All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] drm/i915/ttm: Evict and restore of compressed object
@ 2022-03-19 20:42 ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support
the 3D and media compression formats.

The memory required for the CCS of the entire local memory is
1/256 of the local memory size. So before the kernel
boot, the required memory is reserved for the CCS data and a
secure register will be programmed with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the smem
to hold the compression related data along with the obj we need smem
space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v4:
  Inflate the ttm_tt only when the obj is lmem only.
  Back to XY_CTRL_SURF_COPY_BLT to clear the ccs, as FAST_CLEAR_0
							failed.
  Add a selftest for testing the ccs clearing.

Test-with: 20220314051432.15785-1-ramalingam.c@intel.com

Ramalingam C (8):
  drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
  drm/i915/gt: Clear compress metadata for Flat-ccs objects
  drm/i915/selftest_migrate: Consider the possible roundup of size
  drm/i915/selftest_migrate: Check CCS meta data clear
  drm/i915/gt: Optimize the migration loop
  drm/ttm: Add a parameter to add extra pages into ttm_tt
  drm/i915/gem: Add extra pages in ttm_tt for ccs data
  drm/i915/migrate: Evict and restore the flatccs capable lmem obj

 drivers/gpu/drm/drm_gem_vram_helper.c        |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  29 +-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  20 +
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 404 +++++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_migrate.c   | 277 +++++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c                |   2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c        |   2 +-
 drivers/gpu/drm/ttm/ttm_tt.c                 |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c   |   2 +-
 include/drm/ttm/ttm_tt.h                     |   4 +-
 10 files changed, 688 insertions(+), 66 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 0/8] drm/i915/ttm: Evict and restore of compressed object
@ 2022-03-19 20:42 ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

On Xe-HP and later devices, we use dedicated compression control
state (CCS) stored in local memory for each surface, to support
the 3D and media compression formats.

The memory required for the CCS of the entire local memory is
1/256 of the local memory size. So before the kernel
boot, the required memory is reserved for the CCS data and a
secure register will be programmed with the CCS base address

So when we allocate a object in local memory we dont need to explicitly
allocate the space for ccs data. But when we evict the obj into the smem
to hold the compression related data along with the obj we need smem
space of obj_size + (obj_size/256).

Hence when we create smem for an obj with lmem placement possibility we
create with the extra space.

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v4:
  Inflate the ttm_tt only when the obj is lmem only.
  Back to XY_CTRL_SURF_COPY_BLT to clear the ccs, as FAST_CLEAR_0
							failed.
  Add a selftest for testing the ccs clearing.

Test-with: 20220314051432.15785-1-ramalingam.c@intel.com

Ramalingam C (8):
  drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
  drm/i915/gt: Clear compress metadata for Flat-ccs objects
  drm/i915/selftest_migrate: Consider the possible roundup of size
  drm/i915/selftest_migrate: Check CCS meta data clear
  drm/i915/gt: Optimize the migration loop
  drm/ttm: Add a parameter to add extra pages into ttm_tt
  drm/i915/gem: Add extra pages in ttm_tt for ccs data
  drm/i915/migrate: Evict and restore the flatccs capable lmem obj

 drivers/gpu/drm/drm_gem_vram_helper.c        |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      |  29 +-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  20 +
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 404 +++++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_migrate.c   | 277 +++++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c                |   2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c        |   2 +-
 drivers/gpu/drm/ttm/ttm_tt.c                 |  12 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c   |   2 +-
 include/drm/ttm/ttm_tt.h                     |   4 +-
 10 files changed, 688 insertions(+), 66 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence for
clearing (Zero out) the pages of the newly allocated object, faster cmd
is used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++---
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index d112ffd56418..925e55b6a94f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -205,6 +205,11 @@
 
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
+#define XY_FAST_COLOR_BLT_CMD		(2 << 29 | 0x44 << 22)
+#define   XY_FAST_COLOR_BLT_DEPTH_32	(2 << 19)
+#define   XY_FAST_COLOR_BLT_DW		16
+#define   XY_FAST_COLOR_BLT_MOCS_MASK	GENMASK(27, 21)
+#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
 #define GEN9_XY_FAST_COPY_BLT_CMD	(2 << 29 | 0x42 << 22)
 #define XY_SRC_COPY_BLT_CMD		(2 << 29 | 0x53 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..73199ebf0671 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	const int ver = GRAPHICS_VER(i915);
+	int ring_sz;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	if (ver >= 12)
+		ring_sz = 16;
+	else if (ver >= 8)
+		ring_sz = 8;
+	else
+		ring_sz = 6;
+
+	cs = intel_ring_begin(rq, ring_sz);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	if (ver >= 8) {
+	if (ver >= 12) {
+		*cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
+			(XY_FAST_COLOR_BLT_DW - 2);
+		*cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
+			(PAGE_SIZE - 1);
+		*cs++ = 0;
+		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
+		*cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
+		/* BG7 */
+		*cs++ = value;
+		*cs++ = 0;
+		*cs++ = 0;
+		*cs++ = 0;
+		/* BG11 */
+		*cs++ = 0;
+		*cs++ = 0;
+		/* BG13 */
+		*cs++ = 0;
+		*cs++ = 0;
+		*cs++ = 0;
+	} else if (ver >= 8) {
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
@@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence for
clearing (Zero out) the pages of the newly allocated object, faster cmd
is used.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++---
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index d112ffd56418..925e55b6a94f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -205,6 +205,11 @@
 
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
+#define XY_FAST_COLOR_BLT_CMD		(2 << 29 | 0x44 << 22)
+#define   XY_FAST_COLOR_BLT_DEPTH_32	(2 << 19)
+#define   XY_FAST_COLOR_BLT_DW		16
+#define   XY_FAST_COLOR_BLT_MOCS_MASK	GENMASK(27, 21)
+#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
 #define GEN9_XY_FAST_COPY_BLT_CMD	(2 << 29 | 0x42 << 22)
 #define XY_SRC_COPY_BLT_CMD		(2 << 29 | 0x53 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..73199ebf0671 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	const int ver = GRAPHICS_VER(i915);
+	int ring_sz;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	if (ver >= 12)
+		ring_sz = 16;
+	else if (ver >= 8)
+		ring_sz = 8;
+	else
+		ring_sz = 6;
+
+	cs = intel_ring_begin(rq, ring_sz);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	if (ver >= 8) {
+	if (ver >= 12) {
+		*cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
+			(XY_FAST_COLOR_BLT_DW - 2);
+		*cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
+			(PAGE_SIZE - 1);
+		*cs++ = 0;
+		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
+		*cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
+		/* BG7 */
+		*cs++ = value;
+		*cs++ = 0;
+		*cs++ = 0;
+		*cs++ = 0;
+		/* BG11 */
+		*cs++ = 0;
+		*cs++ = 0;
+		/* BG13 */
+		*cs++ = 0;
+		*cs++ = 0;
+		*cs++ = 0;
+	} else if (ver >= 8) {
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
@@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 2/8] drm/i915/gt: Clear compress metadata for Flat-ccs objects
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the
ccs surface of a lmem memory. So on Flat-CCS capable platform we use
XY_CTRL_SURF_COPY_BLT  to clear the CCS meta data.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
    mocs fix [Matt]
    Comments Fix [Matt]
    Flush address programming [Ram]
v6: FLUSH_DW is fixed
    Few coding style fix
v7: Adopting the XY_FAST_COLOR_BLT (Thomas]
v8: XY_CTRL_SURF_COPY_BLT for ccs clearing.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 138 ++++++++++++++++++-
 2 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 925e55b6a94f..6b4eb7927ec7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -153,8 +153,10 @@
 #define   MI_FLUSH_DW_PROTECTED_MEM_EN	(1 << 22)
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
+#define   MI_FLUSH_DW_CCS		(1<<16)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
 #define   MI_FLUSH_DW_OP_MASK		(3<<14)
+#define   MI_FLUSH_DW_LLC		(1<<9)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
@@ -203,6 +205,19 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE		5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_MASK			GENMASK(17, 8)
+#define   XY_CTRL_SURF_MOCS_MASK	GENMASK(31, 25)
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define XY_FAST_COLOR_BLT_CMD		(2 << 29 | 0x44 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 73199ebf0671..c1db8daf994a 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,7 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
-
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
 	if (!engine)
@@ -467,6 +468,110 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exhaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
+{
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = 0;
+	*cmd++ = 0;
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * transfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
+
+	/*
+	 * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u32 ccs_blocks)
+{
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		int blks_per_copy;
+
+		blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
+				NUM_CCS_BLKS_PER_XFER : ccs_blocks;
+		*cmd++ = XY_CTRL_SURF_COPY_BLT |
+			 src_mem_access << SRC_ACCESS_TYPE_SHIFT |
+			 dst_mem_access << DST_ACCESS_TYPE_SHIFT |
+			 FIELD_PREP(CCS_SIZE_MASK, blks_per_copy - 1);
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, src_mocs);
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, dst_mocs);
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		ccs_blocks -= blks_per_copy;
+	} while (ccs_blocks > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -618,8 +723,9 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 		      u32 value, bool is_lmem)
 {
 	struct drm_i915_private *i915 = rq->engine->i915;
-	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 mocs = rq->engine->gt->mocs.uc_index << 1;
 	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size = 0;
 	int ring_sz;
 	u32 *cs;
 
@@ -634,7 +740,12 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 	else
 		ring_sz = 6;
 
-	cs = intel_ring_begin(rq, ring_sz);
+	/* Clear CCS only when value is 0 */
+	ccs_ring_size = (HAS_FLAT_CCS(i915) && is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+	ring_sz += ccs_ring_size;
+
+	cs = intel_ring_begin(rq, round_up(ring_sz, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -679,6 +790,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 		*cs++ = value;
 	}
 
+	if (ccs_ring_size) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      mocs, mocs, num_ccs_blks);
+		cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	}
+	if (ring_sz & 1)
+		*cs++ = MI_NOOP;
+
 	intel_ring_advance(rq, cs);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 2/8] drm/i915/gt: Clear compress metadata for Flat-ccs objects
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the
ccs surface of a lmem memory. So on Flat-CCS capable platform we use
XY_CTRL_SURF_COPY_BLT  to clear the CCS meta data.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
    mocs fix [Matt]
    Comments Fix [Matt]
    Flush address programming [Ram]
v6: FLUSH_DW is fixed
    Few coding style fix
v7: Adopting the XY_FAST_COLOR_BLT (Thomas]
v8: XY_CTRL_SURF_COPY_BLT for ccs clearing.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 138 ++++++++++++++++++-
 2 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 925e55b6a94f..6b4eb7927ec7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -153,8 +153,10 @@
 #define   MI_FLUSH_DW_PROTECTED_MEM_EN	(1 << 22)
 #define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
 #define   MI_INVALIDATE_TLB		(1<<18)
+#define   MI_FLUSH_DW_CCS		(1<<16)
 #define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
 #define   MI_FLUSH_DW_OP_MASK		(3<<14)
+#define   MI_FLUSH_DW_LLC		(1<<9)
 #define   MI_FLUSH_DW_NOTIFY		(1<<8)
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
@@ -203,6 +205,19 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE		5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_MASK			GENMASK(17, 8)
+#define   XY_CTRL_SURF_MOCS_MASK	GENMASK(31, 25)
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define XY_FAST_COLOR_BLT_CMD		(2 << 29 | 0x44 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 73199ebf0671..c1db8daf994a 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,7 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
-
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
 	if (!engine)
@@ -467,6 +468,110 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exhaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
+{
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = 0;
+	*cmd++ = 0;
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * transfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = XY_CTRL_SURF_INSTR_SIZE * num_cmds;
+
+	/*
+	 * Adding a flush before and after XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u32 ccs_blocks)
+{
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		int blks_per_copy;
+
+		blks_per_copy = ccs_blocks >= NUM_CCS_BLKS_PER_XFER ?
+				NUM_CCS_BLKS_PER_XFER : ccs_blocks;
+		*cmd++ = XY_CTRL_SURF_COPY_BLT |
+			 src_mem_access << SRC_ACCESS_TYPE_SHIFT |
+			 dst_mem_access << DST_ACCESS_TYPE_SHIFT |
+			 FIELD_PREP(CCS_SIZE_MASK, blks_per_copy - 1);
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = (upper_32_bits(src_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, src_mocs);
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = (upper_32_bits(dst_addr) & 0xFFFF) |
+			  FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, dst_mocs);
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		ccs_blocks -= blks_per_copy;
+	} while (ccs_blocks > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -618,8 +723,9 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 		      u32 value, bool is_lmem)
 {
 	struct drm_i915_private *i915 = rq->engine->i915;
-	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 mocs = rq->engine->gt->mocs.uc_index << 1;
 	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size = 0;
 	int ring_sz;
 	u32 *cs;
 
@@ -634,7 +740,12 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 	else
 		ring_sz = 6;
 
-	cs = intel_ring_begin(rq, ring_sz);
+	/* Clear CCS only when value is 0 */
+	ccs_ring_size = (HAS_FLAT_CCS(i915) && is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+	ring_sz += ccs_ring_size;
+
+	cs = intel_ring_begin(rq, round_up(ring_sz, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -679,6 +790,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size,
 		*cs++ = value;
 	}
 
+	if (ccs_ring_size) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      mocs, mocs, num_ccs_blks);
+		cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	}
+	if (ring_sz & 1)
+		*cs++ = MI_NOOP;
+
 	intel_ring_advance(rq, cs);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 3/8] drm/i915/selftest_migrate: Consider the possible roundup of size
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

At obj allocation size could be rounded up by min_page_size. So please
update the size as per the obj size allocated.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index c9c4f391c5cc..b5da8b8cd039 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -152,6 +152,9 @@ static int clear(struct intel_migrate *migrate,
 	if (IS_ERR(obj))
 		return 0;
 
+	/* Consider the rounded up memory too */
+	sz = obj->base.size;
+
 	for_i915_gem_ww(&ww, err, true) {
 		err = i915_gem_object_lock(obj, &ww);
 		if (err)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 3/8] drm/i915/selftest_migrate: Consider the possible roundup of size
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

At obj allocation size could be rounded up by min_page_size. So please
update the size as per the obj size allocated.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index c9c4f391c5cc..b5da8b8cd039 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -152,6 +152,9 @@ static int clear(struct intel_migrate *migrate,
 	if (IS_ERR(obj))
 		return 0;
 
+	/* Consider the rounded up memory too */
+	sz = obj->base.size;
+
 	for_i915_gem_ww(&ww, err, true) {
 		err = i915_gem_object_lock(obj, &ww);
 		if (err)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

While clearing the Flat-CCS capable lmem object, we need to clear the CCS
meta data corresponding to the memory.

As part of live_migrate_clear add check for the ccs meta data clear for
the Flat-CCS capable lmem object.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++---
 2 files changed, 278 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index c1db8daf994a..bbfea570c239 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 	return cmd;
 }
 
+static int emit_copy_ccs(struct i915_request *rq,
+			 u32 dst_offset, u8 dst_access,
+			 u32 src_offset, u8 src_access, int size)
+{
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 num_ccs_blks, ccs_ring_size;
+	u32 *cs;
+
+	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
+	WARN_ON(!ccs_ring_size);
+
+	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				    NUM_CCS_BYTES_PER_BLOCK);
+
+	cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
+				      src_access, dst_access,
+				      mocs, mocs, num_ccs_blks);
+	cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	if (ccs_ring_size & 1)
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index b5da8b8cd039..e32cc994f4a2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
 	return err;
 }
 
+static int intel_context_copy_ccs(struct intel_context *ce,
+				  const struct i915_deps *deps,
+				  struct scatterlist *sg,
+				  enum i915_cache_level cache_level,
+				  bool write_to_ccs,
+				  struct i915_request **out)
+{
+	u8 src_access = write_to_ccs ? DIRECT_ACCESS : INDIRECT_ACCESS;
+	u8 dst_access = write_to_ccs ? INDIRECT_ACCESS : DIRECT_ACCESS;
+	struct sgt_dma it = sg_sgt(sg);
+	struct i915_request *rq;
+	u32 offset;
+	int err;
+
+	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
+	*out = NULL;
+
+	GEM_BUG_ON(ce->ring->size < SZ_64K);
+
+	offset = 0;
+	if (HAS_64K_PAGES(ce->engine->i915))
+		offset = CHUNK_SZ;
+	offset += (u64)rq->engine->instance << 32;
+
+	do {
+		int len;
+
+		rq = i915_request_create(ce);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out_ce;
+		}
+
+		if (deps) {
+			err = i915_request_await_deps(rq, deps);
+			if (err)
+				goto out_rq;
+
+			if (rq->engine->emit_init_breadcrumb) {
+				err = rq->engine->emit_init_breadcrumb(rq);
+				if (err)
+					goto out_rq;
+			}
+
+			deps = NULL;
+		}
+
+		/* The PTE updates + clear must not be interrupted. */
+		err = emit_no_arbitration(rq);
+		if (err)
+			goto out_rq;
+
+		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
+		if (len <= 0) {
+			err = len;
+			goto out_rq;
+		}
+
+		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+		if (err)
+			goto out_rq;
+
+		err = emit_copy_ccs(rq, offset, dst_access,
+				    offset, src_access, len);
+		if (err)
+			goto out_rq;
+
+		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
+					     MI_FLUSH_DW_CCS);
+
+		/* Arbitration is re-enabled between requests. */
+out_rq:
+		if (*out)
+			i915_request_put(*out);
+		*out = i915_request_get(rq);
+		i915_request_add(rq);
+		if (err || !it.sg || !sg_dma_len(it.sg))
+			break;
+
+		cond_resched();
+	} while (1);
+
+out_ce:
+	return err;
+}
+
+static int
+intel_migrate_ccs_copy(struct intel_migrate *m,
+		       struct i915_gem_ww_ctx *ww,
+		       const struct i915_deps *deps,
+		       struct scatterlist *sg,
+		       enum i915_cache_level cache_level,
+		       bool write_to_ccs,
+		       struct i915_request **out)
+{
+	struct intel_context *ce;
+	int err;
+
+	*out = NULL;
+	if (!m->context)
+		return -ENODEV;
+
+	ce = intel_migrate_create_context(m);
+	if (IS_ERR(ce))
+		ce = intel_context_get(m->context);
+	GEM_BUG_ON(IS_ERR(ce));
+
+	err = intel_context_pin_ww(ce, ww);
+	if (err)
+		goto out;
+
+	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
+				     write_to_ccs, out);
+
+	intel_context_unpin(ce);
+out:
+	intel_context_put(ce);
+	return err;
+}
+
 static int clear(struct intel_migrate *migrate,
 		 int (*fn)(struct intel_migrate *migrate,
 			   struct i915_gem_ww_ctx *ww,
@@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
 	struct drm_i915_gem_object *obj;
 	struct i915_request *rq;
 	struct i915_gem_ww_ctx ww;
-	u32 *vaddr;
+	u32 *vaddr, val = 0;
+	bool ccs_cap = false;
 	int err = 0;
 	int i;
 
@@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
 	/* Consider the rounded up memory too */
 	sz = obj->base.size;
 
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
+		ccs_cap = true;
+
 	for_i915_gem_ww(&ww, err, true) {
+		int ccs_bytes;
+
 		err = i915_gem_object_lock(obj, &ww);
 		if (err)
 			continue;
@@ -170,44 +296,136 @@ static int clear(struct intel_migrate *migrate,
 			vaddr[i] = ~i;
 		i915_gem_object_flush_map(obj);
 
-		err = fn(migrate, &ww, obj, sz, &rq);
-		if (!err)
-			continue;
+		if (ccs_cap && !val) {
+			/* Write the obj data into ccs surface */
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
+						     obj->mm.pages->sgl,
+						     obj->cache_level,
+						     true, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			for (i = 0; i < sz / sizeof(u32); i++)
+				vaddr[i] = 0x5a5a5a5a;
+			i915_gem_object_flush_map(obj);
+
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL, obj->mm.pages->sgl,
+						     obj->cache_level, false, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			i915_gem_object_flush_map(obj);
+			for (i = 0; !err && i < ccs_bytes; i += 4) {
+				if (vaddr[i] != ~i) {
+					pr_err("%ps ccs write and read failed, offset: %d\n",
+					       fn, i);
+					err = -EINVAL;
+				}
+			}
+			if (err)
+				continue;
+
+			i915_gem_object_flush_map(obj);
+		}
 
-		if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS)
-			pr_err("%ps failed, size: %u\n", fn, sz);
-		if (rq) {
-			i915_request_wait(rq, 0, HZ);
+		err = fn(migrate, &ww, obj, val, &rq);
+		if (rq && !err) {
+			if (i915_request_wait(rq, 0, HZ) < 0) {
+				pr_err("%ps timed out, size: %u\n", fn, sz);
+				err = -ETIME;
+			}
 			i915_request_put(rq);
+			rq = NULL;
 		}
-		i915_gem_object_unpin_map(obj);
-	}
-	if (err)
-		goto err_out;
+		if (err)
+			continue;
 
-	if (rq) {
-		if (i915_request_wait(rq, 0, HZ) < 0) {
-			pr_err("%ps timed out, size: %u\n", fn, sz);
-			err = -ETIME;
+		i915_gem_object_flush_map(obj);
+
+		/* Verify the set/clear of the obj mem */
+		for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
+			int x = i * 1024 +
+				i915_prandom_u32_max_state(1024, prng);
+
+			if (vaddr[x] != val) {
+				pr_err("%ps failed, (%u != %u), offset: %zu\n",
+				       fn, vaddr[x], val,  x * sizeof(u32));
+				igt_hexdump(vaddr + i * 1024, 4096);
+				err = -EINVAL;
+			}
 		}
-		i915_request_put(rq);
-	}
+		if (err)
+			continue;
 
-	for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
-		int x = i * 1024 + i915_prandom_u32_max_state(1024, prng);
+		if (ccs_cap && !val) {
+			for (i = 0; i < sz / sizeof(u32); i++)
+				vaddr[i] = ~i;
+			i915_gem_object_flush_map(obj);
+
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
+						     obj->mm.pages->sgl,
+						     obj->cache_level,
+						     false, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			ccs_bytes = GET_CCS_BYTES(i915, sz);
+			i915_gem_object_flush_map(obj);
+			for (i = 0; !err && i < ccs_bytes / sizeof(u32); i++) {
+				if (vaddr[i]) {
+					pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
+					       fn, i, (ccs_bytes / sizeof(u32)) -  1);
+					igt_hexdump(vaddr + i, ccs_bytes - i * sizeof(u32));
+					err = -EINVAL;
+				}
+			}
+			if (err)
+				continue;
+		}
+		i915_gem_object_unpin_map(obj);
+	}
 
-		if (vaddr[x] != sz) {
-			pr_err("%ps failed, size: %u, offset: %zu\n",
-			       fn, sz, x * sizeof(u32));
-			igt_hexdump(vaddr + i * 1024, 4096);
-			err = -EINVAL;
+	if (err) {
+		if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS)
+			pr_err("%ps failed, size: %u\n", fn, sz);
+		if (rq && err != -EINVAL) {
+			i915_request_wait(rq, 0, HZ);
+			i915_request_put(rq);
 		}
+
+		i915_gem_object_unpin_map(obj);
+	} else {
+		pr_debug("%ps Passed. size: %u\n", fn, sz);
 	}
 
-	i915_gem_object_unpin_map(obj);
-err_out:
 	i915_gem_object_put(obj);
-
 	return err;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

While clearing the Flat-CCS capable lmem object, we need to clear the CCS
meta data corresponding to the memory.

As part of live_migrate_clear add check for the ccs meta data clear for
the Flat-CCS capable lmem object.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
 drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++---
 2 files changed, 278 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index c1db8daf994a..bbfea570c239 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
 	return cmd;
 }
 
+static int emit_copy_ccs(struct i915_request *rq,
+			 u32 dst_offset, u8 dst_access,
+			 u32 src_offset, u8 src_access, int size)
+{
+	struct drm_i915_private *i915 = rq->engine->i915;
+	int mocs = rq->engine->gt->mocs.uc_index << 1;
+	u32 num_ccs_blks, ccs_ring_size;
+	u32 *cs;
+
+	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
+	WARN_ON(!ccs_ring_size);
+
+	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				    NUM_CCS_BYTES_PER_BLOCK);
+
+	cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
+				      src_access, dst_access,
+				      mocs, mocs, num_ccs_blks);
+	cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
+	if (ccs_ring_size & 1)
+		*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index b5da8b8cd039..e32cc994f4a2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
 	return err;
 }
 
+static int intel_context_copy_ccs(struct intel_context *ce,
+				  const struct i915_deps *deps,
+				  struct scatterlist *sg,
+				  enum i915_cache_level cache_level,
+				  bool write_to_ccs,
+				  struct i915_request **out)
+{
+	u8 src_access = write_to_ccs ? DIRECT_ACCESS : INDIRECT_ACCESS;
+	u8 dst_access = write_to_ccs ? INDIRECT_ACCESS : DIRECT_ACCESS;
+	struct sgt_dma it = sg_sgt(sg);
+	struct i915_request *rq;
+	u32 offset;
+	int err;
+
+	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
+	*out = NULL;
+
+	GEM_BUG_ON(ce->ring->size < SZ_64K);
+
+	offset = 0;
+	if (HAS_64K_PAGES(ce->engine->i915))
+		offset = CHUNK_SZ;
+	offset += (u64)rq->engine->instance << 32;
+
+	do {
+		int len;
+
+		rq = i915_request_create(ce);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out_ce;
+		}
+
+		if (deps) {
+			err = i915_request_await_deps(rq, deps);
+			if (err)
+				goto out_rq;
+
+			if (rq->engine->emit_init_breadcrumb) {
+				err = rq->engine->emit_init_breadcrumb(rq);
+				if (err)
+					goto out_rq;
+			}
+
+			deps = NULL;
+		}
+
+		/* The PTE updates + clear must not be interrupted. */
+		err = emit_no_arbitration(rq);
+		if (err)
+			goto out_rq;
+
+		len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
+		if (len <= 0) {
+			err = len;
+			goto out_rq;
+		}
+
+		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+		if (err)
+			goto out_rq;
+
+		err = emit_copy_ccs(rq, offset, dst_access,
+				    offset, src_access, len);
+		if (err)
+			goto out_rq;
+
+		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
+					     MI_FLUSH_DW_CCS);
+
+		/* Arbitration is re-enabled between requests. */
+out_rq:
+		if (*out)
+			i915_request_put(*out);
+		*out = i915_request_get(rq);
+		i915_request_add(rq);
+		if (err || !it.sg || !sg_dma_len(it.sg))
+			break;
+
+		cond_resched();
+	} while (1);
+
+out_ce:
+	return err;
+}
+
+static int
+intel_migrate_ccs_copy(struct intel_migrate *m,
+		       struct i915_gem_ww_ctx *ww,
+		       const struct i915_deps *deps,
+		       struct scatterlist *sg,
+		       enum i915_cache_level cache_level,
+		       bool write_to_ccs,
+		       struct i915_request **out)
+{
+	struct intel_context *ce;
+	int err;
+
+	*out = NULL;
+	if (!m->context)
+		return -ENODEV;
+
+	ce = intel_migrate_create_context(m);
+	if (IS_ERR(ce))
+		ce = intel_context_get(m->context);
+	GEM_BUG_ON(IS_ERR(ce));
+
+	err = intel_context_pin_ww(ce, ww);
+	if (err)
+		goto out;
+
+	err = intel_context_copy_ccs(ce, deps, sg, cache_level,
+				     write_to_ccs, out);
+
+	intel_context_unpin(ce);
+out:
+	intel_context_put(ce);
+	return err;
+}
+
 static int clear(struct intel_migrate *migrate,
 		 int (*fn)(struct intel_migrate *migrate,
 			   struct i915_gem_ww_ctx *ww,
@@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
 	struct drm_i915_gem_object *obj;
 	struct i915_request *rq;
 	struct i915_gem_ww_ctx ww;
-	u32 *vaddr;
+	u32 *vaddr, val = 0;
+	bool ccs_cap = false;
 	int err = 0;
 	int i;
 
@@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
 	/* Consider the rounded up memory too */
 	sz = obj->base.size;
 
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
+		ccs_cap = true;
+
 	for_i915_gem_ww(&ww, err, true) {
+		int ccs_bytes;
+
 		err = i915_gem_object_lock(obj, &ww);
 		if (err)
 			continue;
@@ -170,44 +296,136 @@ static int clear(struct intel_migrate *migrate,
 			vaddr[i] = ~i;
 		i915_gem_object_flush_map(obj);
 
-		err = fn(migrate, &ww, obj, sz, &rq);
-		if (!err)
-			continue;
+		if (ccs_cap && !val) {
+			/* Write the obj data into ccs surface */
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
+						     obj->mm.pages->sgl,
+						     obj->cache_level,
+						     true, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			for (i = 0; i < sz / sizeof(u32); i++)
+				vaddr[i] = 0x5a5a5a5a;
+			i915_gem_object_flush_map(obj);
+
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL, obj->mm.pages->sgl,
+						     obj->cache_level, false, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			i915_gem_object_flush_map(obj);
+			for (i = 0; !err && i < ccs_bytes; i += 4) {
+				if (vaddr[i] != ~i) {
+					pr_err("%ps ccs write and read failed, offset: %d\n",
+					       fn, i);
+					err = -EINVAL;
+				}
+			}
+			if (err)
+				continue;
+
+			i915_gem_object_flush_map(obj);
+		}
 
-		if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS)
-			pr_err("%ps failed, size: %u\n", fn, sz);
-		if (rq) {
-			i915_request_wait(rq, 0, HZ);
+		err = fn(migrate, &ww, obj, val, &rq);
+		if (rq && !err) {
+			if (i915_request_wait(rq, 0, HZ) < 0) {
+				pr_err("%ps timed out, size: %u\n", fn, sz);
+				err = -ETIME;
+			}
 			i915_request_put(rq);
+			rq = NULL;
 		}
-		i915_gem_object_unpin_map(obj);
-	}
-	if (err)
-		goto err_out;
+		if (err)
+			continue;
 
-	if (rq) {
-		if (i915_request_wait(rq, 0, HZ) < 0) {
-			pr_err("%ps timed out, size: %u\n", fn, sz);
-			err = -ETIME;
+		i915_gem_object_flush_map(obj);
+
+		/* Verify the set/clear of the obj mem */
+		for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
+			int x = i * 1024 +
+				i915_prandom_u32_max_state(1024, prng);
+
+			if (vaddr[x] != val) {
+				pr_err("%ps failed, (%u != %u), offset: %zu\n",
+				       fn, vaddr[x], val,  x * sizeof(u32));
+				igt_hexdump(vaddr + i * 1024, 4096);
+				err = -EINVAL;
+			}
 		}
-		i915_request_put(rq);
-	}
+		if (err)
+			continue;
 
-	for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
-		int x = i * 1024 + i915_prandom_u32_max_state(1024, prng);
+		if (ccs_cap && !val) {
+			for (i = 0; i < sz / sizeof(u32); i++)
+				vaddr[i] = ~i;
+			i915_gem_object_flush_map(obj);
+
+			err = intel_migrate_ccs_copy(migrate, &ww, NULL,
+						     obj->mm.pages->sgl,
+						     obj->cache_level,
+						     false, &rq);
+			if (rq && !err) {
+				if (i915_request_wait(rq, 0, HZ) < 0) {
+					pr_err("%ps timed out, size: %u\n",
+					       fn, sz);
+					err = -ETIME;
+				}
+				i915_request_put(rq);
+				rq = NULL;
+			}
+			if (err)
+				continue;
+
+			ccs_bytes = GET_CCS_BYTES(i915, sz);
+			i915_gem_object_flush_map(obj);
+			for (i = 0; !err && i < ccs_bytes / sizeof(u32); i++) {
+				if (vaddr[i]) {
+					pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
+					       fn, i, (ccs_bytes / sizeof(u32)) -  1);
+					igt_hexdump(vaddr + i, ccs_bytes - i * sizeof(u32));
+					err = -EINVAL;
+				}
+			}
+			if (err)
+				continue;
+		}
+		i915_gem_object_unpin_map(obj);
+	}
 
-		if (vaddr[x] != sz) {
-			pr_err("%ps failed, size: %u, offset: %zu\n",
-			       fn, sz, x * sizeof(u32));
-			igt_hexdump(vaddr + i * 1024, 4096);
-			err = -EINVAL;
+	if (err) {
+		if (err != -EDEADLK && err != -EINTR && err != -ERESTARTSYS)
+			pr_err("%ps failed, size: %u\n", fn, sz);
+		if (rq && err != -EINVAL) {
+			i915_request_wait(rq, 0, HZ);
+			i915_request_put(rq);
 		}
+
+		i915_gem_object_unpin_map(obj);
+	} else {
+		pr_debug("%ps Passed. size: %u\n", fn, sz);
 	}
 
-	i915_gem_object_unpin_map(obj);
-err_out:
 	i915_gem_object_put(obj);
-
 	return err;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 5/8] drm/i915/gt: Optimize the migration loop
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Move the static calculations out of the loop.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 34 ++++++++++++-------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index bbfea570c239..b6c5a0102bc2 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -663,6 +663,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	u32 src_offset, dst_offset;
 	struct i915_request *rq;
 	int err;
 
@@ -671,8 +672,20 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	src_offset = 0;
+	dst_offset = CHUNK_SZ;
+	if (HAS_64K_PAGES(ce->engine->i915)) {
+		GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+		src_offset = 0;
+		dst_offset = 0;
+		if (src_is_lmem)
+			src_offset = CHUNK_SZ;
+		if (dst_is_lmem)
+			dst_offset = 2 * CHUNK_SZ;
+	}
+
 	do {
-		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -700,19 +713,6 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		src_offset = 0;
-		dst_offset = CHUNK_SZ;
-		if (HAS_64K_PAGES(ce->engine->i915)) {
-			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
-
-			src_offset = 0;
-			dst_offset = 0;
-			if (src_is_lmem)
-				src_offset = CHUNK_SZ;
-			if (dst_is_lmem)
-				dst_offset = 2 * CHUNK_SZ;
-		}
-
 		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
 			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
@@ -722,12 +722,10 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
 			       dst_offset, len);
-		if (err < 0)
-			goto out_rq;
-		if (err < len) {
+		if (err < len)
 			err = -EINVAL;
+		if (err < 0)
 			goto out_rq;
-		}
 
 		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 		if (err)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 5/8] drm/i915/gt: Optimize the migration loop
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

Move the static calculations out of the loop.

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 34 ++++++++++++-------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index bbfea570c239..b6c5a0102bc2 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -663,6 +663,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	u32 src_offset, dst_offset;
 	struct i915_request *rq;
 	int err;
 
@@ -671,8 +672,20 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	src_offset = 0;
+	dst_offset = CHUNK_SZ;
+	if (HAS_64K_PAGES(ce->engine->i915)) {
+		GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+		src_offset = 0;
+		dst_offset = 0;
+		if (src_is_lmem)
+			src_offset = CHUNK_SZ;
+		if (dst_is_lmem)
+			dst_offset = 2 * CHUNK_SZ;
+	}
+
 	do {
-		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -700,19 +713,6 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		src_offset = 0;
-		dst_offset = CHUNK_SZ;
-		if (HAS_64K_PAGES(ce->engine->i915)) {
-			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
-
-			src_offset = 0;
-			dst_offset = 0;
-			if (src_is_lmem)
-				src_offset = CHUNK_SZ;
-			if (dst_is_lmem)
-				dst_offset = 2 * CHUNK_SZ;
-		}
-
 		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
 			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
@@ -722,12 +722,10 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
 			       dst_offset, len);
-		if (err < 0)
-			goto out_rq;
-		if (err < len) {
+		if (err < len)
 			err = -EINVAL;
+		if (err < 0)
 			goto out_rq;
-		}
 
 		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 		if (err)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Hellstrom Thomas, Matthew Auld, Christian Koenig

Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
driver needs extra pages in ttm_tt.

v2:
  Used imperative wording [Thomas and Christian]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
 include/drm/ttm/ttm_tt.h                   |  4 +++-
 7 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index dc7f938bfff2..123045b58fec 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
 	if (!tt)
 		return NULL;
 
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
+	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
 	if (ret < 0)
 		goto err_ttm_tt_init;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index e4a06fcf741a..3b9f99c765c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -290,7 +290,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
 	if (ret)
 		goto err_free;
 
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
 	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
 	if (ttm == NULL)
 		return NULL;
-	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
 		kfree(ttm);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
 	agp_be->mem = NULL;
 	agp_be->bridge = bridge;
 
-	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
+	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
 		kfree(agp_be);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d234aab800a0..1a66d9fc589a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 static void ttm_tt_init_fields(struct ttm_tt *ttm,
 			       struct ttm_buffer_object *bo,
 			       uint32_t page_flags,
-			       enum ttm_caching caching)
+			       enum ttm_caching caching,
+			       unsigned long extra_pages)
 {
-	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
+	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
 	ttm->caching = ttm_cached;
 	ttm->page_flags = page_flags;
 	ttm->dma_address = NULL;
@@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching)
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages)
 {
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
 
 	if (ttm_tt_alloc_page_directory(ttm)) {
 		pr_err("Failed allocating page table\n");
@@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
 {
 	int ret;
 
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
 
 	if (page_flags & TTM_TT_FLAG_EXTERNAL)
 		ret = ttm_sg_tt_alloc_page_directory(ttm);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index b84ecc6d6611..4e3938e62c08 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
 				     ttm_cached);
 	else
 		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
-				  ttm_cached);
+				  ttm_cached, 0);
 	if (unlikely(ret != 0))
 		goto out_no_init;
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index f20832139815..17a0310e8aaa 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * @bo: The buffer object we create the ttm for.
  * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
  * @caching: the desired caching state of the pages
+ * @extra_pages: Extra pages needed for the driver.
  *
  * Create a struct ttm_tt to back data with system memory pages.
  * No pages are actually allocated.
@@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * NULL: Out of memory.
  */
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching);
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages);
 int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
 		   uint32_t page_flags, enum ttm_caching caching);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Hellstrom Thomas, Matthew Auld, Christian Koenig

Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
driver needs extra pages in ttm_tt.

v2:
  Used imperative wording [Thomas and Christian]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
 drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
 drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
 include/drm/ttm/ttm_tt.h                   |  4 +++-
 7 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
index dc7f938bfff2..123045b58fec 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
 	if (!tt)
 		return NULL;
 
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
+	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
 	if (ret < 0)
 		goto err_ttm_tt_init;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index e4a06fcf741a..3b9f99c765c4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -290,7 +290,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
 	if (ret)
 		goto err_free;
 
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
 	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
 	if (ttm == NULL)
 		return NULL;
-	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
 		kfree(ttm);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
 	agp_be->mem = NULL;
 	agp_be->bridge = bridge;
 
-	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
+	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
 		kfree(agp_be);
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index d234aab800a0..1a66d9fc589a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 static void ttm_tt_init_fields(struct ttm_tt *ttm,
 			       struct ttm_buffer_object *bo,
 			       uint32_t page_flags,
-			       enum ttm_caching caching)
+			       enum ttm_caching caching,
+			       unsigned long extra_pages)
 {
-	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
+	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
 	ttm->caching = ttm_cached;
 	ttm->page_flags = page_flags;
 	ttm->dma_address = NULL;
@@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching)
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages)
 {
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
 
 	if (ttm_tt_alloc_page_directory(ttm)) {
 		pr_err("Failed allocating page table\n");
@@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
 {
 	int ret;
 
-	ttm_tt_init_fields(ttm, bo, page_flags, caching);
+	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
 
 	if (page_flags & TTM_TT_FLAG_EXTERNAL)
 		ret = ttm_sg_tt_alloc_page_directory(ttm);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index b84ecc6d6611..4e3938e62c08 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
 				     ttm_cached);
 	else
 		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
-				  ttm_cached);
+				  ttm_cached, 0);
 	if (unlikely(ret != 0))
 		goto out_no_init;
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index f20832139815..17a0310e8aaa 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * @bo: The buffer object we create the ttm for.
  * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
  * @caching: the desired caching state of the pages
+ * @extra_pages: Extra pages needed for the driver.
  *
  * Create a struct ttm_tt to back data with system memory pages.
  * No pages are actually allocated.
@@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
  * NULL: Out of memory.
  */
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-		uint32_t page_flags, enum ttm_caching caching);
+		uint32_t page_flags, enum ttm_caching caching,
+		unsigned long extra_pages);
 int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
 		   uint32_t page_flags, enum ttm_caching caching);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 7/8] drm/i915/gem: Add extra pages in ttm_tt for ccs data
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, Hellstrom Thomas, Matthew Auld, Christian Koenig

On Xe-HP and later devices, dedicated compression control state (CCS)
stored in local memory is used for each surface, to support the
3D and media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address

So when an object is allocated in local memory, dont need to explicitly
allocate the space for ccs data. But when the obj is evicted into the
smem, to hold the compression related data along with the obj extra space
is needed in smem. i.e obj_size + (obj_size/256).

Hence when a smem pages are allocated for an obj with lmem placement
possibility we create with the extra pages required for the ccs data for
the obj size.

v2:
  Used imperative wording [Thomas]
v3:
  Inflate the pages only when obj's placement is lmem only

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 29 ++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 3b9f99c765c4..0305a150b9d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -20,6 +20,7 @@
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
+#include "gt/intel_gpu_commands.h"
 
 #define I915_TTM_PRIO_PURGE     0
 #define I915_TTM_PRIO_NO_PAGES  1
@@ -262,12 +263,33 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
+static inline bool
+i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
+{
+	bool lmem_placement = false;
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++) {
+		/* Compression is not allowed for the objects with smem placement */
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
+			return false;
+		if (!lmem_placement &&
+		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			lmem_placement = true;
+	}
+
+	return lmem_placement;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
+	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+						     bdev);
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	unsigned long ccs_pages = 0;
 	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
@@ -290,7 +312,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_needs_ccs_pages(obj))
+		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
+						      NUM_BYTES_PER_CCS_BYTE),
+					 PAGE_SIZE);
+
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, ccs_pages);
 	if (ret)
 		goto err_free;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 7/8] drm/i915/gem: Add extra pages in ttm_tt for ccs data
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, Hellstrom Thomas, Matthew Auld, Christian Koenig

On Xe-HP and later devices, dedicated compression control state (CCS)
stored in local memory is used for each surface, to support the
3D and media compression formats.

The memory required for the CCS of the entire local memory is 1/256 of
the local memory size. So before the kernel boot, the required memory
is reserved for the CCS data and a secure register will be programmed
with the CCS base address

So when an object is allocated in local memory, dont need to explicitly
allocate the space for ccs data. But when the obj is evicted into the
smem, to hold the compression related data along with the obj extra space
is needed in smem. i.e obj_size + (obj_size/256).

Hence when a smem pages are allocated for an obj with lmem placement
possibility we create with the extra pages required for the ccs data for
the obj size.

v2:
  Used imperative wording [Thomas]
v3:
  Inflate the pages only when obj's placement is lmem only

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Christian Koenig <christian.koenig@amd.com>
cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 29 ++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 3b9f99c765c4..0305a150b9d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -20,6 +20,7 @@
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_move.h"
 #include "gem/i915_gem_ttm_pm.h"
+#include "gt/intel_gpu_commands.h"
 
 #define I915_TTM_PRIO_PURGE     0
 #define I915_TTM_PRIO_NO_PAGES  1
@@ -262,12 +263,33 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
 	.release = i915_ttm_tt_release
 };
 
+static inline bool
+i915_gem_object_needs_ccs_pages(struct drm_i915_gem_object *obj)
+{
+	bool lmem_placement = false;
+	int i;
+
+	for (i = 0; i < obj->mm.n_placements; i++) {
+		/* Compression is not allowed for the objects with smem placement */
+		if (obj->mm.placements[i]->type == INTEL_MEMORY_SYSTEM)
+			return false;
+		if (!lmem_placement &&
+		    obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+			lmem_placement = true;
+	}
+
+	return lmem_placement;
+}
+
 static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 					 uint32_t page_flags)
 {
+	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+						     bdev);
 	struct ttm_resource_manager *man =
 		ttm_manager_type(bo->bdev, bo->resource->mem_type);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	unsigned long ccs_pages = 0;
 	enum ttm_caching caching;
 	struct i915_ttm_tt *i915_tt;
 	int ret;
@@ -290,7 +312,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		i915_tt->is_shmem = true;
 	}
 
-	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
+	if (HAS_FLAT_CCS(i915) && i915_gem_object_needs_ccs_pages(obj))
+		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
+						      NUM_BYTES_PER_CCS_BYTE),
+					 PAGE_SIZE);
+
+	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, ccs_pages);
 	if (ret)
 		goto err_free;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 8/8] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
@ 2022-03-19 20:42   ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem. ccs data is 1/256 of
lmem size.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v2: Fixing the ccs handling
v3: Handle the ccs data at same loop as main memory [Thomas]
v4: changes for emit_copy_ccs

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 163 +++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index b6c5a0102bc2..ddc7df3de9bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -651,6 +651,65 @@ static int emit_copy(struct i915_request *rq,
 	return 0;
 }
 
+static int scatter_list_length(struct scatterlist *sg)
+{
+	int len = 0;
+
+	while (sg && sg_dma_len(sg)) {
+		len += sg_dma_len(sg);
+		sg = sg_next(sg);
+	};
+
+	return len;
+}
+
+static void
+calculate_chunk_sz(struct drm_i915_private *i915, bool src_is_lmem,
+		   int *src_sz, int *ccs_sz, u32 bytes_to_cpy,
+		   u32 ccs_bytes_to_cpy)
+{
+	if (ccs_bytes_to_cpy) {
+		/*
+		 * We can only copy the ccs data corresponding to
+		 * the CHUNK_SZ of lmem which is
+		 * GET_CCS_BYTES(i915, CHUNK_SZ))
+		 */
+		*ccs_sz = min_t(int, ccs_bytes_to_cpy, GET_CCS_BYTES(i915, CHUNK_SZ));
+
+		if (!src_is_lmem)
+			/*
+			 * When CHUNK_SZ is passed all the pages upto CHUNK_SZ
+			 * will be taken for the blt. in Flat-ccs supported
+			 * platform Smem obj will have more pages than required
+			 * for main meory hence limit it to the required size
+			 * for main memory
+			 */
+			*src_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+	} else { /* ccs handling is not required */
+		*src_sz = CHUNK_SZ;
+	}
+}
+
+static void get_ccs_sg_sgt(struct sgt_dma *it, u32 bytes_to_cpy)
+{
+	u32 len;
+
+	do {
+		GEM_BUG_ON(!it->sg || !sg_dma_len(it->sg));
+		len = it->max - it->dma;
+		if (len > bytes_to_cpy) {
+			it->dma += bytes_to_cpy;
+			break;
+		}
+
+		bytes_to_cpy -= len;
+
+		it->sg = __sg_next(it->sg);
+		it->dma = sg_dma_address(it->sg);
+		it->max = it->dma + sg_dma_len(it->sg);
+	} while (bytes_to_cpy);
+}
+
 int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
@@ -662,9 +721,15 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   bool dst_is_lmem,
 			   struct i915_request **out)
 {
-	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
+	struct drm_i915_private *i915 = ce->engine->i915;
+	u32 ccs_bytes_to_cpy = 0, bytes_to_cpy;
+	enum i915_cache_level ccs_cache_level;
+	int src_sz, dst_sz, ccs_sz;
 	u32 src_offset, dst_offset;
+	u8 src_access, dst_access;
 	struct i915_request *rq;
+	bool ccs_is_src;
 	int err;
 
 	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
@@ -672,6 +737,38 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	src_sz = scatter_list_length(src);
+	bytes_to_cpy = src_sz;
+
+	if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
+		src_access = !src_is_lmem && dst_is_lmem;
+		dst_access = !src_access;
+
+		dst_sz = scatter_list_length(dst);
+		if (src_is_lmem) {
+			it_ccs = it_dst;
+			ccs_cache_level = dst_cache_level;
+			ccs_is_src = false;
+		} else if (dst_is_lmem) {
+			bytes_to_cpy = dst_sz;
+			it_ccs = it_src;
+			ccs_cache_level = src_cache_level;
+			ccs_is_src = true;
+		}
+
+		/*
+		 * When there is a eviction of ccs needed smem will have the
+		 * extra pages for the ccs data
+		 *
+		 * TO-DO: Want to move the size mismatch check to a WARN_ON,
+		 * but still we have some requests of smem->lmem with same size.
+		 * Need to fix it.
+		 */
+		ccs_bytes_to_cpy = src_sz != dst_sz ? GET_CCS_BYTES(i915, bytes_to_cpy) : 0;
+		if (ccs_bytes_to_cpy)
+			get_ccs_sg_sgt(&it_ccs, bytes_to_cpy);
+	}
+
 	src_offset = 0;
 	dst_offset = CHUNK_SZ;
 	if (HAS_64K_PAGES(ce->engine->i915)) {
@@ -713,8 +810,11 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
+		calculate_chunk_sz(i915, src_is_lmem, &src_sz, &ccs_sz,
+				   bytes_to_cpy, ccs_bytes_to_cpy);
+
 		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
-			       src_offset, CHUNK_SZ);
+			       src_offset, src_sz);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -731,7 +831,46 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, dst_offset, src_offset, len);
+		err = emit_copy(rq, dst_offset,	src_offset, len);
+		if (err)
+			goto out_rq;
+
+		bytes_to_cpy -= len;
+
+		if (ccs_bytes_to_cpy) {
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
+				       ccs_is_src ? src_offset : dst_offset,
+				       ccs_sz);
+
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			/*
+			 * Using max of src_sz and dst_sz, as we need to
+			 * pass the lmem size corresponding to the ccs
+			 * blocks we need to handle.
+			 */
+			ccs_sz = max_t(int, ccs_is_src ? ccs_sz : src_sz,
+				       ccs_is_src ? dst_sz : ccs_sz);
+
+			err = emit_copy_ccs(rq, dst_offset, dst_access,
+					    src_offset, src_access, ccs_sz);
+			if (err)
+				goto out_rq;
+
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			/* Converting back to ccs bytes */
+			ccs_sz = GET_CCS_BYTES(rq->engine->i915, ccs_sz);
+			ccs_bytes_to_cpy -= ccs_sz;
+		}
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -739,9 +878,25 @@ intel_context_migrate_copy(struct intel_context *ce,
 			i915_request_put(*out);
 		*out = i915_request_get(rq);
 		i915_request_add(rq);
-		if (err || !it_src.sg || !sg_dma_len(it_src.sg))
+
+		if (err)
 			break;
 
+		if (!bytes_to_cpy && !ccs_bytes_to_cpy) {
+			if (src_is_lmem)
+				WARN_ON(it_src.sg && sg_dma_len(it_src.sg));
+			else
+				WARN_ON(it_dst.sg && sg_dma_len(it_dst.sg));
+			break;
+		}
+
+		if (WARN_ON(!it_src.sg || !sg_dma_len(it_src.sg) ||
+			    !it_dst.sg || !sg_dma_len(it_dst.sg) ||
+			    !it_ccs.sg || !sg_dma_len(it_ccs.sg))) {
+			err = -EINVAL;
+			break;
+		}
+
 		cond_resched();
 	} while (1);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] [PATCH v4 8/8] drm/i915/migrate: Evict and restore the flatccs capable lmem obj
@ 2022-03-19 20:42   ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-19 20:42 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Hellstrom Thomas, Matthew Auld

When we are swapping out the local memory obj on flat-ccs capable platform,
we need to capture the ccs data too along with main meory and we need to
restore it when we are swapping in the content.

When lmem object is swapped into a smem obj, smem obj will
have the extra pages required to hold the ccs data corresponding to the
lmem main memory. So main memory of lmem will be copied into the initial
pages of the smem and then ccs data corresponding to the main memory
will be copied to the subsequent pages of smem. ccs data is 1/256 of
lmem size.

Swapin happens exactly in reverse order. First main memory of lmem is
restored from the smem's initial pages and the ccs data will be restored
from the subsequent pages of smem.

Extracting and restoring the CCS data is done through a special cmd called
XY_CTRL_SURF_COPY_BLT

v2: Fixing the ccs handling
v3: Handle the ccs data at same loop as main memory [Thomas]
v4: changes for emit_copy_ccs

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 163 +++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index b6c5a0102bc2..ddc7df3de9bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -651,6 +651,65 @@ static int emit_copy(struct i915_request *rq,
 	return 0;
 }
 
+static int scatter_list_length(struct scatterlist *sg)
+{
+	int len = 0;
+
+	while (sg && sg_dma_len(sg)) {
+		len += sg_dma_len(sg);
+		sg = sg_next(sg);
+	};
+
+	return len;
+}
+
+static void
+calculate_chunk_sz(struct drm_i915_private *i915, bool src_is_lmem,
+		   int *src_sz, int *ccs_sz, u32 bytes_to_cpy,
+		   u32 ccs_bytes_to_cpy)
+{
+	if (ccs_bytes_to_cpy) {
+		/*
+		 * We can only copy the ccs data corresponding to
+		 * the CHUNK_SZ of lmem which is
+		 * GET_CCS_BYTES(i915, CHUNK_SZ))
+		 */
+		*ccs_sz = min_t(int, ccs_bytes_to_cpy, GET_CCS_BYTES(i915, CHUNK_SZ));
+
+		if (!src_is_lmem)
+			/*
+			 * When CHUNK_SZ is passed all the pages upto CHUNK_SZ
+			 * will be taken for the blt. in Flat-ccs supported
+			 * platform Smem obj will have more pages than required
+			 * for main meory hence limit it to the required size
+			 * for main memory
+			 */
+			*src_sz = min_t(int, bytes_to_cpy, CHUNK_SZ);
+	} else { /* ccs handling is not required */
+		*src_sz = CHUNK_SZ;
+	}
+}
+
+static void get_ccs_sg_sgt(struct sgt_dma *it, u32 bytes_to_cpy)
+{
+	u32 len;
+
+	do {
+		GEM_BUG_ON(!it->sg || !sg_dma_len(it->sg));
+		len = it->max - it->dma;
+		if (len > bytes_to_cpy) {
+			it->dma += bytes_to_cpy;
+			break;
+		}
+
+		bytes_to_cpy -= len;
+
+		it->sg = __sg_next(it->sg);
+		it->dma = sg_dma_address(it->sg);
+		it->max = it->dma + sg_dma_len(it->sg);
+	} while (bytes_to_cpy);
+}
+
 int
 intel_context_migrate_copy(struct intel_context *ce,
 			   const struct i915_deps *deps,
@@ -662,9 +721,15 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   bool dst_is_lmem,
 			   struct i915_request **out)
 {
-	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst), it_ccs;
+	struct drm_i915_private *i915 = ce->engine->i915;
+	u32 ccs_bytes_to_cpy = 0, bytes_to_cpy;
+	enum i915_cache_level ccs_cache_level;
+	int src_sz, dst_sz, ccs_sz;
 	u32 src_offset, dst_offset;
+	u8 src_access, dst_access;
 	struct i915_request *rq;
+	bool ccs_is_src;
 	int err;
 
 	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
@@ -672,6 +737,38 @@ intel_context_migrate_copy(struct intel_context *ce,
 
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
+	src_sz = scatter_list_length(src);
+	bytes_to_cpy = src_sz;
+
+	if (HAS_FLAT_CCS(i915) && src_is_lmem ^ dst_is_lmem) {
+		src_access = !src_is_lmem && dst_is_lmem;
+		dst_access = !src_access;
+
+		dst_sz = scatter_list_length(dst);
+		if (src_is_lmem) {
+			it_ccs = it_dst;
+			ccs_cache_level = dst_cache_level;
+			ccs_is_src = false;
+		} else if (dst_is_lmem) {
+			bytes_to_cpy = dst_sz;
+			it_ccs = it_src;
+			ccs_cache_level = src_cache_level;
+			ccs_is_src = true;
+		}
+
+		/*
+		 * When there is a eviction of ccs needed smem will have the
+		 * extra pages for the ccs data
+		 *
+		 * TO-DO: Want to move the size mismatch check to a WARN_ON,
+		 * but still we have some requests of smem->lmem with same size.
+		 * Need to fix it.
+		 */
+		ccs_bytes_to_cpy = src_sz != dst_sz ? GET_CCS_BYTES(i915, bytes_to_cpy) : 0;
+		if (ccs_bytes_to_cpy)
+			get_ccs_sg_sgt(&it_ccs, bytes_to_cpy);
+	}
+
 	src_offset = 0;
 	dst_offset = CHUNK_SZ;
 	if (HAS_64K_PAGES(ce->engine->i915)) {
@@ -713,8 +810,11 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
+		calculate_chunk_sz(i915, src_is_lmem, &src_sz, &ccs_sz,
+				   bytes_to_cpy, ccs_bytes_to_cpy);
+
 		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
-			       src_offset, CHUNK_SZ);
+			       src_offset, src_sz);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -731,7 +831,46 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, dst_offset, src_offset, len);
+		err = emit_copy(rq, dst_offset,	src_offset, len);
+		if (err)
+			goto out_rq;
+
+		bytes_to_cpy -= len;
+
+		if (ccs_bytes_to_cpy) {
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			err = emit_pte(rq, &it_ccs, ccs_cache_level, false,
+				       ccs_is_src ? src_offset : dst_offset,
+				       ccs_sz);
+
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			/*
+			 * Using max of src_sz and dst_sz, as we need to
+			 * pass the lmem size corresponding to the ccs
+			 * blocks we need to handle.
+			 */
+			ccs_sz = max_t(int, ccs_is_src ? ccs_sz : src_sz,
+				       ccs_is_src ? dst_sz : ccs_sz);
+
+			err = emit_copy_ccs(rq, dst_offset, dst_access,
+					    src_offset, src_access, ccs_sz);
+			if (err)
+				goto out_rq;
+
+			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+			if (err)
+				goto out_rq;
+
+			/* Converting back to ccs bytes */
+			ccs_sz = GET_CCS_BYTES(rq->engine->i915, ccs_sz);
+			ccs_bytes_to_cpy -= ccs_sz;
+		}
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -739,9 +878,25 @@ intel_context_migrate_copy(struct intel_context *ce,
 			i915_request_put(*out);
 		*out = i915_request_get(rq);
 		i915_request_add(rq);
-		if (err || !it_src.sg || !sg_dma_len(it_src.sg))
+
+		if (err)
 			break;
 
+		if (!bytes_to_cpy && !ccs_bytes_to_cpy) {
+			if (src_is_lmem)
+				WARN_ON(it_src.sg && sg_dma_len(it_src.sg));
+			else
+				WARN_ON(it_dst.sg && sg_dma_len(it_dst.sg));
+			break;
+		}
+
+		if (WARN_ON(!it_src.sg || !sg_dma_len(it_src.sg) ||
+			    !it_dst.sg || !sg_dma_len(it_dst.sg) ||
+			    !it_ccs.sg || !sg_dma_len(it_ccs.sg))) {
+			err = -EINVAL;
+			break;
+		}
+
 		cond_resched();
 	} while (1);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/ttm: Evict and restore of compressed object (rev2)
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
                   ` (8 preceding siblings ...)
  (?)
@ 2022-03-19 20:50 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2022-03-19 20:50 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Evict and restore of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/101106/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
d7aead7dd462 drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
77cd7fc660bc drm/i915/gt: Clear compress metadata for Flat-ccs objects
-:40: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#40: FILE: drivers/gpu/drm/i915/gt/intel_gpu_commands.h:156:
+#define   MI_FLUSH_DW_CCS		(1<<16)
                          		  ^

-:43: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#43: FILE: drivers/gpu/drm/i915/gt/intel_gpu_commands.h:159:
+#define   MI_FLUSH_DW_LLC		(1<<9)
                          		  ^

total: 0 errors, 0 warnings, 2 checks, 198 lines checked
670ff29d341b drm/i915/selftest_migrate: Consider the possible roundup of size
2dfff14cce14 drm/i915/selftest_migrate: Check CCS meta data clear
da00aef96bd2 drm/i915/gt: Optimize the migration loop
3311d89b98a5 drm/ttm: Add a parameter to add extra pages into ttm_tt
-:92: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#92: FILE: drivers/gpu/drm/ttm/ttm_tt.c:150:
+		uint32_t page_flags, enum ttm_caching caching,

-:139: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'
#139: FILE: include/drm/ttm/ttm_tt.h:151:
+		uint32_t page_flags, enum ttm_caching caching,

total: 0 errors, 0 warnings, 2 checks, 88 lines checked
302c4a819aef drm/i915/gem: Add extra pages in ttm_tt for ccs data
1c655127031f drm/i915/migrate: Evict and restore the flatccs capable lmem obj



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Evict and restore of compressed object (rev2)
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
                   ` (9 preceding siblings ...)
  (?)
@ 2022-03-19 20:52 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2022-03-19 20:52 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Evict and restore of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/101106/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/ttm: Evict and restore of compressed object (rev2)
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
                   ` (10 preceding siblings ...)
  (?)
@ 2022-03-19 21:26 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2022-03-19 21:26 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 19538 bytes --]

== Series Details ==

Series: drm/i915/ttm: Evict and restore of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/101106/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11385 -> Patchwork_22620
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22620 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22620, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/index.html

Participating hosts (47 -> 41)
------------------------------

  Missing    (6): fi-bdw-5557u shard-tglu fi-bsw-cyan shard-rkl shard-dg1 fi-bdw-samus 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22620:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@migrate:
    - fi-bsw-kefka:       [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-bsw-kefka/igt@i915_selftest@live@migrate.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-kefka/igt@i915_selftest@live@migrate.html
    - fi-kbl-8809g:       [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-kbl-8809g/igt@i915_selftest@live@migrate.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-8809g/igt@i915_selftest@live@migrate.html
    - fi-kbl-x1275:       [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-kbl-x1275/igt@i915_selftest@live@migrate.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-x1275/igt@i915_selftest@live@migrate.html
    - fi-rkl-guc:         [PASS][7] -> [DMESG-FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-rkl-guc/igt@i915_selftest@live@migrate.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-rkl-guc/igt@i915_selftest@live@migrate.html
    - fi-skl-6700k2:      [PASS][9] -> [DMESG-FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-skl-6700k2/igt@i915_selftest@live@migrate.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-skl-6700k2/igt@i915_selftest@live@migrate.html
    - fi-cfl-guc:         [PASS][11] -> [DMESG-FAIL][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-guc/igt@i915_selftest@live@migrate.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-guc/igt@i915_selftest@live@migrate.html
    - fi-bsw-n3050:       [PASS][13] -> [DMESG-FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-bsw-n3050/igt@i915_selftest@live@migrate.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-n3050/igt@i915_selftest@live@migrate.html
    - fi-cfl-8700k:       [PASS][15] -> [DMESG-FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-8700k/igt@i915_selftest@live@migrate.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8700k/igt@i915_selftest@live@migrate.html
    - fi-tgl-1115g4:      [PASS][17] -> [DMESG-FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-tgl-1115g4/igt@i915_selftest@live@migrate.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-tgl-1115g4/igt@i915_selftest@live@migrate.html
    - fi-bxt-dsi:         [PASS][19] -> [DMESG-FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-bxt-dsi/igt@i915_selftest@live@migrate.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bxt-dsi/igt@i915_selftest@live@migrate.html
    - fi-cfl-8109u:       [PASS][21] -> [DMESG-FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-8109u/igt@i915_selftest@live@migrate.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8109u/igt@i915_selftest@live@migrate.html
    - fi-glk-j4005:       [PASS][23] -> [DMESG-FAIL][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-glk-j4005/igt@i915_selftest@live@migrate.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-glk-j4005/igt@i915_selftest@live@migrate.html
    - fi-skl-guc:         [PASS][25] -> [DMESG-FAIL][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-skl-guc/igt@i915_selftest@live@migrate.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-skl-guc/igt@i915_selftest@live@migrate.html
    - fi-kbl-7567u:       [PASS][27] -> [DMESG-FAIL][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-kbl-7567u/igt@i915_selftest@live@migrate.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-7567u/igt@i915_selftest@live@migrate.html
    - fi-kbl-7500u:       [PASS][29] -> [DMESG-FAIL][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-kbl-7500u/igt@i915_selftest@live@migrate.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-7500u/igt@i915_selftest@live@migrate.html
    - fi-glk-dsi:         [PASS][31] -> [DMESG-FAIL][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-glk-dsi/igt@i915_selftest@live@migrate.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-glk-dsi/igt@i915_selftest@live@migrate.html
    - fi-kbl-soraka:      [PASS][33] -> [DMESG-FAIL][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-kbl-soraka/igt@i915_selftest@live@migrate.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-soraka/igt@i915_selftest@live@migrate.html
    - fi-bsw-nick:        [PASS][35] -> [DMESG-FAIL][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-bsw-nick/igt@i915_selftest@live@migrate.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-nick/igt@i915_selftest@live@migrate.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_ringfill@basic-all:
    - {bat-dg2-9}:        [PASS][37] -> [DMESG-WARN][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-dg2-9/igt@gem_ringfill@basic-all.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-dg2-9/igt@gem_ringfill@basic-all.html

  * igt@i915_selftest@live@migrate:
    - {fi-ehl-2}:         [PASS][39] -> [DMESG-FAIL][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-ehl-2/igt@i915_selftest@live@migrate.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-ehl-2/igt@i915_selftest@live@migrate.html
    - {bat-jsl-1}:        [PASS][41] -> [DMESG-FAIL][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-jsl-1/igt@i915_selftest@live@migrate.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-jsl-1/igt@i915_selftest@live@migrate.html
    - {fi-adl-ddr5}:      [PASS][43] -> [DMESG-FAIL][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-adl-ddr5/igt@i915_selftest@live@migrate.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-adl-ddr5/igt@i915_selftest@live@migrate.html
    - {fi-tgl-dsi}:       [PASS][45] -> [DMESG-FAIL][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-tgl-dsi/igt@i915_selftest@live@migrate.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-tgl-dsi/igt@i915_selftest@live@migrate.html
    - {bat-jsl-2}:        [PASS][47] -> [INCOMPLETE][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-jsl-2/igt@i915_selftest@live@migrate.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-jsl-2/igt@i915_selftest@live@migrate.html
    - {fi-jsl-1}:         [PASS][49] -> [DMESG-FAIL][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-jsl-1/igt@i915_selftest@live@migrate.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-jsl-1/igt@i915_selftest@live@migrate.html
    - {bat-adlp-6}:       [PASS][51] -> [DMESG-FAIL][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-adlp-6/igt@i915_selftest@live@migrate.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-adlp-6/igt@i915_selftest@live@migrate.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-atomic:
    - {bat-adlm-1}:       [PASS][53] -> [INCOMPLETE][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-adlm-1/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-adlm-1/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html

  
Known issues
------------

  Here are the changes found in Patchwork_22620 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@hangcheck:
    - fi-hsw-4770:        [PASS][55] -> [INCOMPLETE][56] ([i915#4785])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cfl-8109u:       [PASS][57] -> [DMESG-FAIL][58] ([i915#295])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c:
    - fi-cfl-8109u:       [PASS][59] -> [DMESG-WARN][60] ([i915#295] / [i915#5341])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-b:
    - fi-cfl-8109u:       [PASS][61] -> [DMESG-WARN][62] ([i915#295]) +9 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/fi-cfl-8109u/igt@kms_pipe_crc_basic@read-crc-pipe-b.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8109u/igt@kms_pipe_crc_basic@read-crc-pipe-b.html

  * igt@runner@aborted:
    - fi-cfl-8109u:       NOTRUN -> [FAIL][63] ([i915#4312] / [i915#5257])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8109u/igt@runner@aborted.html
    - fi-glk-dsi:         NOTRUN -> [FAIL][64] ([i915#4312] / [i915#5257] / [k.org#202321])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-glk-dsi/igt@runner@aborted.html
    - fi-kbl-soraka:      NOTRUN -> [FAIL][65] ([i915#1436] / [i915#4312] / [i915#5257])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-soraka/igt@runner@aborted.html
    - fi-hsw-4770:        NOTRUN -> [FAIL][66] ([fdo#109271] / [i915#1436] / [i915#4312])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-hsw-4770/igt@runner@aborted.html
    - fi-kbl-7500u:       NOTRUN -> [FAIL][67] ([i915#1436] / [i915#4312] / [i915#5257])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-7500u/igt@runner@aborted.html
    - fi-bxt-dsi:         NOTRUN -> [FAIL][68] ([i915#4312] / [i915#5257])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bxt-dsi/igt@runner@aborted.html
    - fi-tgl-1115g4:      NOTRUN -> [FAIL][69] ([i915#1436] / [i915#4312] / [i915#5257])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-tgl-1115g4/igt@runner@aborted.html
    - fi-cfl-guc:         NOTRUN -> [FAIL][70] ([i915#4312] / [i915#5257])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-guc/igt@runner@aborted.html
    - fi-glk-j4005:       NOTRUN -> [FAIL][71] ([i915#4312] / [i915#5257] / [k.org#202321])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-glk-j4005/igt@runner@aborted.html
    - fi-kbl-7567u:       NOTRUN -> [FAIL][72] ([i915#1436] / [i915#4312] / [i915#5257])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-7567u/igt@runner@aborted.html
    - fi-skl-guc:         NOTRUN -> [FAIL][73] ([i915#1436] / [i915#4312] / [i915#5257])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-skl-guc/igt@runner@aborted.html
    - fi-skl-6700k2:      NOTRUN -> [FAIL][74] ([i915#1436] / [i915#4312] / [i915#5257])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-skl-6700k2/igt@runner@aborted.html
    - fi-bsw-n3050:       NOTRUN -> [FAIL][75] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-n3050/igt@runner@aborted.html
    - fi-kbl-x1275:       NOTRUN -> [FAIL][76] ([i915#1436] / [i915#4312] / [i915#5257])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-x1275/igt@runner@aborted.html
    - fi-bsw-kefka:       NOTRUN -> [FAIL][77] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-kefka/igt@runner@aborted.html
    - fi-cfl-8700k:       NOTRUN -> [FAIL][78] ([i915#4312] / [i915#5257])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-cfl-8700k/igt@runner@aborted.html
    - fi-bsw-nick:        NOTRUN -> [FAIL][79] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-bsw-nick/igt@runner@aborted.html
    - fi-kbl-8809g:       NOTRUN -> [FAIL][80] ([i915#1436] / [i915#4312] / [i915#5257])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-kbl-8809g/igt@runner@aborted.html
    - fi-rkl-guc:         NOTRUN -> [FAIL][81] ([i915#4312])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/fi-rkl-guc/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@kms_busy@basic@modeset:
    - {bat-adlp-6}:       [DMESG-WARN][82] ([i915#3576]) -> [PASS][83]
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-adlp-6/igt@kms_busy@basic@modeset.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/bat-adlp-6/igt@kms_busy@basic@modeset.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#295]: https://gitlab.freedesktop.org/drm/intel/issues/295
  [i915#3428]: https://gitlab.freedesktop.org/drm/intel/issues/3428
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257
  [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270
  [i915#5339]: https://gitlab.freedesktop.org/drm/intel/issues/5339
  [i915#5341]: https://gitlab.freedesktop.org/drm/intel/issues/5341
  [i915#5342]: https://gitlab.freedesktop.org/drm/intel/issues/5342
  [i915#5356]: https://gitlab.freedesktop.org/drm/intel/issues/5356
  [k.org#202321]: https://bugzilla.kernel.org/show_bug.cgi?id=202321


Build changes
-------------

  * IGT: IGT_6386 -> IGTPW_6786
  * Linux: CI_DRM_11385 -> Patchwork_22620

  CI-20190529: 20190529
  CI_DRM_11385: 3babe046f5f5544ec772cd443f9d5ca24e342348 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGTPW_6786: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_6786/index.html
  IGT_6386: 0fcd59ad25b2960c0b654f90dfe4dd9e7c7b874d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22620: 1c655127031fcc99f4690104be416c9c54f37874 @ git://anongit.freedesktop.org/gfx-ci/linux


== Kernel 32bit build ==

Warning: Kernel 32bit buildtest failed:
https://intel-gfx-ci.01.org/Patchwork_22620/build_32bit.log

  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/gt/intel_migrate.o
In file included from ./include/linux/kernel.h:29,
                 from ./include/linux/iova.h:13,
                 from ./include/linux/intel-iommu.h:14,
                 from ./drivers/gpu/drm/i915/i915_drv.h:37,
                 from drivers/gpu/drm/i915/gt/intel_migrate.c:6:
drivers/gpu/drm/i915/gt/selftest_migrate.c: In function ‘clear’:
./include/linux/kern_levels.h:5:18: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ [-Werror=format=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
                  ^~~~~~
./include/linux/printk.h:418:11: note: in definition of macro ‘printk_index_wrap’
   _p_func(_fmt, ##__VA_ARGS__);    \
           ^~~~
./include/linux/printk.h:489:2: note: in expansion of macro ‘printk’
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~
./include/linux/kern_levels.h:11:18: note: in expansion of macro ‘KERN_SOH’
 #define KERN_ERR KERN_SOH "3" /* error conditions */
                  ^~~~~~~~
./include/linux/printk.h:489:9: note: in expansion of macro ‘KERN_ERR’
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         ^~~~~~~~
drivers/gpu/drm/i915/gt/selftest_migrate.c:403:6: note: in expansion of macro ‘pr_err’
      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
      ^~~~~~
In file included from drivers/gpu/drm/i915/gt/intel_migrate.c:1167:
drivers/gpu/drm/i915/gt/selftest_migrate.c:403:52: note: format string is defined here
      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
                                                  ~~^
                                                  %u
cc1: all warnings being treated as errors
scripts/Makefile.build:288: recipe for target 'drivers/gpu/drm/i915/gt/intel_migrate.o' failed
make[4]: *** [drivers/gpu/drm/i915/gt/intel_migrate.o] Error 1
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1831: recipe for target 'drivers' failed
make: *** [drivers] Error 2


== Linux commits ==

1c655127031f drm/i915/migrate: Evict and restore the flatccs capable lmem obj
302c4a819aef drm/i915/gem: Add extra pages in ttm_tt for ccs data
3311d89b98a5 drm/ttm: Add a parameter to add extra pages into ttm_tt
da00aef96bd2 drm/i915/gt: Optimize the migration loop
2dfff14cce14 drm/i915/selftest_migrate: Check CCS meta data clear
670ff29d341b drm/i915/selftest_migrate: Consider the possible roundup of size
77cd7fc660bc drm/i915/gt: Clear compress metadata for Flat-ccs objects
d7aead7dd462 drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/index.html

[-- Attachment #2: Type: text/html, Size: 24794 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: warning for drm/i915/ttm: Evict and restore of compressed object (rev2)
  2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
                   ` (11 preceding siblings ...)
  (?)
@ 2022-03-19 21:26 ` Patchwork
  -1 siblings, 0 replies; 33+ messages in thread
From: Patchwork @ 2022-03-19 21:26 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/ttm: Evict and restore of compressed object (rev2)
URL   : https://patchwork.freedesktop.org/series/101106/
State : warning

== Summary ==

CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/gt/intel_migrate.o
In file included from ./include/linux/kernel.h:29,
                 from ./include/linux/iova.h:13,
                 from ./include/linux/intel-iommu.h:14,
                 from ./drivers/gpu/drm/i915/i915_drv.h:37,
                 from drivers/gpu/drm/i915/gt/intel_migrate.c:6:
drivers/gpu/drm/i915/gt/selftest_migrate.c: In function ‘clear’:
./include/linux/kern_levels.h:5:18: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘unsigned int’ [-Werror=format=]
 #define KERN_SOH "\001"  /* ASCII Start Of Header */
                  ^~~~~~
./include/linux/printk.h:418:11: note: in definition of macro ‘printk_index_wrap’
   _p_func(_fmt, ##__VA_ARGS__);    \
           ^~~~
./include/linux/printk.h:489:2: note: in expansion of macro ‘printk’
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~
./include/linux/kern_levels.h:11:18: note: in expansion of macro ‘KERN_SOH’
 #define KERN_ERR KERN_SOH "3" /* error conditions */
                  ^~~~~~~~
./include/linux/printk.h:489:9: note: in expansion of macro ‘KERN_ERR’
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         ^~~~~~~~
drivers/gpu/drm/i915/gt/selftest_migrate.c:403:6: note: in expansion of macro ‘pr_err’
      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
      ^~~~~~
In file included from drivers/gpu/drm/i915/gt/intel_migrate.c:1167:
drivers/gpu/drm/i915/gt/selftest_migrate.c:403:52: note: format string is defined here
      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
                                                  ~~^
                                                  %u
cc1: all warnings being treated as errors
scripts/Makefile.build:288: recipe for target 'drivers/gpu/drm/i915/gt/intel_migrate.o' failed
make[4]: *** [drivers/gpu/drm/i915/gt/intel_migrate.o] Error 1
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1831: recipe for target 'drivers' failed
make: *** [drivers] Error 2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22620/build_32bit.log

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
  2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-03-20  1:39   ` kernel test robot
  -1 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-03-20  1:39 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel
  Cc: Hellstrom Thomas, kbuild-all, Matthew Auld

Hi Ramalingam,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on drm/drm-next drm-tip/drm-tip next-20220318]
[cannot apply to v5.17-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ramalingam-C/drm-i915-ttm-Evict-and-restore-of-compressed-object/20220320-044242
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20220320/202203200912.4mqFVTe9-lkp@intel.com/config)
compiler: gcc-9 (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/afd58bdbf43437bf72ff2313776c3036ebf99a11
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ramalingam-C/drm-i915-ttm-Evict-and-restore-of-compressed-object/20220320-044242
        git checkout afd58bdbf43437bf72ff2313776c3036ebf99a11
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:29,
                    from arch/x86/include/asm/percpu.h:27,
                    from arch/x86/include/asm/current.h:6,
                    from arch/x86/include/asm/processor.h:17,
                    from arch/x86/include/asm/kvm_para.h:5,
                    from arch/x86/include/asm/hypervisor.h:37,
                    from drivers/gpu/drm/i915/i915_drv.h:35,
                    from drivers/gpu/drm/i915/gt/intel_migrate.c:6:
   drivers/gpu/drm/i915/gt/selftest_migrate.c: In function 'clear':
>> include/linux/kern_levels.h:5:18: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Werror=format=]
       5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
         |                  ^~~~~~
   include/linux/printk.h:418:11: note: in definition of macro 'printk_index_wrap'
     418 |   _p_func(_fmt, ##__VA_ARGS__);    \
         |           ^~~~
   include/linux/printk.h:489:2: note: in expansion of macro 'printk'
     489 |  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         |  ^~~~~~
   include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
      11 | #define KERN_ERR KERN_SOH "3" /* error conditions */
         |                  ^~~~~~~~
   include/linux/printk.h:489:9: note: in expansion of macro 'KERN_ERR'
     489 |  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
         |         ^~~~~~~~
   drivers/gpu/drm/i915/gt/selftest_migrate.c:403:6: note: in expansion of macro 'pr_err'
     403 |      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
         |      ^~~~~~
   In file included from drivers/gpu/drm/i915/gt/intel_migrate.c:1014:
   drivers/gpu/drm/i915/gt/selftest_migrate.c:403:52: note: format string is defined here
     403 |      pr_err("%ps ccs clearing failed, offset: %d/%lu\n",
         |                                                  ~~^
         |                                                    |
         |                                                    long unsigned int
         |                                                  %u
   drivers/gpu/drm/i915/gt/intel_migrate.c: In function 'intel_context_copy_ccs':
>> drivers/gpu/drm/i915/gt/selftest_migrate.c:157:19: error: 'rq' is used uninitialized in this function [-Werror=uninitialized]
     157 |  offset += (u64)rq->engine->instance << 32;
         |                 ~~^~~~~~~~
   cc1: all warnings being treated as errors


vim +/rq +157 drivers/gpu/drm/i915/gt/selftest_migrate.c

   134	
   135	static int intel_context_copy_ccs(struct intel_context *ce,
   136					  const struct i915_deps *deps,
   137					  struct scatterlist *sg,
   138					  enum i915_cache_level cache_level,
   139					  bool write_to_ccs,
   140					  struct i915_request **out)
   141	{
   142		u8 src_access = write_to_ccs ? DIRECT_ACCESS : INDIRECT_ACCESS;
   143		u8 dst_access = write_to_ccs ? INDIRECT_ACCESS : DIRECT_ACCESS;
   144		struct sgt_dma it = sg_sgt(sg);
   145		struct i915_request *rq;
   146		u32 offset;
   147		int err;
   148	
   149		GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
   150		*out = NULL;
   151	
   152		GEM_BUG_ON(ce->ring->size < SZ_64K);
   153	
   154		offset = 0;
   155		if (HAS_64K_PAGES(ce->engine->i915))
   156			offset = CHUNK_SZ;
 > 157		offset += (u64)rq->engine->instance << 32;
   158	
   159		do {
   160			int len;
   161	
   162			rq = i915_request_create(ce);
   163			if (IS_ERR(rq)) {
   164				err = PTR_ERR(rq);
   165				goto out_ce;
   166			}
   167	
   168			if (deps) {
   169				err = i915_request_await_deps(rq, deps);
   170				if (err)
   171					goto out_rq;
   172	
   173				if (rq->engine->emit_init_breadcrumb) {
   174					err = rq->engine->emit_init_breadcrumb(rq);
   175					if (err)
   176						goto out_rq;
   177				}
   178	
   179				deps = NULL;
   180			}
   181	
   182			/* The PTE updates + clear must not be interrupted. */
   183			err = emit_no_arbitration(rq);
   184			if (err)
   185				goto out_rq;
   186	
   187			len = emit_pte(rq, &it, cache_level, true, offset, CHUNK_SZ);
   188			if (len <= 0) {
   189				err = len;
   190				goto out_rq;
   191			}
   192	
   193			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
   194			if (err)
   195				goto out_rq;
   196	
   197			err = emit_copy_ccs(rq, offset, dst_access,
   198					    offset, src_access, len);
   199			if (err)
   200				goto out_rq;
   201	
   202			err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
   203						     MI_FLUSH_DW_CCS);
   204	
   205			/* Arbitration is re-enabled between requests. */
   206	out_rq:
   207			if (*out)
   208				i915_request_put(*out);
   209			*out = i915_request_get(rq);
   210			i915_request_add(rq);
   211			if (err || !it.sg || !sg_dma_len(it.sg))
   212				break;
   213	
   214			cond_resched();
   215		} while (1);
   216	
   217	out_ce:
   218		return err;
   219	}
   220	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
  2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
@ 2022-03-21  8:49     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 33+ messages in thread
From: Hellstrom, Thomas @ 2022-03-21  8:49 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence
> for
> clearing (Zero out) the pages of the newly allocated object, faster
> cmd
> is used.

NIT: Imperative wording

> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Also there's a typo in the patch title.

With that fixed:
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++-
> --
>  2 files changed, 43 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index d112ffd56418..925e55b6a94f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -205,6 +205,11 @@
>  
>  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> 2))
>  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
> +#define XY_FAST_COLOR_BLT_CMD          (2 << 29 | 0x44 << 22)
> +#define   XY_FAST_COLOR_BLT_DEPTH_32   (2 << 19)
> +#define   XY_FAST_COLOR_BLT_DW         16
> +#define   XY_FAST_COLOR_BLT_MOCS_MASK  GENMASK(27, 21)
> +#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
>  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
>  #define GEN9_XY_FAST_COPY_BLT_CMD      (2 << 29 | 0x42 << 22)
>  #define XY_SRC_COPY_BLT_CMD            (2 << 29 | 0x53 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..73199ebf0671 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>         return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +                     u32 value, bool is_lmem)
>  {
> -       const int ver = GRAPHICS_VER(rq->engine->i915);
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       const int ver = GRAPHICS_VER(i915);
> +       int ring_sz;
>         u32 *cs;
>  
>         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>         offset += (u64)rq->engine->instance << 32;
>  
> -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +       if (ver >= 12)
> +               ring_sz = 16;
> +       else if (ver >= 8)
> +               ring_sz = 8;
> +       else
> +               ring_sz = 6;
> +
> +       cs = intel_ring_begin(rq, ring_sz);
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> -       if (ver >= 8) {
> +       if (ver >= 12) {
> +               *cs++ = XY_FAST_COLOR_BLT_CMD |
> XY_FAST_COLOR_BLT_DEPTH_32 |
> +                       (XY_FAST_COLOR_BLT_DW - 2);
> +               *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs)
> |
> +                       (PAGE_SIZE - 1);
> +               *cs++ = 0;
> +               *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> +               *cs++ = lower_32_bits(offset);
> +               *cs++ = upper_32_bits(offset);
> +               *cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> +               /* BG7 */
> +               *cs++ = value;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               /* BG11 */
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               /* BG13 */
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +       } else if (ver >= 8) {
>                 *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
>                 *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY |
> PAGE_SIZE;
>                 *cs++ = 0;
> @@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context
> *ce,
>                 if (err)
>                         goto out_rq;
>  
> -               err = emit_clear(rq, offset, len, value);
> +               err = emit_clear(rq, offset, len, value, is_lmem);
>  
>                 /* Arbitration is re-enabled between requests. */
>  out_rq:

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
@ 2022-03-21  8:49     ` Hellstrom, Thomas
  0 siblings, 0 replies; 33+ messages in thread
From: Hellstrom, Thomas @ 2022-03-21  8:49 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence
> for
> clearing (Zero out) the pages of the newly allocated object, faster
> cmd
> is used.

NIT: Imperative wording

> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Also there's a typo in the patch title.

With that fixed:
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++-
> --
>  2 files changed, 43 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index d112ffd56418..925e55b6a94f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -205,6 +205,11 @@
>  
>  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> 2))
>  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
> +#define XY_FAST_COLOR_BLT_CMD          (2 << 29 | 0x44 << 22)
> +#define   XY_FAST_COLOR_BLT_DEPTH_32   (2 << 19)
> +#define   XY_FAST_COLOR_BLT_DW         16
> +#define   XY_FAST_COLOR_BLT_MOCS_MASK  GENMASK(27, 21)
> +#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
>  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
>  #define GEN9_XY_FAST_COPY_BLT_CMD      (2 << 29 | 0x42 << 22)
>  #define XY_SRC_COPY_BLT_CMD            (2 << 29 | 0x53 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..73199ebf0671 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context
> *ce,
>         return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +                     u32 value, bool is_lmem)
>  {
> -       const int ver = GRAPHICS_VER(rq->engine->i915);
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       const int ver = GRAPHICS_VER(i915);
> +       int ring_sz;
>         u32 *cs;
>  
>         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>         offset += (u64)rq->engine->instance << 32;
>  
> -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +       if (ver >= 12)
> +               ring_sz = 16;
> +       else if (ver >= 8)
> +               ring_sz = 8;
> +       else
> +               ring_sz = 6;
> +
> +       cs = intel_ring_begin(rq, ring_sz);
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> -       if (ver >= 8) {
> +       if (ver >= 12) {
> +               *cs++ = XY_FAST_COLOR_BLT_CMD |
> XY_FAST_COLOR_BLT_DEPTH_32 |
> +                       (XY_FAST_COLOR_BLT_DW - 2);
> +               *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs)
> |
> +                       (PAGE_SIZE - 1);
> +               *cs++ = 0;
> +               *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> +               *cs++ = lower_32_bits(offset);
> +               *cs++ = upper_32_bits(offset);
> +               *cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> +               /* BG7 */
> +               *cs++ = value;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               /* BG11 */
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               /* BG13 */
> +               *cs++ = 0;
> +               *cs++ = 0;
> +               *cs++ = 0;
> +       } else if (ver >= 8) {
>                 *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
>                 *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY |
> PAGE_SIZE;
>                 *cs++ = 0;
> @@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context
> *ce,
>                 if (err)
>                         goto out_rq;
>  
> -               err = emit_clear(rq, offset, len, value);
> +               err = emit_clear(rq, offset, len, value, is_lmem);
>  
>                 /* Arbitration is re-enabled between requests. */
>  out_rq:

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt
  2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
  (?)
@ 2022-03-21 10:11   ` Das, Nirmoy
  2022-03-21 23:06     ` Ramalingam C
  -1 siblings, 1 reply; 33+ messages in thread
From: Das, Nirmoy @ 2022-03-21 10:11 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel
  Cc: Thomas Hellstrom, Hellstrom Thomas, Matthew Auld, Christian Koenig

In the previous version I replied only to the mailing list email so 
probably my email slipped through.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> for patch 6-7

On 3/19/2022 9:42 PM, Ramalingam C wrote:
> Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
> driver needs extra pages in ttm_tt.
>
> v2:
>    Used imperative wording [Thomas and Christian]
>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> cc: Christian Koenig <christian.koenig@amd.com>
> cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
> Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Christian Konig <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
>   drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
>   drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
>   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
>   include/drm/ttm/ttm_tt.h                   |  4 +++-
>   7 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
> index dc7f938bfff2..123045b58fec 100644
> --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> @@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
>   	if (!tt)
>   		return NULL;
>   
> -	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> +	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
>   	if (ret < 0)
>   		goto err_ttm_tt_init;
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index e4a06fcf741a..3b9f99c765c4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -290,7 +290,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
>   		i915_tt->is_shmem = true;
>   	}
>   
> -	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> +	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
>   	if (ret)
>   		goto err_free;
>   
> diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
> index b2e33d5ba5d0..52156b54498f 100644
> --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
>   	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
>   	if (ttm == NULL)
>   		return NULL;
> -	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> +	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
>   		kfree(ttm);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> index 6ddc16f0fe2b..d27691f2e451 100644
> --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
>   	agp_be->mem = NULL;
>   	agp_be->bridge = bridge;
>   
> -	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
> +	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
>   		kfree(agp_be);
>   		return NULL;
>   	}
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index d234aab800a0..1a66d9fc589a 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
>   static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   			       struct ttm_buffer_object *bo,
>   			       uint32_t page_flags,
> -			       enum ttm_caching caching)
> +			       enum ttm_caching caching,
> +			       unsigned long extra_pages)
>   {
> -	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> +	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
>   	ttm->caching = ttm_cached;
>   	ttm->page_flags = page_flags;
>   	ttm->dma_address = NULL;
> @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
>   }
>   
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching)
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages)
>   {
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
>   
>   	if (ttm_tt_alloc_page_directory(ttm)) {
>   		pr_err("Failed allocating page table\n");
> @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
>   {
>   	int ret;
>   
> -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> +	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
>   
>   	if (page_flags & TTM_TT_FLAG_EXTERNAL)
>   		ret = ttm_sg_tt_alloc_page_directory(ttm);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> index b84ecc6d6611..4e3938e62c08 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
>   				     ttm_cached);
>   	else
>   		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> -				  ttm_cached);
> +				  ttm_cached, 0);
>   	if (unlikely(ret != 0))
>   		goto out_no_init;
>   
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index f20832139815..17a0310e8aaa 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * @bo: The buffer object we create the ttm for.
>    * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
>    * @caching: the desired caching state of the pages
> + * @extra_pages: Extra pages needed for the driver.
>    *
>    * Create a struct ttm_tt to back data with system memory pages.
>    * No pages are actually allocated.
> @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
>    * NULL: Out of memory.
>    */
>   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> -		uint32_t page_flags, enum ttm_caching caching);
> +		uint32_t page_flags, enum ttm_caching caching,
> +		unsigned long extra_pages);
>   int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
>   		   uint32_t page_flags, enum ttm_caching caching);
>   

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
  2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
@ 2022-03-21 10:39     ` Hellstrom, Thomas
  -1 siblings, 0 replies; 33+ messages in thread
From: Hellstrom, Thomas @ 2022-03-21 10:39 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> While clearing the Flat-CCS capable lmem object, we need to clear the
> CCS
> meta data corresponding to the memory.
> 
> As part of live_migrate_clear add check for the ccs meta data clear
> for
> the Flat-CCS capable lmem object.
> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
>  drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++-
> --
>  2 files changed, 278 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index c1db8daf994a..bbfea570c239 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> u64 src_addr, u64 dst_addr,
>         return cmd;
>  }
>  
> +static int emit_copy_ccs(struct i915_request *rq,
> +                        u32 dst_offset, u8 dst_access,
> +                        u32 src_offset, u8 src_access, int size)
> +{
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       u32 num_ccs_blks, ccs_ring_size;
> +       u32 *cs;
> +
> +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> +       WARN_ON(!ccs_ring_size);
> +
> +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> +       if (IS_ERR(cs))
> +               return PTR_ERR(cs);
> +
> +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                                   NUM_CCS_BYTES_PER_BLOCK);
> +
> +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> +                                     src_access, dst_access,
> +                                     mocs, mocs, num_ccs_blks);
> +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> +       if (ccs_ring_size & 1)
> +               *cs++ = MI_NOOP;
> +
> +       intel_ring_advance(rq, cs);
> +
> +       return 0;
> +}


This would be an unused function if selftests are not configured,
right?


> +
>  static int emit_copy(struct i915_request *rq,
>                      u32 dst_offset, u32 src_offset, int size)
>  {
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index b5da8b8cd039..e32cc994f4a2 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
>         return err;
>  }
>  
> +static int intel_context_copy_ccs(struct intel_context *ce,
> +                                 const struct i915_deps *deps,
> +                                 struct scatterlist *sg,
> +                                 enum i915_cache_level cache_level,
> +                                 bool write_to_ccs,
> +                                 struct i915_request **out)
> +{
> +       u8 src_access = write_to_ccs ? DIRECT_ACCESS :
> INDIRECT_ACCESS;
> +       u8 dst_access = write_to_ccs ? INDIRECT_ACCESS :
> DIRECT_ACCESS;
> +       struct sgt_dma it = sg_sgt(sg);
> +       struct i915_request *rq;
> +       u32 offset;
> +       int err;
> +
> +       GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> +       *out = NULL;
> +
> +       GEM_BUG_ON(ce->ring->size < SZ_64K);
> +
> +       offset = 0;
> +       if (HAS_64K_PAGES(ce->engine->i915))
> +               offset = CHUNK_SZ;
> +       offset += (u64)rq->engine->instance << 32;
> +
> +       do {
> +               int len;
> +
> +               rq = i915_request_create(ce);
> +               if (IS_ERR(rq)) {
> +                       err = PTR_ERR(rq);
> +                       goto out_ce;
> +               }
> +
> +               if (deps) {
> +                       err = i915_request_await_deps(rq, deps);
> +                       if (err)
> +                               goto out_rq;
> +
> +                       if (rq->engine->emit_init_breadcrumb) {
> +                               err = rq->engine-
> >emit_init_breadcrumb(rq);
> +                               if (err)
> +                                       goto out_rq;
> +                       }
> +
> +                       deps = NULL;
> +               }
> +
> +               /* The PTE updates + clear must not be interrupted.
> */
> +               err = emit_no_arbitration(rq);
> +               if (err)
> +                       goto out_rq;
> +
> +               len = emit_pte(rq, &it, cache_level, true, offset,
> CHUNK_SZ);
> +               if (len <= 0) {
> +                       err = len;
> +                       goto out_rq;
> +               }
> +
> +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> +               if (err)
> +                       goto out_rq;
> +
> +               err = emit_copy_ccs(rq, offset, dst_access,
> +                                   offset, src_access, len);
> +               if (err)
> +                       goto out_rq;
> +
> +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
> +                                            MI_FLUSH_DW_CCS);
> +
> +               /* Arbitration is re-enabled between requests. */
> +out_rq:
> +               if (*out)
> +                       i915_request_put(*out);
> +               *out = i915_request_get(rq);
> +               i915_request_add(rq);
> +               if (err || !it.sg || !sg_dma_len(it.sg))
> +                       break;
> +
> +               cond_resched();
> +       } while (1);
> +
> +out_ce:
> +       return err;
> +}
> +
> +static int
> +intel_migrate_ccs_copy(struct intel_migrate *m,
> +                      struct i915_gem_ww_ctx *ww,
> +                      const struct i915_deps *deps,
> +                      struct scatterlist *sg,
> +                      enum i915_cache_level cache_level,
> +                      bool write_to_ccs,
> +                      struct i915_request **out)
> +{
> +       struct intel_context *ce;
> +       int err;
> +
> +       *out = NULL;
> +       if (!m->context)
> +               return -ENODEV;
> +
> +       ce = intel_migrate_create_context(m);
> +       if (IS_ERR(ce))
> +               ce = intel_context_get(m->context);
> +       GEM_BUG_ON(IS_ERR(ce));
> +
> +       err = intel_context_pin_ww(ce, ww);
> +       if (err)
> +               goto out;
> +
> +       err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> +                                    write_to_ccs, out);
> +
> +       intel_context_unpin(ce);
> +out:
> +       intel_context_put(ce);
> +       return err;
> +}
> +
>  static int clear(struct intel_migrate *migrate,
>                  int (*fn)(struct intel_migrate *migrate,
>                            struct i915_gem_ww_ctx *ww,
> @@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
>         struct drm_i915_gem_object *obj;
>         struct i915_request *rq;
>         struct i915_gem_ww_ctx ww;
> -       u32 *vaddr;
> +       u32 *vaddr, val = 0;
> +       bool ccs_cap = false;
>         int err = 0;
>         int i;
>  
> @@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
>         /* Consider the rounded up memory too */
>         sz = obj->base.size;
>  
> +       if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
> +               ccs_cap = true;
> +
>         for_i915_gem_ww(&ww, err, true) {
> +               int ccs_bytes;
> +
>                 err = i915_gem_object_lock(obj, &ww);
>                 if (err)
>                         continue;
> @@ -170,44 +296,136 @@ static int clear(struct intel_migrate
> *migrate,
>                         vaddr[i] = ~i;
>                 i915_gem_object_flush_map(obj);
>  
> -               err = fn(migrate, &ww, obj, sz, &rq);
> -               if (!err)
> -                       continue;
> +               if (ccs_cap && !val) {
> +                       /* Write the obj data into ccs surface */
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL,
> +                                                    obj->mm.pages-
> >sgl,
> +                                                    obj-
> >cache_level,
> +                                                    true, &rq);
> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       for (i = 0; i < sz / sizeof(u32); i++)
> +                               vaddr[i] = 0x5a5a5a5a;
> +                       i915_gem_object_flush_map(obj);
> +
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL, obj->mm.pages->sgl,
> +                                                    obj-
> >cache_level, false, &rq);

Why do we read back CCS content here?

> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       i915_gem_object_flush_map(obj);
> +                       for (i = 0; !err && i < ccs_bytes; i += 4) {
> +                               if (vaddr[i] != ~i) {
> +                                       pr_err("%ps ccs write and
> read failed, offset: %d\n",
> +                                              fn, i);
> +                                       err = -EINVAL;
> +                               }
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       i915_gem_object_flush_map(obj);
> +               }
>  
> -               if (err != -EDEADLK && err != -EINTR && err != -
> ERESTARTSYS)
> -                       pr_err("%ps failed, size: %u\n", fn, sz);
> -               if (rq) {
> -                       i915_request_wait(rq, 0, HZ);
> +               err = fn(migrate, &ww, obj, val, &rq);
> +               if (rq && !err) {
> +                       if (i915_request_wait(rq, 0, HZ) < 0) {
> +                               pr_err("%ps timed out, size: %u\n",
> fn, sz);
> +                               err = -ETIME;
> +                       }
>                         i915_request_put(rq);
> +                       rq = NULL;
>                 }
> -               i915_gem_object_unpin_map(obj);
> -       }
> -       if (err)
> -               goto err_out;
> +               if (err)
> +                       continue;
>  
> -       if (rq) {
> -               if (i915_request_wait(rq, 0, HZ) < 0) {
> -                       pr_err("%ps timed out, size: %u\n", fn, sz);
> -                       err = -ETIME;
> +               i915_gem_object_flush_map(obj);
> +
> +               /* Verify the set/clear of the obj mem */
> +               for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> +                       int x = i * 1024 +
> +                               i915_prandom_u32_max_state(1024,
> prng);
> +
> +                       if (vaddr[x] != val) {
> +                               pr_err("%ps failed, (%u != %u),
> offset: %zu\n",
> +                                      fn, vaddr[x], val,  x *
> sizeof(u32));
> +                               igt_hexdump(vaddr + i * 1024, 4096);
> +                               err = -EINVAL;
> +                       }
>                 }
> -               i915_request_put(rq);
> -       }
> +               if (err)
> +                       continue;
>  
> -       for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> -               int x = i * 1024 + i915_prandom_u32_max_state(1024,
> prng);
> +               if (ccs_cap && !val) {
> +                       for (i = 0; i < sz / sizeof(u32); i++)
> +                               vaddr[i] = ~i;
> +                       i915_gem_object_flush_map(obj);
> +
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL,
> +                                                    obj->mm.pages-
> >sgl,
> +                                                    obj-
> >cache_level,
> +                                                    false, &rq);
> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       ccs_bytes = GET_CCS_BYTES(i915, sz);
> +                       i915_gem_object_flush_map(obj);
> +                       for (i = 0; !err && i < ccs_bytes /
> sizeof(u32); i++) {
> +                               if (vaddr[i]) {

I think this is incorrect. This assumes that CCS data is read back
contiguous for the whole buffer, but instead CCS data is read back
per 8MiB chunk and placed at the beginning of each chunk?

/Thomas



> +                                       pr_err("%ps ccs clearing
> failed, offset: %d/%lu\n",
> +                                              fn, i, (ccs_bytes /
> sizeof(u32)) -  1);
> +                                       igt_hexdump(vaddr + i,
> ccs_bytes - i * sizeof(u32));
> +                                       err = -EINVAL;
> +                               }
> +                       }
> +                       if (err)
> +                               continue;
> +               }
> +               i915_gem_object_unpin_map(obj);
> +       }
>  
> -               if (vaddr[x] != sz) {
> -                       pr_err("%ps failed, size: %u, offset: %zu\n",
> -                              fn, sz, x * sizeof(u32));
> -                       igt_hexdump(vaddr + i * 1024, 4096);
> -                       err = -EINVAL;
> +       if (err) {
> +               if (err != -EDEADLK && err != -EINTR && err != -
> ERESTARTSYS)
> +                       pr_err("%ps failed, size: %u\n", fn, sz);
> +               if (rq && err != -EINVAL) {
> +                       i915_request_wait(rq, 0, HZ);
> +                       i915_request_put(rq);
>                 }
> +
> +               i915_gem_object_unpin_map(obj);
> +       } else {
> +               pr_debug("%ps Passed. size: %u\n", fn, sz);
>         }
>  
> -       i915_gem_object_unpin_map(obj);
> -err_out:
>         i915_gem_object_put(obj);
> -
>         return err;
>  }
>  

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
@ 2022-03-21 10:39     ` Hellstrom, Thomas
  0 siblings, 0 replies; 33+ messages in thread
From: Hellstrom, Thomas @ 2022-03-21 10:39 UTC (permalink / raw)
  To: dri-devel, C, Ramalingam, intel-gfx; +Cc: Auld, Matthew

On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> While clearing the Flat-CCS capable lmem object, we need to clear the
> CCS
> meta data corresponding to the memory.
> 
> As part of live_migrate_clear add check for the ccs meta data clear
> for
> the Flat-CCS capable lmem object.
> 
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
>  drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++-
> --
>  2 files changed, 278 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index c1db8daf994a..bbfea570c239 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> u64 src_addr, u64 dst_addr,
>         return cmd;
>  }
>  
> +static int emit_copy_ccs(struct i915_request *rq,
> +                        u32 dst_offset, u8 dst_access,
> +                        u32 src_offset, u8 src_access, int size)
> +{
> +       struct drm_i915_private *i915 = rq->engine->i915;
> +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> +       u32 num_ccs_blks, ccs_ring_size;
> +       u32 *cs;
> +
> +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> +       WARN_ON(!ccs_ring_size);
> +
> +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> +       if (IS_ERR(cs))
> +               return PTR_ERR(cs);
> +
> +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +                                   NUM_CCS_BYTES_PER_BLOCK);
> +
> +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> +                                     src_access, dst_access,
> +                                     mocs, mocs, num_ccs_blks);
> +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> +       if (ccs_ring_size & 1)
> +               *cs++ = MI_NOOP;
> +
> +       intel_ring_advance(rq, cs);
> +
> +       return 0;
> +}


This would be an unused function if selftests are not configured,
right?


> +
>  static int emit_copy(struct i915_request *rq,
>                      u32 dst_offset, u32 src_offset, int size)
>  {
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index b5da8b8cd039..e32cc994f4a2 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
>         return err;
>  }
>  
> +static int intel_context_copy_ccs(struct intel_context *ce,
> +                                 const struct i915_deps *deps,
> +                                 struct scatterlist *sg,
> +                                 enum i915_cache_level cache_level,
> +                                 bool write_to_ccs,
> +                                 struct i915_request **out)
> +{
> +       u8 src_access = write_to_ccs ? DIRECT_ACCESS :
> INDIRECT_ACCESS;
> +       u8 dst_access = write_to_ccs ? INDIRECT_ACCESS :
> DIRECT_ACCESS;
> +       struct sgt_dma it = sg_sgt(sg);
> +       struct i915_request *rq;
> +       u32 offset;
> +       int err;
> +
> +       GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> +       *out = NULL;
> +
> +       GEM_BUG_ON(ce->ring->size < SZ_64K);
> +
> +       offset = 0;
> +       if (HAS_64K_PAGES(ce->engine->i915))
> +               offset = CHUNK_SZ;
> +       offset += (u64)rq->engine->instance << 32;
> +
> +       do {
> +               int len;
> +
> +               rq = i915_request_create(ce);
> +               if (IS_ERR(rq)) {
> +                       err = PTR_ERR(rq);
> +                       goto out_ce;
> +               }
> +
> +               if (deps) {
> +                       err = i915_request_await_deps(rq, deps);
> +                       if (err)
> +                               goto out_rq;
> +
> +                       if (rq->engine->emit_init_breadcrumb) {
> +                               err = rq->engine-
> >emit_init_breadcrumb(rq);
> +                               if (err)
> +                                       goto out_rq;
> +                       }
> +
> +                       deps = NULL;
> +               }
> +
> +               /* The PTE updates + clear must not be interrupted.
> */
> +               err = emit_no_arbitration(rq);
> +               if (err)
> +                       goto out_rq;
> +
> +               len = emit_pte(rq, &it, cache_level, true, offset,
> CHUNK_SZ);
> +               if (len <= 0) {
> +                       err = len;
> +                       goto out_rq;
> +               }
> +
> +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> +               if (err)
> +                       goto out_rq;
> +
> +               err = emit_copy_ccs(rq, offset, dst_access,
> +                                   offset, src_access, len);
> +               if (err)
> +                       goto out_rq;
> +
> +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
> +                                            MI_FLUSH_DW_CCS);
> +
> +               /* Arbitration is re-enabled between requests. */
> +out_rq:
> +               if (*out)
> +                       i915_request_put(*out);
> +               *out = i915_request_get(rq);
> +               i915_request_add(rq);
> +               if (err || !it.sg || !sg_dma_len(it.sg))
> +                       break;
> +
> +               cond_resched();
> +       } while (1);
> +
> +out_ce:
> +       return err;
> +}
> +
> +static int
> +intel_migrate_ccs_copy(struct intel_migrate *m,
> +                      struct i915_gem_ww_ctx *ww,
> +                      const struct i915_deps *deps,
> +                      struct scatterlist *sg,
> +                      enum i915_cache_level cache_level,
> +                      bool write_to_ccs,
> +                      struct i915_request **out)
> +{
> +       struct intel_context *ce;
> +       int err;
> +
> +       *out = NULL;
> +       if (!m->context)
> +               return -ENODEV;
> +
> +       ce = intel_migrate_create_context(m);
> +       if (IS_ERR(ce))
> +               ce = intel_context_get(m->context);
> +       GEM_BUG_ON(IS_ERR(ce));
> +
> +       err = intel_context_pin_ww(ce, ww);
> +       if (err)
> +               goto out;
> +
> +       err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> +                                    write_to_ccs, out);
> +
> +       intel_context_unpin(ce);
> +out:
> +       intel_context_put(ce);
> +       return err;
> +}
> +
>  static int clear(struct intel_migrate *migrate,
>                  int (*fn)(struct intel_migrate *migrate,
>                            struct i915_gem_ww_ctx *ww,
> @@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
>         struct drm_i915_gem_object *obj;
>         struct i915_request *rq;
>         struct i915_gem_ww_ctx ww;
> -       u32 *vaddr;
> +       u32 *vaddr, val = 0;
> +       bool ccs_cap = false;
>         int err = 0;
>         int i;
>  
> @@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
>         /* Consider the rounded up memory too */
>         sz = obj->base.size;
>  
> +       if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
> +               ccs_cap = true;
> +
>         for_i915_gem_ww(&ww, err, true) {
> +               int ccs_bytes;
> +
>                 err = i915_gem_object_lock(obj, &ww);
>                 if (err)
>                         continue;
> @@ -170,44 +296,136 @@ static int clear(struct intel_migrate
> *migrate,
>                         vaddr[i] = ~i;
>                 i915_gem_object_flush_map(obj);
>  
> -               err = fn(migrate, &ww, obj, sz, &rq);
> -               if (!err)
> -                       continue;
> +               if (ccs_cap && !val) {
> +                       /* Write the obj data into ccs surface */
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL,
> +                                                    obj->mm.pages-
> >sgl,
> +                                                    obj-
> >cache_level,
> +                                                    true, &rq);
> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       for (i = 0; i < sz / sizeof(u32); i++)
> +                               vaddr[i] = 0x5a5a5a5a;
> +                       i915_gem_object_flush_map(obj);
> +
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL, obj->mm.pages->sgl,
> +                                                    obj-
> >cache_level, false, &rq);

Why do we read back CCS content here?

> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       i915_gem_object_flush_map(obj);
> +                       for (i = 0; !err && i < ccs_bytes; i += 4) {
> +                               if (vaddr[i] != ~i) {
> +                                       pr_err("%ps ccs write and
> read failed, offset: %d\n",
> +                                              fn, i);
> +                                       err = -EINVAL;
> +                               }
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       i915_gem_object_flush_map(obj);
> +               }
>  
> -               if (err != -EDEADLK && err != -EINTR && err != -
> ERESTARTSYS)
> -                       pr_err("%ps failed, size: %u\n", fn, sz);
> -               if (rq) {
> -                       i915_request_wait(rq, 0, HZ);
> +               err = fn(migrate, &ww, obj, val, &rq);
> +               if (rq && !err) {
> +                       if (i915_request_wait(rq, 0, HZ) < 0) {
> +                               pr_err("%ps timed out, size: %u\n",
> fn, sz);
> +                               err = -ETIME;
> +                       }
>                         i915_request_put(rq);
> +                       rq = NULL;
>                 }
> -               i915_gem_object_unpin_map(obj);
> -       }
> -       if (err)
> -               goto err_out;
> +               if (err)
> +                       continue;
>  
> -       if (rq) {
> -               if (i915_request_wait(rq, 0, HZ) < 0) {
> -                       pr_err("%ps timed out, size: %u\n", fn, sz);
> -                       err = -ETIME;
> +               i915_gem_object_flush_map(obj);
> +
> +               /* Verify the set/clear of the obj mem */
> +               for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> +                       int x = i * 1024 +
> +                               i915_prandom_u32_max_state(1024,
> prng);
> +
> +                       if (vaddr[x] != val) {
> +                               pr_err("%ps failed, (%u != %u),
> offset: %zu\n",
> +                                      fn, vaddr[x], val,  x *
> sizeof(u32));
> +                               igt_hexdump(vaddr + i * 1024, 4096);
> +                               err = -EINVAL;
> +                       }
>                 }
> -               i915_request_put(rq);
> -       }
> +               if (err)
> +                       continue;
>  
> -       for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> -               int x = i * 1024 + i915_prandom_u32_max_state(1024,
> prng);
> +               if (ccs_cap && !val) {
> +                       for (i = 0; i < sz / sizeof(u32); i++)
> +                               vaddr[i] = ~i;
> +                       i915_gem_object_flush_map(obj);
> +
> +                       err = intel_migrate_ccs_copy(migrate, &ww,
> NULL,
> +                                                    obj->mm.pages-
> >sgl,
> +                                                    obj-
> >cache_level,
> +                                                    false, &rq);
> +                       if (rq && !err) {
> +                               if (i915_request_wait(rq, 0, HZ) < 0)
> {
> +                                       pr_err("%ps timed out, size:
> %u\n",
> +                                              fn, sz);
> +                                       err = -ETIME;
> +                               }
> +                               i915_request_put(rq);
> +                               rq = NULL;
> +                       }
> +                       if (err)
> +                               continue;
> +
> +                       ccs_bytes = GET_CCS_BYTES(i915, sz);
> +                       i915_gem_object_flush_map(obj);
> +                       for (i = 0; !err && i < ccs_bytes /
> sizeof(u32); i++) {
> +                               if (vaddr[i]) {

I think this is incorrect. This assumes that CCS data is read back
contiguous for the whole buffer, but instead CCS data is read back
per 8MiB chunk and placed at the beginning of each chunk?

/Thomas



> +                                       pr_err("%ps ccs clearing
> failed, offset: %d/%lu\n",
> +                                              fn, i, (ccs_bytes /
> sizeof(u32)) -  1);
> +                                       igt_hexdump(vaddr + i,
> ccs_bytes - i * sizeof(u32));
> +                                       err = -EINVAL;
> +                               }
> +                       }
> +                       if (err)
> +                               continue;
> +               }
> +               i915_gem_object_unpin_map(obj);
> +       }
>  
> -               if (vaddr[x] != sz) {
> -                       pr_err("%ps failed, size: %u, offset: %zu\n",
> -                              fn, sz, x * sizeof(u32));
> -                       igt_hexdump(vaddr + i * 1024, 4096);
> -                       err = -EINVAL;
> +       if (err) {
> +               if (err != -EDEADLK && err != -EINTR && err != -
> ERESTARTSYS)
> +                       pr_err("%ps failed, size: %u\n", fn, sz);
> +               if (rq && err != -EINVAL) {
> +                       i915_request_wait(rq, 0, HZ);
> +                       i915_request_put(rq);
>                 }
> +
> +               i915_gem_object_unpin_map(obj);
> +       } else {
> +               pr_debug("%ps Passed. size: %u\n", fn, sz);
>         }
>  
> -       i915_gem_object_unpin_map(obj);
> -err_out:
>         i915_gem_object_put(obj);
> -
>         return err;
>  }
>  

----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
  2022-03-21 10:39     ` Hellstrom, Thomas
@ 2022-03-21 23:05       ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-21 23:05 UTC (permalink / raw)
  To: Hellstrom, Thomas; +Cc: intel-gfx, Auld, Matthew, dri-devel

On 2022-03-21 at 16:09:08 +0530, Hellstrom, Thomas wrote:
> On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> > While clearing the Flat-CCS capable lmem object, we need to clear the
> > CCS
> > meta data corresponding to the memory.
> >
> > As part of live_migrate_clear add check for the ccs meta data clear
> > for
> > the Flat-CCS capable lmem object.
> >
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
> >  drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++-
> > --
> >  2 files changed, 278 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index c1db8daf994a..bbfea570c239 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> > u64 src_addr, u64 dst_addr,
> >         return cmd;
> >  }
> >
> > +static int emit_copy_ccs(struct i915_request *rq,
> > +                        u32 dst_offset, u8 dst_access,
> > +                        u32 src_offset, u8 src_access, int size)
> > +{
> > +       struct drm_i915_private *i915 = rq->engine->i915;
> > +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> > +       u32 num_ccs_blks, ccs_ring_size;
> > +       u32 *cs;
> > +
> > +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> > +       WARN_ON(!ccs_ring_size);
> > +
> > +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> > +       if (IS_ERR(cs))
> > +               return PTR_ERR(cs);
> > +
> > +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +                                   NUM_CCS_BYTES_PER_BLOCK);
> > +
> > +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> > +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> > +                                     src_access, dst_access,
> > +                                     mocs, mocs, num_ccs_blks);
> > +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> > +       if (ccs_ring_size & 1)
> > +               *cs++ = MI_NOOP;
> > +
> > +       intel_ring_advance(rq, cs);
> > +
> > +       return 0;
> > +}
> 
> 
> This would be an unused function if selftests are not configured,
> right?
No Thomas. This is reused between selftest and eviction flow. in next
version i am reusing it for evict_clear too.

> 
> 
> > +
> >  static int emit_copy(struct i915_request *rq,
> >                      u32 dst_offset, u32 src_offset, int size)
> >  {
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > index b5da8b8cd039..e32cc994f4a2 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > @@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
> >         return err;
> >  }
> >
> > +static int intel_context_copy_ccs(struct intel_context *ce,
> > +                                 const struct i915_deps *deps,
> > +                                 struct scatterlist *sg,
> > +                                 enum i915_cache_level cache_level,
> > +                                 bool write_to_ccs,
> > +                                 struct i915_request **out)
> > +{
> > +       u8 src_access = write_to_ccs ? DIRECT_ACCESS :
> > INDIRECT_ACCESS;
> > +       u8 dst_access = write_to_ccs ? INDIRECT_ACCESS :
> > DIRECT_ACCESS;
> > +       struct sgt_dma it = sg_sgt(sg);
> > +       struct i915_request *rq;
> > +       u32 offset;
> > +       int err;
> > +
> > +       GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> > +       *out = NULL;
> > +
> > +       GEM_BUG_ON(ce->ring->size < SZ_64K);
> > +
> > +       offset = 0;
> > +       if (HAS_64K_PAGES(ce->engine->i915))
> > +               offset = CHUNK_SZ;
> > +       offset += (u64)rq->engine->instance << 32;
> > +
> > +       do {
> > +               int len;
> > +
> > +               rq = i915_request_create(ce);
> > +               if (IS_ERR(rq)) {
> > +                       err = PTR_ERR(rq);
> > +                       goto out_ce;
> > +               }
> > +
> > +               if (deps) {
> > +                       err = i915_request_await_deps(rq, deps);
> > +                       if (err)
> > +                               goto out_rq;
> > +
> > +                       if (rq->engine->emit_init_breadcrumb) {
> > +                               err = rq->engine-
> > >emit_init_breadcrumb(rq);
> > +                               if (err)
> > +                                       goto out_rq;
> > +                       }
> > +
> > +                       deps = NULL;
> > +               }
> > +
> > +               /* The PTE updates + clear must not be interrupted.
> > */
> > +               err = emit_no_arbitration(rq);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               len = emit_pte(rq, &it, cache_level, true, offset,
> > CHUNK_SZ);
> > +               if (len <= 0) {
> > +                       err = len;
> > +                       goto out_rq;
> > +               }
> > +
> > +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               err = emit_copy_ccs(rq, offset, dst_access,
> > +                                   offset, src_access, len);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
> > +                                            MI_FLUSH_DW_CCS);
> > +
> > +               /* Arbitration is re-enabled between requests. */
> > +out_rq:
> > +               if (*out)
> > +                       i915_request_put(*out);
> > +               *out = i915_request_get(rq);
> > +               i915_request_add(rq);
> > +               if (err || !it.sg || !sg_dma_len(it.sg))
> > +                       break;
> > +
> > +               cond_resched();
> > +       } while (1);
> > +
> > +out_ce:
> > +       return err;
> > +}
> > +
> > +static int
> > +intel_migrate_ccs_copy(struct intel_migrate *m,
> > +                      struct i915_gem_ww_ctx *ww,
> > +                      const struct i915_deps *deps,
> > +                      struct scatterlist *sg,
> > +                      enum i915_cache_level cache_level,
> > +                      bool write_to_ccs,
> > +                      struct i915_request **out)
> > +{
> > +       struct intel_context *ce;
> > +       int err;
> > +
> > +       *out = NULL;
> > +       if (!m->context)
> > +               return -ENODEV;
> > +
> > +       ce = intel_migrate_create_context(m);
> > +       if (IS_ERR(ce))
> > +               ce = intel_context_get(m->context);
> > +       GEM_BUG_ON(IS_ERR(ce));
> > +
> > +       err = intel_context_pin_ww(ce, ww);
> > +       if (err)
> > +               goto out;
> > +
> > +       err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> > +                                    write_to_ccs, out);
> > +
> > +       intel_context_unpin(ce);
> > +out:
> > +       intel_context_put(ce);
> > +       return err;
> > +}
> > +
> >  static int clear(struct intel_migrate *migrate,
> >                  int (*fn)(struct intel_migrate *migrate,
> >                            struct i915_gem_ww_ctx *ww,
> > @@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
> >         struct drm_i915_gem_object *obj;
> >         struct i915_request *rq;
> >         struct i915_gem_ww_ctx ww;
> > -       u32 *vaddr;
> > +       u32 *vaddr, val = 0;
> > +       bool ccs_cap = false;
> >         int err = 0;
> >         int i;
> >
> > @@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
> >         /* Consider the rounded up memory too */
> >         sz = obj->base.size;
> >
> > +       if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
> > +               ccs_cap = true;
> > +
> >         for_i915_gem_ww(&ww, err, true) {
> > +               int ccs_bytes;
> > +
> >                 err = i915_gem_object_lock(obj, &ww);
> >                 if (err)
> >                         continue;
> > @@ -170,44 +296,136 @@ static int clear(struct intel_migrate
> > *migrate,
> >                         vaddr[i] = ~i;
> >                 i915_gem_object_flush_map(obj);
> >
> > -               err = fn(migrate, &ww, obj, sz, &rq);
> > -               if (!err)
> > -                       continue;
> > +               if (ccs_cap && !val) {
> > +                       /* Write the obj data into ccs surface */
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL,
> > +                                                    obj->mm.pages-
> > >sgl,
> > +                                                    obj-
> > >cache_level,
> > +                                                    true, &rq);
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       for (i = 0; i < sz / sizeof(u32); i++)
> > +                               vaddr[i] = 0x5a5a5a5a;
> > +                       i915_gem_object_flush_map(obj);
> > +
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL, obj->mm.pages->sgl,
> > +                                                    obj-
> > >cache_level, false, &rq);
> 
> Why do we read back CCS content here?
Was rechecking the ccs copy. but this is not needed for real intention.
Removing in next version.
> 
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       i915_gem_object_flush_map(obj);
> > +                       for (i = 0; !err && i < ccs_bytes; i += 4) {
> > +                               if (vaddr[i] != ~i) {
> > +                                       pr_err("%ps ccs write and
> > read failed, offset: %d\n",
> > +                                              fn, i);
> > +                                       err = -EINVAL;
> > +                               }
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       i915_gem_object_flush_map(obj);
> > +               }
> >
> > -               if (err != -EDEADLK && err != -EINTR && err != -
> > ERESTARTSYS)
> > -                       pr_err("%ps failed, size: %u\n", fn, sz);
> > -               if (rq) {
> > -                       i915_request_wait(rq, 0, HZ);
> > +               err = fn(migrate, &ww, obj, val, &rq);
> > +               if (rq && !err) {
> > +                       if (i915_request_wait(rq, 0, HZ) < 0) {
> > +                               pr_err("%ps timed out, size: %u\n",
> > fn, sz);
> > +                               err = -ETIME;
> > +                       }
> >                         i915_request_put(rq);
> > +                       rq = NULL;
> >                 }
> > -               i915_gem_object_unpin_map(obj);
> > -       }
> > -       if (err)
> > -               goto err_out;
> > +               if (err)
> > +                       continue;
> >
> > -       if (rq) {
> > -               if (i915_request_wait(rq, 0, HZ) < 0) {
> > -                       pr_err("%ps timed out, size: %u\n", fn, sz);
> > -                       err = -ETIME;
> > +               i915_gem_object_flush_map(obj);
> > +
> > +               /* Verify the set/clear of the obj mem */
> > +               for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> > +                       int x = i * 1024 +
> > +                               i915_prandom_u32_max_state(1024,
> > prng);
> > +
> > +                       if (vaddr[x] != val) {
> > +                               pr_err("%ps failed, (%u != %u),
> > offset: %zu\n",
> > +                                      fn, vaddr[x], val,  x *
> > sizeof(u32));
> > +                               igt_hexdump(vaddr + i * 1024, 4096);
> > +                               err = -EINVAL;
> > +                       }
> >                 }
> > -               i915_request_put(rq);
> > -       }
> > +               if (err)
> > +                       continue;
> >
> > -       for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> > -               int x = i * 1024 + i915_prandom_u32_max_state(1024,
> > prng);
> > +               if (ccs_cap && !val) {
> > +                       for (i = 0; i < sz / sizeof(u32); i++)
> > +                               vaddr[i] = ~i;
> > +                       i915_gem_object_flush_map(obj);
> > +
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL,
> > +                                                    obj->mm.pages-
> > >sgl,
> > +                                                    obj-
> > >cache_level,
> > +                                                    false, &rq);
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       ccs_bytes = GET_CCS_BYTES(i915, sz);
> > +                       i915_gem_object_flush_map(obj);
> > +                       for (i = 0; !err && i < ccs_bytes /
> > sizeof(u32); i++) {
> > +                               if (vaddr[i]) {
> 
> I think this is incorrect. This assumes that CCS data is read back
> contiguous for the whole buffer, but instead CCS data is read back
> per 8MiB chunk and placed at the beginning of each chunk?

Yes. This is the source for the problem I was discussing with you. Fixed
it in the next version. Please share your feedback. could have used a
dedicated obj for ccs, but just calculated the offset of the ccs bytes.

Ram
> 
> /Thomas
> 
> 
> 
> > +                                       pr_err("%ps ccs clearing
> > failed, offset: %d/%lu\n",
> > +                                              fn, i, (ccs_bytes /
> > sizeof(u32)) -  1);
> > +                                       igt_hexdump(vaddr + i,
> > ccs_bytes - i * sizeof(u32));
> > +                                       err = -EINVAL;
> > +                               }
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +               }
> > +               i915_gem_object_unpin_map(obj);
> > +       }
> >
> > -               if (vaddr[x] != sz) {
> > -                       pr_err("%ps failed, size: %u, offset: %zu\n",
> > -                              fn, sz, x * sizeof(u32));
> > -                       igt_hexdump(vaddr + i * 1024, 4096);
> > -                       err = -EINVAL;
> > +       if (err) {
> > +               if (err != -EDEADLK && err != -EINTR && err != -
> > ERESTARTSYS)
> > +                       pr_err("%ps failed, size: %u\n", fn, sz);
> > +               if (rq && err != -EINVAL) {
> > +                       i915_request_wait(rq, 0, HZ);
> > +                       i915_request_put(rq);
> >                 }
> > +
> > +               i915_gem_object_unpin_map(obj);
> > +       } else {
> > +               pr_debug("%ps Passed. size: %u\n", fn, sz);
> >         }
> >
> > -       i915_gem_object_unpin_map(obj);
> > -err_out:
> >         i915_gem_object_put(obj);
> > -
> >         return err;
> >  }
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear
@ 2022-03-21 23:05       ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-21 23:05 UTC (permalink / raw)
  To: Hellstrom, Thomas; +Cc: intel-gfx, Auld, Matthew, dri-devel

On 2022-03-21 at 16:09:08 +0530, Hellstrom, Thomas wrote:
> On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> > While clearing the Flat-CCS capable lmem object, we need to clear the
> > CCS
> > meta data corresponding to the memory.
> >
> > As part of live_migrate_clear add check for the ccs meta data clear
> > for
> > the Flat-CCS capable lmem object.
> >
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_migrate.c    |  32 +++
> >  drivers/gpu/drm/i915/gt/selftest_migrate.c | 274 ++++++++++++++++++-
> > --
> >  2 files changed, 278 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index c1db8daf994a..bbfea570c239 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -572,6 +572,38 @@ static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd,
> > u64 src_addr, u64 dst_addr,
> >         return cmd;
> >  }
> >
> > +static int emit_copy_ccs(struct i915_request *rq,
> > +                        u32 dst_offset, u8 dst_access,
> > +                        u32 src_offset, u8 src_access, int size)
> > +{
> > +       struct drm_i915_private *i915 = rq->engine->i915;
> > +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> > +       u32 num_ccs_blks, ccs_ring_size;
> > +       u32 *cs;
> > +
> > +       ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> > +       WARN_ON(!ccs_ring_size);
> > +
> > +       cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> > +       if (IS_ERR(cs))
> > +               return PTR_ERR(cs);
> > +
> > +       num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +                                   NUM_CCS_BYTES_PER_BLOCK);
> > +
> > +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> > +       cs = _i915_ctrl_surf_copy_blt(cs, src_offset, dst_offset,
> > +                                     src_access, dst_access,
> > +                                     mocs, mocs, num_ccs_blks);
> > +       cs = i915_flush_dw(cs, MI_FLUSH_DW_LLC | MI_FLUSH_DW_CCS);
> > +       if (ccs_ring_size & 1)
> > +               *cs++ = MI_NOOP;
> > +
> > +       intel_ring_advance(rq, cs);
> > +
> > +       return 0;
> > +}
> 
> 
> This would be an unused function if selftests are not configured,
> right?
No Thomas. This is reused between selftest and eviction flow. in next
version i am reusing it for evict_clear too.

> 
> 
> > +
> >  static int emit_copy(struct i915_request *rq,
> >                      u32 dst_offset, u32 src_offset, int size)
> >  {
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > index b5da8b8cd039..e32cc994f4a2 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> > @@ -132,6 +132,126 @@ static int copy(struct intel_migrate *migrate,
> >         return err;
> >  }
> >
> > +static int intel_context_copy_ccs(struct intel_context *ce,
> > +                                 const struct i915_deps *deps,
> > +                                 struct scatterlist *sg,
> > +                                 enum i915_cache_level cache_level,
> > +                                 bool write_to_ccs,
> > +                                 struct i915_request **out)
> > +{
> > +       u8 src_access = write_to_ccs ? DIRECT_ACCESS :
> > INDIRECT_ACCESS;
> > +       u8 dst_access = write_to_ccs ? INDIRECT_ACCESS :
> > DIRECT_ACCESS;
> > +       struct sgt_dma it = sg_sgt(sg);
> > +       struct i915_request *rq;
> > +       u32 offset;
> > +       int err;
> > +
> > +       GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);
> > +       *out = NULL;
> > +
> > +       GEM_BUG_ON(ce->ring->size < SZ_64K);
> > +
> > +       offset = 0;
> > +       if (HAS_64K_PAGES(ce->engine->i915))
> > +               offset = CHUNK_SZ;
> > +       offset += (u64)rq->engine->instance << 32;
> > +
> > +       do {
> > +               int len;
> > +
> > +               rq = i915_request_create(ce);
> > +               if (IS_ERR(rq)) {
> > +                       err = PTR_ERR(rq);
> > +                       goto out_ce;
> > +               }
> > +
> > +               if (deps) {
> > +                       err = i915_request_await_deps(rq, deps);
> > +                       if (err)
> > +                               goto out_rq;
> > +
> > +                       if (rq->engine->emit_init_breadcrumb) {
> > +                               err = rq->engine-
> > >emit_init_breadcrumb(rq);
> > +                               if (err)
> > +                                       goto out_rq;
> > +                       }
> > +
> > +                       deps = NULL;
> > +               }
> > +
> > +               /* The PTE updates + clear must not be interrupted.
> > */
> > +               err = emit_no_arbitration(rq);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               len = emit_pte(rq, &it, cache_level, true, offset,
> > CHUNK_SZ);
> > +               if (len <= 0) {
> > +                       err = len;
> > +                       goto out_rq;
> > +               }
> > +
> > +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               err = emit_copy_ccs(rq, offset, dst_access,
> > +                                   offset, src_access, len);
> > +               if (err)
> > +                       goto out_rq;
> > +
> > +               err = rq->engine->emit_flush(rq, EMIT_INVALIDATE |
> > +                                            MI_FLUSH_DW_CCS);
> > +
> > +               /* Arbitration is re-enabled between requests. */
> > +out_rq:
> > +               if (*out)
> > +                       i915_request_put(*out);
> > +               *out = i915_request_get(rq);
> > +               i915_request_add(rq);
> > +               if (err || !it.sg || !sg_dma_len(it.sg))
> > +                       break;
> > +
> > +               cond_resched();
> > +       } while (1);
> > +
> > +out_ce:
> > +       return err;
> > +}
> > +
> > +static int
> > +intel_migrate_ccs_copy(struct intel_migrate *m,
> > +                      struct i915_gem_ww_ctx *ww,
> > +                      const struct i915_deps *deps,
> > +                      struct scatterlist *sg,
> > +                      enum i915_cache_level cache_level,
> > +                      bool write_to_ccs,
> > +                      struct i915_request **out)
> > +{
> > +       struct intel_context *ce;
> > +       int err;
> > +
> > +       *out = NULL;
> > +       if (!m->context)
> > +               return -ENODEV;
> > +
> > +       ce = intel_migrate_create_context(m);
> > +       if (IS_ERR(ce))
> > +               ce = intel_context_get(m->context);
> > +       GEM_BUG_ON(IS_ERR(ce));
> > +
> > +       err = intel_context_pin_ww(ce, ww);
> > +       if (err)
> > +               goto out;
> > +
> > +       err = intel_context_copy_ccs(ce, deps, sg, cache_level,
> > +                                    write_to_ccs, out);
> > +
> > +       intel_context_unpin(ce);
> > +out:
> > +       intel_context_put(ce);
> > +       return err;
> > +}
> > +
> >  static int clear(struct intel_migrate *migrate,
> >                  int (*fn)(struct intel_migrate *migrate,
> >                            struct i915_gem_ww_ctx *ww,
> > @@ -144,7 +264,8 @@ static int clear(struct intel_migrate *migrate,
> >         struct drm_i915_gem_object *obj;
> >         struct i915_request *rq;
> >         struct i915_gem_ww_ctx ww;
> > -       u32 *vaddr;
> > +       u32 *vaddr, val = 0;
> > +       bool ccs_cap = false;
> >         int err = 0;
> >         int i;
> >
> > @@ -155,7 +276,12 @@ static int clear(struct intel_migrate *migrate,
> >         /* Consider the rounded up memory too */
> >         sz = obj->base.size;
> >
> > +       if (HAS_FLAT_CCS(i915) && i915_gem_object_is_lmem(obj))
> > +               ccs_cap = true;
> > +
> >         for_i915_gem_ww(&ww, err, true) {
> > +               int ccs_bytes;
> > +
> >                 err = i915_gem_object_lock(obj, &ww);
> >                 if (err)
> >                         continue;
> > @@ -170,44 +296,136 @@ static int clear(struct intel_migrate
> > *migrate,
> >                         vaddr[i] = ~i;
> >                 i915_gem_object_flush_map(obj);
> >
> > -               err = fn(migrate, &ww, obj, sz, &rq);
> > -               if (!err)
> > -                       continue;
> > +               if (ccs_cap && !val) {
> > +                       /* Write the obj data into ccs surface */
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL,
> > +                                                    obj->mm.pages-
> > >sgl,
> > +                                                    obj-
> > >cache_level,
> > +                                                    true, &rq);
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       for (i = 0; i < sz / sizeof(u32); i++)
> > +                               vaddr[i] = 0x5a5a5a5a;
> > +                       i915_gem_object_flush_map(obj);
> > +
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL, obj->mm.pages->sgl,
> > +                                                    obj-
> > >cache_level, false, &rq);
> 
> Why do we read back CCS content here?
Was rechecking the ccs copy. but this is not needed for real intention.
Removing in next version.
> 
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       i915_gem_object_flush_map(obj);
> > +                       for (i = 0; !err && i < ccs_bytes; i += 4) {
> > +                               if (vaddr[i] != ~i) {
> > +                                       pr_err("%ps ccs write and
> > read failed, offset: %d\n",
> > +                                              fn, i);
> > +                                       err = -EINVAL;
> > +                               }
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       i915_gem_object_flush_map(obj);
> > +               }
> >
> > -               if (err != -EDEADLK && err != -EINTR && err != -
> > ERESTARTSYS)
> > -                       pr_err("%ps failed, size: %u\n", fn, sz);
> > -               if (rq) {
> > -                       i915_request_wait(rq, 0, HZ);
> > +               err = fn(migrate, &ww, obj, val, &rq);
> > +               if (rq && !err) {
> > +                       if (i915_request_wait(rq, 0, HZ) < 0) {
> > +                               pr_err("%ps timed out, size: %u\n",
> > fn, sz);
> > +                               err = -ETIME;
> > +                       }
> >                         i915_request_put(rq);
> > +                       rq = NULL;
> >                 }
> > -               i915_gem_object_unpin_map(obj);
> > -       }
> > -       if (err)
> > -               goto err_out;
> > +               if (err)
> > +                       continue;
> >
> > -       if (rq) {
> > -               if (i915_request_wait(rq, 0, HZ) < 0) {
> > -                       pr_err("%ps timed out, size: %u\n", fn, sz);
> > -                       err = -ETIME;
> > +               i915_gem_object_flush_map(obj);
> > +
> > +               /* Verify the set/clear of the obj mem */
> > +               for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> > +                       int x = i * 1024 +
> > +                               i915_prandom_u32_max_state(1024,
> > prng);
> > +
> > +                       if (vaddr[x] != val) {
> > +                               pr_err("%ps failed, (%u != %u),
> > offset: %zu\n",
> > +                                      fn, vaddr[x], val,  x *
> > sizeof(u32));
> > +                               igt_hexdump(vaddr + i * 1024, 4096);
> > +                               err = -EINVAL;
> > +                       }
> >                 }
> > -               i915_request_put(rq);
> > -       }
> > +               if (err)
> > +                       continue;
> >
> > -       for (i = 0; !err && i < sz / PAGE_SIZE; i++) {
> > -               int x = i * 1024 + i915_prandom_u32_max_state(1024,
> > prng);
> > +               if (ccs_cap && !val) {
> > +                       for (i = 0; i < sz / sizeof(u32); i++)
> > +                               vaddr[i] = ~i;
> > +                       i915_gem_object_flush_map(obj);
> > +
> > +                       err = intel_migrate_ccs_copy(migrate, &ww,
> > NULL,
> > +                                                    obj->mm.pages-
> > >sgl,
> > +                                                    obj-
> > >cache_level,
> > +                                                    false, &rq);
> > +                       if (rq && !err) {
> > +                               if (i915_request_wait(rq, 0, HZ) < 0)
> > {
> > +                                       pr_err("%ps timed out, size:
> > %u\n",
> > +                                              fn, sz);
> > +                                       err = -ETIME;
> > +                               }
> > +                               i915_request_put(rq);
> > +                               rq = NULL;
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +
> > +                       ccs_bytes = GET_CCS_BYTES(i915, sz);
> > +                       i915_gem_object_flush_map(obj);
> > +                       for (i = 0; !err && i < ccs_bytes /
> > sizeof(u32); i++) {
> > +                               if (vaddr[i]) {
> 
> I think this is incorrect. This assumes that CCS data is read back
> contiguous for the whole buffer, but instead CCS data is read back
> per 8MiB chunk and placed at the beginning of each chunk?

Yes. This is the source for the problem I was discussing with you. Fixed
it in the next version. Please share your feedback. could have used a
dedicated obj for ccs, but just calculated the offset of the ccs bytes.

Ram
> 
> /Thomas
> 
> 
> 
> > +                                       pr_err("%ps ccs clearing
> > failed, offset: %d/%lu\n",
> > +                                              fn, i, (ccs_bytes /
> > sizeof(u32)) -  1);
> > +                                       igt_hexdump(vaddr + i,
> > ccs_bytes - i * sizeof(u32));
> > +                                       err = -EINVAL;
> > +                               }
> > +                       }
> > +                       if (err)
> > +                               continue;
> > +               }
> > +               i915_gem_object_unpin_map(obj);
> > +       }
> >
> > -               if (vaddr[x] != sz) {
> > -                       pr_err("%ps failed, size: %u, offset: %zu\n",
> > -                              fn, sz, x * sizeof(u32));
> > -                       igt_hexdump(vaddr + i * 1024, 4096);
> > -                       err = -EINVAL;
> > +       if (err) {
> > +               if (err != -EDEADLK && err != -EINTR && err != -
> > ERESTARTSYS)
> > +                       pr_err("%ps failed, size: %u\n", fn, sz);
> > +               if (rq && err != -EINVAL) {
> > +                       i915_request_wait(rq, 0, HZ);
> > +                       i915_request_put(rq);
> >                 }
> > +
> > +               i915_gem_object_unpin_map(obj);
> > +       } else {
> > +               pr_debug("%ps Passed. size: %u\n", fn, sz);
> >         }
> >
> > -       i915_gem_object_unpin_map(obj);
> > -err_out:
> >         i915_gem_object_put(obj);
> > -
> >         return err;
> >  }
> >
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt
  2022-03-21 10:11   ` Das, Nirmoy
@ 2022-03-21 23:06     ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-21 23:06 UTC (permalink / raw)
  To: Das, Nirmoy
  Cc: Thomas Hellstrom, intel-gfx, dri-devel, Hellstrom Thomas,
	Matthew Auld, Christian Koenig

On 2022-03-21 at 11:11:33 +0100, Das, Nirmoy wrote:
> In the previous version I replied only to the mailing list email so probably
> my email slipped through.

Sorry for the miss. Thank so much for the review.

Ram
> 
> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> for patch 6-7
> 
> On 3/19/2022 9:42 PM, Ramalingam C wrote:
> > Add a parameter called "extra_pages" for ttm_tt_init, to indicate that
> > driver needs extra pages in ttm_tt.
> > 
> > v2:
> >    Used imperative wording [Thomas and Christian]
> > 
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > cc: Christian Koenig <christian.koenig@amd.com>
> > cc: Hellstrom Thomas <thomas.hellstrom@intel.com>
> > Reviewed-by: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
> > Reviewed-by: Christian Konig <christian.koenig@amd.com>
> > ---
> >   drivers/gpu/drm/drm_gem_vram_helper.c      |  2 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm.c    |  2 +-
> >   drivers/gpu/drm/qxl/qxl_ttm.c              |  2 +-
> >   drivers/gpu/drm/ttm/ttm_agp_backend.c      |  2 +-
> >   drivers/gpu/drm/ttm/ttm_tt.c               | 12 +++++++-----
> >   drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
> >   include/drm/ttm/ttm_tt.h                   |  4 +++-
> >   7 files changed, 15 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c
> > index dc7f938bfff2..123045b58fec 100644
> > --- a/drivers/gpu/drm/drm_gem_vram_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_vram_helper.c
> > @@ -867,7 +867,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo,
> >   	if (!tt)
> >   		return NULL;
> > -	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);
> > +	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
> >   	if (ret < 0)
> >   		goto err_ttm_tt_init;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > index e4a06fcf741a..3b9f99c765c4 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> > @@ -290,7 +290,7 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
> >   		i915_tt->is_shmem = true;
> >   	}
> > -	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching);
> > +	ret = ttm_tt_init(&i915_tt->ttm, bo, page_flags, caching, 0);
> >   	if (ret)
> >   		goto err_free;
> > diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
> > index b2e33d5ba5d0..52156b54498f 100644
> > --- a/drivers/gpu/drm/qxl/qxl_ttm.c
> > +++ b/drivers/gpu/drm/qxl/qxl_ttm.c
> > @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
> >   	ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
> >   	if (ttm == NULL)
> >   		return NULL;
> > -	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
> > +	if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
> >   		kfree(ttm);
> >   		return NULL;
> >   	}
> > diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> > index 6ddc16f0fe2b..d27691f2e451 100644
> > --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
> > +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
> > @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo,
> >   	agp_be->mem = NULL;
> >   	agp_be->bridge = bridge;
> > -	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined)) {
> > +	if (ttm_tt_init(&agp_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
> >   		kfree(agp_be);
> >   		return NULL;
> >   	}
> > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > index d234aab800a0..1a66d9fc589a 100644
> > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > @@ -134,9 +134,10 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
> >   static void ttm_tt_init_fields(struct ttm_tt *ttm,
> >   			       struct ttm_buffer_object *bo,
> >   			       uint32_t page_flags,
> > -			       enum ttm_caching caching)
> > +			       enum ttm_caching caching,
> > +			       unsigned long extra_pages)
> >   {
> > -	ttm->num_pages = PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT;
> > +	ttm->num_pages = (PAGE_ALIGN(bo->base.size) >> PAGE_SHIFT) + extra_pages;
> >   	ttm->caching = ttm_cached;
> >   	ttm->page_flags = page_flags;
> >   	ttm->dma_address = NULL;
> > @@ -146,9 +147,10 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
> >   }
> >   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> > -		uint32_t page_flags, enum ttm_caching caching)
> > +		uint32_t page_flags, enum ttm_caching caching,
> > +		unsigned long extra_pages)
> >   {
> > -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> > +	ttm_tt_init_fields(ttm, bo, page_flags, caching, extra_pages);
> >   	if (ttm_tt_alloc_page_directory(ttm)) {
> >   		pr_err("Failed allocating page table\n");
> > @@ -180,7 +182,7 @@ int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> >   {
> >   	int ret;
> > -	ttm_tt_init_fields(ttm, bo, page_flags, caching);
> > +	ttm_tt_init_fields(ttm, bo, page_flags, caching, 0);
> >   	if (page_flags & TTM_TT_FLAG_EXTERNAL)
> >   		ret = ttm_sg_tt_alloc_page_directory(ttm);
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > index b84ecc6d6611..4e3938e62c08 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
> > @@ -517,7 +517,7 @@ static struct ttm_tt *vmw_ttm_tt_create(struct ttm_buffer_object *bo,
> >   				     ttm_cached);
> >   	else
> >   		ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
> > -				  ttm_cached);
> > +				  ttm_cached, 0);
> >   	if (unlikely(ret != 0))
> >   		goto out_no_init;
> > diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> > index f20832139815..17a0310e8aaa 100644
> > --- a/include/drm/ttm/ttm_tt.h
> > +++ b/include/drm/ttm/ttm_tt.h
> > @@ -140,6 +140,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
> >    * @bo: The buffer object we create the ttm for.
> >    * @page_flags: Page flags as identified by TTM_TT_FLAG_XX flags.
> >    * @caching: the desired caching state of the pages
> > + * @extra_pages: Extra pages needed for the driver.
> >    *
> >    * Create a struct ttm_tt to back data with system memory pages.
> >    * No pages are actually allocated.
> > @@ -147,7 +148,8 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc);
> >    * NULL: Out of memory.
> >    */
> >   int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> > -		uint32_t page_flags, enum ttm_caching caching);
> > +		uint32_t page_flags, enum ttm_caching caching,
> > +		unsigned long extra_pages);
> >   int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
> >   		   uint32_t page_flags, enum ttm_caching caching);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
  2022-03-21  8:49     ` [Intel-gfx] " Hellstrom, Thomas
@ 2022-03-21 23:07       ` Ramalingam C
  -1 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-21 23:07 UTC (permalink / raw)
  To: Hellstrom, Thomas; +Cc: intel-gfx, Auld, Matthew, dri-devel

On 2022-03-21 at 14:19:01 +0530, Hellstrom, Thomas wrote:
> On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> > XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence
> > for
> > clearing (Zero out) the pages of the newly allocated object, faster
> > cmd
> > is used.
> 
> NIT: Imperative wording
> 
> >
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Also there's a typo in the patch title.
Fixed them in the next version. Thanks for the review Thomas.

Ram
> 
> With that fixed:
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
> >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++-
> > --
> >  2 files changed, 43 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index d112ffd56418..925e55b6a94f 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -205,6 +205,11 @@
> >
> >  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> > 2))
> >  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
> > +#define XY_FAST_COLOR_BLT_CMD          (2 << 29 | 0x44 << 22)
> > +#define   XY_FAST_COLOR_BLT_DEPTH_32   (2 << 19)
> > +#define   XY_FAST_COLOR_BLT_DW         16
> > +#define   XY_FAST_COLOR_BLT_MOCS_MASK  GENMASK(27, 21)
> > +#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
> >  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
> >  #define GEN9_XY_FAST_COPY_BLT_CMD      (2 << 29 | 0x42 << 22)
> >  #define XY_SRC_COPY_BLT_CMD            (2 << 29 | 0x53 << 22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index 20444d6ceb3c..73199ebf0671 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context
> > *ce,
> >         return err;
> >  }
> >
> > -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > u32 value)
> > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > +                     u32 value, bool is_lmem)
> >  {
> > -       const int ver = GRAPHICS_VER(rq->engine->i915);
> > +       struct drm_i915_private *i915 = rq->engine->i915;
> > +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> > +       const int ver = GRAPHICS_VER(i915);
> > +       int ring_sz;
> >         u32 *cs;
> >
> >         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> >
> >         offset += (u64)rq->engine->instance << 32;
> >
> > -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > +       if (ver >= 12)
> > +               ring_sz = 16;
> > +       else if (ver >= 8)
> > +               ring_sz = 8;
> > +       else
> > +               ring_sz = 6;
> > +
> > +       cs = intel_ring_begin(rq, ring_sz);
> >         if (IS_ERR(cs))
> >                 return PTR_ERR(cs);
> >
> > -       if (ver >= 8) {
> > +       if (ver >= 12) {
> > +               *cs++ = XY_FAST_COLOR_BLT_CMD |
> > XY_FAST_COLOR_BLT_DEPTH_32 |
> > +                       (XY_FAST_COLOR_BLT_DW - 2);
> > +               *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs)
> > |
> > +                       (PAGE_SIZE - 1);
> > +               *cs++ = 0;
> > +               *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> > +               *cs++ = lower_32_bits(offset);
> > +               *cs++ = upper_32_bits(offset);
> > +               *cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > +               /* BG7 */
> > +               *cs++ = value;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               /* BG11 */
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               /* BG13 */
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +       } else if (ver >= 8) {
> >                 *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
> >                 *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY |
> > PAGE_SIZE;
> >                 *cs++ = 0;
> > @@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context
> > *ce,
> >                 if (err)
> >                         goto out_rq;
> >
> > -               err = emit_clear(rq, offset, len, value);
> > +               err = emit_clear(rq, offset, len, value, is_lmem);
> >
> >                 /* Arbitration is re-enabled between requests. */
> >  out_rq:
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-gfx] [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+
@ 2022-03-21 23:07       ` Ramalingam C
  0 siblings, 0 replies; 33+ messages in thread
From: Ramalingam C @ 2022-03-21 23:07 UTC (permalink / raw)
  To: Hellstrom, Thomas; +Cc: intel-gfx, Auld, Matthew, dri-devel

On 2022-03-21 at 14:19:01 +0530, Hellstrom, Thomas wrote:
> On Sun, 2022-03-20 at 02:12 +0530, Ramalingam C wrote:
> > XY_FAST_COLOR_BLT cmd is faster than the older XY_COLOR_BLT. Hence
> > for
> > clearing (Zero out) the pages of the newly allocated object, faster
> > cmd
> > is used.
> 
> NIT: Imperative wording
> 
> >
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Also there's a typo in the patch title.
Fixed them in the next version. Thanks for the review Thomas.

Ram
> 
> With that fixed:
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  5 +++
> >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 43 +++++++++++++++++-
> > --
> >  2 files changed, 43 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index d112ffd56418..925e55b6a94f 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -205,6 +205,11 @@
> >
> >  #define COLOR_BLT_CMD                  (2 << 29 | 0x40 << 22 | (5 -
> > 2))
> >  #define XY_COLOR_BLT_CMD               (2 << 29 | 0x50 << 22)
> > +#define XY_FAST_COLOR_BLT_CMD          (2 << 29 | 0x44 << 22)
> > +#define   XY_FAST_COLOR_BLT_DEPTH_32   (2 << 19)
> > +#define   XY_FAST_COLOR_BLT_DW         16
> > +#define   XY_FAST_COLOR_BLT_MOCS_MASK  GENMASK(27, 21)
> > +#define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
> >  #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
> >  #define GEN9_XY_FAST_COPY_BLT_CMD      (2 << 29 | 0x42 << 22)
> >  #define XY_SRC_COPY_BLT_CMD            (2 << 29 | 0x53 << 22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index 20444d6ceb3c..73199ebf0671 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -614,20 +614,53 @@ intel_context_migrate_copy(struct intel_context
> > *ce,
> >         return err;
> >  }
> >
> > -static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > u32 value)
> > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > +                     u32 value, bool is_lmem)
> >  {
> > -       const int ver = GRAPHICS_VER(rq->engine->i915);
> > +       struct drm_i915_private *i915 = rq->engine->i915;
> > +       int mocs = rq->engine->gt->mocs.uc_index << 1;
> > +       const int ver = GRAPHICS_VER(i915);
> > +       int ring_sz;
> >         u32 *cs;
> >
> >         GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> >
> >         offset += (u64)rq->engine->instance << 32;
> >
> > -       cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > +       if (ver >= 12)
> > +               ring_sz = 16;
> > +       else if (ver >= 8)
> > +               ring_sz = 8;
> > +       else
> > +               ring_sz = 6;
> > +
> > +       cs = intel_ring_begin(rq, ring_sz);
> >         if (IS_ERR(cs))
> >                 return PTR_ERR(cs);
> >
> > -       if (ver >= 8) {
> > +       if (ver >= 12) {
> > +               *cs++ = XY_FAST_COLOR_BLT_CMD |
> > XY_FAST_COLOR_BLT_DEPTH_32 |
> > +                       (XY_FAST_COLOR_BLT_DW - 2);
> > +               *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs)
> > |
> > +                       (PAGE_SIZE - 1);
> > +               *cs++ = 0;
> > +               *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
> > +               *cs++ = lower_32_bits(offset);
> > +               *cs++ = upper_32_bits(offset);
> > +               *cs++ = !is_lmem << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > +               /* BG7 */
> > +               *cs++ = value;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               /* BG11 */
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               /* BG13 */
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +               *cs++ = 0;
> > +       } else if (ver >= 8) {
> >                 *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (7 - 2);
> >                 *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY |
> > PAGE_SIZE;
> >                 *cs++ = 0;
> > @@ -711,7 +744,7 @@ intel_context_migrate_clear(struct intel_context
> > *ce,
> >                 if (err)
> >                         goto out_rq;
> >
> > -               err = emit_clear(rq, offset, len, value);
> > +               err = emit_clear(rq, offset, len, value, is_lmem);
> >
> >                 /* Arbitration is re-enabled between requests. */
> >  out_rq:
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-03-21 23:06 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19 20:42 [PATCH v4 0/8] drm/i915/ttm: Evict and restore of compressed object Ramalingam C
2022-03-19 20:42 ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 1/8] drm/i915/gt: Use XY_FASR_COLOR_BLT to clear obj on graphics ver 12+ Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-21  8:49   ` Hellstrom, Thomas
2022-03-21  8:49     ` [Intel-gfx] " Hellstrom, Thomas
2022-03-21 23:07     ` Ramalingam C
2022-03-21 23:07       ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 2/8] drm/i915/gt: Clear compress metadata for Flat-ccs objects Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 3/8] drm/i915/selftest_migrate: Consider the possible roundup of size Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 4/8] drm/i915/selftest_migrate: Check CCS meta data clear Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-20  1:39   ` kernel test robot
2022-03-21 10:39   ` Hellstrom, Thomas
2022-03-21 10:39     ` Hellstrom, Thomas
2022-03-21 23:05     ` Ramalingam C
2022-03-21 23:05       ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 5/8] drm/i915/gt: Optimize the migration loop Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 6/8] drm/ttm: Add a parameter to add extra pages into ttm_tt Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-21 10:11   ` Das, Nirmoy
2022-03-21 23:06     ` Ramalingam C
2022-03-19 20:42 ` [PATCH v4 7/8] drm/i915/gem: Add extra pages in ttm_tt for ccs data Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-19 20:42 ` [PATCH v4 8/8] drm/i915/migrate: Evict and restore the flatccs capable lmem obj Ramalingam C
2022-03-19 20:42   ` [Intel-gfx] " Ramalingam C
2022-03-19 20:50 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/ttm: Evict and restore of compressed object (rev2) Patchwork
2022-03-19 20:52 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-03-19 21:26 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-03-19 21:26 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.