[Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions
@ 2022-01-21 22:22 Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 1/5] drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS Adrian Larumbe
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

This series is a first attempt at handling the eviction of flat-CCS
compressed lmem-placed bo's.

The official specification of flat CCS behaviour dictates that:

"Flat CCS data needs to be cleared when a lmem object is allocated.  And CCS
data can be copied in and out of CCS region through XY_CTRL_SURF_COPY_BLT. CPU
can't access the CCS data directly.
                                                                             
When we exaust the lmem, if the object's placements support smem, then we can
directly decompress the compressed lmem object into smem and start using it from
smem itself.
                                                                             
But when we need to swapout the compressed lmem object into a smem region though
objects' placement doesn't support smem, then we copy the lmem content as it is
into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).  When the
object is referred, lmem content will be swaped in along with restoration of the
CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding location."

Design decisions:

 - A separate GEM bo of type `ttm_bo_type_kernel` is created to hold the
 contents of the CCS block when an lmem-only object is evicted onto smem. This
 is because this bo should never be mmap'able onto user space.

 - Whether a bo is meant to contain flat-CCS compressed data is marked by adding
 specific surface modifiers that describe a FB object which contains the
 relevant bo. This is done through a specific DRM library call.

 - At an eviction event, the bo's buffer data and its corresponding CCS block
 have to be moved between smem and lmem in two separate blit operations. Because
 of my poor knowledge of the blit unit, it's very likely that the way I
 programmed it is somehow wrong.

 - At the moment, migrating a flat-CCS lmem-only object from smem back onto lmem
 will fail if its flat-CCS swap bo had not been created. However, a bo lifecycle
 begins in smem when it's created, even its original placement specifies lmem
 only. Between the time that a bo is freshly created and an execbuf2 ioctl
 actually moves it onto lmem and allocates its backing storage, the user might've
 called mmap on it, a scenario which I haven't yet accounted for.

Part of the blitting engine code was borrowed from Ramalingam C <ramalingam.c@intel.com>,
who has been working in parallel on the same problem, although in a slightly
different approach.

For testing, a flat-CCS driver self-test is under preparation.

Adrian Larumbe (5):
  drm/i915/flat-CCS: Add GEM BO structure fields for flat-CCS
  drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers
  drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header
  drm/i915/flat-CCS: handle CCS block blit for bo migrations
  drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's

 drivers/gpu/drm/i915/display/intel_fb.c       |  36 ++-
 drivers/gpu/drm/i915/display/intel_fb.h       |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  10 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |  11 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  |  78 ++++-
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 289 ++++++++++++------
 drivers/gpu/drm/i915/gt/intel_migrate.h       |   2 +
 drivers/gpu/drm/i915/gt/selftest_migrate.c    |   3 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 9 files changed, 342 insertions(+), 93 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] [RFC PATCH 1/5] drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
@ 2022-01-21 22:22 ` Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 2/5] drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers Adrian Larumbe
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

When a flat-CCS aware bo is evicted from lmem, its control surface will be
written out into smem. This will be done in the shape of a kernel-only bo
attached to the original bo.

Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 71e778ecaeb8..9f574e149c58 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -628,6 +628,16 @@ struct drm_i915_gem_object {
 
 		void *gvt_info;
 	};
+
+	/**
+	 * This is set if the object is lmem-placeable, supports flat
+	 * CCS and is compressed. In that case, a separate block of
+	 * stolen lmem memory will contain its compression data.
+	 */
+	struct {
+		struct drm_i915_gem_object *swap;
+		bool enabled:1;
+	} flat_css;
 };
 
 static inline struct drm_i915_gem_object *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [RFC PATCH 2/5] drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 1/5] drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS Adrian Larumbe
@ 2022-01-21 22:22 ` Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header Adrian Larumbe
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

Adds frame buffer support code for flat-CCS devices, like DG2. A flat-CCS
modifier is attached to a fb object that contains the original bo by means
of the drmModeAddFB2WithModifiers drm API call.

Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/i915/display/intel_fb.c | 36 ++++++++++++++++++++++---
 drivers/gpu/drm/i915/display/intel_fb.h |  1 +
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c
index 72040f580911..6f998d1956bb 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -158,19 +158,24 @@ static const struct intel_modifier_desc intel_modifiers[] = {
 	{
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
 		.display_ver = { 13, 14 },
-		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 |
+				INTEL_PLANE_CAP_CCS_MC |
+				INTEL_PLANE_CAP_DG2_CCS,
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
 		.display_ver = { 13, 14 },
-		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
-
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 |
+				INTEL_PLANE_CAP_CCS_RC_CC |
+				INTEL_PLANE_CAP_DG2_CCS,
 		.ccs.cc_planes = BIT(1),
 
 		FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats),
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
 		.display_ver = { 13, 14 },
-		.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
+		.plane_caps = INTEL_PLANE_CAP_TILING_4 |
+				INTEL_PLANE_CAP_CCS_RC |
+				INTEL_PLANE_CAP_DG2_CCS,
 	}, {
 		.modifier = I915_FORMAT_MOD_4_TILED,
 		.display_ver = { 13, 14 },
@@ -313,6 +318,20 @@ bool intel_fb_is_ccs_modifier(u64 modifier)
 				      INTEL_PLANE_CAP_CCS_MASK);
 }
 
+/**
+ * if (intel_fb_is_dg2_ccs_modifier): Check if a modifier is a DG2 CCS modifier type
+ * @modifier: Modifier to check
+ *
+ * Returns:
+ * Returns %true if @modifier is a render, render with color clear or
+ * media compression modifier compatible with DG2 devices.
+ */
+bool intel_fb_is_dg2_ccs_modifier(u64 modifier)
+{
+	return plane_caps_contain_any(lookup_modifier(modifier)->plane_caps,
+				      INTEL_PLANE_CAP_DG2_CCS);
+}
+
 /**
  * intel_fb_is_rc_ccs_cc_modifier: Check if a modifier is an RC CCS CC modifier type
  * @modifier: Modifier to check
@@ -2000,6 +2019,15 @@ int intel_framebuffer_init(struct intel_framebuffer *intel_fb,
 		intel_fb->dpt_vm = vm;
 	}
 
+	/*
+	 * In devices with flat CCS support, a compressed buffer object
+	 * will need to shuffle its CCS block back and forth between lmem
+	 * and smem at object migration events.
+	 */
+	if (intel_fb_is_dg2_ccs_modifier(fb->modifier) && HAS_FLAT_CCS(dev_priv))
+		if (!i915_gem_object_migratable(obj) && i915_gem_object_is_lmem(obj))
+			obj->flat_css.enabled = true;
+
 	ret = drm_framebuffer_init(&dev_priv->drm, fb, &intel_fb_funcs);
 	if (ret) {
 		drm_err(&dev_priv->drm, "framebuffer init failed %d\n", ret);
diff --git a/drivers/gpu/drm/i915/display/intel_fb.h b/drivers/gpu/drm/i915/display/intel_fb.h
index 12386f13a4e0..5bd74ff9a449 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.h
+++ b/drivers/gpu/drm/i915/display/intel_fb.h
@@ -28,6 +28,7 @@ struct intel_plane_state;
 #define INTEL_PLANE_CAP_TILING_Y	BIT(4)
 #define INTEL_PLANE_CAP_TILING_Yf	BIT(5)
 #define INTEL_PLANE_CAP_TILING_4	BIT(6)
+#define INTEL_PLANE_CAP_DG2_CCS		BIT(7)
 
 bool intel_fb_is_ccs_modifier(u64 modifier);
 bool intel_fb_is_rc_ccs_cc_modifier(u64 modifier);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 1/5] drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS Adrian Larumbe
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 2/5] drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers Adrian Larumbe
@ 2022-01-21 22:22 ` Adrian Larumbe
  2022-01-24 16:00   ` Jani Nikula
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations Adrian Larumbe
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

It has to be used by other files other than low-level migration code.

Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 1 -
 drivers/gpu/drm/i915/i915_drv.h         | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index a210c911905e..716f2f51c7f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,7 +16,6 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
-#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? (size) >> 8 : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5623892ceab9..6b890a6674e4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -105,6 +105,7 @@
 #include "i915_request.h"
 #include "i915_scheduler.h"
 #include "gt/intel_timeline.h"
+#include "gt/intel_gpu_commands.h"
 #include "i915_vma.h"
 
 
@@ -1526,6 +1527,10 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 #define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
 
+#define GET_CCS_SIZE(i915, size) (HAS_FLAT_CCS(i915) ? \
+				  DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK) \
+				  0)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
                   ` (2 preceding siblings ...)
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header Adrian Larumbe
@ 2022-01-21 22:22 ` Adrian Larumbe
  2022-01-24 16:02   ` Jani Nikula
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's Adrian Larumbe
  2022-01-24 22:09 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add basic support for flat-CCS bo evictions Patchwork
  5 siblings, 1 reply; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

Because the smem-evicted bo that holds the CCS block has to be blitted
separately from the original compressed bo, two sets of PTEs have to
be emitted for every bo copy.

This commit is partially based off another commit from Ramalingam C
<ramalingam.c@intel.com>, currently under discussion.

Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c    | 288 +++++++++++++++------
 drivers/gpu/drm/i915/gt/intel_migrate.h    |   2 +
 drivers/gpu/drm/i915/gt/selftest_migrate.c |   3 +-
 3 files changed, 207 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 716f2f51c7f9..da0fcc42c43c 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -447,14 +447,183 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int
+size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_SIZE(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = (GET_CCS_SIZE(i915, size) +
+			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
+	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
+
+static int emit_ccs(struct i915_request *rq,
+		    struct sgt_dma *it_lmem,
+		    enum i915_cache_level lmem_cache_level,
+		    struct sgt_dma *it_css,
+		    enum i915_cache_level css_cache_level,
+		    bool lmem2smem,
+		    int size)
+{
+	struct drm_i915_private *i915 = rq->engine->i915;
+	u32 num_ccs_blks = (GET_CCS_SIZE(i915, size) +
+			    NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
+	struct sgt_dma *it_src, *it_dst;
+	enum i915_cache_level src_cache_level;
+	enum i915_cache_level dst_cache_level;
+	u8 src_access, dst_access;
+	u32 src_offset, dst_offset;
+	u32 ccs_ring_size;
+	int err, len;
+	u32 *cs;
+
+	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
+
+	err = emit_no_arbitration(rq);
+	if (err)
+		return -EINVAL;
+
+	src_offset = 0;
+	dst_offset = CHUNK_SZ;
+	if (HAS_64K_PAGES(i915)) {
+		src_offset = 0;
+		dst_offset = 0;
+		if (lmem2smem)
+			src_offset = CHUNK_SZ;
+		else
+			dst_offset = 2 * CHUNK_SZ;
+	}
+
+	if (lmem2smem) {
+		it_src = it_lmem;
+		it_dst = it_css;
+		src_cache_level = lmem_cache_level;
+		dst_cache_level = css_cache_level;
+	} else {
+		it_src = it_css;
+		it_dst = it_lmem;
+		src_cache_level = css_cache_level;
+		dst_cache_level = lmem_cache_level;
+	}
+
+	len = emit_pte(rq, it_src, src_cache_level,
+		       lmem2smem ? true : false,
+		       src_offset, CHUNK_SZ);
+	if (len <= 0)
+		return len;
+
+	err = emit_pte(rq, it_src, dst_cache_level,
+		       lmem2smem ? false : true,
+		       dst_offset, len);
+	if (err < 0)
+		return err;
+	if (err < len)
+		return -EINVAL;
+
+	err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+	if (err)
+		return err;
+
+	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
+
+	src_access = !lmem2smem;
+	dst_access = !src_access;
+
+	cs = _i915_ctrl_surf_copy_blt(cs,
+				      src_offset,
+				      dst_offset,
+				      src_access,
+				      dst_access,
+				      1, 1,
+				      num_ccs_blks);
+	cs = i915_flush_dw(cs, dst_offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int emit_copy(struct i915_request *rq,
-		     u32 dst_offset, u32 src_offset, int size)
+		     bool dst_is_lmem, u32 dst_offset,
+		     bool src_is_lmem, u32 src_offset,
+		     int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
 	u32 *cs;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 10 : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -503,6 +672,8 @@ intel_context_migrate_copy(struct intel_context *ce,
 			   struct scatterlist *dst,
 			   enum i915_cache_level dst_cache_level,
 			   bool dst_is_lmem,
+			   struct scatterlist *css_blk,
+			   enum i915_cache_level css_cache_level,
 			   struct i915_request **out)
 {
 	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
@@ -576,7 +747,31 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, dst_offset, src_offset, len);
+		err = emit_copy(rq, dst_is_lmem, dst_offset, src_is_lmem,
+				src_offset, len);
+
+		if (HAS_FLAT_CCS(ce->engine->i915) && css_blk) {
+			struct sgt_dma it_css_smem = sg_sgt(css_blk);
+			enum i915_cache_level lmem_cache_level;
+			struct sgt_dma *it_lmem;
+			bool lmem2smem;
+
+			if (dst_is_lmem) {
+				it_lmem = &it_dst;
+				lmem_cache_level = dst_cache_level;
+				lmem2smem = false;
+			} else {
+				it_lmem = &it_src;
+				lmem_cache_level = src_cache_level;
+				lmem2smem = true;
+			}
+
+			err = emit_ccs(rq, it_lmem, lmem_cache_level,
+				       &it_css_smem, css_cache_level,
+				       lmem2smem, len);
+			if (err)
+				goto out_rq;
+		}
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -641,84 +836,6 @@ intel_context_migrate_copy(struct intel_context *ce,
  * 4Kb tiles i.e Tile4 layout.
  */
 
-static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
-{
-	/* Mask the 3 LSB to use the PPGTT address space */
-	*cmd++ = MI_FLUSH_DW | flags;
-	*cmd++ = lower_32_bits(dst);
-	*cmd++ = upper_32_bits(dst);
-
-	return cmd;
-}
-
-static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
-{
-	u32 num_cmds, num_blks, total_size;
-
-	if (!GET_CCS_SIZE(i915, size))
-		return 0;
-
-	/*
-	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
-	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
-	 * trnasfer upto 1024 blocks.
-	 */
-	num_blks = (GET_CCS_SIZE(i915, size) +
-			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
-	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
-	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
-
-	/*
-	 * We need to add a flush before and after
-	 * XY_CTRL_SURF_COPY_BLT
-	 */
-	total_size += 2 * MI_FLUSH_DW_SIZE;
-	return total_size;
-}
-
-static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
-				     u8 src_mem_access, u8 dst_mem_access,
-				     int src_mocs, int dst_mocs,
-				     u16 num_ccs_blocks)
-{
-	int i = num_ccs_blocks;
-
-	/*
-	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
-	 * data in and out of the CCS region.
-	 *
-	 * We can copy at most 1024 blocks of 256 bytes using one
-	 * XY_CTRL_SURF_COPY_BLT instruction.
-	 *
-	 * In case we need to copy more than 1024 blocks, we need to add
-	 * another instruction to the same batch buffer.
-	 *
-	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
-	 *
-	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
-	 */
-	do {
-		/*
-		 * We use logical AND with 1023 since the size field
-		 * takes values which is in the range of 0 - 1023
-		 */
-		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
-			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
-			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
-			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
-		*cmd++ = lower_32_bits(src_addr);
-		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
-			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
-		*cmd++ = lower_32_bits(dst_addr);
-		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
-			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
-		src_addr += SZ_64M;
-		dst_addr += SZ_64M;
-		i -= NUM_CCS_BLKS_PER_XFER;
-	} while (i > 0);
-
-	return cmd;
-}
 
 static int emit_clear(struct i915_request *rq,
 		      u64 offset,
@@ -740,7 +857,7 @@ static int emit_clear(struct i915_request *rq,
 			 calc_ctrl_surf_instr_size(i915, size)
 			 : 0;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 + ccs_ring_size : 6);
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -764,8 +881,7 @@ static int emit_clear(struct i915_request *rq,
 	}
 
 	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
-		num_ccs_blks = (GET_CCS_SIZE(i915, size) +
-				NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
+		num_ccs_blks = GET_CCS_SIZE(i915, size);
 		/*
 		 * Flat CCS surface can only be accessed via
 		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
@@ -784,6 +900,8 @@ static int emit_clear(struct i915_request *rq,
 					      1, 1,
 					      num_ccs_blks);
 		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
 	}
 	intel_ring_advance(rq, cs);
 	return 0;
@@ -898,7 +1016,7 @@ int intel_migrate_copy(struct intel_migrate *m,
 	err = intel_context_migrate_copy(ce, deps,
 					 src, src_cache_level, src_is_lmem,
 					 dst, dst_cache_level, dst_is_lmem,
-					 out);
+					 NULL, I915_CACHE_NONE, out);
 
 	intel_context_unpin(ce);
 out:
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
index ccc677ec4aa3..dce63a0dba33 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.h
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
@@ -41,6 +41,8 @@ int intel_context_migrate_copy(struct intel_context *ce,
 			       struct scatterlist *dst,
 			       enum i915_cache_level dst_cache_level,
 			       bool dst_is_lmem,
+			       struct scatterlist *cssblk,
+			       enum i915_cache_level css_cache_level,
 			       struct i915_request **out);
 
 int
diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index fa4293d2944f..2a2fa6186e31 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -231,7 +231,7 @@ static int __global_copy(struct intel_migrate *migrate,
 					  i915_gem_object_is_lmem(src),
 					  dst->mm.pages->sgl, dst->cache_level,
 					  i915_gem_object_is_lmem(dst),
-					  out);
+					  NULL, I915_CACHE_NONE, out);
 }
 
 static int
@@ -582,6 +582,7 @@ static int __perf_copy_blt(struct intel_context *ce,
 						 src_is_lmem,
 						 dst, dst_cache_level,
 						 dst_is_lmem,
+						 NULL, I915_CACHE_NONE,
 						 &rq);
 		if (rq) {
 			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
                   ` (3 preceding siblings ...)
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations Adrian Larumbe
@ 2022-01-21 22:22 ` Adrian Larumbe
  2022-01-24 16:24   ` Thomas Hellström
  2022-01-24 22:09 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add basic support for flat-CCS bo evictions Patchwork
  5 siblings, 1 reply; 11+ messages in thread
From: Adrian Larumbe @ 2022-01-21 22:22 UTC (permalink / raw)
  To: daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

When a flat-CCS lmem-bound BO is evicted onto smem for the first time, a
separate swap gem object is created to hold the contents of the CCS block.
It is assumed that, for a flat-CCS bo to be migrated back onto lmem, it
should've begun its life in lmem.

It also handles destruction of the swap bo when the original TTM object
reaches the end of its life.

Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 11 +++
 drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 78 +++++++++++++++++++-
 2 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 84cae740b4a5..24708d6bfd9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -474,11 +474,22 @@ static int i915_ttm_shrink(struct drm_i915_gem_object *obj, unsigned int flags)
 static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct drm_i915_private *i915 =
+		container_of(bo->bdev, typeof(*i915), bdev);
 
 	if (likely(obj)) {
 		__i915_gem_object_pages_fini(obj);
 		i915_ttm_free_cached_io_rsgt(obj);
 	}
+
+	if (HAS_FLAT_CCS(i915) && obj->flat_css.enabled) {
+		struct drm_i915_gem_object *swap_obj = obj->flat_css.swap;
+
+		if (swap_obj) {
+			swap_obj->base.funcs->free(&swap_obj->base);
+			obj->flat_css.swap = NULL;
+		}
+	}
 }
 
 static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 1de306c03aaf..3479c4a37bd8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -162,6 +162,56 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo)
 	return 0;
 }
 
+static int
+i915_ccs_handle_move(struct drm_i915_gem_object *obj,
+		     struct ttm_resource *dst_mem)
+{
+	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+						     bdev);
+	struct intel_memory_region *dst_reg;
+	size_t ccs_blk_size;
+	int ret;
+
+	dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
+	ccs_blk_size = GET_CCS_SIZE(i915, obj->base.size);
+
+	if (dst_reg->type != INTEL_MEMORY_LOCAL &&
+	    dst_reg->type != INTEL_MEMORY_SYSTEM) {
+		DRM_DEBUG_DRIVER("Wrong memory region when using flat CCS.\n");
+		return -EINVAL;
+	}
+
+	if (dst_reg->type == INTEL_MEMORY_LOCAL &&
+	    (obj->flat_css.swap == NULL || !i915_gem_object_has_pages(obj->flat_css.swap))) {
+		/*
+		 * All BOs begin their life cycle in smem, even if meant to be
+		 * lmem-bound. Then, upon running the execbuf2 ioctl, get moved
+		 * onto lmem before first use. Therefore, migrating a flat-CCS
+		 * lmem-only buffer into lmem means a CCS swap buffer had already
+		 * been allocated when first migrating it onto smem from lmem.
+		 */
+
+		drm_err(&i915->drm, "BO hasn't been evicted into smem yet\n");
+		return -EINVAL;
+
+	} else if (dst_reg->type == INTEL_MEMORY_SYSTEM &&
+		   !obj->flat_css.swap) {
+		/* First time object is swapped out onto smem */
+		obj->flat_css.swap =
+			i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_SMEM],
+						      ccs_blk_size, 0, 0);
+		if (IS_ERR(obj->flat_css.swap))
+			return -ENOMEM;
+
+		ret = __i915_gem_object_get_pages(obj->flat_css.swap);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 					     bool clear,
 					     struct ttm_resource *dst_mem,
@@ -172,9 +222,10 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
 						     bdev);
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+	struct i915_refct_sgt *ccs_rsgt = NULL;
 	struct i915_request *rq;
 	struct ttm_tt *src_ttm = bo->ttm;
-	enum i915_cache_level src_level, dst_level;
+	enum i915_cache_level src_level, dst_level, ccs_level;
 	int ret;
 
 	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
@@ -196,6 +247,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 						  i915_ttm_gtt_binds_lmem(dst_mem),
 						  0, &rq);
 	} else {
+		struct ttm_buffer_object *swap_bo;
 		struct i915_refct_sgt *src_rsgt =
 			i915_ttm_resource_get_st(obj, bo->resource);
 
@@ -203,6 +255,25 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 			return ERR_CAST(src_rsgt);
 
 		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
+		ccs_level = I915_CACHE_NONE;
+
+		/* Handle CCS block */
+		if (HAS_FLAT_CCS(i915) && obj->flat_css.enabled) {
+			ret = i915_ccs_handle_move(obj, dst_mem);
+			if (ret) {
+				drm_err(&i915->drm,
+					"CCS block migration failed (%d)\n", ret);
+				return ERR_PTR(ret);
+			}
+
+			swap_bo = i915_gem_to_ttm(obj->flat_css.swap);
+			ccs_level = i915_ttm_cache_level(i915, swap_bo->resource, swap_bo->ttm);
+			ccs_rsgt = i915_ttm_resource_get_st(i915_ttm_to_gem(swap_bo),
+							    swap_bo->resource);
+			if (IS_ERR(ccs_rsgt))
+				return ERR_CAST(ccs_rsgt);
+		}
+
 		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
 		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
 						 deps, src_rsgt->table.sgl,
@@ -210,9 +281,12 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
 						 i915_ttm_gtt_binds_lmem(bo->resource),
 						 dst_st->sgl, dst_level,
 						 i915_ttm_gtt_binds_lmem(dst_mem),
-						 &rq);
+						 ccs_rsgt ? ccs_rsgt->table.sgl : NULL,
+						 ccs_level, &rq);
 
 		i915_refct_sgt_put(src_rsgt);
+		if (ccs_rsgt)
+			i915_refct_sgt_put(ccs_rsgt);
 	}
 
 	intel_engine_pm_put(to_gt(i915)->migrate.context->engine);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header Adrian Larumbe
@ 2022-01-24 16:00   ` Jani Nikula
  0 siblings, 0 replies; 11+ messages in thread
From: Jani Nikula @ 2022-01-24 16:00 UTC (permalink / raw)
  To: Adrian Larumbe, daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

On Fri, 21 Jan 2022, Adrian Larumbe <adrian.larumbe@collabora.com> wrote:
> It has to be used by other files other than low-level migration code.

Maybe, but i915_drv.h is not the dumping ground for this
stuff. Especially you shouldn't add anything in i915_drv.h that requires
you to pull in other headers. The goal is to go in the completely
opposite direction.

BR,
Jani.

>
> Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c | 1 -
>  drivers/gpu/drm/i915/i915_drv.h         | 5 +++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index a210c911905e..716f2f51c7f9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,7 +16,6 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> -#define GET_CCS_SIZE(i915, size)	(HAS_FLAT_CCS(i915) ? (size) >> 8 : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs *engine)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5623892ceab9..6b890a6674e4 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -105,6 +105,7 @@
>  #include "i915_request.h"
>  #include "i915_scheduler.h"
>  #include "gt/intel_timeline.h"
> +#include "gt/intel_gpu_commands.h"
>  #include "i915_vma.h"
>  
>  
> @@ -1526,6 +1527,10 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>  
>  #define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
>  
> +#define GET_CCS_SIZE(i915, size) (HAS_FLAT_CCS(i915) ? \
> +				  DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK) \
> +				  0)
> +
>  #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
>  
>  #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations Adrian Larumbe
@ 2022-01-24 16:02   ` Jani Nikula
  0 siblings, 0 replies; 11+ messages in thread
From: Jani Nikula @ 2022-01-24 16:02 UTC (permalink / raw)
  To: Adrian Larumbe, daniel, ramalingam.c, intel-gfx; +Cc: adrian.larumbe

On Fri, 21 Jan 2022, Adrian Larumbe <adrian.larumbe@collabora.com> wrote:
> Because the smem-evicted bo that holds the CCS block has to be blitted
> separately from the original compressed bo, two sets of PTEs have to
> be emitted for every bo copy.
>
> This commit is partially based off another commit from Ramalingam C
> <ramalingam.c@intel.com>, currently under discussion.
>
> Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_migrate.c    | 288 +++++++++++++++------
>  drivers/gpu/drm/i915/gt/intel_migrate.h    |   2 +
>  drivers/gpu/drm/i915/gt/selftest_migrate.c |   3 +-
>  3 files changed, 207 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 716f2f51c7f9..da0fcc42c43c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -447,14 +447,183 @@ static bool wa_1209644611_applies(int ver, u32 size)
>  	return height % 4 == 3 && height <= 8;
>  }
>  
> +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)

As a general rule, please don't use the inline keyword in .c files, just
let the compiler decide. It's a premature optimization. And you won't
get warnings if it's unused.

BR,
Jani.

> +{
> +	/* Mask the 3 LSB to use the PPGTT address space */
> +	*cmd++ = MI_FLUSH_DW | flags;
> +	*cmd++ = lower_32_bits(dst);
> +	*cmd++ = upper_32_bits(dst);
> +
> +	return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int
> +size)
> +{
> +	u32 num_cmds, num_blks, total_size;
> +
> +	if (!GET_CCS_SIZE(i915, size))
> +		return 0;
> +
> +	/*
> +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> +	 * trnasfer upto 1024 blocks.
> +	 */
> +	num_blks = (GET_CCS_SIZE(i915, size) +
> +			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
> +	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
> +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> +
> +	/*
> +	 * We need to add a flush before and after
> +	 * XY_CTRL_SURF_COPY_BLT
> +	 */
> +	total_size += 2 * MI_FLUSH_DW_SIZE;
> +	return total_size;
> +}
> +
> +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> +				     u8 src_mem_access, u8 dst_mem_access,
> +				     int src_mocs, int dst_mocs,
> +				     u16 num_ccs_blocks)
> +{
> +	int i = num_ccs_blocks;
> +
> +	/*
> +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> +	 * data in and out of the CCS region.
> +	 *
> +	 * We can copy at most 1024 blocks of 256 bytes using one
> +	 * XY_CTRL_SURF_COPY_BLT instruction.
> +	 *
> +	 * In case we need to copy more than 1024 blocks, we need to add
> +	 * another instruction to the same batch buffer.
> +	 *
> +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> +	 *
> +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> +	 */
> +	do {
> +		/*
> +		 * We use logical AND with 1023 since the size field
> +		 * takes values which is in the range of 0 - 1023
> +		 */
> +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> +		*cmd++ = lower_32_bits(src_addr);
> +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		*cmd++ = lower_32_bits(dst_addr);
> +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		src_addr += SZ_64M;
> +		dst_addr += SZ_64M;
> +		i -= NUM_CCS_BLKS_PER_XFER;
> +	} while (i > 0);
> +
> +	return cmd;
> +}
> +
> +
> +static int emit_ccs(struct i915_request *rq,
> +		    struct sgt_dma *it_lmem,
> +		    enum i915_cache_level lmem_cache_level,
> +		    struct sgt_dma *it_css,
> +		    enum i915_cache_level css_cache_level,
> +		    bool lmem2smem,
> +		    int size)
> +{
> +	struct drm_i915_private *i915 = rq->engine->i915;
> +	u32 num_ccs_blks = (GET_CCS_SIZE(i915, size) +
> +			    NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
> +	struct sgt_dma *it_src, *it_dst;
> +	enum i915_cache_level src_cache_level;
> +	enum i915_cache_level dst_cache_level;
> +	u8 src_access, dst_access;
> +	u32 src_offset, dst_offset;
> +	u32 ccs_ring_size;
> +	int err, len;
> +	u32 *cs;
> +
> +	ccs_ring_size = calc_ctrl_surf_instr_size(i915, size);
> +
> +	err = emit_no_arbitration(rq);
> +	if (err)
> +		return -EINVAL;
> +
> +	src_offset = 0;
> +	dst_offset = CHUNK_SZ;
> +	if (HAS_64K_PAGES(i915)) {
> +		src_offset = 0;
> +		dst_offset = 0;
> +		if (lmem2smem)
> +			src_offset = CHUNK_SZ;
> +		else
> +			dst_offset = 2 * CHUNK_SZ;
> +	}
> +
> +	if (lmem2smem) {
> +		it_src = it_lmem;
> +		it_dst = it_css;
> +		src_cache_level = lmem_cache_level;
> +		dst_cache_level = css_cache_level;
> +	} else {
> +		it_src = it_css;
> +		it_dst = it_lmem;
> +		src_cache_level = css_cache_level;
> +		dst_cache_level = lmem_cache_level;
> +	}
> +
> +	len = emit_pte(rq, it_src, src_cache_level,
> +		       lmem2smem ? true : false,
> +		       src_offset, CHUNK_SZ);
> +	if (len <= 0)
> +		return len;
> +
> +	err = emit_pte(rq, it_src, dst_cache_level,
> +		       lmem2smem ? false : true,
> +		       dst_offset, len);
> +	if (err < 0)
> +		return err;
> +	if (err < len)
> +		return -EINVAL;
> +
> +	err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> +	if (err)
> +		return err;
> +
> +	cs = intel_ring_begin(rq, round_up(ccs_ring_size, 2));
> +
> +	src_access = !lmem2smem;
> +	dst_access = !src_access;
> +
> +	cs = _i915_ctrl_surf_copy_blt(cs,
> +				      src_offset,
> +				      dst_offset,
> +				      src_access,
> +				      dst_access,
> +				      1, 1,
> +				      num_ccs_blks);
> +	cs = i915_flush_dw(cs, dst_offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
> -		     u32 dst_offset, u32 src_offset, int size)
> +		     bool dst_is_lmem, u32 dst_offset,
> +		     bool src_is_lmem, u32 src_offset,
> +		     int size)
>  {
>  	const int ver = GRAPHICS_VER(rq->engine->i915);
>  	u32 instance = rq->engine->instance;
>  	u32 *cs;
>  
> -	cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
> +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 10 : 6, 2));
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
> @@ -503,6 +672,8 @@ intel_context_migrate_copy(struct intel_context *ce,
>  			   struct scatterlist *dst,
>  			   enum i915_cache_level dst_cache_level,
>  			   bool dst_is_lmem,
> +			   struct scatterlist *css_blk,
> +			   enum i915_cache_level css_cache_level,
>  			   struct i915_request **out)
>  {
>  	struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
> @@ -576,7 +747,31 @@ intel_context_migrate_copy(struct intel_context *ce,
>  		if (err)
>  			goto out_rq;
>  
> -		err = emit_copy(rq, dst_offset, src_offset, len);
> +		err = emit_copy(rq, dst_is_lmem, dst_offset, src_is_lmem,
> +				src_offset, len);
> +
> +		if (HAS_FLAT_CCS(ce->engine->i915) && css_blk) {
> +			struct sgt_dma it_css_smem = sg_sgt(css_blk);
> +			enum i915_cache_level lmem_cache_level;
> +			struct sgt_dma *it_lmem;
> +			bool lmem2smem;
> +
> +			if (dst_is_lmem) {
> +				it_lmem = &it_dst;
> +				lmem_cache_level = dst_cache_level;
> +				lmem2smem = false;
> +			} else {
> +				it_lmem = &it_src;
> +				lmem_cache_level = src_cache_level;
> +				lmem2smem = true;
> +			}
> +
> +			err = emit_ccs(rq, it_lmem, lmem_cache_level,
> +				       &it_css_smem, css_cache_level,
> +				       lmem2smem, len);
> +			if (err)
> +				goto out_rq;
> +		}
>  
>  		/* Arbitration is re-enabled between requests. */
>  out_rq:
> @@ -641,84 +836,6 @@ intel_context_migrate_copy(struct intel_context *ce,
>   * 4Kb tiles i.e Tile4 layout.
>   */
>  
> -static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> -{
> -	/* Mask the 3 LSB to use the PPGTT address space */
> -	*cmd++ = MI_FLUSH_DW | flags;
> -	*cmd++ = lower_32_bits(dst);
> -	*cmd++ = upper_32_bits(dst);
> -
> -	return cmd;
> -}
> -
> -static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> -{
> -	u32 num_cmds, num_blks, total_size;
> -
> -	if (!GET_CCS_SIZE(i915, size))
> -		return 0;
> -
> -	/*
> -	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> -	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> -	 * trnasfer upto 1024 blocks.
> -	 */
> -	num_blks = (GET_CCS_SIZE(i915, size) +
> -			   (NUM_CCS_BYTES_PER_BLOCK - 1)) >> 8;
> -	num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10;
> -	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> -
> -	/*
> -	 * We need to add a flush before and after
> -	 * XY_CTRL_SURF_COPY_BLT
> -	 */
> -	total_size += 2 * MI_FLUSH_DW_SIZE;
> -	return total_size;
> -}
> -
> -static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> -				     u8 src_mem_access, u8 dst_mem_access,
> -				     int src_mocs, int dst_mocs,
> -				     u16 num_ccs_blocks)
> -{
> -	int i = num_ccs_blocks;
> -
> -	/*
> -	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> -	 * data in and out of the CCS region.
> -	 *
> -	 * We can copy at most 1024 blocks of 256 bytes using one
> -	 * XY_CTRL_SURF_COPY_BLT instruction.
> -	 *
> -	 * In case we need to copy more than 1024 blocks, we need to add
> -	 * another instruction to the same batch buffer.
> -	 *
> -	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> -	 *
> -	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> -	 */
> -	do {
> -		/*
> -		 * We use logical AND with 1023 since the size field
> -		 * takes values which is in the range of 0 - 1023
> -		 */
> -		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> -			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> -			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> -			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> -		*cmd++ = lower_32_bits(src_addr);
> -		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> -			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> -		*cmd++ = lower_32_bits(dst_addr);
> -		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> -			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> -		src_addr += SZ_64M;
> -		dst_addr += SZ_64M;
> -		i -= NUM_CCS_BLKS_PER_XFER;
> -	} while (i > 0);
> -
> -	return cmd;
> -}
>  
>  static int emit_clear(struct i915_request *rq,
>  		      u64 offset,
> @@ -740,7 +857,7 @@ static int emit_clear(struct i915_request *rq,
>  			 calc_ctrl_surf_instr_size(i915, size)
>  			 : 0;
>  
> -	cs = intel_ring_begin(rq, ver >= 8 ? 8 + ccs_ring_size : 6);
> +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
> @@ -764,8 +881,7 @@ static int emit_clear(struct i915_request *rq,
>  	}
>  
>  	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> -		num_ccs_blks = (GET_CCS_SIZE(i915, size) +
> -				NUM_CCS_BYTES_PER_BLOCK - 1) >> 8;
> +		num_ccs_blks = GET_CCS_SIZE(i915, size);
>  		/*
>  		 * Flat CCS surface can only be accessed via
>  		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> @@ -784,6 +900,8 @@ static int emit_clear(struct i915_request *rq,
>  					      1, 1,
>  					      num_ccs_blks);
>  		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +		if (ccs_ring_size & 1)
> +			*cs++ = MI_NOOP;
>  	}
>  	intel_ring_advance(rq, cs);
>  	return 0;
> @@ -898,7 +1016,7 @@ int intel_migrate_copy(struct intel_migrate *m,
>  	err = intel_context_migrate_copy(ce, deps,
>  					 src, src_cache_level, src_is_lmem,
>  					 dst, dst_cache_level, dst_is_lmem,
> -					 out);
> +					 NULL, I915_CACHE_NONE, out);
>  
>  	intel_context_unpin(ce);
>  out:
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.h b/drivers/gpu/drm/i915/gt/intel_migrate.h
> index ccc677ec4aa3..dce63a0dba33 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.h
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.h
> @@ -41,6 +41,8 @@ int intel_context_migrate_copy(struct intel_context *ce,
>  			       struct scatterlist *dst,
>  			       enum i915_cache_level dst_cache_level,
>  			       bool dst_is_lmem,
> +			       struct scatterlist *cssblk,
> +			       enum i915_cache_level css_cache_level,
>  			       struct i915_request **out);
>  
>  int
> diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> index fa4293d2944f..2a2fa6186e31 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
> @@ -231,7 +231,7 @@ static int __global_copy(struct intel_migrate *migrate,
>  					  i915_gem_object_is_lmem(src),
>  					  dst->mm.pages->sgl, dst->cache_level,
>  					  i915_gem_object_is_lmem(dst),
> -					  out);
> +					  NULL, I915_CACHE_NONE, out);
>  }
>  
>  static int
> @@ -582,6 +582,7 @@ static int __perf_copy_blt(struct intel_context *ce,
>  						 src_is_lmem,
>  						 dst, dst_cache_level,
>  						 dst_is_lmem,
> +						 NULL, I915_CACHE_NONE,
>  						 &rq);
>  		if (rq) {
>  			if (i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT) < 0)

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's Adrian Larumbe
@ 2022-01-24 16:24   ` Thomas Hellström
  2022-01-24 18:21     ` Thomas Hellström
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Hellström @ 2022-01-24 16:24 UTC (permalink / raw)
  To: Adrian Larumbe, daniel, ramalingam.c, intel-gfx, Christian König

Hi, Adrian

On 1/21/22 23:22, Adrian Larumbe wrote:
> When a flat-CCS lmem-bound BO is evicted onto smem for the first time, a
> separate swap gem object is created to hold the contents of the CCS block.
> It is assumed that, for a flat-CCS bo to be migrated back onto lmem, it
> should've begun its life in lmem.
>
> It also handles destruction of the swap bo when the original TTM object
> reaches the end of its life.
>
> Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>


While allocating a separate object for the CCS data is certainly
possible, it poses some additional difficulties that have not been
addressed here.

The CCS object needs to share the dma_resv of the original object. That
is because the CCS object needs to be locked and validated when we process it, and we
can only trylock within the ttm move callback which might therefore fail
and isn't sufficient on swapin. We'd need to create some
i915_gem_object_create_region_locked() that wraps ttm_bo_init_reserved().

Furthermore destruction also becomes complicated, as the main object
owns a refcount on the CCS object, but the CCS object also needs a
refcount on the dma_resv part of the main object which will create a
refcount loop requiring an additional dma_resv refcount for objects to
resolve, similar to how we've solved this for shared dma_resv shared with vms.

Also shouldn't we be destroying the CCS object when data is moved back into lmem?

Anyway, when we've earlier discussed how to handle this, we've discussed a solution where the struct ttm_tt was given an inflated size on creation to accommodate also the CCS data at the end. That would waste some memory if we ever were to migrate such an object to system while decompressing, but otherwise greatly simplify the handling. Basically we'd only look at whether the object is flat-CCS enabled in i915_ttm_tt_create() and inflate the ttm_tt size.
  
This requires an additional size parameter to ttm_tt_init(), but I've once discussed this with Christian König, and he didn't seem to object at the time. (+CC Christian König).

Thanks,
Thomas


> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c      | 11 +++
>   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 78 +++++++++++++++++++-
>   2 files changed, 87 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 84cae740b4a5..24708d6bfd9c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -474,11 +474,22 @@ static int i915_ttm_shrink(struct drm_i915_gem_object *obj, unsigned int flags)
>   static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
>   {
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +	struct drm_i915_private *i915 =
> +		container_of(bo->bdev, typeof(*i915), bdev);
>   
>   	if (likely(obj)) {
>   		__i915_gem_object_pages_fini(obj);
>   		i915_ttm_free_cached_io_rsgt(obj);
>   	}
> +
> +	if (HAS_FLAT_CCS(i915) && obj->flat_css.enabled) {
> +		struct drm_i915_gem_object *swap_obj = obj->flat_css.swap;
> +
> +		if (swap_obj) {
> +			swap_obj->base.funcs->free(&swap_obj->base);
> +			obj->flat_css.swap = NULL;
> +		}
> +	}
>   }
>   
>   static struct i915_refct_sgt *i915_ttm_tt_get_st(struct ttm_tt *ttm)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> index 1de306c03aaf..3479c4a37bd8 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> @@ -162,6 +162,56 @@ int i915_ttm_move_notify(struct ttm_buffer_object *bo)
>   	return 0;
>   }
>   
> +static int
> +i915_ccs_handle_move(struct drm_i915_gem_object *obj,
> +		     struct ttm_resource *dst_mem)
> +{
> +	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
> +	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
> +						     bdev);
> +	struct intel_memory_region *dst_reg;
> +	size_t ccs_blk_size;
> +	int ret;
> +
> +	dst_reg = i915_ttm_region(bo->bdev, dst_mem->mem_type);
> +	ccs_blk_size = GET_CCS_SIZE(i915, obj->base.size);
> +
> +	if (dst_reg->type != INTEL_MEMORY_LOCAL &&
> +	    dst_reg->type != INTEL_MEMORY_SYSTEM) {
> +		DRM_DEBUG_DRIVER("Wrong memory region when using flat CCS.\n");
> +		return -EINVAL;
> +	}
> +
> +	if (dst_reg->type == INTEL_MEMORY_LOCAL &&
> +	    (obj->flat_css.swap == NULL || !i915_gem_object_has_pages(obj->flat_css.swap))) {
> +		/*
> +		 * All BOs begin their life cycle in smem, even if meant to be
> +		 * lmem-bound. Then, upon running the execbuf2 ioctl, get moved
> +		 * onto lmem before first use. Therefore, migrating a flat-CCS
> +		 * lmem-only buffer into lmem means a CCS swap buffer had already
> +		 * been allocated when first migrating it onto smem from lmem.
> +		 */
> +
> +		drm_err(&i915->drm, "BO hasn't been evicted into smem yet\n");
> +		return -EINVAL;
> +
> +	} else if (dst_reg->type == INTEL_MEMORY_SYSTEM &&
> +		   !obj->flat_css.swap) {
> +		/* First time object is swapped out onto smem */
> +		obj->flat_css.swap =
> +			i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_SMEM],
> +						      ccs_blk_size, 0, 0);
> +		if (IS_ERR(obj->flat_css.swap))
> +			return -ENOMEM;
> +
> +		ret = __i915_gem_object_get_pages(obj->flat_css.swap);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>   static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   					     bool clear,
>   					     struct ttm_resource *dst_mem,
> @@ -172,9 +222,10 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   	struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
>   						     bdev);
>   	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> +	struct i915_refct_sgt *ccs_rsgt = NULL;
>   	struct i915_request *rq;
>   	struct ttm_tt *src_ttm = bo->ttm;
> -	enum i915_cache_level src_level, dst_level;
> +	enum i915_cache_level src_level, dst_level, ccs_level;
>   	int ret;
>   
>   	if (!to_gt(i915)->migrate.context || intel_gt_is_wedged(to_gt(i915)))
> @@ -196,6 +247,7 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   						  i915_ttm_gtt_binds_lmem(dst_mem),
>   						  0, &rq);
>   	} else {
> +		struct ttm_buffer_object *swap_bo;
>   		struct i915_refct_sgt *src_rsgt =
>   			i915_ttm_resource_get_st(obj, bo->resource);
>   
> @@ -203,6 +255,25 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   			return ERR_CAST(src_rsgt);
>   
>   		src_level = i915_ttm_cache_level(i915, bo->resource, src_ttm);
> +		ccs_level = I915_CACHE_NONE;
> +
> +		/* Handle CCS block */
> +		if (HAS_FLAT_CCS(i915) && obj->flat_css.enabled) {
> +			ret = i915_ccs_handle_move(obj, dst_mem);
> +			if (ret) {
> +				drm_err(&i915->drm,
> +					"CCS block migration failed (%d)\n", ret);
> +				return ERR_PTR(ret);
> +			}
> +
> +			swap_bo = i915_gem_to_ttm(obj->flat_css.swap);
> +			ccs_level = i915_ttm_cache_level(i915, swap_bo->resource, swap_bo->ttm);
> +			ccs_rsgt = i915_ttm_resource_get_st(i915_ttm_to_gem(swap_bo),
> +							    swap_bo->resource);
> +			if (IS_ERR(ccs_rsgt))
> +				return ERR_CAST(ccs_rsgt);
> +		}
> +
>   		intel_engine_pm_get(to_gt(i915)->migrate.context->engine);
>   		ret = intel_context_migrate_copy(to_gt(i915)->migrate.context,
>   						 deps, src_rsgt->table.sgl,
> @@ -210,9 +281,12 @@ static struct dma_fence *i915_ttm_accel_move(struct ttm_buffer_object *bo,
>   						 i915_ttm_gtt_binds_lmem(bo->resource),
>   						 dst_st->sgl, dst_level,
>   						 i915_ttm_gtt_binds_lmem(dst_mem),
> -						 &rq);
> +						 ccs_rsgt ? ccs_rsgt->table.sgl : NULL,
> +						 ccs_level, &rq);
>   
>   		i915_refct_sgt_put(src_rsgt);
> +		if (ccs_rsgt)
> +			i915_refct_sgt_put(ccs_rsgt);
>   	}
>   
>   	intel_engine_pm_put(to_gt(i915)->migrate.context->engine);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's
  2022-01-24 16:24   ` Thomas Hellström
@ 2022-01-24 18:21     ` Thomas Hellström
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Hellström @ 2022-01-24 18:21 UTC (permalink / raw)
  To: Adrian Larumbe, daniel, ramalingam.c, intel-gfx, Christian König

On Mon, 2022-01-24 at 17:24 +0100, Thomas Hellström wrote:
> Hi, Adrian
> 
> On 1/21/22 23:22, Adrian Larumbe wrote:
> > When a flat-CCS lmem-bound BO is evicted onto smem for the first
> > time, a
> > separate swap gem object is created to hold the contents of the CCS
> > block.
> > It is assumed that, for a flat-CCS bo to be migrated back onto
> > lmem, it
> > should've begun its life in lmem.
> > 
> > It also handles destruction of the swap bo when the original TTM
> > object
> > reaches the end of its life.
> > 
> > Signed-off-by: Adrian Larumbe <adrian.larumbe@collabora.com>
> 
> 
> While allocating a separate object for the CCS data is certainly
> possible, it poses some additional difficulties that have not been
> addressed here.
> 
> The CCS object needs to share the dma_resv of the original object.
> That
> is because the CCS object needs to be locked and validated when we
> process it, and we
> can only trylock within the ttm move callback which might therefore
> fail
> and isn't sufficient on swapin. We'd need to create some
> i915_gem_object_create_region_locked() that wraps
> ttm_bo_init_reserved().

Actually that would be a function to create with a reservation object
shared from another object.

/Thomas



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add basic support for flat-CCS bo evictions
  2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
                   ` (4 preceding siblings ...)
  2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's Adrian Larumbe
@ 2022-01-24 22:09 ` Patchwork
  5 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2022-01-24 22:09 UTC (permalink / raw)
  To: Adrian Larumbe; +Cc: intel-gfx

== Series Details ==

Series: Add basic support for flat-CCS bo evictions
URL   : https://patchwork.freedesktop.org/series/99248/
State : failure

== Summary ==

Applying: drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS
Applying: drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/display/intel_fb.c
M	drivers/gpu/drm/i915/display/intel_fb.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/display/intel_fb.h
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/display/intel_fb.h
Auto-merging drivers/gpu/drm/i915/display/intel_fb.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/display/intel_fb.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-01-24 22:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-21 22:22 [Intel-gfx] [RFC PATCH 0/5] Add basic support for flat-CCS bo evictions Adrian Larumbe
2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 1/5] drm/i915/flat-CCS: Add GEM bo structure fields for flat-CCS Adrian Larumbe
2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 2/5] drm/i915/flat-CCS: Add flat CCS plane capabilities and modifiers Adrian Larumbe
2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 3/5] drm/i915/flat-CCS: move GET_CCS_SIZE macro into driver-wide header Adrian Larumbe
2022-01-24 16:00   ` Jani Nikula
2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 4/5] drm/i915/flat-CCS: handle CCS block blit for bo migrations Adrian Larumbe
2022-01-24 16:02   ` Jani Nikula
2022-01-21 22:22 ` [Intel-gfx] [RFC PATCH 5/5] drm/i915/flat-CCS: handle creation and destruction of flat CCS bo's Adrian Larumbe
2022-01-24 16:24   ` Thomas Hellström
2022-01-24 18:21     ` Thomas Hellström
2022-01-24 22:09 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add basic support for flat-CCS bo evictions Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.