All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/15] drm/i915: Enable DG2
@ 2022-02-18 18:47 ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi

Enabling the Dg2 on drm/i915.

This series adds support for 64k pagesize and documents the uapi
impacts. And also adds basic flat-ccs enabling patches to
support the local memory initialization and object creation. Kdoc is
added to document the Flat-ccs support.

Flat-ccs modifiers will be enabled in upcoming patches.

Note:
This is subset of https://patchwork.freedesktop.org/series/95686/ The
remaining patches of the series will be pursued in subsequent series.

And few patches are reviewed at and pulled from many series like
https://patchwork.freedesktop.org/series/99119/
https://patchwork.freedesktop.org/series/100373/
https://patchwork.freedesktop.org/series/97544/

Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

John Harrison (1):
  drm/i915/dg2: Define GuC firmware version for DG2

Jouni Högander (1):
  drm/i915: Fix for PHY_MISC_TC1 offset

Matt Roper (2):
  drm/i915/dg2: Drop 38.4 MHz MPLLB tables
  drm/i915/dg2: Enable 5th port

Matthew Auld (6):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/gtt: allow overriding the pt alignment
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/uapi: document behaviour for DG2 64K support

Ramalingam C (1):
  drm/i915: add needs_compact_pt flag

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 drivers/gpu/drm/i915/display/intel_display.c  |   1 +
 drivers/gpu/drm/i915/display/intel_gmbus.c    |  16 +-
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 210 +----------
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  15 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   3 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  35 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 337 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  26 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   1 +
 drivers/gpu/drm/i915/i915_drv.h               |  17 +-
 drivers/gpu/drm/i915/i915_irq.c               |   5 +-
 drivers/gpu/drm/i915/i915_pci.c               |   3 +
 drivers/gpu/drm/i915/i915_reg.h               |   7 +-
 drivers/gpu/drm/i915/i915_vma.c               |   9 +
 drivers/gpu/drm/i915/intel_device_info.h      |   2 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 222 ++++++++++--
 include/uapi/drm/i915_drm.h                   |  45 ++-
 24 files changed, 934 insertions(+), 308 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 00/15] drm/i915: Enable DG2
@ 2022-02-18 18:47 ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi

Enabling the Dg2 on drm/i915.

This series adds support for 64k pagesize and documents the uapi
impacts. And also adds basic flat-ccs enabling patches to
support the local memory initialization and object creation. Kdoc is
added to document the Flat-ccs support.

Flat-ccs modifiers will be enabled in upcoming patches.

Note:
This is subset of https://patchwork.freedesktop.org/series/95686/ The
remaining patches of the series will be pursued in subsequent series.

And few patches are reviewed at and pulled from many series like
https://patchwork.freedesktop.org/series/99119/
https://patchwork.freedesktop.org/series/100373/
https://patchwork.freedesktop.org/series/97544/

Abdiel Janulgue (1):
  drm/i915/lmem: Enable lmem for platforms with Flat CCS

Ayaz A Siddiqui (1):
  drm/i915/gt: Clear compress metadata for Xe_HP platforms

CQ Tang (1):
  drm/i915/xehpsdv: Add has_flat_ccs to device info

John Harrison (1):
  drm/i915/dg2: Define GuC firmware version for DG2

Jouni Högander (1):
  drm/i915: Fix for PHY_MISC_TC1 offset

Matt Roper (2):
  drm/i915/dg2: Drop 38.4 MHz MPLLB tables
  drm/i915/dg2: Enable 5th port

Matthew Auld (6):
  drm/i915: enforce min GTT alignment for discrete cards
  drm/i915: support 64K GTT pages for discrete cards
  drm/i915/gtt: allow overriding the pt alignment
  drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  drm/i915/migrate: add acceleration support for DG2
  drm/i915/uapi: document behaviour for DG2 64K support

Ramalingam C (1):
  drm/i915: add needs_compact_pt flag

Robert Beckett (1):
  drm/i915: add gtt misalignment test

 drivers/gpu/drm/i915/display/intel_display.c  |   1 +
 drivers/gpu/drm/i915/display/intel_gmbus.c    |  16 +-
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 210 +----------
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++
 .../i915/gem/selftests/i915_gem_client_blt.c  |  21 +-
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 158 +++++++-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  15 +
 drivers/gpu/drm/i915/gt/intel_gt.c            |  19 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |   1 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   3 +
 drivers/gpu/drm/i915/gt/intel_gtt.c           |  12 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |  35 +-
 drivers/gpu/drm/i915/gt/intel_migrate.c       | 337 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  17 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  26 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      |   1 +
 drivers/gpu/drm/i915/i915_drv.h               |  17 +-
 drivers/gpu/drm/i915/i915_irq.c               |   5 +-
 drivers/gpu/drm/i915/i915_pci.c               |   3 +
 drivers/gpu/drm/i915/i915_reg.h               |   7 +-
 drivers/gpu/drm/i915/i915_vma.c               |   9 +
 drivers/gpu/drm/i915/intel_device_info.h      |   2 +
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 222 ++++++++++--
 include/uapi/drm/i915_drm.h                   |  45 ++-
 24 files changed, 934 insertions(+), 308 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tomasz Mistat, lucas.demarchi, Daniele Ceraolo Spurio, John Harrison

From: John Harrison <John.C.Harrison@Intel.com>

First release of GuC for DG2.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Tomasz Mistat <tomasz.mistat@intel.com>
CC: Ramalingam C <ramalingam.c@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index c88113044494..55512db29183 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -52,6 +52,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
  * firmware as TGL.
  */
 #define INTEL_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
+	fw_def(DG2,          0, guc_def(dg2,  69, 0, 3)) \
 	fw_def(ALDERLAKE_P,  0, guc_def(adlp, 69, 0, 3)) \
 	fw_def(ALDERLAKE_S,  0, guc_def(tgl,  69, 0, 3)) \
 	fw_def(DG1,          0, guc_def(dg1,  69, 0, 3)) \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Tomasz Mistat, lucas.demarchi

From: John Harrison <John.C.Harrison@Intel.com>

First release of GuC for DG2.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Tomasz Mistat <tomasz.mistat@intel.com>
CC: Ramalingam C <ramalingam.c@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
index c88113044494..55512db29183 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
@@ -52,6 +52,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
  * firmware as TGL.
  */
 #define INTEL_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
+	fw_def(DG2,          0, guc_def(dg2,  69, 0, 3)) \
 	fw_def(ALDERLAKE_P,  0, guc_def(adlp, 69, 0, 3)) \
 	fw_def(ALDERLAKE_S,  0, guc_def(tgl,  69, 0, 3)) \
 	fw_def(DG1,          0, guc_def(dg1,  69, 0, 3)) \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/15] drm/i915: Fix for PHY_MISC_TC1 offset
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi, Uma Shankar, Jouni Högander

From: Jouni Högander <jouni.hogander@intel.com>

Currently ICL_PHY_MISC macro is returning offset 0x64C10 for PHY_E.
The PORT_TC1 port is not yet enabled properly in the driver, but
intel_phy_snps.c is relying on intel_phy_is_snps() to filter out
unavailable phys. That function was already considering the last phy as
available. Just correct the offset of the last phy to 0x64C14 as the
rest of the support for it is coming on next commits.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 2 +-
 drivers/gpu/drm/i915/i915_reg.h               | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c
index 8fd00de981fc..4cdce0116883 100644
--- a/drivers/gpu/drm/i915/display/intel_snps_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c
@@ -32,7 +32,7 @@ void intel_snps_phy_wait_for_calibration(struct drm_i915_private *i915)
 		if (!intel_phy_is_snps(i915, phy))
 			continue;
 
-		if (intel_de_wait_for_clear(i915, ICL_PHY_MISC(phy),
+		if (intel_de_wait_for_clear(i915, DG2_PHY_MISC(phy),
 					    DG2_PHY_DP_TX_ACK_MASK, 25))
 			drm_err(&i915->drm, "SNPS PHY %c failed to calibrate after 25ms.\n",
 				phy_name(phy));
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e2e9f543fb83..cc13918fe246 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -9361,8 +9361,10 @@ enum skl_power_gate {
 
 #define _ICL_PHY_MISC_A		0x64C00
 #define _ICL_PHY_MISC_B		0x64C04
-#define ICL_PHY_MISC(port)	_MMIO_PORT(port, _ICL_PHY_MISC_A, \
-						 _ICL_PHY_MISC_B)
+#define _DG2_PHY_MISC_TC1	0x64C14 /* TC1="PHY E" but offset as if "PHY F" */
+#define ICL_PHY_MISC(port)	_MMIO_PORT(port, _ICL_PHY_MISC_A, _ICL_PHY_MISC_B)
+#define DG2_PHY_MISC(port)	((port) == PHY_E ? _MMIO(_DG2_PHY_MISC_TC1) : \
+				 ICL_PHY_MISC(port))
 #define  ICL_PHY_MISC_MUX_DDID			(1 << 28)
 #define  ICL_PHY_MISC_DE_IO_COMP_PWR_DOWN	(1 << 23)
 #define  DG2_PHY_DP_TX_ACK_MASK			REG_GENMASK(23, 20)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 02/15] drm/i915: Fix for PHY_MISC_TC1 offset
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi

From: Jouni Högander <jouni.hogander@intel.com>

Currently ICL_PHY_MISC macro is returning offset 0x64C10 for PHY_E.
The PORT_TC1 port is not yet enabled properly in the driver, but
intel_phy_snps.c is relying on intel_phy_is_snps() to filter out
unavailable phys. That function was already considering the last phy as
available. Just correct the offset of the last phy to 0x64C14 as the
rest of the support for it is coming on next commits.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 2 +-
 drivers/gpu/drm/i915/i915_reg.h               | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c
index 8fd00de981fc..4cdce0116883 100644
--- a/drivers/gpu/drm/i915/display/intel_snps_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c
@@ -32,7 +32,7 @@ void intel_snps_phy_wait_for_calibration(struct drm_i915_private *i915)
 		if (!intel_phy_is_snps(i915, phy))
 			continue;
 
-		if (intel_de_wait_for_clear(i915, ICL_PHY_MISC(phy),
+		if (intel_de_wait_for_clear(i915, DG2_PHY_MISC(phy),
 					    DG2_PHY_DP_TX_ACK_MASK, 25))
 			drm_err(&i915->drm, "SNPS PHY %c failed to calibrate after 25ms.\n",
 				phy_name(phy));
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e2e9f543fb83..cc13918fe246 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -9361,8 +9361,10 @@ enum skl_power_gate {
 
 #define _ICL_PHY_MISC_A		0x64C00
 #define _ICL_PHY_MISC_B		0x64C04
-#define ICL_PHY_MISC(port)	_MMIO_PORT(port, _ICL_PHY_MISC_A, \
-						 _ICL_PHY_MISC_B)
+#define _DG2_PHY_MISC_TC1	0x64C14 /* TC1="PHY E" but offset as if "PHY F" */
+#define ICL_PHY_MISC(port)	_MMIO_PORT(port, _ICL_PHY_MISC_A, _ICL_PHY_MISC_B)
+#define DG2_PHY_MISC(port)	((port) == PHY_E ? _MMIO(_DG2_PHY_MISC_TC1) : \
+				 ICL_PHY_MISC(port))
 #define  ICL_PHY_MISC_MUX_DDID			(1 << 28)
 #define  ICL_PHY_MISC_DE_IO_COMP_PWR_DOWN	(1 << 23)
 #define  DG2_PHY_DP_TX_ACK_MASK			REG_GENMASK(23, 20)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/15] drm/i915/dg2: Drop 38.4 MHz MPLLB tables
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Anusha Srivatsa, lucas.demarchi, José Roberto de Souza, Uma Shankar

From: Matt Roper <matthew.d.roper@intel.com>

Our early understanding of DG2 was incorrect; since the 5th display
isn't actually a Type-C output, 38.4 MHz input clocks are never used on
this platform and we can drop the corresponding MPLLB tables.

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 208 +-----------------
 1 file changed, 1 insertion(+), 207 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c
index 4cdce0116883..7e6245b97fed 100644
--- a/drivers/gpu/drm/i915/display/intel_snps_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c
@@ -250,197 +250,6 @@ static const struct intel_mpllb_state * const dg2_dp_100_tables[] = {
 	NULL,
 };
 
-/*
- * Basic DP link rates with 38.4 MHz reference clock.
- */
-
-static const struct intel_mpllb_state dg2_dp_rbr_38_4 = {
-	.clock = 162000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 304),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 49152),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr1_38_4 = {
-	.clock = 270000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr2_38_4 = {
-	.clock = 540000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr3_38_4 = {
-	.clock = 810000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 388),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 61440),
-};
-
-static const struct intel_mpllb_state dg2_dp_uhbr10_38_4 = {
-	.clock = 1000000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SHIM_DIV32_CLK_SEL, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 488),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 3),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_REM, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 27306),
-
-	/*
-	 * SSC will be enabled, DP UHBR has a minimum SSC requirement.
-	 */
-	.mpllb_sscen =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 76800),
-	.mpllb_sscstep =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 129024),
-};
-
-static const struct intel_mpllb_state dg2_dp_uhbr13_38_4 = {
-	.clock = 1350000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 56) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 670),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 36864),
-
-	/*
-	 * SSC will be enabled, DP UHBR has a minimum SSC requirement.
-	 */
-	.mpllb_sscen =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 103680),
-	.mpllb_sscstep =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 174182),
-};
-
-static const struct intel_mpllb_state * const dg2_dp_38_4_tables[] = {
-	&dg2_dp_rbr_38_4,
-	&dg2_dp_hbr1_38_4,
-	&dg2_dp_hbr2_38_4,
-	&dg2_dp_hbr3_38_4,
-	&dg2_dp_uhbr10_38_4,
-	&dg2_dp_uhbr13_38_4,
-	NULL,
-};
-
 /*
  * eDP link rates with 100 MHz reference clock.
  */
@@ -749,22 +558,7 @@ intel_mpllb_tables_get(struct intel_crtc_state *crtc_state,
 	if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP)) {
 		return dg2_edp_tables;
 	} else if (intel_crtc_has_dp_encoder(crtc_state)) {
-		/*
-		 * FIXME: Initially we're just enabling the "combo" outputs on
-		 * port A-D.  The MPLLB for those ports takes an input from the
-		 * "Display Filter PLL" which always has an output frequency
-		 * of 100 MHz, hence the use of the _100 tables below.
-		 *
-		 * Once we enable port TC1 it will either use the same 100 MHz
-		 * "Display Filter PLL" (when strapped to support a native
-		 * display connection) or different 38.4 MHz "Filter PLL" when
-		 * strapped to support a USB connection, so we'll need to check
-		 * that to determine which table to use.
-		 */
-		if (0)
-			return dg2_dp_38_4_tables;
-		else
-			return dg2_dp_100_tables;
+		return dg2_dp_100_tables;
 	} else if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) {
 		return dg2_hdmi_tables;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 03/15] drm/i915/dg2: Drop 38.4 MHz MPLLB tables
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi

From: Matt Roper <matthew.d.roper@intel.com>

Our early understanding of DG2 was incorrect; since the 5th display
isn't actually a Type-C output, 38.4 MHz input clocks are never used on
this platform and we can drop the corresponding MPLLB tables.

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_snps_phy.c | 208 +-----------------
 1 file changed, 1 insertion(+), 207 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_snps_phy.c b/drivers/gpu/drm/i915/display/intel_snps_phy.c
index 4cdce0116883..7e6245b97fed 100644
--- a/drivers/gpu/drm/i915/display/intel_snps_phy.c
+++ b/drivers/gpu/drm/i915/display/intel_snps_phy.c
@@ -250,197 +250,6 @@ static const struct intel_mpllb_state * const dg2_dp_100_tables[] = {
 	NULL,
 };
 
-/*
- * Basic DP link rates with 38.4 MHz reference clock.
- */
-
-static const struct intel_mpllb_state dg2_dp_rbr_38_4 = {
-	.clock = 162000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 304),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 49152),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr1_38_4 = {
-	.clock = 270000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_TX_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr2_38_4 = {
-	.clock = 540000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 25) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FREQ_VCO, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 248),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 40960),
-};
-
-static const struct intel_mpllb_state dg2_dp_hbr3_38_4 = {
-	.clock = 810000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 388),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 61440),
-};
-
-static const struct intel_mpllb_state dg2_dp_uhbr10_38_4 = {
-	.clock = 1000000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 5) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 26) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SHIM_DIV32_CLK_SEL, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 2),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 488),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 3),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_REM, 2) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 27306),
-
-	/*
-	 * SSC will be enabled, DP UHBR has a minimum SSC requirement.
-	 */
-	.mpllb_sscen =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 76800),
-	.mpllb_sscstep =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 129024),
-};
-
-static const struct intel_mpllb_state dg2_dp_uhbr13_38_4 = {
-	.clock = 1350000,
-	.ref_control =
-		REG_FIELD_PREP(SNPS_PHY_REF_CONTROL_REF_RANGE, 1),
-	.mpllb_cp =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT, 6) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP, 56) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_INT_GS, 65) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_CP_PROP_GS, 127),
-	.mpllb_div =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV5_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_CLK_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DIV_MULTIPLIER, 8) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_PMIX_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_WORD_DIV2_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_DP2_MODE, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_V2I, 3),
-	.mpllb_div2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_REF_CLK_DIV, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_MULTIPLIER, 670),
-	.mpllb_fracn1 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_CGG_UPDATE_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_DEN, 1),
-	.mpllb_fracn2 =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_FRACN_QUOT, 36864),
-
-	/*
-	 * SSC will be enabled, DP UHBR has a minimum SSC requirement.
-	 */
-	.mpllb_sscen =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_EN, 1) |
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_PEAK, 103680),
-	.mpllb_sscstep =
-		REG_FIELD_PREP(SNPS_PHY_MPLLB_SSC_STEPSIZE, 174182),
-};
-
-static const struct intel_mpllb_state * const dg2_dp_38_4_tables[] = {
-	&dg2_dp_rbr_38_4,
-	&dg2_dp_hbr1_38_4,
-	&dg2_dp_hbr2_38_4,
-	&dg2_dp_hbr3_38_4,
-	&dg2_dp_uhbr10_38_4,
-	&dg2_dp_uhbr13_38_4,
-	NULL,
-};
-
 /*
  * eDP link rates with 100 MHz reference clock.
  */
@@ -749,22 +558,7 @@ intel_mpllb_tables_get(struct intel_crtc_state *crtc_state,
 	if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_EDP)) {
 		return dg2_edp_tables;
 	} else if (intel_crtc_has_dp_encoder(crtc_state)) {
-		/*
-		 * FIXME: Initially we're just enabling the "combo" outputs on
-		 * port A-D.  The MPLLB for those ports takes an input from the
-		 * "Display Filter PLL" which always has an output frequency
-		 * of 100 MHz, hence the use of the _100 tables below.
-		 *
-		 * Once we enable port TC1 it will either use the same 100 MHz
-		 * "Display Filter PLL" (when strapped to support a native
-		 * display connection) or different 38.4 MHz "Filter PLL" when
-		 * strapped to support a USB connection, so we'll need to check
-		 * that to determine which table to use.
-		 */
-		if (0)
-			return dg2_dp_38_4_tables;
-		else
-			return dg2_dp_100_tables;
+		return dg2_dp_100_tables;
 	} else if (intel_crtc_has_type(crtc_state, INTEL_OUTPUT_HDMI)) {
 		return dg2_hdmi_tables;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/15] drm/i915/dg2: Enable 5th port
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Swathi Dhanavanthri, lucas.demarchi, José Roberto de Souza

From: Matt Roper <matthew.d.roper@intel.com>

DG2 supports a 5th display output which the hardware refers to as "TC1,"
even though it isn't a Type-C output.  This behaves similarly to the TC1
on past platforms with just a couple minor differences:

 * DG2's TC1 bit in SDEISR is at bit 25 rather than 24 as it is on
   ICP/TGP/ADP.
 * DG2 doesn't need the hpd inversion setting that we had to use on DG1

v2:
  intel_ddi_init(dev_priv, PORT_TC1); [Matt]

Cc: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c |  1 +
 drivers/gpu/drm/i915/display/intel_gmbus.c   | 16 ++++++++++++++--
 drivers/gpu/drm/i915/i915_irq.c              |  5 ++++-
 drivers/gpu/drm/i915/i915_reg.h              |  1 +
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index aaf2aee4da35..69e15ad2c253 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8757,6 +8757,7 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv)
 		intel_ddi_init(dev_priv, PORT_B);
 		intel_ddi_init(dev_priv, PORT_C);
 		intel_ddi_init(dev_priv, PORT_D_XELPD);
+		intel_ddi_init(dev_priv, PORT_TC1);
 	} else if (IS_ALDERLAKE_P(dev_priv)) {
 		intel_ddi_init(dev_priv, PORT_A);
 		intel_ddi_init(dev_priv, PORT_B);
diff --git a/drivers/gpu/drm/i915/display/intel_gmbus.c b/drivers/gpu/drm/i915/display/intel_gmbus.c
index 6ce8c10fe975..2fad03250661 100644
--- a/drivers/gpu/drm/i915/display/intel_gmbus.c
+++ b/drivers/gpu/drm/i915/display/intel_gmbus.c
@@ -98,11 +98,21 @@ static const struct gmbus_pin gmbus_pins_dg1[] = {
 	[GMBUS_PIN_4_CNP] = { "dpd", GPIOE },
 };
 
+static const struct gmbus_pin gmbus_pins_dg2[] = {
+	[GMBUS_PIN_1_BXT] = { "dpa", GPIOB },
+	[GMBUS_PIN_2_BXT] = { "dpb", GPIOC },
+	[GMBUS_PIN_3_BXT] = { "dpc", GPIOD },
+	[GMBUS_PIN_4_CNP] = { "dpd", GPIOE },
+	[GMBUS_PIN_9_TC1_ICP] = { "tc1", GPIOJ },
+};
+
 /* pin is expected to be valid */
 static const struct gmbus_pin *get_gmbus_pin(struct drm_i915_private *dev_priv,
 					     unsigned int pin)
 {
-	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
+	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2)
+		return &gmbus_pins_dg2[pin];
+	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
 		return &gmbus_pins_dg1[pin];
 	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP)
 		return &gmbus_pins_icp[pin];
@@ -123,7 +133,9 @@ bool intel_gmbus_is_valid_pin(struct drm_i915_private *dev_priv,
 {
 	unsigned int size;
 
-	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
+	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2)
+		size = ARRAY_SIZE(gmbus_pins_dg2);
+	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
 		size = ARRAY_SIZE(gmbus_pins_dg1);
 	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP)
 		size = ARRAY_SIZE(gmbus_pins_icp);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fdd568ba4a16..4d81063b128c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -179,6 +179,7 @@ static const u32 hpd_sde_dg1[HPD_NUM_PINS] = {
 	[HPD_PORT_B] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_B),
 	[HPD_PORT_C] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_C),
 	[HPD_PORT_D] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_D),
+	[HPD_PORT_TC1] = SDE_TC_HOTPLUG_DG2(HPD_PORT_TC1),
 };
 
 static void intel_hpd_init_pins(struct drm_i915_private *dev_priv)
@@ -4424,7 +4425,9 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 		if (I915_HAS_HOTPLUG(dev_priv))
 			dev_priv->hotplug_funcs = &i915_hpd_funcs;
 	} else {
-		if (HAS_PCH_DG1(dev_priv))
+		if (HAS_PCH_DG2(dev_priv))
+			dev_priv->hotplug_funcs = &icp_hpd_funcs;
+		else if (HAS_PCH_DG1(dev_priv))
 			dev_priv->hotplug_funcs = &dg1_hpd_funcs;
 		else if (DISPLAY_VER(dev_priv) >= 11)
 			dev_priv->hotplug_funcs = &gen11_hpd_funcs;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cc13918fe246..986fb30da9ab 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6059,6 +6059,7 @@
 /* south display engine interrupt: ICP/TGP */
 #define SDE_GMBUS_ICP			(1 << 23)
 #define SDE_TC_HOTPLUG_ICP(hpd_pin)	REG_BIT(24 + _HPD_PIN_TC(hpd_pin))
+#define SDE_TC_HOTPLUG_DG2(hpd_pin)	REG_BIT(25 + _HPD_PIN_TC(hpd_pin)) /* sigh */
 #define SDE_DDI_HOTPLUG_ICP(hpd_pin)	REG_BIT(16 + _HPD_PIN_DDI(hpd_pin))
 #define SDE_DDI_HOTPLUG_MASK_ICP	(SDE_DDI_HOTPLUG_ICP(HPD_PORT_D) | \
 					 SDE_DDI_HOTPLUG_ICP(HPD_PORT_C) | \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 04/15] drm/i915/dg2: Enable 5th port
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi

From: Matt Roper <matthew.d.roper@intel.com>

DG2 supports a 5th display output which the hardware refers to as "TC1,"
even though it isn't a Type-C output.  This behaves similarly to the TC1
on past platforms with just a couple minor differences:

 * DG2's TC1 bit in SDEISR is at bit 25 rather than 24 as it is on
   ICP/TGP/ADP.
 * DG2 doesn't need the hpd inversion setting that we had to use on DG1

v2:
  intel_ddi_init(dev_priv, PORT_TC1); [Matt]

Cc: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c |  1 +
 drivers/gpu/drm/i915/display/intel_gmbus.c   | 16 ++++++++++++++--
 drivers/gpu/drm/i915/i915_irq.c              |  5 ++++-
 drivers/gpu/drm/i915/i915_reg.h              |  1 +
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index aaf2aee4da35..69e15ad2c253 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8757,6 +8757,7 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv)
 		intel_ddi_init(dev_priv, PORT_B);
 		intel_ddi_init(dev_priv, PORT_C);
 		intel_ddi_init(dev_priv, PORT_D_XELPD);
+		intel_ddi_init(dev_priv, PORT_TC1);
 	} else if (IS_ALDERLAKE_P(dev_priv)) {
 		intel_ddi_init(dev_priv, PORT_A);
 		intel_ddi_init(dev_priv, PORT_B);
diff --git a/drivers/gpu/drm/i915/display/intel_gmbus.c b/drivers/gpu/drm/i915/display/intel_gmbus.c
index 6ce8c10fe975..2fad03250661 100644
--- a/drivers/gpu/drm/i915/display/intel_gmbus.c
+++ b/drivers/gpu/drm/i915/display/intel_gmbus.c
@@ -98,11 +98,21 @@ static const struct gmbus_pin gmbus_pins_dg1[] = {
 	[GMBUS_PIN_4_CNP] = { "dpd", GPIOE },
 };
 
+static const struct gmbus_pin gmbus_pins_dg2[] = {
+	[GMBUS_PIN_1_BXT] = { "dpa", GPIOB },
+	[GMBUS_PIN_2_BXT] = { "dpb", GPIOC },
+	[GMBUS_PIN_3_BXT] = { "dpc", GPIOD },
+	[GMBUS_PIN_4_CNP] = { "dpd", GPIOE },
+	[GMBUS_PIN_9_TC1_ICP] = { "tc1", GPIOJ },
+};
+
 /* pin is expected to be valid */
 static const struct gmbus_pin *get_gmbus_pin(struct drm_i915_private *dev_priv,
 					     unsigned int pin)
 {
-	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
+	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2)
+		return &gmbus_pins_dg2[pin];
+	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
 		return &gmbus_pins_dg1[pin];
 	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP)
 		return &gmbus_pins_icp[pin];
@@ -123,7 +133,9 @@ bool intel_gmbus_is_valid_pin(struct drm_i915_private *dev_priv,
 {
 	unsigned int size;
 
-	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
+	if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG2)
+		size = ARRAY_SIZE(gmbus_pins_dg2);
+	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_DG1)
 		size = ARRAY_SIZE(gmbus_pins_dg1);
 	else if (INTEL_PCH_TYPE(dev_priv) >= PCH_ICP)
 		size = ARRAY_SIZE(gmbus_pins_icp);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fdd568ba4a16..4d81063b128c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -179,6 +179,7 @@ static const u32 hpd_sde_dg1[HPD_NUM_PINS] = {
 	[HPD_PORT_B] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_B),
 	[HPD_PORT_C] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_C),
 	[HPD_PORT_D] = SDE_DDI_HOTPLUG_ICP(HPD_PORT_D),
+	[HPD_PORT_TC1] = SDE_TC_HOTPLUG_DG2(HPD_PORT_TC1),
 };
 
 static void intel_hpd_init_pins(struct drm_i915_private *dev_priv)
@@ -4424,7 +4425,9 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 		if (I915_HAS_HOTPLUG(dev_priv))
 			dev_priv->hotplug_funcs = &i915_hpd_funcs;
 	} else {
-		if (HAS_PCH_DG1(dev_priv))
+		if (HAS_PCH_DG2(dev_priv))
+			dev_priv->hotplug_funcs = &icp_hpd_funcs;
+		else if (HAS_PCH_DG1(dev_priv))
 			dev_priv->hotplug_funcs = &dg1_hpd_funcs;
 		else if (DISPLAY_VER(dev_priv) >= 11)
 			dev_priv->hotplug_funcs = &gen11_hpd_funcs;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cc13918fe246..986fb30da9ab 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -6059,6 +6059,7 @@
 /* south display engine interrupt: ICP/TGP */
 #define SDE_GMBUS_ICP			(1 << 23)
 #define SDE_TC_HOTPLUG_ICP(hpd_pin)	REG_BIT(24 + _HPD_PIN_TC(hpd_pin))
+#define SDE_TC_HOTPLUG_DG2(hpd_pin)	REG_BIT(25 + _HPD_PIN_TC(hpd_pin)) /* sigh */
 #define SDE_DDI_HOTPLUG_ICP(hpd_pin)	REG_BIT(16 + _HPD_PIN_DDI(hpd_pin))
 #define SDE_DDI_HOTPLUG_MASK_ICP	(SDE_DDI_HOTPLUG_ICP(HPD_PORT_D) | \
 					 SDE_DDI_HOTPLUG_ICP(HPD_PORT_C) | \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/15] drm/i915: add needs_compact_pt flag
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

v6:
	* minor doc formatting

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 11 ++++++++---
 drivers/gpu/drm/i915/i915_pci.c          |  2 ++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f600d1cb01b3..4a3ac66e777a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1340,12 +1340,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.
+ * device local memory access.
  */
 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
 
+/*
+ * Set this flag when platform doesn't allow both 64k pages and 4k pages in
+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 91677a9f330c..8df8887d76ae 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1030,6 +1030,7 @@ static const struct intel_device_info xehpsdv_info = {
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1048,6 +1049,7 @@ static const struct intel_device_info dg2_info = {
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 27dcfe6f2429..f75673da768d 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -131,6 +131,7 @@ enum intel_ppgtt_type {
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
 	func(has_64k_pages); \
+	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 05/15] drm/i915: add needs_compact_pt flag
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

v6:
	* minor doc formatting

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 11 ++++++++---
 drivers/gpu/drm/i915/i915_pci.c          |  2 ++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f600d1cb01b3..4a3ac66e777a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1340,12 +1340,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.
+ * device local memory access.
  */
 #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
 
+/*
+ * Set this flag when platform doesn't allow both 64k pages and 4k pages in
+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
 #define HAS_IPC(dev_priv)		 (INTEL_INFO(dev_priv)->display.has_ipc)
 
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 91677a9f330c..8df8887d76ae 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1030,6 +1030,7 @@ static const struct intel_device_info xehpsdv_info = {
 	PLATFORM(INTEL_XEHPSDV),
 	.display = { },
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1048,6 +1049,7 @@ static const struct intel_device_info dg2_info = {
 	PLATFORM(INTEL_DG2),
 	.has_guc_deprivilege = 1,
 	.has_64k_pages = 1,
+	.needs_compact_pt = 1,
 	.platform_engine_mask =
 		BIT(RCS0) | BIT(BCS0) |
 		BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 27dcfe6f2429..f75673da768d 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -131,6 +131,7 @@ enum intel_ppgtt_type {
 	/* Keep has_* in alphabetical order */ \
 	func(has_64bit_reloc); \
 	func(has_64k_pages); \
+	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
 	func(has_global_mocs); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, lucas.demarchi, Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
	* use needs_compact_pt flag to discriminate between
	  64K and 64K with compact-pt
	* add i915_vm_obj_min_alignment
	* use i915_vm_obj_min_alignment to round up vma reservation
	  if compact-pt instead of hard coding
v5:
	* fix i915_vm_obj_min_alignment for internal objects which
	  have no memory region
v6:
	* tiled_blits_create correctly pick largest required alignment
v8:
	* i915_vm_min_alignment protect against array overflow for mock region

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 21 ++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 22 +++++
 drivers/gpu/drm/i915/i915_vma.c               |  9 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 119 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 8f28e46e8ee5..ddd0772fd828 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -40,6 +40,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -411,14 +412,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL);
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -429,7 +435,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -456,7 +462,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -487,8 +493,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -501,7 +506,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 49a8fb63e6e5..c548c193cd35 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -225,6 +225,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+	} else if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..6cd518a3277c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,25 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	/* avoid INTEL_MEMORY_MOCK overflow */
+	if ((int)type >= ARRAY_SIZE(vm->min_alignment))
+		type = INTEL_MEMORY_SYSTEM;
+
+	return vm->min_alignment[type];
+}
+
+static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
+					    struct drm_i915_gem_object  *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
+
+	return i915_vm_min_alignment(vm, type);
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 845cd88f8313..3558b16a929c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
+	alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj));
+	/*
+	 * for compact-pt we round up the reservation to prevent
+	 * any smaller pages being used within the same PDE
+	 */
+	if (NEEDS_COMPACT_PT(vma->vm->i915))
+		size = round_up(size, alignment);
+
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
@@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	}
 
 	color = 0;
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index e7e6c4b2c81d..0d80509ef3c4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -239,6 +239,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -252,9 +254,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -275,8 +278,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -299,10 +302,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -345,7 +348,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -356,7 +359,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -400,8 +403,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -442,14 +447,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -471,22 +479,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -507,22 +518,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -544,22 +558,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -580,9 +597,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -612,6 +629,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -621,6 +639,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -639,7 +659,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -691,6 +711,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -699,6 +720,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -711,13 +734,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -762,6 +785,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -769,15 +793,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -817,7 +844,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -869,11 +896,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -914,7 +944,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, lucas.demarchi, Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
	* use needs_compact_pt flag to discriminate between
	  64K and 64K with compact-pt
	* add i915_vm_obj_min_alignment
	* use i915_vm_obj_min_alignment to round up vma reservation
	  if compact-pt instead of hard coding
v5:
	* fix i915_vm_obj_min_alignment for internal objects which
	  have no memory region
v6:
	* tiled_blits_create correctly pick largest required alignment
v8:
	* i915_vm_min_alignment protect against array overflow for mock region

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../i915/gem/selftests/i915_gem_client_blt.c  | 21 ++--
 drivers/gpu/drm/i915/gt/intel_gtt.c           | 12 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h           | 22 +++++
 drivers/gpu/drm/i915/i915_vma.c               |  9 ++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++-------
 5 files changed, 119 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index 8f28e46e8ee5..ddd0772fd828 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -40,6 +40,7 @@ struct tiled_blits {
 	struct blit_buffer scratch;
 	struct i915_vma *batch;
 	u64 hole;
+	u64 align;
 	u32 width;
 	u32 height;
 };
@@ -411,14 +412,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_free;
 	}
 
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
+	t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL);
+	t->align = max(t->align,
+		       i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
+
+	hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align);
 	hole_size *= 2; /* room to maneuver */
-	hole_size += 2 * I915_GTT_MIN_ALIGNMENT;
+	hole_size += 2 * t->align; /* padding on either side */
 
 	mutex_lock(&t->ce->vm->mutex);
 	memset(&hole, 0, sizeof(hole));
 	err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole,
-					  hole_size, 0, I915_COLOR_UNEVICTABLE,
+					  hole_size, t->align,
+					  I915_COLOR_UNEVICTABLE,
 					  0, U64_MAX,
 					  DRM_MM_INSERT_BEST);
 	if (!err)
@@ -429,7 +435,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng)
 		goto err_put;
 	}
 
-	t->hole = hole.start + I915_GTT_MIN_ALIGNMENT;
+	t->hole = hole.start + t->align;
 	pr_info("Using hole at %llx\n", t->hole);
 
 	err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng);
@@ -456,7 +462,7 @@ static void tiled_blits_destroy(struct tiled_blits *t)
 static int tiled_blits_prepare(struct tiled_blits *t,
 			       struct rnd_state *prng)
 {
-	u64 offset = PAGE_ALIGN(t->width * t->height * 4);
+	u64 offset = round_up(t->width * t->height * 4, t->align);
 	u32 *map;
 	int err;
 	int i;
@@ -487,8 +493,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
 
 static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 {
-	u64 offset =
-		round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT);
+	u64 offset = round_up(t->width * t->height * 4, 2 * t->align);
 	int err;
 
 	/* We want to check position invariant tiling across GTT eviction */
@@ -501,7 +506,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
 
 	/* Reposition so that we overlap the old addresses, and slightly off */
 	err = tiled_blit(t,
-			 &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT,
+			 &t->buffers[2], t->hole + t->align,
 			 &t->buffers[1], t->hole + 3 * offset / 2);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 49a8fb63e6e5..c548c193cd35 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -225,6 +225,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	GEM_BUG_ON(!vm->total);
 	drm_mm_init(&vm->mm, 0, vm->total);
+
+	memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
+		 ARRAY_SIZE(vm->min_alignment));
+
+	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M;
+	} else if (HAS_64K_PAGES(vm->i915)) {
+		vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+		vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K;
+	}
+
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 8073438b67c8..6cd518a3277c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -29,6 +29,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 #include "i915_vma_types.h"
+#include "i915_params.h"
+#include "intel_memory_region.h"
 
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
@@ -223,6 +225,7 @@ struct i915_address_space {
 	struct device *dma;
 	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
 	u64 reserved;		/* size addr space reserved */
+	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
 
 	unsigned int bind_async_flags;
 
@@ -384,6 +387,25 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
 	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
 }
 
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
+					enum intel_memory_type type)
+{
+	/* avoid INTEL_MEMORY_MOCK overflow */
+	if ((int)type >= ARRAY_SIZE(vm->min_alignment))
+		type = INTEL_MEMORY_SYSTEM;
+
+	return vm->min_alignment[type];
+}
+
+static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
+					    struct drm_i915_gem_object  *obj)
+{
+	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
+
+	return i915_vm_min_alignment(vm, type);
+}
+
 static inline bool
 i915_vm_has_cache_coloring(struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 845cd88f8313..3558b16a929c 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
 
+	alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj));
+	/*
+	 * for compact-pt we round up the reservation to prevent
+	 * any smaller pages being used within the same PDE
+	 */
+	if (NEEDS_COMPACT_PT(vma->vm->i915))
+		size = round_up(size, alignment);
+
 	/* If binding the object/GGTT view requires more space than the entire
 	 * aperture has, reject it early before evicting everything in a vain
 	 * attempt to find space.
@@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
 	}
 
 	color = 0;
+
 	if (i915_vm_has_cache_coloring(vma->vm))
 		color = vma->obj->cache_level;
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index e7e6c4b2c81d..0d80509ef3c4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -239,6 +239,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			 u64 hole_start, u64 hole_end,
 			 unsigned long end_time)
 {
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	I915_RND_STATE(seed_prng);
 	struct i915_vma_resource *mock_vma_res;
 	unsigned int size;
@@ -252,9 +254,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		I915_RND_SUBSTATE(prng, seed_prng);
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -275,8 +278,8 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 		GEM_BUG_ON(!order);
 
-		GEM_BUG_ON(count * BIT_ULL(size) > vm->total);
-		GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end);
+		GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total);
+		GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
 
 		/* Ignore allocation failures (i.e. don't report them as
 		 * a test failure) as we are purposefully allocating very
@@ -299,10 +302,10 @@ static int lowlevel_hole(struct i915_address_space *vm,
 		}
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
-			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
+			GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
 
 			if (igt_timeout(end_time,
 					"%s timed out before %d/%d\n",
@@ -345,7 +348,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 			}
 
 			mock_vma_res->bi.pages = obj->mm.pages;
-			mock_vma_res->node_size = BIT_ULL(size);
+			mock_vma_res->node_size = BIT_ULL(aligned_size);
 			mock_vma_res->start = addr;
 
 			with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref)
@@ -356,7 +359,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
 
 		i915_random_reorder(order, count, &prng);
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 			intel_wakeref_t wakeref;
 
 			GEM_BUG_ON(addr + BIT_ULL(size) > vm->total);
@@ -400,8 +403,10 @@ static int fill_hole(struct i915_address_space *vm,
 {
 	const u64 hole_size = hole_end - hole_start;
 	struct drm_i915_gem_object *obj;
+	const unsigned int min_alignment =
+		i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
 	const unsigned long max_pages =
-		min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT);
+		min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment));
 	const unsigned long max_step = max(int_sqrt(max_pages), 2UL);
 	unsigned long npages, prime, flags;
 	struct i915_vma *vma;
@@ -442,14 +447,17 @@ static int fill_hole(struct i915_address_space *vm,
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -471,22 +479,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -507,22 +518,25 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					err = i915_vma_pin(vma, 0, 0, offset | flags);
@@ -544,22 +558,25 @@ static int fill_hole(struct i915_address_space *vm,
 					i915_vma_unpin(vma);
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 
 				offset = p->offset;
 				list_for_each_entry_reverse(obj, &objects, st_link) {
+					u64 aligned_size = round_up(obj->base.size,
+								    min_alignment);
+
 					vma = i915_vma_instance(obj, vm, NULL);
 					if (IS_ERR(vma))
 						continue;
 
 					if (p->step < 0) {
-						if (offset < hole_start + obj->base.size)
+						if (offset < hole_start + aligned_size)
 							break;
-						offset -= obj->base.size;
+						offset -= aligned_size;
 					}
 
 					if (!drm_mm_node_allocated(&vma->node) ||
@@ -580,9 +597,9 @@ static int fill_hole(struct i915_address_space *vm,
 					}
 
 					if (p->step > 0) {
-						if (offset + obj->base.size > hole_end)
+						if (offset + aligned_size > hole_end)
 							break;
-						offset += obj->base.size;
+						offset += aligned_size;
 					}
 				}
 			}
@@ -612,6 +629,7 @@ static int walk_hole(struct i915_address_space *vm,
 	const u64 hole_size = hole_end - hole_start;
 	const unsigned long max_pages =
 		min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT);
+	unsigned long min_alignment;
 	unsigned long flags;
 	u64 size;
 
@@ -621,6 +639,8 @@ static int walk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	for_each_prime_number_from(size, 1, max_pages) {
 		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
@@ -639,7 +659,7 @@ static int walk_hole(struct i915_address_space *vm,
 
 		for (addr = hole_start;
 		     addr + obj->base.size < hole_end;
-		     addr += obj->base.size) {
+		     addr += round_up(obj->base.size, min_alignment)) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
 				pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n",
@@ -691,6 +711,7 @@ static int pot_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	unsigned int min_alignment;
 	unsigned long flags;
 	unsigned int pot;
 	int err = 0;
@@ -699,6 +720,8 @@ static int pot_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
@@ -711,13 +734,13 @@ static int pot_hole(struct i915_address_space *vm,
 
 	/* Insert a pair of pages across every pot boundary within the hole */
 	for (pot = fls64(hole_end - 1) - 1;
-	     pot > ilog2(2 * I915_GTT_PAGE_SIZE);
+	     pot > ilog2(2 * min_alignment);
 	     pot--) {
 		u64 step = BIT_ULL(pot);
 		u64 addr;
 
-		for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
-		     addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE;
+		for (addr = round_up(hole_start + min_alignment, step) - min_alignment;
+		     addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment;
 		     addr += step) {
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -762,6 +785,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		      unsigned long end_time)
 {
 	I915_RND_STATE(prng);
+	unsigned int min_alignment;
 	unsigned int size;
 	unsigned long flags;
 
@@ -769,15 +793,18 @@ static int drunk_hole(struct i915_address_space *vm,
 	if (i915_is_ggtt(vm))
 		flags |= PIN_GLOBAL;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (size = 12; (hole_end - hole_start) >> size; size++) {
 		struct drm_i915_gem_object *obj;
 		unsigned int *order, count, n;
 		struct i915_vma *vma;
-		u64 hole_size;
+		u64 hole_size, aligned_size;
 		int err = -ENODEV;
 
-		hole_size = (hole_end - hole_start) >> size;
+		aligned_size = max_t(u32, ilog2(min_alignment), size);
+		hole_size = (hole_end - hole_start) >> aligned_size;
 		if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32))
 			hole_size = KMALLOC_MAX_SIZE / sizeof(u32);
 		count = hole_size >> 1;
@@ -817,7 +844,7 @@ static int drunk_hole(struct i915_address_space *vm,
 		GEM_BUG_ON(vma->size != BIT_ULL(size));
 
 		for (n = 0; n < count; n++) {
-			u64 addr = hole_start + order[n] * BIT_ULL(size);
+			u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
 
 			err = i915_vma_pin(vma, 0, 0, addr | flags);
 			if (err) {
@@ -869,11 +896,14 @@ static int __shrink_hole(struct i915_address_space *vm,
 {
 	struct drm_i915_gem_object *obj;
 	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	unsigned int min_alignment;
 	unsigned int order = 12;
 	LIST_HEAD(objects);
 	int err = 0;
 	u64 addr;
 
+	min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM);
+
 	/* Keep creating larger objects until one cannot fit into the hole */
 	for (addr = hole_start; addr < hole_end; ) {
 		struct i915_vma *vma;
@@ -914,7 +944,7 @@ static int __shrink_hole(struct i915_address_space *vm,
 		}
 
 		i915_vma_unpin(vma);
-		addr += size;
+		addr += round_up(size, min_alignment);
 
 		/*
 		 * Since we are injecting allocation faults at random intervals,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/15] drm/i915: support 64K GTT pages for discrete cards
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, kernel test robot, lucas.demarchi,
	Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 8424ee8c5eb8..0528fe1fc9b3 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1479,6 +1479,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1736,6 +1795,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6cd518a3277c..5e038cef0d9f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 07/15] drm/i915: support 64K GTT pages for discrete cards
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, lucas.demarchi, Matthew Auld, Rodrigo Vivi

From: Matthew Auld <matthew.auld@intel.com>

discrete cards optimise 64K GTT pages for local-memory, since everything
should be allocated at 64K granularity. We say goodbye to sparse
entries, and instead get a compact 256B page-table for 64K pages,
which should be more cache friendly. 4K pages for local-memory
are no longer supported by the HW.

v4: don't return uninitialized err in igt_ppgtt_compact
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  60 ++++++++++
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 108 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   3 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   1 +
 4 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 8424ee8c5eb8..0528fe1fc9b3 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1479,6 +1479,65 @@ static int igt_ppgtt_sanity_check(void *arg)
 	return err;
 }
 
+static int igt_ppgtt_compact(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct drm_i915_gem_object *obj;
+	int err;
+
+	/*
+	 * Simple test to catch issues with compact 64K pages -- since the pt is
+	 * compacted to 256B that gives us 32 entries per pt, however since the
+	 * backing page for the pt is 4K, any extra entries we might incorrectly
+	 * write out should be ignored by the HW. If ever hit such a case this
+	 * test should catch it since some of our writes would land in scratch.
+	 */
+
+	if (!HAS_64K_PAGES(i915)) {
+		pr_info("device lacks compact 64K page support, skipping\n");
+		return 0;
+	}
+
+	if (!HAS_LMEM(i915)) {
+		pr_info("device lacks LMEM support, skipping\n");
+		return 0;
+	}
+
+	/* We want the range to cover multiple page-table boundaries. */
+	obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	err = i915_gem_object_pin_pages_unlocked(obj);
+	if (err)
+		goto out_put;
+
+	if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
+		pr_info("LMEM compact unable to allocate huge-page(s)\n");
+		goto out_unpin;
+	}
+
+	/*
+	 * Disable 2M GTT pages by forcing the page-size to 64K for the GTT
+	 * insertion.
+	 */
+	obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
+
+	err = igt_write_huge(i915, obj);
+	if (err)
+		pr_err("LMEM compact write-huge failed\n");
+
+out_unpin:
+	i915_gem_object_unpin_pages(obj);
+out_put:
+	i915_gem_object_put(obj);
+
+	if (err == -ENOMEM)
+		err = 0;
+
+	return err;
+}
+
 static int igt_tmpfs_fallback(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -1736,6 +1795,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_tmpfs_fallback),
 		SUBTEST(igt_ppgtt_smoke_huge),
 		SUBTEST(igt_ppgtt_sanity_check),
+		SUBTEST(igt_ppgtt_compact),
 	};
 
 	if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index c43e724afa9f..62471730266c 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 						   start, end, lvl);
 		} else {
 			unsigned int count;
+			unsigned int pte = gen8_pd_index(start, 0);
+			unsigned int num_ptes;
 			u64 *vaddr;
 
 			count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm,
 			    atomic_read(&pt->used));
 			GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
 
+			num_ptes = count;
+			if (pt->is_compact) {
+				GEM_BUG_ON(num_ptes % 16);
+				GEM_BUG_ON(pte % 16);
+				num_ptes /= 16;
+				pte /= 16;
+			}
+
 			vaddr = px_vaddr(pt);
-			memset64(vaddr + gen8_pd_index(start, 0),
+			memset64(vaddr + pte,
 				 vm->scratch[0]->encode,
-				 count);
+				 num_ptes);
 
 			atomic_sub(count, &pt->used);
 			start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	return idx;
 }
 
+static void
+xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
+			  struct i915_vma_resource *vma_res,
+			  struct sgt_dma *iter,
+			  enum i915_cache_level cache_level,
+			  u32 flags)
+{
+	const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
+	unsigned int rem = sg_dma_len(iter->sg);
+	u64 start = vma_res->start;
+
+	GEM_BUG_ON(!i915_vm_is_4lvl(vm));
+
+	do {
+		struct i915_page_directory * const pdp =
+			gen8_pdp_for_page_address(vm, start);
+		struct i915_page_directory * const pd =
+			i915_pd_entry(pdp, __gen8_pte_index(start, 2));
+		struct i915_page_table *pt =
+			i915_pt_entry(pd, __gen8_pte_index(start, 1));
+		gen8_pte_t encode = pte_encode;
+		unsigned int page_size;
+		gen8_pte_t *vaddr;
+		u16 index, max;
+
+		max = I915_PDES;
+
+		if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
+		    IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
+		    rem >= I915_GTT_PAGE_SIZE_2M &&
+		    !__gen8_pte_index(start, 0)) {
+			index = __gen8_pte_index(start, 1);
+			encode |= GEN8_PDE_PS_2M;
+			page_size = I915_GTT_PAGE_SIZE_2M;
+
+			vaddr = px_vaddr(pd);
+		} else {
+			if (encode & GEN12_PPGTT_PTE_LM) {
+				GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
+				GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
+				GEM_BUG_ON(!IS_ALIGNED(iter->dma,
+						       I915_GTT_PAGE_SIZE_64K));
+
+				index = __gen8_pte_index(start, 0) / 16;
+				page_size = I915_GTT_PAGE_SIZE_64K;
+
+				max /= 16;
+
+				vaddr = px_vaddr(pd);
+				vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
+
+				pt->is_compact = true;
+			} else {
+				GEM_BUG_ON(pt->is_compact);
+				index =  __gen8_pte_index(start, 0);
+				page_size = I915_GTT_PAGE_SIZE;
+			}
+
+			vaddr = px_vaddr(pt);
+		}
+
+		do {
+			GEM_BUG_ON(rem < page_size);
+			vaddr[index++] = encode | iter->dma;
+
+			start += page_size;
+			iter->dma += page_size;
+			rem -= page_size;
+			if (iter->dma >= iter->max) {
+				iter->sg = __sg_next(iter->sg);
+				if (!iter->sg)
+					break;
+
+				rem = sg_dma_len(iter->sg);
+				if (!rem)
+					break;
+
+				iter->dma = sg_dma_address(iter->sg);
+				iter->max = iter->dma + rem;
+
+				if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
+					break;
+			}
+		} while (rem >= page_size && index < max);
+
+		vma_res->page_sizes_gtt |= page_size;
+	} while (iter->sg && sg_dma_len(iter->sg));
+}
+
 static void gen8_ppgtt_insert_huge(struct i915_address_space *vm,
 				   struct i915_vma_resource *vma_res,
 				   struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma_res);
 
 	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
-		gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		if (HAS_64K_PAGES(vm->i915))
+			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
+		else
+			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
 	} else  {
 		u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 6cd518a3277c..5e038cef0d9f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
 
 #define GEN12_GGTT_PTE_LM	BIT_ULL(1)
 
+#define GEN12_PDE_64K BIT(6)
+
 /*
  * Cacheability Control is a 4-bit value. The low three bits are stored in bits
  * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table {
 		atomic_t used;
 		struct i915_page_table *stash;
 	};
+	bool is_compact;
 };
 
 struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 48e6e2f87700..043652dc6892 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	pt->is_compact = false;
 	atomic_set(&pt->used, 0);
 	return pt;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 08/15] drm/i915: add gtt misalignment test
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi

From: Robert Beckett <bob.beckett@collabora.com>

add test to check handling of misaligned offsets and sizes

v4:
	* remove spurious blank lines
	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot <lkp@intel.com>
v6:
	* use NEEDS_COMPACT_PT instead of hard coding for DG2
v7:
	* use i915_vma_unbind_unlocked in misalignment test
v8:
	* handle stolen smem region returning -ENODEV due to
	  uninitialized on some setups
	* avoid trying to test bad alignments on single page hole regions

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 126 ++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 0d80509ef3c4..cc814abb0105 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -26,10 +26,12 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/i915_gem_internal.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_gtt.h"
 
 #include "i915_random.h"
 #include "i915_selftest.h"
@@ -1068,6 +1070,118 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+	bool is_stolen = mr->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+			 mr->type == INTEL_MEMORY_STOLEN_LOCAL;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj)) {
+		/* if iGVT-g or DMAR is active, stolen mem will be uninitialized */
+		if (PTR_ERR(obj) == -ENODEV && is_stolen)
+			return 0;
+		return PTR_ERR(obj);
+	}
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* compact-pt should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind_unlocked(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind_unlocked(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id);
+		u64 size = min_alignment;
+		u64 addr = round_down(hole_start + (hole_size / 2), min_alignment);
+
+		/* avoid -ENOSPC on very small hole setups */
+		if (hole_size < 3 * min_alignment)
+			continue;
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1137,6 +1251,11 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1209,6 +1328,11 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2181,12 +2305,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/15] drm/i915: add gtt misalignment test
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Robert Beckett, Thomas Hellström, kernel test robot, lucas.demarchi

From: Robert Beckett <bob.beckett@collabora.com>

add test to check handling of misaligned offsets and sizes

v4:
	* remove spurious blank lines
	* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot <lkp@intel.com>
v6:
	* use NEEDS_COMPACT_PT instead of hard coding for DG2
v7:
	* use i915_vma_unbind_unlocked in misalignment test
v8:
	* handle stolen smem region returning -ENODEV due to
	  uninitialized on some setups
	* avoid trying to test bad alignments on single page hole regions

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 126 ++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 0d80509ef3c4..cc814abb0105 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -26,10 +26,12 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_context.h"
+#include "gem/i915_gem_region.h"
 #include "gem/i915_gem_internal.h"
 #include "gem/selftests/mock_context.h"
 #include "gt/intel_context.h"
 #include "gt/intel_gpu_commands.h"
+#include "gt/intel_gtt.h"
 
 #include "i915_random.h"
 #include "i915_selftest.h"
@@ -1068,6 +1070,118 @@ static int shrink_boom(struct i915_address_space *vm,
 	return err;
 }
 
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
+			   u64 addr, u64 size, unsigned long flags)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err = 0;
+	u64 expected_vma_size, expected_node_size;
+	bool is_stolen = mr->type == INTEL_MEMORY_STOLEN_SYSTEM ||
+			 mr->type == INTEL_MEMORY_STOLEN_LOCAL;
+
+	obj = i915_gem_object_create_region(mr, size, 0, 0);
+	if (IS_ERR(obj)) {
+		/* if iGVT-g or DMAR is active, stolen mem will be uninitialized */
+		if (PTR_ERR(obj) == -ENODEV && is_stolen)
+			return 0;
+		return PTR_ERR(obj);
+	}
+
+	vma = i915_vma_instance(obj, vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_put;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, addr | flags);
+	if (err)
+		goto err_put;
+	i915_vma_unpin(vma);
+
+	if (!drm_mm_node_allocated(&vma->node)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
+	expected_node_size = expected_vma_size;
+
+	if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) {
+		/* compact-pt should expand lmem node to 2MB */
+		expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K);
+		expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M);
+	}
+
+	if (vma->size != expected_vma_size || vma->node.size != expected_node_size) {
+		err = i915_vma_unbind_unlocked(vma);
+		err = -EBADSLT;
+		goto err_put;
+	}
+
+	err = i915_vma_unbind_unlocked(vma);
+	if (err)
+		goto err_put;
+
+	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+
+err_put:
+	i915_gem_object_put(obj);
+	cleanup_freed_objects(vm->i915);
+	return err;
+}
+
+static int misaligned_pin(struct i915_address_space *vm,
+			  u64 hole_start, u64 hole_end,
+			  unsigned long end_time)
+{
+	struct intel_memory_region *mr;
+	enum intel_region_id id;
+	unsigned long flags = PIN_OFFSET_FIXED | PIN_USER;
+	int err = 0;
+	u64 hole_size = hole_end - hole_start;
+
+	if (i915_is_ggtt(vm))
+		flags |= PIN_GLOBAL;
+
+	for_each_memory_region(mr, vm->i915, id) {
+		u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id);
+		u64 size = min_alignment;
+		u64 addr = round_down(hole_start + (hole_size / 2), min_alignment);
+
+		/* avoid -ENOSPC on very small hole setups */
+		if (hole_size < 3 * min_alignment)
+			continue;
+
+		/* we can't test < 4k alignment due to flags being encoded in lower bits */
+		if (min_alignment != I915_GTT_PAGE_SIZE_4K) {
+			err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags);
+			/* misaligned should error with -EINVAL*/
+			if (!err)
+				err = -EBADSLT;
+			if (err != -EINVAL)
+				return err;
+		}
+
+		/* test for vma->size expansion to min page size */
+		err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags);
+		if (err)
+			return err;
+
+		/* test for intermediate size not expanding vma->size for large alignments */
+		err = misaligned_case(vm, mr, addr, size / 2, flags);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 			  int (*func)(struct i915_address_space *vm,
 				      u64 hole_start, u64 hole_end,
@@ -1137,6 +1251,11 @@ static int igt_ppgtt_shrink_boom(void *arg)
 	return exercise_ppgtt(arg, shrink_boom);
 }
 
+static int igt_ppgtt_misaligned_pin(void *arg)
+{
+	return exercise_ppgtt(arg, misaligned_pin);
+}
+
 static int sort_holes(void *priv, const struct list_head *A,
 		      const struct list_head *B)
 {
@@ -1209,6 +1328,11 @@ static int igt_ggtt_lowlevel(void *arg)
 	return exercise_ggtt(arg, lowlevel_hole);
 }
 
+static int igt_ggtt_misaligned_pin(void *arg)
+{
+	return exercise_ggtt(arg, misaligned_pin);
+}
+
 static int igt_ggtt_page(void *arg)
 {
 	const unsigned int count = PAGE_SIZE/sizeof(u32);
@@ -2181,12 +2305,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_ppgtt_fill),
 		SUBTEST(igt_ppgtt_shrink),
 		SUBTEST(igt_ppgtt_shrink_boom),
+		SUBTEST(igt_ppgtt_misaligned_pin),
 		SUBTEST(igt_ggtt_lowlevel),
 		SUBTEST(igt_ggtt_drunk),
 		SUBTEST(igt_ggtt_walk),
 		SUBTEST(igt_ggtt_pot),
 		SUBTEST(igt_ggtt_fill),
 		SUBTEST(igt_ggtt_page),
+		SUBTEST(igt_ggtt_misaligned_pin),
 		SUBTEST(igt_cs_tlb),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/15] drm/i915/gtt: allow overriding the pt alignment
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

On some platforms we have alignment restrictions when accessing LMEM
from the GTT. In the next few patches we need to be able to modify the
page-tables directly via the GTT itself.

Suggested-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 10 +++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 5e038cef0d9f..9d83c2d3959c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p);
 struct i915_vm_pt_stash {
 	/* preallocated chains of page tables/directories */
 	struct i915_page_table *pt[2];
+	/*
+	 * Optionally override the alignment/size of the physical page that
+	 * contains each PT. If not set defaults back to the usual
+	 * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging
+	 * structures. MUST be a power-of-two. ONLY applicable on discrete
+	 * platforms.
+	 */
+	int pt_sz;
 };
 
 struct i915_vma_ops {
@@ -595,7 +603,7 @@ void free_scratch(struct i915_address_space *vm);
 
 struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
 struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz);
-struct i915_page_table *alloc_pt(struct i915_address_space *vm);
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz);
 struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
 struct i915_page_directory *__alloc_pd(int npde);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 043652dc6892..d91e2beb7517 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -12,7 +12,7 @@
 #include "gen6_ppgtt.h"
 #include "gen8_ppgtt.h"
 
-struct i915_page_table *alloc_pt(struct i915_address_space *vm)
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz)
 {
 	struct i915_page_table *pt;
 
@@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 	if (unlikely(!pt))
 		return ERR_PTR(-ENOMEM);
 
-	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	pt->base = vm->alloc_pt_dma(vm, sz);
 	if (IS_ERR(pt->base)) {
 		kfree(pt);
 		return ERR_PTR(-ENOMEM);
@@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
 			   u64 size)
 {
 	unsigned long count;
-	int shift, n;
+	int shift, n, pt_sz;
 
 	shift = vm->pd_shift;
 	if (!shift)
 		return 0;
 
+	pt_sz = stash->pt_sz;
+	if (!pt_sz)
+		pt_sz = I915_GTT_PAGE_SIZE_4K;
+	else
+		GEM_BUG_ON(!IS_DGFX(vm->i915));
+
+	GEM_BUG_ON(!is_power_of_2(pt_sz));
+
 	count = pd_count(size, shift);
 	while (count--) {
 		struct i915_page_table *pt;
 
-		pt = alloc_pt(vm);
+		pt = alloc_pt(vm, pt_sz);
 		if (IS_ERR(pt)) {
 			i915_vm_free_pt_stash(vm, stash);
 			return PTR_ERR(pt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 09/15] drm/i915/gtt: allow overriding the pt alignment
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

On some platforms we have alignment restrictions when accessing LMEM
from the GTT. In the next few patches we need to be able to modify the
page-tables directly via the GTT itself.

Suggested-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 10 +++++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 5e038cef0d9f..9d83c2d3959c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p);
 struct i915_vm_pt_stash {
 	/* preallocated chains of page tables/directories */
 	struct i915_page_table *pt[2];
+	/*
+	 * Optionally override the alignment/size of the physical page that
+	 * contains each PT. If not set defaults back to the usual
+	 * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging
+	 * structures. MUST be a power-of-two. ONLY applicable on discrete
+	 * platforms.
+	 */
+	int pt_sz;
 };
 
 struct i915_vma_ops {
@@ -595,7 +603,7 @@ void free_scratch(struct i915_address_space *vm);
 
 struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz);
 struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz);
-struct i915_page_table *alloc_pt(struct i915_address_space *vm);
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz);
 struct i915_page_directory *alloc_pd(struct i915_address_space *vm);
 struct i915_page_directory *__alloc_pd(int npde);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 043652dc6892..d91e2beb7517 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -12,7 +12,7 @@
 #include "gen6_ppgtt.h"
 #include "gen8_ppgtt.h"
 
-struct i915_page_table *alloc_pt(struct i915_address_space *vm)
+struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz)
 {
 	struct i915_page_table *pt;
 
@@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm)
 	if (unlikely(!pt))
 		return ERR_PTR(-ENOMEM);
 
-	pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K);
+	pt->base = vm->alloc_pt_dma(vm, sz);
 	if (IS_ERR(pt->base)) {
 		kfree(pt);
 		return ERR_PTR(-ENOMEM);
@@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm,
 			   u64 size)
 {
 	unsigned long count;
-	int shift, n;
+	int shift, n, pt_sz;
 
 	shift = vm->pd_shift;
 	if (!shift)
 		return 0;
 
+	pt_sz = stash->pt_sz;
+	if (!pt_sz)
+		pt_sz = I915_GTT_PAGE_SIZE_4K;
+	else
+		GEM_BUG_ON(!IS_DGFX(vm->i915));
+
+	GEM_BUG_ON(!is_power_of_2(pt_sz));
+
 	count = pd_count(size, shift);
 	while (count--) {
 		struct i915_page_table *pt;
 
-		pt = alloc_pt(vm);
+		pt = alloc_pt(vm, pt_sz);
 		if (IS_ERR(pt)) {
 			i915_vm_free_pt_stash(vm, stash);
 			return PTR_ERR(pt);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/15] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 62471730266c..f574da00eff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 10/15] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

If this is LMEM then we get a 32 entry PT, with each PTE pointing to
some 64K block of memory, otherwise it's just the usual 512 entry PT.
This very much assumes the caller knows what they are doing.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 62471730266c..f574da00eff1 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm,
 		gen8_pdp_for_page_index(vm, idx);
 	struct i915_page_directory *pd =
 		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
 	gen8_pte_t *vaddr;
 
-	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
+	GEM_BUG_ON(pt->is_compact);
+
+	vaddr = px_vaddr(pt);
 	vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags);
 	clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr));
 }
 
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm,
+					    dma_addr_t addr,
+					    u64 offset,
+					    enum i915_cache_level level,
+					    u32 flags)
+{
+	u64 idx = offset >> GEN8_PTE_SHIFT;
+	struct i915_page_directory * const pdp =
+		gen8_pdp_for_page_index(vm, idx);
+	struct i915_page_directory *pd =
+		i915_pd_entry(pdp, gen8_pd_index(idx, 2));
+	struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1));
+	gen8_pte_t *vaddr;
+
+	GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K));
+	GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K));
+
+	if (!pt->is_compact) {
+		vaddr = px_vaddr(pd);
+		vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K;
+		pt->is_compact = true;
+	}
+
+	vaddr = px_vaddr(pt);
+	vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags);
+}
+
+static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm,
+				       dma_addr_t addr,
+				       u64 offset,
+				       enum i915_cache_level level,
+				       u32 flags)
+{
+	if (flags & PTE_LM)
+		return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset,
+						       level, flags);
+
+	return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags);
+}
+
 static int gen8_init_scratch(struct i915_address_space *vm)
 {
 	u32 pte_flags;
@@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
 
 	ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND;
 	ppgtt->vm.insert_entries = gen8_ppgtt_insert;
-	ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
+	if (HAS_64K_PAGES(gt->i915))
+		ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry;
+	else
+		ppgtt->vm.insert_page = gen8_ppgtt_insert_entry;
 	ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc;
 	ppgtt->vm.clear_range = gen8_ppgtt_clear;
 	ppgtt->vm.foreach = gen8_ppgtt_foreach;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/15] drm/i915/migrate: add acceleration support for DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

v2(Ram)
  - Document the ppGTT layout and add a better description for the
    different windows.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 196 ++++++++++++++++++++----
 1 file changed, 164 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 18b44af56969..20444d6ceb3c 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -74,7 +106,32 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * This changes quite a bit on platforms with HAS_64K_PAGES support,
+	 * where we instead have three windows, each CHUNK_SIZE in size. The
+	 * first is reserved for mapping system-memory, and that just uses the
+	 * 512 entry layout using 4K GTT pages. The other two windows just map
+	 * lmem pages and must use the new compact 32 entry layout using 64K GTT
+	 * pages, which ensures we can address any lmem object that the user
+	 * throws at us. We then also use the xehpsdv_toggle_pdes as a way of
+	 * just toggling the PDE bit(GEN12_PDE_64K) for us, to enable the
+	 * compact layout for each of these page-tables, that fall within the
+	 * [CHUNK_SIZE, 3 * CHUNK_SIZE) range.
+	 *
+	 * We lay the ppGTT out as:
+	 *
+	 * [0, CHUNK_SZ) -> first window/object, maps smem
+	 * [CHUNK_SZ, 2 * CHUNK_SZ) -> second window/object, maps lmem src
+	 * [2 * CHUNK_SZ, 3 * CHUNK_SZ) -> third window/object, maps lmem dst
+	 *
+	 * For the PTE window it's also quite different, since each PTE must
+	 * point to some 64K page, one for each PT(since it's in lmem), and yet
+	 * each is only <= 4096bytes, but since the unused space within that PTE
+	 * range is never touched, this should be fine.
+	 *
+	 * So basically each PT now needs 64K of virtual memory, instead of 4K,
+	 * which looks like:
+	 *
+	 * [3 * CHUNK_SZ, 3 * CHUNK_SZ + ((3 * CHUNK_SZ / SZ_2M) * SZ_64K)] -> PTE
 	 */
 
 	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
@@ -86,6 +143,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		goto err_vm;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
+
 	/*
 	 * Each engine instance is assigned its own chunk in the VM, so
 	 * that we can run multiple instances concurrently
@@ -105,14 +165,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -133,7 +199,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -269,19 +346,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -289,7 +385,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -299,6 +395,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -310,7 +408,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -319,13 +428,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -356,7 +467,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -371,31 +483,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -423,6 +535,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -450,15 +563,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -470,7 +596,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -488,14 +614,15 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
@@ -505,17 +632,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -542,6 +669,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -569,7 +697,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -579,7 +711,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, offset, len, value);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 11/15] drm/i915/migrate: add acceleration support for DG2
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Thomas Hellström, lucas.demarchi, Matthew Auld

From: Matthew Auld <matthew.auld@intel.com>

This is all kinds of awkward since we now have to contend with using 64K
GTT pages when mapping anything in LMEM(including the page-tables
themselves).

v2(Ram)
  - Document the ppGTT layout and add a better description for the
    different windows.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_migrate.c | 196 ++++++++++++++++++++----
 1 file changed, 164 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 18b44af56969..20444d6ceb3c 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine)
 	return true;
 }
 
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
+				struct i915_page_table *pt,
+				void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * Insert a dummy PTE into every PT that will map to LMEM to ensure
+	 * we have a correctly setup PDE structure for later use.
+	 */
+	vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
+	GEM_BUG_ON(!pt->is_compact);
+	d->offset += SZ_2M;
+}
+
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
+			       struct i915_page_table *pt,
+			       void *data)
+{
+	struct insert_pte_data *d = data;
+
+	/*
+	 * We are playing tricks here, since the actual pt, from the hw
+	 * pov, is only 256bytes with 32 entries, or 4096bytes with 512
+	 * entries, but we are still guaranteed that the physical
+	 * alignment is 64K underneath for the pt, and we are careful
+	 * not to access the space in the void.
+	 */
+	vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
+	d->offset += SZ_64K;
+}
+
 static void insert_pte(struct i915_address_space *vm,
 		       struct i915_page_table *pt,
 		       void *data)
@@ -74,7 +106,32 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 	 * i.e. within the same non-preemptible window so that we do not switch
 	 * to another migration context that overwrites the PTE.
 	 *
-	 * TODO: Add support for huge LMEM PTEs
+	 * This changes quite a bit on platforms with HAS_64K_PAGES support,
+	 * where we instead have three windows, each CHUNK_SIZE in size. The
+	 * first is reserved for mapping system-memory, and that just uses the
+	 * 512 entry layout using 4K GTT pages. The other two windows just map
+	 * lmem pages and must use the new compact 32 entry layout using 64K GTT
+	 * pages, which ensures we can address any lmem object that the user
+	 * throws at us. We then also use the xehpsdv_toggle_pdes as a way of
+	 * just toggling the PDE bit(GEN12_PDE_64K) for us, to enable the
+	 * compact layout for each of these page-tables, that fall within the
+	 * [CHUNK_SIZE, 3 * CHUNK_SIZE) range.
+	 *
+	 * We lay the ppGTT out as:
+	 *
+	 * [0, CHUNK_SZ) -> first window/object, maps smem
+	 * [CHUNK_SZ, 2 * CHUNK_SZ) -> second window/object, maps lmem src
+	 * [2 * CHUNK_SZ, 3 * CHUNK_SZ) -> third window/object, maps lmem dst
+	 *
+	 * For the PTE window it's also quite different, since each PTE must
+	 * point to some 64K page, one for each PT(since it's in lmem), and yet
+	 * each is only <= 4096bytes, but since the unused space within that PTE
+	 * range is never touched, this should be fine.
+	 *
+	 * So basically each PT now needs 64K of virtual memory, instead of 4K,
+	 * which looks like:
+	 *
+	 * [3 * CHUNK_SZ, 3 * CHUNK_SZ + ((3 * CHUNK_SZ / SZ_2M) * SZ_64K)] -> PTE
 	 */
 
 	vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
@@ -86,6 +143,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		goto err_vm;
 	}
 
+	if (HAS_64K_PAGES(gt->i915))
+		stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
+
 	/*
 	 * Each engine instance is assigned its own chunk in the VM, so
 	 * that we can run multiple instances concurrently
@@ -105,14 +165,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 		 * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need
 		 * 4x2 page directories for source/destination.
 		 */
-		sz = 2 * CHUNK_SZ;
+		if (HAS_64K_PAGES(gt->i915))
+			sz = 3 * CHUNK_SZ;
+		else
+			sz = 2 * CHUNK_SZ;
 		d.offset = base + sz;
 
 		/*
 		 * We need another page directory setup so that we can write
 		 * the 8x512 PTE in each chunk.
 		 */
-		sz += (sz >> 12) * sizeof(u64);
+		if (HAS_64K_PAGES(gt->i915))
+			sz += (sz / SZ_2M) * SZ_64K;
+		else
+			sz += (sz >> 12) * sizeof(u64);
 
 		err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz);
 		if (err)
@@ -133,7 +199,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt)
 			goto err_vm;
 
 		/* Now allow the GPU to rewrite the PTE via its own ppGTT */
-		vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
+		if (HAS_64K_PAGES(gt->i915)) {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       xehpsdv_insert_pte, &d);
+			d.offset = base + CHUNK_SZ;
+			vm->vm.foreach(&vm->vm,
+				       d.offset,
+				       2 * CHUNK_SZ,
+				       xehpsdv_toggle_pdes, &d);
+		} else {
+			vm->vm.foreach(&vm->vm, base, d.offset - base,
+				       insert_pte, &d);
+		}
 	}
 
 	return &vm->vm;
@@ -269,19 +346,38 @@ static int emit_pte(struct i915_request *rq,
 		    u64 offset,
 		    int length)
 {
+	bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915);
 	const u64 encode = rq->context->vm->pte_encode(0, cache_level,
 						       is_lmem ? PTE_LM : 0);
 	struct intel_ring *ring = rq->ring;
-	int total = 0;
+	int pkt, dword_length;
+	u32 total = 0;
+	u32 page_size;
 	u32 *hdr, *cs;
-	int pkt;
 
 	GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
 
+	page_size = I915_GTT_PAGE_SIZE;
+	dword_length = 0x400;
+
 	/* Compute the page directory offset for the target address range */
-	offset >>= 12;
-	offset *= sizeof(u64);
-	offset += 2 * CHUNK_SZ;
+	if (has_64K_pages) {
+		GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
+
+		offset /= SZ_2M;
+		offset *= SZ_64K;
+		offset += 3 * CHUNK_SZ;
+
+		if (is_lmem) {
+			page_size = I915_GTT_PAGE_SIZE_64K;
+			dword_length = 0x40;
+		}
+	} else {
+		offset >>= 12;
+		offset *= sizeof(u64);
+		offset += 2 * CHUNK_SZ;
+	}
+
 	offset += (u64)rq->engine->instance << 32;
 
 	cs = intel_ring_begin(rq, 6);
@@ -289,7 +385,7 @@ static int emit_pte(struct i915_request *rq,
 		return PTR_ERR(cs);
 
 	/* Pack as many PTE updates as possible into a single MI command */
-	pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+	pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5);
 	pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 	hdr = cs;
@@ -299,6 +395,8 @@ static int emit_pte(struct i915_request *rq,
 
 	do {
 		if (cs - hdr >= pkt) {
+			int dword_rem;
+
 			*hdr += cs - hdr - 2;
 			*cs++ = MI_NOOP;
 
@@ -310,7 +408,18 @@ static int emit_pte(struct i915_request *rq,
 			if (IS_ERR(cs))
 				return PTR_ERR(cs);
 
-			pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
+			dword_rem = dword_length;
+			if (has_64K_pages) {
+				if (IS_ALIGNED(total, SZ_2M)) {
+					offset = round_up(offset, SZ_64K);
+				} else {
+					dword_rem = SZ_2M - (total & (SZ_2M - 1));
+					dword_rem /= page_size;
+					dword_rem *= 2;
+				}
+			}
+
+			pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5);
 			pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
 
 			hdr = cs;
@@ -319,13 +428,15 @@ static int emit_pte(struct i915_request *rq,
 			*cs++ = upper_32_bits(offset);
 		}
 
+		GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
+
 		*cs++ = lower_32_bits(encode | it->dma);
 		*cs++ = upper_32_bits(encode | it->dma);
 
 		offset += 8;
-		total += I915_GTT_PAGE_SIZE;
+		total += page_size;
 
-		it->dma += I915_GTT_PAGE_SIZE;
+		it->dma += page_size;
 		if (it->dma >= it->max) {
 			it->sg = __sg_next(it->sg);
 			if (!it->sg || sg_dma_len(it->sg) == 0)
@@ -356,7 +467,8 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
-static int emit_copy(struct i915_request *rq, int size)
+static int emit_copy(struct i915_request *rq,
+		     u32 dst_offset, u32 src_offset, int size)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
 	u32 instance = rq->engine->instance;
@@ -371,31 +483,31 @@ static int emit_copy(struct i915_request *rq, int size)
 		*cs++ = BLT_DEPTH_32 | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else if (ver >= 8) {
 		*cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = instance;
 		*cs++ = 0;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 		*cs++ = instance;
 	} else {
 		GEM_BUG_ON(instance);
 		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;
-		*cs++ = CHUNK_SZ; /* dst offset */
+		*cs++ = dst_offset;
 		*cs++ = PAGE_SIZE;
-		*cs++ = 0; /* src offset */
+		*cs++ = src_offset;
 	}
 
 	intel_ring_advance(rq, cs);
@@ -423,6 +535,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 src_offset, dst_offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -450,15 +563,28 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
-			       CHUNK_SZ);
+		src_offset = 0;
+		dst_offset = CHUNK_SZ;
+		if (HAS_64K_PAGES(ce->engine->i915)) {
+			GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+			src_offset = 0;
+			dst_offset = 0;
+			if (src_is_lmem)
+				src_offset = CHUNK_SZ;
+			if (dst_is_lmem)
+				dst_offset = 2 * CHUNK_SZ;
+		}
+
+		len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
+			       src_offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
 		}
 
 		err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
-			       CHUNK_SZ, len);
+			       dst_offset, len);
 		if (err < 0)
 			goto out_rq;
 		if (err < len) {
@@ -470,7 +596,7 @@ intel_context_migrate_copy(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_copy(rq, len);
+		err = emit_copy(rq, dst_offset, src_offset, len);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
@@ -488,14 +614,15 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 {
 	const int ver = GRAPHICS_VER(rq->engine->i915);
-	u32 instance = rq->engine->instance;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
+	offset += (u64)rq->engine->instance << 32;
+
 	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
@@ -505,17 +632,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value)
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0; /* offset */
-		*cs++ = instance;
+		*cs++ = lower_32_bits(offset);
+		*cs++ = upper_32_bits(offset);
 		*cs++ = value;
 		*cs++ = MI_NOOP;
 	} else {
-		GEM_BUG_ON(instance);
+		GEM_BUG_ON(upper_32_bits(offset));
 		*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
 		*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE;
 		*cs++ = 0;
 		*cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
-		*cs++ = 0;
+		*cs++ = lower_32_bits(offset);
 		*cs++ = value;
 	}
 
@@ -542,6 +669,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 	GEM_BUG_ON(ce->ring->size < SZ_64K);
 
 	do {
+		u32 offset;
 		int len;
 
 		rq = i915_request_create(ce);
@@ -569,7 +697,11 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
+		offset = 0;
+		if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+			offset = CHUNK_SZ;
+
+		len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
 		if (len <= 0) {
 			err = len;
 			goto out_rq;
@@ -579,7 +711,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, len, value);
+		err = emit_clear(rq, offset, len, value);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 12/15] drm/i915/uapi: document behaviour for DG2 64K support
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Thomas Hellström, Simon Ser, lucas.demarchi,
	Kenneth Graunke, Slawomir Milczarek, Pekka Paalanen,
	Matthew Auld, mesa-dev

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v4: Kdoc modification.
v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 45 ++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 914ebd9290e5..05c3642aaece 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3144,11 +3150,40 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * DG2 64K min page size implications:
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * Special DG2 GTT address alignment requirement:
+	 *
+	 * The GTT alignment will also need to be at least 2M for such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE (which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. The kernel will internally
+	 * pad them out to the next 2MB boundary. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects DG2, this
+	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/15] drm/i915/uapi: document behaviour for DG2 64K support
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: Tony Ye, Thomas Hellström, lucas.demarchi, Kenneth Graunke,
	Slawomir Milczarek, Matthew Auld, Jordan Justen, mesa-dev

From: Matthew Auld <matthew.auld@intel.com>

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v4: Kdoc modification.
v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Acked-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
cc: Simon Ser <contact@emersion.fr>
cc: Pekka Paalanen <ppaalanen@gmail.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-dev@lists.freedesktop.org
Cc: Tony Ye <tony.ye@intel.com>
Cc: Slawomir Milczarek <slawomir.milczarek@intel.com>
---
 include/uapi/drm/i915_drm.h | 45 ++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 914ebd9290e5..05c3642aaece 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 {
 	/**
 	 * When the EXEC_OBJECT_PINNED flag is specified this is populated by
 	 * the user with the GTT offset at which this object will be pinned.
+	 *
 	 * When the I915_EXEC_NO_RELOC flag is specified this must contain the
 	 * presumed_offset of the object.
+	 *
 	 * During execbuffer2 the kernel populates it with the value of the
 	 * current GTT offset of the object, for future presumed_offset writes.
+	 *
+	 * See struct drm_i915_gem_create_ext for the rules when dealing with
+	 * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with
+	 * minimum page sizes, like DG2.
 	 */
 	__u64 offset;
 
@@ -3144,11 +3150,40 @@ struct drm_i915_gem_create_ext {
 	 *
 	 * The (page-aligned) allocated size for the object will be returned.
 	 *
-	 * Note that for some devices we have might have further minimum
-	 * page-size restrictions(larger than 4K), like for device local-memory.
-	 * However in general the final size here should always reflect any
-	 * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS
-	 * extension to place the object in device local-memory.
+	 *
+	 * DG2 64K min page size implications:
+	 *
+	 * On discrete platforms, starting from DG2, we have to contend with GTT
+	 * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE
+	 * objects.  Specifically the hardware only supports 64K or larger GTT
+	 * page sizes for such memory. The kernel will already ensure that all
+	 * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page
+	 * sizes underneath.
+	 *
+	 * Note that the returned size here will always reflect any required
+	 * rounding up done by the kernel, i.e 4K will now become 64K on devices
+	 * such as DG2.
+	 *
+	 * Special DG2 GTT address alignment requirement:
+	 *
+	 * The GTT alignment will also need to be at least 2M for such objects.
+	 *
+	 * Note that due to how the hardware implements 64K GTT page support, we
+	 * have some further complications:
+	 *
+	 *   1) The entire PDE (which covers a 2MB virtual address range), must
+	 *   contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same
+	 *   PDE is forbidden by the hardware.
+	 *
+	 *   2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
+	 *   objects.
+	 *
+	 * To keep things simple for userland, we mandate that any GTT mappings
+	 * must be aligned to and rounded up to 2MB. The kernel will internally
+	 * pad them out to the next 2MB boundary. As this only wastes virtual
+	 * address space and avoids userland having to copy any needlessly
+	 * complicated PDE sharing scheme (coloring) and only affects DG2, this
+	 * is deemed to be a good compromise.
 	 */
 	__u64 size;
 	/**
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 13/15] drm/i915/xehpsdv: Add has_flat_ccs to device info
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi, CQ Tang, Matthew Auld

From: CQ Tang <cq.tang@intel.com>

Platforms of XeHP and beyond support 3D surface (buffer) compression and
various compression formats. This is accomplished by an additional
compression control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 6 ++++++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4a3ac66e777a..1c2f4ae4ebf9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1356,6 +1356,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+/*
+ * Platform has the dedicated compression control state for each lmem surfaces
+ * stored in lmem to support the 3D and media compression formats.
+ */
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 8df8887d76ae..f449c454b6f8 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1005,6 +1005,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index f75673da768d..2508a47fb3f5 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -134,6 +134,7 @@ enum intel_ppgtt_type {
 	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/15] drm/i915/xehpsdv: Add has_flat_ccs to device info
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi, CQ Tang, Matthew Auld

From: CQ Tang <cq.tang@intel.com>

Platforms of XeHP and beyond support 3D surface (buffer) compression and
various compression formats. This is accomplished by an additional
compression control state (CCS) stored for each surface.

Gen 12 devices(TGL family and DG1) stores compression states in a separate
region of memory. It is managed by user-space and has an associated set of
user-space managed page tables used by hardware for address translation.

In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced
i.e Flat CCS. It replaced AUX page tables with a flat indexed region of
device memory for storing compression states.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          | 6 ++++++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4a3ac66e777a..1c2f4ae4ebf9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1356,6 +1356,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
 #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
 
+/*
+ * Platform has the dedicated compression control state for each lmem surfaces
+ * stored in lmem to support the 3D and media compression formats.
+ */
+#define HAS_FLAT_CCS(dev_priv)   (INTEL_INFO(dev_priv)->has_flat_ccs)
+
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 8df8887d76ae..f449c454b6f8 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1005,6 +1005,7 @@ static const struct intel_device_info adl_p_info = {
 	XE_HP_PAGE_SIZES, \
 	.dma_mask_size = 46, \
 	.has_64bit_reloc = 1, \
+	.has_flat_ccs = 1, \
 	.has_global_mocs = 1, \
 	.has_gt_uc = 1, \
 	.has_llc = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index f75673da768d..2508a47fb3f5 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -134,6 +134,7 @@ enum intel_ppgtt_type {
 	func(needs_compact_pt); \
 	func(gpu_reset_clobbers_display); \
 	func(has_reset_engine); \
+	func(has_flat_ccs); \
 	func(has_global_mocs); \
 	func(has_gt_uc); \
 	func(has_guc_deprivilege); \
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/15] drm/i915/lmem: Enable lmem for platforms with Flat CCS
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Abdiel Janulgue, lucas.demarchi, Matthew Auld

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

v2:
  Addressed the small bar related issue [Matt]
  Removed a reduntant check [Matt]
v3:
  reg addr def is moved to intel_gt_regs.h [Lucas]
  removed a variable
  s/DRM_ERROR/drm_err [Lucas]

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 +++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 26 +++++++++++++++++++--
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e8403fa53909..2da7dd0f66d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -913,6 +913,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 2dad46c3eff2..0f571c8ee22b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index bf4b942c62ee..935ba793a13b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -906,6 +906,9 @@
 #define XEHP_L3NODEARBCFG			_MMIO(0xb0b4)
 #define   XEHP_LNESPARE				REG_BIT(19)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR		_MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT		8
+
 #define GEN8_L3SQCREG1				_MMIO(0xb100)
 /*
  * Note that on CHV the following has an off-by-one error wrt. to BSpec.
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index cb3f66707b21..f3f0ce2c553a 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -12,6 +12,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_regs.h"
 
 static int init_fake_lmem_bar(struct intel_memory_region *mem)
 {
@@ -206,8 +207,29 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+
+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
+			return ERR_PTR(-ENODEV);
+
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			drm_err(&i915->drm,
+				"CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 14/15] drm/i915/lmem: Enable lmem for platforms with Flat CCS
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Abdiel Janulgue, lucas.demarchi, Matthew Auld

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

A portion of device memory is reserved for Flat CCS so usable
device memory will be reduced by size of Flat CCS. Size of
Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”.
So to get effective device memory we need to subtract
total device memory by Flat CCS memory size.

v2:
  Addressed the small bar related issue [Matt]
  Removed a reduntant check [Matt]
v3:
  reg addr def is moved to intel_gt_regs.h [Lucas]
  removed a variable
  s/DRM_ERROR/drm_err [Lucas]

Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 19 +++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt.h          |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     |  3 +++
 drivers/gpu/drm/i915/gt/intel_region_lmem.c | 26 +++++++++++++++++++--
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e8403fa53909..2da7dd0f66d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -913,6 +913,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
 	return intel_uncore_read_fw(gt->uncore, reg);
 }
 
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
+{
+	int type;
+	u8 sliceid, subsliceid;
+
+	for (type = 0; type < NUM_STEERING_TYPES; type++) {
+		if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
+			intel_gt_get_valid_steering(gt, type, &sliceid,
+						    &subsliceid);
+			return intel_uncore_read_with_mcr_steering(gt->uncore,
+								   reg,
+								   sliceid,
+								   subsliceid);
+		}
+	}
+
+	return intel_uncore_read(gt->uncore, reg);
+}
+
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 2dad46c3eff2..0f571c8ee22b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
 }
 
 u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
 
 void intel_gt_info_print(const struct intel_gt_info *info,
 			 struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index bf4b942c62ee..935ba793a13b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -906,6 +906,9 @@
 #define XEHP_L3NODEARBCFG			_MMIO(0xb0b4)
 #define   XEHP_LNESPARE				REG_BIT(19)
 
+#define XEHPSDV_FLAT_CCS_BASE_ADDR		_MMIO(0x4910)
+#define   XEHPSDV_CCS_BASE_SHIFT		8
+
 #define GEN8_L3SQCREG1				_MMIO(0xb100)
 /*
  * Note that on CHV the following has an off-by-one error wrt. to BSpec.
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index cb3f66707b21..f3f0ce2c553a 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -12,6 +12,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_regs.h"
 
 static int init_fake_lmem_bar(struct intel_memory_region *mem)
 {
@@ -206,8 +207,29 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
 	if (!IS_DGFX(i915))
 		return ERR_PTR(-ENODEV);
 
-	/* Stolen starts from GSMBASE on DG1 */
-	lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
+	if (HAS_FLAT_CCS(i915)) {
+		u64 tile_stolen, flat_ccs_base;
+
+		lmem_size = pci_resource_len(pdev, 2);
+		flat_ccs_base = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
+		flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+
+		if (GEM_WARN_ON(lmem_size < flat_ccs_base))
+			return ERR_PTR(-ENODEV);
+
+		tile_stolen = lmem_size - flat_ccs_base;
+
+		/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
+		if (tile_stolen == lmem_size)
+			drm_err(&i915->drm,
+				"CCS_BASE_ADDR register did not have expected value\n");
+
+		lmem_size -= tile_stolen;
+	} else {
+		/* Stolen starts from GSMBASE without CCS */
+		lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
+	}
+
 
 	io_start = pci_resource_start(pdev, 2);
 	if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 18:47   ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi, CQ Tang, Ayaz A Siddiqui

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
 2 files changed, 156 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..166de5436c4a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,21 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..9f9cd2649377 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      1, 1, num_ccs_blks);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-02-18 18:47   ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-18 18:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: lucas.demarchi, CQ Tang

From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

Flat CCS memory can not be directly accessed by S/W.
Address of CCS buffer associated main BO is automatically calculated
by device itself. KMD/UMD can only access this buffer indirectly using
XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
    Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
    Added Kdoc on flat-ccs.

Cc: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
 drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
 2 files changed, 156 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f8253012d166..166de5436c4a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -203,6 +203,21 @@
 #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
 #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
 
+#define XY_CTRL_SURF_INSTR_SIZE	5
+#define MI_FLUSH_DW_SIZE		3
+#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT		21
+#define   DST_ACCESS_TYPE_SHIFT		20
+#define   CCS_SIZE_SHIFT		8
+#define   XY_CTRL_SURF_MOCS_SHIFT	25
+#define   NUM_CCS_BYTES_PER_BLOCK	256
+#define   NUM_BYTES_PER_CCS_BYTE	256
+#define   NUM_CCS_BLKS_PER_XFER		1024
+#define   INDIRECT_ACCESS		0
+#define   DIRECT_ACCESS			1
+#define  MI_FLUSH_LLC			BIT(9)
+#define  MI_FLUSH_CCS			BIT(16)
+
 #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
 #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
 #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 20444d6ceb3c..9f9cd2649377 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,6 +16,8 @@ struct insert_pte_data {
 };
 
 #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
+#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
+					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
 
 static bool engine_supports_migration(struct intel_engine_cs *engine)
 {
@@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
 	return height % 4 == 3 && height <= 8;
 }
 
+/**
+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exaust the lmem, if the object's placements support smem, then we can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
+{
+	/* Mask the 3 LSB to use the PPGTT address space */
+	*cmd++ = MI_FLUSH_DW | flags;
+	*cmd++ = lower_32_bits(dst);
+	*cmd++ = upper_32_bits(dst);
+
+	return cmd;
+}
+
+static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
+{
+	u32 num_cmds, num_blks, total_size;
+
+	if (!GET_CCS_BYTES(i915, size))
+		return 0;
+
+	/*
+	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
+	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
+	 * trnasfer upto 1024 blocks.
+	 */
+	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+				NUM_CCS_BYTES_PER_BLOCK);
+	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
+	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
+
+	/*
+	 * We need to add a flush before and after
+	 * XY_CTRL_SURF_COPY_BLT
+	 */
+	total_size += 2 * MI_FLUSH_DW_SIZE;
+	return total_size;
+}
+
+static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
+				     u8 src_mem_access, u8 dst_mem_access,
+				     int src_mocs, int dst_mocs,
+				     u16 num_ccs_blocks)
+{
+	int i = num_ccs_blocks;
+
+	/*
+	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
+	 * data in and out of the CCS region.
+	 *
+	 * We can copy at most 1024 blocks of 256 bytes using one
+	 * XY_CTRL_SURF_COPY_BLT instruction.
+	 *
+	 * In case we need to copy more than 1024 blocks, we need to add
+	 * another instruction to the same batch buffer.
+	 *
+	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
+	 *
+	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
+	 */
+	do {
+		/*
+		 * We use logical AND with 1023 since the size field
+		 * takes values which is in the range of 0 - 1023
+		 */
+		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
+			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
+			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
+			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
+		*cmd++ = lower_32_bits(src_addr);
+		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
+			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		*cmd++ = lower_32_bits(dst_addr);
+		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
+			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
+		src_addr += SZ_64M;
+		dst_addr += SZ_64M;
+		i -= NUM_CCS_BLKS_PER_XFER;
+	} while (i > 0);
+
+	return cmd;
+}
+
 static int emit_copy(struct i915_request *rq,
 		     u32 dst_offset, u32 src_offset, int size)
 {
@@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
 	return err;
 }
 
-static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
+static int emit_clear(struct i915_request *rq, u64 offset, int size,
+		      u32 value, bool is_lmem)
 {
-	const int ver = GRAPHICS_VER(rq->engine->i915);
+	struct drm_i915_private *i915 = rq->engine->i915;
+	const int ver = GRAPHICS_VER(i915);
+	u32 num_ccs_blks, ccs_ring_size;
 	u32 *cs;
 
 	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
 
 	offset += (u64)rq->engine->instance << 32;
 
-	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
+	/* Clear flat css only when value is 0 */
+	ccs_ring_size = (is_lmem && !value) ?
+			 calc_ctrl_surf_instr_size(i915, size) : 0;
+
+	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
 		*cs++ = value;
 	}
 
+	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
+		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
+					    NUM_CCS_BYTES_PER_BLOCK);
+
+		/*
+		 * Flat CCS surface can only be accessed via
+		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
+		 * mapping of associated LMEM.
+		 * We can clear ccs surface by writing all 0s,
+		 * so we will flush the previously cleared buffer
+		 * and use it as a source.
+		 */
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
+					      DIRECT_ACCESS, INDIRECT_ACCESS,
+					      1, 1, num_ccs_blks);
+		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
+
+		if (ccs_ring_size & 1)
+			*cs++ = MI_NOOP;
+	}
 	intel_ring_advance(rq, cs);
 	return 0;
 }
@@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
 		if (err)
 			goto out_rq;
 
-		err = emit_clear(rq, offset, len, value);
+		err = emit_clear(rq, offset, len, value, is_lmem);
 
 		/* Arbitration is re-enabled between requests. */
 out_rq:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Enable DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
                   ` (15 preceding siblings ...)
  (?)
@ 2022-02-18 19:09 ` Patchwork
  -1 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2022-02-18 19:09 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Enable DG2
URL   : https://patchwork.freedesktop.org/series/100419/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ecf3a6aabe11 drm/i915/dg2: Define GuC firmware version for DG2
9d7f9f8a3467 drm/i915: Fix for PHY_MISC_TC1 offset
-:49: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'port' - possible side-effects?
#49: FILE: drivers/gpu/drm/i915/i915_reg.h:9366:
+#define DG2_PHY_MISC(port)	((port) == PHY_E ? _MMIO(_DG2_PHY_MISC_TC1) : \
+				 ICL_PHY_MISC(port))

total: 0 errors, 0 warnings, 1 checks, 20 lines checked
fa0f06c942c8 drm/i915/dg2: Drop 38.4 MHz MPLLB tables
fa9afbc030e6 drm/i915/dg2: Enable 5th port
b8a7971164dc drm/i915: add needs_compact_pt flag
34bad68c9936 drm/i915: enforce min GTT alignment for discrete cards
-:304: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#304: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:458:
+						if (offset < hole_start + aligned_size)

-:316: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#316: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:482:
+						if (offset + aligned_size > hole_end)

-:334: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#334: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:498:
+						if (offset < hole_start + aligned_size)

-:346: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#346: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:521:
+						if (offset + aligned_size > hole_end)

-:364: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#364: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:537:
+						if (offset < hole_start + aligned_size)

-:376: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#376: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:561:
+						if (offset + aligned_size > hole_end)

-:394: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#394: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:577:
+						if (offset < hole_start + aligned_size)

-:406: WARNING:DEEP_INDENTATION: Too many leading tabs - consider code refactoring
#406: FILE: drivers/gpu/drm/i915/selftests/i915_gem_gtt.c:600:
+						if (offset + aligned_size > hole_end)

total: 0 errors, 8 warnings, 0 checks, 438 lines checked
2a36c620cd9c drm/i915: support 64K GTT pages for discrete cards
fcbf93599a67 drm/i915: add gtt misalignment test
c23f3ce9db9e drm/i915/gtt: allow overriding the pt alignment
720fb0c9e82d drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
4f1f68863bfa drm/i915/migrate: add acceleration support for DG2
ce72b8e0634a drm/i915/uapi: document behaviour for DG2 64K support
eb2aa0aafb5a drm/i915/xehpsdv: Add has_flat_ccs to device info
f94b70011b2d drm/i915/lmem: Enable lmem for platforms with Flat CCS
96e999374594 drm/i915/gt: Clear compress metadata for Xe_HP platforms



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Enable DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
                   ` (16 preceding siblings ...)
  (?)
@ 2022-02-18 19:10 ` Patchwork
  -1 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2022-02-18 19:10 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Enable DG2
URL   : https://patchwork.freedesktop.org/series/100419/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2
  2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
@ 2022-02-18 19:14     ` Ceraolo Spurio, Daniele
  -1 siblings, 0 replies; 47+ messages in thread
From: Ceraolo Spurio, Daniele @ 2022-02-18 19:14 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel
  Cc: Tomasz Mistat, lucas.demarchi, John Harrison



On 2/18/2022 10:47 AM, Ramalingam C wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> First release of GuC for DG2.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> CC: Tomasz Mistat <tomasz.mistat@intel.com>
> CC: Ramalingam C <ramalingam.c@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index c88113044494..55512db29183 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -52,6 +52,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
>    * firmware as TGL.
>    */
>   #define INTEL_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
> +	fw_def(DG2,          0, guc_def(dg2,  69, 0, 3)) \
>   	fw_def(ALDERLAKE_P,  0, guc_def(adlp, 69, 0, 3)) \
>   	fw_def(ALDERLAKE_S,  0, guc_def(tgl,  69, 0, 3)) \
>   	fw_def(DG1,          0, guc_def(dg1,  69, 0, 3)) \


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2
@ 2022-02-18 19:14     ` Ceraolo Spurio, Daniele
  0 siblings, 0 replies; 47+ messages in thread
From: Ceraolo Spurio, Daniele @ 2022-02-18 19:14 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel; +Cc: Tomasz Mistat, lucas.demarchi



On 2/18/2022 10:47 AM, Ramalingam C wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> First release of GuC for DG2.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> CC: Tomasz Mistat <tomasz.mistat@intel.com>
> CC: Ramalingam C <ramalingam.c@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> index c88113044494..55512db29183 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
> @@ -52,6 +52,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw,
>    * firmware as TGL.
>    */
>   #define INTEL_GUC_FIRMWARE_DEFS(fw_def, guc_def) \
> +	fw_def(DG2,          0, guc_def(dg2,  69, 0, 3)) \
>   	fw_def(ALDERLAKE_P,  0, guc_def(adlp, 69, 0, 3)) \
>   	fw_def(ALDERLAKE_S,  0, guc_def(tgl,  69, 0, 3)) \
>   	fw_def(DG1,          0, guc_def(dg1,  69, 0, 3)) \


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Enable DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
                   ` (17 preceding siblings ...)
  (?)
@ 2022-02-18 19:38 ` Patchwork
  -1 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2022-02-18 19:38 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8620 bytes --]

== Series Details ==

Series: drm/i915: Enable DG2
URL   : https://patchwork.freedesktop.org/series/100419/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11250 -> Patchwork_22333
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/index.html

Participating hosts (45 -> 41)
------------------------------

  Additional (2): fi-apl-guc fi-kbl-8809g 
  Missing    (6): shard-tglu bat-dg2-8 fi-bsw-cyan fi-pnv-d510 shard-rkl shard-dg1 

Known issues
------------

  Here are the changes found in Patchwork_22333 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@debugfs_test@read_all_entries:
    - fi-apl-guc:         NOTRUN -> [DMESG-WARN][1] ([i915#1610])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-apl-guc/igt@debugfs_test@read_all_entries.html

  * igt@gem_exec_suspend@basic-s0@smem:
    - fi-kbl-8809g:       NOTRUN -> [DMESG-WARN][2] ([i915#4962]) +1 similar issue
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-skl-6600u:       NOTRUN -> [INCOMPLETE][3] ([i915#4547])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-skl-6600u/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-8809g:       NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#2190])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random-engines:
    - fi-kbl-8809g:       NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#4613]) +3 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@gem_lmem_swapping@random-engines.html

  * igt@i915_pm_rpm@basic-rte:
    - fi-kbl-8809g:       NOTRUN -> [SKIP][6] ([fdo#109271]) +54 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@i915_pm_rpm@basic-rte.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [PASS][7] -> [DMESG-FAIL][8] ([i915#5026])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-kbl-8809g:       NOTRUN -> [SKIP][9] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [PASS][10] -> [DMESG-WARN][11] ([i915#4269])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
    - fi-cfl-8109u:       [PASS][12] -> [DMESG-FAIL][13] ([i915#295])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-cfl-8109u/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-8809g:       NOTRUN -> [SKIP][14] ([fdo#109271] / [i915#533])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-kbl-8809g/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-b:
    - fi-cfl-8109u:       [PASS][15] -> [DMESG-WARN][16] ([i915#295]) +10 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/fi-cfl-8109u/igt@kms_pipe_crc_basic@read-crc-pipe-b.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-cfl-8109u/igt@kms_pipe_crc_basic@read-crc-pipe-b.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - bat-dg1-5:          [PASS][17] -> [FAIL][18] ([fdo#103375])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/bat-dg1-5/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/bat-dg1-5/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  * igt@runner@aborted:
    - fi-blb-e6850:       NOTRUN -> [FAIL][19] ([fdo#109271] / [i915#2403] / [i915#2426] / [i915#4312])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-blb-e6850/igt@runner@aborted.html
    - fi-apl-guc:         NOTRUN -> [FAIL][20] ([i915#2426] / [i915#4312])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/fi-apl-guc/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - {bat-rpls-1}:       [INCOMPLETE][21] ([i915#4898]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/bat-rpls-1/igt@gem_exec_suspend@basic-s3@smem.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/bat-rpls-1/igt@gem_exec_suspend@basic-s3@smem.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1610]: https://gitlab.freedesktop.org/drm/intel/issues/1610
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#295]: https://gitlab.freedesktop.org/drm/intel/issues/295
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4391]: https://gitlab.freedesktop.org/drm/intel/issues/4391
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4898]: https://gitlab.freedesktop.org/drm/intel/issues/4898
  [i915#4962]: https://gitlab.freedesktop.org/drm/intel/issues/4962
  [i915#5026]: https://gitlab.freedesktop.org/drm/intel/issues/5026
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_11250 -> Patchwork_22333

  CI-20190529: 20190529
  CI_DRM_11250: 29edaa3f0f50b56967b046309ab09e0b51c8bda3 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6348: 9cb64a757d2ff1e180b1648e611439d94afd697d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22333: 96e999374594d5a56be090c063493d0380955ca1 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

96e999374594 drm/i915/gt: Clear compress metadata for Xe_HP platforms
f94b70011b2d drm/i915/lmem: Enable lmem for platforms with Flat CCS
eb2aa0aafb5a drm/i915/xehpsdv: Add has_flat_ccs to device info
ce72b8e0634a drm/i915/uapi: document behaviour for DG2 64K support
4f1f68863bfa drm/i915/migrate: add acceleration support for DG2
720fb0c9e82d drm/i915/gtt: add xehpsdv_ppgtt_insert_entry
c23f3ce9db9e drm/i915/gtt: allow overriding the pt alignment
fcbf93599a67 drm/i915: add gtt misalignment test
2a36c620cd9c drm/i915: support 64K GTT pages for discrete cards
34bad68c9936 drm/i915: enforce min GTT alignment for discrete cards
b8a7971164dc drm/i915: add needs_compact_pt flag
fa9afbc030e6 drm/i915/dg2: Enable 5th port
fa0f06c942c8 drm/i915/dg2: Drop 38.4 MHz MPLLB tables
9d7f9f8a3467 drm/i915: Fix for PHY_MISC_TC1 offset
ecf3a6aabe11 drm/i915/dg2: Define GuC firmware version for DG2

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/index.html

[-- Attachment #2: Type: text/html, Size: 9404 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
@ 2022-02-19  1:47     ` Matt Roper
  -1 siblings, 0 replies; 47+ messages in thread
From: Matt Roper @ 2022-02-19  1:47 UTC (permalink / raw)
  To: Ramalingam C
  Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel, Ayaz A Siddiqui

On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> 
> Xe-HP and latest devices support Flat CCS which reserved a portion of
> the device memory to store compression metadata, during the clearing of
> device memory buffer object we also need to clear the associated
> CCS buffer.
> 
> Flat CCS memory can not be directly accessed by S/W.
> Address of CCS buffer associated main BO is automatically calculated
> by device itself. KMD/UMD can only access this buffer indirectly using
> XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> 
> v2: Fixed issues with platform naming [Lucas]
> v3: Rebased [Ram]
>     Used the round_up funcs [Bob]
> v4: Fixed ccs blk calculation [Ram]
>     Added Kdoc on flat-ccs.
> 
> Cc: CQ Tang <cq.tang@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
>  2 files changed, 156 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f8253012d166..166de5436c4a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -203,6 +203,21 @@
>  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
>  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
>  
> +#define XY_CTRL_SURF_INSTR_SIZE	5
> +#define MI_FLUSH_DW_SIZE		3
> +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> +#define   SRC_ACCESS_TYPE_SHIFT		21
> +#define   DST_ACCESS_TYPE_SHIFT		20
> +#define   CCS_SIZE_SHIFT		8

Rather than using a shift, it might be better to just define the
bitfield.  E.g.,

        #define CCS_SIZE        GENMASK(17, 8)

and then later

        FIELD_PREP(CCS_SIZE, i - 1)

to refer to the proper value.

> +#define   XY_CTRL_SURF_MOCS_SHIFT	25

Same here; we can use GENMASK(31, 25) to define the field.

> +#define   NUM_CCS_BYTES_PER_BLOCK	256
> +#define   NUM_BYTES_PER_CCS_BYTE	256
> +#define   NUM_CCS_BLKS_PER_XFER		1024
> +#define   INDIRECT_ACCESS		0
> +#define   DIRECT_ACCESS			1
> +#define  MI_FLUSH_LLC			BIT(9)
> +#define  MI_FLUSH_CCS			BIT(16)
> +
>  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
>  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
>  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..9f9cd2649377 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,6 +16,8 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs *engine)
>  {
> @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
>  	return height % 4 == 3 && height <= 8;
>  }
>  
> +/**
> + * DOC: Flat-CCS - Memory compression for Local memory
> + *
> + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> + * stored in local memory for each surface, to support the 3D and media
> + * compression formats.
> + *
> + * The memory required for the CCS of the entire local memory is 1/256 of the
> + * local memory size. So before the kernel boot, the required memory is reserved
> + * for the CCS data and a secure register will be programmed with the CCS base
> + * address.
> + *
> + * Flat CCS data needs to be cleared when a lmem object is allocated.
> + * And CCS data can be copied in and out of CCS region through
> + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> + *
> + * When we exaust the lmem, if the object's placements support smem, then we can

Typo: exhaust

> + * directly decompress the compressed lmem object into smem and start using it
> + * from smem itself.
> + *
> + * But when we need to swapout the compressed lmem object into a smem region
> + * though objects' placement doesn't support smem, then we copy the lmem content
> + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> + * When the object is referred, lmem content will be swaped in along with
> + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> + * location.
> + */
> +
> +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> +{
> +	/* Mask the 3 LSB to use the PPGTT address space */

This comment implies that we'd be doing something like

        *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);

but that doesn't seem to be the case.  The bspec does say the address
must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
we get passed something that isn't aligned properly?

> +	*cmd++ = MI_FLUSH_DW | flags;
> +	*cmd++ = lower_32_bits(dst);
> +	*cmd++ = upper_32_bits(dst);
> +
> +	return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> +{
> +	u32 num_cmds, num_blks, total_size;
> +
> +	if (!GET_CCS_BYTES(i915, size))
> +		return 0;
> +
> +	/*
> +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> +	 * trnasfer upto 1024 blocks.

Typo: transfer.

> +	 */
> +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +				NUM_CCS_BYTES_PER_BLOCK);
> +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> +
> +	/*
> +	 * We need to add a flush before and after
> +	 * XY_CTRL_SURF_COPY_BLT

Do you have a bspec reference for this?  It sounds reasonable, but I
wanted to confirm with the bspec that we're programming the flush the
way it wants us to.

> +	 */
> +	total_size += 2 * MI_FLUSH_DW_SIZE;
> +	return total_size;
> +}
> +
> +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> +				     u8 src_mem_access, u8 dst_mem_access,
> +				     int src_mocs, int dst_mocs,
> +				     u16 num_ccs_blocks)
> +{
> +	int i = num_ccs_blocks;
> +
> +	/*
> +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> +	 * data in and out of the CCS region.
> +	 *
> +	 * We can copy at most 1024 blocks of 256 bytes using one
> +	 * XY_CTRL_SURF_COPY_BLT instruction.
> +	 *
> +	 * In case we need to copy more than 1024 blocks, we need to add
> +	 * another instruction to the same batch buffer.
> +	 *
> +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> +	 *
> +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> +	 */
> +	do {
> +		/*
> +		 * We use logical AND with 1023 since the size field
> +		 * takes values which is in the range of 0 - 1023

I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
isn't what you want.

> +		 */
> +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> +		*cmd++ = lower_32_bits(src_addr);
> +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		*cmd++ = lower_32_bits(dst_addr);
> +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		src_addr += SZ_64M;
> +		dst_addr += SZ_64M;
> +		i -= NUM_CCS_BLKS_PER_XFER;
> +	} while (i > 0);
> +
> +	return cmd;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
>  		     u32 dst_offset, u32 src_offset, int size)
>  {
> @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
>  	return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +		      u32 value, bool is_lmem)
>  {
> -	const int ver = GRAPHICS_VER(rq->engine->i915);
> +	struct drm_i915_private *i915 = rq->engine->i915;
> +	const int ver = GRAPHICS_VER(i915);
> +	u32 num_ccs_blks, ccs_ring_size;
>  	u32 *cs;
>  
>  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>  	offset += (u64)rq->engine->instance << 32;
>  
> -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +	/* Clear flat css only when value is 0 */

Typo: ccs

> +	ccs_ring_size = (is_lmem && !value) ?
> +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> +
> +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
> @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
>  		*cs++ = value;
>  	}
>  
> +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +					    NUM_CCS_BYTES_PER_BLOCK);
> +
> +		/*
> +		 * Flat CCS surface can only be accessed via
> +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> +		 * mapping of associated LMEM.
> +		 * We can clear ccs surface by writing all 0s,
> +		 * so we will flush the previously cleared buffer
> +		 * and use it as a source.
> +		 */
> +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> +					      1, 1, num_ccs_blks);

The magic number '1' used for MOCS here doesn't look right.  The proper
MOCS entry is probably going to vary from platform to platform.  Bspec
47980 says it should be UC with GO:Memory, so I think that would be
index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
would mean we we'd need to program a value of "4" here if I'm reading
the description correctly.

Right now we have mocs->uc_index pointing to the uncached entry with
GO:L3.  But I from a quick skim, I think the only places we're using
that value are the programming of BLIT_CCTL (bspec 45807) and
RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
here.  Might be worth renaming the field to "uc_index_gomemory" just to
make it more explicit what it's representing to prevent mistakes during
enablement of future platforms.


Matt

> +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +
> +		if (ccs_ring_size & 1)
> +			*cs++ = MI_NOOP;
> +	}
>  	intel_ring_advance(rq, cs);
>  	return 0;
>  }
> @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>  		if (err)
>  			goto out_rq;
>  
> -		err = emit_clear(rq, offset, len, value);
> +		err = emit_clear(rq, offset, len, value, is_lmem);
>  
>  		/* Arbitration is re-enabled between requests. */
>  out_rq:
> -- 
> 2.20.1
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-02-19  1:47     ` Matt Roper
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Roper @ 2022-02-19  1:47 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel

On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> 
> Xe-HP and latest devices support Flat CCS which reserved a portion of
> the device memory to store compression metadata, during the clearing of
> device memory buffer object we also need to clear the associated
> CCS buffer.
> 
> Flat CCS memory can not be directly accessed by S/W.
> Address of CCS buffer associated main BO is automatically calculated
> by device itself. KMD/UMD can only access this buffer indirectly using
> XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> 
> v2: Fixed issues with platform naming [Lucas]
> v3: Rebased [Ram]
>     Used the round_up funcs [Bob]
> v4: Fixed ccs blk calculation [Ram]
>     Added Kdoc on flat-ccs.
> 
> Cc: CQ Tang <cq.tang@intel.com>
> Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
>  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
>  2 files changed, 156 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f8253012d166..166de5436c4a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -203,6 +203,21 @@
>  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
>  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
>  
> +#define XY_CTRL_SURF_INSTR_SIZE	5
> +#define MI_FLUSH_DW_SIZE		3
> +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> +#define   SRC_ACCESS_TYPE_SHIFT		21
> +#define   DST_ACCESS_TYPE_SHIFT		20
> +#define   CCS_SIZE_SHIFT		8

Rather than using a shift, it might be better to just define the
bitfield.  E.g.,

        #define CCS_SIZE        GENMASK(17, 8)

and then later

        FIELD_PREP(CCS_SIZE, i - 1)

to refer to the proper value.

> +#define   XY_CTRL_SURF_MOCS_SHIFT	25

Same here; we can use GENMASK(31, 25) to define the field.

> +#define   NUM_CCS_BYTES_PER_BLOCK	256
> +#define   NUM_BYTES_PER_CCS_BYTE	256
> +#define   NUM_CCS_BLKS_PER_XFER		1024
> +#define   INDIRECT_ACCESS		0
> +#define   DIRECT_ACCESS			1
> +#define  MI_FLUSH_LLC			BIT(9)
> +#define  MI_FLUSH_CCS			BIT(16)
> +
>  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
>  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
>  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> index 20444d6ceb3c..9f9cd2649377 100644
> --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> @@ -16,6 +16,8 @@ struct insert_pte_data {
>  };
>  
>  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
>  
>  static bool engine_supports_migration(struct intel_engine_cs *engine)
>  {
> @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
>  	return height % 4 == 3 && height <= 8;
>  }
>  
> +/**
> + * DOC: Flat-CCS - Memory compression for Local memory
> + *
> + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> + * stored in local memory for each surface, to support the 3D and media
> + * compression formats.
> + *
> + * The memory required for the CCS of the entire local memory is 1/256 of the
> + * local memory size. So before the kernel boot, the required memory is reserved
> + * for the CCS data and a secure register will be programmed with the CCS base
> + * address.
> + *
> + * Flat CCS data needs to be cleared when a lmem object is allocated.
> + * And CCS data can be copied in and out of CCS region through
> + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> + *
> + * When we exaust the lmem, if the object's placements support smem, then we can

Typo: exhaust

> + * directly decompress the compressed lmem object into smem and start using it
> + * from smem itself.
> + *
> + * But when we need to swapout the compressed lmem object into a smem region
> + * though objects' placement doesn't support smem, then we copy the lmem content
> + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> + * When the object is referred, lmem content will be swaped in along with
> + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> + * location.
> + */
> +
> +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> +{
> +	/* Mask the 3 LSB to use the PPGTT address space */

This comment implies that we'd be doing something like

        *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);

but that doesn't seem to be the case.  The bspec does say the address
must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
we get passed something that isn't aligned properly?

> +	*cmd++ = MI_FLUSH_DW | flags;
> +	*cmd++ = lower_32_bits(dst);
> +	*cmd++ = upper_32_bits(dst);
> +
> +	return cmd;
> +}
> +
> +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> +{
> +	u32 num_cmds, num_blks, total_size;
> +
> +	if (!GET_CCS_BYTES(i915, size))
> +		return 0;
> +
> +	/*
> +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> +	 * trnasfer upto 1024 blocks.

Typo: transfer.

> +	 */
> +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +				NUM_CCS_BYTES_PER_BLOCK);
> +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> +
> +	/*
> +	 * We need to add a flush before and after
> +	 * XY_CTRL_SURF_COPY_BLT

Do you have a bspec reference for this?  It sounds reasonable, but I
wanted to confirm with the bspec that we're programming the flush the
way it wants us to.

> +	 */
> +	total_size += 2 * MI_FLUSH_DW_SIZE;
> +	return total_size;
> +}
> +
> +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> +				     u8 src_mem_access, u8 dst_mem_access,
> +				     int src_mocs, int dst_mocs,
> +				     u16 num_ccs_blocks)
> +{
> +	int i = num_ccs_blocks;
> +
> +	/*
> +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> +	 * data in and out of the CCS region.
> +	 *
> +	 * We can copy at most 1024 blocks of 256 bytes using one
> +	 * XY_CTRL_SURF_COPY_BLT instruction.
> +	 *
> +	 * In case we need to copy more than 1024 blocks, we need to add
> +	 * another instruction to the same batch buffer.
> +	 *
> +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> +	 *
> +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> +	 */
> +	do {
> +		/*
> +		 * We use logical AND with 1023 since the size field
> +		 * takes values which is in the range of 0 - 1023

I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
isn't what you want.

> +		 */
> +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> +		*cmd++ = lower_32_bits(src_addr);
> +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		*cmd++ = lower_32_bits(dst_addr);
> +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> +		src_addr += SZ_64M;
> +		dst_addr += SZ_64M;
> +		i -= NUM_CCS_BLKS_PER_XFER;
> +	} while (i > 0);
> +
> +	return cmd;
> +}
> +
>  static int emit_copy(struct i915_request *rq,
>  		     u32 dst_offset, u32 src_offset, int size)
>  {
> @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
>  	return err;
>  }
>  
> -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> +		      u32 value, bool is_lmem)
>  {
> -	const int ver = GRAPHICS_VER(rq->engine->i915);
> +	struct drm_i915_private *i915 = rq->engine->i915;
> +	const int ver = GRAPHICS_VER(i915);
> +	u32 num_ccs_blks, ccs_ring_size;
>  	u32 *cs;
>  
>  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
>  
>  	offset += (u64)rq->engine->instance << 32;
>  
> -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> +	/* Clear flat css only when value is 0 */

Typo: ccs

> +	ccs_ring_size = (is_lmem && !value) ?
> +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> +
> +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
>  	if (IS_ERR(cs))
>  		return PTR_ERR(cs);
>  
> @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
>  		*cs++ = value;
>  	}
>  
> +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> +					    NUM_CCS_BYTES_PER_BLOCK);
> +
> +		/*
> +		 * Flat CCS surface can only be accessed via
> +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> +		 * mapping of associated LMEM.
> +		 * We can clear ccs surface by writing all 0s,
> +		 * so we will flush the previously cleared buffer
> +		 * and use it as a source.
> +		 */
> +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> +					      1, 1, num_ccs_blks);

The magic number '1' used for MOCS here doesn't look right.  The proper
MOCS entry is probably going to vary from platform to platform.  Bspec
47980 says it should be UC with GO:Memory, so I think that would be
index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
would mean we we'd need to program a value of "4" here if I'm reading
the description correctly.

Right now we have mocs->uc_index pointing to the uncached entry with
GO:L3.  But I from a quick skim, I think the only places we're using
that value are the programming of BLIT_CCTL (bspec 45807) and
RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
here.  Might be worth renaming the field to "uc_index_gomemory" just to
make it more explicit what it's representing to prevent mistakes during
enablement of future platforms.


Matt

> +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> +
> +		if (ccs_ring_size & 1)
> +			*cs++ = MI_NOOP;
> +	}
>  	intel_ring_advance(rq, cs);
>  	return 0;
>  }
> @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
>  		if (err)
>  			goto out_rq;
>  
> -		err = emit_clear(rq, offset, len, value);
> +		err = emit_clear(rq, offset, len, value, is_lmem);
>  
>  		/* Arbitration is re-enabled between requests. */
>  out_rq:
> -- 
> 2.20.1
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Enable DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
                   ` (18 preceding siblings ...)
  (?)
@ 2022-02-19 11:53 ` Patchwork
  -1 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2022-02-19 11:53 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30244 bytes --]

== Series Details ==

Series: drm/i915: Enable DG2
URL   : https://patchwork.freedesktop.org/series/100419/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11250_full -> Patchwork_22333_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (13 -> 13)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22333_full:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_schedule@smoketest-all:
    - {shard-rkl}:        NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-rkl-5/igt@gem_exec_schedule@smoketest-all.html

  * igt@gem_pxp@verify-pxp-execution-after-suspend-resume:
    - {shard-rkl}:        [SKIP][2] ([i915#4270]) -> [INCOMPLETE][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-rkl-5/igt@gem_pxp@verify-pxp-execution-after-suspend-resume.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-rkl-5/igt@gem_pxp@verify-pxp-execution-after-suspend-resume.html

  * igt@perf_pmu@module-unload:
    - {shard-rkl}:        [PASS][4] -> [FAIL][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-rkl-4/igt@perf_pmu@module-unload.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-rkl-6/igt@perf_pmu@module-unload.html

  
Known issues
------------

  Here are the changes found in Patchwork_22333_full that come from known issues:

### CI changes ###

#### Possible fixes ####

  * boot:
    - shard-snb:          ([PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [FAIL][22], [PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30]) ([i915#4338]) -> ([PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], [PASS][55])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb2/boot.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb7/boot.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb7/boot.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb7/boot.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb7/boot.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb2/boot.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb7/boot.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb6/boot.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb6/boot.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb6/boot.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb6/boot.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb5/boot.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb4/boot.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb4/boot.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb4/boot.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb4/boot.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb4/boot.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb2/boot.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb2/boot.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb7/boot.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb7/boot.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb7/boot.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb7/boot.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb7/boot.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb6/boot.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb5/boot.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb5/boot.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb5/boot.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb5/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/boot.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/boot.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/boot.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/boot.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb2/boot.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb2/boot.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb2/boot.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb2/boot.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb2/boot.html
    - shard-glk:          ([PASS][56], [PASS][57], [FAIL][58], [PASS][59], [PASS][60], [PASS][61], [PASS][62], [PASS][63], [PASS][64], [PASS][65], [PASS][66], [PASS][67], [PASS][68], [PASS][69], [PASS][70], [PASS][71], [PASS][72], [PASS][73], [PASS][74], [PASS][75], [PASS][76], [PASS][77], [PASS][78], [PASS][79], [PASS][80]) ([i915#4392]) -> ([PASS][81], [PASS][82], [PASS][83], [PASS][84], [PASS][85], [PASS][86], [PASS][87], [PASS][88], [PASS][89], [PASS][90], [PASS][91], [PASS][92], [PASS][93], [PASS][94], [PASS][95], [PASS][96], [PASS][97], [PASS][98], [PASS][99], [PASS][100], [PASS][101], [PASS][102], [PASS][103], [PASS][104], [PASS][105])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk1/boot.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk1/boot.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk2/boot.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk2/boot.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk2/boot.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk2/boot.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk3/boot.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk3/boot.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk3/boot.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk4/boot.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk4/boot.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk4/boot.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk5/boot.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk5/boot.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk6/boot.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk6/boot.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk7/boot.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk7/boot.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk7/boot.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk8/boot.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk8/boot.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk8/boot.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk9/boot.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk9/boot.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk9/boot.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk9/boot.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk9/boot.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk9/boot.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk8/boot.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk8/boot.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk8/boot.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk7/boot.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk7/boot.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk7/boot.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk6/boot.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk6/boot.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk6/boot.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk5/boot.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk5/boot.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/boot.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/boot.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/boot.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk3/boot.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk3/boot.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk3/boot.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk2/boot.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk2/boot.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk1/boot.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk1/boot.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk1/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@display-3x:
    - shard-iclb:         NOTRUN -> [SKIP][106] ([i915#1839])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@feature_discovery@display-3x.html

  * igt@gem_create@create-massive:
    - shard-iclb:         NOTRUN -> [DMESG-WARN][107] ([i915#4991])
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gem_create@create-massive.html
    - shard-kbl:          NOTRUN -> [DMESG-WARN][108] ([i915#4991])
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl4/igt@gem_create@create-massive.html

  * igt@gem_ctx_persistence@legacy-engines-persistence:
    - shard-snb:          NOTRUN -> [SKIP][109] ([fdo#109271] / [i915#1099]) +1 similar issue
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/igt@gem_ctx_persistence@legacy-engines-persistence.html

  * igt@gem_eio@in-flight-immediate:
    - shard-tglb:         [PASS][110] -> [TIMEOUT][111] ([i915#3063])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-tglb5/igt@gem_eio@in-flight-immediate.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-tglb1/igt@gem_eio@in-flight-immediate.html

  * igt@gem_eio@kms:
    - shard-tglb:         [PASS][112] -> [FAIL][113] ([i915#232])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-tglb6/igt@gem_eio@kms.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-tglb5/igt@gem_eio@kms.html

  * igt@gem_exec_balancer@parallel-balancer:
    - shard-iclb:         [PASS][114] -> [SKIP][115] ([i915#4525])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb4/igt@gem_exec_balancer@parallel-balancer.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb6/igt@gem_exec_balancer@parallel-balancer.html

  * igt@gem_exec_balancer@parallel-keep-in-fence:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][116] ([i915#5076])
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl7/igt@gem_exec_balancer@parallel-keep-in-fence.html

  * igt@gem_exec_capture@pi@vcs0:
    - shard-skl:          NOTRUN -> [INCOMPLETE][117] ([i915#4547])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl10/igt@gem_exec_capture@pi@vcs0.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-skl:          NOTRUN -> [FAIL][118] ([i915#2846])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl3/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - shard-kbl:          [PASS][119] -> [FAIL][120] ([i915#2842]) +1 similar issue
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-kbl7/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl4/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-iclb:         [PASS][121] -> [FAIL][122] ([i915#2842]) +1 similar issue
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb8/igt@gem_exec_fair@basic-pace@vcs0.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb2/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_fair@basic-pace@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][123] ([i915#2842]) +1 similar issue
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb2/igt@gem_exec_fair@basic-pace@vcs1.html

  * igt@gem_exec_whisper@basic-fds-priority:
    - shard-glk:          [PASS][124] -> [DMESG-WARN][125] ([i915#118]) +1 similar issue
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk4/igt@gem_exec_whisper@basic-fds-priority.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk1/igt@gem_exec_whisper@basic-fds-priority.html

  * igt@gem_lmem_swapping@heavy-random:
    - shard-glk:          NOTRUN -> [SKIP][126] ([fdo#109271] / [i915#4613])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/igt@gem_lmem_swapping@heavy-random.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - shard-iclb:         NOTRUN -> [SKIP][127] ([i915#4613]) +1 similar issue
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb5/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-apl:          NOTRUN -> [WARN][128] ([i915#2658])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl8/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_pxp@reject-modify-context-protection-off-3:
    - shard-iclb:         NOTRUN -> [SKIP][129] ([i915#4270]) +4 similar issues
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gem_pxp@reject-modify-context-protection-off-3.html

  * igt@gem_render_copy@y-tiled-ccs-to-y-tiled-mc-ccs:
    - shard-iclb:         NOTRUN -> [SKIP][130] ([i915#768]) +1 similar issue
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gem_render_copy@y-tiled-ccs-to-y-tiled-mc-ccs.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-kbl:          NOTRUN -> [SKIP][131] ([fdo#109271] / [i915#3323])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl4/igt@gem_userptr_blits@dmabuf-sync.html
    - shard-iclb:         NOTRUN -> [SKIP][132] ([i915#3323])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@unsync-unmap-after-close:
    - shard-iclb:         NOTRUN -> [SKIP][133] ([i915#3297])
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gem_userptr_blits@unsync-unmap-after-close.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-kbl:          [PASS][134] -> [DMESG-WARN][135] ([i915#1436] / [i915#716])
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-kbl7/igt@gen9_exec_parse@allowed-all.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl1/igt@gen9_exec_parse@allowed-all.html

  * igt@gen9_exec_parse@unaligned-jump:
    - shard-iclb:         NOTRUN -> [SKIP][136] ([i915#2856]) +2 similar issues
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@gen9_exec_parse@unaligned-jump.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-iclb:         [PASS][137] -> [FAIL][138] ([i915#454]) +1 similar issue
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb8/igt@i915_pm_dc@dc6-dpms.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb3/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         [PASS][139] -> [SKIP][140] ([i915#4281])
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb8/igt@i915_pm_dc@dc9-dpms.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb3/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rpm@dpms-non-lpsp:
    - shard-iclb:         NOTRUN -> [SKIP][141] ([fdo#110892])
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@i915_pm_rpm@dpms-non-lpsp.html

  * igt@i915_pm_rpm@modeset-lpsp-stress:
    - shard-apl:          NOTRUN -> [SKIP][142] ([fdo#109271]) +20 similar issues
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl2/igt@i915_pm_rpm@modeset-lpsp-stress.html

  * igt@i915_suspend@sysfs-reader:
    - shard-apl:          [PASS][143] -> [DMESG-WARN][144] ([i915#180]) +3 similar issues
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-apl4/igt@i915_suspend@sysfs-reader.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl2/igt@i915_suspend@sysfs-reader.html

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing:
    - shard-iclb:         NOTRUN -> [SKIP][145] ([i915#1769])
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb3/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html

  * igt@kms_big_fb@linear-32bpp-rotate-0:
    - shard-glk:          NOTRUN -> [DMESG-WARN][146] ([i915#118])
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/igt@kms_big_fb@linear-32bpp-rotate-0.html

  * igt@kms_big_fb@linear-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][147] ([fdo#110725] / [fdo#111614])
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb5/igt@kms_big_fb@linear-64bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-apl:          NOTRUN -> [SKIP][148] ([fdo#109271] / [i915#3777])
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl2/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][149] ([i915#3743])
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl3/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-glk:          NOTRUN -> [SKIP][150] ([fdo#109271] / [i915#3777])
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-kbl:          NOTRUN -> [SKIP][151] ([fdo#109271] / [i915#3777]) +2 similar issues
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl7/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html
    - shard-skl:          NOTRUN -> [SKIP][152] ([fdo#109271] / [i915#3777])
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl3/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0:
    - shard-iclb:         NOTRUN -> [SKIP][153] ([fdo#110723]) +1 similar issue
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb3/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_rc_ccs_cc:
    - shard-glk:          NOTRUN -> [SKIP][154] ([fdo#109271] / [i915#3886]) +1 similar issue
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_mc_ccs:
    - shard-kbl:          NOTRUN -> [SKIP][155] ([fdo#109271] / [i915#3886]) +1 similar issue
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl3/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc:
    - shard-iclb:         NOTRUN -> [SKIP][156] ([fdo#109278] / [i915#3886]) +2 similar issues
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb3/igt@kms_ccs@pipe-a-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][157] ([fdo#109271] / [i915#3886]) +3 similar issues
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl2/igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][158] ([fdo#109271] / [i915#3886])
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl2/igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@vga-hpd-with-enabled-mode:
    - shard-iclb:         NOTRUN -> [SKIP][159] ([fdo#109284] / [fdo#111827]) +8 similar issues
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb6/igt@kms_chamelium@vga-hpd-with-enabled-mode.html

  * igt@kms_color@pipe-d-ctm-red-to-blue:
    - shard-iclb:         NOTRUN -> [SKIP][160] ([fdo#109278] / [i915#1149])
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb5/igt@kms_color@pipe-d-ctm-red-to-blue.html

  * igt@kms_color_chamelium@pipe-a-ctm-limited-range:
    - shard-snb:          NOTRUN -> [SKIP][161] ([fdo#109271] / [fdo#111827]) +3 similar issues
   [161]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/igt@kms_color_chamelium@pipe-a-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-b-degamma:
    - shard-glk:          NOTRUN -> [SKIP][162] ([fdo#109271] / [fdo#111827]) +2 similar issues
   [162]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk4/igt@kms_color_chamelium@pipe-b-degamma.html

  * igt@kms_color_chamelium@pipe-c-ctm-max:
    - shard-kbl:          NOTRUN -> [SKIP][163] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [163]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl7/igt@kms_color_chamelium@pipe-c-ctm-max.html

  * igt@kms_color_chamelium@pipe-c-degamma:
    - shard-apl:          NOTRUN -> [SKIP][164] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [164]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-apl2/igt@kms_color_chamelium@pipe-c-degamma.html

  * igt@kms_color_chamelium@pipe-d-ctm-0-25:
    - shard-skl:          NOTRUN -> [SKIP][165] ([fdo#109271] / [fdo#111827]) +2 similar issues
   [165]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl3/igt@kms_color_chamelium@pipe-d-ctm-0-25.html

  * igt@kms_color_chamelium@pipe-d-gamma:
    - shard-iclb:         NOTRUN -> [SKIP][166] ([fdo#109278] / [fdo#109284] / [fdo#111827])
   [166]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@kms_color_chamelium@pipe-d-gamma.html

  * igt@kms_content_protection@dp-mst-lic-type-1:
    - shard-iclb:         NOTRUN -> [SKIP][167] ([i915#3116])
   [167]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb6/igt@kms_content_protection@dp-mst-lic-type-1.html

  * igt@kms_cursor_crc@pipe-a-cursor-512x512-sliding:
    - shard-iclb:         NOTRUN -> [SKIP][168] ([fdo#109278] / [fdo#109279]) +5 similar issues
   [168]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb6/igt@kms_cursor_crc@pipe-a-cursor-512x512-sliding.html

  * igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic:
    - shard-iclb:         NOTRUN -> [SKIP][169] ([fdo#109274] / [fdo#109278])
   [169]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic.html

  * igt@kms_cursor_legacy@pipe-d-single-move:
    - shard-iclb:         NOTRUN -> [SKIP][170] ([fdo#109278]) +32 similar issues
   [170]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb5/igt@kms_cursor_legacy@pipe-d-single-move.html

  * igt@kms_draw_crc@draw-method-xrgb8888-render-untiled:
    - shard-snb:          [PASS][171] -> [SKIP][172] ([fdo#109271]) +1 similar issue
   [171]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-snb2/igt@kms_draw_crc@draw-method-xrgb8888-render-untiled.html
   [172]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/igt@kms_draw_crc@draw-method-xrgb8888-render-untiled.html

  * igt@kms_flip@2x-nonexisting-fb:
    - shard-iclb:         NOTRUN -> [SKIP][173] ([fdo#109274]) +6 similar issues
   [173]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@kms_flip@2x-nonexisting-fb.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
    - shard-skl:          [PASS][174] -> [FAIL][175] ([i915#79])
   [174]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-skl2/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html
   [175]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl1/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-edp1:
    - shard-skl:          [PASS][176] -> [INCOMPLETE][177] ([i915#4839])
   [176]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-skl1/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html
   [177]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-skl4/igt@kms_flip@flip-vs-suspend-interruptible@a-edp1.html

  * igt@kms_flip@flip-vs-suspend@a-edp1:
    - shard-iclb:         [PASS][178] -> [DMESG-WARN][179] ([i915#1888])
   [178]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb1/igt@kms_flip@flip-vs-suspend@a-edp1.html
   [179]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb8/igt@kms_flip@flip-vs-suspend@a-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling:
    - shard-iclb:         [PASS][180] -> [SKIP][181] ([i915#3701]) +1 similar issue
   [180]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-iclb4/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling.html
   [181]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-iclb2/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling:
    - shard-glk:          [PASS][182] -> [FAIL][183] ([i915#4911])
   [182]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11250/shard-glk3/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling.html
   [183]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-glk8/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
    - shard-snb:          NOTRUN -> [SKIP][184] ([fdo#109271]) +96 similar issues
   [184]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-snb4/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-blt:
    - shard-kbl:          NOTRUN -> [SKIP][185] ([fdo#109271]) +86 similar issues
   [185]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
    - shard-skl:          NOTRUN -> [SKIP][186] ([fdo#109271]) +42 similar issues
   [186]: https://int

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22333/index.html

[-- Attachment #2: Type: text/html, Size: 32984 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 00/15] drm/i915: Enable DG2
  2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
                   ` (19 preceding siblings ...)
  (?)
@ 2022-02-20 10:18 ` Lucas De Marchi
  -1 siblings, 0 replies; 47+ messages in thread
From: Lucas De Marchi @ 2022-02-20 10:18 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, dri-devel

On Sat, Feb 19, 2022 at 12:17:37AM +0530, Ramalingam C wrote:
>Enabling the Dg2 on drm/i915.
>
>This series adds support for 64k pagesize and documents the uapi
>impacts. And also adds basic flat-ccs enabling patches to
>support the local memory initialization and object creation. Kdoc is
>added to document the Flat-ccs support.
>
>Flat-ccs modifiers will be enabled in upcoming patches.
>
>Note:
>This is subset of https://patchwork.freedesktop.org/series/95686/ The
>remaining patches of the series will be pursued in subsequent series.
>
>And few patches are reviewed at and pulled from many series like
>https://patchwork.freedesktop.org/series/99119/
>https://patchwork.freedesktop.org/series/100373/
>https://patchwork.freedesktop.org/series/97544/
>

Patches 1-4 had already being applied from their own patch series.
Patch 15 still needs some changes

All the other patches were applied to drm-intel-gt-next.

thanks
Lucas De Marchi

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-02-19  1:47     ` [Intel-gfx] " Matt Roper
@ 2022-02-27 16:52       ` Ramalingam C
  -1 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-27 16:52 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel, Ayaz A Siddiqui

Matt,

Thanks for the review.

On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote:
> On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> > From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > 
> > Xe-HP and latest devices support Flat CCS which reserved a portion of
> > the device memory to store compression metadata, during the clearing of
> > device memory buffer object we also need to clear the associated
> > CCS buffer.
> > 
> > Flat CCS memory can not be directly accessed by S/W.
> > Address of CCS buffer associated main BO is automatically calculated
> > by device itself. KMD/UMD can only access this buffer indirectly using
> > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> > 
> > v2: Fixed issues with platform naming [Lucas]
> > v3: Rebased [Ram]
> >     Used the round_up funcs [Bob]
> > v4: Fixed ccs blk calculation [Ram]
> >     Added Kdoc on flat-ccs.
> > 
> > Cc: CQ Tang <cq.tang@intel.com>
> > Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
> >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
> >  2 files changed, 156 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index f8253012d166..166de5436c4a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -203,6 +203,21 @@
> >  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> >  
> > +#define XY_CTRL_SURF_INSTR_SIZE	5
> > +#define MI_FLUSH_DW_SIZE		3
> > +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> > +#define   SRC_ACCESS_TYPE_SHIFT		21
> > +#define   DST_ACCESS_TYPE_SHIFT		20
> > +#define   CCS_SIZE_SHIFT		8
> 
> Rather than using a shift, it might be better to just define the
> bitfield.  E.g.,
> 
>         #define CCS_SIZE        GENMASK(17, 8)
> 
> and then later
> 
>         FIELD_PREP(CCS_SIZE, i - 1)
> 
> to refer to the proper value.
> 
> > +#define   XY_CTRL_SURF_MOCS_SHIFT	25
> 
> Same here; we can use GENMASK(31, 25) to define the field.

Adapting to the GENMASK and FIELD_PREP for these two macros
> 
> > +#define   NUM_CCS_BYTES_PER_BLOCK	256
> > +#define   NUM_BYTES_PER_CCS_BYTE	256
> > +#define   NUM_CCS_BLKS_PER_XFER		1024
> > +#define   INDIRECT_ACCESS		0
> > +#define   DIRECT_ACCESS			1
> > +#define  MI_FLUSH_LLC			BIT(9)
> > +#define  MI_FLUSH_CCS			BIT(16)
> > +
> >  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
> >  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
> >  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index 20444d6ceb3c..9f9cd2649377 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -16,6 +16,8 @@ struct insert_pte_data {
> >  };
> >  
> >  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> > +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> > +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
> >  
> >  static bool engine_supports_migration(struct intel_engine_cs *engine)
> >  {
> > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
> >  	return height % 4 == 3 && height <= 8;
> >  }
> >  
> > +/**
> > + * DOC: Flat-CCS - Memory compression for Local memory
> > + *
> > + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> > + * stored in local memory for each surface, to support the 3D and media
> > + * compression formats.
> > + *
> > + * The memory required for the CCS of the entire local memory is 1/256 of the
> > + * local memory size. So before the kernel boot, the required memory is reserved
> > + * for the CCS data and a secure register will be programmed with the CCS base
> > + * address.
> > + *
> > + * Flat CCS data needs to be cleared when a lmem object is allocated.
> > + * And CCS data can be copied in and out of CCS region through
> > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> > + *
> > + * When we exaust the lmem, if the object's placements support smem, then we can
> 
> Typo: exhaust
> 
> > + * directly decompress the compressed lmem object into smem and start using it
> > + * from smem itself.
> > + *
> > + * But when we need to swapout the compressed lmem object into a smem region
> > + * though objects' placement doesn't support smem, then we copy the lmem content
> > + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> > + * When the object is referred, lmem content will be swaped in along with
> > + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> > + * location.
> > + */
> > +
> > +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> > +{
> > +	/* Mask the 3 LSB to use the PPGTT address space */
> 
> This comment implies that we'd be doing something like
> 
>         *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);
> 
> but that doesn't seem to be the case.  The bspec does say the address
> must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
> we get passed something that isn't aligned properly?

Thanks for pointing this out.

Bit(2) supposed to be 0 to indicate this is from ppgtt. But since the
address is 47:3 we need to << by 3 for the whole address.

Hence planning something like

        /* Address needs to be QWORD aligned */
        WARN_ON(dst & 0x7, __stringify(dst & 0x7));
        *cmd++ = MI_FLUSH_DW | flags;

        dst <<= 3;
        *cmd++ = lower_32_bits(dst);
        *cmd++ = upper_32_bits(dst);

> 
> > +	*cmd++ = MI_FLUSH_DW | flags;
> > +	*cmd++ = lower_32_bits(dst);
> > +	*cmd++ = upper_32_bits(dst);
> > +
> > +	return cmd;
> > +}
> > +
> > +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> > +{
> > +	u32 num_cmds, num_blks, total_size;
> > +
> > +	if (!GET_CCS_BYTES(i915, size))
> > +		return 0;
> > +
> > +	/*
> > +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> > +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> > +	 * trnasfer upto 1024 blocks.
> 
> Typo: transfer.
> 
> > +	 */
> > +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +				NUM_CCS_BYTES_PER_BLOCK);
> > +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> > +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> > +
> > +	/*
> > +	 * We need to add a flush before and after
> > +	 * XY_CTRL_SURF_COPY_BLT
> 
> Do you have a bspec reference for this?  It sounds reasonable, but I
> wanted to confirm with the bspec that we're programming the flush the
> way it wants us to.

For emit_clear we are first copying the CHUNK_SIZE of main memory and
then immediately we are using the same destination of the previous blt
as src for the next ccs clearing blt ops. Hence we are adding the flush
there and also after the blt for writing 0s for ccs data to make sure
that CCS chache is flused we are adding next flush.
> 
> > +	 */
> > +	total_size += 2 * MI_FLUSH_DW_SIZE;
> > +	return total_size;
> > +}
> > +
> > +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> > +				     u8 src_mem_access, u8 dst_mem_access,
> > +				     int src_mocs, int dst_mocs,
> > +				     u16 num_ccs_blocks)
> > +{
> > +	int i = num_ccs_blocks;
> > +
> > +	/*
> > +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> > +	 * data in and out of the CCS region.
> > +	 *
> > +	 * We can copy at most 1024 blocks of 256 bytes using one
> > +	 * XY_CTRL_SURF_COPY_BLT instruction.
> > +	 *
> > +	 * In case we need to copy more than 1024 blocks, we need to add
> > +	 * another instruction to the same batch buffer.
> > +	 *
> > +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> > +	 *
> > +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> > +	 */
> > +	do {
> > +		/*
> > +		 * We use logical AND with 1023 since the size field
> > +		 * takes values which is in the range of 0 - 1023
> 
> I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
> isn't what you want.
oops thats a wrong documentation overlooked
> 
> > +		 */
> > +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> > +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> > +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> > +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> > +		*cmd++ = lower_32_bits(src_addr);
> > +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> > +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > +		*cmd++ = lower_32_bits(dst_addr);
> > +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> > +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > +		src_addr += SZ_64M;
> > +		dst_addr += SZ_64M;
> > +		i -= NUM_CCS_BLKS_PER_XFER;
> > +	} while (i > 0);
> > +
> > +	return cmd;
> > +}
> > +
> >  static int emit_copy(struct i915_request *rq,
> >  		     u32 dst_offset, u32 src_offset, int size)
> >  {
> > @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
> >  	return err;
> >  }
> >  
> > -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > +		      u32 value, bool is_lmem)
> >  {
> > -	const int ver = GRAPHICS_VER(rq->engine->i915);
> > +	struct drm_i915_private *i915 = rq->engine->i915;
> > +	const int ver = GRAPHICS_VER(i915);
> > +	u32 num_ccs_blks, ccs_ring_size;
> >  	u32 *cs;
> >  
> >  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> >  
> >  	offset += (u64)rq->engine->instance << 32;
> >  
> > -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > +	/* Clear flat css only when value is 0 */
> 
> Typo: ccs
> 
> > +	ccs_ring_size = (is_lmem && !value) ?
> > +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> > +
> > +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
> >  	if (IS_ERR(cs))
> >  		return PTR_ERR(cs);
> >  
> > @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> >  		*cs++ = value;
> >  	}
> >  
> > +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> > +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +					    NUM_CCS_BYTES_PER_BLOCK);
> > +
> > +		/*
> > +		 * Flat CCS surface can only be accessed via
> > +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> > +		 * mapping of associated LMEM.
> > +		 * We can clear ccs surface by writing all 0s,
> > +		 * so we will flush the previously cleared buffer
> > +		 * and use it as a source.
> > +		 */
> > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> > +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> > +					      1, 1, num_ccs_blks);
> 
> The magic number '1' used for MOCS here doesn't look right.  The proper
> MOCS entry is probably going to vary from platform to platform.  Bspec
> 47980 says it should be UC with GO:Memory, so I think that would be
> index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
> would mean we we'd need to program a value of "4" here if I'm reading
> the description correctly.
> 
> Right now we have mocs->uc_index pointing to the uncached entry with
> GO:L3.  But I from a quick skim, I think the only places we're using
> that value are the programming of BLIT_CCTL (bspec 45807) and
> RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
> GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
> for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
> here.  Might be worth renaming the field to "uc_index_gomemory" just to
> make it more explicit what it's representing to prevent mistakes during
> enablement of future platforms.

Summarizing the moc index requirement based on the uc_index usage

CMD_CCTL for both CS Write Format Override” and “CS Read Format Override
	UC(Coherent, GO:Memory) 
BLIT_CCTL
  Src Mocs
	UC(Coherent, Go:L3) with UCL3LKDIS
  Dst Mocs
  	UC(Coherent, GoMemory) with UCL3LKDIS
XY_CTRL_SURF_COPY_BLT
  Same as BLIT_CCTL for src and dst Mocs.

So I infer that we need to have the uc_index and uc_gomemory_index both
so that we can use them for src and dst mocs respectively.

Please correct my understanding if i have missed something.

Ram.

> 
> 
> Matt
> 
> > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > +
> > +		if (ccs_ring_size & 1)
> > +			*cs++ = MI_NOOP;
> > +	}
> >  	intel_ring_advance(rq, cs);
> >  	return 0;
> >  }
> > @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
> >  		if (err)
> >  			goto out_rq;
> >  
> > -		err = emit_clear(rq, offset, len, value);
> > +		err = emit_clear(rq, offset, len, value, is_lmem);
> >  
> >  		/* Arbitration is re-enabled between requests. */
> >  out_rq:
> > -- 
> > 2.20.1
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> VTT-OSGC Platform Enablement
> Intel Corporation
> (916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-02-27 16:52       ` Ramalingam C
  0 siblings, 0 replies; 47+ messages in thread
From: Ramalingam C @ 2022-02-27 16:52 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel

Matt,

Thanks for the review.

On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote:
> On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> > From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > 
> > Xe-HP and latest devices support Flat CCS which reserved a portion of
> > the device memory to store compression metadata, during the clearing of
> > device memory buffer object we also need to clear the associated
> > CCS buffer.
> > 
> > Flat CCS memory can not be directly accessed by S/W.
> > Address of CCS buffer associated main BO is automatically calculated
> > by device itself. KMD/UMD can only access this buffer indirectly using
> > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> > 
> > v2: Fixed issues with platform naming [Lucas]
> > v3: Rebased [Ram]
> >     Used the round_up funcs [Bob]
> > v4: Fixed ccs blk calculation [Ram]
> >     Added Kdoc on flat-ccs.
> > 
> > Cc: CQ Tang <cq.tang@intel.com>
> > Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
> >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
> >  2 files changed, 156 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index f8253012d166..166de5436c4a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -203,6 +203,21 @@
> >  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> >  
> > +#define XY_CTRL_SURF_INSTR_SIZE	5
> > +#define MI_FLUSH_DW_SIZE		3
> > +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> > +#define   SRC_ACCESS_TYPE_SHIFT		21
> > +#define   DST_ACCESS_TYPE_SHIFT		20
> > +#define   CCS_SIZE_SHIFT		8
> 
> Rather than using a shift, it might be better to just define the
> bitfield.  E.g.,
> 
>         #define CCS_SIZE        GENMASK(17, 8)
> 
> and then later
> 
>         FIELD_PREP(CCS_SIZE, i - 1)
> 
> to refer to the proper value.
> 
> > +#define   XY_CTRL_SURF_MOCS_SHIFT	25
> 
> Same here; we can use GENMASK(31, 25) to define the field.

Adapting to the GENMASK and FIELD_PREP for these two macros
> 
> > +#define   NUM_CCS_BYTES_PER_BLOCK	256
> > +#define   NUM_BYTES_PER_CCS_BYTE	256
> > +#define   NUM_CCS_BLKS_PER_XFER		1024
> > +#define   INDIRECT_ACCESS		0
> > +#define   DIRECT_ACCESS			1
> > +#define  MI_FLUSH_LLC			BIT(9)
> > +#define  MI_FLUSH_CCS			BIT(16)
> > +
> >  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
> >  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
> >  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > index 20444d6ceb3c..9f9cd2649377 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > @@ -16,6 +16,8 @@ struct insert_pte_data {
> >  };
> >  
> >  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> > +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> > +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
> >  
> >  static bool engine_supports_migration(struct intel_engine_cs *engine)
> >  {
> > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
> >  	return height % 4 == 3 && height <= 8;
> >  }
> >  
> > +/**
> > + * DOC: Flat-CCS - Memory compression for Local memory
> > + *
> > + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> > + * stored in local memory for each surface, to support the 3D and media
> > + * compression formats.
> > + *
> > + * The memory required for the CCS of the entire local memory is 1/256 of the
> > + * local memory size. So before the kernel boot, the required memory is reserved
> > + * for the CCS data and a secure register will be programmed with the CCS base
> > + * address.
> > + *
> > + * Flat CCS data needs to be cleared when a lmem object is allocated.
> > + * And CCS data can be copied in and out of CCS region through
> > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> > + *
> > + * When we exaust the lmem, if the object's placements support smem, then we can
> 
> Typo: exhaust
> 
> > + * directly decompress the compressed lmem object into smem and start using it
> > + * from smem itself.
> > + *
> > + * But when we need to swapout the compressed lmem object into a smem region
> > + * though objects' placement doesn't support smem, then we copy the lmem content
> > + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> > + * When the object is referred, lmem content will be swaped in along with
> > + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> > + * location.
> > + */
> > +
> > +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> > +{
> > +	/* Mask the 3 LSB to use the PPGTT address space */
> 
> This comment implies that we'd be doing something like
> 
>         *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);
> 
> but that doesn't seem to be the case.  The bspec does say the address
> must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
> we get passed something that isn't aligned properly?

Thanks for pointing this out.

Bit(2) supposed to be 0 to indicate this is from ppgtt. But since the
address is 47:3 we need to << by 3 for the whole address.

Hence planning something like

        /* Address needs to be QWORD aligned */
        WARN_ON(dst & 0x7, __stringify(dst & 0x7));
        *cmd++ = MI_FLUSH_DW | flags;

        dst <<= 3;
        *cmd++ = lower_32_bits(dst);
        *cmd++ = upper_32_bits(dst);

> 
> > +	*cmd++ = MI_FLUSH_DW | flags;
> > +	*cmd++ = lower_32_bits(dst);
> > +	*cmd++ = upper_32_bits(dst);
> > +
> > +	return cmd;
> > +}
> > +
> > +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> > +{
> > +	u32 num_cmds, num_blks, total_size;
> > +
> > +	if (!GET_CCS_BYTES(i915, size))
> > +		return 0;
> > +
> > +	/*
> > +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> > +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> > +	 * trnasfer upto 1024 blocks.
> 
> Typo: transfer.
> 
> > +	 */
> > +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +				NUM_CCS_BYTES_PER_BLOCK);
> > +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> > +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> > +
> > +	/*
> > +	 * We need to add a flush before and after
> > +	 * XY_CTRL_SURF_COPY_BLT
> 
> Do you have a bspec reference for this?  It sounds reasonable, but I
> wanted to confirm with the bspec that we're programming the flush the
> way it wants us to.

For emit_clear we are first copying the CHUNK_SIZE of main memory and
then immediately we are using the same destination of the previous blt
as src for the next ccs clearing blt ops. Hence we are adding the flush
there and also after the blt for writing 0s for ccs data to make sure
that CCS chache is flused we are adding next flush.
> 
> > +	 */
> > +	total_size += 2 * MI_FLUSH_DW_SIZE;
> > +	return total_size;
> > +}
> > +
> > +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> > +				     u8 src_mem_access, u8 dst_mem_access,
> > +				     int src_mocs, int dst_mocs,
> > +				     u16 num_ccs_blocks)
> > +{
> > +	int i = num_ccs_blocks;
> > +
> > +	/*
> > +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> > +	 * data in and out of the CCS region.
> > +	 *
> > +	 * We can copy at most 1024 blocks of 256 bytes using one
> > +	 * XY_CTRL_SURF_COPY_BLT instruction.
> > +	 *
> > +	 * In case we need to copy more than 1024 blocks, we need to add
> > +	 * another instruction to the same batch buffer.
> > +	 *
> > +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> > +	 *
> > +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> > +	 */
> > +	do {
> > +		/*
> > +		 * We use logical AND with 1023 since the size field
> > +		 * takes values which is in the range of 0 - 1023
> 
> I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
> isn't what you want.
oops thats a wrong documentation overlooked
> 
> > +		 */
> > +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> > +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> > +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> > +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> > +		*cmd++ = lower_32_bits(src_addr);
> > +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> > +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > +		*cmd++ = lower_32_bits(dst_addr);
> > +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> > +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > +		src_addr += SZ_64M;
> > +		dst_addr += SZ_64M;
> > +		i -= NUM_CCS_BLKS_PER_XFER;
> > +	} while (i > 0);
> > +
> > +	return cmd;
> > +}
> > +
> >  static int emit_copy(struct i915_request *rq,
> >  		     u32 dst_offset, u32 src_offset, int size)
> >  {
> > @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
> >  	return err;
> >  }
> >  
> > -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > +		      u32 value, bool is_lmem)
> >  {
> > -	const int ver = GRAPHICS_VER(rq->engine->i915);
> > +	struct drm_i915_private *i915 = rq->engine->i915;
> > +	const int ver = GRAPHICS_VER(i915);
> > +	u32 num_ccs_blks, ccs_ring_size;
> >  	u32 *cs;
> >  
> >  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> >  
> >  	offset += (u64)rq->engine->instance << 32;
> >  
> > -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > +	/* Clear flat css only when value is 0 */
> 
> Typo: ccs
> 
> > +	ccs_ring_size = (is_lmem && !value) ?
> > +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> > +
> > +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
> >  	if (IS_ERR(cs))
> >  		return PTR_ERR(cs);
> >  
> > @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> >  		*cs++ = value;
> >  	}
> >  
> > +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> > +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > +					    NUM_CCS_BYTES_PER_BLOCK);
> > +
> > +		/*
> > +		 * Flat CCS surface can only be accessed via
> > +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> > +		 * mapping of associated LMEM.
> > +		 * We can clear ccs surface by writing all 0s,
> > +		 * so we will flush the previously cleared buffer
> > +		 * and use it as a source.
> > +		 */
> > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> > +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> > +					      1, 1, num_ccs_blks);
> 
> The magic number '1' used for MOCS here doesn't look right.  The proper
> MOCS entry is probably going to vary from platform to platform.  Bspec
> 47980 says it should be UC with GO:Memory, so I think that would be
> index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
> would mean we we'd need to program a value of "4" here if I'm reading
> the description correctly.
> 
> Right now we have mocs->uc_index pointing to the uncached entry with
> GO:L3.  But I from a quick skim, I think the only places we're using
> that value are the programming of BLIT_CCTL (bspec 45807) and
> RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
> GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
> for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
> here.  Might be worth renaming the field to "uc_index_gomemory" just to
> make it more explicit what it's representing to prevent mistakes during
> enablement of future platforms.

Summarizing the moc index requirement based on the uc_index usage

CMD_CCTL for both CS Write Format Override” and “CS Read Format Override
	UC(Coherent, GO:Memory) 
BLIT_CCTL
  Src Mocs
	UC(Coherent, Go:L3) with UCL3LKDIS
  Dst Mocs
  	UC(Coherent, GoMemory) with UCL3LKDIS
XY_CTRL_SURF_COPY_BLT
  Same as BLIT_CCTL for src and dst Mocs.

So I infer that we need to have the uc_index and uc_gomemory_index both
so that we can use them for src and dst mocs respectively.

Please correct my understanding if i have missed something.

Ram.

> 
> 
> Matt
> 
> > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > +
> > +		if (ccs_ring_size & 1)
> > +			*cs++ = MI_NOOP;
> > +	}
> >  	intel_ring_advance(rq, cs);
> >  	return 0;
> >  }
> > @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
> >  		if (err)
> >  			goto out_rq;
> >  
> > -		err = emit_clear(rq, offset, len, value);
> > +		err = emit_clear(rq, offset, len, value, is_lmem);
> >  
> >  		/* Arbitration is re-enabled between requests. */
> >  out_rq:
> > -- 
> > 2.20.1
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> VTT-OSGC Platform Enablement
> Intel Corporation
> (916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
  2022-02-27 16:52       ` [Intel-gfx] " Ramalingam C
@ 2022-03-03  5:28         ` Matt Roper
  -1 siblings, 0 replies; 47+ messages in thread
From: Matt Roper @ 2022-03-03  5:28 UTC (permalink / raw)
  To: Ramalingam C
  Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel, Ayaz A Siddiqui

On Sun, Feb 27, 2022 at 10:22:20PM +0530, Ramalingam C wrote:
> Matt,
> 
> Thanks for the review.
> 
> On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote:
> > On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> > > From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > > 
> > > Xe-HP and latest devices support Flat CCS which reserved a portion of
> > > the device memory to store compression metadata, during the clearing of
> > > device memory buffer object we also need to clear the associated
> > > CCS buffer.
> > > 
> > > Flat CCS memory can not be directly accessed by S/W.
> > > Address of CCS buffer associated main BO is automatically calculated
> > > by device itself. KMD/UMD can only access this buffer indirectly using
> > > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> > > 
> > > v2: Fixed issues with platform naming [Lucas]
> > > v3: Rebased [Ram]
> > >     Used the round_up funcs [Bob]
> > > v4: Fixed ccs blk calculation [Ram]
> > >     Added Kdoc on flat-ccs.
> > > 
> > > Cc: CQ Tang <cq.tang@intel.com>
> > > Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
> > >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
> > >  2 files changed, 156 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > index f8253012d166..166de5436c4a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > @@ -203,6 +203,21 @@
> > >  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> > >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> > >  
> > > +#define XY_CTRL_SURF_INSTR_SIZE	5
> > > +#define MI_FLUSH_DW_SIZE		3
> > > +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> > > +#define   SRC_ACCESS_TYPE_SHIFT		21
> > > +#define   DST_ACCESS_TYPE_SHIFT		20
> > > +#define   CCS_SIZE_SHIFT		8
> > 
> > Rather than using a shift, it might be better to just define the
> > bitfield.  E.g.,
> > 
> >         #define CCS_SIZE        GENMASK(17, 8)
> > 
> > and then later
> > 
> >         FIELD_PREP(CCS_SIZE, i - 1)
> > 
> > to refer to the proper value.
> > 
> > > +#define   XY_CTRL_SURF_MOCS_SHIFT	25
> > 
> > Same here; we can use GENMASK(31, 25) to define the field.
> 
> Adapting to the GENMASK and FIELD_PREP for these two macros
> > 
> > > +#define   NUM_CCS_BYTES_PER_BLOCK	256
> > > +#define   NUM_BYTES_PER_CCS_BYTE	256
> > > +#define   NUM_CCS_BLKS_PER_XFER		1024
> > > +#define   INDIRECT_ACCESS		0
> > > +#define   DIRECT_ACCESS			1
> > > +#define  MI_FLUSH_LLC			BIT(9)
> > > +#define  MI_FLUSH_CCS			BIT(16)
> > > +
> > >  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
> > >  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
> > >  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > index 20444d6ceb3c..9f9cd2649377 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > @@ -16,6 +16,8 @@ struct insert_pte_data {
> > >  };
> > >  
> > >  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> > > +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> > > +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
> > >  
> > >  static bool engine_supports_migration(struct intel_engine_cs *engine)
> > >  {
> > > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
> > >  	return height % 4 == 3 && height <= 8;
> > >  }
> > >  
> > > +/**
> > > + * DOC: Flat-CCS - Memory compression for Local memory
> > > + *
> > > + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> > > + * stored in local memory for each surface, to support the 3D and media
> > > + * compression formats.
> > > + *
> > > + * The memory required for the CCS of the entire local memory is 1/256 of the
> > > + * local memory size. So before the kernel boot, the required memory is reserved
> > > + * for the CCS data and a secure register will be programmed with the CCS base
> > > + * address.
> > > + *
> > > + * Flat CCS data needs to be cleared when a lmem object is allocated.
> > > + * And CCS data can be copied in and out of CCS region through
> > > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> > > + *
> > > + * When we exaust the lmem, if the object's placements support smem, then we can
> > 
> > Typo: exhaust
> > 
> > > + * directly decompress the compressed lmem object into smem and start using it
> > > + * from smem itself.
> > > + *
> > > + * But when we need to swapout the compressed lmem object into a smem region
> > > + * though objects' placement doesn't support smem, then we copy the lmem content
> > > + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> > > + * When the object is referred, lmem content will be swaped in along with
> > > + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> > > + * location.
> > > + */
> > > +
> > > +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> > > +{
> > > +	/* Mask the 3 LSB to use the PPGTT address space */
> > 
> > This comment implies that we'd be doing something like
> > 
> >         *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);
> > 
> > but that doesn't seem to be the case.  The bspec does say the address
> > must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
> > we get passed something that isn't aligned properly?
> 
> Thanks for pointing this out.
> 
> Bit(2) supposed to be 0 to indicate this is from ppgtt. But since the
> address is 47:3 we need to << by 3 for the whole address.

I don't think this is actually true; we've been bitten by similar
registers/instructions in the past because the bspec's notation for this
kind of stuff is really confusing.  One "pattern" that happens quite
often in hardware is that if we're supposed to pass a memory address
that has a specific alignment, the hardware knows that the lower bits of
the address will always be zero so it just doesn't have us specify them
in the register value; instead it overloads the specific bits that would
always be 0's to represent other flags.

So in this case we're programming a 48-bit address which would usually
require bits [47:0] to represent.  But since the hardware requires that
we pass a qword-aligned value, the lowest three bits of the address
don't need to be explicitly passed and dword1[2:0] get used for
something else.  So the final value we program winds up being either
"addr | BIT(2)" (for GGTT) or just "addr" (for PPGTT), with no actual
shifting taking place.  The "Format: GraphicsAddress[47:3]" tag in the
bspec is supposed to be the clue that they only actually want us to
program the upper 45 bits of the address.

> 
> Hence planning something like
> 
>         /* Address needs to be QWORD aligned */
>         WARN_ON(dst & 0x7, __stringify(dst & 0x7));
>         *cmd++ = MI_FLUSH_DW | flags;
> 
>         dst <<= 3;
>         *cmd++ = lower_32_bits(dst);
>         *cmd++ = upper_32_bits(dst);
> 
> > 
> > > +	*cmd++ = MI_FLUSH_DW | flags;
> > > +	*cmd++ = lower_32_bits(dst);
> > > +	*cmd++ = upper_32_bits(dst);
> > > +
> > > +	return cmd;
> > > +}
> > > +
> > > +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> > > +{
> > > +	u32 num_cmds, num_blks, total_size;
> > > +
> > > +	if (!GET_CCS_BYTES(i915, size))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> > > +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> > > +	 * trnasfer upto 1024 blocks.
> > 
> > Typo: transfer.
> > 
> > > +	 */
> > > +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > > +				NUM_CCS_BYTES_PER_BLOCK);
> > > +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> > > +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> > > +
> > > +	/*
> > > +	 * We need to add a flush before and after
> > > +	 * XY_CTRL_SURF_COPY_BLT
> > 
> > Do you have a bspec reference for this?  It sounds reasonable, but I
> > wanted to confirm with the bspec that we're programming the flush the
> > way it wants us to.
> 
> For emit_clear we are first copying the CHUNK_SIZE of main memory and
> then immediately we are using the same destination of the previous blt
> as src for the next ccs clearing blt ops. Hence we are adding the flush
> there and also after the blt for writing 0s for ccs data to make sure
> that CCS chache is flused we are adding next flush.
> > 
> > > +	 */
> > > +	total_size += 2 * MI_FLUSH_DW_SIZE;
> > > +	return total_size;
> > > +}
> > > +
> > > +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> > > +				     u8 src_mem_access, u8 dst_mem_access,
> > > +				     int src_mocs, int dst_mocs,
> > > +				     u16 num_ccs_blocks)
> > > +{
> > > +	int i = num_ccs_blocks;
> > > +
> > > +	/*
> > > +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> > > +	 * data in and out of the CCS region.
> > > +	 *
> > > +	 * We can copy at most 1024 blocks of 256 bytes using one
> > > +	 * XY_CTRL_SURF_COPY_BLT instruction.
> > > +	 *
> > > +	 * In case we need to copy more than 1024 blocks, we need to add
> > > +	 * another instruction to the same batch buffer.
> > > +	 *
> > > +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> > > +	 *
> > > +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> > > +	 */
> > > +	do {
> > > +		/*
> > > +		 * We use logical AND with 1023 since the size field
> > > +		 * takes values which is in the range of 0 - 1023
> > 
> > I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
> > isn't what you want.
> oops thats a wrong documentation overlooked
> > 
> > > +		 */
> > > +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> > > +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> > > +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> > > +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> > > +		*cmd++ = lower_32_bits(src_addr);
> > > +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> > > +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > > +		*cmd++ = lower_32_bits(dst_addr);
> > > +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> > > +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > > +		src_addr += SZ_64M;
> > > +		dst_addr += SZ_64M;
> > > +		i -= NUM_CCS_BLKS_PER_XFER;
> > > +	} while (i > 0);
> > > +
> > > +	return cmd;
> > > +}
> > > +
> > >  static int emit_copy(struct i915_request *rq,
> > >  		     u32 dst_offset, u32 src_offset, int size)
> > >  {
> > > @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
> > >  	return err;
> > >  }
> > >  
> > > -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > > +		      u32 value, bool is_lmem)
> > >  {
> > > -	const int ver = GRAPHICS_VER(rq->engine->i915);
> > > +	struct drm_i915_private *i915 = rq->engine->i915;
> > > +	const int ver = GRAPHICS_VER(i915);
> > > +	u32 num_ccs_blks, ccs_ring_size;
> > >  	u32 *cs;
> > >  
> > >  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> > >  
> > >  	offset += (u64)rq->engine->instance << 32;
> > >  
> > > -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > > +	/* Clear flat css only when value is 0 */
> > 
> > Typo: ccs
> > 
> > > +	ccs_ring_size = (is_lmem && !value) ?
> > > +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> > > +
> > > +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
> > >  	if (IS_ERR(cs))
> > >  		return PTR_ERR(cs);
> > >  
> > > @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > >  		*cs++ = value;
> > >  	}
> > >  
> > > +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> > > +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > > +					    NUM_CCS_BYTES_PER_BLOCK);
> > > +
> > > +		/*
> > > +		 * Flat CCS surface can only be accessed via
> > > +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> > > +		 * mapping of associated LMEM.
> > > +		 * We can clear ccs surface by writing all 0s,
> > > +		 * so we will flush the previously cleared buffer
> > > +		 * and use it as a source.
> > > +		 */
> > > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > > +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> > > +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> > > +					      1, 1, num_ccs_blks);
> > 
> > The magic number '1' used for MOCS here doesn't look right.  The proper
> > MOCS entry is probably going to vary from platform to platform.  Bspec
> > 47980 says it should be UC with GO:Memory, so I think that would be
> > index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
> > would mean we we'd need to program a value of "4" here if I'm reading
> > the description correctly.
> > 
> > Right now we have mocs->uc_index pointing to the uncached entry with
> > GO:L3.  But I from a quick skim, I think the only places we're using
> > that value are the programming of BLIT_CCTL (bspec 45807) and
> > RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
> > GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
> > for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
> > here.  Might be worth renaming the field to "uc_index_gomemory" just to
> > make it more explicit what it's representing to prevent mistakes during
> > enablement of future platforms.
> 
> Summarizing the moc index requirement based on the uc_index usage
> 
> CMD_CCTL for both CS Write Format Override” and “CS Read Format Override
> 	UC(Coherent, GO:Memory) 
> BLIT_CCTL
>   Src Mocs
> 	UC(Coherent, Go:L3) with UCL3LKDIS
>   Dst Mocs
>   	UC(Coherent, GoMemory) with UCL3LKDIS
> XY_CTRL_SURF_COPY_BLT
>   Same as BLIT_CCTL for src and dst Mocs.
> 
> So I infer that we need to have the uc_index and uc_gomemory_index both
> so that we can use them for src and dst mocs respectively.
> 
> Please correct my understanding if i have missed something.

This is another place where the bspec is often confusing --- "MOCS
value" and "MOCS index" refer to different things.  A "MOCS value" is a
7-bit value that consists of the MOCS index (in bits 6:1) and a flag in
bit 0 (which for our purposes we can always consider to be 0).  So the
"uc_index" field we have in the driver today holds the index of a
platform's "uncached" MOCS entry; when we want to program it into one of
these instructions that expects a 7-bit *value*, we need to use
"uc_index << 1" to get the right behavior.

On these recent platforms, "uncached" is further broken down into
"uncached GO:Memory" and "uncached GO:L3" --- it seems that we've been
assigning the "uncached GO:L3" entry to our uc_index field in the driver
today, but that isn't quite right --- everywhere that we actually pass
the value to the hardware is somewhere we're supposed to be passing an
"uncached GO:Memory" entry instead.


Matt

> 
> Ram.
> 
> > 
> > 
> > Matt
> > 
> > > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > > +
> > > +		if (ccs_ring_size & 1)
> > > +			*cs++ = MI_NOOP;
> > > +	}
> > >  	intel_ring_advance(rq, cs);
> > >  	return 0;
> > >  }
> > > @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
> > >  		if (err)
> > >  			goto out_rq;
> > >  
> > > -		err = emit_clear(rq, offset, len, value);
> > > +		err = emit_clear(rq, offset, len, value, is_lmem);
> > >  
> > >  		/* Arbitration is re-enabled between requests. */
> > >  out_rq:
> > > -- 
> > > 2.20.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > VTT-OSGC Platform Enablement
> > Intel Corporation
> > (916) 356-2795

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms
@ 2022-03-03  5:28         ` Matt Roper
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Roper @ 2022-03-03  5:28 UTC (permalink / raw)
  To: Ramalingam C; +Cc: intel-gfx, lucas.demarchi, CQ Tang, dri-devel

On Sun, Feb 27, 2022 at 10:22:20PM +0530, Ramalingam C wrote:
> Matt,
> 
> Thanks for the review.
> 
> On 2022-02-18 at 17:47:22 -0800, Matt Roper wrote:
> > On Sat, Feb 19, 2022 at 12:17:52AM +0530, Ramalingam C wrote:
> > > From: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > > 
> > > Xe-HP and latest devices support Flat CCS which reserved a portion of
> > > the device memory to store compression metadata, during the clearing of
> > > device memory buffer object we also need to clear the associated
> > > CCS buffer.
> > > 
> > > Flat CCS memory can not be directly accessed by S/W.
> > > Address of CCS buffer associated main BO is automatically calculated
> > > by device itself. KMD/UMD can only access this buffer indirectly using
> > > XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
> > > 
> > > v2: Fixed issues with platform naming [Lucas]
> > > v3: Rebased [Ram]
> > >     Used the round_up funcs [Bob]
> > > v4: Fixed ccs blk calculation [Ram]
> > >     Added Kdoc on flat-ccs.
> > > 
> > > Cc: CQ Tang <cq.tang@intel.com>
> > > Signed-off-by: Ayaz A Siddiqui <ayaz.siddiqui@intel.com>
> > > Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
> > >  drivers/gpu/drm/i915/gt/intel_migrate.c      | 145 ++++++++++++++++++-
> > >  2 files changed, 156 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > index f8253012d166..166de5436c4a 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > > @@ -203,6 +203,21 @@
> > >  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> > >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> > >  
> > > +#define XY_CTRL_SURF_INSTR_SIZE	5
> > > +#define MI_FLUSH_DW_SIZE		3
> > > +#define XY_CTRL_SURF_COPY_BLT		((2 << 29) | (0x48 << 22) | 3)
> > > +#define   SRC_ACCESS_TYPE_SHIFT		21
> > > +#define   DST_ACCESS_TYPE_SHIFT		20
> > > +#define   CCS_SIZE_SHIFT		8
> > 
> > Rather than using a shift, it might be better to just define the
> > bitfield.  E.g.,
> > 
> >         #define CCS_SIZE        GENMASK(17, 8)
> > 
> > and then later
> > 
> >         FIELD_PREP(CCS_SIZE, i - 1)
> > 
> > to refer to the proper value.
> > 
> > > +#define   XY_CTRL_SURF_MOCS_SHIFT	25
> > 
> > Same here; we can use GENMASK(31, 25) to define the field.
> 
> Adapting to the GENMASK and FIELD_PREP for these two macros
> > 
> > > +#define   NUM_CCS_BYTES_PER_BLOCK	256
> > > +#define   NUM_BYTES_PER_CCS_BYTE	256
> > > +#define   NUM_CCS_BLKS_PER_XFER		1024
> > > +#define   INDIRECT_ACCESS		0
> > > +#define   DIRECT_ACCESS			1
> > > +#define  MI_FLUSH_LLC			BIT(9)
> > > +#define  MI_FLUSH_CCS			BIT(16)
> > > +
> > >  #define COLOR_BLT_CMD			(2 << 29 | 0x40 << 22 | (5 - 2))
> > >  #define XY_COLOR_BLT_CMD		(2 << 29 | 0x50 << 22)
> > >  #define SRC_COPY_BLT_CMD		(2 << 29 | 0x43 << 22)
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > index 20444d6ceb3c..9f9cd2649377 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
> > > @@ -16,6 +16,8 @@ struct insert_pte_data {
> > >  };
> > >  
> > >  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */
> > > +#define GET_CCS_BYTES(i915, size)	(HAS_FLAT_CCS(i915) ? \
> > > +					 DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0)
> > >  
> > >  static bool engine_supports_migration(struct intel_engine_cs *engine)
> > >  {
> > > @@ -467,6 +469,113 @@ static bool wa_1209644611_applies(int ver, u32 size)
> > >  	return height % 4 == 3 && height <= 8;
> > >  }
> > >  
> > > +/**
> > > + * DOC: Flat-CCS - Memory compression for Local memory
> > > + *
> > > + * On Xe-HP and later devices, we use dedicated compression control state (CCS)
> > > + * stored in local memory for each surface, to support the 3D and media
> > > + * compression formats.
> > > + *
> > > + * The memory required for the CCS of the entire local memory is 1/256 of the
> > > + * local memory size. So before the kernel boot, the required memory is reserved
> > > + * for the CCS data and a secure register will be programmed with the CCS base
> > > + * address.
> > > + *
> > > + * Flat CCS data needs to be cleared when a lmem object is allocated.
> > > + * And CCS data can be copied in and out of CCS region through
> > > + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
> > > + *
> > > + * When we exaust the lmem, if the object's placements support smem, then we can
> > 
> > Typo: exhaust
> > 
> > > + * directly decompress the compressed lmem object into smem and start using it
> > > + * from smem itself.
> > > + *
> > > + * But when we need to swapout the compressed lmem object into a smem region
> > > + * though objects' placement doesn't support smem, then we copy the lmem content
> > > + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
> > > + * When the object is referred, lmem content will be swaped in along with
> > > + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
> > > + * location.
> > > + */
> > > +
> > > +static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags)
> > > +{
> > > +	/* Mask the 3 LSB to use the PPGTT address space */
> > 
> > This comment implies that we'd be doing something like
> > 
> >         *cmd++ = lower_32_bits(dst) & GEN_MASK(31, 3);
> > 
> > but that doesn't seem to be the case.  The bspec does say the address
> > must be qword-aligned, so maybe we should just have a drm_WARN_ON() if
> > we get passed something that isn't aligned properly?
> 
> Thanks for pointing this out.
> 
> Bit(2) supposed to be 0 to indicate this is from ppgtt. But since the
> address is 47:3 we need to << by 3 for the whole address.

I don't think this is actually true; we've been bitten by similar
registers/instructions in the past because the bspec's notation for this
kind of stuff is really confusing.  One "pattern" that happens quite
often in hardware is that if we're supposed to pass a memory address
that has a specific alignment, the hardware knows that the lower bits of
the address will always be zero so it just doesn't have us specify them
in the register value; instead it overloads the specific bits that would
always be 0's to represent other flags.

So in this case we're programming a 48-bit address which would usually
require bits [47:0] to represent.  But since the hardware requires that
we pass a qword-aligned value, the lowest three bits of the address
don't need to be explicitly passed and dword1[2:0] get used for
something else.  So the final value we program winds up being either
"addr | BIT(2)" (for GGTT) or just "addr" (for PPGTT), with no actual
shifting taking place.  The "Format: GraphicsAddress[47:3]" tag in the
bspec is supposed to be the clue that they only actually want us to
program the upper 45 bits of the address.

> 
> Hence planning something like
> 
>         /* Address needs to be QWORD aligned */
>         WARN_ON(dst & 0x7, __stringify(dst & 0x7));
>         *cmd++ = MI_FLUSH_DW | flags;
> 
>         dst <<= 3;
>         *cmd++ = lower_32_bits(dst);
>         *cmd++ = upper_32_bits(dst);
> 
> > 
> > > +	*cmd++ = MI_FLUSH_DW | flags;
> > > +	*cmd++ = lower_32_bits(dst);
> > > +	*cmd++ = upper_32_bits(dst);
> > > +
> > > +	return cmd;
> > > +}
> > > +
> > > +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size)
> > > +{
> > > +	u32 num_cmds, num_blks, total_size;
> > > +
> > > +	if (!GET_CCS_BYTES(i915, size))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte
> > > +	 * blocks. one XY_CTRL_SURF_COPY_BLT command can
> > > +	 * trnasfer upto 1024 blocks.
> > 
> > Typo: transfer.
> > 
> > > +	 */
> > > +	num_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > > +				NUM_CCS_BYTES_PER_BLOCK);
> > > +	num_cmds = DIV_ROUND_UP(num_blks, NUM_CCS_BLKS_PER_XFER);
> > > +	total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds;
> > > +
> > > +	/*
> > > +	 * We need to add a flush before and after
> > > +	 * XY_CTRL_SURF_COPY_BLT
> > 
> > Do you have a bspec reference for this?  It sounds reasonable, but I
> > wanted to confirm with the bspec that we're programming the flush the
> > way it wants us to.
> 
> For emit_clear we are first copying the CHUNK_SIZE of main memory and
> then immediately we are using the same destination of the previous blt
> as src for the next ccs clearing blt ops. Hence we are adding the flush
> there and also after the blt for writing 0s for ccs data to make sure
> that CCS chache is flused we are adding next flush.
> > 
> > > +	 */
> > > +	total_size += 2 * MI_FLUSH_DW_SIZE;
> > > +	return total_size;
> > > +}
> > > +
> > > +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr,
> > > +				     u8 src_mem_access, u8 dst_mem_access,
> > > +				     int src_mocs, int dst_mocs,
> > > +				     u16 num_ccs_blocks)
> > > +{
> > > +	int i = num_ccs_blocks;
> > > +
> > > +	/*
> > > +	 * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS
> > > +	 * data in and out of the CCS region.
> > > +	 *
> > > +	 * We can copy at most 1024 blocks of 256 bytes using one
> > > +	 * XY_CTRL_SURF_COPY_BLT instruction.
> > > +	 *
> > > +	 * In case we need to copy more than 1024 blocks, we need to add
> > > +	 * another instruction to the same batch buffer.
> > > +	 *
> > > +	 * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS.
> > > +	 *
> > > +	 * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM.
> > > +	 */
> > > +	do {
> > > +		/*
> > > +		 * We use logical AND with 1023 since the size field
> > > +		 * takes values which is in the range of 0 - 1023
> > 
> > I think you mean 'bitwise AND' here?  A logical AND would be '&&' which
> > isn't what you want.
> oops thats a wrong documentation overlooked
> > 
> > > +		 */
> > > +		*cmd++ = ((XY_CTRL_SURF_COPY_BLT) |
> > > +			  (src_mem_access << SRC_ACCESS_TYPE_SHIFT) |
> > > +			  (dst_mem_access << DST_ACCESS_TYPE_SHIFT) |
> > > +			  (((i - 1) & 1023) << CCS_SIZE_SHIFT));
> > > +		*cmd++ = lower_32_bits(src_addr);
> > > +		*cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) |
> > > +			  (src_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > > +		*cmd++ = lower_32_bits(dst_addr);
> > > +		*cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) |
> > > +			  (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT));
> > > +		src_addr += SZ_64M;
> > > +		dst_addr += SZ_64M;
> > > +		i -= NUM_CCS_BLKS_PER_XFER;
> > > +	} while (i > 0);
> > > +
> > > +	return cmd;
> > > +}
> > > +
> > >  static int emit_copy(struct i915_request *rq,
> > >  		     u32 dst_offset, u32 src_offset, int size)
> > >  {
> > > @@ -614,16 +723,23 @@ intel_context_migrate_copy(struct intel_context *ce,
> > >  	return err;
> > >  }
> > >  
> > > -static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > > +static int emit_clear(struct i915_request *rq, u64 offset, int size,
> > > +		      u32 value, bool is_lmem)
> > >  {
> > > -	const int ver = GRAPHICS_VER(rq->engine->i915);
> > > +	struct drm_i915_private *i915 = rq->engine->i915;
> > > +	const int ver = GRAPHICS_VER(i915);
> > > +	u32 num_ccs_blks, ccs_ring_size;
> > >  	u32 *cs;
> > >  
> > >  	GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
> > >  
> > >  	offset += (u64)rq->engine->instance << 32;
> > >  
> > > -	cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6);
> > > +	/* Clear flat css only when value is 0 */
> > 
> > Typo: ccs
> > 
> > > +	ccs_ring_size = (is_lmem && !value) ?
> > > +			 calc_ctrl_surf_instr_size(i915, size) : 0;
> > > +
> > > +	cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2));
> > >  	if (IS_ERR(cs))
> > >  		return PTR_ERR(cs);
> > >  
> > > @@ -646,6 +762,27 @@ static int emit_clear(struct i915_request *rq, u64 offset, int size, u32 value)
> > >  		*cs++ = value;
> > >  	}
> > >  
> > > +	if (is_lmem && HAS_FLAT_CCS(i915) && !value) {
> > > +		num_ccs_blks = DIV_ROUND_UP(GET_CCS_BYTES(i915, size),
> > > +					    NUM_CCS_BYTES_PER_BLOCK);
> > > +
> > > +		/*
> > > +		 * Flat CCS surface can only be accessed via
> > > +		 * XY_CTRL_SURF_COPY_BLT CMD and using indirect
> > > +		 * mapping of associated LMEM.
> > > +		 * We can clear ccs surface by writing all 0s,
> > > +		 * so we will flush the previously cleared buffer
> > > +		 * and use it as a source.
> > > +		 */
> > > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > > +		cs = _i915_ctrl_surf_copy_blt(cs, offset, offset,
> > > +					      DIRECT_ACCESS, INDIRECT_ACCESS,
> > > +					      1, 1, num_ccs_blks);
> > 
> > The magic number '1' used for MOCS here doesn't look right.  The proper
> > MOCS entry is probably going to vary from platform to platform.  Bspec
> > 47980 says it should be UC with GO:Memory, so I think that would be
> > index 2 for DG2 and Xe_HP SDV.  Since MOCS values are (index << 1), that
> > would mean we we'd need to program a value of "4" here if I'm reading
> > the description correctly.
> > 
> > Right now we have mocs->uc_index pointing to the uncached entry with
> > GO:L3.  But I from a quick skim, I think the only places we're using
> > that value are the programming of BLIT_CCTL (bspec 45807) and
> > RING_CMD_CCTL (bspec 45826), both of which are supposed to be using
> > GO:Memory instead of GO:L3.  So maybe we should fix the uc_index value
> > for those platforms and then use "rq->engine->gt->mocs.uc_index << 1"
> > here.  Might be worth renaming the field to "uc_index_gomemory" just to
> > make it more explicit what it's representing to prevent mistakes during
> > enablement of future platforms.
> 
> Summarizing the moc index requirement based on the uc_index usage
> 
> CMD_CCTL for both CS Write Format Override” and “CS Read Format Override
> 	UC(Coherent, GO:Memory) 
> BLIT_CCTL
>   Src Mocs
> 	UC(Coherent, Go:L3) with UCL3LKDIS
>   Dst Mocs
>   	UC(Coherent, GoMemory) with UCL3LKDIS
> XY_CTRL_SURF_COPY_BLT
>   Same as BLIT_CCTL for src and dst Mocs.
> 
> So I infer that we need to have the uc_index and uc_gomemory_index both
> so that we can use them for src and dst mocs respectively.
> 
> Please correct my understanding if i have missed something.

This is another place where the bspec is often confusing --- "MOCS
value" and "MOCS index" refer to different things.  A "MOCS value" is a
7-bit value that consists of the MOCS index (in bits 6:1) and a flag in
bit 0 (which for our purposes we can always consider to be 0).  So the
"uc_index" field we have in the driver today holds the index of a
platform's "uncached" MOCS entry; when we want to program it into one of
these instructions that expects a 7-bit *value*, we need to use
"uc_index << 1" to get the right behavior.

On these recent platforms, "uncached" is further broken down into
"uncached GO:Memory" and "uncached GO:L3" --- it seems that we've been
assigning the "uncached GO:L3" entry to our uc_index field in the driver
today, but that isn't quite right --- everywhere that we actually pass
the value to the hardware is somewhere we're supposed to be passing an
"uncached GO:Memory" entry instead.


Matt

> 
> Ram.
> 
> > 
> > 
> > Matt
> > 
> > > +		cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS);
> > > +
> > > +		if (ccs_ring_size & 1)
> > > +			*cs++ = MI_NOOP;
> > > +	}
> > >  	intel_ring_advance(rq, cs);
> > >  	return 0;
> > >  }
> > > @@ -711,7 +848,7 @@ intel_context_migrate_clear(struct intel_context *ce,
> > >  		if (err)
> > >  			goto out_rq;
> > >  
> > > -		err = emit_clear(rq, offset, len, value);
> > > +		err = emit_clear(rq, offset, len, value, is_lmem);
> > >  
> > >  		/* Arbitration is re-enabled between requests. */
> > >  out_rq:
> > > -- 
> > > 2.20.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > VTT-OSGC Platform Enablement
> > Intel Corporation
> > (916) 356-2795

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards
  2022-02-18 18:47   ` Ramalingam C
@ 2022-03-03  9:43     ` Jani Nikula
  -1 siblings, 0 replies; 47+ messages in thread
From: Jani Nikula @ 2022-03-03  9:43 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel
  Cc: Thomas Hellström, Tvrtko Ursulin, lucas.demarchi,
	Matthew Auld, Rodrigo Vivi

On Sat, 19 Feb 2022, Ramalingam C <ramalingam.c@intel.com> wrote:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8073438b67c8..6cd518a3277c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -29,6 +29,8 @@
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  #include "i915_vma_types.h"
> +#include "i915_params.h"

Do you need this? Avoid includes from includes.

> +#include "intel_memory_region.h"
>  
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
> @@ -223,6 +225,7 @@ struct i915_address_space {
>  	struct device *dma;
>  	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>  
>  	unsigned int bind_async_flags;
>  
> @@ -384,6 +387,25 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>  	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>  }
>  
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	/* avoid INTEL_MEMORY_MOCK overflow */
> +	if ((int)type >= ARRAY_SIZE(vm->min_alignment))
> +		type = INTEL_MEMORY_SYSTEM;
> +
> +	return vm->min_alignment[type];
> +}
> +
> +static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
> +					    struct drm_i915_gem_object  *obj)
> +{
> +	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
> +	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
> +
> +	return i915_vm_min_alignment(vm, type);
> +}
> +

Is it performance critical that these two functions are inlines, and
warrant including more headers from headers, complicating the
interdependent mess that the gem/gt includes already are?


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Intel-gfx] [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards
@ 2022-03-03  9:43     ` Jani Nikula
  0 siblings, 0 replies; 47+ messages in thread
From: Jani Nikula @ 2022-03-03  9:43 UTC (permalink / raw)
  To: Ramalingam C, intel-gfx, dri-devel
  Cc: Thomas Hellström, lucas.demarchi, Matthew Auld, Rodrigo Vivi

On Sat, 19 Feb 2022, Ramalingam C <ramalingam.c@intel.com> wrote:
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index 8073438b67c8..6cd518a3277c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -29,6 +29,8 @@
>  #include "i915_selftest.h"
>  #include "i915_vma_resource.h"
>  #include "i915_vma_types.h"
> +#include "i915_params.h"

Do you need this? Avoid includes from includes.

> +#include "intel_memory_region.h"
>  
>  #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
>  
> @@ -223,6 +225,7 @@ struct i915_address_space {
>  	struct device *dma;
>  	u64 total;		/* size addr space maps (ex. 2GB for ggtt) */
>  	u64 reserved;		/* size addr space reserved */
> +	u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
>  
>  	unsigned int bind_async_flags;
>  
> @@ -384,6 +387,25 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm)
>  	return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K);
>  }
>  
> +static inline u64 i915_vm_min_alignment(struct i915_address_space *vm,
> +					enum intel_memory_type type)
> +{
> +	/* avoid INTEL_MEMORY_MOCK overflow */
> +	if ((int)type >= ARRAY_SIZE(vm->min_alignment))
> +		type = INTEL_MEMORY_SYSTEM;
> +
> +	return vm->min_alignment[type];
> +}
> +
> +static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm,
> +					    struct drm_i915_gem_object  *obj)
> +{
> +	struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
> +	enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM;
> +
> +	return i915_vm_min_alignment(vm, type);
> +}
> +

Is it performance critical that these two functions are inlines, and
warrant including more headers from headers, complicating the
interdependent mess that the gem/gt includes already are?


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-03-03  9:43 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 18:47 [PATCH 00/15] drm/i915: Enable DG2 Ramalingam C
2022-02-18 18:47 ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 01/15] drm/i915/dg2: Define GuC firmware version for DG2 Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 19:14   ` Ceraolo Spurio, Daniele
2022-02-18 19:14     ` [Intel-gfx] " Ceraolo Spurio, Daniele
2022-02-18 18:47 ` [PATCH 02/15] drm/i915: Fix for PHY_MISC_TC1 offset Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 03/15] drm/i915/dg2: Drop 38.4 MHz MPLLB tables Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 04/15] drm/i915/dg2: Enable 5th port Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 05/15] drm/i915: add needs_compact_pt flag Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [Intel-gfx] [PATCH 06/15] drm/i915: enforce min GTT alignment for discrete cards Ramalingam C
2022-02-18 18:47   ` Ramalingam C
2022-03-03  9:43   ` [Intel-gfx] " Jani Nikula
2022-03-03  9:43     ` Jani Nikula
2022-02-18 18:47 ` [PATCH 07/15] drm/i915: support 64K GTT pages " Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [Intel-gfx] [PATCH 08/15] drm/i915: add gtt misalignment test Ramalingam C
2022-02-18 18:47   ` Ramalingam C
2022-02-18 18:47 ` [PATCH 09/15] drm/i915/gtt: allow overriding the pt alignment Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 10/15] drm/i915/gtt: add xehpsdv_ppgtt_insert_entry Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 11/15] drm/i915/migrate: add acceleration support for DG2 Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [Intel-gfx] [PATCH 12/15] drm/i915/uapi: document behaviour for DG2 64K support Ramalingam C
2022-02-18 18:47   ` Ramalingam C
2022-02-18 18:47 ` [Intel-gfx] [PATCH 13/15] drm/i915/xehpsdv: Add has_flat_ccs to device info Ramalingam C
2022-02-18 18:47   ` Ramalingam C
2022-02-18 18:47 ` [PATCH 14/15] drm/i915/lmem: Enable lmem for platforms with Flat CCS Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-18 18:47 ` [PATCH 15/15] drm/i915/gt: Clear compress metadata for Xe_HP platforms Ramalingam C
2022-02-18 18:47   ` [Intel-gfx] " Ramalingam C
2022-02-19  1:47   ` Matt Roper
2022-02-19  1:47     ` [Intel-gfx] " Matt Roper
2022-02-27 16:52     ` Ramalingam C
2022-02-27 16:52       ` [Intel-gfx] " Ramalingam C
2022-03-03  5:28       ` Matt Roper
2022-03-03  5:28         ` [Intel-gfx] " Matt Roper
2022-02-18 19:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Enable DG2 Patchwork
2022-02-18 19:10 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-02-18 19:38 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-02-19 11:53 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-02-20 10:18 ` [Intel-gfx] [PATCH 00/15] " Lucas De Marchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.