All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches
@ 2023-03-08 10:17 Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt() Matthew Auld
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe

Split from the small-bar series. Plus one extra patch to fix the overloaded
meaning behind xe_ttm_stolen_cpu_inaccessible(), as pointed out by Maarten.

v2: Apply s/lmem/vram/ for the entire driver

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt()
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-08 11:20   ` Maarten Lankhorst
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 2/6] drm/xe: s/lmem/vram/ Matthew Auld
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe

xe_ttm_stolen_cpu_inaccessible() was originally meant to just cover the
case where stolen is not directly CPU accessible on some older
integrated platforms, and as such a GGTT mapping was also required for
CPU access (as per the check in xe_bo_create_pin_map_at()).

However with small-bar systems on dgfx we have one more case where
stolen is also inaccessible, however here we don't have any fallback
GGTT mode for CPU access. Fix the check in xe_bo_create_pin_map_at() to
make this distinction clear. In such a case the later vmap() will fail
anyway.

v2: fix kernel-doc warning

Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c             |  2 +-
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 14 ++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 2bfd3f6f2e9a..876f77669104 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1151,7 +1151,7 @@ struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_gt *gt,
 	u64 end = offset == ~0ull ? offset : start + size;
 
 	if (flags & XE_BO_CREATE_STOLEN_BIT &&
-	    xe_ttm_stolen_cpu_inaccessible(xe))
+	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
 		flags |= XE_BO_CREATE_GGTT_BIT;
 
 	bo = xe_bo_create_locked_range(xe, gt, vm, size, start, end, type, flags);
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index 2e8d07ad42ae..e608b49072ba 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -66,6 +66,20 @@ bool xe_ttm_stolen_cpu_inaccessible(struct xe_device *xe)
 	return !mgr->io_base || GRAPHICS_VERx100(xe) < 1270;
 }
 
+/**
+ * xe_ttm_stolen_cpu_access_needs_ggtt - If we can't directly CPU access stolen,
+ * can we then fallback to mapping through the GGTT.
+ * @xe: xe device
+ *
+ * Some older integrated platforms don't support reliable CPU access for stolen,
+ * however on such hardware we can always use the mappable part of the GGTT for
+ * CPU access. Check if that's the case for this device.
+ */
+bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe)
+{
+	return xe_ttm_stolen_cpu_inaccessible(xe) && !IS_DGFX(xe);
+}
+
 static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
index 2fda97b97a05..e210dada636e 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
@@ -15,6 +15,7 @@ struct xe_device;
 void xe_ttm_stolen_mgr_init(struct xe_device *xe);
 int xe_ttm_stolen_io_mem_reserve(struct xe_device *xe, struct ttm_resource *mem);
 bool xe_ttm_stolen_cpu_inaccessible(struct xe_device *xe);
+bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe);
 u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset);
 u64 xe_ttm_stolen_gpu_offset(struct xe_device *xe);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 2/6] drm/xe: s/lmem/vram/
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt() Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 3/6] drm/xe/vram: start tracking the io_size Matthew Auld
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

This seems to be the preferred nomenclature in xe. Currently we are
intermixing vram and lmem, which is confusing.

v2 (Gwan-gyeong Mun & Lucas):
  - Rather apply to the entire driver

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/Kconfig.debug      |  2 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
 drivers/gpu/drm/xe/xe_bo.c            | 10 +++----
 drivers/gpu/drm/xe/xe_bo.h            |  6 ++--
 drivers/gpu/drm/xe/xe_ggtt.c          |  6 ++--
 drivers/gpu/drm/xe/xe_migrate.c       | 12 ++++----
 drivers/gpu/drm/xe/xe_mmio.c          | 40 +++++++++++++--------------
 drivers/gpu/drm/xe/xe_module.c        |  6 ++--
 drivers/gpu/drm/xe/xe_module.h        |  2 +-
 drivers/gpu/drm/xe/xe_pt.c            | 12 ++++----
 drivers/gpu/drm/xe/xe_vm.c            | 10 +++----
 11 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/xe/Kconfig.debug b/drivers/gpu/drm/xe/Kconfig.debug
index 565be3f6b9b9..93b284cdd0a2 100644
--- a/drivers/gpu/drm/xe/Kconfig.debug
+++ b/drivers/gpu/drm/xe/Kconfig.debug
@@ -41,7 +41,7 @@ config DRM_XE_DEBUG_VM
 	  If in doubt, say "N".
 
 config DRM_XE_DEBUG_MEM
-	bool "Enable passing SYS/LMEM addresses to user space"
+	bool "Enable passing SYS/VRAM addresses to user space"
 	default n
 	help
 	  Pass object location trough uapi. Intended for extended
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index b7e4a126e8b7..ac659b94e7f5 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -129,7 +129,7 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 	}
 	dma_fence_put(fence);
 
-	/* Try to copy 0xc0 from sysmem to lmem with 2MB or 64KiB/4KiB pages */
+	/* Try to copy 0xc0 from sysmem to vram with 2MB or 64KiB/4KiB pages */
 	xe_map_memset(xe, &sysmem->vmap, 0, 0xc0, sysmem->size);
 	xe_map_memset(xe, &bo->vmap, 0, 0xd0, bo->size);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 876f77669104..73a7f2cd4ad8 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1290,12 +1290,12 @@ int xe_bo_pin(struct xe_bo *bo)
 	if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) &&
 	    bo->flags & XE_BO_INTERNAL_TEST)) {
 		struct ttm_place *place = &(bo->placements[0]);
-		bool lmem;
+		bool vram;
 
 		if (mem_type_is_vram(place->mem_type)) {
 			XE_BUG_ON(!(place->flags & TTM_PL_FLAG_CONTIGUOUS));
 
-			place->fpfn = (xe_bo_addr(bo, 0, PAGE_SIZE, &lmem) -
+			place->fpfn = (xe_bo_addr(bo, 0, PAGE_SIZE, &vram) -
 				       vram_region_io_offset(bo)) >> PAGE_SHIFT;
 			place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT);
 
@@ -1415,7 +1415,7 @@ bool xe_bo_is_xe_bo(struct ttm_buffer_object *bo)
 }
 
 dma_addr_t xe_bo_addr(struct xe_bo *bo, u64 offset,
-		      size_t page_size, bool *is_lmem)
+		      size_t page_size, bool *is_vram)
 {
 	struct xe_res_cursor cur;
 	u64 page;
@@ -1427,9 +1427,9 @@ dma_addr_t xe_bo_addr(struct xe_bo *bo, u64 offset,
 	page = offset >> PAGE_SHIFT;
 	offset &= (PAGE_SIZE - 1);
 
-	*is_lmem = xe_bo_is_vram(bo);
+	*is_vram = xe_bo_is_vram(bo);
 
-	if (!*is_lmem && !xe_bo_is_stolen(bo)) {
+	if (!*is_vram && !xe_bo_is_stolen(bo)) {
 		XE_BUG_ON(!bo->ttm.ttm);
 
 		xe_res_first_sg(xe_bo_get_sg(bo), page << PAGE_SHIFT,
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 0699b2b4c5ca..f841e74cd417 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -196,14 +196,14 @@ static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
 
 bool xe_bo_is_xe_bo(struct ttm_buffer_object *bo);
 dma_addr_t xe_bo_addr(struct xe_bo *bo, u64 offset,
-		      size_t page_size, bool *is_lmem);
+		      size_t page_size, bool *is_vram);
 
 static inline dma_addr_t
 xe_bo_main_addr(struct xe_bo *bo, size_t page_size)
 {
-	bool is_lmem;
+	bool is_vram;
 
-	return xe_bo_addr(bo, 0, page_size, &is_lmem);
+	return xe_bo_addr(bo, 0, page_size, &is_vram);
 }
 
 static inline u32
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index a0d672b6afb2..a430d1568890 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -28,12 +28,12 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	u64 pte;
-	bool is_lmem;
+	bool is_vram;
 
-	pte = xe_bo_addr(bo, bo_offset, GEN8_PAGE_SIZE, &is_lmem);
+	pte = xe_bo_addr(bo, bo_offset, GEN8_PAGE_SIZE, &is_vram);
 	pte |= GEN8_PAGE_PRESENT;
 
-	if (is_lmem)
+	if (is_vram)
 		pte |= GEN12_GGTT_PTE_LM;
 
 	/* FIXME: vfunc + pass in caching rules */
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index bc69ec17d5ad..c0523d8fe944 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -222,15 +222,15 @@ static int xe_migrate_prepare_vm(struct xe_gt *gt, struct xe_migrate *m,
 			level++;
 		}
 	} else {
-		bool is_lmem;
-		u64 batch_addr = xe_bo_addr(batch, 0, GEN8_PAGE_SIZE, &is_lmem);
+		bool is_vram;
+		u64 batch_addr = xe_bo_addr(batch, 0, GEN8_PAGE_SIZE, &is_vram);
 
 		m->batch_base_ofs = xe_migrate_vram_ofs(batch_addr);
 
 		if (xe->info.supports_usm) {
 			batch = gt->usm.bb_pool.bo;
 			batch_addr = xe_bo_addr(batch, 0, GEN8_PAGE_SIZE,
-						&is_lmem);
+						&is_vram);
 			m->usm_batch_base_ofs = xe_migrate_vram_ofs(batch_addr);
 		}
 	}
@@ -933,12 +933,12 @@ static void write_pgtable(struct xe_gt *gt, struct xe_bb *bb, u64 ppgtt_ofs,
 	 */
 	XE_BUG_ON(update->qwords > 0x1ff);
 	if (!ppgtt_ofs) {
-		bool is_lmem;
+		bool is_vram;
 
 		ppgtt_ofs = xe_migrate_vram_ofs(xe_bo_addr(update->pt_bo, 0,
 							   GEN8_PAGE_SIZE,
-							   &is_lmem));
-		XE_BUG_ON(!is_lmem);
+							   &is_vram));
+		XE_BUG_ON(!is_vram);
 	}
 
 	do {
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 65b0df9bb579..e5bd4609aaee 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -68,7 +68,7 @@ _resize_bar(struct xe_device *xe, int resno, resource_size_t size)
 	return 1;
 }
 
-static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
+static int xe_resize_vram_bar(struct xe_device *xe, resource_size_t vram_size)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	struct pci_bus *root = pdev->bus;
@@ -78,31 +78,31 @@ static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
 	u32 pci_cmd;
 	int i;
 	int ret;
-	u64 force_lmem_bar_size = xe_force_lmem_bar_size;
+	u64 force_vram_bar_size = xe_force_vram_bar_size;
 
 	current_size = roundup_pow_of_two(pci_resource_len(pdev, GEN12_LMEM_BAR));
 
-	if (force_lmem_bar_size) {
+	if (force_vram_bar_size) {
 		u32 bar_sizes;
 
-		rebar_size = force_lmem_bar_size * (resource_size_t)SZ_1M;
+		rebar_size = force_vram_bar_size * (resource_size_t)SZ_1M;
 		bar_sizes = pci_rebar_get_possible_sizes(pdev, GEN12_LMEM_BAR);
 
 		if (rebar_size == current_size)
 			return 0;
 
 		if (!(bar_sizes & BIT(pci_rebar_bytes_to_size(rebar_size))) ||
-		    rebar_size >= roundup_pow_of_two(lmem_size)) {
-			rebar_size = lmem_size;
+		    rebar_size >= roundup_pow_of_two(vram_size)) {
+			rebar_size = vram_size;
 			drm_info(&xe->drm,
 				 "Given bar size is not within supported size, setting it to default: %llu\n",
-				 (u64)lmem_size >> 20);
+				 (u64)vram_size >> 20);
 		}
 	} else {
 		rebar_size = current_size;
 
-		if (rebar_size != roundup_pow_of_two(lmem_size))
-			rebar_size = lmem_size;
+		if (rebar_size != roundup_pow_of_two(vram_size))
+			rebar_size = vram_size;
 		else
 			return 0;
 	}
@@ -117,7 +117,7 @@ static int xe_resize_lmem_bar(struct xe_device *xe, resource_size_t lmem_size)
 	}
 
 	if (!root_res) {
-		drm_info(&xe->drm, "Can't resize LMEM BAR - platform support is missing\n");
+		drm_info(&xe->drm, "Can't resize VRAM BAR - platform support is missing\n");
 		return -1;
 	}
 
@@ -168,7 +168,7 @@ int xe_mmio_total_vram_size(struct xe_device *xe, u64 *vram_size, u64 *usable_si
 	if (usable_size) {
 		reg = xe_gt_mcr_unicast_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
 		*usable_size = (u64)REG_FIELD_GET(GENMASK(31, 8), reg) * SZ_64K;
-		drm_info(&xe->drm, "lmem_size: 0x%llx usable_size: 0x%llx\n",
+		drm_info(&xe->drm, "vram_size: 0x%llx usable_size: 0x%llx\n",
 			 *vram_size, *usable_size);
 	}
 
@@ -180,7 +180,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	struct xe_gt *gt;
 	u8 id;
-	u64 lmem_size;
+	u64 vram_size;
 	u64 original_size;
 	u64 current_size;
 	u64 usable_size;
@@ -207,29 +207,29 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	gt = xe_device_get_gt(xe, 0);
 	original_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
 
-	err = xe_mmio_total_vram_size(xe, &lmem_size, &usable_size);
+	err = xe_mmio_total_vram_size(xe, &vram_size, &usable_size);
 	if (err)
 		return err;
 
-	resize_result = xe_resize_lmem_bar(xe, lmem_size);
+	resize_result = xe_resize_vram_bar(xe, vram_size);
 	current_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
 	xe->mem.vram.io_start = pci_resource_start(pdev, GEN12_LMEM_BAR);
 
-	xe->mem.vram.size = min(current_size, lmem_size);
+	xe->mem.vram.size = min(current_size, vram_size);
 
 	if (!xe->mem.vram.size)
 		return -EIO;
 
 	if (resize_result > 0)
-		drm_info(&xe->drm, "Successfully resize LMEM from %lluMiB to %lluMiB\n",
+		drm_info(&xe->drm, "Successfully resize VRAM from %lluMiB to %lluMiB\n",
 			 (u64)original_size >> 20,
 			 (u64)current_size >> 20);
-	else if (xe->mem.vram.size < lmem_size && !xe_force_lmem_bar_size)
+	else if (xe->mem.vram.size < vram_size && !xe_force_vram_bar_size)
 		drm_info(&xe->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' support in your BIOS.\n",
 			 (u64)xe->mem.vram.size >> 20);
-	if (xe->mem.vram.size < lmem_size)
+	if (xe->mem.vram.size < vram_size)
 		drm_warn(&xe->drm, "Restricting VRAM size to PCI resource size (0x%llx->0x%llx)\n",
-			 lmem_size, (u64)xe->mem.vram.size);
+			 vram_size, (u64)xe->mem.vram.size);
 
 	xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.size);
 	xe->mem.vram.size = min_t(u64, xe->mem.vram.size, usable_size);
@@ -360,7 +360,7 @@ int xe_mmio_init(struct xe_device *xe)
 	 * and we should not continue with driver initialization.
 	 */
 	if (IS_DGFX(xe) && !(xe_mmio_read32(gt, GU_CNTL.reg) & LMEM_INIT)) {
-		drm_err(&xe->drm, "LMEM not initialized by firmware\n");
+		drm_err(&xe->drm, "VRAM not initialized by firmware\n");
 		return -ENODEV;
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
index 5a51a9959eff..6860586ce7f8 100644
--- a/drivers/gpu/drm/xe/xe_module.c
+++ b/drivers/gpu/drm/xe/xe_module.c
@@ -22,9 +22,9 @@ bool enable_display = true;
 module_param_named(enable_display, enable_display, bool, 0444);
 MODULE_PARM_DESC(enable_display, "Enable display");
 
-u32 xe_force_lmem_bar_size;
-module_param_named(lmem_bar_size, xe_force_lmem_bar_size, uint, 0600);
-MODULE_PARM_DESC(lmem_bar_size, "Set the lmem bar size(in MiB)");
+u32 xe_force_vram_bar_size;
+module_param_named(vram_bar_size, xe_force_vram_bar_size, uint, 0600);
+MODULE_PARM_DESC(vram_bar_size, "Set the vram bar size(in MiB)");
 
 int xe_guc_log_level = 5;
 module_param_named(guc_log_level, xe_guc_log_level, int, 0600);
diff --git a/drivers/gpu/drm/xe/xe_module.h b/drivers/gpu/drm/xe/xe_module.h
index 2c6ee46f5595..86916c176382 100644
--- a/drivers/gpu/drm/xe/xe_module.h
+++ b/drivers/gpu/drm/xe/xe_module.h
@@ -8,6 +8,6 @@
 /* Module modprobe variables */
 extern bool enable_guc;
 extern bool enable_display;
-extern u32 xe_force_lmem_bar_size;
+extern u32 xe_force_vram_bar_size;
 extern int xe_guc_log_level;
 extern char *xe_param_force_probe;
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 06fc7b206997..dfd97b0ec42a 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -62,12 +62,12 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64 bo_offset,
 		    const enum xe_cache_level level)
 {
 	u64 pde;
-	bool is_lmem;
+	bool is_vram;
 
-	pde = xe_bo_addr(bo, bo_offset, GEN8_PAGE_SIZE, &is_lmem);
+	pde = xe_bo_addr(bo, bo_offset, GEN8_PAGE_SIZE, &is_vram);
 	pde |= GEN8_PAGE_PRESENT | GEN8_PAGE_RW;
 
-	XE_WARN_ON(IS_DGFX(xe_bo_device(bo)) && !is_lmem);
+	XE_WARN_ON(IS_DGFX(xe_bo_device(bo)) && !is_vram);
 
 	/* FIXME: I don't think the PPAT handling is correct for MTL */
 
@@ -80,13 +80,13 @@ u64 gen8_pde_encode(struct xe_bo *bo, u64 bo_offset,
 }
 
 static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
-			   size_t page_size, bool *is_lmem)
+			   size_t page_size, bool *is_vram)
 {
 	if (xe_vma_is_userptr(vma)) {
 		struct xe_res_cursor cur;
 		u64 page;
 
-		*is_lmem = false;
+		*is_vram = false;
 		page = offset >> PAGE_SHIFT;
 		offset &= (PAGE_SIZE - 1);
 
@@ -94,7 +94,7 @@ static dma_addr_t vma_addr(struct xe_vma *vma, u64 offset,
 				&cur);
 		return xe_res_dma(&cur) + offset;
 	} else {
-		return xe_bo_addr(vma->bo, offset, page_size, is_lmem);
+		return xe_bo_addr(vma->bo, offset, page_size, is_vram);
 	}
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index a0e35627e45e..6cf7172e84cc 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3355,7 +3355,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 {
 	struct rb_node *node;
-	bool is_lmem;
+	bool is_vram;
 	uint64_t addr;
 
 	if (!down_read_trylock(&vm->lock)) {
@@ -3363,8 +3363,8 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 		return 0;
 	}
 	if (vm->pt_root[gt_id]) {
-		addr = xe_bo_addr(vm->pt_root[gt_id]->bo, 0, GEN8_PAGE_SIZE, &is_lmem);
-		drm_printf(p, " VM root: A:0x%llx %s\n", addr, is_lmem ? "LMEM" : "SYS");
+		addr = xe_bo_addr(vm->pt_root[gt_id]->bo, 0, GEN8_PAGE_SIZE, &is_vram);
+		drm_printf(p, " VM root: A:0x%llx %s\n", addr, is_vram ? "VRAM" : "SYS");
 	}
 
 	for (node = rb_first(&vm->vmas); node; node = rb_next(node)) {
@@ -3377,11 +3377,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
 			xe_res_first_sg(vma->userptr.sg, 0, GEN8_PAGE_SIZE, &cur);
 			addr = xe_res_dma(&cur);
 		} else {
-			addr = xe_bo_addr(vma->bo, 0, GEN8_PAGE_SIZE, &is_lmem);
+			addr = xe_bo_addr(vma->bo, 0, GEN8_PAGE_SIZE, &is_vram);
 		}
 		drm_printf(p, " [%016llx-%016llx] S:0x%016llx A:%016llx %s\n",
 			   vma->start, vma->end, vma->end - vma->start + 1ull,
-			   addr, is_userptr ? "USR" : is_lmem ? "VRAM" : "SYS");
+			   addr, is_userptr ? "USR" : is_vram ? "VRAM" : "SYS");
 	}
 	up_read(&vm->lock);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 3/6] drm/xe/vram: start tracking the io_size
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt() Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 2/6] drm/xe: s/lmem/vram/ Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 4/6] drm/xe/buddy: remove the virtualized start Matthew Auld
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

First step towards supporting small-bar is to track the io_size for
vram. We can longer assume that the io_size == vram size. This way we
know how much is CPU accessible via the BAR, and how much is not.
Effectively giving us a two tiered vram, where in some later patches we
can support different allocation strategies depending on if the memory
needs to be CPU accessible or not.

Note as this stage we still clamp the vram size to the usable vram size.
Only in the final patch do we turn this on for real, and allow distinct
io_size and vram_size.

v2: (Lucas):
  - Improve the commit message, plus improve the kernel-doc for the
    io_size to give a better sense of what it actually is.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 14 +++++++--
 drivers/gpu/drm/xe/xe_gt_types.h     | 14 +++++++--
 drivers/gpu/drm/xe/xe_mmio.c         | 44 ++++++++++++++++++++--------
 3 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 034e0956f4ea..b98ffad45064 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -176,9 +176,19 @@ struct xe_device {
 	struct {
 		/** @vram: VRAM info for device */
 		struct {
-			/** @io_start: start address of VRAM */
+			/** @io_start: IO start address of VRAM */
 			resource_size_t io_start;
-			/** @size: size of VRAM */
+			/**
+			 * @io_size: IO size of VRAM.
+			 *
+			 * This represents how much of VRAM we can access via
+			 * the CPU through the VRAM BAR. This can be smaller
+			 * than @size, in which case only part of VRAM is CPU
+			 * accessible (typically the first 256M). This
+			 * configuration is known as small-bar.
+			 */
+			resource_size_t io_size;
+			/** @size: Total size of VRAM */
 			resource_size_t size;
 			/** @mapping: pointer to VRAM mappable space */
 			void *__iomem mapping;
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 74b4e6776bf1..8f29aba455e0 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -143,9 +143,19 @@ struct xe_gt {
 		 * (virtual split), can be subset of global device VRAM
 		 */
 		struct {
-			/** @io_start: start address of VRAM */
+			/** @io_start: IO start address of this VRAM instance */
 			resource_size_t io_start;
-			/** @size: size of VRAM */
+			/**
+			 * @io_size: IO size of this VRAM instance
+			 *
+			 * This represents how much of this VRAM we can access
+			 * via the CPU through the VRAM BAR. This can be smaller
+			 * than @size, in which case only part of VRAM is CPU
+			 * accessible (typically the first 256M). This
+			 * configuration is known as small-bar.
+			 */
+			resource_size_t io_size;
+			/** @size: size of VRAM. */
 			resource_size_t size;
 			/** @mapping: pointer to VRAM mappable space */
 			void *__iomem mapping;
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index e5bd4609aaee..5cacaa05759a 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -182,7 +182,6 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	u8 id;
 	u64 vram_size;
 	u64 original_size;
-	u64 current_size;
 	u64 usable_size;
 	int resize_result, err;
 
@@ -190,11 +189,13 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 		xe->mem.vram.mapping = 0;
 		xe->mem.vram.size = 0;
 		xe->mem.vram.io_start = 0;
+		xe->mem.vram.io_size = 0;
 
 		for_each_gt(gt, xe, id) {
 			gt->mem.vram.mapping = 0;
 			gt->mem.vram.size = 0;
 			gt->mem.vram.io_start = 0;
+			gt->mem.vram.io_size = 0;
 		}
 		return 0;
 	}
@@ -212,10 +213,10 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 		return err;
 
 	resize_result = xe_resize_vram_bar(xe, vram_size);
-	current_size = pci_resource_len(pdev, GEN12_LMEM_BAR);
 	xe->mem.vram.io_start = pci_resource_start(pdev, GEN12_LMEM_BAR);
-
-	xe->mem.vram.size = min(current_size, vram_size);
+	xe->mem.vram.io_size = min(usable_size,
+				   pci_resource_len(pdev, GEN12_LMEM_BAR));
+	xe->mem.vram.size = xe->mem.vram.io_size;
 
 	if (!xe->mem.vram.size)
 		return -EIO;
@@ -223,15 +224,15 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	if (resize_result > 0)
 		drm_info(&xe->drm, "Successfully resize VRAM from %lluMiB to %lluMiB\n",
 			 (u64)original_size >> 20,
-			 (u64)current_size >> 20);
-	else if (xe->mem.vram.size < vram_size && !xe_force_vram_bar_size)
+			 (u64)xe->mem.vram.io_size >> 20);
+	else if (xe->mem.vram.io_size < usable_size && !xe_force_vram_bar_size)
 		drm_info(&xe->drm, "Using a reduced BAR size of %lluMiB. Consider enabling 'Resizable BAR' support in your BIOS.\n",
 			 (u64)xe->mem.vram.size >> 20);
 	if (xe->mem.vram.size < vram_size)
 		drm_warn(&xe->drm, "Restricting VRAM size to PCI resource size (0x%llx->0x%llx)\n",
 			 vram_size, (u64)xe->mem.vram.size);
 
-	xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.size);
+	xe->mem.vram.mapping = ioremap_wc(xe->mem.vram.io_start, xe->mem.vram.io_size);
 	xe->mem.vram.size = min_t(u64, xe->mem.vram.size, usable_size);
 
 	drm_info(&xe->drm, "TOTAL VRAM: %pa, %pa\n", &xe->mem.vram.io_start, &xe->mem.vram.size);
@@ -239,7 +240,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	/* FIXME: Assuming equally partitioned VRAM, incorrect */
 	if (xe->info.tile_count > 1) {
 		u8 adj_tile_count = xe->info.tile_count;
-		resource_size_t size, io_start;
+		resource_size_t size, io_start, io_size;
 
 		for_each_gt(gt, xe, id)
 			if (xe_gt_is_media_type(gt))
@@ -249,15 +250,31 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 
 		size = xe->mem.vram.size / adj_tile_count;
 		io_start = xe->mem.vram.io_start;
+		io_size = xe->mem.vram.io_size;
 
 		for_each_gt(gt, xe, id) {
-			if (id && !xe_gt_is_media_type(gt))
-				io_start += size;
+			if (id && !xe_gt_is_media_type(gt)) {
+				io_size -= min(io_size, size);
+				io_start += io_size;
+			}
 
 			gt->mem.vram.size = size;
-			gt->mem.vram.io_start = io_start;
-			gt->mem.vram.mapping = xe->mem.vram.mapping +
-				(io_start - xe->mem.vram.io_start);
+
+			/*
+			 * XXX: multi-tile small-bar might be wild. Hopefully
+			 * full tile without any mappable vram is not something
+			 * we care about.
+			 */
+
+			gt->mem.vram.io_size = min(size, io_size);
+			if (io_size) {
+				gt->mem.vram.io_start = io_start;
+				gt->mem.vram.mapping = xe->mem.vram.mapping +
+					(io_start - xe->mem.vram.io_start);
+			} else {
+				drm_err(&xe->drm, "Tile without any CPU visible VRAM. Aborting.\n");
+				return -ENODEV;
+			}
 
 			drm_info(&xe->drm, "VRAM[%u, %u]: %pa, %pa\n",
 				 id, gt->info.vram_id, &gt->mem.vram.io_start,
@@ -266,6 +283,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 	} else {
 		gt->mem.vram.size = xe->mem.vram.size;
 		gt->mem.vram.io_start = xe->mem.vram.io_start;
+		gt->mem.vram.io_size = xe->mem.vram.io_size;
 		gt->mem.vram.mapping = xe->mem.vram.mapping;
 
 		drm_info(&xe->drm, "VRAM: %pa\n", &gt->mem.vram.size);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 4/6] drm/xe/buddy: remove the virtualized start
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
                   ` (2 preceding siblings ...)
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 3/6] drm/xe/vram: start tracking the io_size Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking Matthew Auld
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
  5 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

Hopefully not needed anymore. We can add a .compatible() hook once we
need to differentiate between mappable and non-mappable vram. If the
allocation is not contiguous then the start value is kind of
meaningless, so rather just mark as invalid.

In upstream, TTM wants to eventually remove the ttm_resource.start
usage.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c           |  6 ++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 29 ++++++++++++++--------------
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 73a7f2cd4ad8..346573f406c2 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -662,6 +662,12 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 				void *new_addr = gt->mem.vram.mapping +
 					(new_mem->start << PAGE_SHIFT);
 
+				if (XE_WARN_ON(new_mem->start == XE_BO_INVALID_OFFSET)) {
+					ret = -EINVAL;
+					xe_device_mem_access_put(xe);
+					goto out;
+				}
+
 				XE_BUG_ON(new_mem->start !=
 					  bo->placements->fpfn);
 
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 643365b18bc7..e3ac8c6d3978 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -54,7 +54,6 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr_resource *vres;
 	u64 size, remaining_size, lpfn, fpfn;
 	struct drm_buddy *mm = &mgr->mm;
-	struct drm_buddy_block *block;
 	unsigned long pages_per_block;
 	int r;
 
@@ -186,24 +185,24 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 			list_splice_tail(trim_list, &vres->blocks);
 	}
 
-	vres->base.start = 0;
-	list_for_each_entry(block, &vres->blocks, link) {
-		unsigned long start;
+	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
+	    xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
+		vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
 
-		start = drm_buddy_block_offset(block) +
-			drm_buddy_block_size(mm, block);
-		start >>= PAGE_SHIFT;
+	/*
+	 * For some kernel objects we still rely on the start when io mapping
+	 * the object.
+	 */
+	if (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) {
+		struct drm_buddy_block *block = list_first_entry(&vres->blocks,
+								 typeof(*block),
+								 link);
 
-		if (start > PFN_UP(vres->base.size))
-			start -= PFN_UP(vres->base.size);
-		else
-			start = 0;
-		vres->base.start = max(vres->base.start, start);
+		vres->base.start = drm_buddy_block_offset(block) >> PAGE_SHIFT;
+	} else {
+		vres->base.start = XE_BO_INVALID_OFFSET;
 	}
 
-	if (xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
-		vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
-
 	*res = &vres->base;
 	return 0;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
                   ` (3 preceding siblings ...)
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 4/6] drm/xe/buddy: remove the virtualized start Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-20 12:08   ` Gwan-gyeong Mun
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
  5 siblings, 1 reply; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

Replace the allocation code with the i915 version. This simplifies the
code a little, and importantly we get the accounting at the mgr level,
which is useful for debug (and maybe userspace), plus per resource
tracking so we can easily check if a resource is using one or pages in
the mappable part of vram (useful for eviction), or if the resource is
completely within the mappable portion (useful for checking if the
resource can be safely CPU mapped).

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c     |  18 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 201 ++++++++++-----------
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |   6 +
 4 files changed, 118 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
index e608b49072ba..35e6bc62766e 100644
--- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
@@ -158,7 +158,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
 {
 	struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
-	u64 stolen_size, pgsize;
+	u64 stolen_size, io_size, pgsize;
 	int err;
 
 	if (IS_DGFX(xe))
@@ -177,7 +177,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
 	if (pgsize < PAGE_SIZE)
 		pgsize = PAGE_SIZE;
 
-	err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size, pgsize);
+	/*
+	 * We don't try to attempt partial visible support for stolen vram,
+	 * since stolen is always at the end of vram, and the BAR size is pretty
+	 * much always 256M, with small-bar.
+	 */
+	io_size = 0;
+	if (!xe_ttm_stolen_cpu_inaccessible(xe))
+		io_size = stolen_size;
+
+	err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size,
+				     io_size, pgsize);
 	if (err) {
 		drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
 		return;
@@ -186,8 +196,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
 	drm_dbg_kms(&xe->drm, "Initialized stolen memory support with %llu bytes\n",
 		    stolen_size);
 
-	if (!xe_ttm_stolen_cpu_inaccessible(xe))
-		mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, stolen_size);
+	if (io_size)
+		mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, io_size);
 }
 
 u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index e3ac8c6d3978..f703e962f499 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 			       const struct ttm_place *place,
 			       struct ttm_resource **res)
 {
-	u64 max_bytes, cur_size, min_block_size;
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres;
-	u64 size, remaining_size, lpfn, fpfn;
 	struct drm_buddy *mm = &mgr->mm;
-	unsigned long pages_per_block;
-	int r;
-
-	lpfn = (u64)place->lpfn << PAGE_SHIFT;
-	if (!lpfn || lpfn > man->size)
-		lpfn = man->size;
-
-	fpfn = (u64)place->fpfn << PAGE_SHIFT;
+	u64 size, remaining_size, min_page_size;
+	unsigned long lpfn;
+	int err;
 
-	max_bytes = mgr->manager.size;
-	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
-		pages_per_block = ~0ul;
-	} else {
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-		pages_per_block = HPAGE_PMD_NR;
-#else
-		/* default to 2MB */
-		pages_per_block = 2UL << (20UL - PAGE_SHIFT);
-#endif
-
-		pages_per_block = max_t(uint32_t, pages_per_block,
-					tbo->page_alignment);
-	}
+	lpfn = place->lpfn;
+	if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
+		lpfn = man->size >> PAGE_SHIFT;
 
 	vres = kzalloc(sizeof(*vres), GFP_KERNEL);
 	if (!vres)
 		return -ENOMEM;
 
 	ttm_resource_init(tbo, place, &vres->base);
-	remaining_size = vres->base.size;
 
 	/* bail out quickly if there's likely not enough VRAM for this BO */
-	if (ttm_resource_manager_usage(man) > max_bytes) {
-		r = -ENOSPC;
+	if (ttm_resource_manager_usage(man) > man->size) {
+		err = -ENOSPC;
 		goto error_fini;
 	}
 
@@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	if (place->flags & TTM_PL_FLAG_TOPDOWN)
 		vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
 
-	if (fpfn || lpfn != man->size)
-		/* Allocate blocks in desired range */
+	if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
 		vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
 
-	mutex_lock(&mgr->lock);
-	while (remaining_size) {
-		if (tbo->page_alignment)
-			min_block_size = tbo->page_alignment << PAGE_SHIFT;
-		else
-			min_block_size = mgr->default_page_size;
-
-		XE_BUG_ON(min_block_size < mm->chunk_size);
-
-		/* Limit maximum size to 2GiB due to SG table limitations */
-		size = min(remaining_size, 2ULL << 30);
-
-		if (size >= pages_per_block << PAGE_SHIFT)
-			min_block_size = pages_per_block << PAGE_SHIFT;
-
-		cur_size = size;
-
-		if (fpfn + size != place->lpfn << PAGE_SHIFT) {
-			/*
-			 * Except for actual range allocation, modify the size and
-			 * min_block_size conforming to continuous flag enablement
-			 */
-			if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
-				size = roundup_pow_of_two(size);
-				min_block_size = size;
-			/*
-			 * Modify the size value if size is not
-			 * aligned with min_block_size
-			 */
-			} else if (!IS_ALIGNED(size, min_block_size)) {
-				size = round_up(size, min_block_size);
-			}
-		}
+	XE_BUG_ON(!vres->base.size);
+	size = vres->base.size;
 
-		r = drm_buddy_alloc_blocks(mm, fpfn,
-					   lpfn,
-					   size,
-					   min_block_size,
-					   &vres->blocks,
-					   vres->flags);
-		if (unlikely(r))
-			goto error_free_blocks;
+	min_page_size = mgr->default_page_size;
+	if (tbo->page_alignment)
+		min_page_size = tbo->page_alignment << PAGE_SHIFT;
 
-		if (size > remaining_size)
-			remaining_size = 0;
-		else
-			remaining_size -= size;
+	XE_BUG_ON(min_page_size < mm->chunk_size);
+	XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
+	XE_BUG_ON(size > SZ_2G &&
+		  (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
+	XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
+
+	if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
+	    place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+		unsigned long pages;
+
+		size = roundup_pow_of_two(size);
+		min_page_size = size;
+
+		pages = size >> ilog2(mm->chunk_size);
+		if (pages > lpfn)
+			lpfn = pages;
 	}
-	mutex_unlock(&mgr->lock);
 
-	if (cur_size != size) {
-		struct drm_buddy_block *block;
-		struct list_head *trim_list;
-		u64 original_size;
-		LIST_HEAD(temp);
+	if (size > lpfn << PAGE_SHIFT) {
+		err = -E2BIG; /* don't trigger eviction */
+		goto error_fini;
+	}
 
-		trim_list = &vres->blocks;
-		original_size = vres->base.size;
+	mutex_lock(&mgr->lock);
+	if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
+		mutex_unlock(&mgr->lock);
+		err = -ENOSPC;
+		goto error_fini;
+	}
 
+	remaining_size = size;
+	do {
 		/*
-		 * If size value is rounded up to min_block_size, trim the last
-		 * block to the required size
+		 * Limit maximum size to 2GiB due to SG table limitations.
+		 * FIXME: Should maybe be handled as part of sg construction.
 		 */
-		if (!list_is_singular(&vres->blocks)) {
-			block = list_last_entry(&vres->blocks, typeof(*block), link);
-			list_move_tail(&block->link, &temp);
-			trim_list = &temp;
-			/*
-			 * Compute the original_size value by subtracting the
-			 * last block size with (aligned size - original size)
-			 */
-			original_size = drm_buddy_block_size(mm, block) -
-				(size - cur_size);
-		}
+		u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
+
+		err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
+					     (u64)lpfn << PAGE_SHIFT,
+					     alloc_size,
+					     min_page_size,
+					     &vres->blocks,
+					     vres->flags);
+		if (err)
+			goto error_free_blocks;
 
-		mutex_lock(&mgr->lock);
-		drm_buddy_block_trim(mm,
-				     original_size,
-				     trim_list);
-		mutex_unlock(&mgr->lock);
+		remaining_size -= alloc_size;
+	} while (remaining_size);
 
-		if (!list_empty(&temp))
-			list_splice_tail(trim_list, &vres->blocks);
+	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
+		if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
+			size = vres->base.size;
 	}
 
+	if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
+		vres->used_visible_size = size;
+	} else {
+		struct drm_buddy_block *block;
+
+		list_for_each_entry(block, &vres->blocks, link) {
+			u64 start = drm_buddy_block_offset(block);
+
+			if (start < mgr->visible_size) {
+				u64 end = start + drm_buddy_block_size(mm, block);
+
+				vres->used_visible_size +=
+					min(end, mgr->visible_size) - start;
+			}
+		}
+	}
+
+	mgr->visible_avail -= vres->used_visible_size;
+	mutex_unlock(&mgr->lock);
+
 	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
 	    xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
 		vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
@@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	ttm_resource_fini(man, &vres->base);
 	kfree(vres);
 
-	return r;
+	return err;
 }
 
 static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
@@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
 
 	mutex_lock(&mgr->lock);
 	drm_buddy_free_list(mm, &vres->blocks);
+	mgr->visible_avail += vres->used_visible_size;
 	mutex_unlock(&mgr->lock);
 
 	ttm_resource_fini(man, res);
@@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
 	struct drm_buddy *mm = &mgr->mm;
 
 	mutex_lock(&mgr->lock);
+	drm_printf(printer, "default_page_size: %lluKiB\n",
+		   mgr->default_page_size >> 10);
+	drm_printf(printer, "visible_avail: %lluMiB\n",
+		   (u64)mgr->visible_avail >> 20);
+	drm_printf(printer, "visible_size: %lluMiB\n",
+		   (u64)mgr->visible_size >> 20);
+
 	drm_buddy_print(mm, printer);
 	mutex_unlock(&mgr->lock);
 	drm_printf(printer, "man size:%llu\n", man->size);
@@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 	if (ttm_resource_manager_evict_all(&xe->ttm, man))
 		return;
 
+	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
+
 	drm_buddy_fini(&mgr->mm);
 
 	ttm_resource_manager_cleanup(&mgr->manager);
@@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 }
 
 int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
-			   u32 mem_type, u64 size, u64 default_page_size)
+			   u32 mem_type, u64 size, u64 io_size,
+			   u64 default_page_size)
 {
 	struct ttm_resource_manager *man = &mgr->manager;
 	int err;
@@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	mgr->mem_type = mem_type;
 	mutex_init(&mgr->lock);
 	mgr->default_page_size = default_page_size;
+	mgr->visible_size = io_size;
+	mgr->visible_avail = io_size;
 
 	ttm_resource_manager_init(man, &xe->ttm, size);
 	err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
@@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr)
 	mgr->gt = gt;
 
 	return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + gt->info.vram_id,
-				      gt->mem.vram.size, PAGE_SIZE);
+				      gt->mem.vram.size, gt->mem.vram.io_size,
+				      PAGE_SIZE);
 }
 
 int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 78f332d26224..35e5367a79fb 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -13,7 +13,8 @@ struct xe_device;
 struct xe_gt;
 
 int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
-			   u32 mem_type, u64 size, u64 default_page_size);
+			   u32 mem_type, u64 size, u64 io_size,
+			   u64 default_page_size);
 int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr);
 int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
 			      struct ttm_resource *res,
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index 39aa2ec1b968..3d9417ff7434 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
 	struct ttm_resource_manager manager;
 	/** @mm: DRM buddy allocator which manages the VRAM */
 	struct drm_buddy mm;
+	/** @visible_size: Proped size of the CPU visible portion */
+	u64 visible_size;
+	/** @visible_avail: CPU visible portion still unallocated */
+	u64 visible_avail;
 	/** @default_page_size: default page size */
 	u64 default_page_size;
 	/** @lock: protects allocations of VRAM */
@@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
 	struct ttm_resource base;
 	/** @blocks: list of DRM buddy blocks */
 	struct list_head blocks;
+	/** @used_visible_size: How many CPU visible bytes this resource is using */
+	u64 used_visible_size;
 	/** @flags: flags associated with the resource */
 	unsigned long flags;
 };
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks
  2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
                   ` (4 preceding siblings ...)
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking Matthew Auld
@ 2023-03-08 10:17 ` Matthew Auld
  2023-03-20 14:15   ` Gwan-gyeong Mun
  5 siblings, 1 reply; 14+ messages in thread
From: Matthew Auld @ 2023-03-08 10:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

Copy this from i915. We need .compatible for lmem -> lmem transfers, so
they don't just get nooped by ttm, if need to move something from
mappable to non-mappble or vice versa. The .intersects is needed for
eviction, to determine if a victim resource is worth eviction. e.g if we
need mappable space there is no point in evicting a resource that has
zero mappable pages.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 62 ++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index f703e962f499..8dd33ac65499 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -230,9 +230,71 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
 	drm_printf(printer, "man size:%llu\n", man->size);
 }
 
+static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
+				       struct ttm_resource *res,
+				       const struct ttm_place *place,
+				       size_t size)
+{
+	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+	struct xe_ttm_vram_mgr_resource *vres =
+		to_xe_ttm_vram_mgr_resource(res);
+	struct drm_buddy *mm = &mgr->mm;
+	struct drm_buddy_block *block;
+
+	if (!place->fpfn && !place->lpfn)
+		return true;
+
+	if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
+		return vres->used_visible_size > 0;
+
+	list_for_each_entry(block, &vres->blocks, link) {
+		unsigned long fpfn =
+			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+		unsigned long lpfn = fpfn +
+			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+
+		if (place->fpfn < lpfn && place->lpfn > fpfn)
+			return true;
+	}
+
+	return false;
+}
+
+static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
+				       struct ttm_resource *res,
+				       const struct ttm_place *place,
+				       size_t size)
+{
+	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+	struct xe_ttm_vram_mgr_resource *vres =
+		to_xe_ttm_vram_mgr_resource(res);
+	struct drm_buddy *mm = &mgr->mm;
+	struct drm_buddy_block *block;
+
+	if (!place->fpfn && !place->lpfn)
+		return true;
+
+	if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
+		return vres->used_visible_size == size;
+
+	list_for_each_entry(block, &vres->blocks, link) {
+		unsigned long fpfn =
+			drm_buddy_block_offset(block) >> PAGE_SHIFT;
+		unsigned long lpfn = fpfn +
+			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
+
+		if (fpfn < place->fpfn || lpfn > place->lpfn)
+			return false;
+	}
+
+	return true;
+}
+
 static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
 	.alloc	= xe_ttm_vram_mgr_new,
 	.free	= xe_ttm_vram_mgr_del,
+	.intersects = xe_ttm_vram_mgr_intersects,
+	.compatible = xe_ttm_vram_mgr_compatible,
 	.debug	= xe_ttm_vram_mgr_debug
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt()
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt() Matthew Auld
@ 2023-03-08 11:20   ` Maarten Lankhorst
  0 siblings, 0 replies; 14+ messages in thread
From: Maarten Lankhorst @ 2023-03-08 11:20 UTC (permalink / raw)
  To: Matthew Auld, intel-xe

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]


On 2023-03-08 11:17, Matthew Auld wrote:
> xe_ttm_stolen_cpu_inaccessible() was originally meant to just cover the
> case where stolen is not directly CPU accessible on some older
> integrated platforms, and as such a GGTT mapping was also required for
> CPU access (as per the check in xe_bo_create_pin_map_at()).
>
> However with small-bar systems on dgfx we have one more case where
> stolen is also inaccessible, however here we don't have any fallback
> GGTT mode for CPU access. Fix the check in xe_bo_create_pin_map_at() to
> make this distinction clear. In such a case the later vmap() will fail
> anyway.
>
> v2: fix kernel-doc warning
>
> Suggested-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com>
> Signed-off-by: Matthew Auld<matthew.auld@intel.com>
> Reviewed-by: Gwan-gyeong Mun<gwan-gyeong.mun@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_bo.c             |  2 +-
>   drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 14 ++++++++++++++
>   drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h |  1 +
>   3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 2bfd3f6f2e9a..876f77669104 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1151,7 +1151,7 @@ struct xe_bo *xe_bo_create_pin_map_at(struct xe_device *xe, struct xe_gt *gt,
>   	u64 end = offset == ~0ull ? offset : start + size;
>   
>   	if (flags & XE_BO_CREATE_STOLEN_BIT &&
> -	    xe_ttm_stolen_cpu_inaccessible(xe))
> +	    xe_ttm_stolen_cpu_access_needs_ggtt(xe))
>   		flags |= XE_BO_CREATE_GGTT_BIT;
>   
>   	bo = xe_bo_create_locked_range(xe, gt, vm, size, start, end, type, flags);
> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> index 2e8d07ad42ae..e608b49072ba 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> @@ -66,6 +66,20 @@ bool xe_ttm_stolen_cpu_inaccessible(struct xe_device *xe)
>   	return !mgr->io_base || GRAPHICS_VERx100(xe) < 1270;
>   }
>   
> +/**
> + * xe_ttm_stolen_cpu_access_needs_ggtt - If we can't directly CPU access stolen,
> + * can we then fallback to mapping through the GGTT.
> + * @xe: xe device
> + *
> + * Some older integrated platforms don't support reliable CPU access for stolen,
> + * however on such hardware we can always use the mappable part of the GGTT for
> + * CPU access. Check if that's the case for this device.
> + */
> +bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe)
> +{
> +	return xe_ttm_stolen_cpu_inaccessible(xe) && !IS_DGFX(xe);
> +}
> +
>   static s64 detect_bar2_dgfx(struct xe_device *xe, struct xe_ttm_stolen_mgr *mgr)
>   {
>   	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);

I think it's more the other way around by the way, if you make 
cpu_inaccessible use

xe_ttm_stolen_cpu_access_needs_ggtt, it can be used insidexe_ttm_stolen_mgr_init too.

> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
> index 2fda97b97a05..e210dada636e 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.h
> @@ -15,6 +15,7 @@ struct xe_device;
>   void xe_ttm_stolen_mgr_init(struct xe_device *xe);
>   int xe_ttm_stolen_io_mem_reserve(struct xe_device *xe, struct ttm_resource *mem);
>   bool xe_ttm_stolen_cpu_inaccessible(struct xe_device *xe);
> +bool xe_ttm_stolen_cpu_access_needs_ggtt(struct xe_device *xe);
>   u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset);
>   u64 xe_ttm_stolen_gpu_offset(struct xe_device *xe);
>   

[-- Attachment #2: Type: text/html, Size: 4695 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking Matthew Auld
@ 2023-03-20 12:08   ` Gwan-gyeong Mun
  2023-03-20 13:03     ` Matthew Auld
  0 siblings, 1 reply; 14+ messages in thread
From: Gwan-gyeong Mun @ 2023-03-20 12:08 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi



On 3/8/23 12:17 PM, Matthew Auld wrote:
> Replace the allocation code with the i915 version. This simplifies the
> code a little, and importantly we get the accounting at the mgr level,
> which is useful for debug (and maybe userspace), plus per resource
> tracking so we can easily check if a resource is using one or pages in
> the mappable part of vram (useful for eviction), or if the resource is
> completely within the mappable portion (useful for checking if the
> resource can be safely CPU mapped).
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c     |  18 +-
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 201 ++++++++++-----------
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |   6 +
>   4 files changed, 118 insertions(+), 110 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> index e608b49072ba..35e6bc62766e 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
> @@ -158,7 +158,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>   {
>   	struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL);
>   	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
> -	u64 stolen_size, pgsize;
> +	u64 stolen_size, io_size, pgsize;
>   	int err;
>   
>   	if (IS_DGFX(xe))
> @@ -177,7 +177,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>   	if (pgsize < PAGE_SIZE)
>   		pgsize = PAGE_SIZE;
>   
> -	err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size, pgsize);
> +	/*
> +	 * We don't try to attempt partial visible support for stolen vram,
> +	 * since stolen is always at the end of vram, and the BAR size is pretty
> +	 * much always 256M, with small-bar.
> +	 */
> +	io_size = 0;
> +	if (!xe_ttm_stolen_cpu_inaccessible(xe))
> +		io_size = stolen_size;
> +
> +	err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size,
> +				     io_size, pgsize);
>   	if (err) {
>   		drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
>   		return;
> @@ -186,8 +196,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>   	drm_dbg_kms(&xe->drm, "Initialized stolen memory support with %llu bytes\n",
>   		    stolen_size);
>   
> -	if (!xe_ttm_stolen_cpu_inaccessible(xe))
> -		mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, stolen_size);
> +	if (io_size)
> +		mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, io_size);
>   }
>   
>   u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index e3ac8c6d3978..f703e962f499 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>   			       const struct ttm_place *place,
>   			       struct ttm_resource **res)
>   {
> -	u64 max_bytes, cur_size, min_block_size;
>   	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>   	struct xe_ttm_vram_mgr_resource *vres;
> -	u64 size, remaining_size, lpfn, fpfn;
>   	struct drm_buddy *mm = &mgr->mm;
> -	unsigned long pages_per_block;
> -	int r;
> -
> -	lpfn = (u64)place->lpfn << PAGE_SHIFT;
> -	if (!lpfn || lpfn > man->size)
> -		lpfn = man->size;
> -
> -	fpfn = (u64)place->fpfn << PAGE_SHIFT;
> +	u64 size, remaining_size, min_page_size;
> +	unsigned long lpfn;
> +	int err;
>   
> -	max_bytes = mgr->manager.size;
> -	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> -		pages_per_block = ~0ul;
> -	} else {
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -		pages_per_block = HPAGE_PMD_NR;
> -#else
> -		/* default to 2MB */
> -		pages_per_block = 2UL << (20UL - PAGE_SHIFT);
> -#endif
> -
> -		pages_per_block = max_t(uint32_t, pages_per_block,
> -					tbo->page_alignment);
> -	}
> +	lpfn = place->lpfn;
> +	if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
> +		lpfn = man->size >> PAGE_SHIFT;
>   
>   	vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>   	if (!vres)
>   		return -ENOMEM;
>   
>   	ttm_resource_init(tbo, place, &vres->base);
> -	remaining_size = vres->base.size;
>   
>   	/* bail out quickly if there's likely not enough VRAM for this BO */
> -	if (ttm_resource_manager_usage(man) > max_bytes) {
> -		r = -ENOSPC;
> +	if (ttm_resource_manager_usage(man) > man->size) {
> +		err = -ENOSPC;
>   		goto error_fini;
>   	}
>   
> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>   	if (place->flags & TTM_PL_FLAG_TOPDOWN)
>   		vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>   
> -	if (fpfn || lpfn != man->size)
> -		/* Allocate blocks in desired range */
> +	if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
>   		vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>   
> -	mutex_lock(&mgr->lock);
> -	while (remaining_size) {
> -		if (tbo->page_alignment)
> -			min_block_size = tbo->page_alignment << PAGE_SHIFT;
> -		else
> -			min_block_size = mgr->default_page_size;
> -
> -		XE_BUG_ON(min_block_size < mm->chunk_size);
> -
> -		/* Limit maximum size to 2GiB due to SG table limitations */
> -		size = min(remaining_size, 2ULL << 30);
> -
> -		if (size >= pages_per_block << PAGE_SHIFT)
> -			min_block_size = pages_per_block << PAGE_SHIFT;
> -
> -		cur_size = size;
> -
> -		if (fpfn + size != place->lpfn << PAGE_SHIFT) {
> -			/*
> -			 * Except for actual range allocation, modify the size and
> -			 * min_block_size conforming to continuous flag enablement
> -			 */
> -			if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> -				size = roundup_pow_of_two(size);
> -				min_block_size = size;
> -			/*
> -			 * Modify the size value if size is not
> -			 * aligned with min_block_size
> -			 */
> -			} else if (!IS_ALIGNED(size, min_block_size)) {
> -				size = round_up(size, min_block_size);
> -			}
> -		}
> +	XE_BUG_ON(!vres->base.size);
> +	size = vres->base.size;
>   
> -		r = drm_buddy_alloc_blocks(mm, fpfn,
> -					   lpfn,
> -					   size,
> -					   min_block_size,
> -					   &vres->blocks,
> -					   vres->flags);
> -		if (unlikely(r))
> -			goto error_free_blocks;
> +	min_page_size = mgr->default_page_size;
> +	if (tbo->page_alignment)
> +		min_page_size = tbo->page_alignment << PAGE_SHIFT;
>   
> -		if (size > remaining_size)
> -			remaining_size = 0;
> -		else
> -			remaining_size -= size;
> +	XE_BUG_ON(min_page_size < mm->chunk_size);
> +	XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
> +	XE_BUG_ON(size > SZ_2G &&
> +		  (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
Hi Matt,
What scenario is the purpose of testing? Is the purpose of reporting an 
error when it is a small bar case at the current stage?
> +	XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
If it is not aligned, can't we increase it to the aligned size and 
allocate it?
> +
> +	if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
> +	    place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> +		unsigned long pages;
> +
> +		size = roundup_pow_of_two(size);
> +		min_page_size = size;
> +
> +		pages = size >> ilog2(mm->chunk_size);
Why did you use ilog2(mm->chunk_size) instead of PAGE_SHIFT here?
> +		if (pages > lpfn)
> +			lpfn = pages;
>   	}
> -	mutex_unlock(&mgr->lock);
>   
> -	if (cur_size != size) {
> -		struct drm_buddy_block *block;
> -		struct list_head *trim_list;
> -		u64 original_size;
> -		LIST_HEAD(temp);
> +	if (size > lpfn << PAGE_SHIFT) {
In case of DRM_BUDDY_RANGE_ALLOCATION, shouldn't it be compared with 
size > (lpfn - fpfn) << PAGE_SHIFT?

Br,

G.G.
> +		err = -E2BIG; /* don't trigger eviction */
> +		goto error_fini;
> +	}
>   
> -		trim_list = &vres->blocks;
> -		original_size = vres->base.size;
> +	mutex_lock(&mgr->lock);
> +	if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
> +		mutex_unlock(&mgr->lock);
> +		err = -ENOSPC;
> +		goto error_fini;
> +	}
>   
> +	remaining_size = size;
> +	do {
>   		/*
> -		 * If size value is rounded up to min_block_size, trim the last
> -		 * block to the required size
> +		 * Limit maximum size to 2GiB due to SG table limitations.
> +		 * FIXME: Should maybe be handled as part of sg construction.
>   		 */
> -		if (!list_is_singular(&vres->blocks)) {
> -			block = list_last_entry(&vres->blocks, typeof(*block), link);
> -			list_move_tail(&block->link, &temp);
> -			trim_list = &temp;
> -			/*
> -			 * Compute the original_size value by subtracting the
> -			 * last block size with (aligned size - original size)
> -			 */
> -			original_size = drm_buddy_block_size(mm, block) -
> -				(size - cur_size);
> -		}
> +		u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
> +
> +		err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
> +					     (u64)lpfn << PAGE_SHIFT,
> +					     alloc_size,
> +					     min_page_size,
> +					     &vres->blocks,
> +					     vres->flags);
> +		if (err)
> +			goto error_free_blocks;
>   
> -		mutex_lock(&mgr->lock);
> -		drm_buddy_block_trim(mm,
> -				     original_size,
> -				     trim_list);
> -		mutex_unlock(&mgr->lock);
> +		remaining_size -= alloc_size;
> +	} while (remaining_size);
>   
> -		if (!list_empty(&temp))
> -			list_splice_tail(trim_list, &vres->blocks);
> +	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
> +		if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
> +			size = vres->base.size;
>   	}
>   
> +	if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
> +		vres->used_visible_size = size;
> +	} else {
> +		struct drm_buddy_block *block;
> +
> +		list_for_each_entry(block, &vres->blocks, link) {
> +			u64 start = drm_buddy_block_offset(block);
> +
> +			if (start < mgr->visible_size) {
> +				u64 end = start + drm_buddy_block_size(mm, block);
> +
> +				vres->used_visible_size +=
> +					min(end, mgr->visible_size) - start;
> +			}
> +		}
> +	}
> +
> +	mgr->visible_avail -= vres->used_visible_size;
> +	mutex_unlock(&mgr->lock);
> +
>   	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>   	    xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>   		vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
>   	ttm_resource_fini(man, &vres->base);
>   	kfree(vres);
>   
> -	return r;
> +	return err;
>   }
>   
>   static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>   
>   	mutex_lock(&mgr->lock);
>   	drm_buddy_free_list(mm, &vres->blocks);
> +	mgr->visible_avail += vres->used_visible_size;
>   	mutex_unlock(&mgr->lock);
>   
>   	ttm_resource_fini(man, res);
> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
>   	struct drm_buddy *mm = &mgr->mm;
>   
>   	mutex_lock(&mgr->lock);
> +	drm_printf(printer, "default_page_size: %lluKiB\n",
> +		   mgr->default_page_size >> 10);
> +	drm_printf(printer, "visible_avail: %lluMiB\n",
> +		   (u64)mgr->visible_avail >> 20);
> +	drm_printf(printer, "visible_size: %lluMiB\n",
> +		   (u64)mgr->visible_size >> 20);
> +
>   	drm_buddy_print(mm, printer);
>   	mutex_unlock(&mgr->lock);
>   	drm_printf(printer, "man size:%llu\n", man->size);
> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>   	if (ttm_resource_manager_evict_all(&xe->ttm, man))
>   		return;
>   
> +	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
> +
>   	drm_buddy_fini(&mgr->mm);
>   
>   	ttm_resource_manager_cleanup(&mgr->manager);
> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>   }
>   
>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
> -			   u32 mem_type, u64 size, u64 default_page_size)
> +			   u32 mem_type, u64 size, u64 io_size,
> +			   u64 default_page_size)
>   {
>   	struct ttm_resource_manager *man = &mgr->manager;
>   	int err;
> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
>   	mgr->mem_type = mem_type;
>   	mutex_init(&mgr->lock);
>   	mgr->default_page_size = default_page_size;
> +	mgr->visible_size = io_size;
> +	mgr->visible_avail = io_size;
>   
>   	ttm_resource_manager_init(man, &xe->ttm, size);
>   	err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr)
>   	mgr->gt = gt;
>   
>   	return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + gt->info.vram_id,
> -				      gt->mem.vram.size, PAGE_SIZE);
> +				      gt->mem.vram.size, gt->mem.vram.io_size,
> +				      PAGE_SIZE);
>   }
>   
>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> index 78f332d26224..35e5367a79fb 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> @@ -13,7 +13,8 @@ struct xe_device;
>   struct xe_gt;
>   
>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
> -			   u32 mem_type, u64 size, u64 default_page_size);
> +			   u32 mem_type, u64 size, u64 io_size,
> +			   u64 default_page_size);
>   int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr *mgr);
>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>   			      struct ttm_resource *res,
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> index 39aa2ec1b968..3d9417ff7434 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
>   	struct ttm_resource_manager manager;
>   	/** @mm: DRM buddy allocator which manages the VRAM */
>   	struct drm_buddy mm;
> +	/** @visible_size: Proped size of the CPU visible portion */
> +	u64 visible_size;
> +	/** @visible_avail: CPU visible portion still unallocated */
> +	u64 visible_avail;
>   	/** @default_page_size: default page size */
>   	u64 default_page_size;
>   	/** @lock: protects allocations of VRAM */
> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
>   	struct ttm_resource base;
>   	/** @blocks: list of DRM buddy blocks */
>   	struct list_head blocks;
> +	/** @used_visible_size: How many CPU visible bytes this resource is using */
> +	u64 used_visible_size;
>   	/** @flags: flags associated with the resource */
>   	unsigned long flags;
>   };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking
  2023-03-20 12:08   ` Gwan-gyeong Mun
@ 2023-03-20 13:03     ` Matthew Auld
  2023-03-20 14:26       ` Gwan-gyeong Mun
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Auld @ 2023-03-20 13:03 UTC (permalink / raw)
  To: Gwan-gyeong Mun, intel-xe; +Cc: Lucas De Marchi

On 20/03/2023 12:08, Gwan-gyeong Mun wrote:
> 
> 
> On 3/8/23 12:17 PM, Matthew Auld wrote:
>> Replace the allocation code with the i915 version. This simplifies the
>> code a little, and importantly we get the accounting at the mgr level,
>> which is useful for debug (and maybe userspace), plus per resource
>> tracking so we can easily check if a resource is using one or pages in
>> the mappable part of vram (useful for eviction), or if the resource is
>> completely within the mappable portion (useful for checking if the
>> resource can be safely CPU mapped).
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c     |  18 +-
>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 201 ++++++++++-----------
>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |   6 +
>>   4 files changed, 118 insertions(+), 110 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c 
>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> index e608b49072ba..35e6bc62766e 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> @@ -158,7 +158,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>   {
>>       struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, 
>> sizeof(*mgr), GFP_KERNEL);
>>       struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>> -    u64 stolen_size, pgsize;
>> +    u64 stolen_size, io_size, pgsize;
>>       int err;
>>       if (IS_DGFX(xe))
>> @@ -177,7 +177,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>       if (pgsize < PAGE_SIZE)
>>           pgsize = PAGE_SIZE;
>> -    err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, 
>> stolen_size, pgsize);
>> +    /*
>> +     * We don't try to attempt partial visible support for stolen vram,
>> +     * since stolen is always at the end of vram, and the BAR size is 
>> pretty
>> +     * much always 256M, with small-bar.
>> +     */
>> +    io_size = 0;
>> +    if (!xe_ttm_stolen_cpu_inaccessible(xe))
>> +        io_size = stolen_size;
>> +
>> +    err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, 
>> stolen_size,
>> +                     io_size, pgsize);
>>       if (err) {
>>           drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
>>           return;
>> @@ -186,8 +196,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>       drm_dbg_kms(&xe->drm, "Initialized stolen memory support with 
>> %llu bytes\n",
>>               stolen_size);
>> -    if (!xe_ttm_stolen_cpu_inaccessible(xe))
>> -        mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, 
>> stolen_size);
>> +    if (io_size)
>> +        mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, 
>> io_size);
>>   }
>>   u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> index e3ac8c6d3978..f703e962f499 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>                      const struct ttm_place *place,
>>                      struct ttm_resource **res)
>>   {
>> -    u64 max_bytes, cur_size, min_block_size;
>>       struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>>       struct xe_ttm_vram_mgr_resource *vres;
>> -    u64 size, remaining_size, lpfn, fpfn;
>>       struct drm_buddy *mm = &mgr->mm;
>> -    unsigned long pages_per_block;
>> -    int r;
>> -
>> -    lpfn = (u64)place->lpfn << PAGE_SHIFT;
>> -    if (!lpfn || lpfn > man->size)
>> -        lpfn = man->size;
>> -
>> -    fpfn = (u64)place->fpfn << PAGE_SHIFT;
>> +    u64 size, remaining_size, min_page_size;
>> +    unsigned long lpfn;
>> +    int err;
>> -    max_bytes = mgr->manager.size;
>> -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> -        pages_per_block = ~0ul;
>> -    } else {
>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> -        pages_per_block = HPAGE_PMD_NR;
>> -#else
>> -        /* default to 2MB */
>> -        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>> -#endif
>> -
>> -        pages_per_block = max_t(uint32_t, pages_per_block,
>> -                    tbo->page_alignment);
>> -    }
>> +    lpfn = place->lpfn;
>> +    if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
>> +        lpfn = man->size >> PAGE_SHIFT;
>>       vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>>       if (!vres)
>>           return -ENOMEM;
>>       ttm_resource_init(tbo, place, &vres->base);
>> -    remaining_size = vres->base.size;
>>       /* bail out quickly if there's likely not enough VRAM for this 
>> BO */
>> -    if (ttm_resource_manager_usage(man) > max_bytes) {
>> -        r = -ENOSPC;
>> +    if (ttm_resource_manager_usage(man) > man->size) {
>> +        err = -ENOSPC;
>>           goto error_fini;
>>       }
>> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>       if (place->flags & TTM_PL_FLAG_TOPDOWN)
>>           vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>> -    if (fpfn || lpfn != man->size)
>> -        /* Allocate blocks in desired range */
>> +    if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
>>           vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>> -    mutex_lock(&mgr->lock);
>> -    while (remaining_size) {
>> -        if (tbo->page_alignment)
>> -            min_block_size = tbo->page_alignment << PAGE_SHIFT;
>> -        else
>> -            min_block_size = mgr->default_page_size;
>> -
>> -        XE_BUG_ON(min_block_size < mm->chunk_size);
>> -
>> -        /* Limit maximum size to 2GiB due to SG table limitations */
>> -        size = min(remaining_size, 2ULL << 30);
>> -
>> -        if (size >= pages_per_block << PAGE_SHIFT)
>> -            min_block_size = pages_per_block << PAGE_SHIFT;
>> -
>> -        cur_size = size;
>> -
>> -        if (fpfn + size != place->lpfn << PAGE_SHIFT) {
>> -            /*
>> -             * Except for actual range allocation, modify the size and
>> -             * min_block_size conforming to continuous flag enablement
>> -             */
>> -            if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> -                size = roundup_pow_of_two(size);
>> -                min_block_size = size;
>> -            /*
>> -             * Modify the size value if size is not
>> -             * aligned with min_block_size
>> -             */
>> -            } else if (!IS_ALIGNED(size, min_block_size)) {
>> -                size = round_up(size, min_block_size);
>> -            }
>> -        }
>> +    XE_BUG_ON(!vres->base.size);
>> +    size = vres->base.size;
>> -        r = drm_buddy_alloc_blocks(mm, fpfn,
>> -                       lpfn,
>> -                       size,
>> -                       min_block_size,
>> -                       &vres->blocks,
>> -                       vres->flags);
>> -        if (unlikely(r))
>> -            goto error_free_blocks;
>> +    min_page_size = mgr->default_page_size;
>> +    if (tbo->page_alignment)
>> +        min_page_size = tbo->page_alignment << PAGE_SHIFT;
>> -        if (size > remaining_size)
>> -            remaining_size = 0;
>> -        else
>> -            remaining_size -= size;
>> +    XE_BUG_ON(min_page_size < mm->chunk_size);
>> +    XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
>> +    XE_BUG_ON(size > SZ_2G &&
>> +          (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
> Hi Matt,
> What scenario is the purpose of testing? Is the purpose of reporting an 
> error when it is a small bar case at the current stage?

XE_BUG_ON(min_page_size < mm->chunk_size) - This one is just a general 
sanity check, which seemed good to add. The chunk_size is the minimum 
sized chunk of memory that we can allocate, and so if we ask for 
something smaller then likely that's a bug.

XE_BUG_ON(min_page_size > SZ_2G) - This is also not small-bar related, 
but an existing limitation. If you look below the allocation is split 
into a maximum allocation size of 2G (so that is can always fit in an sg 
entry without overflowing). If someone in the future asks for page_size 
 > 2G, then the alignment might not match, due to this splitting, hence 
better treat that as an error. This one has a FIXME, since the code that 
constructs the sg table could be improved to handle the limitation 
there, instead of leaking it into the buddy code.

XE_BUG_ON(size > SZ_2G && (vres->base.placement & 
TTM_PL_FLAG_CONTIGUOUS)) - And this is basically the same as above. We 
can't have larger than 2G for one chunk AFAICT, so if someone asks for 
CONTIG for such a case, the underlying allocation might not be 
contiguous, since we need to make at least two separate allocation 
requests. But again, I don't think anyone currently needs such a thing, 
but best to just document/assert such limitations.

>> +    XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
> If it is not aligned, can't we increase it to the aligned size and 
> allocate it?

Yeah, that's one possibility. It's just that the existing behaviour is 
to leave that to the responsibility of the caller. AFAICT this is for 
now mostly interesting on platforms where we need 64K min page size for 
VRAM, and there the object size is meant to be rounded up, so we don't 
currently expect anyone to trigger this, unless there is a bug.

>> +
>> +    if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
>> +        place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> +        unsigned long pages;
>> +
>> +        size = roundup_pow_of_two(size);
>> +        min_page_size = size;
>> +
>> +        pages = size >> ilog2(mm->chunk_size);
> Why did you use ilog2(mm->chunk_size) instead of PAGE_SHIFT here?

Yeah, it looks like the existing i915 code seems to mix chunk_size and 
PAGE_SHIFT. I think back in the day the buddy allocator just accepted 
the order, but now is happy with the number of bytes, so doesn't make 
much sense. Will fix. Thanks for pointing this out.

>> +        if (pages > lpfn)
>> +            lpfn = pages;
>>       }
>> -    mutex_unlock(&mgr->lock);
>> -    if (cur_size != size) {
>> -        struct drm_buddy_block *block;
>> -        struct list_head *trim_list;
>> -        u64 original_size;
>> -        LIST_HEAD(temp);
>> +    if (size > lpfn << PAGE_SHIFT) {
> In case of DRM_BUDDY_RANGE_ALLOCATION, shouldn't it be compared with 
> size > (lpfn - fpfn) << PAGE_SHIFT?

Yeah, that makes more sense. Also this should be higher up. Thanks for 
reviewing.

> 
> Br,
> 
> G.G.
>> +        err = -E2BIG; /* don't trigger eviction */
>> +        goto error_fini;
>> +    }
>> -        trim_list = &vres->blocks;
>> -        original_size = vres->base.size;
>> +    mutex_lock(&mgr->lock);
>> +    if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
>> +        mutex_unlock(&mgr->lock);
>> +        err = -ENOSPC;
>> +        goto error_fini;
>> +    }
>> +    remaining_size = size;
>> +    do {
>>           /*
>> -         * If size value is rounded up to min_block_size, trim the last
>> -         * block to the required size
>> +         * Limit maximum size to 2GiB due to SG table limitations.
>> +         * FIXME: Should maybe be handled as part of sg construction.
>>            */
>> -        if (!list_is_singular(&vres->blocks)) {
>> -            block = list_last_entry(&vres->blocks, typeof(*block), 
>> link);
>> -            list_move_tail(&block->link, &temp);
>> -            trim_list = &temp;
>> -            /*
>> -             * Compute the original_size value by subtracting the
>> -             * last block size with (aligned size - original size)
>> -             */
>> -            original_size = drm_buddy_block_size(mm, block) -
>> -                (size - cur_size);
>> -        }
>> +        u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
>> +
>> +        err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>> +                         (u64)lpfn << PAGE_SHIFT,
>> +                         alloc_size,
>> +                         min_page_size,
>> +                         &vres->blocks,
>> +                         vres->flags);
>> +        if (err)
>> +            goto error_free_blocks;
>> -        mutex_lock(&mgr->lock);
>> -        drm_buddy_block_trim(mm,
>> -                     original_size,
>> -                     trim_list);
>> -        mutex_unlock(&mgr->lock);
>> +        remaining_size -= alloc_size;
>> +    } while (remaining_size);
>> -        if (!list_empty(&temp))
>> -            list_splice_tail(trim_list, &vres->blocks);
>> +    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>> +        if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
>> +            size = vres->base.size;
>>       }
>> +    if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
>> +        vres->used_visible_size = size;
>> +    } else {
>> +        struct drm_buddy_block *block;
>> +
>> +        list_for_each_entry(block, &vres->blocks, link) {
>> +            u64 start = drm_buddy_block_offset(block);
>> +
>> +            if (start < mgr->visible_size) {
>> +                u64 end = start + drm_buddy_block_size(mm, block);
>> +
>> +                vres->used_visible_size +=
>> +                    min(end, mgr->visible_size) - start;
>> +            }
>> +        }
>> +    }
>> +
>> +    mgr->visible_avail -= vres->used_visible_size;
>> +    mutex_unlock(&mgr->lock);
>> +
>>       if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>>           xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>>           vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct 
>> ttm_resource_manager *man,
>>       ttm_resource_fini(man, &vres->base);
>>       kfree(vres);
>> -    return r;
>> +    return err;
>>   }
>>   static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct 
>> ttm_resource_manager *man,
>>       mutex_lock(&mgr->lock);
>>       drm_buddy_free_list(mm, &vres->blocks);
>> +    mgr->visible_avail += vres->used_visible_size;
>>       mutex_unlock(&mgr->lock);
>>       ttm_resource_fini(man, res);
>> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct 
>> ttm_resource_manager *man,
>>       struct drm_buddy *mm = &mgr->mm;
>>       mutex_lock(&mgr->lock);
>> +    drm_printf(printer, "default_page_size: %lluKiB\n",
>> +           mgr->default_page_size >> 10);
>> +    drm_printf(printer, "visible_avail: %lluMiB\n",
>> +           (u64)mgr->visible_avail >> 20);
>> +    drm_printf(printer, "visible_size: %lluMiB\n",
>> +           (u64)mgr->visible_size >> 20);
>> +
>>       drm_buddy_print(mm, printer);
>>       mutex_unlock(&mgr->lock);
>>       drm_printf(printer, "man size:%llu\n", man->size);
>> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device 
>> *dev, void *arg)
>>       if (ttm_resource_manager_evict_all(&xe->ttm, man))
>>           return;
>> +    WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>> +
>>       drm_buddy_fini(&mgr->mm);
>>       ttm_resource_manager_cleanup(&mgr->manager);
>> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device 
>> *dev, void *arg)
>>   }
>>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct 
>> xe_ttm_vram_mgr *mgr,
>> -               u32 mem_type, u64 size, u64 default_page_size)
>> +               u32 mem_type, u64 size, u64 io_size,
>> +               u64 default_page_size)
>>   {
>>       struct ttm_resource_manager *man = &mgr->manager;
>>       int err;
>> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, 
>> struct xe_ttm_vram_mgr *mgr,
>>       mgr->mem_type = mem_type;
>>       mutex_init(&mgr->lock);
>>       mgr->default_page_size = default_page_size;
>> +    mgr->visible_size = io_size;
>> +    mgr->visible_avail = io_size;
>>       ttm_resource_manager_init(man, &xe->ttm, size);
>>       err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
>> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct 
>> xe_ttm_vram_mgr *mgr)
>>       mgr->gt = gt;
>>       return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + 
>> gt->info.vram_id,
>> -                      gt->mem.vram.size, PAGE_SIZE);
>> +                      gt->mem.vram.size, gt->mem.vram.io_size,
>> +                      PAGE_SIZE);
>>   }
>>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h 
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> index 78f332d26224..35e5367a79fb 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>> @@ -13,7 +13,8 @@ struct xe_device;
>>   struct xe_gt;
>>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct 
>> xe_ttm_vram_mgr *mgr,
>> -               u32 mem_type, u64 size, u64 default_page_size);
>> +               u32 mem_type, u64 size, u64 io_size,
>> +               u64 default_page_size);
>>   int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr 
>> *mgr);
>>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>>                     struct ttm_resource *res,
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h 
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> index 39aa2ec1b968..3d9417ff7434 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
>>       struct ttm_resource_manager manager;
>>       /** @mm: DRM buddy allocator which manages the VRAM */
>>       struct drm_buddy mm;
>> +    /** @visible_size: Proped size of the CPU visible portion */
>> +    u64 visible_size;
>> +    /** @visible_avail: CPU visible portion still unallocated */
>> +    u64 visible_avail;
>>       /** @default_page_size: default page size */
>>       u64 default_page_size;
>>       /** @lock: protects allocations of VRAM */
>> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
>>       struct ttm_resource base;
>>       /** @blocks: list of DRM buddy blocks */
>>       struct list_head blocks;
>> +    /** @used_visible_size: How many CPU visible bytes this resource 
>> is using */
>> +    u64 used_visible_size;
>>       /** @flags: flags associated with the resource */
>>       unsigned long flags;
>>   };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks
  2023-03-08 10:17 ` [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
@ 2023-03-20 14:15   ` Gwan-gyeong Mun
  2023-03-20 14:25     ` Matthew Auld
  0 siblings, 1 reply; 14+ messages in thread
From: Gwan-gyeong Mun @ 2023-03-20 14:15 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi



On 3/8/23 12:17 PM, Matthew Auld wrote:
> Copy this from i915. We need .compatible for lmem -> lmem transfers, so
> they don't just get nooped by ttm, if need to move something from
> mappable to non-mappble or vice versa. The .intersects is needed for
> eviction, to determine if a victim resource is worth eviction. e.g if we
> need mappable space there is no point in evicting a resource that has
> zero mappable pages.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 62 ++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index f703e962f499..8dd33ac65499 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -230,9 +230,71 @@ static void xe_ttm_vram_mgr_debug(struct ttm_resource_manager *man,
>   	drm_printf(printer, "man size:%llu\n", man->size);
>   }
>   
> +static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
> +				       struct ttm_resource *res,
> +				       const struct ttm_place *place,
> +				       size_t size)
> +{
> +	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
> +	struct xe_ttm_vram_mgr_resource *vres =
> +		to_xe_ttm_vram_mgr_resource(res);
> +	struct drm_buddy *mm = &mgr->mm;
> +	struct drm_buddy_block *block;
> +
> +	if (!place->fpfn && !place->lpfn)
> +		return true;
> +Hi Matt,
I understand that ttm_resource_intersects() returns true in the 
following case,

if (!place || !man->func->intersects)
		return true;

so this line can handle it like this, but in a real scenario, is it 
possible to have a situation where place->fpfn==0 and place->lpfn==0?

Except for the routine that does the same check in 
xe_ttm_vram_mgr_compatible(), the rest of it looks good.

br,
G.G.
> +	if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
> +		return vres->used_visible_size > 0;
> +
> +	list_for_each_entry(block, &vres->blocks, link) {
> +		unsigned long fpfn =
> +			drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +		unsigned long lpfn = fpfn +
> +			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +
> +		if (place->fpfn < lpfn && place->lpfn > fpfn)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
> +				       struct ttm_resource *res,
> +				       const struct ttm_place *place,
> +				       size_t size)
> +{
> +	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
> +	struct xe_ttm_vram_mgr_resource *vres =
> +		to_xe_ttm_vram_mgr_resource(res);
> +	struct drm_buddy *mm = &mgr->mm;
> +	struct drm_buddy_block *block;
> +
> +	if (!place->fpfn && !place->lpfn)
> +		return true;
> +
> +	if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
> +		return vres->used_visible_size == size;
> +
> +	list_for_each_entry(block, &vres->blocks, link) {
> +		unsigned long fpfn =
> +			drm_buddy_block_offset(block) >> PAGE_SHIFT;
> +		unsigned long lpfn = fpfn +
> +			(drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
> +
> +		if (fpfn < place->fpfn || lpfn > place->lpfn)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
>   static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
>   	.alloc	= xe_ttm_vram_mgr_new,
>   	.free	= xe_ttm_vram_mgr_del,
> +	.intersects = xe_ttm_vram_mgr_intersects,
> +	.compatible = xe_ttm_vram_mgr_compatible,
>   	.debug	= xe_ttm_vram_mgr_debug
>   };
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks
  2023-03-20 14:15   ` Gwan-gyeong Mun
@ 2023-03-20 14:25     ` Matthew Auld
  2023-03-20 14:34       ` Gwan-gyeong Mun
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Auld @ 2023-03-20 14:25 UTC (permalink / raw)
  To: Gwan-gyeong Mun, intel-xe; +Cc: Lucas De Marchi

On 20/03/2023 14:15, Gwan-gyeong Mun wrote:
> 
> 
> On 3/8/23 12:17 PM, Matthew Auld wrote:
>> Copy this from i915. We need .compatible for lmem -> lmem transfers, so
>> they don't just get nooped by ttm, if need to move something from
>> mappable to non-mappble or vice versa. The .intersects is needed for
>> eviction, to determine if a victim resource is worth eviction. e.g if we
>> need mappable space there is no point in evicting a resource that has
>> zero mappable pages.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 62 ++++++++++++++++++++++++++++
>>   1 file changed, 62 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> index f703e962f499..8dd33ac65499 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>> @@ -230,9 +230,71 @@ static void xe_ttm_vram_mgr_debug(struct 
>> ttm_resource_manager *man,
>>       drm_printf(printer, "man size:%llu\n", man->size);
>>   }
>> +static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager *man,
>> +                       struct ttm_resource *res,
>> +                       const struct ttm_place *place,
>> +                       size_t size)
>> +{
>> +    struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>> +    struct xe_ttm_vram_mgr_resource *vres =
>> +        to_xe_ttm_vram_mgr_resource(res);
>> +    struct drm_buddy *mm = &mgr->mm;
>> +    struct drm_buddy_block *block;
>> +
>> +    if (!place->fpfn && !place->lpfn)
>> +        return true;
>> +Hi Matt,
> I understand that ttm_resource_intersects() returns true in the 
> following case,
> 
> if (!place || !man->func->intersects)
>          return true;
> 
> so this line can handle it like this, but in a real scenario, is it 
> possible to have a situation where place->fpfn==0 and place->lpfn==0?

Yeah, fpfn = 0 and lpfn = 0 is the common case, where the user doesn't 
care specifically where it is placed in VRAM. Therefore everything 
intersects (potentially valuable to evict), and is also always going to 
be compatible, so we just always return true.

> 
> Except for the routine that does the same check in 
> xe_ttm_vram_mgr_compatible(), the rest of it looks good.
> 
> br,
> G.G.
>> +    if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
>> +        return vres->used_visible_size > 0;
>> +
>> +    list_for_each_entry(block, &vres->blocks, link) {
>> +        unsigned long fpfn =
>> +            drm_buddy_block_offset(block) >> PAGE_SHIFT;
>> +        unsigned long lpfn = fpfn +
>> +            (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
>> +
>> +        if (place->fpfn < lpfn && place->lpfn > fpfn)
>> +            return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager *man,
>> +                       struct ttm_resource *res,
>> +                       const struct ttm_place *place,
>> +                       size_t size)
>> +{
>> +    struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>> +    struct xe_ttm_vram_mgr_resource *vres =
>> +        to_xe_ttm_vram_mgr_resource(res);
>> +    struct drm_buddy *mm = &mgr->mm;
>> +    struct drm_buddy_block *block;
>> +
>> +    if (!place->fpfn && !place->lpfn)
>> +        return true;
>> +
>> +    if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
>> +        return vres->used_visible_size == size;
>> +
>> +    list_for_each_entry(block, &vres->blocks, link) {
>> +        unsigned long fpfn =
>> +            drm_buddy_block_offset(block) >> PAGE_SHIFT;
>> +        unsigned long lpfn = fpfn +
>> +            (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
>> +
>> +        if (fpfn < place->fpfn || lpfn > place->lpfn)
>> +            return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +
>>   static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
>>       .alloc    = xe_ttm_vram_mgr_new,
>>       .free    = xe_ttm_vram_mgr_del,
>> +    .intersects = xe_ttm_vram_mgr_intersects,
>> +    .compatible = xe_ttm_vram_mgr_compatible,
>>       .debug    = xe_ttm_vram_mgr_debug
>>   };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking
  2023-03-20 13:03     ` Matthew Auld
@ 2023-03-20 14:26       ` Gwan-gyeong Mun
  0 siblings, 0 replies; 14+ messages in thread
From: Gwan-gyeong Mun @ 2023-03-20 14:26 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi



On 3/20/23 3:03 PM, Matthew Auld wrote:
> On 20/03/2023 12:08, Gwan-gyeong Mun wrote:
>>
>>
>> On 3/8/23 12:17 PM, Matthew Auld wrote:
>>> Replace the allocation code with the i915 version. This simplifies the
>>> code a little, and importantly we get the accounting at the mgr level,
>>> which is useful for debug (and maybe userspace), plus per resource
>>> tracking so we can easily check if a resource is using one or pages in
>>> the mappable part of vram (useful for eviction), or if the resource is
>>> completely within the mappable portion (useful for checking if the
>>> resource can be safely CPU mapped).
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c     |  18 +-
>>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 201 ++++++++++-----------
>>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
>>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |   6 +
>>>   4 files changed, 118 insertions(+), 110 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c 
>>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> index e608b49072ba..35e6bc62766e 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> @@ -158,7 +158,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>>   {
>>>       struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, 
>>> sizeof(*mgr), GFP_KERNEL);
>>>       struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>>> -    u64 stolen_size, pgsize;
>>> +    u64 stolen_size, io_size, pgsize;
>>>       int err;
>>>       if (IS_DGFX(xe))
>>> @@ -177,7 +177,17 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>>       if (pgsize < PAGE_SIZE)
>>>           pgsize = PAGE_SIZE;
>>> -    err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, 
>>> stolen_size, pgsize);
>>> +    /*
>>> +     * We don't try to attempt partial visible support for stolen vram,
>>> +     * since stolen is always at the end of vram, and the BAR size 
>>> is pretty
>>> +     * much always 256M, with small-bar.
>>> +     */
>>> +    io_size = 0;
>>> +    if (!xe_ttm_stolen_cpu_inaccessible(xe))
>>> +        io_size = stolen_size;
>>> +
>>> +    err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, 
>>> stolen_size,
>>> +                     io_size, pgsize);
>>>       if (err) {
>>>           drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err);
>>>           return;
>>> @@ -186,8 +196,8 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe)
>>>       drm_dbg_kms(&xe->drm, "Initialized stolen memory support with 
>>> %llu bytes\n",
>>>               stolen_size);
>>> -    if (!xe_ttm_stolen_cpu_inaccessible(xe))
>>> -        mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, 
>>> stolen_size);
>>> +    if (io_size)
>>> +        mgr->mapping = devm_ioremap_wc(&pdev->dev, mgr->io_base, 
>>> io_size);
>>>   }
>>>   u64 xe_ttm_stolen_io_offset(struct xe_bo *bo, u32 offset)
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> index e3ac8c6d3978..f703e962f499 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> @@ -49,45 +49,26 @@ static int xe_ttm_vram_mgr_new(struct 
>>> ttm_resource_manager *man,
>>>                      const struct ttm_place *place,
>>>                      struct ttm_resource **res)
>>>   {
>>> -    u64 max_bytes, cur_size, min_block_size;
>>>       struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>>>       struct xe_ttm_vram_mgr_resource *vres;
>>> -    u64 size, remaining_size, lpfn, fpfn;
>>>       struct drm_buddy *mm = &mgr->mm;
>>> -    unsigned long pages_per_block;
>>> -    int r;
>>> -
>>> -    lpfn = (u64)place->lpfn << PAGE_SHIFT;
>>> -    if (!lpfn || lpfn > man->size)
>>> -        lpfn = man->size;
>>> -
>>> -    fpfn = (u64)place->fpfn << PAGE_SHIFT;
>>> +    u64 size, remaining_size, min_page_size;
>>> +    unsigned long lpfn;
>>> +    int err;
>>> -    max_bytes = mgr->manager.size;
>>> -    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> -        pages_per_block = ~0ul;
>>> -    } else {
>>> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> -        pages_per_block = HPAGE_PMD_NR;
>>> -#else
>>> -        /* default to 2MB */
>>> -        pages_per_block = 2UL << (20UL - PAGE_SHIFT);
>>> -#endif
>>> -
>>> -        pages_per_block = max_t(uint32_t, pages_per_block,
>>> -                    tbo->page_alignment);
>>> -    }
>>> +    lpfn = place->lpfn;
>>> +    if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
>>> +        lpfn = man->size >> PAGE_SHIFT;
>>>       vres = kzalloc(sizeof(*vres), GFP_KERNEL);
>>>       if (!vres)
>>>           return -ENOMEM;
>>>       ttm_resource_init(tbo, place, &vres->base);
>>> -    remaining_size = vres->base.size;
>>>       /* bail out quickly if there's likely not enough VRAM for this 
>>> BO */
>>> -    if (ttm_resource_manager_usage(man) > max_bytes) {
>>> -        r = -ENOSPC;
>>> +    if (ttm_resource_manager_usage(man) > man->size) {
>>> +        err = -ENOSPC;
>>>           goto error_fini;
>>>       }
>>> @@ -96,95 +77,91 @@ static int xe_ttm_vram_mgr_new(struct 
>>> ttm_resource_manager *man,
>>>       if (place->flags & TTM_PL_FLAG_TOPDOWN)
>>>           vres->flags |= DRM_BUDDY_TOPDOWN_ALLOCATION;
>>> -    if (fpfn || lpfn != man->size)
>>> -        /* Allocate blocks in desired range */
>>> +    if (place->fpfn || lpfn != man->size >> PAGE_SHIFT)
>>>           vres->flags |= DRM_BUDDY_RANGE_ALLOCATION;
>>> -    mutex_lock(&mgr->lock);
>>> -    while (remaining_size) {
>>> -        if (tbo->page_alignment)
>>> -            min_block_size = tbo->page_alignment << PAGE_SHIFT;
>>> -        else
>>> -            min_block_size = mgr->default_page_size;
>>> -
>>> -        XE_BUG_ON(min_block_size < mm->chunk_size);
>>> -
>>> -        /* Limit maximum size to 2GiB due to SG table limitations */
>>> -        size = min(remaining_size, 2ULL << 30);
>>> -
>>> -        if (size >= pages_per_block << PAGE_SHIFT)
>>> -            min_block_size = pages_per_block << PAGE_SHIFT;
>>> -
>>> -        cur_size = size;
>>> -
>>> -        if (fpfn + size != place->lpfn << PAGE_SHIFT) {
>>> -            /*
>>> -             * Except for actual range allocation, modify the size and
>>> -             * min_block_size conforming to continuous flag enablement
>>> -             */
>>> -            if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> -                size = roundup_pow_of_two(size);
>>> -                min_block_size = size;
>>> -            /*
>>> -             * Modify the size value if size is not
>>> -             * aligned with min_block_size
>>> -             */
>>> -            } else if (!IS_ALIGNED(size, min_block_size)) {
>>> -                size = round_up(size, min_block_size);
>>> -            }
>>> -        }
>>> +    XE_BUG_ON(!vres->base.size);
>>> +    size = vres->base.size;
>>> -        r = drm_buddy_alloc_blocks(mm, fpfn,
>>> -                       lpfn,
>>> -                       size,
>>> -                       min_block_size,
>>> -                       &vres->blocks,
>>> -                       vres->flags);
>>> -        if (unlikely(r))
>>> -            goto error_free_blocks;
>>> +    min_page_size = mgr->default_page_size;
>>> +    if (tbo->page_alignment)
>>> +        min_page_size = tbo->page_alignment << PAGE_SHIFT;
>>> -        if (size > remaining_size)
>>> -            remaining_size = 0;
>>> -        else
>>> -            remaining_size -= size;
>>> +    XE_BUG_ON(min_page_size < mm->chunk_size);
>>> +    XE_BUG_ON(min_page_size > SZ_2G); /* FIXME: sg limit */
>>> +    XE_BUG_ON(size > SZ_2G &&
>>> +          (vres->base.placement & TTM_PL_FLAG_CONTIGUOUS));
>> Hi Matt,
>> What scenario is the purpose of testing? Is the purpose of reporting 
>> an error when it is a small bar case at the current stage?
> 
> XE_BUG_ON(min_page_size < mm->chunk_size) - This one is just a general 
> sanity check, which seemed good to add. The chunk_size is the minimum 
> sized chunk of memory that we can allocate, and so if we ask for 
> something smaller then likely that's a bug.
> 
> XE_BUG_ON(min_page_size > SZ_2G) - This is also not small-bar related, 
> but an existing limitation. If you look below the allocation is split 
> into a maximum allocation size of 2G (so that is can always fit in an sg 
> entry without overflowing). If someone in the future asks for page_size 
>  > 2G, then the alignment might not match, due to this splitting, hence 
> better treat that as an error. This one has a FIXME, since the code that 
> constructs the sg table could be improved to handle the limitation 
> there, instead of leaking it into the buddy code.
> 
> XE_BUG_ON(size > SZ_2G && (vres->base.placement & 
> TTM_PL_FLAG_CONTIGUOUS)) - And this is basically the same as above. We 
> can't have larger than 2G for one chunk AFAICT, so if someone asks for 
> CONTIG for such a case, the underlying allocation might not be 
> contiguous, since we need to make at least two separate allocation 
> requests. But again, I don't think anyone currently needs such a thing, 
> but best to just document/assert such limitations.
> 
>>> +    XE_BUG_ON(!IS_ALIGNED(size, min_page_size));
>> If it is not aligned, can't we increase it to the aligned size and 
>> allocate it?
> 
> Yeah, that's one possibility. It's just that the existing behaviour is 
> to leave that to the responsibility of the caller. AFAICT this is for 
> now mostly interesting on platforms where we need 64K min page size for 
> VRAM, and there the object size is meant to be rounded up, so we don't 
> currently expect anyone to trigger this, unless there is a bug.
> 
>>> +
>>> +    if (place->fpfn + (vres->base.size >> PAGE_SHIFT) != place->lpfn &&
>>> +        place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> +        unsigned long pages;
>>> +
>>> +        size = roundup_pow_of_two(size);
>>> +        min_page_size = size;
>>> +
>>> +        pages = size >> ilog2(mm->chunk_size);
>> Why did you use ilog2(mm->chunk_size) instead of PAGE_SHIFT here?
> 
> Yeah, it looks like the existing i915 code seems to mix chunk_size and 
> PAGE_SHIFT. I think back in the day the buddy allocator just accepted 
> the order, but now is happy with the number of bytes, so doesn't make 
> much sense. Will fix. Thanks for pointing this out.
> 
>>> +        if (pages > lpfn)
>>> +            lpfn = pages;
>>>       }
>>> -    mutex_unlock(&mgr->lock);
>>> -    if (cur_size != size) {
>>> -        struct drm_buddy_block *block;
>>> -        struct list_head *trim_list;
>>> -        u64 original_size;
>>> -        LIST_HEAD(temp);
>>> +    if (size > lpfn << PAGE_SHIFT) {
>> In case of DRM_BUDDY_RANGE_ALLOCATION, shouldn't it be compared with 
>> size > (lpfn - fpfn) << PAGE_SHIFT?
> 
> Yeah, that makes more sense. Also this should be higher up. Thanks for 
> reviewing.
Thanks for comments.
And I should have replied to the v4 patch, but I replied to v2. Please 
review the emailed reply as a reply to v4.
Br,
G.G.
> 
>>
>> Br,
>>
>> G.G.
>>> +        err = -E2BIG; /* don't trigger eviction */
>>> +        goto error_fini;
>>> +    }
>>> -        trim_list = &vres->blocks;
>>> -        original_size = vres->base.size;
>>> +    mutex_lock(&mgr->lock);
>>> +    if (lpfn <= mgr->visible_size && size > mgr->visible_avail) {
>>> +        mutex_unlock(&mgr->lock);
>>> +        err = -ENOSPC;
>>> +        goto error_fini;
>>> +    }
>>> +    remaining_size = size;
>>> +    do {
>>>           /*
>>> -         * If size value is rounded up to min_block_size, trim the last
>>> -         * block to the required size
>>> +         * Limit maximum size to 2GiB due to SG table limitations.
>>> +         * FIXME: Should maybe be handled as part of sg construction.
>>>            */
>>> -        if (!list_is_singular(&vres->blocks)) {
>>> -            block = list_last_entry(&vres->blocks, typeof(*block), 
>>> link);
>>> -            list_move_tail(&block->link, &temp);
>>> -            trim_list = &temp;
>>> -            /*
>>> -             * Compute the original_size value by subtracting the
>>> -             * last block size with (aligned size - original size)
>>> -             */
>>> -            original_size = drm_buddy_block_size(mm, block) -
>>> -                (size - cur_size);
>>> -        }
>>> +        u64 alloc_size = min_t(u64, remaining_size, SZ_2G);
>>> +
>>> +        err = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << 
>>> PAGE_SHIFT,
>>> +                         (u64)lpfn << PAGE_SHIFT,
>>> +                         alloc_size,
>>> +                         min_page_size,
>>> +                         &vres->blocks,
>>> +                         vres->flags);
>>> +        if (err)
>>> +            goto error_free_blocks;
>>> -        mutex_lock(&mgr->lock);
>>> -        drm_buddy_block_trim(mm,
>>> -                     original_size,
>>> -                     trim_list);
>>> -        mutex_unlock(&mgr->lock);
>>> +        remaining_size -= alloc_size;
>>> +    } while (remaining_size);
>>> -        if (!list_empty(&temp))
>>> -            list_splice_tail(trim_list, &vres->blocks);
>>> +    if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {
>>> +        if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks))
>>> +            size = vres->base.size;
>>>       }
>>> +    if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
>>> +        vres->used_visible_size = size;
>>> +    } else {
>>> +        struct drm_buddy_block *block;
>>> +
>>> +        list_for_each_entry(block, &vres->blocks, link) {
>>> +            u64 start = drm_buddy_block_offset(block);
>>> +
>>> +            if (start < mgr->visible_size) {
>>> +                u64 end = start + drm_buddy_block_size(mm, block);
>>> +
>>> +                vres->used_visible_size +=
>>> +                    min(end, mgr->visible_size) - start;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    mgr->visible_avail -= vres->used_visible_size;
>>> +    mutex_unlock(&mgr->lock);
>>> +
>>>       if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
>>>           xe_is_vram_mgr_blocks_contiguous(mm, &vres->blocks))
>>>           vres->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>>> @@ -213,7 +190,7 @@ static int xe_ttm_vram_mgr_new(struct 
>>> ttm_resource_manager *man,
>>>       ttm_resource_fini(man, &vres->base);
>>>       kfree(vres);
>>> -    return r;
>>> +    return err;
>>>   }
>>>   static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
>>> @@ -226,6 +203,7 @@ static void xe_ttm_vram_mgr_del(struct 
>>> ttm_resource_manager *man,
>>>       mutex_lock(&mgr->lock);
>>>       drm_buddy_free_list(mm, &vres->blocks);
>>> +    mgr->visible_avail += vres->used_visible_size;
>>>       mutex_unlock(&mgr->lock);
>>>       ttm_resource_fini(man, res);
>>> @@ -240,6 +218,13 @@ static void xe_ttm_vram_mgr_debug(struct 
>>> ttm_resource_manager *man,
>>>       struct drm_buddy *mm = &mgr->mm;
>>>       mutex_lock(&mgr->lock);
>>> +    drm_printf(printer, "default_page_size: %lluKiB\n",
>>> +           mgr->default_page_size >> 10);
>>> +    drm_printf(printer, "visible_avail: %lluMiB\n",
>>> +           (u64)mgr->visible_avail >> 20);
>>> +    drm_printf(printer, "visible_size: %lluMiB\n",
>>> +           (u64)mgr->visible_size >> 20);
>>> +
>>>       drm_buddy_print(mm, printer);
>>>       mutex_unlock(&mgr->lock);
>>>       drm_printf(printer, "man size:%llu\n", man->size);
>>> @@ -262,6 +247,8 @@ static void ttm_vram_mgr_fini(struct drm_device 
>>> *dev, void *arg)
>>>       if (ttm_resource_manager_evict_all(&xe->ttm, man))
>>>           return;
>>> +    WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>>> +
>>>       drm_buddy_fini(&mgr->mm);
>>>       ttm_resource_manager_cleanup(&mgr->manager);
>>> @@ -270,7 +257,8 @@ static void ttm_vram_mgr_fini(struct drm_device 
>>> *dev, void *arg)
>>>   }
>>>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct 
>>> xe_ttm_vram_mgr *mgr,
>>> -               u32 mem_type, u64 size, u64 default_page_size)
>>> +               u32 mem_type, u64 size, u64 io_size,
>>> +               u64 default_page_size)
>>>   {
>>>       struct ttm_resource_manager *man = &mgr->manager;
>>>       int err;
>>> @@ -279,6 +267,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, 
>>> struct xe_ttm_vram_mgr *mgr,
>>>       mgr->mem_type = mem_type;
>>>       mutex_init(&mgr->lock);
>>>       mgr->default_page_size = default_page_size;
>>> +    mgr->visible_size = io_size;
>>> +    mgr->visible_avail = io_size;
>>>       ttm_resource_manager_init(man, &xe->ttm, size);
>>>       err = drm_buddy_init(&mgr->mm, man->size, default_page_size);
>>> @@ -298,7 +288,8 @@ int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct 
>>> xe_ttm_vram_mgr *mgr)
>>>       mgr->gt = gt;
>>>       return __xe_ttm_vram_mgr_init(xe, mgr, XE_PL_VRAM0 + 
>>> gt->info.vram_id,
>>> -                      gt->mem.vram.size, PAGE_SIZE);
>>> +                      gt->mem.vram.size, gt->mem.vram.io_size,
>>> +                      PAGE_SIZE);
>>>   }
>>>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h 
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> index 78f332d26224..35e5367a79fb 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> @@ -13,7 +13,8 @@ struct xe_device;
>>>   struct xe_gt;
>>>   int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct 
>>> xe_ttm_vram_mgr *mgr,
>>> -               u32 mem_type, u64 size, u64 default_page_size);
>>> +               u32 mem_type, u64 size, u64 io_size,
>>> +               u64 default_page_size);
>>>   int xe_ttm_vram_mgr_init(struct xe_gt *gt, struct xe_ttm_vram_mgr 
>>> *mgr);
>>>   int xe_ttm_vram_mgr_alloc_sgt(struct xe_device *xe,
>>>                     struct ttm_resource *res,
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h 
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> index 39aa2ec1b968..3d9417ff7434 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> @@ -23,6 +23,10 @@ struct xe_ttm_vram_mgr {
>>>       struct ttm_resource_manager manager;
>>>       /** @mm: DRM buddy allocator which manages the VRAM */
>>>       struct drm_buddy mm;
>>> +    /** @visible_size: Proped size of the CPU visible portion */
>>> +    u64 visible_size;
>>> +    /** @visible_avail: CPU visible portion still unallocated */
>>> +    u64 visible_avail;
>>>       /** @default_page_size: default page size */
>>>       u64 default_page_size;
>>>       /** @lock: protects allocations of VRAM */
>>> @@ -39,6 +43,8 @@ struct xe_ttm_vram_mgr_resource {
>>>       struct ttm_resource base;
>>>       /** @blocks: list of DRM buddy blocks */
>>>       struct list_head blocks;
>>> +    /** @used_visible_size: How many CPU visible bytes this resource 
>>> is using */
>>> +    u64 used_visible_size;
>>>       /** @flags: flags associated with the resource */
>>>       unsigned long flags;
>>>   };

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks
  2023-03-20 14:25     ` Matthew Auld
@ 2023-03-20 14:34       ` Gwan-gyeong Mun
  0 siblings, 0 replies; 14+ messages in thread
From: Gwan-gyeong Mun @ 2023-03-20 14:34 UTC (permalink / raw)
  To: Matthew Auld, intel-xe; +Cc: Lucas De Marchi



On 3/20/23 4:25 PM, Matthew Auld wrote:
> On 20/03/2023 14:15, Gwan-gyeong Mun wrote:
>>
>>
>> On 3/8/23 12:17 PM, Matthew Auld wrote:
>>> Copy this from i915. We need .compatible for lmem -> lmem transfers, so
>>> they don't just get nooped by ttm, if need to move something from
>>> mappable to non-mappble or vice versa. The .intersects is needed for
>>> eviction, to determine if a victim resource is worth eviction. e.g if we
>>> need mappable space there is no point in evicting a resource that has
>>> zero mappable pages.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 62 ++++++++++++++++++++++++++++
>>>   1 file changed, 62 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c 
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> index f703e962f499..8dd33ac65499 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> @@ -230,9 +230,71 @@ static void xe_ttm_vram_mgr_debug(struct 
>>> ttm_resource_manager *man,
>>>       drm_printf(printer, "man size:%llu\n", man->size);
>>>   }
>>> +static bool xe_ttm_vram_mgr_intersects(struct ttm_resource_manager 
>>> *man,
>>> +                       struct ttm_resource *res,
>>> +                       const struct ttm_place *place,
>>> +                       size_t size)
>>> +{
>>> +    struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>>> +    struct xe_ttm_vram_mgr_resource *vres =
>>> +        to_xe_ttm_vram_mgr_resource(res);
>>> +    struct drm_buddy *mm = &mgr->mm;
>>> +    struct drm_buddy_block *block;
>>> +
>>> +    if (!place->fpfn && !place->lpfn)
>>> +        return true;
>>> +Hi Matt,
>> I understand that ttm_resource_intersects() returns true in the 
>> following case,
>>
>> if (!place || !man->func->intersects)
>>          return true;
>>
>> so this line can handle it like this, but in a real scenario, is it 
>> possible to have a situation where place->fpfn==0 and place->lpfn==0?
> 
> Yeah, fpfn = 0 and lpfn = 0 is the common case, where the user doesn't 
> care specifically where it is placed in VRAM. Therefore everything 
> intersects (potentially valuable to evict), and is also always going to 
> be compatible, so we just always return true.
> 
Thanks for the detailed explanation.

Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>

Br,
G.G.
>>
>> Except for the routine that does the same check in 
>> xe_ttm_vram_mgr_compatible(), the rest of it looks good.
>>
>> br,
>> G.G.
>>> +    if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
>>> +        return vres->used_visible_size > 0;
>>> +
>>> +    list_for_each_entry(block, &vres->blocks, link) {
>>> +        unsigned long fpfn =
>>> +            drm_buddy_block_offset(block) >> PAGE_SHIFT;
>>> +        unsigned long lpfn = fpfn +
>>> +            (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
>>> +
>>> +        if (place->fpfn < lpfn && place->lpfn > fpfn)
>>> +            return true;
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +static bool xe_ttm_vram_mgr_compatible(struct ttm_resource_manager 
>>> *man,
>>> +                       struct ttm_resource *res,
>>> +                       const struct ttm_place *place,
>>> +                       size_t size)
>>> +{
>>> +    struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
>>> +    struct xe_ttm_vram_mgr_resource *vres =
>>> +        to_xe_ttm_vram_mgr_resource(res);
>>> +    struct drm_buddy *mm = &mgr->mm;
>>> +    struct drm_buddy_block *block;
>>> +
>>> +    if (!place->fpfn && !place->lpfn)
>>> +        return true;
>>> +
>>> +    if (!place->fpfn && place->lpfn == mgr->visible_size >> PAGE_SHIFT)
>>> +        return vres->used_visible_size == size;
>>> +
>>> +    list_for_each_entry(block, &vres->blocks, link) {
>>> +        unsigned long fpfn =
>>> +            drm_buddy_block_offset(block) >> PAGE_SHIFT;
>>> +        unsigned long lpfn = fpfn +
>>> +            (drm_buddy_block_size(mm, block) >> PAGE_SHIFT);
>>> +
>>> +        if (fpfn < place->fpfn || lpfn > place->lpfn)
>>> +            return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +
>>>   static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
>>>       .alloc    = xe_ttm_vram_mgr_new,
>>>       .free    = xe_ttm_vram_mgr_del,
>>> +    .intersects = xe_ttm_vram_mgr_intersects,
>>> +    .compatible = xe_ttm_vram_mgr_compatible,
>>>       .debug    = xe_ttm_vram_mgr_debug
>>>   };

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-03-20 14:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-08 10:17 [Intel-xe] [PATCH v2 0/6] Some small-bar prep patches Matthew Auld
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 1/6] drm/xe: add xe_ttm_stolen_cpu_access_needs_ggtt() Matthew Auld
2023-03-08 11:20   ` Maarten Lankhorst
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 2/6] drm/xe: s/lmem/vram/ Matthew Auld
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 3/6] drm/xe/vram: start tracking the io_size Matthew Auld
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 4/6] drm/xe/buddy: remove the virtualized start Matthew Auld
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 5/6] drm/xe/buddy: add visible tracking Matthew Auld
2023-03-20 12:08   ` Gwan-gyeong Mun
2023-03-20 13:03     ` Matthew Auld
2023-03-20 14:26       ` Gwan-gyeong Mun
2023-03-08 10:17 ` [Intel-xe] [PATCH v2 6/6] drm/xe/buddy: add compatible and intersects hooks Matthew Auld
2023-03-20 14:15   ` Gwan-gyeong Mun
2023-03-20 14:25     ` Matthew Auld
2023-03-20 14:34       ` Gwan-gyeong Mun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.