All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimize VM handling a bit more
@ 2018-09-09 18:03 Christian König
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi everyone,

Especially on Vega and Raven VM handling is rather inefficient while creating PTEs because we originally only supported 2 level page tables and implemented 4 level page tables on top of that.

This patch set reworks quite a bit of that handling and adds proper iterator and tree walking functions which are then used to update PTEs more efficiently.

A totally constructed test case which tried to map 2GB of VRAM on an unaligned address is reduced from 45ms down to ~20ms on my test system.

As a very positive side effect this also adds support for 1GB giant VRAM pages additional to the existing 2MB huge pages on Vega/Raven and also enables all additional power of two values (2MB-2GB) for the L1.

This could be beneficial for applications which allocate very huge amounts of memory because it reduces the overhead of page table walks by 50% (huge pages where 25%).

Please comment and/or review,
Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-09 18:03   ` Christian König
       [not found]     ` <20180909180339.1910-2-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 02/11] drm/amdgpu: add amdgpu_vm_pt_parent helper Christian König
                     ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Try to allocate VRAM in power of two sizes and only fallback to vram
split sizes if that fails.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 9cfa8a9ada92..3f9d5d00c9b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
 	return usage;
 }
 
+/**
+ * amdgpu_vram_mgr_virt_start - update virtual start address
+ *
+ * @mem: ttm_mem_reg to update
+ * @node: just allocated node
+ *
+ * Calculate a virtual BO start address to easily check if everything is CPU
+ * accessible.
+ */
+static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
+				       struct drm_mm_node *node)
+{
+	unsigned long start;
+
+	start = node->start + node->size;
+	if (start > mem->num_pages)
+		start -= mem->num_pages;
+	else
+		start = 0;
+	mem->start = max(mem->start, start);
+}
+
 /**
  * amdgpu_vram_mgr_new - allocate new ranges
  *
@@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
 	pages_left = mem->num_pages;
 
 	spin_lock(&mgr->lock);
-	for (i = 0; i < num_nodes; ++i) {
+	for (i = 0; pages_left >= pages_per_node; ++i) {
+		unsigned long pages = rounddown_pow_of_two(pages_left);
+
+		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
+						pages_per_node, 0,
+						place->fpfn, lpfn,
+						mode);
+		if (unlikely(r))
+			break;
+
+		usage += nodes[i].size << PAGE_SHIFT;
+		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
+		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
+		pages_left -= pages;
+	}
+
+	for (; pages_left; ++i) {
 		unsigned long pages = min(pages_left, pages_per_node);
 		uint32_t alignment = mem->page_alignment;
-		unsigned long start;
 
 		if (pages == pages_per_node)
 			alignment = pages_per_node;
@@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
 
 		usage += nodes[i].size << PAGE_SHIFT;
 		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
-
-		/* Calculate a virtual BO start address to easily check if
-		 * everything is CPU accessible.
-		 */
-		start = nodes[i].start + nodes[i].size;
-		if (start > mem->num_pages)
-			start -= mem->num_pages;
-		else
-			start = 0;
-		mem->start = max(mem->start, start);
+		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
 		pages_left -= pages;
 	}
 	spin_unlock(&mgr->lock);
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] drm/amdgpu: add amdgpu_vm_pt_parent helper
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 03/11] drm/amdgpu: add amdgpu_vm_update_func Christian König
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Add a function to get the parent of a PD/PT.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 36 ++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 1f79a0ddc78a..5eae2bb5f368 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -337,6 +337,24 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base,
 	amdgpu_vm_bo_evicted(base);
 }
 
+/**
+ * amdgpu_vm_pt_parent - get the parent page directory
+ *
+ * @pt: child page table
+ *
+ * Helper to get the parent entry for the child page table. NULL if we are at
+ * the root page directory.
+ */
+static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct amdgpu_vm_pt *pt)
+{
+	struct amdgpu_bo *parent = pt->base.bo->parent;
+
+	if (!parent)
+		return NULL;
+
+	return list_first_entry(&parent->va, struct amdgpu_vm_pt, base.bo_list);
+}
+
 /**
  * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
  *
@@ -1202,24 +1220,16 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 	}
 
 	while (!list_empty(&vm->relocated)) {
-		struct amdgpu_vm_bo_base *bo_base, *parent;
 		struct amdgpu_vm_pt *pt, *entry;
-		struct amdgpu_bo *bo;
 
-		bo_base = list_first_entry(&vm->relocated,
-					   struct amdgpu_vm_bo_base,
-					   vm_status);
-		amdgpu_vm_bo_idle(bo_base);
+		entry = list_first_entry(&vm->relocated, struct amdgpu_vm_pt,
+					 base.vm_status);
+		amdgpu_vm_bo_idle(&entry->base);
 
-		bo = bo_base->bo->parent;
-		if (!bo)
+		pt = amdgpu_vm_pt_parent(entry);
+		if (!pt)
 			continue;
 
-		parent = list_first_entry(&bo->va, struct amdgpu_vm_bo_base,
-					  bo_list);
-		pt = container_of(parent, struct amdgpu_vm_pt, base);
-		entry = container_of(bo_base, struct amdgpu_vm_pt, base);
-
 		amdgpu_vm_update_pde(&params, vm, pt, entry);
 
 		if (!vm->use_cpu_for_update &&
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] drm/amdgpu: add amdgpu_vm_update_func
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two Christian König
  2018-09-09 18:03   ` [PATCH 02/11] drm/amdgpu: add amdgpu_vm_pt_parent helper Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators Christian König
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Add helper to call the update function for both BO and shadow.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5eae2bb5f368..416eccd9ea29 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1109,6 +1109,22 @@ static int amdgpu_vm_wait_pd(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	return r;
 }
 
+/**
+ * amdgpu_vm_update_func - helper to call update function
+ *
+ * Calls the update function for both the given BO as well as its shadow.
+ */
+static void amdgpu_vm_update_func(struct amdgpu_pte_update_params *params,
+				  struct amdgpu_bo *bo,
+				  uint64_t pe, uint64_t addr,
+				  unsigned count, uint32_t incr,
+				  uint64_t flags)
+{
+	if (bo->shadow)
+		params->func(params, bo->shadow, pe, addr, count, incr, flags);
+	params->func(params, bo, pe, addr, count, incr, flags);
+}
+
 /*
  * amdgpu_vm_update_pde - update a single level in the hierarchy
  *
@@ -1138,9 +1154,7 @@ static void amdgpu_vm_update_pde(struct amdgpu_pte_update_params *params,
 	level += params->adev->vm_manager.root_level;
 	amdgpu_gmc_get_pde_for_bo(entry->base.bo, level, &pt, &flags);
 	pde = (entry - parent->entries) * 8;
-	if (bo->shadow)
-		params->func(params, bo->shadow, pde, pt, 1, 0, flags);
-	params->func(params, bo, pde, pt, 1, 0, flags);
+	amdgpu_vm_update_func(params, bo, pde, pt, 1, 0, flags);
 }
 
 /*
@@ -1347,9 +1361,7 @@ static void amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
 	amdgpu_gmc_get_vm_pde(p->adev, AMDGPU_VM_PDB0, &dst, &flags);
 
 	pde = (entry - parent->entries) * 8;
-	if (parent->base.bo->shadow)
-		p->func(p, parent->base.bo->shadow, pde, dst, 1, 0, flags);
-	p->func(p, parent->base.bo, pde, dst, 1, 0, flags);
+	amdgpu_vm_update_func(p, parent->base.bo, pde, dst, 1, 0, flags);
 }
 
 /**
@@ -1399,11 +1411,9 @@ static int amdgpu_vm_update_ptes(struct amdgpu_pte_update_params *params,
 
 		pt = entry->base.bo;
 		pe_start = (addr & mask) * 8;
-		if (pt->shadow)
-			params->func(params, pt->shadow, pe_start, dst, nptes,
-				     AMDGPU_GPU_PAGE_SIZE, flags);
-		params->func(params, pt, pe_start, dst, nptes,
-			     AMDGPU_GPU_PAGE_SIZE, flags);
+		amdgpu_vm_update_func(params, pt, pe_start, dst, nptes,
+				      AMDGPU_GPU_PAGE_SIZE, flags);
+
 	}
 
 	return 0;
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 03/11] drm/amdgpu: add amdgpu_vm_update_func Christian König
@ 2018-09-09 18:03   ` Christian König
       [not found]     ` <20180909180339.1910-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 05/11] drm/amdgpu: use leaf iterator for allocating PD/PT Christian König
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Both a leaf as well as dfs iterator to walk over all the PDs/PTs.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 221 +++++++++++++++++++++++++++++++++
 1 file changed, 221 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 416eccd9ea29..4007202585d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -355,6 +355,227 @@ static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct amdgpu_vm_pt *pt)
 	return list_first_entry(&parent->va, struct amdgpu_vm_pt, base.bo_list);
 }
 
+/**
+ * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
+ */
+struct amdgpu_vm_pt_cursor {
+	uint64_t pfn;
+	struct amdgpu_vm_pt *parent;
+	struct amdgpu_vm_pt *entry;
+	unsigned level;
+};
+
+/**
+ * amdgpu_vm_pt_start - start PD/PT walk
+ *
+ * @adev: amdgpu_device pointer
+ * @vm: amdgpu_vm structure
+ * @start: start address of the walk
+ * @cursor: state to initialize
+ *
+ * Initialize a amdgpu_vm_pt_cursor to start a walk.
+ */
+static void amdgpu_vm_pt_start(struct amdgpu_device *adev,
+			       struct amdgpu_vm *vm, uint64_t start,
+			       struct amdgpu_vm_pt_cursor *cursor)
+{
+	cursor->pfn = start;
+	cursor->parent = NULL;
+	cursor->entry = &vm->root;
+	cursor->level = adev->vm_manager.root_level;
+}
+
+/**
+ * amdgpu_vm_pt_descendant - got to child node
+ *
+ * @adev: amdgpu_device pointer
+ * @cursor: current state
+ *
+ * Walk to the child node of the current node.
+ * Returns:
+ * True if the walk was possible, false otherwise.
+ */
+static bool amdgpu_vm_pt_descendant(struct amdgpu_device *adev,
+				    struct amdgpu_vm_pt_cursor *cursor)
+{
+	unsigned num_entries, shift, idx;
+
+	if (!cursor->entry->entries)
+		return false;
+
+	BUG_ON(!cursor->entry->base.bo);
+	num_entries = amdgpu_vm_num_entries(adev, cursor->level);
+	shift = amdgpu_vm_level_shift(adev, cursor->level);
+
+	++cursor->level;
+	idx = (cursor->pfn >> shift) % num_entries;
+	cursor->parent = cursor->entry;
+	cursor->entry = &cursor->entry->entries[idx];
+	return true;
+}
+
+/**
+ * amdgpu_vm_pt_sibling - go to sibling node
+ *
+ * @adev: amdgpu_device pointer
+ * @cursor: current state
+ *
+ * Walk to the sibling node of the current node.
+ * Returns:
+ * True if the walk was possible, false otherwise.
+ */
+static bool amdgpu_vm_pt_sibling(struct amdgpu_device *adev,
+				 struct amdgpu_vm_pt_cursor *cursor)
+{
+	unsigned shift, num_entries;
+
+	/* Root doesn't have a sibling */
+	if (!cursor->parent)
+		return false;
+
+	/* Go to our parents and see if we got a sibling */
+	shift = amdgpu_vm_level_shift(adev, cursor->level - 1);
+	num_entries = amdgpu_vm_num_entries(adev, cursor->level - 1);
+
+	if (cursor->entry == &cursor->parent->entries[num_entries - 1])
+		return false;
+
+	cursor->pfn += 1ULL << shift;
+	cursor->pfn &= ~((1ULL << shift) - 1);
+	++cursor->entry;
+	return true;
+}
+
+/**
+ * amdgpu_vm_pt_ancestor - go to parent node
+ *
+ * @adev: amdgpu_device pointer
+ * @cursor: current state
+ *
+ * Walk to the parent node of the current node.
+ * Returns:
+ * True if the walk was possible, false otherwise.
+ */
+static bool amdgpu_vm_pt_ancestor(struct amdgpu_vm_pt_cursor *cursor)
+{
+	if (!cursor->parent)
+		return false;
+
+	--cursor->level;
+	cursor->entry = cursor->parent;
+	cursor->parent = amdgpu_vm_pt_parent(cursor->parent);
+	return true;
+}
+
+/**
+ * amdgpu_vm_pt_next - get next PD/PT in hieratchy
+ *
+ * @adev: amdgpu_device pointer
+ * @cursor: current state
+ *
+ * Walk the PD/PT tree to the next node.
+ */
+static void amdgpu_vm_pt_next(struct amdgpu_device *adev,
+			      struct amdgpu_vm_pt_cursor *cursor)
+{
+	/* First try a newborn child */
+	if (amdgpu_vm_pt_descendant(adev, cursor))
+		return;
+
+	/* If that didn't worked try to find a sibling */
+	while (!amdgpu_vm_pt_sibling(adev, cursor)) {
+		/* No sibling, go to our parents and grandparents */
+		if (!amdgpu_vm_pt_ancestor(cursor)) {
+			cursor->pfn = ~0ll;
+			return;
+		}
+	}
+}
+
+/**
+ * amdgpu_vm_pt_first_leaf - get first leaf PD/PT
+ *
+ * @adev: amdgpu_device pointer
+ * @vm: amdgpu_vm structure
+ * @start: start addr of the walk
+ * @cursor: state to initialize
+ *
+ * Start a walk and go directly to the leaf node.
+ */
+static void amdgpu_vm_pt_first_leaf(struct amdgpu_device *adev,
+				    struct amdgpu_vm *vm, uint64_t start,
+				    struct amdgpu_vm_pt_cursor *cursor)
+{
+	amdgpu_vm_pt_start(adev, vm, start, cursor);
+	while (amdgpu_vm_pt_descendant(adev, cursor));
+}
+
+/**
+ * amdgpu_vm_pt_next_leaf - get next leaf PD/PT
+ *
+ * @adev: amdgpu_device pointer
+ * @cursor: current state
+ *
+ * Walk the PD/PT tree to the next leaf node.
+ */
+static void amdgpu_vm_pt_next_leaf(struct amdgpu_device *adev,
+				   struct amdgpu_vm_pt_cursor *cursor)
+{
+	amdgpu_vm_pt_next(adev, cursor);
+	while (amdgpu_vm_pt_descendant(adev, cursor));
+}
+
+/**
+ * for_each_amdgpu_vm_pt_leaf - walk over all leaf PDs/PTs in the hierarchy
+ */
+#define for_each_amdgpu_vm_pt_leaf(adev, vm, start, end, cursor)		\
+	for (amdgpu_vm_pt_first_leaf((adev), (vm), (start), &(cursor));		\
+	     (cursor).pfn <= end; amdgpu_vm_pt_next_leaf((adev), &(cursor)))
+
+/**
+ * amdgpu_vm_pt_first_dfs - start a deep first search
+ *
+ * @adev: amdgpu_device structure
+ * @vm: amdgpu_vm structure
+ * @cursor: state to initialize
+ *
+ * Starts a deep first traversal of the PD/PT tree.
+ */
+static void amdgpu_vm_pt_first_dfs(struct amdgpu_device *adev,
+				   struct amdgpu_vm *vm,
+				   struct amdgpu_vm_pt_cursor *cursor)
+{
+	amdgpu_vm_pt_start(adev, vm, 0, cursor);
+	while (amdgpu_vm_pt_descendant(adev, cursor));
+}
+
+/**
+ * amdgpu_vm_pt_next_dfs - get the next node for a deep first search
+ *
+ * @adev: amdgpu_device structure
+ * @cursor: current state
+ *
+ * Move the cursor to the next node in a deep first search.
+ */
+static void amdgpu_vm_pt_next_dfs(struct amdgpu_device *adev,
+				  struct amdgpu_vm_pt_cursor *cursor)
+{
+	if (!cursor->parent)
+		cursor->entry = NULL;
+	else if (amdgpu_vm_pt_sibling(adev, cursor))
+		while (amdgpu_vm_pt_descendant(adev, cursor));
+	else
+		amdgpu_vm_pt_ancestor(cursor);
+}
+
+/**
+ * for_each_amdgpu_vm_pt_dfs_safe - safe deep first search of all PDs/PTs
+ */
+#define for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry)			\
+	for (amdgpu_vm_pt_first_dfs((adev), (vm), &(cursor)),			\
+	     (entry) = (cursor).entry; (entry); (entry) = (cursor).entry,	\
+	     amdgpu_vm_pt_next_dfs((adev), &(cursor)))
+
 /**
  * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
  *
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] drm/amdgpu: use leaf iterator for allocating PD/PT
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 06/11] drm/amdgpu: use dfs iterator to free PDs/PTs Christian König
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Less code and allows for easier error handling.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 156 ++++++++++++---------------------
 1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4007202585d4..c9b7793501cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -837,103 +837,6 @@ static void amdgpu_vm_bo_param(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		bp->resv = vm->root.base.bo->tbo.resv;
 }
 
-/**
- * amdgpu_vm_alloc_levels - allocate the PD/PT levels
- *
- * @adev: amdgpu_device pointer
- * @vm: requested vm
- * @parent: parent PT
- * @saddr: start of the address range
- * @eaddr: end of the address range
- * @level: VMPT level
- * @ats: indicate ATS support from PTE
- *
- * Make sure the page directories and page tables are allocated
- *
- * Returns:
- * 0 on success, errno otherwise.
- */
-static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
-				  struct amdgpu_vm *vm,
-				  struct amdgpu_vm_pt *parent,
-				  uint64_t saddr, uint64_t eaddr,
-				  unsigned level, bool ats)
-{
-	unsigned shift = amdgpu_vm_level_shift(adev, level);
-	struct amdgpu_bo_param bp;
-	unsigned pt_idx, from, to;
-	int r;
-
-	if (!parent->entries) {
-		unsigned num_entries = amdgpu_vm_num_entries(adev, level);
-
-		parent->entries = kvmalloc_array(num_entries,
-						   sizeof(struct amdgpu_vm_pt),
-						   GFP_KERNEL | __GFP_ZERO);
-		if (!parent->entries)
-			return -ENOMEM;
-	}
-
-	from = saddr >> shift;
-	to = eaddr >> shift;
-	if (from >= amdgpu_vm_num_entries(adev, level) ||
-	    to >= amdgpu_vm_num_entries(adev, level))
-		return -EINVAL;
-
-	++level;
-	saddr = saddr & ((1 << shift) - 1);
-	eaddr = eaddr & ((1 << shift) - 1);
-
-	amdgpu_vm_bo_param(adev, vm, level, &bp);
-
-	/* walk over the address space and allocate the page tables */
-	for (pt_idx = from; pt_idx <= to; ++pt_idx) {
-		struct amdgpu_vm_pt *entry = &parent->entries[pt_idx];
-		struct amdgpu_bo *pt;
-
-		if (!entry->base.bo) {
-			r = amdgpu_bo_create(adev, &bp, &pt);
-			if (r)
-				return r;
-
-			r = amdgpu_vm_clear_bo(adev, vm, pt, level, ats);
-			if (r) {
-				amdgpu_bo_unref(&pt->shadow);
-				amdgpu_bo_unref(&pt);
-				return r;
-			}
-
-			if (vm->use_cpu_for_update) {
-				r = amdgpu_bo_kmap(pt, NULL);
-				if (r) {
-					amdgpu_bo_unref(&pt->shadow);
-					amdgpu_bo_unref(&pt);
-					return r;
-				}
-			}
-
-			/* Keep a reference to the root directory to avoid
-			* freeing them up in the wrong order.
-			*/
-			pt->parent = amdgpu_bo_ref(parent->base.bo);
-
-			amdgpu_vm_bo_base_init(&entry->base, vm, pt);
-		}
-
-		if (level < AMDGPU_VM_PTB) {
-			uint64_t sub_saddr = (pt_idx == from) ? saddr : 0;
-			uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
-				((1 << shift) - 1);
-			r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr,
-						   sub_eaddr, level, ats);
-			if (r)
-				return r;
-		}
-	}
-
-	return 0;
-}
-
 /**
  * amdgpu_vm_alloc_pts - Allocate page tables.
  *
@@ -942,7 +845,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
  * @saddr: Start address which needs to be allocated
  * @size: Size from start address we need.
  *
- * Make sure the page tables are allocated.
+ * Make sure the page directories and page tables are allocated
  *
  * Returns:
  * 0 on success, errno otherwise.
@@ -951,8 +854,11 @@ int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 			struct amdgpu_vm *vm,
 			uint64_t saddr, uint64_t size)
 {
-	uint64_t eaddr;
+	struct amdgpu_vm_pt_cursor cursor;
+	struct amdgpu_bo *pt;
 	bool ats = false;
+	uint64_t eaddr;
+	int r;
 
 	/* validate the parameters */
 	if (saddr & AMDGPU_GPU_PAGE_MASK || size & AMDGPU_GPU_PAGE_MASK)
@@ -972,8 +878,56 @@ int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 		return -EINVAL;
 	}
 
-	return amdgpu_vm_alloc_levels(adev, vm, &vm->root, saddr, eaddr,
-				      adev->vm_manager.root_level, ats);
+	for_each_amdgpu_vm_pt_leaf(adev, vm, saddr, eaddr, cursor) {
+		struct amdgpu_vm_pt *entry = cursor.entry;
+		struct amdgpu_bo_param bp;
+
+		if (cursor.level < AMDGPU_VM_PTB) {
+			unsigned num_entries;
+
+			num_entries = amdgpu_vm_num_entries(adev, cursor.level);
+			entry->entries = kvmalloc_array(num_entries,
+							sizeof(*entry->entries),
+							GFP_KERNEL |
+							__GFP_ZERO);
+			if (!entry->entries)
+				return -ENOMEM;
+		}
+
+
+		if (entry->base.bo)
+			continue;
+
+		amdgpu_vm_bo_param(adev, vm, cursor.level, &bp);
+
+		r = amdgpu_bo_create(adev, &bp, &pt);
+		if (r)
+			return r;
+
+		r = amdgpu_vm_clear_bo(adev, vm, pt, cursor.level, ats);
+		if (r)
+			goto error_free_pt;
+
+		if (vm->use_cpu_for_update) {
+			r = amdgpu_bo_kmap(pt, NULL);
+			if (r)
+				goto error_free_pt;
+		}
+
+		/* Keep a reference to the root directory to avoid
+		* freeing them up in the wrong order.
+		*/
+		pt->parent = amdgpu_bo_ref(cursor.parent->base.bo);
+
+		amdgpu_vm_bo_base_init(&entry->base, vm, pt);
+	}
+
+	return 0;
+
+error_free_pt:
+	amdgpu_bo_unref(&pt->shadow);
+	amdgpu_bo_unref(&pt);
+	return r;
 }
 
 /**
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] drm/amdgpu: use dfs iterator to free PDs/PTs
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 05/11] drm/amdgpu: use leaf iterator for allocating PD/PT Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 07/11] drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_level Christian König
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Allows us to free all PDs/PTs without recursion.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 62 ++++++++++++++++------------------
 1 file changed, 30 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index c9b7793501cd..805cca89d8c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -930,6 +930,35 @@ int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
 	return r;
 }
 
+/**
+ * amdgpu_vm_free_pts - free PD/PT levels
+ *
+ * @adev: amdgpu device structure
+ * @parent: PD/PT starting level to free
+ * @level: level of parent structure
+ *
+ * Free the page directory or page table level and all sub levels.
+ */
+static void amdgpu_vm_free_pts(struct amdgpu_device *adev,
+			       struct amdgpu_vm *vm)
+{
+	struct amdgpu_vm_pt_cursor cursor;
+	struct amdgpu_vm_pt *entry;
+
+	for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry) {
+
+		if (entry->base.bo) {
+			list_del(&entry->base.bo_list);
+			list_del(&entry->base.vm_status);
+			amdgpu_bo_unref(&entry->base.bo->shadow);
+			amdgpu_bo_unref(&entry->base.bo);
+		}
+		kvfree(entry->entries);
+	}
+
+	BUG_ON(vm->root.base.bo);
+}
+
 /**
  * amdgpu_vm_check_compute_bug - check whether asic has compute vm bug
  *
@@ -3110,36 +3139,6 @@ void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	vm->pasid = 0;
 }
 
-/**
- * amdgpu_vm_free_levels - free PD/PT levels
- *
- * @adev: amdgpu device structure
- * @parent: PD/PT starting level to free
- * @level: level of parent structure
- *
- * Free the page directory or page table level and all sub levels.
- */
-static void amdgpu_vm_free_levels(struct amdgpu_device *adev,
-				  struct amdgpu_vm_pt *parent,
-				  unsigned level)
-{
-	unsigned i, num_entries = amdgpu_vm_num_entries(adev, level);
-
-	if (parent->base.bo) {
-		list_del(&parent->base.bo_list);
-		list_del(&parent->base.vm_status);
-		amdgpu_bo_unref(&parent->base.bo->shadow);
-		amdgpu_bo_unref(&parent->base.bo);
-	}
-
-	if (parent->entries)
-		for (i = 0; i < num_entries; i++)
-			amdgpu_vm_free_levels(adev, &parent->entries[i],
-					      level + 1);
-
-	kvfree(parent->entries);
-}
-
 /**
  * amdgpu_vm_fini - tear down a vm instance
  *
@@ -3197,8 +3196,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	if (r) {
 		dev_err(adev->dev, "Leaking page tables because BO reservation failed\n");
 	} else {
-		amdgpu_vm_free_levels(adev, &vm->root,
-				      adev->vm_manager.root_level);
+		amdgpu_vm_free_pts(adev, vm);
 		amdgpu_bo_unreserve(root);
 	}
 	amdgpu_bo_unref(&root);
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_level
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 06/11] drm/amdgpu: use dfs iterator to free PDs/PTs Christian König
@ 2018-09-09 18:03   ` Christian König
       [not found]     ` <20180909180339.1910-8-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 08/11] drm/amdgpu: use leaf iterator for filling PTs Christian König
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Less code and easier to maintain.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 805cca89d8c7..65d95e6771aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1366,33 +1366,18 @@ static void amdgpu_vm_update_pde(struct amdgpu_pte_update_params *params,
  *
  * @adev: amdgpu_device pointer
  * @vm: related vm
- * @parent: parent PD
- * @level: VMPT level
  *
  * Mark all PD level as invalid after an error.
  */
 static void amdgpu_vm_invalidate_level(struct amdgpu_device *adev,
-				       struct amdgpu_vm *vm,
-				       struct amdgpu_vm_pt *parent,
-				       unsigned level)
+				       struct amdgpu_vm *vm)
 {
-	unsigned pt_idx, num_entries;
-
-	/*
-	 * Recurse into the subdirectories. This recursion is harmless because
-	 * we only have a maximum of 5 layers.
-	 */
-	num_entries = amdgpu_vm_num_entries(adev, level);
-	for (pt_idx = 0; pt_idx < num_entries; ++pt_idx) {
-		struct amdgpu_vm_pt *entry = &parent->entries[pt_idx];
-
-		if (!entry->base.bo)
-			continue;
+	struct amdgpu_vm_pt_cursor cursor;
+	struct amdgpu_vm_pt *entry;
 
-		if (!entry->base.moved)
+	for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry)
+		if (entry->base.bo && !entry->base.moved)
 			amdgpu_vm_bo_relocated(&entry->base);
-		amdgpu_vm_invalidate_level(adev, vm, entry, level + 1);
-	}
 }
 
 /*
@@ -1489,8 +1474,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 	return 0;
 
 error:
-	amdgpu_vm_invalidate_level(adev, vm, &vm->root,
-				   adev->vm_manager.root_level);
+	amdgpu_vm_invalidate_level(adev, vm);
 	amdgpu_job_free(job);
 	return r;
 }
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] drm/amdgpu: use leaf iterator for filling PTs
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 07/11] drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_level Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 09/11] drm/amdgpu: meld together VM fragment and huge page handling Christian König
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Less overhead and is the starting point for further cleanups and
improvements.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 62 ++++++++--------------------------
 1 file changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 65d95e6771aa..a7b3d27dd3cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1479,36 +1479,6 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 	return r;
 }
 
-/**
- * amdgpu_vm_find_entry - find the entry for an address
- *
- * @p: see amdgpu_pte_update_params definition
- * @addr: virtual address in question
- * @entry: resulting entry or NULL
- * @parent: parent entry
- *
- * Find the vm_pt entry and it's parent for the given address.
- */
-void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
-			 struct amdgpu_vm_pt **entry,
-			 struct amdgpu_vm_pt **parent)
-{
-	unsigned level = p->adev->vm_manager.root_level;
-
-	*parent = NULL;
-	*entry = &p->vm->root;
-	while ((*entry)->entries) {
-		unsigned shift = amdgpu_vm_level_shift(p->adev, level++);
-
-		*parent = *entry;
-		*entry = &(*entry)->entries[addr >> shift];
-		addr &= (1ULL << shift) - 1;
-	}
-
-	if (level != AMDGPU_VM_PTB)
-		*entry = NULL;
-}
-
 /**
  * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
  *
@@ -1572,36 +1542,34 @@ static int amdgpu_vm_update_ptes(struct amdgpu_pte_update_params *params,
 {
 	struct amdgpu_device *adev = params->adev;
 	const uint64_t mask = AMDGPU_VM_PTE_COUNT(adev) - 1;
-
-	uint64_t addr, pe_start;
-	struct amdgpu_bo *pt;
-	unsigned nptes;
+	struct amdgpu_vm_pt_cursor cursor;
 
 	/* walk over the address space and update the page tables */
-	for (addr = start; addr < end; addr += nptes,
-	     dst += nptes * AMDGPU_GPU_PAGE_SIZE) {
-		struct amdgpu_vm_pt *entry, *parent;
+	for_each_amdgpu_vm_pt_leaf(adev, params->vm, start, end - 1, cursor) {
+		struct amdgpu_bo *pt = cursor.entry->base.bo;
+		uint64_t pe_start;
+		unsigned nptes;
 
-		amdgpu_vm_get_entry(params, addr, &entry, &parent);
-		if (!entry)
+		if (!pt || cursor.level != AMDGPU_VM_PTB)
 			return -ENOENT;
 
-		if ((addr & ~mask) == (end & ~mask))
-			nptes = end - addr;
+		if ((cursor.pfn & ~mask) == (end & ~mask))
+			nptes = end - cursor.pfn;
 		else
-			nptes = AMDGPU_VM_PTE_COUNT(adev) - (addr & mask);
+			nptes = AMDGPU_VM_PTE_COUNT(adev) - (cursor.pfn & mask);
 
-		amdgpu_vm_handle_huge_pages(params, entry, parent,
+		amdgpu_vm_handle_huge_pages(params, cursor.entry, cursor.parent,
 					    nptes, dst, flags);
 		/* We don't need to update PTEs for huge pages */
-		if (entry->huge)
+		if (cursor.entry->huge) {
+			dst += nptes * AMDGPU_GPU_PAGE_SIZE;
 			continue;
+		}
 
-		pt = entry->base.bo;
-		pe_start = (addr & mask) * 8;
+		pe_start = (cursor.pfn & mask) * 8;
 		amdgpu_vm_update_func(params, pt, pe_start, dst, nptes,
 				      AMDGPU_GPU_PAGE_SIZE, flags);
-
+		dst += nptes * AMDGPU_GPU_PAGE_SIZE;
 	}
 
 	return 0;
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] drm/amdgpu: meld together VM fragment and huge page handling
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 08/11] drm/amdgpu: use leaf iterator for filling PTs Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-09 18:03   ` [PATCH 10/11] drm/amdgpu: use the maximum possible fragment size on Vega/Raven Christian König
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This optimizes the generating of PTEs by walking the hierarchy only once
for a range and making changes as necessary.

It allows for both huge (2MB) as well giant (1GB) pages to be used on
Vega and Raven.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 267 ++++++++++++++++++---------------
 1 file changed, 147 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a7b3d27dd3cc..5b8c931a5d37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1480,46 +1480,76 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
 }
 
 /**
- * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ * amdgpu_vm_update_huge - figure out parameters for PTE updates
  *
- * @p: see amdgpu_pte_update_params definition
- * @entry: vm_pt entry to check
- * @parent: parent entry
- * @nptes: number of PTEs updated with this operation
- * @dst: destination address where the PTEs should point to
- * @flags: access flags fro the PTEs
- *
- * Check if we can update the PD with a huge page.
+ * Make sure to set the right flags for the PTEs at the desired level.
  */
-static void amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
-					struct amdgpu_vm_pt *entry,
-					struct amdgpu_vm_pt *parent,
-					unsigned nptes, uint64_t dst,
-					uint64_t flags)
-{
-	uint64_t pde;
+static void amdgpu_vm_update_huge(struct amdgpu_pte_update_params *params,
+				  struct amdgpu_bo *bo, unsigned level,
+				  uint64_t pe, uint64_t addr,
+				  unsigned count, uint32_t incr,
+				  uint64_t flags)
 
-	/* In the case of a mixed PT the PDE must point to it*/
-	if (p->adev->asic_type >= CHIP_VEGA10 && !p->src &&
-	    nptes == AMDGPU_VM_PTE_COUNT(p->adev)) {
-		/* Set the huge page flag to stop scanning at this PDE */
+{
+	if (level != AMDGPU_VM_PTB) {
 		flags |= AMDGPU_PDE_PTE;
+		amdgpu_gmc_get_vm_pde(params->adev, level, &addr, &flags);
 	}
 
-	if (!(flags & AMDGPU_PDE_PTE)) {
-		if (entry->huge) {
-			/* Add the entry to the relocated list to update it. */
-			entry->huge = false;
-			amdgpu_vm_bo_relocated(&entry->base);
-		}
+	amdgpu_vm_update_func(params, bo, pe, addr, count, incr, flags);
+}
+
+/**
+ * amdgpu_vm_fragment - get fragment for PTEs
+ *
+ * @params: see amdgpu_pte_update_params definition
+ * @start: first PTE to handle
+ * @end: last PTE to handle
+ * @flags: hw mapping flags
+ * @frag: resulting fragment size
+ * @frag_end: end of this fragment
+ *
+ * Returns the first possible fragment for the start and end address.
+ */
+static void amdgpu_vm_fragment(struct amdgpu_pte_update_params *params,
+			       uint64_t start, uint64_t end, uint64_t flags,
+			       unsigned int *frag, uint64_t *frag_end)
+{
+	/**
+	 * The MC L1 TLB supports variable sized pages, based on a fragment
+	 * field in the PTE. When this field is set to a non-zero value, page
+	 * granularity is increased from 4KB to (1 << (12 + frag)). The PTE
+	 * flags are considered valid for all PTEs within the fragment range
+	 * and corresponding mappings are assumed to be physically contiguous.
+	 *
+	 * The L1 TLB can store a single PTE for the whole fragment,
+	 * significantly increasing the space available for translation
+	 * caching. This leads to large improvements in throughput when the
+	 * TLB is under pressure.
+	 *
+	 * The L2 TLB distributes small and large fragments into two
+	 * asymmetric partitions. The large fragment cache is significantly
+	 * larger. Thus, we try to use large fragments wherever possible.
+	 * Userspace can support this by aligning virtual base address and
+	 * allocation size to the fragment size.
+	 */
+	unsigned max_frag = params->adev->vm_manager.fragment_size;
+
+	/* system pages are non continuously */
+	if (params->src || !(flags & AMDGPU_PTE_VALID)) {
+		*frag = 0;
+		*frag_end = end;
 		return;
 	}
 
-	entry->huge = true;
-	amdgpu_gmc_get_vm_pde(p->adev, AMDGPU_VM_PDB0, &dst, &flags);
-
-	pde = (entry - parent->entries) * 8;
-	amdgpu_vm_update_func(p, parent->base.bo, pde, dst, 1, 0, flags);
+	/* This intentionally wraps around if no bit is set */
+	*frag = min((unsigned)ffs(start) - 1, (unsigned)fls64(end - start) - 1);
+	if (*frag >= max_frag) {
+		*frag = max_frag;
+		*frag_end = end & ~((1ULL << max_frag) - 1);
+	} else {
+		*frag_end = start + (1 << *frag);
+	}
 }
 
 /**
@@ -1537,108 +1567,105 @@ static void amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
  * 0 for success, -EINVAL for failure.
  */
 static int amdgpu_vm_update_ptes(struct amdgpu_pte_update_params *params,
-				  uint64_t start, uint64_t end,
-				  uint64_t dst, uint64_t flags)
+				 uint64_t start, uint64_t end,
+				 uint64_t dst, uint64_t flags)
 {
 	struct amdgpu_device *adev = params->adev;
-	const uint64_t mask = AMDGPU_VM_PTE_COUNT(adev) - 1;
 	struct amdgpu_vm_pt_cursor cursor;
+	uint64_t frag_start = start, frag_end;
+	unsigned int frag;
 
-	/* walk over the address space and update the page tables */
-	for_each_amdgpu_vm_pt_leaf(adev, params->vm, start, end - 1, cursor) {
+	/* figure out the initial fragment */
+	amdgpu_vm_fragment(params, frag_start, end, flags, &frag, &frag_end);
+
+	/* walk over the address space and update the PTs */
+	amdgpu_vm_pt_start(adev, params->vm, start, &cursor);
+	while (cursor.pfn < end) {
 		struct amdgpu_bo *pt = cursor.entry->base.bo;
-		uint64_t pe_start;
-		unsigned nptes;
+		unsigned shift, parent_shift, num_entries;
+		uint64_t incr, entry_end, pe_start;
 
-		if (!pt || cursor.level != AMDGPU_VM_PTB)
+		if (!pt)
 			return -ENOENT;
 
-		if ((cursor.pfn & ~mask) == (end & ~mask))
-			nptes = end - cursor.pfn;
-		else
-			nptes = AMDGPU_VM_PTE_COUNT(adev) - (cursor.pfn & mask);
-
-		amdgpu_vm_handle_huge_pages(params, cursor.entry, cursor.parent,
-					    nptes, dst, flags);
-		/* We don't need to update PTEs for huge pages */
-		if (cursor.entry->huge) {
-			dst += nptes * AMDGPU_GPU_PAGE_SIZE;
+		/* The root level can't be a huge page */
+		if (cursor.level == adev->vm_manager.root_level) {
+			if (!amdgpu_vm_pt_descendant(adev, &cursor))
+				return -ENOENT;
 			continue;
 		}
 
-		pe_start = (cursor.pfn & mask) * 8;
-		amdgpu_vm_update_func(params, pt, pe_start, dst, nptes,
-				      AMDGPU_GPU_PAGE_SIZE, flags);
-		dst += nptes * AMDGPU_GPU_PAGE_SIZE;
-	}
-
-	return 0;
-}
+		/* First check if the entry is already handled */
+		if (cursor.pfn < frag_start) {
+			cursor.entry->huge = true;
+			amdgpu_vm_pt_next(adev, &cursor);
+			continue;
+		}
 
-/*
- * amdgpu_vm_frag_ptes - add fragment information to PTEs
- *
- * @params: see amdgpu_pte_update_params definition
- * @vm: requested vm
- * @start: first PTE to handle
- * @end: last PTE to handle
- * @dst: addr those PTEs should point to
- * @flags: hw mapping flags
- *
- * Returns:
- * 0 for success, -EINVAL for failure.
- */
-static int amdgpu_vm_frag_ptes(struct amdgpu_pte_update_params	*params,
-				uint64_t start, uint64_t end,
-				uint64_t dst, uint64_t flags)
-{
-	/**
-	 * The MC L1 TLB supports variable sized pages, based on a fragment
-	 * field in the PTE. When this field is set to a non-zero value, page
-	 * granularity is increased from 4KB to (1 << (12 + frag)). The PTE
-	 * flags are considered valid for all PTEs within the fragment range
-	 * and corresponding mappings are assumed to be physically contiguous.
-	 *
-	 * The L1 TLB can store a single PTE for the whole fragment,
-	 * significantly increasing the space available for translation
-	 * caching. This leads to large improvements in throughput when the
-	 * TLB is under pressure.
-	 *
-	 * The L2 TLB distributes small and large fragments into two
-	 * asymmetric partitions. The large fragment cache is significantly
-	 * larger. Thus, we try to use large fragments wherever possible.
-	 * Userspace can support this by aligning virtual base address and
-	 * allocation size to the fragment size.
-	 */
-	unsigned max_frag = params->adev->vm_manager.fragment_size;
-	int r;
+		/* If it isn't already handled it can't be a huge page */
+		if (cursor.entry->huge) {
+			/* Add the entry to the relocated list to update it. */
+			cursor.entry->huge = false;
+			amdgpu_vm_bo_relocated(&cursor.entry->base);
+		}
 
-	/* system pages are non continuously */
-	if (params->src || !(flags & AMDGPU_PTE_VALID))
-		return amdgpu_vm_update_ptes(params, start, end, dst, flags);
-
-	while (start != end) {
-		uint64_t frag_flags, frag_end;
-		unsigned frag;
-
-		/* This intentionally wraps around if no bit is set */
-		frag = min((unsigned)ffs(start) - 1,
-			   (unsigned)fls64(end - start) - 1);
-		if (frag >= max_frag) {
-			frag_flags = AMDGPU_PTE_FRAG(max_frag);
-			frag_end = end & ~((1ULL << max_frag) - 1);
-		} else {
-			frag_flags = AMDGPU_PTE_FRAG(frag);
-			frag_end = start + (1 << frag);
+		shift = amdgpu_vm_level_shift(adev, cursor.level);
+		parent_shift = amdgpu_vm_level_shift(adev, cursor.level - 1);
+		if (adev->asic_type < CHIP_VEGA10) {
+			/* No huge page support before GMC v9 */
+			if (cursor.level != AMDGPU_VM_PTB) {
+				if (!amdgpu_vm_pt_descendant(adev, &cursor))
+					return -ENOENT;
+				continue;
+			}
+		} else if (frag < shift) {
+			/* We can't use this level when the fragment size is
+			 * smaller than the address shift. Go to the next
+			 * child entry and try again.
+			 */
+			if (!amdgpu_vm_pt_descendant(adev, &cursor))
+				return -ENOENT;
+			continue;
+		} else if (frag >= parent_shift) {
+			/* If the fragment size is even larger than the parent
+			 * shift we should go up one level and check it again.
+			 */
+			if (!amdgpu_vm_pt_ancestor(&cursor))
+				return -ENOENT;
+			continue;
 		}
 
-		r = amdgpu_vm_update_ptes(params, start, frag_end, dst,
-					  flags | frag_flags);
-		if (r)
-			return r;
+		/* Looks good so far, calculate parameters for the update */
+		incr = AMDGPU_GPU_PAGE_SIZE << shift;
+		num_entries = amdgpu_vm_num_entries(adev, cursor.level);
+		pe_start = ((cursor.pfn >> shift) & (num_entries - 1)) * 8;
+		entry_end = num_entries << shift;
+		entry_end += cursor.pfn & ~(entry_end - 1);
+		entry_end = min(entry_end, end);
+
+		do {
+			uint64_t upd_end = min(entry_end, frag_end);
+			unsigned nptes = (upd_end - frag_start) >> shift;
+
+			amdgpu_vm_update_huge(params, pt, cursor.level,
+					      pe_start, dst, nptes, incr,
+					      flags | AMDGPU_PTE_FRAG(frag));
+
+			pe_start += nptes * 8;
+			dst += nptes * AMDGPU_GPU_PAGE_SIZE << shift;
+
+			frag_start = upd_end;
+			if (frag_start >= frag_end) {
+				/* figure out the next fragment */
+				amdgpu_vm_fragment(params, frag_start, end,
+						   flags, &frag, &frag_end);
+				if (frag < shift)
+					break;
+			}
+		} while (frag_start < entry_end);
 
-		dst += (frag_end - start) * AMDGPU_GPU_PAGE_SIZE;
-		start = frag_end;
+		if (frag >= shift)
+			amdgpu_vm_pt_next(adev, &cursor);
 	}
 
 	return 0;
@@ -1700,8 +1727,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 
 		params.func = amdgpu_vm_cpu_set_ptes;
 		params.pages_addr = pages_addr;
-		return amdgpu_vm_frag_ptes(&params, start, last + 1,
-					   addr, flags);
+		return amdgpu_vm_update_ptes(&params, start, last + 1,
+					     addr, flags);
 	}
 
 	ring = container_of(vm->entity.rq->sched, struct amdgpu_ring, sched);
@@ -1780,7 +1807,7 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	if (r)
 		goto error_free;
 
-	r = amdgpu_vm_frag_ptes(&params, start, last + 1, addr, flags);
+	r = amdgpu_vm_update_ptes(&params, start, last + 1, addr, flags);
 	if (r)
 		goto error_free;
 
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] drm/amdgpu: use the maximum possible fragment size on Vega/Raven
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 09/11] drm/amdgpu: meld together VM fragment and huge page handling Christian König
@ 2018-09-09 18:03   ` Christian König
       [not found]     ` <20180909180339.1910-11-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-09 18:03   ` [PATCH 11/11] drm/amdgpu: allow fragment processing for invalid PTEs Christian König
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

The fragment size controls only the L1 on Vega/Raven and we now don't
have any extra overhead any more because of larger fragments.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5b8c931a5d37..433849d267f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1533,7 +1533,12 @@ static void amdgpu_vm_fragment(struct amdgpu_pte_update_params *params,
 	 * Userspace can support this by aligning virtual base address and
 	 * allocation size to the fragment size.
 	 */
-	unsigned max_frag = params->adev->vm_manager.fragment_size;
+	unsigned max_frag;
+
+	if (params->adev->asic_type < CHIP_VEGA10)
+		max_frag = params->adev->vm_manager.fragment_size;
+	else
+		max_frag = 31;
 
 	/* system pages are non continuously */
 	if (params->src || !(flags & AMDGPU_PTE_VALID)) {
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] drm/amdgpu: allow fragment processing for invalid PTEs
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 10/11] drm/amdgpu: use the maximum possible fragment size on Vega/Raven Christian König
@ 2018-09-09 18:03   ` Christian König
  2018-09-11  2:17   ` Optimize VM handling a bit more Felix Kuehling
  2018-09-11  2:39   ` Zhang, Jerry (Junwei)
  12 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-09 18:03 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

That should improve the PRT performance on Vega quite a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 433849d267f2..3af536fd27d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1541,7 +1541,7 @@ static void amdgpu_vm_fragment(struct amdgpu_pte_update_params *params,
 		max_frag = 31;
 
 	/* system pages are non continuously */
-	if (params->src || !(flags & AMDGPU_PTE_VALID)) {
+	if (params->src) {
 		*frag = 0;
 		*frag_end = end;
 		return;
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]     ` <20180909180339.1910-2-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  0:08       ` Felix Kuehling
       [not found]         ` <60b9c96a-d31d-cd61-df22-c8414f0166dc-5C7GfCeVMHo@public.gmane.org>
  2018-09-11  9:50       ` Huang Rui
  1 sibling, 1 reply; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11  0:08 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

This looks good. But it complicates something I've been looking at:
Remembering which process drm_mm_nodes last belonged to, so that they
don't need to be cleared next time they are allocated by the same
process. Having most nodes the same size (vram_page_split pages) would
make this very easy and efficient for the most common cases (large
allocations without any exotic address limitations or alignment
requirements).

Does anything else in this patch series depend on this optimization?

Regards,
  Felix


On 2018-09-09 02:03 PM, Christian König wrote:
> Try to allocate VRAM in power of two sizes and only fallback to vram
> split sizes if that fails.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 9cfa8a9ada92..3f9d5d00c9b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>  	return usage;
>  }
>  
> +/**
> + * amdgpu_vram_mgr_virt_start - update virtual start address
> + *
> + * @mem: ttm_mem_reg to update
> + * @node: just allocated node
> + *
> + * Calculate a virtual BO start address to easily check if everything is CPU
> + * accessible.
> + */
> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
> +				       struct drm_mm_node *node)
> +{
> +	unsigned long start;
> +
> +	start = node->start + node->size;
> +	if (start > mem->num_pages)
> +		start -= mem->num_pages;
> +	else
> +		start = 0;
> +	mem->start = max(mem->start, start);
> +}
> +
>  /**
>   * amdgpu_vram_mgr_new - allocate new ranges
>   *
> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>  	pages_left = mem->num_pages;
>  
>  	spin_lock(&mgr->lock);
> -	for (i = 0; i < num_nodes; ++i) {
> +	for (i = 0; pages_left >= pages_per_node; ++i) {
> +		unsigned long pages = rounddown_pow_of_two(pages_left);
> +
> +		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
> +						pages_per_node, 0,
> +						place->fpfn, lpfn,
> +						mode);
> +		if (unlikely(r))
> +			break;
> +
> +		usage += nodes[i].size << PAGE_SHIFT;
> +		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
> +		pages_left -= pages;
> +	}
> +
> +	for (; pages_left; ++i) {
>  		unsigned long pages = min(pages_left, pages_per_node);
>  		uint32_t alignment = mem->page_alignment;
> -		unsigned long start;
>  
>  		if (pages == pages_per_node)
>  			alignment = pages_per_node;
> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>  
>  		usage += nodes[i].size << PAGE_SHIFT;
>  		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
> -
> -		/* Calculate a virtual BO start address to easily check if
> -		 * everything is CPU accessible.
> -		 */
> -		start = nodes[i].start + nodes[i].size;
> -		if (start > mem->num_pages)
> -			start -= mem->num_pages;
> -		else
> -			start = 0;
> -		mem->start = max(mem->start, start);
> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>  		pages_left -= pages;
>  	}
>  	spin_unlock(&mgr->lock);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_level
       [not found]     ` <20180909180339.1910-8-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  0:19       ` Felix Kuehling
  0 siblings, 0 replies; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11  0:19 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-09 02:03 PM, Christian König wrote:
> Less code and easier to maintain.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 28 ++++++----------------------
>  1 file changed, 6 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 805cca89d8c7..65d95e6771aa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1366,33 +1366,18 @@ static void amdgpu_vm_update_pde(struct amdgpu_pte_update_params *params,
>   *
>   * @adev: amdgpu_device pointer
>   * @vm: related vm
> - * @parent: parent PD
> - * @level: VMPT level
>   *
>   * Mark all PD level as invalid after an error.
>   */
>  static void amdgpu_vm_invalidate_level(struct amdgpu_device *adev,

Since you removed the level parameter, maybe also remove "level" from
the function name.

Regards,
  Felix

> -				       struct amdgpu_vm *vm,
> -				       struct amdgpu_vm_pt *parent,
> -				       unsigned level)
> +				       struct amdgpu_vm *vm)
>  {
> -	unsigned pt_idx, num_entries;
> -
> -	/*
> -	 * Recurse into the subdirectories. This recursion is harmless because
> -	 * we only have a maximum of 5 layers.
> -	 */
> -	num_entries = amdgpu_vm_num_entries(adev, level);
> -	for (pt_idx = 0; pt_idx < num_entries; ++pt_idx) {
> -		struct amdgpu_vm_pt *entry = &parent->entries[pt_idx];
> -
> -		if (!entry->base.bo)
> -			continue;
> +	struct amdgpu_vm_pt_cursor cursor;
> +	struct amdgpu_vm_pt *entry;
>  
> -		if (!entry->base.moved)
> +	for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry)
> +		if (entry->base.bo && !entry->base.moved)
>  			amdgpu_vm_bo_relocated(&entry->base);
> -		amdgpu_vm_invalidate_level(adev, vm, entry, level + 1);
> -	}
>  }
>  
>  /*
> @@ -1489,8 +1474,7 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev,
>  	return 0;
>  
>  error:
> -	amdgpu_vm_invalidate_level(adev, vm, &vm->root,
> -				   adev->vm_manager.root_level);
> +	amdgpu_vm_invalidate_level(adev, vm);
>  	amdgpu_job_free(job);
>  	return r;
>  }

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] drm/amdgpu: use the maximum possible fragment size on Vega/Raven
       [not found]     ` <20180909180339.1910-11-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  0:45       ` Felix Kuehling
  0 siblings, 0 replies; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11  0:45 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


On 2018-09-09 02:03 PM, Christian König wrote:
> The fragment size controls only the L1 on Vega/Raven and we now don't
> have any extra overhead any more because of larger fragments.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5b8c931a5d37..433849d267f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1533,7 +1533,12 @@ static void amdgpu_vm_fragment(struct amdgpu_pte_update_params *params,
>  	 * Userspace can support this by aligning virtual base address and
>  	 * allocation size to the fragment size.

Do you want to update the comment here as well with something similar to
the patch description?

Regards,
  Felix

>  	 */
> -	unsigned max_frag = params->adev->vm_manager.fragment_size;
> +	unsigned max_frag;
> +
> +	if (params->adev->asic_type < CHIP_VEGA10)
> +		max_frag = params->adev->vm_manager.fragment_size;
> +	else
> +		max_frag = 31;
>  
>  	/* system pages are non continuously */
>  	if (params->src || !(flags & AMDGPU_PTE_VALID)) {

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators
       [not found]     ` <20180909180339.1910-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  1:51       ` Felix Kuehling
  2018-09-11  2:38       ` Zhang, Jerry (Junwei)
  1 sibling, 0 replies; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11  1:51 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-09 02:03 PM, Christian König wrote:
> Both a leaf as well as dfs iterator to walk over all the PDs/PTs.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 221 +++++++++++++++++++++++++++++++++
>  1 file changed, 221 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 416eccd9ea29..4007202585d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -355,6 +355,227 @@ static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct amdgpu_vm_pt *pt)
>  	return list_first_entry(&parent->va, struct amdgpu_vm_pt, base.bo_list);
>  }
>  
> +/**
> + * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
> + */
> +struct amdgpu_vm_pt_cursor {

A lot of functions below have an adev parameter. I wonder if it makes
sense to add that to the cursor structure. Maybe it doesn't matter. Many
of those small functions will be inlined anyway.

> +	uint64_t pfn;
> +	struct amdgpu_vm_pt *parent;
> +	struct amdgpu_vm_pt *entry;
> +	unsigned level;
> +};
> +
> +/**
> + * amdgpu_vm_pt_start - start PD/PT walk
> + *
> + * @adev: amdgpu_device pointer
> + * @vm: amdgpu_vm structure
> + * @start: start address of the walk
> + * @cursor: state to initialize
> + *
> + * Initialize a amdgpu_vm_pt_cursor to start a walk.
> + */
> +static void amdgpu_vm_pt_start(struct amdgpu_device *adev,
> +			       struct amdgpu_vm *vm, uint64_t start,
> +			       struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	cursor->pfn = start;
> +	cursor->parent = NULL;
> +	cursor->entry = &vm->root;
> +	cursor->level = adev->vm_manager.root_level;
> +}
> +
> +/**
> + * amdgpu_vm_pt_descendant - got to child node
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk to the child node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_descendant(struct amdgpu_device *adev,
> +				    struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	unsigned num_entries, shift, idx;
> +
> +	if (!cursor->entry->entries)
> +		return false;
> +
> +	BUG_ON(!cursor->entry->base.bo);

Is that just a sanity check? Doesn't seem like anything in this function
depends on this. In KFD we made a choice to replace all BUG_ONs with
other means of reporting problems. Mostly WARN_ON, which is just as
noisy, but easier to debug. In a function that already has a failure
mode, it just adds one more condition for the function to fail, so the
code calling it shouldn't need any modifications.

checkpatch.pl will issue a warning for all BUG_ONs to consider alternatives.

> +	num_entries = amdgpu_vm_num_entries(adev, cursor->level);
> +	shift = amdgpu_vm_level_shift(adev, cursor->level);
> +
> +	++cursor->level;
> +	idx = (cursor->pfn >> shift) % num_entries;

& (num_entries - 1)?

> +	cursor->parent = cursor->entry;
> +	cursor->entry = &cursor->entry->entries[idx];
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_sibling - go to sibling node
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk to the sibling node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_sibling(struct amdgpu_device *adev,
> +				 struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	unsigned shift, num_entries;
> +
> +	/* Root doesn't have a sibling */
> +	if (!cursor->parent)
> +		return false;
> +
> +	/* Go to our parents and see if we got a sibling */
> +	shift = amdgpu_vm_level_shift(adev, cursor->level - 1);
> +	num_entries = amdgpu_vm_num_entries(adev, cursor->level - 1);
> +
> +	if (cursor->entry == &cursor->parent->entries[num_entries - 1])
> +		return false;
> +
> +	cursor->pfn += 1ULL << shift;
> +	cursor->pfn &= ~((1ULL << shift) - 1);
> +	++cursor->entry;
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_ancestor - go to parent node
> + *
> + * @adev: amdgpu_device pointer

Doesn't really have an adev parameter.

> + * @cursor: current state
> + *
> + * Walk to the parent node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_ancestor(struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	if (!cursor->parent)
> +		return false;
> +
> +	--cursor->level;
> +	cursor->entry = cursor->parent;
> +	cursor->parent = amdgpu_vm_pt_parent(cursor->parent);
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_next - get next PD/PT in hieratchy
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk the PD/PT tree to the next node.
> + */
> +static void amdgpu_vm_pt_next(struct amdgpu_device *adev,
> +			      struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	/* First try a newborn child */
> +	if (amdgpu_vm_pt_descendant(adev, cursor))
> +		return;
> +
> +	/* If that didn't worked try to find a sibling */
> +	while (!amdgpu_vm_pt_sibling(adev, cursor)) {
> +		/* No sibling, go to our parents and grandparents */
> +		if (!amdgpu_vm_pt_ancestor(cursor)) {
> +			cursor->pfn = ~0ll;
> +			return;
> +		}
> +	}
> +}
> +
> +/**
> + * amdgpu_vm_pt_first_leaf - get first leaf PD/PT
> + *
> + * @adev: amdgpu_device pointer
> + * @vm: amdgpu_vm structure
> + * @start: start addr of the walk
> + * @cursor: state to initialize
> + *
> + * Start a walk and go directly to the leaf node.
> + */
> +static void amdgpu_vm_pt_first_leaf(struct amdgpu_device *adev,
> +				    struct amdgpu_vm *vm, uint64_t start,
> +				    struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_start(adev, vm, start, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * amdgpu_vm_pt_next_leaf - get next leaf PD/PT
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk the PD/PT tree to the next leaf node.
> + */
> +static void amdgpu_vm_pt_next_leaf(struct amdgpu_device *adev,
> +				   struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_next(adev, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * for_each_amdgpu_vm_pt_leaf - walk over all leaf PDs/PTs in the hierarchy
> + */
> +#define for_each_amdgpu_vm_pt_leaf(adev, vm, start, end, cursor)		\
> +	for (amdgpu_vm_pt_first_leaf((adev), (vm), (start), &(cursor));		\
> +	     (cursor).pfn <= end; amdgpu_vm_pt_next_leaf((adev), &(cursor)))
> +
> +/**
> + * amdgpu_vm_pt_first_dfs - start a deep first search
> + *
> + * @adev: amdgpu_device structure
> + * @vm: amdgpu_vm structure
> + * @cursor: state to initialize
> + *
> + * Starts a deep first traversal of the PD/PT tree.
> + */
> +static void amdgpu_vm_pt_first_dfs(struct amdgpu_device *adev,
> +				   struct amdgpu_vm *vm,
> +				   struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_start(adev, vm, 0, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * amdgpu_vm_pt_next_dfs - get the next node for a deep first search
> + *
> + * @adev: amdgpu_device structure
> + * @cursor: current state
> + *
> + * Move the cursor to the next node in a deep first search.
> + */
> +static void amdgpu_vm_pt_next_dfs(struct amdgpu_device *adev,
> +				  struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	if (!cursor->parent)
> +		cursor->entry = NULL;
> +	else if (amdgpu_vm_pt_sibling(adev, cursor))
> +		while (amdgpu_vm_pt_descendant(adev, cursor));
> +	else
> +		amdgpu_vm_pt_ancestor(cursor);
> +}
> +
> +/**
> + * for_each_amdgpu_vm_pt_dfs_safe - safe deep first search of all PDs/PTs
> + */
> +#define for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry)			\
> +	for (amdgpu_vm_pt_first_dfs((adev), (vm), &(cursor)),			\
> +	     (entry) = (cursor).entry; (entry); (entry) = (cursor).entry,	\
> +	     amdgpu_vm_pt_next_dfs((adev), &(cursor)))

I think this will find the first entry twice. You need to add an
amdgpu_vm_pt_next_dfs(...) call to the initializer.

Regards,
  Felix

> +
>  /**
>   * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
>   *

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Optimize VM handling a bit more
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-09-09 18:03   ` [PATCH 11/11] drm/amdgpu: allow fragment processing for invalid PTEs Christian König
@ 2018-09-11  2:17   ` Felix Kuehling
  2018-09-11  2:39   ` Zhang, Jerry (Junwei)
  12 siblings, 0 replies; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11  2:17 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Patches 2, 3, 5, 6, 8, 9, 11 are Reviewed-by: Felix Kuehling
<Felix.Kuehling@amd.com>

I replied with comments to 1, 4, 7, 10.

On another thread, some of the machine learning guys found that the main
overhead of our memory allocator is clearing of BOs. I'm thinking about
a way to avoid that, but your patch 1 interferes with that.

My idea is to cache vram_page_split-sized drm_mm_nodes in
amdgpu_vram_mgr by process, instead of just freeing them. When the same
process allocates memory next, first try to use an existing node that
was already used by the same process. This would work for the common
case that there are no special alignment and placement restrictions.
Having most nodes of the same size (typically 2MB) helps and makes the
lookup of existing nodes very fast. Having to deal with different node
sizes would make it more difficult. Also the cache would likely
interfere with attempts to get large nodes in the first place.

I started some code, but I'm not sure I'll be able to send out something
working for review before my vacation at the end of this week, and then XDC.

Regards,
  Felix


On 2018-09-09 02:03 PM, Christian König wrote:
> Hi everyone,
>
> Especially on Vega and Raven VM handling is rather inefficient while creating PTEs because we originally only supported 2 level page tables and implemented 4 level page tables on top of that.
>
> This patch set reworks quite a bit of that handling and adds proper iterator and tree walking functions which are then used to update PTEs more efficiently.
>
> A totally constructed test case which tried to map 2GB of VRAM on an unaligned address is reduced from 45ms down to ~20ms on my test system.
>
> As a very positive side effect this also adds support for 1GB giant VRAM pages additional to the existing 2MB huge pages on Vega/Raven and also enables all additional power of two values (2MB-2GB) for the L1.
>
> This could be beneficial for applications which allocate very huge amounts of memory because it reduces the overhead of page table walks by 50% (huge pages where 25%).
>
> Please comment and/or review,
> Christian.
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators
       [not found]     ` <20180909180339.1910-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-11  1:51       ` Felix Kuehling
@ 2018-09-11  2:38       ` Zhang, Jerry (Junwei)
  1 sibling, 0 replies; 30+ messages in thread
From: Zhang, Jerry (Junwei) @ 2018-09-11  2:38 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/10/2018 02:03 AM, Christian König wrote:
> Both a leaf as well as dfs iterator to walk over all the PDs/PTs.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 221 +++++++++++++++++++++++++++++++++
>   1 file changed, 221 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 416eccd9ea29..4007202585d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -355,6 +355,227 @@ static struct amdgpu_vm_pt *amdgpu_vm_pt_parent(struct amdgpu_vm_pt *pt)
>   	return list_first_entry(&parent->va, struct amdgpu_vm_pt, base.bo_list);
>   }
>
> +/**
> + * amdgpu_vm_pt_cursor - state for for_each_amdgpu_vm_pt
> + */
> +struct amdgpu_vm_pt_cursor {
> +	uint64_t pfn;
> +	struct amdgpu_vm_pt *parent;
> +	struct amdgpu_vm_pt *entry;
> +	unsigned level;
> +};
> +
> +/**
> + * amdgpu_vm_pt_start - start PD/PT walk
> + *
> + * @adev: amdgpu_device pointer
> + * @vm: amdgpu_vm structure
> + * @start: start address of the walk
> + * @cursor: state to initialize
> + *
> + * Initialize a amdgpu_vm_pt_cursor to start a walk.
> + */
> +static void amdgpu_vm_pt_start(struct amdgpu_device *adev,
> +			       struct amdgpu_vm *vm, uint64_t start,
> +			       struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	cursor->pfn = start;
> +	cursor->parent = NULL;
> +	cursor->entry = &vm->root;
> +	cursor->level = adev->vm_manager.root_level;
> +}
> +
> +/**
> + * amdgpu_vm_pt_descendant - got to child node

seems typo for "go to"

Jerry

> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk to the child node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_descendant(struct amdgpu_device *adev,
> +				    struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	unsigned num_entries, shift, idx;
> +
> +	if (!cursor->entry->entries)
> +		return false;
> +
> +	BUG_ON(!cursor->entry->base.bo);
> +	num_entries = amdgpu_vm_num_entries(adev, cursor->level);
> +	shift = amdgpu_vm_level_shift(adev, cursor->level);
> +
> +	++cursor->level;
> +	idx = (cursor->pfn >> shift) % num_entries;
> +	cursor->parent = cursor->entry;
> +	cursor->entry = &cursor->entry->entries[idx];
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_sibling - go to sibling node
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk to the sibling node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_sibling(struct amdgpu_device *adev,
> +				 struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	unsigned shift, num_entries;
> +
> +	/* Root doesn't have a sibling */
> +	if (!cursor->parent)
> +		return false;
> +
> +	/* Go to our parents and see if we got a sibling */
> +	shift = amdgpu_vm_level_shift(adev, cursor->level - 1);
> +	num_entries = amdgpu_vm_num_entries(adev, cursor->level - 1);
> +
> +	if (cursor->entry == &cursor->parent->entries[num_entries - 1])
> +		return false;
> +
> +	cursor->pfn += 1ULL << shift;
> +	cursor->pfn &= ~((1ULL << shift) - 1);
> +	++cursor->entry;
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_ancestor - go to parent node
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk to the parent node of the current node.
> + * Returns:
> + * True if the walk was possible, false otherwise.
> + */
> +static bool amdgpu_vm_pt_ancestor(struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	if (!cursor->parent)
> +		return false;
> +
> +	--cursor->level;
> +	cursor->entry = cursor->parent;
> +	cursor->parent = amdgpu_vm_pt_parent(cursor->parent);
> +	return true;
> +}
> +
> +/**
> + * amdgpu_vm_pt_next - get next PD/PT in hieratchy
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk the PD/PT tree to the next node.
> + */
> +static void amdgpu_vm_pt_next(struct amdgpu_device *adev,
> +			      struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	/* First try a newborn child */
> +	if (amdgpu_vm_pt_descendant(adev, cursor))
> +		return;
> +
> +	/* If that didn't worked try to find a sibling */
> +	while (!amdgpu_vm_pt_sibling(adev, cursor)) {
> +		/* No sibling, go to our parents and grandparents */
> +		if (!amdgpu_vm_pt_ancestor(cursor)) {
> +			cursor->pfn = ~0ll;
> +			return;
> +		}
> +	}
> +}
> +
> +/**
> + * amdgpu_vm_pt_first_leaf - get first leaf PD/PT
> + *
> + * @adev: amdgpu_device pointer
> + * @vm: amdgpu_vm structure
> + * @start: start addr of the walk
> + * @cursor: state to initialize
> + *
> + * Start a walk and go directly to the leaf node.
> + */
> +static void amdgpu_vm_pt_first_leaf(struct amdgpu_device *adev,
> +				    struct amdgpu_vm *vm, uint64_t start,
> +				    struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_start(adev, vm, start, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * amdgpu_vm_pt_next_leaf - get next leaf PD/PT
> + *
> + * @adev: amdgpu_device pointer
> + * @cursor: current state
> + *
> + * Walk the PD/PT tree to the next leaf node.
> + */
> +static void amdgpu_vm_pt_next_leaf(struct amdgpu_device *adev,
> +				   struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_next(adev, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * for_each_amdgpu_vm_pt_leaf - walk over all leaf PDs/PTs in the hierarchy
> + */
> +#define for_each_amdgpu_vm_pt_leaf(adev, vm, start, end, cursor)		\
> +	for (amdgpu_vm_pt_first_leaf((adev), (vm), (start), &(cursor));		\
> +	     (cursor).pfn <= end; amdgpu_vm_pt_next_leaf((adev), &(cursor)))
> +
> +/**
> + * amdgpu_vm_pt_first_dfs - start a deep first search
> + *
> + * @adev: amdgpu_device structure
> + * @vm: amdgpu_vm structure
> + * @cursor: state to initialize
> + *
> + * Starts a deep first traversal of the PD/PT tree.
> + */
> +static void amdgpu_vm_pt_first_dfs(struct amdgpu_device *adev,
> +				   struct amdgpu_vm *vm,
> +				   struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	amdgpu_vm_pt_start(adev, vm, 0, cursor);
> +	while (amdgpu_vm_pt_descendant(adev, cursor));
> +}
> +
> +/**
> + * amdgpu_vm_pt_next_dfs - get the next node for a deep first search
> + *
> + * @adev: amdgpu_device structure
> + * @cursor: current state
> + *
> + * Move the cursor to the next node in a deep first search.
> + */
> +static void amdgpu_vm_pt_next_dfs(struct amdgpu_device *adev,
> +				  struct amdgpu_vm_pt_cursor *cursor)
> +{
> +	if (!cursor->parent)
> +		cursor->entry = NULL;
> +	else if (amdgpu_vm_pt_sibling(adev, cursor))
> +		while (amdgpu_vm_pt_descendant(adev, cursor));
> +	else
> +		amdgpu_vm_pt_ancestor(cursor);
> +}
> +
> +/**
> + * for_each_amdgpu_vm_pt_dfs_safe - safe deep first search of all PDs/PTs
> + */
> +#define for_each_amdgpu_vm_pt_dfs_safe(adev, vm, cursor, entry)			\
> +	for (amdgpu_vm_pt_first_dfs((adev), (vm), &(cursor)),			\
> +	     (entry) = (cursor).entry; (entry); (entry) = (cursor).entry,	\
> +	     amdgpu_vm_pt_next_dfs((adev), &(cursor)))
> +
>   /**
>    * amdgpu_vm_get_pd_bo - add the VM PD to a validation list
>    *
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Optimize VM handling a bit more
       [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-09-11  2:17   ` Optimize VM handling a bit more Felix Kuehling
@ 2018-09-11  2:39   ` Zhang, Jerry (Junwei)
  12 siblings, 0 replies; 30+ messages in thread
From: Zhang, Jerry (Junwei) @ 2018-09-11  2:39 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Apart from Felix comments,

Looks good for me, patch 2 ~ 8 are
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>

Patch 9 ~ 11 are
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>


On 09/10/2018 02:03 AM, Christian König wrote:
> Hi everyone,
>
> Especially on Vega and Raven VM handling is rather inefficient while creating PTEs because we originally only supported 2 level page tables and implemented 4 level page tables on top of that.
>
> This patch set reworks quite a bit of that handling and adds proper iterator and tree walking functions which are then used to update PTEs more efficiently.
>
> A totally constructed test case which tried to map 2GB of VRAM on an unaligned address is reduced from 45ms down to ~20ms on my test system.
>
> As a very positive side effect this also adds support for 1GB giant VRAM pages additional to the existing 2MB huge pages on Vega/Raven and also enables all additional power of two values (2MB-2GB) for the L1.
>
> This could be beneficial for applications which allocate very huge amounts of memory because it reduces the overhead of page table walks by 50% (huge pages where 25%).
>
> Please comment and/or review,
> Christian.
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]         ` <60b9c96a-d31d-cd61-df22-c8414f0166dc-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  6:49           ` Christian König
       [not found]             ` <9b61bb4c-4721-c8af-0810-4ecc18ed1ea4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-11  6:49 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Yeah well the whole patch set depends on that :)

Otherwise we don't get pages larger than 2MB for the L1 on Vega10.

But another question: Why do you want to clear VRAM on allocation? We 
perfectly support allocating VRAM without clearing it.

Regards,
Christian.

Am 11.09.2018 um 02:08 schrieb Felix Kuehling:
> This looks good. But it complicates something I've been looking at:
> Remembering which process drm_mm_nodes last belonged to, so that they
> don't need to be cleared next time they are allocated by the same
> process. Having most nodes the same size (vram_page_split pages) would
> make this very easy and efficient for the most common cases (large
> allocations without any exotic address limitations or alignment
> requirements).
>
> Does anything else in this patch series depend on this optimization?
>
> Regards,
>    Felix
>
>
> On 2018-09-09 02:03 PM, Christian König wrote:
>> Try to allocate VRAM in power of two sizes and only fallback to vram
>> split sizes if that fails.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>>   1 file changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 9cfa8a9ada92..3f9d5d00c9b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>>   	return usage;
>>   }
>>   
>> +/**
>> + * amdgpu_vram_mgr_virt_start - update virtual start address
>> + *
>> + * @mem: ttm_mem_reg to update
>> + * @node: just allocated node
>> + *
>> + * Calculate a virtual BO start address to easily check if everything is CPU
>> + * accessible.
>> + */
>> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
>> +				       struct drm_mm_node *node)
>> +{
>> +	unsigned long start;
>> +
>> +	start = node->start + node->size;
>> +	if (start > mem->num_pages)
>> +		start -= mem->num_pages;
>> +	else
>> +		start = 0;
>> +	mem->start = max(mem->start, start);
>> +}
>> +
>>   /**
>>    * amdgpu_vram_mgr_new - allocate new ranges
>>    *
>> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>   	pages_left = mem->num_pages;
>>   
>>   	spin_lock(&mgr->lock);
>> -	for (i = 0; i < num_nodes; ++i) {
>> +	for (i = 0; pages_left >= pages_per_node; ++i) {
>> +		unsigned long pages = rounddown_pow_of_two(pages_left);
>> +
>> +		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>> +						pages_per_node, 0,
>> +						place->fpfn, lpfn,
>> +						mode);
>> +		if (unlikely(r))
>> +			break;
>> +
>> +		usage += nodes[i].size << PAGE_SHIFT;
>> +		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>> +		pages_left -= pages;
>> +	}
>> +
>> +	for (; pages_left; ++i) {
>>   		unsigned long pages = min(pages_left, pages_per_node);
>>   		uint32_t alignment = mem->page_alignment;
>> -		unsigned long start;
>>   
>>   		if (pages == pages_per_node)
>>   			alignment = pages_per_node;
>> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>   
>>   		usage += nodes[i].size << PAGE_SHIFT;
>>   		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> -
>> -		/* Calculate a virtual BO start address to easily check if
>> -		 * everything is CPU accessible.
>> -		 */
>> -		start = nodes[i].start + nodes[i].size;
>> -		if (start > mem->num_pages)
>> -			start -= mem->num_pages;
>> -		else
>> -			start = 0;
>> -		mem->start = max(mem->start, start);
>> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>   		pages_left -= pages;
>>   	}
>>   	spin_unlock(&mgr->lock);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]             ` <9b61bb4c-4721-c8af-0810-4ecc18ed1ea4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-11  7:37               ` Michel Dänzer
       [not found]                 ` <46b7da24-4b04-7daf-e6c3-07e10ecb6fd6-otUistvHUpPR7s880joybQ@public.gmane.org>
  2018-09-11 14:27               ` Kuehling, Felix
  1 sibling, 1 reply; 30+ messages in thread
From: Michel Dänzer @ 2018-09-11  7:37 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Felix Kuehling
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-11 8:49 a.m., Christian König wrote:
> 
> But another question: Why do you want to clear VRAM on allocation? We
> perfectly support allocating VRAM without clearing it.

Which is a problem of its own, as it can leak information from one
process to another.

Anyway, not clearing when userspace sets AMDGPU_GEM_CREATE_VRAM_CLEARED
would break the userspace ABI, so that's no go.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                 ` <46b7da24-4b04-7daf-e6c3-07e10ecb6fd6-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-09-11  7:46                   ` Christian König
       [not found]                     ` <d0af0309-e4f9-9825-fd30-9cf79ed5f823-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-11  7:46 UTC (permalink / raw)
  To: Michel Dänzer, Felix Kuehling
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 11.09.2018 um 09:37 schrieb Michel Dänzer:
> On 2018-09-11 8:49 a.m., Christian König wrote:
>> But another question: Why do you want to clear VRAM on allocation? We
>> perfectly support allocating VRAM without clearing it.
> Which is a problem of its own, as it can leak information from one
> process to another.

How about adding a AMDGPU_GEM_CREATE_VRAM_SECRED flag which triggers 
clearing VRAM when the BO is freed?

> Anyway, not clearing when userspace sets AMDGPU_GEM_CREATE_VRAM_CLEARED
> would break the userspace ABI, so that's no go.
Yeah, exactly my point.

What Felix seems to try is to avoid clearing VRAM when the allocated 
memory came from the same process.

But that isn't the intention of the AMDGPU_GEM_CREATE_VRAM_CLEARED flag.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                     ` <d0af0309-e4f9-9825-fd30-9cf79ed5f823-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  7:55                       ` Michel Dänzer
       [not found]                         ` <5fd331d0-24ab-7540-2d0a-b227230743c0-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michel Dänzer @ 2018-09-11  7:55 UTC (permalink / raw)
  To: Christian König, Felix Kuehling
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-11 9:46 a.m., Christian König wrote:
> Am 11.09.2018 um 09:37 schrieb Michel Dänzer:
>> On 2018-09-11 8:49 a.m., Christian König wrote:
>>> But another question: Why do you want to clear VRAM on allocation? We
>>> perfectly support allocating VRAM without clearing it.
>> Which is a problem of its own, as it can leak information from one
>> process to another.
> 
> How about adding a AMDGPU_GEM_CREATE_VRAM_SECRED flag which triggers
> clearing VRAM when the BO is freed?

Doesn't sound like much better API than AMDGPU_GEM_CREATE_VRAM_CLEARED
to me. Why would any process be okay with leaking its BO contents to
other processes? Could that flag even prevent it in all cases, e.g. when
BOs are evicted or otherwise migrated?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                         ` <5fd331d0-24ab-7540-2d0a-b227230743c0-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-09-11  8:20                           ` Christian König
       [not found]                             ` <a3c91610-e5e2-c666-cfda-344f1f40485c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-11  8:20 UTC (permalink / raw)
  To: Michel Dänzer, Christian König, Felix Kuehling
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 11.09.2018 um 09:55 schrieb Michel Dänzer:
> On 2018-09-11 9:46 a.m., Christian König wrote:
>> Am 11.09.2018 um 09:37 schrieb Michel Dänzer:
>>> On 2018-09-11 8:49 a.m., Christian König wrote:
>>>> But another question: Why do you want to clear VRAM on allocation? We
>>>> perfectly support allocating VRAM without clearing it.
>>> Which is a problem of its own, as it can leak information from one
>>> process to another.
>> How about adding a AMDGPU_GEM_CREATE_VRAM_SECRED flag which triggers
>> clearing VRAM when the BO is freed?
> Doesn't sound like much better API than AMDGPU_GEM_CREATE_VRAM_CLEARED
> to me. Why would any process be okay with leaking its BO contents to
> other processes?

Well the content of most BOs is uninteresting to other processes, e.g. 
textures.

Only things like page tables and maybe some security related 
calculations need extra protection as far as I can see.

> Could that flag even prevent it in all cases, e.g. when
> BOs are evicted or otherwise migrated?

Yeah, sure that shouldn't be a problem. Clearing a BO after it is moved 
should be trivial.

Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                             ` <a3c91610-e5e2-c666-cfda-344f1f40485c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-11  8:25                               ` Michel Dänzer
  0 siblings, 0 replies; 30+ messages in thread
From: Michel Dänzer @ 2018-09-11  8:25 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Felix Kuehling
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-11 10:20 a.m., Christian König wrote:
> Am 11.09.2018 um 09:55 schrieb Michel Dänzer:
>> On 2018-09-11 9:46 a.m., Christian König wrote:
>>> Am 11.09.2018 um 09:37 schrieb Michel Dänzer:
>>>> On 2018-09-11 8:49 a.m., Christian König wrote:
>>>>> But another question: Why do you want to clear VRAM on allocation? We
>>>>> perfectly support allocating VRAM without clearing it.
>>>> Which is a problem of its own, as it can leak information from one
>>>> process to another.
>>> How about adding a AMDGPU_GEM_CREATE_VRAM_SECRED flag which triggers
>>> clearing VRAM when the BO is freed?
>> Doesn't sound like much better API than AMDGPU_GEM_CREATE_VRAM_CLEARED
>> to me. Why would any process be okay with leaking its BO contents to
>> other processes?
> 
> Well the content of most BOs is uninteresting to other processes, e.g.
> textures.

Textures can hold application window contents, which can be sensitive.
Not clearing BOs can result in such contents showing up in unexpected
places.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]     ` <20180909180339.1910-2-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-11  0:08       ` Felix Kuehling
@ 2018-09-11  9:50       ` Huang Rui
  2018-09-11 11:16         ` Christian König
  1 sibling, 1 reply; 30+ messages in thread
From: Huang Rui @ 2018-09-11  9:50 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Sun, Sep 09, 2018 at 08:03:29PM +0200, Christian König wrote:
> Try to allocate VRAM in power of two sizes and only fallback to vram
> split sizes if that fails.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 9cfa8a9ada92..3f9d5d00c9b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>  	return usage;
>  }
>  
> +/**
> + * amdgpu_vram_mgr_virt_start - update virtual start address
> + *
> + * @mem: ttm_mem_reg to update
> + * @node: just allocated node
> + *
> + * Calculate a virtual BO start address to easily check if everything is CPU
> + * accessible.
> + */
> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
> +				       struct drm_mm_node *node)
> +{
> +	unsigned long start;
> +
> +	start = node->start + node->size;
> +	if (start > mem->num_pages)
> +		start -= mem->num_pages;
> +	else
> +		start = 0;
> +	mem->start = max(mem->start, start);
> +}
> +

How about move this function into ttm (ttm_bo_manager.c) then export the
symbol, it looks more like a ttm helper function.

Thanks,
Ray

>  /**
>   * amdgpu_vram_mgr_new - allocate new ranges
>   *
> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>  	pages_left = mem->num_pages;
>  
>  	spin_lock(&mgr->lock);
> -	for (i = 0; i < num_nodes; ++i) {
> +	for (i = 0; pages_left >= pages_per_node; ++i) {
> +		unsigned long pages = rounddown_pow_of_two(pages_left);
> +
> +		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
> +						pages_per_node, 0,
> +						place->fpfn, lpfn,
> +						mode);
> +		if (unlikely(r))
> +			break;
> +
> +		usage += nodes[i].size << PAGE_SHIFT;
> +		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
> +		pages_left -= pages;
> +	}
> +
> +	for (; pages_left; ++i) {
>  		unsigned long pages = min(pages_left, pages_per_node);
>  		uint32_t alignment = mem->page_alignment;
> -		unsigned long start;
>  
>  		if (pages == pages_per_node)
>  			alignment = pages_per_node;
> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>  
>  		usage += nodes[i].size << PAGE_SHIFT;
>  		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
> -
> -		/* Calculate a virtual BO start address to easily check if
> -		 * everything is CPU accessible.
> -		 */
> -		start = nodes[i].start + nodes[i].size;
> -		if (start > mem->num_pages)
> -			start -= mem->num_pages;
> -		else
> -			start = 0;
> -		mem->start = max(mem->start, start);
> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>  		pages_left -= pages;
>  	}
>  	spin_unlock(&mgr->lock);
> -- 
> 2.14.1
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
  2018-09-11  9:50       ` Huang Rui
@ 2018-09-11 11:16         ` Christian König
  0 siblings, 0 replies; 30+ messages in thread
From: Christian König @ 2018-09-11 11:16 UTC (permalink / raw)
  To: Huang Rui; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 11.09.2018 um 11:50 schrieb Huang Rui:
> On Sun, Sep 09, 2018 at 08:03:29PM +0200, Christian König wrote:
>> Try to allocate VRAM in power of two sizes and only fallback to vram
>> split sizes if that fails.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>>   1 file changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 9cfa8a9ada92..3f9d5d00c9b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>>   	return usage;
>>   }
>>   
>> +/**
>> + * amdgpu_vram_mgr_virt_start - update virtual start address
>> + *
>> + * @mem: ttm_mem_reg to update
>> + * @node: just allocated node
>> + *
>> + * Calculate a virtual BO start address to easily check if everything is CPU
>> + * accessible.
>> + */
>> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
>> +				       struct drm_mm_node *node)
>> +{
>> +	unsigned long start;
>> +
>> +	start = node->start + node->size;
>> +	if (start > mem->num_pages)
>> +		start -= mem->num_pages;
>> +	else
>> +		start = 0;
>> +	mem->start = max(mem->start, start);
>> +}
>> +
> How about move this function into ttm (ttm_bo_manager.c) then export the
> symbol, it looks more like a ttm helper function.

No, that is a completely amdgpu specific hack.

We just provide a virtual start address so that the page fault function 
can quickly check if the BO is in CPU visible VRAM or not.

Regards,
Christian.

>
> Thanks,
> Ray
>
>>   /**
>>    * amdgpu_vram_mgr_new - allocate new ranges
>>    *
>> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>   	pages_left = mem->num_pages;
>>   
>>   	spin_lock(&mgr->lock);
>> -	for (i = 0; i < num_nodes; ++i) {
>> +	for (i = 0; pages_left >= pages_per_node; ++i) {
>> +		unsigned long pages = rounddown_pow_of_two(pages_left);
>> +
>> +		r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>> +						pages_per_node, 0,
>> +						place->fpfn, lpfn,
>> +						mode);
>> +		if (unlikely(r))
>> +			break;
>> +
>> +		usage += nodes[i].size << PAGE_SHIFT;
>> +		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>> +		pages_left -= pages;
>> +	}
>> +
>> +	for (; pages_left; ++i) {
>>   		unsigned long pages = min(pages_left, pages_per_node);
>>   		uint32_t alignment = mem->page_alignment;
>> -		unsigned long start;
>>   
>>   		if (pages == pages_per_node)
>>   			alignment = pages_per_node;
>> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>   
>>   		usage += nodes[i].size << PAGE_SHIFT;
>>   		vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> -
>> -		/* Calculate a virtual BO start address to easily check if
>> -		 * everything is CPU accessible.
>> -		 */
>> -		start = nodes[i].start + nodes[i].size;
>> -		if (start > mem->num_pages)
>> -			start -= mem->num_pages;
>> -		else
>> -			start = 0;
>> -		mem->start = max(mem->start, start);
>> +		amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>   		pages_left -= pages;
>>   	}
>>   	spin_unlock(&mgr->lock);
>> -- 
>> 2.14.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]             ` <9b61bb4c-4721-c8af-0810-4ecc18ed1ea4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2018-09-11  7:37               ` Michel Dänzer
@ 2018-09-11 14:27               ` Kuehling, Felix
       [not found]                 ` <DM5PR12MB17073F8B673AEFC092FBA5B392040-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 30+ messages in thread
From: Kuehling, Felix @ 2018-09-11 14:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Koenig, Christian

> Otherwise we don't get pages larger than 2MB for the L1 on Vega10.

Yeah, I realized that after reviewing the rest of the patch series.

> But another question: Why do you want to clear VRAM on allocation? We
> perfectly support allocating VRAM without clearing it.

As Michel pointed out, that's a security problem because it can leak data between processes. Therefore all VRAM allocated through KFD APIs is currently cleared on allocation. However, we have no requirement from our user mode APIs to provide initialized memory. So if we could skip the initialization it would be a big performance improvement in applications that do lots of memory management.

I was thinking, it should probably be easy to modify my drm_mm_mode caching idea to deal with two node sizes: 2MB and 1GB. But having to deal with arbitrary power-of-two sizes in between would make things more challenging and less effective.

Regards,
  Felix

________________________________________
From: Christian König <ckoenig.leichtzumerken@gmail.com>
Sent: Tuesday, September 11, 2018 2:49:44 AM
To: Kuehling, Felix; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two

Yeah well the whole patch set depends on that :)

Otherwise we don't get pages larger than 2MB for the L1 on Vega10.

But another question: Why do you want to clear VRAM on allocation? We
perfectly support allocating VRAM without clearing it.

Regards,
Christian.

Am 11.09.2018 um 02:08 schrieb Felix Kuehling:
> This looks good. But it complicates something I've been looking at:
> Remembering which process drm_mm_nodes last belonged to, so that they
> don't need to be cleared next time they are allocated by the same
> process. Having most nodes the same size (vram_page_split pages) would
> make this very easy and efficient for the most common cases (large
> allocations without any exotic address limitations or alignment
> requirements).
>
> Does anything else in this patch series depend on this optimization?
>
> Regards,
>    Felix
>
>
> On 2018-09-09 02:03 PM, Christian König wrote:
>> Try to allocate VRAM in power of two sizes and only fallback to vram
>> split sizes if that fails.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>>   1 file changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 9cfa8a9ada92..3f9d5d00c9b3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>>      return usage;
>>   }
>>
>> +/**
>> + * amdgpu_vram_mgr_virt_start - update virtual start address
>> + *
>> + * @mem: ttm_mem_reg to update
>> + * @node: just allocated node
>> + *
>> + * Calculate a virtual BO start address to easily check if everything is CPU
>> + * accessible.
>> + */
>> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
>> +                                   struct drm_mm_node *node)
>> +{
>> +    unsigned long start;
>> +
>> +    start = node->start + node->size;
>> +    if (start > mem->num_pages)
>> +            start -= mem->num_pages;
>> +    else
>> +            start = 0;
>> +    mem->start = max(mem->start, start);
>> +}
>> +
>>   /**
>>    * amdgpu_vram_mgr_new - allocate new ranges
>>    *
>> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>      pages_left = mem->num_pages;
>>
>>      spin_lock(&mgr->lock);
>> -    for (i = 0; i < num_nodes; ++i) {
>> +    for (i = 0; pages_left >= pages_per_node; ++i) {
>> +            unsigned long pages = rounddown_pow_of_two(pages_left);
>> +
>> +            r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>> +                                            pages_per_node, 0,
>> +                                            place->fpfn, lpfn,
>> +                                            mode);
>> +            if (unlikely(r))
>> +                    break;
>> +
>> +            usage += nodes[i].size << PAGE_SHIFT;
>> +            vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>> +            pages_left -= pages;
>> +    }
>> +
>> +    for (; pages_left; ++i) {
>>              unsigned long pages = min(pages_left, pages_per_node);
>>              uint32_t alignment = mem->page_alignment;
>> -            unsigned long start;
>>
>>              if (pages == pages_per_node)
>>                      alignment = pages_per_node;
>> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>
>>              usage += nodes[i].size << PAGE_SHIFT;
>>              vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>> -
>> -            /* Calculate a virtual BO start address to easily check if
>> -             * everything is CPU accessible.
>> -             */
>> -            start = nodes[i].start + nodes[i].size;
>> -            if (start > mem->num_pages)
>> -                    start -= mem->num_pages;
>> -            else
>> -                    start = 0;
>> -            mem->start = max(mem->start, start);
>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>              pages_left -= pages;
>>      }
>>      spin_unlock(&mgr->lock);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                 ` <DM5PR12MB17073F8B673AEFC092FBA5B392040-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-09-11 14:48                   ` Christian König
       [not found]                     ` <2361b55f-ed56-bd42-b628-fc001d9ea300-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Christian König @ 2018-09-11 14:48 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 11.09.2018 um 16:27 schrieb Kuehling, Felix:
>> Otherwise we don't get pages larger than 2MB for the L1 on Vega10.
> Yeah, I realized that after reviewing the rest of the patch series.
>
>> But another question: Why do you want to clear VRAM on allocation? We
>> perfectly support allocating VRAM without clearing it.
> As Michel pointed out, that's a security problem because it can leak data between processes. Therefore all VRAM allocated through KFD APIs is currently cleared on allocation. However, we have no requirement from our user mode APIs to provide initialized memory. So if we could skip the initialization it would be a big performance improvement in applications that do lots of memory management.
>
> I was thinking, it should probably be easy to modify my drm_mm_mode caching idea to deal with two node sizes: 2MB and 1GB. But having to deal with arbitrary power-of-two sizes in between would make things more challenging and less effective.

It is rather unlikely that somebody wants to allocate 1GB, but a 32MB 
texture is perfectly possible. Would be a pity if we loose the extra L1 
optimization.

But I don't really see a problem with that, cause you could still split 
up the larger allocations on freeing them (keep in mind that we always 
allocate enough nodes for 2MB pages).

Additional to that I would actually not track drm_mm_nodes at all, but 
instead use an array where I note for each 2MB page who the last owner was.

If I'm not completely mistaken would only be around 64KB for 16GB of 
VRAM and much simpler than scanning recently freed up drm_mm_nodes.

Regards,
Christian.

>
> Regards,
>    Felix
>
> ________________________________________
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, September 11, 2018 2:49:44 AM
> To: Kuehling, Felix; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
>
> Yeah well the whole patch set depends on that :)
>
> Otherwise we don't get pages larger than 2MB for the L1 on Vega10.
>
> But another question: Why do you want to clear VRAM on allocation? We
> perfectly support allocating VRAM without clearing it.
>
> Regards,
> Christian.
>
> Am 11.09.2018 um 02:08 schrieb Felix Kuehling:
>> This looks good. But it complicates something I've been looking at:
>> Remembering which process drm_mm_nodes last belonged to, so that they
>> don't need to be cleared next time they are allocated by the same
>> process. Having most nodes the same size (vram_page_split pages) would
>> make this very easy and efficient for the most common cases (large
>> allocations without any exotic address limitations or alignment
>> requirements).
>>
>> Does anything else in this patch series depend on this optimization?
>>
>> Regards,
>>     Felix
>>
>>
>> On 2018-09-09 02:03 PM, Christian König wrote:
>>> Try to allocate VRAM in power of two sizes and only fallback to vram
>>> split sizes if that fails.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52 +++++++++++++++++++++-------
>>>    1 file changed, 40 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 9cfa8a9ada92..3f9d5d00c9b3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct amdgpu_bo *bo)
>>>       return usage;
>>>    }
>>>
>>> +/**
>>> + * amdgpu_vram_mgr_virt_start - update virtual start address
>>> + *
>>> + * @mem: ttm_mem_reg to update
>>> + * @node: just allocated node
>>> + *
>>> + * Calculate a virtual BO start address to easily check if everything is CPU
>>> + * accessible.
>>> + */
>>> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
>>> +                                   struct drm_mm_node *node)
>>> +{
>>> +    unsigned long start;
>>> +
>>> +    start = node->start + node->size;
>>> +    if (start > mem->num_pages)
>>> +            start -= mem->num_pages;
>>> +    else
>>> +            start = 0;
>>> +    mem->start = max(mem->start, start);
>>> +}
>>> +
>>>    /**
>>>     * amdgpu_vram_mgr_new - allocate new ranges
>>>     *
>>> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>>       pages_left = mem->num_pages;
>>>
>>>       spin_lock(&mgr->lock);
>>> -    for (i = 0; i < num_nodes; ++i) {
>>> +    for (i = 0; pages_left >= pages_per_node; ++i) {
>>> +            unsigned long pages = rounddown_pow_of_two(pages_left);
>>> +
>>> +            r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>>> +                                            pages_per_node, 0,
>>> +                                            place->fpfn, lpfn,
>>> +                                            mode);
>>> +            if (unlikely(r))
>>> +                    break;
>>> +
>>> +            usage += nodes[i].size << PAGE_SHIFT;
>>> +            vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>> +            pages_left -= pages;
>>> +    }
>>> +
>>> +    for (; pages_left; ++i) {
>>>               unsigned long pages = min(pages_left, pages_per_node);
>>>               uint32_t alignment = mem->page_alignment;
>>> -            unsigned long start;
>>>
>>>               if (pages == pages_per_node)
>>>                       alignment = pages_per_node;
>>> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct ttm_mem_type_manager *man,
>>>
>>>               usage += nodes[i].size << PAGE_SHIFT;
>>>               vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>>> -
>>> -            /* Calculate a virtual BO start address to easily check if
>>> -             * everything is CPU accessible.
>>> -             */
>>> -            start = nodes[i].start + nodes[i].size;
>>> -            if (start > mem->num_pages)
>>> -                    start -= mem->num_pages;
>>> -            else
>>> -                    start = 0;
>>> -            mem->start = max(mem->start, start);
>>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>>               pages_left -= pages;
>>>       }
>>>       spin_unlock(&mgr->lock);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two
       [not found]                     ` <2361b55f-ed56-bd42-b628-fc001d9ea300-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11 22:25                       ` Felix Kuehling
  0 siblings, 0 replies; 30+ messages in thread
From: Felix Kuehling @ 2018-09-11 22:25 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-09-11 10:48 AM, Christian König wrote:
> Am 11.09.2018 um 16:27 schrieb Kuehling, Felix:
>>> Otherwise we don't get pages larger than 2MB for the L1 on Vega10.
>> Yeah, I realized that after reviewing the rest of the patch series.
>>
>>> But another question: Why do you want to clear VRAM on allocation? We
>>> perfectly support allocating VRAM without clearing it.
>> As Michel pointed out, that's a security problem because it can leak
>> data between processes. Therefore all VRAM allocated through KFD APIs
>> is currently cleared on allocation. However, we have no requirement
>> from our user mode APIs to provide initialized memory. So if we could
>> skip the initialization it would be a big performance improvement in
>> applications that do lots of memory management.
>>
>> I was thinking, it should probably be easy to modify my drm_mm_mode
>> caching idea to deal with two node sizes: 2MB and 1GB. But having to
>> deal with arbitrary power-of-two sizes in between would make things
>> more challenging and less effective.
>
> It is rather unlikely that somebody wants to allocate 1GB, but a 32MB
> texture is perfectly possible. Would be a pity if we loose the extra
> L1 optimization.

True.

>
> But I don't really see a problem with that, cause you could still
> split up the larger allocations on freeing them (keep in mind that we
> always allocate enough nodes for 2MB pages).
>
> Additional to that I would actually not track drm_mm_nodes at all, but
> instead use an array where I note for each 2MB page who the last owner
> was.

That way you'd know whether a node was previously owned by the same
process. But you're leaving it to chance whether you actually get a
"pre-owned" node. I was also trying to deliberately allocate nodes that
were previously owned by the same process to maximize the effectiveness.

>
> If I'm not completely mistaken would only be around 64KB for 16GB of
> VRAM and much simpler than scanning recently freed up drm_mm_nodes.

The nice thing with my plan was that I wouldn't need to scan anything if
there were no placement restrictions. Just pick the first node from a
per-process list. The nodes on the list would have been all the same size.

Anyway, I don't want to hold up your change. I'll have to think of a
different way of doing this. And if this can't be done efficiently in
the kernel, the user mode team was even considering using something like
jemalloc. Feel free to submit your change and add my Reviewed-by.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>> ________________________________________
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Tuesday, September 11, 2018 2:49:44 AM
>> To: Kuehling, Felix; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH 01/11] drm/amdgpu: try allocating VRAM as power
>> of two
>>
>> Yeah well the whole patch set depends on that :)
>>
>> Otherwise we don't get pages larger than 2MB for the L1 on Vega10.
>>
>> But another question: Why do you want to clear VRAM on allocation? We
>> perfectly support allocating VRAM without clearing it.
>>
>> Regards,
>> Christian.
>>
>> Am 11.09.2018 um 02:08 schrieb Felix Kuehling:
>>> This looks good. But it complicates something I've been looking at:
>>> Remembering which process drm_mm_nodes last belonged to, so that they
>>> don't need to be cleared next time they are allocated by the same
>>> process. Having most nodes the same size (vram_page_split pages) would
>>> make this very easy and efficient for the most common cases (large
>>> allocations without any exotic address limitations or alignment
>>> requirements).
>>>
>>> Does anything else in this patch series depend on this optimization?
>>>
>>> Regards,
>>>     Felix
>>>
>>>
>>> On 2018-09-09 02:03 PM, Christian König wrote:
>>>> Try to allocate VRAM in power of two sizes and only fallback to vram
>>>> split sizes if that fails.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 52
>>>> +++++++++++++++++++++-------
>>>>    1 file changed, 40 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> index 9cfa8a9ada92..3f9d5d00c9b3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>>> @@ -124,6 +124,28 @@ u64 amdgpu_vram_mgr_bo_visible_size(struct
>>>> amdgpu_bo *bo)
>>>>       return usage;
>>>>    }
>>>>
>>>> +/**
>>>> + * amdgpu_vram_mgr_virt_start - update virtual start address
>>>> + *
>>>> + * @mem: ttm_mem_reg to update
>>>> + * @node: just allocated node
>>>> + *
>>>> + * Calculate a virtual BO start address to easily check if
>>>> everything is CPU
>>>> + * accessible.
>>>> + */
>>>> +static void amdgpu_vram_mgr_virt_start(struct ttm_mem_reg *mem,
>>>> +                                   struct drm_mm_node *node)
>>>> +{
>>>> +    unsigned long start;
>>>> +
>>>> +    start = node->start + node->size;
>>>> +    if (start > mem->num_pages)
>>>> +            start -= mem->num_pages;
>>>> +    else
>>>> +            start = 0;
>>>> +    mem->start = max(mem->start, start);
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_vram_mgr_new - allocate new ranges
>>>>     *
>>>> @@ -176,10 +198,25 @@ static int amdgpu_vram_mgr_new(struct
>>>> ttm_mem_type_manager *man,
>>>>       pages_left = mem->num_pages;
>>>>
>>>>       spin_lock(&mgr->lock);
>>>> -    for (i = 0; i < num_nodes; ++i) {
>>>> +    for (i = 0; pages_left >= pages_per_node; ++i) {
>>>> +            unsigned long pages = rounddown_pow_of_two(pages_left);
>>>> +
>>>> +            r = drm_mm_insert_node_in_range(mm, &nodes[i], pages,
>>>> +                                            pages_per_node, 0,
>>>> +                                            place->fpfn, lpfn,
>>>> +                                            mode);
>>>> +            if (unlikely(r))
>>>> +                    break;
>>>> +
>>>> +            usage += nodes[i].size << PAGE_SHIFT;
>>>> +            vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>>>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>>> +            pages_left -= pages;
>>>> +    }
>>>> +
>>>> +    for (; pages_left; ++i) {
>>>>               unsigned long pages = min(pages_left, pages_per_node);
>>>>               uint32_t alignment = mem->page_alignment;
>>>> -            unsigned long start;
>>>>
>>>>               if (pages == pages_per_node)
>>>>                       alignment = pages_per_node;
>>>> @@ -193,16 +230,7 @@ static int amdgpu_vram_mgr_new(struct
>>>> ttm_mem_type_manager *man,
>>>>
>>>>               usage += nodes[i].size << PAGE_SHIFT;
>>>>               vis_usage += amdgpu_vram_mgr_vis_size(adev, &nodes[i]);
>>>> -
>>>> -            /* Calculate a virtual BO start address to easily
>>>> check if
>>>> -             * everything is CPU accessible.
>>>> -             */
>>>> -            start = nodes[i].start + nodes[i].size;
>>>> -            if (start > mem->num_pages)
>>>> -                    start -= mem->num_pages;
>>>> -            else
>>>> -                    start = 0;
>>>> -            mem->start = max(mem->start, start);
>>>> +            amdgpu_vram_mgr_virt_start(mem, &nodes[i]);
>>>>               pages_left -= pages;
>>>>       }
>>>>       spin_unlock(&mgr->lock);
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-09-11 22:25 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-09 18:03 Optimize VM handling a bit more Christian König
     [not found] ` <20180909180339.1910-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-09 18:03   ` [PATCH 01/11] drm/amdgpu: try allocating VRAM as power of two Christian König
     [not found]     ` <20180909180339.1910-2-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-11  0:08       ` Felix Kuehling
     [not found]         ` <60b9c96a-d31d-cd61-df22-c8414f0166dc-5C7GfCeVMHo@public.gmane.org>
2018-09-11  6:49           ` Christian König
     [not found]             ` <9b61bb4c-4721-c8af-0810-4ecc18ed1ea4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-11  7:37               ` Michel Dänzer
     [not found]                 ` <46b7da24-4b04-7daf-e6c3-07e10ecb6fd6-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-09-11  7:46                   ` Christian König
     [not found]                     ` <d0af0309-e4f9-9825-fd30-9cf79ed5f823-5C7GfCeVMHo@public.gmane.org>
2018-09-11  7:55                       ` Michel Dänzer
     [not found]                         ` <5fd331d0-24ab-7540-2d0a-b227230743c0-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-09-11  8:20                           ` Christian König
     [not found]                             ` <a3c91610-e5e2-c666-cfda-344f1f40485c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-11  8:25                               ` Michel Dänzer
2018-09-11 14:27               ` Kuehling, Felix
     [not found]                 ` <DM5PR12MB17073F8B673AEFC092FBA5B392040-2J9CzHegvk9TCtO+SvGBKwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-11 14:48                   ` Christian König
     [not found]                     ` <2361b55f-ed56-bd42-b628-fc001d9ea300-5C7GfCeVMHo@public.gmane.org>
2018-09-11 22:25                       ` Felix Kuehling
2018-09-11  9:50       ` Huang Rui
2018-09-11 11:16         ` Christian König
2018-09-09 18:03   ` [PATCH 02/11] drm/amdgpu: add amdgpu_vm_pt_parent helper Christian König
2018-09-09 18:03   ` [PATCH 03/11] drm/amdgpu: add amdgpu_vm_update_func Christian König
2018-09-09 18:03   ` [PATCH 04/11] drm/amdgpu: add some VM PD/PT iterators Christian König
     [not found]     ` <20180909180339.1910-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-11  1:51       ` Felix Kuehling
2018-09-11  2:38       ` Zhang, Jerry (Junwei)
2018-09-09 18:03   ` [PATCH 05/11] drm/amdgpu: use leaf iterator for allocating PD/PT Christian König
2018-09-09 18:03   ` [PATCH 06/11] drm/amdgpu: use dfs iterator to free PDs/PTs Christian König
2018-09-09 18:03   ` [PATCH 07/11] drm/amdgpu: use the DFS iterator in amdgpu_vm_invalidate_level Christian König
     [not found]     ` <20180909180339.1910-8-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-11  0:19       ` Felix Kuehling
2018-09-09 18:03   ` [PATCH 08/11] drm/amdgpu: use leaf iterator for filling PTs Christian König
2018-09-09 18:03   ` [PATCH 09/11] drm/amdgpu: meld together VM fragment and huge page handling Christian König
2018-09-09 18:03   ` [PATCH 10/11] drm/amdgpu: use the maximum possible fragment size on Vega/Raven Christian König
     [not found]     ` <20180909180339.1910-11-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-11  0:45       ` Felix Kuehling
2018-09-09 18:03   ` [PATCH 11/11] drm/amdgpu: allow fragment processing for invalid PTEs Christian König
2018-09-11  2:17   ` Optimize VM handling a bit more Felix Kuehling
2018-09-11  2:39   ` Zhang, Jerry (Junwei)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.