All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3
@ 2021-06-22 16:23 ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Sometimes drivers need to use bounce buffers to evict BOs. While those reside
in some domain they are not necessarily suitable for CS.

Add a flag so that drivers can note that a bounce buffers needs to be
reallocated during validation.

v2: add detailed comments
v3 (chk): merge commits and rework commit message

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 3 +++
 include/drm/ttm/ttm_placement.h | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index db53fecca696..45145d02aed2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -913,6 +913,9 @@ static bool ttm_bo_places_compat(const struct ttm_place *places,
 {
 	unsigned i;
 
+	if (mem->placement & TTM_PL_FLAG_TEMPORARY)
+		return false;
+
 	for (i = 0; i < num_placement; i++) {
 		const struct ttm_place *heap = &places[i];
 
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index aa6ba4d0cf78..8995c9e4ec1b 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -47,8 +47,11 @@
  * top of the memory area, instead of the bottom.
  */
 
-#define TTM_PL_FLAG_CONTIGUOUS  (1 << 19)
-#define TTM_PL_FLAG_TOPDOWN     (1 << 22)
+#define TTM_PL_FLAG_CONTIGUOUS  (1 << 0)
+#define TTM_PL_FLAG_TOPDOWN     (1 << 1)
+
+/* For multihop handling */
+#define TTM_PL_FLAG_TEMPORARY   (1 << 2)
 
 /**
  * struct ttm_place
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3
@ 2021-06-22 16:23 ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Sometimes drivers need to use bounce buffers to evict BOs. While those reside
in some domain they are not necessarily suitable for CS.

Add a flag so that drivers can note that a bounce buffers needs to be
reallocated during validation.

v2: add detailed comments
v3 (chk): merge commits and rework commit message

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 3 +++
 include/drm/ttm/ttm_placement.h | 7 +++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index db53fecca696..45145d02aed2 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -913,6 +913,9 @@ static bool ttm_bo_places_compat(const struct ttm_place *places,
 {
 	unsigned i;
 
+	if (mem->placement & TTM_PL_FLAG_TEMPORARY)
+		return false;
+
 	for (i = 0; i < num_placement; i++) {
 		const struct ttm_place *heap = &places[i];
 
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index aa6ba4d0cf78..8995c9e4ec1b 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -47,8 +47,11 @@
  * top of the memory area, instead of the bottom.
  */
 
-#define TTM_PL_FLAG_CONTIGUOUS  (1 << 19)
-#define TTM_PL_FLAG_TOPDOWN     (1 << 22)
+#define TTM_PL_FLAG_CONTIGUOUS  (1 << 0)
+#define TTM_PL_FLAG_TOPDOWN     (1 << 1)
+
+/* For multihop handling */
+#define TTM_PL_FLAG_TEMPORARY   (1 << 2)
 
 /**
  * struct ttm_place
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] drm/amdgpu: user temporary GTT as bounce buffer
  2021-06-22 16:23 ` Andrey Grodzovsky
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Currently, we have a limitted GTT memory size and need a bounce buffer
when doing buffer migration between VRAM and SYSTEM domain.

The problem is under GTT memory pressure we can't do buffer migration
between VRAM and SYSTEM domain. But in some cases we really need that.
Eespecially when validating a VRAM backing store BO which resides in
SYSTEM domain.

v2: still account temporary GTT allocations
v3 (chk): revert to the simpler change for now

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 20 ++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index ec96e0b26b11..b694dc96b336 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -132,14 +132,15 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	struct amdgpu_gtt_node *node;
 	int r;
 
-	spin_lock(&mgr->lock);
-	if (tbo->resource && tbo->resource->mem_type != TTM_PL_TT &&
-	    atomic64_read(&mgr->available) < num_pages) {
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY)) {
+		spin_lock(&mgr->lock);
+		if (atomic64_read(&mgr->available) < num_pages) {
+			spin_unlock(&mgr->lock);
+			return -ENOSPC;
+		}
+		atomic64_sub(num_pages, &mgr->available);
 		spin_unlock(&mgr->lock);
-		return -ENOSPC;
 	}
-	atomic64_sub(num_pages, &mgr->available);
-	spin_unlock(&mgr->lock);
 
 	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
 	if (!node) {
@@ -175,7 +176,8 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	kfree(node);
 
 err_out:
-	atomic64_add(num_pages, &mgr->available);
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY))
+		atomic64_add(num_pages, &mgr->available);
 
 	return r;
 }
@@ -198,7 +200,9 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 	if (drm_mm_node_allocated(&node->base.mm_nodes[0]))
 		drm_mm_remove_node(&node->base.mm_nodes[0]);
 	spin_unlock(&mgr->lock);
-	atomic64_add(res->num_pages, &mgr->available);
+	
+	if (!(res->placement & TTM_PL_FLAG_TEMPORARY))
+		atomic64_add(res->num_pages, &mgr->available);
 
 	kfree(node);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 80dff29f2bc7..79f875792b30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -521,7 +521,7 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict,
 			hop->fpfn = 0;
 			hop->lpfn = 0;
 			hop->mem_type = TTM_PL_TT;
-			hop->flags = 0;
+			hop->flags = TTM_PL_FLAG_TEMPORARY;
 			return -EMULTIHOP;
 		}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] drm/amdgpu: user temporary GTT as bounce buffer
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Currently, we have a limitted GTT memory size and need a bounce buffer
when doing buffer migration between VRAM and SYSTEM domain.

The problem is under GTT memory pressure we can't do buffer migration
between VRAM and SYSTEM domain. But in some cases we really need that.
Eespecially when validating a VRAM backing store BO which resides in
SYSTEM domain.

v2: still account temporary GTT allocations
v3 (chk): revert to the simpler change for now

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 20 ++++++++++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c     |  2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index ec96e0b26b11..b694dc96b336 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -132,14 +132,15 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	struct amdgpu_gtt_node *node;
 	int r;
 
-	spin_lock(&mgr->lock);
-	if (tbo->resource && tbo->resource->mem_type != TTM_PL_TT &&
-	    atomic64_read(&mgr->available) < num_pages) {
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY)) {
+		spin_lock(&mgr->lock);
+		if (atomic64_read(&mgr->available) < num_pages) {
+			spin_unlock(&mgr->lock);
+			return -ENOSPC;
+		}
+		atomic64_sub(num_pages, &mgr->available);
 		spin_unlock(&mgr->lock);
-		return -ENOSPC;
 	}
-	atomic64_sub(num_pages, &mgr->available);
-	spin_unlock(&mgr->lock);
 
 	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
 	if (!node) {
@@ -175,7 +176,8 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	kfree(node);
 
 err_out:
-	atomic64_add(num_pages, &mgr->available);
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY))
+		atomic64_add(num_pages, &mgr->available);
 
 	return r;
 }
@@ -198,7 +200,9 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 	if (drm_mm_node_allocated(&node->base.mm_nodes[0]))
 		drm_mm_remove_node(&node->base.mm_nodes[0]);
 	spin_unlock(&mgr->lock);
-	atomic64_add(res->num_pages, &mgr->available);
+	
+	if (!(res->placement & TTM_PL_FLAG_TEMPORARY))
+		atomic64_add(res->num_pages, &mgr->available);
 
 	kfree(node);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 80dff29f2bc7..79f875792b30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -521,7 +521,7 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict,
 			hop->fpfn = 0;
 			hop->lpfn = 0;
 			hop->mem_type = TTM_PL_TT;
-			hop->flags = 0;
+			hop->flags = TTM_PL_FLAG_TEMPORARY;
 			return -EMULTIHOP;
 		}
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] drm/amdgpu: always allow evicting to SYSTEM domain
  2021-06-22 16:23 ` Andrey Grodzovsky
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Christian König <christian.koenig@amd.com>

When we run out of GTT we should still be able to evict VRAM->SYSTEM
with a bounce bufferdrm/amdgpu: always allow evicting to SYSTEM domain

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 79f875792b30..b46726e47bce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -149,14 +149,16 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 			 * BOs to be evicted from VRAM
 			 */
 			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
-							 AMDGPU_GEM_DOMAIN_GTT);
+							AMDGPU_GEM_DOMAIN_GTT |
+							AMDGPU_GEM_DOMAIN_CPU);
 			abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT;
 			abo->placements[0].lpfn = 0;
 			abo->placement.busy_placement = &abo->placements[1];
 			abo->placement.num_busy_placement = 1;
 		} else {
 			/* Move to GTT memory */
-			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
+			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT |
+							AMDGPU_GEM_DOMAIN_CPU);
 		}
 		break;
 	case TTM_PL_TT:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] drm/amdgpu: always allow evicting to SYSTEM domain
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Christian König <christian.koenig@amd.com>

When we run out of GTT we should still be able to evict VRAM->SYSTEM
with a bounce bufferdrm/amdgpu: always allow evicting to SYSTEM domain

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 79f875792b30..b46726e47bce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -149,14 +149,16 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 			 * BOs to be evicted from VRAM
 			 */
 			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
-							 AMDGPU_GEM_DOMAIN_GTT);
+							AMDGPU_GEM_DOMAIN_GTT |
+							AMDGPU_GEM_DOMAIN_CPU);
 			abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT;
 			abo->placements[0].lpfn = 0;
 			abo->placement.busy_placement = &abo->placements[1];
 			abo->placement.num_busy_placement = 1;
 		} else {
 			/* Move to GTT memory */
-			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
+			amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT |
+							AMDGPU_GEM_DOMAIN_CPU);
 		}
 		break;
 	case TTM_PL_TT:
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] drm/amdgpu: switch gtt_mgr to counting used pages
  2021-06-22 16:23 ` Andrey Grodzovsky
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Change mgr->available into mgr->used (invert the value).

Makes more sense to do it this way since we don't need the spinlock any
more to double check the handling.

v3 (chk): separated from the TEMPOARAY FLAG change.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 26 ++++++++-------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  2 +-
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index b694dc96b336..495dd3bd4f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -132,14 +132,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	struct amdgpu_gtt_node *node;
 	int r;
 
-	if (!(place->flags & TTM_PL_FLAG_TEMPORARY)) {
-		spin_lock(&mgr->lock);
-		if (atomic64_read(&mgr->available) < num_pages) {
-			spin_unlock(&mgr->lock);
-			return -ENOSPC;
-		}
-		atomic64_sub(num_pages, &mgr->available);
-		spin_unlock(&mgr->lock);
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY) &&
+	    atomic64_add_return(num_pages, &mgr->used) >  man->size) {
+		atomic64_sub(num_pages, &mgr->used);
+		return -ENOSPC;
 	}
 
 	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
@@ -177,7 +173,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 
 err_out:
 	if (!(place->flags & TTM_PL_FLAG_TEMPORARY))
-		atomic64_add(num_pages, &mgr->available);
+		atomic64_sub(num_pages, &mgr->used);
 
 	return r;
 }
@@ -202,7 +198,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 	spin_unlock(&mgr->lock);
 	
 	if (!(res->placement & TTM_PL_FLAG_TEMPORARY))
-		atomic64_add(res->num_pages, &mgr->available);
+		atomic64_sub(res->num_pages, &mgr->used);
 
 	kfree(node);
 }
@@ -217,9 +213,8 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
 {
 	struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-	s64 result = man->size - atomic64_read(&mgr->available);
 
-	return (result > 0 ? result : 0) * PAGE_SIZE;
+	return atomic64_read(&mgr->used) * PAGE_SIZE;
 }
 
 /**
@@ -269,9 +264,8 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man,
 	drm_mm_print(&mgr->mm, printer);
 	spin_unlock(&mgr->lock);
 
-	drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n",
-		   man->size, (u64)atomic64_read(&mgr->available),
-		   amdgpu_gtt_mgr_usage(man) >> 20);
+	drm_printf(printer, "man size:%llu pages,  gtt used:%llu pages\n",
+		   man->size, atomic64_read(&mgr->used));
 }
 
 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
@@ -303,7 +297,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
 	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
 	drm_mm_init(&mgr->mm, start, size);
 	spin_lock_init(&mgr->lock);
-	atomic64_set(&mgr->available, gtt_size >> PAGE_SHIFT);
+	atomic64_set(&mgr->used, 0);
 
 	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, &mgr->manager);
 	ttm_resource_manager_set_used(man, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index e69f3e8e06e5..3205fd520060 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -52,7 +52,7 @@ struct amdgpu_gtt_mgr {
 	struct ttm_resource_manager manager;
 	struct drm_mm mm;
 	spinlock_t lock;
-	atomic64_t available;
+	atomic64_t used;
 };
 
 struct amdgpu_preempt_mgr {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] drm/amdgpu: switch gtt_mgr to counting used pages
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Christian König

From: Lang Yu <Lang.Yu@amd.com>

Change mgr->available into mgr->used (invert the value).

Makes more sense to do it this way since we don't need the spinlock any
more to double check the handling.

v3 (chk): separated from the TEMPOARAY FLAG change.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 26 ++++++++-------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h     |  2 +-
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index b694dc96b336..495dd3bd4f1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -132,14 +132,10 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 	struct amdgpu_gtt_node *node;
 	int r;
 
-	if (!(place->flags & TTM_PL_FLAG_TEMPORARY)) {
-		spin_lock(&mgr->lock);
-		if (atomic64_read(&mgr->available) < num_pages) {
-			spin_unlock(&mgr->lock);
-			return -ENOSPC;
-		}
-		atomic64_sub(num_pages, &mgr->available);
-		spin_unlock(&mgr->lock);
+	if (!(place->flags & TTM_PL_FLAG_TEMPORARY) &&
+	    atomic64_add_return(num_pages, &mgr->used) >  man->size) {
+		atomic64_sub(num_pages, &mgr->used);
+		return -ENOSPC;
 	}
 
 	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
@@ -177,7 +173,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man,
 
 err_out:
 	if (!(place->flags & TTM_PL_FLAG_TEMPORARY))
-		atomic64_add(num_pages, &mgr->available);
+		atomic64_sub(num_pages, &mgr->used);
 
 	return r;
 }
@@ -202,7 +198,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 	spin_unlock(&mgr->lock);
 	
 	if (!(res->placement & TTM_PL_FLAG_TEMPORARY))
-		atomic64_add(res->num_pages, &mgr->available);
+		atomic64_sub(res->num_pages, &mgr->used);
 
 	kfree(node);
 }
@@ -217,9 +213,8 @@ static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
 uint64_t amdgpu_gtt_mgr_usage(struct ttm_resource_manager *man)
 {
 	struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
-	s64 result = man->size - atomic64_read(&mgr->available);
 
-	return (result > 0 ? result : 0) * PAGE_SIZE;
+	return atomic64_read(&mgr->used) * PAGE_SIZE;
 }
 
 /**
@@ -269,9 +264,8 @@ static void amdgpu_gtt_mgr_debug(struct ttm_resource_manager *man,
 	drm_mm_print(&mgr->mm, printer);
 	spin_unlock(&mgr->lock);
 
-	drm_printf(printer, "man size:%llu pages, gtt available:%lld pages, usage:%lluMB\n",
-		   man->size, (u64)atomic64_read(&mgr->available),
-		   amdgpu_gtt_mgr_usage(man) >> 20);
+	drm_printf(printer, "man size:%llu pages,  gtt used:%llu pages\n",
+		   man->size, atomic64_read(&mgr->used));
 }
 
 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func = {
@@ -303,7 +297,7 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t gtt_size)
 	size = (adev->gmc.gart_size >> PAGE_SHIFT) - start;
 	drm_mm_init(&mgr->mm, start, size);
 	spin_lock_init(&mgr->lock);
-	atomic64_set(&mgr->available, gtt_size >> PAGE_SHIFT);
+	atomic64_set(&mgr->used, 0);
 
 	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_TT, &mgr->manager);
 	ttm_resource_manager_set_used(man, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index e69f3e8e06e5..3205fd520060 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -52,7 +52,7 @@ struct amdgpu_gtt_mgr {
 	struct ttm_resource_manager manager;
 	struct drm_mm mm;
 	spinlock_t lock;
-	atomic64_t available;
+	atomic64_t used;
 };
 
 struct amdgpu_preempt_mgr {
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] drm/amdgpu: Fix BUG_ON assert
  2021-06-22 16:23 ` Andrey Grodzovsky
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu

With added CPU domain to placement you can have
now 3 placemnts at once.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b7a2070d90af..81268eded073 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -180,7 +180,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		c++;
 	}
 
-	BUG_ON(c >= AMDGPU_BO_MAX_PLACEMENTS);
+	BUG_ON(c > AMDGPU_BO_MAX_PLACEMENTS);
 
 	placement->num_placement = c;
 	placement->placement = places;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] drm/amdgpu: Fix BUG_ON assert
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Andrey Grodzovsky

With added CPU domain to placement you can have
now 3 placemnts at once.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b7a2070d90af..81268eded073 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -180,7 +180,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		c++;
 	}
 
-	BUG_ON(c >= AMDGPU_BO_MAX_PLACEMENTS);
+	BUG_ON(c > AMDGPU_BO_MAX_PLACEMENTS);
 
 	placement->num_placement = c;
 	placement->placement = places;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] drm/ttm: Fix multihop assert on eviction.
  2021-06-22 16:23 ` Andrey Grodzovsky
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu

Problem:
Under memory pressure when GTT domain is almost full multihop assert
will come up when trying to evict LRU BO from VRAM to SYSTEM.

Fix:
Don't assert on multihop error in evict code but rather do a retry
as we do in ttm_bo_move_buffer

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 63 +++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 45145d02aed2..5a2dc712c632 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -485,6 +485,31 @@ void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resched)
 }
 EXPORT_SYMBOL(ttm_bo_unlock_delayed_workqueue);
 
+static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
+				     struct ttm_resource **mem,
+				     struct ttm_operation_ctx *ctx,
+				     struct ttm_place *hop)
+{
+	struct ttm_placement hop_placement;
+	struct ttm_resource *hop_mem;
+	int ret;
+
+	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
+	hop_placement.placement = hop_placement.busy_placement = hop;
+
+	/* find space in the bounce domain */
+	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+	if (ret)
+		return ret;
+	/* move to the bounce domain */
+	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
+	if (ret) {
+		ttm_resource_free(bo, &hop_mem);
+		return ret;
+	}
+	return 0;
+}
+
 static int ttm_bo_evict(struct ttm_buffer_object *bo,
 			struct ttm_operation_ctx *ctx)
 {
@@ -524,12 +549,17 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 		goto out;
 	}
 
+bounce:
 	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
-	if (unlikely(ret)) {
-		WARN(ret == -EMULTIHOP, "Unexpected multihop in eviction - likely driver bug\n");
-		if (ret != -ERESTARTSYS)
+	if (ret == -EMULTIHOP) {
+		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
+		if (ret) {
 			pr_err("Buffer eviction failed\n");
-		ttm_resource_free(bo, &evict_mem);
+			ttm_resource_free(bo, &evict_mem);
+			goto out;
+		}
+		/* try and move to final place now. */
+		goto bounce;
 	}
 out:
 	return ret;
@@ -844,31 +874,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_mem_space);
 
-static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
-				     struct ttm_resource **mem,
-				     struct ttm_operation_ctx *ctx,
-				     struct ttm_place *hop)
-{
-	struct ttm_placement hop_placement;
-	struct ttm_resource *hop_mem;
-	int ret;
-
-	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
-	hop_placement.placement = hop_placement.busy_placement = hop;
-
-	/* find space in the bounce domain */
-	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
-	if (ret)
-		return ret;
-	/* move to the bounce domain */
-	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
-	if (ret) {
-		ttm_resource_free(bo, &hop_mem);
-		return ret;
-	}
-	return 0;
-}
-
 static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 			      struct ttm_placement *placement,
 			      struct ttm_operation_ctx *ctx)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] drm/ttm: Fix multihop assert on eviction.
@ 2021-06-22 16:23   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-22 16:23 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: ckoenig.leichtzumerken, Lang.Yu, Andrey Grodzovsky

Problem:
Under memory pressure when GTT domain is almost full multihop assert
will come up when trying to evict LRU BO from VRAM to SYSTEM.

Fix:
Don't assert on multihop error in evict code but rather do a retry
as we do in ttm_bo_move_buffer

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 63 +++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 45145d02aed2..5a2dc712c632 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -485,6 +485,31 @@ void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resched)
 }
 EXPORT_SYMBOL(ttm_bo_unlock_delayed_workqueue);
 
+static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
+				     struct ttm_resource **mem,
+				     struct ttm_operation_ctx *ctx,
+				     struct ttm_place *hop)
+{
+	struct ttm_placement hop_placement;
+	struct ttm_resource *hop_mem;
+	int ret;
+
+	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
+	hop_placement.placement = hop_placement.busy_placement = hop;
+
+	/* find space in the bounce domain */
+	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
+	if (ret)
+		return ret;
+	/* move to the bounce domain */
+	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
+	if (ret) {
+		ttm_resource_free(bo, &hop_mem);
+		return ret;
+	}
+	return 0;
+}
+
 static int ttm_bo_evict(struct ttm_buffer_object *bo,
 			struct ttm_operation_ctx *ctx)
 {
@@ -524,12 +549,17 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 		goto out;
 	}
 
+bounce:
 	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
-	if (unlikely(ret)) {
-		WARN(ret == -EMULTIHOP, "Unexpected multihop in eviction - likely driver bug\n");
-		if (ret != -ERESTARTSYS)
+	if (ret == -EMULTIHOP) {
+		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
+		if (ret) {
 			pr_err("Buffer eviction failed\n");
-		ttm_resource_free(bo, &evict_mem);
+			ttm_resource_free(bo, &evict_mem);
+			goto out;
+		}
+		/* try and move to final place now. */
+		goto bounce;
 	}
 out:
 	return ret;
@@ -844,31 +874,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_mem_space);
 
-static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
-				     struct ttm_resource **mem,
-				     struct ttm_operation_ctx *ctx,
-				     struct ttm_place *hop)
-{
-	struct ttm_placement hop_placement;
-	struct ttm_resource *hop_mem;
-	int ret;
-
-	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
-	hop_placement.placement = hop_placement.busy_placement = hop;
-
-	/* find space in the bounce domain */
-	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
-	if (ret)
-		return ret;
-	/* move to the bounce domain */
-	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
-	if (ret) {
-		ttm_resource_free(bo, &hop_mem);
-		return ret;
-	}
-	return 0;
-}
-
 static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 			      struct ttm_placement *placement,
 			      struct ttm_operation_ctx *ctx)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] drm/ttm: Fix multihop assert on eviction.
  2021-06-22 16:23   ` Andrey Grodzovsky
@ 2021-06-23  8:28     ` Christian König
  -1 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2021-06-23  8:28 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx; +Cc: Lang.Yu

Am 22.06.21 um 18:23 schrieb Andrey Grodzovsky:
> Problem:
> Under memory pressure when GTT domain is almost full multihop assert
> will come up when trying to evict LRU BO from VRAM to SYSTEM.
>
> Fix:
> Don't assert on multihop error in evict code but rather do a retry
> as we do in ttm_bo_move_buffer
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

But I think you need to move this patch earlier in the series or 
otherwise you break amdgpu eviction in between.

Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 63 +++++++++++++++++++-----------------
>   1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 45145d02aed2..5a2dc712c632 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -485,6 +485,31 @@ void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resched)
>   }
>   EXPORT_SYMBOL(ttm_bo_unlock_delayed_workqueue);
>   
> +static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
> +				     struct ttm_resource **mem,
> +				     struct ttm_operation_ctx *ctx,
> +				     struct ttm_place *hop)
> +{
> +	struct ttm_placement hop_placement;
> +	struct ttm_resource *hop_mem;
> +	int ret;
> +
> +	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
> +	hop_placement.placement = hop_placement.busy_placement = hop;
> +
> +	/* find space in the bounce domain */
> +	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
> +	if (ret)
> +		return ret;
> +	/* move to the bounce domain */
> +	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
> +	if (ret) {
> +		ttm_resource_free(bo, &hop_mem);
> +		return ret;
> +	}
> +	return 0;
> +}
> +
>   static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   			struct ttm_operation_ctx *ctx)
>   {
> @@ -524,12 +549,17 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   		goto out;
>   	}
>   
> +bounce:
>   	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> -	if (unlikely(ret)) {
> -		WARN(ret == -EMULTIHOP, "Unexpected multihop in eviction - likely driver bug\n");
> -		if (ret != -ERESTARTSYS)
> +	if (ret == -EMULTIHOP) {
> +		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> +		if (ret) {
>   			pr_err("Buffer eviction failed\n");
> -		ttm_resource_free(bo, &evict_mem);
> +			ttm_resource_free(bo, &evict_mem);
> +			goto out;
> +		}
> +		/* try and move to final place now. */
> +		goto bounce;
>   	}
>   out:
>   	return ret;
> @@ -844,31 +874,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_mem_space);
>   
> -static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
> -				     struct ttm_resource **mem,
> -				     struct ttm_operation_ctx *ctx,
> -				     struct ttm_place *hop)
> -{
> -	struct ttm_placement hop_placement;
> -	struct ttm_resource *hop_mem;
> -	int ret;
> -
> -	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
> -	hop_placement.placement = hop_placement.busy_placement = hop;
> -
> -	/* find space in the bounce domain */
> -	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
> -	if (ret)
> -		return ret;
> -	/* move to the bounce domain */
> -	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
> -	if (ret) {
> -		ttm_resource_free(bo, &hop_mem);
> -		return ret;
> -	}
> -	return 0;
> -}
> -
>   static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
>   			      struct ttm_placement *placement,
>   			      struct ttm_operation_ctx *ctx)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] drm/ttm: Fix multihop assert on eviction.
@ 2021-06-23  8:28     ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2021-06-23  8:28 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx; +Cc: Lang.Yu

Am 22.06.21 um 18:23 schrieb Andrey Grodzovsky:
> Problem:
> Under memory pressure when GTT domain is almost full multihop assert
> will come up when trying to evict LRU BO from VRAM to SYSTEM.
>
> Fix:
> Don't assert on multihop error in evict code but rather do a retry
> as we do in ttm_bo_move_buffer
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

But I think you need to move this patch earlier in the series or 
otherwise you break amdgpu eviction in between.

Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 63 +++++++++++++++++++-----------------
>   1 file changed, 34 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 45145d02aed2..5a2dc712c632 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -485,6 +485,31 @@ void ttm_bo_unlock_delayed_workqueue(struct ttm_device *bdev, int resched)
>   }
>   EXPORT_SYMBOL(ttm_bo_unlock_delayed_workqueue);
>   
> +static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
> +				     struct ttm_resource **mem,
> +				     struct ttm_operation_ctx *ctx,
> +				     struct ttm_place *hop)
> +{
> +	struct ttm_placement hop_placement;
> +	struct ttm_resource *hop_mem;
> +	int ret;
> +
> +	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
> +	hop_placement.placement = hop_placement.busy_placement = hop;
> +
> +	/* find space in the bounce domain */
> +	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
> +	if (ret)
> +		return ret;
> +	/* move to the bounce domain */
> +	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
> +	if (ret) {
> +		ttm_resource_free(bo, &hop_mem);
> +		return ret;
> +	}
> +	return 0;
> +}
> +
>   static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   			struct ttm_operation_ctx *ctx)
>   {
> @@ -524,12 +549,17 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   		goto out;
>   	}
>   
> +bounce:
>   	ret = ttm_bo_handle_move_mem(bo, evict_mem, true, ctx, &hop);
> -	if (unlikely(ret)) {
> -		WARN(ret == -EMULTIHOP, "Unexpected multihop in eviction - likely driver bug\n");
> -		if (ret != -ERESTARTSYS)
> +	if (ret == -EMULTIHOP) {
> +		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
> +		if (ret) {
>   			pr_err("Buffer eviction failed\n");
> -		ttm_resource_free(bo, &evict_mem);
> +			ttm_resource_free(bo, &evict_mem);
> +			goto out;
> +		}
> +		/* try and move to final place now. */
> +		goto bounce;
>   	}
>   out:
>   	return ret;
> @@ -844,31 +874,6 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_mem_space);
>   
> -static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
> -				     struct ttm_resource **mem,
> -				     struct ttm_operation_ctx *ctx,
> -				     struct ttm_place *hop)
> -{
> -	struct ttm_placement hop_placement;
> -	struct ttm_resource *hop_mem;
> -	int ret;
> -
> -	hop_placement.num_placement = hop_placement.num_busy_placement = 1;
> -	hop_placement.placement = hop_placement.busy_placement = hop;
> -
> -	/* find space in the bounce domain */
> -	ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx);
> -	if (ret)
> -		return ret;
> -	/* move to the bounce domain */
> -	ret = ttm_bo_handle_move_mem(bo, hop_mem, false, ctx, NULL);
> -	if (ret) {
> -		ttm_resource_free(bo, &hop_mem);
> -		return ret;
> -	}
> -	return 0;
> -}
> -
>   static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
>   			      struct ttm_placement *placement,
>   			      struct ttm_operation_ctx *ctx)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Fix BUG_ON assert
  2021-06-22 16:23   ` Andrey Grodzovsky
@ 2021-06-23  8:30     ` Christian König
  -1 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2021-06-23  8:30 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx; +Cc: Lang.Yu



Am 22.06.21 um 18:23 schrieb Andrey Grodzovsky:
> With added CPU domain to placement you can have
> now 3 placemnts at once.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

And please add CC: stable@kernel.org since this is trigger able from 
userspace and actually a rather nasty bug.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index b7a2070d90af..81268eded073 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -180,7 +180,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>   		c++;
>   	}
>   
> -	BUG_ON(c >= AMDGPU_BO_MAX_PLACEMENTS);
> +	BUG_ON(c > AMDGPU_BO_MAX_PLACEMENTS);
>   
>   	placement->num_placement = c;
>   	placement->placement = places;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] drm/amdgpu: Fix BUG_ON assert
@ 2021-06-23  8:30     ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2021-06-23  8:30 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx; +Cc: Lang.Yu



Am 22.06.21 um 18:23 schrieb Andrey Grodzovsky:
> With added CPU domain to placement you can have
> now 3 placemnts at once.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

And please add CC: stable@kernel.org since this is trigger able from 
userspace and actually a rather nasty bug.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index b7a2070d90af..81268eded073 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -180,7 +180,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>   		c++;
>   	}
>   
> -	BUG_ON(c >= AMDGPU_BO_MAX_PLACEMENTS);
> +	BUG_ON(c > AMDGPU_BO_MAX_PLACEMENTS);
>   
>   	placement->num_placement = c;
>   	placement->placement = places;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3
  2021-06-22 16:23 ` Andrey Grodzovsky
                   ` (5 preceding siblings ...)
  (?)
@ 2021-06-23 18:18 ` Das, Nirmoy
  2021-06-23 19:10   ` Andrey Grodzovsky
  -1 siblings, 1 reply; 18+ messages in thread
From: Das, Nirmoy @ 2021-06-23 18:18 UTC (permalink / raw)
  To: dri-devel

Tried on vmwgfx and amdgpu, everything looks fine. I would have love to 
do a kfdtest as

I think kfdtest does bo movement tests but it seems kfdtest regressing 
even before this series. Trying to debug that.


The series is Acked-by: Nirmoy Das <nirmoy.das@amd.com>


On 6/22/2021 6:23 PM, Andrey Grodzovsky wrote:
> From: Lang Yu <Lang.Yu@amd.com>
>
> Sometimes drivers need to use bounce buffers to evict BOs. While those reside
> in some domain they are not necessarily suitable for CS.
>
> Add a flag so that drivers can note that a bounce buffers needs to be
> reallocated during validation.
>
> v2: add detailed comments
> v3 (chk): merge commits and rework commit message
>
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Lang Yu <Lang.Yu@amd.com>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 3 +++
>   include/drm/ttm/ttm_placement.h | 7 +++++--
>   2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index db53fecca696..45145d02aed2 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -913,6 +913,9 @@ static bool ttm_bo_places_compat(const struct ttm_place *places,
>   {
>   	unsigned i;
>   
> +	if (mem->placement & TTM_PL_FLAG_TEMPORARY)
> +		return false;
> +
>   	for (i = 0; i < num_placement; i++) {
>   		const struct ttm_place *heap = &places[i];
>   
> diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
> index aa6ba4d0cf78..8995c9e4ec1b 100644
> --- a/include/drm/ttm/ttm_placement.h
> +++ b/include/drm/ttm/ttm_placement.h
> @@ -47,8 +47,11 @@
>    * top of the memory area, instead of the bottom.
>    */
>   
> -#define TTM_PL_FLAG_CONTIGUOUS  (1 << 19)
> -#define TTM_PL_FLAG_TOPDOWN     (1 << 22)
> +#define TTM_PL_FLAG_CONTIGUOUS  (1 << 0)
> +#define TTM_PL_FLAG_TOPDOWN     (1 << 1)
> +
> +/* For multihop handling */
> +#define TTM_PL_FLAG_TEMPORARY   (1 << 2)
>   
>   /**
>    * struct ttm_place

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3
  2021-06-23 18:18 ` [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3 Das, Nirmoy
@ 2021-06-23 19:10   ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2021-06-23 19:10 UTC (permalink / raw)
  To: Das, Nirmoy, dri-devel; +Cc: Olsak, Marek, Christian.Koenig

Thanks Nirmoy.

Pushed to drm-misc-next.

Andrey

On 2021-06-23 2:18 p.m., Das, Nirmoy wrote:
> Tried on vmwgfx and amdgpu, everything looks fine. I would have love 
> to do a kfdtest as
>
> I think kfdtest does bo movement tests but it seems kfdtest regressing 
> even before this series. Trying to debug that.
>
>
> The series is Acked-by: Nirmoy Das <nirmoy.das@amd.com>
>
>
> On 6/22/2021 6:23 PM, Andrey Grodzovsky wrote:
>> From: Lang Yu <Lang.Yu@amd.com>
>>
>> Sometimes drivers need to use bounce buffers to evict BOs. While 
>> those reside
>> in some domain they are not necessarily suitable for CS.
>>
>> Add a flag so that drivers can note that a bounce buffers needs to be
>> reallocated during validation.
>>
>> v2: add detailed comments
>> v3 (chk): merge commits and rework commit message
>>
>> Suggested-by: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Lang Yu <Lang.Yu@amd.com>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c    | 3 +++
>>   include/drm/ttm/ttm_placement.h | 7 +++++--
>>   2 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index db53fecca696..45145d02aed2 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -913,6 +913,9 @@ static bool ttm_bo_places_compat(const struct 
>> ttm_place *places,
>>   {
>>       unsigned i;
>>   +    if (mem->placement & TTM_PL_FLAG_TEMPORARY)
>> +        return false;
>> +
>>       for (i = 0; i < num_placement; i++) {
>>           const struct ttm_place *heap = &places[i];
>>   diff --git a/include/drm/ttm/ttm_placement.h 
>> b/include/drm/ttm/ttm_placement.h
>> index aa6ba4d0cf78..8995c9e4ec1b 100644
>> --- a/include/drm/ttm/ttm_placement.h
>> +++ b/include/drm/ttm/ttm_placement.h
>> @@ -47,8 +47,11 @@
>>    * top of the memory area, instead of the bottom.
>>    */
>>   -#define TTM_PL_FLAG_CONTIGUOUS  (1 << 19)
>> -#define TTM_PL_FLAG_TOPDOWN     (1 << 22)
>> +#define TTM_PL_FLAG_CONTIGUOUS  (1 << 0)
>> +#define TTM_PL_FLAG_TOPDOWN     (1 << 1)
>> +
>> +/* For multihop handling */
>> +#define TTM_PL_FLAG_TEMPORARY   (1 << 2)
>>     /**
>>    * struct ttm_place

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-06-23 19:10 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-22 16:23 [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3 Andrey Grodzovsky
2021-06-22 16:23 ` Andrey Grodzovsky
2021-06-22 16:23 ` [PATCH 2/6] drm/amdgpu: user temporary GTT as bounce buffer Andrey Grodzovsky
2021-06-22 16:23   ` Andrey Grodzovsky
2021-06-22 16:23 ` [PATCH 3/6] drm/amdgpu: always allow evicting to SYSTEM domain Andrey Grodzovsky
2021-06-22 16:23   ` Andrey Grodzovsky
2021-06-22 16:23 ` [PATCH 4/6] drm/amdgpu: switch gtt_mgr to counting used pages Andrey Grodzovsky
2021-06-22 16:23   ` Andrey Grodzovsky
2021-06-22 16:23 ` [PATCH 5/6] drm/amdgpu: Fix BUG_ON assert Andrey Grodzovsky
2021-06-22 16:23   ` Andrey Grodzovsky
2021-06-23  8:30   ` Christian König
2021-06-23  8:30     ` Christian König
2021-06-22 16:23 ` [PATCH 6/6] drm/ttm: Fix multihop assert on eviction Andrey Grodzovsky
2021-06-22 16:23   ` Andrey Grodzovsky
2021-06-23  8:28   ` Christian König
2021-06-23  8:28     ` Christian König
2021-06-23 18:18 ` [PATCH 1/6] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3 Das, Nirmoy
2021-06-23 19:10   ` Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.