All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves
@ 2018-09-14 13:42 Christian König
       [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2018-09-14 13:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

We are going to need this for recoverable page fault handling and it
makes shadow handling during GPU reset much more easier.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e6909252aefa..e6e5e5e50c98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1360,7 +1360,7 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
 {
 	WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM);
 	WARN_ON_ONCE(!ww_mutex_is_locked(&bo->tbo.resv->lock) &&
-		     !bo->pin_count);
+		     !bo->pin_count && bo->tbo.type != ttm_bo_type_kernel);
 	WARN_ON_ONCE(bo->tbo.mem.start == AMDGPU_BO_INVALID_OFFSET);
 	WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_VRAM &&
 		     !(bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS));
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8a158ee922f7..9e7991b1c8ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -524,7 +524,11 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
 	if (r)
 		goto error;
 
-	r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);
+	/* Always block for VM page tables before committing the new location */
+	if (bo->type == ttm_bo_type_kernel)
+		r = ttm_bo_move_accel_cleanup(bo, fence, true, new_mem);
+	else
+		r = ttm_bo_pipeline_move(bo, fence, evict, new_mem);
 	dma_fence_put(fence);
 	return r;
 
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] drm/amdgpu: always enable shadow BOs v2
       [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-14 13:42   ` Christian König
  2018-09-14 13:42   ` [PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment Christian König
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2018-09-14 13:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Even when GPU recovery is disabled we could run into a manually
triggered recovery.

v2: keep accidental removed comments

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e6e5e5e50c98..d8e8d653d518 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -51,18 +51,6 @@
  *
  */
 
-static bool amdgpu_bo_need_backup(struct amdgpu_device *adev)
-{
-	if (adev->flags & AMD_IS_APU)
-		return false;
-
-	if (amdgpu_gpu_recovery == 0 ||
-	    (amdgpu_gpu_recovery == -1  && !amdgpu_sriov_vf(adev)))
-		return false;
-
-	return true;
-}
-
 /**
  * amdgpu_bo_subtract_pin_size - Remove BO from pin_size accounting
  *
@@ -593,7 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	if (r)
 		return r;
 
-	if ((flags & AMDGPU_GEM_CREATE_SHADOW) && amdgpu_bo_need_backup(adev)) {
+	if ((flags & AMDGPU_GEM_CREATE_SHADOW) && !(adev->flags & AMD_IS_APU)) {
 		if (!bp->resv)
 			WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv,
 							NULL));
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment
       [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-14 13:42   ` [PATCH 2/5] drm/amdgpu: always enable shadow BOs v2 Christian König
@ 2018-09-14 13:42   ` Christian König
  2018-09-14 13:42   ` [PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery Christian König
  2018-09-14 13:42   ` [PATCH 5/5] drm/amdgpu: fix shadow BO restoring Christian König
  3 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2018-09-14 13:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

They aren't directly used by the hardware.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d8e8d653d518..650c45c896f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -526,7 +526,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 }
 
 static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
-				   unsigned long size, int byte_align,
+				   unsigned long size,
 				   struct amdgpu_bo *bo)
 {
 	struct amdgpu_bo_param bp;
@@ -537,7 +537,6 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
 
 	memset(&bp, 0, sizeof(bp));
 	bp.size = size;
-	bp.byte_align = byte_align;
 	bp.domain = AMDGPU_GEM_DOMAIN_GTT;
 	bp.flags = AMDGPU_GEM_CREATE_CPU_GTT_USWC |
 		AMDGPU_GEM_CREATE_SHADOW;
@@ -586,7 +585,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 			WARN_ON(reservation_object_lock((*bo_ptr)->tbo.resv,
 							NULL));
 
-		r = amdgpu_bo_create_shadow(adev, bp->size, bp->byte_align, (*bo_ptr));
+		r = amdgpu_bo_create_shadow(adev, bp->size, *bo_ptr);
 
 		if (!bp->resv)
 			reservation_object_unlock((*bo_ptr)->tbo.resv);
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery
       [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  2018-09-14 13:42   ` [PATCH 2/5] drm/amdgpu: always enable shadow BOs v2 Christian König
  2018-09-14 13:42   ` [PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment Christian König
@ 2018-09-14 13:42   ` Christian König
  2018-09-14 13:42   ` [PATCH 5/5] drm/amdgpu: fix shadow BO restoring Christian König
  3 siblings, 0 replies; 12+ messages in thread
From: Christian König @ 2018-09-14 13:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

It shouldn't add much overhead and we should make sure that critical
VRAM content is always restored.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 762dc5f886cd..899342c6dfad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3003,7 +3003,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
 }
 
 /**
- * amdgpu_device_handle_vram_lost - Handle the loss of VRAM contents
+ * amdgpu_device_recover_vram - Recover some VRAM contents
  *
  * @adev: amdgpu_device pointer
  *
@@ -3012,7 +3012,7 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
  * the contents of VRAM might be lost.
  * Returns 0 on success, 1 on failure.
  */
-static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev)
+static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 {
 	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
 	struct amdgpu_bo *bo, *tmp;
@@ -3139,8 +3139,8 @@ static int amdgpu_device_reset(struct amdgpu_device *adev)
 		}
 	}
 
-	if (!r && ((need_full_reset && !(adev->flags & AMD_IS_APU)) || vram_lost))
-		r = amdgpu_device_handle_vram_lost(adev);
+	if (!r)
+		r = amdgpu_device_recover_vram(adev);
 
 	return r;
 }
@@ -3186,7 +3186,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
 	amdgpu_virt_release_full_gpu(adev, true);
 	if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
 		atomic_inc(&adev->vram_lost_counter);
-		r = amdgpu_device_handle_vram_lost(adev);
+		r = amdgpu_device_recover_vram(adev);
 	}
 
 	return r;
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-09-14 13:42   ` [PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery Christian König
@ 2018-09-14 13:42   ` Christian König
       [not found]     ` <20180914134257.2196-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  3 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2018-09-14 13:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
 3 files changed, 43 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 899342c6dfad..1cbc372964f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
 	return 0;
 }
 
-/**
- * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: amdgpu_bo buffer whose shadow is being restored
- * @fence: dma_fence associated with the operation
- *
- * Restores the VRAM buffer contents from the shadow in GTT.  Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
-						  struct amdgpu_ring *ring,
-						  struct amdgpu_bo *bo,
-						  struct dma_fence **fence)
-{
-	uint32_t domain;
-	int r;
-
-	if (!bo->shadow)
-		return 0;
-
-	r = amdgpu_bo_reserve(bo, true);
-	if (r)
-		return r;
-	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
-	/* if bo has been evicted, then no need to recover */
-	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
-		r = amdgpu_bo_validate(bo->shadow);
-		if (r) {
-			DRM_ERROR("bo validate failed!\n");
-			goto err;
-		}
-
-		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
-						 NULL, fence, true);
-		if (r) {
-			DRM_ERROR("recover page table failed!\n");
-			goto err;
-		}
-	}
-err:
-	amdgpu_bo_unreserve(bo);
-	return r;
-}
-
 /**
  * amdgpu_device_recover_vram - Recover some VRAM contents
  *
@@ -3010,16 +2962,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
  * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
  * restore things like GPUVM page tables after a GPU reset where
  * the contents of VRAM might be lost.
- * Returns 0 on success, 1 on failure.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
  */
 static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 {
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-	struct amdgpu_bo *bo, *tmp;
 	struct dma_fence *fence = NULL, *next = NULL;
-	long r = 1;
-	int i = 0;
-	long tmo;
+	struct amdgpu_bo *shadow;
+	long r = 1, tmo;
 
 	if (amdgpu_sriov_runtime(adev))
 		tmo = msecs_to_jiffies(8000);
@@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 
 	DRM_INFO("recover vram bo from shadow start\n");
 	mutex_lock(&adev->shadow_list_lock);
-	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
-		next = NULL;
-		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
+	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
+
+		/* No need to recover an evicted BO */
+		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
+		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
+			continue;
+
+		r = amdgpu_bo_restore_shadow(shadow, &next);
+		if (r)
+			break;
+
 		if (fence) {
 			r = dma_fence_wait_timeout(fence, false, tmo);
-			if (r == 0)
-				pr_err("wait fence %p[%d] timeout\n", fence, i);
-			else if (r < 0)
-				pr_err("wait fence %p[%d] interrupted\n", fence, i);
-			if (r < 1) {
-				dma_fence_put(fence);
-				fence = next;
+			dma_fence_put(fence);
+			fence = next;
+			if (r <= 0)
 				break;
-			}
-			i++;
+		} else {
+			fence = next;
 		}
-
-		dma_fence_put(fence);
-		fence = next;
 	}
 	mutex_unlock(&adev->shadow_list_lock);
 
-	if (fence) {
-		r = dma_fence_wait_timeout(fence, false, tmo);
-		if (r == 0)
-			pr_err("wait fence %p[%d] timeout\n", fence, i);
-		else if (r < 0)
-			pr_err("wait fence %p[%d] interrupted\n", fence, i);
-
-	}
+	if (fence)
+		tmo = dma_fence_wait_timeout(fence, false, tmo);
 	dma_fence_put(fence);
 
-	if (r > 0)
-		DRM_INFO("recover vram bo from shadow done\n");
-	else
+	if (r <= 0 || tmo <= 0) {
 		DRM_ERROR("recover vram bo from shadow failed\n");
+		return -EIO;
+	}
 
-	return (r > 0) ? 0 : 1;
+	DRM_INFO("recover vram bo from shadow done\n");
+	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 650c45c896f0..ca8a0b7a5ac3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -547,7 +547,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
 	if (!r) {
 		bo->shadow->parent = amdgpu_bo_ref(bo);
 		mutex_lock(&adev->shadow_list_lock);
-		list_add_tail(&bo->shadow_list, &adev->shadow_list);
+		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
 		mutex_unlock(&adev->shadow_list_lock);
 	}
 
@@ -679,13 +679,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
 }
 
 /**
- * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
- * @adev: amdgpu device object
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: &amdgpu_bo buffer to be restored
- * @resv: reservation object with embedded fence
+ * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
+ *
+ * @shadow: &amdgpu_bo shadow to be restored
  * @fence: dma_fence associated with the operation
- * @direct: whether to submit the job directly
  *
  * Copies a buffer object's shadow content back to the object.
  * This is used for recovering a buffer from its shadow in case of a gpu
@@ -694,36 +691,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
  * Returns:
  * 0 for success or a negative error code on failure.
  */
-int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
-				  struct amdgpu_ring *ring,
-				  struct amdgpu_bo *bo,
-				  struct reservation_object *resv,
-				  struct dma_fence **fence,
-				  bool direct)
+int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
 
 {
-	struct amdgpu_bo *shadow = bo->shadow;
-	uint64_t bo_addr, shadow_addr;
-	int r;
-
-	if (!shadow)
-		return -EINVAL;
-
-	bo_addr = amdgpu_bo_gpu_offset(bo);
-	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
-
-	r = reservation_object_reserve_shared(bo->tbo.resv);
-	if (r)
-		goto err;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	uint64_t shadow_addr, parent_addr;
 
-	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
-			       amdgpu_bo_size(bo), resv, fence,
-			       direct, false);
-	if (!r)
-		amdgpu_bo_fence(bo, *fence, true);
+	shadow_addr = amdgpu_bo_gpu_offset(shadow);
+	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
 
-err:
-	return r;
+	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
+				  amdgpu_bo_size(shadow), NULL, fence,
+				  true, false);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 64337ff2ad63..7d3312d0da11 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
 			       struct reservation_object *resv,
 			       struct dma_fence **fence, bool direct);
 int amdgpu_bo_validate(struct amdgpu_bo *bo);
-int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
-				  struct amdgpu_ring *ring,
-				  struct amdgpu_bo *bo,
-				  struct reservation_object *resv,
-				  struct dma_fence **fence,
-				  bool direct);
+int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
+			     struct dma_fence **fence);
 uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
 					    uint32_t domain);
 
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found]     ` <20180914134257.2196-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-17  9:10       ` Huang Rui
  2018-09-17 11:42         ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Huang Rui @ 2018-09-17  9:10 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Sep 14, 2018 at 03:42:57PM +0200, Christian König wrote:
> Don't grab the reservation lock any more and simplify the handling quite
> a bit.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
>  3 files changed, 43 insertions(+), 120 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 899342c6dfad..1cbc372964f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
>  	return 0;
>  }
>  
> -/**
> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
> - *
> - * @adev: amdgpu_device pointer
> - * @ring: amdgpu_ring for the engine handling the buffer operations
> - * @bo: amdgpu_bo buffer whose shadow is being restored
> - * @fence: dma_fence associated with the operation
> - *
> - * Restores the VRAM buffer contents from the shadow in GTT.  Used to
> - * restore things like GPUVM page tables after a GPU reset where
> - * the contents of VRAM might be lost.
> - * Returns 0 on success, negative error code on failure.
> - */
> -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
> -						  struct amdgpu_ring *ring,
> -						  struct amdgpu_bo *bo,
> -						  struct dma_fence **fence)
> -{
> -	uint32_t domain;
> -	int r;
> -
> -	if (!bo->shadow)
> -		return 0;
> -
> -	r = amdgpu_bo_reserve(bo, true);
> -	if (r)
> -		return r;
> -	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
> -	/* if bo has been evicted, then no need to recover */
> -	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -		r = amdgpu_bo_validate(bo->shadow);
> -		if (r) {
> -			DRM_ERROR("bo validate failed!\n");
> -			goto err;
> -		}
> -
> -		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
> -						 NULL, fence, true);
> -		if (r) {
> -			DRM_ERROR("recover page table failed!\n");
> -			goto err;
> -		}
> -	}
> -err:
> -	amdgpu_bo_unreserve(bo);
> -	return r;
> -}
> -
>  /**
>   * amdgpu_device_recover_vram - Recover some VRAM contents
>   *
> @@ -3010,16 +2962,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>   * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
>   * restore things like GPUVM page tables after a GPU reset where
>   * the contents of VRAM might be lost.
> - * Returns 0 on success, 1 on failure.
> + *
> + * Returns:
> + * 0 on success, negative error code on failure.
>   */
>  static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>  {
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> -	struct amdgpu_bo *bo, *tmp;
>  	struct dma_fence *fence = NULL, *next = NULL;
> -	long r = 1;
> -	int i = 0;
> -	long tmo;
> +	struct amdgpu_bo *shadow;
> +	long r = 1, tmo;
>  
>  	if (amdgpu_sriov_runtime(adev))
>  		tmo = msecs_to_jiffies(8000);
> @@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>  
>  	DRM_INFO("recover vram bo from shadow start\n");
>  	mutex_lock(&adev->shadow_list_lock);
> -	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
> -		next = NULL;
> -		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
> +	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
> +
> +		/* No need to recover an evicted BO */
> +		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
> +		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
> +			continue;
> +
> +		r = amdgpu_bo_restore_shadow(shadow, &next);
> +		if (r)
> +			break;
> +
>  		if (fence) {
>  			r = dma_fence_wait_timeout(fence, false, tmo);
> -			if (r == 0)
> -				pr_err("wait fence %p[%d] timeout\n", fence, i);
> -			else if (r < 0)
> -				pr_err("wait fence %p[%d] interrupted\n", fence, i);
> -			if (r < 1) {
> -				dma_fence_put(fence);
> -				fence = next;
> +			dma_fence_put(fence);
> +			fence = next;
> +			if (r <= 0)
>  				break;
> -			}
> -			i++;
> +		} else {
> +			fence = next;
>  		}
> -
> -		dma_fence_put(fence);
> -		fence = next;
>  	}
>  	mutex_unlock(&adev->shadow_list_lock);
>  
> -	if (fence) {
> -		r = dma_fence_wait_timeout(fence, false, tmo);
> -		if (r == 0)
> -			pr_err("wait fence %p[%d] timeout\n", fence, i);
> -		else if (r < 0)
> -			pr_err("wait fence %p[%d] interrupted\n", fence, i);
> -
> -	}
> +	if (fence)
> +		tmo = dma_fence_wait_timeout(fence, false, tmo);
>  	dma_fence_put(fence);
>  
> -	if (r > 0)
> -		DRM_INFO("recover vram bo from shadow done\n");
> -	else
> +	if (r <= 0 || tmo <= 0) {
>  		DRM_ERROR("recover vram bo from shadow failed\n");
> +		return -EIO;
> +	}
>  
> -	return (r > 0) ? 0 : 1;
> +	DRM_INFO("recover vram bo from shadow done\n");
> +	return 0;
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 650c45c896f0..ca8a0b7a5ac3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -547,7 +547,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
>  	if (!r) {
>  		bo->shadow->parent = amdgpu_bo_ref(bo);
>  		mutex_lock(&adev->shadow_list_lock);
> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
> +		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
>  		mutex_unlock(&adev->shadow_list_lock);
>  	}
>  
> @@ -679,13 +679,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>  }
>  
>  /**
> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
> - * @adev: amdgpu device object
> - * @ring: amdgpu_ring for the engine handling the buffer operations
> - * @bo: &amdgpu_bo buffer to be restored
> - * @resv: reservation object with embedded fence
> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
> + *
> + * @shadow: &amdgpu_bo shadow to be restored
>   * @fence: dma_fence associated with the operation
> - * @direct: whether to submit the job directly
>   *
>   * Copies a buffer object's shadow content back to the object.
>   * This is used for recovering a buffer from its shadow in case of a gpu
> @@ -694,36 +691,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>   * Returns:
>   * 0 for success or a negative error code on failure.
>   */
> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> -				  struct amdgpu_ring *ring,
> -				  struct amdgpu_bo *bo,
> -				  struct reservation_object *resv,
> -				  struct dma_fence **fence,
> -				  bool direct)
> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
>  
>  {
> -	struct amdgpu_bo *shadow = bo->shadow;
> -	uint64_t bo_addr, shadow_addr;
> -	int r;
> -
> -	if (!shadow)
> -		return -EINVAL;
> -
> -	bo_addr = amdgpu_bo_gpu_offset(bo);
> -	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
> -
> -	r = reservation_object_reserve_shared(bo->tbo.resv);
> -	if (r)
> -		goto err;
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
> +	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> +	uint64_t shadow_addr, parent_addr;
>  

May I know why won't need the reservation_object_reserve_shared() here? Is
it because we only copy buffer from shadow parent bo instead of orignal bo?

Thanks,
Ray

> -	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
> -			       amdgpu_bo_size(bo), resv, fence,
> -			       direct, false);
> -	if (!r)
> -		amdgpu_bo_fence(bo, *fence, true);
> +	shadow_addr = amdgpu_bo_gpu_offset(shadow);
> +	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
>  
> -err:
> -	return r;
> +	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
> +				  amdgpu_bo_size(shadow), NULL, fence,
> +				  true, false);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 64337ff2ad63..7d3312d0da11 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
>  			       struct reservation_object *resv,
>  			       struct dma_fence **fence, bool direct);
>  int amdgpu_bo_validate(struct amdgpu_bo *bo);
> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> -				  struct amdgpu_ring *ring,
> -				  struct amdgpu_bo *bo,
> -				  struct reservation_object *resv,
> -				  struct dma_fence **fence,
> -				  bool direct);
> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
> +			     struct dma_fence **fence);
>  uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
>  					    uint32_t domain);
>  
> -- 
> 2.14.1
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
  2018-09-17  9:10       ` Huang Rui
@ 2018-09-17 11:42         ` Christian König
       [not found]           ` <e1c78397-4a41-0f74-2049-5c63a279630b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2018-09-17 11:42 UTC (permalink / raw)
  To: Huang Rui; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 17.09.2018 um 11:10 schrieb Huang Rui:
> On Fri, Sep 14, 2018 at 03:42:57PM +0200, Christian König wrote:
>> Don't grab the reservation lock any more and simplify the handling quite
>> a bit.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
>>   3 files changed, 43 insertions(+), 120 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 899342c6dfad..1cbc372964f8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
>>   	return 0;
>>   }
>>   
>> -/**
>> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>> - * @bo: amdgpu_bo buffer whose shadow is being restored
>> - * @fence: dma_fence associated with the operation
>> - *
>> - * Restores the VRAM buffer contents from the shadow in GTT.  Used to
>> - * restore things like GPUVM page tables after a GPU reset where
>> - * the contents of VRAM might be lost.
>> - * Returns 0 on success, negative error code on failure.
>> - */
>> -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>> -						  struct amdgpu_ring *ring,
>> -						  struct amdgpu_bo *bo,
>> -						  struct dma_fence **fence)
>> -{
>> -	uint32_t domain;
>> -	int r;
>> -
>> -	if (!bo->shadow)
>> -		return 0;
>> -
>> -	r = amdgpu_bo_reserve(bo, true);
>> -	if (r)
>> -		return r;
>> -	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
>> -	/* if bo has been evicted, then no need to recover */
>> -	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -		r = amdgpu_bo_validate(bo->shadow);
>> -		if (r) {
>> -			DRM_ERROR("bo validate failed!\n");
>> -			goto err;
>> -		}
>> -
>> -		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
>> -						 NULL, fence, true);
>> -		if (r) {
>> -			DRM_ERROR("recover page table failed!\n");
>> -			goto err;
>> -		}
>> -	}
>> -err:
>> -	amdgpu_bo_unreserve(bo);
>> -	return r;
>> -}
>> -
>>   /**
>>    * amdgpu_device_recover_vram - Recover some VRAM contents
>>    *
>> @@ -3010,16 +2962,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>>    * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
>>    * restore things like GPUVM page tables after a GPU reset where
>>    * the contents of VRAM might be lost.
>> - * Returns 0 on success, 1 on failure.
>> + *
>> + * Returns:
>> + * 0 on success, negative error code on failure.
>>    */
>>   static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>>   {
>> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>> -	struct amdgpu_bo *bo, *tmp;
>>   	struct dma_fence *fence = NULL, *next = NULL;
>> -	long r = 1;
>> -	int i = 0;
>> -	long tmo;
>> +	struct amdgpu_bo *shadow;
>> +	long r = 1, tmo;
>>   
>>   	if (amdgpu_sriov_runtime(adev))
>>   		tmo = msecs_to_jiffies(8000);
>> @@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>>   
>>   	DRM_INFO("recover vram bo from shadow start\n");
>>   	mutex_lock(&adev->shadow_list_lock);
>> -	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
>> -		next = NULL;
>> -		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
>> +	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
>> +
>> +		/* No need to recover an evicted BO */
>> +		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
>> +		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
>> +			continue;
>> +
>> +		r = amdgpu_bo_restore_shadow(shadow, &next);
>> +		if (r)
>> +			break;
>> +
>>   		if (fence) {
>>   			r = dma_fence_wait_timeout(fence, false, tmo);
>> -			if (r == 0)
>> -				pr_err("wait fence %p[%d] timeout\n", fence, i);
>> -			else if (r < 0)
>> -				pr_err("wait fence %p[%d] interrupted\n", fence, i);
>> -			if (r < 1) {
>> -				dma_fence_put(fence);
>> -				fence = next;
>> +			dma_fence_put(fence);
>> +			fence = next;
>> +			if (r <= 0)
>>   				break;
>> -			}
>> -			i++;
>> +		} else {
>> +			fence = next;
>>   		}
>> -
>> -		dma_fence_put(fence);
>> -		fence = next;
>>   	}
>>   	mutex_unlock(&adev->shadow_list_lock);
>>   
>> -	if (fence) {
>> -		r = dma_fence_wait_timeout(fence, false, tmo);
>> -		if (r == 0)
>> -			pr_err("wait fence %p[%d] timeout\n", fence, i);
>> -		else if (r < 0)
>> -			pr_err("wait fence %p[%d] interrupted\n", fence, i);
>> -
>> -	}
>> +	if (fence)
>> +		tmo = dma_fence_wait_timeout(fence, false, tmo);
>>   	dma_fence_put(fence);
>>   
>> -	if (r > 0)
>> -		DRM_INFO("recover vram bo from shadow done\n");
>> -	else
>> +	if (r <= 0 || tmo <= 0) {
>>   		DRM_ERROR("recover vram bo from shadow failed\n");
>> +		return -EIO;
>> +	}
>>   
>> -	return (r > 0) ? 0 : 1;
>> +	DRM_INFO("recover vram bo from shadow done\n");
>> +	return 0;
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 650c45c896f0..ca8a0b7a5ac3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -547,7 +547,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
>>   	if (!r) {
>>   		bo->shadow->parent = amdgpu_bo_ref(bo);
>>   		mutex_lock(&adev->shadow_list_lock);
>> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
>> +		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
>>   		mutex_unlock(&adev->shadow_list_lock);
>>   	}
>>   
>> @@ -679,13 +679,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>   }
>>   
>>   /**
>> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
>> - * @adev: amdgpu device object
>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>> - * @bo: &amdgpu_bo buffer to be restored
>> - * @resv: reservation object with embedded fence
>> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
>> + *
>> + * @shadow: &amdgpu_bo shadow to be restored
>>    * @fence: dma_fence associated with the operation
>> - * @direct: whether to submit the job directly
>>    *
>>    * Copies a buffer object's shadow content back to the object.
>>    * This is used for recovering a buffer from its shadow in case of a gpu
>> @@ -694,36 +691,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>    * Returns:
>>    * 0 for success or a negative error code on failure.
>>    */
>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>> -				  struct amdgpu_ring *ring,
>> -				  struct amdgpu_bo *bo,
>> -				  struct reservation_object *resv,
>> -				  struct dma_fence **fence,
>> -				  bool direct)
>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
>>   
>>   {
>> -	struct amdgpu_bo *shadow = bo->shadow;
>> -	uint64_t bo_addr, shadow_addr;
>> -	int r;
>> -
>> -	if (!shadow)
>> -		return -EINVAL;
>> -
>> -	bo_addr = amdgpu_bo_gpu_offset(bo);
>> -	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
>> -
>> -	r = reservation_object_reserve_shared(bo->tbo.resv);
>> -	if (r)
>> -		goto err;
>> +	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
>> +	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>> +	uint64_t shadow_addr, parent_addr;
>>   
> May I know why won't need the reservation_object_reserve_shared() here? Is
> it because we only copy buffer from shadow parent bo instead of orignal bo?

The scheduler and delayed free thread are disabled and we wait for the 
copy to finish before allowing any eviction to proceed.

Adding the BO as shared fence to the BO could actually be harmful 
because TTM might already want to destroy it.

Regards,
Christian.

>
> Thanks,
> Ray
>
>> -	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
>> -			       amdgpu_bo_size(bo), resv, fence,
>> -			       direct, false);
>> -	if (!r)
>> -		amdgpu_bo_fence(bo, *fence, true);
>> +	shadow_addr = amdgpu_bo_gpu_offset(shadow);
>> +	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
>>   
>> -err:
>> -	return r;
>> +	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
>> +				  amdgpu_bo_size(shadow), NULL, fence,
>> +				  true, false);
>>   }
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index 64337ff2ad63..7d3312d0da11 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
>>   			       struct reservation_object *resv,
>>   			       struct dma_fence **fence, bool direct);
>>   int amdgpu_bo_validate(struct amdgpu_bo *bo);
>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>> -				  struct amdgpu_ring *ring,
>> -				  struct amdgpu_bo *bo,
>> -				  struct reservation_object *resv,
>> -				  struct dma_fence **fence,
>> -				  bool direct);
>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
>> +			     struct dma_fence **fence);
>>   uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
>>   					    uint32_t domain);
>>   
>> -- 
>> 2.14.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found]           ` <e1c78397-4a41-0f74-2049-5c63a279630b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-18  8:40             ` Huang Rui
  0 siblings, 0 replies; 12+ messages in thread
From: Huang Rui @ 2018-09-18  8:40 UTC (permalink / raw)
  To: Koenig, Christian; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Mon, Sep 17, 2018 at 07:42:38PM +0800, Christian König wrote:
> Am 17.09.2018 um 11:10 schrieb Huang Rui:
> > On Fri, Sep 14, 2018 at 03:42:57PM +0200, Christian König wrote:
> >> Don't grab the reservation lock any more and simplify the handling quite
> >> a bit.
> >>
> >> Signed-off-by: Christian König <christian.koenig@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
> >>   3 files changed, 43 insertions(+), 120 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 899342c6dfad..1cbc372964f8 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -2954,54 +2954,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
> >>   	return 0;
> >>   }
> >>   
> >> -/**
> >> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
> >> - *
> >> - * @adev: amdgpu_device pointer
> >> - * @ring: amdgpu_ring for the engine handling the buffer operations
> >> - * @bo: amdgpu_bo buffer whose shadow is being restored
> >> - * @fence: dma_fence associated with the operation
> >> - *
> >> - * Restores the VRAM buffer contents from the shadow in GTT.  Used to
> >> - * restore things like GPUVM page tables after a GPU reset where
> >> - * the contents of VRAM might be lost.
> >> - * Returns 0 on success, negative error code on failure.
> >> - */
> >> -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
> >> -						  struct amdgpu_ring *ring,
> >> -						  struct amdgpu_bo *bo,
> >> -						  struct dma_fence **fence)
> >> -{
> >> -	uint32_t domain;
> >> -	int r;
> >> -
> >> -	if (!bo->shadow)
> >> -		return 0;
> >> -
> >> -	r = amdgpu_bo_reserve(bo, true);
> >> -	if (r)
> >> -		return r;
> >> -	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
> >> -	/* if bo has been evicted, then no need to recover */
> >> -	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
> >> -		r = amdgpu_bo_validate(bo->shadow);
> >> -		if (r) {
> >> -			DRM_ERROR("bo validate failed!\n");
> >> -			goto err;
> >> -		}
> >> -
> >> -		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
> >> -						 NULL, fence, true);
> >> -		if (r) {
> >> -			DRM_ERROR("recover page table failed!\n");
> >> -			goto err;
> >> -		}
> >> -	}
> >> -err:
> >> -	amdgpu_bo_unreserve(bo);
> >> -	return r;
> >> -}
> >> -
> >>   /**
> >>    * amdgpu_device_recover_vram - Recover some VRAM contents
> >>    *
> >> @@ -3010,16 +2962,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
> >>    * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
> >>    * restore things like GPUVM page tables after a GPU reset where
> >>    * the contents of VRAM might be lost.
> >> - * Returns 0 on success, 1 on failure.
> >> + *
> >> + * Returns:
> >> + * 0 on success, negative error code on failure.
> >>    */
> >>   static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
> >>   {
> >> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> >> -	struct amdgpu_bo *bo, *tmp;
> >>   	struct dma_fence *fence = NULL, *next = NULL;
> >> -	long r = 1;
> >> -	int i = 0;
> >> -	long tmo;
> >> +	struct amdgpu_bo *shadow;
> >> +	long r = 1, tmo;
> >>   
> >>   	if (amdgpu_sriov_runtime(adev))
> >>   		tmo = msecs_to_jiffies(8000);
> >> @@ -3028,44 +2979,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
> >>   
> >>   	DRM_INFO("recover vram bo from shadow start\n");
> >>   	mutex_lock(&adev->shadow_list_lock);
> >> -	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
> >> -		next = NULL;
> >> -		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
> >> +	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
> >> +
> >> +		/* No need to recover an evicted BO */
> >> +		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
> >> +		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
> >> +			continue;
> >> +
> >> +		r = amdgpu_bo_restore_shadow(shadow, &next);
> >> +		if (r)
> >> +			break;
> >> +
> >>   		if (fence) {
> >>   			r = dma_fence_wait_timeout(fence, false, tmo);
> >> -			if (r == 0)
> >> -				pr_err("wait fence %p[%d] timeout\n", fence, i);
> >> -			else if (r < 0)
> >> -				pr_err("wait fence %p[%d] interrupted\n", fence, i);
> >> -			if (r < 1) {
> >> -				dma_fence_put(fence);
> >> -				fence = next;
> >> +			dma_fence_put(fence);
> >> +			fence = next;
> >> +			if (r <= 0)
> >>   				break;
> >> -			}
> >> -			i++;
> >> +		} else {
> >> +			fence = next;
> >>   		}
> >> -
> >> -		dma_fence_put(fence);
> >> -		fence = next;
> >>   	}
> >>   	mutex_unlock(&adev->shadow_list_lock);
> >>   
> >> -	if (fence) {
> >> -		r = dma_fence_wait_timeout(fence, false, tmo);
> >> -		if (r == 0)
> >> -			pr_err("wait fence %p[%d] timeout\n", fence, i);
> >> -		else if (r < 0)
> >> -			pr_err("wait fence %p[%d] interrupted\n", fence, i);
> >> -
> >> -	}
> >> +	if (fence)
> >> +		tmo = dma_fence_wait_timeout(fence, false, tmo);
> >>   	dma_fence_put(fence);
> >>   
> >> -	if (r > 0)
> >> -		DRM_INFO("recover vram bo from shadow done\n");
> >> -	else
> >> +	if (r <= 0 || tmo <= 0) {
> >>   		DRM_ERROR("recover vram bo from shadow failed\n");
> >> +		return -EIO;
> >> +	}
> >>   
> >> -	return (r > 0) ? 0 : 1;
> >> +	DRM_INFO("recover vram bo from shadow done\n");
> >> +	return 0;
> >>   }
> >>   
> >>   /**
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> index 650c45c896f0..ca8a0b7a5ac3 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >> @@ -547,7 +547,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
> >>   	if (!r) {
> >>   		bo->shadow->parent = amdgpu_bo_ref(bo);
> >>   		mutex_lock(&adev->shadow_list_lock);
> >> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
> >> +		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
> >>   		mutex_unlock(&adev->shadow_list_lock);
> >>   	}
> >>   
> >> @@ -679,13 +679,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
> >>   }
> >>   
> >>   /**
> >> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
> >> - * @adev: amdgpu device object
> >> - * @ring: amdgpu_ring for the engine handling the buffer operations
> >> - * @bo: &amdgpu_bo buffer to be restored
> >> - * @resv: reservation object with embedded fence
> >> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
> >> + *
> >> + * @shadow: &amdgpu_bo shadow to be restored
> >>    * @fence: dma_fence associated with the operation
> >> - * @direct: whether to submit the job directly
> >>    *
> >>    * Copies a buffer object's shadow content back to the object.
> >>    * This is used for recovering a buffer from its shadow in case of a gpu
> >> @@ -694,36 +691,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
> >>    * Returns:
> >>    * 0 for success or a negative error code on failure.
> >>    */
> >> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> >> -				  struct amdgpu_ring *ring,
> >> -				  struct amdgpu_bo *bo,
> >> -				  struct reservation_object *resv,
> >> -				  struct dma_fence **fence,
> >> -				  bool direct)
> >> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
> >>   
> >>   {
> >> -	struct amdgpu_bo *shadow = bo->shadow;
> >> -	uint64_t bo_addr, shadow_addr;
> >> -	int r;
> >> -
> >> -	if (!shadow)
> >> -		return -EINVAL;
> >> -
> >> -	bo_addr = amdgpu_bo_gpu_offset(bo);
> >> -	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
> >> -
> >> -	r = reservation_object_reserve_shared(bo->tbo.resv);
> >> -	if (r)
> >> -		goto err;
> >> +	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
> >> +	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> >> +	uint64_t shadow_addr, parent_addr;
> >>   
> > May I know why won't need the reservation_object_reserve_shared() here? Is
> > it because we only copy buffer from shadow parent bo instead of orignal bo?
> 
> The scheduler and delayed free thread are disabled and we wait for the 
> copy to finish before allowing any eviction to proceed.
> 
> Adding the BO as shared fence to the BO could actually be harmful 
> because TTM might already want to destroy it.
> 

Thanks, patch is Reviewed-by: Huang Rui <ray.huang@amd.com>

> Regards,
> Christian.
> 
> >
> > Thanks,
> > Ray
> >
> >> -	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
> >> -			       amdgpu_bo_size(bo), resv, fence,
> >> -			       direct, false);
> >> -	if (!r)
> >> -		amdgpu_bo_fence(bo, *fence, true);
> >> +	shadow_addr = amdgpu_bo_gpu_offset(shadow);
> >> +	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
> >>   
> >> -err:
> >> -	return r;
> >> +	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
> >> +				  amdgpu_bo_size(shadow), NULL, fence,
> >> +				  true, false);
> >>   }
> >>   
> >>   /**
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> index 64337ff2ad63..7d3312d0da11 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> >> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
> >>   			       struct reservation_object *resv,
> >>   			       struct dma_fence **fence, bool direct);
> >>   int amdgpu_bo_validate(struct amdgpu_bo *bo);
> >> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> >> -				  struct amdgpu_ring *ring,
> >> -				  struct amdgpu_bo *bo,
> >> -				  struct reservation_object *resv,
> >> -				  struct dma_fence **fence,
> >> -				  bool direct);
> >> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
> >> +			     struct dma_fence **fence);
> >>   uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
> >>   					    uint32_t domain);
> >>   
> >> -- 
> >> 2.14.1
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found]             ` <fa8fb53f-ff84-fb45-59c7-1d52ec57fba7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-18  6:15               ` Zhang, Jerry(Junwei)
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang, Jerry(Junwei) @ 2018-09-18  6:15 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/14/2018 07:54 PM, Christian König wrote:
> Am 13.09.2018 um 11:29 schrieb Zhang, Jerry(Junwei):
>> On 09/11/2018 05:56 PM, Christian König wrote:
>>> Don't grab the reservation lock any more and simplify the handling 
>>> quite
>>> a bit.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 
>>> ++++++++---------------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
>>>   3 files changed, 43 insertions(+), 120 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 5eba66ecf668..20bb702f5c7f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2940,54 +2940,6 @@ static int 
>>> amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
>>>       return 0;
>>>   }
>>>   -/**
>>> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM 
>>> buffers
>>> - *
>>> - * @adev: amdgpu_device pointer
>>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>>> - * @bo: amdgpu_bo buffer whose shadow is being restored
>>> - * @fence: dma_fence associated with the operation
>>> - *
>>> - * Restores the VRAM buffer contents from the shadow in GTT. Used to
>>> - * restore things like GPUVM page tables after a GPU reset where
>>> - * the contents of VRAM might be lost.
>>> - * Returns 0 on success, negative error code on failure.
>>> - */
>>> -static int amdgpu_device_recover_vram_from_shadow(struct 
>>> amdgpu_device *adev,
>>> -                          struct amdgpu_ring *ring,
>>> -                          struct amdgpu_bo *bo,
>>> -                          struct dma_fence **fence)
>>> -{
>>> -    uint32_t domain;
>>> -    int r;
>>> -
>>> -    if (!bo->shadow)
>>> -        return 0;
>>> -
>>> -    r = amdgpu_bo_reserve(bo, true);
>>> -    if (r)
>>> -        return r;
>>> -    domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
>>> -    /* if bo has been evicted, then no need to recover */
>>> -    if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> -        r = amdgpu_bo_validate(bo->shadow);
>>> -        if (r) {
>>> -            DRM_ERROR("bo validate failed!\n");
>>> -            goto err;
>>> -        }
>>> -
>>> -        r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
>>> -                         NULL, fence, true);
>>> -        if (r) {
>>> -            DRM_ERROR("recover page table failed!\n");
>>> -            goto err;
>>> -        }
>>> -    }
>>> -err:
>>> -    amdgpu_bo_unreserve(bo);
>>> -    return r;
>>> -}
>>> -
>>>   /**
>>>    * amdgpu_device_recover_vram - Recover some VRAM contents
>>>    *
>>> @@ -2996,16 +2948,15 @@ static int 
>>> amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>>>    * Restores the contents of VRAM buffers from the shadows in GTT.  
>>> Used to
>>>    * restore things like GPUVM page tables after a GPU reset where
>>>    * the contents of VRAM might be lost.
>>> - * Returns 0 on success, 1 on failure.
>>> + *
>>> + * Returns:
>>> + * 0 on success, negative error code on failure.
>>>    */
>>>   static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>>>   {
>>> -    struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>> -    struct amdgpu_bo *bo, *tmp;
>>>       struct dma_fence *fence = NULL, *next = NULL;
>>> -    long r = 1;
>>> -    int i = 0;
>>> -    long tmo;
>>> +    struct amdgpu_bo *shadow;
>>> +    long r = 1, tmo;
>>>         if (amdgpu_sriov_runtime(adev))
>>>           tmo = msecs_to_jiffies(8000);
>>> @@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct 
>>> amdgpu_device *adev)
>>>         DRM_INFO("recover vram bo from shadow start\n");
>>>       mutex_lock(&adev->shadow_list_lock);
>>> -    list_for_each_entry_safe(bo, tmp, &adev->shadow_list, 
>>> shadow_list) {
>>> -        next = NULL;
>>> -        amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
>>> +    list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
>>> +
>>> +        /* No need to recover an evicted BO */
>>> +        if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
>>> +            shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
>> is there a change that shadow bo evicted to other domain?
>> like SYSTEM?
>
> Yes, that's why I test "!= TTM_PL_TT" here.
>
> What can happen is that either the shadow or the page table or page 
> directory is evicted.
>
> But in this case we don't need to restore anything because of patch #1 
> in this series.

Thanks, then it's
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>

Regards,
Jerry

>
> Regards,
> Christian.
>
>>
>> Regards,
>> Jerry
>>> +            continue;
>>> +
>>> +        r = amdgpu_bo_restore_shadow(shadow, &next);
>>> +        if (r)
>>> +            break;
>>> +
>>>           if (fence) {
>>>               r = dma_fence_wait_timeout(fence, false, tmo);
>>> -            if (r == 0)
>>> -                pr_err("wait fence %p[%d] timeout\n", fence, i);
>>> -            else if (r < 0)
>>> -                pr_err("wait fence %p[%d] interrupted\n", fence, i);
>>> -            if (r < 1) {
>>> -                dma_fence_put(fence);
>>> -                fence = next;
>>> +            dma_fence_put(fence);
>>> +            fence = next;
>>> +            if (r <= 0)
>>>                   break;
>>> -            }
>>> -            i++;
>>> +        } else {
>>> +            fence = next;
>>>           }
>>> -
>>> -        dma_fence_put(fence);
>>> -        fence = next;
>>>       }
>>>       mutex_unlock(&adev->shadow_list_lock);
>>>   -    if (fence) {
>>> -        r = dma_fence_wait_timeout(fence, false, tmo);
>>> -        if (r == 0)
>>> -            pr_err("wait fence %p[%d] timeout\n", fence, i);
>>> -        else if (r < 0)
>>> -            pr_err("wait fence %p[%d] interrupted\n", fence, i);
>>> -
>>> -    }
>>> +    if (fence)
>>> +        tmo = dma_fence_wait_timeout(fence, false, tmo);
>>>       dma_fence_put(fence);
>>>   -    if (r > 0)
>>> -        DRM_INFO("recover vram bo from shadow done\n");
>>> -    else
>>> +    if (r <= 0 || tmo <= 0) {
>>>           DRM_ERROR("recover vram bo from shadow failed\n");
>>> +        return -EIO;
>>> +    }
>>>   -    return (r > 0) ? 0 : 1;
>>> +    DRM_INFO("recover vram bo from shadow done\n");
>>> +    return 0;
>>>   }
>>>     /**
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 3a6f92de5504..5b6d5fcc2151 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -537,7 +537,7 @@ static int amdgpu_bo_create_shadow(struct 
>>> amdgpu_device *adev,
>>>       if (!r) {
>>>           bo->shadow->parent = amdgpu_bo_ref(bo);
>>>           mutex_lock(&adev->shadow_list_lock);
>>> -        list_add_tail(&bo->shadow_list, &adev->shadow_list);
>>> +        list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
>>>           mutex_unlock(&adev->shadow_list_lock);
>>>       }
>>>   @@ -669,13 +669,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>>   }
>>>     /**
>>> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
>>> - * @adev: amdgpu device object
>>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>>> - * @bo: &amdgpu_bo buffer to be restored
>>> - * @resv: reservation object with embedded fence
>>> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
>>> + *
>>> + * @shadow: &amdgpu_bo shadow to be restored
>>>    * @fence: dma_fence associated with the operation
>>> - * @direct: whether to submit the job directly
>>>    *
>>>    * Copies a buffer object's shadow content back to the object.
>>>    * This is used for recovering a buffer from its shadow in case of 
>>> a gpu
>>> @@ -684,36 +681,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>>    * Returns:
>>>    * 0 for success or a negative error code on failure.
>>>    */
>>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>>> -                  struct amdgpu_ring *ring,
>>> -                  struct amdgpu_bo *bo,
>>> -                  struct reservation_object *resv,
>>> -                  struct dma_fence **fence,
>>> -                  bool direct)
>>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct 
>>> dma_fence **fence)
>>>     {
>>> -    struct amdgpu_bo *shadow = bo->shadow;
>>> -    uint64_t bo_addr, shadow_addr;
>>> -    int r;
>>> -
>>> -    if (!shadow)
>>> -        return -EINVAL;
>>> -
>>> -    bo_addr = amdgpu_bo_gpu_offset(bo);
>>> -    shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
>>> -
>>> -    r = reservation_object_reserve_shared(bo->tbo.resv);
>>> -    if (r)
>>> -        goto err;
>>> +    struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
>>> +    struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>>> +    uint64_t shadow_addr, parent_addr;
>>>   -    r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
>>> -                   amdgpu_bo_size(bo), resv, fence,
>>> -                   direct, false);
>>> -    if (!r)
>>> -        amdgpu_bo_fence(bo, *fence, true);
>>> +    shadow_addr = amdgpu_bo_gpu_offset(shadow);
>>> +    parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
>>>   -err:
>>> -    return r;
>>> +    return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
>>> +                  amdgpu_bo_size(shadow), NULL, fence,
>>> +                  true, false);
>>>   }
>>>     /**
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> index 907fdf46d895..363db417fb2e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>>> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct 
>>> amdgpu_device *adev,
>>>                      struct reservation_object *resv,
>>>                      struct dma_fence **fence, bool direct);
>>>   int amdgpu_bo_validate(struct amdgpu_bo *bo);
>>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>>> -                  struct amdgpu_ring *ring,
>>> -                  struct amdgpu_bo *bo,
>>> -                  struct reservation_object *resv,
>>> -                  struct dma_fence **fence,
>>> -                  bool direct);
>>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
>>> +                 struct dma_fence **fence);
>>>   uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device 
>>> *adev,
>>>                           uint32_t domain);
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found]         ` <073f24f7-df48-55f6-f4fb-2d250cfd2dd1-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-14 11:54           ` Christian König
       [not found]             ` <fa8fb53f-ff84-fb45-59c7-1d52ec57fba7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2018-09-14 11:54 UTC (permalink / raw)
  To: Zhang, Jerry(Junwei), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 13.09.2018 um 11:29 schrieb Zhang, Jerry(Junwei):
> On 09/11/2018 05:56 PM, Christian König wrote:
>> Don't grab the reservation lock any more and simplify the handling quite
>> a bit.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 
>> ++++++++---------------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
>>   3 files changed, 43 insertions(+), 120 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 5eba66ecf668..20bb702f5c7f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2940,54 +2940,6 @@ static int 
>> amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
>>       return 0;
>>   }
>>   -/**
>> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM 
>> buffers
>> - *
>> - * @adev: amdgpu_device pointer
>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>> - * @bo: amdgpu_bo buffer whose shadow is being restored
>> - * @fence: dma_fence associated with the operation
>> - *
>> - * Restores the VRAM buffer contents from the shadow in GTT. Used to
>> - * restore things like GPUVM page tables after a GPU reset where
>> - * the contents of VRAM might be lost.
>> - * Returns 0 on success, negative error code on failure.
>> - */
>> -static int amdgpu_device_recover_vram_from_shadow(struct 
>> amdgpu_device *adev,
>> -                          struct amdgpu_ring *ring,
>> -                          struct amdgpu_bo *bo,
>> -                          struct dma_fence **fence)
>> -{
>> -    uint32_t domain;
>> -    int r;
>> -
>> -    if (!bo->shadow)
>> -        return 0;
>> -
>> -    r = amdgpu_bo_reserve(bo, true);
>> -    if (r)
>> -        return r;
>> -    domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
>> -    /* if bo has been evicted, then no need to recover */
>> -    if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -        r = amdgpu_bo_validate(bo->shadow);
>> -        if (r) {
>> -            DRM_ERROR("bo validate failed!\n");
>> -            goto err;
>> -        }
>> -
>> -        r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
>> -                         NULL, fence, true);
>> -        if (r) {
>> -            DRM_ERROR("recover page table failed!\n");
>> -            goto err;
>> -        }
>> -    }
>> -err:
>> -    amdgpu_bo_unreserve(bo);
>> -    return r;
>> -}
>> -
>>   /**
>>    * amdgpu_device_recover_vram - Recover some VRAM contents
>>    *
>> @@ -2996,16 +2948,15 @@ static int 
>> amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>>    * Restores the contents of VRAM buffers from the shadows in GTT.  
>> Used to
>>    * restore things like GPUVM page tables after a GPU reset where
>>    * the contents of VRAM might be lost.
>> - * Returns 0 on success, 1 on failure.
>> + *
>> + * Returns:
>> + * 0 on success, negative error code on failure.
>>    */
>>   static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>>   {
>> -    struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>> -    struct amdgpu_bo *bo, *tmp;
>>       struct dma_fence *fence = NULL, *next = NULL;
>> -    long r = 1;
>> -    int i = 0;
>> -    long tmo;
>> +    struct amdgpu_bo *shadow;
>> +    long r = 1, tmo;
>>         if (amdgpu_sriov_runtime(adev))
>>           tmo = msecs_to_jiffies(8000);
>> @@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct 
>> amdgpu_device *adev)
>>         DRM_INFO("recover vram bo from shadow start\n");
>>       mutex_lock(&adev->shadow_list_lock);
>> -    list_for_each_entry_safe(bo, tmp, &adev->shadow_list, 
>> shadow_list) {
>> -        next = NULL;
>> -        amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
>> +    list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
>> +
>> +        /* No need to recover an evicted BO */
>> +        if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
>> +            shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
> is there a change that shadow bo evicted to other domain?
> like SYSTEM?

Yes, that's why I test "!= TTM_PL_TT" here.

What can happen is that either the shadow or the page table or page 
directory is evicted.

But in this case we don't need to restore anything because of patch #1 
in this series.

Regards,
Christian.

>
> Regards,
> Jerry
>> +            continue;
>> +
>> +        r = amdgpu_bo_restore_shadow(shadow, &next);
>> +        if (r)
>> +            break;
>> +
>>           if (fence) {
>>               r = dma_fence_wait_timeout(fence, false, tmo);
>> -            if (r == 0)
>> -                pr_err("wait fence %p[%d] timeout\n", fence, i);
>> -            else if (r < 0)
>> -                pr_err("wait fence %p[%d] interrupted\n", fence, i);
>> -            if (r < 1) {
>> -                dma_fence_put(fence);
>> -                fence = next;
>> +            dma_fence_put(fence);
>> +            fence = next;
>> +            if (r <= 0)
>>                   break;
>> -            }
>> -            i++;
>> +        } else {
>> +            fence = next;
>>           }
>> -
>> -        dma_fence_put(fence);
>> -        fence = next;
>>       }
>>       mutex_unlock(&adev->shadow_list_lock);
>>   -    if (fence) {
>> -        r = dma_fence_wait_timeout(fence, false, tmo);
>> -        if (r == 0)
>> -            pr_err("wait fence %p[%d] timeout\n", fence, i);
>> -        else if (r < 0)
>> -            pr_err("wait fence %p[%d] interrupted\n", fence, i);
>> -
>> -    }
>> +    if (fence)
>> +        tmo = dma_fence_wait_timeout(fence, false, tmo);
>>       dma_fence_put(fence);
>>   -    if (r > 0)
>> -        DRM_INFO("recover vram bo from shadow done\n");
>> -    else
>> +    if (r <= 0 || tmo <= 0) {
>>           DRM_ERROR("recover vram bo from shadow failed\n");
>> +        return -EIO;
>> +    }
>>   -    return (r > 0) ? 0 : 1;
>> +    DRM_INFO("recover vram bo from shadow done\n");
>> +    return 0;
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 3a6f92de5504..5b6d5fcc2151 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -537,7 +537,7 @@ static int amdgpu_bo_create_shadow(struct 
>> amdgpu_device *adev,
>>       if (!r) {
>>           bo->shadow->parent = amdgpu_bo_ref(bo);
>>           mutex_lock(&adev->shadow_list_lock);
>> -        list_add_tail(&bo->shadow_list, &adev->shadow_list);
>> +        list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
>>           mutex_unlock(&adev->shadow_list_lock);
>>       }
>>   @@ -669,13 +669,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>   }
>>     /**
>> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
>> - * @adev: amdgpu device object
>> - * @ring: amdgpu_ring for the engine handling the buffer operations
>> - * @bo: &amdgpu_bo buffer to be restored
>> - * @resv: reservation object with embedded fence
>> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
>> + *
>> + * @shadow: &amdgpu_bo shadow to be restored
>>    * @fence: dma_fence associated with the operation
>> - * @direct: whether to submit the job directly
>>    *
>>    * Copies a buffer object's shadow content back to the object.
>>    * This is used for recovering a buffer from its shadow in case of 
>> a gpu
>> @@ -684,36 +681,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>>    * Returns:
>>    * 0 for success or a negative error code on failure.
>>    */
>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>> -                  struct amdgpu_ring *ring,
>> -                  struct amdgpu_bo *bo,
>> -                  struct reservation_object *resv,
>> -                  struct dma_fence **fence,
>> -                  bool direct)
>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct 
>> dma_fence **fence)
>>     {
>> -    struct amdgpu_bo *shadow = bo->shadow;
>> -    uint64_t bo_addr, shadow_addr;
>> -    int r;
>> -
>> -    if (!shadow)
>> -        return -EINVAL;
>> -
>> -    bo_addr = amdgpu_bo_gpu_offset(bo);
>> -    shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
>> -
>> -    r = reservation_object_reserve_shared(bo->tbo.resv);
>> -    if (r)
>> -        goto err;
>> +    struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
>> +    struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
>> +    uint64_t shadow_addr, parent_addr;
>>   -    r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
>> -                   amdgpu_bo_size(bo), resv, fence,
>> -                   direct, false);
>> -    if (!r)
>> -        amdgpu_bo_fence(bo, *fence, true);
>> +    shadow_addr = amdgpu_bo_gpu_offset(shadow);
>> +    parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
>>   -err:
>> -    return r;
>> +    return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
>> +                  amdgpu_bo_size(shadow), NULL, fence,
>> +                  true, false);
>>   }
>>     /**
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index 907fdf46d895..363db417fb2e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct 
>> amdgpu_device *adev,
>>                      struct reservation_object *resv,
>>                      struct dma_fence **fence, bool direct);
>>   int amdgpu_bo_validate(struct amdgpu_bo *bo);
>> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
>> -                  struct amdgpu_ring *ring,
>> -                  struct amdgpu_bo *bo,
>> -                  struct reservation_object *resv,
>> -                  struct dma_fence **fence,
>> -                  bool direct);
>> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
>> +                 struct dma_fence **fence);
>>   uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device 
>> *adev,
>>                           uint32_t domain);
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found]     ` <20180911095602.10152-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-13  9:29       ` Zhang, Jerry(Junwei)
       [not found]         ` <073f24f7-df48-55f6-f4fb-2d250cfd2dd1-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Zhang, Jerry(Junwei) @ 2018-09-13  9:29 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 09/11/2018 05:56 PM, Christian König wrote:
> Don't grab the reservation lock any more and simplify the handling quite
> a bit.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
>   3 files changed, 43 insertions(+), 120 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5eba66ecf668..20bb702f5c7f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2940,54 +2940,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
>   	return 0;
>   }
>   
> -/**
> - * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
> - *
> - * @adev: amdgpu_device pointer
> - * @ring: amdgpu_ring for the engine handling the buffer operations
> - * @bo: amdgpu_bo buffer whose shadow is being restored
> - * @fence: dma_fence associated with the operation
> - *
> - * Restores the VRAM buffer contents from the shadow in GTT.  Used to
> - * restore things like GPUVM page tables after a GPU reset where
> - * the contents of VRAM might be lost.
> - * Returns 0 on success, negative error code on failure.
> - */
> -static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
> -						  struct amdgpu_ring *ring,
> -						  struct amdgpu_bo *bo,
> -						  struct dma_fence **fence)
> -{
> -	uint32_t domain;
> -	int r;
> -
> -	if (!bo->shadow)
> -		return 0;
> -
> -	r = amdgpu_bo_reserve(bo, true);
> -	if (r)
> -		return r;
> -	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
> -	/* if bo has been evicted, then no need to recover */
> -	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -		r = amdgpu_bo_validate(bo->shadow);
> -		if (r) {
> -			DRM_ERROR("bo validate failed!\n");
> -			goto err;
> -		}
> -
> -		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
> -						 NULL, fence, true);
> -		if (r) {
> -			DRM_ERROR("recover page table failed!\n");
> -			goto err;
> -		}
> -	}
> -err:
> -	amdgpu_bo_unreserve(bo);
> -	return r;
> -}
> -
>   /**
>    * amdgpu_device_recover_vram - Recover some VRAM contents
>    *
> @@ -2996,16 +2948,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
>    * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
>    * restore things like GPUVM page tables after a GPU reset where
>    * the contents of VRAM might be lost.
> - * Returns 0 on success, 1 on failure.
> + *
> + * Returns:
> + * 0 on success, negative error code on failure.
>    */
>   static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>   {
> -	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> -	struct amdgpu_bo *bo, *tmp;
>   	struct dma_fence *fence = NULL, *next = NULL;
> -	long r = 1;
> -	int i = 0;
> -	long tmo;
> +	struct amdgpu_bo *shadow;
> +	long r = 1, tmo;
>   
>   	if (amdgpu_sriov_runtime(adev))
>   		tmo = msecs_to_jiffies(8000);
> @@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
>   
>   	DRM_INFO("recover vram bo from shadow start\n");
>   	mutex_lock(&adev->shadow_list_lock);
> -	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
> -		next = NULL;
> -		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
> +	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
> +
> +		/* No need to recover an evicted BO */
> +		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
> +		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
is there a change that shadow bo evicted to other domain?
like SYSTEM?

Regards,
Jerry
> +			continue;
> +
> +		r = amdgpu_bo_restore_shadow(shadow, &next);
> +		if (r)
> +			break;
> +
>   		if (fence) {
>   			r = dma_fence_wait_timeout(fence, false, tmo);
> -			if (r == 0)
> -				pr_err("wait fence %p[%d] timeout\n", fence, i);
> -			else if (r < 0)
> -				pr_err("wait fence %p[%d] interrupted\n", fence, i);
> -			if (r < 1) {
> -				dma_fence_put(fence);
> -				fence = next;
> +			dma_fence_put(fence);
> +			fence = next;
> +			if (r <= 0)
>   				break;
> -			}
> -			i++;
> +		} else {
> +			fence = next;
>   		}
> -
> -		dma_fence_put(fence);
> -		fence = next;
>   	}
>   	mutex_unlock(&adev->shadow_list_lock);
>   
> -	if (fence) {
> -		r = dma_fence_wait_timeout(fence, false, tmo);
> -		if (r == 0)
> -			pr_err("wait fence %p[%d] timeout\n", fence, i);
> -		else if (r < 0)
> -			pr_err("wait fence %p[%d] interrupted\n", fence, i);
> -
> -	}
> +	if (fence)
> +		tmo = dma_fence_wait_timeout(fence, false, tmo);
>   	dma_fence_put(fence);
>   
> -	if (r > 0)
> -		DRM_INFO("recover vram bo from shadow done\n");
> -	else
> +	if (r <= 0 || tmo <= 0) {
>   		DRM_ERROR("recover vram bo from shadow failed\n");
> +		return -EIO;
> +	}
>   
> -	return (r > 0) ? 0 : 1;
> +	DRM_INFO("recover vram bo from shadow done\n");
> +	return 0;
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 3a6f92de5504..5b6d5fcc2151 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -537,7 +537,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
>   	if (!r) {
>   		bo->shadow->parent = amdgpu_bo_ref(bo);
>   		mutex_lock(&adev->shadow_list_lock);
> -		list_add_tail(&bo->shadow_list, &adev->shadow_list);
> +		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
>   		mutex_unlock(&adev->shadow_list_lock);
>   	}
>   
> @@ -669,13 +669,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>   }
>   
>   /**
> - * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
> - * @adev: amdgpu device object
> - * @ring: amdgpu_ring for the engine handling the buffer operations
> - * @bo: &amdgpu_bo buffer to be restored
> - * @resv: reservation object with embedded fence
> + * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
> + *
> + * @shadow: &amdgpu_bo shadow to be restored
>    * @fence: dma_fence associated with the operation
> - * @direct: whether to submit the job directly
>    *
>    * Copies a buffer object's shadow content back to the object.
>    * This is used for recovering a buffer from its shadow in case of a gpu
> @@ -684,36 +681,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
>    * Returns:
>    * 0 for success or a negative error code on failure.
>    */
> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> -				  struct amdgpu_ring *ring,
> -				  struct amdgpu_bo *bo,
> -				  struct reservation_object *resv,
> -				  struct dma_fence **fence,
> -				  bool direct)
> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
>   
>   {
> -	struct amdgpu_bo *shadow = bo->shadow;
> -	uint64_t bo_addr, shadow_addr;
> -	int r;
> -
> -	if (!shadow)
> -		return -EINVAL;
> -
> -	bo_addr = amdgpu_bo_gpu_offset(bo);
> -	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
> -
> -	r = reservation_object_reserve_shared(bo->tbo.resv);
> -	if (r)
> -		goto err;
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
> +	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> +	uint64_t shadow_addr, parent_addr;
>   
> -	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
> -			       amdgpu_bo_size(bo), resv, fence,
> -			       direct, false);
> -	if (!r)
> -		amdgpu_bo_fence(bo, *fence, true);
> +	shadow_addr = amdgpu_bo_gpu_offset(shadow);
> +	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
>   
> -err:
> -	return r;
> +	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
> +				  amdgpu_bo_size(shadow), NULL, fence,
> +				  true, false);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 907fdf46d895..363db417fb2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
>   			       struct reservation_object *resv,
>   			       struct dma_fence **fence, bool direct);
>   int amdgpu_bo_validate(struct amdgpu_bo *bo);
> -int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
> -				  struct amdgpu_ring *ring,
> -				  struct amdgpu_bo *bo,
> -				  struct reservation_object *resv,
> -				  struct dma_fence **fence,
> -				  bool direct);
> +int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
> +			     struct dma_fence **fence);
>   uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
>   					    uint32_t domain);
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/5] drm/amdgpu: fix shadow BO restoring
       [not found] ` <20180911095602.10152-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-11  9:56   ` Christian König
       [not found]     ` <20180911095602.10152-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2018-09-11  9:56 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Don't grab the reservation lock any more and simplify the handling quite
a bit.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 109 ++++++++---------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  46 ++++--------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   8 +--
 3 files changed, 43 insertions(+), 120 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5eba66ecf668..20bb702f5c7f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2940,54 +2940,6 @@ static int amdgpu_device_ip_post_soft_reset(struct amdgpu_device *adev)
 	return 0;
 }
 
-/**
- * amdgpu_device_recover_vram_from_shadow - restore shadowed VRAM buffers
- *
- * @adev: amdgpu_device pointer
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: amdgpu_bo buffer whose shadow is being restored
- * @fence: dma_fence associated with the operation
- *
- * Restores the VRAM buffer contents from the shadow in GTT.  Used to
- * restore things like GPUVM page tables after a GPU reset where
- * the contents of VRAM might be lost.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
-						  struct amdgpu_ring *ring,
-						  struct amdgpu_bo *bo,
-						  struct dma_fence **fence)
-{
-	uint32_t domain;
-	int r;
-
-	if (!bo->shadow)
-		return 0;
-
-	r = amdgpu_bo_reserve(bo, true);
-	if (r)
-		return r;
-	domain = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
-	/* if bo has been evicted, then no need to recover */
-	if (domain == AMDGPU_GEM_DOMAIN_VRAM) {
-		r = amdgpu_bo_validate(bo->shadow);
-		if (r) {
-			DRM_ERROR("bo validate failed!\n");
-			goto err;
-		}
-
-		r = amdgpu_bo_restore_from_shadow(adev, ring, bo,
-						 NULL, fence, true);
-		if (r) {
-			DRM_ERROR("recover page table failed!\n");
-			goto err;
-		}
-	}
-err:
-	amdgpu_bo_unreserve(bo);
-	return r;
-}
-
 /**
  * amdgpu_device_recover_vram - Recover some VRAM contents
  *
@@ -2996,16 +2948,15 @@ static int amdgpu_device_recover_vram_from_shadow(struct amdgpu_device *adev,
  * Restores the contents of VRAM buffers from the shadows in GTT.  Used to
  * restore things like GPUVM page tables after a GPU reset where
  * the contents of VRAM might be lost.
- * Returns 0 on success, 1 on failure.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
  */
 static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 {
-	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
-	struct amdgpu_bo *bo, *tmp;
 	struct dma_fence *fence = NULL, *next = NULL;
-	long r = 1;
-	int i = 0;
-	long tmo;
+	struct amdgpu_bo *shadow;
+	long r = 1, tmo;
 
 	if (amdgpu_sriov_runtime(adev))
 		tmo = msecs_to_jiffies(8000);
@@ -3014,44 +2965,40 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev)
 
 	DRM_INFO("recover vram bo from shadow start\n");
 	mutex_lock(&adev->shadow_list_lock);
-	list_for_each_entry_safe(bo, tmp, &adev->shadow_list, shadow_list) {
-		next = NULL;
-		amdgpu_device_recover_vram_from_shadow(adev, ring, bo, &next);
+	list_for_each_entry(shadow, &adev->shadow_list, shadow_list) {
+
+		/* No need to recover an evicted BO */
+		if (shadow->tbo.mem.mem_type != TTM_PL_TT ||
+		    shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM)
+			continue;
+
+		r = amdgpu_bo_restore_shadow(shadow, &next);
+		if (r)
+			break;
+
 		if (fence) {
 			r = dma_fence_wait_timeout(fence, false, tmo);
-			if (r == 0)
-				pr_err("wait fence %p[%d] timeout\n", fence, i);
-			else if (r < 0)
-				pr_err("wait fence %p[%d] interrupted\n", fence, i);
-			if (r < 1) {
-				dma_fence_put(fence);
-				fence = next;
+			dma_fence_put(fence);
+			fence = next;
+			if (r <= 0)
 				break;
-			}
-			i++;
+		} else {
+			fence = next;
 		}
-
-		dma_fence_put(fence);
-		fence = next;
 	}
 	mutex_unlock(&adev->shadow_list_lock);
 
-	if (fence) {
-		r = dma_fence_wait_timeout(fence, false, tmo);
-		if (r == 0)
-			pr_err("wait fence %p[%d] timeout\n", fence, i);
-		else if (r < 0)
-			pr_err("wait fence %p[%d] interrupted\n", fence, i);
-
-	}
+	if (fence)
+		tmo = dma_fence_wait_timeout(fence, false, tmo);
 	dma_fence_put(fence);
 
-	if (r > 0)
-		DRM_INFO("recover vram bo from shadow done\n");
-	else
+	if (r <= 0 || tmo <= 0) {
 		DRM_ERROR("recover vram bo from shadow failed\n");
+		return -EIO;
+	}
 
-	return (r > 0) ? 0 : 1;
+	DRM_INFO("recover vram bo from shadow done\n");
+	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 3a6f92de5504..5b6d5fcc2151 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -537,7 +537,7 @@ static int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
 	if (!r) {
 		bo->shadow->parent = amdgpu_bo_ref(bo);
 		mutex_lock(&adev->shadow_list_lock);
-		list_add_tail(&bo->shadow_list, &adev->shadow_list);
+		list_add_tail(&bo->shadow->shadow_list, &adev->shadow_list);
 		mutex_unlock(&adev->shadow_list_lock);
 	}
 
@@ -669,13 +669,10 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
 }
 
 /**
- * amdgpu_bo_restore_from_shadow - restore an &amdgpu_bo buffer object
- * @adev: amdgpu device object
- * @ring: amdgpu_ring for the engine handling the buffer operations
- * @bo: &amdgpu_bo buffer to be restored
- * @resv: reservation object with embedded fence
+ * amdgpu_bo_restore_shadow - restore an &amdgpu_bo shadow
+ *
+ * @shadow: &amdgpu_bo shadow to be restored
  * @fence: dma_fence associated with the operation
- * @direct: whether to submit the job directly
  *
  * Copies a buffer object's shadow content back to the object.
  * This is used for recovering a buffer from its shadow in case of a gpu
@@ -684,36 +681,19 @@ int amdgpu_bo_validate(struct amdgpu_bo *bo)
  * Returns:
  * 0 for success or a negative error code on failure.
  */
-int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
-				  struct amdgpu_ring *ring,
-				  struct amdgpu_bo *bo,
-				  struct reservation_object *resv,
-				  struct dma_fence **fence,
-				  bool direct)
+int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow, struct dma_fence **fence)
 
 {
-	struct amdgpu_bo *shadow = bo->shadow;
-	uint64_t bo_addr, shadow_addr;
-	int r;
-
-	if (!shadow)
-		return -EINVAL;
-
-	bo_addr = amdgpu_bo_gpu_offset(bo);
-	shadow_addr = amdgpu_bo_gpu_offset(bo->shadow);
-
-	r = reservation_object_reserve_shared(bo->tbo.resv);
-	if (r)
-		goto err;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(shadow->tbo.bdev);
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	uint64_t shadow_addr, parent_addr;
 
-	r = amdgpu_copy_buffer(ring, shadow_addr, bo_addr,
-			       amdgpu_bo_size(bo), resv, fence,
-			       direct, false);
-	if (!r)
-		amdgpu_bo_fence(bo, *fence, true);
+	shadow_addr = amdgpu_bo_gpu_offset(shadow);
+	parent_addr = amdgpu_bo_gpu_offset(shadow->parent);
 
-err:
-	return r;
+	return amdgpu_copy_buffer(ring, shadow_addr, parent_addr,
+				  amdgpu_bo_size(shadow), NULL, fence,
+				  true, false);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 907fdf46d895..363db417fb2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -273,12 +273,8 @@ int amdgpu_bo_backup_to_shadow(struct amdgpu_device *adev,
 			       struct reservation_object *resv,
 			       struct dma_fence **fence, bool direct);
 int amdgpu_bo_validate(struct amdgpu_bo *bo);
-int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev,
-				  struct amdgpu_ring *ring,
-				  struct amdgpu_bo *bo,
-				  struct reservation_object *resv,
-				  struct dma_fence **fence,
-				  bool direct);
+int amdgpu_bo_restore_shadow(struct amdgpu_bo *shadow,
+			     struct dma_fence **fence);
 uint32_t amdgpu_bo_get_preferred_pin_domain(struct amdgpu_device *adev,
 					    uint32_t domain);
 
-- 
2.14.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-18  8:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14 13:42 [PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves Christian König
     [not found] ` <20180914134257.2196-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-14 13:42   ` [PATCH 2/5] drm/amdgpu: always enable shadow BOs v2 Christian König
2018-09-14 13:42   ` [PATCH 3/5] drm/amdgpu: shadow BOs don't need any alignment Christian König
2018-09-14 13:42   ` [PATCH 4/5] drm/amdgpu: always recover VRAM during GPU recovery Christian König
2018-09-14 13:42   ` [PATCH 5/5] drm/amdgpu: fix shadow BO restoring Christian König
     [not found]     ` <20180914134257.2196-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-17  9:10       ` Huang Rui
2018-09-17 11:42         ` Christian König
     [not found]           ` <e1c78397-4a41-0f74-2049-5c63a279630b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18  8:40             ` Huang Rui
  -- strict thread matches above, loose matches on Subject: below --
2018-09-11  9:55 [PATCH 1/5] drm/amdgpu: stop pipelining VM PDs/PTs moves Christian König
     [not found] ` <20180911095602.10152-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-11  9:56   ` [PATCH 5/5] drm/amdgpu: fix shadow BO restoring Christian König
     [not found]     ` <20180911095602.10152-5-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2018-09-13  9:29       ` Zhang, Jerry(Junwei)
     [not found]         ` <073f24f7-df48-55f6-f4fb-2d250cfd2dd1-5C7GfCeVMHo@public.gmane.org>
2018-09-14 11:54           ` Christian König
     [not found]             ` <fa8fb53f-ff84-fb45-59c7-1d52ec57fba7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18  6:15               ` Zhang, Jerry(Junwei)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.