All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] GART table recovery
@ 2016-08-02  8:00 Chunming Zhou
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

gart table is stored in one bo which must be ready before gart init, but the shadow bo must be created after gart is ready, so they cannot be created at a same time. shado bo itself aslo is included in gart table, So shadow bo needs a synchronization after device init. After sync, the contents of bo and shadwo bo will be same, and be updated at a same time. Then we will be able to recover gart table from shadow bo when gpu full reset.

patch10 is a fix for memory leak.

Chunming Zhou (10):
  drm/amdgpu: make need_backup generic
  drm/amdgpu: implement gart late_init/fini
  drm/amdgpu: add gart_late_init/fini to gmc V7/8
  drm/amdgpu: abstract amdgpu_bo_create_shadow
  drm/amdgpu: shadow gart table support
  drm/amdgpu: make recover_bo_from_shadow be generic
  drm/amdgpu: implement gart recovery
  drm/amdgpu: recover gart table first when full reset
  drm/amdgpu: sync gart table before initialization completed
  drm/amdgpu: fix memory leak of sched fence

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 +++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
 9 files changed, 304 insertions(+), 66 deletions(-)

-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/10] drm/amdgpu: make need_backup generic
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 02/10] drm/amdgpu: implement gart late_init/fini Chunming Zhou
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

It will be used other place.

Change-Id: I213faf16e25a95bef4c45a65ab21f4d61db4ef41
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h    | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b1a4af0..daf07ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -975,6 +975,7 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev);
 void amdgpu_vm_manager_fini(struct amdgpu_device *adev);
 int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
+bool amdgpu_vm_need_backup(struct amdgpu_device *adev);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 			 struct list_head *validated,
 			 struct amdgpu_bo_list_entry *entry);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a34d94a..01dd888 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -112,7 +112,7 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
 	list_add(&entry->tv.head, validated);
 }
 
-static bool amdgpu_vm_need_backup(struct amdgpu_device *adev)
+bool amdgpu_vm_need_backup(struct amdgpu_device *adev)
 {
 	if (adev->flags & AMD_IS_APU)
 		return false;
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/10] drm/amdgpu: implement gart late_init/fini
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
  2016-08-02  8:00   ` [PATCH 01/10] drm/amdgpu: make need_backup generic Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 03/10] drm/amdgpu: add gart_late_init/fini to gmc V7/8 Chunming Zhou
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

add recovery entity to gart.

Change-Id: Ieb400c8a731ef25619ea3c0b5198a6e7ce56580e
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index daf07ff..419a33b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -646,6 +646,7 @@ struct amdgpu_gart {
 #endif
 	bool				ready;
 	const struct amdgpu_gart_funcs *gart_funcs;
+	struct amd_sched_entity         recover_entity;
 };
 
 int amdgpu_gart_table_ram_alloc(struct amdgpu_device *adev);
@@ -656,6 +657,8 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
 int amdgpu_gart_init(struct amdgpu_device *adev);
 void amdgpu_gart_fini(struct amdgpu_device *adev);
+int amdgpu_gart_late_init(struct amdgpu_device *adev);
+void amdgpu_gart_late_fini(struct amdgpu_device *adev);
 void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset,
 			int pages);
 int amdgpu_gart_bind(struct amdgpu_device *adev, unsigned offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 921bce2..c1f226b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -363,3 +363,21 @@ void amdgpu_gart_fini(struct amdgpu_device *adev)
 #endif
 	amdgpu_dummy_page_fini(adev);
 }
+
+int amdgpu_gart_late_init(struct amdgpu_device *adev)
+{
+	struct amd_sched_rq *rq;
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+
+	rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_RECOVER];
+	return amd_sched_entity_init(&ring->sched, &adev->gart.recover_entity,
+				     rq, amdgpu_sched_jobs);
+
+}
+
+void amdgpu_gart_late_fini(struct amdgpu_device *adev)
+{
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+
+	amd_sched_entity_fini(&ring->sched, &adev->gart.recover_entity);
+}
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/10] drm/amdgpu: add gart_late_init/fini to gmc V7/8
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
  2016-08-02  8:00   ` [PATCH 01/10] drm/amdgpu: make need_backup generic Chunming Zhou
  2016-08-02  8:00   ` [PATCH 02/10] drm/amdgpu: implement gart late_init/fini Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 04/10] drm/amdgpu: abstract amdgpu_bo_create_shadow Chunming Zhou
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

Change-Id: I47b132d1ac5ed57f5805f759d5698948c35721ba
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 24 ++++++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 24 ++++++++++++++++++++----
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 0b0f086..0771c04 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -887,11 +887,26 @@ static int gmc_v7_0_early_init(void *handle)
 static int gmc_v7_0_late_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	int r;
 
-	if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS)
-		return amdgpu_irq_get(adev, &adev->mc.vm_fault, 0);
-	else
-		return 0;
+	r = amdgpu_gart_late_init(adev);
+	if (r)
+		return r;
+
+	if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS) {
+		r = amdgpu_irq_get(adev, &adev->mc.vm_fault, 0);
+		if (r)
+			amdgpu_gart_late_fini(adev);
+	}
+
+	return r;
+}
+
+static void gmc_v7_0_late_fini(void *handle)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+	amdgpu_gart_late_fini(adev);
 }
 
 static int gmc_v7_0_sw_init(void *handle)
@@ -1242,6 +1257,7 @@ const struct amd_ip_funcs gmc_v7_0_ip_funcs = {
 	.name = "gmc_v7_0",
 	.early_init = gmc_v7_0_early_init,
 	.late_init = gmc_v7_0_late_init,
+	.late_fini = gmc_v7_0_late_fini,
 	.sw_init = gmc_v7_0_sw_init,
 	.sw_fini = gmc_v7_0_sw_fini,
 	.hw_init = gmc_v7_0_hw_init,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 0a23b83..c26bee9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -876,11 +876,26 @@ static int gmc_v8_0_early_init(void *handle)
 static int gmc_v8_0_late_init(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	int r;
 
-	if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS)
-		return amdgpu_irq_get(adev, &adev->mc.vm_fault, 0);
-	else
-		return 0;
+	r = amdgpu_gart_late_init(adev);
+	if (r)
+		return r;
+
+	if (amdgpu_vm_fault_stop != AMDGPU_VM_FAULT_STOP_ALWAYS) {
+		r = amdgpu_irq_get(adev, &adev->mc.vm_fault, 0);
+		if (r)
+			amdgpu_gart_late_fini(adev);
+	}
+
+	return r;
+}
+
+static void gmc_v8_0_late_fini(void *handle)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+	amdgpu_gart_late_fini(adev);
 }
 
 #define mmMC_SEQ_MISC0_FIJI 0xA71
@@ -1434,6 +1449,7 @@ const struct amd_ip_funcs gmc_v8_0_ip_funcs = {
 	.name = "gmc_v8_0",
 	.early_init = gmc_v8_0_early_init,
 	.late_init = gmc_v8_0_late_init,
+	.late_fini = gmc_v8_0_late_fini,
 	.sw_init = gmc_v8_0_sw_init,
 	.sw_fini = gmc_v8_0_sw_fini,
 	.hw_init = gmc_v8_0_hw_init,
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/10] drm/amdgpu: abstract amdgpu_bo_create_shadow
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 03/10] drm/amdgpu: add gart_late_init/fini to gmc V7/8 Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 05/10] drm/amdgpu: shadow gart table support Chunming Zhou
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

Change-Id: Id0e89f350a05f8668ea00e3fff8c0bd6f3049cec
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 40 ++++++++++++++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  3 +++
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e6ecf16..c1111c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -380,6 +380,32 @@ fail_free:
 	return r;
 }
 
+int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
+			    unsigned long size, int byte_align,
+			    struct amdgpu_bo *bo)
+{
+	struct ttm_placement placement = {0};
+	struct ttm_place placements[AMDGPU_GEM_DOMAIN_MAX + 1];
+
+	if (bo->shadow)
+		return 0;
+
+	bo->flags |= AMDGPU_GEM_CREATE_SHADOW;
+	memset(&placements, 0,
+	       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
+
+	amdgpu_ttm_placement_init(adev, &placement,
+				  placements, AMDGPU_GEM_DOMAIN_GTT,
+				  AMDGPU_GEM_CREATE_CPU_GTT_USWC);
+
+	return amdgpu_bo_create_restricted(adev, size, byte_align, true,
+					   AMDGPU_GEM_DOMAIN_GTT,
+					   AMDGPU_GEM_CREATE_CPU_GTT_USWC,
+					   NULL, &placement,
+					   bo->tbo.resv,
+					   &bo->shadow);
+}
+
 int amdgpu_bo_create(struct amdgpu_device *adev,
 		     unsigned long size, int byte_align,
 		     bool kernel, u32 domain, u64 flags,
@@ -404,19 +430,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 		return r;
 
 	if (flags & AMDGPU_GEM_CREATE_SHADOW) {
-		memset(&placements, 0,
-		       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
-
-		amdgpu_ttm_placement_init(adev, &placement,
-					  placements, AMDGPU_GEM_DOMAIN_GTT,
-					  AMDGPU_GEM_CREATE_CPU_GTT_USWC);
-
-		r = amdgpu_bo_create_restricted(adev, size, byte_align, kernel,
-						AMDGPU_GEM_DOMAIN_GTT,
-						AMDGPU_GEM_CREATE_CPU_GTT_USWC,
-						NULL, &placement,
-						(*bo_ptr)->tbo.resv,
-						&(*bo_ptr)->shadow);
+		r = amdgpu_bo_create_shadow(adev, size, byte_align, (*bo_ptr));
 		if (r)
 			amdgpu_bo_unref(bo_ptr);
 	} else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index d650b42..b994fd4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -117,6 +117,9 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 			    struct sg_table *sg,
 			    struct reservation_object *resv,
 			    struct amdgpu_bo **bo_ptr);
+int amdgpu_bo_create_shadow(struct amdgpu_device *adev,
+			    unsigned long size, int byte_align,
+			    struct amdgpu_bo *bo);
 int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 				unsigned long size, int byte_align,
 				bool kernel, u32 domain, u64 flags,
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/10] drm/amdgpu: shadow gart table support
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 04/10] drm/amdgpu: abstract amdgpu_bo_create_shadow Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 06/10] drm/amdgpu: make recover_bo_from_shadow be generic Chunming Zhou
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

allocate gart shadow bo, and using shadow bo to backup gart table.

Change-Id: Ib2beae9cea1ad1314c57f0fcdcc254816f39b9b2
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 47 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c    | 15 ++++++++++
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c    | 16 +++++++++++
 4 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 419a33b..2985578d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -638,6 +638,7 @@ struct amdgpu_gart {
 	dma_addr_t			table_addr;
 	struct amdgpu_bo		*robj;
 	void				*ptr;
+	void				*shadow_ptr;
 	unsigned			num_gpu_pages;
 	unsigned			num_cpu_pages;
 	unsigned			table_size;
@@ -655,6 +656,8 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_free(struct amdgpu_device *adev);
 int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
+int amdgpu_gart_table_vram_shadow_pin(struct amdgpu_device *adev);
+void amdgpu_gart_table_vram_shadow_unpin(struct amdgpu_device *adev);
 int amdgpu_gart_init(struct amdgpu_device *adev);
 void amdgpu_gart_fini(struct amdgpu_device *adev);
 int amdgpu_gart_late_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index c1f226b..b306684 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -248,6 +248,9 @@ void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset,
 		for (j = 0; j < (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE); j++, t++) {
 			amdgpu_gart_set_pte_pde(adev, adev->gart.ptr,
 						t, page_base, flags);
+			if (amdgpu_vm_need_backup(adev) && adev->gart.robj->shadow)
+				amdgpu_gart_set_pte_pde(adev, adev->gart.shadow_ptr,
+							t, page_base, flags);
 			page_base += AMDGPU_GPU_PAGE_SIZE;
 		}
 	}
@@ -293,6 +296,9 @@ int amdgpu_gart_bind(struct amdgpu_device *adev, unsigned offset,
 			page_base = dma_addr[i];
 			for (j = 0; j < (PAGE_SIZE / AMDGPU_GPU_PAGE_SIZE); j++, t++) {
 				amdgpu_gart_set_pte_pde(adev, adev->gart.ptr, t, page_base, flags);
+				if (amdgpu_vm_need_backup(adev) && adev->gart.robj->shadow)
+					amdgpu_gart_set_pte_pde(adev, adev->gart.shadow_ptr,
+								t, page_base, flags);
 				page_base += AMDGPU_GPU_PAGE_SIZE;
 			}
 		}
@@ -364,6 +370,46 @@ void amdgpu_gart_fini(struct amdgpu_device *adev)
 	amdgpu_dummy_page_fini(adev);
 }
 
+int amdgpu_gart_table_vram_shadow_pin(struct amdgpu_device *adev)
+{
+	uint64_t gpu_addr;
+	int r;
+
+	if (!adev->gart.robj->shadow)
+		return -EINVAL;
+
+	r = amdgpu_bo_reserve(adev->gart.robj->shadow, false);
+	if (unlikely(r != 0))
+		return r;
+	r = amdgpu_bo_pin(adev->gart.robj->shadow,
+				AMDGPU_GEM_DOMAIN_GTT, &gpu_addr);
+	if (r) {
+		amdgpu_bo_unreserve(adev->gart.robj->shadow);
+		return r;
+	}
+	r = amdgpu_bo_kmap(adev->gart.robj->shadow, &adev->gart.shadow_ptr);
+	if (r)
+		amdgpu_bo_unpin(adev->gart.robj->shadow);
+	amdgpu_bo_unreserve(adev->gart.robj->shadow);
+	return r;
+}
+
+void amdgpu_gart_table_vram_shadow_unpin(struct amdgpu_device *adev)
+{
+	int r;
+
+	if (adev->gart.robj->shadow == NULL)
+		return;
+
+	r = amdgpu_bo_reserve(adev->gart.robj->shadow, false);
+	if (likely(r == 0)) {
+		amdgpu_bo_kunmap(adev->gart.robj->shadow);
+		amdgpu_bo_unpin(adev->gart.robj->shadow);
+		amdgpu_bo_unreserve(adev->gart.robj->shadow);
+		adev->gart.shadow_ptr = NULL;
+	}
+}
+
 int amdgpu_gart_late_init(struct amdgpu_device *adev)
 {
 	struct amd_sched_rq *rq;
@@ -372,7 +418,6 @@ int amdgpu_gart_late_init(struct amdgpu_device *adev)
 	rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_RECOVER];
 	return amd_sched_entity_init(&ring->sched, &adev->gart.recover_entity,
 				     rq, amdgpu_sched_jobs);
-
 }
 
 void amdgpu_gart_late_fini(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 0771c04..5470a28 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -589,7 +589,21 @@ static int gmc_v7_0_gart_enable(struct amdgpu_device *adev)
 		 (unsigned)(adev->mc.gtt_size >> 20),
 		 (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
+	if (amdgpu_vm_need_backup(adev) && adev->gart.robj) {
+		r = amdgpu_bo_create_shadow(adev, adev->gart.table_size,
+					    PAGE_SIZE, adev->gart.robj);
+		if (r)
+			goto err;
+		r = amdgpu_gart_table_vram_shadow_pin(adev);
+		if (r)
+			goto err;
+	}
+
 	return 0;
+err:
+	amdgpu_gart_table_vram_unpin(adev);
+
+	return r;
 }
 
 static int gmc_v7_0_gart_init(struct amdgpu_device *adev)
@@ -634,6 +648,7 @@ static void gmc_v7_0_gart_disable(struct amdgpu_device *adev)
 	WREG32(mmVM_L2_CNTL, tmp);
 	WREG32(mmVM_L2_CNTL2, 0);
 	amdgpu_gart_table_vram_unpin(adev);
+	amdgpu_gart_table_vram_shadow_unpin(adev);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index c26bee9..6c2b5de 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -704,7 +704,22 @@ static int gmc_v8_0_gart_enable(struct amdgpu_device *adev)
 		 (unsigned)(adev->mc.gtt_size >> 20),
 		 (unsigned long long)adev->gart.table_addr);
 	adev->gart.ready = true;
+
+	if (amdgpu_vm_need_backup(adev) && adev->gart.robj) {
+		r = amdgpu_bo_create_shadow(adev, adev->gart.table_size,
+					    PAGE_SIZE, adev->gart.robj);
+		if (r)
+			goto err;
+		r = amdgpu_gart_table_vram_shadow_pin(adev);
+		if (r)
+			goto err;
+	}
+
 	return 0;
+err:
+	amdgpu_gart_table_vram_unpin(adev);
+
+	return r;
 }
 
 static int gmc_v8_0_gart_init(struct amdgpu_device *adev)
@@ -749,6 +764,7 @@ static void gmc_v8_0_gart_disable(struct amdgpu_device *adev)
 	WREG32(mmVM_L2_CNTL, tmp);
 	WREG32(mmVM_L2_CNTL2, 0);
 	amdgpu_gart_table_vram_unpin(adev);
+	amdgpu_gart_table_vram_shadow_unpin(adev);
 }
 
 /**
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/10] drm/amdgpu: make recover_bo_from_shadow be generic
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 05/10] drm/amdgpu: shadow gart table support Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 07/10] drm/amdgpu: implement gart recovery Chunming Zhou
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

Change-Id: I74758b9ca84058f3f2db5509822d8aad840d283e
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 40 +++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     | 48 ++++--------------------------
 3 files changed, 51 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index c1111c9..1d0bdfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -439,6 +439,46 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	return r;
 }
 
+int amdgpu_bo_recover_bo_from_shadow(struct amdgpu_device *adev,
+				     struct amdgpu_ring *ring,
+				     struct amd_sched_entity *entity,
+				     struct amdgpu_bo *bo,
+				     struct reservation_object *resv,
+				     struct fence **fence)
+
+{
+	int r;
+	uint64_t vram_addr, gtt_addr;
+
+	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM, &vram_addr);
+	if (r) {
+		DRM_ERROR("Failed to pin bo object\n");
+		goto err1;
+	}
+	r = amdgpu_bo_pin(bo->shadow, AMDGPU_GEM_DOMAIN_GTT, &gtt_addr);
+	if (r) {
+		DRM_ERROR("Failed to pin bo shadow object\n");
+		goto err2;
+	}
+
+	r = reservation_object_reserve_shared(bo->tbo.resv);
+	if (r)
+		goto err3;
+
+	r = amdgpu_copy_buffer(ring, entity, gtt_addr,
+			       vram_addr, amdgpu_bo_size(bo), resv, fence);
+	if (!r)
+		amdgpu_bo_fence(bo, *fence, true);
+
+err3:
+	amdgpu_bo_unpin(bo->shadow);
+err2:
+	amdgpu_bo_unpin(bo);
+err1:
+
+	return r;
+}
+
 int amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr)
 {
 	bool is_iomem;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index b994fd4..f35fd68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -158,6 +158,12 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
 void amdgpu_bo_fence(struct amdgpu_bo *bo, struct fence *fence,
 		     bool shared);
 u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo);
+int amdgpu_bo_recover_bo_from_shadow(struct amdgpu_device *adev,
+				     struct amdgpu_ring *ring,
+				     struct amd_sched_entity *entity,
+				     struct amdgpu_bo *bo,
+				     struct reservation_object *resv,
+				     struct fence **fence);
 
 /*
  * sub allocation
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 01dd888..3eecddc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -713,46 +713,6 @@ error_free:
 	return r;
 }
 
-static int amdgpu_vm_recover_bo_from_shadow(struct amdgpu_device *adev,
-					    struct amdgpu_vm *vm,
-					    struct amdgpu_bo *bo,
-					    struct amdgpu_bo *bo_shadow,
-					    struct reservation_object *resv,
-					    struct fence **fence)
-
-{
-	int r;
-	uint64_t vram_addr, gtt_addr;
-
-	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_VRAM, &vram_addr);
-	if (r) {
-		DRM_ERROR("Failed to pin bo object\n");
-		goto err1;
-	}
-	r = amdgpu_bo_pin(bo_shadow, AMDGPU_GEM_DOMAIN_GTT, &gtt_addr);
-	if (r) {
-		DRM_ERROR("Failed to pin bo shadow object\n");
-		goto err2;
-	}
-
-	r = reservation_object_reserve_shared(bo->tbo.resv);
-	if (r)
-		goto err3;
-
-	r = amdgpu_copy_buffer(vm->ring, &vm->recover_entity, gtt_addr,
-			       vram_addr, amdgpu_bo_size(bo), resv, fence);
-	if (!r)
-		amdgpu_bo_fence(bo, *fence, true);
-
-err3:
-	amdgpu_bo_unpin(bo_shadow);
-err2:
-	amdgpu_bo_unpin(bo);
-err1:
-
-	return r;
-}
-
 int amdgpu_vm_recover_page_table_from_shadow(struct amdgpu_device *adev,
 					     struct amdgpu_vm *vm)
 {
@@ -767,8 +727,9 @@ int amdgpu_vm_recover_page_table_from_shadow(struct amdgpu_device *adev,
 	if (unlikely(r != 0))
 		return r;
 
-	r = amdgpu_vm_recover_bo_from_shadow(adev, vm, vm->page_directory,
-					     vm->page_directory->shadow,
+	r = amdgpu_bo_recover_bo_from_shadow(adev, vm->ring,
+					     &vm->recover_entity,
+					     vm->page_directory,
 					     NULL, &fence);
 	if (r) {
 		DRM_ERROR("recover page table failed!\n");
@@ -784,7 +745,8 @@ int amdgpu_vm_recover_page_table_from_shadow(struct amdgpu_device *adev,
 
 		if (!bo || !bo_shadow)
 			continue;
-		r = amdgpu_vm_recover_bo_from_shadow(adev, vm, bo, bo_shadow,
+		r = amdgpu_bo_recover_bo_from_shadow(adev, vm->ring,
+						     &vm->recover_entity, bo,
 						     NULL, &fence);
 		if (r) {
 			DRM_ERROR("recover page table failed!\n");
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/10] drm/amdgpu: implement gart recovery
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 06/10] drm/amdgpu: make recover_bo_from_shadow be generic Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 08/10] drm/amdgpu: recover gart table first when full reset Chunming Zhou
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

recover gart bo from its shadow bo.

Change-Id: Idbb91d62b1c3cf73f7d90b5f2c650f2690e5a42b
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2985578d..3ee01fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -667,6 +667,7 @@ void amdgpu_gart_unbind(struct amdgpu_device *adev, unsigned offset,
 int amdgpu_gart_bind(struct amdgpu_device *adev, unsigned offset,
 		     int pages, struct page **pagelist,
 		     dma_addr_t *dma_addr, uint32_t flags);
+int amdgpu_gart_table_recover_from_shadow(struct amdgpu_device *adev);
 
 /*
  * GPU MC structures, functions & helpers
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index b306684..baeaee2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -135,6 +135,37 @@ int amdgpu_gart_table_vram_alloc(struct amdgpu_device *adev)
 	return 0;
 }
 
+int amdgpu_gart_table_recover_from_shadow(struct amdgpu_device *adev)
+{
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	struct fence *fence;
+	int r;
+
+	if (!amdgpu_vm_need_backup(adev))
+		return 0;
+	/* bo and shadow use same resv, so reserve one time */
+	r = amdgpu_bo_reserve(adev->gart.robj, false);
+	if (unlikely(r != 0))
+		return r;
+
+	r = amdgpu_bo_recover_bo_from_shadow(adev, ring,
+					     &adev->gart.recover_entity,
+					     adev->gart.robj,
+					     NULL, &fence);
+	amdgpu_bo_unreserve(adev->gart.robj);
+	if (r) {
+		DRM_ERROR("recover page table failed!\n");
+		goto err;
+	}
+
+	if (fence)
+		r = fence_wait(fence, false);
+	fence_put(fence);
+
+err:
+	return r;
+}
+
 /**
  * amdgpu_gart_table_vram_pin - pin gart page table in vram
  *
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/10] drm/amdgpu: recover gart table first when full reset
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 07/10] drm/amdgpu: implement gart recovery Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 09/10] drm/amdgpu: sync gart table before initialization completed Chunming Zhou
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

Change-Id: Iad7a90646dbb5df930a8ba177ce6bdc48415ff7d
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b7b4cf8..16ba37d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2191,6 +2191,8 @@ retry:
 				kthread_unpark(ring->sched.thread);
 				unpark_bits |= 1 << ring->idx;
 			}
+			DRM_INFO("recover gart table first\n");
+			amdgpu_gart_table_recover_from_shadow(adev);
 
 			spin_lock(&adev->vm_list_lock);
 			list_for_each_entry_safe(vm, tmp, &adev->vm_list, list) {
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/10] drm/amdgpu: sync gart table before initialization completed
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 08/10] drm/amdgpu: recover gart table first when full reset Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02  8:00   ` [PATCH 10/10] drm/amdgpu: fix memory leak of sched fence Chunming Zhou
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

Since the shadow is in GTT, shadow itslef pte isn't in shadow,
We need to do sync before initialization is completed

Change-Id: I29b433da6c71fc790a32ef202dd85a72ab6b5787
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h      |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 47 +++++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3ee01fe..4cad4b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -648,6 +648,7 @@ struct amdgpu_gart {
 	bool				ready;
 	const struct amdgpu_gart_funcs *gart_funcs;
 	struct amd_sched_entity         recover_entity;
+	u64                             shadow_gpu_addr;
 };
 
 int amdgpu_gart_table_ram_alloc(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index baeaee2..e99c8a3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -421,10 +421,46 @@ int amdgpu_gart_table_vram_shadow_pin(struct amdgpu_device *adev)
 	r = amdgpu_bo_kmap(adev->gart.robj->shadow, &adev->gart.shadow_ptr);
 	if (r)
 		amdgpu_bo_unpin(adev->gart.robj->shadow);
+	adev->gart.shadow_gpu_addr = gpu_addr;
 	amdgpu_bo_unreserve(adev->gart.robj->shadow);
 	return r;
 }
 
+/* Since the shadow is in GTT, shadow itslef pte isn't in shadow,
+   We need to do sync before initialization is completed */
+static int amdgpu_gart_table_shadow_sync(struct amdgpu_device *adev)
+{
+	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	struct amd_sched_entity *entity = &adev->gart.recover_entity;
+	struct fence *fence;
+	u64 vram_addr = adev->gart.table_addr;
+	u64 shadow_addr = adev->gart.shadow_gpu_addr;
+	int r;
+
+	if (!adev->gart.ready) {
+		DRM_ERROR("cannot sync gart table for shadow.\n");
+		return -EINVAL;
+	}
+	if (!amdgpu_vm_need_backup(adev) || !adev->gart.robj ||
+	    !adev->gart.robj->shadow)
+		return 0;
+	r = amdgpu_bo_reserve(adev->gart.robj->shadow, false);
+	if (unlikely(r != 0))
+		return r;
+	/* if adev->gart.ready, means both gart bo and shadow bo are pinned */
+	r = amdgpu_copy_buffer(ring, entity, vram_addr,
+			       shadow_addr, amdgpu_bo_size(adev->gart.robj),
+			       adev->gart.robj->tbo.resv, &fence);
+	if (!r)
+		amdgpu_bo_fence(adev->gart.robj, fence, true);
+
+	amdgpu_bo_unreserve(adev->gart.robj->shadow);
+	r = fence_wait(fence, true);
+	fence_put(fence);
+
+	return r;
+}
+
 void amdgpu_gart_table_vram_shadow_unpin(struct amdgpu_device *adev)
 {
 	int r;
@@ -445,10 +481,19 @@ int amdgpu_gart_late_init(struct amdgpu_device *adev)
 {
 	struct amd_sched_rq *rq;
 	struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+	int r;
 
 	rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_RECOVER];
-	return amd_sched_entity_init(&ring->sched, &adev->gart.recover_entity,
+	r = amd_sched_entity_init(&ring->sched, &adev->gart.recover_entity,
 				     rq, amdgpu_sched_jobs);
+	if (r)
+		return r;
+	r = amdgpu_gart_table_shadow_sync(adev);
+	if (r) {
+		DRM_ERROR("sync gart table failed (%d).\n", r);
+		amd_sched_entity_fini(&ring->sched, &adev->gart.recover_entity);
+	}
+	return r;
 }
 
 void amdgpu_gart_late_fini(struct amdgpu_device *adev)
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/10] drm/amdgpu: fix memory leak of sched fence
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 09/10] drm/amdgpu: sync gart table before initialization completed Chunming Zhou
@ 2016-08-02  8:00   ` Chunming Zhou
  2016-08-02 15:15   ` [PATCH 00/10] GART table recovery Christian König
  2016-08-03 14:01   ` Christian König
  11 siblings, 0 replies; 18+ messages in thread
From: Chunming Zhou @ 2016-08-02  8:00 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Chunming Zhou

amdgpu_job_free_resources is already called by submit_job.
If it is called in run_job, the sched fence could be got twice in sa bo free,
then memory leak happens.

Change-Id: I833612e31cf22b62174f3f76546fd11c9ea38780
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 8d87a9a..d56247d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -183,7 +183,7 @@ static struct fence *amdgpu_job_run(struct amd_sched_job *sched_job)
 	/* if gpu reset, hw fence will be replaced here */
 	fence_put(job->fence);
 	job->fence = fence_get(fence);
-	amdgpu_job_free_resources(job);
+
 	return fence;
 }
 
-- 
1.9.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2016-08-02  8:00   ` [PATCH 10/10] drm/amdgpu: fix memory leak of sched fence Chunming Zhou
@ 2016-08-02 15:15   ` Christian König
       [not found]     ` <f1b6c786-7e9c-ff61-1de9-299bc4daed15-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  2016-08-03 14:01   ` Christian König
  11 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2016-08-02 15:15 UTC (permalink / raw)
  To: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Well you have been hardworking during my vacation :)

Looks pretty good to me, but hope that I can get a closer look tomorrow.

Is there any particular order the three sets must be applied?

Regards,
Christian.

Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
> gart table is stored in one bo which must be ready before gart init, but the shadow bo must be created after gart is ready, so they cannot be created at a same time. shado bo itself aslo is included in gart table, So shadow bo needs a synchronization after device init. After sync, the contents of bo and shadwo bo will be same, and be updated at a same time. Then we will be able to recover gart table from shadow bo when gpu full reset.
>
> patch10 is a fix for memory leak.
>
> Chunming Zhou (10):
>    drm/amdgpu: make need_backup generic
>    drm/amdgpu: implement gart late_init/fini
>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>    drm/amdgpu: shadow gart table support
>    drm/amdgpu: make recover_bo_from_shadow be generic
>    drm/amdgpu: implement gart recovery
>    drm/amdgpu: recover gart table first when full reset
>    drm/amdgpu: sync gart table before initialization completed
>    drm/amdgpu: fix memory leak of sched fence
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 +++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>   9 files changed, 304 insertions(+), 66 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found]     ` <f1b6c786-7e9c-ff61-1de9-299bc4daed15-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-08-03  1:33       ` zhoucm1
  0 siblings, 0 replies; 18+ messages in thread
From: zhoucm1 @ 2016-08-03  1:33 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2016年08月02日 23:15, Christian König wrote:
> Well you have been hardworking during my vacation :)
>
> Looks pretty good to me, but hope that I can get a closer look tomorrow.
>
> Is there any particular order the three sets must be applied?
they are depending on my development order:
1. [PATCH 00/13] shadow page table support
2. [PATCH 00/11] add recovery entity and run queue
3. [PATCH 00/10] GART table recovery

Thanks,
David Zhou
>
> Regards,
> Christian.
>
> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>> gart table is stored in one bo which must be ready before gart init, 
>> but the shadow bo must be created after gart is ready, so they cannot 
>> be created at a same time. shado bo itself aslo is included in gart 
>> table, So shadow bo needs a synchronization after device init. After 
>> sync, the contents of bo and shadwo bo will be same, and be updated 
>> at a same time. Then we will be able to recover gart table from 
>> shadow bo when gpu full reset.
>>
>> patch10 is a fix for memory leak.
>>
>> Chunming Zhou (10):
>>    drm/amdgpu: make need_backup generic
>>    drm/amdgpu: implement gart late_init/fini
>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>    drm/amdgpu: shadow gart table support
>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>    drm/amdgpu: implement gart recovery
>>    drm/amdgpu: recover gart table first when full reset
>>    drm/amdgpu: sync gart table before initialization completed
>>    drm/amdgpu: fix memory leak of sched fence
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>> +++++++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2016-08-02 15:15   ` [PATCH 00/10] GART table recovery Christian König
@ 2016-08-03 14:01   ` Christian König
       [not found]     ` <54bb3255-2dda-f6ad-3682-8e4396ec932a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  11 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2016-08-03 14:01 UTC (permalink / raw)
  To: Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Well patch #10 is incorrect. The SA BO will be set to NULL by 
amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
reference the fence twice.

Additional to that the whole approach here of restoring the GART from 
the backup using the SDMA won't work either. For the SDMA to work you 
need the GART to access the ring buffer.

So you run into a chicken and egg problem here, for the ring buffer to 
work you need the GART and for the GART backup to work you need the ring 
buffer.

We should just restore the GART content from the housekeeping structure 
instead. Going to evaluate if and how that might be possible.

Regards,
Christian.

Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
> gart table is stored in one bo which must be ready before gart init, but the shadow bo must be created after gart is ready, so they cannot be created at a same time. shado bo itself aslo is included in gart table, So shadow bo needs a synchronization after device init. After sync, the contents of bo and shadwo bo will be same, and be updated at a same time. Then we will be able to recover gart table from shadow bo when gpu full reset.
>
> patch10 is a fix for memory leak.
>
> Chunming Zhou (10):
>    drm/amdgpu: make need_backup generic
>    drm/amdgpu: implement gart late_init/fini
>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>    drm/amdgpu: shadow gart table support
>    drm/amdgpu: make recover_bo_from_shadow be generic
>    drm/amdgpu: implement gart recovery
>    drm/amdgpu: recover gart table first when full reset
>    drm/amdgpu: sync gart table before initialization completed
>    drm/amdgpu: fix memory leak of sched fence
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 +++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>   9 files changed, 304 insertions(+), 66 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found]     ` <54bb3255-2dda-f6ad-3682-8e4396ec932a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-08-04  3:35       ` zhoucm1
       [not found]         ` <57A2B810.6050209-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: zhoucm1 @ 2016-08-04  3:35 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2016年08月03日 22:01, Christian König wrote:
> Well patch #10 is incorrect. The SA BO will be set to NULL by 
> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
> reference the fence twice.
I see.
But amdgpu_job_free_resources still shouldn't be called twice, right? 
That's an obvious duplication although it seems no effect now. Is there 
any other reason?

>
> Additional to that the whole approach here of restoring the GART from 
> the backup using the SDMA won't work either. For the SDMA to work you 
> need the GART to access the ring buffer.
>
> So you run into a chicken and egg problem here, for the ring buffer to 
> work you need the GART and for the GART backup to work you need the 
> ring buffer.
Good catch, ring buffer is a GTT buffer as well.

Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo is 
only one bo.

Regards,
David Zhou

>
> We should just restore the GART content from the housekeeping 
> structure instead. Going to evaluate if and how that might be possible.

>
> Regards,
> Christian.
>
> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>> gart table is stored in one bo which must be ready before gart init, 
>> but the shadow bo must be created after gart is ready, so they cannot 
>> be created at a same time. shado bo itself aslo is included in gart 
>> table, So shadow bo needs a synchronization after device init. After 
>> sync, the contents of bo and shadwo bo will be same, and be updated 
>> at a same time. Then we will be able to recover gart table from 
>> shadow bo when gpu full reset.
>>
>> patch10 is a fix for memory leak.
>>
>> Chunming Zhou (10):
>>    drm/amdgpu: make need_backup generic
>>    drm/amdgpu: implement gart late_init/fini
>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>    drm/amdgpu: shadow gart table support
>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>    drm/amdgpu: implement gart recovery
>>    drm/amdgpu: recover gart table first when full reset
>>    drm/amdgpu: sync gart table before initialization completed
>>    drm/amdgpu: fix memory leak of sched fence
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>> +++++++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found]         ` <57A2B810.6050209-5C7GfCeVMHo@public.gmane.org>
@ 2016-08-04  9:58           ` Christian König
       [not found]             ` <077bb11d-957d-c6f2-2f87-248fbc19304a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2016-08-04  9:58 UTC (permalink / raw)
  To: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 04.08.2016 um 05:35 schrieb zhoucm1:
>
>
> On 2016年08月03日 22:01, Christian König wrote:
>> Well patch #10 is incorrect. The SA BO will be set to NULL by 
>> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
>> reference the fence twice.
> I see.
> But amdgpu_job_free_resources still shouldn't be called twice, right? 
> That's an obvious duplication although it seems no effect now. Is 
> there any other reason?

It's actually called from a couple of different locations:
1. From the CS path in amdgpu_cs.c as soon as we have a scheduler fence.
2. From the amdgpu_job_submit() path as soon as we have a scheduler fence.
3. From amdgpu_job_run() after submitting the job to the hardware ring.
4. From amdgpu_job_free(), this is for direct submissions or for freeing 
the job when something went wrong.

Thinking about it you could be right and we could probably drop the one 
in amdgpu_job_run(), because amdgpu_job_submit() should have already 
taken care of that. But I'm not 100% sure of that.

>
>>
>> Additional to that the whole approach here of restoring the GART from 
>> the backup using the SDMA won't work either. For the SDMA to work you 
>> need the GART to access the ring buffer.
>>
>> So you run into a chicken and egg problem here, for the ring buffer 
>> to work you need the GART and for the GART backup to work you need 
>> the ring buffer.
> Good catch, ring buffer is a GTT buffer as well.
>
> Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo 
> is only one bo.

Yeah that is what we did with radeon as well. Unfortunately the double 
housekeeping costs quite a bunch of memory.

And actually we have the exactly same information in the TTM MM as well, 
we would just need to bind all BOs again.

Give me a day or two to double check that. Might be that the solution is 
rather simple.

Regards,
Christian.

>
> Regards,
> David Zhou
>
>>
>> We should just restore the GART content from the housekeeping 
>> structure instead. Going to evaluate if and how that might be possible.
>
>>
>> Regards,
>> Christian.
>>
>> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>>> gart table is stored in one bo which must be ready before gart init, 
>>> but the shadow bo must be created after gart is ready, so they 
>>> cannot be created at a same time. shado bo itself aslo is included 
>>> in gart table, So shadow bo needs a synchronization after device 
>>> init. After sync, the contents of bo and shadwo bo will be same, and 
>>> be updated at a same time. Then we will be able to recover gart 
>>> table from shadow bo when gpu full reset.
>>>
>>> patch10 is a fix for memory leak.
>>>
>>> Chunming Zhou (10):
>>>    drm/amdgpu: make need_backup generic
>>>    drm/amdgpu: implement gart late_init/fini
>>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>>    drm/amdgpu: shadow gart table support
>>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>>    drm/amdgpu: implement gart recovery
>>>    drm/amdgpu: recover gart table first when full reset
>>>    drm/amdgpu: sync gart table before initialization completed
>>>    drm/amdgpu: fix memory leak of sched fence
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>>> +++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found]             ` <077bb11d-957d-c6f2-2f87-248fbc19304a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2016-08-18  8:50               ` zhoucm1
       [not found]                 ` <57B576BB.4030400-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: zhoucm1 @ 2016-08-18  8:50 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2016年08月04日 17:58, Christian König wrote:
> Am 04.08.2016 um 05:35 schrieb zhoucm1:
>>
>>
>> On 2016年08月03日 22:01, Christian König wrote:
>>> Well patch #10 is incorrect. The SA BO will be set to NULL by 
>>> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
>>> reference the fence twice.
>> I see.
>> But amdgpu_job_free_resources still shouldn't be called twice, right? 
>> That's an obvious duplication although it seems no effect now. Is 
>> there any other reason?
>
> It's actually called from a couple of different locations:
> 1. From the CS path in amdgpu_cs.c as soon as we have a scheduler fence.
> 2. From the amdgpu_job_submit() path as soon as we have a scheduler 
> fence.
> 3. From amdgpu_job_run() after submitting the job to the hardware ring.
> 4. From amdgpu_job_free(), this is for direct submissions or for 
> freeing the job when something went wrong.
>
> Thinking about it you could be right and we could probably drop the 
> one in amdgpu_job_run(), because amdgpu_job_submit() should have 
> already taken care of that. But I'm not 100% sure of that.
>
>>
>>>
>>> Additional to that the whole approach here of restoring the GART 
>>> from the backup using the SDMA won't work either. For the SDMA to 
>>> work you need the GART to access the ring buffer.
>>>
>>> So you run into a chicken and egg problem here, for the ring buffer 
>>> to work you need the GART and for the GART backup to work you need 
>>> the ring buffer.
>> Good catch, ring buffer is a GTT buffer as well.
>>
>> Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo 
>> is only one bo.
>
> Yeah that is what we did with radeon as well. Unfortunately the double 
> housekeeping costs quite a bunch of memory.
>
> And actually we have the exactly same information in the TTM MM as 
> well, we would just need to bind all BOs again.
>
> Give me a day or two to double check that. Might be that the solution 
> is rather simple.
How about this? Do you have any better idea for it? or just change ring 
buffer to VRAM bo?

Regards,
David Zhou
>
> Regards,
> Christian.
>
>>
>> Regards,
>> David Zhou
>>
>>>
>>> We should just restore the GART content from the housekeeping 
>>> structure instead. Going to evaluate if and how that might be possible.
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>>>> gart table is stored in one bo which must be ready before gart 
>>>> init, but the shadow bo must be created after gart is ready, so 
>>>> they cannot be created at a same time. shado bo itself aslo is 
>>>> included in gart table, So shadow bo needs a synchronization after 
>>>> device init. After sync, the contents of bo and shadwo bo will be 
>>>> same, and be updated at a same time. Then we will be able to 
>>>> recover gart table from shadow bo when gpu full reset.
>>>>
>>>> patch10 is a fix for memory leak.
>>>>
>>>> Chunming Zhou (10):
>>>>    drm/amdgpu: make need_backup generic
>>>>    drm/amdgpu: implement gart late_init/fini
>>>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>>>    drm/amdgpu: shadow gart table support
>>>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>>>    drm/amdgpu: implement gart recovery
>>>>    drm/amdgpu: recover gart table first when full reset
>>>>    drm/amdgpu: sync gart table before initialization completed
>>>>    drm/amdgpu: fix memory leak of sched fence
>>>>
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>>>> +++++++++++++++++++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] GART table recovery
       [not found]                 ` <57B576BB.4030400-5C7GfCeVMHo@public.gmane.org>
@ 2016-08-18  9:03                   ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2016-08-18  9:03 UTC (permalink / raw)
  To: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.08.2016 um 10:50 schrieb zhoucm1:
>
>
> On 2016年08月04日 17:58, Christian König wrote:
>> Am 04.08.2016 um 05:35 schrieb zhoucm1:
>>>
>>>
>>> On 2016年08月03日 22:01, Christian König wrote:
>>>> Well patch #10 is incorrect. The SA BO will be set to NULL by 
>>>> amdgpu_sa_bo_free(), so it can't be freed twice and so you can't 
>>>> reference the fence twice.
>>> I see.
>>> But amdgpu_job_free_resources still shouldn't be called twice, 
>>> right? That's an obvious duplication although it seems no effect 
>>> now. Is there any other reason?
>>
>> It's actually called from a couple of different locations:
>> 1. From the CS path in amdgpu_cs.c as soon as we have a scheduler fence.
>> 2. From the amdgpu_job_submit() path as soon as we have a scheduler 
>> fence.
>> 3. From amdgpu_job_run() after submitting the job to the hardware ring.
>> 4. From amdgpu_job_free(), this is for direct submissions or for 
>> freeing the job when something went wrong.
>>
>> Thinking about it you could be right and we could probably drop the 
>> one in amdgpu_job_run(), because amdgpu_job_submit() should have 
>> already taken care of that. But I'm not 100% sure of that.
>>
>>>
>>>>
>>>> Additional to that the whole approach here of restoring the GART 
>>>> from the backup using the SDMA won't work either. For the SDMA to 
>>>> work you need the GART to access the ring buffer.
>>>>
>>>> So you run into a chicken and egg problem here, for the ring buffer 
>>>> to work you need the GART and for the GART backup to work you need 
>>>> the ring buffer.
>>> Good catch, ring buffer is a GTT buffer as well.
>>>
>>> Then Can we use memcpy to copy GTT to VRAM? Fortunately, the GART bo 
>>> is only one bo.
>>
>> Yeah that is what we did with radeon as well. Unfortunately the 
>> double housekeeping costs quite a bunch of memory.
>>
>> And actually we have the exactly same information in the TTM MM as 
>> well, we would just need to bind all BOs again.
>>
>> Give me a day or two to double check that. Might be that the solution 
>> is rather simple.
> How about this? Do you have any better idea for it?

Sorry for not answering earlier, but you know humans have only one head 
and two hands to type :)

The informations needed to restore the GART is already stored in the 
amdgpu_ttm_tt structures.

We just need to link them together in amdgpu_ttm_backend_bind() and 
unlink them in amdgpu_ttm_backend_unbind() to be able to restore the 
GART table after a reset.

> or just change ring buffer to VRAM bo?

Interesting idea, but considering how much overhead it has to write to 
VRAM from the CPU it clearly wouldn't be such a good idea to do in general.

Regards,
Christian.

>
> Regards,
> David Zhou
>>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> David Zhou
>>>
>>>>
>>>> We should just restore the GART content from the housekeeping 
>>>> structure instead. Going to evaluate if and how that might be 
>>>> possible.
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 02.08.2016 um 10:00 schrieb Chunming Zhou:
>>>>> gart table is stored in one bo which must be ready before gart 
>>>>> init, but the shadow bo must be created after gart is ready, so 
>>>>> they cannot be created at a same time. shado bo itself aslo is 
>>>>> included in gart table, So shadow bo needs a synchronization after 
>>>>> device init. After sync, the contents of bo and shadwo bo will be 
>>>>> same, and be updated at a same time. Then we will be able to 
>>>>> recover gart table from shadow bo when gpu full reset.
>>>>>
>>>>> patch10 is a fix for memory leak.
>>>>>
>>>>> Chunming Zhou (10):
>>>>>    drm/amdgpu: make need_backup generic
>>>>>    drm/amdgpu: implement gart late_init/fini
>>>>>    drm/amdgpu: add gart_late_init/fini to gmc V7/8
>>>>>    drm/amdgpu: abstract amdgpu_bo_create_shadow
>>>>>    drm/amdgpu: shadow gart table support
>>>>>    drm/amdgpu: make recover_bo_from_shadow be generic
>>>>>    drm/amdgpu: implement gart recovery
>>>>>    drm/amdgpu: recover gart table first when full reset
>>>>>    drm/amdgpu: sync gart table before initialization completed
>>>>>    drm/amdgpu: fix memory leak of sched fence
>>>>>
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 ++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 139 
>>>>> +++++++++++++++++++++++++++++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  80 ++++++++++++++---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   9 ++
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     |  50 ++---------
>>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c      |  39 +++++++-
>>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c      |  40 ++++++++-
>>>>>   9 files changed, 304 insertions(+), 66 deletions(-)
>>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-08-18  9:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-02  8:00 [PATCH 00/10] GART table recovery Chunming Zhou
     [not found] ` <1470124840-26170-1-git-send-email-David1.Zhou-5C7GfCeVMHo@public.gmane.org>
2016-08-02  8:00   ` [PATCH 01/10] drm/amdgpu: make need_backup generic Chunming Zhou
2016-08-02  8:00   ` [PATCH 02/10] drm/amdgpu: implement gart late_init/fini Chunming Zhou
2016-08-02  8:00   ` [PATCH 03/10] drm/amdgpu: add gart_late_init/fini to gmc V7/8 Chunming Zhou
2016-08-02  8:00   ` [PATCH 04/10] drm/amdgpu: abstract amdgpu_bo_create_shadow Chunming Zhou
2016-08-02  8:00   ` [PATCH 05/10] drm/amdgpu: shadow gart table support Chunming Zhou
2016-08-02  8:00   ` [PATCH 06/10] drm/amdgpu: make recover_bo_from_shadow be generic Chunming Zhou
2016-08-02  8:00   ` [PATCH 07/10] drm/amdgpu: implement gart recovery Chunming Zhou
2016-08-02  8:00   ` [PATCH 08/10] drm/amdgpu: recover gart table first when full reset Chunming Zhou
2016-08-02  8:00   ` [PATCH 09/10] drm/amdgpu: sync gart table before initialization completed Chunming Zhou
2016-08-02  8:00   ` [PATCH 10/10] drm/amdgpu: fix memory leak of sched fence Chunming Zhou
2016-08-02 15:15   ` [PATCH 00/10] GART table recovery Christian König
     [not found]     ` <f1b6c786-7e9c-ff61-1de9-299bc4daed15-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-08-03  1:33       ` zhoucm1
2016-08-03 14:01   ` Christian König
     [not found]     ` <54bb3255-2dda-f6ad-3682-8e4396ec932a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-08-04  3:35       ` zhoucm1
     [not found]         ` <57A2B810.6050209-5C7GfCeVMHo@public.gmane.org>
2016-08-04  9:58           ` Christian König
     [not found]             ` <077bb11d-957d-c6f2-2f87-248fbc19304a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2016-08-18  8:50               ` zhoucm1
     [not found]                 ` <57B576BB.4030400-5C7GfCeVMHo@public.gmane.org>
2016-08-18  9:03                   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.