[RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-17 22:27 ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

This patchset is based on earlier work by Boris[1] that allowed to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. On top of that I also serialized
any GPU reset we trigger from within amdgpu code to also go through the same
ordered wq and in this way simplify somewhat our GPU reset code so we don't need
to protect from concurrency by multiple GPU reset triggeres such as TDR on one
hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that
all the entities that go through reset together will be serialized one against
another. 

TDR triggered by multiple entities within the same domain due to the same reason will not
be triggered as the first such reset will cancel all the pending resets. This is
relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
those will still happen after the in flight resets finishes.

[1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Andrey Grodzovsky (6):
  drm/amdgpu: Init GPU reset single threaded wq
  drm/amdgpu: Move scheduler init to after XGMI is ready
  drm/amdgpu: Fix crash on modprobe
  drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  drm/amdgpu: Drop hive->in_reset
  drm/amdgpu: Drop concurrent GPU reset protection for device

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
 7 files changed, 132 insertions(+), 136 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-17 22:27 ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

This patchset is based on earlier work by Boris[1] that allowed to have an
ordered workqueue at the driver level that will be used by the different
schedulers to queue their timeout work. On top of that I also serialized
any GPU reset we trigger from within amdgpu code to also go through the same
ordered wq and in this way simplify somewhat our GPU reset code so we don't need
to protect from concurrency by multiple GPU reset triggeres such as TDR on one
hand and sysfs trigger or RAS trigger on the other hand.

As advised by Christian and Daniel I defined a reset_domain struct such that
all the entities that go through reset together will be serialized one against
another. 

TDR triggered by multiple entities within the same domain due to the same reason will not
be triggered as the first such reset will cancel all the pending resets. This is
relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
those will still happen after the in flight resets finishes.

[1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/

P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Andrey Grodzovsky (6):
  drm/amdgpu: Init GPU reset single threaded wq
  drm/amdgpu: Move scheduler init to after XGMI is ready
  drm/amdgpu: Fix crash on modprobe
  drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  drm/amdgpu: Drop hive->in_reset
  drm/amdgpu: Drop concurrent GPU reset protection for device

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
 7 files changed, 132 insertions(+), 136 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Do it for both single device and XGMI hive cases.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  7 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  9 +++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  2 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9f017663ac50..b5ff76aae7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -812,6 +812,11 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+
+struct amdgpu_reset_domain {
+	struct workqueue_struct *wq;
+};
+
 struct amdgpu_device {
 	struct device			*dev;
 	struct pci_dev			*pdev;
@@ -1096,6 +1101,8 @@ struct amdgpu_device {
 
 	struct amdgpu_reset_control     *reset_cntl;
 	uint32_t                        ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
+
+	struct amdgpu_reset_domain	reset_domain;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5625f7736e37..5f13195d23d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (r)
 		goto init_failed;
 
-	if (adev->gmc.xgmi.num_physical_nodes > 1)
+	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+		struct amdgpu_hive_info *hive;
+
 		amdgpu_xgmi_add_device(adev);
 
+		hive = amdgpu_get_xgmi_hive(adev);
+		if (!hive || !hive->reset_domain.wq) {
+			DRM_ERROR("Failed to obtain reset domain info for XGMI hive:%llx", hive->hive_id);
+			r = -EINVAL;
+			goto init_failed;
+		}
+
+		adev->reset_domain.wq = hive->reset_domain.wq;
+	} else {
+		adev->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-dev", 0);
+		if (!adev->reset_domain.wq) {
+			r = -ENOMEM;
+			goto init_failed;
+		}
+	}
+
 	/* Don't init kfd if whole hive need to be reset during init */
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 0fad2bf854ae..8b116f398101 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -391,6 +391,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 		goto pro_end;
 	}
 
+	hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0);
+	if (!hive->reset_domain.wq) {
+		dev_err(adev->dev, "XGMI: failed allocating wq for reset domain!\n");
+		kfree(hive);
+		hive = NULL;
+		goto pro_end;
+	}
+
 	hive->hive_id = adev->gmc.xgmi.hive_id;
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
@@ -400,6 +408,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
 	hive->hi_req_gpu = NULL;
+
 	/*
 	 * hive pstate on boot is high in vega20 so we have to go to low
 	 * pstate on after boot.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index d2189bf7d428..6121aaa292cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,8 @@ struct amdgpu_hive_info {
 		AMDGPU_XGMI_PSTATE_MAX_VEGA20,
 		AMDGPU_XGMI_PSTATE_UNKNOWN
 	} pstate;
+
+	struct amdgpu_reset_domain reset_domain;
 };
 
 struct amdgpu_pcs_ras_field {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Do it for both single device and XGMI hive cases.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  7 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  9 +++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  2 ++
 4 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 9f017663ac50..b5ff76aae7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -812,6 +812,11 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+
+struct amdgpu_reset_domain {
+	struct workqueue_struct *wq;
+};
+
 struct amdgpu_device {
 	struct device			*dev;
 	struct pci_dev			*pdev;
@@ -1096,6 +1101,8 @@ struct amdgpu_device {
 
 	struct amdgpu_reset_control     *reset_cntl;
 	uint32_t                        ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
+
+	struct amdgpu_reset_domain	reset_domain;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5625f7736e37..5f13195d23d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2391,9 +2391,27 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (r)
 		goto init_failed;
 
-	if (adev->gmc.xgmi.num_physical_nodes > 1)
+	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+		struct amdgpu_hive_info *hive;
+
 		amdgpu_xgmi_add_device(adev);
 
+		hive = amdgpu_get_xgmi_hive(adev);
+		if (!hive || !hive->reset_domain.wq) {
+			DRM_ERROR("Failed to obtain reset domain info for XGMI hive:%llx", hive->hive_id);
+			r = -EINVAL;
+			goto init_failed;
+		}
+
+		adev->reset_domain.wq = hive->reset_domain.wq;
+	} else {
+		adev->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-dev", 0);
+		if (!adev->reset_domain.wq) {
+			r = -ENOMEM;
+			goto init_failed;
+		}
+	}
+
 	/* Don't init kfd if whole hive need to be reset during init */
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 0fad2bf854ae..8b116f398101 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -391,6 +391,14 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 		goto pro_end;
 	}
 
+	hive->reset_domain.wq = alloc_ordered_workqueue("amdgpu-reset-hive", 0);
+	if (!hive->reset_domain.wq) {
+		dev_err(adev->dev, "XGMI: failed allocating wq for reset domain!\n");
+		kfree(hive);
+		hive = NULL;
+		goto pro_end;
+	}
+
 	hive->hive_id = adev->gmc.xgmi.hive_id;
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
@@ -400,6 +408,7 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
 	hive->hi_req_gpu = NULL;
+
 	/*
 	 * hive pstate on boot is high in vega20 so we have to go to low
 	 * pstate on after boot.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index d2189bf7d428..6121aaa292cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,8 @@ struct amdgpu_hive_info {
 		AMDGPU_XGMI_PSTATE_MAX_VEGA20,
 		AMDGPU_XGMI_PSTATE_UNKNOWN
 	} pstate;
+
+	struct amdgpu_reset_domain reset_domain;
 };
 
 struct amdgpu_pcs_ras_field {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5f13195d23d1..b595e6d699b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)
 	return r;
 }
 
+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
+{
+	long timeout;
+	int r, i;
+
+	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+		struct amdgpu_ring *ring = adev->rings[i];
+
+		/* No need to setup the GPU scheduler for rings that don't need it */
+		if (!ring || ring->no_scheduler)
+			continue;
+
+		switch (ring->funcs->type) {
+		case AMDGPU_RING_TYPE_GFX:
+			timeout = adev->gfx_timeout;
+			break;
+		case AMDGPU_RING_TYPE_COMPUTE:
+			timeout = adev->compute_timeout;
+			break;
+		case AMDGPU_RING_TYPE_SDMA:
+			timeout = adev->sdma_timeout;
+			break;
+		default:
+			timeout = adev->video_timeout;
+			break;
+		}
+
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+				   ring->num_hw_submission, amdgpu_job_hang_limit,
+				   timeout, adev->reset_domain.wq, ring->sched_score, ring->name);
+		if (r) {
+			DRM_ERROR("Failed to create scheduler on ring %s.\n",
+				  ring->name);
+			return r;
+		}
+	}
+
+	return 0;
+}
+
+
 /**
  * amdgpu_device_ip_init - run init for hardware IPs
  *
@@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 		}
 	}
 
+	r = amdgpu_device_init_schedulers(adev);
+	if (r)
+		goto init_failed;
+
 	/* Don't init kfd if whole hive need to be reset during init */
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3b7e86ea7167..5527c68c51de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 				  atomic_t *sched_score)
 {
 	struct amdgpu_device *adev = ring->adev;
-	long timeout;
-	int r;
 
 	if (!adev)
 		return -EINVAL;
@@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 	spin_lock_init(&ring->fence_drv.lock);
 	ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
 					 GFP_KERNEL);
-	if (!ring->fence_drv.fences)
-		return -ENOMEM;
 
-	/* No need to setup the GPU scheduler for rings that don't need it */
-	if (ring->no_scheduler)
-		return 0;
+	ring->num_hw_submission = num_hw_submission;
+	ring->sched_score = sched_score;
 
-	switch (ring->funcs->type) {
-	case AMDGPU_RING_TYPE_GFX:
-		timeout = adev->gfx_timeout;
-		break;
-	case AMDGPU_RING_TYPE_COMPUTE:
-		timeout = adev->compute_timeout;
-		break;
-	case AMDGPU_RING_TYPE_SDMA:
-		timeout = adev->sdma_timeout;
-		break;
-	default:
-		timeout = adev->video_timeout;
-		break;
-	}
-
-	r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
-			   num_hw_submission, amdgpu_job_hang_limit,
-			   timeout, NULL, sched_score, ring->name);
-	if (r) {
-		DRM_ERROR("Failed to create scheduler on ring %s.\n",
-			  ring->name);
-		return r;
-	}
+	if (!ring->fence_drv.fences)
+		return -ENOMEM;
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4d380e79752c..a4b8279e3011 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -253,6 +253,8 @@ struct amdgpu_ring {
 	bool			has_compute_vm_bug;
 	bool			no_scheduler;
 	int			hw_prio;
+	unsigned 		num_hw_submission;
+	atomic_t		*sched_score;
 };
 
 #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
 3 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5f13195d23d1..b595e6d699b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)
 	return r;
 }
 
+static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
+{
+	long timeout;
+	int r, i;
+
+	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+		struct amdgpu_ring *ring = adev->rings[i];
+
+		/* No need to setup the GPU scheduler for rings that don't need it */
+		if (!ring || ring->no_scheduler)
+			continue;
+
+		switch (ring->funcs->type) {
+		case AMDGPU_RING_TYPE_GFX:
+			timeout = adev->gfx_timeout;
+			break;
+		case AMDGPU_RING_TYPE_COMPUTE:
+			timeout = adev->compute_timeout;
+			break;
+		case AMDGPU_RING_TYPE_SDMA:
+			timeout = adev->sdma_timeout;
+			break;
+		default:
+			timeout = adev->video_timeout;
+			break;
+		}
+
+		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
+				   ring->num_hw_submission, amdgpu_job_hang_limit,
+				   timeout, adev->reset_domain.wq, ring->sched_score, ring->name);
+		if (r) {
+			DRM_ERROR("Failed to create scheduler on ring %s.\n",
+				  ring->name);
+			return r;
+		}
+	}
+
+	return 0;
+}
+
+
 /**
  * amdgpu_device_ip_init - run init for hardware IPs
  *
@@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 		}
 	}
 
+	r = amdgpu_device_init_schedulers(adev);
+	if (r)
+		goto init_failed;
+
 	/* Don't init kfd if whole hive need to be reset during init */
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3b7e86ea7167..5527c68c51de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 				  atomic_t *sched_score)
 {
 	struct amdgpu_device *adev = ring->adev;
-	long timeout;
-	int r;
 
 	if (!adev)
 		return -EINVAL;
@@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 	spin_lock_init(&ring->fence_drv.lock);
 	ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
 					 GFP_KERNEL);
-	if (!ring->fence_drv.fences)
-		return -ENOMEM;
 
-	/* No need to setup the GPU scheduler for rings that don't need it */
-	if (ring->no_scheduler)
-		return 0;
+	ring->num_hw_submission = num_hw_submission;
+	ring->sched_score = sched_score;
 
-	switch (ring->funcs->type) {
-	case AMDGPU_RING_TYPE_GFX:
-		timeout = adev->gfx_timeout;
-		break;
-	case AMDGPU_RING_TYPE_COMPUTE:
-		timeout = adev->compute_timeout;
-		break;
-	case AMDGPU_RING_TYPE_SDMA:
-		timeout = adev->sdma_timeout;
-		break;
-	default:
-		timeout = adev->video_timeout;
-		break;
-	}
-
-	r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
-			   num_hw_submission, amdgpu_job_hang_limit,
-			   timeout, NULL, sched_score, ring->name);
-	if (r) {
-		DRM_ERROR("Failed to create scheduler on ring %s.\n",
-			  ring->name);
-		return r;
-	}
+	if (!ring->fence_drv.fences)
+		return -ENOMEM;
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 4d380e79752c..a4b8279e3011 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -253,6 +253,8 @@ struct amdgpu_ring {
 	bool			has_compute_vm_bug;
 	bool			no_scheduler;
 	int			hw_prio;
+	unsigned 		num_hw_submission;
+	atomic_t		*sched_score;
 };
 
 #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Restrict jobs resubmission to suspend case
only since schedulers not initialised yet on
probe.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 5527c68c51de..8ebd954e06c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
 		if (!ring || !ring->fence_drv.initialized)
 			continue;
 
-		if (!ring->no_scheduler) {
+		if (adev->in_suspend && !ring->no_scheduler) {
 			drm_sched_resubmit_jobs(&ring->sched);
 			drm_sched_start(&ring->sched, true);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Restrict jobs resubmission to suspend case
only since schedulers not initialised yet on
probe.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 5527c68c51de..8ebd954e06c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
 		if (!ring || !ring->fence_drv.initialized)
 			continue;
 
-		if (!ring->no_scheduler) {
+		if (adev->in_suspend && !ring->no_scheduler) {
 			drm_sched_resubmit_jobs(&ring->sched);
 			drm_sched_start(&ring->sched, true);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Use reset domain wq also for non TDR gpu recovery trigers
such as sysfs and RAS. We must serialize all possible
GPU recoveries to gurantee no concurrency there.
For TDR call the original recovery function directly since
it's already executed from within the wq. For others just
use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b5ff76aae7e0..8e96b9a14452 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			      struct amdgpu_job* job);
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+			      struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b595e6d699b5..55cd67b9ede2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 			      struct amdgpu_job *job)
 {
 	struct list_head device_list, *device_list_handle =  NULL;
@@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	return r;
 }
 
+struct recover_work_struct {
+	struct work_struct base;
+	struct amdgpu_device *adev;
+	struct amdgpu_job *job;
+	int ret;
+};
+
+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
+{
+	struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);
+
+	recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
+}
+/*
+ * Serialize gpu recover into reset domain single threaded wq
+ */
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+				    struct amdgpu_job *job)
+{
+	struct recover_work_struct work = {.adev = adev, .job = job};
+
+	INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
+
+	if (!queue_work(adev->reset_domain.wq, &work.base))
+		return -EAGAIN;
+
+	flush_work(&work.base);
+
+	return work.ret;
+}
+
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index bfc47bea23db..38c9fd7b7ad4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
+		amdgpu_device_gpu_recover_imp(ring->adev, job);
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Use reset domain wq also for non TDR gpu recovery trigers
such as sysfs and RAS. We must serialize all possible
GPU recoveries to gurantee no concurrency there.
For TDR call the original recovery function directly since
it's already executed from within the wq. For others just
use a wrapper to qeueue work and wait on it to finish.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b5ff76aae7e0..8e96b9a14452 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev);
 bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
 int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			      struct amdgpu_job* job);
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
+			      struct amdgpu_job *job);
 void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b595e6d699b5..55cd67b9ede2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
  * Returns 0 for success or an error on failure.
  */
 
-int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 			      struct amdgpu_job *job)
 {
 	struct list_head device_list, *device_list_handle =  NULL;
@@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	return r;
 }
 
+struct recover_work_struct {
+	struct work_struct base;
+	struct amdgpu_device *adev;
+	struct amdgpu_job *job;
+	int ret;
+};
+
+static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
+{
+	struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);
+
+	recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
+}
+/*
+ * Serialize gpu recover into reset domain single threaded wq
+ */
+int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
+				    struct amdgpu_job *job)
+{
+	struct recover_work_struct work = {.adev = adev, .job = job};
+
+	INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
+
+	if (!queue_work(adev->reset_domain.wq, &work.base))
+		return -EAGAIN;
+
+	flush_work(&work.base);
+
+	return work.ret;
+}
+
 /**
  * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index bfc47bea23db..38c9fd7b7ad4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
 
 	if (amdgpu_device_should_recover_gpu(ring->adev)) {
-		amdgpu_device_gpu_recover(ring->adev, job);
+		amdgpu_device_gpu_recover_imp(ring->adev, job);
 	} else {
 		drm_sched_suspend_timeout(&ring->sched);
 		if (amdgpu_sriov_vf(adev))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 5/6] drm/amdgpu: Drop hive->in_reset
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 55cd67b9ede2..d2701e4d0622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	dev_info(adev->dev, "GPU %s begin!\n",
 		need_emergency_restart ? "jobs stop":"reset");
 
-	/*
-	 * Here we trylock to avoid chain of resets executing from
-	 * either trigger by jobs on different adevs in XGMI hive or jobs on
-	 * different schedulers for same device while this TO handler is running.
-	 * We always reset all schedulers for device and all devices for XGMI
-	 * hive so that should take care of them too.
-	 */
 	hive = amdgpu_get_xgmi_hive(adev);
-	if (hive) {
-		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
-			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
-				job ? job->base.id : -1, hive->hive_id);
-			amdgpu_put_xgmi_hive(hive);
-			if (job && job->vm)
-				drm_sched_increase_karma(&job->base);
-			return 0;
-		}
+	if (hive)
 		mutex_lock(&hive->hive_lock);
-	}
 
 	reset_context.method = AMD_RESET_METHOD_NONE;
 	reset_context.reset_req_dev = adev;
@@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 skip_recovery:
 	if (hive) {
-		atomic_set(&hive->in_reset, 0);
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 8b116f398101..0d54bef5c494 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
 	mutex_init(&hive->hive_lock);
-	atomic_set(&hive->in_reset, 0);
 	atomic_set(&hive->number_devices, 0);
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@ struct amdgpu_hive_info {
 	struct list_head node;
 	atomic_t number_devices;
 	struct mutex hive_lock;
-	atomic_t in_reset;
 	int hi_req_count;
 	struct amdgpu_device *hi_req_gpu;
 	struct task_barrier tb;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 5/6] drm/amdgpu: Drop hive->in_reset
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 55cd67b9ede2..d2701e4d0622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	dev_info(adev->dev, "GPU %s begin!\n",
 		need_emergency_restart ? "jobs stop":"reset");
 
-	/*
-	 * Here we trylock to avoid chain of resets executing from
-	 * either trigger by jobs on different adevs in XGMI hive or jobs on
-	 * different schedulers for same device while this TO handler is running.
-	 * We always reset all schedulers for device and all devices for XGMI
-	 * hive so that should take care of them too.
-	 */
 	hive = amdgpu_get_xgmi_hive(adev);
-	if (hive) {
-		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
-			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
-				job ? job->base.id : -1, hive->hive_id);
-			amdgpu_put_xgmi_hive(hive);
-			if (job && job->vm)
-				drm_sched_increase_karma(&job->base);
-			return 0;
-		}
+	if (hive)
 		mutex_lock(&hive->hive_lock);
-	}
 
 	reset_context.method = AMD_RESET_METHOD_NONE;
 	reset_context.reset_req_dev = adev;
@@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 skip_recovery:
 	if (hive) {
-		atomic_set(&hive->in_reset, 0);
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 8b116f398101..0d54bef5c494 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
 	mutex_init(&hive->hive_lock);
-	atomic_set(&hive->in_reset, 0);
 	atomic_set(&hive->number_devices, 0);
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@ struct amdgpu_hive_info {
 	struct list_head node;
 	atomic_t number_devices;
 	struct mutex hive_lock;
-	atomic_t in_reset;
 	int hi_req_count;
 	struct amdgpu_device *hi_req_gpu;
 	struct task_barrier tb;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx; +Cc: Monk.Liu, horace.chen, christian.koenig

Since now all GPU resets are serialzied there is no need for this.

This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout'

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++--------------------
 1 file changed, 7 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d2701e4d0622..311e0b9e1e4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 	return r;
 }
 
-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
+static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
 				struct amdgpu_hive_info *hive)
 {
-	if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
-		return false;
+	atomic_set(&adev->in_gpu_reset, 1);
 
 	if (hive) {
 		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
@@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
 		adev->mp1_state = PP_MP1_STATE_NONE;
 		break;
 	}
-
-	return true;
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
@@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
 	up_write(&adev->reset_sem);
 }
 
-/*
- * to lockup a list of amdgpu devices in a hive safely, if not a hive
- * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
- *
- * unlock won't require roll back.
- */
-static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
-{
-	struct amdgpu_device *tmp_adev = NULL;
-
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
-		if (!hive) {
-			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
-			return -ENODEV;
-		}
-		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			if (!amdgpu_device_lock_adev(tmp_adev, hive))
-				goto roll_back;
-		}
-	} else if (!amdgpu_device_lock_adev(adev, hive))
-		return -EAGAIN;
-
-	return 0;
-roll_back:
-	if (!list_is_first(&tmp_adev->gmc.xgmi.head, &hive->device_list)) {
-		/*
-		 * if the lockup iteration break in the middle of a hive,
-		 * it may means there may has a race issue,
-		 * or a hive device locked up independently.
-		 * we may be in trouble and may not, so will try to roll back
-		 * the lock and give out a warnning.
-		 */
-		dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock");
-		list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			amdgpu_device_unlock_adev(tmp_adev);
-		}
-	}
-	return -EAGAIN;
-}
-
 static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
 {
 	struct pci_dev *p = NULL;
@@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	reset_context.hive = hive;
 	clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags);
 
-	/*
-	 * lock the device before we try to operate the linked list
-	 * if didn't get the device lock, don't touch the linked list since
-	 * others may iterating it.
-	 */
-	r = amdgpu_device_lock_hive_adev(adev, hive);
-	if (r) {
-		dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
-					job ? job->base.id : -1);
-
-		/* even we skipped this reset, still need to set the job to guilty */
-		if (job && job->vm)
-			drm_sched_increase_karma(&job->base);
-		goto skip_recovery;
-	}
-
 	/*
 	 * Build list of devices to reset.
 	 * In case we are in XGMI hive mode, resort the device list
@@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 	/* block all schedulers and reset given job's ring */
 	list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+
+		amdgpu_device_lock_adev(tmp_adev, hive);
+
 		/*
 		 * Try to put the audio codec into suspend state
 		 * before gpu reset started.
@@ -5208,13 +5152,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 		amdgpu_device_unlock_adev(tmp_adev);
 	}
 
-skip_recovery:
 	if (hive) {
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
 
-	if (r && r != -EAGAIN)
+	if (r)
 		dev_info(adev->dev, "GPU reset end with ret = %d\n", r);
 	return r;
 }
@@ -5437,20 +5380,6 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
 	return 0;
 }
 
-static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
-{
-	int i;
-
-	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
-		struct amdgpu_ring *ring = adev->rings[i];
-
-		if (!ring || !ring->sched.thread)
-			continue;
-
-		cancel_delayed_work_sync(&ring->sched.work_tdr);
-	}
-}
-
 /**
  * amdgpu_pci_error_detected - Called when a PCI error is detected.
  * @pdev: PCI device struct
@@ -5481,14 +5410,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 	/* Fatal error, prepare for slot reset */
 	case pci_channel_io_frozen:
 		/*
-		 * Cancel and wait for all TDRs in progress if failing to
-		 * set  adev->in_gpu_reset in amdgpu_device_lock_adev
-		 *
 		 * Locking adev->reset_sem will prevent any external access
 		 * to GPU during PCI error recovery
 		 */
-		while (!amdgpu_device_lock_adev(adev, NULL))
-			amdgpu_cancel_all_tdr(adev);
+		amdgpu_device_lock_adev(adev, NULL);
 
 		/*
 		 * Block any work scheduling as we do for regular GPU reset
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device
@ 2021-12-17 22:27   ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-17 22:27 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Monk.Liu, Andrey Grodzovsky, horace.chen, christian.koenig, daniel

Since now all GPU resets are serialzied there is no need for this.

This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout'

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++--------------------
 1 file changed, 7 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d2701e4d0622..311e0b9e1e4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4763,11 +4763,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 	return r;
 }
 
-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
+static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
 				struct amdgpu_hive_info *hive)
 {
-	if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
-		return false;
+	atomic_set(&adev->in_gpu_reset, 1);
 
 	if (hive) {
 		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
@@ -4786,8 +4785,6 @@ static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
 		adev->mp1_state = PP_MP1_STATE_NONE;
 		break;
 	}
-
-	return true;
 }
 
 static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
@@ -4798,46 +4795,6 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
 	up_write(&adev->reset_sem);
 }
 
-/*
- * to lockup a list of amdgpu devices in a hive safely, if not a hive
- * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
- *
- * unlock won't require roll back.
- */
-static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
-{
-	struct amdgpu_device *tmp_adev = NULL;
-
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
-		if (!hive) {
-			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
-			return -ENODEV;
-		}
-		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			if (!amdgpu_device_lock_adev(tmp_adev, hive))
-				goto roll_back;
-		}
-	} else if (!amdgpu_device_lock_adev(adev, hive))
-		return -EAGAIN;
-
-	return 0;
-roll_back:
-	if (!list_is_first(&tmp_adev->gmc.xgmi.head, &hive->device_list)) {
-		/*
-		 * if the lockup iteration break in the middle of a hive,
-		 * it may means there may has a race issue,
-		 * or a hive device locked up independently.
-		 * we may be in trouble and may not, so will try to roll back
-		 * the lock and give out a warnning.
-		 */
-		dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock");
-		list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			amdgpu_device_unlock_adev(tmp_adev);
-		}
-	}
-	return -EAGAIN;
-}
-
 static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
 {
 	struct pci_dev *p = NULL;
@@ -5023,22 +4980,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	reset_context.hive = hive;
 	clear_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags);
 
-	/*
-	 * lock the device before we try to operate the linked list
-	 * if didn't get the device lock, don't touch the linked list since
-	 * others may iterating it.
-	 */
-	r = amdgpu_device_lock_hive_adev(adev, hive);
-	if (r) {
-		dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
-					job ? job->base.id : -1);
-
-		/* even we skipped this reset, still need to set the job to guilty */
-		if (job && job->vm)
-			drm_sched_increase_karma(&job->base);
-		goto skip_recovery;
-	}
-
 	/*
 	 * Build list of devices to reset.
 	 * In case we are in XGMI hive mode, resort the device list
@@ -5058,6 +4999,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 	/* block all schedulers and reset given job's ring */
 	list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+
+		amdgpu_device_lock_adev(tmp_adev, hive);
+
 		/*
 		 * Try to put the audio codec into suspend state
 		 * before gpu reset started.
@@ -5208,13 +5152,12 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 		amdgpu_device_unlock_adev(tmp_adev);
 	}
 
-skip_recovery:
 	if (hive) {
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
 
-	if (r && r != -EAGAIN)
+	if (r)
 		dev_info(adev->dev, "GPU reset end with ret = %d\n", r);
 	return r;
 }
@@ -5437,20 +5380,6 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
 	return 0;
 }
 
-static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
-{
-	int i;
-
-	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
-		struct amdgpu_ring *ring = adev->rings[i];
-
-		if (!ring || !ring->sched.thread)
-			continue;
-
-		cancel_delayed_work_sync(&ring->sched.work_tdr);
-	}
-}
-
 /**
  * amdgpu_pci_error_detected - Called when a PCI error is detected.
  * @pdev: PCI device struct
@@ -5481,14 +5410,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 	/* Fatal error, prepare for slot reset */
 	case pci_channel_io_frozen:
 		/*
-		 * Cancel and wait for all TDRs in progress if failing to
-		 * set  adev->in_gpu_reset in amdgpu_device_lock_adev
-		 *
 		 * Locking adev->reset_sem will prevent any external access
 		 * to GPU during PCI error recovery
 		 */
-		while (!amdgpu_device_lock_adev(adev, NULL))
-			amdgpu_cancel_all_tdr(adev);
+		amdgpu_device_lock_adev(adev, NULL);
 
 		/*
 		 * Block any work scheduling as we do for regular GPU reset
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
  2021-12-17 22:27   ` Andrey Grodzovsky
@ 2021-12-20  7:16     ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:16 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu



Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Before we initialize schedulers we must know which reset
> domain are we in - for single device there iis a single
> domain per device and so single wq per device. For XGMI
> the reset domain spans the entire XGMI hive and so the
> reset wq is per hive.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>   3 files changed, 51 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5f13195d23d1..b595e6d699b5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)
>   	return r;
>   }
>   
> +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> +{
> +	long timeout;
> +	int r, i;
> +
> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> +		struct amdgpu_ring *ring = adev->rings[i];
> +
> +		/* No need to setup the GPU scheduler for rings that don't need it */
> +		if (!ring || ring->no_scheduler)
> +			continue;
> +
> +		switch (ring->funcs->type) {
> +		case AMDGPU_RING_TYPE_GFX:
> +			timeout = adev->gfx_timeout;
> +			break;
> +		case AMDGPU_RING_TYPE_COMPUTE:
> +			timeout = adev->compute_timeout;
> +			break;
> +		case AMDGPU_RING_TYPE_SDMA:
> +			timeout = adev->sdma_timeout;
> +			break;
> +		default:
> +			timeout = adev->video_timeout;
> +			break;
> +		}
> +


> +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> +				   ring->num_hw_submission, amdgpu_job_hang_limit,
> +				   timeout, adev->reset_domain.wq, ring->sched_score, ring->name);
> +		if (r) {
> +			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> +				  ring->name);
> +			return r;
> +		}

Maybe better put that into amdgpu_ring.c. But not really a hard 
requirement, more a gut feeling.

> +	}
> +
> +	return 0;
> +}
> +
> +
>   /**
>    * amdgpu_device_ip_init - run init for hardware IPs
>    *
> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   		}
>   	}
>   
> +	r = amdgpu_device_init_schedulers(adev);
> +	if (r)
> +		goto init_failed;
> +
>   	/* Don't init kfd if whole hive need to be reset during init */
>   	if (!adev->gmc.xgmi.pending_reset)
>   		amdgpu_amdkfd_device_init(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 3b7e86ea7167..5527c68c51de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   				  atomic_t *sched_score)
>   {
>   	struct amdgpu_device *adev = ring->adev;
> -	long timeout;
> -	int r;
>   
>   	if (!adev)
>   		return -EINVAL;
> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   	spin_lock_init(&ring->fence_drv.lock);
>   	ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
>   					 GFP_KERNEL);
> -	if (!ring->fence_drv.fences)
> -		return -ENOMEM;
>   
> -	/* No need to setup the GPU scheduler for rings that don't need it */
> -	if (ring->no_scheduler)
> -		return 0;
> +	ring->num_hw_submission = num_hw_submission;
> +	ring->sched_score = sched_score;

Probably better to set that in the caller and drop the parameters from 
the amdgpu_fence_driver_init_ring() function completely.

Christian.

>   
> -	switch (ring->funcs->type) {
> -	case AMDGPU_RING_TYPE_GFX:
> -		timeout = adev->gfx_timeout;
> -		break;
> -	case AMDGPU_RING_TYPE_COMPUTE:
> -		timeout = adev->compute_timeout;
> -		break;
> -	case AMDGPU_RING_TYPE_SDMA:
> -		timeout = adev->sdma_timeout;
> -		break;
> -	default:
> -		timeout = adev->video_timeout;
> -		break;
> -	}
> -
> -	r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> -			   num_hw_submission, amdgpu_job_hang_limit,
> -			   timeout, NULL, sched_score, ring->name);
> -	if (r) {
> -		DRM_ERROR("Failed to create scheduler on ring %s.\n",
> -			  ring->name);
> -		return r;
> -	}
> +	if (!ring->fence_drv.fences)
> +		return -ENOMEM;
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 4d380e79752c..a4b8279e3011 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>   	bool			has_compute_vm_bug;
>   	bool			no_scheduler;
>   	int			hw_prio;
> +	unsigned 		num_hw_submission;
> +	atomic_t		*sched_score;
>   };
>   
>   #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
@ 2021-12-20  7:16     ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:16 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu



Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Before we initialize schedulers we must know which reset
> domain are we in - for single device there iis a single
> domain per device and so single wq per device. For XGMI
> the reset domain spans the entire XGMI hive and so the
> reset wq is per hive.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>   3 files changed, 51 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5f13195d23d1..b595e6d699b5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct amdgpu_device *adev)
>   	return r;
>   }
>   
> +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
> +{
> +	long timeout;
> +	int r, i;
> +
> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> +		struct amdgpu_ring *ring = adev->rings[i];
> +
> +		/* No need to setup the GPU scheduler for rings that don't need it */
> +		if (!ring || ring->no_scheduler)
> +			continue;
> +
> +		switch (ring->funcs->type) {
> +		case AMDGPU_RING_TYPE_GFX:
> +			timeout = adev->gfx_timeout;
> +			break;
> +		case AMDGPU_RING_TYPE_COMPUTE:
> +			timeout = adev->compute_timeout;
> +			break;
> +		case AMDGPU_RING_TYPE_SDMA:
> +			timeout = adev->sdma_timeout;
> +			break;
> +		default:
> +			timeout = adev->video_timeout;
> +			break;
> +		}
> +


> +		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> +				   ring->num_hw_submission, amdgpu_job_hang_limit,
> +				   timeout, adev->reset_domain.wq, ring->sched_score, ring->name);
> +		if (r) {
> +			DRM_ERROR("Failed to create scheduler on ring %s.\n",
> +				  ring->name);
> +			return r;
> +		}

Maybe better put that into amdgpu_ring.c. But not really a hard 
requirement, more a gut feeling.

> +	}
> +
> +	return 0;
> +}
> +
> +
>   /**
>    * amdgpu_device_ip_init - run init for hardware IPs
>    *
> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>   		}
>   	}
>   
> +	r = amdgpu_device_init_schedulers(adev);
> +	if (r)
> +		goto init_failed;
> +
>   	/* Don't init kfd if whole hive need to be reset during init */
>   	if (!adev->gmc.xgmi.pending_reset)
>   		amdgpu_amdkfd_device_init(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 3b7e86ea7167..5527c68c51de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   				  atomic_t *sched_score)
>   {
>   	struct amdgpu_device *adev = ring->adev;
> -	long timeout;
> -	int r;
>   
>   	if (!adev)
>   		return -EINVAL;
> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   	spin_lock_init(&ring->fence_drv.lock);
>   	ring->fence_drv.fences = kcalloc(num_hw_submission * 2, sizeof(void *),
>   					 GFP_KERNEL);
> -	if (!ring->fence_drv.fences)
> -		return -ENOMEM;
>   
> -	/* No need to setup the GPU scheduler for rings that don't need it */
> -	if (ring->no_scheduler)
> -		return 0;
> +	ring->num_hw_submission = num_hw_submission;
> +	ring->sched_score = sched_score;

Probably better to set that in the caller and drop the parameters from 
the amdgpu_fence_driver_init_ring() function completely.

Christian.

>   
> -	switch (ring->funcs->type) {
> -	case AMDGPU_RING_TYPE_GFX:
> -		timeout = adev->gfx_timeout;
> -		break;
> -	case AMDGPU_RING_TYPE_COMPUTE:
> -		timeout = adev->compute_timeout;
> -		break;
> -	case AMDGPU_RING_TYPE_SDMA:
> -		timeout = adev->sdma_timeout;
> -		break;
> -	default:
> -		timeout = adev->video_timeout;
> -		break;
> -	}
> -
> -	r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
> -			   num_hw_submission, amdgpu_job_hang_limit,
> -			   timeout, NULL, sched_score, ring->name);
> -	if (r) {
> -		DRM_ERROR("Failed to create scheduler on ring %s.\n",
> -			  ring->name);
> -		return r;
> -	}
> +	if (!ring->fence_drv.fences)
> +		return -ENOMEM;
>   
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 4d380e79752c..a4b8279e3011 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>   	bool			has_compute_vm_bug;
>   	bool			no_scheduler;
>   	int			hw_prio;
> +	unsigned 		num_hw_submission;
> +	atomic_t		*sched_score;
>   };
>   
>   #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), (ib)))


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-17 22:27   ` Andrey Grodzovsky
@ 2021-12-20  7:17     ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:17 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Restrict jobs resubmission to suspend case
> only since schedulers not initialised yet on
> probe.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 5527c68c51de..8ebd954e06c6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
>   		if (!ring || !ring->fence_drv.initialized)
>   			continue;
>   
> -		if (!ring->no_scheduler) {
> +		if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

>   			drm_sched_resubmit_jobs(&ring->sched);
>   			drm_sched_start(&ring->sched, true);
>   		}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-20  7:17     ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:17 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Restrict jobs resubmission to suspend case
> only since schedulers not initialised yet on
> probe.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 5527c68c51de..8ebd954e06c6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
>   		if (!ring || !ring->fence_drv.initialized)
>   			continue;
>   
> -		if (!ring->no_scheduler) {
> +		if (adev->in_suspend && !ring->no_scheduler) {

Uff, why is that suddenly necessary? Because of the changed order?

Christian.

>   			drm_sched_resubmit_jobs(&ring->sched);
>   			drm_sched_start(&ring->sched, true);
>   		}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  2021-12-17 22:27   ` Andrey Grodzovsky
@ 2021-12-20  7:20     ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:20 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Use reset domain wq also for non TDR gpu recovery trigers
> such as sysfs and RAS. We must serialize all possible
> GPU recoveries to gurantee no concurrency there.
> For TDR call the original recovery function directly since
> it's already executed from within the wq. For others just
> use a wrapper to qeueue work and wait on it to finish.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>   3 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index b5ff76aae7e0..8e96b9a14452 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev);
>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   			      struct amdgpu_job* job);
> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
> +			      struct amdgpu_job *job);
>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b595e6d699b5..55cd67b9ede2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>    * Returns 0 for success or an error on failure.
>    */
>   
> -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>   			      struct amdgpu_job *job)
>   {
>   	struct list_head device_list, *device_list_handle =  NULL;
> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   	return r;
>   }
>   
> +struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

> +	struct work_struct base;
> +	struct amdgpu_device *adev;
> +	struct amdgpu_job *job;
> +	int ret;
> +};
> +
> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
> +{
> +	struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);
> +
> +	recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
> +}
> +/*
> + * Serialize gpu recover into reset domain single threaded wq
> + */
> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> +				    struct amdgpu_job *job)
> +{
> +	struct recover_work_struct work = {.adev = adev, .job = job};
> +
> +	INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
> +
> +	if (!queue_work(adev->reset_domain.wq, &work.base))
> +		return -EAGAIN;
> +
> +	flush_work(&work.base);
> +
> +	return work.ret;
> +}

Maybe that should be part of the scheduler code? Not really sure, just 
an idea.

Apart from that looks good to me,
Christian.

> +
>   /**
>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>    *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index bfc47bea23db..38c9fd7b7ad4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>   		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
>   
>   	if (amdgpu_device_should_recover_gpu(ring->adev)) {
> -		amdgpu_device_gpu_recover(ring->adev, job);
> +		amdgpu_device_gpu_recover_imp(ring->adev, job);
>   	} else {
>   		drm_sched_suspend_timeout(&ring->sched);
>   		if (amdgpu_sriov_vf(adev))


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
@ 2021-12-20  7:20     ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:20 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> Use reset domain wq also for non TDR gpu recovery trigers
> such as sysfs and RAS. We must serialize all possible
> GPU recoveries to gurantee no concurrency there.
> For TDR call the original recovery function directly since
> it's already executed from within the wq. For others just
> use a wrapper to qeueue work and wait on it to finish.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>   3 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index b5ff76aae7e0..8e96b9a14452 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev);
>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   			      struct amdgpu_job* job);
> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
> +			      struct amdgpu_job *job);
>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b595e6d699b5..55cd67b9ede2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>    * Returns 0 for success or an error on failure.
>    */
>   
> -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>   			      struct amdgpu_job *job)
>   {
>   	struct list_head device_list, *device_list_handle =  NULL;
> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   	return r;
>   }
>   
> +struct recover_work_struct {

Please add an amdgpu_ prefix to the name.

> +	struct work_struct base;
> +	struct amdgpu_device *adev;
> +	struct amdgpu_job *job;
> +	int ret;
> +};
> +
> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct *work)
> +{
> +	struct recover_work_struct *recover_work = container_of(work, struct recover_work_struct, base);
> +
> +	recover_work->ret = amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
> +}
> +/*
> + * Serialize gpu recover into reset domain single threaded wq
> + */
> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> +				    struct amdgpu_job *job)
> +{
> +	struct recover_work_struct work = {.adev = adev, .job = job};
> +
> +	INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
> +
> +	if (!queue_work(adev->reset_domain.wq, &work.base))
> +		return -EAGAIN;
> +
> +	flush_work(&work.base);
> +
> +	return work.ret;
> +}

Maybe that should be part of the scheduler code? Not really sure, just 
an idea.

Apart from that looks good to me,
Christian.

> +
>   /**
>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>    *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index bfc47bea23db..38c9fd7b7ad4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
>   		  ti.process_name, ti.tgid, ti.task_name, ti.pid);
>   
>   	if (amdgpu_device_should_recover_gpu(ring->adev)) {
> -		amdgpu_device_gpu_recover(ring->adev, job);
> +		amdgpu_device_gpu_recover_imp(ring->adev, job);
>   	} else {
>   		drm_sched_suspend_timeout(&ring->sched);
>   		if (amdgpu_sriov_vf(adev))


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
  2021-12-17 22:27 ` Andrey Grodzovsky
@ 2021-12-20  7:25   ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:25 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> This patchset is based on earlier work by Boris[1] that allowed to have an
> ordered workqueue at the driver level that will be used by the different
> schedulers to queue their timeout work. On top of that I also serialized
> any GPU reset we trigger from within amdgpu code to also go through the same
> ordered wq and in this way simplify somewhat our GPU reset code so we don't need
> to protect from concurrency by multiple GPU reset triggeres such as TDR on one
> hand and sysfs trigger or RAS trigger on the other hand.
>
> As advised by Christian and Daniel I defined a reset_domain struct such that
> all the entities that go through reset together will be serialized one against
> another.
>
> TDR triggered by multiple entities within the same domain due to the same reason will not
> be triggered as the first such reset will cancel all the pending resets. This is
> relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
> those will still happen after the in flight resets finishes.
>
> [1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/
>
> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König 
<christian.koenig@amd.com>

Some minor comments on the rest, but in general absolutely looks like 
the way we want to go.

Regards,
Christian.

>
> Andrey Grodzovsky (6):
>    drm/amdgpu: Init GPU reset single threaded wq
>    drm/amdgpu: Move scheduler init to after XGMI is ready
>    drm/amdgpu: Fix crash on modprobe
>    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>    drm/amdgpu: Drop hive->in_reset
>    drm/amdgpu: Drop concurrent GPU reset protection for device
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>   7 files changed, 132 insertions(+), 136 deletions(-)
>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-20  7:25   ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-20  7:25 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> This patchset is based on earlier work by Boris[1] that allowed to have an
> ordered workqueue at the driver level that will be used by the different
> schedulers to queue their timeout work. On top of that I also serialized
> any GPU reset we trigger from within amdgpu code to also go through the same
> ordered wq and in this way simplify somewhat our GPU reset code so we don't need
> to protect from concurrency by multiple GPU reset triggeres such as TDR on one
> hand and sysfs trigger or RAS trigger on the other hand.
>
> As advised by Christian and Daniel I defined a reset_domain struct such that
> all the entities that go through reset together will be serialized one against
> another.
>
> TDR triggered by multiple entities within the same domain due to the same reason will not
> be triggered as the first such reset will cancel all the pending resets. This is
> relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
> those will still happen after the in flight resets finishes.
>
> [1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/
>
> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König 
<christian.koenig@amd.com>

Some minor comments on the rest, but in general absolutely looks like 
the way we want to go.

Regards,
Christian.

>
> Andrey Grodzovsky (6):
>    drm/amdgpu: Init GPU reset single threaded wq
>    drm/amdgpu: Move scheduler init to after XGMI is ready
>    drm/amdgpu: Fix crash on modprobe
>    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>    drm/amdgpu: Drop hive->in_reset
>    drm/amdgpu: Drop concurrent GPU reset protection for device
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>   7 files changed, 132 insertions(+), 136 deletions(-)
>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
  2021-12-20  7:25   ` Christian König
@ 2021-12-20  9:43     ` Daniel Vetter
  -1 siblings, 0 replies; 46+ messages in thread
From: Daniel Vetter @ 2021-12-20  9:43 UTC (permalink / raw)
  To: Christian König
  Cc: horace.chen, amd-gfx, dri-devel, christian.koenig, Monk.Liu

On Mon, Dec 20, 2021 at 08:25:05AM +0100, Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> > This patchset is based on earlier work by Boris[1] that allowed to have an
> > ordered workqueue at the driver level that will be used by the different
> > schedulers to queue their timeout work. On top of that I also serialized
> > any GPU reset we trigger from within amdgpu code to also go through the same
> > ordered wq and in this way simplify somewhat our GPU reset code so we don't need
> > to protect from concurrency by multiple GPU reset triggeres such as TDR on one
> > hand and sysfs trigger or RAS trigger on the other hand.
> > 
> > As advised by Christian and Daniel I defined a reset_domain struct such that
> > all the entities that go through reset together will be serialized one against
> > another.
> > 
> > TDR triggered by multiple entities within the same domain due to the same reason will not
> > be triggered as the first such reset will cancel all the pending resets. This is
> > relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
> > those will still happen after the in flight resets finishes.
> > 
> > [1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/
> > 
> > P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.
> 
> Patches #1 and #5, #6 are Reviewed-by: Christian König
> <christian.koenig@amd.com>
> 
> Some minor comments on the rest, but in general absolutely looks like the
> way we want to go.

I only scrolled through quickly, but yeah I'm concurring.
-Daniel
> 
> Regards,
> Christian.
> 
> > 
> > Andrey Grodzovsky (6):
> >    drm/amdgpu: Init GPU reset single threaded wq
> >    drm/amdgpu: Move scheduler init to after XGMI is ready
> >    drm/amdgpu: Fix crash on modprobe
> >    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
> >    drm/amdgpu: Drop hive->in_reset
> >    drm/amdgpu: Drop concurrent GPU reset protection for device
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
> >   7 files changed, 132 insertions(+), 136 deletions(-)
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-20  9:43     ` Daniel Vetter
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Vetter @ 2021-12-20  9:43 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, horace.chen, amd-gfx, dri-devel, daniel,
	christian.koenig, Monk.Liu

On Mon, Dec 20, 2021 at 08:25:05AM +0100, Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> > This patchset is based on earlier work by Boris[1] that allowed to have an
> > ordered workqueue at the driver level that will be used by the different
> > schedulers to queue their timeout work. On top of that I also serialized
> > any GPU reset we trigger from within amdgpu code to also go through the same
> > ordered wq and in this way simplify somewhat our GPU reset code so we don't need
> > to protect from concurrency by multiple GPU reset triggeres such as TDR on one
> > hand and sysfs trigger or RAS trigger on the other hand.
> > 
> > As advised by Christian and Daniel I defined a reset_domain struct such that
> > all the entities that go through reset together will be serialized one against
> > another.
> > 
> > TDR triggered by multiple entities within the same domain due to the same reason will not
> > be triggered as the first such reset will cancel all the pending resets. This is
> > relevant only to TDR timers and not to triggered resets coming from RAS or SYSFS,
> > those will still happen after the in flight resets finishes.
> > 
> > [1] https://patchwork.kernel.org/project/dri-devel/patch/20210629073510.2764391-3-boris.brezillon@collabora.com/
> > 
> > P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.
> 
> Patches #1 and #5, #6 are Reviewed-by: Christian König
> <christian.koenig@amd.com>
> 
> Some minor comments on the rest, but in general absolutely looks like the
> way we want to go.

I only scrolled through quickly, but yeah I'm concurring.
-Daniel
> 
> Regards,
> Christian.
> 
> > 
> > Andrey Grodzovsky (6):
> >    drm/amdgpu: Init GPU reset single threaded wq
> >    drm/amdgpu: Move scheduler init to after XGMI is ready
> >    drm/amdgpu: Fix crash on modprobe
> >    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
> >    drm/amdgpu: Drop hive->in_reset
> >    drm/amdgpu: Drop concurrent GPU reset protection for device
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
> >   7 files changed, 132 insertions(+), 136 deletions(-)
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
  2021-12-20  7:25   ` Christian König
@ 2021-12-20 17:06     ` Liu, Shaoyun
  -1 siblings, 0 replies; 46+ messages in thread
From: Liu, Shaoyun @ 2021-12-20 17:06 UTC (permalink / raw)
  To: Christian König, Grodzovsky, Andrey, dri-devel, amd-gfx
  Cc: Liu, Monk, Chen, Horace, Koenig, Christian

[AMD Official Use Only]


Hi , Andrey 
I actually has some concerns about this  change . 
1.  on SRIOV configuration , the reset notify coming  from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ?  Can  we just use the  new one  you added ? 
2. For KFD,  the rocm use the user queue for the submission and it won't call the drm scheduler  and hence no job timeout.  Can  we handle that with  your new change ? 
3 . For XGMI  hive, there is only hive  reset for all devices on bare-metal  config ,  but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need  reset_domain for individual device within the hive for SRIOV configuration. 

Anyway I think this change need to be verified on sriov configuration on XGMI with  some rocm use app is running . 

Regards
Shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, December 20, 2021 2:25 AM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Cc: daniel@ffwll.ch; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> This patchset is based on earlier work by Boris[1] that allowed to 
> have an ordered workqueue at the driver level that will be used by the 
> different schedulers to queue their timeout work. On top of that I 
> also serialized any GPU reset we trigger from within amdgpu code to 
> also go through the same ordered wq and in this way simplify somewhat 
> our GPU reset code so we don't need to protect from concurrency by 
> multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.
>
> As advised by Christian and Daniel I defined a reset_domain struct 
> such that all the entities that go through reset together will be 
> serialized one against another.
>
> TDR triggered by multiple entities within the same domain due to the 
> same reason will not be triggered as the first such reset will cancel 
> all the pending resets. This is relevant only to TDR timers and not to 
> triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.
>
> [1] 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439
> 1-3-boris.brezillon%40collabora.com%2F&amp;data=04%7C01%7CShaoyun.Liu%
> 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82
> d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp
> ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&amp;re
> served=0
>
> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König <christian.koenig@amd.com>

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

Regards,
Christian.

>
> Andrey Grodzovsky (6):
>    drm/amdgpu: Init GPU reset single threaded wq
>    drm/amdgpu: Move scheduler init to after XGMI is ready
>    drm/amdgpu: Fix crash on modprobe
>    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>    drm/amdgpu: Drop hive->in_reset
>    drm/amdgpu: Drop concurrent GPU reset protection for device
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>   7 files changed, 132 insertions(+), 136 deletions(-)
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-20 17:06     ` Liu, Shaoyun
  0 siblings, 0 replies; 46+ messages in thread
From: Liu, Shaoyun @ 2021-12-20 17:06 UTC (permalink / raw)
  To: Christian König, Grodzovsky, Andrey, dri-devel, amd-gfx
  Cc: Liu, Monk, Chen, Horace, Koenig, Christian, daniel

[AMD Official Use Only]


Hi , Andrey 
I actually has some concerns about this  change . 
1.  on SRIOV configuration , the reset notify coming  from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ?  Can  we just use the  new one  you added ? 
2. For KFD,  the rocm use the user queue for the submission and it won't call the drm scheduler  and hence no job timeout.  Can  we handle that with  your new change ? 
3 . For XGMI  hive, there is only hive  reset for all devices on bare-metal  config ,  but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need  reset_domain for individual device within the hive for SRIOV configuration. 

Anyway I think this change need to be verified on sriov configuration on XGMI with  some rocm use app is running . 

Regards
Shaoyun.liu

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, December 20, 2021 2:25 AM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Cc: daniel@ffwll.ch; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>
Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu

Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
> This patchset is based on earlier work by Boris[1] that allowed to 
> have an ordered workqueue at the driver level that will be used by the 
> different schedulers to queue their timeout work. On top of that I 
> also serialized any GPU reset we trigger from within amdgpu code to 
> also go through the same ordered wq and in this way simplify somewhat 
> our GPU reset code so we don't need to protect from concurrency by 
> multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.
>
> As advised by Christian and Daniel I defined a reset_domain struct 
> such that all the entities that go through reset together will be 
> serialized one against another.
>
> TDR triggered by multiple entities within the same domain due to the 
> same reason will not be triggered as the first such reset will cancel 
> all the pending resets. This is relevant only to TDR timers and not to 
> triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.
>
> [1] 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439
> 1-3-boris.brezillon%40collabora.com%2F&amp;data=04%7C01%7CShaoyun.Liu%
> 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82
> d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp
> ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&amp;re
> served=0
>
> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.

Patches #1 and #5, #6 are Reviewed-by: Christian König <christian.koenig@amd.com>

Some minor comments on the rest, but in general absolutely looks like the way we want to go.

Regards,
Christian.

>
> Andrey Grodzovsky (6):
>    drm/amdgpu: Init GPU reset single threaded wq
>    drm/amdgpu: Move scheduler init to after XGMI is ready
>    drm/amdgpu: Fix crash on modprobe
>    drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>    drm/amdgpu: Drop hive->in_reset
>    drm/amdgpu: Drop concurrent GPU reset protection for device
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>   7 files changed, 132 insertions(+), 136 deletions(-)
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
  2021-12-20 17:06     ` Liu, Shaoyun
@ 2021-12-20 19:11       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 19:11 UTC (permalink / raw)
  To: Liu, Shaoyun, Christian König, dri-devel, amd-gfx
  Cc: Liu, Monk, Chen, Horace, Koenig, Christian

On 2021-12-20 12:06 p.m., Liu, Shaoyun wrote:

> [AMD Official Use Only]
>
>
> Hi , Andrey
> I actually has some concerns about this  change .
> 1.  on SRIOV configuration , the reset notify coming  from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ?  Can  we just use the  new one  you added ?


Shouldn't be a problem,  i will change. In fact it's a great idea 
because then it looks like we can totally drop 'adev->in_gpu_reset' 
since we don't need to lock again concurrent resets anymore


> 2. For KFD,  the rocm use the user queue for the submission and it won't call the drm scheduler  and hence no job timeout.  Can  we handle that with  your new change ?


I think that not a problem - a lot of places use direct submissions and 
not scheduler, in case they need to synchronize against concurrent GPU 
resets they lock adev->reset_sem. Nothing changes in this sense.


>   
> 3 . For XGMI  hive, there is only hive  reset for all devices on bare-metal  config ,  but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need  reset_domain for individual device within the hive for SRIOV configuration.


This is something future right ? I don't see it in the code - in this 
case we will have to account for this as part of the generic design for 
this kind of single device reset within XGMI hive. It should require 
only a minor addition to current design in creating 2 parallel reset 
domains - one for hive and one per device.


>
> Anyway I think this change need to be verified on sriov configuration on XGMI with  some rocm use app is running .


I do have XGMI setup where I still test XGMI resets. It has ROCm stack 
there - can you please login there and tell me if I have what needed 
there to do the tests you advise ? I am not very familiar with ROCm 
tools as i usually test using libdrm. (Ping me on teams for the device 
ip and user name)

Andrey


>
> Regards
> Shaoyun.liu
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
> Sent: Monday, December 20, 2021 2:25 AM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Cc: daniel@ffwll.ch; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>
> Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
>
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> This patchset is based on earlier work by Boris[1] that allowed to
>> have an ordered workqueue at the driver level that will be used by the
>> different schedulers to queue their timeout work. On top of that I
>> also serialized any GPU reset we trigger from within amdgpu code to
>> also go through the same ordered wq and in this way simplify somewhat
>> our GPU reset code so we don't need to protect from concurrency by
>> multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.
>>
>> As advised by Christian and Daniel I defined a reset_domain struct
>> such that all the entities that go through reset together will be
>> serialized one against another.
>>
>> TDR triggered by multiple entities within the same domain due to the
>> same reason will not be triggered as the first such reset will cancel
>> all the pending resets. This is relevant only to TDR timers and not to
>> triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.
>>
>> [1]
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>> hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439
>> 1-3-boris.brezillon%40collabora.com%2F&amp;data=04%7C01%7CShaoyun.Liu%
>> 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82
>> d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp
>> ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&amp;re
>> served=0
>>
>> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.
> Patches #1 and #5, #6 are Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Some minor comments on the rest, but in general absolutely looks like the way we want to go.
>
> Regards,
> Christian.
>
>> Andrey Grodzovsky (6):
>>     drm/amdgpu: Init GPU reset single threaded wq
>>     drm/amdgpu: Move scheduler init to after XGMI is ready
>>     drm/amdgpu: Fix crash on modprobe
>>     drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>>     drm/amdgpu: Drop hive->in_reset
>>     drm/amdgpu: Drop concurrent GPU reset protection for device
>>
>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>>    7 files changed, 132 insertions(+), 136 deletions(-)
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
@ 2021-12-20 19:11       ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 19:11 UTC (permalink / raw)
  To: Liu, Shaoyun, Christian König, dri-devel, amd-gfx
  Cc: Liu, Monk, Chen, Horace, Koenig, Christian, daniel

On 2021-12-20 12:06 p.m., Liu, Shaoyun wrote:

> [AMD Official Use Only]
>
>
> Hi , Andrey
> I actually has some concerns about this  change .
> 1.  on SRIOV configuration , the reset notify coming  from host , and driver already trigger a work queue to handle the reset (check xgpu_*_mailbox_flr_work) , is it a good idea to trigger another work queue inside the work queue ?  Can  we just use the  new one  you added ?


Shouldn't be a problem,  i will change. In fact it's a great idea 
because then it looks like we can totally drop 'adev->in_gpu_reset' 
since we don't need to lock again concurrent resets anymore


> 2. For KFD,  the rocm use the user queue for the submission and it won't call the drm scheduler  and hence no job timeout.  Can  we handle that with  your new change ?


I think that not a problem - a lot of places use direct submissions and 
not scheduler, in case they need to synchronize against concurrent GPU 
resets they lock adev->reset_sem. Nothing changes in this sense.


>   
> 3 . For XGMI  hive, there is only hive  reset for all devices on bare-metal  config ,  but for SRIOV config , the VF will support VF FLR, which means host might only need to reset specific device instead trigger whole hive reset . So we might still need  reset_domain for individual device within the hive for SRIOV configuration.


This is something future right ? I don't see it in the code - in this 
case we will have to account for this as part of the generic design for 
this kind of single device reset within XGMI hive. It should require 
only a minor addition to current design in creating 2 parallel reset 
domains - one for hive and one per device.


>
> Anyway I think this change need to be verified on sriov configuration on XGMI with  some rocm use app is running .


I do have XGMI setup where I still test XGMI resets. It has ROCm stack 
there - can you please login there and tell me if I have what needed 
there to do the tests you advise ? I am not very familiar with ROCm 
tools as i usually test using libdrm. (Ping me on teams for the device 
ip and user name)

Andrey


>
> Regards
> Shaoyun.liu
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
> Sent: Monday, December 20, 2021 2:25 AM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Cc: daniel@ffwll.ch; Chen, Horace <Horace.Chen@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Liu, Monk <Monk.Liu@amd.com>
> Subject: Re: [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu
>
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> This patchset is based on earlier work by Boris[1] that allowed to
>> have an ordered workqueue at the driver level that will be used by the
>> different schedulers to queue their timeout work. On top of that I
>> also serialized any GPU reset we trigger from within amdgpu code to
>> also go through the same ordered wq and in this way simplify somewhat
>> our GPU reset code so we don't need to protect from concurrency by
>> multiple GPU reset triggeres such as TDR on one hand and sysfs trigger or RAS trigger on the other hand.
>>
>> As advised by Christian and Daniel I defined a reset_domain struct
>> such that all the entities that go through reset together will be
>> serialized one against another.
>>
>> TDR triggered by multiple entities within the same domain due to the
>> same reason will not be triggered as the first such reset will cancel
>> all the pending resets. This is relevant only to TDR timers and not to
>> triggered resets coming from RAS or SYSFS, those will still happen after the in flight resets finishes.
>>
>> [1]
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
>> hwork.kernel.org%2Fproject%2Fdri-devel%2Fpatch%2F20210629073510.276439
>> 1-3-boris.brezillon%40collabora.com%2F&amp;data=04%7C01%7CShaoyun.Liu%
>> 40amd.com%7C1d2b07ad556b4da5d58808d9c389decf%7C3dd8961fe4884e608e11a82
>> d994e183d%7C0%7C0%7C637755819206627827%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp
>> ;sdata=8C8UbdPmM%2FH6sdTYDP5lZfRfBdQ%2B%2FN7m6s%2FREW8%2BsoM%3D&amp;re
>> served=0
>>
>> P.S Going through drm-misc-next and not amd-staging-drm-next as Boris work hasn't landed yet there.
> Patches #1 and #5, #6 are Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Some minor comments on the rest, but in general absolutely looks like the way we want to go.
>
> Regards,
> Christian.
>
>> Andrey Grodzovsky (6):
>>     drm/amdgpu: Init GPU reset single threaded wq
>>     drm/amdgpu: Move scheduler init to after XGMI is ready
>>     drm/amdgpu: Fix crash on modprobe
>>     drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>>     drm/amdgpu: Drop hive->in_reset
>>     drm/amdgpu: Drop concurrent GPU reset protection for device
>>
>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   9 +
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 206 +++++++++++----------
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  36 +---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |   2 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   2 +
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  10 +-
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |   3 +-
>>    7 files changed, 132 insertions(+), 136 deletions(-)
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-20  7:17     ` Christian König
@ 2021-12-20 19:22       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 19:22 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:17 a.m., Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Restrict jobs resubmission to suspend case
>> only since schedulers not initialised yet on
>> probe.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 5527c68c51de..8ebd954e06c6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>> amdgpu_device *adev)
>>           if (!ring || !ring->fence_drv.initialized)
>>               continue;
>>   -        if (!ring->no_scheduler) {
>> +        if (adev->in_suspend && !ring->no_scheduler) {
>
> Uff, why is that suddenly necessary? Because of the changed order?
>
> Christian.


Yes.

Andrey


>
>> drm_sched_resubmit_jobs(&ring->sched);
>>               drm_sched_start(&ring->sched, true);
>>           }
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-20 19:22       ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 19:22 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:17 a.m., Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Restrict jobs resubmission to suspend case
>> only since schedulers not initialised yet on
>> probe.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 5527c68c51de..8ebd954e06c6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>> amdgpu_device *adev)
>>           if (!ring || !ring->fence_drv.initialized)
>>               continue;
>>   -        if (!ring->no_scheduler) {
>> +        if (adev->in_suspend && !ring->no_scheduler) {
>
> Uff, why is that suddenly necessary? Because of the changed order?
>
> Christian.


Yes.

Andrey


>
>> drm_sched_resubmit_jobs(&ring->sched);
>>               drm_sched_start(&ring->sched, true);
>>           }
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
  2021-12-20  7:16     ` Christian König
@ 2021-12-20 21:51       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 21:51 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:16 a.m., Christian König wrote:
>
>
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Before we initialize schedulers we must know which reset
>> domain are we in - for single device there iis a single
>> domain per device and so single wq per device. For XGMI
>> the reset domain spans the entire XGMI hive and so the
>> reset wq is per hive.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>>   3 files changed, 51 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 5f13195d23d1..b595e6d699b5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct 
>> amdgpu_device *adev)
>>       return r;
>>   }
>>   +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>> +{
>> +    long timeout;
>> +    int r, i;
>> +
>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> +        struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +        /* No need to setup the GPU scheduler for rings that don't 
>> need it */
>> +        if (!ring || ring->no_scheduler)
>> +            continue;
>> +
>> +        switch (ring->funcs->type) {
>> +        case AMDGPU_RING_TYPE_GFX:
>> +            timeout = adev->gfx_timeout;
>> +            break;
>> +        case AMDGPU_RING_TYPE_COMPUTE:
>> +            timeout = adev->compute_timeout;
>> +            break;
>> +        case AMDGPU_RING_TYPE_SDMA:
>> +            timeout = adev->sdma_timeout;
>> +            break;
>> +        default:
>> +            timeout = adev->video_timeout;
>> +            break;
>> +        }
>> +
>
>
>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>> +                   ring->num_hw_submission, amdgpu_job_hang_limit,
>> +                   timeout, adev->reset_domain.wq, 
>> ring->sched_score, ring->name);
>> +        if (r) {
>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>> +                  ring->name);
>> +            return r;
>> +        }
>
> Maybe better put that into amdgpu_ring.c. But not really a hard 
> requirement, more a gut feeling.
>
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +
>>   /**
>>    * amdgpu_device_ip_init - run init for hardware IPs
>>    *
>> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct 
>> amdgpu_device *adev)
>>           }
>>       }
>>   +    r = amdgpu_device_init_schedulers(adev);
>> +    if (r)
>> +        goto init_failed;
>> +
>>       /* Don't init kfd if whole hive need to be reset during init */
>>       if (!adev->gmc.xgmi.pending_reset)
>>           amdgpu_amdkfd_device_init(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 3b7e86ea7167..5527c68c51de 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct 
>> amdgpu_ring *ring,
>>                     atomic_t *sched_score)
>>   {
>>       struct amdgpu_device *adev = ring->adev;
>> -    long timeout;
>> -    int r;
>>         if (!adev)
>>           return -EINVAL;
>> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct 
>> amdgpu_ring *ring,
>>       spin_lock_init(&ring->fence_drv.lock);
>>       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>> sizeof(void *),
>>                        GFP_KERNEL);
>> -    if (!ring->fence_drv.fences)
>> -        return -ENOMEM;
>>   -    /* No need to setup the GPU scheduler for rings that don't 
>> need it */
>> -    if (ring->no_scheduler)
>> -        return 0;
>> +    ring->num_hw_submission = num_hw_submission;
>> +    ring->sched_score = sched_score;
>
> Probably better to set that in the caller and drop the parameters from 
> the amdgpu_fence_driver_init_ring() function completely.
>
> Christian.


I noticed that at least num_hw_submission is validated within the 
function so not sure we should then discard the parameters.

Andrey


>
>>   -    switch (ring->funcs->type) {
>> -    case AMDGPU_RING_TYPE_GFX:
>> -        timeout = adev->gfx_timeout;
>> -        break;
>> -    case AMDGPU_RING_TYPE_COMPUTE:
>> -        timeout = adev->compute_timeout;
>> -        break;
>> -    case AMDGPU_RING_TYPE_SDMA:
>> -        timeout = adev->sdma_timeout;
>> -        break;
>> -    default:
>> -        timeout = adev->video_timeout;
>> -        break;
>> -    }
>> -
>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>> -               num_hw_submission, amdgpu_job_hang_limit,
>> -               timeout, NULL, sched_score, ring->name);
>> -    if (r) {
>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>> -              ring->name);
>> -        return r;
>> -    }
>> +    if (!ring->fence_drv.fences)
>> +        return -ENOMEM;
>>         return 0;
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 4d380e79752c..a4b8279e3011 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>>       bool            has_compute_vm_bug;
>>       bool            no_scheduler;
>>       int            hw_prio;
>> +    unsigned         num_hw_submission;
>> +    atomic_t        *sched_score;
>>   };
>>     #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), 
>> (ib)))
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
@ 2021-12-20 21:51       ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 21:51 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:16 a.m., Christian König wrote:
>
>
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Before we initialize schedulers we must know which reset
>> domain are we in - for single device there iis a single
>> domain per device and so single wq per device. For XGMI
>> the reset domain spans the entire XGMI hive and so the
>> reset wq is per hive.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 ++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>>   3 files changed, 51 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 5f13195d23d1..b595e6d699b5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct 
>> amdgpu_device *adev)
>>       return r;
>>   }
>>   +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>> +{
>> +    long timeout;
>> +    int r, i;
>> +
>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> +        struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +        /* No need to setup the GPU scheduler for rings that don't 
>> need it */
>> +        if (!ring || ring->no_scheduler)
>> +            continue;
>> +
>> +        switch (ring->funcs->type) {
>> +        case AMDGPU_RING_TYPE_GFX:
>> +            timeout = adev->gfx_timeout;
>> +            break;
>> +        case AMDGPU_RING_TYPE_COMPUTE:
>> +            timeout = adev->compute_timeout;
>> +            break;
>> +        case AMDGPU_RING_TYPE_SDMA:
>> +            timeout = adev->sdma_timeout;
>> +            break;
>> +        default:
>> +            timeout = adev->video_timeout;
>> +            break;
>> +        }
>> +
>
>
>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>> +                   ring->num_hw_submission, amdgpu_job_hang_limit,
>> +                   timeout, adev->reset_domain.wq, 
>> ring->sched_score, ring->name);
>> +        if (r) {
>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>> +                  ring->name);
>> +            return r;
>> +        }
>
> Maybe better put that into amdgpu_ring.c. But not really a hard 
> requirement, more a gut feeling.
>
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +
>>   /**
>>    * amdgpu_device_ip_init - run init for hardware IPs
>>    *
>> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct 
>> amdgpu_device *adev)
>>           }
>>       }
>>   +    r = amdgpu_device_init_schedulers(adev);
>> +    if (r)
>> +        goto init_failed;
>> +
>>       /* Don't init kfd if whole hive need to be reset during init */
>>       if (!adev->gmc.xgmi.pending_reset)
>>           amdgpu_amdkfd_device_init(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 3b7e86ea7167..5527c68c51de 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct 
>> amdgpu_ring *ring,
>>                     atomic_t *sched_score)
>>   {
>>       struct amdgpu_device *adev = ring->adev;
>> -    long timeout;
>> -    int r;
>>         if (!adev)
>>           return -EINVAL;
>> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct 
>> amdgpu_ring *ring,
>>       spin_lock_init(&ring->fence_drv.lock);
>>       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>> sizeof(void *),
>>                        GFP_KERNEL);
>> -    if (!ring->fence_drv.fences)
>> -        return -ENOMEM;
>>   -    /* No need to setup the GPU scheduler for rings that don't 
>> need it */
>> -    if (ring->no_scheduler)
>> -        return 0;
>> +    ring->num_hw_submission = num_hw_submission;
>> +    ring->sched_score = sched_score;
>
> Probably better to set that in the caller and drop the parameters from 
> the amdgpu_fence_driver_init_ring() function completely.
>
> Christian.


I noticed that at least num_hw_submission is validated within the 
function so not sure we should then discard the parameters.

Andrey


>
>>   -    switch (ring->funcs->type) {
>> -    case AMDGPU_RING_TYPE_GFX:
>> -        timeout = adev->gfx_timeout;
>> -        break;
>> -    case AMDGPU_RING_TYPE_COMPUTE:
>> -        timeout = adev->compute_timeout;
>> -        break;
>> -    case AMDGPU_RING_TYPE_SDMA:
>> -        timeout = adev->sdma_timeout;
>> -        break;
>> -    default:
>> -        timeout = adev->video_timeout;
>> -        break;
>> -    }
>> -
>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>> -               num_hw_submission, amdgpu_job_hang_limit,
>> -               timeout, NULL, sched_score, ring->name);
>> -    if (r) {
>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>> -              ring->name);
>> -        return r;
>> -    }
>> +    if (!ring->fence_drv.fences)
>> +        return -ENOMEM;
>>         return 0;
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 4d380e79752c..a4b8279e3011 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>>       bool            has_compute_vm_bug;
>>       bool            no_scheduler;
>>       int            hw_prio;
>> +    unsigned         num_hw_submission;
>> +    atomic_t        *sched_score;
>>   };
>>     #define amdgpu_ring_parse_cs(r, p, ib) ((r)->funcs->parse_cs((p), 
>> (ib)))
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  2021-12-20  7:20     ` Christian König
@ 2021-12-20 22:17       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 22:17 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:20 a.m., Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Use reset domain wq also for non TDR gpu recovery trigers
>> such as sysfs and RAS. We must serialize all possible
>> GPU recoveries to gurantee no concurrency there.
>> For TDR call the original recovery function directly since
>> it's already executed from within the wq. For others just
>> use a wrapper to qeueue work and wait on it to finish.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index b5ff76aae7e0..8e96b9a14452 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>> amdgpu_device *adev);
>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>                     struct amdgpu_job* job);
>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>> +                  struct amdgpu_job *job);
>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b595e6d699b5..55cd67b9ede2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>    * Returns 0 for success or an error on failure.
>>    */
>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>                     struct amdgpu_job *job)
>>   {
>>       struct list_head device_list, *device_list_handle =  NULL;
>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>> amdgpu_device *adev,
>>       return r;
>>   }
>>   +struct recover_work_struct {
>
> Please add an amdgpu_ prefix to the name.
>
>> +    struct work_struct base;
>> +    struct amdgpu_device *adev;
>> +    struct amdgpu_job *job;
>> +    int ret;
>> +};
>> +
>> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct 
>> *work)
>> +{
>> +    struct recover_work_struct *recover_work = container_of(work, 
>> struct recover_work_struct, base);
>> +
>> +    recover_work->ret = 
>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>> +}
>> +/*
>> + * Serialize gpu recover into reset domain single threaded wq
>> + */
>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>> +                    struct amdgpu_job *job)
>> +{
>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>> +
>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>> +
>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>> +        return -EAGAIN;
>> +
>> +    flush_work(&work.base);
>> +
>> +    return work.ret;
>> +}
>
> Maybe that should be part of the scheduler code? Not really sure, just 
> an idea.


Seems to me that since the reset domain is almost always above a single 
scheduler granularity then it wouldn't feet very well there.

Andrey


>
> Apart from that looks good to me,
> Christian.
>
>> +
>>   /**
>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>    *
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index bfc47bea23db..38c9fd7b7ad4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
@ 2021-12-20 22:17       ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-20 22:17 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu


On 2021-12-20 2:20 a.m., Christian König wrote:
> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>> Use reset domain wq also for non TDR gpu recovery trigers
>> such as sysfs and RAS. We must serialize all possible
>> GPU recoveries to gurantee no concurrency there.
>> For TDR call the original recovery function directly since
>> it's already executed from within the wq. For others just
>> use a wrapper to qeueue work and wait on it to finish.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 +++++++++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index b5ff76aae7e0..8e96b9a14452 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>> amdgpu_device *adev);
>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>                     struct amdgpu_job* job);
>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>> +                  struct amdgpu_job *job);
>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b595e6d699b5..55cd67b9ede2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>    * Returns 0 for success or an error on failure.
>>    */
>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>                     struct amdgpu_job *job)
>>   {
>>       struct list_head device_list, *device_list_handle =  NULL;
>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>> amdgpu_device *adev,
>>       return r;
>>   }
>>   +struct recover_work_struct {
>
> Please add an amdgpu_ prefix to the name.
>
>> +    struct work_struct base;
>> +    struct amdgpu_device *adev;
>> +    struct amdgpu_job *job;
>> +    int ret;
>> +};
>> +
>> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct 
>> *work)
>> +{
>> +    struct recover_work_struct *recover_work = container_of(work, 
>> struct recover_work_struct, base);
>> +
>> +    recover_work->ret = 
>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>> +}
>> +/*
>> + * Serialize gpu recover into reset domain single threaded wq
>> + */
>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>> +                    struct amdgpu_job *job)
>> +{
>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>> +
>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>> +
>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>> +        return -EAGAIN;
>> +
>> +    flush_work(&work.base);
>> +
>> +    return work.ret;
>> +}
>
> Maybe that should be part of the scheduler code? Not really sure, just 
> an idea.


Seems to me that since the reset domain is almost always above a single 
scheduler granularity then it wouldn't feet very well there.

Andrey


>
> Apart from that looks good to me,
> Christian.
>
>> +
>>   /**
>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>    *
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index bfc47bea23db..38c9fd7b7ad4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>> -        amdgpu_device_gpu_recover(ring->adev, job);
>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>       } else {
>>           drm_sched_suspend_timeout(&ring->sched);
>>           if (amdgpu_sriov_vf(adev))
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-20 19:22       ` Andrey Grodzovsky
@ 2021-12-21  7:02         ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:02 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu



Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:17 a.m., Christian König wrote:
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Restrict jobs resubmission to suspend case
>>> only since schedulers not initialised yet on
>>> probe.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 5527c68c51de..8ebd954e06c6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>> amdgpu_device *adev)
>>>           if (!ring || !ring->fence_drv.initialized)
>>>               continue;
>>>   -        if (!ring->no_scheduler) {
>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>
>> Uff, why is that suddenly necessary? Because of the changed order?
>>
>> Christian.
>
>
> Yes.

Mhm, that's quite bad design then.

How about we keep the order as is and allow specifying the reset work 
queue with drm_sched_start() ?

Christian.

>
> Andrey
>
>
>>
>>> drm_sched_resubmit_jobs(&ring->sched);
>>>               drm_sched_start(&ring->sched, true);
>>>           }
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-21  7:02         ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:02 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu



Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:17 a.m., Christian König wrote:
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Restrict jobs resubmission to suspend case
>>> only since schedulers not initialised yet on
>>> probe.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 5527c68c51de..8ebd954e06c6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>> amdgpu_device *adev)
>>>           if (!ring || !ring->fence_drv.initialized)
>>>               continue;
>>>   -        if (!ring->no_scheduler) {
>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>
>> Uff, why is that suddenly necessary? Because of the changed order?
>>
>> Christian.
>
>
> Yes.

Mhm, that's quite bad design then.

How about we keep the order as is and allow specifying the reset work 
queue with drm_sched_start() ?

Christian.

>
> Andrey
>
>
>>
>>> drm_sched_resubmit_jobs(&ring->sched);
>>>               drm_sched_start(&ring->sched, true);
>>>           }
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
  2021-12-20 21:51       ` Andrey Grodzovsky
@ 2021-12-21  7:05         ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:05 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, Monk.Liu

Am 20.12.21 um 22:51 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:16 a.m., Christian König wrote:
>>
>>
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Before we initialize schedulers we must know which reset
>>> domain are we in - for single device there iis a single
>>> domain per device and so single wq per device. For XGMI
>>> the reset domain spans the entire XGMI hive and so the
>>> reset wq is per hive.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 
>>> ++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>>>   3 files changed, 51 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 5f13195d23d1..b595e6d699b5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct 
>>> amdgpu_device *adev)
>>>       return r;
>>>   }
>>>   +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>> +{
>>> +    long timeout;
>>> +    int r, i;
>>> +
>>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>> +        struct amdgpu_ring *ring = adev->rings[i];
>>> +
>>> +        /* No need to setup the GPU scheduler for rings that don't 
>>> need it */
>>> +        if (!ring || ring->no_scheduler)
>>> +            continue;
>>> +
>>> +        switch (ring->funcs->type) {
>>> +        case AMDGPU_RING_TYPE_GFX:
>>> +            timeout = adev->gfx_timeout;
>>> +            break;
>>> +        case AMDGPU_RING_TYPE_COMPUTE:
>>> +            timeout = adev->compute_timeout;
>>> +            break;
>>> +        case AMDGPU_RING_TYPE_SDMA:
>>> +            timeout = adev->sdma_timeout;
>>> +            break;
>>> +        default:
>>> +            timeout = adev->video_timeout;
>>> +            break;
>>> +        }
>>> +
>>
>>
>>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>> +                   ring->num_hw_submission, amdgpu_job_hang_limit,
>>> +                   timeout, adev->reset_domain.wq, 
>>> ring->sched_score, ring->name);
>>> +        if (r) {
>>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>> +                  ring->name);
>>> +            return r;
>>> +        }
>>
>> Maybe better put that into amdgpu_ring.c. But not really a hard 
>> requirement, more a gut feeling.
>>
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>   /**
>>>    * amdgpu_device_ip_init - run init for hardware IPs
>>>    *
>>> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct 
>>> amdgpu_device *adev)
>>>           }
>>>       }
>>>   +    r = amdgpu_device_init_schedulers(adev);
>>> +    if (r)
>>> +        goto init_failed;
>>> +
>>>       /* Don't init kfd if whole hive need to be reset during init */
>>>       if (!adev->gmc.xgmi.pending_reset)
>>>           amdgpu_amdkfd_device_init(adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 3b7e86ea7167..5527c68c51de 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>                     atomic_t *sched_score)
>>>   {
>>>       struct amdgpu_device *adev = ring->adev;
>>> -    long timeout;
>>> -    int r;
>>>         if (!adev)
>>>           return -EINVAL;
>>> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>       spin_lock_init(&ring->fence_drv.lock);
>>>       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>>> sizeof(void *),
>>>                        GFP_KERNEL);
>>> -    if (!ring->fence_drv.fences)
>>> -        return -ENOMEM;
>>>   -    /* No need to setup the GPU scheduler for rings that don't 
>>> need it */
>>> -    if (ring->no_scheduler)
>>> -        return 0;
>>> +    ring->num_hw_submission = num_hw_submission;
>>> +    ring->sched_score = sched_score;
>>
>> Probably better to set that in the caller and drop the parameters 
>> from the amdgpu_fence_driver_init_ring() function completely.
>>
>> Christian.
>
>
> I noticed that at least num_hw_submission is validated within the 
> function so not sure we should then discard the parameters.

Good point. It also doesn't make sense to move this check up because the 
power of two requirement comes from the fences, doesn't it?

Ok in this case just keep it like it is.

Christian.

>
> Andrey
>
>
>>
>>>   -    switch (ring->funcs->type) {
>>> -    case AMDGPU_RING_TYPE_GFX:
>>> -        timeout = adev->gfx_timeout;
>>> -        break;
>>> -    case AMDGPU_RING_TYPE_COMPUTE:
>>> -        timeout = adev->compute_timeout;
>>> -        break;
>>> -    case AMDGPU_RING_TYPE_SDMA:
>>> -        timeout = adev->sdma_timeout;
>>> -        break;
>>> -    default:
>>> -        timeout = adev->video_timeout;
>>> -        break;
>>> -    }
>>> -
>>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>> -               num_hw_submission, amdgpu_job_hang_limit,
>>> -               timeout, NULL, sched_score, ring->name);
>>> -    if (r) {
>>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>> -              ring->name);
>>> -        return r;
>>> -    }
>>> +    if (!ring->fence_drv.fences)
>>> +        return -ENOMEM;
>>>         return 0;
>>>   }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index 4d380e79752c..a4b8279e3011 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>>>       bool            has_compute_vm_bug;
>>>       bool            no_scheduler;
>>>       int            hw_prio;
>>> +    unsigned         num_hw_submission;
>>> +    atomic_t        *sched_score;
>>>   };
>>>     #define amdgpu_ring_parse_cs(r, p, ib) 
>>> ((r)->funcs->parse_cs((p), (ib)))
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready
@ 2021-12-21  7:05         ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:05 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, daniel, Monk.Liu

Am 20.12.21 um 22:51 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:16 a.m., Christian König wrote:
>>
>>
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Before we initialize schedulers we must know which reset
>>> domain are we in - for single device there iis a single
>>> domain per device and so single wq per device. For XGMI
>>> the reset domain spans the entire XGMI hive and so the
>>> reset wq is per hive.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 45 
>>> ++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 34 ++--------------
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  2 +
>>>   3 files changed, 51 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 5f13195d23d1..b595e6d699b5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2284,6 +2284,47 @@ static int amdgpu_device_fw_loading(struct 
>>> amdgpu_device *adev)
>>>       return r;
>>>   }
>>>   +static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>> +{
>>> +    long timeout;
>>> +    int r, i;
>>> +
>>> +    for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>> +        struct amdgpu_ring *ring = adev->rings[i];
>>> +
>>> +        /* No need to setup the GPU scheduler for rings that don't 
>>> need it */
>>> +        if (!ring || ring->no_scheduler)
>>> +            continue;
>>> +
>>> +        switch (ring->funcs->type) {
>>> +        case AMDGPU_RING_TYPE_GFX:
>>> +            timeout = adev->gfx_timeout;
>>> +            break;
>>> +        case AMDGPU_RING_TYPE_COMPUTE:
>>> +            timeout = adev->compute_timeout;
>>> +            break;
>>> +        case AMDGPU_RING_TYPE_SDMA:
>>> +            timeout = adev->sdma_timeout;
>>> +            break;
>>> +        default:
>>> +            timeout = adev->video_timeout;
>>> +            break;
>>> +        }
>>> +
>>
>>
>>> +        r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>> +                   ring->num_hw_submission, amdgpu_job_hang_limit,
>>> +                   timeout, adev->reset_domain.wq, 
>>> ring->sched_score, ring->name);
>>> +        if (r) {
>>> +            DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>> +                  ring->name);
>>> +            return r;
>>> +        }
>>
>> Maybe better put that into amdgpu_ring.c. But not really a hard 
>> requirement, more a gut feeling.
>>
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>   /**
>>>    * amdgpu_device_ip_init - run init for hardware IPs
>>>    *
>>> @@ -2412,6 +2453,10 @@ static int amdgpu_device_ip_init(struct 
>>> amdgpu_device *adev)
>>>           }
>>>       }
>>>   +    r = amdgpu_device_init_schedulers(adev);
>>> +    if (r)
>>> +        goto init_failed;
>>> +
>>>       /* Don't init kfd if whole hive need to be reset during init */
>>>       if (!adev->gmc.xgmi.pending_reset)
>>>           amdgpu_amdkfd_device_init(adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 3b7e86ea7167..5527c68c51de 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -456,8 +456,6 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>                     atomic_t *sched_score)
>>>   {
>>>       struct amdgpu_device *adev = ring->adev;
>>> -    long timeout;
>>> -    int r;
>>>         if (!adev)
>>>           return -EINVAL;
>>> @@ -477,36 +475,12 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>       spin_lock_init(&ring->fence_drv.lock);
>>>       ring->fence_drv.fences = kcalloc(num_hw_submission * 2, 
>>> sizeof(void *),
>>>                        GFP_KERNEL);
>>> -    if (!ring->fence_drv.fences)
>>> -        return -ENOMEM;
>>>   -    /* No need to setup the GPU scheduler for rings that don't 
>>> need it */
>>> -    if (ring->no_scheduler)
>>> -        return 0;
>>> +    ring->num_hw_submission = num_hw_submission;
>>> +    ring->sched_score = sched_score;
>>
>> Probably better to set that in the caller and drop the parameters 
>> from the amdgpu_fence_driver_init_ring() function completely.
>>
>> Christian.
>
>
> I noticed that at least num_hw_submission is validated within the 
> function so not sure we should then discard the parameters.

Good point. It also doesn't make sense to move this check up because the 
power of two requirement comes from the fences, doesn't it?

Ok in this case just keep it like it is.

Christian.

>
> Andrey
>
>
>>
>>>   -    switch (ring->funcs->type) {
>>> -    case AMDGPU_RING_TYPE_GFX:
>>> -        timeout = adev->gfx_timeout;
>>> -        break;
>>> -    case AMDGPU_RING_TYPE_COMPUTE:
>>> -        timeout = adev->compute_timeout;
>>> -        break;
>>> -    case AMDGPU_RING_TYPE_SDMA:
>>> -        timeout = adev->sdma_timeout;
>>> -        break;
>>> -    default:
>>> -        timeout = adev->video_timeout;
>>> -        break;
>>> -    }
>>> -
>>> -    r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>> -               num_hw_submission, amdgpu_job_hang_limit,
>>> -               timeout, NULL, sched_score, ring->name);
>>> -    if (r) {
>>> -        DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>> -              ring->name);
>>> -        return r;
>>> -    }
>>> +    if (!ring->fence_drv.fences)
>>> +        return -ENOMEM;
>>>         return 0;
>>>   }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index 4d380e79752c..a4b8279e3011 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -253,6 +253,8 @@ struct amdgpu_ring {
>>>       bool            has_compute_vm_bug;
>>>       bool            no_scheduler;
>>>       int            hw_prio;
>>> +    unsigned         num_hw_submission;
>>> +    atomic_t        *sched_score;
>>>   };
>>>     #define amdgpu_ring_parse_cs(r, p, ib) 
>>> ((r)->funcs->parse_cs((p), (ib)))
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  2021-12-20 22:17       ` Andrey Grodzovsky
@ 2021-12-21  7:59         ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:59 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, Monk.Liu

Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:20 a.m., Christian König wrote:
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Use reset domain wq also for non TDR gpu recovery trigers
>>> such as sysfs and RAS. We must serialize all possible
>>> GPU recoveries to gurantee no concurrency there.
>>> For TDR call the original recovery function directly since
>>> it's already executed from within the wq. For others just
>>> use a wrapper to qeueue work and wait on it to finish.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 
>>> +++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index b5ff76aae7e0..8e96b9a14452 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>>> amdgpu_device *adev);
>>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>                     struct amdgpu_job* job);
>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>> +                  struct amdgpu_job *job);
>>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index b595e6d699b5..55cd67b9ede2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>>    * Returns 0 for success or an error on failure.
>>>    */
>>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>                     struct amdgpu_job *job)
>>>   {
>>>       struct list_head device_list, *device_list_handle = NULL;
>>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>>> amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +struct recover_work_struct {
>>
>> Please add an amdgpu_ prefix to the name.
>>
>>> +    struct work_struct base;
>>> +    struct amdgpu_device *adev;
>>> +    struct amdgpu_job *job;
>>> +    int ret;
>>> +};
>>> +
>>> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct 
>>> *work)
>>> +{
>>> +    struct recover_work_struct *recover_work = container_of(work, 
>>> struct recover_work_struct, base);
>>> +
>>> +    recover_work->ret = 
>>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>>> +}
>>> +/*
>>> + * Serialize gpu recover into reset domain single threaded wq
>>> + */
>>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>> +                    struct amdgpu_job *job)
>>> +{
>>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>>> +
>>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>>> +
>>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>>> +        return -EAGAIN;
>>> +
>>> +    flush_work(&work.base);
>>> +
>>> +    return work.ret;
>>> +}
>>
>> Maybe that should be part of the scheduler code? Not really sure, 
>> just an idea.
>
>
> Seems to me that since the reset domain is almost always above a 
> single scheduler granularity then it wouldn't feet very well there.

Yeah, but what if we introduce an drm_sched_recover_queue and 
drm_sched_recover_work object?

It's probably ok to go forward with that for now, but this handling 
makes quite some sense to have independent of which driver is using it. 
So as soon as we see a second similar implementation we should move it 
into common code.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Apart from that looks good to me,
>> Christian.
>>
>>> +
>>>   /**
>>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>>    *
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index bfc47bea23db..38c9fd7b7ad4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>> -        amdgpu_device_gpu_recover(ring->adev, job);
>>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>>       } else {
>>>           drm_sched_suspend_timeout(&ring->sched);
>>>           if (amdgpu_sriov_vf(adev))
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
@ 2021-12-21  7:59         ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-21  7:59 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, daniel, Monk.Liu

Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:
>
> On 2021-12-20 2:20 a.m., Christian König wrote:
>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>> Use reset domain wq also for non TDR gpu recovery trigers
>>> such as sysfs and RAS. We must serialize all possible
>>> GPU recoveries to gurantee no concurrency there.
>>> For TDR call the original recovery function directly since
>>> it's already executed from within the wq. For others just
>>> use a wrapper to qeueue work and wait on it to finish.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 
>>> +++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index b5ff76aae7e0..8e96b9a14452 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>>> amdgpu_device *adev);
>>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>                     struct amdgpu_job* job);
>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>> +                  struct amdgpu_job *job);
>>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index b595e6d699b5..55cd67b9ede2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>>    * Returns 0 for success or an error on failure.
>>>    */
>>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>                     struct amdgpu_job *job)
>>>   {
>>>       struct list_head device_list, *device_list_handle = NULL;
>>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>>> amdgpu_device *adev,
>>>       return r;
>>>   }
>>>   +struct recover_work_struct {
>>
>> Please add an amdgpu_ prefix to the name.
>>
>>> +    struct work_struct base;
>>> +    struct amdgpu_device *adev;
>>> +    struct amdgpu_job *job;
>>> +    int ret;
>>> +};
>>> +
>>> +static void amdgpu_device_queue_gpu_recover_work(struct work_struct 
>>> *work)
>>> +{
>>> +    struct recover_work_struct *recover_work = container_of(work, 
>>> struct recover_work_struct, base);
>>> +
>>> +    recover_work->ret = 
>>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>>> +}
>>> +/*
>>> + * Serialize gpu recover into reset domain single threaded wq
>>> + */
>>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>> +                    struct amdgpu_job *job)
>>> +{
>>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>>> +
>>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>>> +
>>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>>> +        return -EAGAIN;
>>> +
>>> +    flush_work(&work.base);
>>> +
>>> +    return work.ret;
>>> +}
>>
>> Maybe that should be part of the scheduler code? Not really sure, 
>> just an idea.
>
>
> Seems to me that since the reset domain is almost always above a 
> single scheduler granularity then it wouldn't feet very well there.

Yeah, but what if we introduce an drm_sched_recover_queue and 
drm_sched_recover_work object?

It's probably ok to go forward with that for now, but this handling 
makes quite some sense to have independent of which driver is using it. 
So as soon as we see a second similar implementation we should move it 
into common code.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Apart from that looks good to me,
>> Christian.
>>
>>> +
>>>   /**
>>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>>    *
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index bfc47bea23db..38c9fd7b7ad4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>> -        amdgpu_device_gpu_recover(ring->adev, job);
>>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>>       } else {
>>>           drm_sched_suspend_timeout(&ring->sched);
>>>           if (amdgpu_sriov_vf(adev))
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-21  7:02         ` Christian König
@ 2021-12-21 16:03           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-21 16:03 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu


On 2021-12-21 2:02 a.m., Christian König wrote:
>
>
> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>
>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>> Restrict jobs resubmission to suspend case
>>>> only since schedulers not initialised yet on
>>>> probe.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>> amdgpu_device *adev)
>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>               continue;
>>>>   -        if (!ring->no_scheduler) {
>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>
>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>
>>> Christian.
>>
>>
>> Yes.
>
> Mhm, that's quite bad design then.


If you look at the original patch for this 
https://www.spinics.net/lists/amd-gfx/msg67560.html you will
see that that restarting scheduler here is only relevant for 
suspend/resume case because there was
a race to fix. There is no point in this code on driver init because 
nothing was submitted to scheduler yet
and so it seems to me ok to add condition that this code run only 
in_suspend case.


>
> How about we keep the order as is and allow specifying the reset work 
> queue with drm_sched_start() ?


As i mentioned above, the fact we even have drm_sched_start there is 
just part of a solution to resolve a race
during suspend/resume. It is not for device initialization and indeed, 
other client drivers of gpu shcheduler never call
drm_sched_start on device init. We must guarantee that reset work queue 
already initialized before any job submission to scheduler
and because of that IMHO the right place for this is drm_sched_init.

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>               drm_sched_start(&ring->sched, true);
>>>>           }
>>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-21 16:03           ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-21 16:03 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu

On 2021-12-21 2:02 a.m., Christian König wrote:
>
>
> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>
>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>> Restrict jobs resubmission to suspend case
>>>> only since schedulers not initialised yet on
>>>> probe.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>> amdgpu_device *adev)
>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>               continue;
>>>>   -        if (!ring->no_scheduler) {
>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>
>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>
>>> Christian.
>>
>>
>> Yes.
>
> Mhm, that's quite bad design then.

If you look at the original patch for this 
https://www.spinics.net/lists/amd-gfx/msg67560.html you will
see that that restarting scheduler here is only relevant for 
suspend/resume case because there was
a race to fix. There is no point in this code on driver init because 
nothing was submitted to scheduler yet
and so it seems to me ok to add condition that this code run only 
in_suspend case.

>
> How about we keep the order as is and allow specifying the reset work 
> queue with drm_sched_start() ?

As i mentioned above, the fact we even have drm_sched_start there is 
just part of a solution to resolve a race
during suspend/resume. It is not for device initialization and indeed, 
other client drivers of gpu shcheduler never call
drm_sched_start on device init. We must guarantee that reset work queue 
already initialized before any job submission to scheduler
and because of that IMHO the right place for this is drm_sched_init.

Andrey

>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>               drm_sched_start(&ring->sched, true);
>>>>           }
>>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
  2021-12-21  7:59         ` Christian König
@ 2021-12-21 16:10           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-21 16:10 UTC (permalink / raw)
  To: Christian König, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, Monk.Liu


On 2021-12-21 2:59 a.m., Christian König wrote:
> Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:
>>
>> On 2021-12-20 2:20 a.m., Christian König wrote:
>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>> Use reset domain wq also for non TDR gpu recovery trigers
>>>> such as sysfs and RAS. We must serialize all possible
>>>> GPU recoveries to gurantee no concurrency there.
>>>> For TDR call the original recovery function directly since
>>>> it's already executed from within the wq. For others just
>>>> use a wrapper to qeueue work and wait on it to finish.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 
>>>> +++++++++++++++++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index b5ff76aae7e0..8e96b9a14452 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>>>> amdgpu_device *adev);
>>>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>                     struct amdgpu_job* job);
>>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>> +                  struct amdgpu_job *job);
>>>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index b595e6d699b5..55cd67b9ede2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>>>    * Returns 0 for success or an error on failure.
>>>>    */
>>>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>>                     struct amdgpu_job *job)
>>>>   {
>>>>       struct list_head device_list, *device_list_handle = NULL;
>>>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>>>> amdgpu_device *adev,
>>>>       return r;
>>>>   }
>>>>   +struct recover_work_struct {
>>>
>>> Please add an amdgpu_ prefix to the name.
>>>
>>>> +    struct work_struct base;
>>>> +    struct amdgpu_device *adev;
>>>> +    struct amdgpu_job *job;
>>>> +    int ret;
>>>> +};
>>>> +
>>>> +static void amdgpu_device_queue_gpu_recover_work(struct 
>>>> work_struct *work)
>>>> +{
>>>> +    struct recover_work_struct *recover_work = container_of(work, 
>>>> struct recover_work_struct, base);
>>>> +
>>>> +    recover_work->ret = 
>>>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>>>> +}
>>>> +/*
>>>> + * Serialize gpu recover into reset domain single threaded wq
>>>> + */
>>>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>> +                    struct amdgpu_job *job)
>>>> +{
>>>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>>>> +
>>>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>>>> +
>>>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>>>> +        return -EAGAIN;
>>>> +
>>>> +    flush_work(&work.base);
>>>> +
>>>> +    return work.ret;
>>>> +}
>>>
>>> Maybe that should be part of the scheduler code? Not really sure, 
>>> just an idea.
>>
>>
>> Seems to me that since the reset domain is almost always above a 
>> single scheduler granularity then it wouldn't feet very well there.
>
> Yeah, but what if we introduce an drm_sched_recover_queue and 
> drm_sched_recover_work object?
>
> It's probably ok to go forward with that for now, but this handling 
> makes quite some sense to have independent of which driver is using 
> it. So as soon as we see a second similar implementation we should 
> move it into common code.
>
> Regards,
> Christian.


Agree, the only point i would stress is that we need an entity separate 
from from drm_gpu_scheduler, something like
drm_gpu_reset_domain which  should point to or be pointed by a set of 
schedulers that should go through
reset together.

Andrey


>>
>> Andrey
>>
>>
>>>
>>> Apart from that looks good to me,
>>> Christian.
>>>
>>>> +
>>>>   /**
>>>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>>>    *
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index bfc47bea23db..38c9fd7b7ad4 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>>>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>>> -        amdgpu_device_gpu_recover(ring->adev, job);
>>>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>>>       } else {
>>>>           drm_sched_suspend_timeout(&ring->sched);
>>>>           if (amdgpu_sriov_vf(adev))
>>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
@ 2021-12-21 16:10           ` Andrey Grodzovsky
  0 siblings, 0 replies; 46+ messages in thread
From: Andrey Grodzovsky @ 2021-12-21 16:10 UTC (permalink / raw)
  To: Christian König, Christian König, dri-devel, amd-gfx
  Cc: horace.chen, daniel, Monk.Liu


On 2021-12-21 2:59 a.m., Christian König wrote:
> Am 20.12.21 um 23:17 schrieb Andrey Grodzovsky:
>>
>> On 2021-12-20 2:20 a.m., Christian König wrote:
>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>> Use reset domain wq also for non TDR gpu recovery trigers
>>>> such as sysfs and RAS. We must serialize all possible
>>>> GPU recoveries to gurantee no concurrency there.
>>>> For TDR call the original recovery function directly since
>>>> it's already executed from within the wq. For others just
>>>> use a wrapper to qeueue work and wait on it to finish.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 33 
>>>> +++++++++++++++++++++-
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    |  2 +-
>>>>   3 files changed, 35 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index b5ff76aae7e0..8e96b9a14452 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1296,6 +1296,8 @@ bool amdgpu_device_has_job_running(struct 
>>>> amdgpu_device *adev);
>>>>   bool amdgpu_device_should_recover_gpu(struct amdgpu_device *adev);
>>>>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>>                     struct amdgpu_job* job);
>>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>> +                  struct amdgpu_job *job);
>>>>   void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>>>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index b595e6d699b5..55cd67b9ede2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4979,7 +4979,7 @@ static void amdgpu_device_recheck_guilty_jobs(
>>>>    * Returns 0 for success or an error on failure.
>>>>    */
>>>>   -int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>> +int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
>>>>                     struct amdgpu_job *job)
>>>>   {
>>>>       struct list_head device_list, *device_list_handle = NULL;
>>>> @@ -5236,6 +5236,37 @@ int amdgpu_device_gpu_recover(struct 
>>>> amdgpu_device *adev,
>>>>       return r;
>>>>   }
>>>>   +struct recover_work_struct {
>>>
>>> Please add an amdgpu_ prefix to the name.
>>>
>>>> +    struct work_struct base;
>>>> +    struct amdgpu_device *adev;
>>>> +    struct amdgpu_job *job;
>>>> +    int ret;
>>>> +};
>>>> +
>>>> +static void amdgpu_device_queue_gpu_recover_work(struct 
>>>> work_struct *work)
>>>> +{
>>>> +    struct recover_work_struct *recover_work = container_of(work, 
>>>> struct recover_work_struct, base);
>>>> +
>>>> +    recover_work->ret = 
>>>> amdgpu_device_gpu_recover_imp(recover_work->adev, recover_work->job);
>>>> +}
>>>> +/*
>>>> + * Serialize gpu recover into reset domain single threaded wq
>>>> + */
>>>> +int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>>>> +                    struct amdgpu_job *job)
>>>> +{
>>>> +    struct recover_work_struct work = {.adev = adev, .job = job};
>>>> +
>>>> +    INIT_WORK(&work.base, amdgpu_device_queue_gpu_recover_work);
>>>> +
>>>> +    if (!queue_work(adev->reset_domain.wq, &work.base))
>>>> +        return -EAGAIN;
>>>> +
>>>> +    flush_work(&work.base);
>>>> +
>>>> +    return work.ret;
>>>> +}
>>>
>>> Maybe that should be part of the scheduler code? Not really sure, 
>>> just an idea.
>>
>>
>> Seems to me that since the reset domain is almost always above a 
>> single scheduler granularity then it wouldn't feet very well there.
>
> Yeah, but what if we introduce an drm_sched_recover_queue and 
> drm_sched_recover_work object?
>
> It's probably ok to go forward with that for now, but this handling 
> makes quite some sense to have independent of which driver is using 
> it. So as soon as we see a second similar implementation we should 
> move it into common code.
>
> Regards,
> Christian.


Agree, the only point i would stress is that we need an entity separate 
from from drm_gpu_scheduler, something like
drm_gpu_reset_domain which  should point to or be pointed by a set of 
schedulers that should go through
reset together.

Andrey


>>
>> Andrey
>>
>>
>>>
>>> Apart from that looks good to me,
>>> Christian.
>>>
>>>> +
>>>>   /**
>>>>    * amdgpu_device_get_pcie_info - fence pcie info about the PCIE slot
>>>>    *
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index bfc47bea23db..38c9fd7b7ad4 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -63,7 +63,7 @@ static enum drm_gpu_sched_stat 
>>>> amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>             ti.process_name, ti.tgid, ti.task_name, ti.pid);
>>>>         if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>>> -        amdgpu_device_gpu_recover(ring->adev, job);
>>>> +        amdgpu_device_gpu_recover_imp(ring->adev, job);
>>>>       } else {
>>>>           drm_sched_suspend_timeout(&ring->sched);
>>>>           if (amdgpu_sriov_vf(adev))
>>>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
  2021-12-21 16:03           ` Andrey Grodzovsky
@ 2021-12-22  7:50             ` Christian König
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-22  7:50 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: horace.chen, christian.koenig, Monk.Liu

Am 21.12.21 um 17:03 schrieb Andrey Grodzovsky:
>
> On 2021-12-21 2:02 a.m., Christian König wrote:
>>
>>
>> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>>> Restrict jobs resubmission to suspend case
>>>>> only since schedulers not initialised yet on
>>>>> probe.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>>> amdgpu_device *adev)
>>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>>               continue;
>>>>>   -        if (!ring->no_scheduler) {
>>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>>
>>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>>
>>>> Christian.
>>>
>>>
>>> Yes.
>>
>> Mhm, that's quite bad design then.
>
>
> If you look at the original patch for this 
> https://www.spinics.net/lists/amd-gfx/msg67560.html you will
> see that that restarting scheduler here is only relevant for 
> suspend/resume case because there was
> a race to fix. There is no point in this code on driver init because 
> nothing was submitted to scheduler yet
> and so it seems to me ok to add condition that this code run only 
> in_suspend case.

Yeah, but having extra logic like this means that we have some design 
issue in the IP block handling.

We need to clean that and some other odd approaches up at some point, 
but probably not now.

Christian.

>
>
>>
>> How about we keep the order as is and allow specifying the reset work 
>> queue with drm_sched_start() ?
>
>
> As i mentioned above, the fact we even have drm_sched_start there is 
> just part of a solution to resolve a race
> during suspend/resume. It is not for device initialization and indeed, 
> other client drivers of gpu shcheduler never call
> drm_sched_start on device init. We must guarantee that reset work 
> queue already initialized before any job submission to scheduler
> and because of that IMHO the right place for this is drm_sched_init.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>>               drm_sched_start(&ring->sched, true);
>>>>>           }
>>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC 3/6] drm/amdgpu: Fix crash on modprobe
@ 2021-12-22  7:50             ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2021-12-22  7:50 UTC (permalink / raw)
  To: Andrey Grodzovsky, dri-devel, amd-gfx
  Cc: daniel, horace.chen, christian.koenig, Monk.Liu

Am 21.12.21 um 17:03 schrieb Andrey Grodzovsky:
>
> On 2021-12-21 2:02 a.m., Christian König wrote:
>>
>>
>> Am 20.12.21 um 20:22 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-12-20 2:17 a.m., Christian König wrote:
>>>> Am 17.12.21 um 23:27 schrieb Andrey Grodzovsky:
>>>>> Restrict jobs resubmission to suspend case
>>>>> only since schedulers not initialised yet on
>>>>> probe.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index 5527c68c51de..8ebd954e06c6 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -582,7 +582,7 @@ void amdgpu_fence_driver_hw_init(struct 
>>>>> amdgpu_device *adev)
>>>>>           if (!ring || !ring->fence_drv.initialized)
>>>>>               continue;
>>>>>   -        if (!ring->no_scheduler) {
>>>>> +        if (adev->in_suspend && !ring->no_scheduler) {
>>>>
>>>> Uff, why is that suddenly necessary? Because of the changed order?
>>>>
>>>> Christian.
>>>
>>>
>>> Yes.
>>
>> Mhm, that's quite bad design then.
>
>
> If you look at the original patch for this 
> https://www.spinics.net/lists/amd-gfx/msg67560.html you will
> see that that restarting scheduler here is only relevant for 
> suspend/resume case because there was
> a race to fix. There is no point in this code on driver init because 
> nothing was submitted to scheduler yet
> and so it seems to me ok to add condition that this code run only 
> in_suspend case.

Yeah, but having extra logic like this means that we have some design 
issue in the IP block handling.

We need to clean that and some other odd approaches up at some point, 
but probably not now.

Christian.

>
>
>>
>> How about we keep the order as is and allow specifying the reset work 
>> queue with drm_sched_start() ?
>
>
> As i mentioned above, the fact we even have drm_sched_start there is 
> just part of a solution to resolve a race
> during suspend/resume. It is not for device initialization and indeed, 
> other client drivers of gpu shcheduler never call
> drm_sched_start on device init. We must guarantee that reset work 
> queue already initialized before any job submission to scheduler
> and because of that IMHO the right place for this is drm_sched_init.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> drm_sched_resubmit_jobs(&ring->sched);
>>>>>               drm_sched_start(&ring->sched, true);
>>>>>           }
>>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-12-22  7:50 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17 22:27 [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky
2021-12-17 22:27 ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:16   ` Christian König
2021-12-20  7:16     ` Christian König
2021-12-20 21:51     ` Andrey Grodzovsky
2021-12-20 21:51       ` Andrey Grodzovsky
2021-12-21  7:05       ` Christian König
2021-12-21  7:05         ` Christian König
2021-12-17 22:27 ` [RFC 3/6] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:17   ` Christian König
2021-12-20  7:17     ` Christian König
2021-12-20 19:22     ` Andrey Grodzovsky
2021-12-20 19:22       ` Andrey Grodzovsky
2021-12-21  7:02       ` Christian König
2021-12-21  7:02         ` Christian König
2021-12-21 16:03         ` Andrey Grodzovsky
2021-12-21 16:03           ` Andrey Grodzovsky
2021-12-22  7:50           ` Christian König
2021-12-22  7:50             ` Christian König
2021-12-17 22:27 ` [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:20   ` Christian König
2021-12-20  7:20     ` Christian König
2021-12-20 22:17     ` Andrey Grodzovsky
2021-12-20 22:17       ` Andrey Grodzovsky
2021-12-21  7:59       ` Christian König
2021-12-21  7:59         ` Christian König
2021-12-21 16:10         ` Andrey Grodzovsky
2021-12-21 16:10           ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 5/6] drm/amdgpu: Drop hive->in_reset Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky
2021-12-17 22:27   ` Andrey Grodzovsky
2021-12-20  7:25 ` [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Christian König
2021-12-20  7:25   ` Christian König
2021-12-20  9:43   ` Daniel Vetter
2021-12-20  9:43     ` Daniel Vetter
2021-12-20 17:06   ` Liu, Shaoyun
2021-12-20 17:06     ` Liu, Shaoyun
2021-12-20 19:11     ` Andrey Grodzovsky
2021-12-20 19:11       ` Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.