All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
@ 2021-03-18  7:23 Dennis Li
  2021-03-18  7:23 ` [PATCH 1/4] drm/amdgpu: remove reset lock from low level functions Dennis Li
                   ` (4 more replies)
  0 siblings, 5 replies; 56+ messages in thread
From: Dennis Li @ 2021-03-18  7:23 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang,
	christian.koenig
  Cc: Dennis Li

We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore reset_sem is used to solve these synchronization issues between recovery thread and other threads.

The original solution locked registers' access in lower functions, which will introduce following issues:

1) many lower functions are used in both recovery thread and others. Firstly we must harvest these functions, it is easy to miss someones. Secondly these functions need select which lock (read lock or write lock) will be used, according to the thread it is running in. If the thread context isn't considered, the added lock will easily introduce deadlock. Besides that, in most time, developer easily forget to add locks for new functions.

2) performance drop. More lower functions are more frequently called.

3) easily introduce false positive lockdep complaint, because write lock has big range in recovery thread, but low level functions will hold read lock may be protected by other locks in other threads.

Therefore the new solution will try to add lock protection for ioctls of kfd. Its goal is that there are no threads except for recovery thread or its children (for xgmi) to access hardware when doing GPU reset and resume. So refine recovery thread as the following:

Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
   1). if failed, it means system had a recovery thread running, current thread exit directly;
   2). if success, enter recovery thread;

Step 1: cancel all delay works, stop drm schedule, complete all unreceived fences and so on. It try to stop or pause other threads.

Step 2: call down_write(&adev->reset_sem) to hold write lock, which will block recovery thread until other threads release read locks.

Step 3: normally, there is only recovery threads running to access hardware, it is safe to do gpu reset now.

Step 4: do post gpu reset, such as call all ips' resume functions;

Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release write lock. Recovery thread exit normally.

Other threads call the amdgpu_read_lock to synchronize with recovery thread. If it finds that in_gpu_reset is 1, it should release read lock if it has holden one, and then blocks itself to wait for recovery finished event. If thread successfully hold read lock and in_gpu_reset is 0, it continues. It will exit normally or be stopped by recovery thread in step 1.

Dennis Li (4):
  drm/amdgpu: remove reset lock from low level functions
  drm/amdgpu: refine the GPU recovery sequence
  drm/amdgpu: instead of using down/up_read directly
  drm/amdkfd: add reset lock protection for kfd entry functions

 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
 .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
 .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
 12 files changed, 345 insertions(+), 75 deletions(-)

-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/4] drm/amdgpu: remove reset lock from low level functions
  2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
@ 2021-03-18  7:23 ` Dennis Li
  2021-03-18  7:23 ` [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence Dennis Li
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 56+ messages in thread
From: Dennis Li @ 2021-03-18  7:23 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang,
	christian.koenig
  Cc: Dennis Li

It is easy to cause performance drop issue when using lock in low level
functions.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 0b1e0127056f..24ff5992cb02 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -374,13 +374,10 @@ uint32_t amdgpu_device_rreg(struct amdgpu_device *adev,
 
 	if ((reg * 4) < adev->rmmio_size) {
 		if (!(acc_flags & AMDGPU_REGS_NO_KIQ) &&
-		    amdgpu_sriov_runtime(adev) &&
-		    down_read_trylock(&adev->reset_sem)) {
+		    amdgpu_sriov_runtime(adev))
 			ret = amdgpu_kiq_rreg(adev, reg);
-			up_read(&adev->reset_sem);
-		} else {
+		else
 			ret = readl(((void __iomem *)adev->rmmio) + (reg * 4));
-		}
 	} else {
 		ret = adev->pcie_rreg(adev, reg * 4);
 	}
@@ -459,13 +456,10 @@ void amdgpu_device_wreg(struct amdgpu_device *adev,
 
 	if ((reg * 4) < adev->rmmio_size) {
 		if (!(acc_flags & AMDGPU_REGS_NO_KIQ) &&
-		    amdgpu_sriov_runtime(adev) &&
-		    down_read_trylock(&adev->reset_sem)) {
+		    amdgpu_sriov_runtime(adev))
 			amdgpu_kiq_wreg(adev, reg, v);
-			up_read(&adev->reset_sem);
-		} else {
+		else
 			writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
-		}
 	} else {
 		adev->pcie_wreg(adev, reg * 4, v);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index a05dbbbd9803..9f6eaca107ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -155,11 +155,7 @@ static int __update_table_header(struct amdgpu_ras_eeprom_control *control,
 
 	msg.addr = control->i2c_address;
 
-	/* i2c may be unstable in gpu reset */
-	down_read(&adev->reset_sem);
 	ret = i2c_transfer(&adev->pm.smu_i2c, &msg, 1);
-	up_read(&adev->reset_sem);
-
 	if (ret < 1)
 		DRM_ERROR("Failed to write EEPROM table header, ret:%d", ret);
 
@@ -546,11 +542,7 @@ int amdgpu_ras_eeprom_process_recods(struct amdgpu_ras_eeprom_control *control,
 		control->next_addr += EEPROM_TABLE_RECORD_SIZE;
 	}
 
-	/* i2c may be unstable in gpu reset */
-	down_read(&adev->reset_sem);
 	ret = i2c_transfer(&adev->pm.smu_i2c, msgs, num);
-	up_read(&adev->reset_sem);
-
 	if (ret < 1) {
 		DRM_ERROR("Failed to process EEPROM table records, ret:%d", ret);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 33e54eed2eec..690f368ce378 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -317,8 +317,7 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	 * Directly use kiq to do the vm invalidation instead
 	 */
 	if (adev->gfx.kiq.ring.sched.ready &&
-	    (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev)) &&
-	    down_read_trylock(&adev->reset_sem)) {
+	    (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev))) {
 		struct amdgpu_vmhub *hub = &adev->vmhub[vmhub];
 		const unsigned eng = 17;
 		u32 inv_req = hub->vmhub_funcs->get_invalidate_req(vmid, flush_type);
@@ -328,7 +327,6 @@ static void gmc_v10_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 		amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, inv_req,
 				1 << vmid);
 
-		up_read(&adev->reset_sem);
 		return;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 1567dd227f51..ec3c05360776 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -757,14 +757,12 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
 	 * as GFXOFF under bare metal
 	 */
 	if (adev->gfx.kiq.ring.sched.ready &&
-	    (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev)) &&
-	    down_read_trylock(&adev->reset_sem)) {
+	    (amdgpu_sriov_runtime(adev) || !amdgpu_sriov_vf(adev))) {
 		uint32_t req = hub->vm_inv_eng0_req + hub->eng_distance * eng;
 		uint32_t ack = hub->vm_inv_eng0_ack + hub->eng_distance * eng;
 
 		amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, inv_req,
 						   1 << vmid);
-		up_read(&adev->reset_sem);
 		return;
 	}
 
@@ -859,7 +857,7 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
 	if (amdgpu_in_reset(adev))
 		return -EIO;
 
-	if (ring->sched.ready && down_read_trylock(&adev->reset_sem)) {
+	if (ring->sched.ready) {
 		/* Vega20+XGMI caches PTEs in TC and TLB. Add a
 		 * heavy-weight TLB flush (type 2), which flushes
 		 * both. Due to a race condition with concurrent
@@ -886,7 +884,6 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
 		if (r) {
 			amdgpu_ring_undo(ring);
 			spin_unlock(&adev->gfx.kiq.ring_lock);
-			up_read(&adev->reset_sem);
 			return -ETIME;
 		}
 
@@ -895,10 +892,8 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct amdgpu_device *adev,
 		r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
 		if (r < 1) {
 			dev_err(adev->dev, "wait for kiq fence error: %ld.\n", r);
-			up_read(&adev->reset_sem);
 			return -ETIME;
 		}
-		up_read(&adev->reset_sem);
 		return 0;
 	}
 
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence
  2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
  2021-03-18  7:23 ` [PATCH 1/4] drm/amdgpu: remove reset lock from low level functions Dennis Li
@ 2021-03-18  7:23 ` Dennis Li
  2021-03-18  7:56   ` Christian König
  2021-03-18  7:23 ` [PATCH 3/4] drm/amdgpu: instead of using down/up_read directly Dennis Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 56+ messages in thread
From: Dennis Li @ 2021-03-18  7:23 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang,
	christian.koenig
  Cc: Dennis Li

Changed to only set in_gpu_reset as 1 when the recovery thread begin,
and delay hold reset_sem after pre-reset but before reset. It make sure
that other threads have exited or been blocked before doing GPU reset.
Compared with the old codes, it could make some threads exit more early
without waiting for timeout.

Introduce a event recovery_fini_event which is used to block new threads
when recovery thread has begun. These threads are only waked up when recovery
thread exit.

v2: remove codes to check the usage of adev->reset_sem, because lockdep
will show all locks held in the system, when system detect hung timeout
in the recovery thread.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 02a34f9a26aa..67c716e5ee8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1044,6 +1044,8 @@ struct amdgpu_device {
 	atomic_t 			in_gpu_reset;
 	enum pp_mp1_state               mp1_state;
 	struct rw_semaphore reset_sem;
+	wait_queue_head_t recovery_fini_event;
+
 	struct amdgpu_doorbell_index doorbell_index;
 
 	struct mutex			notifier_lock;
@@ -1406,4 +1408,8 @@ static inline int amdgpu_in_reset(struct amdgpu_device *adev)
 {
 	return atomic_read(&adev->in_gpu_reset);
 }
+
+int amdgpu_read_lock(struct drm_device *dev, bool interruptible);
+void amdgpu_read_unlock(struct drm_device *dev);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 24ff5992cb02..15235610cc54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -211,6 +211,60 @@ static ssize_t amdgpu_device_get_serial_number(struct device *dev,
 static DEVICE_ATTR(serial_number, S_IRUGO,
 		amdgpu_device_get_serial_number, NULL);
 
+int amdgpu_read_lock(struct drm_device *dev, bool interruptible)
+{
+	struct amdgpu_device *adev = drm_to_adev(dev);
+	int ret = 0;
+
+	/**
+	 * if a thread hold the read lock, but recovery thread has started,
+	 * it should release the read lock and wait for recovery thread finished
+	 * Because pre-reset functions have begun, which stops old threads but no
+	 * include the current thread.
+	*/
+	if (interruptible) {
+		while (!(ret = down_read_killable(&adev->reset_sem)) &&
+			amdgpu_in_reset(adev)) {
+			up_read(&adev->reset_sem);
+			ret = wait_event_interruptible(adev->recovery_fini_event,
+							!amdgpu_in_reset(adev));
+			if (ret)
+				break;
+		}
+	} else {
+		down_read(&adev->reset_sem);
+		while (amdgpu_in_reset(adev)) {
+			up_read(&adev->reset_sem);
+			wait_event(adev->recovery_fini_event,
+				   !amdgpu_in_reset(adev));
+			down_read(&adev->reset_sem);
+		}
+	}
+
+	return ret;
+}
+
+void amdgpu_read_unlock(struct drm_device *dev)
+{
+	struct amdgpu_device *adev = drm_to_adev(dev);
+
+	up_read(&adev->reset_sem);
+}
+
+static void amdgpu_write_lock(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
+{
+	if (hive) {
+		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
+	} else {
+		down_write(&adev->reset_sem);
+	}
+}
+
+static void amdgpu_write_unlock(struct amdgpu_device *adev)
+{
+	up_write(&adev->reset_sem);
+}
+
 /**
  * amdgpu_device_supports_atpx - Is the device a dGPU with HG/PX power control
  *
@@ -3280,6 +3334,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	hash_init(adev->mn_hash);
 	atomic_set(&adev->in_gpu_reset, 0);
 	init_rwsem(&adev->reset_sem);
+	init_waitqueue_head(&adev->recovery_fini_event);
 	mutex_init(&adev->psp.mutex);
 	mutex_init(&adev->notifier_lock);
 
@@ -4509,39 +4564,18 @@ int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
 	return r;
 }
 
-static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
-				struct amdgpu_hive_info *hive)
+static bool amdgpu_device_recovery_enter(struct amdgpu_device *adev)
 {
 	if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
 		return false;
 
-	if (hive) {
-		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
-	} else {
-		down_write(&adev->reset_sem);
-	}
-
-	switch (amdgpu_asic_reset_method(adev)) {
-	case AMD_RESET_METHOD_MODE1:
-		adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
-		break;
-	case AMD_RESET_METHOD_MODE2:
-		adev->mp1_state = PP_MP1_STATE_RESET;
-		break;
-	default:
-		adev->mp1_state = PP_MP1_STATE_NONE;
-		break;
-	}
-
 	return true;
 }
 
-static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
+static void amdgpu_device_recovery_exit(struct amdgpu_device *adev)
 {
-	amdgpu_vf_error_trans_all(adev);
-	adev->mp1_state = PP_MP1_STATE_NONE;
 	atomic_set(&adev->in_gpu_reset, 0);
-	up_write(&adev->reset_sem);
+	wake_up_interruptible_all(&adev->recovery_fini_event);
 }
 
 /*
@@ -4550,7 +4584,7 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
  *
  * unlock won't require roll back.
  */
-static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
+static int amdgpu_hive_recovery_enter(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
 {
 	struct amdgpu_device *tmp_adev = NULL;
 
@@ -4560,10 +4594,10 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
 			return -ENODEV;
 		}
 		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			if (!amdgpu_device_lock_adev(tmp_adev, hive))
+			if (!amdgpu_device_recovery_enter(tmp_adev))
 				goto roll_back;
 		}
-	} else if (!amdgpu_device_lock_adev(adev, hive))
+	} else if (!amdgpu_device_recovery_enter(adev))
 		return -EAGAIN;
 
 	return 0;
@@ -4578,12 +4612,61 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
 		 */
 		dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock");
 		list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) {
-			amdgpu_device_unlock_adev(tmp_adev);
+			amdgpu_device_recovery_exit(tmp_adev);
 		}
 	}
 	return -EAGAIN;
 }
 
+static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
+				struct amdgpu_hive_info *hive)
+{
+	amdgpu_write_lock(adev, hive);
+
+	switch (amdgpu_asic_reset_method(adev)) {
+	case AMD_RESET_METHOD_MODE1:
+		adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
+		break;
+	case AMD_RESET_METHOD_MODE2:
+		adev->mp1_state = PP_MP1_STATE_RESET;
+		break;
+	default:
+		adev->mp1_state = PP_MP1_STATE_NONE;
+		break;
+	}
+}
+
+static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
+{
+	amdgpu_vf_error_trans_all(adev);
+	adev->mp1_state = PP_MP1_STATE_NONE;
+	amdgpu_write_unlock(adev);
+}
+
+/*
+ * to lockup a list of amdgpu devices in a hive safely, if not a hive
+ * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
+ *
+ * unlock won't require roll back.
+ */
+static void amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
+{
+	struct amdgpu_device *tmp_adev = NULL;
+
+	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+		if (!hive) {
+			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
+			return;
+		}
+
+		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
+			amdgpu_device_lock_adev(tmp_adev, hive);
+		}
+	} else {
+		amdgpu_device_lock_adev(adev, hive);
+	}
+}
+
 static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
 {
 	struct pci_dev *p = NULL;
@@ -4732,6 +4815,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	bool need_emergency_restart = false;
 	bool audio_suspended = false;
 	int tmp_vram_lost_counter;
+	bool locked = false;
 
 	/*
 	 * Special case: RAS triggered and full reset isn't supported
@@ -4777,7 +4861,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * if didn't get the device lock, don't touch the linked list since
 	 * others may iterating it.
 	 */
-	r = amdgpu_device_lock_hive_adev(adev, hive);
+	r = amdgpu_hive_recovery_enter(adev, hive);
 	if (r) {
 		dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
 					job ? job->base.id : -1);
@@ -4884,6 +4968,16 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	}
 
 	tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
+	/*
+	 * Pre reset functions called before lock, which make sure other threads
+	 * who own reset lock exit successfully. No other thread runs in the driver
+	 * while the recovery thread runs
+	 */
+	if (!locked) {
+		amdgpu_device_lock_hive_adev(adev, hive);
+		locked = true;
+	}
+
 	/* Actual ASIC resets if needed.*/
 	/* TODO Implement XGMI hive reset logic for SRIOV */
 	if (amdgpu_sriov_vf(adev)) {
@@ -4955,7 +5049,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
 		if (audio_suspended)
 			amdgpu_device_resume_display_audio(tmp_adev);
-		amdgpu_device_unlock_adev(tmp_adev);
+		amdgpu_device_recovery_exit(tmp_adev);
+		if (locked)
+			amdgpu_device_unlock_adev(tmp_adev);
 	}
 
 skip_recovery:
@@ -5199,9 +5295,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
 		 * Locking adev->reset_sem will prevent any external access
 		 * to GPU during PCI error recovery
 		 */
-		while (!amdgpu_device_lock_adev(adev, NULL))
+		while (!amdgpu_device_recovery_enter(adev))
 			amdgpu_cancel_all_tdr(adev);
 
+		amdgpu_device_lock_adev(adev, NULL);
 		/*
 		 * Block any work scheduling as we do for regular GPU reset
 		 * for the duration of the recovery
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/4] drm/amdgpu: instead of using down/up_read directly
  2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
  2021-03-18  7:23 ` [PATCH 1/4] drm/amdgpu: remove reset lock from low level functions Dennis Li
  2021-03-18  7:23 ` [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence Dennis Li
@ 2021-03-18  7:23 ` Dennis Li
  2021-03-18  7:23 ` [PATCH 4/4] drm/amdkfd: add reset lock protection for kfd entry functions Dennis Li
  2021-03-18  7:53 ` [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Christian König
  4 siblings, 0 replies; 56+ messages in thread
From: Dennis Li @ 2021-03-18  7:23 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang,
	christian.koenig
  Cc: Dennis Li

change to use amdgpu_read_lock/unlock which could handle more cases

Signed-off-by: Dennis Li <Dennis.Li@amd.com>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index bcaf271b39bf..66dec0f49c4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -59,11 +59,12 @@ int amdgpu_debugfs_wait_dump(struct amdgpu_device *adev)
 static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
 {
 	struct amdgpu_device *adev = inode->i_private;
+	struct drm_device *dev = adev_to_drm(adev);
 	int ret;
 
 	file->private_data = adev;
 
-	ret = down_read_killable(&adev->reset_sem);
+	ret = amdgpu_read_lock(dev, true);
 	if (ret)
 		return ret;
 
@@ -74,7 +75,7 @@ static int amdgpu_debugfs_autodump_open(struct inode *inode, struct file *file)
 		ret = -EBUSY;
 	}
 
-	up_read(&adev->reset_sem);
+	amdgpu_read_unlock(dev);
 
 	return ret;
 }
@@ -1206,7 +1207,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 	}
 
 	/* Avoid accidently unparking the sched thread during GPU reset */
-	r = down_read_killable(&adev->reset_sem);
+	r = amdgpu_read_lock(dev, true);
 	if (r)
 		return r;
 
@@ -1235,7 +1236,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
 		kthread_unpark(ring->sched.thread);
 	}
 
-	up_read(&adev->reset_sem);
+	amdgpu_read_unlock(dev);
 
 	pm_runtime_mark_last_busy(dev->dev);
 	pm_runtime_put_autosuspend(dev->dev);
@@ -1427,6 +1428,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	struct amdgpu_ring *ring;
 	struct dma_fence **fences = NULL;
 	struct amdgpu_device *adev = (struct amdgpu_device *)data;
+	struct drm_device *dev = adev_to_drm(adev);
 
 	if (val >= AMDGPU_MAX_RINGS)
 		return -EINVAL;
@@ -1446,7 +1448,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 		return -ENOMEM;
 
 	/* Avoid accidently unparking the sched thread during GPU reset */
-	r = down_read_killable(&adev->reset_sem);
+	r = amdgpu_read_lock(dev, true);
 	if (r)
 		goto pro_end;
 
@@ -1489,7 +1491,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 	/* restart the scheduler */
 	kthread_unpark(ring->sched.thread);
 
-	up_read(&adev->reset_sem);
+	amdgpu_read_unlock(dev);
 
 	ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, resched);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 3ee481557fc9..113c63bf187f 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -247,12 +247,13 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
 	struct amdgpu_virt *virt = container_of(work, struct amdgpu_virt, flr_work);
 	struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt);
 	int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
+	struct drm_device *dev = adev_to_drm(adev);
 
 	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
 	 * otherwise the mailbox msg will be ruined/reseted by
 	 * the VF FLR.
 	 */
-	if (!down_read_trylock(&adev->reset_sem))
+	if (amdgpu_read_lock(dev, true))
 		return;
 
 	amdgpu_virt_fini_data_exchange(adev);
@@ -268,7 +269,7 @@ static void xgpu_ai_mailbox_flr_work(struct work_struct *work)
 
 flr_done:
 	atomic_set(&adev->in_gpu_reset, 0);
-	up_read(&adev->reset_sem);
+	amdgpu_read_unlock(dev);
 
 	/* Trigger recovery for world switch failure if no TDR */
 	if (amdgpu_device_should_recover_gpu(adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index 48e588d3c409..2cd910e5caa7 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -268,12 +268,13 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
 	struct amdgpu_virt *virt = container_of(work, struct amdgpu_virt, flr_work);
 	struct amdgpu_device *adev = container_of(virt, struct amdgpu_device, virt);
 	int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;
+	struct drm_device *dev = adev_to_drm(adev);
 
 	/* block amdgpu_gpu_recover till msg FLR COMPLETE received,
 	 * otherwise the mailbox msg will be ruined/reseted by
 	 * the VF FLR.
 	 */
-	if (!down_read_trylock(&adev->reset_sem))
+	if (amdgpu_read_lock(dev, true))
 		return;
 
 	amdgpu_virt_fini_data_exchange(adev);
@@ -289,7 +290,7 @@ static void xgpu_nv_mailbox_flr_work(struct work_struct *work)
 
 flr_done:
 	atomic_set(&adev->in_gpu_reset, 0);
-	up_read(&adev->reset_sem);
+	amdgpu_read_unlock(dev);
 
 	/* Trigger recovery for world switch failure if no TDR */
 	if (amdgpu_device_should_recover_gpu(adev)
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/4] drm/amdkfd: add reset lock protection for kfd entry functions
  2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
                   ` (2 preceding siblings ...)
  2021-03-18  7:23 ` [PATCH 3/4] drm/amdgpu: instead of using down/up_read directly Dennis Li
@ 2021-03-18  7:23 ` Dennis Li
  2021-03-18  7:53 ` [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Christian König
  4 siblings, 0 replies; 56+ messages in thread
From: Dennis Li @ 2021-03-18  7:23 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang,
	christian.koenig
  Cc: Dennis Li

When doing GPU reset, try to block all kfd functions including
kfd ioctls and file close function, which maybe access hardware.

v2: fix a potential recursive locking issue

kfd_ioctl_dbg_register has chance called into pqm_create_queue, which
will cause recursive locking. So remove locking read_lock from process
queue manager, and add read_lock into related ioctls instead.

v3: put pqm_query_dev_by_qid under the protection of p->mutex

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6802c616e10e..283ba9435233 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -40,6 +40,7 @@
 #include "kfd_dbgmgr.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -298,6 +299,9 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 	}
 
 	mutex_lock(&p->mutex);
+	err = amdgpu_read_lock(dev->ddev, true);
+	if (err)
+		goto err_read_lock;
 
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
@@ -326,6 +330,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 		 */
 		args->doorbell_offset |= doorbell_offset_in_process;
 
+	amdgpu_read_unlock(dev->ddev);
 	mutex_unlock(&p->mutex);
 
 	pr_debug("Queue id %d was created successfully\n", args->queue_id);
@@ -343,6 +348,8 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 
 err_create_queue:
 err_bind_process:
+	amdgpu_read_unlock(dev->ddev);
+err_read_lock:
 	mutex_unlock(&p->mutex);
 	return err;
 }
@@ -352,6 +359,7 @@ static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p,
 {
 	int retval;
 	struct kfd_ioctl_destroy_queue_args *args = data;
+	struct kfd_dev *dev;
 
 	pr_debug("Destroying queue id %d for pasid 0x%x\n",
 				args->queue_id,
@@ -359,8 +367,20 @@ static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p,
 
 	mutex_lock(&p->mutex);
 
+	dev = pqm_query_dev_by_qid(&p->pqm, args->queue_id);
+	if (!dev) {
+		retval = -EINVAL;
+		goto err_query_dev;
+	}
+
+	retval = amdgpu_read_lock(dev->ddev, true);
+	if (retval)
+		goto err_read_lock;
 	retval = pqm_destroy_queue(&p->pqm, args->queue_id);
+	amdgpu_read_unlock(dev->ddev);
 
+err_read_lock:
+err_query_dev:
 	mutex_unlock(&p->mutex);
 	return retval;
 }
@@ -371,6 +391,7 @@ static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p,
 	int retval;
 	struct kfd_ioctl_update_queue_args *args = data;
 	struct queue_properties properties;
+	struct kfd_dev *dev;
 
 	if (args->queue_percentage > KFD_MAX_QUEUE_PERCENTAGE) {
 		pr_err("Queue percentage must be between 0 to KFD_MAX_QUEUE_PERCENTAGE\n");
@@ -404,10 +425,21 @@ static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p,
 
 	mutex_lock(&p->mutex);
 
+	dev = pqm_query_dev_by_qid(&p->pqm, args->queue_id);
+	if (!dev) {
+		retval = -EINVAL;
+		goto err_query_dev;
+	}
+
+	retval = amdgpu_read_lock(dev->ddev, true);
+	if (retval)
+		goto err_read_lock;
 	retval = pqm_update_queue(&p->pqm, args->queue_id, &properties);
+	amdgpu_read_unlock(dev->ddev);
 
+err_read_lock:
+err_query_dev:
 	mutex_unlock(&p->mutex);
-
 	return retval;
 }
 
@@ -420,6 +452,7 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
 	struct queue_properties properties;
 	uint32_t __user *cu_mask_ptr = (uint32_t __user *)args->cu_mask_ptr;
 	size_t cu_mask_size = sizeof(uint32_t) * (args->num_cu_mask / 32);
+	struct kfd_dev *dev;
 
 	if ((args->num_cu_mask % 32) != 0) {
 		pr_debug("num_cu_mask 0x%x must be a multiple of 32",
@@ -456,8 +489,20 @@ static int kfd_ioctl_set_cu_mask(struct file *filp, struct kfd_process *p,
 
 	mutex_lock(&p->mutex);
 
+	dev = pqm_query_dev_by_qid(&p->pqm, args->queue_id);
+	if (!dev) {
+		retval = -EINVAL;
+		goto err_query_dev;
+	}
+
+	retval = amdgpu_read_lock(dev->ddev, true);
+	if (retval)
+		goto err_read_lock;
 	retval = pqm_set_cu_mask(&p->pqm, args->queue_id, &properties);
+	amdgpu_read_unlock(dev->ddev);
 
+err_read_lock:
+err_query_dev:
 	mutex_unlock(&p->mutex);
 
 	if (retval)
@@ -471,14 +516,27 @@ static int kfd_ioctl_get_queue_wave_state(struct file *filep,
 {
 	struct kfd_ioctl_get_queue_wave_state_args *args = data;
 	int r;
+	struct kfd_dev *dev;
 
 	mutex_lock(&p->mutex);
 
+	dev = pqm_query_dev_by_qid(&p->pqm, args->queue_id);
+	if (!dev) {
+		r = -EINVAL;
+		goto err_query_dev;
+	}
+
+	r = amdgpu_read_lock(dev->ddev, true);
+	if (r)
+		goto err_read_lock;
 	r = pqm_get_wave_state(&p->pqm, args->queue_id,
 			       (void __user *)args->ctl_stack_address,
 			       &args->ctl_stack_used_size,
 			       &args->save_area_used_size);
+	amdgpu_read_unlock(dev->ddev);
 
+err_read_lock:
+err_query_dev:
 	mutex_unlock(&p->mutex);
 
 	return r;
@@ -509,6 +567,10 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
 
 	mutex_lock(&p->mutex);
 
+	err = amdgpu_read_lock(dev->ddev, true);
+	if (err)
+		goto err_read_lock;
+
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
 		err = -ESRCH;
@@ -531,6 +593,9 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
 		err = -EINVAL;
 
 out:
+	amdgpu_read_unlock(dev->ddev);
+
+err_read_lock:
 	mutex_unlock(&p->mutex);
 
 	return err;
@@ -550,6 +615,10 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
 
 	mutex_lock(&p->mutex);
 
+	err = amdgpu_read_lock(dev->ddev, true);
+	if (err)
+		goto err_read_lock;
+
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
 		err = -ESRCH;
@@ -559,6 +628,9 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
 	kfd_process_set_trap_handler(&pdd->qpd, args->tba_addr, args->tma_addr);
 
 out:
+	amdgpu_read_unlock(dev->ddev);
+
+err_read_lock:
 	mutex_unlock(&p->mutex);
 
 	return err;
@@ -584,6 +656,11 @@ static int kfd_ioctl_dbg_register(struct file *filep,
 	}
 
 	mutex_lock(&p->mutex);
+
+	status = amdgpu_read_lock(dev->ddev, true);
+	if (status)
+		goto err_read_lock;
+
 	mutex_lock(kfd_get_dbgmgr_mutex());
 
 	/*
@@ -613,6 +690,9 @@ static int kfd_ioctl_dbg_register(struct file *filep,
 
 out:
 	mutex_unlock(kfd_get_dbgmgr_mutex());
+	amdgpu_read_unlock(dev->ddev);
+
+err_read_lock:
 	mutex_unlock(&p->mutex);
 
 	return status;
@@ -634,6 +714,10 @@ static int kfd_ioctl_dbg_unregister(struct file *filep,
 		return -EINVAL;
 	}
 
+	status = amdgpu_read_lock(dev->ddev, true);
+	if (status)
+		return status;
+
 	mutex_lock(kfd_get_dbgmgr_mutex());
 
 	status = kfd_dbgmgr_unregister(dev->dbgmgr, p);
@@ -644,6 +728,8 @@ static int kfd_ioctl_dbg_unregister(struct file *filep,
 
 	mutex_unlock(kfd_get_dbgmgr_mutex());
 
+	amdgpu_read_unlock(dev->ddev);
+
 	return status;
 }
 
@@ -743,15 +829,19 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep,
 	/* Currently HSA Event is not supported for DBG */
 	aw_info.watch_event = NULL;
 
+	status = amdgpu_read_lock(dev->ddev, true);
+	if (status)
+		goto out;
+
 	mutex_lock(kfd_get_dbgmgr_mutex());
 
 	status = kfd_dbgmgr_address_watch(dev->dbgmgr, &aw_info);
 
 	mutex_unlock(kfd_get_dbgmgr_mutex());
 
+	amdgpu_read_unlock(dev->ddev);
 out:
 	kfree(args_buff);
-
 	return status;
 }
 
@@ -822,6 +912,10 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
 					*((uint32_t *)(&args_buff[args_idx]));
 	wac_info.dbgWave_msg.MemoryVA = NULL;
 
+	status = amdgpu_read_lock(dev->ddev, true);
+	if (status)
+		goto pro_end;
+
 	mutex_lock(kfd_get_dbgmgr_mutex());
 
 	pr_debug("Calling dbg manager process %p, operand %u, mode %u, trapId %u, message %u\n",
@@ -835,6 +929,9 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
 
 	mutex_unlock(kfd_get_dbgmgr_mutex());
 
+	amdgpu_read_unlock(dev->ddev);
+
+pro_end:
 	kfree(args_buff);
 
 	return status;
@@ -847,10 +944,11 @@ static int kfd_ioctl_get_clock_counters(struct file *filep,
 	struct kfd_dev *dev;
 
 	dev = kfd_device_by_id(args->gpu_id);
-	if (dev)
+	if (dev && !amdgpu_read_lock(dev->ddev, true)) {
 		/* Reading GPU clock counter from KGD */
 		args->gpu_clock_counter = amdgpu_amdkfd_get_gpu_clock_counter(dev->kgd);
-	else
+		amdgpu_read_unlock(dev->ddev);
+	} else
 		/* Node without GPU resource */
 		args->gpu_clock_counter = 0;
 
@@ -1056,13 +1154,20 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 		}
 		mutex_unlock(&p->mutex);
 
+		err = amdgpu_read_lock(kfd->ddev, true);
+		if (err)
+			return err;
+
 		err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
 						mem, &kern_addr, &size);
 		if (err) {
 			pr_err("Failed to map event page to kernel\n");
+			amdgpu_read_unlock(kfd->ddev);
 			return err;
 		}
 
+		amdgpu_read_unlock(kfd->ddev);
+
 		err = kfd_event_page_set(p, kern_addr, size);
 		if (err) {
 			pr_err("Failed to set event page\n");
@@ -1144,11 +1249,17 @@ static int kfd_ioctl_set_scratch_backing_va(struct file *filep,
 
 	mutex_unlock(&p->mutex);
 
+	err = amdgpu_read_lock(dev->ddev, true);
+	if (err)
+		return err;
+
 	if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS &&
 	    pdd->qpd.vmid != 0 && dev->kfd2kgd->set_scratch_backing_va)
 		dev->kfd2kgd->set_scratch_backing_va(
 			dev->kgd, args->va_addr, pdd->qpd.vmid);
 
+	amdgpu_read_unlock(dev->ddev);
+
 	return 0;
 
 bind_process_to_device_fail:
@@ -1217,6 +1328,10 @@ static int kfd_ioctl_acquire_vm(struct file *filep, struct kfd_process *p,
 
 	mutex_lock(&p->mutex);
 
+	ret = amdgpu_read_lock(dev->ddev, true);
+	if (ret)
+		goto err_read_lock;
+
 	pdd = kfd_get_process_device_data(dev, p);
 	if (!pdd) {
 		ret = -EINVAL;
@@ -1231,12 +1346,16 @@ static int kfd_ioctl_acquire_vm(struct file *filep, struct kfd_process *p,
 	ret = kfd_process_device_init_vm(pdd, drm_file);
 	if (ret)
 		goto err_unlock;
+
+	amdgpu_read_unlock(dev->ddev);
 	/* On success, the PDD keeps the drm_file reference */
 	mutex_unlock(&p->mutex);
 
 	return 0;
 
 err_unlock:
+	amdgpu_read_unlock(dev->ddev);
+err_read_lock:
 	mutex_unlock(&p->mutex);
 	fput(drm_file);
 	return ret;
@@ -1289,6 +1408,10 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 
 	mutex_lock(&p->mutex);
 
+	err = amdgpu_read_lock(dev->ddev, true);
+	if (err)
+		goto err_read_lock;
+
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
 		err = PTR_ERR(pdd);
@@ -1331,6 +1454,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	if (flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
 		WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + args->size);
 
+	amdgpu_read_unlock(dev->ddev);
 	mutex_unlock(&p->mutex);
 
 	args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
@@ -1348,6 +1472,8 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 err_free:
 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, NULL);
 err_unlock:
+	amdgpu_read_unlock(dev->ddev);
+err_read_lock:
 	mutex_unlock(&p->mutex);
 	return err;
 }
@@ -1368,6 +1494,10 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
 
 	mutex_lock(&p->mutex);
 
+	ret = amdgpu_read_lock(dev->ddev, true);
+	if (ret)
+		goto err_read_lock;
+
 	pdd = kfd_get_process_device_data(dev, p);
 	if (!pdd) {
 		pr_err("Process device data doesn't exist\n");
@@ -1395,6 +1525,8 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
 	WRITE_ONCE(pdd->vram_usage, pdd->vram_usage - size);
 
 err_unlock:
+	amdgpu_read_unlock(dev->ddev);
+err_read_lock:
 	mutex_unlock(&p->mutex);
 	return ret;
 }
@@ -1465,13 +1597,21 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 			err = PTR_ERR(peer_pdd);
 			goto get_mem_obj_from_handle_failed;
 		}
+
+		err = amdgpu_read_lock(peer->ddev, true);
+		if (err)
+			goto map_memory_to_gpu_failed;
+
 		err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
 		if (err) {
 			pr_err("Failed to map to gpu %d/%d\n",
 			       i, args->n_devices);
+			amdgpu_read_unlock(peer->ddev);
 			goto map_memory_to_gpu_failed;
 		}
+
+		amdgpu_read_unlock(peer->ddev);
 		args->n_success = i+1;
 	}
 
@@ -1491,7 +1631,10 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 		peer_pdd = kfd_get_process_device_data(peer, p);
 		if (WARN_ON_ONCE(!peer_pdd))
 			continue;
-		kfd_flush_tlb(peer_pdd);
+		if (!amdgpu_read_lock(peer->ddev, true)) {
+			kfd_flush_tlb(peer_pdd);
+			amdgpu_read_unlock(peer->ddev);
+		}
 	}
 
 	kfree(devices_arr);
@@ -1572,13 +1715,20 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 			err = -ENODEV;
 			goto get_mem_obj_from_handle_failed;
 		}
+
+		err = amdgpu_read_lock(peer->ddev, true);
+		if (err)
+			goto unmap_memory_from_gpu_failed;
+
 		err = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
 		if (err) {
 			pr_err("Failed to unmap from gpu %d/%d\n",
 			       i, args->n_devices);
+			amdgpu_read_unlock(peer->ddev);
 			goto unmap_memory_from_gpu_failed;
 		}
+		amdgpu_read_unlock(peer->ddev);
 		args->n_success = i+1;
 	}
 	kfree(devices_arr);
@@ -1624,7 +1774,13 @@ static int kfd_ioctl_alloc_queue_gws(struct file *filep,
 		goto out_unlock;
 	}
 
+	retval = amdgpu_read_lock(dev->ddev, true);
+	if (retval)
+		goto out_unlock;
+
 	retval = pqm_set_gws(&p->pqm, args->queue_id, args->num_gws ? dev->gws : NULL);
+
+	amdgpu_read_unlock(dev->ddev);
 	mutex_unlock(&p->mutex);
 
 	args->first_gws = 0;
@@ -1711,6 +1867,9 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
 		return PTR_ERR(dmabuf);
 
 	mutex_lock(&p->mutex);
+	r = amdgpu_read_lock(dev->ddev, true);
+	if (r)
+		goto err_read_lock;
 
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
@@ -1731,6 +1890,7 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
 		goto err_free;
 	}
 
+	amdgpu_read_unlock(dev->ddev);
 	mutex_unlock(&p->mutex);
 	dma_buf_put(dmabuf);
 
@@ -1741,6 +1901,8 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
 err_free:
 	amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem, NULL);
 err_unlock:
+	amdgpu_read_unlock(dev->ddev);
+err_read_lock:
 	mutex_unlock(&p->mutex);
 	dma_buf_put(dmabuf);
 	return r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d8c8b5ff449a..5ea25c7dff0d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1011,7 +1011,8 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 		       void __user *ctl_stack,
 		       u32 *ctl_stack_used_size,
 		       u32 *save_area_used_size);
-
+struct kfd_dev *pqm_query_dev_by_qid(struct process_queue_manager *pqm,
+				     unsigned int qid);
 int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
 			      unsigned int fence_value,
 			      unsigned int timeout_ms);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index f5237997fa18..d02ca231ad83 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -898,11 +898,15 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
 				    per_device_list) {
 			if (!peer_pdd->vm)
 				continue;
+			amdgpu_read_lock(peer_pdd->dev->ddev, false);
 			amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 				peer_pdd->dev->kgd, mem, peer_pdd->vm);
+			amdgpu_read_unlock(peer_pdd->dev->ddev);
 		}
 
+		amdgpu_read_lock(pdd->dev->ddev, false);
 		amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem, NULL);
+		amdgpu_read_unlock(pdd->dev->ddev);
 		kfd_process_device_remove_obj_handle(pdd, id);
 	}
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index eb1635ac8988..2b2308c0b006 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -64,6 +64,23 @@ static int find_available_queue_slot(struct process_queue_manager *pqm,
 	return 0;
 }
 
+struct kfd_dev *pqm_query_dev_by_qid(struct process_queue_manager *pqm,
+				     unsigned int qid)
+{
+	struct process_queue_node *pqn;
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (!pqn) {
+		pr_err("Queue id does not match any known queue\n");
+		return NULL;
+	}
+
+	if (pqn->q)
+		return pqn->q->device;
+
+	return NULL;
+}
+
 void kfd_process_dequeue_from_device(struct kfd_process_device *pdd)
 {
 	struct kfd_dev *dev = pdd->dev;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
                   ` (3 preceding siblings ...)
  2021-03-18  7:23 ` [PATCH 4/4] drm/amdkfd: add reset lock protection for kfd entry functions Dennis Li
@ 2021-03-18  7:53 ` Christian König
  2021-03-18  8:28   ` Li, Dennis
  4 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-03-18  7:53 UTC (permalink / raw)
  To: Dennis Li, amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang

Am 18.03.21 um 08:23 schrieb Dennis Li:
> We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore reset_sem is used to solve these synchronization issues between recovery thread and other threads.
>
> The original solution locked registers' access in lower functions, which will introduce following issues:
>
> 1) many lower functions are used in both recovery thread and others. Firstly we must harvest these functions, it is easy to miss someones. Secondly these functions need select which lock (read lock or write lock) will be used, according to the thread it is running in. If the thread context isn't considered, the added lock will easily introduce deadlock. Besides that, in most time, developer easily forget to add locks for new functions.
>
> 2) performance drop. More lower functions are more frequently called.
>
> 3) easily introduce false positive lockdep complaint, because write lock has big range in recovery thread, but low level functions will hold read lock may be protected by other locks in other threads.
>
> Therefore the new solution will try to add lock protection for ioctls of kfd. Its goal is that there are no threads except for recovery thread or its children (for xgmi) to access hardware when doing GPU reset and resume. So refine recovery thread as the following:
>
> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>     1). if failed, it means system had a recovery thread running, current thread exit directly;
>     2). if success, enter recovery thread;
>
> Step 1: cancel all delay works, stop drm schedule, complete all unreceived fences and so on. It try to stop or pause other threads.
>
> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will block recovery thread until other threads release read locks.

Those two steps need to be exchanged or otherwise it is possible that 
new delayed work items etc are started before the lock is taken.

Just to make it clear until this is fixed the whole patch set is a NAK.

Regards,
Christian.

>
> Step 3: normally, there is only recovery threads running to access hardware, it is safe to do gpu reset now.
>
> Step 4: do post gpu reset, such as call all ips' resume functions;
>
> Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release write lock. Recovery thread exit normally.
>
> Other threads call the amdgpu_read_lock to synchronize with recovery thread. If it finds that in_gpu_reset is 1, it should release read lock if it has holden one, and then blocks itself to wait for recovery finished event. If thread successfully hold read lock and in_gpu_reset is 0, it continues. It will exit normally or be stopped by recovery thread in step 1.
>
> Dennis Li (4):
>    drm/amdgpu: remove reset lock from low level functions
>    drm/amdgpu: refine the GPU recovery sequence
>    drm/amdgpu: instead of using down/up_read directly
>    drm/amdkfd: add reset lock protection for kfd entry functions
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
>   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
>   12 files changed, 345 insertions(+), 75 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence
  2021-03-18  7:23 ` [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence Dennis Li
@ 2021-03-18  7:56   ` Christian König
  0 siblings, 0 replies; 56+ messages in thread
From: Christian König @ 2021-03-18  7:56 UTC (permalink / raw)
  To: Dennis Li, amd-gfx, Alexander.Deucher, felix.kuehling, Hawking.Zhang

Am 18.03.21 um 08:23 schrieb Dennis Li:
> Changed to only set in_gpu_reset as 1 when the recovery thread begin,
> and delay hold reset_sem after pre-reset but before reset. It make sure
> that other threads have exited or been blocked before doing GPU reset.
> Compared with the old codes, it could make some threads exit more early
> without waiting for timeout.
>
> Introduce a event recovery_fini_event which is used to block new threads
> when recovery thread has begun. These threads are only waked up when recovery
> thread exit.
>
> v2: remove codes to check the usage of adev->reset_sem, because lockdep
> will show all locks held in the system, when system detect hung timeout
> in the recovery thread.
>
> Signed-off-by: Dennis Li <Dennis.Li@amd.com>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 02a34f9a26aa..67c716e5ee8d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1044,6 +1044,8 @@ struct amdgpu_device {
>   	atomic_t 			in_gpu_reset;
>   	enum pp_mp1_state               mp1_state;
>   	struct rw_semaphore reset_sem;
> +	wait_queue_head_t recovery_fini_event;
> +
>   	struct amdgpu_doorbell_index doorbell_index;
>   
>   	struct mutex			notifier_lock;
> @@ -1406,4 +1408,8 @@ static inline int amdgpu_in_reset(struct amdgpu_device *adev)
>   {
>   	return atomic_read(&adev->in_gpu_reset);
>   }
> +
> +int amdgpu_read_lock(struct drm_device *dev, bool interruptible);
> +void amdgpu_read_unlock(struct drm_device *dev);
> +
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 24ff5992cb02..15235610cc54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -211,6 +211,60 @@ static ssize_t amdgpu_device_get_serial_number(struct device *dev,
>   static DEVICE_ATTR(serial_number, S_IRUGO,
>   		amdgpu_device_get_serial_number, NULL);
>   
> +int amdgpu_read_lock(struct drm_device *dev, bool interruptible)
> +{
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +	int ret = 0;
> +
> +	/**
> +	 * if a thread hold the read lock, but recovery thread has started,
> +	 * it should release the read lock and wait for recovery thread finished
> +	 * Because pre-reset functions have begun, which stops old threads but no
> +	 * include the current thread.
> +	*/
> +	if (interruptible) {
> +		while (!(ret = down_read_killable(&adev->reset_sem)) &&
> +			amdgpu_in_reset(adev)) {
> +			up_read(&adev->reset_sem);
> +			ret = wait_event_interruptible(adev->recovery_fini_event,
> +							!amdgpu_in_reset(adev));
> +			if (ret)
> +				break;
> +		}
> +	} else {
> +		down_read(&adev->reset_sem);
> +		while (amdgpu_in_reset(adev)) {
> +			up_read(&adev->reset_sem);
> +			wait_event(adev->recovery_fini_event,
> +				   !amdgpu_in_reset(adev));
> +			down_read(&adev->reset_sem);
> +		}
> +	}

Ok once more. This general approach is a NAK. We have already tried this 
and it doesn't work.

All you do here is to replace the gpu reset lock with 
wait_event_interruptible().

 From an upstream perspective that is strictly illegal since it will 
just prevent lockdep warning from filling the logs and doesn't really 
solve any problem.

Regards,
Christian.

> +
> +	return ret;
> +}
> +
> +void amdgpu_read_unlock(struct drm_device *dev)
> +{
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +
> +	up_read(&adev->reset_sem);
> +}
> +
> +static void amdgpu_write_lock(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
> +{
> +	if (hive) {
> +		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
> +	} else {
> +		down_write(&adev->reset_sem);
> +	}
> +}
> +
> +static void amdgpu_write_unlock(struct amdgpu_device *adev)
> +{
> +	up_write(&adev->reset_sem);
> +}
> +
>   /**
>    * amdgpu_device_supports_atpx - Is the device a dGPU with HG/PX power control
>    *
> @@ -3280,6 +3334,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	hash_init(adev->mn_hash);
>   	atomic_set(&adev->in_gpu_reset, 0);
>   	init_rwsem(&adev->reset_sem);
> +	init_waitqueue_head(&adev->recovery_fini_event);
>   	mutex_init(&adev->psp.mutex);
>   	mutex_init(&adev->notifier_lock);
>   
> @@ -4509,39 +4564,18 @@ int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
>   	return r;
>   }
>   
> -static bool amdgpu_device_lock_adev(struct amdgpu_device *adev,
> -				struct amdgpu_hive_info *hive)
> +static bool amdgpu_device_recovery_enter(struct amdgpu_device *adev)
>   {
>   	if (atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) != 0)
>   		return false;
>   
> -	if (hive) {
> -		down_write_nest_lock(&adev->reset_sem, &hive->hive_lock);
> -	} else {
> -		down_write(&adev->reset_sem);
> -	}
> -
> -	switch (amdgpu_asic_reset_method(adev)) {
> -	case AMD_RESET_METHOD_MODE1:
> -		adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
> -		break;
> -	case AMD_RESET_METHOD_MODE2:
> -		adev->mp1_state = PP_MP1_STATE_RESET;
> -		break;
> -	default:
> -		adev->mp1_state = PP_MP1_STATE_NONE;
> -		break;
> -	}
> -
>   	return true;
>   }
>   
> -static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
> +static void amdgpu_device_recovery_exit(struct amdgpu_device *adev)
>   {
> -	amdgpu_vf_error_trans_all(adev);
> -	adev->mp1_state = PP_MP1_STATE_NONE;
>   	atomic_set(&adev->in_gpu_reset, 0);
> -	up_write(&adev->reset_sem);
> +	wake_up_interruptible_all(&adev->recovery_fini_event);
>   }
>   
>   /*
> @@ -4550,7 +4584,7 @@ static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
>    *
>    * unlock won't require roll back.
>    */
> -static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
> +static int amdgpu_hive_recovery_enter(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
>   {
>   	struct amdgpu_device *tmp_adev = NULL;
>   
> @@ -4560,10 +4594,10 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
>   			return -ENODEV;
>   		}
>   		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
> -			if (!amdgpu_device_lock_adev(tmp_adev, hive))
> +			if (!amdgpu_device_recovery_enter(tmp_adev))
>   				goto roll_back;
>   		}
> -	} else if (!amdgpu_device_lock_adev(adev, hive))
> +	} else if (!amdgpu_device_recovery_enter(adev))
>   		return -EAGAIN;
>   
>   	return 0;
> @@ -4578,12 +4612,61 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
>   		 */
>   		dev_warn(tmp_adev->dev, "Hive lock iteration broke in the middle. Rolling back to unlock");
>   		list_for_each_entry_continue_reverse(tmp_adev, &hive->device_list, gmc.xgmi.head) {
> -			amdgpu_device_unlock_adev(tmp_adev);
> +			amdgpu_device_recovery_exit(tmp_adev);
>   		}
>   	}
>   	return -EAGAIN;
>   }
>   
> +static void amdgpu_device_lock_adev(struct amdgpu_device *adev,
> +				struct amdgpu_hive_info *hive)
> +{
> +	amdgpu_write_lock(adev, hive);
> +
> +	switch (amdgpu_asic_reset_method(adev)) {
> +	case AMD_RESET_METHOD_MODE1:
> +		adev->mp1_state = PP_MP1_STATE_SHUTDOWN;
> +		break;
> +	case AMD_RESET_METHOD_MODE2:
> +		adev->mp1_state = PP_MP1_STATE_RESET;
> +		break;
> +	default:
> +		adev->mp1_state = PP_MP1_STATE_NONE;
> +		break;
> +	}
> +}
> +
> +static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
> +{
> +	amdgpu_vf_error_trans_all(adev);
> +	adev->mp1_state = PP_MP1_STATE_NONE;
> +	amdgpu_write_unlock(adev);
> +}
> +
> +/*
> + * to lockup a list of amdgpu devices in a hive safely, if not a hive
> + * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
> + *
> + * unlock won't require roll back.
> + */
> +static void amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgpu_hive_info *hive)
> +{
> +	struct amdgpu_device *tmp_adev = NULL;
> +
> +	if (adev->gmc.xgmi.num_physical_nodes > 1) {
> +		if (!hive) {
> +			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
> +			return;
> +		}
> +
> +		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) {
> +			amdgpu_device_lock_adev(tmp_adev, hive);
> +		}
> +	} else {
> +		amdgpu_device_lock_adev(adev, hive);
> +	}
> +}
> +
>   static void amdgpu_device_resume_display_audio(struct amdgpu_device *adev)
>   {
>   	struct pci_dev *p = NULL;
> @@ -4732,6 +4815,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   	bool need_emergency_restart = false;
>   	bool audio_suspended = false;
>   	int tmp_vram_lost_counter;
> +	bool locked = false;
>   
>   	/*
>   	 * Special case: RAS triggered and full reset isn't supported
> @@ -4777,7 +4861,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   	 * if didn't get the device lock, don't touch the linked list since
>   	 * others may iterating it.
>   	 */
> -	r = amdgpu_device_lock_hive_adev(adev, hive);
> +	r = amdgpu_hive_recovery_enter(adev, hive);
>   	if (r) {
>   		dev_info(adev->dev, "Bailing on TDR for s_job:%llx, as another already in progress",
>   					job ? job->base.id : -1);
> @@ -4884,6 +4968,16 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   	}
>   
>   	tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));
> +	/*
> +	 * Pre reset functions called before lock, which make sure other threads
> +	 * who own reset lock exit successfully. No other thread runs in the driver
> +	 * while the recovery thread runs
> +	 */
> +	if (!locked) {
> +		amdgpu_device_lock_hive_adev(adev, hive);
> +		locked = true;
> +	}
> +
>   	/* Actual ASIC resets if needed.*/
>   	/* TODO Implement XGMI hive reset logic for SRIOV */
>   	if (amdgpu_sriov_vf(adev)) {
> @@ -4955,7 +5049,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   
>   		if (audio_suspended)
>   			amdgpu_device_resume_display_audio(tmp_adev);
> -		amdgpu_device_unlock_adev(tmp_adev);
> +		amdgpu_device_recovery_exit(tmp_adev);
> +		if (locked)
> +			amdgpu_device_unlock_adev(tmp_adev);
>   	}
>   
>   skip_recovery:
> @@ -5199,9 +5295,10 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>   		 * Locking adev->reset_sem will prevent any external access
>   		 * to GPU during PCI error recovery
>   		 */
> -		while (!amdgpu_device_lock_adev(adev, NULL))
> +		while (!amdgpu_device_recovery_enter(adev))
>   			amdgpu_cancel_all_tdr(adev);
>   
> +		amdgpu_device_lock_adev(adev, NULL);
>   		/*
>   		 * Block any work scheduling as we do for regular GPU reset
>   		 * for the duration of the recovery

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  7:53 ` [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Christian König
@ 2021-03-18  8:28   ` Li, Dennis
  2021-03-18  8:58     ` AW: " Koenig, Christian
  0 siblings, 1 reply; 56+ messages in thread
From: Li, Dennis @ 2021-03-18  8:28 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx, Deucher, Alexander, Kuehling, Felix,
	Zhang, Hawking

>>> Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.
What about adding check for adev->in_gpu_reset in work item? If exchange the two steps, it maybe introduce the deadlock.  For example, the user thread hold the read lock and waiting for the fence, if recovery thread try to hold write lock and then complete fences, in this case, recovery thread will always be blocked. 

Best Regards
Dennis Li
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Thursday, March 18, 2021 3:54 PM
To: Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 18.03.21 um 08:23 schrieb Dennis Li:
> We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore reset_sem is used to solve these synchronization issues between recovery thread and other threads.
>
> The original solution locked registers' access in lower functions, which will introduce following issues:
>
> 1) many lower functions are used in both recovery thread and others. Firstly we must harvest these functions, it is easy to miss someones. Secondly these functions need select which lock (read lock or write lock) will be used, according to the thread it is running in. If the thread context isn't considered, the added lock will easily introduce deadlock. Besides that, in most time, developer easily forget to add locks for new functions.
>
> 2) performance drop. More lower functions are more frequently called.
>
> 3) easily introduce false positive lockdep complaint, because write lock has big range in recovery thread, but low level functions will hold read lock may be protected by other locks in other threads.
>
> Therefore the new solution will try to add lock protection for ioctls of kfd. Its goal is that there are no threads except for recovery thread or its children (for xgmi) to access hardware when doing GPU reset and resume. So refine recovery thread as the following:
>
> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>     1). if failed, it means system had a recovery thread running, current thread exit directly;
>     2). if success, enter recovery thread;
>
> Step 1: cancel all delay works, stop drm schedule, complete all unreceived fences and so on. It try to stop or pause other threads.
>
> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will block recovery thread until other threads release read locks.

Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.

Just to make it clear until this is fixed the whole patch set is a NAK.

Regards,
Christian.

>
> Step 3: normally, there is only recovery threads running to access hardware, it is safe to do gpu reset now.
>
> Step 4: do post gpu reset, such as call all ips' resume functions;
>
> Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release write lock. Recovery thread exit normally.
>
> Other threads call the amdgpu_read_lock to synchronize with recovery thread. If it finds that in_gpu_reset is 1, it should release read lock if it has holden one, and then blocks itself to wait for recovery finished event. If thread successfully hold read lock and in_gpu_reset is 0, it continues. It will exit normally or be stopped by recovery thread in step 1.
>
> Dennis Li (4):
>    drm/amdgpu: remove reset lock from low level functions
>    drm/amdgpu: refine the GPU recovery sequence
>    drm/amdgpu: instead of using down/up_read directly
>    drm/amdkfd: add reset lock protection for kfd entry functions
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
>   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
>   12 files changed, 345 insertions(+), 75 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  8:28   ` Li, Dennis
@ 2021-03-18  8:58     ` Koenig, Christian
  2021-03-18  9:30       ` Li, Dennis
  0 siblings, 1 reply; 56+ messages in thread
From: Koenig, Christian @ 2021-03-18  8:58 UTC (permalink / raw)
  To: Li, Dennis, amd-gfx, Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 5816 bytes --]

Exactly that's what you don't seem to understand.

The GPU reset doesn't complete the fences we wait for. It only completes the hardware fences as part of the reset.

So waiting for a fence while holding the reset lock is illegal and needs to be avoided.

Lockdep also complains about this when it is used correctly. The only reason it doesn't complain here is because you use an atomic+wait_event instead of a locking primitive.

Regards,
Christian.

________________________________
Von: Li, Dennis <Dennis.Li@amd.com>
Gesendet: Donnerstag, 18. März 2021 09:28
An: Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
Betreff: RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

>>> Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.
What about adding check for adev->in_gpu_reset in work item? If exchange the two steps, it maybe introduce the deadlock.  For example, the user thread hold the read lock and waiting for the fence, if recovery thread try to hold write lock and then complete fences, in this case, recovery thread will always be blocked.

Best Regards
Dennis Li
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Thursday, March 18, 2021 3:54 PM
To: Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 18.03.21 um 08:23 schrieb Dennis Li:
> We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore reset_sem is used to solve these synchronization issues between recovery thread and other threads.
>
> The original solution locked registers' access in lower functions, which will introduce following issues:
>
> 1) many lower functions are used in both recovery thread and others. Firstly we must harvest these functions, it is easy to miss someones. Secondly these functions need select which lock (read lock or write lock) will be used, according to the thread it is running in. If the thread context isn't considered, the added lock will easily introduce deadlock. Besides that, in most time, developer easily forget to add locks for new functions.
>
> 2) performance drop. More lower functions are more frequently called.
>
> 3) easily introduce false positive lockdep complaint, because write lock has big range in recovery thread, but low level functions will hold read lock may be protected by other locks in other threads.
>
> Therefore the new solution will try to add lock protection for ioctls of kfd. Its goal is that there are no threads except for recovery thread or its children (for xgmi) to access hardware when doing GPU reset and resume. So refine recovery thread as the following:
>
> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>     1). if failed, it means system had a recovery thread running, current thread exit directly;
>     2). if success, enter recovery thread;
>
> Step 1: cancel all delay works, stop drm schedule, complete all unreceived fences and so on. It try to stop or pause other threads.
>
> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will block recovery thread until other threads release read locks.

Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.

Just to make it clear until this is fixed the whole patch set is a NAK.

Regards,
Christian.

>
> Step 3: normally, there is only recovery threads running to access hardware, it is safe to do gpu reset now.
>
> Step 4: do post gpu reset, such as call all ips' resume functions;
>
> Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release write lock. Recovery thread exit normally.
>
> Other threads call the amdgpu_read_lock to synchronize with recovery thread. If it finds that in_gpu_reset is 1, it should release read lock if it has holden one, and then blocks itself to wait for recovery finished event. If thread successfully hold read lock and in_gpu_reset is 0, it continues. It will exit normally or be stopped by recovery thread in step 1.
>
> Dennis Li (4):
>    drm/amdgpu: remove reset lock from low level functions
>    drm/amdgpu: refine the GPU recovery sequence
>    drm/amdgpu: instead of using down/up_read directly
>    drm/amdkfd: add reset lock protection for kfd entry functions
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
>   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
>   12 files changed, 345 insertions(+), 75 deletions(-)
>


[-- Attachment #1.2: Type: text/html, Size: 9127 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  8:58     ` AW: " Koenig, Christian
@ 2021-03-18  9:30       ` Li, Dennis
  2021-03-18  9:51         ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Li, Dennis @ 2021-03-18  9:30 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx, Deucher, Alexander, Kuehling, Felix,
	Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 7515 bytes --]

>>> The GPU reset doesn't complete the fences we wait for. It only completes the hardware fences as part of the reset.
>>> So waiting for a fence while holding the reset lock is illegal and needs to be avoided.
I understood your concern. It is more complex for DRM GFX, therefore I abandon adding lock protection for DRM ioctls now. Maybe we can try to add all kernel  dma_fence waiting in a list, and signal all in recovery threads. Do you have same concern for compute cases?

>>> Lockdep also complains about this when it is used correctly. The only reason it doesn't complain here is because you use an atomic+wait_event instead of a locking primitive.
Agree. This approach will escape the monitor of lockdep.  Its goal is to block other threads when GPU recovery thread start. But I couldn’t find a better method to solve this problem. Do you have some suggestion?

Best Regards
Dennis Li

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Thursday, March 18, 2021 4:59 PM
To: Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
Subject: AW: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Exactly that's what you don't seem to understand.

The GPU reset doesn't complete the fences we wait for. It only completes the hardware fences as part of the reset.

So waiting for a fence while holding the reset lock is illegal and needs to be avoided.

Lockdep also complains about this when it is used correctly. The only reason it doesn't complain here is because you use an atomic+wait_event instead of a locking primitive.

Regards,
Christian.

________________________________
Von: Li, Dennis <Dennis.Li@amd.com<mailto:Dennis.Li@amd.com>>
Gesendet: Donnerstag, 18. März 2021 09:28
An: Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Betreff: RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

>>> Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.
What about adding check for adev->in_gpu_reset in work item? If exchange the two steps, it maybe introduce the deadlock.  For example, the user thread hold the read lock and waiting for the fence, if recovery thread try to hold write lock and then complete fences, in this case, recovery thread will always be blocked.

Best Regards
Dennis Li
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Sent: Thursday, March 18, 2021 3:54 PM
To: Li, Dennis <Dennis.Li@amd.com<mailto:Dennis.Li@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix <Felix.Kuehling@amd.com<mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking <Hawking.Zhang@amd.com<mailto:Hawking.Zhang@amd.com>>
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 18.03.21 um 08:23 schrieb Dennis Li:
> We have defined two variables in_gpu_reset and reset_sem in adev object. The atomic type variable in_gpu_reset is used to avoid recovery thread reenter and make lower functions return more earlier when recovery start, but couldn't block recovery thread when it access hardware. The r/w semaphore reset_sem is used to solve these synchronization issues between recovery thread and other threads.
>
> The original solution locked registers' access in lower functions, which will introduce following issues:
>
> 1) many lower functions are used in both recovery thread and others. Firstly we must harvest these functions, it is easy to miss someones. Secondly these functions need select which lock (read lock or write lock) will be used, according to the thread it is running in. If the thread context isn't considered, the added lock will easily introduce deadlock. Besides that, in most time, developer easily forget to add locks for new functions.
>
> 2) performance drop. More lower functions are more frequently called.
>
> 3) easily introduce false positive lockdep complaint, because write lock has big range in recovery thread, but low level functions will hold read lock may be protected by other locks in other threads.
>
> Therefore the new solution will try to add lock protection for ioctls of kfd. Its goal is that there are no threads except for recovery thread or its children (for xgmi) to access hardware when doing GPU reset and resume. So refine recovery thread as the following:
>
> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>     1). if failed, it means system had a recovery thread running, current thread exit directly;
>     2). if success, enter recovery thread;
>
> Step 1: cancel all delay works, stop drm schedule, complete all unreceived fences and so on. It try to stop or pause other threads.
>
> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will block recovery thread until other threads release read locks.

Those two steps need to be exchanged or otherwise it is possible that new delayed work items etc are started before the lock is taken.

Just to make it clear until this is fixed the whole patch set is a NAK.

Regards,
Christian.

>
> Step 3: normally, there is only recovery threads running to access hardware, it is safe to do gpu reset now.
>
> Step 4: do post gpu reset, such as call all ips' resume functions;
>
> Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads and release write lock. Recovery thread exit normally.
>
> Other threads call the amdgpu_read_lock to synchronize with recovery thread. If it finds that in_gpu_reset is 1, it should release read lock if it has holden one, and then blocks itself to wait for recovery finished event. If thread successfully hold read lock and in_gpu_reset is 0, it continues. It will exit normally or be stopped by recovery thread in step 1.
>
> Dennis Li (4):
>    drm/amdgpu: remove reset lock from low level functions
>    drm/amdgpu: refine the GPU recovery sequence
>    drm/amdgpu: instead of using down/up_read directly
>    drm/amdkfd: add reset lock protection for kfd entry functions
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
>   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
>   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
>   .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
>   12 files changed, 345 insertions(+), 75 deletions(-)
>

[-- Attachment #1.2: Type: text/html, Size: 13439 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  9:30       ` Li, Dennis
@ 2021-03-18  9:51         ` Christian König
  2021-04-05 17:58           ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-03-18  9:51 UTC (permalink / raw)
  To: Li, Dennis, amd-gfx, Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 8737 bytes --]

Am 18.03.21 um 10:30 schrieb Li, Dennis:
>
> >>> The GPU reset doesn't complete the fences we wait for. It only 
> completes the hardware fences as part of the reset.
>
> >>> So waiting for a fence while holding the reset lock is illegal and 
> needs to be avoided.
>
> I understood your concern. It is more complex for DRM GFX, therefore I 
> abandon adding lock protection for DRM ioctls now. Maybe we can try to 
> add all kernel  dma_fence waiting in a list, and signal all in 
> recovery threads. Do you have same concern for compute cases?
>

Yes, compute (KFD) is even harder to handle.

See you can't signal the dma_fence waiting. Waiting for a dma_fence also 
means you wait for the GPU reset to finish.

When we would signal the dma_fence during the GPU reset then we would 
run into memory corruption because the hardware jobs running after the 
GPU reset would access memory which is already freed.

> >>> Lockdep also complains about this when it is used correctly. The 
> only reason it doesn't complain here is because you use an 
> atomic+wait_event instead of a locking primitive.
>
> Agree. This approach will escape the monitor of lockdep.  Its goal is 
> to block other threads when GPU recovery thread start. But I couldn’t 
> find a better method to solve this problem. Do you have some suggestion?
>

Well, completely abandon those change here.

What we need to do is to identify where hardware access happens and then 
insert taking the read side of the GPU reset lock so that we don't wait 
for a dma_fence or allocate memory, but still protect the hardware from 
concurrent access and reset.

Regards,
Christian.

> Best Regards
>
> Dennis Li
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Thursday, March 18, 2021 4:59 PM
> *To:* Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix 
> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance its 
> stability
>
> Exactly that's what you don't seem to understand.
>
> The GPU reset doesn't complete the fences we wait for. It only 
> completes the hardware fences as part of the reset.
>
> So waiting for a fence while holding the reset lock is illegal and 
> needs to be avoided.
>
> Lockdep also complains about this when it is used correctly. The only 
> reason it doesn't complain here is because you use an 
> atomic+wait_event instead of a locking primitive.
>
> Regards,
>
> Christian.
>
> ------------------------------------------------------------------------
>
> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
> *Gesendet:* Donnerstag, 18. März 2021 09:28
> *An:* Koenig, Christian <Christian.Koenig@amd.com 
> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
> <mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org 
> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
> Kuehling, Felix <Felix.Kuehling@amd.com 
> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its 
> stability
>
> >>> Those two steps need to be exchanged or otherwise it is possible 
> that new delayed work items etc are started before the lock is taken.
> What about adding check for adev->in_gpu_reset in work item? If 
> exchange the two steps, it maybe introduce the deadlock.  For example, 
> the user thread hold the read lock and waiting for the fence, if 
> recovery thread try to hold write lock and then complete fences, in 
> this case, recovery thread will always be blocked.
>
>
> Best Regards
> Dennis Li
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com 
> <mailto:Christian.Koenig@amd.com>>
> Sent: Thursday, March 18, 2021 3:54 PM
> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>; 
> Deucher, Alexander <Alexander.Deucher@amd.com 
> <mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix 
> <Felix.Kuehling@amd.com <mailto:Felix.Kuehling@amd.com>>; Zhang, 
> Hawking <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its 
> stability
>
> Am 18.03.21 um 08:23 schrieb Dennis Li:
> > We have defined two variables in_gpu_reset and reset_sem in adev 
> object. The atomic type variable in_gpu_reset is used to avoid 
> recovery thread reenter and make lower functions return more earlier 
> when recovery start, but couldn't block recovery thread when it access 
> hardware. The r/w semaphore reset_sem is used to solve these 
> synchronization issues between recovery thread and other threads.
> >
> > The original solution locked registers' access in lower functions, 
> which will introduce following issues:
> >
> > 1) many lower functions are used in both recovery thread and others. 
> Firstly we must harvest these functions, it is easy to miss someones. 
> Secondly these functions need select which lock (read lock or write 
> lock) will be used, according to the thread it is running in. If the 
> thread context isn't considered, the added lock will easily introduce 
> deadlock. Besides that, in most time, developer easily forget to add 
> locks for new functions.
> >
> > 2) performance drop. More lower functions are more frequently called.
> >
> > 3) easily introduce false positive lockdep complaint, because write 
> lock has big range in recovery thread, but low level functions will 
> hold read lock may be protected by other locks in other threads.
> >
> > Therefore the new solution will try to add lock protection for 
> ioctls of kfd. Its goal is that there are no threads except for 
> recovery thread or its children (for xgmi) to access hardware when 
> doing GPU reset and resume. So refine recovery thread as the following:
> >
> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
> >     1). if failed, it means system had a recovery thread running, 
> current thread exit directly;
> >     2). if success, enter recovery thread;
> >
> > Step 1: cancel all delay works, stop drm schedule, complete all 
> unreceived fences and so on. It try to stop or pause other threads.
> >
> > Step 2: call down_write(&adev->reset_sem) to hold write lock, which 
> will block recovery thread until other threads release read locks.
>
> Those two steps need to be exchanged or otherwise it is possible that 
> new delayed work items etc are started before the lock is taken.
>
> Just to make it clear until this is fixed the whole patch set is a NAK.
>
> Regards,
> Christian.
>
> >
> > Step 3: normally, there is only recovery threads running to access 
> hardware, it is safe to do gpu reset now.
> >
> > Step 4: do post gpu reset, such as call all ips' resume functions;
> >
> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads 
> and release write lock. Recovery thread exit normally.
> >
> > Other threads call the amdgpu_read_lock to synchronize with recovery 
> thread. If it finds that in_gpu_reset is 1, it should release read 
> lock if it has holden one, and then blocks itself to wait for recovery 
> finished event. If thread successfully hold read lock and in_gpu_reset 
> is 0, it continues. It will exit normally or be stopped by recovery 
> thread in step 1.
> >
> > Dennis Li (4):
> >    drm/amdgpu: remove reset lock from low level functions
> >    drm/amdgpu: refine the GPU recovery sequence
> >    drm/amdgpu: instead of using down/up_read directly
> >    drm/amdkfd: add reset lock protection for kfd entry functions
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   6 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   |  14 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 +++++++++++++-----
> >   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    |   8 -
> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |   4 +-
> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |   9 +-
> >   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |   5 +-
> >   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         |   5 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   3 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   4 +
> >   .../amd/amdkfd/kfd_process_queue_manager.c    |  17 ++
> >   12 files changed, 345 insertions(+), 75 deletions(-)
> >
>


[-- Attachment #1.2: Type: text/html, Size: 18202 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-03-18  9:51         ` Christian König
@ 2021-04-05 17:58           ` Andrey Grodzovsky
  2021-04-06 10:34             ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-05 17:58 UTC (permalink / raw)
  To: Christian König, Li, Dennis, amd-gfx, Deucher, Alexander,
	Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 9910 bytes --]

Denis, Christian, are there any updates in the plan on how to move on 
with this ? As you know I need very similar code for my up-streaming of 
device hot-unplug. My latest solution 
(https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
was not acceptable because of low level guards on the register accessors 
level which was hurting performance. Basically I need a way to prevent 
any MMIO write accesses from kernel driver after device is removed (UMD 
accesses are taken care of by page faulting dummy page). We are using 
now hot-unplug code for Freemont program and so up-streaming became more 
of a priority then before. This MMIO access issue is currently my main 
blocker from up-streaming. Is there any way I can assist in pushing this 
on ?

Andrey

On 2021-03-18 5:51 a.m., Christian König wrote:
> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>
>> >>> The GPU reset doesn't complete the fences we wait for. It only 
>> completes the hardware fences as part of the reset.
>>
>> >>> So waiting for a fence while holding the reset lock is illegal 
>> and needs to be avoided.
>>
>> I understood your concern. It is more complex for DRM GFX, therefore 
>> I abandon adding lock protection for DRM ioctls now. Maybe we can try 
>> to add all kernel  dma_fence waiting in a list, and signal all in 
>> recovery threads. Do you have same concern for compute cases?
>>
>
> Yes, compute (KFD) is even harder to handle.
>
> See you can't signal the dma_fence waiting. Waiting for a dma_fence 
> also means you wait for the GPU reset to finish.
>
> When we would signal the dma_fence during the GPU reset then we would 
> run into memory corruption because the hardware jobs running after the 
> GPU reset would access memory which is already freed.
>
>> >>> Lockdep also complains about this when it is used correctly. The 
>> only reason it doesn't complain here is because you use an 
>> atomic+wait_event instead of a locking primitive.
>>
>> Agree. This approach will escape the monitor of lockdep.  Its goal is 
>> to block other threads when GPU recovery thread start. But I couldn’t 
>> find a better method to solve this problem. Do you have some suggestion?
>>
>
> Well, completely abandon those change here.
>
> What we need to do is to identify where hardware access happens and 
> then insert taking the read side of the GPU reset lock so that we 
> don't wait for a dma_fence or allocate memory, but still protect the 
> hardware from concurrent access and reset.
>
> Regards,
> Christian.
>
>> Best Regards
>>
>> Dennis Li
>>
>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>> *Sent:* Thursday, March 18, 2021 4:59 PM
>> *To:* Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; 
>> Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix 
>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>> its stability
>>
>> Exactly that's what you don't seem to understand.
>>
>> The GPU reset doesn't complete the fences we wait for. It only 
>> completes the hardware fences as part of the reset.
>>
>> So waiting for a fence while holding the reset lock is illegal and 
>> needs to be avoided.
>>
>> Lockdep also complains about this when it is used correctly. The only 
>> reason it doesn't complain here is because you use an 
>> atomic+wait_event instead of a locking primitive.
>>
>> Regards,
>>
>> Christian.
>>
>> ------------------------------------------------------------------------
>>
>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>> <mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org 
>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>> Kuehling, Felix <Felix.Kuehling@amd.com 
>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>> its stability
>>
>> >>> Those two steps need to be exchanged or otherwise it is possible 
>> that new delayed work items etc are started before the lock is taken.
>> What about adding check for adev->in_gpu_reset in work item? If 
>> exchange the two steps, it maybe introduce the deadlock.  For 
>> example, the user thread hold the read lock and waiting for the 
>> fence, if recovery thread try to hold write lock and then complete 
>> fences, in this case, recovery thread will always be blocked.
>>
>>
>> Best Regards
>> Dennis Li
>> -----Original Message-----
>> From: Koenig, Christian <Christian.Koenig@amd.com 
>> <mailto:Christian.Koenig@amd.com>>
>> Sent: Thursday, March 18, 2021 3:54 PM
>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>> amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>; 
>> Deucher, Alexander <Alexander.Deucher@amd.com 
>> <mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix 
>> <Felix.Kuehling@amd.com <mailto:Felix.Kuehling@amd.com>>; Zhang, 
>> Hawking <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its 
>> stability
>>
>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>> > We have defined two variables in_gpu_reset and reset_sem in adev 
>> object. The atomic type variable in_gpu_reset is used to avoid 
>> recovery thread reenter and make lower functions return more earlier 
>> when recovery start, but couldn't block recovery thread when it 
>> access hardware. The r/w semaphore reset_sem is used to solve these 
>> synchronization issues between recovery thread and other threads.
>> >
>> > The original solution locked registers' access in lower functions, 
>> which will introduce following issues:
>> >
>> > 1) many lower functions are used in both recovery thread and 
>> others. Firstly we must harvest these functions, it is easy to miss 
>> someones. Secondly these functions need select which lock (read lock 
>> or write lock) will be used, according to the thread it is running 
>> in. If the thread context isn't considered, the added lock will 
>> easily introduce deadlock. Besides that, in most time, developer 
>> easily forget to add locks for new functions.
>> >
>> > 2) performance drop. More lower functions are more frequently called.
>> >
>> > 3) easily introduce false positive lockdep complaint, because write 
>> lock has big range in recovery thread, but low level functions will 
>> hold read lock may be protected by other locks in other threads.
>> >
>> > Therefore the new solution will try to add lock protection for 
>> ioctls of kfd. Its goal is that there are no threads except for 
>> recovery thread or its children (for xgmi) to access hardware when 
>> doing GPU reset and resume. So refine recovery thread as the following:
>> >
>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>> >     1). if failed, it means system had a recovery thread running, 
>> current thread exit directly;
>> >     2). if success, enter recovery thread;
>> >
>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>> unreceived fences and so on. It try to stop or pause other threads.
>> >
>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, which 
>> will block recovery thread until other threads release read locks.
>>
>> Those two steps need to be exchanged or otherwise it is possible that 
>> new delayed work items etc are started before the lock is taken.
>>
>> Just to make it clear until this is fixed the whole patch set is a NAK.
>>
>> Regards,
>> Christian.
>>
>> >
>> > Step 3: normally, there is only recovery threads running to access 
>> hardware, it is safe to do gpu reset now.
>> >
>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>> >
>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads 
>> and release write lock. Recovery thread exit normally.
>> >
>> > Other threads call the amdgpu_read_lock to synchronize with 
>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>> release read lock if it has holden one, and then blocks itself to 
>> wait for recovery finished event. If thread successfully hold read 
>> lock and in_gpu_reset is 0, it continues. It will exit normally or be 
>> stopped by recovery thread in step 1.
>> >
>> > Dennis Li (4):
>> >    drm/amdgpu: remove reset lock from low level functions
>> >    drm/amdgpu: refine the GPU recovery sequence
>> >    drm/amdgpu: instead of using down/up_read directly
>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>> >
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 6 +
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 14 +-
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 
>> +++++++++++++-----
>> >   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    | 8 -
>> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        | 4 +-
>> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         | 9 +-
>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         | 5 +-
>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         | 5 +-
>> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 ++++++++++++++++-
>> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         | 3 +-
>> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 4 +
>> >   .../amd/amdkfd/kfd_process_queue_manager.c    | 17 ++
>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>> >
>>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #1.2: Type: text/html, Size: 20574 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-05 17:58           ` Andrey Grodzovsky
@ 2021-04-06 10:34             ` Christian König
  2021-04-06 11:21               ` Christian König
  2021-04-06 21:22               ` Andrey Grodzovsky
  0 siblings, 2 replies; 56+ messages in thread
From: Christian König @ 2021-04-06 10:34 UTC (permalink / raw)
  To: Andrey Grodzovsky, Li, Dennis, amd-gfx, Deucher, Alexander,
	Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 10674 bytes --]

Hi Andrey,

well good question. My job is to watch over the implementation and 
design and while I always help I can adjust anybodies schedule.

Is the patch to print a warning when the hardware is accessed without 
holding the locks merged yet? If not then that would probably be a good 
starting point.

Then we would need to unify this with the SRCU to make sure that we have 
both the reset lock as well as block the hotplug code from reusing the 
MMIO space.

And then testing, testing, testing to see if we have missed something.

Christian.

Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>
> Denis, Christian, are there any updates in the plan on how to move on 
> with this ? As you know I need very similar code for my up-streaming 
> of device hot-unplug. My latest solution 
> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
> was not acceptable because of low level guards on the register 
> accessors level which was hurting performance. Basically I need a way 
> to prevent any MMIO write accesses from kernel driver after device is 
> removed (UMD accesses are taken care of by page faulting dummy page). 
> We are using now hot-unplug code for Freemont program and so 
> up-streaming became more of a priority then before. This MMIO access 
> issue is currently my main blocker from up-streaming. Is there any way 
> I can assist in pushing this on ?
>
> Andrey
>
> On 2021-03-18 5:51 a.m., Christian König wrote:
>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>
>>> >>> The GPU reset doesn't complete the fences we wait for. It only 
>>> completes the hardware fences as part of the reset.
>>>
>>> >>> So waiting for a fence while holding the reset lock is illegal 
>>> and needs to be avoided.
>>>
>>> I understood your concern. It is more complex for DRM GFX, therefore 
>>> I abandon adding lock protection for DRM ioctls now. Maybe we can 
>>> try to add all kernel  dma_fence waiting in a list, and signal all 
>>> in recovery threads. Do you have same concern for compute cases?
>>>
>>
>> Yes, compute (KFD) is even harder to handle.
>>
>> See you can't signal the dma_fence waiting. Waiting for a dma_fence 
>> also means you wait for the GPU reset to finish.
>>
>> When we would signal the dma_fence during the GPU reset then we would 
>> run into memory corruption because the hardware jobs running after 
>> the GPU reset would access memory which is already freed.
>>
>>> >>> Lockdep also complains about this when it is used correctly. The 
>>> only reason it doesn't complain here is because you use an 
>>> atomic+wait_event instead of a locking primitive.
>>>
>>> Agree. This approach will escape the monitor of lockdep.  Its goal 
>>> is to block other threads when GPU recovery thread start. But I 
>>> couldn’t find a better method to solve this problem. Do you have 
>>> some suggestion?
>>>
>>
>> Well, completely abandon those change here.
>>
>> What we need to do is to identify where hardware access happens and 
>> then insert taking the read side of the GPU reset lock so that we 
>> don't wait for a dma_fence or allocate memory, but still protect the 
>> hardware from concurrent access and reset.
>>
>> Regards,
>> Christian.
>>
>>> Best Regards
>>>
>>> Dennis Li
>>>
>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>> *To:* Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; 
>>> Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>> its stability
>>>
>>> Exactly that's what you don't seem to understand.
>>>
>>> The GPU reset doesn't complete the fences we wait for. It only 
>>> completes the hardware fences as part of the reset.
>>>
>>> So waiting for a fence while holding the reset lock is illegal and 
>>> needs to be avoided.
>>>
>>> Lockdep also complains about this when it is used correctly. The 
>>> only reason it doesn't complain here is because you use an 
>>> atomic+wait_event instead of a locking primitive.
>>>
>>> Regards,
>>>
>>> Christian.
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>> <amd-gfx@lists.freedesktop.org 
>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>> its stability
>>>
>>> >>> Those two steps need to be exchanged or otherwise it is possible 
>>> that new delayed work items etc are started before the lock is taken.
>>> What about adding check for adev->in_gpu_reset in work item? If 
>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>> example, the user thread hold the read lock and waiting for the 
>>> fence, if recovery thread try to hold write lock and then complete 
>>> fences, in this case, recovery thread will always be blocked.
>>>
>>>
>>> Best Regards
>>> Dennis Li
>>> -----Original Message-----
>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>> <mailto:Christian.Koenig@amd.com>>
>>> Sent: Thursday, March 18, 2021 3:54 PM
>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>> amd-gfx@lists.freedesktop.org 
>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its 
>>> stability
>>>
>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>> > We have defined two variables in_gpu_reset and reset_sem in adev 
>>> object. The atomic type variable in_gpu_reset is used to avoid 
>>> recovery thread reenter and make lower functions return more earlier 
>>> when recovery start, but couldn't block recovery thread when it 
>>> access hardware. The r/w semaphore reset_sem is used to solve these 
>>> synchronization issues between recovery thread and other threads.
>>> >
>>> > The original solution locked registers' access in lower functions, 
>>> which will introduce following issues:
>>> >
>>> > 1) many lower functions are used in both recovery thread and 
>>> others. Firstly we must harvest these functions, it is easy to miss 
>>> someones. Secondly these functions need select which lock (read lock 
>>> or write lock) will be used, according to the thread it is running 
>>> in. If the thread context isn't considered, the added lock will 
>>> easily introduce deadlock. Besides that, in most time, developer 
>>> easily forget to add locks for new functions.
>>> >
>>> > 2) performance drop. More lower functions are more frequently called.
>>> >
>>> > 3) easily introduce false positive lockdep complaint, because 
>>> write lock has big range in recovery thread, but low level functions 
>>> will hold read lock may be protected by other locks in other threads.
>>> >
>>> > Therefore the new solution will try to add lock protection for 
>>> ioctls of kfd. Its goal is that there are no threads except for 
>>> recovery thread or its children (for xgmi) to access hardware when 
>>> doing GPU reset and resume. So refine recovery thread as the following:
>>> >
>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>> >     1). if failed, it means system had a recovery thread running, 
>>> current thread exit directly;
>>> >     2). if success, enter recovery thread;
>>> >
>>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>>> unreceived fences and so on. It try to stop or pause other threads.
>>> >
>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>> which will block recovery thread until other threads release read locks.
>>>
>>> Those two steps need to be exchanged or otherwise it is possible 
>>> that new delayed work items etc are started before the lock is taken.
>>>
>>> Just to make it clear until this is fixed the whole patch set is a NAK.
>>>
>>> Regards,
>>> Christian.
>>>
>>> >
>>> > Step 3: normally, there is only recovery threads running to access 
>>> hardware, it is safe to do gpu reset now.
>>> >
>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>> >
>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads 
>>> and release write lock. Recovery thread exit normally.
>>> >
>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>> release read lock if it has holden one, and then blocks itself to 
>>> wait for recovery finished event. If thread successfully hold read 
>>> lock and in_gpu_reset is 0, it continues. It will exit normally or 
>>> be stopped by recovery thread in step 1.
>>> >
>>> > Dennis Li (4):
>>> >    drm/amdgpu: remove reset lock from low level functions
>>> >    drm/amdgpu: refine the GPU recovery sequence
>>> >    drm/amdgpu: instead of using down/up_read directly
>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>> >
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 
>>> +++++++++++++-----
>>> >   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 
>>> ++++++++++++++++-
>>> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>> >   .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>> >
>>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 22335 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-06 10:34             ` Christian König
@ 2021-04-06 11:21               ` Christian König
  2021-04-06 21:22               ` Andrey Grodzovsky
  1 sibling, 0 replies; 56+ messages in thread
From: Christian König @ 2021-04-06 11:21 UTC (permalink / raw)
  To: Andrey Grodzovsky, Li, Dennis, amd-gfx, Deucher, Alexander,
	Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 11218 bytes --]



Am 06.04.21 um 12:34 schrieb Christian König:
> Hi Andrey,
>
> well good question. My job is to watch over the implementation and 
> design and while I always help I can adjust anybodies schedule.

That should read "I can't adjust anybodies schedule".

Christian.

>
> Is the patch to print a warning when the hardware is accessed without 
> holding the locks merged yet? If not then that would probably be a 
> good starting point.
>
> Then we would need to unify this with the SRCU to make sure that we 
> have both the reset lock as well as block the hotplug code from 
> reusing the MMIO space.
>
> And then testing, testing, testing to see if we have missed something.
>
> Christian.
>
> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>
>> Denis, Christian, are there any updates in the plan on how to move on 
>> with this ? As you know I need very similar code for my up-streaming 
>> of device hot-unplug. My latest solution 
>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>> was not acceptable because of low level guards on the register 
>> accessors level which was hurting performance. Basically I need a way 
>> to prevent any MMIO write accesses from kernel driver after device is 
>> removed (UMD accesses are taken care of by page faulting dummy page). 
>> We are using now hot-unplug code for Freemont program and so 
>> up-streaming became more of a priority then before. This MMIO access 
>> issue is currently my main blocker from up-streaming. Is there any 
>> way I can assist in pushing this on ?
>>
>> Andrey
>>
>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>
>>>> >>> The GPU reset doesn't complete the fences we wait for. It only 
>>>> completes the hardware fences as part of the reset.
>>>>
>>>> >>> So waiting for a fence while holding the reset lock is illegal 
>>>> and needs to be avoided.
>>>>
>>>> I understood your concern. It is more complex for DRM GFX, 
>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>> Maybe we can try to add all kernel  dma_fence waiting in a list, 
>>>> and signal all in recovery threads. Do you have same concern for 
>>>> compute cases?
>>>>
>>>
>>> Yes, compute (KFD) is even harder to handle.
>>>
>>> See you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>> also means you wait for the GPU reset to finish.
>>>
>>> When we would signal the dma_fence during the GPU reset then we 
>>> would run into memory corruption because the hardware jobs running 
>>> after the GPU reset would access memory which is already freed.
>>>
>>>> >>> Lockdep also complains about this when it is used correctly. 
>>>> The only reason it doesn't complain here is because you use an 
>>>> atomic+wait_event instead of a locking primitive.
>>>>
>>>> Agree. This approach will escape the monitor of lockdep.  Its goal 
>>>> is to block other threads when GPU recovery thread start. But I 
>>>> couldn’t find a better method to solve this problem. Do you have 
>>>> some suggestion?
>>>>
>>>
>>> Well, completely abandon those change here.
>>>
>>> What we need to do is to identify where hardware access happens and 
>>> then insert taking the read side of the GPU reset lock so that we 
>>> don't wait for a dma_fence or allocate memory, but still protect the 
>>> hardware from concurrent access and reset.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Best Regards
>>>>
>>>> Dennis Li
>>>>
>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> Exactly that's what you don't seem to understand.
>>>>
>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>> completes the hardware fences as part of the reset.
>>>>
>>>> So waiting for a fence while holding the reset lock is illegal and 
>>>> needs to be avoided.
>>>>
>>>> Lockdep also complains about this when it is used correctly. The 
>>>> only reason it doesn't complain here is because you use an 
>>>> atomic+wait_event instead of a locking primitive.
>>>>
>>>> Regards,
>>>>
>>>> Christian.
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>> <amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>> possible that new delayed work items etc are started before the 
>>>> lock is taken.
>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>>> example, the user thread hold the read lock and waiting for the 
>>>> fence, if recovery thread try to hold write lock and then complete 
>>>> fences, in this case, recovery thread will always be blocked.
>>>>
>>>>
>>>> Best Regards
>>>> Dennis Li
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>> <mailto:Christian.Koenig@amd.com>>
>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>> amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>> > We have defined two variables in_gpu_reset and reset_sem in adev 
>>>> object. The atomic type variable in_gpu_reset is used to avoid 
>>>> recovery thread reenter and make lower functions return more 
>>>> earlier when recovery start, but couldn't block recovery thread 
>>>> when it access hardware. The r/w semaphore reset_sem is used to 
>>>> solve these synchronization issues between recovery thread and 
>>>> other threads.
>>>> >
>>>> > The original solution locked registers' access in lower 
>>>> functions, which will introduce following issues:
>>>> >
>>>> > 1) many lower functions are used in both recovery thread and 
>>>> others. Firstly we must harvest these functions, it is easy to miss 
>>>> someones. Secondly these functions need select which lock (read 
>>>> lock or write lock) will be used, according to the thread it is 
>>>> running in. If the thread context isn't considered, the added lock 
>>>> will easily introduce deadlock. Besides that, in most time, 
>>>> developer easily forget to add locks for new functions.
>>>> >
>>>> > 2) performance drop. More lower functions are more frequently called.
>>>> >
>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>> write lock has big range in recovery thread, but low level 
>>>> functions will hold read lock may be protected by other locks in 
>>>> other threads.
>>>> >
>>>> > Therefore the new solution will try to add lock protection for 
>>>> ioctls of kfd. Its goal is that there are no threads except for 
>>>> recovery thread or its children (for xgmi) to access hardware when 
>>>> doing GPU reset and resume. So refine recovery thread as the following:
>>>> >
>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>> >     1). if failed, it means system had a recovery thread running, 
>>>> current thread exit directly;
>>>> >     2). if success, enter recovery thread;
>>>> >
>>>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>>>> unreceived fences and so on. It try to stop or pause other threads.
>>>> >
>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>> which will block recovery thread until other threads release read 
>>>> locks.
>>>>
>>>> Those two steps need to be exchanged or otherwise it is possible 
>>>> that new delayed work items etc are started before the lock is taken.
>>>>
>>>> Just to make it clear until this is fixed the whole patch set is a NAK.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> >
>>>> > Step 3: normally, there is only recovery threads running to 
>>>> access hardware, it is safe to do gpu reset now.
>>>> >
>>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>>> >
>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads 
>>>> and release write lock. Recovery thread exit normally.
>>>> >
>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>> release read lock if it has holden one, and then blocks itself to 
>>>> wait for recovery finished event. If thread successfully hold read 
>>>> lock and in_gpu_reset is 0, it continues. It will exit normally or 
>>>> be stopped by recovery thread in step 1.
>>>> >
>>>> > Dennis Li (4):
>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>> >
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 173 +++++++++++++-----
>>>> >   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++++++++++++++++-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>> >   .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>> >
>>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 24020 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-06 10:34             ` Christian König
  2021-04-06 11:21               ` Christian König
@ 2021-04-06 21:22               ` Andrey Grodzovsky
  2021-04-07 10:28                 ` Christian König
  1 sibling, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-06 21:22 UTC (permalink / raw)
  To: Christian König, Li, Dennis, amd-gfx, Deucher, Alexander,
	Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 14383 bytes --]

Hey Christian, Denis, see bellow -

On 2021-04-06 6:34 a.m., Christian König wrote:
> Hi Andrey,
>
> well good question. My job is to watch over the implementation and 
> design and while I always help I can adjust anybodies schedule.
>
> Is the patch to print a warning when the hardware is accessed without 
> holding the locks merged yet? If not then that would probably be a 
> good starting point.


It's merged into amd-staging-drm-next and since I work on drm-misc-next 
I will cherry-pick it into there.


>
> Then we would need to unify this with the SRCU to make sure that we 
> have both the reset lock as well as block the hotplug code from 
> reusing the MMIO space.

In my understanding there is a significant difference between handling 
of GPU reset and unplug - while GPU reset use case requires any HW 
accessing code to block and wait for the reset to finish and then 
proceed, hot-unplug
is permanent and hence no need to wait and proceed but rather abort at 
once. This why I think that in any place we already check for device 
reset we should also add a check for hot-unplug but the handling would 
be different
in that for hot-unplug we would abort instead of keep waiting.

Similar to handling device reset for unplug we obviously also need to 
stop and block any MMIO accesses once device is unplugged and, as Daniel 
Vetter mentioned - we have to do it before finishing pci_remove (early 
device fini)
and not later (when last device reference is dropped from user space) in 
order to prevent reuse of MMIO space we still access by other hot 
plugging devices. As in device reset case we need to cancel all delay 
works, stop drm schedule, complete all unfinished fences(both HW and 
scheduler fences). While you stated strong objection to force signalling 
scheduler fences from GPU reset, quote:

"you can't signal the dma_fence waiting. Waiting for a dma_fence also 
means you wait for the GPU reset to finish. When we would signal the 
dma_fence during the GPU reset then we would run into memory corruption 
because the hardware jobs running after the GPU reset would access 
memory which is already freed."
To my understating this is a key difference with hot-unplug, the device 
is gone, all those concerns are irrelevant and hence we can actually 
force signal scheduler fences (setting and error to them before) to 
force completion of any
waiting clients such as possibly IOCTLs or async page flips e.t.c.

Beyond blocking all delayed works and scheduler threads we also need to 
guarantee no  IOCTL can access MMIO post device unplug OR in flight 
IOCTLs are done before we finish pci_remove (amdgpu_pci_remove for us).
For this I suggest we do something like what we worked on with Takashi 
Iwai the ALSA maintainer recently when he helped implementing PCI BARs 
move support for snd_hda_intel. Take a look at
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
and
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
We also had same issue there, how to prevent MMIO accesses while the 
BARs are migrating. What was done there is a refcount was added to count 
all IOCTLs in flight, for any in flight  IOCTL the BAR migration handler 
would
block for the refcount to drop to 0 before it would proceed, for any 
later IOCTL it stops and wait if device is in migration state. We even 
don't need the wait part, nothing to wait for, we just return with 
-ENODEV for this case.

The above approach should allow us to wait for all the IOCTLs in flight, 
together with stopping scheduler threads and cancelling and flushing all 
in flight work items and timers i think It should give as full solution 
for the hot-unplug case
of preventing any MMIO accesses post device pci_remove.

Let me know what you think guys.

Andrey


>
> And then testing, testing, testing to see if we have missed something.
>
> Christian.
>
> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>
>> Denis, Christian, are there any updates in the plan on how to move on 
>> with this ? As you know I need very similar code for my up-streaming 
>> of device hot-unplug. My latest solution 
>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>> was not acceptable because of low level guards on the register 
>> accessors level which was hurting performance. Basically I need a way 
>> to prevent any MMIO write accesses from kernel driver after device is 
>> removed (UMD accesses are taken care of by page faulting dummy page). 
>> We are using now hot-unplug code for Freemont program and so 
>> up-streaming became more of a priority then before. This MMIO access 
>> issue is currently my main blocker from up-streaming. Is there any 
>> way I can assist in pushing this on ?
>>
>> Andrey
>>
>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>
>>>> >>> The GPU reset doesn't complete the fences we wait for. It only 
>>>> completes the hardware fences as part of the reset.
>>>>
>>>> >>> So waiting for a fence while holding the reset lock is illegal 
>>>> and needs to be avoided.
>>>>
>>>> I understood your concern. It is more complex for DRM GFX, 
>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>> Maybe we can try to add all kernel  dma_fence waiting in a list, 
>>>> and signal all in recovery threads. Do you have same concern for 
>>>> compute cases?
>>>>
>>>
>>> Yes, compute (KFD) is even harder to handle.
>>>
>>> See you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>> also means you wait for the GPU reset to finish.
>>>
>>> When we would signal the dma_fence during the GPU reset then we 
>>> would run into memory corruption because the hardware jobs running 
>>> after the GPU reset would access memory which is already freed.
>>>
>>>> >>> Lockdep also complains about this when it is used correctly. 
>>>> The only reason it doesn't complain here is because you use an 
>>>> atomic+wait_event instead of a locking primitive.
>>>>
>>>> Agree. This approach will escape the monitor of lockdep.  Its goal 
>>>> is to block other threads when GPU recovery thread start. But I 
>>>> couldn’t find a better method to solve this problem. Do you have 
>>>> some suggestion?
>>>>
>>>
>>> Well, completely abandon those change here.
>>>
>>> What we need to do is to identify where hardware access happens and 
>>> then insert taking the read side of the GPU reset lock so that we 
>>> don't wait for a dma_fence or allocate memory, but still protect the 
>>> hardware from concurrent access and reset.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> Best Regards
>>>>
>>>> Dennis Li
>>>>
>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> Exactly that's what you don't seem to understand.
>>>>
>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>> completes the hardware fences as part of the reset.
>>>>
>>>> So waiting for a fence while holding the reset lock is illegal and 
>>>> needs to be avoided.
>>>>
>>>> Lockdep also complains about this when it is used correctly. The 
>>>> only reason it doesn't complain here is because you use an 
>>>> atomic+wait_event instead of a locking primitive.
>>>>
>>>> Regards,
>>>>
>>>> Christian.
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>> <amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>> possible that new delayed work items etc are started before the 
>>>> lock is taken.
>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>>> example, the user thread hold the read lock and waiting for the 
>>>> fence, if recovery thread try to hold write lock and then complete 
>>>> fences, in this case, recovery thread will always be blocked.
>>>>
>>>>
>>>> Best Regards
>>>> Dennis Li
>>>> -----Original Message-----
>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>> <mailto:Christian.Koenig@amd.com>>
>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>> amd-gfx@lists.freedesktop.org 
>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>> its stability
>>>>
>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>> > We have defined two variables in_gpu_reset and reset_sem in adev 
>>>> object. The atomic type variable in_gpu_reset is used to avoid 
>>>> recovery thread reenter and make lower functions return more 
>>>> earlier when recovery start, but couldn't block recovery thread 
>>>> when it access hardware. The r/w semaphore reset_sem is used to 
>>>> solve these synchronization issues between recovery thread and 
>>>> other threads.
>>>> >
>>>> > The original solution locked registers' access in lower 
>>>> functions, which will introduce following issues:
>>>> >
>>>> > 1) many lower functions are used in both recovery thread and 
>>>> others. Firstly we must harvest these functions, it is easy to miss 
>>>> someones. Secondly these functions need select which lock (read 
>>>> lock or write lock) will be used, according to the thread it is 
>>>> running in. If the thread context isn't considered, the added lock 
>>>> will easily introduce deadlock. Besides that, in most time, 
>>>> developer easily forget to add locks for new functions.
>>>> >
>>>> > 2) performance drop. More lower functions are more frequently called.
>>>> >
>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>> write lock has big range in recovery thread, but low level 
>>>> functions will hold read lock may be protected by other locks in 
>>>> other threads.
>>>> >
>>>> > Therefore the new solution will try to add lock protection for 
>>>> ioctls of kfd. Its goal is that there are no threads except for 
>>>> recovery thread or its children (for xgmi) to access hardware when 
>>>> doing GPU reset and resume. So refine recovery thread as the following:
>>>> >
>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>> >     1). if failed, it means system had a recovery thread running, 
>>>> current thread exit directly;
>>>> >     2). if success, enter recovery thread;
>>>> >
>>>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>>>> unreceived fences and so on. It try to stop or pause other threads.
>>>> >
>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>> which will block recovery thread until other threads release read 
>>>> locks.
>>>>
>>>> Those two steps need to be exchanged or otherwise it is possible 
>>>> that new delayed work items etc are started before the lock is taken.
>>>>
>>>> Just to make it clear until this is fixed the whole patch set is a NAK.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> >
>>>> > Step 3: normally, there is only recovery threads running to 
>>>> access hardware, it is safe to do gpu reset now.
>>>> >
>>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>>> >
>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other threads 
>>>> and release write lock. Recovery thread exit normally.
>>>> >
>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>> release read lock if it has holden one, and then blocks itself to 
>>>> wait for recovery finished event. If thread successfully hold read 
>>>> lock and in_gpu_reset is 0, it continues. It will exit normally or 
>>>> be stopped by recovery thread in step 1.
>>>> >
>>>> > Dennis Li (4):
>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>> >
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 173 +++++++++++++-----
>>>> >   .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>> >   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++++++++++++++++-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>> >   .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>> >
>>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 27815 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-06 21:22               ` Andrey Grodzovsky
@ 2021-04-07 10:28                 ` Christian König
  2021-04-07 19:44                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-07 10:28 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 15922 bytes --]

Hi Andrey,

Am 06.04.21 um 23:22 schrieb Andrey Grodzovsky:
>
> Hey Christian, Denis, see bellow -
>
> On 2021-04-06 6:34 a.m., Christian König wrote:
>> Hi Andrey,
>>
>> well good question. My job is to watch over the implementation and 
>> design and while I always help I can adjust anybodies schedule.
>>
>> Is the patch to print a warning when the hardware is accessed without 
>> holding the locks merged yet? If not then that would probably be a 
>> good starting point.
>
>
> It's merged into amd-staging-drm-next and since I work on 
> drm-misc-next I will cherry-pick it into there.
>

Ok good to know, I haven't tracked that one further.

>
>>
>> Then we would need to unify this with the SRCU to make sure that we 
>> have both the reset lock as well as block the hotplug code from 
>> reusing the MMIO space.
>
> In my understanding there is a significant difference between handling 
> of GPU reset and unplug - while GPU reset use case requires any HW 
> accessing code to block and wait for the reset to finish and then 
> proceed, hot-unplug
> is permanent and hence no need to wait and proceed but rather abort at 
> once.
>

Yes, absolutely correct.

> This why I think that in any place we already check for device reset 
> we should also add a check for hot-unplug but the handling would be 
> different
> in that for hot-unplug we would abort instead of keep waiting.
>

Yes, that's the rough picture in my head as well.

Essentially Daniels patch of having an 
amdgpu_device_hwaccess_begin()/_end() was the right approach. You just 
can't do it in the top level IOCTL handler, but rather need it somewhere 
between front end and backend.

> Similar to handling device reset for unplug we obviously also need to 
> stop and block any MMIO accesses once device is unplugged and, as 
> Daniel Vetter mentioned - we have to do it before finishing pci_remove 
> (early device fini)
> and not later (when last device reference is dropped from user space) 
> in order to prevent reuse of MMIO space we still access by other hot 
> plugging devices. As in device reset case we need to cancel all delay 
> works, stop drm schedule, complete all unfinished fences(both HW and 
> scheduler fences). While you stated strong objection to force 
> signalling scheduler fences from GPU reset, quote:
>
> "you can't signal the dma_fence waiting. Waiting for a dma_fence also 
> means you wait for the GPU reset to finish. When we would signal the 
> dma_fence during the GPU reset then we would run into memory 
> corruption because the hardware jobs running after the GPU reset would 
> access memory which is already freed."
> To my understating this is a key difference with hot-unplug, the 
> device is gone, all those concerns are irrelevant and hence we can 
> actually force signal scheduler fences (setting and error to them 
> before) to force completion of any
> waiting clients such as possibly IOCTLs or async page flips e.t.c.
>

Yes, absolutely correct. That's what I also mentioned to Daniel. When we 
are able to nuke the device and any memory access it might do we can 
also signal the fences.

> Beyond blocking all delayed works and scheduler threads we also need 
> to guarantee no  IOCTL can access MMIO post device unplug OR in flight 
> IOCTLs are done before we finish pci_remove (amdgpu_pci_remove for us).
> For this I suggest we do something like what we worked on with Takashi 
> Iwai the ALSA maintainer recently when he helped implementing PCI BARs 
> move support for snd_hda_intel. Take a look at
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
> and
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
> We also had same issue there, how to prevent MMIO accesses while the 
> BARs are migrating. What was done there is a refcount was added to 
> count all IOCTLs in flight, for any in flight  IOCTL the BAR migration 
> handler would
> block for the refcount to drop to 0 before it would proceed, for any 
> later IOCTL it stops and wait if device is in migration state. We even 
> don't need the wait part, nothing to wait for, we just return with 
> -ENODEV for this case.
>

This is essentially what the DRM SRCU is doing as well.

For the hotplug case we could do this in the toplevel since we can 
signal the fence and don't need to block memory management.

But I'm not sure, maybe we should handle it the same way as reset or 
maybe we should have it at the top level.

Regards,
Christian.

> The above approach should allow us to wait for all the IOCTLs in 
> flight, together with stopping scheduler threads and cancelling and 
> flushing all in flight work items and timers i think It should give as 
> full solution for the hot-unplug case
> of preventing any MMIO accesses post device pci_remove.
>
> Let me know what you think guys.
>
> Andrey
>
>
>>
>> And then testing, testing, testing to see if we have missed something.
>>
>> Christian.
>>
>> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>>
>>> Denis, Christian, are there any updates in the plan on how to move 
>>> on with this ? As you know I need very similar code for my 
>>> up-streaming of device hot-unplug. My latest solution 
>>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>>> was not acceptable because of low level guards on the register 
>>> accessors level which was hurting performance. Basically I need a 
>>> way to prevent any MMIO write accesses from kernel driver after 
>>> device is removed (UMD accesses are taken care of by page faulting 
>>> dummy page). We are using now hot-unplug code for Freemont program 
>>> and so up-streaming became more of a priority then before. This MMIO 
>>> access issue is currently my main blocker from up-streaming. Is 
>>> there any way I can assist in pushing this on ?
>>>
>>> Andrey
>>>
>>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>>
>>>>> >>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>> completes the hardware fences as part of the reset.
>>>>>
>>>>> >>> So waiting for a fence while holding the reset lock is illegal 
>>>>> and needs to be avoided.
>>>>>
>>>>> I understood your concern. It is more complex for DRM GFX, 
>>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>>> Maybe we can try to add all kernel  dma_fence waiting in a list, 
>>>>> and signal all in recovery threads. Do you have same concern for 
>>>>> compute cases?
>>>>>
>>>>
>>>> Yes, compute (KFD) is even harder to handle.
>>>>
>>>> See you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>>> also means you wait for the GPU reset to finish.
>>>>
>>>> When we would signal the dma_fence during the GPU reset then we 
>>>> would run into memory corruption because the hardware jobs running 
>>>> after the GPU reset would access memory which is already freed.
>>>>
>>>>> >>> Lockdep also complains about this when it is used correctly. 
>>>>> The only reason it doesn't complain here is because you use an 
>>>>> atomic+wait_event instead of a locking primitive.
>>>>>
>>>>> Agree. This approach will escape the monitor of lockdep.  Its goal 
>>>>> is to block other threads when GPU recovery thread start. But I 
>>>>> couldn’t find a better method to solve this problem. Do you have 
>>>>> some suggestion?
>>>>>
>>>>
>>>> Well, completely abandon those change here.
>>>>
>>>> What we need to do is to identify where hardware access happens and 
>>>> then insert taking the read side of the GPU reset lock so that we 
>>>> don't wait for a dma_fence or allocate memory, but still protect 
>>>> the hardware from concurrent access and reset.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Best Regards
>>>>>
>>>>> Dennis Li
>>>>>
>>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>>> its stability
>>>>>
>>>>> Exactly that's what you don't seem to understand.
>>>>>
>>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>> completes the hardware fences as part of the reset.
>>>>>
>>>>> So waiting for a fence while holding the reset lock is illegal and 
>>>>> needs to be avoided.
>>>>>
>>>>> Lockdep also complains about this when it is used correctly. The 
>>>>> only reason it doesn't complain here is because you use an 
>>>>> atomic+wait_event instead of a locking primitive.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Christian.
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>>> <amd-gfx@lists.freedesktop.org 
>>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>>> its stability
>>>>>
>>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>>> possible that new delayed work items etc are started before the 
>>>>> lock is taken.
>>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>>>> example, the user thread hold the read lock and waiting for the 
>>>>> fence, if recovery thread try to hold write lock and then complete 
>>>>> fences, in this case, recovery thread will always be blocked.
>>>>>
>>>>>
>>>>> Best Regards
>>>>> Dennis Li
>>>>> -----Original Message-----
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>>> <mailto:Christian.Koenig@amd.com>>
>>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>>> amd-gfx@lists.freedesktop.org 
>>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>>> its stability
>>>>>
>>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>>> > We have defined two variables in_gpu_reset and reset_sem in adev 
>>>>> object. The atomic type variable in_gpu_reset is used to avoid 
>>>>> recovery thread reenter and make lower functions return more 
>>>>> earlier when recovery start, but couldn't block recovery thread 
>>>>> when it access hardware. The r/w semaphore reset_sem is used to 
>>>>> solve these synchronization issues between recovery thread and 
>>>>> other threads.
>>>>> >
>>>>> > The original solution locked registers' access in lower 
>>>>> functions, which will introduce following issues:
>>>>> >
>>>>> > 1) many lower functions are used in both recovery thread and 
>>>>> others. Firstly we must harvest these functions, it is easy to 
>>>>> miss someones. Secondly these functions need select which lock 
>>>>> (read lock or write lock) will be used, according to the thread it 
>>>>> is running in. If the thread context isn't considered, the added 
>>>>> lock will easily introduce deadlock. Besides that, in most time, 
>>>>> developer easily forget to add locks for new functions.
>>>>> >
>>>>> > 2) performance drop. More lower functions are more frequently 
>>>>> called.
>>>>> >
>>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>>> write lock has big range in recovery thread, but low level 
>>>>> functions will hold read lock may be protected by other locks in 
>>>>> other threads.
>>>>> >
>>>>> > Therefore the new solution will try to add lock protection for 
>>>>> ioctls of kfd. Its goal is that there are no threads except for 
>>>>> recovery thread or its children (for xgmi) to access hardware when 
>>>>> doing GPU reset and resume. So refine recovery thread as the 
>>>>> following:
>>>>> >
>>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>>> >     1). if failed, it means system had a recovery thread 
>>>>> running, current thread exit directly;
>>>>> >     2). if success, enter recovery thread;
>>>>> >
>>>>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>>>>> unreceived fences and so on. It try to stop or pause other threads.
>>>>> >
>>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>>> which will block recovery thread until other threads release read 
>>>>> locks.
>>>>>
>>>>> Those two steps need to be exchanged or otherwise it is possible 
>>>>> that new delayed work items etc are started before the lock is taken.
>>>>>
>>>>> Just to make it clear until this is fixed the whole patch set is a 
>>>>> NAK.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> >
>>>>> > Step 3: normally, there is only recovery threads running to 
>>>>> access hardware, it is safe to do gpu reset now.
>>>>> >
>>>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>>>> >
>>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other 
>>>>> threads and release write lock. Recovery thread exit normally.
>>>>> >
>>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>>> release read lock if it has holden one, and then blocks itself to 
>>>>> wait for recovery finished event. If thread successfully hold read 
>>>>> lock and in_gpu_reset is 0, it continues. It will exit normally or 
>>>>> be stopped by recovery thread in step 1.
>>>>> >
>>>>> > Dennis Li (4):
>>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>>> >
>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 6 +
>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 14 +-
>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 
>>>>> +++++++++++++-----
>>>>> > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    | 8 -
>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        | 4 +-
>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         | 9 +-
>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         | 5 +-
>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c         | 5 +-
>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 
>>>>> ++++++++++++++++-
>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h         | 3 +-
>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 4 +
>>>>> > .../amd/amdkfd/kfd_process_queue_manager.c    | 17 ++
>>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>>> >
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 32181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-07 10:28                 ` Christian König
@ 2021-04-07 19:44                   ` Andrey Grodzovsky
  2021-04-08  8:22                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-07 19:44 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 16839 bytes --]


On 2021-04-07 6:28 a.m., Christian König wrote:
> Hi Andrey,
>
> Am 06.04.21 um 23:22 schrieb Andrey Grodzovsky:
>>
>> Hey Christian, Denis, see bellow -
>>
>> On 2021-04-06 6:34 a.m., Christian König wrote:
>>> Hi Andrey,
>>>
>>> well good question. My job is to watch over the implementation and 
>>> design and while I always help I can adjust anybodies schedule.
>>>
>>> Is the patch to print a warning when the hardware is accessed 
>>> without holding the locks merged yet? If not then that would 
>>> probably be a good starting point.
>>
>>
>> It's merged into amd-staging-drm-next and since I work on 
>> drm-misc-next I will cherry-pick it into there.
>>
>
> Ok good to know, I haven't tracked that one further.
>
>>
>>>
>>> Then we would need to unify this with the SRCU to make sure that we 
>>> have both the reset lock as well as block the hotplug code from 
>>> reusing the MMIO space.
>>
>> In my understanding there is a significant difference between 
>> handling of GPU reset and unplug - while GPU reset use case requires 
>> any HW accessing code to block and wait for the reset to finish and 
>> then proceed, hot-unplug
>> is permanent and hence no need to wait and proceed but rather abort 
>> at once.
>>
>
> Yes, absolutely correct.
>
>> This why I think that in any place we already check for device reset 
>> we should also add a check for hot-unplug but the handling would be 
>> different
>> in that for hot-unplug we would abort instead of keep waiting.
>>
>
> Yes, that's the rough picture in my head as well.
>
> Essentially Daniels patch of having an 
> amdgpu_device_hwaccess_begin()/_end() was the right approach. You just 
> can't do it in the top level IOCTL handler, but rather need it 
> somewhere between front end and backend.


Can you point me to what patch was it ? Can't find.


>
>> Similar to handling device reset for unplug we obviously also need to 
>> stop and block any MMIO accesses once device is unplugged and, as 
>> Daniel Vetter mentioned - we have to do it before finishing 
>> pci_remove (early device fini)
>> and not later (when last device reference is dropped from user space) 
>> in order to prevent reuse of MMIO space we still access by other hot 
>> plugging devices. As in device reset case we need to cancel all delay 
>> works, stop drm schedule, complete all unfinished fences(both HW and 
>> scheduler fences). While you stated strong objection to force 
>> signalling scheduler fences from GPU reset, quote:
>>
>> "you can't signal the dma_fence waiting. Waiting for a dma_fence also 
>> means you wait for the GPU reset to finish. When we would signal the 
>> dma_fence during the GPU reset then we would run into memory 
>> corruption because the hardware jobs running after the GPU reset 
>> would access memory which is already freed."
>> To my understating this is a key difference with hot-unplug, the 
>> device is gone, all those concerns are irrelevant and hence we can 
>> actually force signal scheduler fences (setting and error to them 
>> before) to force completion of any
>> waiting clients such as possibly IOCTLs or async page flips e.t.c.
>>
>
> Yes, absolutely correct. That's what I also mentioned to Daniel. When 
> we are able to nuke the device and any memory access it might do we 
> can also signal the fences.
>
>> Beyond blocking all delayed works and scheduler threads we also need 
>> to guarantee no  IOCTL can access MMIO post device unplug OR in 
>> flight IOCTLs are done before we finish pci_remove (amdgpu_pci_remove 
>> for us).
>> For this I suggest we do something like what we worked on with 
>> Takashi Iwai the ALSA maintainer recently when he helped implementing 
>> PCI BARs move support for snd_hda_intel. Take a look at
>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>> and
>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
>> We also had same issue there, how to prevent MMIO accesses while the 
>> BARs are migrating. What was done there is a refcount was added to 
>> count all IOCTLs in flight, for any in flight  IOCTL the BAR 
>> migration handler would
>> block for the refcount to drop to 0 before it would proceed, for any 
>> later IOCTL it stops and wait if device is in migration state. We 
>> even don't need the wait part, nothing to wait for, we just return 
>> with -ENODEV for this case.
>>
>
> This is essentially what the DRM SRCU is doing as well.
>
> For the hotplug case we could do this in the toplevel since we can 
> signal the fence and don't need to block memory management.


To make SRCU 'wait for' all IOCTLs in flight we would need to wrap every 
IOCTL ( practically - just drm_ioctl function) with 
drm_dev_enter/drm_dev_exit - can we do it ?


>
> But I'm not sure, maybe we should handle it the same way as reset or 
> maybe we should have it at the top level.


If by top level you mean checking for device unplugged and bailing out 
at the entry to IOCTL or right at start of any work_item/timer function 
we have then seems to me it's better and more clear. Once we flushed all 
of them in flight there is no reason for them to execute any more when 
device is unplugged.

Andrey


>
> Regards,
> Christian.
>
>> The above approach should allow us to wait for all the IOCTLs in 
>> flight, together with stopping scheduler threads and cancelling and 
>> flushing all in flight work items and timers i think It should give 
>> as full solution for the hot-unplug case
>> of preventing any MMIO accesses post device pci_remove.
>>
>> Let me know what you think guys.
>>
>> Andrey
>>
>>
>>>
>>> And then testing, testing, testing to see if we have missed something.
>>>
>>> Christian.
>>>
>>> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>>>
>>>> Denis, Christian, are there any updates in the plan on how to move 
>>>> on with this ? As you know I need very similar code for my 
>>>> up-streaming of device hot-unplug. My latest solution 
>>>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>>>> was not acceptable because of low level guards on the register 
>>>> accessors level which was hurting performance. Basically I need a 
>>>> way to prevent any MMIO write accesses from kernel driver after 
>>>> device is removed (UMD accesses are taken care of by page faulting 
>>>> dummy page). We are using now hot-unplug code for Freemont program 
>>>> and so up-streaming became more of a priority then before. This 
>>>> MMIO access issue is currently my main blocker from up-streaming. 
>>>> Is there any way I can assist in pushing this on ?
>>>>
>>>> Andrey
>>>>
>>>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>>>
>>>>>> >>> The GPU reset doesn't complete the fences we wait for. It 
>>>>>> only completes the hardware fences as part of the reset.
>>>>>>
>>>>>> >>> So waiting for a fence while holding the reset lock is 
>>>>>> illegal and needs to be avoided.
>>>>>>
>>>>>> I understood your concern. It is more complex for DRM GFX, 
>>>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>>>> Maybe we can try to add all kernel  dma_fence waiting in a list, 
>>>>>> and signal all in recovery threads. Do you have same concern for 
>>>>>> compute cases?
>>>>>>
>>>>>
>>>>> Yes, compute (KFD) is even harder to handle.
>>>>>
>>>>> See you can't signal the dma_fence waiting. Waiting for a 
>>>>> dma_fence also means you wait for the GPU reset to finish.
>>>>>
>>>>> When we would signal the dma_fence during the GPU reset then we 
>>>>> would run into memory corruption because the hardware jobs running 
>>>>> after the GPU reset would access memory which is already freed.
>>>>>
>>>>>> >>> Lockdep also complains about this when it is used correctly. 
>>>>>> The only reason it doesn't complain here is because you use an 
>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>
>>>>>> Agree. This approach will escape the monitor of lockdep.  Its 
>>>>>> goal is to block other threads when GPU recovery thread start. 
>>>>>> But I couldn’t find a better method to solve this problem. Do you 
>>>>>> have some suggestion?
>>>>>>
>>>>>
>>>>> Well, completely abandon those change here.
>>>>>
>>>>> What we need to do is to identify where hardware access happens 
>>>>> and then insert taking the read side of the GPU reset lock so that 
>>>>> we don't wait for a dma_fence or allocate memory, but still 
>>>>> protect the hardware from concurrent access and reset.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>> Dennis Li
>>>>>>
>>>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>> enhance its stability
>>>>>>
>>>>>> Exactly that's what you don't seem to understand.
>>>>>>
>>>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>>> completes the hardware fences as part of the reset.
>>>>>>
>>>>>> So waiting for a fence while holding the reset lock is illegal 
>>>>>> and needs to be avoided.
>>>>>>
>>>>>> Lockdep also complains about this when it is used correctly. The 
>>>>>> only reason it doesn't complain here is because you use an 
>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>>>> <mailto:Christian.Koenig@amd.com>>; amd-gfx@lists.freedesktop.org 
>>>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>>>> <amd-gfx@lists.freedesktop.org 
>>>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>> enhance its stability
>>>>>>
>>>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>>>> possible that new delayed work items etc are started before the 
>>>>>> lock is taken.
>>>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>>>> exchange the two steps, it maybe introduce the deadlock. For 
>>>>>> example, the user thread hold the read lock and waiting for the 
>>>>>> fence, if recovery thread try to hold write lock and then 
>>>>>> complete fences, in this case, recovery thread will always be 
>>>>>> blocked.
>>>>>>
>>>>>>
>>>>>> Best Regards
>>>>>> Dennis Li
>>>>>> -----Original Message-----
>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>>>> <mailto:Christian.Koenig@amd.com>>
>>>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>>>> its stability
>>>>>>
>>>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>>>> > We have defined two variables in_gpu_reset and reset_sem in 
>>>>>> adev object. The atomic type variable in_gpu_reset is used to 
>>>>>> avoid recovery thread reenter and make lower functions return 
>>>>>> more earlier when recovery start, but couldn't block recovery 
>>>>>> thread when it access hardware. The r/w semaphore reset_sem is 
>>>>>> used to solve these synchronization issues between recovery 
>>>>>> thread and other threads.
>>>>>> >
>>>>>> > The original solution locked registers' access in lower 
>>>>>> functions, which will introduce following issues:
>>>>>> >
>>>>>> > 1) many lower functions are used in both recovery thread and 
>>>>>> others. Firstly we must harvest these functions, it is easy to 
>>>>>> miss someones. Secondly these functions need select which lock 
>>>>>> (read lock or write lock) will be used, according to the thread 
>>>>>> it is running in. If the thread context isn't considered, the 
>>>>>> added lock will easily introduce deadlock. Besides that, in most 
>>>>>> time, developer easily forget to add locks for new functions.
>>>>>> >
>>>>>> > 2) performance drop. More lower functions are more frequently 
>>>>>> called.
>>>>>> >
>>>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>>>> write lock has big range in recovery thread, but low level 
>>>>>> functions will hold read lock may be protected by other locks in 
>>>>>> other threads.
>>>>>> >
>>>>>> > Therefore the new solution will try to add lock protection for 
>>>>>> ioctls of kfd. Its goal is that there are no threads except for 
>>>>>> recovery thread or its children (for xgmi) to access hardware 
>>>>>> when doing GPU reset and resume. So refine recovery thread as the 
>>>>>> following:
>>>>>> >
>>>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>>>> >     1). if failed, it means system had a recovery thread 
>>>>>> running, current thread exit directly;
>>>>>> >     2). if success, enter recovery thread;
>>>>>> >
>>>>>> > Step 1: cancel all delay works, stop drm schedule, complete all 
>>>>>> unreceived fences and so on. It try to stop or pause other threads.
>>>>>> >
>>>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>>>> which will block recovery thread until other threads release read 
>>>>>> locks.
>>>>>>
>>>>>> Those two steps need to be exchanged or otherwise it is possible 
>>>>>> that new delayed work items etc are started before the lock is taken.
>>>>>>
>>>>>> Just to make it clear until this is fixed the whole patch set is 
>>>>>> a NAK.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>> >
>>>>>> > Step 3: normally, there is only recovery threads running to 
>>>>>> access hardware, it is safe to do gpu reset now.
>>>>>> >
>>>>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>>>>> >
>>>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other 
>>>>>> threads and release write lock. Recovery thread exit normally.
>>>>>> >
>>>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>>>> release read lock if it has holden one, and then blocks itself to 
>>>>>> wait for recovery finished event. If thread successfully hold 
>>>>>> read lock and in_gpu_reset is 0, it continues. It will exit 
>>>>>> normally or be stopped by recovery thread in step 1.
>>>>>> >
>>>>>> > Dennis Li (4):
>>>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>>>> >
>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c   | 14 +-
>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 173 
>>>>>> +++++++++++++-----
>>>>>> > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 172 
>>>>>> ++++++++++++++++-
>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>>>> > .../amd/amdkfd/kfd_process_queue_manager.c    | 17 ++
>>>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 37398 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-07 19:44                   ` Andrey Grodzovsky
@ 2021-04-08  8:22                     ` Christian König
  2021-04-08  8:32                       ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-08  8:22 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 17577 bytes --]

Hi Andrey,

Am 07.04.21 um 21:44 schrieb Andrey Grodzovsky:
>
>
> On 2021-04-07 6:28 a.m., Christian König wrote:
>> Hi Andrey,
>>
>> Am 06.04.21 um 23:22 schrieb Andrey Grodzovsky:
>>>
>>> Hey Christian, Denis, see bellow -
>>>
>>> On 2021-04-06 6:34 a.m., Christian König wrote:
>>>> Hi Andrey,
>>>>
>>>> well good question. My job is to watch over the implementation and 
>>>> design and while I always help I can adjust anybodies schedule.
>>>>
>>>> Is the patch to print a warning when the hardware is accessed 
>>>> without holding the locks merged yet? If not then that would 
>>>> probably be a good starting point.
>>>
>>>
>>> It's merged into amd-staging-drm-next and since I work on 
>>> drm-misc-next I will cherry-pick it into there.
>>>
>>
>> Ok good to know, I haven't tracked that one further.
>>
>>>
>>>>
>>>> Then we would need to unify this with the SRCU to make sure that we 
>>>> have both the reset lock as well as block the hotplug code from 
>>>> reusing the MMIO space.
>>>
>>> In my understanding there is a significant difference between 
>>> handling of GPU reset and unplug - while GPU reset use case requires 
>>> any HW accessing code to block and wait for the reset to finish and 
>>> then proceed, hot-unplug
>>> is permanent and hence no need to wait and proceed but rather abort 
>>> at once.
>>>
>>
>> Yes, absolutely correct.
>>
>>> This why I think that in any place we already check for device reset 
>>> we should also add a check for hot-unplug but the handling would be 
>>> different
>>> in that for hot-unplug we would abort instead of keep waiting.
>>>
>>
>> Yes, that's the rough picture in my head as well.
>>
>> Essentially Daniels patch of having an 
>> amdgpu_device_hwaccess_begin()/_end() was the right approach. You 
>> just can't do it in the top level IOCTL handler, but rather need it 
>> somewhere between front end and backend.
>
>
> Can you point me to what patch was it ? Can't find.
>

What I mean was the approach in patch #3 in this series where he 
replaced the down_read/up_read with amdgpu_read_lock()/amdgpu_read_unlock().

I would just not call it amdgpu_read_lock()/amdgpu_read_unlock(), but 
something more descriptive.

Regards,
Christian.

>
>>
>>> Similar to handling device reset for unplug we obviously also need 
>>> to stop and block any MMIO accesses once device is unplugged and, as 
>>> Daniel Vetter mentioned - we have to do it before finishing 
>>> pci_remove (early device fini)
>>> and not later (when last device reference is dropped from user 
>>> space) in order to prevent reuse of MMIO space we still access by 
>>> other hot plugging devices. As in device reset case we need to 
>>> cancel all delay works, stop drm schedule, complete all unfinished 
>>> fences(both HW and scheduler fences). While you stated strong 
>>> objection to force signalling scheduler fences from GPU reset, quote:
>>>
>>> "you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>> also means you wait for the GPU reset to finish. When we would 
>>> signal the dma_fence during the GPU reset then we would run into 
>>> memory corruption because the hardware jobs running after the GPU 
>>> reset would access memory which is already freed."
>>> To my understating this is a key difference with hot-unplug, the 
>>> device is gone, all those concerns are irrelevant and hence we can 
>>> actually force signal scheduler fences (setting and error to them 
>>> before) to force completion of any
>>> waiting clients such as possibly IOCTLs or async page flips e.t.c.
>>>
>>
>> Yes, absolutely correct. That's what I also mentioned to Daniel. When 
>> we are able to nuke the device and any memory access it might do we 
>> can also signal the fences.
>>
>>> Beyond blocking all delayed works and scheduler threads we also need 
>>> to guarantee no  IOCTL can access MMIO post device unplug OR in 
>>> flight IOCTLs are done before we finish pci_remove 
>>> (amdgpu_pci_remove for us).
>>> For this I suggest we do something like what we worked on with 
>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>> implementing PCI BARs move support for snd_hda_intel. Take a look at
>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>> and
>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
>>> We also had same issue there, how to prevent MMIO accesses while the 
>>> BARs are migrating. What was done there is a refcount was added to 
>>> count all IOCTLs in flight, for any in flight  IOCTL the BAR 
>>> migration handler would
>>> block for the refcount to drop to 0 before it would proceed, for any 
>>> later IOCTL it stops and wait if device is in migration state. We 
>>> even don't need the wait part, nothing to wait for, we just return 
>>> with -ENODEV for this case.
>>>
>>
>> This is essentially what the DRM SRCU is doing as well.
>>
>> For the hotplug case we could do this in the toplevel since we can 
>> signal the fence and don't need to block memory management.
>
>
> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
> every IOCTL ( practically - just drm_ioctl function) with 
> drm_dev_enter/drm_dev_exit - can we do it ?
>
>
>>
>> But I'm not sure, maybe we should handle it the same way as reset or 
>> maybe we should have it at the top level.
>
>
> If by top level you mean checking for device unplugged and bailing out 
> at the entry to IOCTL or right at start of any work_item/timer 
> function we have then seems to me it's better and more clear. Once we 
> flushed all of them in flight there is no reason for them to execute 
> any more when device is unplugged.
>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>> The above approach should allow us to wait for all the IOCTLs in 
>>> flight, together with stopping scheduler threads and cancelling and 
>>> flushing all in flight work items and timers i think It should give 
>>> as full solution for the hot-unplug case
>>> of preventing any MMIO accesses post device pci_remove.
>>>
>>> Let me know what you think guys.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> And then testing, testing, testing to see if we have missed something.
>>>>
>>>> Christian.
>>>>
>>>> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>>>>
>>>>> Denis, Christian, are there any updates in the plan on how to move 
>>>>> on with this ? As you know I need very similar code for my 
>>>>> up-streaming of device hot-unplug. My latest solution 
>>>>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>>>>> was not acceptable because of low level guards on the register 
>>>>> accessors level which was hurting performance. Basically I need a 
>>>>> way to prevent any MMIO write accesses from kernel driver after 
>>>>> device is removed (UMD accesses are taken care of by page faulting 
>>>>> dummy page). We are using now hot-unplug code for Freemont program 
>>>>> and so up-streaming became more of a priority then before. This 
>>>>> MMIO access issue is currently my main blocker from up-streaming. 
>>>>> Is there any way I can assist in pushing this on ?
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>>>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>>>>
>>>>>>> >>> The GPU reset doesn't complete the fences we wait for. It 
>>>>>>> only completes the hardware fences as part of the reset.
>>>>>>>
>>>>>>> >>> So waiting for a fence while holding the reset lock is 
>>>>>>> illegal and needs to be avoided.
>>>>>>>
>>>>>>> I understood your concern. It is more complex for DRM GFX, 
>>>>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>>>>> Maybe we can try to add all kernel  dma_fence waiting in a list, 
>>>>>>> and signal all in recovery threads. Do you have same concern for 
>>>>>>> compute cases?
>>>>>>>
>>>>>>
>>>>>> Yes, compute (KFD) is even harder to handle.
>>>>>>
>>>>>> See you can't signal the dma_fence waiting. Waiting for a 
>>>>>> dma_fence also means you wait for the GPU reset to finish.
>>>>>>
>>>>>> When we would signal the dma_fence during the GPU reset then we 
>>>>>> would run into memory corruption because the hardware jobs 
>>>>>> running after the GPU reset would access memory which is already 
>>>>>> freed.
>>>>>>
>>>>>>> >>> Lockdep also complains about this when it is used correctly. 
>>>>>>> The only reason it doesn't complain here is because you use an 
>>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>>
>>>>>>> Agree. This approach will escape the monitor of lockdep.  Its 
>>>>>>> goal is to block other threads when GPU recovery thread start. 
>>>>>>> But I couldn’t find a better method to solve this problem. Do 
>>>>>>> you have some suggestion?
>>>>>>>
>>>>>>
>>>>>> Well, completely abandon those change here.
>>>>>>
>>>>>> What we need to do is to identify where hardware access happens 
>>>>>> and then insert taking the read side of the GPU reset lock so 
>>>>>> that we don't wait for a dma_fence or allocate memory, but still 
>>>>>> protect the hardware from concurrent access and reset.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Dennis Li
>>>>>>>
>>>>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>> enhance its stability
>>>>>>>
>>>>>>> Exactly that's what you don't seem to understand.
>>>>>>>
>>>>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>>>> completes the hardware fences as part of the reset.
>>>>>>>
>>>>>>> So waiting for a fence while holding the reset lock is illegal 
>>>>>>> and needs to be avoided.
>>>>>>>
>>>>>>> Lockdep also complains about this when it is used correctly. The 
>>>>>>> only reason it doesn't complain here is because you use an 
>>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>> <mailto:Christian.Koenig@amd.com>>; 
>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>>>>> <amd-gfx@lists.freedesktop.org 
>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>> enhance its stability
>>>>>>>
>>>>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>>>>> possible that new delayed work items etc are started before the 
>>>>>>> lock is taken.
>>>>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>>>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>>>>>> example, the user thread hold the read lock and waiting for the 
>>>>>>> fence, if recovery thread try to hold write lock and then 
>>>>>>> complete fences, in this case, recovery thread will always be 
>>>>>>> blocked.
>>>>>>>
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Dennis Li
>>>>>>> -----Original Message-----
>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>> <mailto:Christian.Koenig@amd.com>>
>>>>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance 
>>>>>>> its stability
>>>>>>>
>>>>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>>>>> > We have defined two variables in_gpu_reset and reset_sem in 
>>>>>>> adev object. The atomic type variable in_gpu_reset is used to 
>>>>>>> avoid recovery thread reenter and make lower functions return 
>>>>>>> more earlier when recovery start, but couldn't block recovery 
>>>>>>> thread when it access hardware. The r/w semaphore reset_sem is 
>>>>>>> used to solve these synchronization issues between recovery 
>>>>>>> thread and other threads.
>>>>>>> >
>>>>>>> > The original solution locked registers' access in lower 
>>>>>>> functions, which will introduce following issues:
>>>>>>> >
>>>>>>> > 1) many lower functions are used in both recovery thread and 
>>>>>>> others. Firstly we must harvest these functions, it is easy to 
>>>>>>> miss someones. Secondly these functions need select which lock 
>>>>>>> (read lock or write lock) will be used, according to the thread 
>>>>>>> it is running in. If the thread context isn't considered, the 
>>>>>>> added lock will easily introduce deadlock. Besides that, in most 
>>>>>>> time, developer easily forget to add locks for new functions.
>>>>>>> >
>>>>>>> > 2) performance drop. More lower functions are more frequently 
>>>>>>> called.
>>>>>>> >
>>>>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>>>>> write lock has big range in recovery thread, but low level 
>>>>>>> functions will hold read lock may be protected by other locks in 
>>>>>>> other threads.
>>>>>>> >
>>>>>>> > Therefore the new solution will try to add lock protection for 
>>>>>>> ioctls of kfd. Its goal is that there are no threads except for 
>>>>>>> recovery thread or its children (for xgmi) to access hardware 
>>>>>>> when doing GPU reset and resume. So refine recovery thread as 
>>>>>>> the following:
>>>>>>> >
>>>>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>>>>> >     1). if failed, it means system had a recovery thread 
>>>>>>> running, current thread exit directly;
>>>>>>> >     2). if success, enter recovery thread;
>>>>>>> >
>>>>>>> > Step 1: cancel all delay works, stop drm schedule, complete 
>>>>>>> all unreceived fences and so on. It try to stop or pause other 
>>>>>>> threads.
>>>>>>> >
>>>>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>>>>> which will block recovery thread until other threads release 
>>>>>>> read locks.
>>>>>>>
>>>>>>> Those two steps need to be exchanged or otherwise it is possible 
>>>>>>> that new delayed work items etc are started before the lock is 
>>>>>>> taken.
>>>>>>>
>>>>>>> Just to make it clear until this is fixed the whole patch set is 
>>>>>>> a NAK.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> >
>>>>>>> > Step 3: normally, there is only recovery threads running to 
>>>>>>> access hardware, it is safe to do gpu reset now.
>>>>>>> >
>>>>>>> > Step 4: do post gpu reset, such as call all ips' resume functions;
>>>>>>> >
>>>>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other 
>>>>>>> threads and release write lock. Recovery thread exit normally.
>>>>>>> >
>>>>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>>>>> release read lock if it has holden one, and then blocks itself 
>>>>>>> to wait for recovery finished event. If thread successfully hold 
>>>>>>> read lock and in_gpu_reset is 0, it continues. It will exit 
>>>>>>> normally or be stopped by recovery thread in step 1.
>>>>>>> >
>>>>>>> > Dennis Li (4):
>>>>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>>>>> >
>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 173 
>>>>>>> +++++++++++++-----
>>>>>>> > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++++++++++++++++-
>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>>>>> > .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>>>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>


[-- Attachment #1.2: Type: text/html, Size: 39403 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-08  8:22                     ` Christian König
@ 2021-04-08  8:32                       ` Christian König
  2021-04-08 16:08                         ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-08  8:32 UTC (permalink / raw)
  To: Christian König, Andrey Grodzovsky, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 18335 bytes --]



Am 08.04.21 um 10:22 schrieb Christian König:
> Hi Andrey,
>
> Am 07.04.21 um 21:44 schrieb Andrey Grodzovsky:
>>
>>
>> On 2021-04-07 6:28 a.m., Christian König wrote:
>>> Hi Andrey,
>>>
>>> Am 06.04.21 um 23:22 schrieb Andrey Grodzovsky:
>>>>
>>>> Hey Christian, Denis, see bellow -
>>>>
>>>> On 2021-04-06 6:34 a.m., Christian König wrote:
>>>>> Hi Andrey,
>>>>>
>>>>> well good question. My job is to watch over the implementation and 
>>>>> design and while I always help I can adjust anybodies schedule.
>>>>>
>>>>> Is the patch to print a warning when the hardware is accessed 
>>>>> without holding the locks merged yet? If not then that would 
>>>>> probably be a good starting point.
>>>>
>>>>
>>>> It's merged into amd-staging-drm-next and since I work on 
>>>> drm-misc-next I will cherry-pick it into there.
>>>>
>>>
>>> Ok good to know, I haven't tracked that one further.
>>>
>>>>
>>>>>
>>>>> Then we would need to unify this with the SRCU to make sure that 
>>>>> we have both the reset lock as well as block the hotplug code from 
>>>>> reusing the MMIO space.
>>>>
>>>> In my understanding there is a significant difference between 
>>>> handling of GPU reset and unplug - while GPU reset use case 
>>>> requires any HW accessing code to block and wait for the reset to 
>>>> finish and then proceed, hot-unplug
>>>> is permanent and hence no need to wait and proceed but rather abort 
>>>> at once.
>>>>
>>>
>>> Yes, absolutely correct.
>>>
>>>> This why I think that in any place we already check for device 
>>>> reset we should also add a check for hot-unplug but the handling 
>>>> would be different
>>>> in that for hot-unplug we would abort instead of keep waiting.
>>>>
>>>
>>> Yes, that's the rough picture in my head as well.
>>>
>>> Essentially Daniels patch of having an 
>>> amdgpu_device_hwaccess_begin()/_end() was the right approach. You 
>>> just can't do it in the top level IOCTL handler, but rather need it 
>>> somewhere between front end and backend.
>>
>>
>> Can you point me to what patch was it ? Can't find.
>>
>
> What I mean was the approach in patch #3 in this series where he 
> replaced the down_read/up_read with 
> amdgpu_read_lock()/amdgpu_read_unlock().
>
> I would just not call it amdgpu_read_lock()/amdgpu_read_unlock(), but 
> something more descriptive.
>
> Regards,
> Christian.
>
>>
>>>
>>>> Similar to handling device reset for unplug we obviously also need 
>>>> to stop and block any MMIO accesses once device is unplugged and, 
>>>> as Daniel Vetter mentioned - we have to do it before finishing 
>>>> pci_remove (early device fini)
>>>> and not later (when last device reference is dropped from user 
>>>> space) in order to prevent reuse of MMIO space we still access by 
>>>> other hot plugging devices. As in device reset case we need to 
>>>> cancel all delay works, stop drm schedule, complete all unfinished 
>>>> fences(both HW and scheduler fences). While you stated strong 
>>>> objection to force signalling scheduler fences from GPU reset, quote:
>>>>
>>>> "you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>>> also means you wait for the GPU reset to finish. When we would 
>>>> signal the dma_fence during the GPU reset then we would run into 
>>>> memory corruption because the hardware jobs running after the GPU 
>>>> reset would access memory which is already freed."
>>>> To my understating this is a key difference with hot-unplug, the 
>>>> device is gone, all those concerns are irrelevant and hence we can 
>>>> actually force signal scheduler fences (setting and error to them 
>>>> before) to force completion of any
>>>> waiting clients such as possibly IOCTLs or async page flips e.t.c.
>>>>
>>>
>>> Yes, absolutely correct. That's what I also mentioned to Daniel. 
>>> When we are able to nuke the device and any memory access it might 
>>> do we can also signal the fences.
>>>
>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>> need to guarantee no  IOCTL can access MMIO post device unplug OR 
>>>> in flight IOCTLs are done before we finish pci_remove 
>>>> (amdgpu_pci_remove for us).
>>>> For this I suggest we do something like what we worked on with 
>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>> implementing PCI BARs move support for snd_hda_intel. Take a look at
>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>> and
>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
>>>> We also had same issue there, how to prevent MMIO accesses while 
>>>> the BARs are migrating. What was done there is a refcount was added 
>>>> to count all IOCTLs in flight, for any in flight  IOCTL the BAR 
>>>> migration handler would
>>>> block for the refcount to drop to 0 before it would proceed, for 
>>>> any later IOCTL it stops and wait if device is in migration state. 
>>>> We even don't need the wait part, nothing to wait for, we just 
>>>> return with -ENODEV for this case.
>>>>
>>>
>>> This is essentially what the DRM SRCU is doing as well.
>>>
>>> For the hotplug case we could do this in the toplevel since we can 
>>> signal the fence and don't need to block memory management.
>>
>>
>> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
>> every IOCTL ( practically - just drm_ioctl function) with 
>> drm_dev_enter/drm_dev_exit - can we do it ?
>>

Sorry totally missed this question.

Yes, exactly that. As discussed for the hotplug case we can do this.

>>
>>>
>>> But I'm not sure, maybe we should handle it the same way as reset or 
>>> maybe we should have it at the top level.
>>
>>
>> If by top level you mean checking for device unplugged and bailing 
>> out at the entry to IOCTL or right at start of any work_item/timer 
>> function we have then seems to me it's better and more clear. Once we 
>> flushed all of them in flight there is no reason for them to execute 
>> any more when device is unplugged.
>>

Well I'm open to both approaches. I just think having drm_dev_enter/exit 
on each work item would be more defensive in case we forgot to 
cancel/sync one.

Christian.

>> Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> The above approach should allow us to wait for all the IOCTLs in 
>>>> flight, together with stopping scheduler threads and cancelling and 
>>>> flushing all in flight work items and timers i think It should give 
>>>> as full solution for the hot-unplug case
>>>> of preventing any MMIO accesses post device pci_remove.
>>>>
>>>> Let me know what you think guys.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> And then testing, testing, testing to see if we have missed something.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> Denis, Christian, are there any updates in the plan on how to 
>>>>>> move on with this ? As you know I need very similar code for my 
>>>>>> up-streaming of device hot-unplug. My latest solution 
>>>>>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>>>>>> was not acceptable because of low level guards on the register 
>>>>>> accessors level which was hurting performance. Basically I need a 
>>>>>> way to prevent any MMIO write accesses from kernel driver after 
>>>>>> device is removed (UMD accesses are taken care of by page 
>>>>>> faulting dummy page). We are using now hot-unplug code for 
>>>>>> Freemont program and so up-streaming became more of a priority 
>>>>>> then before. This MMIO access issue is currently my main blocker 
>>>>>> from up-streaming. Is there any way I can assist in pushing this on ?
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>>>>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>>>>>
>>>>>>>> >>> The GPU reset doesn't complete the fences we wait for. It 
>>>>>>>> only completes the hardware fences as part of the reset.
>>>>>>>>
>>>>>>>> >>> So waiting for a fence while holding the reset lock is 
>>>>>>>> illegal and needs to be avoided.
>>>>>>>>
>>>>>>>> I understood your concern. It is more complex for DRM GFX, 
>>>>>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>>>>>> Maybe we can try to add all kernel  dma_fence waiting in a 
>>>>>>>> list, and signal all in recovery threads. Do you have same 
>>>>>>>> concern for compute cases?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, compute (KFD) is even harder to handle.
>>>>>>>
>>>>>>> See you can't signal the dma_fence waiting. Waiting for a 
>>>>>>> dma_fence also means you wait for the GPU reset to finish.
>>>>>>>
>>>>>>> When we would signal the dma_fence during the GPU reset then we 
>>>>>>> would run into memory corruption because the hardware jobs 
>>>>>>> running after the GPU reset would access memory which is already 
>>>>>>> freed.
>>>>>>>
>>>>>>>> >>> Lockdep also complains about this when it is used 
>>>>>>>> correctly. The only reason it doesn't complain here is because 
>>>>>>>> you use an atomic+wait_event instead of a locking primitive.
>>>>>>>>
>>>>>>>> Agree. This approach will escape the monitor of lockdep.  Its 
>>>>>>>> goal is to block other threads when GPU recovery thread start. 
>>>>>>>> But I couldn’t find a better method to solve this problem. Do 
>>>>>>>> you have some suggestion?
>>>>>>>>
>>>>>>>
>>>>>>> Well, completely abandon those change here.
>>>>>>>
>>>>>>> What we need to do is to identify where hardware access happens 
>>>>>>> and then insert taking the read side of the GPU reset lock so 
>>>>>>> that we don't wait for a dma_fence or allocate memory, but still 
>>>>>>> protect the hardware from concurrent access and reset.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Best Regards
>>>>>>>>
>>>>>>>> Dennis Li
>>>>>>>>
>>>>>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>>>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>>>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>>>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>> enhance its stability
>>>>>>>>
>>>>>>>> Exactly that's what you don't seem to understand.
>>>>>>>>
>>>>>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>>>>> completes the hardware fences as part of the reset.
>>>>>>>>
>>>>>>>> So waiting for a fence while holding the reset lock is illegal 
>>>>>>>> and needs to be avoided.
>>>>>>>>
>>>>>>>> Lockdep also complains about this when it is used correctly. 
>>>>>>>> The only reason it doesn't complain here is because you use an 
>>>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>>>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>>>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>>> <mailto:Christian.Koenig@amd.com>>; 
>>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>>>>>> <amd-gfx@lists.freedesktop.org 
>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>> enhance its stability
>>>>>>>>
>>>>>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>>>>>> possible that new delayed work items etc are started before the 
>>>>>>>> lock is taken.
>>>>>>>> What about adding check for adev->in_gpu_reset in work item? If 
>>>>>>>> exchange the two steps, it maybe introduce the deadlock.  For 
>>>>>>>> example, the user thread hold the read lock and waiting for the 
>>>>>>>> fence, if recovery thread try to hold write lock and then 
>>>>>>>> complete fences, in this case, recovery thread will always be 
>>>>>>>> blocked.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Dennis Li
>>>>>>>> -----Original Message-----
>>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>>> <mailto:Christian.Koenig@amd.com>>
>>>>>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>>>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>>>>>> <Alexander.Deucher@amd.com <mailto:Alexander.Deucher@amd.com>>; 
>>>>>>>> Kuehling, Felix <Felix.Kuehling@amd.com 
>>>>>>>> <mailto:Felix.Kuehling@amd.com>>; Zhang, Hawking 
>>>>>>>> <Hawking.Zhang@amd.com <mailto:Hawking.Zhang@amd.com>>
>>>>>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>> enhance its stability
>>>>>>>>
>>>>>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>>>>>> > We have defined two variables in_gpu_reset and reset_sem in 
>>>>>>>> adev object. The atomic type variable in_gpu_reset is used to 
>>>>>>>> avoid recovery thread reenter and make lower functions return 
>>>>>>>> more earlier when recovery start, but couldn't block recovery 
>>>>>>>> thread when it access hardware. The r/w semaphore reset_sem is 
>>>>>>>> used to solve these synchronization issues between recovery 
>>>>>>>> thread and other threads.
>>>>>>>> >
>>>>>>>> > The original solution locked registers' access in lower 
>>>>>>>> functions, which will introduce following issues:
>>>>>>>> >
>>>>>>>> > 1) many lower functions are used in both recovery thread and 
>>>>>>>> others. Firstly we must harvest these functions, it is easy to 
>>>>>>>> miss someones. Secondly these functions need select which lock 
>>>>>>>> (read lock or write lock) will be used, according to the thread 
>>>>>>>> it is running in. If the thread context isn't considered, the 
>>>>>>>> added lock will easily introduce deadlock. Besides that, in 
>>>>>>>> most time, developer easily forget to add locks for new functions.
>>>>>>>> >
>>>>>>>> > 2) performance drop. More lower functions are more frequently 
>>>>>>>> called.
>>>>>>>> >
>>>>>>>> > 3) easily introduce false positive lockdep complaint, because 
>>>>>>>> write lock has big range in recovery thread, but low level 
>>>>>>>> functions will hold read lock may be protected by other locks 
>>>>>>>> in other threads.
>>>>>>>> >
>>>>>>>> > Therefore the new solution will try to add lock protection 
>>>>>>>> for ioctls of kfd. Its goal is that there are no threads except 
>>>>>>>> for recovery thread or its children (for xgmi) to access 
>>>>>>>> hardware when doing GPU reset and resume. So refine recovery 
>>>>>>>> thread as the following:
>>>>>>>> >
>>>>>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>>>>>> >     1). if failed, it means system had a recovery thread 
>>>>>>>> running, current thread exit directly;
>>>>>>>> >     2). if success, enter recovery thread;
>>>>>>>> >
>>>>>>>> > Step 1: cancel all delay works, stop drm schedule, complete 
>>>>>>>> all unreceived fences and so on. It try to stop or pause other 
>>>>>>>> threads.
>>>>>>>> >
>>>>>>>> > Step 2: call down_write(&adev->reset_sem) to hold write lock, 
>>>>>>>> which will block recovery thread until other threads release 
>>>>>>>> read locks.
>>>>>>>>
>>>>>>>> Those two steps need to be exchanged or otherwise it is 
>>>>>>>> possible that new delayed work items etc are started before the 
>>>>>>>> lock is taken.
>>>>>>>>
>>>>>>>> Just to make it clear until this is fixed the whole patch set 
>>>>>>>> is a NAK.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Step 3: normally, there is only recovery threads running to 
>>>>>>>> access hardware, it is safe to do gpu reset now.
>>>>>>>> >
>>>>>>>> > Step 4: do post gpu reset, such as call all ips' resume 
>>>>>>>> functions;
>>>>>>>> >
>>>>>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other 
>>>>>>>> threads and release write lock. Recovery thread exit normally.
>>>>>>>> >
>>>>>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>>>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>>>>>> release read lock if it has holden one, and then blocks itself 
>>>>>>>> to wait for recovery finished event. If thread successfully 
>>>>>>>> hold read lock and in_gpu_reset is 0, it continues. It will 
>>>>>>>> exit normally or be stopped by recovery thread in step 1.
>>>>>>>> >
>>>>>>>> > Dennis Li (4):
>>>>>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>>>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>>>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>>>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>>>>>> >
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 173 
>>>>>>>> +++++++++++++-----
>>>>>>>> > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++++++++++++++++-
>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>>>>>> > .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>>>>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>


[-- Attachment #1.2: Type: text/html, Size: 43235 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-08  8:32                       ` Christian König
@ 2021-04-08 16:08                         ` Andrey Grodzovsky
  2021-04-08 18:58                           ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-08 16:08 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 19675 bytes --]


On 2021-04-08 4:32 a.m., Christian König wrote:
>
>
> Am 08.04.21 um 10:22 schrieb Christian König:
>> Hi Andrey,
>>
>> Am 07.04.21 um 21:44 schrieb Andrey Grodzovsky:
>>>
>>>
>>> On 2021-04-07 6:28 a.m., Christian König wrote:
>>>> Hi Andrey,
>>>>
>>>> Am 06.04.21 um 23:22 schrieb Andrey Grodzovsky:
>>>>>
>>>>> Hey Christian, Denis, see bellow -
>>>>>
>>>>> On 2021-04-06 6:34 a.m., Christian König wrote:
>>>>>> Hi Andrey,
>>>>>>
>>>>>> well good question. My job is to watch over the implementation 
>>>>>> and design and while I always help I can adjust anybodies schedule.
>>>>>>
>>>>>> Is the patch to print a warning when the hardware is accessed 
>>>>>> without holding the locks merged yet? If not then that would 
>>>>>> probably be a good starting point.
>>>>>
>>>>>
>>>>> It's merged into amd-staging-drm-next and since I work on 
>>>>> drm-misc-next I will cherry-pick it into there.
>>>>>
>>>>
>>>> Ok good to know, I haven't tracked that one further.
>>>>
>>>>>
>>>>>>
>>>>>> Then we would need to unify this with the SRCU to make sure that 
>>>>>> we have both the reset lock as well as block the hotplug code 
>>>>>> from reusing the MMIO space.
>>>>>
>>>>> In my understanding there is a significant difference between 
>>>>> handling of GPU reset and unplug - while GPU reset use case 
>>>>> requires any HW accessing code to block and wait for the reset to 
>>>>> finish and then proceed, hot-unplug
>>>>> is permanent and hence no need to wait and proceed but rather 
>>>>> abort at once.
>>>>>
>>>>
>>>> Yes, absolutely correct.
>>>>
>>>>> This why I think that in any place we already check for device 
>>>>> reset we should also add a check for hot-unplug but the handling 
>>>>> would be different
>>>>> in that for hot-unplug we would abort instead of keep waiting.
>>>>>
>>>>
>>>> Yes, that's the rough picture in my head as well.
>>>>
>>>> Essentially Daniels patch of having an 
>>>> amdgpu_device_hwaccess_begin()/_end() was the right approach. You 
>>>> just can't do it in the top level IOCTL handler, but rather need it 
>>>> somewhere between front end and backend.
>>>
>>>
>>> Can you point me to what patch was it ? Can't find.
>>>
>>
>> What I mean was the approach in patch #3 in this series where he 
>> replaced the down_read/up_read with 
>> amdgpu_read_lock()/amdgpu_read_unlock().
>>
>> I would just not call it amdgpu_read_lock()/amdgpu_read_unlock(), but 
>> something more descriptive.
>>
>> Regards,
>> Christian.
>>
>>>
>>>>
>>>>> Similar to handling device reset for unplug we obviously also need 
>>>>> to stop and block any MMIO accesses once device is unplugged and, 
>>>>> as Daniel Vetter mentioned - we have to do it before finishing 
>>>>> pci_remove (early device fini)
>>>>> and not later (when last device reference is dropped from user 
>>>>> space) in order to prevent reuse of MMIO space we still access by 
>>>>> other hot plugging devices. As in device reset case we need to 
>>>>> cancel all delay works, stop drm schedule, complete all unfinished 
>>>>> fences(both HW and scheduler fences). While you stated strong 
>>>>> objection to force signalling scheduler fences from GPU reset, quote:
>>>>>
>>>>> "you can't signal the dma_fence waiting. Waiting for a dma_fence 
>>>>> also means you wait for the GPU reset to finish. When we would 
>>>>> signal the dma_fence during the GPU reset then we would run into 
>>>>> memory corruption because the hardware jobs running after the GPU 
>>>>> reset would access memory which is already freed."
>>>>> To my understating this is a key difference with hot-unplug, the 
>>>>> device is gone, all those concerns are irrelevant and hence we can 
>>>>> actually force signal scheduler fences (setting and error to them 
>>>>> before) to force completion of any
>>>>> waiting clients such as possibly IOCTLs or async page flips e.t.c.
>>>>>
>>>>
>>>> Yes, absolutely correct. That's what I also mentioned to Daniel. 
>>>> When we are able to nuke the device and any memory access it might 
>>>> do we can also signal the fences.
>>>>
>>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>>> need to guarantee no  IOCTL can access MMIO post device unplug OR 
>>>>> in flight IOCTLs are done before we finish pci_remove 
>>>>> (amdgpu_pci_remove for us).
>>>>> For this I suggest we do something like what we worked on with 
>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>> implementing PCI BARs move support for snd_hda_intel. Take a look at
>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>> and
>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
>>>>> We also had same issue there, how to prevent MMIO accesses while 
>>>>> the BARs are migrating. What was done there is a refcount was 
>>>>> added to count all IOCTLs in flight, for any in flight  IOCTL the 
>>>>> BAR migration handler would
>>>>> block for the refcount to drop to 0 before it would proceed, for 
>>>>> any later IOCTL it stops and wait if device is in migration state. 
>>>>> We even don't need the wait part, nothing to wait for, we just 
>>>>> return with -ENODEV for this case.
>>>>>
>>>>
>>>> This is essentially what the DRM SRCU is doing as well.
>>>>
>>>> For the hotplug case we could do this in the toplevel since we can 
>>>> signal the fence and don't need to block memory management.
>>>
>>>
>>> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
>>> every IOCTL ( practically - just drm_ioctl function) with 
>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>
>
> Sorry totally missed this question.
>
> Yes, exactly that. As discussed for the hotplug case we can do this.


Thinking more about it - assuming we are  treating synchronize_srcu as a 
'wait for completion' of any in flight {drm_dev_enter, drm_dev_exit} 
scope, some of those scopes might do dma_fence_wait inside. Since we 
haven't force signaled the fences yet we will end up a deadlock. We have 
to signal all the various fences before doing the 'wait for'. But we 
can't signal the fences before setting 'dev->unplugged = true' to reject 
further CS and other stuff which might create more fences we were 
supposed-to force signal and now missed them. Effectively  setting 
'dev->unplugged = true' and doing synchronize_srcu in one call like 
drm_dev_unplug does without signalling all the fences in the device in 
between these two steps looks luck a possible deadlock to me - what do 
you think ?

Andrey

>
>>>
>>>>
>>>> But I'm not sure, maybe we should handle it the same way as reset 
>>>> or maybe we should have it at the top level.
>>>
>>>
>>> If by top level you mean checking for device unplugged and bailing 
>>> out at the entry to IOCTL or right at start of any work_item/timer 
>>> function we have then seems to me it's better and more clear. Once 
>>> we flushed all of them in flight there is no reason for them to 
>>> execute any more when device is unplugged.
>>>
>
> Well I'm open to both approaches. I just think having 
> drm_dev_enter/exit on each work item would be more defensive in case 
> we forgot to cancel/sync one.
>
> Christian.
>
>>> Andrey
>>>
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> The above approach should allow us to wait for all the IOCTLs in 
>>>>> flight, together with stopping scheduler threads and cancelling 
>>>>> and flushing all in flight work items and timers i think It should 
>>>>> give as full solution for the hot-unplug case
>>>>> of preventing any MMIO accesses post device pci_remove.
>>>>>
>>>>> Let me know what you think guys.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> And then testing, testing, testing to see if we have missed 
>>>>>> something.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 05.04.21 um 19:58 schrieb Andrey Grodzovsky:
>>>>>>>
>>>>>>> Denis, Christian, are there any updates in the plan on how to 
>>>>>>> move on with this ? As you know I need very similar code for my 
>>>>>>> up-streaming of device hot-unplug. My latest solution 
>>>>>>> (https://lists.freedesktop.org/archives/amd-gfx/2021-January/058606.html) 
>>>>>>> was not acceptable because of low level guards on the register 
>>>>>>> accessors level which was hurting performance. Basically I need 
>>>>>>> a way to prevent any MMIO write accesses from kernel driver 
>>>>>>> after device is removed (UMD accesses are taken care of by page 
>>>>>>> faulting dummy page). We are using now hot-unplug code for 
>>>>>>> Freemont program and so up-streaming became more of a priority 
>>>>>>> then before. This MMIO access issue is currently my main blocker 
>>>>>>> from up-streaming. Is there any way I can assist in pushing this 
>>>>>>> on ?
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>> On 2021-03-18 5:51 a.m., Christian König wrote:
>>>>>>>> Am 18.03.21 um 10:30 schrieb Li, Dennis:
>>>>>>>>>
>>>>>>>>> >>> The GPU reset doesn't complete the fences we wait for. It 
>>>>>>>>> only completes the hardware fences as part of the reset.
>>>>>>>>>
>>>>>>>>> >>> So waiting for a fence while holding the reset lock is 
>>>>>>>>> illegal and needs to be avoided.
>>>>>>>>>
>>>>>>>>> I understood your concern. It is more complex for DRM GFX, 
>>>>>>>>> therefore I abandon adding lock protection for DRM ioctls now. 
>>>>>>>>> Maybe we can try to add all kernel  dma_fence waiting in a 
>>>>>>>>> list, and signal all in recovery threads. Do you have same 
>>>>>>>>> concern for compute cases?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, compute (KFD) is even harder to handle.
>>>>>>>>
>>>>>>>> See you can't signal the dma_fence waiting. Waiting for a 
>>>>>>>> dma_fence also means you wait for the GPU reset to finish.
>>>>>>>>
>>>>>>>> When we would signal the dma_fence during the GPU reset then we 
>>>>>>>> would run into memory corruption because the hardware jobs 
>>>>>>>> running after the GPU reset would access memory which is 
>>>>>>>> already freed.
>>>>>>>>
>>>>>>>>> >>> Lockdep also complains about this when it is used 
>>>>>>>>> correctly. The only reason it doesn't complain here is because 
>>>>>>>>> you use an atomic+wait_event instead of a locking primitive.
>>>>>>>>>
>>>>>>>>> Agree. This approach will escape the monitor of lockdep.  Its 
>>>>>>>>> goal is to block other threads when GPU recovery thread start. 
>>>>>>>>> But I couldn’t find a better method to solve this problem. Do 
>>>>>>>>> you have some suggestion?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, completely abandon those change here.
>>>>>>>>
>>>>>>>> What we need to do is to identify where hardware access happens 
>>>>>>>> and then insert taking the read side of the GPU reset lock so 
>>>>>>>> that we don't wait for a dma_fence or allocate memory, but 
>>>>>>>> still protect the hardware from concurrent access and reset.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>>
>>>>>>>>> Dennis Li
>>>>>>>>>
>>>>>>>>> *From:* Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>>>> *Sent:* Thursday, March 18, 2021 4:59 PM
>>>>>>>>> *To:* Li, Dennis <Dennis.Li@amd.com>; 
>>>>>>>>> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
>>>>>>>>> <Alexander.Deucher@amd.com>; Kuehling, Felix 
>>>>>>>>> <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>
>>>>>>>>> *Subject:* AW: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>>> enhance its stability
>>>>>>>>>
>>>>>>>>> Exactly that's what you don't seem to understand.
>>>>>>>>>
>>>>>>>>> The GPU reset doesn't complete the fences we wait for. It only 
>>>>>>>>> completes the hardware fences as part of the reset.
>>>>>>>>>
>>>>>>>>> So waiting for a fence while holding the reset lock is illegal 
>>>>>>>>> and needs to be avoided.
>>>>>>>>>
>>>>>>>>> Lockdep also complains about this when it is used correctly. 
>>>>>>>>> The only reason it doesn't complain here is because you use an 
>>>>>>>>> atomic+wait_event instead of a locking primitive.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> *Von:*Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>
>>>>>>>>> *Gesendet:* Donnerstag, 18. März 2021 09:28
>>>>>>>>> *An:* Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>>>> <mailto:Christian.Koenig@amd.com>>; 
>>>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org> 
>>>>>>>>> <amd-gfx@lists.freedesktop.org 
>>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander 
>>>>>>>>> <Alexander.Deucher@amd.com 
>>>>>>>>> <mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix 
>>>>>>>>> <Felix.Kuehling@amd.com <mailto:Felix.Kuehling@amd.com>>; 
>>>>>>>>> Zhang, Hawking <Hawking.Zhang@amd.com 
>>>>>>>>> <mailto:Hawking.Zhang@amd.com>>
>>>>>>>>> *Betreff:* RE: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>>> enhance its stability
>>>>>>>>>
>>>>>>>>> >>> Those two steps need to be exchanged or otherwise it is 
>>>>>>>>> possible that new delayed work items etc are started before 
>>>>>>>>> the lock is taken.
>>>>>>>>> What about adding check for adev->in_gpu_reset in work item? 
>>>>>>>>> If exchange the two steps, it maybe introduce the deadlock.  
>>>>>>>>> For example, the user thread hold the read lock and waiting 
>>>>>>>>> for the fence, if recovery thread try to hold write lock and 
>>>>>>>>> then complete fences, in this case, recovery thread will 
>>>>>>>>> always be blocked.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Dennis Li
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com 
>>>>>>>>> <mailto:Christian.Koenig@amd.com>>
>>>>>>>>> Sent: Thursday, March 18, 2021 3:54 PM
>>>>>>>>> To: Li, Dennis <Dennis.Li@amd.com <mailto:Dennis.Li@amd.com>>; 
>>>>>>>>> amd-gfx@lists.freedesktop.org 
>>>>>>>>> <mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander 
>>>>>>>>> <Alexander.Deucher@amd.com 
>>>>>>>>> <mailto:Alexander.Deucher@amd.com>>; Kuehling, Felix 
>>>>>>>>> <Felix.Kuehling@amd.com <mailto:Felix.Kuehling@amd.com>>; 
>>>>>>>>> Zhang, Hawking <Hawking.Zhang@amd.com 
>>>>>>>>> <mailto:Hawking.Zhang@amd.com>>
>>>>>>>>> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to 
>>>>>>>>> enhance its stability
>>>>>>>>>
>>>>>>>>> Am 18.03.21 um 08:23 schrieb Dennis Li:
>>>>>>>>> > We have defined two variables in_gpu_reset and reset_sem in 
>>>>>>>>> adev object. The atomic type variable in_gpu_reset is used to 
>>>>>>>>> avoid recovery thread reenter and make lower functions return 
>>>>>>>>> more earlier when recovery start, but couldn't block recovery 
>>>>>>>>> thread when it access hardware. The r/w semaphore reset_sem is 
>>>>>>>>> used to solve these synchronization issues between recovery 
>>>>>>>>> thread and other threads.
>>>>>>>>> >
>>>>>>>>> > The original solution locked registers' access in lower 
>>>>>>>>> functions, which will introduce following issues:
>>>>>>>>> >
>>>>>>>>> > 1) many lower functions are used in both recovery thread and 
>>>>>>>>> others. Firstly we must harvest these functions, it is easy to 
>>>>>>>>> miss someones. Secondly these functions need select which lock 
>>>>>>>>> (read lock or write lock) will be used, according to the 
>>>>>>>>> thread it is running in. If the thread context isn't 
>>>>>>>>> considered, the added lock will easily introduce deadlock. 
>>>>>>>>> Besides that, in most time, developer easily forget to add 
>>>>>>>>> locks for new functions.
>>>>>>>>> >
>>>>>>>>> > 2) performance drop. More lower functions are more 
>>>>>>>>> frequently called.
>>>>>>>>> >
>>>>>>>>> > 3) easily introduce false positive lockdep complaint, 
>>>>>>>>> because write lock has big range in recovery thread, but low 
>>>>>>>>> level functions will hold read lock may be protected by other 
>>>>>>>>> locks in other threads.
>>>>>>>>> >
>>>>>>>>> > Therefore the new solution will try to add lock protection 
>>>>>>>>> for ioctls of kfd. Its goal is that there are no threads 
>>>>>>>>> except for recovery thread or its children (for xgmi) to 
>>>>>>>>> access hardware when doing GPU reset and resume. So refine 
>>>>>>>>> recovery thread as the following:
>>>>>>>>> >
>>>>>>>>> > Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1)
>>>>>>>>> >     1). if failed, it means system had a recovery thread 
>>>>>>>>> running, current thread exit directly;
>>>>>>>>> >     2). if success, enter recovery thread;
>>>>>>>>> >
>>>>>>>>> > Step 1: cancel all delay works, stop drm schedule, complete 
>>>>>>>>> all unreceived fences and so on. It try to stop or pause other 
>>>>>>>>> threads.
>>>>>>>>> >
>>>>>>>>> > Step 2: call down_write(&adev->reset_sem) to hold write 
>>>>>>>>> lock, which will block recovery thread until other threads 
>>>>>>>>> release read locks.
>>>>>>>>>
>>>>>>>>> Those two steps need to be exchanged or otherwise it is 
>>>>>>>>> possible that new delayed work items etc are started before 
>>>>>>>>> the lock is taken.
>>>>>>>>>
>>>>>>>>> Just to make it clear until this is fixed the whole patch set 
>>>>>>>>> is a NAK.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > Step 3: normally, there is only recovery threads running to 
>>>>>>>>> access hardware, it is safe to do gpu reset now.
>>>>>>>>> >
>>>>>>>>> > Step 4: do post gpu reset, such as call all ips' resume 
>>>>>>>>> functions;
>>>>>>>>> >
>>>>>>>>> > Step 5: atomic set adev->in_gpu_reset as 0, wake up other 
>>>>>>>>> threads and release write lock. Recovery thread exit normally.
>>>>>>>>> >
>>>>>>>>> > Other threads call the amdgpu_read_lock to synchronize with 
>>>>>>>>> recovery thread. If it finds that in_gpu_reset is 1, it should 
>>>>>>>>> release read lock if it has holden one, and then blocks itself 
>>>>>>>>> to wait for recovery finished event. If thread successfully 
>>>>>>>>> hold read lock and in_gpu_reset is 0, it continues. It will 
>>>>>>>>> exit normally or be stopped by recovery thread in step 1.
>>>>>>>>> >
>>>>>>>>> > Dennis Li (4):
>>>>>>>>> >    drm/amdgpu: remove reset lock from low level functions
>>>>>>>>> >    drm/amdgpu: refine the GPU recovery sequence
>>>>>>>>> >    drm/amdgpu: instead of using down/up_read directly
>>>>>>>>> >    drm/amdkfd: add reset lock protection for kfd entry functions
>>>>>>>>> >
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu.h |   6 +
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 173 
>>>>>>>>> +++++++++++++-----
>>>>>>>>> > .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   8 -
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |   4 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   9 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c |   5 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c |   5 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 172 ++++++++++++++++-
>>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   3 +-
>>>>>>>>> > drivers/gpu/drm/amd/amdkfd/kfd_process.c |   4 +
>>>>>>>>> > .../amd/amdkfd/kfd_process_queue_manager.c |  17 ++
>>>>>>>>> >   12 files changed, 345 insertions(+), 75 deletions(-)
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>
>

[-- Attachment #1.2: Type: text/html, Size: 44655 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-08 16:08                         ` Andrey Grodzovsky
@ 2021-04-08 18:58                           ` Christian König
  2021-04-08 20:39                             ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-08 18:58 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
> On 2021-04-08 4:32 a.m., Christian König wrote:
>> Am 08.04.21 um 10:22 schrieb Christian König:
>>> [SNIP]
>>>>>
>>>>>
>>>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>>>> need to guarantee no  IOCTL can access MMIO post device unplug OR 
>>>>>> in flight IOCTLs are done before we finish pci_remove 
>>>>>> (amdgpu_pci_remove for us).
>>>>>> For this I suggest we do something like what we worked on with 
>>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a look at
>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>>> and
>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440
>>>>>> We also had same issue there, how to prevent MMIO accesses while 
>>>>>> the BARs are migrating. What was done there is a refcount was 
>>>>>> added to count all IOCTLs in flight, for any in flight  IOCTL the 
>>>>>> BAR migration handler would
>>>>>> block for the refcount to drop to 0 before it would proceed, for 
>>>>>> any later IOCTL it stops and wait if device is in migration 
>>>>>> state. We even don't need the wait part, nothing to wait for, we 
>>>>>> just return with -ENODEV for this case.
>>>>>>
>>>>>
>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>
>>>>> For the hotplug case we could do this in the toplevel since we can 
>>>>> signal the fence and don't need to block memory management.
>>>>
>>>>
>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
>>>> every IOCTL ( practically - just drm_ioctl function) with 
>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>
>>
>> Sorry totally missed this question.
>>
>> Yes, exactly that. As discussed for the hotplug case we can do this.
>
>
> Thinking more about it - assuming we are  treating synchronize_srcu as 
> a 'wait for completion' of any in flight {drm_dev_enter, drm_dev_exit} 
> scope, some of those scopes might do dma_fence_wait inside. Since we 
> haven't force signaled the fences yet we will end up a deadlock. We 
> have to signal all the various fences before doing the 'wait for'. But 
> we can't signal the fences before setting 'dev->unplugged = true' to 
> reject further CS and other stuff which might create more fences we 
> were supposed-to force signal and now missed them. Effectively setting 
> 'dev->unplugged = true' and doing synchronize_srcu in one call like 
> drm_dev_unplug does without signalling all the fences in the device in 
> between these two steps looks luck a possible deadlock to me - what do 
> you think ?
>

Indeed, that is a really good argument to handle it the same way as the 
reset lock.

E.g. not taking it at the high level IOCTL, but rather when the frontend 
of the driver has acquired all the necessary locks (BO resv, VM lock 
etc...) before calling into the backend to actually do things with the 
hardware.

Christian.

> Andrey
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-08 18:58                           ` Christian König
@ 2021-04-08 20:39                             ` Andrey Grodzovsky
  2021-04-09  6:53                               ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-08 20:39 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


On 2021-04-08 2:58 p.m., Christian König wrote:
> Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
>> On 2021-04-08 4:32 a.m., Christian König wrote:
>>> Am 08.04.21 um 10:22 schrieb Christian König:
>>>> [SNIP]
>>>>>>
>>>>>>
>>>>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>>>>> need to guarantee no  IOCTL can access MMIO post device unplug 
>>>>>>> OR in flight IOCTLs are done before we finish pci_remove 
>>>>>>> (amdgpu_pci_remove for us).
>>>>>>> For this I suggest we do something like what we worked on with 
>>>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a 
>>>>>>> look at
>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>>>> and
>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440 
>>>>>>>
>>>>>>> We also had same issue there, how to prevent MMIO accesses while 
>>>>>>> the BARs are migrating. What was done there is a refcount was 
>>>>>>> added to count all IOCTLs in flight, for any in flight  IOCTL 
>>>>>>> the BAR migration handler would
>>>>>>> block for the refcount to drop to 0 before it would proceed, for 
>>>>>>> any later IOCTL it stops and wait if device is in migration 
>>>>>>> state. We even don't need the wait part, nothing to wait for, we 
>>>>>>> just return with -ENODEV for this case.
>>>>>>>
>>>>>>
>>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>>
>>>>>> For the hotplug case we could do this in the toplevel since we 
>>>>>> can signal the fence and don't need to block memory management.
>>>>>
>>>>>
>>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to wrap 
>>>>> every IOCTL ( practically - just drm_ioctl function) with 
>>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>>
>>>
>>> Sorry totally missed this question.
>>>
>>> Yes, exactly that. As discussed for the hotplug case we can do this.
>>
>>
>> Thinking more about it - assuming we are  treating synchronize_srcu 
>> as a 'wait for completion' of any in flight {drm_dev_enter, 
>> drm_dev_exit} scope, some of those scopes might do dma_fence_wait 
>> inside. Since we haven't force signaled the fences yet we will end up 
>> a deadlock. We have to signal all the various fences before doing the 
>> 'wait for'. But we can't signal the fences before setting 
>> 'dev->unplugged = true' to reject further CS and other stuff which 
>> might create more fences we were supposed-to force signal and now 
>> missed them. Effectively setting 'dev->unplugged = true' and doing 
>> synchronize_srcu in one call like drm_dev_unplug does without 
>> signalling all the fences in the device in between these two steps 
>> looks luck a possible deadlock to me - what do you think ?
>>
>
> Indeed, that is a really good argument to handle it the same way as 
> the reset lock.
>
> E.g. not taking it at the high level IOCTL, but rather when the 
> frontend of the driver has acquired all the necessary locks (BO resv, 
> VM lock etc...) before calling into the backend to actually do things 
> with the hardware.
>
> Christian.

 From what you said I understand that you want to solve this problem by 
using drm_dev_enter/exit brackets low enough in the code such that it 
will not include and fence wait.

But inserting dmr_dev_enter/exit on the highest level in drm_ioctl is 
much less effort and less room for error then going through each IOCTL 
and trying to identify at what point (possibly multiple points) they are 
about to access HW, some of this is hidden deep in HAL layers such as DC 
layer in display driver or the multi layers of powerplay/SMU libraries. 
Also, we can't only limit our-self to back-end if by this you mean ASIC 
specific functions which access registers. We also need to take care of 
any MMIO kernel BO (VRAM BOs) where we may access directly MMIO space by 
pointer from the front end of the driver (HW agnostic) and TTM/DRM layers.

Our problem here is how to signal all the existing  fences on one hand 
and on the other prevent any new dma_fence waits after we finished 
signaling existing fences. Once we solved this then there is no problem 
using drm_dev_unplug in conjunction with drm_dev_enter/exit at the 
highest level of drm_ioctl to flush any IOCTLs in flight and block any 
new ones.

IMHO when we speak about signalling all fences we don't mean ALL the 
currently existing dma_fence structs (they are spread all over the 
place) but rather signal all the HW fences because HW is what's gone and 
we can't expect for those fences to be ever signaled. All the rest such 
as: scheduler fences,  user fences, drm_gem reservation objects e.t.c. 
are either dependent on those HW fences and hence signaling the HW 
fences will in turn signal them or, are not impacted by the HW being 
gone and hence can still be waited on and will complete. If this 
assumption is correct then I think that we should use some flag to 
prevent any new submission to HW which creates HW fences (somewhere 
around amdgpu_fence_emit), then traverse all existing HW fences 
(currently they are spread in a few places so maybe we need to track 
them in a list) and signal them. After that it's safe to cal 
drm_dev_unplug and be sure synchronize_srcu won't stall because of of 
dma_fence_wait. After that we can proceed to canceling work items, 
stopping schedulers e.t.c.

Andrey


>
>> Andrey
>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-08 20:39                             ` Andrey Grodzovsky
@ 2021-04-09  6:53                               ` Christian König
  2021-04-09  7:01                                 ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-09  6:53 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>
> On 2021-04-08 2:58 p.m., Christian König wrote:
>> Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
>>> On 2021-04-08 4:32 a.m., Christian König wrote:
>>>> Am 08.04.21 um 10:22 schrieb Christian König:
>>>>> [SNIP]
>>>>>>>
>>>>>>>
>>>>>>>> Beyond blocking all delayed works and scheduler threads we also 
>>>>>>>> need to guarantee no  IOCTL can access MMIO post device unplug 
>>>>>>>> OR in flight IOCTLs are done before we finish pci_remove 
>>>>>>>> (amdgpu_pci_remove for us).
>>>>>>>> For this I suggest we do something like what we worked on with 
>>>>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a 
>>>>>>>> look at
>>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>>>>> and
>>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440 
>>>>>>>>
>>>>>>>> We also had same issue there, how to prevent MMIO accesses 
>>>>>>>> while the BARs are migrating. What was done there is a refcount 
>>>>>>>> was added to count all IOCTLs in flight, for any in flight  
>>>>>>>> IOCTL the BAR migration handler would
>>>>>>>> block for the refcount to drop to 0 before it would proceed, 
>>>>>>>> for any later IOCTL it stops and wait if device is in migration 
>>>>>>>> state. We even don't need the wait part, nothing to wait for, 
>>>>>>>> we just return with -ENODEV for this case.
>>>>>>>>
>>>>>>>
>>>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>>>
>>>>>>> For the hotplug case we could do this in the toplevel since we 
>>>>>>> can signal the fence and don't need to block memory management.
>>>>>>
>>>>>>
>>>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to 
>>>>>> wrap every IOCTL ( practically - just drm_ioctl function) with 
>>>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>>>
>>>>
>>>> Sorry totally missed this question.
>>>>
>>>> Yes, exactly that. As discussed for the hotplug case we can do this.
>>>
>>>
>>> Thinking more about it - assuming we are  treating synchronize_srcu 
>>> as a 'wait for completion' of any in flight {drm_dev_enter, 
>>> drm_dev_exit} scope, some of those scopes might do dma_fence_wait 
>>> inside. Since we haven't force signaled the fences yet we will end 
>>> up a deadlock. We have to signal all the various fences before doing 
>>> the 'wait for'. But we can't signal the fences before setting 
>>> 'dev->unplugged = true' to reject further CS and other stuff which 
>>> might create more fences we were supposed-to force signal and now 
>>> missed them. Effectively setting 'dev->unplugged = true' and doing 
>>> synchronize_srcu in one call like drm_dev_unplug does without 
>>> signalling all the fences in the device in between these two steps 
>>> looks luck a possible deadlock to me - what do you think ?
>>>
>>
>> Indeed, that is a really good argument to handle it the same way as 
>> the reset lock.
>>
>> E.g. not taking it at the high level IOCTL, but rather when the 
>> frontend of the driver has acquired all the necessary locks (BO resv, 
>> VM lock etc...) before calling into the backend to actually do things 
>> with the hardware.
>>
>> Christian.
>
> From what you said I understand that you want to solve this problem by 
> using drm_dev_enter/exit brackets low enough in the code such that it 
> will not include and fence wait.
>
> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl is 
> much less effort and less room for error then going through each IOCTL 
> and trying to identify at what point (possibly multiple points) they 
> are about to access HW, some of this is hidden deep in HAL layers such 
> as DC layer in display driver or the multi layers of powerplay/SMU 
> libraries. Also, we can't only limit our-self to back-end if by this 
> you mean ASIC specific functions which access registers. We also need 
> to take care of any MMIO kernel BO (VRAM BOs) where we may access 
> directly MMIO space by pointer from the front end of the driver (HW 
> agnostic) and TTM/DRM layers.

Exactly, yes. The key point is we need to identify such places anyway 
for GPU reset to work properly. So we could just piggy back hotplug on 
top of that work and are done.

>
> Our problem here is how to signal all the existing  fences on one hand 
> and on the other prevent any new dma_fence waits after we finished 
> signaling existing fences. Once we solved this then there is no 
> problem using drm_dev_unplug in conjunction with drm_dev_enter/exit at 
> the highest level of drm_ioctl to flush any IOCTLs in flight and block 
> any new ones.
>
> IMHO when we speak about signalling all fences we don't mean ALL the 
> currently existing dma_fence structs (they are spread all over the 
> place) but rather signal all the HW fences because HW is what's gone 
> and we can't expect for those fences to be ever signaled. All the rest 
> such as: scheduler fences,  user fences, drm_gem reservation objects 
> e.t.c. are either dependent on those HW fences and hence signaling the 
> HW fences will in turn signal them or, are not impacted by the HW 
> being gone and hence can still be waited on and will complete. If this 
> assumption is correct then I think that we should use some flag to 
> prevent any new submission to HW which creates HW fences (somewhere 
> around amdgpu_fence_emit), then traverse all existing HW fences 
> (currently they are spread in a few places so maybe we need to track 
> them in a list) and signal them. After that it's safe to cal 
> drm_dev_unplug and be sure synchronize_srcu won't stall because of of 
> dma_fence_wait. After that we can proceed to canceling work items, 
> stopping schedulers e.t.c.

That is problematic as well since you need to make sure that the 
scheduler is not creating a new hardware fence in the moment you try to 
signal all of them. It would require another SRCU or lock for this.

Christian.

>
> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-09  6:53                               ` Christian König
@ 2021-04-09  7:01                                 ` Christian König
  2021-04-09 15:42                                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-09  7:01 UTC (permalink / raw)
  To: Christian König, Andrey Grodzovsky, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Am 09.04.21 um 08:53 schrieb Christian König:
> Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-08 2:58 p.m., Christian König wrote:
>>> Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
>>>> On 2021-04-08 4:32 a.m., Christian König wrote:
>>>>> Am 08.04.21 um 10:22 schrieb Christian König:
>>>>>> [SNIP]
>>>>>>>>
>>>>>>>>
>>>>>>>>> Beyond blocking all delayed works and scheduler threads we 
>>>>>>>>> also need to guarantee no  IOCTL can access MMIO post device 
>>>>>>>>> unplug OR in flight IOCTLs are done before we finish 
>>>>>>>>> pci_remove (amdgpu_pci_remove for us).
>>>>>>>>> For this I suggest we do something like what we worked on with 
>>>>>>>>> Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a 
>>>>>>>>> look at
>>>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=cbaa324799718e2b828a8c7b5b001dd896748497 
>>>>>>>>> and
>>>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=yadro/pcie_hotplug/movable_bars_v9.1&id=e36365d9ab5bbc30bdc221ab4b3437de34492440 
>>>>>>>>>
>>>>>>>>> We also had same issue there, how to prevent MMIO accesses 
>>>>>>>>> while the BARs are migrating. What was done there is a 
>>>>>>>>> refcount was added to count all IOCTLs in flight, for any in 
>>>>>>>>> flight  IOCTL the BAR migration handler would
>>>>>>>>> block for the refcount to drop to 0 before it would proceed, 
>>>>>>>>> for any later IOCTL it stops and wait if device is in 
>>>>>>>>> migration state. We even don't need the wait part, nothing to 
>>>>>>>>> wait for, we just return with -ENODEV for this case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>>>>
>>>>>>>> For the hotplug case we could do this in the toplevel since we 
>>>>>>>> can signal the fence and don't need to block memory management.
>>>>>>>
>>>>>>>
>>>>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to 
>>>>>>> wrap every IOCTL ( practically - just drm_ioctl function) with 
>>>>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>>>>
>>>>>
>>>>> Sorry totally missed this question.
>>>>>
>>>>> Yes, exactly that. As discussed for the hotplug case we can do this.
>>>>
>>>>
>>>> Thinking more about it - assuming we are  treating synchronize_srcu 
>>>> as a 'wait for completion' of any in flight {drm_dev_enter, 
>>>> drm_dev_exit} scope, some of those scopes might do dma_fence_wait 
>>>> inside. Since we haven't force signaled the fences yet we will end 
>>>> up a deadlock. We have to signal all the various fences before 
>>>> doing the 'wait for'. But we can't signal the fences before setting 
>>>> 'dev->unplugged = true' to reject further CS and other stuff which 
>>>> might create more fences we were supposed-to force signal and now 
>>>> missed them. Effectively setting 'dev->unplugged = true' and doing 
>>>> synchronize_srcu in one call like drm_dev_unplug does without 
>>>> signalling all the fences in the device in between these two steps 
>>>> looks luck a possible deadlock to me - what do you think ?
>>>>
>>>
>>> Indeed, that is a really good argument to handle it the same way as 
>>> the reset lock.
>>>
>>> E.g. not taking it at the high level IOCTL, but rather when the 
>>> frontend of the driver has acquired all the necessary locks (BO 
>>> resv, VM lock etc...) before calling into the backend to actually do 
>>> things with the hardware.
>>>
>>> Christian.
>>
>> From what you said I understand that you want to solve this problem 
>> by using drm_dev_enter/exit brackets low enough in the code such that 
>> it will not include and fence wait.
>>
>> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl is 
>> much less effort and less room for error then going through each 
>> IOCTL and trying to identify at what point (possibly multiple points) 
>> they are about to access HW, some of this is hidden deep in HAL 
>> layers such as DC layer in display driver or the multi layers of 
>> powerplay/SMU libraries. Also, we can't only limit our-self to 
>> back-end if by this you mean ASIC specific functions which access 
>> registers. We also need to take care of any MMIO kernel BO (VRAM BOs) 
>> where we may access directly MMIO space by pointer from the front end 
>> of the driver (HW agnostic) and TTM/DRM layers.
>
> Exactly, yes. The key point is we need to identify such places anyway 
> for GPU reset to work properly. So we could just piggy back hotplug on 
> top of that work and are done.
>
>>
>> Our problem here is how to signal all the existing  fences on one 
>> hand and on the other prevent any new dma_fence waits after we 
>> finished signaling existing fences. Once we solved this then there is 
>> no problem using drm_dev_unplug in conjunction with 
>> drm_dev_enter/exit at the highest level of drm_ioctl to flush any 
>> IOCTLs in flight and block any new ones.
>>
>> IMHO when we speak about signalling all fences we don't mean ALL the 
>> currently existing dma_fence structs (they are spread all over the 
>> place) but rather signal all the HW fences because HW is what's gone 
>> and we can't expect for those fences to be ever signaled. All the 
>> rest such as: scheduler fences,  user fences, drm_gem reservation 
>> objects e.t.c. are either dependent on those HW fences and hence 
>> signaling the HW fences will in turn signal them or, are not impacted 
>> by the HW being gone and hence can still be waited on and will 
>> complete. If this assumption is correct then I think that we should 
>> use some flag to prevent any new submission to HW which creates HW 
>> fences (somewhere around amdgpu_fence_emit), then traverse all 
>> existing HW fences (currently they are spread in a few places so 
>> maybe we need to track them in a list) and signal them. After that 
>> it's safe to cal drm_dev_unplug and be sure synchronize_srcu won't 
>> stall because of of dma_fence_wait. After that we can proceed to 
>> canceling work items, stopping schedulers e.t.c.
>
> That is problematic as well since you need to make sure that the 
> scheduler is not creating a new hardware fence in the moment you try 
> to signal all of them. It would require another SRCU or lock for this.

Alternatively grabbing the reset write side and stopping and then 
restarting the scheduler could work as well.

Christian.

>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-09  7:01                                 ` Christian König
@ 2021-04-09 15:42                                   ` Andrey Grodzovsky
  2021-04-09 16:39                                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-09 15:42 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


On 2021-04-09 3:01 a.m., Christian König wrote:
> Am 09.04.21 um 08:53 schrieb Christian König:
>> Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-04-08 2:58 p.m., Christian König wrote:
>>>> Am 08.04.21 um 18:08 schrieb Andrey Grodzovsky:
>>>>> On 2021-04-08 4:32 a.m., Christian König wrote:
>>>>>> Am 08.04.21 um 10:22 schrieb Christian König:
>>>>>>> [SNIP]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Beyond blocking all delayed works and scheduler threads we 
>>>>>>>>>> also need to guarantee no  IOCTL can access MMIO post device 
>>>>>>>>>> unplug OR in flight IOCTLs are done before we finish 
>>>>>>>>>> pci_remove (amdgpu_pci_remove for us).
>>>>>>>>>> For this I suggest we do something like what we worked on 
>>>>>>>>>> with Takashi Iwai the ALSA maintainer recently when he helped 
>>>>>>>>>> implementing PCI BARs move support for snd_hda_intel. Take a 
>>>>>>>>>> look at
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3Dcbaa324799718e2b828a8c7b5b001dd896748497&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c5e440d332f46b7f86208d8fb25422c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535484734581904%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=n%2FG3bLYUKdl9mitR9f1a8qLpkToLdKM3Iz4y23GFg60%3D&amp;reserved=0 
>>>>>>>>>> and
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Dyadro%2Fpcie_hotplug%2Fmovable_bars_v9.1%26id%3De36365d9ab5bbc30bdc221ab4b3437de34492440&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c5e440d332f46b7f86208d8fb25422c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637535484734581904%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xI88SgbdAK%2FUmCC3JOvAknFTdbDbfu4AIPL%2Bf8ol4ZI%3D&amp;reserved=0 
>>>>>>>>>>
>>>>>>>>>> We also had same issue there, how to prevent MMIO accesses 
>>>>>>>>>> while the BARs are migrating. What was done there is a 
>>>>>>>>>> refcount was added to count all IOCTLs in flight, for any in 
>>>>>>>>>> flight  IOCTL the BAR migration handler would
>>>>>>>>>> block for the refcount to drop to 0 before it would proceed, 
>>>>>>>>>> for any later IOCTL it stops and wait if device is in 
>>>>>>>>>> migration state. We even don't need the wait part, nothing to 
>>>>>>>>>> wait for, we just return with -ENODEV for this case.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is essentially what the DRM SRCU is doing as well.
>>>>>>>>>
>>>>>>>>> For the hotplug case we could do this in the toplevel since we 
>>>>>>>>> can signal the fence and don't need to block memory management.
>>>>>>>>
>>>>>>>>
>>>>>>>> To make SRCU 'wait for' all IOCTLs in flight we would need to 
>>>>>>>> wrap every IOCTL ( practically - just drm_ioctl function) with 
>>>>>>>> drm_dev_enter/drm_dev_exit - can we do it ?
>>>>>>>>
>>>>>>
>>>>>> Sorry totally missed this question.
>>>>>>
>>>>>> Yes, exactly that. As discussed for the hotplug case we can do this.
>>>>>
>>>>>
>>>>> Thinking more about it - assuming we are  treating 
>>>>> synchronize_srcu as a 'wait for completion' of any in flight 
>>>>> {drm_dev_enter, drm_dev_exit} scope, some of those scopes might do 
>>>>> dma_fence_wait inside. Since we haven't force signaled the fences 
>>>>> yet we will end up a deadlock. We have to signal all the various 
>>>>> fences before doing the 'wait for'. But we can't signal the fences 
>>>>> before setting 'dev->unplugged = true' to reject further CS and 
>>>>> other stuff which might create more fences we were supposed-to 
>>>>> force signal and now missed them. Effectively setting 
>>>>> 'dev->unplugged = true' and doing synchronize_srcu in one call 
>>>>> like drm_dev_unplug does without signalling all the fences in the 
>>>>> device in between these two steps looks luck a possible deadlock 
>>>>> to me - what do you think ?
>>>>>
>>>>
>>>> Indeed, that is a really good argument to handle it the same way as 
>>>> the reset lock.
>>>>
>>>> E.g. not taking it at the high level IOCTL, but rather when the 
>>>> frontend of the driver has acquired all the necessary locks (BO 
>>>> resv, VM lock etc...) before calling into the backend to actually 
>>>> do things with the hardware.
>>>>
>>>> Christian.
>>>
>>> From what you said I understand that you want to solve this problem 
>>> by using drm_dev_enter/exit brackets low enough in the code such 
>>> that it will not include and fence wait.
>>>
>>> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl 
>>> is much less effort and less room for error then going through each 
>>> IOCTL and trying to identify at what point (possibly multiple 
>>> points) they are about to access HW, some of this is hidden deep in 
>>> HAL layers such as DC layer in display driver or the multi layers of 
>>> powerplay/SMU libraries. Also, we can't only limit our-self to 
>>> back-end if by this you mean ASIC specific functions which access 
>>> registers. We also need to take care of any MMIO kernel BO (VRAM 
>>> BOs) where we may access directly MMIO space by pointer from the 
>>> front end of the driver (HW agnostic) and TTM/DRM layers.
>>
>> Exactly, yes. The key point is we need to identify such places anyway 
>> for GPU reset to work properly. So we could just piggy back hotplug 
>> on top of that work and are done.


I see most of this was done By Denis in this patch 
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=df9c8d1aa278c435c30a69b8f2418b4a52fcb929, 
indeed this doesn't cover the direct by pointer accesses of MMIO and 
will introduce much more of those and, as people write new code, new 
places to cover will pop up leading to regressions and extra work to 
fix. It would be really much better if we could blanket cover it at the 
very top  such as root of all IOCTLs or, for any queued work/timer at 
the very top function, to handle it once and for all.


>>
>>>
>>> Our problem here is how to signal all the existing  fences on one 
>>> hand and on the other prevent any new dma_fence waits after we 
>>> finished signaling existing fences. Once we solved this then there 
>>> is no problem using drm_dev_unplug in conjunction with 
>>> drm_dev_enter/exit at the highest level of drm_ioctl to flush any 
>>> IOCTLs in flight and block any new ones.
>>>
>>> IMHO when we speak about signalling all fences we don't mean ALL the 
>>> currently existing dma_fence structs (they are spread all over the 
>>> place) but rather signal all the HW fences because HW is what's gone 
>>> and we can't expect for those fences to be ever signaled. All the 
>>> rest such as: scheduler fences, user fences, drm_gem reservation 
>>> objects e.t.c. are either dependent on those HW fences and hence 
>>> signaling the HW fences will in turn signal them or, are not 
>>> impacted by the HW being gone and hence can still be waited on and 
>>> will complete. If this assumption is correct then I think that we 
>>> should use some flag to prevent any new submission to HW which 
>>> creates HW fences (somewhere around amdgpu_fence_emit), then 
>>> traverse all existing HW fences (currently they are spread in a few 
>>> places so maybe we need to track them in a list) and signal them. 
>>> After that it's safe to cal drm_dev_unplug and be sure 
>>> synchronize_srcu won't stall because of of dma_fence_wait. After 
>>> that we can proceed to canceling work items, stopping schedulers e.t.c.
>>
>> That is problematic as well since you need to make sure that the 
>> scheduler is not creating a new hardware fence in the moment you try 
>> to signal all of them. It would require another SRCU or lock for this.


If we use a list and a flag called 'emit_allowed' under a lock such that 
in amdgpu_fence_emit we lock the list, check the flag and if true add 
the new HW fence to list and proceed to HW emition as normal, otherwise 
return with -ENODEV. In amdgpu_pci_remove we take the lock, set the flag 
to false, and then iterate the list and force signal it. Will this not 
prevent any new HW fence creation from now on from any place trying to 
do so ?


>
> Alternatively grabbing the reset write side and stopping and then 
> restarting the scheduler could work as well.
>
> Christian.


I didn't get the above and I don't see why I need to reuse the GPU reset 
rw_lock. I rely on the SRCU unplug flag for unplug. Also, not clear to 
me why are we focusing on the scheduler threads, any code patch to 
generate HW fences should be covered, so any code leading to 
amdgpu_fence_emit needs to be taken into account such as, direct IB 
submissions, VM flushes e.t.c

Andrey


>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-09 15:42                                   ` Andrey Grodzovsky
@ 2021-04-09 16:39                                     ` Christian König
  2021-04-09 18:18                                       ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-09 16:39 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Am 09.04.21 um 17:42 schrieb Andrey Grodzovsky:
>
> On 2021-04-09 3:01 a.m., Christian König wrote:
>> Am 09.04.21 um 08:53 schrieb Christian König:
>>> Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl 
>>>> is much less effort and less room for error then going through each 
>>>> IOCTL and trying to identify at what point (possibly multiple 
>>>> points) they are about to access HW, some of this is hidden deep in 
>>>> HAL layers such as DC layer in display driver or the multi layers 
>>>> of powerplay/SMU libraries. Also, we can't only limit our-self to 
>>>> back-end if by this you mean ASIC specific functions which access 
>>>> registers. We also need to take care of any MMIO kernel BO (VRAM 
>>>> BOs) where we may access directly MMIO space by pointer from the 
>>>> front end of the driver (HW agnostic) and TTM/DRM layers.
>>>
>>> Exactly, yes. The key point is we need to identify such places 
>>> anyway for GPU reset to work properly. So we could just piggy back 
>>> hotplug on top of that work and are done.
>
>
> I see most of this was done By Denis in this patch 
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=df9c8d1aa278c435c30a69b8f2418b4a52fcb929, 
> indeed this doesn't cover the direct by pointer accesses of MMIO and 
> will introduce much more of those and, as people write new code, new 
> places to cover will pop up leading to regressions and extra work to 
> fix. It would be really much better if we could blanket cover it at 
> the very top  such as root of all IOCTLs or, for any queued work/timer 
> at the very top function, to handle it once and for all.

And exactly that's what is not possible. At least for the reset case you 
need to look into each hardware access and handle that bit by bit and I 
think that for the hotplug case we should go down that route as well.

>>>
>>>>
>>>> Our problem here is how to signal all the existing  fences on one 
>>>> hand and on the other prevent any new dma_fence waits after we 
>>>> finished signaling existing fences. Once we solved this then there 
>>>> is no problem using drm_dev_unplug in conjunction with 
>>>> drm_dev_enter/exit at the highest level of drm_ioctl to flush any 
>>>> IOCTLs in flight and block any new ones.
>>>>
>>>> IMHO when we speak about signalling all fences we don't mean ALL 
>>>> the currently existing dma_fence structs (they are spread all over 
>>>> the place) but rather signal all the HW fences because HW is what's 
>>>> gone and we can't expect for those fences to be ever signaled. All 
>>>> the rest such as: scheduler fences, user fences, drm_gem 
>>>> reservation objects e.t.c. are either dependent on those HW fences 
>>>> and hence signaling the HW fences will in turn signal them or, are 
>>>> not impacted by the HW being gone and hence can still be waited on 
>>>> and will complete. If this assumption is correct then I think that 
>>>> we should use some flag to prevent any new submission to HW which 
>>>> creates HW fences (somewhere around amdgpu_fence_emit), then 
>>>> traverse all existing HW fences (currently they are spread in a few 
>>>> places so maybe we need to track them in a list) and signal them. 
>>>> After that it's safe to cal drm_dev_unplug and be sure 
>>>> synchronize_srcu won't stall because of of dma_fence_wait. After 
>>>> that we can proceed to canceling work items, stopping schedulers 
>>>> e.t.c.
>>>
>>> That is problematic as well since you need to make sure that the 
>>> scheduler is not creating a new hardware fence in the moment you try 
>>> to signal all of them. It would require another SRCU or lock for this.
>
>
> If we use a list and a flag called 'emit_allowed' under a lock such 
> that in amdgpu_fence_emit we lock the list, check the flag and if true 
> add the new HW fence to list and proceed to HW emition as normal, 
> otherwise return with -ENODEV. In amdgpu_pci_remove we take the lock, 
> set the flag to false, and then iterate the list and force signal it. 
> Will this not prevent any new HW fence creation from now on from any 
> place trying to do so ?

Way to much overhead. The fence processing is intentionally lock free to 
avoid cache line bouncing because the IRQ can move from CPU to CPU.

We need something which at least the processing of fences in the 
interrupt handler doesn't affect at all.

>>
>> Alternatively grabbing the reset write side and stopping and then 
>> restarting the scheduler could work as well.
>>
>> Christian.
>
>
> I didn't get the above and I don't see why I need to reuse the GPU 
> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not 
> clear to me why are we focusing on the scheduler threads, any code 
> patch to generate HW fences should be covered, so any code leading to 
> amdgpu_fence_emit needs to be taken into account such as, direct IB 
> submissions, VM flushes e.t.c

You need to work together with the reset lock anyway, cause a hotplug 
could run at the same time as a reset.


Christian.

>
> Andrey
>
>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>
>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-09 16:39                                     ` Christian König
@ 2021-04-09 18:18                                       ` Andrey Grodzovsky
  2021-04-10 17:34                                         ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-09 18:18 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


On 2021-04-09 12:39 p.m., Christian König wrote:
> Am 09.04.21 um 17:42 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-09 3:01 a.m., Christian König wrote:
>>> Am 09.04.21 um 08:53 schrieb Christian König:
>>>> Am 08.04.21 um 22:39 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>> But inserting dmr_dev_enter/exit on the highest level in drm_ioctl 
>>>>> is much less effort and less room for error then going through 
>>>>> each IOCTL and trying to identify at what point (possibly multiple 
>>>>> points) they are about to access HW, some of this is hidden deep 
>>>>> in HAL layers such as DC layer in display driver or the multi 
>>>>> layers of powerplay/SMU libraries. Also, we can't only limit 
>>>>> our-self to back-end if by this you mean ASIC specific functions 
>>>>> which access registers. We also need to take care of any MMIO 
>>>>> kernel BO (VRAM BOs) where we may access directly MMIO space by 
>>>>> pointer from the front end of the driver (HW agnostic) and TTM/DRM 
>>>>> layers.
>>>>
>>>> Exactly, yes. The key point is we need to identify such places 
>>>> anyway for GPU reset to work properly. So we could just piggy back 
>>>> hotplug on top of that work and are done.
>>
>>
>> I see most of this was done By Denis in this patch 
>> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=df9c8d1aa278c435c30a69b8f2418b4a52fcb929, 
>> indeed this doesn't cover the direct by pointer accesses of MMIO and 
>> will introduce much more of those and, as people write new code, new 
>> places to cover will pop up leading to regressions and extra work to 
>> fix. It would be really much better if we could blanket cover it at 
>> the very top  such as root of all IOCTLs or, for any queued 
>> work/timer at the very top function, to handle it once and for all.
>
> And exactly that's what is not possible. At least for the reset case 
> you need to look into each hardware access and handle that bit by bit 
> and I think that for the hotplug case we should go down that route as 
> well.
>
>>>>
>>>>>
>>>>> Our problem here is how to signal all the existing  fences on one 
>>>>> hand and on the other prevent any new dma_fence waits after we 
>>>>> finished signaling existing fences. Once we solved this then there 
>>>>> is no problem using drm_dev_unplug in conjunction with 
>>>>> drm_dev_enter/exit at the highest level of drm_ioctl to flush any 
>>>>> IOCTLs in flight and block any new ones.
>>>>>
>>>>> IMHO when we speak about signalling all fences we don't mean ALL 
>>>>> the currently existing dma_fence structs (they are spread all over 
>>>>> the place) but rather signal all the HW fences because HW is 
>>>>> what's gone and we can't expect for those fences to be ever 
>>>>> signaled. All the rest such as: scheduler fences, user fences, 
>>>>> drm_gem reservation objects e.t.c. are either dependent on those 
>>>>> HW fences and hence signaling the HW fences will in turn signal 
>>>>> them or, are not impacted by the HW being gone and hence can still 
>>>>> be waited on and will complete. If this assumption is correct then 
>>>>> I think that we should use some flag to prevent any new submission 
>>>>> to HW which creates HW fences (somewhere around 
>>>>> amdgpu_fence_emit), then traverse all existing HW fences 
>>>>> (currently they are spread in a few places so maybe we need to 
>>>>> track them in a list) and signal them. After that it's safe to cal 
>>>>> drm_dev_unplug and be sure synchronize_srcu won't stall because of 
>>>>> of dma_fence_wait. After that we can proceed to canceling work 
>>>>> items, stopping schedulers e.t.c.
>>>>
>>>> That is problematic as well since you need to make sure that the 
>>>> scheduler is not creating a new hardware fence in the moment you 
>>>> try to signal all of them. It would require another SRCU or lock 
>>>> for this.
>>
>>
>> If we use a list and a flag called 'emit_allowed' under a lock such 
>> that in amdgpu_fence_emit we lock the list, check the flag and if 
>> true add the new HW fence to list and proceed to HW emition as 
>> normal, otherwise return with -ENODEV. In amdgpu_pci_remove we take 
>> the lock, set the flag to false, and then iterate the list and force 
>> signal it. Will this not prevent any new HW fence creation from now 
>> on from any place trying to do so ?
>
> Way to much overhead. The fence processing is intentionally lock free 
> to avoid cache line bouncing because the IRQ can move from CPU to CPU.
>
> We need something which at least the processing of fences in the 
> interrupt handler doesn't affect at all.


As far as I see in the code, amdgpu_fence_emit is only called from task 
context. Also, we can skip this list I proposed and just use 
amdgpu_fence_driver_force_completion for each ring to signal all created 
HW fences.


>
>>>
>>> Alternatively grabbing the reset write side and stopping and then 
>>> restarting the scheduler could work as well.
>>>
>>> Christian.
>>
>>
>> I didn't get the above and I don't see why I need to reuse the GPU 
>> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not 
>> clear to me why are we focusing on the scheduler threads, any code 
>> patch to generate HW fences should be covered, so any code leading to 
>> amdgpu_fence_emit needs to be taken into account such as, direct IB 
>> submissions, VM flushes e.t.c
>
> You need to work together with the reset lock anyway, cause a hotplug 
> could run at the same time as a reset.


For going my way indeed now I see now that I have to take reset write 
side lock during HW fences signalling in order to protect against 
scheduler/HW fences detachment and reattachment during schedulers 
stop/restart. But if we go with your approach  then calling 
drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit 
should be enough to prevent any concurrent GPU resets during unplug. In 
fact I already do it anyway - 
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=ef0ea4dd29ef44d2649c5eda16c8f4869acc36b1

Andrey


>
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-09 18:18                                       ` Andrey Grodzovsky
@ 2021-04-10 17:34                                         ` Christian König
  2021-04-12 17:27                                           ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-10 17:34 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Hi Andrey,

Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
> [SNIP]
>>>
>>> If we use a list and a flag called 'emit_allowed' under a lock such 
>>> that in amdgpu_fence_emit we lock the list, check the flag and if 
>>> true add the new HW fence to list and proceed to HW emition as 
>>> normal, otherwise return with -ENODEV. In amdgpu_pci_remove we take 
>>> the lock, set the flag to false, and then iterate the list and force 
>>> signal it. Will this not prevent any new HW fence creation from now 
>>> on from any place trying to do so ?
>>
>> Way to much overhead. The fence processing is intentionally lock free 
>> to avoid cache line bouncing because the IRQ can move from CPU to CPU.
>>
>> We need something which at least the processing of fences in the 
>> interrupt handler doesn't affect at all.
>
>
> As far as I see in the code, amdgpu_fence_emit is only called from 
> task context. Also, we can skip this list I proposed and just use 
> amdgpu_fence_driver_force_completion for each ring to signal all 
> created HW fences.

Ah, wait a second this gave me another idea.

See amdgpu_fence_driver_force_completion():

amdgpu_fence_write(ring, ring->fence_drv.sync_seq);

If we change that to something like:

amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);

Not only the currently submitted, but also the next 0x3FFFFFFF fences 
will be considered signaled.

This basically solves out problem of making sure that new fences are 
also signaled without any additional overhead whatsoever.

>
>>>>
>>>> Alternatively grabbing the reset write side and stopping and then 
>>>> restarting the scheduler could work as well.
>>>>
>>>> Christian.
>>>
>>>
>>> I didn't get the above and I don't see why I need to reuse the GPU 
>>> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not 
>>> clear to me why are we focusing on the scheduler threads, any code 
>>> patch to generate HW fences should be covered, so any code leading 
>>> to amdgpu_fence_emit needs to be taken into account such as, direct 
>>> IB submissions, VM flushes e.t.c
>>
>> You need to work together with the reset lock anyway, cause a hotplug 
>> could run at the same time as a reset.
>
>
> For going my way indeed now I see now that I have to take reset write 
> side lock during HW fences signalling in order to protect against 
> scheduler/HW fences detachment and reattachment during schedulers 
> stop/restart. But if we go with your approach  then calling 
> drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit 
> should be enough to prevent any concurrent GPU resets during unplug. 
> In fact I already do it anyway - 
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=drm-misc-next&id=ef0ea4dd29ef44d2649c5eda16c8f4869acc36b1

Yes, good point as well.

Christian.

>
> Andrey
>
>
>>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-10 17:34                                         ` Christian König
@ 2021-04-12 17:27                                           ` Andrey Grodzovsky
  2021-04-12 17:44                                             ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-12 17:27 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 5152 bytes --]


On 2021-04-10 1:34 p.m., Christian König wrote:
> Hi Andrey,
>
> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>> [SNIP]
>>>>
>>>> If we use a list and a flag called 'emit_allowed' under a lock such 
>>>> that in amdgpu_fence_emit we lock the list, check the flag and if 
>>>> true add the new HW fence to list and proceed to HW emition as 
>>>> normal, otherwise return with -ENODEV. In amdgpu_pci_remove we take 
>>>> the lock, set the flag to false, and then iterate the list and 
>>>> force signal it. Will this not prevent any new HW fence creation 
>>>> from now on from any place trying to do so ?
>>>
>>> Way to much overhead. The fence processing is intentionally lock 
>>> free to avoid cache line bouncing because the IRQ can move from CPU 
>>> to CPU.
>>>
>>> We need something which at least the processing of fences in the 
>>> interrupt handler doesn't affect at all.
>>
>>
>> As far as I see in the code, amdgpu_fence_emit is only called from 
>> task context. Also, we can skip this list I proposed and just use 
>> amdgpu_fence_driver_force_completion for each ring to signal all 
>> created HW fences.
>
> Ah, wait a second this gave me another idea.
>
> See amdgpu_fence_driver_force_completion():
>
> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>
> If we change that to something like:
>
> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>
> Not only the currently submitted, but also the next 0x3FFFFFFF fences 
> will be considered signaled.
>
> This basically solves out problem of making sure that new fences are 
> also signaled without any additional overhead whatsoever.


Problem with this is that the act of setting the sync_seq to some MAX 
value alone is not enough, you actually have to call 
amdgpu_fence_process to iterate and signal the fences currently stored 
in ring->fence_drv.fences array and to guarantee that once you done your 
signalling no more HW fences will be added to that array anymore. I was 
thinking to do something like bellow:

amdgpu_fence_emit()

{

     dma_fence_init(fence);

     srcu_read_lock(amdgpu_unplug_srcu)

     if (!adev->unplug)) {

         seq = ++ring->fence_drv.sync_seq;
         emit_fence(fence);

*/* We can't wait forever as the HW might be gone at any point*/**
        dma_fence_wait_timeout(old_fence, 5S);*
         ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask] = 
fence;

     } else {

         dma_fence_set_error(fence, -ENODEV);
         DMA_fence_signal(fence)

     }

     srcu_read_unlock(amdgpu_unplug_srcu)
     return fence;

}

amdgpu_pci_remove

{

     adev->unplug = true;
     synchronize_srcu(amdgpu_unplug_srcu)

     /* Past this point no more fence are submitted to HW ring and hence 
we can safely call force signal on all that are currently there.
      * Any subsequently created  HW fences will be returned signaled 
with an error code right away
      */

     for_each_ring(adev)
         amdgpu_fence_process(ring)

     drm_dev_unplug(dev);
     Stop schedulers
     cancel_sync(all timers and queued works);
     hw_fini
     unmap_mmio

}


Andrey


>
>
>>
>>>>>
>>>>> Alternatively grabbing the reset write side and stopping and then 
>>>>> restarting the scheduler could work as well.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> I didn't get the above and I don't see why I need to reuse the GPU 
>>>> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, not 
>>>> clear to me why are we focusing on the scheduler threads, any code 
>>>> patch to generate HW fences should be covered, so any code leading 
>>>> to amdgpu_fence_emit needs to be taken into account such as, direct 
>>>> IB submissions, VM flushes e.t.c
>>>
>>> You need to work together with the reset lock anyway, cause a 
>>> hotplug could run at the same time as a reset.
>>
>>
>> For going my way indeed now I see now that I have to take reset write 
>> side lock during HW fences signalling in order to protect against 
>> scheduler/HW fences detachment and reattachment during schedulers 
>> stop/restart. But if we go with your approach  then calling 
>> drm_dev_unplug and scoping amdgpu_job_timeout with drm_dev_enter/exit 
>> should be enough to prevent any concurrent GPU resets during unplug. 
>> In fact I already do it anyway - 
>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>
> Yes, good point as well.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 9256 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 17:27                                           ` Andrey Grodzovsky
@ 2021-04-12 17:44                                             ` Christian König
  2021-04-12 18:01                                               ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-12 17:44 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 5776 bytes --]


Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
> On 2021-04-10 1:34 p.m., Christian König wrote:
>> Hi Andrey,
>>
>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>>
>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>> such that in amdgpu_fence_emit we lock the list, check the flag 
>>>>> and if true add the new HW fence to list and proceed to HW emition 
>>>>> as normal, otherwise return with -ENODEV. In amdgpu_pci_remove we 
>>>>> take the lock, set the flag to false, and then iterate the list 
>>>>> and force signal it. Will this not prevent any new HW fence 
>>>>> creation from now on from any place trying to do so ?
>>>>
>>>> Way to much overhead. The fence processing is intentionally lock 
>>>> free to avoid cache line bouncing because the IRQ can move from CPU 
>>>> to CPU.
>>>>
>>>> We need something which at least the processing of fences in the 
>>>> interrupt handler doesn't affect at all.
>>>
>>>
>>> As far as I see in the code, amdgpu_fence_emit is only called from 
>>> task context. Also, we can skip this list I proposed and just use 
>>> amdgpu_fence_driver_force_completion for each ring to signal all 
>>> created HW fences.
>>
>> Ah, wait a second this gave me another idea.
>>
>> See amdgpu_fence_driver_force_completion():
>>
>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>
>> If we change that to something like:
>>
>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>
>> Not only the currently submitted, but also the next 0x3FFFFFFF fences 
>> will be considered signaled.
>>
>> This basically solves out problem of making sure that new fences are 
>> also signaled without any additional overhead whatsoever.
>
>
> Problem with this is that the act of setting the sync_seq to some MAX 
> value alone is not enough, you actually have to call 
> amdgpu_fence_process to iterate and signal the fences currently stored 
> in ring->fence_drv.fences array and to guarantee that once you done 
> your signalling no more HW fences will be added to that array anymore. 
> I was thinking to do something like bellow:
>

Well we could implement the is_signaled callback once more, but I'm not 
sure if that is a good idea.

> amdgpu_fence_emit()
>
> {
>
>     dma_fence_init(fence);
>
>     srcu_read_lock(amdgpu_unplug_srcu)
>
>     if (!adev->unplug)) {
>
>         seq = ++ring->fence_drv.sync_seq;
>         emit_fence(fence);
>
> */* We can't wait forever as the HW might be gone at any point*/**
>        dma_fence_wait_timeout(old_fence, 5S);*
>

You can pretty much ignore this wait here. It is only as a last resort 
so that we never overwrite the ring buffers.

But it should not have a timeout as far as I can see.

>         ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask] 
> = fence;
>
>     } else {
>
>         dma_fence_set_error(fence, -ENODEV);
>         DMA_fence_signal(fence)
>
>     }
>
>     srcu_read_unlock(amdgpu_unplug_srcu)
>     return fence;
>
> }
>
> amdgpu_pci_remove
>
> {
>
>     adev->unplug = true;
>     synchronize_srcu(amdgpu_unplug_srcu)
>

Well that is just duplicating what drm_dev_unplug() should be doing on a 
different level.

Christian.

>     /* Past this point no more fence are submitted to HW ring and 
> hence we can safely call force signal on all that are currently there.
>      * Any subsequently created  HW fences will be returned signaled 
> with an error code right away
>      */
>
>     for_each_ring(adev)
>         amdgpu_fence_process(ring)
>
>     drm_dev_unplug(dev);
>     Stop schedulers
>     cancel_sync(all timers and queued works);
>     hw_fini
>     unmap_mmio
>
> }
>
>
> Andrey
>
>
>>
>>
>>>
>>>>>>
>>>>>> Alternatively grabbing the reset write side and stopping and then 
>>>>>> restarting the scheduler could work as well.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> I didn't get the above and I don't see why I need to reuse the GPU 
>>>>> reset rw_lock. I rely on the SRCU unplug flag for unplug. Also, 
>>>>> not clear to me why are we focusing on the scheduler threads, any 
>>>>> code patch to generate HW fences should be covered, so any code 
>>>>> leading to amdgpu_fence_emit needs to be taken into account such 
>>>>> as, direct IB submissions, VM flushes e.t.c
>>>>
>>>> You need to work together with the reset lock anyway, cause a 
>>>> hotplug could run at the same time as a reset.
>>>
>>>
>>> For going my way indeed now I see now that I have to take reset 
>>> write side lock during HW fences signalling in order to protect 
>>> against scheduler/HW fences detachment and reattachment during 
>>> schedulers stop/restart. But if we go with your approach  then 
>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>> drm_dev_enter/exit should be enough to prevent any concurrent GPU 
>>> resets during unplug. In fact I already do it anyway - 
>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>
>> Yes, good point as well.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>


[-- Attachment #1.2: Type: text/html, Size: 10279 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 17:44                                             ` Christian König
@ 2021-04-12 18:01                                               ` Andrey Grodzovsky
  2021-04-12 18:05                                                 ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-12 18:01 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 6597 bytes --]


On 2021-04-12 1:44 p.m., Christian König wrote:
>
> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>> Hi Andrey,
>>>
>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>>
>>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>>> such that in amdgpu_fence_emit we lock the list, check the flag 
>>>>>> and if true add the new HW fence to list and proceed to HW 
>>>>>> emition as normal, otherwise return with -ENODEV. In 
>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, and 
>>>>>> then iterate the list and force signal it. Will this not prevent 
>>>>>> any new HW fence creation from now on from any place trying to do 
>>>>>> so ?
>>>>>
>>>>> Way to much overhead. The fence processing is intentionally lock 
>>>>> free to avoid cache line bouncing because the IRQ can move from 
>>>>> CPU to CPU.
>>>>>
>>>>> We need something which at least the processing of fences in the 
>>>>> interrupt handler doesn't affect at all.
>>>>
>>>>
>>>> As far as I see in the code, amdgpu_fence_emit is only called from 
>>>> task context. Also, we can skip this list I proposed and just use 
>>>> amdgpu_fence_driver_force_completion for each ring to signal all 
>>>> created HW fences.
>>>
>>> Ah, wait a second this gave me another idea.
>>>
>>> See amdgpu_fence_driver_force_completion():
>>>
>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>
>>> If we change that to something like:
>>>
>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>
>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>> fences will be considered signaled.
>>>
>>> This basically solves out problem of making sure that new fences are 
>>> also signaled without any additional overhead whatsoever.
>>
>>
>> Problem with this is that the act of setting the sync_seq to some MAX 
>> value alone is not enough, you actually have to call 
>> amdgpu_fence_process to iterate and signal the fences currently 
>> stored in ring->fence_drv.fences array and to guarantee that once you 
>> done your signalling no more HW fences will be added to that array 
>> anymore. I was thinking to do something like bellow:
>>
>
> Well we could implement the is_signaled callback once more, but I'm 
> not sure if that is a good idea.


This indeed could save the explicit signaling I am doing bellow but I 
also set an error code there which might be helpful to propagate to users


>
>> amdgpu_fence_emit()
>>
>> {
>>
>>     dma_fence_init(fence);
>>
>>     srcu_read_lock(amdgpu_unplug_srcu)
>>
>>     if (!adev->unplug)) {
>>
>>         seq = ++ring->fence_drv.sync_seq;
>>         emit_fence(fence);
>>
>> */* We can't wait forever as the HW might be gone at any point*/**
>>        dma_fence_wait_timeout(old_fence, 5S);*
>>
>
> You can pretty much ignore this wait here. It is only as a last resort 
> so that we never overwrite the ring buffers.


If device is present how can I ignore this ?


>
> But it should not have a timeout as far as I can see.


Without timeout wait the who approach falls apart as I can't call 
srcu_synchronize on this scope because once device is physically gone 
the wait here will be forever


>
>>         ring->fence_drv.fences[seq & ring->fence_drv.num_fences_mask] 
>> = fence;
>>
>>     } else {
>>
>>         dma_fence_set_error(fence, -ENODEV);
>>         DMA_fence_signal(fence)
>>
>>     }
>>
>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>     return fence;
>>
>> }
>>
>> amdgpu_pci_remove
>>
>> {
>>
>>     adev->unplug = true;
>>     synchronize_srcu(amdgpu_unplug_srcu)
>>
>
> Well that is just duplicating what drm_dev_unplug() should be doing on 
> a different level.


drm_dev_unplug is on a much wider scope, for everything in the device 
including 'flushing' in flight IOCTLs, this deals specifically with the 
issue of force signalling HW fences

Andrey


>
> Christian.
>
>>     /* Past this point no more fence are submitted to HW ring and 
>> hence we can safely call force signal on all that are currently there.
>>      * Any subsequently created  HW fences will be returned signaled 
>> with an error code right away
>>      */
>>
>>     for_each_ring(adev)
>>         amdgpu_fence_process(ring)
>>
>>     drm_dev_unplug(dev);
>>     Stop schedulers
>>     cancel_sync(all timers and queued works);
>>     hw_fini
>>     unmap_mmio
>>
>> }
>>
>>
>> Andrey
>>
>>
>>>
>>>
>>>>
>>>>>>>
>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>> then restarting the scheduler could work as well.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> I didn't get the above and I don't see why I need to reuse the 
>>>>>> GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. 
>>>>>> Also, not clear to me why are we focusing on the scheduler 
>>>>>> threads, any code patch to generate HW fences should be covered, 
>>>>>> so any code leading to amdgpu_fence_emit needs to be taken into 
>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>
>>>>> You need to work together with the reset lock anyway, cause a 
>>>>> hotplug could run at the same time as a reset.
>>>>
>>>>
>>>> For going my way indeed now I see now that I have to take reset 
>>>> write side lock during HW fences signalling in order to protect 
>>>> against scheduler/HW fences detachment and reattachment during 
>>>> schedulers stop/restart. But if we go with your approach  then 
>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>> drm_dev_enter/exit should be enough to prevent any concurrent GPU 
>>>> resets during unplug. In fact I already do it anyway - 
>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>
>>> Yes, good point as well.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 12171 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 18:01                                               ` Andrey Grodzovsky
@ 2021-04-12 18:05                                                 ` Christian König
  2021-04-12 18:18                                                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-12 18:05 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 7110 bytes --]

Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>
> On 2021-04-12 1:44 p.m., Christian König wrote:
>
>>
>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>> Hi Andrey,
>>>>
>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>>>>
>>>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>>>> such that in amdgpu_fence_emit we lock the list, check the flag 
>>>>>>> and if true add the new HW fence to list and proceed to HW 
>>>>>>> emition as normal, otherwise return with -ENODEV. In 
>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, and 
>>>>>>> then iterate the list and force signal it. Will this not prevent 
>>>>>>> any new HW fence creation from now on from any place trying to 
>>>>>>> do so ?
>>>>>>
>>>>>> Way to much overhead. The fence processing is intentionally lock 
>>>>>> free to avoid cache line bouncing because the IRQ can move from 
>>>>>> CPU to CPU.
>>>>>>
>>>>>> We need something which at least the processing of fences in the 
>>>>>> interrupt handler doesn't affect at all.
>>>>>
>>>>>
>>>>> As far as I see in the code, amdgpu_fence_emit is only called from 
>>>>> task context. Also, we can skip this list I proposed and just use 
>>>>> amdgpu_fence_driver_force_completion for each ring to signal all 
>>>>> created HW fences.
>>>>
>>>> Ah, wait a second this gave me another idea.
>>>>
>>>> See amdgpu_fence_driver_force_completion():
>>>>
>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>
>>>> If we change that to something like:
>>>>
>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>
>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>> fences will be considered signaled.
>>>>
>>>> This basically solves out problem of making sure that new fences 
>>>> are also signaled without any additional overhead whatsoever.
>>>
>>>
>>> Problem with this is that the act of setting the sync_seq to some 
>>> MAX value alone is not enough, you actually have to call 
>>> amdgpu_fence_process to iterate and signal the fences currently 
>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>> you done your signalling no more HW fences will be added to that 
>>> array anymore. I was thinking to do something like bellow:
>>>
>>
>> Well we could implement the is_signaled callback once more, but I'm 
>> not sure if that is a good idea.
>
>
> This indeed could save the explicit signaling I am doing bellow but I 
> also set an error code there which might be helpful to propagate to users
>
>
>>
>>> amdgpu_fence_emit()
>>>
>>> {
>>>
>>>     dma_fence_init(fence);
>>>
>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>
>>>     if (!adev->unplug)) {
>>>
>>>         seq = ++ring->fence_drv.sync_seq;
>>>         emit_fence(fence);
>>>
>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>
>>
>> You can pretty much ignore this wait here. It is only as a last 
>> resort so that we never overwrite the ring buffers.
>
>
> If device is present how can I ignore this ?
>
>
>>
>> But it should not have a timeout as far as I can see.
>
>
> Without timeout wait the who approach falls apart as I can't call 
> srcu_synchronize on this scope because once device is physically gone 
> the wait here will be forever
>

Yeah, but this is intentional. The only alternative to avoid corruption 
is to wait with a timeout and call BUG() if that triggers. That isn't 
much better.

>
>>
>>>         ring->fence_drv.fences[seq & 
>>> ring->fence_drv.num_fences_mask] = fence;
>>>
>>>     } else {
>>>
>>>         dma_fence_set_error(fence, -ENODEV);
>>>         DMA_fence_signal(fence)
>>>
>>>     }
>>>
>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>     return fence;
>>>
>>> }
>>>
>>> amdgpu_pci_remove
>>>
>>> {
>>>
>>>     adev->unplug = true;
>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>
>>
>> Well that is just duplicating what drm_dev_unplug() should be doing 
>> on a different level.
>
>
> drm_dev_unplug is on a much wider scope, for everything in the device 
> including 'flushing' in flight IOCTLs, this deals specifically with 
> the issue of force signalling HW fences
>

Yeah, but it adds the same overhead as the device srcu.

Christian.

> Andrey
>
>
>>
>> Christian.
>>
>>>     /* Past this point no more fence are submitted to HW ring and 
>>> hence we can safely call force signal on all that are currently there.
>>>      * Any subsequently created  HW fences will be returned signaled 
>>> with an error code right away
>>>      */
>>>
>>>     for_each_ring(adev)
>>>         amdgpu_fence_process(ring)
>>>
>>>     drm_dev_unplug(dev);
>>>     Stop schedulers
>>>     cancel_sync(all timers and queued works);
>>>     hw_fini
>>>     unmap_mmio
>>>
>>> }
>>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>>>>
>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>> I didn't get the above and I don't see why I need to reuse the 
>>>>>>> GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. 
>>>>>>> Also, not clear to me why are we focusing on the scheduler 
>>>>>>> threads, any code patch to generate HW fences should be covered, 
>>>>>>> so any code leading to amdgpu_fence_emit needs to be taken into 
>>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>>
>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>> hotplug could run at the same time as a reset.
>>>>>
>>>>>
>>>>> For going my way indeed now I see now that I have to take reset 
>>>>> write side lock during HW fences signalling in order to protect 
>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>> drm_dev_enter/exit should be enough to prevent any concurrent GPU 
>>>>> resets during unplug. In fact I already do it anyway - 
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>
>>>> Yes, good point as well.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>


[-- Attachment #1.2: Type: text/html, Size: 13311 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 18:05                                                 ` Christian König
@ 2021-04-12 18:18                                                   ` Andrey Grodzovsky
  2021-04-12 18:23                                                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-12 18:18 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 7668 bytes --]


On 2021-04-12 2:05 p.m., Christian König wrote:
> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-12 1:44 p.m., Christian König wrote:
>>
>>>
>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>>> Hi Andrey,
>>>>>
>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>>>>> such that in amdgpu_fence_emit we lock the list, check the flag 
>>>>>>>> and if true add the new HW fence to list and proceed to HW 
>>>>>>>> emition as normal, otherwise return with -ENODEV. In 
>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, and 
>>>>>>>> then iterate the list and force signal it. Will this not 
>>>>>>>> prevent any new HW fence creation from now on from any place 
>>>>>>>> trying to do so ?
>>>>>>>
>>>>>>> Way to much overhead. The fence processing is intentionally lock 
>>>>>>> free to avoid cache line bouncing because the IRQ can move from 
>>>>>>> CPU to CPU.
>>>>>>>
>>>>>>> We need something which at least the processing of fences in the 
>>>>>>> interrupt handler doesn't affect at all.
>>>>>>
>>>>>>
>>>>>> As far as I see in the code, amdgpu_fence_emit is only called 
>>>>>> from task context. Also, we can skip this list I proposed and 
>>>>>> just use amdgpu_fence_driver_force_completion for each ring to 
>>>>>> signal all created HW fences.
>>>>>
>>>>> Ah, wait a second this gave me another idea.
>>>>>
>>>>> See amdgpu_fence_driver_force_completion():
>>>>>
>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>>
>>>>> If we change that to something like:
>>>>>
>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>>
>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>>> fences will be considered signaled.
>>>>>
>>>>> This basically solves out problem of making sure that new fences 
>>>>> are also signaled without any additional overhead whatsoever.
>>>>
>>>>
>>>> Problem with this is that the act of setting the sync_seq to some 
>>>> MAX value alone is not enough, you actually have to call 
>>>> amdgpu_fence_process to iterate and signal the fences currently 
>>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>>> you done your signalling no more HW fences will be added to that 
>>>> array anymore. I was thinking to do something like bellow:
>>>>
>>>
>>> Well we could implement the is_signaled callback once more, but I'm 
>>> not sure if that is a good idea.
>>
>>
>> This indeed could save the explicit signaling I am doing bellow but I 
>> also set an error code there which might be helpful to propagate to users
>>
>>
>>>
>>>> amdgpu_fence_emit()
>>>>
>>>> {
>>>>
>>>>     dma_fence_init(fence);
>>>>
>>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>>
>>>>     if (!adev->unplug)) {
>>>>
>>>>         seq = ++ring->fence_drv.sync_seq;
>>>>         emit_fence(fence);
>>>>
>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>
>>>
>>> You can pretty much ignore this wait here. It is only as a last 
>>> resort so that we never overwrite the ring buffers.
>>
>>
>> If device is present how can I ignore this ?
>>

I think you missed my question here


>>
>>>
>>> But it should not have a timeout as far as I can see.
>>
>>
>> Without timeout wait the who approach falls apart as I can't call 
>> srcu_synchronize on this scope because once device is physically gone 
>> the wait here will be forever
>>
>
> Yeah, but this is intentional. The only alternative to avoid 
> corruption is to wait with a timeout and call BUG() if that triggers. 
> That isn't much better.
>
>>
>>>
>>>>         ring->fence_drv.fences[seq & 
>>>> ring->fence_drv.num_fences_mask] = fence;
>>>>
>>>>     } else {
>>>>
>>>>         dma_fence_set_error(fence, -ENODEV);
>>>>         DMA_fence_signal(fence)
>>>>
>>>>     }
>>>>
>>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>>     return fence;
>>>>
>>>> }
>>>>
>>>> amdgpu_pci_remove
>>>>
>>>> {
>>>>
>>>>     adev->unplug = true;
>>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>>
>>>
>>> Well that is just duplicating what drm_dev_unplug() should be doing 
>>> on a different level.
>>
>>
>> drm_dev_unplug is on a much wider scope, for everything in the device 
>> including 'flushing' in flight IOCTLs, this deals specifically with 
>> the issue of force signalling HW fences
>>
>
> Yeah, but it adds the same overhead as the device srcu.
>
> Christian.


So what's the right approach ? How we guarantee that when running 
amdgpu_fence_driver_force_completion we will signal all the HW fences 
and not racing against some more fences insertion into that array ?

Andrey


>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>     /* Past this point no more fence are submitted to HW ring and 
>>>> hence we can safely call force signal on all that are currently there.
>>>>      * Any subsequently created  HW fences will be returned 
>>>> signaled with an error code right away
>>>>      */
>>>>
>>>>     for_each_ring(adev)
>>>>         amdgpu_fence_process(ring)
>>>>
>>>>     drm_dev_unplug(dev);
>>>>     Stop schedulers
>>>>     cancel_sync(all timers and queued works);
>>>>     hw_fini
>>>>     unmap_mmio
>>>>
>>>> }
>>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>>>
>>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't get the above and I don't see why I need to reuse the 
>>>>>>>> GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. 
>>>>>>>> Also, not clear to me why are we focusing on the scheduler 
>>>>>>>> threads, any code patch to generate HW fences should be 
>>>>>>>> covered, so any code leading to amdgpu_fence_emit needs to be 
>>>>>>>> taken into account such as, direct IB submissions, VM flushes 
>>>>>>>> e.t.c
>>>>>>>
>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>> hotplug could run at the same time as a reset.
>>>>>>
>>>>>>
>>>>>> For going my way indeed now I see now that I have to take reset 
>>>>>> write side lock during HW fences signalling in order to protect 
>>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent GPU 
>>>>>> resets during unplug. In fact I already do it anyway - 
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>
>>>>> Yes, good point as well.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 14876 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 18:18                                                   ` Andrey Grodzovsky
@ 2021-04-12 18:23                                                     ` Christian König
  2021-04-12 19:12                                                       ` Andrey Grodzovsky
  2021-04-13  5:36                                                       ` Andrey Grodzovsky
  0 siblings, 2 replies; 56+ messages in thread
From: Christian König @ 2021-04-12 18:23 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 8579 bytes --]

Am 12.04.21 um 20:18 schrieb Andrey Grodzovsky:
>
> On 2021-04-12 2:05 p.m., Christian König wrote:
>
>> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-04-12 1:44 p.m., Christian König wrote:
>>>
>>>>
>>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>>>> [SNIP]
>>>>>>>>>
>>>>>>>>> If we use a list and a flag called 'emit_allowed' under a lock 
>>>>>>>>> such that in amdgpu_fence_emit we lock the list, check the 
>>>>>>>>> flag and if true add the new HW fence to list and proceed to 
>>>>>>>>> HW emition as normal, otherwise return with -ENODEV. In 
>>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, and 
>>>>>>>>> then iterate the list and force signal it. Will this not 
>>>>>>>>> prevent any new HW fence creation from now on from any place 
>>>>>>>>> trying to do so ?
>>>>>>>>
>>>>>>>> Way to much overhead. The fence processing is intentionally 
>>>>>>>> lock free to avoid cache line bouncing because the IRQ can move 
>>>>>>>> from CPU to CPU.
>>>>>>>>
>>>>>>>> We need something which at least the processing of fences in 
>>>>>>>> the interrupt handler doesn't affect at all.
>>>>>>>
>>>>>>>
>>>>>>> As far as I see in the code, amdgpu_fence_emit is only called 
>>>>>>> from task context. Also, we can skip this list I proposed and 
>>>>>>> just use amdgpu_fence_driver_force_completion for each ring to 
>>>>>>> signal all created HW fences.
>>>>>>
>>>>>> Ah, wait a second this gave me another idea.
>>>>>>
>>>>>> See amdgpu_fence_driver_force_completion():
>>>>>>
>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>>>
>>>>>> If we change that to something like:
>>>>>>
>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>>>
>>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>>>> fences will be considered signaled.
>>>>>>
>>>>>> This basically solves out problem of making sure that new fences 
>>>>>> are also signaled without any additional overhead whatsoever.
>>>>>
>>>>>
>>>>> Problem with this is that the act of setting the sync_seq to some 
>>>>> MAX value alone is not enough, you actually have to call 
>>>>> amdgpu_fence_process to iterate and signal the fences currently 
>>>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>>>> you done your signalling no more HW fences will be added to that 
>>>>> array anymore. I was thinking to do something like bellow:
>>>>>
>>>>
>>>> Well we could implement the is_signaled callback once more, but I'm 
>>>> not sure if that is a good idea.
>>>
>>>
>>> This indeed could save the explicit signaling I am doing bellow but 
>>> I also set an error code there which might be helpful to propagate 
>>> to users
>>>
>>>
>>>>
>>>>> amdgpu_fence_emit()
>>>>>
>>>>> {
>>>>>
>>>>>     dma_fence_init(fence);
>>>>>
>>>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>>>
>>>>>     if (!adev->unplug)) {
>>>>>
>>>>>         seq = ++ring->fence_drv.sync_seq;
>>>>>         emit_fence(fence);
>>>>>
>>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>>
>>>>
>>>> You can pretty much ignore this wait here. It is only as a last 
>>>> resort so that we never overwrite the ring buffers.
>>>
>>>
>>> If device is present how can I ignore this ?
>>>
>
> I think you missed my question here
>

Sorry I thought I answered that below.

See this is just the last resort so that we don't need to worry about 
ring buffer overflows during testing.

We should not get here in practice and if we get here generating a 
deadlock might actually be the best handling.

The alternative would be to call BUG().

>>>
>>>>
>>>> But it should not have a timeout as far as I can see.
>>>
>>>
>>> Without timeout wait the who approach falls apart as I can't call 
>>> srcu_synchronize on this scope because once device is physically 
>>> gone the wait here will be forever
>>>
>>
>> Yeah, but this is intentional. The only alternative to avoid 
>> corruption is to wait with a timeout and call BUG() if that triggers. 
>> That isn't much better.
>>
>>>
>>>>
>>>>>         ring->fence_drv.fences[seq & 
>>>>> ring->fence_drv.num_fences_mask] = fence;
>>>>>
>>>>>     } else {
>>>>>
>>>>>         dma_fence_set_error(fence, -ENODEV);
>>>>>         DMA_fence_signal(fence)
>>>>>
>>>>>     }
>>>>>
>>>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>>>     return fence;
>>>>>
>>>>> }
>>>>>
>>>>> amdgpu_pci_remove
>>>>>
>>>>> {
>>>>>
>>>>>     adev->unplug = true;
>>>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>>>
>>>>
>>>> Well that is just duplicating what drm_dev_unplug() should be doing 
>>>> on a different level.
>>>
>>>
>>> drm_dev_unplug is on a much wider scope, for everything in the 
>>> device including 'flushing' in flight IOCTLs, this deals 
>>> specifically with the issue of force signalling HW fences
>>>
>>
>> Yeah, but it adds the same overhead as the device srcu.
>>
>> Christian.
>
>
> So what's the right approach ? How we guarantee that when running 
> amdgpu_fence_driver_force_completion we will signal all the HW fences 
> and not racing against some more fences insertion into that array ?
>

Well I would still say the best approach would be to insert this between 
the front end and the backend and not rely on signaling fences while 
holding the device srcu.

BTW: Could it be that the device SRCU protects more than one device and 
we deadlock because of this?

Christian.

> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>     /* Past this point no more fence are submitted to HW ring and 
>>>>> hence we can safely call force signal on all that are currently 
>>>>> there.
>>>>>      * Any subsequently created  HW fences will be returned 
>>>>> signaled with an error code right away
>>>>>      */
>>>>>
>>>>>     for_each_ring(adev)
>>>>>         amdgpu_fence_process(ring)
>>>>>
>>>>>     drm_dev_unplug(dev);
>>>>>     Stop schedulers
>>>>>     cancel_sync(all timers and queued works);
>>>>>     hw_fini
>>>>>     unmap_mmio
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I didn't get the above and I don't see why I need to reuse the 
>>>>>>>>> GPU reset rw_lock. I rely on the SRCU unplug flag for unplug. 
>>>>>>>>> Also, not clear to me why are we focusing on the scheduler 
>>>>>>>>> threads, any code patch to generate HW fences should be 
>>>>>>>>> covered, so any code leading to amdgpu_fence_emit needs to be 
>>>>>>>>> taken into account such as, direct IB submissions, VM flushes 
>>>>>>>>> e.t.c
>>>>>>>>
>>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>>> hotplug could run at the same time as a reset.
>>>>>>>
>>>>>>>
>>>>>>> For going my way indeed now I see now that I have to take reset 
>>>>>>> write side lock during HW fences signalling in order to protect 
>>>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent 
>>>>>>> GPU resets during unplug. In fact I already do it anyway - 
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>
>>>>>> Yes, good point as well.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>


[-- Attachment #1.2: Type: text/html, Size: 16567 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 18:23                                                     ` Christian König
@ 2021-04-12 19:12                                                       ` Andrey Grodzovsky
  2021-04-12 19:18                                                         ` Christian König
  2021-04-13  5:36                                                       ` Andrey Grodzovsky
  1 sibling, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-12 19:12 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 9490 bytes --]


On 2021-04-12 2:23 p.m., Christian König wrote:
> Am 12.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-12 2:05 p.m., Christian König wrote:
>>
>>> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-04-12 1:44 p.m., Christian König wrote:
>>>>
>>>>>
>>>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>>>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> If we use a list and a flag called 'emit_allowed' under a 
>>>>>>>>>> lock such that in amdgpu_fence_emit we lock the list, check 
>>>>>>>>>> the flag and if true add the new HW fence to list and proceed 
>>>>>>>>>> to HW emition as normal, otherwise return with -ENODEV. In 
>>>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, 
>>>>>>>>>> and then iterate the list and force signal it. Will this not 
>>>>>>>>>> prevent any new HW fence creation from now on from any place 
>>>>>>>>>> trying to do so ?
>>>>>>>>>
>>>>>>>>> Way to much overhead. The fence processing is intentionally 
>>>>>>>>> lock free to avoid cache line bouncing because the IRQ can 
>>>>>>>>> move from CPU to CPU.
>>>>>>>>>
>>>>>>>>> We need something which at least the processing of fences in 
>>>>>>>>> the interrupt handler doesn't affect at all.
>>>>>>>>
>>>>>>>>
>>>>>>>> As far as I see in the code, amdgpu_fence_emit is only called 
>>>>>>>> from task context. Also, we can skip this list I proposed and 
>>>>>>>> just use amdgpu_fence_driver_force_completion for each ring to 
>>>>>>>> signal all created HW fences.
>>>>>>>
>>>>>>> Ah, wait a second this gave me another idea.
>>>>>>>
>>>>>>> See amdgpu_fence_driver_force_completion():
>>>>>>>
>>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>>>>
>>>>>>> If we change that to something like:
>>>>>>>
>>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>>>>
>>>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>>>>> fences will be considered signaled.
>>>>>>>
>>>>>>> This basically solves out problem of making sure that new fences 
>>>>>>> are also signaled without any additional overhead whatsoever.
>>>>>>
>>>>>>
>>>>>> Problem with this is that the act of setting the sync_seq to some 
>>>>>> MAX value alone is not enough, you actually have to call 
>>>>>> amdgpu_fence_process to iterate and signal the fences currently 
>>>>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>>>>> you done your signalling no more HW fences will be added to that 
>>>>>> array anymore. I was thinking to do something like bellow:
>>>>>>
>>>>>
>>>>> Well we could implement the is_signaled callback once more, but 
>>>>> I'm not sure if that is a good idea.
>>>>
>>>>
>>>> This indeed could save the explicit signaling I am doing bellow but 
>>>> I also set an error code there which might be helpful to propagate 
>>>> to users
>>>>
>>>>
>>>>>
>>>>>> amdgpu_fence_emit()
>>>>>>
>>>>>> {
>>>>>>
>>>>>>     dma_fence_init(fence);
>>>>>>
>>>>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>>>>
>>>>>>     if (!adev->unplug)) {
>>>>>>
>>>>>>         seq = ++ring->fence_drv.sync_seq;
>>>>>>         emit_fence(fence);
>>>>>>
>>>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>>>
>>>>>
>>>>> You can pretty much ignore this wait here. It is only as a last 
>>>>> resort so that we never overwrite the ring buffers.
>>>>
>>>>
>>>> If device is present how can I ignore this ?
>>>>
>>
>> I think you missed my question here
>>
>
> Sorry I thought I answered that below.
>
> See this is just the last resort so that we don't need to worry about 
> ring buffer overflows during testing.
>
> We should not get here in practice and if we get here generating a 
> deadlock might actually be the best handling.
>
> The alternative would be to call BUG().
>
>>>>
>>>>>
>>>>> But it should not have a timeout as far as I can see.
>>>>
>>>>
>>>> Without timeout wait the who approach falls apart as I can't call 
>>>> srcu_synchronize on this scope because once device is physically 
>>>> gone the wait here will be forever
>>>>
>>>
>>> Yeah, but this is intentional. The only alternative to avoid 
>>> corruption is to wait with a timeout and call BUG() if that 
>>> triggers. That isn't much better.
>>>
>>>>
>>>>>
>>>>>>         ring->fence_drv.fences[seq & 
>>>>>> ring->fence_drv.num_fences_mask] = fence;
>>>>>>
>>>>>>     } else {
>>>>>>
>>>>>>         dma_fence_set_error(fence, -ENODEV);
>>>>>>         DMA_fence_signal(fence)
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>>>>     return fence;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> amdgpu_pci_remove
>>>>>>
>>>>>> {
>>>>>>
>>>>>>     adev->unplug = true;
>>>>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>>>>
>>>>>
>>>>> Well that is just duplicating what drm_dev_unplug() should be 
>>>>> doing on a different level.
>>>>
>>>>
>>>> drm_dev_unplug is on a much wider scope, for everything in the 
>>>> device including 'flushing' in flight IOCTLs, this deals 
>>>> specifically with the issue of force signalling HW fences
>>>>
>>>
>>> Yeah, but it adds the same overhead as the device srcu.
>>>
>>> Christian.
>>
>>
>> So what's the right approach ? How we guarantee that when running 
>> amdgpu_fence_driver_force_completion we will signal all the HW fences 
>> and not racing against some more fences insertion into that array ?
>>
>
> Well I would still say the best approach would be to insert this 
> between the front end and the backend and not rely on signaling fences 
> while holding the device srcu.


My question is, even now, when we run 
amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, what 
there prevents a race with another fence being at the same time emitted 
and inserted into the fence array ? Looks like nothing.


>
> BTW: Could it be that the device SRCU protects more than one device 
> and we deadlock because of this?


I haven't actually experienced any deadlock until now but, yes, 
drm_unplug_srcu is defined as static in drm_drv.c and so in the 
presence  of multiple devices from same or different drivers we in fact 
are dependent on all their critical sections i guess.

Andrey


>
> Christian.
>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>     /* Past this point no more fence are submitted to HW ring and 
>>>>>> hence we can safely call force signal on all that are currently 
>>>>>> there.
>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>> signaled with an error code right away
>>>>>>      */
>>>>>>
>>>>>>     for_each_ring(adev)
>>>>>>         amdgpu_fence_process(ring)
>>>>>>
>>>>>>     drm_dev_unplug(dev);
>>>>>>     Stop schedulers
>>>>>>     cancel_sync(all timers and queued works);
>>>>>>     hw_fini
>>>>>>     unmap_mmio
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>> should be covered, so any code leading to amdgpu_fence_emit 
>>>>>>>>>> needs to be taken into account such as, direct IB 
>>>>>>>>>> submissions, VM flushes e.t.c
>>>>>>>>>
>>>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>>>> hotplug could run at the same time as a reset.
>>>>>>>>
>>>>>>>>
>>>>>>>> For going my way indeed now I see now that I have to take reset 
>>>>>>>> write side lock during HW fences signalling in order to protect 
>>>>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent 
>>>>>>>> GPU resets during unplug. In fact I already do it anyway - 
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>>
>>>>>>> Yes, good point as well.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 18466 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 19:12                                                       ` Andrey Grodzovsky
@ 2021-04-12 19:18                                                         ` Christian König
  2021-04-12 20:01                                                           ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-12 19:18 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking

Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
> [SNIP]
>>>
>>> So what's the right approach ? How we guarantee that when running 
>>> amdgpu_fence_driver_force_completion we will signal all the HW 
>>> fences and not racing against some more fences insertion into that 
>>> array ?
>>>
>>
>> Well I would still say the best approach would be to insert this 
>> between the front end and the backend and not rely on signaling 
>> fences while holding the device srcu.
>
>
> My question is, even now, when we run 
> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
> what there prevents a race with another fence being at the same time 
> emitted and inserted into the fence array ? Looks like nothing.
>

Each ring can only be used by one thread at the same time, this includes 
emitting fences as well as other stuff.

During GPU reset we make sure nobody writes to the rings by stopping the 
scheduler and taking the GPU reset lock (so that nobody else can start 
the scheduler again).

>>
>> BTW: Could it be that the device SRCU protects more than one device 
>> and we deadlock because of this?
>
>
> I haven't actually experienced any deadlock until now but, yes, 
> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
> presence  of multiple devices from same or different drivers we in 
> fact are dependent on all their critical sections i guess.
>

Shit, yeah the devil is a squirrel. So for A+I laptops we actually need 
to sync that up with Daniel and the rest of the i915 guys.

IIRC we could actually have an amdgpu device in a docking station which 
needs hotplug and the driver might depend on waiting for the i915 driver 
as well.

Christian.

> Andrey
>
>
>>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>     /* Past this point no more fence are submitted to HW ring 
>>>>>>> and hence we can safely call force signal on all that are 
>>>>>>> currently there.
>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>> signaled with an error code right away
>>>>>>>      */
>>>>>>>
>>>>>>>     for_each_ring(adev)
>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>
>>>>>>>     drm_dev_unplug(dev);
>>>>>>>     Stop schedulers
>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>     hw_fini
>>>>>>>     unmap_mmio
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping 
>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>>> should be covered, so any code leading to amdgpu_fence_emit 
>>>>>>>>>>> needs to be taken into account such as, direct IB 
>>>>>>>>>>> submissions, VM flushes e.t.c
>>>>>>>>>>
>>>>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>>>>> hotplug could run at the same time as a reset.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>> reset write side lock during HW fences signalling in order to 
>>>>>>>>> protect against scheduler/HW fences detachment and 
>>>>>>>>> reattachment during schedulers stop/restart. But if we go with 
>>>>>>>>> your approach  then calling drm_dev_unplug and scoping 
>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough to 
>>>>>>>>> prevent any concurrent GPU resets during unplug. In fact I 
>>>>>>>>> already do it anyway - 
>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>>>
>>>>>>>> Yes, good point as well.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 19:18                                                         ` Christian König
@ 2021-04-12 20:01                                                           ` Andrey Grodzovsky
  2021-04-13  7:10                                                             ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-12 20:01 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


On 2021-04-12 3:18 p.m., Christian König wrote:
> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>> [SNIP]
>>>>
>>>> So what's the right approach ? How we guarantee that when running 
>>>> amdgpu_fence_driver_force_completion we will signal all the HW 
>>>> fences and not racing against some more fences insertion into that 
>>>> array ?
>>>>
>>>
>>> Well I would still say the best approach would be to insert this 
>>> between the front end and the backend and not rely on signaling 
>>> fences while holding the device srcu.
>>
>>
>> My question is, even now, when we run 
>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>> what there prevents a race with another fence being at the same time 
>> emitted and inserted into the fence array ? Looks like nothing.
>>
>
> Each ring can only be used by one thread at the same time, this 
> includes emitting fences as well as other stuff.
>
> During GPU reset we make sure nobody writes to the rings by stopping 
> the scheduler and taking the GPU reset lock (so that nobody else can 
> start the scheduler again).


What about direct submissions not through scheduler - 
amdgpu_job_submit_direct, I don't see how this is protected.


>
>>>
>>> BTW: Could it be that the device SRCU protects more than one device 
>>> and we deadlock because of this?
>>
>>
>> I haven't actually experienced any deadlock until now but, yes, 
>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>> presence  of multiple devices from same or different drivers we in 
>> fact are dependent on all their critical sections i guess.
>>
>
> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
> need to sync that up with Daniel and the rest of the i915 guys.
>
> IIRC we could actually have an amdgpu device in a docking station 
> which needs hotplug and the driver might depend on waiting for the 
> i915 driver as well.


Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
don't see why it has to be global and not per device thing.

Andrey


>
> Christian.
>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>     /* Past this point no more fence are submitted to HW ring 
>>>>>>>> and hence we can safely call force signal on all that are 
>>>>>>>> currently there.
>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>> signaled with an error code right away
>>>>>>>>      */
>>>>>>>>
>>>>>>>>     for_each_ring(adev)
>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>
>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>     Stop schedulers
>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>     hw_fini
>>>>>>>>     unmap_mmio
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping 
>>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>>>> should be covered, so any code leading to amdgpu_fence_emit 
>>>>>>>>>>>> needs to be taken into account such as, direct IB 
>>>>>>>>>>>> submissions, VM flushes e.t.c
>>>>>>>>>>>
>>>>>>>>>>> You need to work together with the reset lock anyway, cause 
>>>>>>>>>>> a hotplug could run at the same time as a reset.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>> reset write side lock during HW fences signalling in order to 
>>>>>>>>>> protect against scheduler/HW fences detachment and 
>>>>>>>>>> reattachment during schedulers stop/restart. But if we go 
>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping 
>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough 
>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact I 
>>>>>>>>>> already do it anyway - 
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>>>>
>>>>>>>>> Yes, good point as well.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 18:23                                                     ` Christian König
  2021-04-12 19:12                                                       ` Andrey Grodzovsky
@ 2021-04-13  5:36                                                       ` Andrey Grodzovsky
  2021-04-13  7:07                                                         ` Christian König
  1 sibling, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-13  5:36 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 9273 bytes --]


On 2021-04-12 2:23 p.m., Christian König wrote:
> Am 12.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-12 2:05 p.m., Christian König wrote:
>>
>>> Am 12.04.21 um 20:01 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-04-12 1:44 p.m., Christian König wrote:
>>>>
>>>>>
>>>>> Am 12.04.21 um 19:27 schrieb Andrey Grodzovsky:
>>>>>> On 2021-04-10 1:34 p.m., Christian König wrote:
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> Am 09.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> If we use a list and a flag called 'emit_allowed' under a 
>>>>>>>>>> lock such that in amdgpu_fence_emit we lock the list, check 
>>>>>>>>>> the flag and if true add the new HW fence to list and proceed 
>>>>>>>>>> to HW emition as normal, otherwise return with -ENODEV. In 
>>>>>>>>>> amdgpu_pci_remove we take the lock, set the flag to false, 
>>>>>>>>>> and then iterate the list and force signal it. Will this not 
>>>>>>>>>> prevent any new HW fence creation from now on from any place 
>>>>>>>>>> trying to do so ?
>>>>>>>>>
>>>>>>>>> Way to much overhead. The fence processing is intentionally 
>>>>>>>>> lock free to avoid cache line bouncing because the IRQ can 
>>>>>>>>> move from CPU to CPU.
>>>>>>>>>
>>>>>>>>> We need something which at least the processing of fences in 
>>>>>>>>> the interrupt handler doesn't affect at all.
>>>>>>>>
>>>>>>>>
>>>>>>>> As far as I see in the code, amdgpu_fence_emit is only called 
>>>>>>>> from task context. Also, we can skip this list I proposed and 
>>>>>>>> just use amdgpu_fence_driver_force_completion for each ring to 
>>>>>>>> signal all created HW fences.
>>>>>>>
>>>>>>> Ah, wait a second this gave me another idea.
>>>>>>>
>>>>>>> See amdgpu_fence_driver_force_completion():
>>>>>>>
>>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq);
>>>>>>>
>>>>>>> If we change that to something like:
>>>>>>>
>>>>>>> amdgpu_fence_write(ring, ring->fence_drv.sync_seq + 0x3FFFFFFF);
>>>>>>>
>>>>>>> Not only the currently submitted, but also the next 0x3FFFFFFF 
>>>>>>> fences will be considered signaled.
>>>>>>>
>>>>>>> This basically solves out problem of making sure that new fences 
>>>>>>> are also signaled without any additional overhead whatsoever.
>>>>>>
>>>>>>
>>>>>> Problem with this is that the act of setting the sync_seq to some 
>>>>>> MAX value alone is not enough, you actually have to call 
>>>>>> amdgpu_fence_process to iterate and signal the fences currently 
>>>>>> stored in ring->fence_drv.fences array and to guarantee that once 
>>>>>> you done your signalling no more HW fences will be added to that 
>>>>>> array anymore. I was thinking to do something like bellow:
>>>>>>
>>>>>
>>>>> Well we could implement the is_signaled callback once more, but 
>>>>> I'm not sure if that is a good idea.
>>>>
>>>>
>>>> This indeed could save the explicit signaling I am doing bellow but 
>>>> I also set an error code there which might be helpful to propagate 
>>>> to users
>>>>
>>>>
>>>>>
>>>>>> amdgpu_fence_emit()
>>>>>>
>>>>>> {
>>>>>>
>>>>>>     dma_fence_init(fence);
>>>>>>
>>>>>>     srcu_read_lock(amdgpu_unplug_srcu)
>>>>>>
>>>>>>     if (!adev->unplug)) {
>>>>>>
>>>>>>         seq = ++ring->fence_drv.sync_seq;
>>>>>>         emit_fence(fence);
>>>>>>
>>>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>>>
>>>>>
>>>>> You can pretty much ignore this wait here. It is only as a last 
>>>>> resort so that we never overwrite the ring buffers.
>>>>
>>>>
>>>> If device is present how can I ignore this ?
>>>>
>>
>> I think you missed my question here
>>
>
> Sorry I thought I answered that below.
>
> See this is just the last resort so that we don't need to worry about 
> ring buffer overflows during testing.
>
> We should not get here in practice and if we get here generating a 
> deadlock might actually be the best handling.
>
> The alternative would be to call BUG().


BTW, I am not sure it's so improbable to get here in case of sudden 
device remove, if you are during rapid commands submission to the ring 
during this time  you could easily get to ring buffer overrun because 
EOP interrupts are gone and fences are not removed anymore but new ones 
keep arriving from new submissions which don't stop yet.

Andrey


>
>>>>
>>>>>
>>>>> But it should not have a timeout as far as I can see.
>>>>
>>>>
>>>> Without timeout wait the who approach falls apart as I can't call 
>>>> srcu_synchronize on this scope because once device is physically 
>>>> gone the wait here will be forever
>>>>
>>>
>>> Yeah, but this is intentional. The only alternative to avoid 
>>> corruption is to wait with a timeout and call BUG() if that 
>>> triggers. That isn't much better.
>>>
>>>>
>>>>>
>>>>>>         ring->fence_drv.fences[seq & 
>>>>>> ring->fence_drv.num_fences_mask] = fence;
>>>>>>
>>>>>>     } else {
>>>>>>
>>>>>>         dma_fence_set_error(fence, -ENODEV);
>>>>>>         DMA_fence_signal(fence)
>>>>>>
>>>>>>     }
>>>>>>
>>>>>>     srcu_read_unlock(amdgpu_unplug_srcu)
>>>>>>     return fence;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> amdgpu_pci_remove
>>>>>>
>>>>>> {
>>>>>>
>>>>>>     adev->unplug = true;
>>>>>>     synchronize_srcu(amdgpu_unplug_srcu)
>>>>>>
>>>>>
>>>>> Well that is just duplicating what drm_dev_unplug() should be 
>>>>> doing on a different level.
>>>>
>>>>
>>>> drm_dev_unplug is on a much wider scope, for everything in the 
>>>> device including 'flushing' in flight IOCTLs, this deals 
>>>> specifically with the issue of force signalling HW fences
>>>>
>>>
>>> Yeah, but it adds the same overhead as the device srcu.
>>>
>>> Christian.
>>
>>
>> So what's the right approach ? How we guarantee that when running 
>> amdgpu_fence_driver_force_completion we will signal all the HW fences 
>> and not racing against some more fences insertion into that array ?
>>
>
> Well I would still say the best approach would be to insert this 
> between the front end and the backend and not rely on signaling fences 
> while holding the device srcu.
>
> BTW: Could it be that the device SRCU protects more than one device 
> and we deadlock because of this?
>
> Christian.
>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>     /* Past this point no more fence are submitted to HW ring and 
>>>>>> hence we can safely call force signal on all that are currently 
>>>>>> there.
>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>> signaled with an error code right away
>>>>>>      */
>>>>>>
>>>>>>     for_each_ring(adev)
>>>>>>         amdgpu_fence_process(ring)
>>>>>>
>>>>>>     drm_dev_unplug(dev);
>>>>>>     Stop schedulers
>>>>>>     cancel_sync(all timers and queued works);
>>>>>>     hw_fini
>>>>>>     unmap_mmio
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Alternatively grabbing the reset write side and stopping and 
>>>>>>>>>>> then restarting the scheduler could work as well.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>> should be covered, so any code leading to amdgpu_fence_emit 
>>>>>>>>>> needs to be taken into account such as, direct IB 
>>>>>>>>>> submissions, VM flushes e.t.c
>>>>>>>>>
>>>>>>>>> You need to work together with the reset lock anyway, cause a 
>>>>>>>>> hotplug could run at the same time as a reset.
>>>>>>>>
>>>>>>>>
>>>>>>>> For going my way indeed now I see now that I have to take reset 
>>>>>>>> write side lock during HW fences signalling in order to protect 
>>>>>>>> against scheduler/HW fences detachment and reattachment during 
>>>>>>>> schedulers stop/restart. But if we go with your approach  then 
>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout with 
>>>>>>>> drm_dev_enter/exit should be enough to prevent any concurrent 
>>>>>>>> GPU resets during unplug. In fact I already do it anyway - 
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>>
>>>>>>> Yes, good point as well.
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

[-- Attachment #1.2: Type: text/html, Size: 18081 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  5:36                                                       ` Andrey Grodzovsky
@ 2021-04-13  7:07                                                         ` Christian König
  0 siblings, 0 replies; 56+ messages in thread
From: Christian König @ 2021-04-13  7:07 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking


[-- Attachment #1.1: Type: text/plain, Size: 1586 bytes --]

Am 13.04.21 um 07:36 schrieb Andrey Grodzovsky:
> [SNIP]

> emit_fence(fence);
>>>>>>>
>>>>>>> */* We can't wait forever as the HW might be gone at any point*/**
>>>>>>>        dma_fence_wait_timeout(old_fence, 5S);*
>>>>>>>
>>>>>>
>>>>>> You can pretty much ignore this wait here. It is only as a last 
>>>>>> resort so that we never overwrite the ring buffers.
>>>>>
>>>>>
>>>>> If device is present how can I ignore this ?
>>>>>
>>>
>>> I think you missed my question here
>>>
>>
>> Sorry I thought I answered that below.
>>
>> See this is just the last resort so that we don't need to worry about 
>> ring buffer overflows during testing.
>>
>> We should not get here in practice and if we get here generating a 
>> deadlock might actually be the best handling.
>>
>> The alternative would be to call BUG().
>
>
> BTW, I am not sure it's so improbable to get here in case of sudden 
> device remove, if you are during rapid commands submission to the ring 
> during this time  you could easily get to ring buffer overrun because 
> EOP interrupts are gone and fences are not removed anymore but new 
> ones keep arriving from new submissions which don't stop yet.
>

During normal operation hardware fences are only created by two code paths:
1. The scheduler when it pushes jobs to the hardware.
2. The KIQ when it does register access on SRIOV.

Both are limited in how many submissions could be made.

The only case where this here becomes necessary is during GPU reset when 
we do direct submission bypassing the scheduler for IB and other tests.

Christian.

> Andrey
>


[-- Attachment #1.2: Type: text/html, Size: 3310 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-12 20:01                                                           ` Andrey Grodzovsky
@ 2021-04-13  7:10                                                             ` Christian König
  2021-04-13  9:13                                                               ` Li, Dennis
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Christian König @ 2021-04-13  7:10 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>
> On 2021-04-12 3:18 p.m., Christian König wrote:
>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>>
>>>>> So what's the right approach ? How we guarantee that when running 
>>>>> amdgpu_fence_driver_force_completion we will signal all the HW 
>>>>> fences and not racing against some more fences insertion into that 
>>>>> array ?
>>>>>
>>>>
>>>> Well I would still say the best approach would be to insert this 
>>>> between the front end and the backend and not rely on signaling 
>>>> fences while holding the device srcu.
>>>
>>>
>>> My question is, even now, when we run 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>> what there prevents a race with another fence being at the same time 
>>> emitted and inserted into the fence array ? Looks like nothing.
>>>
>>
>> Each ring can only be used by one thread at the same time, this 
>> includes emitting fences as well as other stuff.
>>
>> During GPU reset we make sure nobody writes to the rings by stopping 
>> the scheduler and taking the GPU reset lock (so that nobody else can 
>> start the scheduler again).
>
>
> What about direct submissions not through scheduler - 
> amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.

>>
>>>>
>>>> BTW: Could it be that the device SRCU protects more than one device 
>>>> and we deadlock because of this?
>>>
>>>
>>> I haven't actually experienced any deadlock until now but, yes, 
>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>> presence  of multiple devices from same or different drivers we in 
>>> fact are dependent on all their critical sections i guess.
>>>
>>
>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>> need to sync that up with Daniel and the rest of the i915 guys.
>>
>> IIRC we could actually have an amdgpu device in a docking station 
>> which needs hotplug and the driver might depend on waiting for the 
>> i915 driver as well.
>
>
> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
> don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>     /* Past this point no more fence are submitted to HW ring 
>>>>>>>>> and hence we can safely call force signal on all that are 
>>>>>>>>> currently there.
>>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>>> signaled with an error code right away
>>>>>>>>>      */
>>>>>>>>>
>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>
>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>     Stop schedulers
>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>     hw_fini
>>>>>>>>>     unmap_mmio
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping 
>>>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>>>>> should be covered, so any code leading to 
>>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account such as, 
>>>>>>>>>>>>> direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>
>>>>>>>>>>>> You need to work together with the reset lock anyway, cause 
>>>>>>>>>>>> a hotplug could run at the same time as a reset.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>>> reset write side lock during HW fences signalling in order 
>>>>>>>>>>> to protect against scheduler/HW fences detachment and 
>>>>>>>>>>> reattachment during schedulers stop/restart. But if we go 
>>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping 
>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough 
>>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact 
>>>>>>>>>>> I already do it anyway - 
>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
>>>>>>>>>>
>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  7:10                                                             ` Christian König
@ 2021-04-13  9:13                                                               ` Li, Dennis
  2021-04-13  9:14                                                                 ` Christian König
  2021-04-13 20:08                                                                 ` Daniel Vetter
  2021-04-13 15:12                                                               ` Andrey Grodzovsky
  2021-04-13 20:07                                                               ` Daniel Vetter
  2 siblings, 2 replies; 56+ messages in thread
From: Li, Dennis @ 2021-04-13  9:13 UTC (permalink / raw)
  To: Christian König, Grodzovsky, Andrey, Koenig, Christian,
	amd-gfx, Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

[AMD Official Use Only - Internal Distribution Only]

Hi, Christian and Andrey,
      We maybe try to implement "wait" callback function of dma_fence_ops, when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller device lost. 

	 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
	 * interrupted, and remaining jiffies if fence has signaled, or 0 if wait
	 * timed out. Can also return other error values on custom implementations,
	 * which should be treated as if the fence is signaled. For example a hardware
	 * lockup could be reported like that.
	 *
	 * This callback is optional.
	 */
	signed long (*wait)(struct dma_fence *fence,
			    bool intr, signed long timeout);

Best Regards
Dennis Li
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@gmail.com> 
Sent: Tuesday, April 13, 2021 3:10 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability

Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>
> On 2021-04-12 3:18 p.m., Christian König wrote:
>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>>
>>>>> So what's the right approach ? How we guarantee that when running 
>>>>> amdgpu_fence_driver_force_completion we will signal all the HW 
>>>>> fences and not racing against some more fences insertion into that 
>>>>> array ?
>>>>>
>>>>
>>>> Well I would still say the best approach would be to insert this 
>>>> between the front end and the backend and not rely on signaling 
>>>> fences while holding the device srcu.
>>>
>>>
>>> My question is, even now, when we run 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
>>> what there prevents a race with another fence being at the same time 
>>> emitted and inserted into the fence array ? Looks like nothing.
>>>
>>
>> Each ring can only be used by one thread at the same time, this 
>> includes emitting fences as well as other stuff.
>>
>> During GPU reset we make sure nobody writes to the rings by stopping 
>> the scheduler and taking the GPU reset lock (so that nobody else can 
>> start the scheduler again).
>
>
> What about direct submissions not through scheduler - 
> amdgpu_job_submit_direct, I don't see how this is protected.

Those only happen during startup and GPU reset.

>>
>>>>
>>>> BTW: Could it be that the device SRCU protects more than one device 
>>>> and we deadlock because of this?
>>>
>>>
>>> I haven't actually experienced any deadlock until now but, yes, 
>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>> presence  of multiple devices from same or different drivers we in 
>>> fact are dependent on all their critical sections i guess.
>>>
>>
>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>> need to sync that up with Daniel and the rest of the i915 guys.
>>
>> IIRC we could actually have an amdgpu device in a docking station 
>> which needs hotplug and the driver might depend on waiting for the
>> i915 driver as well.
>
>
> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
> don't see why it has to be global and not per device thing.

I'm really wondering the same thing for quite a while now.

Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

Regards,
Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>     /* Past this point no more fence are submitted to HW ring 
>>>>>>>>> and hence we can safely call force signal on all that are 
>>>>>>>>> currently there.
>>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>>> signaled with an error code right away
>>>>>>>>>      */
>>>>>>>>>
>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>
>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>     Stop schedulers
>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>     hw_fini
>>>>>>>>>     unmap_mmio
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping 
>>>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse 
>>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for 
>>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the 
>>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences 
>>>>>>>>>>>>> should be covered, so any code leading to 
>>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account such as, 
>>>>>>>>>>>>> direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>
>>>>>>>>>>>> You need to work together with the reset lock anyway, cause 
>>>>>>>>>>>> a hotplug could run at the same time as a reset.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>>> reset write side lock during HW fences signalling in order 
>>>>>>>>>>> to protect against scheduler/HW fences detachment and 
>>>>>>>>>>> reattachment during schedulers stop/restart. But if we go 
>>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping 
>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough 
>>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact 
>>>>>>>>>>> I already do it anyway -
>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2
>>>>>>>>>>> F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh
>>>>>>>>>>> %3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc
>>>>>>>>>>> 36b1&amp;data=04%7C01%7CDennis.Li%40amd.com%7Cc7fc6cb505c34a
>>>>>>>>>>> edfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
>>>>>>>>>>> 0%7C637538946323194151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
>>>>>>>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>>>>>>>>>>> &amp;sdata=%2Fe%2BqJNlcuUjLHsLvfHCKqerK%2Ff8lzujqOBhnMBIRP8E
>>>>>>>>>>> %3D&amp;reserved=0
>>>>>>>>>>
>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  9:13                                                               ` Li, Dennis
@ 2021-04-13  9:14                                                                 ` Christian König
  2021-04-13 20:08                                                                 ` Daniel Vetter
  1 sibling, 0 replies; 56+ messages in thread
From: Christian König @ 2021-04-13  9:14 UTC (permalink / raw)
  To: Li, Dennis, Christian König, Grodzovsky, Andrey, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Hi Dennis,

yeah, that just has the same down side of a lot of additional overhead 
as the is_signaled callback.

Bouncing cache lines on the CPU isn't funny at all.

Christian.

Am 13.04.21 um 11:13 schrieb Li, Dennis:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Christian and Andrey,
>        We maybe try to implement "wait" callback function of dma_fence_ops, when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller device lost.
>
> 	 * Must return -ERESTARTSYS if the wait is intr = true and the wait was
> 	 * interrupted, and remaining jiffies if fence has signaled, or 0 if wait
> 	 * timed out. Can also return other error values on custom implementations,
> 	 * which should be treated as if the fence is signaled. For example a hardware
> 	 * lockup could be reported like that.
> 	 *
> 	 * This callback is optional.
> 	 */
> 	signed long (*wait)(struct dma_fence *fence,
> 			    bool intr, signed long timeout);
>
> Best Regards
> Dennis Li
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 13, 2021 3:10 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
>
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>> So what's the right approach ? How we guarantee that when running
>>>>>> amdgpu_fence_driver_force_completion we will signal all the HW
>>>>>> fences and not racing against some more fences insertion into that
>>>>>> array ?
>>>>>>
>>>>> Well I would still say the best approach would be to insert this
>>>>> between the front end and the backend and not rely on signaling
>>>>> fences while holding the device srcu.
>>>>
>>>> My question is, even now, when we run
>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
>>>> what there prevents a race with another fence being at the same time
>>>> emitted and inserted into the fence array ? Looks like nothing.
>>>>
>>> Each ring can only be used by one thread at the same time, this
>>> includes emitting fences as well as other stuff.
>>>
>>> During GPU reset we make sure nobody writes to the rings by stopping
>>> the scheduler and taking the GPU reset lock (so that nobody else can
>>> start the scheduler again).
>>
>> What about direct submissions not through scheduler -
>> amdgpu_job_submit_direct, I don't see how this is protected.
> Those only happen during startup and GPU reset.
>
>>>>> BTW: Could it be that the device SRCU protects more than one device
>>>>> and we deadlock because of this?
>>>>
>>>> I haven't actually experienced any deadlock until now but, yes,
>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the
>>>> presence  of multiple devices from same or different drivers we in
>>>> fact are dependent on all their critical sections i guess.
>>>>
>>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually
>>> need to sync that up with Daniel and the rest of the i915 guys.
>>>
>>> IIRC we could actually have an amdgpu device in a docking station
>>> which needs hotplug and the driver might depend on waiting for the
>>> i915 driver as well.
>>
>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
>> don't see why it has to be global and not per device thing.
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.
>
> Regards,
> Christian.
>
>> Andrey
>>
>>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>      /* Past this point no more fence are submitted to HW ring
>>>>>>>>>> and hence we can safely call force signal on all that are
>>>>>>>>>> currently there.
>>>>>>>>>>       * Any subsequently created  HW fences will be returned
>>>>>>>>>> signaled with an error code right away
>>>>>>>>>>       */
>>>>>>>>>>
>>>>>>>>>>      for_each_ring(adev)
>>>>>>>>>>          amdgpu_fence_process(ring)
>>>>>>>>>>
>>>>>>>>>>      drm_dev_unplug(dev);
>>>>>>>>>>      Stop schedulers
>>>>>>>>>>      cancel_sync(all timers and queued works);
>>>>>>>>>>      hw_fini
>>>>>>>>>>      unmap_mmio
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping
>>>>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse
>>>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for
>>>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the
>>>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences
>>>>>>>>>>>>>> should be covered, so any code leading to
>>>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account such as,
>>>>>>>>>>>>>> direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>> You need to work together with the reset lock anyway, cause
>>>>>>>>>>>>> a hotplug could run at the same time as a reset.
>>>>>>>>>>>>
>>>>>>>>>>>> For going my way indeed now I see now that I have to take
>>>>>>>>>>>> reset write side lock during HW fences signalling in order
>>>>>>>>>>>> to protect against scheduler/HW fences detachment and
>>>>>>>>>>>> reattachment during schedulers stop/restart. But if we go
>>>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping
>>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough
>>>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact
>>>>>>>>>>>> I already do it anyway -
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2
>>>>>>>>>>>> F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh
>>>>>>>>>>>> %3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc
>>>>>>>>>>>> 36b1&amp;data=04%7C01%7CDennis.Li%40amd.com%7Cc7fc6cb505c34a
>>>>>>>>>>>> edfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
>>>>>>>>>>>> 0%7C637538946323194151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
>>>>>>>>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
>>>>>>>>>>>> &amp;sdata=%2Fe%2BqJNlcuUjLHsLvfHCKqerK%2Ff8lzujqOBhnMBIRP8E
>>>>>>>>>>>> %3D&amp;reserved=0
>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  7:10                                                             ` Christian König
  2021-04-13  9:13                                                               ` Li, Dennis
@ 2021-04-13 15:12                                                               ` Andrey Grodzovsky
  2021-04-13 18:03                                                                 ` Christian König
  2021-04-13 20:07                                                               ` Daniel Vetter
  2 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-13 15:12 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-13 3:10 a.m., Christian König wrote:
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>>
>>>>>> So what's the right approach ? How we guarantee that when running 
>>>>>> amdgpu_fence_driver_force_completion we will signal all the HW 
>>>>>> fences and not racing against some more fences insertion into 
>>>>>> that array ?
>>>>>>
>>>>>
>>>>> Well I would still say the best approach would be to insert this 
>>>>> between the front end and the backend and not rely on signaling 
>>>>> fences while holding the device srcu.
>>>>
>>>>
>>>> My question is, even now, when we run 
>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>> what there prevents a race with another fence being at the same 
>>>> time emitted and inserted into the fence array ? Looks like nothing.
>>>>
>>>
>>> Each ring can only be used by one thread at the same time, this 
>>> includes emitting fences as well as other stuff.
>>>
>>> During GPU reset we make sure nobody writes to the rings by stopping 
>>> the scheduler and taking the GPU reset lock (so that nobody else can 
>>> start the scheduler again).
>>
>>
>> What about direct submissions not through scheduler - 
>> amdgpu_job_submit_direct, I don't see how this is protected.
>
> Those only happen during startup and GPU reset.


Ok, but then looks like I am missing something, see the following steps 
in amdgpu_pci_remove -

1) Use disable_irq API function to stop and flush all in flight HW 
interrupts handlers

2) Grab the reset lock and stop all the schedulers

After above 2 steps the HW fences array is idle, no more insertions and 
no more extractions from the array

3) Run one time amdgpu_fence_process to signal all current HW fences

4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the 
GPU reset lock and go on with the rest of the sequence (cancel timers, 
work items e.t.c)

What's problematic in this sequence ?

Andrey


>
>
>>>
>>>>>
>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>> device and we deadlock because of this?
>>>>
>>>>
>>>> I haven't actually experienced any deadlock until now but, yes, 
>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>>> presence  of multiple devices from same or different drivers we in 
>>>> fact are dependent on all their critical sections i guess.
>>>>
>>>
>>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>>> need to sync that up with Daniel and the rest of the i915 guys.
>>>
>>> IIRC we could actually have an amdgpu device in a docking station 
>>> which needs hotplug and the driver might depend on waiting for the 
>>> i915 driver as well.
>>
>>
>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
>> don't see why it has to be global and not per device thing.
>
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.
>
> Regards,
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>     /* Past this point no more fence are submitted to HW ring 
>>>>>>>>>> and hence we can safely call force signal on all that are 
>>>>>>>>>> currently there.
>>>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>>>> signaled with an error code right away
>>>>>>>>>>      */
>>>>>>>>>>
>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>
>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>     Stop schedulers
>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>     hw_fini
>>>>>>>>>>     unmap_mmio
>>>>>>>>>>
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping 
>>>>>>>>>>>>>>> and then restarting the scheduler could work as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU unplug 
>>>>>>>>>>>>>> flag for unplug. Also, not clear to me why are we 
>>>>>>>>>>>>>> focusing on the scheduler threads, any code patch to 
>>>>>>>>>>>>>> generate HW fences should be covered, so any code leading 
>>>>>>>>>>>>>> to amdgpu_fence_emit needs to be taken into account such 
>>>>>>>>>>>>>> as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>
>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>>>> reset write side lock during HW fences signalling in order 
>>>>>>>>>>>> to protect against scheduler/HW fences detachment and 
>>>>>>>>>>>> reattachment during schedulers stop/restart. But if we go 
>>>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping 
>>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough 
>>>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact 
>>>>>>>>>>>> I already do it anyway - 
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0
>>>>>>>>>>>
>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13 15:12                                                               ` Andrey Grodzovsky
@ 2021-04-13 18:03                                                                 ` Christian König
  2021-04-13 18:18                                                                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-13 18:03 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>
> On 2021-04-13 3:10 a.m., Christian König wrote:
>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>> [SNIP]
>>>>>>>
>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>> running amdgpu_fence_driver_force_completion we will signal all 
>>>>>>> the HW fences and not racing against some more fences insertion 
>>>>>>> into that array ?
>>>>>>>
>>>>>>
>>>>>> Well I would still say the best approach would be to insert this 
>>>>>> between the front end and the backend and not rely on signaling 
>>>>>> fences while holding the device srcu.
>>>>>
>>>>>
>>>>> My question is, even now, when we run 
>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>> what there prevents a race with another fence being at the same 
>>>>> time emitted and inserted into the fence array ? Looks like nothing.
>>>>>
>>>>
>>>> Each ring can only be used by one thread at the same time, this 
>>>> includes emitting fences as well as other stuff.
>>>>
>>>> During GPU reset we make sure nobody writes to the rings by 
>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>> nobody else can start the scheduler again).
>>>
>>>
>>> What about direct submissions not through scheduler - 
>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>
>> Those only happen during startup and GPU reset.
>
>
> Ok, but then looks like I am missing something, see the following 
> steps in amdgpu_pci_remove -
>
> 1) Use disable_irq API function to stop and flush all in flight HW 
> interrupts handlers
>
> 2) Grab the reset lock and stop all the schedulers
>
> After above 2 steps the HW fences array is idle, no more insertions 
> and no more extractions from the array
>
> 3) Run one time amdgpu_fence_process to signal all current HW fences
>
> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release the 
> GPU reset lock and go on with the rest of the sequence (cancel timers, 
> work items e.t.c)
>
> What's problematic in this sequence ?

drm_dev_unplug() will wait for the IOCTLs to finish.

The IOCTLs in turn can wait for fences. That can be both hardware 
fences, scheduler fences, as well as fences from other devices (and KIQ 
fences for register writes under SRIOV, but we can hopefully ignore them 
for now).

We have handled the hardware fences, but we have no idea when the 
scheduler fences or the fences from other devices will signal.

Scheduler fences won't signal until the scheduler threads are restarted 
or we somehow cancel the submissions. Doable, but tricky as well.

For waiting for other device I have no idea if that couldn't deadlock 
somehow.

Regards,
Christian.

>
> Andrey
>
>
>>
>>
>>>>
>>>>>>
>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>> device and we deadlock because of this?
>>>>>
>>>>>
>>>>> I haven't actually experienced any deadlock until now but, yes, 
>>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>>>> presence  of multiple devices from same or different drivers we in 
>>>>> fact are dependent on all their critical sections i guess.
>>>>>
>>>>
>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>>>> need to sync that up with Daniel and the rest of the i915 guys.
>>>>
>>>> IIRC we could actually have an amdgpu device in a docking station 
>>>> which needs hotplug and the driver might depend on waiting for the 
>>>> i915 driver as well.
>>>
>>>
>>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
>>> don't see why it has to be global and not per device thing.
>>
>> I'm really wondering the same thing for quite a while now.
>>
>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>     /* Past this point no more fence are submitted to HW 
>>>>>>>>>>> ring and hence we can safely call force signal on all that 
>>>>>>>>>>> are currently there.
>>>>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>>>>> signaled with an error code right away
>>>>>>>>>>>      */
>>>>>>>>>>>
>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>
>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>     hw_fini
>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could work 
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU unplug 
>>>>>>>>>>>>>>> flag for unplug. Also, not clear to me why are we 
>>>>>>>>>>>>>>> focusing on the scheduler threads, any code patch to 
>>>>>>>>>>>>>>> generate HW fences should be covered, so any code 
>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken into 
>>>>>>>>>>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>>>>> reset write side lock during HW fences signalling in order 
>>>>>>>>>>>>> to protect against scheduler/HW fences detachment and 
>>>>>>>>>>>>> reattachment during schedulers stop/restart. But if we go 
>>>>>>>>>>>>> with your approach  then calling drm_dev_unplug and 
>>>>>>>>>>>>> scoping amdgpu_job_timeout with drm_dev_enter/exit should 
>>>>>>>>>>>>> be enough to prevent any concurrent GPU resets during 
>>>>>>>>>>>>> unplug. In fact I already do it anyway - 
>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13 18:03                                                                 ` Christian König
@ 2021-04-13 18:18                                                                   ` Andrey Grodzovsky
  2021-04-13 18:25                                                                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-13 18:18 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-13 2:03 p.m., Christian König wrote:
> Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-13 3:10 a.m., Christian König wrote:
>>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>>
>>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>>> running amdgpu_fence_driver_force_completion we will signal all 
>>>>>>>> the HW fences and not racing against some more fences insertion 
>>>>>>>> into that array ?
>>>>>>>>
>>>>>>>
>>>>>>> Well I would still say the best approach would be to insert this 
>>>>>>> between the front end and the backend and not rely on signaling 
>>>>>>> fences while holding the device srcu.
>>>>>>
>>>>>>
>>>>>> My question is, even now, when we run 
>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>>> what there prevents a race with another fence being at the same 
>>>>>> time emitted and inserted into the fence array ? Looks like nothing.
>>>>>>
>>>>>
>>>>> Each ring can only be used by one thread at the same time, this 
>>>>> includes emitting fences as well as other stuff.
>>>>>
>>>>> During GPU reset we make sure nobody writes to the rings by 
>>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>>> nobody else can start the scheduler again).
>>>>
>>>>
>>>> What about direct submissions not through scheduler - 
>>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>>
>>> Those only happen during startup and GPU reset.
>>
>>
>> Ok, but then looks like I am missing something, see the following 
>> steps in amdgpu_pci_remove -
>>
>> 1) Use disable_irq API function to stop and flush all in flight HW 
>> interrupts handlers
>>
>> 2) Grab the reset lock and stop all the schedulers
>>
>> After above 2 steps the HW fences array is idle, no more insertions 
>> and no more extractions from the array
>>
>> 3) Run one time amdgpu_fence_process to signal all current HW fences
>>
>> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
>> the GPU reset lock and go on with the rest of the sequence (cancel 
>> timers, work items e.t.c)
>>
>> What's problematic in this sequence ?
>
> drm_dev_unplug() will wait for the IOCTLs to finish.
>
> The IOCTLs in turn can wait for fences. That can be both hardware 
> fences, scheduler fences, as well as fences from other devices (and 
> KIQ fences for register writes under SRIOV, but we can hopefully 
> ignore them for now).
>
> We have handled the hardware fences, but we have no idea when the 
> scheduler fences or the fences from other devices will signal.
>
> Scheduler fences won't signal until the scheduler threads are 
> restarted or we somehow cancel the submissions. Doable, but tricky as 
> well.


For scheduler fences I am not worried, for the sched_fence->finished 
fence they are by definition attached to HW fences which already 
signaled,for sched_fence->scheduled we should run 
drm_sched_entity_kill_jobs for each entity after stopping the scheduler 
threads and before setting drm_dev_unplug.


>
> For waiting for other device I have no idea if that couldn't deadlock 
> somehow.


Yea, not sure for imported fences and dma_bufs, I would assume the other 
devices should not be impacted by our device removal but, who knows...

So I guess we are NOT going with finalizing HW fences before 
drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
back-ends all over the place where there are MMIO accesses ?

Andrey

>
> Regards,
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>
>>>>>
>>>>>>>
>>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>>> device and we deadlock because of this?
>>>>>>
>>>>>>
>>>>>> I haven't actually experienced any deadlock until now but, yes, 
>>>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>>>>> presence  of multiple devices from same or different drivers we 
>>>>>> in fact are dependent on all their critical sections i guess.
>>>>>>
>>>>>
>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we actually 
>>>>> need to sync that up with Daniel and the rest of the i915 guys.
>>>>>
>>>>> IIRC we could actually have an amdgpu device in a docking station 
>>>>> which needs hotplug and the driver might depend on waiting for the 
>>>>> i915 driver as well.
>>>>
>>>>
>>>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? I 
>>>> don't see why it has to be global and not per device thing.
>>>
>>> I'm really wondering the same thing for quite a while now.
>>>
>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
>>> global.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>     /* Past this point no more fence are submitted to HW 
>>>>>>>>>>>> ring and hence we can safely call force signal on all that 
>>>>>>>>>>>> are currently there.
>>>>>>>>>>>>      * Any subsequently created  HW fences will be returned 
>>>>>>>>>>>> signaled with an error code right away
>>>>>>>>>>>>      */
>>>>>>>>>>>>
>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>
>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could work 
>>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU unplug 
>>>>>>>>>>>>>>>> flag for unplug. Also, not clear to me why are we 
>>>>>>>>>>>>>>>> focusing on the scheduler threads, any code patch to 
>>>>>>>>>>>>>>>> generate HW fences should be covered, so any code 
>>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken into 
>>>>>>>>>>>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For going my way indeed now I see now that I have to take 
>>>>>>>>>>>>>> reset write side lock during HW fences signalling in 
>>>>>>>>>>>>>> order to protect against scheduler/HW fences detachment 
>>>>>>>>>>>>>> and reattachment during schedulers stop/restart. But if 
>>>>>>>>>>>>>> we go with your approach  then calling drm_dev_unplug and 
>>>>>>>>>>>>>> scoping amdgpu_job_timeout with drm_dev_enter/exit should 
>>>>>>>>>>>>>> be enough to prevent any concurrent GPU resets during 
>>>>>>>>>>>>>> unplug. In fact I already do it anyway - 
>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13 18:18                                                                   ` Andrey Grodzovsky
@ 2021-04-13 18:25                                                                     ` Christian König
  2021-04-13 18:30                                                                       ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-13 18:25 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter



Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:
>
> On 2021-04-13 2:03 p.m., Christian König wrote:
>> Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-04-13 3:10 a.m., Christian König wrote:
>>>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>>>> [SNIP]
>>>>>>>>>
>>>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>>>> running amdgpu_fence_driver_force_completion we will signal 
>>>>>>>>> all the HW fences and not racing against some more fences 
>>>>>>>>> insertion into that array ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well I would still say the best approach would be to insert 
>>>>>>>> this between the front end and the backend and not rely on 
>>>>>>>> signaling fences while holding the device srcu.
>>>>>>>
>>>>>>>
>>>>>>> My question is, even now, when we run 
>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>>>> what there prevents a race with another fence being at the same 
>>>>>>> time emitted and inserted into the fence array ? Looks like 
>>>>>>> nothing.
>>>>>>>
>>>>>>
>>>>>> Each ring can only be used by one thread at the same time, this 
>>>>>> includes emitting fences as well as other stuff.
>>>>>>
>>>>>> During GPU reset we make sure nobody writes to the rings by 
>>>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>>>> nobody else can start the scheduler again).
>>>>>
>>>>>
>>>>> What about direct submissions not through scheduler - 
>>>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>>>
>>>> Those only happen during startup and GPU reset.
>>>
>>>
>>> Ok, but then looks like I am missing something, see the following 
>>> steps in amdgpu_pci_remove -
>>>
>>> 1) Use disable_irq API function to stop and flush all in flight HW 
>>> interrupts handlers
>>>
>>> 2) Grab the reset lock and stop all the schedulers
>>>
>>> After above 2 steps the HW fences array is idle, no more insertions 
>>> and no more extractions from the array
>>>
>>> 3) Run one time amdgpu_fence_process to signal all current HW fences
>>>
>>> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
>>> the GPU reset lock and go on with the rest of the sequence (cancel 
>>> timers, work items e.t.c)
>>>
>>> What's problematic in this sequence ?
>>
>> drm_dev_unplug() will wait for the IOCTLs to finish.
>>
>> The IOCTLs in turn can wait for fences. That can be both hardware 
>> fences, scheduler fences, as well as fences from other devices (and 
>> KIQ fences for register writes under SRIOV, but we can hopefully 
>> ignore them for now).
>>
>> We have handled the hardware fences, but we have no idea when the 
>> scheduler fences or the fences from other devices will signal.
>>
>> Scheduler fences won't signal until the scheduler threads are 
>> restarted or we somehow cancel the submissions. Doable, but tricky as 
>> well.
>
>
> For scheduler fences I am not worried, for the sched_fence->finished 
> fence they are by definition attached to HW fences which already 
> signaledfor sched_fence->scheduled we should run 
> drm_sched_entity_kill_jobs for each entity after stopping the 
> scheduler threads and before setting drm_dev_unplug.

Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() 
assumes that there are no more jobs pushed into the entity.

We are racing here once more and need to handle that.

>>
>> For waiting for other device I have no idea if that couldn't deadlock 
>> somehow.
>
>
> Yea, not sure for imported fences and dma_bufs, I would assume the 
> other devices should not be impacted by our device removal but, who 
> knows...
>
> So I guess we are NOT going with finalizing HW fences before 
> drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
> back-ends all over the place where there are MMIO accesses ?

Good question. As you said that is really the hard path.

Handling it all at once at IOCTL level certainly has some appeal as 
well, but I have no idea if we can guarantee that this is lock free.

Christian.

>
> Andrey
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>
>>>>>>
>>>>>>>>
>>>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>>>> device and we deadlock because of this?
>>>>>>>
>>>>>>>
>>>>>>> I haven't actually experienced any deadlock until now but, yes, 
>>>>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>>>>>> presence  of multiple devices from same or different drivers we 
>>>>>>> in fact are dependent on all their critical sections i guess.
>>>>>>>
>>>>>>
>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>> actually need to sync that up with Daniel and the rest of the 
>>>>>> i915 guys.
>>>>>>
>>>>>> IIRC we could actually have an amdgpu device in a docking station 
>>>>>> which needs hotplug and the driver might depend on waiting for 
>>>>>> the i915 driver as well.
>>>>>
>>>>>
>>>>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? 
>>>>> I don't see why it has to be global and not per device thing.
>>>>
>>>> I'm really wondering the same thing for quite a while now.
>>>>
>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
>>>> global.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>     /* Past this point no more fence are submitted to HW 
>>>>>>>>>>>>> ring and hence we can safely call force signal on all that 
>>>>>>>>>>>>> are currently there.
>>>>>>>>>>>>>      * Any subsequently created  HW fences will be 
>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>      */
>>>>>>>>>>>>>
>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>>
>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could work 
>>>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU unplug 
>>>>>>>>>>>>>>>>> flag for unplug. Also, not clear to me why are we 
>>>>>>>>>>>>>>>>> focusing on the scheduler threads, any code patch to 
>>>>>>>>>>>>>>>>> generate HW fences should be covered, so any code 
>>>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken into 
>>>>>>>>>>>>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For going my way indeed now I see now that I have to 
>>>>>>>>>>>>>>> take reset write side lock during HW fences signalling 
>>>>>>>>>>>>>>> in order to protect against scheduler/HW fences 
>>>>>>>>>>>>>>> detachment and reattachment during schedulers 
>>>>>>>>>>>>>>> stop/restart. But if we go with your approach  then 
>>>>>>>>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout 
>>>>>>>>>>>>>>> with drm_dev_enter/exit should be enough to prevent any 
>>>>>>>>>>>>>>> concurrent GPU resets during unplug. In fact I already 
>>>>>>>>>>>>>>> do it anyway - 
>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13 18:25                                                                     ` Christian König
@ 2021-04-13 18:30                                                                       ` Andrey Grodzovsky
  2021-04-14  7:01                                                                         ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-13 18:30 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-13 2:25 p.m., Christian König wrote:
>
>
> Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-13 2:03 p.m., Christian König wrote:
>>> Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-04-13 3:10 a.m., Christian König wrote:
>>>>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>>
>>>>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>>>>> running amdgpu_fence_driver_force_completion we will signal 
>>>>>>>>>> all the HW fences and not racing against some more fences 
>>>>>>>>>> insertion into that array ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well I would still say the best approach would be to insert 
>>>>>>>>> this between the front end and the backend and not rely on 
>>>>>>>>> signaling fences while holding the device srcu.
>>>>>>>>
>>>>>>>>
>>>>>>>> My question is, even now, when we run 
>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>>>>> what there prevents a race with another fence being at the same 
>>>>>>>> time emitted and inserted into the fence array ? Looks like 
>>>>>>>> nothing.
>>>>>>>>
>>>>>>>
>>>>>>> Each ring can only be used by one thread at the same time, this 
>>>>>>> includes emitting fences as well as other stuff.
>>>>>>>
>>>>>>> During GPU reset we make sure nobody writes to the rings by 
>>>>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>>>>> nobody else can start the scheduler again).
>>>>>>
>>>>>>
>>>>>> What about direct submissions not through scheduler - 
>>>>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>>>>
>>>>> Those only happen during startup and GPU reset.
>>>>
>>>>
>>>> Ok, but then looks like I am missing something, see the following 
>>>> steps in amdgpu_pci_remove -
>>>>
>>>> 1) Use disable_irq API function to stop and flush all in flight HW 
>>>> interrupts handlers
>>>>
>>>> 2) Grab the reset lock and stop all the schedulers
>>>>
>>>> After above 2 steps the HW fences array is idle, no more insertions 
>>>> and no more extractions from the array
>>>>
>>>> 3) Run one time amdgpu_fence_process to signal all current HW fences
>>>>
>>>> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
>>>> the GPU reset lock and go on with the rest of the sequence (cancel 
>>>> timers, work items e.t.c)
>>>>
>>>> What's problematic in this sequence ?
>>>
>>> drm_dev_unplug() will wait for the IOCTLs to finish.
>>>
>>> The IOCTLs in turn can wait for fences. That can be both hardware 
>>> fences, scheduler fences, as well as fences from other devices (and 
>>> KIQ fences for register writes under SRIOV, but we can hopefully 
>>> ignore them for now).
>>>
>>> We have handled the hardware fences, but we have no idea when the 
>>> scheduler fences or the fences from other devices will signal.
>>>
>>> Scheduler fences won't signal until the scheduler threads are 
>>> restarted or we somehow cancel the submissions. Doable, but tricky 
>>> as well.
>>
>>
>> For scheduler fences I am not worried, for the sched_fence->finished 
>> fence they are by definition attached to HW fences which already 
>> signaledfor sched_fence->scheduled we should run 
>> drm_sched_entity_kill_jobs for each entity after stopping the 
>> scheduler threads and before setting drm_dev_unplug.
>
> Well exactly that is what is tricky here. drm_sched_entity_kill_jobs() 
> assumes that there are no more jobs pushed into the entity.
>
> We are racing here once more and need to handle that.


But why, I wrote above that we first stop the all schedulers, then only 
call drm_sched_entity_kill_jobs.


>
>>>
>>> For waiting for other device I have no idea if that couldn't 
>>> deadlock somehow.
>>
>>
>> Yea, not sure for imported fences and dma_bufs, I would assume the 
>> other devices should not be impacted by our device removal but, who 
>> knows...
>>
>> So I guess we are NOT going with finalizing HW fences before 
>> drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
>> back-ends all over the place where there are MMIO accesses ?
>
> Good question. As you said that is really the hard path.
>
> Handling it all at once at IOCTL level certainly has some appeal as 
> well, but I have no idea if we can guarantee that this is lock free.


Maybe just empirically - let's try it and see under different test 
scenarios what actually happens  ?

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>>>>> device and we deadlock because of this?
>>>>>>>>
>>>>>>>>
>>>>>>>> I haven't actually experienced any deadlock until now but, yes, 
>>>>>>>> drm_unplug_srcu is defined as static in drm_drv.c and so in the 
>>>>>>>> presence  of multiple devices from same or different drivers we 
>>>>>>>> in fact are dependent on all their critical sections i guess.
>>>>>>>>
>>>>>>>
>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>> actually need to sync that up with Daniel and the rest of the 
>>>>>>> i915 guys.
>>>>>>>
>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>> waiting for the i915 driver as well.
>>>>>>
>>>>>>
>>>>>> Can't we propose a patch to make drm_unplug_srcu per drm_device ? 
>>>>>> I don't see why it has to be global and not per device thing.
>>>>>
>>>>> I'm really wondering the same thing for quite a while now.
>>>>>
>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
>>>>> global.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>     /* Past this point no more fence are submitted to HW 
>>>>>>>>>>>>>> ring and hence we can safely call force signal on all 
>>>>>>>>>>>>>> that are currently there.
>>>>>>>>>>>>>>      * Any subsequently created  HW fences will be 
>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could 
>>>>>>>>>>>>>>>>>>> work as well.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU 
>>>>>>>>>>>>>>>>>> unplug flag for unplug. Also, not clear to me why are 
>>>>>>>>>>>>>>>>>> we focusing on the scheduler threads, any code patch 
>>>>>>>>>>>>>>>>>> to generate HW fences should be covered, so any code 
>>>>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken into 
>>>>>>>>>>>>>>>>>> account such as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have to 
>>>>>>>>>>>>>>>> take reset write side lock during HW fences signalling 
>>>>>>>>>>>>>>>> in order to protect against scheduler/HW fences 
>>>>>>>>>>>>>>>> detachment and reattachment during schedulers 
>>>>>>>>>>>>>>>> stop/restart. But if we go with your approach  then 
>>>>>>>>>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout 
>>>>>>>>>>>>>>>> with drm_dev_enter/exit should be enough to prevent any 
>>>>>>>>>>>>>>>> concurrent GPU resets during unplug. In fact I already 
>>>>>>>>>>>>>>>> do it anyway - 
>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  7:10                                                             ` Christian König
  2021-04-13  9:13                                                               ` Li, Dennis
  2021-04-13 15:12                                                               ` Andrey Grodzovsky
@ 2021-04-13 20:07                                                               ` Daniel Vetter
  2 siblings, 0 replies; 56+ messages in thread
From: Daniel Vetter @ 2021-04-13 20:07 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, Kuehling, Felix, amd-gfx, Deucher, Alexander,
	Christian König, Li, Dennis, Zhang, Hawking

On Tue, Apr 13, 2021 at 9:10 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
> >
> > On 2021-04-12 3:18 p.m., Christian König wrote:
> >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
> >>> [SNIP]
> >>>>>
> >>>>> So what's the right approach ? How we guarantee that when running
> >>>>> amdgpu_fence_driver_force_completion we will signal all the HW
> >>>>> fences and not racing against some more fences insertion into that
> >>>>> array ?
> >>>>>
> >>>>
> >>>> Well I would still say the best approach would be to insert this
> >>>> between the front end and the backend and not rely on signaling
> >>>> fences while holding the device srcu.
> >>>
> >>>
> >>> My question is, even now, when we run
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
> >>> what there prevents a race with another fence being at the same time
> >>> emitted and inserted into the fence array ? Looks like nothing.
> >>>
> >>
> >> Each ring can only be used by one thread at the same time, this
> >> includes emitting fences as well as other stuff.
> >>
> >> During GPU reset we make sure nobody writes to the rings by stopping
> >> the scheduler and taking the GPU reset lock (so that nobody else can
> >> start the scheduler again).
> >
> >
> > What about direct submissions not through scheduler -
> > amdgpu_job_submit_direct, I don't see how this is protected.
>
> Those only happen during startup and GPU reset.
>
> >>
> >>>>
> >>>> BTW: Could it be that the device SRCU protects more than one device
> >>>> and we deadlock because of this?
> >>>
> >>>
> >>> I haven't actually experienced any deadlock until now but, yes,
> >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the
> >>> presence  of multiple devices from same or different drivers we in
> >>> fact are dependent on all their critical sections i guess.
> >>>
> >>
> >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually
> >> need to sync that up with Daniel and the rest of the i915 guys.
> >>
> >> IIRC we could actually have an amdgpu device in a docking station
> >> which needs hotplug and the driver might depend on waiting for the
> >> i915 driver as well.
> >
> >
> > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
> > don't see why it has to be global and not per device thing.
>
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.

SRCU isn't exactly the cheapest thing, but aside from that we could
make it per-device. I'm not seeing the point much since if you do end
up being stuck on an ioctl this might happen with anything really.

Also note that dma_fence_waits are supposed to be time bound, so you
shouldn't end up waiting on them forever. It should all get sorted out
in due time with TDR I hope (e.g. if i915 is stuck on a fence because
you're unlucky).
-Daniel

>
> Regards,
> Christian.
>
> >
> > Andrey
> >
> >
> >>
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> >>>>
> >>>> Christian.
> >>>>
> >>>>> Andrey
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>> Andrey
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>>     /* Past this point no more fence are submitted to HW ring
> >>>>>>>>> and hence we can safely call force signal on all that are
> >>>>>>>>> currently there.
> >>>>>>>>>      * Any subsequently created  HW fences will be returned
> >>>>>>>>> signaled with an error code right away
> >>>>>>>>>      */
> >>>>>>>>>
> >>>>>>>>>     for_each_ring(adev)
> >>>>>>>>>         amdgpu_fence_process(ring)
> >>>>>>>>>
> >>>>>>>>>     drm_dev_unplug(dev);
> >>>>>>>>>     Stop schedulers
> >>>>>>>>>     cancel_sync(all timers and queued works);
> >>>>>>>>>     hw_fini
> >>>>>>>>>     unmap_mmio
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Andrey
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping
> >>>>>>>>>>>>>> and then restarting the scheduler could work as well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse
> >>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for
> >>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the
> >>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences
> >>>>>>>>>>>>> should be covered, so any code leading to
> >>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account such as,
> >>>>>>>>>>>>> direct IB submissions, VM flushes e.t.c
> >>>>>>>>>>>>
> >>>>>>>>>>>> You need to work together with the reset lock anyway, cause
> >>>>>>>>>>>> a hotplug could run at the same time as a reset.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> For going my way indeed now I see now that I have to take
> >>>>>>>>>>> reset write side lock during HW fences signalling in order
> >>>>>>>>>>> to protect against scheduler/HW fences detachment and
> >>>>>>>>>>> reattachment during schedulers stop/restart. But if we go
> >>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping
> >>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough
> >>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact
> >>>>>>>>>>> I already do it anyway -
> >>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ceefa9c90ed8c405ec3b708d8fc46daaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637536728550884740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UiNaJE%2BH45iYmbwSDnMSKZS5z0iak0fNlbbfYqKS2Jo%3D&amp;reserved=0
> >>>>>>>>>>
> >>>>>>>>>> Yes, good point as well.
> >>>>>>>>>>
> >>>>>>>>>> Christian.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Andrey
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Christian.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13  9:13                                                               ` Li, Dennis
  2021-04-13  9:14                                                                 ` Christian König
@ 2021-04-13 20:08                                                                 ` Daniel Vetter
  1 sibling, 0 replies; 56+ messages in thread
From: Daniel Vetter @ 2021-04-13 20:08 UTC (permalink / raw)
  To: Li, Dennis
  Cc: Grodzovsky, Andrey, Christian König, Kuehling, Felix,
	amd-gfx, Deucher, Alexander, Koenig, Christian, Zhang, Hawking

On Tue, Apr 13, 2021 at 11:13 AM Li, Dennis <Dennis.Li@amd.com> wrote:
>
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Christian and Andrey,
>       We maybe try to implement "wait" callback function of dma_fence_ops, when GPU reset or unplug happen, make this callback return - ENODEV, to notify the caller device lost.
>
>          * Must return -ERESTARTSYS if the wait is intr = true and the wait was
>          * interrupted, and remaining jiffies if fence has signaled, or 0 if wait
>          * timed out. Can also return other error values on custom implementations,
>          * which should be treated as if the fence is signaled. For example a hardware
>          * lockup could be reported like that.
>          *
>          * This callback is optional.
>          */
>         signed long (*wait)(struct dma_fence *fence,
>                             bool intr, signed long timeout);

Uh this callback is for old horros like unreliable irq delivery on
radeon. Please don't use it for anything, if we need to make fences
bail out on error then we need something that works for all fences.
Also TDR should recovery you here already and make sure the
dma_fence_wait() is bound in time.
-Daniel

>
> Best Regards
> Dennis Li
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Tuesday, April 13, 2021 3:10 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Li, Dennis <Dennis.Li@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> Subject: Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
>
> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
> >
> > On 2021-04-12 3:18 p.m., Christian König wrote:
> >> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
> >>> [SNIP]
> >>>>>
> >>>>> So what's the right approach ? How we guarantee that when running
> >>>>> amdgpu_fence_driver_force_completion we will signal all the HW
> >>>>> fences and not racing against some more fences insertion into that
> >>>>> array ?
> >>>>>
> >>>>
> >>>> Well I would still say the best approach would be to insert this
> >>>> between the front end and the backend and not rely on signaling
> >>>> fences while holding the device srcu.
> >>>
> >>>
> >>> My question is, even now, when we run
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or
> >>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion,
> >>> what there prevents a race with another fence being at the same time
> >>> emitted and inserted into the fence array ? Looks like nothing.
> >>>
> >>
> >> Each ring can only be used by one thread at the same time, this
> >> includes emitting fences as well as other stuff.
> >>
> >> During GPU reset we make sure nobody writes to the rings by stopping
> >> the scheduler and taking the GPU reset lock (so that nobody else can
> >> start the scheduler again).
> >
> >
> > What about direct submissions not through scheduler -
> > amdgpu_job_submit_direct, I don't see how this is protected.
>
> Those only happen during startup and GPU reset.
>
> >>
> >>>>
> >>>> BTW: Could it be that the device SRCU protects more than one device
> >>>> and we deadlock because of this?
> >>>
> >>>
> >>> I haven't actually experienced any deadlock until now but, yes,
> >>> drm_unplug_srcu is defined as static in drm_drv.c and so in the
> >>> presence  of multiple devices from same or different drivers we in
> >>> fact are dependent on all their critical sections i guess.
> >>>
> >>
> >> Shit, yeah the devil is a squirrel. So for A+I laptops we actually
> >> need to sync that up with Daniel and the rest of the i915 guys.
> >>
> >> IIRC we could actually have an amdgpu device in a docking station
> >> which needs hotplug and the driver might depend on waiting for the
> >> i915 driver as well.
> >
> >
> > Can't we propose a patch to make drm_unplug_srcu per drm_device ? I
> > don't see why it has to be global and not per device thing.
>
> I'm really wondering the same thing for quite a while now.
>
> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is global.
>
> Regards,
> Christian.
>
> >
> > Andrey
> >
> >
> >>
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> >>>>
> >>>> Christian.
> >>>>
> >>>>> Andrey
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>> Andrey
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>>     /* Past this point no more fence are submitted to HW ring
> >>>>>>>>> and hence we can safely call force signal on all that are
> >>>>>>>>> currently there.
> >>>>>>>>>      * Any subsequently created  HW fences will be returned
> >>>>>>>>> signaled with an error code right away
> >>>>>>>>>      */
> >>>>>>>>>
> >>>>>>>>>     for_each_ring(adev)
> >>>>>>>>>         amdgpu_fence_process(ring)
> >>>>>>>>>
> >>>>>>>>>     drm_dev_unplug(dev);
> >>>>>>>>>     Stop schedulers
> >>>>>>>>>     cancel_sync(all timers and queued works);
> >>>>>>>>>     hw_fini
> >>>>>>>>>     unmap_mmio
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Andrey
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Alternatively grabbing the reset write side and stopping
> >>>>>>>>>>>>>> and then restarting the scheduler could work as well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I didn't get the above and I don't see why I need to reuse
> >>>>>>>>>>>>> the GPU reset rw_lock. I rely on the SRCU unplug flag for
> >>>>>>>>>>>>> unplug. Also, not clear to me why are we focusing on the
> >>>>>>>>>>>>> scheduler threads, any code patch to generate HW fences
> >>>>>>>>>>>>> should be covered, so any code leading to
> >>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account such as,
> >>>>>>>>>>>>> direct IB submissions, VM flushes e.t.c
> >>>>>>>>>>>>
> >>>>>>>>>>>> You need to work together with the reset lock anyway, cause
> >>>>>>>>>>>> a hotplug could run at the same time as a reset.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> For going my way indeed now I see now that I have to take
> >>>>>>>>>>> reset write side lock during HW fences signalling in order
> >>>>>>>>>>> to protect against scheduler/HW fences detachment and
> >>>>>>>>>>> reattachment during schedulers stop/restart. But if we go
> >>>>>>>>>>> with your approach  then calling drm_dev_unplug and scoping
> >>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be enough
> >>>>>>>>>>> to prevent any concurrent GPU resets during unplug. In fact
> >>>>>>>>>>> I already do it anyway -
> >>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2
> >>>>>>>>>>> F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh
> >>>>>>>>>>> %3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc
> >>>>>>>>>>> 36b1&amp;data=04%7C01%7CDennis.Li%40amd.com%7Cc7fc6cb505c34a
> >>>>>>>>>>> edfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C
> >>>>>>>>>>> 0%7C637538946323194151%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> >>>>>>>>>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000
> >>>>>>>>>>> &amp;sdata=%2Fe%2BqJNlcuUjLHsLvfHCKqerK%2Ff8lzujqOBhnMBIRP8E
> >>>>>>>>>>> %3D&amp;reserved=0
> >>>>>>>>>>
> >>>>>>>>>> Yes, good point as well.
> >>>>>>>>>>
> >>>>>>>>>> Christian.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Andrey
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Christian.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-13 18:30                                                                       ` Andrey Grodzovsky
@ 2021-04-14  7:01                                                                         ` Christian König
  2021-04-14 14:36                                                                           ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-14  7:01 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Am 13.04.21 um 20:30 schrieb Andrey Grodzovsky:
>
> On 2021-04-13 2:25 p.m., Christian König wrote:
>>
>>
>> Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>
>>> On 2021-04-13 2:03 p.m., Christian König wrote:
>>>> Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 2021-04-13 3:10 a.m., Christian König wrote:
>>>>>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>>>>>
>>>>>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>>>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>>>>>> [SNIP]
>>>>>>>>>>>
>>>>>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>>>>>> running amdgpu_fence_driver_force_completion we will signal 
>>>>>>>>>>> all the HW fences and not racing against some more fences 
>>>>>>>>>>> insertion into that array ?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well I would still say the best approach would be to insert 
>>>>>>>>>> this between the front end and the backend and not rely on 
>>>>>>>>>> signaling fences while holding the device srcu.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My question is, even now, when we run 
>>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>>>>>> what there prevents a race with another fence being at the 
>>>>>>>>> same time emitted and inserted into the fence array ? Looks 
>>>>>>>>> like nothing.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Each ring can only be used by one thread at the same time, this 
>>>>>>>> includes emitting fences as well as other stuff.
>>>>>>>>
>>>>>>>> During GPU reset we make sure nobody writes to the rings by 
>>>>>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>>>>>> nobody else can start the scheduler again).
>>>>>>>
>>>>>>>
>>>>>>> What about direct submissions not through scheduler - 
>>>>>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>>>>>
>>>>>> Those only happen during startup and GPU reset.
>>>>>
>>>>>
>>>>> Ok, but then looks like I am missing something, see the following 
>>>>> steps in amdgpu_pci_remove -
>>>>>
>>>>> 1) Use disable_irq API function to stop and flush all in flight HW 
>>>>> interrupts handlers
>>>>>
>>>>> 2) Grab the reset lock and stop all the schedulers
>>>>>
>>>>> After above 2 steps the HW fences array is idle, no more 
>>>>> insertions and no more extractions from the array
>>>>>
>>>>> 3) Run one time amdgpu_fence_process to signal all current HW fences
>>>>>
>>>>> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), release 
>>>>> the GPU reset lock and go on with the rest of the sequence (cancel 
>>>>> timers, work items e.t.c)
>>>>>
>>>>> What's problematic in this sequence ?
>>>>
>>>> drm_dev_unplug() will wait for the IOCTLs to finish.
>>>>
>>>> The IOCTLs in turn can wait for fences. That can be both hardware 
>>>> fences, scheduler fences, as well as fences from other devices (and 
>>>> KIQ fences for register writes under SRIOV, but we can hopefully 
>>>> ignore them for now).
>>>>
>>>> We have handled the hardware fences, but we have no idea when the 
>>>> scheduler fences or the fences from other devices will signal.
>>>>
>>>> Scheduler fences won't signal until the scheduler threads are 
>>>> restarted or we somehow cancel the submissions. Doable, but tricky 
>>>> as well.
>>>
>>>
>>> For scheduler fences I am not worried, for the sched_fence->finished 
>>> fence they are by definition attached to HW fences which already 
>>> signaledfor sched_fence->scheduled we should run 
>>> drm_sched_entity_kill_jobs for each entity after stopping the 
>>> scheduler threads and before setting drm_dev_unplug.
>>
>> Well exactly that is what is tricky here. 
>> drm_sched_entity_kill_jobs() assumes that there are no more jobs 
>> pushed into the entity.
>>
>> We are racing here once more and need to handle that.
>
>
> But why, I wrote above that we first stop the all schedulers, then 
> only call drm_sched_entity_kill_jobs.

The schedulers consuming jobs is not the problem, we already handle that 
correct.

The problem is that the entities might continue feeding stuff into the 
scheduler.

>>
>>>>
>>>> For waiting for other device I have no idea if that couldn't 
>>>> deadlock somehow.
>>>
>>>
>>> Yea, not sure for imported fences and dma_bufs, I would assume the 
>>> other devices should not be impacted by our device removal but, who 
>>> knows...
>>>
>>> So I guess we are NOT going with finalizing HW fences before 
>>> drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
>>> back-ends all over the place where there are MMIO accesses ?
>>
>> Good question. As you said that is really the hard path.
>>
>> Handling it all at once at IOCTL level certainly has some appeal as 
>> well, but I have no idea if we can guarantee that this is lock free.
>
>
> Maybe just empirically - let's try it and see under different test 
> scenarios what actually happens  ?

Not a good idea in general, we have that approach way to often at AMD 
and are then surprised that everything works in QA but fails in production.

But Daniel already noted in his reply that waiting for a fence while 
holding the SRCU is expected to work.

So let's stick with the approach of high level locking for hotplug.

Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>>>>>> device and we deadlock because of this?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I haven't actually experienced any deadlock until now but, 
>>>>>>>>> yes, drm_unplug_srcu is defined as static in drm_drv.c and so 
>>>>>>>>> in the presence  of multiple devices from same or different 
>>>>>>>>> drivers we in fact are dependent on all their critical 
>>>>>>>>> sections i guess.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>> actually need to sync that up with Daniel and the rest of the 
>>>>>>>> i915 guys.
>>>>>>>>
>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>> waiting for the i915 driver as well.
>>>>>>>
>>>>>>>
>>>>>>> Can't we propose a patch to make drm_unplug_srcu per drm_device 
>>>>>>> ? I don't see why it has to be global and not per device thing.
>>>>>>
>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>
>>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
>>>>>> global.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted to HW 
>>>>>>>>>>>>>>> ring and hence we can safely call force signal on all 
>>>>>>>>>>>>>>> that are currently there.
>>>>>>>>>>>>>>>      * Any subsequently created  HW fences will be 
>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could 
>>>>>>>>>>>>>>>>>>>> work as well.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need to 
>>>>>>>>>>>>>>>>>>> reuse the GPU reset rw_lock. I rely on the SRCU 
>>>>>>>>>>>>>>>>>>> unplug flag for unplug. Also, not clear to me why 
>>>>>>>>>>>>>>>>>>> are we focusing on the scheduler threads, any code 
>>>>>>>>>>>>>>>>>>> patch to generate HW fences should be covered, so 
>>>>>>>>>>>>>>>>>>> any code leading to amdgpu_fence_emit needs to be 
>>>>>>>>>>>>>>>>>>> taken into account such as, direct IB submissions, 
>>>>>>>>>>>>>>>>>>> VM flushes e.t.c
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You need to work together with the reset lock anyway, 
>>>>>>>>>>>>>>>>>> cause a hotplug could run at the same time as a reset.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have to 
>>>>>>>>>>>>>>>>> take reset write side lock during HW fences signalling 
>>>>>>>>>>>>>>>>> in order to protect against scheduler/HW fences 
>>>>>>>>>>>>>>>>> detachment and reattachment during schedulers 
>>>>>>>>>>>>>>>>> stop/restart. But if we go with your approach  then 
>>>>>>>>>>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout 
>>>>>>>>>>>>>>>>> with drm_dev_enter/exit should be enough to prevent 
>>>>>>>>>>>>>>>>> any concurrent GPU resets during unplug. In fact I 
>>>>>>>>>>>>>>>>> already do it anyway - 
>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-14  7:01                                                                         ` Christian König
@ 2021-04-14 14:36                                                                           ` Andrey Grodzovsky
  2021-04-14 14:58                                                                             ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-14 14:36 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-14 3:01 a.m., Christian König wrote:
> Am 13.04.21 um 20:30 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-13 2:25 p.m., Christian König wrote:
>>>
>>>
>>> Am 13.04.21 um 20:18 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-04-13 2:03 p.m., Christian König wrote:
>>>>> Am 13.04.21 um 17:12 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 2021-04-13 3:10 a.m., Christian König wrote:
>>>>>>> Am 12.04.21 um 22:01 schrieb Andrey Grodzovsky:
>>>>>>>>
>>>>>>>> On 2021-04-12 3:18 p.m., Christian König wrote:
>>>>>>>>> Am 12.04.21 um 21:12 schrieb Andrey Grodzovsky:
>>>>>>>>>> [SNIP]
>>>>>>>>>>>>
>>>>>>>>>>>> So what's the right approach ? How we guarantee that when 
>>>>>>>>>>>> running amdgpu_fence_driver_force_completion we will signal 
>>>>>>>>>>>> all the HW fences and not racing against some more fences 
>>>>>>>>>>>> insertion into that array ?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Well I would still say the best approach would be to insert 
>>>>>>>>>>> this between the front end and the backend and not rely on 
>>>>>>>>>>> signaling fences while holding the device srcu.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My question is, even now, when we run 
>>>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_wait_empty or 
>>>>>>>>>> amdgpu_fence_driver_fini_hw->amdgpu_fence_driver_force_completion, 
>>>>>>>>>> what there prevents a race with another fence being at the 
>>>>>>>>>> same time emitted and inserted into the fence array ? Looks 
>>>>>>>>>> like nothing.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Each ring can only be used by one thread at the same time, 
>>>>>>>>> this includes emitting fences as well as other stuff.
>>>>>>>>>
>>>>>>>>> During GPU reset we make sure nobody writes to the rings by 
>>>>>>>>> stopping the scheduler and taking the GPU reset lock (so that 
>>>>>>>>> nobody else can start the scheduler again).
>>>>>>>>
>>>>>>>>
>>>>>>>> What about direct submissions not through scheduler - 
>>>>>>>> amdgpu_job_submit_direct, I don't see how this is protected.
>>>>>>>
>>>>>>> Those only happen during startup and GPU reset.
>>>>>>
>>>>>>
>>>>>> Ok, but then looks like I am missing something, see the following 
>>>>>> steps in amdgpu_pci_remove -
>>>>>>
>>>>>> 1) Use disable_irq API function to stop and flush all in flight 
>>>>>> HW interrupts handlers
>>>>>>
>>>>>> 2) Grab the reset lock and stop all the schedulers
>>>>>>
>>>>>> After above 2 steps the HW fences array is idle, no more 
>>>>>> insertions and no more extractions from the array
>>>>>>
>>>>>> 3) Run one time amdgpu_fence_process to signal all current HW fences
>>>>>>
>>>>>> 4) Set drm_dev_unplug (will 'flush' all in flight IOCTLs), 
>>>>>> release the GPU reset lock and go on with the rest of the 
>>>>>> sequence (cancel timers, work items e.t.c)
>>>>>>
>>>>>> What's problematic in this sequence ?
>>>>>
>>>>> drm_dev_unplug() will wait for the IOCTLs to finish.
>>>>>
>>>>> The IOCTLs in turn can wait for fences. That can be both hardware 
>>>>> fences, scheduler fences, as well as fences from other devices 
>>>>> (and KIQ fences for register writes under SRIOV, but we can 
>>>>> hopefully ignore them for now).
>>>>>
>>>>> We have handled the hardware fences, but we have no idea when the 
>>>>> scheduler fences or the fences from other devices will signal.
>>>>>
>>>>> Scheduler fences won't signal until the scheduler threads are 
>>>>> restarted or we somehow cancel the submissions. Doable, but tricky 
>>>>> as well.
>>>>
>>>>
>>>> For scheduler fences I am not worried, for the 
>>>> sched_fence->finished fence they are by definition attached to HW 
>>>> fences which already signaledfor sched_fence->scheduled we should 
>>>> run drm_sched_entity_kill_jobs for each entity after stopping the 
>>>> scheduler threads and before setting drm_dev_unplug.
>>>
>>> Well exactly that is what is tricky here. 
>>> drm_sched_entity_kill_jobs() assumes that there are no more jobs 
>>> pushed into the entity.
>>>
>>> We are racing here once more and need to handle that.
>>
>>
>> But why, I wrote above that we first stop the all schedulers, then 
>> only call drm_sched_entity_kill_jobs.
>
> The schedulers consuming jobs is not the problem, we already handle 
> that correct.
>
> The problem is that the entities might continue feeding stuff into the 
> scheduler.


Missed that.  Ok, can I just use non sleeping RCU with a flag around 
drm_sched_entity_push_job at the amdgpu level (only 2 functions call it 
- amdgpu_cs_submit and amdgpu_job_submit) as a preliminary step to flush 
and block in flight and future submissions to entity queue ?


>
>>>
>>>>>
>>>>> For waiting for other device I have no idea if that couldn't 
>>>>> deadlock somehow.
>>>>
>>>>
>>>> Yea, not sure for imported fences and dma_bufs, I would assume the 
>>>> other devices should not be impacted by our device removal but, who 
>>>> knows...
>>>>
>>>> So I guess we are NOT going with finalizing HW fences before 
>>>> drm_dev_unplug and instead will just call drm_dev_enter/exit at the 
>>>> back-ends all over the place where there are MMIO accesses ?
>>>
>>> Good question. As you said that is really the hard path.
>>>
>>> Handling it all at once at IOCTL level certainly has some appeal as 
>>> well, but I have no idea if we can guarantee that this is lock free.
>>
>>
>> Maybe just empirically - let's try it and see under different test 
>> scenarios what actually happens  ?
>
> Not a good idea in general, we have that approach way to often at AMD 
> and are then surprised that everything works in QA but fails in 
> production.
>
> But Daniel already noted in his reply that waiting for a fence while 
> holding the SRCU is expected to work.
>
> So let's stick with the approach of high level locking for hotplug.


To my understanding this is true for other devises, not the one being 
extracted, for him you still need to do all the HW fence signalling 
dance because the HW is gone and we block any TDRs (which won't help 
anyway).

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> BTW: Could it be that the device SRCU protects more than one 
>>>>>>>>>>> device and we deadlock because of this?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I haven't actually experienced any deadlock until now but, 
>>>>>>>>>> yes, drm_unplug_srcu is defined as static in drm_drv.c and so 
>>>>>>>>>> in the presence of multiple devices from same or different 
>>>>>>>>>> drivers we in fact are dependent on all their critical 
>>>>>>>>>> sections i guess.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>> actually need to sync that up with Daniel and the rest of the 
>>>>>>>>> i915 guys.
>>>>>>>>>
>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>>> waiting for the i915 driver as well.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per drm_device 
>>>>>>>> ? I don't see why it has to be global and not per device thing.
>>>>>>>
>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>
>>>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu is 
>>>>>>> global.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted to 
>>>>>>>>>>>>>>>> HW ring and hence we can safely call force signal on 
>>>>>>>>>>>>>>>> all that are currently there.
>>>>>>>>>>>>>>>>      * Any subsequently created  HW fences will be 
>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could 
>>>>>>>>>>>>>>>>>>>>> work as well.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need 
>>>>>>>>>>>>>>>>>>>> to reuse the GPU reset rw_lock. I rely on the SRCU 
>>>>>>>>>>>>>>>>>>>> unplug flag for unplug. Also, not clear to me why 
>>>>>>>>>>>>>>>>>>>> are we focusing on the scheduler threads, any code 
>>>>>>>>>>>>>>>>>>>> patch to generate HW fences should be covered, so 
>>>>>>>>>>>>>>>>>>>> any code leading to amdgpu_fence_emit needs to be 
>>>>>>>>>>>>>>>>>>>> taken into account such as, direct IB submissions, 
>>>>>>>>>>>>>>>>>>>> VM flushes e.t.c
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same time 
>>>>>>>>>>>>>>>>>>> as a reset.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have to 
>>>>>>>>>>>>>>>>>> take reset write side lock during HW fences 
>>>>>>>>>>>>>>>>>> signalling in order to protect against scheduler/HW 
>>>>>>>>>>>>>>>>>> fences detachment and reattachment during schedulers 
>>>>>>>>>>>>>>>>>> stop/restart. But if we go with your approach  then 
>>>>>>>>>>>>>>>>>> calling drm_dev_unplug and scoping amdgpu_job_timeout 
>>>>>>>>>>>>>>>>>> with drm_dev_enter/exit should be enough to prevent 
>>>>>>>>>>>>>>>>>> any concurrent GPU resets during unplug. In fact I 
>>>>>>>>>>>>>>>>>> already do it anyway - 
>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-14 14:36                                                                           ` Andrey Grodzovsky
@ 2021-04-14 14:58                                                                             ` Christian König
  2021-04-15  6:27                                                                               ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-14 14:58 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:
>  [SNIP]
>>>>
>>>> We are racing here once more and need to handle that.
>>>
>>>
>>> But why, I wrote above that we first stop the all schedulers, then 
>>> only call drm_sched_entity_kill_jobs.
>>
>> The schedulers consuming jobs is not the problem, we already handle 
>> that correct.
>>
>> The problem is that the entities might continue feeding stuff into 
>> the scheduler.
>
>
> Missed that.  Ok, can I just use non sleeping RCU with a flag around 
> drm_sched_entity_push_job at the amdgpu level (only 2 functions call 
> it - amdgpu_cs_submit and amdgpu_job_submit) as a preliminary step to 
> flush and block in flight and future submissions to entity queue ?

Double checking the code I think we can use the notifier_lock for this.

E.g. in amdgpu_cs.c see where we have the goto error_abort.

That is the place where such a check could be added without any 
additional overhead.

Christian.

>
>
>>
>>>>
>>>>>>
>>>>>> For waiting for other device I have no idea if that couldn't 
>>>>>> deadlock somehow.
>>>>>
>>>>>
>>>>> Yea, not sure for imported fences and dma_bufs, I would assume the 
>>>>> other devices should not be impacted by our device removal but, 
>>>>> who knows...
>>>>>
>>>>> So I guess we are NOT going with finalizing HW fences before 
>>>>> drm_dev_unplug and instead will just call drm_dev_enter/exit at 
>>>>> the back-ends all over the place where there are MMIO accesses ?
>>>>
>>>> Good question. As you said that is really the hard path.
>>>>
>>>> Handling it all at once at IOCTL level certainly has some appeal as 
>>>> well, but I have no idea if we can guarantee that this is lock free.
>>>
>>>
>>> Maybe just empirically - let's try it and see under different test 
>>> scenarios what actually happens  ?
>>
>> Not a good idea in general, we have that approach way to often at AMD 
>> and are then surprised that everything works in QA but fails in 
>> production.
>>
>> But Daniel already noted in his reply that waiting for a fence while 
>> holding the SRCU is expected to work.
>>
>> So let's stick with the approach of high level locking for hotplug.
>
>
> To my understanding this is true for other devises, not the one being 
> extracted, for him you still need to do all the HW fence signalling 
> dance because the HW is gone and we block any TDRs (which won't help 
> anyway).
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> BTW: Could it be that the device SRCU protects more than 
>>>>>>>>>>>> one device and we deadlock because of this?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I haven't actually experienced any deadlock until now but, 
>>>>>>>>>>> yes, drm_unplug_srcu is defined as static in drm_drv.c and 
>>>>>>>>>>> so in the presence of multiple devices from same or 
>>>>>>>>>>> different drivers we in fact are dependent on all their 
>>>>>>>>>>> critical sections i guess.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>>> actually need to sync that up with Daniel and the rest of the 
>>>>>>>>>> i915 guys.
>>>>>>>>>>
>>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>>>> waiting for the i915 driver as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per 
>>>>>>>>> drm_device ? I don't see why it has to be global and not per 
>>>>>>>>> device thing.
>>>>>>>>
>>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>>
>>>>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu 
>>>>>>>> is global.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted to 
>>>>>>>>>>>>>>>>> HW ring and hence we can safely call force signal on 
>>>>>>>>>>>>>>>>> all that are currently there.
>>>>>>>>>>>>>>>>>      * Any subsequently created HW fences will be 
>>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>>         amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could 
>>>>>>>>>>>>>>>>>>>>>> work as well.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need 
>>>>>>>>>>>>>>>>>>>>> to reuse the GPU reset rw_lock. I rely on the SRCU 
>>>>>>>>>>>>>>>>>>>>> unplug flag for unplug. Also, not clear to me why 
>>>>>>>>>>>>>>>>>>>>> are we focusing on the scheduler threads, any code 
>>>>>>>>>>>>>>>>>>>>> patch to generate HW fences should be covered, so 
>>>>>>>>>>>>>>>>>>>>> any code leading to amdgpu_fence_emit needs to be 
>>>>>>>>>>>>>>>>>>>>> taken into account such as, direct IB submissions, 
>>>>>>>>>>>>>>>>>>>>> VM flushes e.t.c
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same time 
>>>>>>>>>>>>>>>>>>>> as a reset.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have to 
>>>>>>>>>>>>>>>>>>> take reset write side lock during HW fences 
>>>>>>>>>>>>>>>>>>> signalling in order to protect against scheduler/HW 
>>>>>>>>>>>>>>>>>>> fences detachment and reattachment during schedulers 
>>>>>>>>>>>>>>>>>>> stop/restart. But if we go with your approach then 
>>>>>>>>>>>>>>>>>>> calling drm_dev_unplug and scoping 
>>>>>>>>>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should be 
>>>>>>>>>>>>>>>>>>> enough to prevent any concurrent GPU resets during 
>>>>>>>>>>>>>>>>>>> unplug. In fact I already do it anyway - 
>>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-14 14:58                                                                             ` Christian König
@ 2021-04-15  6:27                                                                               ` Andrey Grodzovsky
  2021-04-15  7:02                                                                                 ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-15  6:27 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-14 10:58 a.m., Christian König wrote:
> Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:
>>  [SNIP]
>>>>>
>>>>> We are racing here once more and need to handle that.
>>>>
>>>>
>>>> But why, I wrote above that we first stop the all schedulers, then 
>>>> only call drm_sched_entity_kill_jobs.
>>>
>>> The schedulers consuming jobs is not the problem, we already handle 
>>> that correct.
>>>
>>> The problem is that the entities might continue feeding stuff into 
>>> the scheduler.
>>
>>
>> Missed that.  Ok, can I just use non sleeping RCU with a flag around 
>> drm_sched_entity_push_job at the amdgpu level (only 2 functions call 
>> it - amdgpu_cs_submit and amdgpu_job_submit) as a preliminary step to 
>> flush and block in flight and future submissions to entity queue ?
>
> Double checking the code I think we can use the notifier_lock for this.
>
> E.g. in amdgpu_cs.c see where we have the goto error_abort.
>
> That is the place where such a check could be added without any 
> additional overhead.


Sure, I will just have to add this lock to amdgpu_job_submit too.


>
> Christian.
>
>>
>>
>>>
>>>>>
>>>>>>>
>>>>>>> For waiting for other device I have no idea if that couldn't 
>>>>>>> deadlock somehow.
>>>>>>
>>>>>>
>>>>>> Yea, not sure for imported fences and dma_bufs, I would assume 
>>>>>> the other devices should not be impacted by our device removal 
>>>>>> but, who knows...
>>>>>>
>>>>>> So I guess we are NOT going with finalizing HW fences before 
>>>>>> drm_dev_unplug and instead will just call drm_dev_enter/exit at 
>>>>>> the back-ends all over the place where there are MMIO accesses ?
>>>>>
>>>>> Good question. As you said that is really the hard path.
>>>>>
>>>>> Handling it all at once at IOCTL level certainly has some appeal 
>>>>> as well, but I have no idea if we can guarantee that this is lock 
>>>>> free.
>>>>
>>>>
>>>> Maybe just empirically - let's try it and see under different test 
>>>> scenarios what actually happens  ?
>>>
>>> Not a good idea in general, we have that approach way to often at 
>>> AMD and are then surprised that everything works in QA but fails in 
>>> production.
>>>
>>> But Daniel already noted in his reply that waiting for a fence while 
>>> holding the SRCU is expected to work.
>>>
>>> So let's stick with the approach of high level locking for hotplug.
>>
>>
>> To my understanding this is true for other devises, not the one being 
>> extracted, for him you still need to do all the HW fence signalling 
>> dance because the HW is gone and we block any TDRs (which won't help 
>> anyway).
>>
>> Andrey


Do you agree to the above ?

Andrey


>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW: Could it be that the device SRCU protects more than 
>>>>>>>>>>>>> one device and we deadlock because of this?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't actually experienced any deadlock until now but, 
>>>>>>>>>>>> yes, drm_unplug_srcu is defined as static in drm_drv.c and 
>>>>>>>>>>>> so in the presence of multiple devices from same or 
>>>>>>>>>>>> different drivers we in fact are dependent on all their 
>>>>>>>>>>>> critical sections i guess.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>>>> actually need to sync that up with Daniel and the rest of 
>>>>>>>>>>> the i915 guys.
>>>>>>>>>>>
>>>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>>>>> waiting for the i915 driver as well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per 
>>>>>>>>>> drm_device ? I don't see why it has to be global and not per 
>>>>>>>>>> device thing.
>>>>>>>>>
>>>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>>>
>>>>>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu 
>>>>>>>>> is global.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted to 
>>>>>>>>>>>>>>>>>> HW ring and hence we can safely call force signal on 
>>>>>>>>>>>>>>>>>> all that are currently there.
>>>>>>>>>>>>>>>>>>      * Any subsequently created HW fences will be 
>>>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>>> amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler could 
>>>>>>>>>>>>>>>>>>>>>>> work as well.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I need 
>>>>>>>>>>>>>>>>>>>>>> to reuse the GPU reset rw_lock. I rely on the 
>>>>>>>>>>>>>>>>>>>>>> SRCU unplug flag for unplug. Also, not clear to 
>>>>>>>>>>>>>>>>>>>>>> me why are we focusing on the scheduler threads, 
>>>>>>>>>>>>>>>>>>>>>> any code patch to generate HW fences should be 
>>>>>>>>>>>>>>>>>>>>>> covered, so any code leading to amdgpu_fence_emit 
>>>>>>>>>>>>>>>>>>>>>> needs to be taken into account such as, direct IB 
>>>>>>>>>>>>>>>>>>>>>> submissions, VM flushes e.t.c
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same time 
>>>>>>>>>>>>>>>>>>>>> as a reset.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have 
>>>>>>>>>>>>>>>>>>>> to take reset write side lock during HW fences 
>>>>>>>>>>>>>>>>>>>> signalling in order to protect against scheduler/HW 
>>>>>>>>>>>>>>>>>>>> fences detachment and reattachment during 
>>>>>>>>>>>>>>>>>>>> schedulers stop/restart. But if we go with your 
>>>>>>>>>>>>>>>>>>>> approach then calling drm_dev_unplug and scoping 
>>>>>>>>>>>>>>>>>>>> amdgpu_job_timeout with drm_dev_enter/exit should 
>>>>>>>>>>>>>>>>>>>> be enough to prevent any concurrent GPU resets 
>>>>>>>>>>>>>>>>>>>> during unplug. In fact I already do it anyway - 
>>>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-15  6:27                                                                               ` Andrey Grodzovsky
@ 2021-04-15  7:02                                                                                 ` Christian König
  2021-04-15 14:11                                                                                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2021-04-15  7:02 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter

Am 15.04.21 um 08:27 schrieb Andrey Grodzovsky:
>
> On 2021-04-14 10:58 a.m., Christian König wrote:
>> Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:
>>>  [SNIP]
>>>>>>
>>>>>> We are racing here once more and need to handle that.
>>>>>
>>>>>
>>>>> But why, I wrote above that we first stop the all schedulers, then 
>>>>> only call drm_sched_entity_kill_jobs.
>>>>
>>>> The schedulers consuming jobs is not the problem, we already handle 
>>>> that correct.
>>>>
>>>> The problem is that the entities might continue feeding stuff into 
>>>> the scheduler.
>>>
>>>
>>> Missed that.  Ok, can I just use non sleeping RCU with a flag around 
>>> drm_sched_entity_push_job at the amdgpu level (only 2 functions call 
>>> it - amdgpu_cs_submit and amdgpu_job_submit) as a preliminary step 
>>> to flush and block in flight and future submissions to entity queue ?
>>
>> Double checking the code I think we can use the notifier_lock for this.
>>
>> E.g. in amdgpu_cs.c see where we have the goto error_abort.
>>
>> That is the place where such a check could be added without any 
>> additional overhead.
>
>
> Sure, I will just have to add this lock to amdgpu_job_submit too.

Not ideal, but I think that's fine with me. You might want to rename the 
lock for this thought.

>
>> [SNIP]
>>>>>
>>>>> Maybe just empirically - let's try it and see under different test 
>>>>> scenarios what actually happens  ?
>>>>
>>>> Not a good idea in general, we have that approach way to often at 
>>>> AMD and are then surprised that everything works in QA but fails in 
>>>> production.
>>>>
>>>> But Daniel already noted in his reply that waiting for a fence 
>>>> while holding the SRCU is expected to work.
>>>>
>>>> So let's stick with the approach of high level locking for hotplug.
>>>
>>>
>>> To my understanding this is true for other devises, not the one 
>>> being extracted, for him you still need to do all the HW fence 
>>> signalling dance because the HW is gone and we block any TDRs (which 
>>> won't help anyway).
>>>
>>> Andrey
>
>
> Do you agree to the above ?

Yeah, I think that is correct.

But on the other hand what Daniel reminded me of is that the handling 
needs to be consistent over different devices. And since some device 
already go with the approach of canceling everything we simply have to 
go down that route as well.

Christian.

>
> Andrey
>
>
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BTW: Could it be that the device SRCU protects more than 
>>>>>>>>>>>>>> one device and we deadlock because of this?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't actually experienced any deadlock until now but, 
>>>>>>>>>>>>> yes, drm_unplug_srcu is defined as static in drm_drv.c and 
>>>>>>>>>>>>> so in the presence of multiple devices from same or 
>>>>>>>>>>>>> different drivers we in fact are dependent on all their 
>>>>>>>>>>>>> critical sections i guess.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>>>>> actually need to sync that up with Daniel and the rest of 
>>>>>>>>>>>> the i915 guys.
>>>>>>>>>>>>
>>>>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>>>>>> waiting for the i915 driver as well.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per 
>>>>>>>>>>> drm_device ? I don't see why it has to be global and not per 
>>>>>>>>>>> device thing.
>>>>>>>>>>
>>>>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>>>>
>>>>>>>>>> Adding Daniel as well, maybe he knows why the drm_unplug_srcu 
>>>>>>>>>> is global.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted 
>>>>>>>>>>>>>>>>>>> to HW ring and hence we can safely call force signal 
>>>>>>>>>>>>>>>>>>> on all that are currently there.
>>>>>>>>>>>>>>>>>>>      * Any subsequently created HW fences will be 
>>>>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>>>> amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side and 
>>>>>>>>>>>>>>>>>>>>>>>> stopping and then restarting the scheduler 
>>>>>>>>>>>>>>>>>>>>>>>> could work as well.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I 
>>>>>>>>>>>>>>>>>>>>>>> need to reuse the GPU reset rw_lock. I rely on 
>>>>>>>>>>>>>>>>>>>>>>> the SRCU unplug flag for unplug. Also, not clear 
>>>>>>>>>>>>>>>>>>>>>>> to me why are we focusing on the scheduler 
>>>>>>>>>>>>>>>>>>>>>>> threads, any code patch to generate HW fences 
>>>>>>>>>>>>>>>>>>>>>>> should be covered, so any code leading to 
>>>>>>>>>>>>>>>>>>>>>>> amdgpu_fence_emit needs to be taken into account 
>>>>>>>>>>>>>>>>>>>>>>> such as, direct IB submissions, VM flushes e.t.c
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same 
>>>>>>>>>>>>>>>>>>>>>> time as a reset.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have 
>>>>>>>>>>>>>>>>>>>>> to take reset write side lock during HW fences 
>>>>>>>>>>>>>>>>>>>>> signalling in order to protect against 
>>>>>>>>>>>>>>>>>>>>> scheduler/HW fences detachment and reattachment 
>>>>>>>>>>>>>>>>>>>>> during schedulers stop/restart. But if we go with 
>>>>>>>>>>>>>>>>>>>>> your approach then calling drm_dev_unplug and 
>>>>>>>>>>>>>>>>>>>>> scoping amdgpu_job_timeout with drm_dev_enter/exit 
>>>>>>>>>>>>>>>>>>>>> should be enough to prevent any concurrent GPU 
>>>>>>>>>>>>>>>>>>>>> resets during unplug. In fact I already do it 
>>>>>>>>>>>>>>>>>>>>> anyway - 
>>>>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cc7fc6cb505c34aedfe6d08d8fe4b3947%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637538946324857369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=64362PRC8xTgR2Uj2R256bMegVm8YWq1KI%2BAjzeYXv4%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-15  7:02                                                                                 ` Christian König
@ 2021-04-15 14:11                                                                                   ` Andrey Grodzovsky
  2021-04-15 15:09                                                                                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Grodzovsky @ 2021-04-15 14:11 UTC (permalink / raw)
  To: Christian König, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


On 2021-04-15 3:02 a.m., Christian König wrote:
> Am 15.04.21 um 08:27 schrieb Andrey Grodzovsky:
>>
>> On 2021-04-14 10:58 a.m., Christian König wrote:
>>> Am 14.04.21 um 16:36 schrieb Andrey Grodzovsky:
>>>>  [SNIP]
>>>>>>>
>>>>>>> We are racing here once more and need to handle that.
>>>>>>
>>>>>>
>>>>>> But why, I wrote above that we first stop the all schedulers, 
>>>>>> then only call drm_sched_entity_kill_jobs.
>>>>>
>>>>> The schedulers consuming jobs is not the problem, we already 
>>>>> handle that correct.
>>>>>
>>>>> The problem is that the entities might continue feeding stuff into 
>>>>> the scheduler.
>>>>
>>>>
>>>> Missed that.  Ok, can I just use non sleeping RCU with a flag 
>>>> around drm_sched_entity_push_job at the amdgpu level (only 2 
>>>> functions call it - amdgpu_cs_submit and amdgpu_job_submit) as a 
>>>> preliminary step to flush and block in flight and future 
>>>> submissions to entity queue ?
>>>
>>> Double checking the code I think we can use the notifier_lock for this.
>>>
>>> E.g. in amdgpu_cs.c see where we have the goto error_abort.
>>>
>>> That is the place where such a check could be added without any 
>>> additional overhead.
>>
>>
>> Sure, I will just have to add this lock to amdgpu_job_submit too.
>
> Not ideal, but I think that's fine with me. You might want to rename 
> the lock for this thought.
>
>>
>>> [SNIP]
>>>>>>
>>>>>> Maybe just empirically - let's try it and see under different 
>>>>>> test scenarios what actually happens  ?
>>>>>
>>>>> Not a good idea in general, we have that approach way to often at 
>>>>> AMD and are then surprised that everything works in QA but fails 
>>>>> in production.
>>>>>
>>>>> But Daniel already noted in his reply that waiting for a fence 
>>>>> while holding the SRCU is expected to work.
>>>>>
>>>>> So let's stick with the approach of high level locking for hotplug.
>>>>
>>>>
>>>> To my understanding this is true for other devises, not the one 
>>>> being extracted, for him you still need to do all the HW fence 
>>>> signalling dance because the HW is gone and we block any TDRs 
>>>> (which won't help anyway).
>>>>
>>>> Andrey
>>
>>
>> Do you agree to the above ?
>
> Yeah, I think that is correct.
>
> But on the other hand what Daniel reminded me of is that the handling 
> needs to be consistent over different devices. And since some device 
> already go with the approach of canceling everything we simply have to 
> go down that route as well.
>
> Christian.


What does it mean in our context ? What needs to be done which we are 
not doing now ?

Andrey


>
>>
>> Andrey
>>
>>
>>>>
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> BTW: Could it be that the device SRCU protects more than 
>>>>>>>>>>>>>>> one device and we deadlock because of this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I haven't actually experienced any deadlock until now 
>>>>>>>>>>>>>> but, yes, drm_unplug_srcu is defined as static in 
>>>>>>>>>>>>>> drm_drv.c and so in the presence of multiple devices from 
>>>>>>>>>>>>>> same or different drivers we in fact are dependent on all 
>>>>>>>>>>>>>> their critical sections i guess.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>>>>>> actually need to sync that up with Daniel and the rest of 
>>>>>>>>>>>>> the i915 guys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>>>>>> station which needs hotplug and the driver might depend on 
>>>>>>>>>>>>> waiting for the i915 driver as well.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per 
>>>>>>>>>>>> drm_device ? I don't see why it has to be global and not 
>>>>>>>>>>>> per device thing.
>>>>>>>>>>>
>>>>>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>>>>>
>>>>>>>>>>> Adding Daniel as well, maybe he knows why the 
>>>>>>>>>>> drm_unplug_srcu is global.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     /* Past this point no more fence are submitted 
>>>>>>>>>>>>>>>>>>>> to HW ring and hence we can safely call force 
>>>>>>>>>>>>>>>>>>>> signal on all that are currently there.
>>>>>>>>>>>>>>>>>>>>      * Any subsequently created HW fences will be 
>>>>>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>>>>> amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side 
>>>>>>>>>>>>>>>>>>>>>>>>> and stopping and then restarting the scheduler 
>>>>>>>>>>>>>>>>>>>>>>>>> could work as well.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I 
>>>>>>>>>>>>>>>>>>>>>>>> need to reuse the GPU reset rw_lock. I rely on 
>>>>>>>>>>>>>>>>>>>>>>>> the SRCU unplug flag for unplug. Also, not 
>>>>>>>>>>>>>>>>>>>>>>>> clear to me why are we focusing on the 
>>>>>>>>>>>>>>>>>>>>>>>> scheduler threads, any code patch to generate 
>>>>>>>>>>>>>>>>>>>>>>>> HW fences should be covered, so any code 
>>>>>>>>>>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken 
>>>>>>>>>>>>>>>>>>>>>>>> into account such as, direct IB submissions, VM 
>>>>>>>>>>>>>>>>>>>>>>>> flushes e.t.c
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same 
>>>>>>>>>>>>>>>>>>>>>>> time as a reset.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I have 
>>>>>>>>>>>>>>>>>>>>>> to take reset write side lock during HW fences 
>>>>>>>>>>>>>>>>>>>>>> signalling in order to protect against 
>>>>>>>>>>>>>>>>>>>>>> scheduler/HW fences detachment and reattachment 
>>>>>>>>>>>>>>>>>>>>>> during schedulers stop/restart. But if we go with 
>>>>>>>>>>>>>>>>>>>>>> your approach then calling drm_dev_unplug and 
>>>>>>>>>>>>>>>>>>>>>> scoping amdgpu_job_timeout with 
>>>>>>>>>>>>>>>>>>>>>> drm_dev_enter/exit should be enough to prevent 
>>>>>>>>>>>>>>>>>>>>>> any concurrent GPU resets during unplug. In fact 
>>>>>>>>>>>>>>>>>>>>>> I already do it anyway - 
>>>>>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ca64b1f5e0df0403a656408d8ffdc7bdb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637540669732692484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pLcplnlDIESV998tLO7iydxEo5lh71BjQCbAOxKif2Q%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 0/4] Refine GPU recovery sequence to enhance its stability
  2021-04-15 14:11                                                                                   ` Andrey Grodzovsky
@ 2021-04-15 15:09                                                                                     ` Christian König
  0 siblings, 0 replies; 56+ messages in thread
From: Christian König @ 2021-04-15 15:09 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Li, Dennis, amd-gfx,
	Deucher, Alexander, Kuehling, Felix, Zhang, Hawking,
	Daniel Vetter


>>>
>>>> [SNIP]
>>>>>>>
>>>>>>> Maybe just empirically - let's try it and see under different 
>>>>>>> test scenarios what actually happens  ?
>>>>>>
>>>>>> Not a good idea in general, we have that approach way to often at 
>>>>>> AMD and are then surprised that everything works in QA but fails 
>>>>>> in production.
>>>>>>
>>>>>> But Daniel already noted in his reply that waiting for a fence 
>>>>>> while holding the SRCU is expected to work.
>>>>>>
>>>>>> So let's stick with the approach of high level locking for hotplug.
>>>>>
>>>>>
>>>>> To my understanding this is true for other devises, not the one 
>>>>> being extracted, for him you still need to do all the HW fence 
>>>>> signalling dance because the HW is gone and we block any TDRs 
>>>>> (which won't help anyway).
>>>>>
>>>>> Andrey
>>>
>>>
>>> Do you agree to the above ?
>>
>> Yeah, I think that is correct.
>>
>> But on the other hand what Daniel reminded me of is that the handling 
>> needs to be consistent over different devices. And since some device 
>> already go with the approach of canceling everything we simply have 
>> to go down that route as well.
>>
>> Christian.
>
>
> What does it mean in our context ? What needs to be done which we are 
> not doing now ?

I think we are fine, we just need to continue with the approach of 
forcefully signaling all fences on hotplug.

Christian.

>
> Andrey
>
>
>>
>>>
>>> Andrey
>>>
>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> BTW: Could it be that the device SRCU protects more 
>>>>>>>>>>>>>>>> than one device and we deadlock because of this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I haven't actually experienced any deadlock until now 
>>>>>>>>>>>>>>> but, yes, drm_unplug_srcu is defined as static in 
>>>>>>>>>>>>>>> drm_drv.c and so in the presence of multiple devices 
>>>>>>>>>>>>>>> from same or different drivers we in fact are dependent 
>>>>>>>>>>>>>>> on all their critical sections i guess.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Shit, yeah the devil is a squirrel. So for A+I laptops we 
>>>>>>>>>>>>>> actually need to sync that up with Daniel and the rest of 
>>>>>>>>>>>>>> the i915 guys.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> IIRC we could actually have an amdgpu device in a docking 
>>>>>>>>>>>>>> station which needs hotplug and the driver might depend 
>>>>>>>>>>>>>> on waiting for the i915 driver as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can't we propose a patch to make drm_unplug_srcu per 
>>>>>>>>>>>>> drm_device ? I don't see why it has to be global and not 
>>>>>>>>>>>>> per device thing.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm really wondering the same thing for quite a while now.
>>>>>>>>>>>>
>>>>>>>>>>>> Adding Daniel as well, maybe he knows why the 
>>>>>>>>>>>> drm_unplug_srcu is global.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> /* Past this point no more fence are submitted to 
>>>>>>>>>>>>>>>>>>>>> HW ring and hence we can safely call force signal 
>>>>>>>>>>>>>>>>>>>>> on all that are currently there.
>>>>>>>>>>>>>>>>>>>>>      * Any subsequently created HW fences will be 
>>>>>>>>>>>>>>>>>>>>> returned signaled with an error code right away
>>>>>>>>>>>>>>>>>>>>>      */
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     for_each_ring(adev)
>>>>>>>>>>>>>>>>>>>>> amdgpu_fence_process(ring)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>     drm_dev_unplug(dev);
>>>>>>>>>>>>>>>>>>>>>     Stop schedulers
>>>>>>>>>>>>>>>>>>>>>     cancel_sync(all timers and queued works);
>>>>>>>>>>>>>>>>>>>>>     hw_fini
>>>>>>>>>>>>>>>>>>>>>     unmap_mmio
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Alternatively grabbing the reset write side 
>>>>>>>>>>>>>>>>>>>>>>>>>> and stopping and then restarting the 
>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler could work as well.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I didn't get the above and I don't see why I 
>>>>>>>>>>>>>>>>>>>>>>>>> need to reuse the GPU reset rw_lock. I rely on 
>>>>>>>>>>>>>>>>>>>>>>>>> the SRCU unplug flag for unplug. Also, not 
>>>>>>>>>>>>>>>>>>>>>>>>> clear to me why are we focusing on the 
>>>>>>>>>>>>>>>>>>>>>>>>> scheduler threads, any code patch to generate 
>>>>>>>>>>>>>>>>>>>>>>>>> HW fences should be covered, so any code 
>>>>>>>>>>>>>>>>>>>>>>>>> leading to amdgpu_fence_emit needs to be taken 
>>>>>>>>>>>>>>>>>>>>>>>>> into account such as, direct IB submissions, 
>>>>>>>>>>>>>>>>>>>>>>>>> VM flushes e.t.c
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> You need to work together with the reset lock 
>>>>>>>>>>>>>>>>>>>>>>>> anyway, cause a hotplug could run at the same 
>>>>>>>>>>>>>>>>>>>>>>>> time as a reset.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> For going my way indeed now I see now that I 
>>>>>>>>>>>>>>>>>>>>>>> have to take reset write side lock during HW 
>>>>>>>>>>>>>>>>>>>>>>> fences signalling in order to protect against 
>>>>>>>>>>>>>>>>>>>>>>> scheduler/HW fences detachment and reattachment 
>>>>>>>>>>>>>>>>>>>>>>> during schedulers stop/restart. But if we go 
>>>>>>>>>>>>>>>>>>>>>>> with your approach then calling drm_dev_unplug 
>>>>>>>>>>>>>>>>>>>>>>> and scoping amdgpu_job_timeout with 
>>>>>>>>>>>>>>>>>>>>>>> drm_dev_enter/exit should be enough to prevent 
>>>>>>>>>>>>>>>>>>>>>>> any concurrent GPU resets during unplug. In fact 
>>>>>>>>>>>>>>>>>>>>>>> I already do it anyway - 
>>>>>>>>>>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Ddrm-misc-next%26id%3Def0ea4dd29ef44d2649c5eda16c8f4869acc36b1&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ca64b1f5e0df0403a656408d8ffdc7bdb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637540669732692484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pLcplnlDIESV998tLO7iydxEo5lh71BjQCbAOxKif2Q%3D&amp;reserved=0 
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes, good point as well.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-04-15 15:09 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18  7:23 [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Dennis Li
2021-03-18  7:23 ` [PATCH 1/4] drm/amdgpu: remove reset lock from low level functions Dennis Li
2021-03-18  7:23 ` [PATCH 2/4] drm/amdgpu: refine the GPU recovery sequence Dennis Li
2021-03-18  7:56   ` Christian König
2021-03-18  7:23 ` [PATCH 3/4] drm/amdgpu: instead of using down/up_read directly Dennis Li
2021-03-18  7:23 ` [PATCH 4/4] drm/amdkfd: add reset lock protection for kfd entry functions Dennis Li
2021-03-18  7:53 ` [PATCH 0/4] Refine GPU recovery sequence to enhance its stability Christian König
2021-03-18  8:28   ` Li, Dennis
2021-03-18  8:58     ` AW: " Koenig, Christian
2021-03-18  9:30       ` Li, Dennis
2021-03-18  9:51         ` Christian König
2021-04-05 17:58           ` Andrey Grodzovsky
2021-04-06 10:34             ` Christian König
2021-04-06 11:21               ` Christian König
2021-04-06 21:22               ` Andrey Grodzovsky
2021-04-07 10:28                 ` Christian König
2021-04-07 19:44                   ` Andrey Grodzovsky
2021-04-08  8:22                     ` Christian König
2021-04-08  8:32                       ` Christian König
2021-04-08 16:08                         ` Andrey Grodzovsky
2021-04-08 18:58                           ` Christian König
2021-04-08 20:39                             ` Andrey Grodzovsky
2021-04-09  6:53                               ` Christian König
2021-04-09  7:01                                 ` Christian König
2021-04-09 15:42                                   ` Andrey Grodzovsky
2021-04-09 16:39                                     ` Christian König
2021-04-09 18:18                                       ` Andrey Grodzovsky
2021-04-10 17:34                                         ` Christian König
2021-04-12 17:27                                           ` Andrey Grodzovsky
2021-04-12 17:44                                             ` Christian König
2021-04-12 18:01                                               ` Andrey Grodzovsky
2021-04-12 18:05                                                 ` Christian König
2021-04-12 18:18                                                   ` Andrey Grodzovsky
2021-04-12 18:23                                                     ` Christian König
2021-04-12 19:12                                                       ` Andrey Grodzovsky
2021-04-12 19:18                                                         ` Christian König
2021-04-12 20:01                                                           ` Andrey Grodzovsky
2021-04-13  7:10                                                             ` Christian König
2021-04-13  9:13                                                               ` Li, Dennis
2021-04-13  9:14                                                                 ` Christian König
2021-04-13 20:08                                                                 ` Daniel Vetter
2021-04-13 15:12                                                               ` Andrey Grodzovsky
2021-04-13 18:03                                                                 ` Christian König
2021-04-13 18:18                                                                   ` Andrey Grodzovsky
2021-04-13 18:25                                                                     ` Christian König
2021-04-13 18:30                                                                       ` Andrey Grodzovsky
2021-04-14  7:01                                                                         ` Christian König
2021-04-14 14:36                                                                           ` Andrey Grodzovsky
2021-04-14 14:58                                                                             ` Christian König
2021-04-15  6:27                                                                               ` Andrey Grodzovsky
2021-04-15  7:02                                                                                 ` Christian König
2021-04-15 14:11                                                                                   ` Andrey Grodzovsky
2021-04-15 15:09                                                                                     ` Christian König
2021-04-13 20:07                                                               ` Daniel Vetter
2021-04-13  5:36                                                       ` Andrey Grodzovsky
2021-04-13  7:07                                                         ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.