dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org>
Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com
Subject: [RFC 5/6] drm/amdgpu: Drop hive->in_reset
Date: Fri, 17 Dec 2021 17:27:44 -0500	[thread overview]
Message-ID: <20211217222745.881637-6-andrey.grodzovsky@amd.com> (raw)
In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com>

Since we serialize all resets no need to protect from concurrent
resets.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  1 -
 3 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 55cd67b9ede2..d2701e4d0622 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 	dev_info(adev->dev, "GPU %s begin!\n",
 		need_emergency_restart ? "jobs stop":"reset");
 
-	/*
-	 * Here we trylock to avoid chain of resets executing from
-	 * either trigger by jobs on different adevs in XGMI hive or jobs on
-	 * different schedulers for same device while this TO handler is running.
-	 * We always reset all schedulers for device and all devices for XGMI
-	 * hive so that should take care of them too.
-	 */
 	hive = amdgpu_get_xgmi_hive(adev);
-	if (hive) {
-		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
-			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
-				job ? job->base.id : -1, hive->hive_id);
-			amdgpu_put_xgmi_hive(hive);
-			if (job && job->vm)
-				drm_sched_increase_karma(&job->base);
-			return 0;
-		}
+	if (hive)
 		mutex_lock(&hive->hive_lock);
-	}
 
 	reset_context.method = AMD_RESET_METHOD_NONE;
 	reset_context.reset_req_dev = adev;
@@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev,
 
 skip_recovery:
 	if (hive) {
-		atomic_set(&hive->in_reset, 0);
 		mutex_unlock(&hive->hive_lock);
 		amdgpu_put_xgmi_hive(hive);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 8b116f398101..0d54bef5c494 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev)
 	INIT_LIST_HEAD(&hive->device_list);
 	INIT_LIST_HEAD(&hive->node);
 	mutex_init(&hive->hive_lock);
-	atomic_set(&hive->in_reset, 0);
 	atomic_set(&hive->number_devices, 0);
 	task_barrier_init(&hive->tb);
 	hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index 6121aaa292cb..2f2ce53645a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -33,7 +33,6 @@ struct amdgpu_hive_info {
 	struct list_head node;
 	atomic_t number_devices;
 	struct mutex hive_lock;
-	atomic_t in_reset;
 	int hi_req_count;
 	struct amdgpu_device *hi_req_gpu;
 	struct task_barrier tb;
-- 
2.25.1


  parent reply	other threads:[~2021-12-17 22:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17 22:27 [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq Andrey Grodzovsky
2021-12-17 22:27 ` [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky
2021-12-20  7:16   ` Christian König
2021-12-20 21:51     ` Andrey Grodzovsky
2021-12-21  7:05       ` Christian König
2021-12-17 22:27 ` [RFC 3/6] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky
2021-12-20  7:17   ` Christian König
2021-12-20 19:22     ` Andrey Grodzovsky
2021-12-21  7:02       ` Christian König
2021-12-21 16:03         ` Andrey Grodzovsky
2021-12-22  7:50           ` Christian König
2021-12-17 22:27 ` [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky
2021-12-20  7:20   ` Christian König
2021-12-20 22:17     ` Andrey Grodzovsky
2021-12-21  7:59       ` Christian König
2021-12-21 16:10         ` Andrey Grodzovsky
2021-12-17 22:27 ` Andrey Grodzovsky [this message]
2021-12-17 22:27 ` [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky
2021-12-20  7:25 ` [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Christian König
2021-12-20  9:43   ` Daniel Vetter
2021-12-20 17:06   ` Liu, Shaoyun
2021-12-20 19:11     ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211217222745.881637-6-andrey.grodzovsky@amd.com \
    --to=andrey.grodzovsky@amd.com \
    --cc=Monk.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=horace.chen@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).