From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org> Cc: Monk.Liu@amd.com, horace.chen@amd.com, christian.koenig@amd.com Subject: [RFC 5/6] drm/amdgpu: Drop hive->in_reset Date: Fri, 17 Dec 2021 17:27:44 -0500 [thread overview] Message-ID: <20211217222745.881637-6-andrey.grodzovsky@amd.com> (raw) In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed, 1 insertion(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 55cd67b9ede2..d2701e4d0622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, dev_info(adev->dev, "GPU %s begin!\n", need_emergency_restart ? "jobs stop":"reset"); - /* - * Here we trylock to avoid chain of resets executing from - * either trigger by jobs on different adevs in XGMI hive or jobs on - * different schedulers for same device while this TO handler is running. - * We always reset all schedulers for device and all devices for XGMI - * hive so that should take care of them too. - */ hive = amdgpu_get_xgmi_hive(adev); - if (hive) { - if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { - DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", - job ? job->base.id : -1, hive->hive_id); - amdgpu_put_xgmi_hive(hive); - if (job && job->vm) - drm_sched_increase_karma(&job->base); - return 0; - } + if (hive) mutex_lock(&hive->hive_lock); - } reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; @@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, skip_recovery: if (hive) { - atomic_set(&hive->in_reset, 0); mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 8b116f398101..0d54bef5c494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); mutex_init(&hive->hive_lock); - atomic_set(&hive->in_reset, 0); atomic_set(&hive->number_devices, 0); task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6121aaa292cb..2f2ce53645a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -33,7 +33,6 @@ struct amdgpu_hive_info { struct list_head node; atomic_t number_devices; struct mutex hive_lock; - atomic_t in_reset; int hi_req_count; struct amdgpu_device *hi_req_gpu; struct task_barrier tb; -- 2.25.1
WARNING: multiple messages have this Message-ID (diff)
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> To: <dri-devel@lists.freedesktop.org>, <amd-gfx@lists.freedesktop.org> Cc: Monk.Liu@amd.com, Andrey Grodzovsky <andrey.grodzovsky@amd.com>, horace.chen@amd.com, christian.koenig@amd.com, daniel@ffwll.ch Subject: [RFC 5/6] drm/amdgpu: Drop hive->in_reset Date: Fri, 17 Dec 2021 17:27:44 -0500 [thread overview] Message-ID: <20211217222745.881637-6-andrey.grodzovsky@amd.com> (raw) In-Reply-To: <20211217222745.881637-1-andrey.grodzovsky@amd.com> Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 1 - 3 files changed, 1 insertion(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 55cd67b9ede2..d2701e4d0622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5013,25 +5013,9 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, dev_info(adev->dev, "GPU %s begin!\n", need_emergency_restart ? "jobs stop":"reset"); - /* - * Here we trylock to avoid chain of resets executing from - * either trigger by jobs on different adevs in XGMI hive or jobs on - * different schedulers for same device while this TO handler is running. - * We always reset all schedulers for device and all devices for XGMI - * hive so that should take care of them too. - */ hive = amdgpu_get_xgmi_hive(adev); - if (hive) { - if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) { - DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress", - job ? job->base.id : -1, hive->hive_id); - amdgpu_put_xgmi_hive(hive); - if (job && job->vm) - drm_sched_increase_karma(&job->base); - return 0; - } + if (hive) mutex_lock(&hive->hive_lock); - } reset_context.method = AMD_RESET_METHOD_NONE; reset_context.reset_req_dev = adev; @@ -5226,7 +5210,6 @@ int amdgpu_device_gpu_recover_imp(struct amdgpu_device *adev, skip_recovery: if (hive) { - atomic_set(&hive->in_reset, 0); mutex_unlock(&hive->hive_lock); amdgpu_put_xgmi_hive(hive); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c index 8b116f398101..0d54bef5c494 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c @@ -403,7 +403,6 @@ struct amdgpu_hive_info *amdgpu_get_xgmi_hive(struct amdgpu_device *adev) INIT_LIST_HEAD(&hive->device_list); INIT_LIST_HEAD(&hive->node); mutex_init(&hive->hive_lock); - atomic_set(&hive->in_reset, 0); atomic_set(&hive->number_devices, 0); task_barrier_init(&hive->tb); hive->pstate = AMDGPU_XGMI_PSTATE_UNKNOWN; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h index 6121aaa292cb..2f2ce53645a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h @@ -33,7 +33,6 @@ struct amdgpu_hive_info { struct list_head node; atomic_t number_devices; struct mutex hive_lock; - atomic_t in_reset; int hi_req_count; struct amdgpu_device *hi_req_gpu; struct task_barrier tb; -- 2.25.1
next prev parent reply other threads:[~2021-12-17 22:28 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-17 22:27 [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-17 22:27 ` [RFC 1/6] drm/amdgpu: Init GPU reset single threaded wq Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-17 22:27 ` [RFC 2/6] drm/amdgpu: Move scheduler init to after XGMI is ready Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-20 7:16 ` Christian König 2021-12-20 7:16 ` Christian König 2021-12-20 21:51 ` Andrey Grodzovsky 2021-12-20 21:51 ` Andrey Grodzovsky 2021-12-21 7:05 ` Christian König 2021-12-21 7:05 ` Christian König 2021-12-17 22:27 ` [RFC 3/6] drm/amdgpu: Fix crash on modprobe Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-20 7:17 ` Christian König 2021-12-20 7:17 ` Christian König 2021-12-20 19:22 ` Andrey Grodzovsky 2021-12-20 19:22 ` Andrey Grodzovsky 2021-12-21 7:02 ` Christian König 2021-12-21 7:02 ` Christian König 2021-12-21 16:03 ` Andrey Grodzovsky 2021-12-21 16:03 ` Andrey Grodzovsky 2021-12-22 7:50 ` Christian König 2021-12-22 7:50 ` Christian König 2021-12-17 22:27 ` [RFC 4/6] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-20 7:20 ` Christian König 2021-12-20 7:20 ` Christian König 2021-12-20 22:17 ` Andrey Grodzovsky 2021-12-20 22:17 ` Andrey Grodzovsky 2021-12-21 7:59 ` Christian König 2021-12-21 7:59 ` Christian König 2021-12-21 16:10 ` Andrey Grodzovsky 2021-12-21 16:10 ` Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky [this message] 2021-12-17 22:27 ` [RFC 5/6] drm/amdgpu: Drop hive->in_reset Andrey Grodzovsky 2021-12-17 22:27 ` [RFC 6/6] drm/amdgpu: Drop concurrent GPU reset protection for device Andrey Grodzovsky 2021-12-17 22:27 ` Andrey Grodzovsky 2021-12-20 7:25 ` [RFC 0/6] Define and use reset domain for GPU recovery in amdgpu Christian König 2021-12-20 7:25 ` Christian König 2021-12-20 9:43 ` Daniel Vetter 2021-12-20 9:43 ` Daniel Vetter 2021-12-20 17:06 ` Liu, Shaoyun 2021-12-20 17:06 ` Liu, Shaoyun 2021-12-20 19:11 ` Andrey Grodzovsky 2021-12-20 19:11 ` Andrey Grodzovsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211217222745.881637-6-andrey.grodzovsky@amd.com \ --to=andrey.grodzovsky@amd.com \ --cc=Monk.Liu@amd.com \ --cc=amd-gfx@lists.freedesktop.org \ --cc=christian.koenig@amd.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=horace.chen@amd.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.