All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Lancelot SIX <lancelot.six@amd.com>,
	Felix Kuehling <felix.kuehling@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sasha Levin <sashal@kernel.org>,
	Felix.Kuehling@amd.com, christian.koenig@amd.com,
	Xinhui.Pan@amd.com, airlied@gmail.com, daniel@ffwll.ch,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.15 12/15] drm/amdkfd: Flush the process wq before creating a kfd_process
Date: Tue,  7 May 2024 19:13:21 -0400	[thread overview]
Message-ID: <20240507231333.394765-12-sashal@kernel.org> (raw)
In-Reply-To: <20240507231333.394765-1-sashal@kernel.org>

From: Lancelot SIX <lancelot.six@amd.com>

[ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ]

There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3).  In this scenario:
- The process executes exec.
 - This will eventually release the process's mm, which will cause the
   kfd_process object associated with the process to be freed
   (kfd_process_free_notifier decrements the reference count to the
   kfd_process to 0).  This causes kfd_process_ref_release to enqueue
   kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
  re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
 - When handling this request, KFD tries to re-create a kfd_process.
   This eventually calls kfd_create_process and kobject_init_and_add.

At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.

This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object.  This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 21ec8a18cad29..7f69031f2b61a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -818,6 +818,14 @@ struct kfd_process *kfd_create_process(struct file *filep)
 	if (process) {
 		pr_debug("Process already found\n");
 	} else {
+		/* If the process just called exec(3), it is possible that the
+		 * cleanup of the kfd_process (following the release of the mm
+		 * of the old process image) is still in the cleanup work queue.
+		 * Make sure to drain any job before trying to recreate any
+		 * resource for this process.
+		 */
+		flush_workqueue(kfd_process_wq);
+
 		process = create_process(thread);
 		if (IS_ERR(process))
 			goto out;
-- 
2.43.0


  parent reply	other threads:[~2024-05-07 23:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 23:13 [PATCH AUTOSEL 5.15 01/15] regulator: irq_helpers: duplicate IRQ name Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 02/15] ASoC: rt5645: Fix the electric noise due to the CBJ contacts floating Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 03/15] ASoC: dt-bindings: rt5645: add cbj sleeve gpio property Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 04/15] regulator: vqmmc-ipq4019: fix module autoloading Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 05/15] ASoC: rt715: add vendor clear control register Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 06/15] ASoC: rt715-sdca: volume step modification Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 07/15] softirq: Fix suspicious RCU usage in __do_softirq() Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 08/15] net: qede: sanitize 'rc' in qede_add_tc_flower_fltr() Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 09/15] firewire: nosy: ensure user_length is taken into account when fetching packet contents Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 10/15] ASoC: da7219-aad: fix usage of device_get_named_child_node() Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 11/15] drm/amd/display: Atom Integrated System Info v2_2 for DCN35 Sasha Levin
2024-05-07 23:13 ` Sasha Levin [this message]
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 13/15] x86/mm: Remove broken vsyscall emulation code from the page fault code Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 14/15] nvme: find numa distance only if controller has valid numa id Sasha Levin
2024-05-07 23:13 ` [PATCH AUTOSEL 5.15 15/15] epoll: be better about file lifetimes Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507231333.394765-12-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=lancelot.six@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.