linux-cve-announce.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CVE-2024-26628: drm/amdkfd: Fix lock dependency warning
@ 2024-03-06  6:46 Greg Kroah-Hartman
  0 siblings, 0 replies; only message in thread
From: Greg Kroah-Hartman @ 2024-03-06  6:46 UTC (permalink / raw)
  To: linux-cve-announce; +Cc: Greg Kroah-Hartman

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

drm/amdkfd: Fix lock dependency warning

======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550

but task is already holding lock:
ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&svms->lock){+.+.}-{3:3}:
       __mutex_lock+0x97/0xd30
       kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
       kfd_ioctl+0x1b2/0x5d0 [amdgpu]
       __x64_sys_ioctl+0x86/0xc0
       do_syscall_64+0x39/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

-> #1 (&mm->mmap_lock){++++}-{3:3}:
       down_read+0x42/0x160
       svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
       __lock_acquire+0x1426/0x2200
       lock_acquire+0xc1/0x2b0
       __flush_work+0x80/0x550
       __cancel_work_timer+0x109/0x190
       svm_range_bo_release+0xdc/0x1c0 [amdgpu]
       svm_range_free+0x175/0x180 [amdgpu]
       svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
       process_one_work+0x27a/0x540
       worker_thread+0x53/0x3e0
       kthread+0xeb/0x120
       ret_from_fork+0x31/0x50
       ret_from_fork_asm+0x11/0x20

other info that might help us debug this:

Chain exists of:
  (work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&svms->lock);
                               lock(&mm->mmap_lock);
                               lock(&svms->lock);
  lock((work_completion)(&svm_bo->eviction_work));

I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.

To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.

v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.

The Linux kernel CVE team has assigned CVE-2024-26628 to this issue.


Affected and fixed versions
===========================

	Fixed in 5.15.149 with commit 7a70663ba02b
	Fixed in 6.1.77 with commit 8b25d397162b
	Fixed in 6.6.16 with commit 28d2d623d2fb
	Fixed in 6.7.4 with commit cb96e492d72d
	Fixed in 6.8-rc1 with commit 47bf0f83fc86

Please see https://www.kernel.org or a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2024-26628
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	drivers/gpu/drm/amd/amdkfd/kfd_svm.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/7a70663ba02bd4e19aea8d70c979eb3bd03d839d
	https://git.kernel.org/stable/c/8b25d397162b0316ceda40afaa63ee0c4a97d28b
	https://git.kernel.org/stable/c/28d2d623d2fbddcca5c24600474e92f16ebb3a05
	https://git.kernel.org/stable/c/cb96e492d72d143d57db2d2bc143a1cee8741807
	https://git.kernel.org/stable/c/47bf0f83fc86df1bf42b385a91aadb910137c5c9

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-03-06  6:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-06  6:46 CVE-2024-26628: drm/amdkfd: Fix lock dependency warning Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).