stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: "Rajneesh Bhardwaj" <rajneesh.bhardwaj@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <Alexander.Deucher@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Sasha Levin" <sashal@kernel.org>,
	Xinhui.Pan@amd.com, airlied@linux.ie, daniel@ffwll.ch,
	nirmoy.das@amd.com, matthew.auld@intel.com, Roy.Sun@amd.com,
	tzimmermann@suse.de, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Subject: [PATCH AUTOSEL 5.10 20/65] drm/amdgpu: Fix recursive locking warning
Date: Fri,  1 Apr 2022 10:41:21 -0400	[thread overview]
Message-ID: <20220401144206.1953700-20-sashal@kernel.org> (raw)
In-Reply-To: <20220401144206.1953700-1-sashal@kernel.org>

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

[ Upstream commit 447c7997b62a5115ba4da846dcdee4fc12298a6a ]

Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.

[  +0.000003] WARNING: possible recursive locking detected
[  +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.000004] --------------------------------------------
[  +0.000002] python/4822 is trying to acquire lock:
[  +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
              but task is already holding lock:
[  +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000017]
              other info that might help us debug this:
[  +0.000002]  Possible unsafe locking scenario:

[  +0.000003]        CPU0
[  +0.000002]        ----
[  +0.000002]   lock(reservation_ww_class_mutex);
[  +0.000004]   lock(reservation_ww_class_mutex);
[  +0.000003]
               *** DEADLOCK ***

[  +0.000002]  May be due to missing lock nesting notation

[  +0.000003] 7 locks held by python/4822:
[  +0.000003]  #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: ffffb2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: ffff932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000015]  #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.000038]  #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
              stack backtrace:
[  +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.000003] Call Trace:
[  +0.000003]  dump_stack+0x6d/0x89
[  +0.000010]  __lock_acquire+0xb93/0x1a90
[  +0.000009]  lock_acquire+0x25d/0x2d0
[  +0.000005]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.000007]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.000005]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.000010]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.000006]  ? __fget_files+0x107/0x1e0
[  +0.000007]  __x64_sys_ioctl+0x8b/0xd0
[  +0.000007]  do_syscall_64+0x36/0x70
[  +0.000004]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.000007] RIP: 0033:0x7fbff90a7317
[  +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
00007fbff90a7317
[  +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
0000000000000004
[  +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
00007fbcc402d880
[  +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
00000000c0184b18
[  +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
00007fbcc402d820

Cc: Christian König <christian.koenig@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index ad9863b84f1f..f615ecc06a22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1338,7 +1338,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
 		return;
 
-	dma_resv_lock(bo->base.resv, NULL);
+	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))
+		return;
 
 	r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence);
 	if (!WARN_ON(r)) {
-- 
2.34.1


  parent reply	other threads:[~2022-04-01 15:02 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 14:41 [PATCH AUTOSEL 5.10 01/65] drm: Add orientation quirk for GPD Win Max Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 02/65] ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 03/65] drm/amd/display: Add signal type check when verify stream backends same Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 04/65] drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 05/65] usb: gadget: tegra-xudc: Do not program SPARAM Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 06/65] usb: gadget: tegra-xudc: Fix control endpoint's definitions Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 07/65] ptp: replace snprintf with sysfs_emit Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 08/65] powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 09/65] ath11k: fix kernel panic during unload/load ath11k modules Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 10/65] ath11k: mhi: use mhi_sync_power_up() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 11/65] bpf: Make dst_port field in struct bpf_sock 16-bit wide Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 12/65] scsi: mvsas: Replace snprintf() with sysfs_emit() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 13/65] scsi: bfa: " Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 14/65] power: supply: axp20x_battery: properly report current when discharging Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 15/65] mt76: dma: initialize skip_unmap in mt76_dma_rx_fill Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 16/65] cfg80211: don't add non transmitted BSS to 6GHz scanned channels Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 17/65] libbpf: Fix build issue with llvm-readelf Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 18/65] ipv6: make mc_forwarding atomic Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 19/65] powerpc: Set crashkernel offset to mid of RMA region Sasha Levin
2022-04-01 14:41 ` Sasha Levin [this message]
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 21/65] PCI: aardvark: Fix support for MSI interrupts Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 22/65] iommu/arm-smmu-v3: fix event handling soft lockup Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 23/65] usb: ehci: add pci device support for Aspeed platforms Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 24/65] PCI: endpoint: Fix alignment fault error in copy tests Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 25/65] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 26/65] PCI: pciehp: Add Qualcomm quirk for Command Completed erratum Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 27/65] power: supply: axp288-charger: Set Vhold to 4.4V Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 28/65] iwlwifi: mvm: Correctly set fragmented EBS Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 29/65] ipv4: Invalidate neighbour for broadcast address upon address addition Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 30/65] dm ioctl: prevent potential spectre v1 gadget Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 31/65] dm: requeue IO if mapping table not yet available Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 32/65] drm/amdkfd: make CRAT table missing message informational only Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 33/65] scsi: pm8001: Fix pm80xx_pci_mem_copy() interface Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 34/65] scsi: pm8001: Fix pm8001_mpi_task_abort_resp() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 35/65] scsi: pm8001: Fix task leak in pm8001_send_abort_all() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 36/65] scsi: pm8001: Fix tag leaks on error Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 37/65] scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 38/65] mt76: mt7615: Fix assigning negative values to unsigned variable Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 39/65] scsi: aha152x: Fix aha152x_setup() __setup handler return value Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 40/65] scsi: hisi_sas: Free irq vectors in order for v3 HW Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 41/65] net/smc: correct settings of RMB window update limit Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 42/65] mips: ralink: fix a refcount leak in ill_acc_of_setup() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 43/65] macvtap: advertise link netns via netlink Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 44/65] tuntap: add sanity checks about msg_controllen in sendmsg Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 45/65] iommu/iova: Improve 32-bit free space estimate Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 46/65] Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 47/65] Bluetooth: use memset avoid memory leaks Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 48/65] bnxt_en: Eliminate unintended link toggle during FW reset Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 49/65] PCI: endpoint: Fix misused goto label Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 50/65] MIPS: fix fortify panic when copying asm exception handlers Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 51/65] powerpc/code-patching: Pre-map patch area Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 52/65] powerpc/secvar: fix refcount leak in format_show() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 53/65] scsi: libfc: Fix use after free in fc_exch_abts_resp() Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 54/65] can: isotp: set default value for N_As to 50 micro seconds Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 55/65] net: account alternate interface name memory Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 56/65] net: limit altnames to 64k total Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 57/65] net: sfp: add 2500base-X quirk for Lantech SFP module Sasha Levin
2022-04-01 14:41 ` [PATCH AUTOSEL 5.10 58/65] usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 59/65] xtensa: fix DTC warning unit_address_format Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 60/65] MIPS: ingenic: correct unit node address Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 61/65] Bluetooth: Fix use after free in hci_send_acl Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 62/65] netlabel: fix out-of-bounds memory accesses Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 63/65] ceph: fix memory leak in ceph_readdir when note_last_dentry returns error Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 64/65] init/main.c: return 1 from handled __setup() functions Sasha Levin
2022-04-01 14:42 ` [PATCH AUTOSEL 5.10 65/65] minix: fix bug when opening a file with O_DIRECT Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220401144206.1953700-20-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Alexander.Deucher@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Roy.Sun@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.auld@intel.com \
    --cc=nirmoy.das@amd.com \
    --cc=rajneesh.bhardwaj@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).