All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-media@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Arunpravin Paneer Selvam" <Arunpravin.PaneerSelvam@amd.com>
Subject: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
Date: Tue, 7 Mar 2023 21:22:23 -0500	[thread overview]
Message-ID: <20230307212223.7e49384a@gandalf.local.home> (raw)


In a report for a regression in my code, I tried to run v6.3-rc1 through my
tests. It crashed at boot up on my first test (my start up tests do take a
long time, hence the 206 seconds of boot!).

[  206.238782] ------------[ cut here ]------------
[  206.277786] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  206.277946] WARNING: CPU: 0 PID: 332 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.313338] Modules linked in:
[  206.324732] CPU: 0 PID: 332 Comm: kworker/0:13H Not tainted 6.3.0-rc1-test-00001-ga98bd42762ed-dirty #965
[  206.338273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[  206.353596] Workqueue: ttm ttm_bo_delayed_delete
[  206.370520] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.382855] Code: e8 ab 59 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 58 c0 3b cf 85 c9 0f 85 17 fb ff ff 68 e0 8d 07 cf 68 2b ac 05 cf e8 e6 e6 3f ff <0f> 0b 58 5a e9 ff fa ff ff e8 78 59 95 ff 85 c0 74 0e 8b 0d 58 c0
[  206.411247] EAX: 00000028 EBX: 00000000 ECX: c3ae5dd8 EDX: 00000002
[  206.425193] ESI: 00000000 EDI: c2d5f0bc EBP: c3ae5f00 ESP: c3ae5eac
[  206.439236] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
[  206.453597] CR0: 80050033 CR2: ff9ff000 CR3: 0f512000 CR4: 00150ef0
[  206.467841] Call Trace:
[  206.481059]  ? ttm_bo_delayed_delete+0x30/0x94
[  206.494980]  ww_mutex_lock+0x32/0x94
[  206.508699]  ttm_bo_delayed_delete+0x30/0x94
[  206.522371]  process_one_work+0x21a/0x538
[  206.536306]  worker_thread+0x146/0x398
[  206.549860]  kthread+0xea/0x10c
[  206.563141]  ? process_one_work+0x538/0x538
[  206.576835]  ? kthread_complete_and_exit+0x1c/0x1c
[  206.590652]  ret_from_fork+0x1c/0x28
[  206.604522] irq event stamp: 4219
[  206.617852] hardirqs last  enabled at (4219): [<ced2a039>] _raw_spin_unlock_irqrestore+0x2d/0x58
[  206.633077] hardirqs last disabled at (4218): [<ce1d3a65>] kvfree_call_rcu+0x155/0x2ec
[  206.648161] softirqs last  enabled at (3570): [<ced2b113>] __do_softirq+0x2f3/0x48b
[  206.663025] softirqs last disabled at (3565): [<ce0c84e9>] call_on_stack+0x45/0x4c
[  206.678065] ---[ end trace 0000000000000000 ]---

Looks like there was a lock possibly used after free. But as commit
9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
workers") changed a lot of this code, I figured it may be the culprit.

-- Steve

WARNING: multiple messages have this Message-ID (diff)
From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Arunpravin Paneer Selvam" <Arunpravin.PaneerSelvam@amd.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	amd-gfx@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	linux-media@vger.kernel.org
Subject: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
Date: Tue, 7 Mar 2023 21:22:23 -0500	[thread overview]
Message-ID: <20230307212223.7e49384a@gandalf.local.home> (raw)


In a report for a regression in my code, I tried to run v6.3-rc1 through my
tests. It crashed at boot up on my first test (my start up tests do take a
long time, hence the 206 seconds of boot!).

[  206.238782] ------------[ cut here ]------------
[  206.277786] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  206.277946] WARNING: CPU: 0 PID: 332 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.313338] Modules linked in:
[  206.324732] CPU: 0 PID: 332 Comm: kworker/0:13H Not tainted 6.3.0-rc1-test-00001-ga98bd42762ed-dirty #965
[  206.338273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[  206.353596] Workqueue: ttm ttm_bo_delayed_delete
[  206.370520] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.382855] Code: e8 ab 59 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 58 c0 3b cf 85 c9 0f 85 17 fb ff ff 68 e0 8d 07 cf 68 2b ac 05 cf e8 e6 e6 3f ff <0f> 0b 58 5a e9 ff fa ff ff e8 78 59 95 ff 85 c0 74 0e 8b 0d 58 c0
[  206.411247] EAX: 00000028 EBX: 00000000 ECX: c3ae5dd8 EDX: 00000002
[  206.425193] ESI: 00000000 EDI: c2d5f0bc EBP: c3ae5f00 ESP: c3ae5eac
[  206.439236] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
[  206.453597] CR0: 80050033 CR2: ff9ff000 CR3: 0f512000 CR4: 00150ef0
[  206.467841] Call Trace:
[  206.481059]  ? ttm_bo_delayed_delete+0x30/0x94
[  206.494980]  ww_mutex_lock+0x32/0x94
[  206.508699]  ttm_bo_delayed_delete+0x30/0x94
[  206.522371]  process_one_work+0x21a/0x538
[  206.536306]  worker_thread+0x146/0x398
[  206.549860]  kthread+0xea/0x10c
[  206.563141]  ? process_one_work+0x538/0x538
[  206.576835]  ? kthread_complete_and_exit+0x1c/0x1c
[  206.590652]  ret_from_fork+0x1c/0x28
[  206.604522] irq event stamp: 4219
[  206.617852] hardirqs last  enabled at (4219): [<ced2a039>] _raw_spin_unlock_irqrestore+0x2d/0x58
[  206.633077] hardirqs last disabled at (4218): [<ce1d3a65>] kvfree_call_rcu+0x155/0x2ec
[  206.648161] softirqs last  enabled at (3570): [<ced2b113>] __do_softirq+0x2f3/0x48b
[  206.663025] softirqs last disabled at (3565): [<ce0c84e9>] call_on_stack+0x45/0x4c
[  206.678065] ---[ end trace 0000000000000000 ]---

Looks like there was a lock possibly used after free. But as commit
9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
workers") changed a lot of this code, I figured it may be the culprit.

-- Steve

WARNING: multiple messages have this Message-ID (diff)
From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Arunpravin Paneer Selvam" <Arunpravin.PaneerSelvam@amd.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	amd-gfx@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	linux-media@vger.kernel.org
Subject: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
Date: Tue, 7 Mar 2023 21:22:23 -0500	[thread overview]
Message-ID: <20230307212223.7e49384a@gandalf.local.home> (raw)


In a report for a regression in my code, I tried to run v6.3-rc1 through my
tests. It crashed at boot up on my first test (my start up tests do take a
long time, hence the 206 seconds of boot!).

[  206.238782] ------------[ cut here ]------------
[  206.277786] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  206.277946] WARNING: CPU: 0 PID: 332 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.313338] Modules linked in:
[  206.324732] CPU: 0 PID: 332 Comm: kworker/0:13H Not tainted 6.3.0-rc1-test-00001-ga98bd42762ed-dirty #965
[  206.338273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[  206.353596] Workqueue: ttm ttm_bo_delayed_delete
[  206.370520] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec
[  206.382855] Code: e8 ab 59 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 58 c0 3b cf 85 c9 0f 85 17 fb ff ff 68 e0 8d 07 cf 68 2b ac 05 cf e8 e6 e6 3f ff <0f> 0b 58 5a e9 ff fa ff ff e8 78 59 95 ff 85 c0 74 0e 8b 0d 58 c0
[  206.411247] EAX: 00000028 EBX: 00000000 ECX: c3ae5dd8 EDX: 00000002
[  206.425193] ESI: 00000000 EDI: c2d5f0bc EBP: c3ae5f00 ESP: c3ae5eac
[  206.439236] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
[  206.453597] CR0: 80050033 CR2: ff9ff000 CR3: 0f512000 CR4: 00150ef0
[  206.467841] Call Trace:
[  206.481059]  ? ttm_bo_delayed_delete+0x30/0x94
[  206.494980]  ww_mutex_lock+0x32/0x94
[  206.508699]  ttm_bo_delayed_delete+0x30/0x94
[  206.522371]  process_one_work+0x21a/0x538
[  206.536306]  worker_thread+0x146/0x398
[  206.549860]  kthread+0xea/0x10c
[  206.563141]  ? process_one_work+0x538/0x538
[  206.576835]  ? kthread_complete_and_exit+0x1c/0x1c
[  206.590652]  ret_from_fork+0x1c/0x28
[  206.604522] irq event stamp: 4219
[  206.617852] hardirqs last  enabled at (4219): [<ced2a039>] _raw_spin_unlock_irqrestore+0x2d/0x58
[  206.633077] hardirqs last disabled at (4218): [<ce1d3a65>] kvfree_call_rcu+0x155/0x2ec
[  206.648161] softirqs last  enabled at (3570): [<ced2b113>] __do_softirq+0x2f3/0x48b
[  206.663025] softirqs last disabled at (3565): [<ce0c84e9>] call_on_stack+0x45/0x4c
[  206.678065] ---[ end trace 0000000000000000 ]---

Looks like there was a lock possibly used after free. But as commit
9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
workers") changed a lot of this code, I figured it may be the culprit.

-- Steve

             reply	other threads:[~2023-03-08  2:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08  2:22 Steven Rostedt [this message]
2023-03-08  2:22 ` [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete() Steven Rostedt
2023-03-08  2:22 ` Steven Rostedt
2023-03-08  2:26 ` Steven Rostedt
2023-03-08  2:26   ` Steven Rostedt
2023-03-08  2:26   ` [Intel-gfx] " Steven Rostedt
2023-03-08  6:17   ` Christian König
2023-03-08  6:17     ` [Intel-gfx] " Christian König
2023-03-08  6:17     ` Christian König
2023-03-08 12:43     ` Steven Rostedt
2023-03-08 12:43       ` Steven Rostedt
2023-03-08 12:43       ` [Intel-gfx] " Steven Rostedt
2023-03-15 18:41       ` Christian König
2023-03-15 18:41         ` Christian König
2023-03-15 18:41         ` [Intel-gfx] " Christian König
2023-03-15 19:15         ` Matthew Auld
2023-03-15 19:15           ` Matthew Auld
2023-03-15 19:51           ` Christian König
2023-03-15 19:51             ` Christian König
2023-03-15 20:20             ` Steven Rostedt
2023-03-15 20:20               ` Steven Rostedt
2023-03-15 20:20               ` Steven Rostedt
2023-03-16  0:21               ` Steven Rostedt
2023-03-16  0:21                 ` Steven Rostedt
2023-03-16  0:21                 ` Steven Rostedt
2023-03-16  0:22                 ` Steven Rostedt
2023-03-16  0:22                   ` Steven Rostedt
2023-03-16  0:22                   ` Steven Rostedt
2023-03-17 17:42                   ` Linus Torvalds
2023-03-17 17:42                     ` Linus Torvalds
2023-03-17 17:42                     ` Linus Torvalds
2023-03-15 15:09     ` Steven Rostedt
2023-03-15 15:09       ` Steven Rostedt
2023-03-15 15:09       ` [Intel-gfx] " Steven Rostedt
2023-03-15 15:25       ` Christian König
2023-03-15 15:25         ` [Intel-gfx] " Christian König
2023-03-15 15:25         ` Christian König
2023-03-15 15:57         ` Steven Rostedt
2023-03-15 15:57           ` [Intel-gfx] " Steven Rostedt
2023-03-15 15:57           ` Steven Rostedt
2023-03-15 17:31           ` Steven Rostedt
2023-03-15 17:31             ` [Intel-gfx] " Steven Rostedt
2023-03-15 17:31             ` Steven Rostedt
2023-03-15 18:34             ` Christian König
2023-03-15 18:34               ` Christian König
2023-03-15 18:34               ` [Intel-gfx] " Christian König
2023-03-15 17:54           ` Steven Rostedt
2023-03-15 17:54             ` [Intel-gfx] " Steven Rostedt
2023-03-15 17:54             ` Steven Rostedt
2023-03-15 18:25             ` Christian König
2023-03-15 18:25               ` [Intel-gfx] " Christian König
2023-03-15 18:25               ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230307212223.7e49384a@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=Arunpravin.PaneerSelvam@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.