stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/ttm: never purge busy objects
@ 2022-11-07 13:30 Matthew Auld
  2022-11-11 18:11 ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Auld @ 2022-11-07 13:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Niranjana Vishwanathapura, Andrzej Hajda, Nirmoy Das, stable

In i915_gem_madvise_ioctl() we immediately purge the object is not
currently used, like when the mm.pages are NULL.  With shmem the pages
might still be hanging around or are perhaps swapped out. Similarly with
ttm we might still have the pages hanging around on the ttm resource,
like with lmem or shmem, but here we need to be extra careful since
async unbinds are possible as well as in-progress kernel moves. In
i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
for us, however if it's busy the memory is only moved to a ghost object,
which then leads to broken behaviour when for example clearing the
i915_tt->filp, since the actual ttm_tt is still alive and populated,
even though it's been moved to the ghost object.  When we later destroy
the ghost object we hit the following, since the filp is now NULL:

[  +0.006982] #PF: supervisor read access in kernel mode
[  +0.005149] #PF: error_code(0x0000) - not-present page
[  +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
[  +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
[  +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
[  +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
 30 e9 42 b2 ff ff 4c 89
[  +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
[  +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
[  +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
[  +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
[  +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
[  +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
[  +0.007154] FS:  0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
[  +0.008108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
[  +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  +0.007154] Call Trace:
[  +0.002459]  <TASK>
[  +0.002126]  ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
[  +0.005068]  ttm_bo_tt_destroy+0x1c/0x50 [ttm]
[  +0.004464]  ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
[  +0.005244]  ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
[  +0.004721]  ttm_bo_delayed_delete+0x235/0x250 [ttm]
[  +0.004981]  ttm_device_delayed_workqueue+0x13/0x40 [ttm]
[  +0.005422]  process_one_work+0x248/0x560
[  +0.004028]  worker_thread+0x4b/0x390
[  +0.003682]  ? process_one_work+0x560/0x560
[  +0.004199]  kthread+0xeb/0x120
[  +0.003163]  ? kthread_complete_and_exit+0x20/0x20
[  +0.004815]  ret_from_fork+0x1f/0x30

Fixes: 213d50927763 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 25129af70f70..fb452a388b5e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -599,10 +599,18 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
 static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
 {
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
+	struct ttm_operation_ctx ctx = {
+		.interruptible = true,
+		.no_wait_gpu = false,
+	};
 	int err;
 
 	WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
 
+	err = ttm_bo_wait_ctx(bo, &ctx);
+	if (err)
+		return err;
+
 	err = i915_ttm_move_notify(bo);
 	if (err)
 		return err;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] drm/i915/ttm: never purge busy objects
  2022-11-07 13:30 [PATCH] drm/i915/ttm: never purge busy objects Matthew Auld
@ 2022-11-11 18:11 ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 2+ messages in thread
From: Niranjana Vishwanathapura @ 2022-11-11 18:11 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-gfx, Andrzej Hajda, Nirmoy Das, stable

On Mon, Nov 07, 2022 at 01:30:27PM +0000, Matthew Auld wrote:
>In i915_gem_madvise_ioctl() we immediately purge the object is not
>currently used, like when the mm.pages are NULL.  With shmem the pages
>might still be hanging around or are perhaps swapped out. Similarly with
>ttm we might still have the pages hanging around on the ttm resource,
>like with lmem or shmem, but here we need to be extra careful since
>async unbinds are possible as well as in-progress kernel moves. In
>i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource
>for us, however if it's busy the memory is only moved to a ghost object,
>which then leads to broken behaviour when for example clearing the
>i915_tt->filp, since the actual ttm_tt is still alive and populated,
>even though it's been moved to the ghost object.  When we later destroy
>the ghost object we hit the following, since the filp is now NULL:
>
>[  +0.006982] #PF: supervisor read access in kernel mode
>[  +0.005149] #PF: error_code(0x0000) - not-present page
>[  +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0
>[  +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI
>[  +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm]
>[  +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915]
>[  +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68
>8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70
> 30 e9 42 b2 ff ff 4c 89
>[  +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202
>[  +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000
>[  +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0
>[  +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001
>[  +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0
>[  +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248
>[  +0.007154] FS:  0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000
>[  +0.008108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[  +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0
>[  +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>[  +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>[  +0.007154] Call Trace:
>[  +0.002459]  <TASK>
>[  +0.002126]  ttm_tt_unpopulate.part.0+0x17/0x70 [ttm]
>[  +0.005068]  ttm_bo_tt_destroy+0x1c/0x50 [ttm]
>[  +0.004464]  ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm]
>[  +0.005244]  ttm_bo_cleanup_refs+0x90/0x2c0 [ttm]
>[  +0.004721]  ttm_bo_delayed_delete+0x235/0x250 [ttm]
>[  +0.004981]  ttm_device_delayed_workqueue+0x13/0x40 [ttm]
>[  +0.005422]  process_one_work+0x248/0x560
>[  +0.004028]  worker_thread+0x4b/0x390
>[  +0.003682]  ? process_one_work+0x560/0x560
>[  +0.004199]  kthread+0xeb/0x120
>[  +0.003163]  ? kthread_complete_and_exit+0x20/0x20
>[  +0.004815]  ret_from_fork+0x1f/0x30
>
>Fixes: 213d50927763 ("drm/i915/ttm: Introduce a TTM i915 gem object backend")
>Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Andrzej Hajda <andrzej.hajda@intel.com>
>Cc: Nirmoy Das <nirmoy.das@intel.com>
>Cc: <stable@vger.kernel.org> # v5.15+
>---
> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>index 25129af70f70..fb452a388b5e 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>@@ -599,10 +599,18 @@ i915_ttm_resource_get_st(struct drm_i915_gem_object *obj,
> static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
> {
> 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>+	struct ttm_operation_ctx ctx = {
>+		.interruptible = true,
>+		.no_wait_gpu = false,
>+	};
> 	int err;
>
> 	WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
>
>+	err = ttm_bo_wait_ctx(bo, &ctx);
>+	if (err)
>+		return err;
>+

We probably can call ttm_bo_wait() directly without the ctx?

Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

> 	err = i915_ttm_move_notify(bo);
> 	if (err)
> 		return err;
>-- 
>2.38.1
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-11-11 18:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 13:30 [PATCH] drm/i915/ttm: never purge busy objects Matthew Auld
2022-11-11 18:11 ` Niranjana Vishwanathapura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).