All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH] drm/xe: Fix potential deadlock handling page faults
@ 2023-03-18  3:28 Matthew Brost
  2023-03-18  3:30 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Matthew Brost @ 2023-03-18  3:28 UTC (permalink / raw)
  To: intel-xe

Within a class the GuC will hault scheduling if the head of the queue
can't be scheduled the queue will block. This can lead to deadlock if
BCS0-7 all have faults and another engine on BCS0-7 is at head of the
GuC scheduling queue as the migration engine used to fix tthe fault will
be blocked. To work around this set the migration engine to the highest
priority when servicing page faults.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 76ec40567a78..8fad6e60f826 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -106,6 +106,7 @@ static struct xe_vma *lookup_vma(struct xe_vm *vm, u64 page_addr)
 static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 {
 	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_engine *e = xe_gt_migrate_engine(gt);
 	struct xe_vm *vm;
 	struct xe_vma *vma = NULL;
 	struct xe_bo *bo;
@@ -185,6 +186,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	if (ret)
 		goto unlock_vm;
 
+	e->ops->set_priority(e, DRM_SCHED_PRIORITY_KERNEL);
 	if (atomic) {
 		if (xe_vma_is_userptr(vma)) {
 			ret = -EACCES;
@@ -204,7 +206,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 
 	/* Bind VMA only to the GT that has faulted */
 	trace_xe_vma_pf_bind(vma);
-	fence = __xe_pt_bind_vma(gt, vma, xe_gt_migrate_engine(gt), NULL, 0,
+	fence = __xe_pt_bind_vma(gt, vma, e, NULL, 0,
 				 vma->gt_present & BIT(gt->info.id));
 	if (IS_ERR(fence)) {
 		ret = PTR_ERR(fence);
@@ -218,6 +220,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	 */
 	dma_fence_wait(fence, false);
 	dma_fence_put(fence);
+	e->ops->set_priority(e, DRM_SCHED_PRIORITY_NORMAL);
 
 	if (xe_vma_is_userptr(vma))
 		ret = xe_vma_userptr_check_repin(vma);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread
* [Intel-xe] [PATCH] drm/xe: Fix potential deadlock handling page faults
@ 2023-03-21  3:42 Matthew Brost
  2023-03-24 16:21 ` Maarten Lankhorst
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Brost @ 2023-03-21  3:42 UTC (permalink / raw)
  To: intel-xe

Within a class the GuC will hault scheduling if the head of the queue
can't be scheduled the queue will block. This can lead to deadlock if
BCS0-7 all have faults and another engine on BCS0-7 is at head of the
GuC scheduling queue as the migration engine used to fix tthe fault will
be blocked. To work around this set the migration engine to the highest
priority when servicing page faults.

v2 (Maarten): Set priority to kernel once at creation

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 77a6d71f6e89..546711a0ec39 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -359,6 +359,8 @@ struct xe_migrate *xe_migrate_init(struct xe_gt *gt)
 		xe_vm_close_and_put(vm);
 		return ERR_CAST(m->eng);
 	}
+	if (xe->info.supports_usm)
+		m->eng->entity->priority = DRM_SCHED_PRIORITY_KERNEL;
 
 	mutex_init(&m->job_mutex);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-24 16:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-18  3:28 [Intel-xe] [PATCH] drm/xe: Fix potential deadlock handling page faults Matthew Brost
2023-03-18  3:30 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-03-18  3:31 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-03-18  3:35 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-03-18  3:47 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-03-20 11:31 ` [Intel-xe] [PATCH] " Maarten Lankhorst
2023-03-20 17:14   ` Matthew Brost
2023-03-21  3:42 Matthew Brost
2023-03-24 16:21 ` Maarten Lankhorst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.