All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06  9:06 ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06  9:06 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: LKML, Daniel Vetter, Chris Wilson, Tvrtko Ursulin,
	Joonas Lahtinen, Peter Zijlstra, Thomas Gleixner, Sasha Levin,
	Marta Lofstedt, Tejun Heo, Daniel Vetter

4.14-rc1 gained the fancy new cross-release support in lockdep, which
seems to have uncovered a few more rules about what is allowed and
isn't.

This one here seems to indicate that allocating a work-queue while
holding mmap_sem is a no-go, so let's try to preallocate it.

Of course another way to break this chain would be somewhere in the
cpu hotplug code, since this isn't the only trace we're finding now
which goes through msr_create_device.

Full lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
------------------------------------------------------
prime_mmap/1551 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50

but task is already holding lock:
 (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev_priv->mm_lock){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xa3/0x840
       cpuhp_thread_fun+0x7a/0x150
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x10b/0x170
       __cpuhp_setup_state_cpuslocked+0x134/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x52/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       apply_workqueue_attrs+0x17/0x50
       __alloc_workqueue_key+0x1d8/0x4d9
       i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev_priv->mm_lock);
                               lock(&mm->mmap_sem);
                               lock(&dev_priv->mm_lock);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

2 locks held by prime_mmap/1551:
 #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
 #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

stack backtrace:
CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? apply_workqueue_attrs+0x17/0x50
 cpus_read_lock+0x3d/0xb0
 ? apply_workqueue_attrs+0x17/0x50
 apply_workqueue_attrs+0x17/0x50
 __alloc_workqueue_key+0x1d8/0x4d9
 ? __lockdep_init_map+0x57/0x1c0
 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
 i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 drm_ioctl_kernel+0x69/0xb0
 drm_ioctl+0x2f9/0x3d0
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 ? __do_page_fault+0x2a4/0x570
 do_vfs_ioctl+0x94/0x670
 ? entry_SYSCALL_64_fastpath+0x5/0xb1
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xe3/0x1b0
 SyS_ioctl+0x41/0x70
 entry_SYSCALL_64_fastpath+0x1c/0xb1
RIP: 0033:0x7fbb83c39587
RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
 ? __this_cpu_preempt_check+0x13/0x20

v2: Set ret correctly when we raced with another thread.

v3: Use Chris' diff. Attach the right lockdep splat.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Tejun Heo <tj@kernel.org>
References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 2d4996de7331..f9b3406401af 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
 i915_mmu_notifier_create(struct mm_struct *mm)
 {
 	struct i915_mmu_notifier *mn;
-	int ret;
 
 	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
 	if (mn == NULL)
@@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	 /* Protected by mmap_sem (write-lock) */
-	ret = __mmu_notifier_register(&mn->mn, mm);
-	if (ret) {
-		destroy_workqueue(mn->wq);
-		kfree(mn);
-		return ERR_PTR(ret);
-	}
-
 	return mn;
 }
 
@@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
 static struct i915_mmu_notifier *
 i915_mmu_notifier_find(struct i915_mm_struct *mm)
 {
-	struct i915_mmu_notifier *mn = mm->mn;
+	struct i915_mmu_notifier *mn;
+	int err;
 
 	mn = mm->mn;
 	if (mn)
 		return mn;
 
+	mn = i915_mmu_notifier_create(mm->mm);
+	if (IS_ERR(mn))
+		return mn;
+
+	err = 0;
 	down_write(&mm->mm->mmap_sem);
 	mutex_lock(&mm->i915->mm_lock);
-	if ((mn = mm->mn) == NULL) {
-		mn = i915_mmu_notifier_create(mm->mm);
-		if (!IS_ERR(mn))
-			mm->mn = mn;
+	if (mm->mn == NULL) {
+		/* Protected by mmap_sem (write-lock) */
+		err = __mmu_notifier_register(&mn->mn, mm->mm);
+		if (!err) {
+			/* Protected by mm_lock */
+			mm->mn = fetch_and_zero(&mn);
+		}
 	}
 	mutex_unlock(&mm->i915->mm_lock);
 	up_write(&mm->mm->mmap_sem);
 
-	return mn;
+	if (mn) {
+		destroy_workqueue(mn->wq);
+		kfree(mn);
+	}
+
+	return err ? ERR_PTR(err) : mm->mn;
 }
 
 static int
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06  9:06 ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06  9:06 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Peter Zijlstra, Daniel Vetter, LKML, Tejun Heo, Daniel Vetter,
	Thomas Gleixner, Sasha Levin

4.14-rc1 gained the fancy new cross-release support in lockdep, which
seems to have uncovered a few more rules about what is allowed and
isn't.

This one here seems to indicate that allocating a work-queue while
holding mmap_sem is a no-go, so let's try to preallocate it.

Of course another way to break this chain would be somewhere in the
cpu hotplug code, since this isn't the only trace we're finding now
which goes through msr_create_device.

Full lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
------------------------------------------------------
prime_mmap/1551 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50

but task is already holding lock:
 (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev_priv->mm_lock){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xa3/0x840
       cpuhp_thread_fun+0x7a/0x150
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x10b/0x170
       __cpuhp_setup_state_cpuslocked+0x134/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x52/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       apply_workqueue_attrs+0x17/0x50
       __alloc_workqueue_key+0x1d8/0x4d9
       i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev_priv->mm_lock);
                               lock(&mm->mmap_sem);
                               lock(&dev_priv->mm_lock);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

2 locks held by prime_mmap/1551:
 #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
 #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

stack backtrace:
CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? apply_workqueue_attrs+0x17/0x50
 cpus_read_lock+0x3d/0xb0
 ? apply_workqueue_attrs+0x17/0x50
 apply_workqueue_attrs+0x17/0x50
 __alloc_workqueue_key+0x1d8/0x4d9
 ? __lockdep_init_map+0x57/0x1c0
 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
 i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 drm_ioctl_kernel+0x69/0xb0
 drm_ioctl+0x2f9/0x3d0
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 ? __do_page_fault+0x2a4/0x570
 do_vfs_ioctl+0x94/0x670
 ? entry_SYSCALL_64_fastpath+0x5/0xb1
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xe3/0x1b0
 SyS_ioctl+0x41/0x70
 entry_SYSCALL_64_fastpath+0x1c/0xb1
RIP: 0033:0x7fbb83c39587
RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
 ? __this_cpu_preempt_check+0x13/0x20

v2: Set ret correctly when we raced with another thread.

v3: Use Chris' diff. Attach the right lockdep splat.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Tejun Heo <tj@kernel.org>
References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 2d4996de7331..f9b3406401af 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
 i915_mmu_notifier_create(struct mm_struct *mm)
 {
 	struct i915_mmu_notifier *mn;
-	int ret;
 
 	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
 	if (mn == NULL)
@@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	 /* Protected by mmap_sem (write-lock) */
-	ret = __mmu_notifier_register(&mn->mn, mm);
-	if (ret) {
-		destroy_workqueue(mn->wq);
-		kfree(mn);
-		return ERR_PTR(ret);
-	}
-
 	return mn;
 }
 
@@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
 static struct i915_mmu_notifier *
 i915_mmu_notifier_find(struct i915_mm_struct *mm)
 {
-	struct i915_mmu_notifier *mn = mm->mn;
+	struct i915_mmu_notifier *mn;
+	int err;
 
 	mn = mm->mn;
 	if (mn)
 		return mn;
 
+	mn = i915_mmu_notifier_create(mm->mm);
+	if (IS_ERR(mn))
+		return mn;
+
+	err = 0;
 	down_write(&mm->mm->mmap_sem);
 	mutex_lock(&mm->i915->mm_lock);
-	if ((mn = mm->mn) == NULL) {
-		mn = i915_mmu_notifier_create(mm->mm);
-		if (!IS_ERR(mn))
-			mm->mn = mn;
+	if (mm->mn == NULL) {
+		/* Protected by mmap_sem (write-lock) */
+		err = __mmu_notifier_register(&mn->mn, mm->mm);
+		if (!err) {
+			/* Protected by mm_lock */
+			mm->mn = fetch_and_zero(&mn);
+		}
 	}
 	mutex_unlock(&mm->i915->mm_lock);
 	up_write(&mm->mm->mmap_sem);
 
-	return mn;
+	if (mn) {
+		destroy_workqueue(mn->wq);
+		kfree(mn);
+	}
+
+	return err ? ERR_PTR(err) : mm->mn;
 }
 
 static int
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06  9:06 ` Daniel Vetter
@ 2017-10-06  9:06   ` Daniel Vetter
  -1 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06  9:06 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: LKML, Daniel Vetter, Chris Wilson, Mika Kuoppala,
	Thomas Gleixner, Marta Lofstedt, Daniel Vetter

stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch tries to address the locking abuse of stop_machine() from

commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 22 14:41:21 2016 +0000

    drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.

I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.

This should fix the followwing lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40

but task is already holding lock:
 (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev->struct_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_interruptible_nested+0x1b/0x20
       i915_mutex_lock_interruptible+0x51/0x130 [i915]
       i915_gem_fault+0x209/0x650 [i915]
       __do_fault+0x1e/0x80
       __handle_mm_fault+0xa08/0xed0
       handle_mm_fault+0x156/0x300
       __do_page_fault+0x2c5/0x570
       do_page_fault+0x28/0x250
       page_fault+0x22/0x30

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xc9/0xbf0
       cpuhp_thread_fun+0x17b/0x240
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x133/0x1c0
       __cpuhp_setup_state_cpuslocked+0x139/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x53/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       i915_handle_error+0x2d8/0x430 [i915]
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->struct_mutex);
                               lock(&mm->mmap_sem);
                               lock(&dev->struct_mutex);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

3 locks held by kworker/3:4/562:
 #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 ? irq_work_queue+0x86/0xe0
 ? wake_up_klogd+0x53/0x70
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? stop_machine+0x1c/0x40
 ? i915_gem_object_truncate+0x50/0x50 [i915]
 cpus_read_lock+0x3d/0xb0
 ? stop_machine+0x1c/0x40
 stop_machine+0x1c/0x40
 i915_gem_set_wedged+0x1a/0x20 [i915]
 i915_reset+0xb9/0x230 [i915]
 i915_reset_device+0x1f6/0x260 [i915]
 ? gen8_gt_irq_ack+0x170/0x170 [i915]
 ? work_on_cpu_safe+0x60/0x60
 i915_handle_error+0x2d8/0x430 [i915]
 ? vsnprintf+0xd1/0x4b0
 ? scnprintf+0x3a/0x70
 hangcheck_declare_hang+0xd3/0xf0 [i915]
 ? intel_runtime_pm_put+0x56/0xa0 [i915]
 i915_hangcheck_elapsed+0x262/0x2d0 [i915]
 process_one_work+0x233/0x660
 worker_thread+0x4e/0x3b0
 kthread+0x152/0x190
 ? process_one_work+0x660/0x660
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang

v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
 drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
 drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ab8c6946fea4..e79a6ca60265 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 }
 
-static void engine_set_wedged(struct intel_engine_cs *engine)
+static void engine_complete_requests(struct intel_engine_cs *engine)
 {
-	/* We need to be sure that no thread is running the old callback as
-	 * we install the nop handler (otherwise we would submit a request
-	 * to hardware that will never complete). In order to prevent this
-	 * race, we wait until the machine is idle before making the swap
-	 * (using stop_machine()).
-	 */
-	engine->submit_request = nop_submit_request;
-
 	/* Mark all executing requests as skipped */
 	engine->cancel_requests(engine);
 
@@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
 				       intel_engine_last_submit(engine));
 }
 
-static int __i915_gem_set_wedged_BKL(void *data)
+void i915_gem_set_wedged(struct drm_i915_private *i915)
 {
-	struct drm_i915_private *i915 = data;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
 	for_each_engine(engine, i915, id)
-		engine_set_wedged(engine);
+		engine->submit_request = nop_submit_request;
 
-	set_bit(I915_WEDGED, &i915->gpu_error.flags);
-	wake_up_all(&i915->gpu_error.reset_queue);
+	/* Make sure no one is running the old callback before we proceed with
+	 * cancelling requests and resetting the completion tracking. Otherwise
+	 * we might submit a request to the hardware which never completes.
+	 */
+	synchronize_rcu();
 
-	return 0;
-}
+	for_each_engine(engine, i915, id)
+		engine_complete_requests(engine);
 
-void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
-{
-	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
+	set_bit(I915_WEDGED, &i915->gpu_error.flags);
+	wake_up_all(&i915->gpu_error.reset_queue);
 }
 
 bool i915_gem_unset_wedged(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b100b38f1dd2..ef78a85cb845 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		trace_i915_gem_request_submit(request);
+		rcu_read_lock();
 		request->engine->submit_request(request);
+		rcu_read_unlock();
 		break;
 
 	case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
index 78b9f811707f..a999161e8db1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
@@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
 	}
 	i915_gem_request_get(vip);
 	i915_add_request(vip);
+	rcu_read_lock();
 	request->engine->submit_request(request);
+	rcu_read_unlock();
 
 	mutex_unlock(&i915->drm.struct_mutex);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-06  9:06   ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06  9:06 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Daniel Vetter, LKML, Daniel Vetter, Thomas Gleixner, Mika Kuoppala

stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch tries to address the locking abuse of stop_machine() from

commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 22 14:41:21 2016 +0000

    drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.

I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.

This should fix the followwing lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40

but task is already holding lock:
 (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev->struct_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_interruptible_nested+0x1b/0x20
       i915_mutex_lock_interruptible+0x51/0x130 [i915]
       i915_gem_fault+0x209/0x650 [i915]
       __do_fault+0x1e/0x80
       __handle_mm_fault+0xa08/0xed0
       handle_mm_fault+0x156/0x300
       __do_page_fault+0x2c5/0x570
       do_page_fault+0x28/0x250
       page_fault+0x22/0x30

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xc9/0xbf0
       cpuhp_thread_fun+0x17b/0x240
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x133/0x1c0
       __cpuhp_setup_state_cpuslocked+0x139/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x53/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       i915_handle_error+0x2d8/0x430 [i915]
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->struct_mutex);
                               lock(&mm->mmap_sem);
                               lock(&dev->struct_mutex);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

3 locks held by kworker/3:4/562:
 #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
 #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 ? irq_work_queue+0x86/0xe0
 ? wake_up_klogd+0x53/0x70
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? stop_machine+0x1c/0x40
 ? i915_gem_object_truncate+0x50/0x50 [i915]
 cpus_read_lock+0x3d/0xb0
 ? stop_machine+0x1c/0x40
 stop_machine+0x1c/0x40
 i915_gem_set_wedged+0x1a/0x20 [i915]
 i915_reset+0xb9/0x230 [i915]
 i915_reset_device+0x1f6/0x260 [i915]
 ? gen8_gt_irq_ack+0x170/0x170 [i915]
 ? work_on_cpu_safe+0x60/0x60
 i915_handle_error+0x2d8/0x430 [i915]
 ? vsnprintf+0xd1/0x4b0
 ? scnprintf+0x3a/0x70
 hangcheck_declare_hang+0xd3/0xf0 [i915]
 ? intel_runtime_pm_put+0x56/0xa0 [i915]
 i915_hangcheck_elapsed+0x262/0x2d0 [i915]
 process_one_work+0x233/0x660
 worker_thread+0x4e/0x3b0
 kthread+0x152/0x190
 ? process_one_work+0x660/0x660
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang

v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
 drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
 drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ab8c6946fea4..e79a6ca60265 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
 	intel_engine_init_global_seqno(request->engine, request->global_seqno);
 }
 
-static void engine_set_wedged(struct intel_engine_cs *engine)
+static void engine_complete_requests(struct intel_engine_cs *engine)
 {
-	/* We need to be sure that no thread is running the old callback as
-	 * we install the nop handler (otherwise we would submit a request
-	 * to hardware that will never complete). In order to prevent this
-	 * race, we wait until the machine is idle before making the swap
-	 * (using stop_machine()).
-	 */
-	engine->submit_request = nop_submit_request;
-
 	/* Mark all executing requests as skipped */
 	engine->cancel_requests(engine);
 
@@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
 				       intel_engine_last_submit(engine));
 }
 
-static int __i915_gem_set_wedged_BKL(void *data)
+void i915_gem_set_wedged(struct drm_i915_private *i915)
 {
-	struct drm_i915_private *i915 = data;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
 	for_each_engine(engine, i915, id)
-		engine_set_wedged(engine);
+		engine->submit_request = nop_submit_request;
 
-	set_bit(I915_WEDGED, &i915->gpu_error.flags);
-	wake_up_all(&i915->gpu_error.reset_queue);
+	/* Make sure no one is running the old callback before we proceed with
+	 * cancelling requests and resetting the completion tracking. Otherwise
+	 * we might submit a request to the hardware which never completes.
+	 */
+	synchronize_rcu();
 
-	return 0;
-}
+	for_each_engine(engine, i915, id)
+		engine_complete_requests(engine);
 
-void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
-{
-	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
+	set_bit(I915_WEDGED, &i915->gpu_error.flags);
+	wake_up_all(&i915->gpu_error.reset_queue);
 }
 
 bool i915_gem_unset_wedged(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b100b38f1dd2..ef78a85cb845 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		trace_i915_gem_request_submit(request);
+		rcu_read_lock();
 		request->engine->submit_request(request);
+		rcu_read_unlock();
 		break;
 
 	case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
index 78b9f811707f..a999161e8db1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
@@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
 	}
 	i915_gem_request_get(vip);
 	i915_add_request(vip);
+	rcu_read_lock();
 	request->engine->submit_request(request);
+	rcu_read_unlock();
 
 	mutex_unlock(&i915->drm.struct_mutex);
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06  9:06   ` Daniel Vetter
@ 2017-10-06  9:17     ` Chris Wilson
  -1 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:17 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: LKML, Daniel Vetter, Mika Kuoppala, Thomas Gleixner,
	Marta Lofstedt, Daniel Vetter

Quoting Daniel Vetter (2017-10-06 10:06:37)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch tries to address the locking abuse of stop_machine() from
> 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

I still want a discussion of the reason why keeping the normal path clean
and why an alternative is sought, here. That design leads into vv

> To stay as close as possible to the stop_machine semantics we first
> update all the submit function pointers to the nop handler, then call
> synchronize_rcu() to make sure no new requests can be submitted. This
> should give us exactly the huge barrier we want.
> 
> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.
> 
> This should fix the followwing lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> ------------------------------------------------------
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev->struct_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_interruptible_nested+0x1b/0x20
>        i915_mutex_lock_interruptible+0x51/0x130 [i915]
>        i915_gem_fault+0x209/0x650 [i915]
>        __do_fault+0x1e/0x80
>        __handle_mm_fault+0xa08/0xed0
>        handle_mm_fault+0x156/0x300
>        __do_page_fault+0x2c5/0x570
>        do_page_fault+0x28/0x250
>        page_fault+0x22/0x30
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xc9/0xbf0
>        cpuhp_thread_fun+0x17b/0x240
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x133/0x1c0
>        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        stop_machine+0x1c/0x40
>        i915_gem_set_wedged+0x1a/0x20 [i915]
>        i915_reset+0xb9/0x230 [i915]
>        i915_reset_device+0x1f6/0x260 [i915]
>        i915_handle_error+0x2d8/0x430 [i915]
>        hangcheck_declare_hang+0xd3/0xf0 [i915]
>        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>        process_one_work+0x233/0x660
>        worker_thread+0x4e/0x3b0
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev->struct_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(&dev->struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> Workqueue: events_long i915_hangcheck_elapsed [i915]
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  ? irq_work_queue+0x86/0xe0
>  ? wake_up_klogd+0x53/0x70
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? stop_machine+0x1c/0x40
>  ? i915_gem_object_truncate+0x50/0x50 [i915]
>  cpus_read_lock+0x3d/0xb0
>  ? stop_machine+0x1c/0x40
>  stop_machine+0x1c/0x40
>  i915_gem_set_wedged+0x1a/0x20 [i915]
>  i915_reset+0xb9/0x230 [i915]
>  i915_reset_device+0x1f6/0x260 [i915]
>  ? gen8_gt_irq_ack+0x170/0x170 [i915]
>  ? work_on_cpu_safe+0x60/0x60
>  i915_handle_error+0x2d8/0x430 [i915]
>  ? vsnprintf+0xd1/0x4b0
>  ? scnprintf+0x3a/0x70
>  hangcheck_declare_hang+0xd3/0xf0 [i915]
>  ? intel_runtime_pm_put+0x56/0xa0 [i915]
>  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>  process_one_work+0x233/0x660
>  worker_thread+0x4e/0x3b0
>  kthread+0x152/0x190
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> 
> v2: Have 1 global synchronize_rcu() barrier across all engines, and
> improve commit message.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>         intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> -static void engine_set_wedged(struct intel_engine_cs *engine)
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
> -       /* We need to be sure that no thread is running the old callback as
> -        * we install the nop handler (otherwise we would submit a request
> -        * to hardware that will never complete). In order to prevent this
> -        * race, we wait until the machine is idle before making the swap
> -        * (using stop_machine()).
> -        */
> -       engine->submit_request = nop_submit_request;
> -
>         /* Mark all executing requests as skipped */
>         engine->cancel_requests(engine);
>  
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>                                        intel_engine_last_submit(engine));
>  }
>  
> -static int __i915_gem_set_wedged_BKL(void *data)
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>  {
> -       struct drm_i915_private *i915 = data;
>         struct intel_engine_cs *engine;
>         enum intel_engine_id id;
>  
>         for_each_engine(engine, i915, id)
> -               engine_set_wedged(engine);
> +               engine->submit_request = nop_submit_request;
>  
> -       set_bit(I915_WEDGED, &i915->gpu_error.flags);
> -       wake_up_all(&i915->gpu_error.reset_queue);
> +       /* Make sure no one is running the old callback before we proceed with
> +        * cancelling requests and resetting the completion tracking. Otherwise
> +        * we might submit a request to the hardware which never completes.
> +        */
> +       synchronize_rcu();
>  
> -       return 0;
> -}
> +       for_each_engine(engine, i915, id)
> +               engine_complete_requests(engine);
>  
> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> -{
> -       stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
> +       set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +       wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
>  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>         switch (state) {
>         case FENCE_COMPLETE:
>                 trace_i915_gem_request_submit(request);
> +               rcu_read_lock();
>                 request->engine->submit_request(request);
> +               rcu_read_unlock();
>                 break;
>  
>         case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>         }
>         i915_gem_request_get(vip);
>         i915_add_request(vip);
> +       rcu_read_lock();
>         request->engine->submit_request(request);
> +       rcu_read_unlock();
>  
>         mutex_unlock(&i915->drm.struct_mutex);
>  
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-06  9:17     ` Chris Wilson
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:17 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Daniel Vetter, LKML, Daniel Vetter, Thomas Gleixner, Mika Kuoppala

Quoting Daniel Vetter (2017-10-06 10:06:37)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch tries to address the locking abuse of stop_machine() from
> 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

I still want a discussion of the reason why keeping the normal path clean
and why an alternative is sought, here. That design leads into vv

> To stay as close as possible to the stop_machine semantics we first
> update all the submit function pointers to the nop handler, then call
> synchronize_rcu() to make sure no new requests can be submitted. This
> should give us exactly the huge barrier we want.
> 
> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.
> 
> This should fix the followwing lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> ------------------------------------------------------
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev->struct_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_interruptible_nested+0x1b/0x20
>        i915_mutex_lock_interruptible+0x51/0x130 [i915]
>        i915_gem_fault+0x209/0x650 [i915]
>        __do_fault+0x1e/0x80
>        __handle_mm_fault+0xa08/0xed0
>        handle_mm_fault+0x156/0x300
>        __do_page_fault+0x2c5/0x570
>        do_page_fault+0x28/0x250
>        page_fault+0x22/0x30
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xc9/0xbf0
>        cpuhp_thread_fun+0x17b/0x240
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x133/0x1c0
>        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        stop_machine+0x1c/0x40
>        i915_gem_set_wedged+0x1a/0x20 [i915]
>        i915_reset+0xb9/0x230 [i915]
>        i915_reset_device+0x1f6/0x260 [i915]
>        i915_handle_error+0x2d8/0x430 [i915]
>        hangcheck_declare_hang+0xd3/0xf0 [i915]
>        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>        process_one_work+0x233/0x660
>        worker_thread+0x4e/0x3b0
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev->struct_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(&dev->struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> Workqueue: events_long i915_hangcheck_elapsed [i915]
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  ? irq_work_queue+0x86/0xe0
>  ? wake_up_klogd+0x53/0x70
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? stop_machine+0x1c/0x40
>  ? i915_gem_object_truncate+0x50/0x50 [i915]
>  cpus_read_lock+0x3d/0xb0
>  ? stop_machine+0x1c/0x40
>  stop_machine+0x1c/0x40
>  i915_gem_set_wedged+0x1a/0x20 [i915]
>  i915_reset+0xb9/0x230 [i915]
>  i915_reset_device+0x1f6/0x260 [i915]
>  ? gen8_gt_irq_ack+0x170/0x170 [i915]
>  ? work_on_cpu_safe+0x60/0x60
>  i915_handle_error+0x2d8/0x430 [i915]
>  ? vsnprintf+0xd1/0x4b0
>  ? scnprintf+0x3a/0x70
>  hangcheck_declare_hang+0xd3/0xf0 [i915]
>  ? intel_runtime_pm_put+0x56/0xa0 [i915]
>  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>  process_one_work+0x233/0x660
>  worker_thread+0x4e/0x3b0
>  kthread+0x152/0x190
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> 
> v2: Have 1 global synchronize_rcu() barrier across all engines, and
> improve commit message.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>         intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> -static void engine_set_wedged(struct intel_engine_cs *engine)
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
> -       /* We need to be sure that no thread is running the old callback as
> -        * we install the nop handler (otherwise we would submit a request
> -        * to hardware that will never complete). In order to prevent this
> -        * race, we wait until the machine is idle before making the swap
> -        * (using stop_machine()).
> -        */
> -       engine->submit_request = nop_submit_request;
> -
>         /* Mark all executing requests as skipped */
>         engine->cancel_requests(engine);
>  
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>                                        intel_engine_last_submit(engine));
>  }
>  
> -static int __i915_gem_set_wedged_BKL(void *data)
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>  {
> -       struct drm_i915_private *i915 = data;
>         struct intel_engine_cs *engine;
>         enum intel_engine_id id;
>  
>         for_each_engine(engine, i915, id)
> -               engine_set_wedged(engine);
> +               engine->submit_request = nop_submit_request;
>  
> -       set_bit(I915_WEDGED, &i915->gpu_error.flags);
> -       wake_up_all(&i915->gpu_error.reset_queue);
> +       /* Make sure no one is running the old callback before we proceed with
> +        * cancelling requests and resetting the completion tracking. Otherwise
> +        * we might submit a request to the hardware which never completes.
> +        */
> +       synchronize_rcu();
>  
> -       return 0;
> -}
> +       for_each_engine(engine, i915, id)
> +               engine_complete_requests(engine);
>  
> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> -{
> -       stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
> +       set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +       wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
>  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>         switch (state) {
>         case FENCE_COMPLETE:
>                 trace_i915_gem_request_submit(request);
> +               rcu_read_lock();
>                 request->engine->submit_request(request);
> +               rcu_read_unlock();
>                 break;
>  
>         case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>         }
>         i915_gem_request_get(vip);
>         i915_add_request(vip);
> +       rcu_read_lock();
>         request->engine->submit_request(request);
> +       rcu_read_unlock();
>  
>         mutex_unlock(&i915->drm.struct_mutex);
>  
> -- 
> 2.14.1
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06  9:06 ` Daniel Vetter
@ 2017-10-06  9:23   ` Chris Wilson
  -1 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:23 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: LKML, Daniel Vetter, Tvrtko Ursulin, Joonas Lahtinen,
	Peter Zijlstra, Thomas Gleixner, Sasha Levin, Marta Lofstedt,
	Tejun Heo, Daniel Vetter

Quoting Daniel Vetter (2017-10-06 10:06:36)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.

But you haven't mentioned why we might want to preallocate to reduce the
mmap/mm_lock coverage anyway.
-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06  9:23   ` Chris Wilson
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:23 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: LKML, Daniel Vetter, Tvrtko Ursulin, Joonas Lahtinen,
	Peter Zijlstra, Thomas Gleixner, Sasha Levin, Marta Lofstedt,
	Tejun Heo, Daniel Vetter

Quoting Daniel Vetter (2017-10-06 10:06:36)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.

But you haven't mentioned why we might want to preallocate to reduce the
mmap/mm_lock coverage anyway.
-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06  9:06 ` Daniel Vetter
@ 2017-10-06  9:48   ` Chris Wilson
  -1 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:48 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: LKML, Daniel Vetter, Tvrtko Ursulin, Joonas Lahtinen,
	Peter Zijlstra, Thomas Gleixner, Sasha Levin, Marta Lofstedt,
	Tejun Heo, Daniel Vetter

Quoting Daniel Vetter (2017-10-06 10:06:36)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> ------------------------------------------------------
> prime_mmap/1551 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> 
> but task is already holding lock:
>  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev_priv->mm_lock){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
>        i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>        drm_ioctl_kernel+0x69/0xb0
>        drm_ioctl+0x2f9/0x3d0
>        do_vfs_ioctl+0x94/0x670
>        SyS_ioctl+0x41/0x70
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xa3/0x840
>        cpuhp_thread_fun+0x7a/0x150
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x10b/0x170
>        __cpuhp_setup_state_cpuslocked+0x134/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        apply_workqueue_attrs+0x17/0x50
>        __alloc_workqueue_key+0x1d8/0x4d9
>        i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>        i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>        drm_ioctl_kernel+0x69/0xb0
>        drm_ioctl+0x2f9/0x3d0
>        do_vfs_ioctl+0x94/0x670
>        SyS_ioctl+0x41/0x70
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev_priv->mm_lock);
>                                lock(&mm->mmap_sem);
>                                lock(&dev_priv->mm_lock);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 2 locks held by prime_mmap/1551:
>  #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
>  #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> stack backtrace:
> CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? apply_workqueue_attrs+0x17/0x50
>  cpus_read_lock+0x3d/0xb0
>  ? apply_workqueue_attrs+0x17/0x50
>  apply_workqueue_attrs+0x17/0x50
>  __alloc_workqueue_key+0x1d8/0x4d9
>  ? __lockdep_init_map+0x57/0x1c0
>  i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>  i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>  ? i915_gem_userptr_release+0x140/0x140 [i915]
>  drm_ioctl_kernel+0x69/0xb0
>  drm_ioctl+0x2f9/0x3d0
>  ? i915_gem_userptr_release+0x140/0x140 [i915]
>  ? __do_page_fault+0x2a4/0x570
>  do_vfs_ioctl+0x94/0x670
>  ? entry_SYSCALL_64_fastpath+0x5/0xb1
>  ? __this_cpu_preempt_check+0x13/0x20
>  ? trace_hardirqs_on_caller+0xe3/0x1b0
>  SyS_ioctl+0x41/0x70
>  entry_SYSCALL_64_fastpath+0x1c/0xb1
> RIP: 0033:0x7fbb83c39587
> RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
>  ? __this_cpu_preempt_check+0x13/0x20
> 
> v2: Set ret correctly when we raced with another thread.
> 
> v3: Use Chris' diff. Attach the right lockdep splat.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sasha Levin <alexander.levin@verizon.com>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Cc: Tejun Heo <tj@kernel.org>
> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939

The difference is that I would have s/Bugzilla/References/.
-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06  9:48   ` Chris Wilson
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06  9:48 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Peter Zijlstra, Daniel Vetter, LKML, Sasha Levin, Tejun Heo,
	Daniel Vetter, Thomas Gleixner

Quoting Daniel Vetter (2017-10-06 10:06:36)
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> ------------------------------------------------------
> prime_mmap/1551 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> 
> but task is already holding lock:
>  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev_priv->mm_lock){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
>        i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>        drm_ioctl_kernel+0x69/0xb0
>        drm_ioctl+0x2f9/0x3d0
>        do_vfs_ioctl+0x94/0x670
>        SyS_ioctl+0x41/0x70
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xa3/0x840
>        cpuhp_thread_fun+0x7a/0x150
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x10b/0x170
>        __cpuhp_setup_state_cpuslocked+0x134/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        apply_workqueue_attrs+0x17/0x50
>        __alloc_workqueue_key+0x1d8/0x4d9
>        i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>        i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>        drm_ioctl_kernel+0x69/0xb0
>        drm_ioctl+0x2f9/0x3d0
>        do_vfs_ioctl+0x94/0x670
>        SyS_ioctl+0x41/0x70
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev_priv->mm_lock);
>                                lock(&mm->mmap_sem);
>                                lock(&dev_priv->mm_lock);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 2 locks held by prime_mmap/1551:
>  #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
>  #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> stack backtrace:
> CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? apply_workqueue_attrs+0x17/0x50
>  cpus_read_lock+0x3d/0xb0
>  ? apply_workqueue_attrs+0x17/0x50
>  apply_workqueue_attrs+0x17/0x50
>  __alloc_workqueue_key+0x1d8/0x4d9
>  ? __lockdep_init_map+0x57/0x1c0
>  i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>  i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>  ? i915_gem_userptr_release+0x140/0x140 [i915]
>  drm_ioctl_kernel+0x69/0xb0
>  drm_ioctl+0x2f9/0x3d0
>  ? i915_gem_userptr_release+0x140/0x140 [i915]
>  ? __do_page_fault+0x2a4/0x570
>  do_vfs_ioctl+0x94/0x670
>  ? entry_SYSCALL_64_fastpath+0x5/0xb1
>  ? __this_cpu_preempt_check+0x13/0x20
>  ? trace_hardirqs_on_caller+0xe3/0x1b0
>  SyS_ioctl+0x41/0x70
>  entry_SYSCALL_64_fastpath+0x1c/0xb1
> RIP: 0033:0x7fbb83c39587
> RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
>  ? __this_cpu_preempt_check+0x13/0x20
> 
> v2: Set ret correctly when we raced with another thread.
> 
> v3: Use Chris' diff. Attach the right lockdep splat.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sasha Levin <alexander.levin@verizon.com>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Cc: Tejun Heo <tj@kernel.org>
> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939

The difference is that I would have s/Bugzilla/References/.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06  9:17     ` Chris Wilson
@ 2017-10-06 10:12       ` Thomas Gleixner
  -1 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2017-10-06 10:12 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Mika Kuoppala,
	Marta Lofstedt, Daniel Vetter, Peter Zijlstra

On Fri, 6 Oct 2017, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 10:06:37)
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch tries to address the locking abuse of stop_machine() from
> > 
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path. But these callbacks use irqsave
> > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> 
> I still want a discussion of the reason why keeping the normal path clean
> and why an alternative is sought, here. That design leads into vv

stop_machine() is the least resort when serialization problems cannot be
solved otherwise. We try to avoid it where ever we can. While on the call
site it looks simple, it's invasive in terms of locking as shown by the
lockdep splat and it's imposing latencies and other side effects on all
CPUs in the system. So if you don't have a compelling technical reason to
use it, then it _is_ the wrong tool.

As Daniel has shown it's not required, so there is no technical reason why
stomp_machine() has to be used here.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-06 10:12       ` Thomas Gleixner
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Gleixner @ 2017-10-06 10:12 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Peter Zijlstra, Daniel Vetter, Intel Graphics Development, LKML,
	Daniel Vetter, Mika Kuoppala

On Fri, 6 Oct 2017, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 10:06:37)
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch tries to address the locking abuse of stop_machine() from
> > 
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path. But these callbacks use irqsave
> > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> 
> I still want a discussion of the reason why keeping the normal path clean
> and why an alternative is sought, here. That design leads into vv

stop_machine() is the least resort when serialization problems cannot be
solved otherwise. We try to avoid it where ever we can. While on the call
site it looks simple, it's invasive in terms of locking as shown by the
lockdep splat and it's imposing latencies and other side effects on all
CPUs in the system. So if you don't have a compelling technical reason to
use it, then it _is_ the wrong tool.

As Daniel has shown it's not required, so there is no technical reason why
stomp_machine() has to be used here.

Thanks,

	tglx




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06  9:06   ` Daniel Vetter
@ 2017-10-06 11:03     ` Chris Wilson
  -1 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06 11:03 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: LKML, Daniel Vetter, Mika Kuoppala, Thomas Gleixner,
	Marta Lofstedt, Daniel Vetter

Quoting Daniel Vetter (2017-10-06 10:06:37)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch tries to address the locking abuse of stop_machine() from
> 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> 
> To stay as close as possible to the stop_machine semantics we first
> update all the submit function pointers to the nop handler, then call
> synchronize_rcu() to make sure no new requests can be submitted. This
> should give us exactly the huge barrier we want.
> 
> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.
> 
> This should fix the followwing lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> ------------------------------------------------------
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev->struct_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_interruptible_nested+0x1b/0x20
>        i915_mutex_lock_interruptible+0x51/0x130 [i915]
>        i915_gem_fault+0x209/0x650 [i915]
>        __do_fault+0x1e/0x80
>        __handle_mm_fault+0xa08/0xed0
>        handle_mm_fault+0x156/0x300
>        __do_page_fault+0x2c5/0x570
>        do_page_fault+0x28/0x250
>        page_fault+0x22/0x30
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xc9/0xbf0
>        cpuhp_thread_fun+0x17b/0x240
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x133/0x1c0
>        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        stop_machine+0x1c/0x40
>        i915_gem_set_wedged+0x1a/0x20 [i915]
>        i915_reset+0xb9/0x230 [i915]
>        i915_reset_device+0x1f6/0x260 [i915]
>        i915_handle_error+0x2d8/0x430 [i915]
>        hangcheck_declare_hang+0xd3/0xf0 [i915]
>        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>        process_one_work+0x233/0x660
>        worker_thread+0x4e/0x3b0
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev->struct_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(&dev->struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> Workqueue: events_long i915_hangcheck_elapsed [i915]
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  ? irq_work_queue+0x86/0xe0
>  ? wake_up_klogd+0x53/0x70
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? stop_machine+0x1c/0x40
>  ? i915_gem_object_truncate+0x50/0x50 [i915]
>  cpus_read_lock+0x3d/0xb0
>  ? stop_machine+0x1c/0x40
>  stop_machine+0x1c/0x40
>  i915_gem_set_wedged+0x1a/0x20 [i915]
>  i915_reset+0xb9/0x230 [i915]
>  i915_reset_device+0x1f6/0x260 [i915]
>  ? gen8_gt_irq_ack+0x170/0x170 [i915]
>  ? work_on_cpu_safe+0x60/0x60
>  i915_handle_error+0x2d8/0x430 [i915]
>  ? vsnprintf+0xd1/0x4b0
>  ? scnprintf+0x3a/0x70
>  hangcheck_declare_hang+0xd3/0xf0 [i915]
>  ? intel_runtime_pm_put+0x56/0xa0 [i915]
>  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>  process_one_work+0x233/0x660
>  worker_thread+0x4e/0x3b0
>  kthread+0x152/0x190
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> 
> v2: Have 1 global synchronize_rcu() barrier across all engines, and
> improve commit message.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>         intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> -static void engine_set_wedged(struct intel_engine_cs *engine)
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
> -       /* We need to be sure that no thread is running the old callback as
> -        * we install the nop handler (otherwise we would submit a request
> -        * to hardware that will never complete). In order to prevent this
> -        * race, we wait until the machine is idle before making the swap
> -        * (using stop_machine()).
> -        */
> -       engine->submit_request = nop_submit_request;
> -
>         /* Mark all executing requests as skipped */
>         engine->cancel_requests(engine);

How are we planning to serialise the intel_engine_init_global_seqno()
here with the in-flight nop_submit? With sufficient thrust we will get a
stale breadcrumb and an incomplete request.
-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-06 11:03     ` Chris Wilson
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Wilson @ 2017-10-06 11:03 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Daniel Vetter, LKML, Daniel Vetter, Thomas Gleixner, Mika Kuoppala

Quoting Daniel Vetter (2017-10-06 10:06:37)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch tries to address the locking abuse of stop_machine() from
> 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> 
> To stay as close as possible to the stop_machine semantics we first
> update all the submit function pointers to the nop handler, then call
> synchronize_rcu() to make sure no new requests can be submitted. This
> should give us exactly the huge barrier we want.
> 
> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.
> 
> This should fix the followwing lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> ------------------------------------------------------
> kworker/3:4/562 is trying to acquire lock:
>  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> 
> but task is already holding lock:
>  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev->struct_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_interruptible_nested+0x1b/0x20
>        i915_mutex_lock_interruptible+0x51/0x130 [i915]
>        i915_gem_fault+0x209/0x650 [i915]
>        __do_fault+0x1e/0x80
>        __handle_mm_fault+0xa08/0xed0
>        handle_mm_fault+0x156/0x300
>        __do_page_fault+0x2c5/0x570
>        do_page_fault+0x28/0x250
>        page_fault+0x22/0x30
> 
> -> #5 (&mm->mmap_sem){++++}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __might_fault+0x68/0x90
>        _copy_to_user+0x23/0x70
>        filldir+0xa5/0x120
>        dcache_readdir+0xf9/0x170
>        iterate_dir+0x69/0x1a0
>        SyS_getdents+0xa5/0x140
>        entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>        down_write+0x3b/0x70
>        handle_create+0xcb/0x1e0
>        devtmpfsd+0x139/0x180
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        wait_for_common+0x58/0x210
>        wait_for_completion+0x1d/0x20
>        devtmpfs_create_node+0x13d/0x160
>        device_add+0x5eb/0x620
>        device_create_groups_vargs+0xe0/0xf0
>        device_create+0x3a/0x40
>        msr_device_create+0x2b/0x40
>        cpuhp_invoke_callback+0xc9/0xbf0
>        cpuhp_thread_fun+0x17b/0x240
>        smpboot_thread_fn+0x18a/0x280
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state-up){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpuhp_issue_call+0x133/0x1c0
>        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_writeback_init+0x43/0x67
>        pagecache_init+0x3d/0x42
>        start_kernel+0x3a8/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        __mutex_lock+0x86/0x9b0
>        mutex_lock_nested+0x1b/0x20
>        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
>        __cpuhp_setup_state+0x46/0x60
>        page_alloc_init+0x28/0x30
>        start_kernel+0x145/0x3fc
>        x86_64_start_reservations+0x2a/0x2c
>        x86_64_start_kernel+0x6d/0x70
>        verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>        check_prev_add+0x430/0x840
>        __lock_acquire+0x1420/0x15e0
>        lock_acquire+0xb0/0x200
>        cpus_read_lock+0x3d/0xb0
>        stop_machine+0x1c/0x40
>        i915_gem_set_wedged+0x1a/0x20 [i915]
>        i915_reset+0xb9/0x230 [i915]
>        i915_reset_device+0x1f6/0x260 [i915]
>        i915_handle_error+0x2d8/0x430 [i915]
>        hangcheck_declare_hang+0xd3/0xf0 [i915]
>        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>        process_one_work+0x233/0x660
>        worker_thread+0x4e/0x3b0
>        kthread+0x152/0x190
>        ret_from_fork+0x27/0x40
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&dev->struct_mutex);
>                                lock(&mm->mmap_sem);
>                                lock(&dev->struct_mutex);
>   lock(cpu_hotplug_lock.rw_sem);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/3:4/562:
>  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
>  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> 
> stack backtrace:
> CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> Workqueue: events_long i915_hangcheck_elapsed [i915]
> Call Trace:
>  dump_stack+0x68/0x9f
>  print_circular_bug+0x235/0x3c0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  check_prev_add+0x430/0x840
>  ? irq_work_queue+0x86/0xe0
>  ? wake_up_klogd+0x53/0x70
>  __lock_acquire+0x1420/0x15e0
>  ? __lock_acquire+0x1420/0x15e0
>  ? lockdep_init_map_crosslock+0x20/0x20
>  lock_acquire+0xb0/0x200
>  ? stop_machine+0x1c/0x40
>  ? i915_gem_object_truncate+0x50/0x50 [i915]
>  cpus_read_lock+0x3d/0xb0
>  ? stop_machine+0x1c/0x40
>  stop_machine+0x1c/0x40
>  i915_gem_set_wedged+0x1a/0x20 [i915]
>  i915_reset+0xb9/0x230 [i915]
>  i915_reset_device+0x1f6/0x260 [i915]
>  ? gen8_gt_irq_ack+0x170/0x170 [i915]
>  ? work_on_cpu_safe+0x60/0x60
>  i915_handle_error+0x2d8/0x430 [i915]
>  ? vsnprintf+0xd1/0x4b0
>  ? scnprintf+0x3a/0x70
>  hangcheck_declare_hang+0xd3/0xf0 [i915]
>  ? intel_runtime_pm_put+0x56/0xa0 [i915]
>  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
>  process_one_work+0x233/0x660
>  worker_thread+0x4e/0x3b0
>  kthread+0x152/0x190
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> Setting dangerous option reset - tainting kernel
> i915 0000:00:02.0: Resetting chip after gpu hang
> 
> v2: Have 1 global synchronize_rcu() barrier across all engines, and
> improve commit message.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>         intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> -static void engine_set_wedged(struct intel_engine_cs *engine)
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
> -       /* We need to be sure that no thread is running the old callback as
> -        * we install the nop handler (otherwise we would submit a request
> -        * to hardware that will never complete). In order to prevent this
> -        * race, we wait until the machine is idle before making the swap
> -        * (using stop_machine()).
> -        */
> -       engine->submit_request = nop_submit_request;
> -
>         /* Mark all executing requests as skipped */
>         engine->cancel_requests(engine);

How are we planning to serialise the intel_engine_init_global_seqno()
here with the in-flight nop_submit? With sufficient thrust we will get a
stale breadcrumb and an incomplete request.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06  9:06   ` Daniel Vetter
                     ` (2 preceding siblings ...)
  (?)
@ 2017-10-06 11:10   ` Peter Zijlstra
  -1 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-10-06 11:10 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, LKML, Chris Wilson, Mika Kuoppala,
	Thomas Gleixner, Marta Lofstedt, Daniel Vetter

On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote:

> I pondered whether we should annotate engine->submit_request as __rcu
> and use rcu_assign_pointer and rcu_dereference on it. But the reason
> behind those is to make sure the compiler/cpu barriers are there for
> when you have an actual data structure you point at, to make sure all
> the writes are seen correctly on the read side. But we just have a
> function pointer, and .text isn't changed, so no need for these
> barriers and hence no need for annotations.

synchronize_*() provides an smp_mb() on the calling CPU and ensures an
smp_mb() on all other CPUs before completion, such that everybody agrees
on the state prior to calling syncrhonize_rcu(). So yes, no additional
ordering requirements.

>  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
>  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>  3 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..e79a6ca60265 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>  	intel_engine_init_global_seqno(request->engine, request->global_seqno);
>  }
>  
> +static void engine_complete_requests(struct intel_engine_cs *engine)
>  {
>  	/* Mark all executing requests as skipped */
>  	engine->cancel_requests(engine);
>  
> @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>  				       intel_engine_last_submit(engine));
>  }
>  
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>  {
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
>  
>  	for_each_engine(engine, i915, id)
> +		engine->submit_request = nop_submit_request;
>  
> +	/* Make sure no one is running the old callback before we proceed with
> +	 * cancelling requests and resetting the completion tracking. Otherwise
> +	 * we might submit a request to the hardware which never completes.
> +	 */

ARGH @ horrid comment style..

  http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html

:-)

> +	synchronize_rcu();
>  
> +	for_each_engine(engine, i915, id)
> +		engine_complete_requests(engine);
>  
> +	set_bit(I915_WEDGED, &i915->gpu_error.flags);
> +	wake_up_all(&i915->gpu_error.reset_queue);
>  }
>  
>  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>  	switch (state) {
>  	case FENCE_COMPLETE:
>  		trace_i915_gem_request_submit(request);
> +		rcu_read_lock();
>  		request->engine->submit_request(request);
> +		rcu_read_unlock();
>  		break;
>  
>  	case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>  	}
>  	i915_gem_request_get(vip);
>  	i915_add_request(vip);
> +	rcu_read_lock();
>  	request->engine->submit_request(request);
> +	rcu_read_unlock();
>  
>  	mutex_unlock(&i915->drm.struct_mutex);

Yes, this is a correct and good replacement, however, you said:

> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

This means that you could simply do synchronize_sched() without the
addition of rcu_read_lock()s and still be fine.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 10:12       ` Thomas Gleixner
@ 2017-10-06 11:12         ` Peter Zijlstra
  -1 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-10-06 11:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Wilson, Daniel Vetter, Intel Graphics Development, LKML,
	Mika Kuoppala, Marta Lofstedt, Daniel Vetter

On Fri, Oct 06, 2017 at 12:12:41PM +0200, Thomas Gleixner wrote:
> So if you don't have a compelling technical reason to
> use it, then it _is_ the wrong tool.

This. stop_machine() effectively takes down _all_ CPUs for the duration
of your callback. That is something you really should avoid at pretty
much any cost.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-06 11:12         ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-10-06 11:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Daniel Vetter,
	Mika Kuoppala

On Fri, Oct 06, 2017 at 12:12:41PM +0200, Thomas Gleixner wrote:
> So if you don't have a compelling technical reason to
> use it, then it _is_ the wrong tool.

This. stop_machine() effectively takes down _all_ CPUs for the duration
of your callback. That is something you really should avoid at pretty
much any cost.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06  9:06 ` Daniel Vetter
@ 2017-10-06 11:34   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2017-10-06 11:34 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: Peter Zijlstra, LKML, Tejun Heo, Daniel Vetter, Thomas Gleixner,
	Sasha Levin


On 06/10/2017 10:06, Daniel Vetter wrote:
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> ------------------------------------------------------
> prime_mmap/1551 is trying to acquire lock:
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> 
> but task is already holding lock:
>   (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev_priv->mm_lock){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #5 (&mm->mmap_sem){++++}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __might_fault+0x68/0x90
>         _copy_to_user+0x23/0x70
>         filldir+0xa5/0x120
>         dcache_readdir+0xf9/0x170
>         iterate_dir+0x69/0x1a0
>         SyS_getdents+0xa5/0x140
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>         down_write+0x3b/0x70
>         handle_create+0xcb/0x1e0
>         devtmpfsd+0x139/0x180
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         wait_for_common+0x58/0x210
>         wait_for_completion+0x1d/0x20
>         devtmpfs_create_node+0x13d/0x160
>         device_add+0x5eb/0x620
>         device_create_groups_vargs+0xe0/0xf0
>         device_create+0x3a/0x40
>         msr_device_create+0x2b/0x40
>         cpuhp_invoke_callback+0xa3/0x840
>         cpuhp_thread_fun+0x7a/0x150
>         smpboot_thread_fn+0x18a/0x280
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpuhp_issue_call+0x10b/0x170
>         __cpuhp_setup_state_cpuslocked+0x134/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_writeback_init+0x43/0x67
>         pagecache_init+0x3d/0x42
>         start_kernel+0x3a8/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         __cpuhp_setup_state_cpuslocked+0x52/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_alloc_init+0x28/0x30
>         start_kernel+0x145/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>         check_prev_add+0x430/0x840
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpus_read_lock+0x3d/0xb0
>         apply_workqueue_attrs+0x17/0x50
>         __alloc_workqueue_key+0x1d8/0x4d9
>         i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&dev_priv->mm_lock);
>                                 lock(&mm->mmap_sem);
>                                 lock(&dev_priv->mm_lock);
>    lock(cpu_hotplug_lock.rw_sem);
> 
>   *** DEADLOCK ***
> 
> 2 locks held by prime_mmap/1551:
>   #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
>   #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> stack backtrace:
> CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> Call Trace:
>   dump_stack+0x68/0x9f
>   print_circular_bug+0x235/0x3c0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   check_prev_add+0x430/0x840
>   __lock_acquire+0x1420/0x15e0
>   ? __lock_acquire+0x1420/0x15e0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   lock_acquire+0xb0/0x200
>   ? apply_workqueue_attrs+0x17/0x50
>   cpus_read_lock+0x3d/0xb0
>   ? apply_workqueue_attrs+0x17/0x50
>   apply_workqueue_attrs+0x17/0x50
>   __alloc_workqueue_key+0x1d8/0x4d9
>   ? __lockdep_init_map+0x57/0x1c0
>   i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   drm_ioctl_kernel+0x69/0xb0
>   drm_ioctl+0x2f9/0x3d0
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   ? __do_page_fault+0x2a4/0x570
>   do_vfs_ioctl+0x94/0x670
>   ? entry_SYSCALL_64_fastpath+0x5/0xb1
>   ? __this_cpu_preempt_check+0x13/0x20
>   ? trace_hardirqs_on_caller+0xe3/0x1b0
>   SyS_ioctl+0x41/0x70
>   entry_SYSCALL_64_fastpath+0x1c/0xb1
> RIP: 0033:0x7fbb83c39587
> RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
>   ? __this_cpu_preempt_check+0x13/0x20
> 
> v2: Set ret correctly when we raced with another thread.
> 
> v3: Use Chris' diff. Attach the right lockdep splat.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sasha Levin <alexander.levin@verizon.com>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Cc: Tejun Heo <tj@kernel.org>
> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
>   1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 2d4996de7331..f9b3406401af 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
>   i915_mmu_notifier_create(struct mm_struct *mm)
>   {
>   	struct i915_mmu_notifier *mn;
> -	int ret;
>   
>   	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
>   	if (mn == NULL)
> @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	 /* Protected by mmap_sem (write-lock) */
> -	ret = __mmu_notifier_register(&mn->mn, mm);
> -	if (ret) {
> -		destroy_workqueue(mn->wq);
> -		kfree(mn);
> -		return ERR_PTR(ret);
> -	}
> -
>   	return mn;
>   }
>   
> @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
>   static struct i915_mmu_notifier *
>   i915_mmu_notifier_find(struct i915_mm_struct *mm)
>   {
> -	struct i915_mmu_notifier *mn = mm->mn;
> +	struct i915_mmu_notifier *mn;
> +	int err;
>   
>   	mn = mm->mn;
>   	if (mn)
>   		return mn;
>   
> +	mn = i915_mmu_notifier_create(mm->mm);
> +	if (IS_ERR(mn))
> +		return mn;

Strictly speaking we don't want to fail just yet, only it we actually 
needed a new notifier and we failed to create it.

> +
> +	err = 0;
>   	down_write(&mm->mm->mmap_sem);
>   	mutex_lock(&mm->i915->mm_lock);
> -	if ((mn = mm->mn) == NULL) {
> -		mn = i915_mmu_notifier_create(mm->mm);
> -		if (!IS_ERR(mn))
> -			mm->mn = mn;
> +	if (mm->mn == NULL) {
> +		/* Protected by mmap_sem (write-lock) */
> +		err = __mmu_notifier_register(&mn->mn, mm->mm);
> +		if (!err) {
> +			/* Protected by mm_lock */
> +			mm->mn = fetch_and_zero(&mn);
> +		}
>   	}
>   	mutex_unlock(&mm->i915->mm_lock);
>   	up_write(&mm->mm->mmap_sem);
>   
> -	return mn;
> +	if (mn) {
> +		destroy_workqueue(mn->wq);
> +		kfree(mn);
> +	}
> +
> +	return err ? ERR_PTR(err) : mm->mn;
>   }
>   
>   static int
> 

Otherwise looks good to me.

I would also put a note in the commit on how working around the locking 
issue is also beneficial to performance with moving the allocation step 
outside the mmap_sem.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06 11:34   ` Tvrtko Ursulin
  0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2017-10-06 11:34 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: Peter Zijlstra, LKML, Sasha Levin, Tejun Heo, Daniel Vetter,
	Thomas Gleixner


On 06/10/2017 10:06, Daniel Vetter wrote:
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> ------------------------------------------------------
> prime_mmap/1551 is trying to acquire lock:
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> 
> but task is already holding lock:
>   (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev_priv->mm_lock){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #5 (&mm->mmap_sem){++++}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __might_fault+0x68/0x90
>         _copy_to_user+0x23/0x70
>         filldir+0xa5/0x120
>         dcache_readdir+0xf9/0x170
>         iterate_dir+0x69/0x1a0
>         SyS_getdents+0xa5/0x140
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>         down_write+0x3b/0x70
>         handle_create+0xcb/0x1e0
>         devtmpfsd+0x139/0x180
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         wait_for_common+0x58/0x210
>         wait_for_completion+0x1d/0x20
>         devtmpfs_create_node+0x13d/0x160
>         device_add+0x5eb/0x620
>         device_create_groups_vargs+0xe0/0xf0
>         device_create+0x3a/0x40
>         msr_device_create+0x2b/0x40
>         cpuhp_invoke_callback+0xa3/0x840
>         cpuhp_thread_fun+0x7a/0x150
>         smpboot_thread_fn+0x18a/0x280
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpuhp_issue_call+0x10b/0x170
>         __cpuhp_setup_state_cpuslocked+0x134/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_writeback_init+0x43/0x67
>         pagecache_init+0x3d/0x42
>         start_kernel+0x3a8/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         __cpuhp_setup_state_cpuslocked+0x52/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_alloc_init+0x28/0x30
>         start_kernel+0x145/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>         check_prev_add+0x430/0x840
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpus_read_lock+0x3d/0xb0
>         apply_workqueue_attrs+0x17/0x50
>         __alloc_workqueue_key+0x1d8/0x4d9
>         i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&dev_priv->mm_lock);
>                                 lock(&mm->mmap_sem);
>                                 lock(&dev_priv->mm_lock);
>    lock(cpu_hotplug_lock.rw_sem);
> 
>   *** DEADLOCK ***
> 
> 2 locks held by prime_mmap/1551:
>   #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
>   #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> stack backtrace:
> CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> Call Trace:
>   dump_stack+0x68/0x9f
>   print_circular_bug+0x235/0x3c0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   check_prev_add+0x430/0x840
>   __lock_acquire+0x1420/0x15e0
>   ? __lock_acquire+0x1420/0x15e0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   lock_acquire+0xb0/0x200
>   ? apply_workqueue_attrs+0x17/0x50
>   cpus_read_lock+0x3d/0xb0
>   ? apply_workqueue_attrs+0x17/0x50
>   apply_workqueue_attrs+0x17/0x50
>   __alloc_workqueue_key+0x1d8/0x4d9
>   ? __lockdep_init_map+0x57/0x1c0
>   i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   drm_ioctl_kernel+0x69/0xb0
>   drm_ioctl+0x2f9/0x3d0
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   ? __do_page_fault+0x2a4/0x570
>   do_vfs_ioctl+0x94/0x670
>   ? entry_SYSCALL_64_fastpath+0x5/0xb1
>   ? __this_cpu_preempt_check+0x13/0x20
>   ? trace_hardirqs_on_caller+0xe3/0x1b0
>   SyS_ioctl+0x41/0x70
>   entry_SYSCALL_64_fastpath+0x1c/0xb1
> RIP: 0033:0x7fbb83c39587
> RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
>   ? __this_cpu_preempt_check+0x13/0x20
> 
> v2: Set ret correctly when we raced with another thread.
> 
> v3: Use Chris' diff. Attach the right lockdep splat.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sasha Levin <alexander.levin@verizon.com>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Cc: Tejun Heo <tj@kernel.org>
> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
>   1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 2d4996de7331..f9b3406401af 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
>   i915_mmu_notifier_create(struct mm_struct *mm)
>   {
>   	struct i915_mmu_notifier *mn;
> -	int ret;
>   
>   	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
>   	if (mn == NULL)
> @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	 /* Protected by mmap_sem (write-lock) */
> -	ret = __mmu_notifier_register(&mn->mn, mm);
> -	if (ret) {
> -		destroy_workqueue(mn->wq);
> -		kfree(mn);
> -		return ERR_PTR(ret);
> -	}
> -
>   	return mn;
>   }
>   
> @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
>   static struct i915_mmu_notifier *
>   i915_mmu_notifier_find(struct i915_mm_struct *mm)
>   {
> -	struct i915_mmu_notifier *mn = mm->mn;
> +	struct i915_mmu_notifier *mn;
> +	int err;
>   
>   	mn = mm->mn;
>   	if (mn)
>   		return mn;
>   
> +	mn = i915_mmu_notifier_create(mm->mm);
> +	if (IS_ERR(mn))
> +		return mn;

Strictly speaking we don't want to fail just yet, only it we actually 
needed a new notifier and we failed to create it.

> +
> +	err = 0;
>   	down_write(&mm->mm->mmap_sem);
>   	mutex_lock(&mm->i915->mm_lock);
> -	if ((mn = mm->mn) == NULL) {
> -		mn = i915_mmu_notifier_create(mm->mm);
> -		if (!IS_ERR(mn))
> -			mm->mn = mn;
> +	if (mm->mn == NULL) {
> +		/* Protected by mmap_sem (write-lock) */
> +		err = __mmu_notifier_register(&mn->mn, mm->mm);
> +		if (!err) {
> +			/* Protected by mm_lock */
> +			mm->mn = fetch_and_zero(&mn);
> +		}
>   	}
>   	mutex_unlock(&mm->i915->mm_lock);
>   	up_write(&mm->mm->mmap_sem);
>   
> -	return mn;
> +	if (mn) {
> +		destroy_workqueue(mn->wq);
> +		kfree(mn);
> +	}
> +
> +	return err ? ERR_PTR(err) : mm->mn;
>   }
>   
>   static int
> 

Otherwise looks good to me.

I would also put a note in the commit on how working around the locking 
issue is also beneficial to performance with moving the allocation step 
outside the mmap_sem.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06  9:06 ` Daniel Vetter
                   ` (4 preceding siblings ...)
  (?)
@ 2017-10-06 12:50 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2017-10-06 12:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
URL   : https://patchwork.freedesktop.org/series/31476/
State : warning

== Summary ==

Series 31476v1 series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
https://patchwork.freedesktop.org/api/1.0/series/31476/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                pass       -> DMESG-WARN (fi-bxt-dsi)
Test kms_frontbuffer_tracking:
        Subgroup basic:
                dmesg-warn -> PASS       (fi-bdw-5557u) fdo#102473
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-n2820) fdo#101705

fdo#102473 https://bugs.freedesktop.org/show_bug.cgi?id=102473
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:456s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:471s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:395s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:576s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:287s
fi-bxt-dsi       total:289  pass:258  dwarn:1   dfail:0   fail:0   skip:30  time:526s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:524s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:543s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:528s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:567s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:625s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:437s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:594s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:440s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:417s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:509s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:475s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:498s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:578s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:493s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:595s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:660s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:467s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:660s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:536s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:518s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:466s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:584s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:429s

f0ca28b3597bea929c1e58e05eb49a75881d9502 drm-tip: 2017y-10m-06d-11h-12m-22s UTC integration manifest
d3d5abd754a3 drm/i915: Use rcu instead of stop_machine in set_wedged
0b4e7df35196 drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5926/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 10:12       ` Thomas Gleixner
  (?)
  (?)
@ 2017-10-06 14:12       ` Daniel Vetter
  -1 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06 14:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Chris Wilson, Daniel Vetter, Intel Graphics Development, LKML,
	Mika Kuoppala, Marta Lofstedt, Daniel Vetter, Peter Zijlstra

On Fri, Oct 06, 2017 at 12:12:41PM +0200, Thomas Gleixner wrote:
> On Fri, 6 Oct 2017, Chris Wilson wrote:
> > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > stop_machine is not really a locking primitive we should use, except
> > > when the hw folks tell us the hw is broken and that's the only way to
> > > work around it.
> > > 
> > > This patch tries to address the locking abuse of stop_machine() from
> > > 
> > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > 
> > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > 
> > > Chris said parts of the reasons for going with stop_machine() was that
> > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > 
> > I still want a discussion of the reason why keeping the normal path clean
> > and why an alternative is sought, here. That design leads into vv
> 
> stop_machine() is the least resort when serialization problems cannot be
> solved otherwise. We try to avoid it where ever we can. While on the call
> site it looks simple, it's invasive in terms of locking as shown by the
> lockdep splat and it's imposing latencies and other side effects on all
> CPUs in the system. So if you don't have a compelling technical reason to
> use it, then it _is_ the wrong tool.
> 
> As Daniel has shown it's not required, so there is no technical reason why
> stomp_machine() has to be used here.

Well I'm not sure yet whether my fix is actually correct :-)

But imo there's a bunch more reason why stop_machine is uncool, beyond
just the "it's a huge shotgun which doesn't play well with anything else"
aspect:

- What we actually seem to want is to make sure that all the
  engine->submit_request have completed, which happen to all run in
  hardirq context. It's an artifact of stop_machine that it completes all
  hardirq handlers, but afaiui stop_machine is really just aimed at
  getting all cpus to execute a specific well know loop (so that your
  callback can start patching .text and other evil stuff). If we move our
  callback into a thread that gets preempted, we have a problem.

- As a consequence, no lockdep annotations for the locking we actually
  want. And since this is for gpu hang recovery (something relatively rare
  that just _has_ to work) we really need all the support from all the
  debug tools we can get to catch possible issues.

- Another consequence is that the read side critical sections aren't
  annotated in the code. That makes it ever so more likely that a redesign
  moves them out of hardirq context and breaks it all.

- Not relevant here (I think), but stop_machine doesn't remove the need
  for read-side (compiler) barriers. Not relevant here I think, but in
  other cases we might still need to sprinkle READ_ONCE all over to make
  sure gcc doesn't realod and create races that way.

rcu has all these bits covered, is maintained by very smart people, and
the overhead is somewhere between 0 and a cacheline access that we touch
anyway (preempt_count is also wrangled by our spinlocks in all the
callbacks). No way this will ever show up against all the mmio writes the
callback does anyway.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 11:03     ` Chris Wilson
  (?)
@ 2017-10-06 14:20     ` Daniel Vetter
  2017-10-06 17:29       ` Chris Wilson
  2017-10-06 17:37       ` Chris Wilson
  -1 siblings, 2 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06 14:20 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Mika Kuoppala,
	Thomas Gleixner, Marta Lofstedt, Daniel Vetter

On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 10:06:37)
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch tries to address the locking abuse of stop_machine() from
> > 
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path. But these callbacks use irqsave
> > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > 
> > To stay as close as possible to the stop_machine semantics we first
> > update all the submit function pointers to the nop handler, then call
> > synchronize_rcu() to make sure no new requests can be submitted. This
> > should give us exactly the huge barrier we want.
> > 
> > I pondered whether we should annotate engine->submit_request as __rcu
> > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > behind those is to make sure the compiler/cpu barriers are there for
> > when you have an actual data structure you point at, to make sure all
> > the writes are seen correctly on the read side. But we just have a
> > function pointer, and .text isn't changed, so no need for these
> > barriers and hence no need for annotations.
> > 
> > This should fix the followwing lockdep splat:
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > ------------------------------------------------------
> > kworker/3:4/562 is trying to acquire lock:
> >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > 
> > but task is already holding lock:
> >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > 
> > which lock already depends on the new lock.
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #6 (&dev->struct_mutex){+.+.}:
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        __mutex_lock+0x86/0x9b0
> >        mutex_lock_interruptible_nested+0x1b/0x20
> >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> >        i915_gem_fault+0x209/0x650 [i915]
> >        __do_fault+0x1e/0x80
> >        __handle_mm_fault+0xa08/0xed0
> >        handle_mm_fault+0x156/0x300
> >        __do_page_fault+0x2c5/0x570
> >        do_page_fault+0x28/0x250
> >        page_fault+0x22/0x30
> > 
> > -> #5 (&mm->mmap_sem){++++}:
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        __might_fault+0x68/0x90
> >        _copy_to_user+0x23/0x70
> >        filldir+0xa5/0x120
> >        dcache_readdir+0xf9/0x170
> >        iterate_dir+0x69/0x1a0
> >        SyS_getdents+0xa5/0x140
> >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> >        down_write+0x3b/0x70
> >        handle_create+0xcb/0x1e0
> >        devtmpfsd+0x139/0x180
> >        kthread+0x152/0x190
> >        ret_from_fork+0x27/0x40
> > 
> > -> #3 ((complete)&req.done){+.+.}:
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        wait_for_common+0x58/0x210
> >        wait_for_completion+0x1d/0x20
> >        devtmpfs_create_node+0x13d/0x160
> >        device_add+0x5eb/0x620
> >        device_create_groups_vargs+0xe0/0xf0
> >        device_create+0x3a/0x40
> >        msr_device_create+0x2b/0x40
> >        cpuhp_invoke_callback+0xc9/0xbf0
> >        cpuhp_thread_fun+0x17b/0x240
> >        smpboot_thread_fn+0x18a/0x280
> >        kthread+0x152/0x190
> >        ret_from_fork+0x27/0x40
> > 
> > -> #2 (cpuhp_state-up){+.+.}:
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        cpuhp_issue_call+0x133/0x1c0
> >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> >        __cpuhp_setup_state+0x46/0x60
> >        page_writeback_init+0x43/0x67
> >        pagecache_init+0x3d/0x42
> >        start_kernel+0x3a8/0x3fc
> >        x86_64_start_reservations+0x2a/0x2c
> >        x86_64_start_kernel+0x6d/0x70
> >        verify_cpu+0x0/0xfb
> > 
> > -> #1 (cpuhp_state_mutex){+.+.}:
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        __mutex_lock+0x86/0x9b0
> >        mutex_lock_nested+0x1b/0x20
> >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> >        __cpuhp_setup_state+0x46/0x60
> >        page_alloc_init+0x28/0x30
> >        start_kernel+0x145/0x3fc
> >        x86_64_start_reservations+0x2a/0x2c
> >        x86_64_start_kernel+0x6d/0x70
> >        verify_cpu+0x0/0xfb
> > 
> > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> >        check_prev_add+0x430/0x840
> >        __lock_acquire+0x1420/0x15e0
> >        lock_acquire+0xb0/0x200
> >        cpus_read_lock+0x3d/0xb0
> >        stop_machine+0x1c/0x40
> >        i915_gem_set_wedged+0x1a/0x20 [i915]
> >        i915_reset+0xb9/0x230 [i915]
> >        i915_reset_device+0x1f6/0x260 [i915]
> >        i915_handle_error+0x2d8/0x430 [i915]
> >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> >        process_one_work+0x233/0x660
> >        worker_thread+0x4e/0x3b0
> >        kthread+0x152/0x190
> >        ret_from_fork+0x27/0x40
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > 
> >  Possible unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&dev->struct_mutex);
> >                                lock(&mm->mmap_sem);
> >                                lock(&dev->struct_mutex);
> >   lock(cpu_hotplug_lock.rw_sem);
> > 
> >  *** DEADLOCK ***
> > 
> > 3 locks held by kworker/3:4/562:
> >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > 
> > stack backtrace:
> > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > Call Trace:
> >  dump_stack+0x68/0x9f
> >  print_circular_bug+0x235/0x3c0
> >  ? lockdep_init_map_crosslock+0x20/0x20
> >  check_prev_add+0x430/0x840
> >  ? irq_work_queue+0x86/0xe0
> >  ? wake_up_klogd+0x53/0x70
> >  __lock_acquire+0x1420/0x15e0
> >  ? __lock_acquire+0x1420/0x15e0
> >  ? lockdep_init_map_crosslock+0x20/0x20
> >  lock_acquire+0xb0/0x200
> >  ? stop_machine+0x1c/0x40
> >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> >  cpus_read_lock+0x3d/0xb0
> >  ? stop_machine+0x1c/0x40
> >  stop_machine+0x1c/0x40
> >  i915_gem_set_wedged+0x1a/0x20 [i915]
> >  i915_reset+0xb9/0x230 [i915]
> >  i915_reset_device+0x1f6/0x260 [i915]
> >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> >  ? work_on_cpu_safe+0x60/0x60
> >  i915_handle_error+0x2d8/0x430 [i915]
> >  ? vsnprintf+0xd1/0x4b0
> >  ? scnprintf+0x3a/0x70
> >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> >  process_one_work+0x233/0x660
> >  worker_thread+0x4e/0x3b0
> >  kthread+0x152/0x190
> >  ? process_one_work+0x660/0x660
> >  ? kthread_create_on_node+0x40/0x40
> >  ret_from_fork+0x27/0x40
> > Setting dangerous option reset - tainting kernel
> > i915 0000:00:02.0: Resetting chip after gpu hang
> > Setting dangerous option reset - tainting kernel
> > i915 0000:00:02.0: Resetting chip after gpu hang
> > 
> > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > improve commit message.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> >  3 files changed, 16 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ab8c6946fea4..e79a6ca60265 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> >  }
> >  
> > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > +static void engine_complete_requests(struct intel_engine_cs *engine)
> >  {
> > -       /* We need to be sure that no thread is running the old callback as
> > -        * we install the nop handler (otherwise we would submit a request
> > -        * to hardware that will never complete). In order to prevent this
> > -        * race, we wait until the machine is idle before making the swap
> > -        * (using stop_machine()).
> > -        */
> > -       engine->submit_request = nop_submit_request;
> > -
> >         /* Mark all executing requests as skipped */
> >         engine->cancel_requests(engine);
> 
> How are we planning to serialise the intel_engine_init_global_seqno()
> here with the in-flight nop_submit? With sufficient thrust we will get a
> stale breadcrumb and an incomplete request.

Yeah that part looks indeed fishy. Well the entire "let the nop handler
fake-complete requests" logic is something I don't really understand. I
guess there's an exclusive relationship between requests handled directly
(and cancelled in engine->cancel_request) and requests with external
dma_fence dependencies.

But then I'm not really seeing what I'm changing, since even with the stop
machine you might end up with a bunch of requests depending upon external
fences, which then all complete at roughly the same time and race multiple
calls to intel_engine_init_global_seqno with each another.

With the fake submission, do we really need to call intel_engine_init_global_seqno?

So yeah, no idea, but pretty sure I didn't make it worse.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06 11:34   ` Tvrtko Ursulin
@ 2017-10-06 14:23     ` Daniel Vetter
  -1 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06 14:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Daniel Vetter, Intel Graphics Development, Peter Zijlstra, LKML,
	Tejun Heo, Daniel Vetter, Thomas Gleixner, Sasha Levin

On Fri, Oct 06, 2017 at 12:34:02PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/10/2017 10:06, Daniel Vetter wrote:
> > 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> > seems to have uncovered a few more rules about what is allowed and
> > isn't.
> > 
> > This one here seems to indicate that allocating a work-queue while
> > holding mmap_sem is a no-go, so let's try to preallocate it.
> > 
> > Of course another way to break this chain would be somewhere in the
> > cpu hotplug code, since this isn't the only trace we're finding now
> > which goes through msr_create_device.
> > 
> > Full lockdep splat:
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> > ------------------------------------------------------
> > prime_mmap/1551 is trying to acquire lock:
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> > 
> > but task is already holding lock:
> >   (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> > 
> > which lock already depends on the new lock.
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #6 (&dev_priv->mm_lock){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __mutex_lock+0x86/0x9b0
> >         mutex_lock_nested+0x1b/0x20
> >         i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> >         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >         drm_ioctl_kernel+0x69/0xb0
> >         drm_ioctl+0x2f9/0x3d0
> >         do_vfs_ioctl+0x94/0x670
> >         SyS_ioctl+0x41/0x70
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > -> #5 (&mm->mmap_sem){++++}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __might_fault+0x68/0x90
> >         _copy_to_user+0x23/0x70
> >         filldir+0xa5/0x120
> >         dcache_readdir+0xf9/0x170
> >         iterate_dir+0x69/0x1a0
> >         SyS_getdents+0xa5/0x140
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> >         down_write+0x3b/0x70
> >         handle_create+0xcb/0x1e0
> >         devtmpfsd+0x139/0x180
> >         kthread+0x152/0x190
> >         ret_from_fork+0x27/0x40
> > 
> > -> #3 ((complete)&req.done){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         wait_for_common+0x58/0x210
> >         wait_for_completion+0x1d/0x20
> >         devtmpfs_create_node+0x13d/0x160
> >         device_add+0x5eb/0x620
> >         device_create_groups_vargs+0xe0/0xf0
> >         device_create+0x3a/0x40
> >         msr_device_create+0x2b/0x40
> >         cpuhp_invoke_callback+0xa3/0x840
> >         cpuhp_thread_fun+0x7a/0x150
> >         smpboot_thread_fn+0x18a/0x280
> >         kthread+0x152/0x190
> >         ret_from_fork+0x27/0x40
> > 
> > -> #2 (cpuhp_state){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         cpuhp_issue_call+0x10b/0x170
> >         __cpuhp_setup_state_cpuslocked+0x134/0x2a0
> >         __cpuhp_setup_state+0x46/0x60
> >         page_writeback_init+0x43/0x67
> >         pagecache_init+0x3d/0x42
> >         start_kernel+0x3a8/0x3fc
> >         x86_64_start_reservations+0x2a/0x2c
> >         x86_64_start_kernel+0x6d/0x70
> >         verify_cpu+0x0/0xfb
> > 
> > -> #1 (cpuhp_state_mutex){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __mutex_lock+0x86/0x9b0
> >         mutex_lock_nested+0x1b/0x20
> >         __cpuhp_setup_state_cpuslocked+0x52/0x2a0
> >         __cpuhp_setup_state+0x46/0x60
> >         page_alloc_init+0x28/0x30
> >         start_kernel+0x145/0x3fc
> >         x86_64_start_reservations+0x2a/0x2c
> >         x86_64_start_kernel+0x6d/0x70
> >         verify_cpu+0x0/0xfb
> > 
> > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> >         check_prev_add+0x430/0x840
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         cpus_read_lock+0x3d/0xb0
> >         apply_workqueue_attrs+0x17/0x50
> >         __alloc_workqueue_key+0x1d8/0x4d9
> >         i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
> >         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >         drm_ioctl_kernel+0x69/0xb0
> >         drm_ioctl+0x2f9/0x3d0
> >         do_vfs_ioctl+0x94/0x670
> >         SyS_ioctl+0x41/0x70
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >    cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> > 
> >   Possible unsafe locking scenario:
> > 
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(&dev_priv->mm_lock);
> >                                 lock(&mm->mmap_sem);
> >                                 lock(&dev_priv->mm_lock);
> >    lock(cpu_hotplug_lock.rw_sem);
> > 
> >   *** DEADLOCK ***
> > 
> > 2 locks held by prime_mmap/1551:
> >   #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
> >   #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> > 
> > stack backtrace:
> > CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> > Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> > Call Trace:
> >   dump_stack+0x68/0x9f
> >   print_circular_bug+0x235/0x3c0
> >   ? lockdep_init_map_crosslock+0x20/0x20
> >   check_prev_add+0x430/0x840
> >   __lock_acquire+0x1420/0x15e0
> >   ? __lock_acquire+0x1420/0x15e0
> >   ? lockdep_init_map_crosslock+0x20/0x20
> >   lock_acquire+0xb0/0x200
> >   ? apply_workqueue_attrs+0x17/0x50
> >   cpus_read_lock+0x3d/0xb0
> >   ? apply_workqueue_attrs+0x17/0x50
> >   apply_workqueue_attrs+0x17/0x50
> >   __alloc_workqueue_key+0x1d8/0x4d9
> >   ? __lockdep_init_map+0x57/0x1c0
> >   i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
> >   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >   ? i915_gem_userptr_release+0x140/0x140 [i915]
> >   drm_ioctl_kernel+0x69/0xb0
> >   drm_ioctl+0x2f9/0x3d0
> >   ? i915_gem_userptr_release+0x140/0x140 [i915]
> >   ? __do_page_fault+0x2a4/0x570
> >   do_vfs_ioctl+0x94/0x670
> >   ? entry_SYSCALL_64_fastpath+0x5/0xb1
> >   ? __this_cpu_preempt_check+0x13/0x20
> >   ? trace_hardirqs_on_caller+0xe3/0x1b0
> >   SyS_ioctl+0x41/0x70
> >   entry_SYSCALL_64_fastpath+0x1c/0xb1
> > RIP: 0033:0x7fbb83c39587
> > RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> > RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> > RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> > R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
> >   ? __this_cpu_preempt_check+0x13/0x20
> > 
> > v2: Set ret correctly when we raced with another thread.
> > 
> > v3: Use Chris' diff. Attach the right lockdep splat.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Sasha Levin <alexander.levin@verizon.com>
> > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
> >   1 file changed, 20 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index 2d4996de7331..f9b3406401af 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
> >   i915_mmu_notifier_create(struct mm_struct *mm)
> >   {
> >   	struct i915_mmu_notifier *mn;
> > -	int ret;
> >   	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
> >   	if (mn == NULL)
> > @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
> >   		return ERR_PTR(-ENOMEM);
> >   	}
> > -	 /* Protected by mmap_sem (write-lock) */
> > -	ret = __mmu_notifier_register(&mn->mn, mm);
> > -	if (ret) {
> > -		destroy_workqueue(mn->wq);
> > -		kfree(mn);
> > -		return ERR_PTR(ret);
> > -	}
> > -
> >   	return mn;
> >   }
> > @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
> >   static struct i915_mmu_notifier *
> >   i915_mmu_notifier_find(struct i915_mm_struct *mm)
> >   {
> > -	struct i915_mmu_notifier *mn = mm->mn;
> > +	struct i915_mmu_notifier *mn;
> > +	int err;
> >   	mn = mm->mn;
> >   	if (mn)
> >   		return mn;
> > +	mn = i915_mmu_notifier_create(mm->mm);
> > +	if (IS_ERR(mn))
> > +		return mn;
> 
> Strictly speaking we don't want to fail just yet, only it we actually needed
> a new notifier and we failed to create it.

The check 2 lines above not good enough? It's somewhat racy, but I'm not
sure what value we provide by being perfectly correct against low memory.
This thread racing against a 2nd one, where the minimal allocation of the
2nd one pushed us perfectly over the oom threshold seems a very unlikely
scenario.

Also, small allocations actually never fail :-)

> 
> > +
> > +	err = 0;
> >   	down_write(&mm->mm->mmap_sem);
> >   	mutex_lock(&mm->i915->mm_lock);
> > -	if ((mn = mm->mn) == NULL) {
> > -		mn = i915_mmu_notifier_create(mm->mm);
> > -		if (!IS_ERR(mn))
> > -			mm->mn = mn;
> > +	if (mm->mn == NULL) {
> > +		/* Protected by mmap_sem (write-lock) */
> > +		err = __mmu_notifier_register(&mn->mn, mm->mm);
> > +		if (!err) {
> > +			/* Protected by mm_lock */
> > +			mm->mn = fetch_and_zero(&mn);
> > +		}
> >   	}
> >   	mutex_unlock(&mm->i915->mm_lock);
> >   	up_write(&mm->mm->mmap_sem);
> > -	return mn;
> > +	if (mn) {
> > +		destroy_workqueue(mn->wq);
> > +		kfree(mn);
> > +	}
> > +
> > +	return err ? ERR_PTR(err) : mm->mn;
> >   }
> >   static int
> > 
> 
> Otherwise looks good to me.
> 
> I would also put a note in the commit on how working around the locking
> issue is also beneficial to performance with moving the allocation step
> outside the mmap_sem.

Yeah Chris brought that up too, I don't really buy it given how
heavy-weight __mmu_notifier_register is. But I can add something like:

"This also has the minor benefit of slightly reducing the critical
section where we hold mmap_sem."

r-b with that added to the commit message?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
@ 2017-10-06 14:23     ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06 14:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Peter Zijlstra, Daniel Vetter, Intel Graphics Development, LKML,
	Sasha Levin, Tejun Heo, Daniel Vetter, Thomas Gleixner

On Fri, Oct 06, 2017 at 12:34:02PM +0100, Tvrtko Ursulin wrote:
> 
> On 06/10/2017 10:06, Daniel Vetter wrote:
> > 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> > seems to have uncovered a few more rules about what is allowed and
> > isn't.
> > 
> > This one here seems to indicate that allocating a work-queue while
> > holding mmap_sem is a no-go, so let's try to preallocate it.
> > 
> > Of course another way to break this chain would be somewhere in the
> > cpu hotplug code, since this isn't the only trace we're finding now
> > which goes through msr_create_device.
> > 
> > Full lockdep splat:
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> > ------------------------------------------------------
> > prime_mmap/1551 is trying to acquire lock:
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> > 
> > but task is already holding lock:
> >   (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> > 
> > which lock already depends on the new lock.
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #6 (&dev_priv->mm_lock){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __mutex_lock+0x86/0x9b0
> >         mutex_lock_nested+0x1b/0x20
> >         i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> >         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >         drm_ioctl_kernel+0x69/0xb0
> >         drm_ioctl+0x2f9/0x3d0
> >         do_vfs_ioctl+0x94/0x670
> >         SyS_ioctl+0x41/0x70
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > -> #5 (&mm->mmap_sem){++++}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __might_fault+0x68/0x90
> >         _copy_to_user+0x23/0x70
> >         filldir+0xa5/0x120
> >         dcache_readdir+0xf9/0x170
> >         iterate_dir+0x69/0x1a0
> >         SyS_getdents+0xa5/0x140
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> >         down_write+0x3b/0x70
> >         handle_create+0xcb/0x1e0
> >         devtmpfsd+0x139/0x180
> >         kthread+0x152/0x190
> >         ret_from_fork+0x27/0x40
> > 
> > -> #3 ((complete)&req.done){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         wait_for_common+0x58/0x210
> >         wait_for_completion+0x1d/0x20
> >         devtmpfs_create_node+0x13d/0x160
> >         device_add+0x5eb/0x620
> >         device_create_groups_vargs+0xe0/0xf0
> >         device_create+0x3a/0x40
> >         msr_device_create+0x2b/0x40
> >         cpuhp_invoke_callback+0xa3/0x840
> >         cpuhp_thread_fun+0x7a/0x150
> >         smpboot_thread_fn+0x18a/0x280
> >         kthread+0x152/0x190
> >         ret_from_fork+0x27/0x40
> > 
> > -> #2 (cpuhp_state){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         cpuhp_issue_call+0x10b/0x170
> >         __cpuhp_setup_state_cpuslocked+0x134/0x2a0
> >         __cpuhp_setup_state+0x46/0x60
> >         page_writeback_init+0x43/0x67
> >         pagecache_init+0x3d/0x42
> >         start_kernel+0x3a8/0x3fc
> >         x86_64_start_reservations+0x2a/0x2c
> >         x86_64_start_kernel+0x6d/0x70
> >         verify_cpu+0x0/0xfb
> > 
> > -> #1 (cpuhp_state_mutex){+.+.}:
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         __mutex_lock+0x86/0x9b0
> >         mutex_lock_nested+0x1b/0x20
> >         __cpuhp_setup_state_cpuslocked+0x52/0x2a0
> >         __cpuhp_setup_state+0x46/0x60
> >         page_alloc_init+0x28/0x30
> >         start_kernel+0x145/0x3fc
> >         x86_64_start_reservations+0x2a/0x2c
> >         x86_64_start_kernel+0x6d/0x70
> >         verify_cpu+0x0/0xfb
> > 
> > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> >         check_prev_add+0x430/0x840
> >         __lock_acquire+0x1420/0x15e0
> >         lock_acquire+0xb0/0x200
> >         cpus_read_lock+0x3d/0xb0
> >         apply_workqueue_attrs+0x17/0x50
> >         __alloc_workqueue_key+0x1d8/0x4d9
> >         i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
> >         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >         drm_ioctl_kernel+0x69/0xb0
> >         drm_ioctl+0x2f9/0x3d0
> >         do_vfs_ioctl+0x94/0x670
> >         SyS_ioctl+0x41/0x70
> >         entry_SYSCALL_64_fastpath+0x1c/0xb1
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >    cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> > 
> >   Possible unsafe locking scenario:
> > 
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(&dev_priv->mm_lock);
> >                                 lock(&mm->mmap_sem);
> >                                 lock(&dev_priv->mm_lock);
> >    lock(cpu_hotplug_lock.rw_sem);
> > 
> >   *** DEADLOCK ***
> > 
> > 2 locks held by prime_mmap/1551:
> >   #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
> >   #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> > 
> > stack backtrace:
> > CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> > Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> > Call Trace:
> >   dump_stack+0x68/0x9f
> >   print_circular_bug+0x235/0x3c0
> >   ? lockdep_init_map_crosslock+0x20/0x20
> >   check_prev_add+0x430/0x840
> >   __lock_acquire+0x1420/0x15e0
> >   ? __lock_acquire+0x1420/0x15e0
> >   ? lockdep_init_map_crosslock+0x20/0x20
> >   lock_acquire+0xb0/0x200
> >   ? apply_workqueue_attrs+0x17/0x50
> >   cpus_read_lock+0x3d/0xb0
> >   ? apply_workqueue_attrs+0x17/0x50
> >   apply_workqueue_attrs+0x17/0x50
> >   __alloc_workqueue_key+0x1d8/0x4d9
> >   ? __lockdep_init_map+0x57/0x1c0
> >   i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
> >   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
> >   ? i915_gem_userptr_release+0x140/0x140 [i915]
> >   drm_ioctl_kernel+0x69/0xb0
> >   drm_ioctl+0x2f9/0x3d0
> >   ? i915_gem_userptr_release+0x140/0x140 [i915]
> >   ? __do_page_fault+0x2a4/0x570
> >   do_vfs_ioctl+0x94/0x670
> >   ? entry_SYSCALL_64_fastpath+0x5/0xb1
> >   ? __this_cpu_preempt_check+0x13/0x20
> >   ? trace_hardirqs_on_caller+0xe3/0x1b0
> >   SyS_ioctl+0x41/0x70
> >   entry_SYSCALL_64_fastpath+0x1c/0xb1
> > RIP: 0033:0x7fbb83c39587
> > RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> > RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> > RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> > R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
> >   ? __this_cpu_preempt_check+0x13/0x20
> > 
> > v2: Set ret correctly when we raced with another thread.
> > 
> > v3: Use Chris' diff. Attach the right lockdep splat.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Sasha Levin <alexander.levin@verizon.com>
> > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
> >   1 file changed, 20 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > index 2d4996de7331..f9b3406401af 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
> >   i915_mmu_notifier_create(struct mm_struct *mm)
> >   {
> >   	struct i915_mmu_notifier *mn;
> > -	int ret;
> >   	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
> >   	if (mn == NULL)
> > @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
> >   		return ERR_PTR(-ENOMEM);
> >   	}
> > -	 /* Protected by mmap_sem (write-lock) */
> > -	ret = __mmu_notifier_register(&mn->mn, mm);
> > -	if (ret) {
> > -		destroy_workqueue(mn->wq);
> > -		kfree(mn);
> > -		return ERR_PTR(ret);
> > -	}
> > -
> >   	return mn;
> >   }
> > @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
> >   static struct i915_mmu_notifier *
> >   i915_mmu_notifier_find(struct i915_mm_struct *mm)
> >   {
> > -	struct i915_mmu_notifier *mn = mm->mn;
> > +	struct i915_mmu_notifier *mn;
> > +	int err;
> >   	mn = mm->mn;
> >   	if (mn)
> >   		return mn;
> > +	mn = i915_mmu_notifier_create(mm->mm);
> > +	if (IS_ERR(mn))
> > +		return mn;
> 
> Strictly speaking we don't want to fail just yet, only it we actually needed
> a new notifier and we failed to create it.

The check 2 lines above not good enough? It's somewhat racy, but I'm not
sure what value we provide by being perfectly correct against low memory.
This thread racing against a 2nd one, where the minimal allocation of the
2nd one pushed us perfectly over the oom threshold seems a very unlikely
scenario.

Also, small allocations actually never fail :-)

> 
> > +
> > +	err = 0;
> >   	down_write(&mm->mm->mmap_sem);
> >   	mutex_lock(&mm->i915->mm_lock);
> > -	if ((mn = mm->mn) == NULL) {
> > -		mn = i915_mmu_notifier_create(mm->mm);
> > -		if (!IS_ERR(mn))
> > -			mm->mn = mn;
> > +	if (mm->mn == NULL) {
> > +		/* Protected by mmap_sem (write-lock) */
> > +		err = __mmu_notifier_register(&mn->mn, mm->mm);
> > +		if (!err) {
> > +			/* Protected by mm_lock */
> > +			mm->mn = fetch_and_zero(&mn);
> > +		}
> >   	}
> >   	mutex_unlock(&mm->i915->mm_lock);
> >   	up_write(&mm->mm->mmap_sem);
> > -	return mn;
> > +	if (mn) {
> > +		destroy_workqueue(mn->wq);
> > +		kfree(mn);
> > +	}
> > +
> > +	return err ? ERR_PTR(err) : mm->mn;
> >   }
> >   static int
> > 
> 
> Otherwise looks good to me.
> 
> I would also put a note in the commit on how working around the locking
> issue is also beneficial to performance with moving the allocation step
> outside the mmap_sem.

Yeah Chris brought that up too, I don't really buy it given how
heavy-weight __mmu_notifier_register is. But I can add something like:

"This also has the minor benefit of slightly reducing the critical
section where we hold mmap_sem."

r-b with that added to the commit message?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06 14:23     ` Daniel Vetter
  (?)
@ 2017-10-06 14:44     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2017-10-06 14:44 UTC (permalink / raw)
  To: Intel Graphics Development, Peter Zijlstra, LKML, Tejun Heo,
	Daniel Vetter, Thomas Gleixner, Sasha Levin


On 06/10/2017 15:23, Daniel Vetter wrote:
> On Fri, Oct 06, 2017 at 12:34:02PM +0100, Tvrtko Ursulin wrote:
>>
>> On 06/10/2017 10:06, Daniel Vetter wrote:
>>> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
>>> seems to have uncovered a few more rules about what is allowed and
>>> isn't.
>>>
>>> This one here seems to indicate that allocating a work-queue while
>>> holding mmap_sem is a no-go, so let's try to preallocate it.
>>>
>>> Of course another way to break this chain would be somewhere in the
>>> cpu hotplug code, since this isn't the only trace we're finding now
>>> which goes through msr_create_device.
>>>
>>> Full lockdep splat:

[snipped lockdep splat]

>>> v2: Set ret correctly when we raced with another thread.
>>>
>>> v3: Use Chris' diff. Attach the right lockdep splat.
>>>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Sasha Levin <alexander.levin@verizon.com>
>>> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_userptr.c | 35 +++++++++++++++++++--------------
>>>    1 file changed, 20 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> index 2d4996de7331..f9b3406401af 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>> @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
>>>    i915_mmu_notifier_create(struct mm_struct *mm)
>>>    {
>>>    	struct i915_mmu_notifier *mn;
>>> -	int ret;
>>>    	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
>>>    	if (mn == NULL)
>>> @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>>>    		return ERR_PTR(-ENOMEM);
>>>    	}
>>> -	 /* Protected by mmap_sem (write-lock) */
>>> -	ret = __mmu_notifier_register(&mn->mn, mm);
>>> -	if (ret) {
>>> -		destroy_workqueue(mn->wq);
>>> -		kfree(mn);
>>> -		return ERR_PTR(ret);
>>> -	}
>>> -
>>>    	return mn;
>>>    }
>>> @@ -210,23 +201,37 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
>>>    static struct i915_mmu_notifier *
>>>    i915_mmu_notifier_find(struct i915_mm_struct *mm)
>>>    {
>>> -	struct i915_mmu_notifier *mn = mm->mn;
>>> +	struct i915_mmu_notifier *mn;
>>> +	int err;
>>>    	mn = mm->mn;
>>>    	if (mn)
>>>    		return mn;
>>> +	mn = i915_mmu_notifier_create(mm->mm);
>>> +	if (IS_ERR(mn))
>>> +		return mn;
>>
>> Strictly speaking we don't want to fail just yet, only it we actually needed
>> a new notifier and we failed to create it.
> 
> The check 2 lines above not good enough? It's somewhat racy, but I'm not
> sure what value we provide by being perfectly correct against low memory.
> This thread racing against a 2nd one, where the minimal allocation of the
> 2nd one pushed us perfectly over the oom threshold seems a very unlikely
> scenario.
> 
> Also, small allocations actually never fail :-)

Yes, but, we otherwise make each other re-spin for much smaller things 
than bailout logic being conceptually at the wrong place. So for me I'd 
like a respin. It's not complicated at all, just move the bailout to to 
before the __mmu_notifier_register:

...

err = 0;
if (IS_ERR(mn))
	err = PTR_ERR(..);

...

if (mana->manah == NULL) { /* ;-D */
	/* Protect by mmap_sem...
	if (err == 0) {
		err = __mmu_notifier_register(..);
		...
	}
}

...

if (mn && !IS_ERR(mn)) {
	...free...
}

I think.. ?

R-b on this, plus below, unless I got something wrong.

> 
>>
>>> +
>>> +	err = 0;
>>>    	down_write(&mm->mm->mmap_sem);
>>>    	mutex_lock(&mm->i915->mm_lock);
>>> -	if ((mn = mm->mn) == NULL) {ed
>>> -		mn = i915_mmu_notifier_create(mm->mm);
>>> -		if (!IS_ERR(mn))
>>> -			mm->mn = mn;
>>> +	if (mm->mn == NULL) {
>>> +		/* Protected by mmap_sem (write-lock) */
>>> +		err = __mmu_notifier_register(&mn->mn, mm->mm);
>>> +		if (!err) {
>>> +			/* Protected by mm_lock */
>>> +			mm->mn = fetch_and_zero(&mn);
>>> +		}
>>>    	}
>>>    	mutex_unlock(&mm->i915->mm_lock);
>>>    	up_write(&mm->mm->mmap_sem);
>>> -	return mn;
>>> +	if (mn) {
>>> +		destroy_workqueue(mn->wq);
>>> +		kfree(mn);
>>> +	}
>>> +
>>> +	return err ? ERR_PTR(err) : mm->mn;
>>>    }
>>>    static int
>>>
>>
>> Otherwise looks good to me.
>>
>> I would also put a note in the commit on how working around the locking
>> issue is also beneficial to performance with moving the allocation step
>> outside the mmap_sem.
> 
> Yeah Chris brought that up too, I don't really buy it given how
> heavy-weight __mmu_notifier_register is. But I can add something like:
> 
> "This also has the minor benefit of slightly reducing the critical
> section where we hold mmap_sem."
> 
> r-b with that added to the commit message?

I think for me it is more making it clear that we are working around 
something where we are not strictly broken by design with a, however 
meaningless, optimisation at the same time.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06  9:06 ` Daniel Vetter
                   ` (5 preceding siblings ...)
  (?)
@ 2017-10-06 15:52 ` Daniel Vetter
  2017-10-06 16:07   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 35+ messages in thread
From: Daniel Vetter @ 2017-10-06 15:52 UTC (permalink / raw)
  To: Intel Graphics Development
  Cc: Peter Zijlstra, Daniel Vetter, Tejun Heo, Daniel Vetter,
	Thomas Gleixner, Sasha Levin

4.14-rc1 gained the fancy new cross-release support in lockdep, which
seems to have uncovered a few more rules about what is allowed and
isn't.

This one here seems to indicate that allocating a work-queue while
holding mmap_sem is a no-go, so let's try to preallocate it.

Of course another way to break this chain would be somewhere in the
cpu hotplug code, since this isn't the only trace we're finding now
which goes through msr_create_device.

Full lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
------------------------------------------------------
prime_mmap/1551 is trying to acquire lock:
 (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50

but task is already holding lock:
 (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev_priv->mm_lock){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #5 (&mm->mmap_sem){++++}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __might_fault+0x68/0x90
       _copy_to_user+0x23/0x70
       filldir+0xa5/0x120
       dcache_readdir+0xf9/0x170
       iterate_dir+0x69/0x1a0
       SyS_getdents+0xa5/0x140
       entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
       down_write+0x3b/0x70
       handle_create+0xcb/0x1e0
       devtmpfsd+0x139/0x180
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       wait_for_common+0x58/0x210
       wait_for_completion+0x1d/0x20
       devtmpfs_create_node+0x13d/0x160
       device_add+0x5eb/0x620
       device_create_groups_vargs+0xe0/0xf0
       device_create+0x3a/0x40
       msr_device_create+0x2b/0x40
       cpuhp_invoke_callback+0xa3/0x840
       cpuhp_thread_fun+0x7a/0x150
       smpboot_thread_fn+0x18a/0x280
       kthread+0x152/0x190
       ret_from_fork+0x27/0x40

-> #2 (cpuhp_state){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpuhp_issue_call+0x10b/0x170
       __cpuhp_setup_state_cpuslocked+0x134/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_writeback_init+0x43/0x67
       pagecache_init+0x3d/0x42
       start_kernel+0x3a8/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       __mutex_lock+0x86/0x9b0
       mutex_lock_nested+0x1b/0x20
       __cpuhp_setup_state_cpuslocked+0x52/0x2a0
       __cpuhp_setup_state+0x46/0x60
       page_alloc_init+0x28/0x30
       start_kernel+0x145/0x3fc
       x86_64_start_reservations+0x2a/0x2c
       x86_64_start_kernel+0x6d/0x70
       verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
       check_prev_add+0x430/0x840
       __lock_acquire+0x1420/0x15e0
       lock_acquire+0xb0/0x200
       cpus_read_lock+0x3d/0xb0
       apply_workqueue_attrs+0x17/0x50
       __alloc_workqueue_key+0x1d8/0x4d9
       i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
       i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
       drm_ioctl_kernel+0x69/0xb0
       drm_ioctl+0x2f9/0x3d0
       do_vfs_ioctl+0x94/0x670
       SyS_ioctl+0x41/0x70
       entry_SYSCALL_64_fastpath+0x1c/0xb1

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev_priv->mm_lock);
                               lock(&mm->mmap_sem);
                               lock(&dev_priv->mm_lock);
  lock(cpu_hotplug_lock.rw_sem);

 *** DEADLOCK ***

2 locks held by prime_mmap/1551:
 #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
 #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]

stack backtrace:
CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
Call Trace:
 dump_stack+0x68/0x9f
 print_circular_bug+0x235/0x3c0
 ? lockdep_init_map_crosslock+0x20/0x20
 check_prev_add+0x430/0x840
 __lock_acquire+0x1420/0x15e0
 ? __lock_acquire+0x1420/0x15e0
 ? lockdep_init_map_crosslock+0x20/0x20
 lock_acquire+0xb0/0x200
 ? apply_workqueue_attrs+0x17/0x50
 cpus_read_lock+0x3d/0xb0
 ? apply_workqueue_attrs+0x17/0x50
 apply_workqueue_attrs+0x17/0x50
 __alloc_workqueue_key+0x1d8/0x4d9
 ? __lockdep_init_map+0x57/0x1c0
 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
 i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 drm_ioctl_kernel+0x69/0xb0
 drm_ioctl+0x2f9/0x3d0
 ? i915_gem_userptr_release+0x140/0x140 [i915]
 ? __do_page_fault+0x2a4/0x570
 do_vfs_ioctl+0x94/0x670
 ? entry_SYSCALL_64_fastpath+0x5/0xb1
 ? __this_cpu_preempt_check+0x13/0x20
 ? trace_hardirqs_on_caller+0xe3/0x1b0
 SyS_ioctl+0x41/0x70
 entry_SYSCALL_64_fastpath+0x1c/0xb1
RIP: 0033:0x7fbb83c39587
RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
 ? __this_cpu_preempt_check+0x13/0x20

Note that this also has the minor benefit of slightly reducing the
critical section where we hold mmap_sem.

v2: Set ret correctly when we raced with another thread.

v3: Use Chris' diff. Attach the right lockdep splat.

v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and
some other thread managed to not also get an ENOMEM and successfully
install the mmu notifier. Note that the kernel guarantees that small
allocations succeed, so this never actually happens).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sasha Levin <alexander.levin@verizon.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Tejun Heo <tj@kernel.org>
References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 38 ++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 2d4996de7331..be55d9af754e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
 i915_mmu_notifier_create(struct mm_struct *mm)
 {
 	struct i915_mmu_notifier *mn;
-	int ret;
 
 	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
 	if (mn == NULL)
@@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	 /* Protected by mmap_sem (write-lock) */
-	ret = __mmu_notifier_register(&mn->mn, mm);
-	if (ret) {
-		destroy_workqueue(mn->wq);
-		kfree(mn);
-		return ERR_PTR(ret);
-	}
-
 	return mn;
 }
 
@@ -210,23 +201,40 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
 static struct i915_mmu_notifier *
 i915_mmu_notifier_find(struct i915_mm_struct *mm)
 {
-	struct i915_mmu_notifier *mn = mm->mn;
+	struct i915_mmu_notifier *mn;
+	int err = 0;
 
 	mn = mm->mn;
 	if (mn)
 		return mn;
 
+	mn = i915_mmu_notifier_create(mm->mm);
+	if (IS_ERR(mn))
+		err = PTR_ERR(mn);
+
 	down_write(&mm->mm->mmap_sem);
 	mutex_lock(&mm->i915->mm_lock);
-	if ((mn = mm->mn) == NULL) {
-		mn = i915_mmu_notifier_create(mm->mm);
-		if (!IS_ERR(mn))
-			mm->mn = mn;
+	if (mm->mn == NULL && !err) {
+		/* Protected by mmap_sem (write-lock) */
+		err = __mmu_notifier_register(&mn->mn, mm->mm);
+		if (!err) {
+			/* Protected by mm_lock */
+			mm->mn = fetch_and_zero(&mn);
+		}
+	} else {
+		/* someone else raced and successfully installed the mmu
+		 * notifier, we can cancel our own errors */
+		err = 0;
 	}
 	mutex_unlock(&mm->i915->mm_lock);
 	up_write(&mm->mm->mmap_sem);
 
-	return mn;
+	if (mn) {
+		destroy_workqueue(mn->wq);
+		kfree(mn);
+	}
+
+	return err ? ERR_PTR(err) : mm->mn;
 }
 
 static int
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
  2017-10-06 15:52 ` [PATCH] " Daniel Vetter
@ 2017-10-06 16:07   ` Tvrtko Ursulin
  0 siblings, 0 replies; 35+ messages in thread
From: Tvrtko Ursulin @ 2017-10-06 16:07 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development
  Cc: Peter Zijlstra, Tejun Heo, Thomas Gleixner, Sasha Levin, Daniel Vetter


On 06/10/2017 16:52, Daniel Vetter wrote:
> 4.14-rc1 gained the fancy new cross-release support in lockdep, which
> seems to have uncovered a few more rules about what is allowed and
> isn't.
> 
> This one here seems to indicate that allocating a work-queue while
> holding mmap_sem is a no-go, so let's try to preallocate it.
> 
> Of course another way to break this chain would be somewhere in the
> cpu hotplug code, since this isn't the only trace we're finding now
> which goes through msr_create_device.
> 
> Full lockdep splat:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G     U
> ------------------------------------------------------
> prime_mmap/1551 is trying to acquire lock:
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50
> 
> but task is already holding lock:
>   (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #6 (&dev_priv->mm_lock){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #5 (&mm->mmap_sem){++++}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __might_fault+0x68/0x90
>         _copy_to_user+0x23/0x70
>         filldir+0xa5/0x120
>         dcache_readdir+0xf9/0x170
>         iterate_dir+0x69/0x1a0
>         SyS_getdents+0xa5/0x140
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> -> #4 (&sb->s_type->i_mutex_key#5){++++}:
>         down_write+0x3b/0x70
>         handle_create+0xcb/0x1e0
>         devtmpfsd+0x139/0x180
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #3 ((complete)&req.done){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         wait_for_common+0x58/0x210
>         wait_for_completion+0x1d/0x20
>         devtmpfs_create_node+0x13d/0x160
>         device_add+0x5eb/0x620
>         device_create_groups_vargs+0xe0/0xf0
>         device_create+0x3a/0x40
>         msr_device_create+0x2b/0x40
>         cpuhp_invoke_callback+0xa3/0x840
>         cpuhp_thread_fun+0x7a/0x150
>         smpboot_thread_fn+0x18a/0x280
>         kthread+0x152/0x190
>         ret_from_fork+0x27/0x40
> 
> -> #2 (cpuhp_state){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpuhp_issue_call+0x10b/0x170
>         __cpuhp_setup_state_cpuslocked+0x134/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_writeback_init+0x43/0x67
>         pagecache_init+0x3d/0x42
>         start_kernel+0x3a8/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #1 (cpuhp_state_mutex){+.+.}:
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         __mutex_lock+0x86/0x9b0
>         mutex_lock_nested+0x1b/0x20
>         __cpuhp_setup_state_cpuslocked+0x52/0x2a0
>         __cpuhp_setup_state+0x46/0x60
>         page_alloc_init+0x28/0x30
>         start_kernel+0x145/0x3fc
>         x86_64_start_reservations+0x2a/0x2c
>         x86_64_start_kernel+0x6d/0x70
>         verify_cpu+0x0/0xfb
> 
> -> #0 (cpu_hotplug_lock.rw_sem){++++}:
>         check_prev_add+0x430/0x840
>         __lock_acquire+0x1420/0x15e0
>         lock_acquire+0xb0/0x200
>         cpus_read_lock+0x3d/0xb0
>         apply_workqueue_attrs+0x17/0x50
>         __alloc_workqueue_key+0x1d8/0x4d9
>         i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>         i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>         drm_ioctl_kernel+0x69/0xb0
>         drm_ioctl+0x2f9/0x3d0
>         do_vfs_ioctl+0x94/0x670
>         SyS_ioctl+0x41/0x70
>         entry_SYSCALL_64_fastpath+0x1c/0xb1
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&dev_priv->mm_lock);
>                                 lock(&mm->mmap_sem);
>                                 lock(&dev_priv->mm_lock);
>    lock(cpu_hotplug_lock.rw_sem);
> 
>   *** DEADLOCK ***
> 
> 2 locks held by prime_mmap/1551:
>   #0:  (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915]
>   #1:  (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915]
> 
> stack backtrace:
> CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G     U          4.14.0-rc1-CI-CI_DRM_3118+ #1
> Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
> Call Trace:
>   dump_stack+0x68/0x9f
>   print_circular_bug+0x235/0x3c0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   check_prev_add+0x430/0x840
>   __lock_acquire+0x1420/0x15e0
>   ? __lock_acquire+0x1420/0x15e0
>   ? lockdep_init_map_crosslock+0x20/0x20
>   lock_acquire+0xb0/0x200
>   ? apply_workqueue_attrs+0x17/0x50
>   cpus_read_lock+0x3d/0xb0
>   ? apply_workqueue_attrs+0x17/0x50
>   apply_workqueue_attrs+0x17/0x50
>   __alloc_workqueue_key+0x1d8/0x4d9
>   ? __lockdep_init_map+0x57/0x1c0
>   i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915]
>   i915_gem_userptr_ioctl+0x222/0x2c0 [i915]
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   drm_ioctl_kernel+0x69/0xb0
>   drm_ioctl+0x2f9/0x3d0
>   ? i915_gem_userptr_release+0x140/0x140 [i915]
>   ? __do_page_fault+0x2a4/0x570
>   do_vfs_ioctl+0x94/0x670
>   ? entry_SYSCALL_64_fastpath+0x5/0xb1
>   ? __this_cpu_preempt_check+0x13/0x20
>   ? trace_hardirqs_on_caller+0xe3/0x1b0
>   SyS_ioctl+0x41/0x70
>   entry_SYSCALL_64_fastpath+0x1c/0xb1
> RIP: 0033:0x7fbb83c39587
> RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587
> RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003
> RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac
> R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac
>   ? __this_cpu_preempt_check+0x13/0x20
> 
> Note that this also has the minor benefit of slightly reducing the
> critical section where we hold mmap_sem.
> 
> v2: Set ret correctly when we raced with another thread.
> 
> v3: Use Chris' diff. Attach the right lockdep splat.
> 
> v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and
> some other thread managed to not also get an ENOMEM and successfully
> install the mmu notifier. Note that the kernel guarantees that small
> allocations succeed, so this never actually happens).
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sasha Levin <alexander.levin@verizon.com>
> Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> Cc: Tejun Heo <tj@kernel.org>
> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_userptr.c | 38 ++++++++++++++++++++-------------
>   1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 2d4996de7331..be55d9af754e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -164,7 +164,6 @@ static struct i915_mmu_notifier *
>   i915_mmu_notifier_create(struct mm_struct *mm)
>   {
>   	struct i915_mmu_notifier *mn;
> -	int ret;
>   
>   	mn = kmalloc(sizeof(*mn), GFP_KERNEL);
>   	if (mn == NULL)
> @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	 /* Protected by mmap_sem (write-lock) */
> -	ret = __mmu_notifier_register(&mn->mn, mm);
> -	if (ret) {
> -		destroy_workqueue(mn->wq);
> -		kfree(mn);
> -		return ERR_PTR(ret);
> -	}
> -
>   	return mn;
>   }
>   
> @@ -210,23 +201,40 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
>   static struct i915_mmu_notifier *
>   i915_mmu_notifier_find(struct i915_mm_struct *mm)
>   {
> -	struct i915_mmu_notifier *mn = mm->mn;
> +	struct i915_mmu_notifier *mn;
> +	int err = 0;
>   
>   	mn = mm->mn;
>   	if (mn)
>   		return mn;
>   
> +	mn = i915_mmu_notifier_create(mm->mm);
> +	if (IS_ERR(mn))
> +		err = PTR_ERR(mn);
> +
>   	down_write(&mm->mm->mmap_sem);
>   	mutex_lock(&mm->i915->mm_lock);
> -	if ((mn = mm->mn) == NULL) {
> -		mn = i915_mmu_notifier_create(mm->mm);
> -		if (!IS_ERR(mn))
> -			mm->mn = mn;
> +	if (mm->mn == NULL && !err) {
> +		/* Protected by mmap_sem (write-lock) */
> +		err = __mmu_notifier_register(&mn->mn, mm->mm);
> +		if (!err) {
> +			/* Protected by mm_lock */
> +			mm->mn = fetch_and_zero(&mn);
> +		}
> +	} else {
> +		/* someone else raced and successfully installed the mmu
> +		 * notifier, we can cancel our own errors */

Better make sure Linus is not lurking. ;)

> +		err = 0;
>   	}
>   	mutex_unlock(&mm->i915->mm_lock);
>   	up_write(&mm->mm->mmap_sem);
>   
> -	return mn;
> +	if (mn) {
> +		destroy_workqueue(mn->wq);
> +		kfree(mn);
> +	}
> +
> +	return err ? ERR_PTR(err) : mm->mn;
>   }
>   
>   static int
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock (rev2)
  2017-10-06  9:06 ` Daniel Vetter
                   ` (6 preceding siblings ...)
  (?)
@ 2017-10-06 16:29 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2017-10-06 16:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock (rev2)
URL   : https://patchwork.freedesktop.org/series/31476/
State : success

== Summary ==

Series 31476v2 series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock
https://patchwork.freedesktop.org/api/1.0/series/31476/revisions/2/mbox/

Test drv_module_reload:
        Subgroup basic-reload-inject:
                incomplete -> PASS       (fi-cfl-s) fdo#103022

fdo#103022 https://bugs.freedesktop.org/show_bug.cgi?id=103022

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:455s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:482s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:397s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:566s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:287s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:526s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:544s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:527s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:559s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:618s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:430s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:612s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:440s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:417s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:504s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:482s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:505s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:584s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:501s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:597s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:656s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:659s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:531s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:516s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:476s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:582s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:432s
fi-bxt-j4205 failed to connect after reboot

cb32cc2ad1c3ccd0803276d5af46c410f5104951 drm-tip: 2017y-10m-06d-15h-01m-44s UTC integration manifest
999c4f026e85 drm/i915: Use rcu instead of stop_machine in set_wedged
cef3c4054a61 drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5935/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 14:20     ` Daniel Vetter
@ 2017-10-06 17:29       ` Chris Wilson
  2017-10-09  9:12           ` Daniel Vetter
  2017-10-06 17:37       ` Chris Wilson
  1 sibling, 1 reply; 35+ messages in thread
From: Chris Wilson @ 2017-10-06 17:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Daniel Vetter,
	Thomas Gleixner, Mika Kuoppala

Quoting Daniel Vetter (2017-10-06 15:20:09)
> On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > stop_machine is not really a locking primitive we should use, except
> > > when the hw folks tell us the hw is broken and that's the only way to
> > > work around it.
> > > 
> > > This patch tries to address the locking abuse of stop_machine() from
> > > 
> > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > 
> > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > 
> > > Chris said parts of the reasons for going with stop_machine() was that
> > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > 
> > > To stay as close as possible to the stop_machine semantics we first
> > > update all the submit function pointers to the nop handler, then call
> > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > should give us exactly the huge barrier we want.
> > > 
> > > I pondered whether we should annotate engine->submit_request as __rcu
> > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > behind those is to make sure the compiler/cpu barriers are there for
> > > when you have an actual data structure you point at, to make sure all
> > > the writes are seen correctly on the read side. But we just have a
> > > function pointer, and .text isn't changed, so no need for these
> > > barriers and hence no need for annotations.
> > > 
> > > This should fix the followwing lockdep splat:
> > > 
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > ------------------------------------------------------
> > > kworker/3:4/562 is trying to acquire lock:
> > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > 
> > > but task is already holding lock:
> > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > 
> > > which lock already depends on the new lock.
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #6 (&dev->struct_mutex){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __mutex_lock+0x86/0x9b0
> > >        mutex_lock_interruptible_nested+0x1b/0x20
> > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > >        i915_gem_fault+0x209/0x650 [i915]
> > >        __do_fault+0x1e/0x80
> > >        __handle_mm_fault+0xa08/0xed0
> > >        handle_mm_fault+0x156/0x300
> > >        __do_page_fault+0x2c5/0x570
> > >        do_page_fault+0x28/0x250
> > >        page_fault+0x22/0x30
> > > 
> > > -> #5 (&mm->mmap_sem){++++}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __might_fault+0x68/0x90
> > >        _copy_to_user+0x23/0x70
> > >        filldir+0xa5/0x120
> > >        dcache_readdir+0xf9/0x170
> > >        iterate_dir+0x69/0x1a0
> > >        SyS_getdents+0xa5/0x140
> > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > 
> > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > >        down_write+0x3b/0x70
> > >        handle_create+0xcb/0x1e0
> > >        devtmpfsd+0x139/0x180
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > -> #3 ((complete)&req.done){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        wait_for_common+0x58/0x210
> > >        wait_for_completion+0x1d/0x20
> > >        devtmpfs_create_node+0x13d/0x160
> > >        device_add+0x5eb/0x620
> > >        device_create_groups_vargs+0xe0/0xf0
> > >        device_create+0x3a/0x40
> > >        msr_device_create+0x2b/0x40
> > >        cpuhp_invoke_callback+0xc9/0xbf0
> > >        cpuhp_thread_fun+0x17b/0x240
> > >        smpboot_thread_fn+0x18a/0x280
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > -> #2 (cpuhp_state-up){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        cpuhp_issue_call+0x133/0x1c0
> > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > >        __cpuhp_setup_state+0x46/0x60
> > >        page_writeback_init+0x43/0x67
> > >        pagecache_init+0x3d/0x42
> > >        start_kernel+0x3a8/0x3fc
> > >        x86_64_start_reservations+0x2a/0x2c
> > >        x86_64_start_kernel+0x6d/0x70
> > >        verify_cpu+0x0/0xfb
> > > 
> > > -> #1 (cpuhp_state_mutex){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __mutex_lock+0x86/0x9b0
> > >        mutex_lock_nested+0x1b/0x20
> > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > >        __cpuhp_setup_state+0x46/0x60
> > >        page_alloc_init+0x28/0x30
> > >        start_kernel+0x145/0x3fc
> > >        x86_64_start_reservations+0x2a/0x2c
> > >        x86_64_start_kernel+0x6d/0x70
> > >        verify_cpu+0x0/0xfb
> > > 
> > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > >        check_prev_add+0x430/0x840
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        cpus_read_lock+0x3d/0xb0
> > >        stop_machine+0x1c/0x40
> > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > >        i915_reset+0xb9/0x230 [i915]
> > >        i915_reset_device+0x1f6/0x260 [i915]
> > >        i915_handle_error+0x2d8/0x430 [i915]
> > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > >        process_one_work+0x233/0x660
> > >        worker_thread+0x4e/0x3b0
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > other info that might help us debug this:
> > > 
> > > Chain exists of:
> > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > 
> > >  Possible unsafe locking scenario:
> > > 
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&dev->struct_mutex);
> > >                                lock(&mm->mmap_sem);
> > >                                lock(&dev->struct_mutex);
> > >   lock(cpu_hotplug_lock.rw_sem);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > > 3 locks held by kworker/3:4/562:
> > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > 
> > > stack backtrace:
> > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > Call Trace:
> > >  dump_stack+0x68/0x9f
> > >  print_circular_bug+0x235/0x3c0
> > >  ? lockdep_init_map_crosslock+0x20/0x20
> > >  check_prev_add+0x430/0x840
> > >  ? irq_work_queue+0x86/0xe0
> > >  ? wake_up_klogd+0x53/0x70
> > >  __lock_acquire+0x1420/0x15e0
> > >  ? __lock_acquire+0x1420/0x15e0
> > >  ? lockdep_init_map_crosslock+0x20/0x20
> > >  lock_acquire+0xb0/0x200
> > >  ? stop_machine+0x1c/0x40
> > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > >  cpus_read_lock+0x3d/0xb0
> > >  ? stop_machine+0x1c/0x40
> > >  stop_machine+0x1c/0x40
> > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > >  i915_reset+0xb9/0x230 [i915]
> > >  i915_reset_device+0x1f6/0x260 [i915]
> > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > >  ? work_on_cpu_safe+0x60/0x60
> > >  i915_handle_error+0x2d8/0x430 [i915]
> > >  ? vsnprintf+0xd1/0x4b0
> > >  ? scnprintf+0x3a/0x70
> > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > >  process_one_work+0x233/0x660
> > >  worker_thread+0x4e/0x3b0
> > >  kthread+0x152/0x190
> > >  ? process_one_work+0x660/0x660
> > >  ? kthread_create_on_node+0x40/0x40
> > >  ret_from_fork+0x27/0x40
> > > Setting dangerous option reset - tainting kernel
> > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > Setting dangerous option reset - tainting kernel
> > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > 
> > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > improve commit message.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index ab8c6946fea4..e79a6ca60265 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > >  }
> > >  
> > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > >  {
> > > -       /* We need to be sure that no thread is running the old callback as
> > > -        * we install the nop handler (otherwise we would submit a request
> > > -        * to hardware that will never complete). In order to prevent this
> > > -        * race, we wait until the machine is idle before making the swap
> > > -        * (using stop_machine()).
> > > -        */
> > > -       engine->submit_request = nop_submit_request;
> > > -
> > >         /* Mark all executing requests as skipped */
> > >         engine->cancel_requests(engine);
> > 
> > How are we planning to serialise the intel_engine_init_global_seqno()
> > here with the in-flight nop_submit? With sufficient thrust we will get a
> > stale breadcrumb and an incomplete request.
> 
> Yeah that part looks indeed fishy. Well the entire "let the nop handler
> fake-complete requests" logic is something I don't really understand. I
> guess there's an exclusive relationship between requests handled directly
> (and cancelled in engine->cancel_request) and requests with external
> dma_fence dependencies.
> 
> But then I'm not really seeing what I'm changing, since even with the stop
> machine you might end up with a bunch of requests depending upon external
> fences, which then all complete at roughly the same time and race multiple
> calls to intel_engine_init_global_seqno with each another.

The stop_machine serialised the update here with the nop_handlers,
that's the bit that changes.
 
> With the fake submission, do we really need to call intel_engine_init_global_seqno?

Yes. Completion is still determined by i915_seqno_passed() comparing the
rq against the engine.

You need this

@@ -3246,6 +3246,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
 
 static void engine_set_wedged(struct intel_engine_cs *engine)
 {
+       unsigned long flags;
+
        /* We need to be sure that no thread is running the old callback as
         * we install the nop handler (otherwise we would submit a request
         * to hardware that will never complete). In order to prevent this
@@ -3261,8 +3263,10 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
         * (lockless) lookup doesn't try and wait upon the request as we
         * reset it.
         */
+       spin_lock_irqsave(&request->engine->timeline->lock, flags);
        intel_engine_init_global_seqno(engine,
                                       intel_engine_last_submit(engine));
+       spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
 }
 
So that the seqno written is ordered with the same spinlock used inside
the nop submission.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 14:20     ` Daniel Vetter
  2017-10-06 17:29       ` Chris Wilson
@ 2017-10-06 17:37       ` Chris Wilson
  2017-10-09  9:26           ` Daniel Vetter
  1 sibling, 1 reply; 35+ messages in thread
From: Chris Wilson @ 2017-10-06 17:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Daniel Vetter,
	Thomas Gleixner, Mika Kuoppala

Quoting Daniel Vetter (2017-10-06 15:20:09)
> On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > stop_machine is not really a locking primitive we should use, except
> > > when the hw folks tell us the hw is broken and that's the only way to
> > > work around it.
> > > 
> > > This patch tries to address the locking abuse of stop_machine() from
> > > 
> > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > 
> > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > 
> > > Chris said parts of the reasons for going with stop_machine() was that
> > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > 
> > > To stay as close as possible to the stop_machine semantics we first
> > > update all the submit function pointers to the nop handler, then call
> > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > should give us exactly the huge barrier we want.
> > > 
> > > I pondered whether we should annotate engine->submit_request as __rcu
> > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > behind those is to make sure the compiler/cpu barriers are there for
> > > when you have an actual data structure you point at, to make sure all
> > > the writes are seen correctly on the read side. But we just have a
> > > function pointer, and .text isn't changed, so no need for these
> > > barriers and hence no need for annotations.
> > > 
> > > This should fix the followwing lockdep splat:
> > > 
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > ------------------------------------------------------
> > > kworker/3:4/562 is trying to acquire lock:
> > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > 
> > > but task is already holding lock:
> > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > 
> > > which lock already depends on the new lock.
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #6 (&dev->struct_mutex){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __mutex_lock+0x86/0x9b0
> > >        mutex_lock_interruptible_nested+0x1b/0x20
> > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > >        i915_gem_fault+0x209/0x650 [i915]
> > >        __do_fault+0x1e/0x80
> > >        __handle_mm_fault+0xa08/0xed0
> > >        handle_mm_fault+0x156/0x300
> > >        __do_page_fault+0x2c5/0x570
> > >        do_page_fault+0x28/0x250
> > >        page_fault+0x22/0x30
> > > 
> > > -> #5 (&mm->mmap_sem){++++}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __might_fault+0x68/0x90
> > >        _copy_to_user+0x23/0x70
> > >        filldir+0xa5/0x120
> > >        dcache_readdir+0xf9/0x170
> > >        iterate_dir+0x69/0x1a0
> > >        SyS_getdents+0xa5/0x140
> > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > 
> > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > >        down_write+0x3b/0x70
> > >        handle_create+0xcb/0x1e0
> > >        devtmpfsd+0x139/0x180
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > -> #3 ((complete)&req.done){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        wait_for_common+0x58/0x210
> > >        wait_for_completion+0x1d/0x20
> > >        devtmpfs_create_node+0x13d/0x160
> > >        device_add+0x5eb/0x620
> > >        device_create_groups_vargs+0xe0/0xf0
> > >        device_create+0x3a/0x40
> > >        msr_device_create+0x2b/0x40
> > >        cpuhp_invoke_callback+0xc9/0xbf0
> > >        cpuhp_thread_fun+0x17b/0x240
> > >        smpboot_thread_fn+0x18a/0x280
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > -> #2 (cpuhp_state-up){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        cpuhp_issue_call+0x133/0x1c0
> > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > >        __cpuhp_setup_state+0x46/0x60
> > >        page_writeback_init+0x43/0x67
> > >        pagecache_init+0x3d/0x42
> > >        start_kernel+0x3a8/0x3fc
> > >        x86_64_start_reservations+0x2a/0x2c
> > >        x86_64_start_kernel+0x6d/0x70
> > >        verify_cpu+0x0/0xfb
> > > 
> > > -> #1 (cpuhp_state_mutex){+.+.}:
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        __mutex_lock+0x86/0x9b0
> > >        mutex_lock_nested+0x1b/0x20
> > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > >        __cpuhp_setup_state+0x46/0x60
> > >        page_alloc_init+0x28/0x30
> > >        start_kernel+0x145/0x3fc
> > >        x86_64_start_reservations+0x2a/0x2c
> > >        x86_64_start_kernel+0x6d/0x70
> > >        verify_cpu+0x0/0xfb
> > > 
> > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > >        check_prev_add+0x430/0x840
> > >        __lock_acquire+0x1420/0x15e0
> > >        lock_acquire+0xb0/0x200
> > >        cpus_read_lock+0x3d/0xb0
> > >        stop_machine+0x1c/0x40
> > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > >        i915_reset+0xb9/0x230 [i915]
> > >        i915_reset_device+0x1f6/0x260 [i915]
> > >        i915_handle_error+0x2d8/0x430 [i915]
> > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > >        process_one_work+0x233/0x660
> > >        worker_thread+0x4e/0x3b0
> > >        kthread+0x152/0x190
> > >        ret_from_fork+0x27/0x40
> > > 
> > > other info that might help us debug this:
> > > 
> > > Chain exists of:
> > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > 
> > >  Possible unsafe locking scenario:
> > > 
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&dev->struct_mutex);
> > >                                lock(&mm->mmap_sem);
> > >                                lock(&dev->struct_mutex);
> > >   lock(cpu_hotplug_lock.rw_sem);
> > > 
> > >  *** DEADLOCK ***
> > > 
> > > 3 locks held by kworker/3:4/562:
> > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > 
> > > stack backtrace:
> > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > Call Trace:
> > >  dump_stack+0x68/0x9f
> > >  print_circular_bug+0x235/0x3c0
> > >  ? lockdep_init_map_crosslock+0x20/0x20
> > >  check_prev_add+0x430/0x840
> > >  ? irq_work_queue+0x86/0xe0
> > >  ? wake_up_klogd+0x53/0x70
> > >  __lock_acquire+0x1420/0x15e0
> > >  ? __lock_acquire+0x1420/0x15e0
> > >  ? lockdep_init_map_crosslock+0x20/0x20
> > >  lock_acquire+0xb0/0x200
> > >  ? stop_machine+0x1c/0x40
> > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > >  cpus_read_lock+0x3d/0xb0
> > >  ? stop_machine+0x1c/0x40
> > >  stop_machine+0x1c/0x40
> > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > >  i915_reset+0xb9/0x230 [i915]
> > >  i915_reset_device+0x1f6/0x260 [i915]
> > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > >  ? work_on_cpu_safe+0x60/0x60
> > >  i915_handle_error+0x2d8/0x430 [i915]
> > >  ? vsnprintf+0xd1/0x4b0
> > >  ? scnprintf+0x3a/0x70
> > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > >  process_one_work+0x233/0x660
> > >  worker_thread+0x4e/0x3b0
> > >  kthread+0x152/0x190
> > >  ? process_one_work+0x660/0x660
> > >  ? kthread_create_on_node+0x40/0x40
> > >  ret_from_fork+0x27/0x40
> > > Setting dangerous option reset - tainting kernel
> > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > Setting dangerous option reset - tainting kernel
> > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > 
> > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > improve commit message.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index ab8c6946fea4..e79a6ca60265 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > >  }
> > >  
> > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > >  {
> > > -       /* We need to be sure that no thread is running the old callback as
> > > -        * we install the nop handler (otherwise we would submit a request
> > > -        * to hardware that will never complete). In order to prevent this
> > > -        * race, we wait until the machine is idle before making the swap
> > > -        * (using stop_machine()).
> > > -        */
> > > -       engine->submit_request = nop_submit_request;
> > > -
> > >         /* Mark all executing requests as skipped */
> > >         engine->cancel_requests(engine);
> > 
> > How are we planning to serialise the intel_engine_init_global_seqno()
> > here with the in-flight nop_submit? With sufficient thrust we will get a
> > stale breadcrumb and an incomplete request.
> 
> Yeah that part looks indeed fishy. Well the entire "let the nop handler
> fake-complete requests" logic is something I don't really understand. I
> guess there's an exclusive relationship between requests handled directly
> (and cancelled in engine->cancel_request) and requests with external
> dma_fence dependencies.
> 
> But then I'm not really seeing what I'm changing, since even with the stop
> machine you might end up with a bunch of requests depending upon external
> fences, which then all complete at roughly the same time and race multiple
> calls to intel_engine_init_global_seqno with each another.

Ugh, there's another issue. If nop_submit_request is executed before
cancel_requests, we will consider the execution queue as completed and
not in error, i.e. we will not flag those requests with the user visible
-EIO.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* ✗ Fi.CI.IGT: warning for series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock (rev2)
  2017-10-06  9:06 ` Daniel Vetter
                   ` (7 preceding siblings ...)
  (?)
@ 2017-10-06 22:20 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2017-10-06 22:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock (rev2)
URL   : https://patchwork.freedesktop.org/series/31476/
State : warning

== Summary ==

Test gem_eio:
        Subgroup in-flight-contexts:
                dmesg-warn -> PASS       (shard-hsw) fdo#102886 +4
Test kms_cursor_crc:
        Subgroup cursor-64x64-sliding:
                pass       -> DMESG-WARN (shard-hsw)
Test prime_mmap:
        Subgroup test_userptr:
                dmesg-warn -> PASS       (shard-hsw) fdo#102939

fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886
fdo#102939 https://bugs.freedesktop.org/show_bug.cgi?id=102939

shard-hsw        total:2446 pass:1333 dwarn:1   dfail:0   fail:9   skip:1103 time:10142s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5935/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 17:29       ` Chris Wilson
@ 2017-10-09  9:12           ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-09  9:12 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development, LKML,
	Mika Kuoppala, Thomas Gleixner, Marta Lofstedt, Daniel Vetter

On Fri, Oct 06, 2017 at 06:29:08PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 15:20:09)
> > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > > stop_machine is not really a locking primitive we should use, except
> > > > when the hw folks tell us the hw is broken and that's the only way to
> > > > work around it.
> > > > 
> > > > This patch tries to address the locking abuse of stop_machine() from
> > > > 
> > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > > 
> > > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > > 
> > > > Chris said parts of the reasons for going with stop_machine() was that
> > > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > > 
> > > > To stay as close as possible to the stop_machine semantics we first
> > > > update all the submit function pointers to the nop handler, then call
> > > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > > should give us exactly the huge barrier we want.
> > > > 
> > > > I pondered whether we should annotate engine->submit_request as __rcu
> > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > > behind those is to make sure the compiler/cpu barriers are there for
> > > > when you have an actual data structure you point at, to make sure all
> > > > the writes are seen correctly on the read side. But we just have a
> > > > function pointer, and .text isn't changed, so no need for these
> > > > barriers and hence no need for annotations.
> > > > 
> > > > This should fix the followwing lockdep splat:
> > > > 
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > > ------------------------------------------------------
> > > > kworker/3:4/562 is trying to acquire lock:
> > > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > > 
> > > > but task is already holding lock:
> > > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #6 (&dev->struct_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_interruptible_nested+0x1b/0x20
> > > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > > >        i915_gem_fault+0x209/0x650 [i915]
> > > >        __do_fault+0x1e/0x80
> > > >        __handle_mm_fault+0xa08/0xed0
> > > >        handle_mm_fault+0x156/0x300
> > > >        __do_page_fault+0x2c5/0x570
> > > >        do_page_fault+0x28/0x250
> > > >        page_fault+0x22/0x30
> > > > 
> > > > -> #5 (&mm->mmap_sem){++++}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __might_fault+0x68/0x90
> > > >        _copy_to_user+0x23/0x70
> > > >        filldir+0xa5/0x120
> > > >        dcache_readdir+0xf9/0x170
> > > >        iterate_dir+0x69/0x1a0
> > > >        SyS_getdents+0xa5/0x140
> > > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > > 
> > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > > >        down_write+0x3b/0x70
> > > >        handle_create+0xcb/0x1e0
> > > >        devtmpfsd+0x139/0x180
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #3 ((complete)&req.done){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        wait_for_common+0x58/0x210
> > > >        wait_for_completion+0x1d/0x20
> > > >        devtmpfs_create_node+0x13d/0x160
> > > >        device_add+0x5eb/0x620
> > > >        device_create_groups_vargs+0xe0/0xf0
> > > >        device_create+0x3a/0x40
> > > >        msr_device_create+0x2b/0x40
> > > >        cpuhp_invoke_callback+0xc9/0xbf0
> > > >        cpuhp_thread_fun+0x17b/0x240
> > > >        smpboot_thread_fn+0x18a/0x280
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #2 (cpuhp_state-up){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpuhp_issue_call+0x133/0x1c0
> > > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_writeback_init+0x43/0x67
> > > >        pagecache_init+0x3d/0x42
> > > >        start_kernel+0x3a8/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #1 (cpuhp_state_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_nested+0x1b/0x20
> > > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_alloc_init+0x28/0x30
> > > >        start_kernel+0x145/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > > >        check_prev_add+0x430/0x840
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpus_read_lock+0x3d/0xb0
> > > >        stop_machine+0x1c/0x40
> > > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >        i915_reset+0xb9/0x230 [i915]
> > > >        i915_reset_device+0x1f6/0x260 [i915]
> > > >        i915_handle_error+0x2d8/0x430 [i915]
> > > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >        process_one_work+0x233/0x660
> > > >        worker_thread+0x4e/0x3b0
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > other info that might help us debug this:
> > > > 
> > > > Chain exists of:
> > > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > > 
> > > >  Possible unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&dev->struct_mutex);
> > > >                                lock(&mm->mmap_sem);
> > > >                                lock(&dev->struct_mutex);
> > > >   lock(cpu_hotplug_lock.rw_sem);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > 3 locks held by kworker/3:4/562:
> > > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > stack backtrace:
> > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > > Call Trace:
> > > >  dump_stack+0x68/0x9f
> > > >  print_circular_bug+0x235/0x3c0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  check_prev_add+0x430/0x840
> > > >  ? irq_work_queue+0x86/0xe0
> > > >  ? wake_up_klogd+0x53/0x70
> > > >  __lock_acquire+0x1420/0x15e0
> > > >  ? __lock_acquire+0x1420/0x15e0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  lock_acquire+0xb0/0x200
> > > >  ? stop_machine+0x1c/0x40
> > > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > > >  cpus_read_lock+0x3d/0xb0
> > > >  ? stop_machine+0x1c/0x40
> > > >  stop_machine+0x1c/0x40
> > > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >  i915_reset+0xb9/0x230 [i915]
> > > >  i915_reset_device+0x1f6/0x260 [i915]
> > > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > > >  ? work_on_cpu_safe+0x60/0x60
> > > >  i915_handle_error+0x2d8/0x430 [i915]
> > > >  ? vsnprintf+0xd1/0x4b0
> > > >  ? scnprintf+0x3a/0x70
> > > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >  process_one_work+0x233/0x660
> > > >  worker_thread+0x4e/0x3b0
> > > >  kthread+0x152/0x190
> > > >  ? process_one_work+0x660/0x660
> > > >  ? kthread_create_on_node+0x40/0x40
> > > >  ret_from_fork+0x27/0x40
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > 
> > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > > improve commit message.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index ab8c6946fea4..e79a6ca60265 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > > >  }
> > > >  
> > > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > > >  {
> > > > -       /* We need to be sure that no thread is running the old callback as
> > > > -        * we install the nop handler (otherwise we would submit a request
> > > > -        * to hardware that will never complete). In order to prevent this
> > > > -        * race, we wait until the machine is idle before making the swap
> > > > -        * (using stop_machine()).
> > > > -        */
> > > > -       engine->submit_request = nop_submit_request;
> > > > -
> > > >         /* Mark all executing requests as skipped */
> > > >         engine->cancel_requests(engine);
> > > 
> > > How are we planning to serialise the intel_engine_init_global_seqno()
> > > here with the in-flight nop_submit? With sufficient thrust we will get a
> > > stale breadcrumb and an incomplete request.
> > 
> > Yeah that part looks indeed fishy. Well the entire "let the nop handler
> > fake-complete requests" logic is something I don't really understand. I
> > guess there's an exclusive relationship between requests handled directly
> > (and cancelled in engine->cancel_request) and requests with external
> > dma_fence dependencies.
> > 
> > But then I'm not really seeing what I'm changing, since even with the stop
> > machine you might end up with a bunch of requests depending upon external
> > fences, which then all complete at roughly the same time and race multiple
> > calls to intel_engine_init_global_seqno with each another.
> 
> The stop_machine serialised the update here with the nop_handlers,
> that's the bit that changes.
>  
> > With the fake submission, do we really need to call intel_engine_init_global_seqno?
> 
> Yes. Completion is still determined by i915_seqno_passed() comparing the
> rq against the engine.
> 
> You need this
> 
> @@ -3246,6 +3246,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>  
>  static void engine_set_wedged(struct intel_engine_cs *engine)
>  {
> +       unsigned long flags;
> +
>         /* We need to be sure that no thread is running the old callback as
>          * we install the nop handler (otherwise we would submit a request
>          * to hardware that will never complete). In order to prevent this
> @@ -3261,8 +3263,10 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>          * (lockless) lookup doesn't try and wait upon the request as we
>          * reset it.
>          */
> +       spin_lock_irqsave(&request->engine->timeline->lock, flags);
>         intel_engine_init_global_seqno(engine,
>                                        intel_engine_last_submit(engine));
> +       spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
>  }
>  
> So that the seqno written is ordered with the same spinlock used inside
> the nop submission.

Makes sense, I entirely missed the spinlock on Fri evening. Call me blind
:-)

All amend the patch.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-09  9:12           ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-09  9:12 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Daniel Vetter,
	Thomas Gleixner, Mika Kuoppala

On Fri, Oct 06, 2017 at 06:29:08PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 15:20:09)
> > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > > stop_machine is not really a locking primitive we should use, except
> > > > when the hw folks tell us the hw is broken and that's the only way to
> > > > work around it.
> > > > 
> > > > This patch tries to address the locking abuse of stop_machine() from
> > > > 
> > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > > 
> > > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > > 
> > > > Chris said parts of the reasons for going with stop_machine() was that
> > > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > > 
> > > > To stay as close as possible to the stop_machine semantics we first
> > > > update all the submit function pointers to the nop handler, then call
> > > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > > should give us exactly the huge barrier we want.
> > > > 
> > > > I pondered whether we should annotate engine->submit_request as __rcu
> > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > > behind those is to make sure the compiler/cpu barriers are there for
> > > > when you have an actual data structure you point at, to make sure all
> > > > the writes are seen correctly on the read side. But we just have a
> > > > function pointer, and .text isn't changed, so no need for these
> > > > barriers and hence no need for annotations.
> > > > 
> > > > This should fix the followwing lockdep splat:
> > > > 
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > > ------------------------------------------------------
> > > > kworker/3:4/562 is trying to acquire lock:
> > > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > > 
> > > > but task is already holding lock:
> > > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #6 (&dev->struct_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_interruptible_nested+0x1b/0x20
> > > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > > >        i915_gem_fault+0x209/0x650 [i915]
> > > >        __do_fault+0x1e/0x80
> > > >        __handle_mm_fault+0xa08/0xed0
> > > >        handle_mm_fault+0x156/0x300
> > > >        __do_page_fault+0x2c5/0x570
> > > >        do_page_fault+0x28/0x250
> > > >        page_fault+0x22/0x30
> > > > 
> > > > -> #5 (&mm->mmap_sem){++++}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __might_fault+0x68/0x90
> > > >        _copy_to_user+0x23/0x70
> > > >        filldir+0xa5/0x120
> > > >        dcache_readdir+0xf9/0x170
> > > >        iterate_dir+0x69/0x1a0
> > > >        SyS_getdents+0xa5/0x140
> > > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > > 
> > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > > >        down_write+0x3b/0x70
> > > >        handle_create+0xcb/0x1e0
> > > >        devtmpfsd+0x139/0x180
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #3 ((complete)&req.done){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        wait_for_common+0x58/0x210
> > > >        wait_for_completion+0x1d/0x20
> > > >        devtmpfs_create_node+0x13d/0x160
> > > >        device_add+0x5eb/0x620
> > > >        device_create_groups_vargs+0xe0/0xf0
> > > >        device_create+0x3a/0x40
> > > >        msr_device_create+0x2b/0x40
> > > >        cpuhp_invoke_callback+0xc9/0xbf0
> > > >        cpuhp_thread_fun+0x17b/0x240
> > > >        smpboot_thread_fn+0x18a/0x280
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #2 (cpuhp_state-up){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpuhp_issue_call+0x133/0x1c0
> > > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_writeback_init+0x43/0x67
> > > >        pagecache_init+0x3d/0x42
> > > >        start_kernel+0x3a8/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #1 (cpuhp_state_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_nested+0x1b/0x20
> > > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_alloc_init+0x28/0x30
> > > >        start_kernel+0x145/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > > >        check_prev_add+0x430/0x840
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpus_read_lock+0x3d/0xb0
> > > >        stop_machine+0x1c/0x40
> > > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >        i915_reset+0xb9/0x230 [i915]
> > > >        i915_reset_device+0x1f6/0x260 [i915]
> > > >        i915_handle_error+0x2d8/0x430 [i915]
> > > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >        process_one_work+0x233/0x660
> > > >        worker_thread+0x4e/0x3b0
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > other info that might help us debug this:
> > > > 
> > > > Chain exists of:
> > > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > > 
> > > >  Possible unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&dev->struct_mutex);
> > > >                                lock(&mm->mmap_sem);
> > > >                                lock(&dev->struct_mutex);
> > > >   lock(cpu_hotplug_lock.rw_sem);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > 3 locks held by kworker/3:4/562:
> > > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > stack backtrace:
> > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > > Call Trace:
> > > >  dump_stack+0x68/0x9f
> > > >  print_circular_bug+0x235/0x3c0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  check_prev_add+0x430/0x840
> > > >  ? irq_work_queue+0x86/0xe0
> > > >  ? wake_up_klogd+0x53/0x70
> > > >  __lock_acquire+0x1420/0x15e0
> > > >  ? __lock_acquire+0x1420/0x15e0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  lock_acquire+0xb0/0x200
> > > >  ? stop_machine+0x1c/0x40
> > > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > > >  cpus_read_lock+0x3d/0xb0
> > > >  ? stop_machine+0x1c/0x40
> > > >  stop_machine+0x1c/0x40
> > > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >  i915_reset+0xb9/0x230 [i915]
> > > >  i915_reset_device+0x1f6/0x260 [i915]
> > > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > > >  ? work_on_cpu_safe+0x60/0x60
> > > >  i915_handle_error+0x2d8/0x430 [i915]
> > > >  ? vsnprintf+0xd1/0x4b0
> > > >  ? scnprintf+0x3a/0x70
> > > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >  process_one_work+0x233/0x660
> > > >  worker_thread+0x4e/0x3b0
> > > >  kthread+0x152/0x190
> > > >  ? process_one_work+0x660/0x660
> > > >  ? kthread_create_on_node+0x40/0x40
> > > >  ret_from_fork+0x27/0x40
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > 
> > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > > improve commit message.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index ab8c6946fea4..e79a6ca60265 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > > >  }
> > > >  
> > > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > > >  {
> > > > -       /* We need to be sure that no thread is running the old callback as
> > > > -        * we install the nop handler (otherwise we would submit a request
> > > > -        * to hardware that will never complete). In order to prevent this
> > > > -        * race, we wait until the machine is idle before making the swap
> > > > -        * (using stop_machine()).
> > > > -        */
> > > > -       engine->submit_request = nop_submit_request;
> > > > -
> > > >         /* Mark all executing requests as skipped */
> > > >         engine->cancel_requests(engine);
> > > 
> > > How are we planning to serialise the intel_engine_init_global_seqno()
> > > here with the in-flight nop_submit? With sufficient thrust we will get a
> > > stale breadcrumb and an incomplete request.
> > 
> > Yeah that part looks indeed fishy. Well the entire "let the nop handler
> > fake-complete requests" logic is something I don't really understand. I
> > guess there's an exclusive relationship between requests handled directly
> > (and cancelled in engine->cancel_request) and requests with external
> > dma_fence dependencies.
> > 
> > But then I'm not really seeing what I'm changing, since even with the stop
> > machine you might end up with a bunch of requests depending upon external
> > fences, which then all complete at roughly the same time and race multiple
> > calls to intel_engine_init_global_seqno with each another.
> 
> The stop_machine serialised the update here with the nop_handlers,
> that's the bit that changes.
>  
> > With the fake submission, do we really need to call intel_engine_init_global_seqno?
> 
> Yes. Completion is still determined by i915_seqno_passed() comparing the
> rq against the engine.
> 
> You need this
> 
> @@ -3246,6 +3246,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>  
>  static void engine_set_wedged(struct intel_engine_cs *engine)
>  {
> +       unsigned long flags;
> +
>         /* We need to be sure that no thread is running the old callback as
>          * we install the nop handler (otherwise we would submit a request
>          * to hardware that will never complete). In order to prevent this
> @@ -3261,8 +3263,10 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>          * (lockless) lookup doesn't try and wait upon the request as we
>          * reset it.
>          */
> +       spin_lock_irqsave(&request->engine->timeline->lock, flags);
>         intel_engine_init_global_seqno(engine,
>                                        intel_engine_last_submit(engine));
> +       spin_unlock_irqrestore(&request->engine->timeline->lock, flags);
>  }
>  
> So that the seqno written is ordered with the same spinlock used inside
> the nop submission.

Makes sense, I entirely missed the spinlock on Fri evening. Call me blind
:-)

All amend the patch.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
  2017-10-06 17:37       ` Chris Wilson
@ 2017-10-09  9:26           ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-09  9:26 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development, LKML,
	Mika Kuoppala, Thomas Gleixner, Marta Lofstedt, Daniel Vetter

On Fri, Oct 06, 2017 at 06:37:52PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 15:20:09)
> > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > > stop_machine is not really a locking primitive we should use, except
> > > > when the hw folks tell us the hw is broken and that's the only way to
> > > > work around it.
> > > > 
> > > > This patch tries to address the locking abuse of stop_machine() from
> > > > 
> > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > > 
> > > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > > 
> > > > Chris said parts of the reasons for going with stop_machine() was that
> > > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > > 
> > > > To stay as close as possible to the stop_machine semantics we first
> > > > update all the submit function pointers to the nop handler, then call
> > > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > > should give us exactly the huge barrier we want.
> > > > 
> > > > I pondered whether we should annotate engine->submit_request as __rcu
> > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > > behind those is to make sure the compiler/cpu barriers are there for
> > > > when you have an actual data structure you point at, to make sure all
> > > > the writes are seen correctly on the read side. But we just have a
> > > > function pointer, and .text isn't changed, so no need for these
> > > > barriers and hence no need for annotations.
> > > > 
> > > > This should fix the followwing lockdep splat:
> > > > 
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > > ------------------------------------------------------
> > > > kworker/3:4/562 is trying to acquire lock:
> > > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > > 
> > > > but task is already holding lock:
> > > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #6 (&dev->struct_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_interruptible_nested+0x1b/0x20
> > > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > > >        i915_gem_fault+0x209/0x650 [i915]
> > > >        __do_fault+0x1e/0x80
> > > >        __handle_mm_fault+0xa08/0xed0
> > > >        handle_mm_fault+0x156/0x300
> > > >        __do_page_fault+0x2c5/0x570
> > > >        do_page_fault+0x28/0x250
> > > >        page_fault+0x22/0x30
> > > > 
> > > > -> #5 (&mm->mmap_sem){++++}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __might_fault+0x68/0x90
> > > >        _copy_to_user+0x23/0x70
> > > >        filldir+0xa5/0x120
> > > >        dcache_readdir+0xf9/0x170
> > > >        iterate_dir+0x69/0x1a0
> > > >        SyS_getdents+0xa5/0x140
> > > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > > 
> > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > > >        down_write+0x3b/0x70
> > > >        handle_create+0xcb/0x1e0
> > > >        devtmpfsd+0x139/0x180
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #3 ((complete)&req.done){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        wait_for_common+0x58/0x210
> > > >        wait_for_completion+0x1d/0x20
> > > >        devtmpfs_create_node+0x13d/0x160
> > > >        device_add+0x5eb/0x620
> > > >        device_create_groups_vargs+0xe0/0xf0
> > > >        device_create+0x3a/0x40
> > > >        msr_device_create+0x2b/0x40
> > > >        cpuhp_invoke_callback+0xc9/0xbf0
> > > >        cpuhp_thread_fun+0x17b/0x240
> > > >        smpboot_thread_fn+0x18a/0x280
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #2 (cpuhp_state-up){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpuhp_issue_call+0x133/0x1c0
> > > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_writeback_init+0x43/0x67
> > > >        pagecache_init+0x3d/0x42
> > > >        start_kernel+0x3a8/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #1 (cpuhp_state_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_nested+0x1b/0x20
> > > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_alloc_init+0x28/0x30
> > > >        start_kernel+0x145/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > > >        check_prev_add+0x430/0x840
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpus_read_lock+0x3d/0xb0
> > > >        stop_machine+0x1c/0x40
> > > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >        i915_reset+0xb9/0x230 [i915]
> > > >        i915_reset_device+0x1f6/0x260 [i915]
> > > >        i915_handle_error+0x2d8/0x430 [i915]
> > > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >        process_one_work+0x233/0x660
> > > >        worker_thread+0x4e/0x3b0
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > other info that might help us debug this:
> > > > 
> > > > Chain exists of:
> > > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > > 
> > > >  Possible unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&dev->struct_mutex);
> > > >                                lock(&mm->mmap_sem);
> > > >                                lock(&dev->struct_mutex);
> > > >   lock(cpu_hotplug_lock.rw_sem);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > 3 locks held by kworker/3:4/562:
> > > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > stack backtrace:
> > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > > Call Trace:
> > > >  dump_stack+0x68/0x9f
> > > >  print_circular_bug+0x235/0x3c0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  check_prev_add+0x430/0x840
> > > >  ? irq_work_queue+0x86/0xe0
> > > >  ? wake_up_klogd+0x53/0x70
> > > >  __lock_acquire+0x1420/0x15e0
> > > >  ? __lock_acquire+0x1420/0x15e0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  lock_acquire+0xb0/0x200
> > > >  ? stop_machine+0x1c/0x40
> > > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > > >  cpus_read_lock+0x3d/0xb0
> > > >  ? stop_machine+0x1c/0x40
> > > >  stop_machine+0x1c/0x40
> > > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >  i915_reset+0xb9/0x230 [i915]
> > > >  i915_reset_device+0x1f6/0x260 [i915]
> > > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > > >  ? work_on_cpu_safe+0x60/0x60
> > > >  i915_handle_error+0x2d8/0x430 [i915]
> > > >  ? vsnprintf+0xd1/0x4b0
> > > >  ? scnprintf+0x3a/0x70
> > > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >  process_one_work+0x233/0x660
> > > >  worker_thread+0x4e/0x3b0
> > > >  kthread+0x152/0x190
> > > >  ? process_one_work+0x660/0x660
> > > >  ? kthread_create_on_node+0x40/0x40
> > > >  ret_from_fork+0x27/0x40
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > 
> > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > > improve commit message.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index ab8c6946fea4..e79a6ca60265 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > > >  }
> > > >  
> > > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > > >  {
> > > > -       /* We need to be sure that no thread is running the old callback as
> > > > -        * we install the nop handler (otherwise we would submit a request
> > > > -        * to hardware that will never complete). In order to prevent this
> > > > -        * race, we wait until the machine is idle before making the swap
> > > > -        * (using stop_machine()).
> > > > -        */
> > > > -       engine->submit_request = nop_submit_request;
> > > > -
> > > >         /* Mark all executing requests as skipped */
> > > >         engine->cancel_requests(engine);
> > > 
> > > How are we planning to serialise the intel_engine_init_global_seqno()
> > > here with the in-flight nop_submit? With sufficient thrust we will get a
> > > stale breadcrumb and an incomplete request.
> > 
> > Yeah that part looks indeed fishy. Well the entire "let the nop handler
> > fake-complete requests" logic is something I don't really understand. I
> > guess there's an exclusive relationship between requests handled directly
> > (and cancelled in engine->cancel_request) and requests with external
> > dma_fence dependencies.
> > 
> > But then I'm not really seeing what I'm changing, since even with the stop
> > machine you might end up with a bunch of requests depending upon external
> > fences, which then all complete at roughly the same time and race multiple
> > calls to intel_engine_init_global_seqno with each another.
> 
> Ugh, there's another issue. If nop_submit_request is executed before
> cancel_requests, we will consider the execution queue as completed and
> not in error, i.e. we will not flag those requests with the user visible
> -EIO.

Hm, that one's a bit a mess. What about the following sequence:

1. We set all submit_request functions to a nop request which does _not_
update the global seqno.

2. synchronize_rcu()

3. ->cancel_requests.

4. We set all submit_request to the current nop submit, i.e. including
pushing the global seqno forward to complete them all. Maybe rename that
one to nop_submit_complete_requests or something like that.

5. synchronize_rcu()

6. Call the spin-locked global seqno init that's currently in complete
requests.

1/3/4/6 would all be wrapped in for_each_engine ofc.

I think that would also clarify a bit why we have to move the global_seqno
in the nop_submit_request function.

Definitely needs lots of comments to explain what's going on.

Thoughts?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged
@ 2017-10-09  9:26           ` Daniel Vetter
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Vetter @ 2017-10-09  9:26 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Intel Graphics Development, LKML, Daniel Vetter,
	Thomas Gleixner, Mika Kuoppala

On Fri, Oct 06, 2017 at 06:37:52PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-06 15:20:09)
> > On Fri, Oct 06, 2017 at 12:03:49PM +0100, Chris Wilson wrote:
> > > Quoting Daniel Vetter (2017-10-06 10:06:37)
> > > > stop_machine is not really a locking primitive we should use, except
> > > > when the hw folks tell us the hw is broken and that's the only way to
> > > > work around it.
> > > > 
> > > > This patch tries to address the locking abuse of stop_machine() from
> > > > 
> > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > > 
> > > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > > 
> > > > Chris said parts of the reasons for going with stop_machine() was that
> > > > it's no overhead for the fast-path. But these callbacks use irqsave
> > > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > > > 
> > > > To stay as close as possible to the stop_machine semantics we first
> > > > update all the submit function pointers to the nop handler, then call
> > > > synchronize_rcu() to make sure no new requests can be submitted. This
> > > > should give us exactly the huge barrier we want.
> > > > 
> > > > I pondered whether we should annotate engine->submit_request as __rcu
> > > > and use rcu_assign_pointer and rcu_dereference on it. But the reason
> > > > behind those is to make sure the compiler/cpu barriers are there for
> > > > when you have an actual data structure you point at, to make sure all
> > > > the writes are seen correctly on the read side. But we just have a
> > > > function pointer, and .text isn't changed, so no need for these
> > > > barriers and hence no need for annotations.
> > > > 
> > > > This should fix the followwing lockdep splat:
> > > > 
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
> > > > ------------------------------------------------------
> > > > kworker/3:4/562 is trying to acquire lock:
> > > >  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
> > > > 
> > > > but task is already holding lock:
> > > >  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > which lock already depends on the new lock.
> > > > 
> > > > the existing dependency chain (in reverse order) is:
> > > > 
> > > > -> #6 (&dev->struct_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_interruptible_nested+0x1b/0x20
> > > >        i915_mutex_lock_interruptible+0x51/0x130 [i915]
> > > >        i915_gem_fault+0x209/0x650 [i915]
> > > >        __do_fault+0x1e/0x80
> > > >        __handle_mm_fault+0xa08/0xed0
> > > >        handle_mm_fault+0x156/0x300
> > > >        __do_page_fault+0x2c5/0x570
> > > >        do_page_fault+0x28/0x250
> > > >        page_fault+0x22/0x30
> > > > 
> > > > -> #5 (&mm->mmap_sem){++++}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __might_fault+0x68/0x90
> > > >        _copy_to_user+0x23/0x70
> > > >        filldir+0xa5/0x120
> > > >        dcache_readdir+0xf9/0x170
> > > >        iterate_dir+0x69/0x1a0
> > > >        SyS_getdents+0xa5/0x140
> > > >        entry_SYSCALL_64_fastpath+0x1c/0xb1
> > > > 
> > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}:
> > > >        down_write+0x3b/0x70
> > > >        handle_create+0xcb/0x1e0
> > > >        devtmpfsd+0x139/0x180
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #3 ((complete)&req.done){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        wait_for_common+0x58/0x210
> > > >        wait_for_completion+0x1d/0x20
> > > >        devtmpfs_create_node+0x13d/0x160
> > > >        device_add+0x5eb/0x620
> > > >        device_create_groups_vargs+0xe0/0xf0
> > > >        device_create+0x3a/0x40
> > > >        msr_device_create+0x2b/0x40
> > > >        cpuhp_invoke_callback+0xc9/0xbf0
> > > >        cpuhp_thread_fun+0x17b/0x240
> > > >        smpboot_thread_fn+0x18a/0x280
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > -> #2 (cpuhp_state-up){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpuhp_issue_call+0x133/0x1c0
> > > >        __cpuhp_setup_state_cpuslocked+0x139/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_writeback_init+0x43/0x67
> > > >        pagecache_init+0x3d/0x42
> > > >        start_kernel+0x3a8/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #1 (cpuhp_state_mutex){+.+.}:
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        __mutex_lock+0x86/0x9b0
> > > >        mutex_lock_nested+0x1b/0x20
> > > >        __cpuhp_setup_state_cpuslocked+0x53/0x2a0
> > > >        __cpuhp_setup_state+0x46/0x60
> > > >        page_alloc_init+0x28/0x30
> > > >        start_kernel+0x145/0x3fc
> > > >        x86_64_start_reservations+0x2a/0x2c
> > > >        x86_64_start_kernel+0x6d/0x70
> > > >        verify_cpu+0x0/0xfb
> > > > 
> > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}:
> > > >        check_prev_add+0x430/0x840
> > > >        __lock_acquire+0x1420/0x15e0
> > > >        lock_acquire+0xb0/0x200
> > > >        cpus_read_lock+0x3d/0xb0
> > > >        stop_machine+0x1c/0x40
> > > >        i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >        i915_reset+0xb9/0x230 [i915]
> > > >        i915_reset_device+0x1f6/0x260 [i915]
> > > >        i915_handle_error+0x2d8/0x430 [i915]
> > > >        hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >        i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >        process_one_work+0x233/0x660
> > > >        worker_thread+0x4e/0x3b0
> > > >        kthread+0x152/0x190
> > > >        ret_from_fork+0x27/0x40
> > > > 
> > > > other info that might help us debug this:
> > > > 
> > > > Chain exists of:
> > > >   cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
> > > > 
> > > >  Possible unsafe locking scenario:
> > > > 
> > > >        CPU0                    CPU1
> > > >        ----                    ----
> > > >   lock(&dev->struct_mutex);
> > > >                                lock(&mm->mmap_sem);
> > > >                                lock(&dev->struct_mutex);
> > > >   lock(cpu_hotplug_lock.rw_sem);
> > > > 
> > > >  *** DEADLOCK ***
> > > > 
> > > > 3 locks held by kworker/3:4/562:
> > > >  #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
> > > >  #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
> > > > 
> > > > stack backtrace:
> > > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
> > > > Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
> > > > Workqueue: events_long i915_hangcheck_elapsed [i915]
> > > > Call Trace:
> > > >  dump_stack+0x68/0x9f
> > > >  print_circular_bug+0x235/0x3c0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  check_prev_add+0x430/0x840
> > > >  ? irq_work_queue+0x86/0xe0
> > > >  ? wake_up_klogd+0x53/0x70
> > > >  __lock_acquire+0x1420/0x15e0
> > > >  ? __lock_acquire+0x1420/0x15e0
> > > >  ? lockdep_init_map_crosslock+0x20/0x20
> > > >  lock_acquire+0xb0/0x200
> > > >  ? stop_machine+0x1c/0x40
> > > >  ? i915_gem_object_truncate+0x50/0x50 [i915]
> > > >  cpus_read_lock+0x3d/0xb0
> > > >  ? stop_machine+0x1c/0x40
> > > >  stop_machine+0x1c/0x40
> > > >  i915_gem_set_wedged+0x1a/0x20 [i915]
> > > >  i915_reset+0xb9/0x230 [i915]
> > > >  i915_reset_device+0x1f6/0x260 [i915]
> > > >  ? gen8_gt_irq_ack+0x170/0x170 [i915]
> > > >  ? work_on_cpu_safe+0x60/0x60
> > > >  i915_handle_error+0x2d8/0x430 [i915]
> > > >  ? vsnprintf+0xd1/0x4b0
> > > >  ? scnprintf+0x3a/0x70
> > > >  hangcheck_declare_hang+0xd3/0xf0 [i915]
> > > >  ? intel_runtime_pm_put+0x56/0xa0 [i915]
> > > >  i915_hangcheck_elapsed+0x262/0x2d0 [i915]
> > > >  process_one_work+0x233/0x660
> > > >  worker_thread+0x4e/0x3b0
> > > >  kthread+0x152/0x190
> > > >  ? process_one_work+0x660/0x660
> > > >  ? kthread_create_on_node+0x40/0x40
> > > >  ret_from_fork+0x27/0x40
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > Setting dangerous option reset - tainting kernel
> > > > i915 0000:00:02.0: Resetting chip after gpu hang
> > > > 
> > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and
> > > > improve commit message.
> > > > 
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Marta Lofstedt <marta.lofstedt@intel.com>
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem.c                   | 31 +++++++++--------------
> > > >  drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> > > >  drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> > > >  3 files changed, 16 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index ab8c6946fea4..e79a6ca60265 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> > > >         intel_engine_init_global_seqno(request->engine, request->global_seqno);
> > > >  }
> > > >  
> > > > -static void engine_set_wedged(struct intel_engine_cs *engine)
> > > > +static void engine_complete_requests(struct intel_engine_cs *engine)
> > > >  {
> > > > -       /* We need to be sure that no thread is running the old callback as
> > > > -        * we install the nop handler (otherwise we would submit a request
> > > > -        * to hardware that will never complete). In order to prevent this
> > > > -        * race, we wait until the machine is idle before making the swap
> > > > -        * (using stop_machine()).
> > > > -        */
> > > > -       engine->submit_request = nop_submit_request;
> > > > -
> > > >         /* Mark all executing requests as skipped */
> > > >         engine->cancel_requests(engine);
> > > 
> > > How are we planning to serialise the intel_engine_init_global_seqno()
> > > here with the in-flight nop_submit? With sufficient thrust we will get a
> > > stale breadcrumb and an incomplete request.
> > 
> > Yeah that part looks indeed fishy. Well the entire "let the nop handler
> > fake-complete requests" logic is something I don't really understand. I
> > guess there's an exclusive relationship between requests handled directly
> > (and cancelled in engine->cancel_request) and requests with external
> > dma_fence dependencies.
> > 
> > But then I'm not really seeing what I'm changing, since even with the stop
> > machine you might end up with a bunch of requests depending upon external
> > fences, which then all complete at roughly the same time and race multiple
> > calls to intel_engine_init_global_seqno with each another.
> 
> Ugh, there's another issue. If nop_submit_request is executed before
> cancel_requests, we will consider the execution queue as completed and
> not in error, i.e. we will not flag those requests with the user visible
> -EIO.

Hm, that one's a bit a mess. What about the following sequence:

1. We set all submit_request functions to a nop request which does _not_
update the global seqno.

2. synchronize_rcu()

3. ->cancel_requests.

4. We set all submit_request to the current nop submit, i.e. including
pushing the global seqno forward to complete them all. Maybe rename that
one to nop_submit_complete_requests or something like that.

5. synchronize_rcu()

6. Call the spin-locked global seqno init that's currently in complete
requests.

1/3/4/6 would all be wrapped in for_each_engine ofc.

I think that would also clarify a bit why we have to move the global_seqno
in the nop_submit_request function.

Definitely needs lots of comments to explain what's going on.

Thoughts?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-10-09  9:26 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-06  9:06 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter
2017-10-06  9:06 ` Daniel Vetter
2017-10-06  9:06 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter
2017-10-06  9:06   ` Daniel Vetter
2017-10-06  9:17   ` Chris Wilson
2017-10-06  9:17     ` Chris Wilson
2017-10-06 10:12     ` Thomas Gleixner
2017-10-06 10:12       ` Thomas Gleixner
2017-10-06 11:12       ` Peter Zijlstra
2017-10-06 11:12         ` Peter Zijlstra
2017-10-06 14:12       ` Daniel Vetter
2017-10-06 11:03   ` Chris Wilson
2017-10-06 11:03     ` Chris Wilson
2017-10-06 14:20     ` Daniel Vetter
2017-10-06 17:29       ` Chris Wilson
2017-10-09  9:12         ` Daniel Vetter
2017-10-09  9:12           ` Daniel Vetter
2017-10-06 17:37       ` Chris Wilson
2017-10-09  9:26         ` Daniel Vetter
2017-10-09  9:26           ` Daniel Vetter
2017-10-06 11:10   ` Peter Zijlstra
2017-10-06  9:23 ` [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Chris Wilson
2017-10-06  9:23   ` Chris Wilson
2017-10-06  9:48 ` Chris Wilson
2017-10-06  9:48   ` Chris Wilson
2017-10-06 11:34 ` [Intel-gfx] " Tvrtko Ursulin
2017-10-06 11:34   ` Tvrtko Ursulin
2017-10-06 14:23   ` [Intel-gfx] " Daniel Vetter
2017-10-06 14:23     ` Daniel Vetter
2017-10-06 14:44     ` [Intel-gfx] " Tvrtko Ursulin
2017-10-06 12:50 ` ✗ Fi.CI.BAT: warning for series starting with [1/2] " Patchwork
2017-10-06 15:52 ` [PATCH] " Daniel Vetter
2017-10-06 16:07   ` Tvrtko Ursulin
2017-10-06 16:29 ` ✓ Fi.CI.BAT: success for series starting with drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock (rev2) Patchwork
2017-10-06 22:20 ` ✗ Fi.CI.IGT: warning " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.