* [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock @ 2017-10-09 16:44 Daniel Vetter 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-09 16:44 UTC (permalink / raw) To: Intel Graphics Development Cc: Peter Zijlstra, Daniel Vetter, Tejun Heo, Daniel Vetter, Thomas Gleixner, Sasha Levin 4.14-rc1 gained the fancy new cross-release support in lockdep, which seems to have uncovered a few more rules about what is allowed and isn't. This one here seems to indicate that allocating a work-queue while holding mmap_sem is a no-go, so let's try to preallocate it. Of course another way to break this chain would be somewhere in the cpu hotplug code, since this isn't the only trace we're finding now which goes through msr_create_device. Full lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G U ------------------------------------------------------ prime_mmap/1551 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50 but task is already holding lock: (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&dev_priv->mm_lock){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __mutex_lock+0x86/0x9b0 mutex_lock_nested+0x1b/0x20 i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] i915_gem_userptr_ioctl+0x222/0x2c0 [i915] drm_ioctl_kernel+0x69/0xb0 drm_ioctl+0x2f9/0x3d0 do_vfs_ioctl+0x94/0x670 SyS_ioctl+0x41/0x70 entry_SYSCALL_64_fastpath+0x1c/0xb1 -> #5 (&mm->mmap_sem){++++}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __might_fault+0x68/0x90 _copy_to_user+0x23/0x70 filldir+0xa5/0x120 dcache_readdir+0xf9/0x170 iterate_dir+0x69/0x1a0 SyS_getdents+0xa5/0x140 entry_SYSCALL_64_fastpath+0x1c/0xb1 -> #4 (&sb->s_type->i_mutex_key#5){++++}: down_write+0x3b/0x70 handle_create+0xcb/0x1e0 devtmpfsd+0x139/0x180 kthread+0x152/0x190 ret_from_fork+0x27/0x40 -> #3 ((complete)&req.done){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 wait_for_common+0x58/0x210 wait_for_completion+0x1d/0x20 devtmpfs_create_node+0x13d/0x160 device_add+0x5eb/0x620 device_create_groups_vargs+0xe0/0xf0 device_create+0x3a/0x40 msr_device_create+0x2b/0x40 cpuhp_invoke_callback+0xa3/0x840 cpuhp_thread_fun+0x7a/0x150 smpboot_thread_fn+0x18a/0x280 kthread+0x152/0x190 ret_from_fork+0x27/0x40 -> #2 (cpuhp_state){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 cpuhp_issue_call+0x10b/0x170 __cpuhp_setup_state_cpuslocked+0x134/0x2a0 __cpuhp_setup_state+0x46/0x60 page_writeback_init+0x43/0x67 pagecache_init+0x3d/0x42 start_kernel+0x3a8/0x3fc x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x6d/0x70 verify_cpu+0x0/0xfb -> #1 (cpuhp_state_mutex){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __mutex_lock+0x86/0x9b0 mutex_lock_nested+0x1b/0x20 __cpuhp_setup_state_cpuslocked+0x52/0x2a0 __cpuhp_setup_state+0x46/0x60 page_alloc_init+0x28/0x30 start_kernel+0x145/0x3fc x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x6d/0x70 verify_cpu+0x0/0xfb -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x430/0x840 __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 cpus_read_lock+0x3d/0xb0 apply_workqueue_attrs+0x17/0x50 __alloc_workqueue_key+0x1d8/0x4d9 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915] i915_gem_userptr_ioctl+0x222/0x2c0 [i915] drm_ioctl_kernel+0x69/0xb0 drm_ioctl+0x2f9/0x3d0 do_vfs_ioctl+0x94/0x670 SyS_ioctl+0x41/0x70 entry_SYSCALL_64_fastpath+0x1c/0xb1 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev_priv->mm_lock); lock(&mm->mmap_sem); lock(&dev_priv->mm_lock); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 2 locks held by prime_mmap/1551: #0: (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915] #1: (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] stack backtrace: CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G U 4.14.0-rc1-CI-CI_DRM_3118+ #1 Hardware name: Dell Inc. XPS 8300 /0Y2MRG, BIOS A06 10/17/2011 Call Trace: dump_stack+0x68/0x9f print_circular_bug+0x235/0x3c0 ? lockdep_init_map_crosslock+0x20/0x20 check_prev_add+0x430/0x840 __lock_acquire+0x1420/0x15e0 ? __lock_acquire+0x1420/0x15e0 ? lockdep_init_map_crosslock+0x20/0x20 lock_acquire+0xb0/0x200 ? apply_workqueue_attrs+0x17/0x50 cpus_read_lock+0x3d/0xb0 ? apply_workqueue_attrs+0x17/0x50 apply_workqueue_attrs+0x17/0x50 __alloc_workqueue_key+0x1d8/0x4d9 ? __lockdep_init_map+0x57/0x1c0 i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915] i915_gem_userptr_ioctl+0x222/0x2c0 [i915] ? i915_gem_userptr_release+0x140/0x140 [i915] drm_ioctl_kernel+0x69/0xb0 drm_ioctl+0x2f9/0x3d0 ? i915_gem_userptr_release+0x140/0x140 [i915] ? __do_page_fault+0x2a4/0x570 do_vfs_ioctl+0x94/0x670 ? entry_SYSCALL_64_fastpath+0x5/0xb1 ? __this_cpu_preempt_check+0x13/0x20 ? trace_hardirqs_on_caller+0xe3/0x1b0 SyS_ioctl+0x41/0x70 entry_SYSCALL_64_fastpath+0x1c/0xb1 RIP: 0033:0x7fbb83c39587 RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587 RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003 RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac ? __this_cpu_preempt_check+0x13/0x20 Note that this also has the minor benefit of slightly reducing the critical section where we hold mmap_sem. v2: Set ret correctly when we raced with another thread. v3: Use Chris' diff. Attach the right lockdep splat. v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and some other thread managed to not also get an ENOMEM and successfully install the mmu notifier. Note that the kernel guarantees that small allocations succeed, so this never actually happens). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sasha Levin <alexander.levin@verizon.com> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Cc: Tejun Heo <tj@kernel.org> References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939 Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- drivers/gpu/drm/i915/i915_gem_userptr.c | 38 ++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index 41e16e19c3f3..65fbf334508e 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -164,7 +164,6 @@ static struct i915_mmu_notifier * i915_mmu_notifier_create(struct mm_struct *mm) { struct i915_mmu_notifier *mn; - int ret; mn = kmalloc(sizeof(*mn), GFP_KERNEL); if (mn == NULL) @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm) return ERR_PTR(-ENOMEM); } - /* Protected by mmap_sem (write-lock) */ - ret = __mmu_notifier_register(&mn->mn, mm); - if (ret) { - destroy_workqueue(mn->wq); - kfree(mn); - return ERR_PTR(ret); - } - return mn; } @@ -210,23 +201,40 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj) static struct i915_mmu_notifier * i915_mmu_notifier_find(struct i915_mm_struct *mm) { - struct i915_mmu_notifier *mn = mm->mn; + struct i915_mmu_notifier *mn; + int err = 0; mn = mm->mn; if (mn) return mn; + mn = i915_mmu_notifier_create(mm->mm); + if (IS_ERR(mn)) + err = PTR_ERR(mn); + down_write(&mm->mm->mmap_sem); mutex_lock(&mm->i915->mm_lock); - if ((mn = mm->mn) == NULL) { - mn = i915_mmu_notifier_create(mm->mm); - if (!IS_ERR(mn)) - mm->mn = mn; + if (mm->mn == NULL && !err) { + /* Protected by mmap_sem (write-lock) */ + err = __mmu_notifier_register(&mn->mn, mm->mm); + if (!err) { + /* Protected by mm_lock */ + mm->mn = fetch_and_zero(&mn); + } + } else { + /* someone else raced and successfully installed the mmu + * notifier, we can cancel our own errors */ + err = 0; } mutex_unlock(&mm->i915->mm_lock); up_write(&mm->mm->mmap_sem); - return mn; + if (mn) { + destroy_workqueue(mn->wq); + kfree(mn); + } + + return err ? ERR_PTR(err) : mm->mn; } static int -- 2.14.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-09 16:44 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter @ 2017-10-09 16:44 ` Daniel Vetter 2017-10-10 9:21 ` Chris Wilson ` (2 more replies) 2017-10-09 18:42 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Patchwork ` (2 subsequent siblings) 3 siblings, 3 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-09 16:44 UTC (permalink / raw) To: Intel Graphics Development Cc: Daniel Vetter, Daniel Vetter, Thomas Gleixner, Mika Kuoppala stop_machine is not really a locking primitive we should use, except when the hw folks tell us the hw is broken and that's the only way to work around it. This patch tries to address the locking abuse of stop_machine() from commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 22 14:41:21 2016 +0000 drm/i915: Stop the machine as we install the wedged submit_request handler Chris said parts of the reasons for going with stop_machine() was that it's no overhead for the fast-path. But these callbacks use irqsave spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. To stay as close as possible to the stop_machine semantics we first update all the submit function pointers to the nop handler, then call synchronize_rcu() to make sure no new requests can be submitted. This should give us exactly the huge barrier we want. I pondered whether we should annotate engine->submit_request as __rcu and use rcu_assign_pointer and rcu_dereference on it. But the reason behind those is to make sure the compiler/cpu barriers are there for when you have an actual data structure you point at, to make sure all the writes are seen correctly on the read side. But we just have a function pointer, and .text isn't changed, so no need for these barriers and hence no need for annotations. Unfortunately there's a complication with the call to intel_engine_init_global_seqno: - Without stop_machine we must hold the corresponding spinlock. - Without stop_machine we must ensure that all requests are marked as having failed with dma_fence_set_error() before we call it. That means we need to split the nop request submission into two phases, both synchronized with rcu: 1. Only stop submitting the requests to hw and mark them as failed. 2. After all pending requests in the scheduler/ring are suitably marked up as failed and we can force complete them all, also force complete by calling intel_engine_init_global_seqno(). This should fix the followwing lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U ------------------------------------------------------ kworker/3:4/562 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40 but task is already holding lock: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&dev->struct_mutex){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __mutex_lock+0x86/0x9b0 mutex_lock_interruptible_nested+0x1b/0x20 i915_mutex_lock_interruptible+0x51/0x130 [i915] i915_gem_fault+0x209/0x650 [i915] __do_fault+0x1e/0x80 __handle_mm_fault+0xa08/0xed0 handle_mm_fault+0x156/0x300 __do_page_fault+0x2c5/0x570 do_page_fault+0x28/0x250 page_fault+0x22/0x30 -> #5 (&mm->mmap_sem){++++}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __might_fault+0x68/0x90 _copy_to_user+0x23/0x70 filldir+0xa5/0x120 dcache_readdir+0xf9/0x170 iterate_dir+0x69/0x1a0 SyS_getdents+0xa5/0x140 entry_SYSCALL_64_fastpath+0x1c/0xb1 -> #4 (&sb->s_type->i_mutex_key#5){++++}: down_write+0x3b/0x70 handle_create+0xcb/0x1e0 devtmpfsd+0x139/0x180 kthread+0x152/0x190 ret_from_fork+0x27/0x40 -> #3 ((complete)&req.done){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 wait_for_common+0x58/0x210 wait_for_completion+0x1d/0x20 devtmpfs_create_node+0x13d/0x160 device_add+0x5eb/0x620 device_create_groups_vargs+0xe0/0xf0 device_create+0x3a/0x40 msr_device_create+0x2b/0x40 cpuhp_invoke_callback+0xc9/0xbf0 cpuhp_thread_fun+0x17b/0x240 smpboot_thread_fn+0x18a/0x280 kthread+0x152/0x190 ret_from_fork+0x27/0x40 -> #2 (cpuhp_state-up){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 cpuhp_issue_call+0x133/0x1c0 __cpuhp_setup_state_cpuslocked+0x139/0x2a0 __cpuhp_setup_state+0x46/0x60 page_writeback_init+0x43/0x67 pagecache_init+0x3d/0x42 start_kernel+0x3a8/0x3fc x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x6d/0x70 verify_cpu+0x0/0xfb -> #1 (cpuhp_state_mutex){+.+.}: __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 __mutex_lock+0x86/0x9b0 mutex_lock_nested+0x1b/0x20 __cpuhp_setup_state_cpuslocked+0x53/0x2a0 __cpuhp_setup_state+0x46/0x60 page_alloc_init+0x28/0x30 start_kernel+0x145/0x3fc x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x6d/0x70 verify_cpu+0x0/0xfb -> #0 (cpu_hotplug_lock.rw_sem){++++}: check_prev_add+0x430/0x840 __lock_acquire+0x1420/0x15e0 lock_acquire+0xb0/0x200 cpus_read_lock+0x3d/0xb0 stop_machine+0x1c/0x40 i915_gem_set_wedged+0x1a/0x20 [i915] i915_reset+0xb9/0x230 [i915] i915_reset_device+0x1f6/0x260 [i915] i915_handle_error+0x2d8/0x430 [i915] hangcheck_declare_hang+0xd3/0xf0 [i915] i915_hangcheck_elapsed+0x262/0x2d0 [i915] process_one_work+0x233/0x660 worker_thread+0x4e/0x3b0 kthread+0x152/0x190 ret_from_fork+0x27/0x40 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dev->struct_mutex); lock(&mm->mmap_sem); lock(&dev->struct_mutex); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 3 locks held by kworker/3:4/562: #0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 #1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 #2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] stack backtrace: CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1 Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017 Workqueue: events_long i915_hangcheck_elapsed [i915] Call Trace: dump_stack+0x68/0x9f print_circular_bug+0x235/0x3c0 ? lockdep_init_map_crosslock+0x20/0x20 check_prev_add+0x430/0x840 ? irq_work_queue+0x86/0xe0 ? wake_up_klogd+0x53/0x70 __lock_acquire+0x1420/0x15e0 ? __lock_acquire+0x1420/0x15e0 ? lockdep_init_map_crosslock+0x20/0x20 lock_acquire+0xb0/0x200 ? stop_machine+0x1c/0x40 ? i915_gem_object_truncate+0x50/0x50 [i915] cpus_read_lock+0x3d/0xb0 ? stop_machine+0x1c/0x40 stop_machine+0x1c/0x40 i915_gem_set_wedged+0x1a/0x20 [i915] i915_reset+0xb9/0x230 [i915] i915_reset_device+0x1f6/0x260 [i915] ? gen8_gt_irq_ack+0x170/0x170 [i915] ? work_on_cpu_safe+0x60/0x60 i915_handle_error+0x2d8/0x430 [i915] ? vsnprintf+0xd1/0x4b0 ? scnprintf+0x3a/0x70 hangcheck_declare_hang+0xd3/0xf0 [i915] ? intel_runtime_pm_put+0x56/0xa0 [i915] i915_hangcheck_elapsed+0x262/0x2d0 [i915] process_one_work+0x233/0x660 worker_thread+0x4e/0x3b0 kthread+0x152/0x190 ? process_one_work+0x660/0x660 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x27/0x40 Setting dangerous option reset - tainting kernel i915 0000:00:02.0: Resetting chip after gpu hang Setting dangerous option reset - tainting kernel i915 0000:00:02.0: Resetting chip after gpu hang v2: Have 1 global synchronize_rcu() barrier across all engines, and improve commit message. v3: We need to protect the seqno update with the timeline spinlock (in set_wedged) to avoid racing with other updates of the seqno, like we already do in nop_submit_request (Chris). v4: Use two-phase sequence to plug the race Chris spotted where we can complete requests before they're marked up with -EIO. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marta Lofstedt <marta.lofstedt@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 76 +++++++++++++++-------- drivers/gpu/drm/i915/i915_gem_request.c | 2 + drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 + 3 files changed, 53 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 82a10036fb38..8d7d8d1f78db 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3057,49 +3057,71 @@ static void nop_submit_request(struct drm_i915_gem_request *request) spin_lock_irqsave(&request->engine->timeline->lock, flags); __i915_gem_request_submit(request); - intel_engine_init_global_seqno(request->engine, request->global_seqno); spin_unlock_irqrestore(&request->engine->timeline->lock, flags); } -static void engine_set_wedged(struct intel_engine_cs *engine) +static void nop_complete_submit_request(struct drm_i915_gem_request *request) { - /* We need to be sure that no thread is running the old callback as - * we install the nop handler (otherwise we would submit a request - * to hardware that will never complete). In order to prevent this - * race, we wait until the machine is idle before making the swap - * (using stop_machine()). - */ - engine->submit_request = nop_submit_request; + unsigned long flags; - /* Mark all executing requests as skipped */ - engine->cancel_requests(engine); + GEM_BUG_ON(!i915_terminally_wedged(&request->i915->gpu_error)); + dma_fence_set_error(&request->fence, -EIO); - /* Mark all pending requests as complete so that any concurrent - * (lockless) lookup doesn't try and wait upon the request as we - * reset it. - */ - intel_engine_init_global_seqno(engine, - intel_engine_last_submit(engine)); + spin_lock_irqsave(&request->engine->timeline->lock, flags); + __i915_gem_request_submit(request); + intel_engine_init_global_seqno(request->engine, request->global_seqno); + spin_unlock_irqrestore(&request->engine->timeline->lock, flags); } -static int __i915_gem_set_wedged_BKL(void *data) +void i915_gem_set_wedged(struct drm_i915_private *i915) { - struct drm_i915_private *i915 = data; struct intel_engine_cs *engine; enum intel_engine_id id; + /* First, stop submission to hw, but do not yet complete requests by + * rolling the global seqno forward (since this would complete requests + * for which we haven't set the fence error to EIO yet). + */ for_each_engine(engine, i915, id) - engine_set_wedged(engine); + engine->submit_request = nop_submit_request; - set_bit(I915_WEDGED, &i915->gpu_error.flags); - wake_up_all(&i915->gpu_error.reset_queue); + /* Make sure no one is running the old callback before we proceed with + * cancelling requests and resetting the completion tracking. Otherwise + * we might submit a request to the hardware which never completes. + */ + synchronize_rcu(); - return 0; -} + for_each_engine(engine, i915, id) { + /* Mark all executing requests as skipped */ + engine->cancel_requests(engine); -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) -{ - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); + /* Only once we've force-cancelled all in-flight requests can we + * start to complete all requests. + */ + engine->submit_request = nop_complete_submit_request; + } + + /* Make sure no request can slip through without getting completed by + * either this call here to intel_engine_init_global_seqno, or the one + * in nop_complete_submit_request. + */ + synchronize_rcu(); + + for_each_engine(engine, i915, id) { + unsigned long flags; + + /* Mark all pending requests as complete so that any concurrent + * (lockless) lookup doesn't try and wait upon the request as we + * reset it. + */ + spin_lock_irqsave(&engine->timeline->lock, flags); + intel_engine_init_global_seqno(engine, + intel_engine_last_submit(engine)); + spin_unlock_irqrestore(&engine->timeline->lock, flags); + } + + set_bit(I915_WEDGED, &i915->gpu_error.flags); + wake_up_all(&i915->gpu_error.reset_queue); } bool i915_gem_unset_wedged(struct drm_i915_private *i915) diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index b100b38f1dd2..ef78a85cb845 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) switch (state) { case FENCE_COMPLETE: trace_i915_gem_request_submit(request); + rcu_read_lock(); request->engine->submit_request(request); + rcu_read_unlock(); break; case FENCE_FREE: diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c index 78b9f811707f..a999161e8db1 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) } i915_gem_request_get(vip); i915_add_request(vip); + rcu_read_lock(); request->engine->submit_request(request); + rcu_read_unlock(); mutex_unlock(&i915->drm.struct_mutex); -- 2.14.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter @ 2017-10-10 9:21 ` Chris Wilson 2017-10-10 12:30 ` Daniel Vetter 2017-10-10 13:29 ` Chris Wilson 2017-10-10 9:50 ` Mika Kuoppala 2017-10-10 13:39 ` Chris Wilson 2 siblings, 2 replies; 12+ messages in thread From: Chris Wilson @ 2017-10-10 9:21 UTC (permalink / raw) To: Intel Graphics Development Cc: Daniel Vetter, Daniel Vetter, Thomas Gleixner, Mika Kuoppala Quoting Daniel Vetter (2017-10-09 17:44:01) > stop_machine is not really a locking primitive we should use, except > when the hw folks tell us the hw is broken and that's the only way to > work around it. > > This patch tries to address the locking abuse of stop_machine() from > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Nov 22 14:41:21 2016 +0000 > > drm/i915: Stop the machine as we install the wedged submit_request handler > > Chris said parts of the reasons for going with stop_machine() was that > it's no overhead for the fast-path. But these callbacks use irqsave > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > To stay as close as possible to the stop_machine semantics we first > update all the submit function pointers to the nop handler, then call > synchronize_rcu() to make sure no new requests can be submitted. This > should give us exactly the huge barrier we want. > > I pondered whether we should annotate engine->submit_request as __rcu > and use rcu_assign_pointer and rcu_dereference on it. But the reason > behind those is to make sure the compiler/cpu barriers are there for > when you have an actual data structure you point at, to make sure all > the writes are seen correctly on the read side. But we just have a > function pointer, and .text isn't changed, so no need for these > barriers and hence no need for annotations. > > Unfortunately there's a complication with the call to > intel_engine_init_global_seqno: This is still broken in the same way as nop_submit_request may execute while you sleep, breaking cancel_requests. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-10 9:21 ` Chris Wilson @ 2017-10-10 12:30 ` Daniel Vetter 2017-10-10 13:29 ` Chris Wilson 1 sibling, 0 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-10 12:30 UTC (permalink / raw) To: Chris Wilson Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter, Thomas Gleixner, Mika Kuoppala On Tue, Oct 10, 2017 at 10:21:45AM +0100, Chris Wilson wrote: > Quoting Daniel Vetter (2017-10-09 17:44:01) > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch tries to address the locking abuse of stop_machine() from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > To stay as close as possible to the stop_machine semantics we first > > update all the submit function pointers to the nop handler, then call > > synchronize_rcu() to make sure no new requests can be submitted. This > > should give us exactly the huge barrier we want. > > > > I pondered whether we should annotate engine->submit_request as __rcu > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > behind those is to make sure the compiler/cpu barriers are there for > > when you have an actual data structure you point at, to make sure all > > the writes are seen correctly on the read side. But we just have a > > function pointer, and .text isn't changed, so no need for these > > barriers and hence no need for annotations. > > > > Unfortunately there's a complication with the call to > > intel_engine_init_global_seqno: > > This is still broken in the same way as nop_submit_request may execute > while you sleep, breaking cancel_requests. I guess then I didn't understand which race you mean, since I think the one I've found should be fixed now. Can you pls explain in more detail what exactly is racing against what else? From what I can tell legacy cancel_request is entirely under the spinlock, so hard to race, same for lrc. And with the global seqno update untangled, they shouldn't complete too early anymore, which I thought was the race you pointed out on the previous thread. I did reply to that to check my understanding, but didn't get a reply. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-10 9:21 ` Chris Wilson 2017-10-10 12:30 ` Daniel Vetter @ 2017-10-10 13:29 ` Chris Wilson 1 sibling, 0 replies; 12+ messages in thread From: Chris Wilson @ 2017-10-10 13:29 UTC (permalink / raw) To: Intel Graphics Development Cc: Daniel Vetter, Daniel Vetter, Thomas Gleixner, Mika Kuoppala Quoting Chris Wilson (2017-10-10 10:21:45) > Quoting Daniel Vetter (2017-10-09 17:44:01) > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch tries to address the locking abuse of stop_machine() from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > To stay as close as possible to the stop_machine semantics we first > > update all the submit function pointers to the nop handler, then call > > synchronize_rcu() to make sure no new requests can be submitted. This > > should give us exactly the huge barrier we want. > > > > I pondered whether we should annotate engine->submit_request as __rcu > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > behind those is to make sure the compiler/cpu barriers are there for > > when you have an actual data structure you point at, to make sure all > > the writes are seen correctly on the read side. But we just have a > > function pointer, and .text isn't changed, so no need for these > > barriers and hence no need for annotations. > > > > Unfortunately there's a complication with the call to > > intel_engine_init_global_seqno: > > This is still broken in the same way as nop_submit_request may execute > while you sleep, breaking cancel_requests. My apologies, I misread the diff and didn't catch the removal of init_seqno. From that pov, the execlists should be intact and I can't see a way for a lack of -EIO being reported. (That we have them is a topic for another day, as set-wedged is not meant to be regarded as a normal operation, but user data loss.) The residual panic I have is that we had to throw set-wedged vs unwedge ordering out of the window; it used to be struct_mutex. Making set-wedged prolonged only makes the window wider, it didn't create that window, and at the moment it's carefully ordered inside the i915_handle_error fallback. (We blissfully ignore debugfs.) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter 2017-10-10 9:21 ` Chris Wilson @ 2017-10-10 9:50 ` Mika Kuoppala 2017-10-10 12:37 ` Daniel Vetter 2017-10-10 13:39 ` Chris Wilson 2 siblings, 1 reply; 12+ messages in thread From: Mika Kuoppala @ 2017-10-10 9:50 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Thomas Gleixner Daniel Vetter <daniel.vetter@ffwll.ch> writes: > stop_machine is not really a locking primitive we should use, except > when the hw folks tell us the hw is broken and that's the only way to > work around it. > > This patch tries to address the locking abuse of stop_machine() from > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Nov 22 14:41:21 2016 +0000 > > drm/i915: Stop the machine as we install the wedged submit_request handler > > Chris said parts of the reasons for going with stop_machine() was that > it's no overhead for the fast-path. But these callbacks use irqsave > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > To stay as close as possible to the stop_machine semantics we first > update all the submit function pointers to the nop handler, then call > synchronize_rcu() to make sure no new requests can be submitted. This > should give us exactly the huge barrier we want. > > I pondered whether we should annotate engine->submit_request as __rcu > and use rcu_assign_pointer and rcu_dereference on it. But the reason > behind those is to make sure the compiler/cpu barriers are there for > when you have an actual data structure you point at, to make sure all > the writes are seen correctly on the read side. But we just have a > function pointer, and .text isn't changed, so no need for these > barriers and hence no need for annotations. > > Unfortunately there's a complication with the call to > intel_engine_init_global_seqno: > > - Without stop_machine we must hold the corresponding spinlock. > > - Without stop_machine we must ensure that all requests are marked as > having failed with dma_fence_set_error() before we call it. That > means we need to split the nop request submission into two phases, > both synchronized with rcu: > > 1. Only stop submitting the requests to hw and mark them as failed. > > 2. After all pending requests in the scheduler/ring are suitably > marked up as failed and we can force complete them all, also force > complete by calling intel_engine_init_global_seqno(). > > This should fix the followwing lockdep splat: > > ====================================================== > WARNING: possible circular locking dependency detected > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U > ------------------------------------------------------ > kworker/3:4/562 is trying to acquire lock: > (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40 > > but task is already holding lock: > (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #6 (&dev->struct_mutex){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __mutex_lock+0x86/0x9b0 > mutex_lock_interruptible_nested+0x1b/0x20 > i915_mutex_lock_interruptible+0x51/0x130 [i915] > i915_gem_fault+0x209/0x650 [i915] > __do_fault+0x1e/0x80 > __handle_mm_fault+0xa08/0xed0 > handle_mm_fault+0x156/0x300 > __do_page_fault+0x2c5/0x570 > do_page_fault+0x28/0x250 > page_fault+0x22/0x30 > > -> #5 (&mm->mmap_sem){++++}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __might_fault+0x68/0x90 > _copy_to_user+0x23/0x70 > filldir+0xa5/0x120 > dcache_readdir+0xf9/0x170 > iterate_dir+0x69/0x1a0 > SyS_getdents+0xa5/0x140 > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > -> #4 (&sb->s_type->i_mutex_key#5){++++}: > down_write+0x3b/0x70 > handle_create+0xcb/0x1e0 > devtmpfsd+0x139/0x180 > kthread+0x152/0x190 > ret_from_fork+0x27/0x40 > > -> #3 ((complete)&req.done){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > wait_for_common+0x58/0x210 > wait_for_completion+0x1d/0x20 > devtmpfs_create_node+0x13d/0x160 > device_add+0x5eb/0x620 > device_create_groups_vargs+0xe0/0xf0 > device_create+0x3a/0x40 > msr_device_create+0x2b/0x40 > cpuhp_invoke_callback+0xc9/0xbf0 > cpuhp_thread_fun+0x17b/0x240 > smpboot_thread_fn+0x18a/0x280 > kthread+0x152/0x190 > ret_from_fork+0x27/0x40 > > -> #2 (cpuhp_state-up){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > cpuhp_issue_call+0x133/0x1c0 > __cpuhp_setup_state_cpuslocked+0x139/0x2a0 > __cpuhp_setup_state+0x46/0x60 > page_writeback_init+0x43/0x67 > pagecache_init+0x3d/0x42 > start_kernel+0x3a8/0x3fc > x86_64_start_reservations+0x2a/0x2c > x86_64_start_kernel+0x6d/0x70 > verify_cpu+0x0/0xfb > > -> #1 (cpuhp_state_mutex){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __mutex_lock+0x86/0x9b0 > mutex_lock_nested+0x1b/0x20 > __cpuhp_setup_state_cpuslocked+0x53/0x2a0 > __cpuhp_setup_state+0x46/0x60 > page_alloc_init+0x28/0x30 > start_kernel+0x145/0x3fc > x86_64_start_reservations+0x2a/0x2c > x86_64_start_kernel+0x6d/0x70 > verify_cpu+0x0/0xfb > > -> #0 (cpu_hotplug_lock.rw_sem){++++}: > check_prev_add+0x430/0x840 > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > cpus_read_lock+0x3d/0xb0 > stop_machine+0x1c/0x40 > i915_gem_set_wedged+0x1a/0x20 [i915] > i915_reset+0xb9/0x230 [i915] > i915_reset_device+0x1f6/0x260 [i915] > i915_handle_error+0x2d8/0x430 [i915] > hangcheck_declare_hang+0xd3/0xf0 [i915] > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > process_one_work+0x233/0x660 > worker_thread+0x4e/0x3b0 > kthread+0x152/0x190 > ret_from_fork+0x27/0x40 > > other info that might help us debug this: > > Chain exists of: > cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&dev->struct_mutex); > lock(&mm->mmap_sem); > lock(&dev->struct_mutex); > lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > 3 locks held by kworker/3:4/562: > #0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 > #1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 > #2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] > > stack backtrace: > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1 > Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017 > Workqueue: events_long i915_hangcheck_elapsed [i915] > Call Trace: > dump_stack+0x68/0x9f > print_circular_bug+0x235/0x3c0 > ? lockdep_init_map_crosslock+0x20/0x20 > check_prev_add+0x430/0x840 > ? irq_work_queue+0x86/0xe0 > ? wake_up_klogd+0x53/0x70 > __lock_acquire+0x1420/0x15e0 > ? __lock_acquire+0x1420/0x15e0 > ? lockdep_init_map_crosslock+0x20/0x20 > lock_acquire+0xb0/0x200 > ? stop_machine+0x1c/0x40 > ? i915_gem_object_truncate+0x50/0x50 [i915] > cpus_read_lock+0x3d/0xb0 > ? stop_machine+0x1c/0x40 > stop_machine+0x1c/0x40 > i915_gem_set_wedged+0x1a/0x20 [i915] > i915_reset+0xb9/0x230 [i915] > i915_reset_device+0x1f6/0x260 [i915] > ? gen8_gt_irq_ack+0x170/0x170 [i915] > ? work_on_cpu_safe+0x60/0x60 > i915_handle_error+0x2d8/0x430 [i915] > ? vsnprintf+0xd1/0x4b0 > ? scnprintf+0x3a/0x70 > hangcheck_declare_hang+0xd3/0xf0 [i915] > ? intel_runtime_pm_put+0x56/0xa0 [i915] > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > process_one_work+0x233/0x660 > worker_thread+0x4e/0x3b0 > kthread+0x152/0x190 > ? process_one_work+0x660/0x660 > ? kthread_create_on_node+0x40/0x40 > ret_from_fork+0x27/0x40 > Setting dangerous option reset - tainting kernel > i915 0000:00:02.0: Resetting chip after gpu hang > Setting dangerous option reset - tainting kernel > i915 0000:00:02.0: Resetting chip after gpu hang > > v2: Have 1 global synchronize_rcu() barrier across all engines, and > improve commit message. > > v3: We need to protect the seqno update with the timeline spinlock (in > set_wedged) to avoid racing with other updates of the seqno, like we > already do in nop_submit_request (Chris). > > v4: Use two-phase sequence to plug the race Chris spotted where we can > complete requests before they're marked up with -EIO. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096 > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Marta Lofstedt <marta.lofstedt@intel.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 76 +++++++++++++++-------- > drivers/gpu/drm/i915/i915_gem_request.c | 2 + > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 + > 3 files changed, 53 insertions(+), 27 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 82a10036fb38..8d7d8d1f78db 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3057,49 +3057,71 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > > spin_lock_irqsave(&request->engine->timeline->lock, flags); > __i915_gem_request_submit(request); > - intel_engine_init_global_seqno(request->engine, request->global_seqno); > spin_unlock_irqrestore(&request->engine->timeline->lock, flags); > } > > -static void engine_set_wedged(struct intel_engine_cs *engine) > +static void nop_complete_submit_request(struct drm_i915_gem_request *request) > { > - /* We need to be sure that no thread is running the old callback as > - * we install the nop handler (otherwise we would submit a request > - * to hardware that will never complete). In order to prevent this > - * race, we wait until the machine is idle before making the swap > - * (using stop_machine()). > - */ > - engine->submit_request = nop_submit_request; > + unsigned long flags; > > - /* Mark all executing requests as skipped */ > - engine->cancel_requests(engine); > + GEM_BUG_ON(!i915_terminally_wedged(&request->i915->gpu_error)); > + dma_fence_set_error(&request->fence, -EIO); > > - /* Mark all pending requests as complete so that any concurrent > - * (lockless) lookup doesn't try and wait upon the request as we > - * reset it. > - */ > - intel_engine_init_global_seqno(engine, > - intel_engine_last_submit(engine)); > + spin_lock_irqsave(&request->engine->timeline->lock, flags); > + __i915_gem_request_submit(request); > + intel_engine_init_global_seqno(request->engine, request->global_seqno); > + spin_unlock_irqrestore(&request->engine->timeline->lock, flags); > } > > -static int __i915_gem_set_wedged_BKL(void *data) > +void i915_gem_set_wedged(struct drm_i915_private *i915) > { > - struct drm_i915_private *i915 = data; > struct intel_engine_cs *engine; > enum intel_engine_id id; > > + /* First, stop submission to hw, but do not yet complete requests by > + * rolling the global seqno forward (since this would complete requests > + * for which we haven't set the fence error to EIO yet). > + */ > for_each_engine(engine, i915, id) > - engine_set_wedged(engine); > + engine->submit_request = nop_submit_request; > > - set_bit(I915_WEDGED, &i915->gpu_error.flags); > - wake_up_all(&i915->gpu_error.reset_queue); > + /* Make sure no one is running the old callback before we proceed with > + * cancelling requests and resetting the completion tracking. Otherwise > + * we might submit a request to the hardware which never completes. > + */ > + synchronize_rcu(); > > - return 0; > -} > + for_each_engine(engine, i915, id) { > + /* Mark all executing requests as skipped */ > + engine->cancel_requests(engine); > > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) > -{ > - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); > + /* Only once we've force-cancelled all in-flight requests can we > + * start to complete all requests. > + */ > + engine->submit_request = nop_complete_submit_request; > + } > + > + /* Make sure no request can slip through without getting completed by > + * either this call here to intel_engine_init_global_seqno, or the one > + * in nop_complete_submit_request. > + */ > + synchronize_rcu(); > + > + for_each_engine(engine, i915, id) { > + unsigned long flags; > + > + /* Mark all pending requests as complete so that any concurrent > + * (lockless) lookup doesn't try and wait upon the request as we > + * reset it. > + */ > + spin_lock_irqsave(&engine->timeline->lock, flags); > + intel_engine_init_global_seqno(engine, > + intel_engine_last_submit(engine)); > + spin_unlock_irqrestore(&engine->timeline->lock, flags); > + } > + > + set_bit(I915_WEDGED, &i915->gpu_error.flags); Can we this set_bit above the loop, right after synchronize_rcu()? Thinking is that we stop accepting requests we can't complete the earliest. -Mika > + wake_up_all(&i915->gpu_error.reset_queue); > } > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b100b38f1dd2..ef78a85cb845 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > switch (state) { > case FENCE_COMPLETE: > trace_i915_gem_request_submit(request); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > break; > > case FENCE_FREE: > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > index 78b9f811707f..a999161e8db1 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) > } > i915_gem_request_get(vip); > i915_add_request(vip); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > > mutex_unlock(&i915->drm.struct_mutex); > > -- > 2.14.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-10 9:50 ` Mika Kuoppala @ 2017-10-10 12:37 ` Daniel Vetter 0 siblings, 0 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-10 12:37 UTC (permalink / raw) To: Mika Kuoppala Cc: Daniel Vetter, Intel Graphics Development, Daniel Vetter, Thomas Gleixner On Tue, Oct 10, 2017 at 12:50:45PM +0300, Mika Kuoppala wrote: > Daniel Vetter <daniel.vetter@ffwll.ch> writes: > > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch tries to address the locking abuse of stop_machine() from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > To stay as close as possible to the stop_machine semantics we first > > update all the submit function pointers to the nop handler, then call > > synchronize_rcu() to make sure no new requests can be submitted. This > > should give us exactly the huge barrier we want. > > > > I pondered whether we should annotate engine->submit_request as __rcu > > and use rcu_assign_pointer and rcu_dereference on it. But the reason > > behind those is to make sure the compiler/cpu barriers are there for > > when you have an actual data structure you point at, to make sure all > > the writes are seen correctly on the read side. But we just have a > > function pointer, and .text isn't changed, so no need for these > > barriers and hence no need for annotations. > > > > Unfortunately there's a complication with the call to > > intel_engine_init_global_seqno: > > > > - Without stop_machine we must hold the corresponding spinlock. > > > > - Without stop_machine we must ensure that all requests are marked as > > having failed with dma_fence_set_error() before we call it. That > > means we need to split the nop request submission into two phases, > > both synchronized with rcu: > > > > 1. Only stop submitting the requests to hw and mark them as failed. > > > > 2. After all pending requests in the scheduler/ring are suitably > > marked up as failed and we can force complete them all, also force > > complete by calling intel_engine_init_global_seqno(). > > > > This should fix the followwing lockdep splat: > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U > > ------------------------------------------------------ > > kworker/3:4/562 is trying to acquire lock: > > (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40 > > > > but task is already holding lock: > > (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] > > > > which lock already depends on the new lock. > > > > the existing dependency chain (in reverse order) is: > > > > -> #6 (&dev->struct_mutex){+.+.}: > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > __mutex_lock+0x86/0x9b0 > > mutex_lock_interruptible_nested+0x1b/0x20 > > i915_mutex_lock_interruptible+0x51/0x130 [i915] > > i915_gem_fault+0x209/0x650 [i915] > > __do_fault+0x1e/0x80 > > __handle_mm_fault+0xa08/0xed0 > > handle_mm_fault+0x156/0x300 > > __do_page_fault+0x2c5/0x570 > > do_page_fault+0x28/0x250 > > page_fault+0x22/0x30 > > > > -> #5 (&mm->mmap_sem){++++}: > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > __might_fault+0x68/0x90 > > _copy_to_user+0x23/0x70 > > filldir+0xa5/0x120 > > dcache_readdir+0xf9/0x170 > > iterate_dir+0x69/0x1a0 > > SyS_getdents+0xa5/0x140 > > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > > > -> #4 (&sb->s_type->i_mutex_key#5){++++}: > > down_write+0x3b/0x70 > > handle_create+0xcb/0x1e0 > > devtmpfsd+0x139/0x180 > > kthread+0x152/0x190 > > ret_from_fork+0x27/0x40 > > > > -> #3 ((complete)&req.done){+.+.}: > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > wait_for_common+0x58/0x210 > > wait_for_completion+0x1d/0x20 > > devtmpfs_create_node+0x13d/0x160 > > device_add+0x5eb/0x620 > > device_create_groups_vargs+0xe0/0xf0 > > device_create+0x3a/0x40 > > msr_device_create+0x2b/0x40 > > cpuhp_invoke_callback+0xc9/0xbf0 > > cpuhp_thread_fun+0x17b/0x240 > > smpboot_thread_fn+0x18a/0x280 > > kthread+0x152/0x190 > > ret_from_fork+0x27/0x40 > > > > -> #2 (cpuhp_state-up){+.+.}: > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > cpuhp_issue_call+0x133/0x1c0 > > __cpuhp_setup_state_cpuslocked+0x139/0x2a0 > > __cpuhp_setup_state+0x46/0x60 > > page_writeback_init+0x43/0x67 > > pagecache_init+0x3d/0x42 > > start_kernel+0x3a8/0x3fc > > x86_64_start_reservations+0x2a/0x2c > > x86_64_start_kernel+0x6d/0x70 > > verify_cpu+0x0/0xfb > > > > -> #1 (cpuhp_state_mutex){+.+.}: > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > __mutex_lock+0x86/0x9b0 > > mutex_lock_nested+0x1b/0x20 > > __cpuhp_setup_state_cpuslocked+0x53/0x2a0 > > __cpuhp_setup_state+0x46/0x60 > > page_alloc_init+0x28/0x30 > > start_kernel+0x145/0x3fc > > x86_64_start_reservations+0x2a/0x2c > > x86_64_start_kernel+0x6d/0x70 > > verify_cpu+0x0/0xfb > > > > -> #0 (cpu_hotplug_lock.rw_sem){++++}: > > check_prev_add+0x430/0x840 > > __lock_acquire+0x1420/0x15e0 > > lock_acquire+0xb0/0x200 > > cpus_read_lock+0x3d/0xb0 > > stop_machine+0x1c/0x40 > > i915_gem_set_wedged+0x1a/0x20 [i915] > > i915_reset+0xb9/0x230 [i915] > > i915_reset_device+0x1f6/0x260 [i915] > > i915_handle_error+0x2d8/0x430 [i915] > > hangcheck_declare_hang+0xd3/0xf0 [i915] > > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > > process_one_work+0x233/0x660 > > worker_thread+0x4e/0x3b0 > > kthread+0x152/0x190 > > ret_from_fork+0x27/0x40 > > > > other info that might help us debug this: > > > > Chain exists of: > > cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&dev->struct_mutex); > > lock(&mm->mmap_sem); > > lock(&dev->struct_mutex); > > lock(cpu_hotplug_lock.rw_sem); > > > > *** DEADLOCK *** > > > > 3 locks held by kworker/3:4/562: > > #0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 > > #1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660 > > #2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915] > > > > stack backtrace: > > CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1 > > Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017 > > Workqueue: events_long i915_hangcheck_elapsed [i915] > > Call Trace: > > dump_stack+0x68/0x9f > > print_circular_bug+0x235/0x3c0 > > ? lockdep_init_map_crosslock+0x20/0x20 > > check_prev_add+0x430/0x840 > > ? irq_work_queue+0x86/0xe0 > > ? wake_up_klogd+0x53/0x70 > > __lock_acquire+0x1420/0x15e0 > > ? __lock_acquire+0x1420/0x15e0 > > ? lockdep_init_map_crosslock+0x20/0x20 > > lock_acquire+0xb0/0x200 > > ? stop_machine+0x1c/0x40 > > ? i915_gem_object_truncate+0x50/0x50 [i915] > > cpus_read_lock+0x3d/0xb0 > > ? stop_machine+0x1c/0x40 > > stop_machine+0x1c/0x40 > > i915_gem_set_wedged+0x1a/0x20 [i915] > > i915_reset+0xb9/0x230 [i915] > > i915_reset_device+0x1f6/0x260 [i915] > > ? gen8_gt_irq_ack+0x170/0x170 [i915] > > ? work_on_cpu_safe+0x60/0x60 > > i915_handle_error+0x2d8/0x430 [i915] > > ? vsnprintf+0xd1/0x4b0 > > ? scnprintf+0x3a/0x70 > > hangcheck_declare_hang+0xd3/0xf0 [i915] > > ? intel_runtime_pm_put+0x56/0xa0 [i915] > > i915_hangcheck_elapsed+0x262/0x2d0 [i915] > > process_one_work+0x233/0x660 > > worker_thread+0x4e/0x3b0 > > kthread+0x152/0x190 > > ? process_one_work+0x660/0x660 > > ? kthread_create_on_node+0x40/0x40 > > ret_from_fork+0x27/0x40 > > Setting dangerous option reset - tainting kernel > > i915 0000:00:02.0: Resetting chip after gpu hang > > Setting dangerous option reset - tainting kernel > > i915 0000:00:02.0: Resetting chip after gpu hang > > > > v2: Have 1 global synchronize_rcu() barrier across all engines, and > > improve commit message. > > > > v3: We need to protect the seqno update with the timeline spinlock (in > > set_wedged) to avoid racing with other updates of the seqno, like we > > already do in nop_submit_request (Chris). > > > > v4: Use two-phase sequence to plug the race Chris spotted where we can > > complete requests before they're marked up with -EIO. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886 > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096 > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Marta Lofstedt <marta.lofstedt@intel.com> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 76 +++++++++++++++-------- > > drivers/gpu/drm/i915/i915_gem_request.c | 2 + > > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 + > > 3 files changed, 53 insertions(+), 27 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 82a10036fb38..8d7d8d1f78db 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -3057,49 +3057,71 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > > > > spin_lock_irqsave(&request->engine->timeline->lock, flags); > > __i915_gem_request_submit(request); > > - intel_engine_init_global_seqno(request->engine, request->global_seqno); > > spin_unlock_irqrestore(&request->engine->timeline->lock, flags); > > } > > > > -static void engine_set_wedged(struct intel_engine_cs *engine) > > +static void nop_complete_submit_request(struct drm_i915_gem_request *request) > > { > > - /* We need to be sure that no thread is running the old callback as > > - * we install the nop handler (otherwise we would submit a request > > - * to hardware that will never complete). In order to prevent this > > - * race, we wait until the machine is idle before making the swap > > - * (using stop_machine()). > > - */ > > - engine->submit_request = nop_submit_request; > > + unsigned long flags; > > > > - /* Mark all executing requests as skipped */ > > - engine->cancel_requests(engine); > > + GEM_BUG_ON(!i915_terminally_wedged(&request->i915->gpu_error)); > > + dma_fence_set_error(&request->fence, -EIO); > > > > - /* Mark all pending requests as complete so that any concurrent > > - * (lockless) lookup doesn't try and wait upon the request as we > > - * reset it. > > - */ > > - intel_engine_init_global_seqno(engine, > > - intel_engine_last_submit(engine)); > > + spin_lock_irqsave(&request->engine->timeline->lock, flags); > > + __i915_gem_request_submit(request); > > + intel_engine_init_global_seqno(request->engine, request->global_seqno); > > + spin_unlock_irqrestore(&request->engine->timeline->lock, flags); > > } > > > > -static int __i915_gem_set_wedged_BKL(void *data) > > +void i915_gem_set_wedged(struct drm_i915_private *i915) > > { > > - struct drm_i915_private *i915 = data; > > struct intel_engine_cs *engine; > > enum intel_engine_id id; > > > > + /* First, stop submission to hw, but do not yet complete requests by > > + * rolling the global seqno forward (since this would complete requests > > + * for which we haven't set the fence error to EIO yet). > > + */ > > for_each_engine(engine, i915, id) > > - engine_set_wedged(engine); > > + engine->submit_request = nop_submit_request; > > > > - set_bit(I915_WEDGED, &i915->gpu_error.flags); > > - wake_up_all(&i915->gpu_error.reset_queue); > > + /* Make sure no one is running the old callback before we proceed with > > + * cancelling requests and resetting the completion tracking. Otherwise > > + * we might submit a request to the hardware which never completes. > > + */ > > + synchronize_rcu(); > > > > - return 0; > > -} > > + for_each_engine(engine, i915, id) { > > + /* Mark all executing requests as skipped */ > > + engine->cancel_requests(engine); > > > > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) > > -{ > > - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); > > + /* Only once we've force-cancelled all in-flight requests can we > > + * start to complete all requests. > > + */ > > + engine->submit_request = nop_complete_submit_request; > > + } > > + > > + /* Make sure no request can slip through without getting completed by > > + * either this call here to intel_engine_init_global_seqno, or the one > > + * in nop_complete_submit_request. > > + */ > > + synchronize_rcu(); > > + > > + for_each_engine(engine, i915, id) { > > + unsigned long flags; > > + > > + /* Mark all pending requests as complete so that any concurrent > > + * (lockless) lookup doesn't try and wait upon the request as we > > + * reset it. > > + */ > > + spin_lock_irqsave(&engine->timeline->lock, flags); > > + intel_engine_init_global_seqno(engine, > > + intel_engine_last_submit(engine)); > > + spin_unlock_irqrestore(&engine->timeline->lock, flags); > > + } > > + > > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > > Can we this set_bit above the loop, right after synchronize_rcu()? > Thinking is that we stop accepting requests we can't complete > the earliest. tbh I have no idea what this thing does there anymore, it looks like this will terminally wedge the gpu and refuse all subsequent requests (see i915_gem_request_alloc), except that i915_gem_set_wedged seems to be the new way to recover a fully hung gpu (and then later on requests are ok again). In short, I have no idea what this does nor where it should be put. Can you pls explain more? Thanks, Daniel > > -Mika > > > + wake_up_all(&i915->gpu_error.reset_queue); > > } > > > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > > index b100b38f1dd2..ef78a85cb845 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > > switch (state) { > > case FENCE_COMPLETE: > > trace_i915_gem_request_submit(request); > > + rcu_read_lock(); > > request->engine->submit_request(request); > > + rcu_read_unlock(); > > break; > > > > case FENCE_FREE: > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > index 78b9f811707f..a999161e8db1 100644 > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) > > } > > i915_gem_request_get(vip); > > i915_add_request(vip); > > + rcu_read_lock(); > > request->engine->submit_request(request); > > + rcu_read_unlock(); > > > > mutex_unlock(&i915->drm.struct_mutex); > > > > -- > > 2.14.1 -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter 2017-10-10 9:21 ` Chris Wilson 2017-10-10 9:50 ` Mika Kuoppala @ 2017-10-10 13:39 ` Chris Wilson 2017-10-10 14:09 ` Daniel Vetter 2 siblings, 1 reply; 12+ messages in thread From: Chris Wilson @ 2017-10-10 13:39 UTC (permalink / raw) To: Intel Graphics Development Cc: Daniel Vetter, Daniel Vetter, Thomas Gleixner, Mika Kuoppala Style nits: Quoting Daniel Vetter (2017-10-09 17:44:01) > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 82a10036fb38..8d7d8d1f78db 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3057,49 +3057,71 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > > spin_lock_irqsave(&request->engine->timeline->lock, flags); > __i915_gem_request_submit(request); > - intel_engine_init_global_seqno(request->engine, request->global_seqno); > spin_unlock_irqrestore(&request->engine->timeline->lock, flags); This reduced to i915_gem_request_submit(). Hah, you can just engine->submit_request = i915_gem_request_submit, and keep the nop_submit_request for the second phase! That may make the diff neater? > -static int __i915_gem_set_wedged_BKL(void *data) > +void i915_gem_set_wedged(struct drm_i915_private *i915) > { > - struct drm_i915_private *i915 = data; > struct intel_engine_cs *engine; > enum intel_engine_id id; > > + /* First, stop submission to hw, but do not yet complete requests by > + * rolling the global seqno forward (since this would complete requests > + * for which we haven't set the fence error to EIO yet). > + */ I'm doing a quiet fixup of all my comments to follow /* * The core convention, which you normally use anyway. */ > for_each_engine(engine, i915, id) > - engine_set_wedged(engine); > + engine->submit_request = nop_submit_request; > > - set_bit(I915_WEDGED, &i915->gpu_error.flags); > - wake_up_all(&i915->gpu_error.reset_queue); > + /* Make sure no one is running the old callback before we proceed with > + * cancelling requests and resetting the completion tracking. Otherwise > + * we might submit a request to the hardware which never completes. > + */ > + synchronize_rcu(); > > - return 0; > -} > + for_each_engine(engine, i915, id) { > + /* Mark all executing requests as skipped */ > + engine->cancel_requests(engine); > > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) > -{ > - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); > + /* Only once we've force-cancelled all in-flight requests can we > + * start to complete all requests. > + */ > + engine->submit_request = nop_complete_submit_request; > + } > + > + /* Make sure no request can slip through without getting completed by > + * either this call here to intel_engine_init_global_seqno, or the one > + * in nop_complete_submit_request. > + */ > + synchronize_rcu(); > + > + for_each_engine(engine, i915, id) { > + unsigned long flags; > + > + /* Mark all pending requests as complete so that any concurrent > + * (lockless) lookup doesn't try and wait upon the request as we > + * reset it. > + */ > + spin_lock_irqsave(&engine->timeline->lock, flags); > + intel_engine_init_global_seqno(engine, > + intel_engine_last_submit(engine)); > + spin_unlock_irqrestore(&engine->timeline->lock, flags); > + } > + > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > + wake_up_all(&i915->gpu_error.reset_queue); > } > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b100b38f1dd2..ef78a85cb845 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > switch (state) { > case FENCE_COMPLETE: > trace_i915_gem_request_submit(request); > + rcu_read_lock(); This trick needs a comment. /* * We need to serialize use of the submit_request() callback with its * hotplugging performed during an emergency i915_gem_set_wedged(). * We use the RCU mechanism to mark the critical section in order to * force i915_gem_set_wedged() to wait until the submit_request() is * completed before proceeding. */ -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged 2017-10-10 13:39 ` Chris Wilson @ 2017-10-10 14:09 ` Daniel Vetter 0 siblings, 0 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-10 14:09 UTC (permalink / raw) To: Chris Wilson Cc: Daniel Vetter, Intel Graphics Development, Thomas Gleixner, Mika Kuoppala On Tue, Oct 10, 2017 at 3:39 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Style nits: > > Quoting Daniel Vetter (2017-10-09 17:44:01) >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >> index 82a10036fb38..8d7d8d1f78db 100644 >> --- a/drivers/gpu/drm/i915/i915_gem.c >> +++ b/drivers/gpu/drm/i915/i915_gem.c >> @@ -3057,49 +3057,71 @@ static void nop_submit_request(struct drm_i915_gem_request *request) >> >> spin_lock_irqsave(&request->engine->timeline->lock, flags); >> __i915_gem_request_submit(request); >> - intel_engine_init_global_seqno(request->engine, request->global_seqno); >> spin_unlock_irqrestore(&request->engine->timeline->lock, flags); > > This reduced to i915_gem_request_submit(). > > Hah, you can just engine->submit_request = i915_gem_request_submit, and > keep the nop_submit_request for the second phase! That may make the > diff neater? We need to call dma_fence_set_error still, and we need to start doing that before we start calling ->cancel_request, or we might miss some requests. I'll do all the other suggestions and resubmit. -Daniel >> -static int __i915_gem_set_wedged_BKL(void *data) >> +void i915_gem_set_wedged(struct drm_i915_private *i915) >> { >> - struct drm_i915_private *i915 = data; >> struct intel_engine_cs *engine; >> enum intel_engine_id id; >> >> + /* First, stop submission to hw, but do not yet complete requests by >> + * rolling the global seqno forward (since this would complete requests >> + * for which we haven't set the fence error to EIO yet). >> + */ > > I'm doing a quiet fixup of all my comments to follow > /* > * The core convention, which you normally use anyway. > */ > >> for_each_engine(engine, i915, id) >> - engine_set_wedged(engine); >> + engine->submit_request = nop_submit_request; >> >> - set_bit(I915_WEDGED, &i915->gpu_error.flags); >> - wake_up_all(&i915->gpu_error.reset_queue); >> + /* Make sure no one is running the old callback before we proceed with >> + * cancelling requests and resetting the completion tracking. Otherwise >> + * we might submit a request to the hardware which never completes. >> + */ >> + synchronize_rcu(); >> >> - return 0; >> -} >> + for_each_engine(engine, i915, id) { >> + /* Mark all executing requests as skipped */ >> + engine->cancel_requests(engine); >> >> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) >> -{ >> - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); >> + /* Only once we've force-cancelled all in-flight requests can we >> + * start to complete all requests. >> + */ >> + engine->submit_request = nop_complete_submit_request; >> + } >> + >> + /* Make sure no request can slip through without getting completed by >> + * either this call here to intel_engine_init_global_seqno, or the one >> + * in nop_complete_submit_request. >> + */ >> + synchronize_rcu(); >> + >> + for_each_engine(engine, i915, id) { >> + unsigned long flags; >> + >> + /* Mark all pending requests as complete so that any concurrent >> + * (lockless) lookup doesn't try and wait upon the request as we >> + * reset it. >> + */ >> + spin_lock_irqsave(&engine->timeline->lock, flags); >> + intel_engine_init_global_seqno(engine, >> + intel_engine_last_submit(engine)); >> + spin_unlock_irqrestore(&engine->timeline->lock, flags); >> + } >> + >> + set_bit(I915_WEDGED, &i915->gpu_error.flags); >> + wake_up_all(&i915->gpu_error.reset_queue); >> } >> >> bool i915_gem_unset_wedged(struct drm_i915_private *i915) >> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c >> index b100b38f1dd2..ef78a85cb845 100644 >> --- a/drivers/gpu/drm/i915/i915_gem_request.c >> +++ b/drivers/gpu/drm/i915/i915_gem_request.c >> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) >> switch (state) { >> case FENCE_COMPLETE: >> trace_i915_gem_request_submit(request); >> + rcu_read_lock(); > > This trick needs a comment. > > /* > * We need to serialize use of the submit_request() callback with its > * hotplugging performed during an emergency i915_gem_set_wedged(). > * We use the RCU mechanism to mark the critical section in order to > * force i915_gem_set_wedged() to wait until the submit_request() is > * completed before proceeding. > */ > -Chris -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock 2017-10-09 16:44 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter @ 2017-10-09 18:42 ` Patchwork 2017-10-10 0:56 ` ✓ Fi.CI.IGT: " Patchwork 2017-10-10 10:57 ` [PATCH 1/2] " Daniel Vetter 3 siblings, 0 replies; 12+ messages in thread From: Patchwork @ 2017-10-09 18:42 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx == Series Details == Series: series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock URL : https://patchwork.freedesktop.org/series/31601/ State : success == Summary == Series 31601v1 series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock https://patchwork.freedesktop.org/api/1.0/series/31601/revisions/1/mbox/ fi-bdw-5557u total:289 pass:268 dwarn:0 dfail:0 fail:0 skip:21 time:452s fi-bdw-gvtdvm total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:464s fi-blb-e6850 total:289 pass:223 dwarn:1 dfail:0 fail:0 skip:65 time:391s fi-bsw-n3050 total:289 pass:243 dwarn:0 dfail:0 fail:0 skip:46 time:575s fi-bwr-2160 total:289 pass:183 dwarn:0 dfail:0 fail:0 skip:106 time:284s fi-bxt-dsi total:289 pass:259 dwarn:0 dfail:0 fail:0 skip:30 time:525s fi-bxt-j4205 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:521s fi-byt-j1900 total:289 pass:253 dwarn:1 dfail:0 fail:0 skip:35 time:538s fi-byt-n2820 total:289 pass:249 dwarn:1 dfail:0 fail:0 skip:39 time:522s fi-cfl-s total:289 pass:256 dwarn:1 dfail:0 fail:0 skip:32 time:558s fi-cnl-y total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:627s fi-elk-e7500 total:289 pass:229 dwarn:0 dfail:0 fail:0 skip:60 time:434s fi-glk-1 total:289 pass:261 dwarn:0 dfail:0 fail:0 skip:28 time:601s fi-hsw-4770 total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:442s fi-hsw-4770r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:419s fi-ilk-650 total:289 pass:228 dwarn:0 dfail:0 fail:0 skip:61 time:464s fi-ivb-3520m total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:497s fi-ivb-3770 total:289 pass:260 dwarn:0 dfail:0 fail:0 skip:29 time:475s fi-kbl-7500u total:289 pass:264 dwarn:1 dfail:0 fail:0 skip:24 time:505s fi-kbl-7560u total:289 pass:270 dwarn:0 dfail:0 fail:0 skip:19 time:577s fi-kbl-7567u total:289 pass:265 dwarn:4 dfail:0 fail:0 skip:20 time:490s fi-kbl-r total:289 pass:262 dwarn:0 dfail:0 fail:0 skip:27 time:591s fi-pnv-d510 total:289 pass:222 dwarn:1 dfail:0 fail:0 skip:66 time:667s fi-skl-6260u total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:466s fi-skl-6700hq total:289 pass:263 dwarn:0 dfail:0 fail:0 skip:26 time:652s fi-skl-6700k total:289 pass:265 dwarn:0 dfail:0 fail:0 skip:24 time:534s fi-skl-6770hq total:289 pass:269 dwarn:0 dfail:0 fail:0 skip:20 time:532s fi-skl-gvtdvm total:289 pass:266 dwarn:0 dfail:0 fail:0 skip:23 time:473s fi-snb-2520m total:289 pass:250 dwarn:0 dfail:0 fail:0 skip:39 time:585s fi-snb-2600 total:289 pass:249 dwarn:0 dfail:0 fail:0 skip:40 time:435s 18b15be401ecdc29d6b819714f054cfbb9f94a48 drm-tip: 2017y-10m-09d-17h-21m-07s UTC integration manifest d71f3e3bc2fa drm/i915: Use rcu instead of stop_machine in set_wedged bb437d41a183 drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5959/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* ✓ Fi.CI.IGT: success for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock 2017-10-09 16:44 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter 2017-10-09 18:42 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Patchwork @ 2017-10-10 0:56 ` Patchwork 2017-10-10 10:57 ` [PATCH 1/2] " Daniel Vetter 3 siblings, 0 replies; 12+ messages in thread From: Patchwork @ 2017-10-10 0:56 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx == Series Details == Series: series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock URL : https://patchwork.freedesktop.org/series/31601/ State : success == Summary == Test gem_userptr_blits: Subgroup map-fixed-invalidate-overlap-gup: dmesg-warn -> PASS (shard-hsw) Subgroup stress-mm-invalidate-close: dmesg-warn -> PASS (shard-hsw) Subgroup dmabuf-unsync: dmesg-warn -> PASS (shard-hsw) Subgroup coherency-sync: dmesg-warn -> PASS (shard-hsw) Subgroup stress-mm-invalidate-close-overlap: dmesg-warn -> PASS (shard-hsw) Subgroup map-fixed-invalidate-overlap-busy: dmesg-warn -> PASS (shard-hsw) Subgroup map-fixed-invalidate-overlap: dmesg-warn -> PASS (shard-hsw) Subgroup stress-mm: dmesg-warn -> PASS (shard-hsw) Subgroup sync-overlap: dmesg-warn -> PASS (shard-hsw) Subgroup process-exit-gtt: dmesg-warn -> PASS (shard-hsw) Subgroup map-fixed-invalidate-overlap-busy-gup: dmesg-warn -> PASS (shard-hsw) Subgroup map-fixed-invalidate-busy-gup: dmesg-warn -> PASS (shard-hsw) Subgroup sync-unmap-after-close: dmesg-warn -> PASS (shard-hsw) Subgroup process-exit-busy: dmesg-warn -> PASS (shard-hsw) Subgroup sync-unmap: dmesg-warn -> PASS (shard-hsw) Subgroup map-fixed-invalidate-busy: dmesg-warn -> PASS (shard-hsw) Subgroup dmabuf-sync: dmesg-warn -> PASS (shard-hsw) Subgroup process-exit-gtt-busy: dmesg-warn -> PASS (shard-hsw) Test gem_eio: Subgroup execbuf: dmesg-warn -> PASS (shard-hsw) fdo#102886 +3 Test kms_setmode: Subgroup basic: fail -> PASS (shard-hsw) fdo#99912 Test drv_module_reload: Subgroup basic-no-display: pass -> DMESG-WARN (shard-hsw) fdo#102707 fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886 fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912 fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707 shard-hsw total:2482 pass:1369 dwarn:2 dfail:0 fail:8 skip:1103 time:9937s == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5959/shards.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock 2017-10-09 16:44 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter ` (2 preceding siblings ...) 2017-10-10 0:56 ` ✓ Fi.CI.IGT: " Patchwork @ 2017-10-10 10:57 ` Daniel Vetter 3 siblings, 0 replies; 12+ messages in thread From: Daniel Vetter @ 2017-10-10 10:57 UTC (permalink / raw) To: Intel Graphics Development Cc: Peter Zijlstra, Daniel Vetter, Tejun Heo, Daniel Vetter, Thomas Gleixner, Sasha Levin On Mon, Oct 09, 2017 at 06:44:00PM +0200, Daniel Vetter wrote: > 4.14-rc1 gained the fancy new cross-release support in lockdep, which > seems to have uncovered a few more rules about what is allowed and > isn't. > > This one here seems to indicate that allocating a work-queue while > holding mmap_sem is a no-go, so let's try to preallocate it. > > Of course another way to break this chain would be somewhere in the > cpu hotplug code, since this isn't the only trace we're finding now > which goes through msr_create_device. > > Full lockdep splat: > > ====================================================== > WARNING: possible circular locking dependency detected > 4.14.0-rc1-CI-CI_DRM_3118+ #1 Tainted: G U > ------------------------------------------------------ > prime_mmap/1551 is trying to acquire lock: > (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8109dbb7>] apply_workqueue_attrs+0x17/0x50 > > but task is already holding lock: > (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #6 (&dev_priv->mm_lock){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __mutex_lock+0x86/0x9b0 > mutex_lock_nested+0x1b/0x20 > i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] > i915_gem_userptr_ioctl+0x222/0x2c0 [i915] > drm_ioctl_kernel+0x69/0xb0 > drm_ioctl+0x2f9/0x3d0 > do_vfs_ioctl+0x94/0x670 > SyS_ioctl+0x41/0x70 > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > -> #5 (&mm->mmap_sem){++++}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __might_fault+0x68/0x90 > _copy_to_user+0x23/0x70 > filldir+0xa5/0x120 > dcache_readdir+0xf9/0x170 > iterate_dir+0x69/0x1a0 > SyS_getdents+0xa5/0x140 > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > -> #4 (&sb->s_type->i_mutex_key#5){++++}: > down_write+0x3b/0x70 > handle_create+0xcb/0x1e0 > devtmpfsd+0x139/0x180 > kthread+0x152/0x190 > ret_from_fork+0x27/0x40 > > -> #3 ((complete)&req.done){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > wait_for_common+0x58/0x210 > wait_for_completion+0x1d/0x20 > devtmpfs_create_node+0x13d/0x160 > device_add+0x5eb/0x620 > device_create_groups_vargs+0xe0/0xf0 > device_create+0x3a/0x40 > msr_device_create+0x2b/0x40 > cpuhp_invoke_callback+0xa3/0x840 > cpuhp_thread_fun+0x7a/0x150 > smpboot_thread_fn+0x18a/0x280 > kthread+0x152/0x190 > ret_from_fork+0x27/0x40 > > -> #2 (cpuhp_state){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > cpuhp_issue_call+0x10b/0x170 > __cpuhp_setup_state_cpuslocked+0x134/0x2a0 > __cpuhp_setup_state+0x46/0x60 > page_writeback_init+0x43/0x67 > pagecache_init+0x3d/0x42 > start_kernel+0x3a8/0x3fc > x86_64_start_reservations+0x2a/0x2c > x86_64_start_kernel+0x6d/0x70 > verify_cpu+0x0/0xfb > > -> #1 (cpuhp_state_mutex){+.+.}: > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > __mutex_lock+0x86/0x9b0 > mutex_lock_nested+0x1b/0x20 > __cpuhp_setup_state_cpuslocked+0x52/0x2a0 > __cpuhp_setup_state+0x46/0x60 > page_alloc_init+0x28/0x30 > start_kernel+0x145/0x3fc > x86_64_start_reservations+0x2a/0x2c > x86_64_start_kernel+0x6d/0x70 > verify_cpu+0x0/0xfb > > -> #0 (cpu_hotplug_lock.rw_sem){++++}: > check_prev_add+0x430/0x840 > __lock_acquire+0x1420/0x15e0 > lock_acquire+0xb0/0x200 > cpus_read_lock+0x3d/0xb0 > apply_workqueue_attrs+0x17/0x50 > __alloc_workqueue_key+0x1d8/0x4d9 > i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915] > i915_gem_userptr_ioctl+0x222/0x2c0 [i915] > drm_ioctl_kernel+0x69/0xb0 > drm_ioctl+0x2f9/0x3d0 > do_vfs_ioctl+0x94/0x670 > SyS_ioctl+0x41/0x70 > entry_SYSCALL_64_fastpath+0x1c/0xb1 > > other info that might help us debug this: > > Chain exists of: > cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev_priv->mm_lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&dev_priv->mm_lock); > lock(&mm->mmap_sem); > lock(&dev_priv->mm_lock); > lock(cpu_hotplug_lock.rw_sem); > > *** DEADLOCK *** > > 2 locks held by prime_mmap/1551: > #0: (&mm->mmap_sem){++++}, at: [<ffffffffa01a7b18>] i915_gem_userptr_init__mmu_notifier+0x138/0x270 [i915] > #1: (&dev_priv->mm_lock){+.+.}, at: [<ffffffffa01a7b2a>] i915_gem_userptr_init__mmu_notifier+0x14a/0x270 [i915] > > stack backtrace: > CPU: 4 PID: 1551 Comm: prime_mmap Tainted: G U 4.14.0-rc1-CI-CI_DRM_3118+ #1 > Hardware name: Dell Inc. XPS 8300 /0Y2MRG, BIOS A06 10/17/2011 > Call Trace: > dump_stack+0x68/0x9f > print_circular_bug+0x235/0x3c0 > ? lockdep_init_map_crosslock+0x20/0x20 > check_prev_add+0x430/0x840 > __lock_acquire+0x1420/0x15e0 > ? __lock_acquire+0x1420/0x15e0 > ? lockdep_init_map_crosslock+0x20/0x20 > lock_acquire+0xb0/0x200 > ? apply_workqueue_attrs+0x17/0x50 > cpus_read_lock+0x3d/0xb0 > ? apply_workqueue_attrs+0x17/0x50 > apply_workqueue_attrs+0x17/0x50 > __alloc_workqueue_key+0x1d8/0x4d9 > ? __lockdep_init_map+0x57/0x1c0 > i915_gem_userptr_init__mmu_notifier+0x1fb/0x270 [i915] > i915_gem_userptr_ioctl+0x222/0x2c0 [i915] > ? i915_gem_userptr_release+0x140/0x140 [i915] > drm_ioctl_kernel+0x69/0xb0 > drm_ioctl+0x2f9/0x3d0 > ? i915_gem_userptr_release+0x140/0x140 [i915] > ? __do_page_fault+0x2a4/0x570 > do_vfs_ioctl+0x94/0x670 > ? entry_SYSCALL_64_fastpath+0x5/0xb1 > ? __this_cpu_preempt_check+0x13/0x20 > ? trace_hardirqs_on_caller+0xe3/0x1b0 > SyS_ioctl+0x41/0x70 > entry_SYSCALL_64_fastpath+0x1c/0xb1 > RIP: 0033:0x7fbb83c39587 > RSP: 002b:00007fff188dc228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: ffffffff81492963 RCX: 00007fbb83c39587 > RDX: 00007fff188dc260 RSI: 00000000c0186473 RDI: 0000000000000003 > RBP: ffffc90001487f88 R08: 0000000000000000 R09: 00007fff188dc2ac > R10: 00007fbb83efcb58 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000000003 R14: 00000000c0186473 R15: 00007fff188dc2ac > ? __this_cpu_preempt_check+0x13/0x20 > > Note that this also has the minor benefit of slightly reducing the > critical section where we hold mmap_sem. > > v2: Set ret correctly when we raced with another thread. > > v3: Use Chris' diff. Attach the right lockdep splat. > > v4: Repaint in Tvrtko's colors (aka don't report ENOMEM if we race and > some other thread managed to not also get an ENOMEM and successfully > install the mmu notifier. Note that the kernel guarantees that small > allocations succeed, so this never actually happens). > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Sasha Levin <alexander.levin@verizon.com> > Cc: Marta Lofstedt <marta.lofstedt@intel.com> > Cc: Tejun Heo <tj@kernel.org> > References: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3180/shard-hsw3/igt@prime_mmap@test_userptr.html > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102939 > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Merged this one with the r-b from Tvrtko from the previous round (I somehow missed that). But the snag in set_wedged is the one that CI is hitting a lot more often. -Daniel > --- > drivers/gpu/drm/i915/i915_gem_userptr.c | 38 ++++++++++++++++++++------------- > 1 file changed, 23 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c > index 41e16e19c3f3..65fbf334508e 100644 > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c > @@ -164,7 +164,6 @@ static struct i915_mmu_notifier * > i915_mmu_notifier_create(struct mm_struct *mm) > { > struct i915_mmu_notifier *mn; > - int ret; > > mn = kmalloc(sizeof(*mn), GFP_KERNEL); > if (mn == NULL) > @@ -179,14 +178,6 @@ i915_mmu_notifier_create(struct mm_struct *mm) > return ERR_PTR(-ENOMEM); > } > > - /* Protected by mmap_sem (write-lock) */ > - ret = __mmu_notifier_register(&mn->mn, mm); > - if (ret) { > - destroy_workqueue(mn->wq); > - kfree(mn); > - return ERR_PTR(ret); > - } > - > return mn; > } > > @@ -210,23 +201,40 @@ i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj) > static struct i915_mmu_notifier * > i915_mmu_notifier_find(struct i915_mm_struct *mm) > { > - struct i915_mmu_notifier *mn = mm->mn; > + struct i915_mmu_notifier *mn; > + int err = 0; > > mn = mm->mn; > if (mn) > return mn; > > + mn = i915_mmu_notifier_create(mm->mm); > + if (IS_ERR(mn)) > + err = PTR_ERR(mn); > + > down_write(&mm->mm->mmap_sem); > mutex_lock(&mm->i915->mm_lock); > - if ((mn = mm->mn) == NULL) { > - mn = i915_mmu_notifier_create(mm->mm); > - if (!IS_ERR(mn)) > - mm->mn = mn; > + if (mm->mn == NULL && !err) { > + /* Protected by mmap_sem (write-lock) */ > + err = __mmu_notifier_register(&mn->mn, mm->mm); > + if (!err) { > + /* Protected by mm_lock */ > + mm->mn = fetch_and_zero(&mn); > + } > + } else { > + /* someone else raced and successfully installed the mmu > + * notifier, we can cancel our own errors */ > + err = 0; > } > mutex_unlock(&mm->i915->mm_lock); > up_write(&mm->mm->mmap_sem); > > - return mn; > + if (mn) { > + destroy_workqueue(mn->wq); > + kfree(mn); > + } > + > + return err ? ERR_PTR(err) : mm->mn; > } > > static int > -- > 2.14.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-10-10 14:09 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-09 16:44 [PATCH 1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Daniel Vetter 2017-10-09 16:44 ` [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Daniel Vetter 2017-10-10 9:21 ` Chris Wilson 2017-10-10 12:30 ` Daniel Vetter 2017-10-10 13:29 ` Chris Wilson 2017-10-10 9:50 ` Mika Kuoppala 2017-10-10 12:37 ` Daniel Vetter 2017-10-10 13:39 ` Chris Wilson 2017-10-10 14:09 ` Daniel Vetter 2017-10-09 18:42 ` ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915: Preallocate our mmu notifier workequeu to unbreak cpu hotplug deadlock Patchwork 2017-10-10 0:56 ` ✓ Fi.CI.IGT: " Patchwork 2017-10-10 10:57 ` [PATCH 1/2] " Daniel Vetter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.