* 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages @ 2019-08-09 12:41 Martin Wilck 2019-08-09 12:53 ` Chris Wilson 0 siblings, 1 reply; 9+ messages in thread From: Martin Wilck @ 2019-08-09 12:41 UTC (permalink / raw) To: intel-gfx; +Cc: linux-kernel This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default (5.3-rc3 with just a few patches on top), after starting a KVM virtual machine. The X screen was frozen. Remote login via ssh was still possible, thus I was able to retrieve basic logs. sysrq-w showed two blocked processes (kcompactd0 and KVM). After a minute, the same two processes were still blocked. KVM seems to try to acquire a lock that kcompactd is holding. kcompactd is waiting for IO to complete on pages owned by the i915 driver. kcompactd stack: Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State Aug 09 12:12:48 apollon.suse.de kernel: task PC stack pid father Aug 09 12:12:48 apollon.suse.de kernel: kcompactd0 D 0 43 2 0x80004000 Aug 09 12:12:48 apollon.suse.de kernel: Call Trace: Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0 Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90 Aug 09 12:12:48 apollon.suse.de kernel: io_schedule+0x12/0x40 Aug 09 12:12:48 apollon.suse.de kernel: __lock_page+0x123/0x200 Aug 09 12:12:48 apollon.suse.de kernel: ? gen8_ppgtt_clear_pdp+0xc0/0x140 [i915] Aug 09 12:12:48 apollon.suse.de kernel: ? file_fdatawait_range+0x20/0x20 Aug 09 12:12:48 apollon.suse.de kernel: set_page_dirty_lock+0x49/0x50 Aug 09 12:12:48 apollon.suse.de kernel: i915_gem_userptr_put_pages+0x13f/0x1c0 [i915] Aug 09 12:12:48 apollon.suse.de kernel: __i915_gem_object_put_pages+0x5e/0xa0 [i915] Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1ff/0x220 [i915] Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0 Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0 Aug 09 12:12:48 apollon.suse.de kernel: ? __mod_lruvec_state+0x3f/0xf0 Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250 Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0 Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290 Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20 Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80 Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0 Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0 Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0 Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80 Aug 09 12:12:48 apollon.suse.de kernel: ? entry_SYSCALL_64_after_hwframe+0xb8/0xbe Aug 09 12:12:48 apollon.suse.de kernel: kcompactd_do_work+0x120/0x290 KVM stack: Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM D 0 25189 1 0x00000320 Aug 09 12:12:48 apollon.suse.de kernel: Call Trace: Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0 Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90 Aug 09 12:12:48 apollon.suse.de kernel: schedule_preempt_disabled+0xa/0x10 Aug 09 12:12:48 apollon.suse.de kernel: __mutex_lock.isra.0+0x172/0x4d0 Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1bf/0x220 [i915] Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0 Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0 Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250 Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0 Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290 Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20 Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80 Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0 Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0 Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0 Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80 Aug 09 12:12:48 apollon.suse.de kernel: compact_zone_order+0xc6/0xf0 Aug 09 12:12:48 apollon.suse.de kernel: try_to_compact_pages+0xcc/0x2a0 Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_direct_compact+0x7c/0x150 Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_slowpath+0x1ee/0xd00 Aug 09 12:12:48 apollon.suse.de kernel: ? vmx_vcpu_load+0x100/0x120 [kvm_intel] Full logs can be found under https://pastebin.com/KJ6tccj4 I haven't yet tried if this is reproducible. Regards Martin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-08-09 12:41 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages Martin Wilck @ 2019-08-09 12:53 ` Chris Wilson 2019-09-10 14:20 ` Leho Kraav 0 siblings, 1 reply; 9+ messages in thread From: Chris Wilson @ 2019-08-09 12:53 UTC (permalink / raw) To: intel-gfx, Martin Wilck; +Cc: linux-kernel Quoting Martin Wilck (2019-08-09 13:41:42) > This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default > (5.3-rc3 with just a few patches on top), after starting a KVM virtual > machine. The X screen was frozen. Remote login via ssh was still > possible, thus I was able to retrieve basic logs. > > sysrq-w showed two blocked processes (kcompactd0 and KVM). After a > minute, the same two processes were still blocked. KVM seems to try to > acquire a lock that kcompactd is holding. kcompactd is waiting for IO > to complete on pages owned by the i915 driver. My bad, it's known. We haven't decided on whether to revert the unfortunate recursive locking (and so hit another warn elsewhere) or to ignore the dirty pages (and so risk losing data across swap). cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") -Chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-08-09 12:53 ` Chris Wilson @ 2019-09-10 14:20 ` Leho Kraav 2019-09-12 11:23 ` Martin Wilck 0 siblings, 1 reply; 9+ messages in thread From: Leho Kraav @ 2019-09-10 14:20 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx, Martin Wilck, linux-kernel On Fri, Aug 09, 2019 at 01:53:43PM +0100, Chris Wilson wrote: > Quoting Martin Wilck (2019-08-09 13:41:42) > > This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default > > (5.3-rc3 with just a few patches on top), after starting a KVM virtual > > machine. The X screen was frozen. Remote login via ssh was still > > possible, thus I was able to retrieve basic logs. > > > > sysrq-w showed two blocked processes (kcompactd0 and KVM). After a > > minute, the same two processes were still blocked. KVM seems to try to > > acquire a lock that kcompactd is holding. kcompactd is waiting for IO > > to complete on pages owned by the i915 driver. > > My bad, it's known. We haven't decided on whether to revert the > unfortunate recursive locking (and so hit another warn elsewhere) or to > ignore the dirty pages (and so risk losing data across swap). > > cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") > -Chris Hi Chris. Is this exactly what I'm hitting at https://bugs.freedesktop.org/show_bug.cgi?id=111500 perhaps? It reliably breaks the graphics userland, as the machine consistently freezes at any random moment. Any workaround options, even if with a performance penalty? Revert cb6d7c7dc7ff but side effects? 5.3 has useful NVMe power mgmt updates for laptops, I'd like to stick with the newest if possible. -- Leho Kraav, senior technology & digital marketing architect ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-09-10 14:20 ` Leho Kraav @ 2019-09-12 11:23 ` Martin Wilck 2019-09-12 11:58 ` leho 2019-09-12 11:59 ` Linus Torvalds 0 siblings, 2 replies; 9+ messages in thread From: Martin Wilck @ 2019-09-12 11:23 UTC (permalink / raw) To: chris; +Cc: torvalds, Michal Koutny, intel-gfx, leho, tiwai, linux-kernel Hi Chris, On Tue, 2019-09-10 at 17:20 +0300, Leho Kraav wrote: > On Fri, Aug 09, 2019 at 01:53:43PM +0100, Chris Wilson wrote: > > Quoting Martin Wilck (2019-08-09 13:41:42) > > > This happened to me today, running kernel 5.3.0-rc3-1.g571863b- > > > default > > > (5.3-rc3 with just a few patches on top), after starting a KVM > > > virtual > > > machine. The X screen was frozen. Remote login via ssh was still > > > possible, thus I was able to retrieve basic logs. > > > > > > sysrq-w showed two blocked processes (kcompactd0 and KVM). After > > > a > > > minute, the same two processes were still blocked. KVM seems to > > > try to > > > acquire a lock that kcompactd is holding. kcompactd is waiting > > > for IO > > > to complete on pages owned by the i915 driver. > > > > My bad, it's known. We haven't decided on whether to revert the > > unfortunate recursive locking (and so hit another warn elsewhere) > > or to > > ignore the dirty pages (and so risk losing data across swap). > > > > cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around > > set_page_dirty()") > > -Chris > > Hi Chris. Is this exactly what I'm hitting at > https://bugs.freedesktop.org/show_bug.cgi?id=111500 perhaps? > > It reliably breaks the graphics userland, as the machine consistently > freezes at any random moment. > > Any workaround options, even if with a performance penalty? Revert > cb6d7c7dc7ff but side effects? > > 5.3 has useful NVMe power mgmt updates for laptops, I'd like to stick > with the newest if possible. There's a considerable risk that many users will start seeing this regression when 5.3 is released. I am not aware of a workaround. Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")? And if we do, what would be the consequences? Would other patches need to be reverted, too? Thanks, Martin ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-09-12 11:23 ` Martin Wilck @ 2019-09-12 11:58 ` leho 2019-09-12 11:59 ` Linus Torvalds 1 sibling, 0 replies; 9+ messages in thread From: leho @ 2019-09-12 11:58 UTC (permalink / raw) To: Martin Wilck Cc: chris, torvalds, Michal Koutny, intel-gfx, tiwai, linux-kernel On Thu, Sep 12, 2019 at 11:23:09AM +0000, Martin Wilck wrote: > > There's a considerable risk that many users will start seeing this > regression when 5.3 is released. I am not aware of a workaround. > > Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr: > Acquire the page lock around set_page_dirty()")? And if we do, what > would be the consequences? Would other patches need to be reverted, > too? I've been running with revert patch for a couple of days and have not encountered any kernel warnings thus far, nor any other ill effects that could be attributed to this locking mechanism. But I'm far from familiar with these subsystems. Graphics does not hang anymore. I've also received developer feedback in private that this really should be fixed before 5.3 release. -- Leho Kraav, senior technology & digital marketing architect ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-09-12 11:23 ` Martin Wilck 2019-09-12 11:58 ` leho @ 2019-09-12 11:59 ` Linus Torvalds 2019-09-12 12:47 ` Chris Wilson 2019-09-12 12:56 ` [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" Chris Wilson 1 sibling, 2 replies; 9+ messages in thread From: Linus Torvalds @ 2019-09-12 11:59 UTC (permalink / raw) To: Martin Wilck; +Cc: chris, Michal Koutny, intel-gfx, leho, tiwai, linux-kernel On Thu, Sep 12, 2019 at 12:51 PM Martin Wilck <Martin.Wilck@suse.com> wrote: > > Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr: > Acquire the page lock around set_page_dirty()")? And if we do, what > would be the consequences? Would other patches need to be reverted, > too? Looking at that commit, and the backtrace of the lockup, I think that reverting it is the correct thing to do. You can't take the page lock in invalidate_range(), since it's called from try_to_unmap(), which is called with the page lock already held. So commit aa56a292ce62 is just fundamentally completely wrong and should be reverted. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages 2019-09-12 11:59 ` Linus Torvalds @ 2019-09-12 12:47 ` Chris Wilson 2019-09-12 12:56 ` [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" Chris Wilson 1 sibling, 0 replies; 9+ messages in thread From: Chris Wilson @ 2019-09-12 12:47 UTC (permalink / raw) To: Linus Torvalds, Martin Wilck Cc: Michal Koutny, intel-gfx, leho, tiwai, linux-kernel Quoting Linus Torvalds (2019-09-12 12:59:25) > On Thu, Sep 12, 2019 at 12:51 PM Martin Wilck <Martin.Wilck@suse.com> wrote: > > > > Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr: > > Acquire the page lock around set_page_dirty()")? And if we do, what > > would be the consequences? Would other patches need to be reverted, > > too? > > Looking at that commit, and the backtrace of the lockup, I think that > reverting it is the correct thing to do. > > You can't take the page lock in invalidate_range(), since it's called > from try_to_unmap(), which is called with the page lock already held. > > So commit aa56a292ce62 is just fundamentally completely wrong and > should be reverted. There's still the dilemma that we get called without the page lock, but at this moment in time in order to hit 5.3, it needs a revert sent directly to Linus. -Chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" 2019-09-12 11:59 ` Linus Torvalds 2019-09-12 12:47 ` Chris Wilson @ 2019-09-12 12:56 ` Chris Wilson 2019-09-12 14:39 ` [Intel-gfx] " Tvrtko Ursulin 1 sibling, 1 reply; 9+ messages in thread From: Chris Wilson @ 2019-09-12 12:56 UTC (permalink / raw) To: intel-gfx, torvalds Cc: Martin.Wilck, MKoutny, leho, tiwai, linux-kernel, Chris Wilson, Lionel Landwerlin, Tvrtko Ursulin, Jani Nikula, Joonas Lahtinen, stable The userptr put_pages can be called from inside try_to_unmap, and so enters with the page lock held on one of the object's backing pages. We cannot take the page lock ourselves for fear of recursion. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Martin Wilck <Martin.Wilck@suse.com> Reported-by: Leo Kraav <leho@kraav.com> Fixes: aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") References: https://bugzilla.kernel.org/show_bug.cgi?id=203317 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 74da35611d7c..11b231c187c5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -672,15 +672,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj, for_each_sgt_page(page, sgt_iter, pages) { if (obj->mm.dirty) - /* - * As this may not be anonymous memory (e.g. shmem) - * but exist on a real mapping, we have to lock - * the page in order to dirty it -- holding - * the page reference is not sufficient to - * prevent the inode from being truncated. - * Play safe and take the lock. - */ - set_page_dirty_lock(page); + set_page_dirty(page); mark_page_accessed(page); put_page(page); -- 2.23.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" 2019-09-12 12:56 ` [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" Chris Wilson @ 2019-09-12 14:39 ` Tvrtko Ursulin 0 siblings, 0 replies; 9+ messages in thread From: Tvrtko Ursulin @ 2019-09-12 14:39 UTC (permalink / raw) To: Chris Wilson, intel-gfx, torvalds Cc: tiwai, linux-kernel, leho, Jani Nikula, MKoutny, stable, Martin.Wilck On 12/09/2019 13:56, Chris Wilson wrote: > The userptr put_pages can be called from inside try_to_unmap, and so > enters with the page lock held on one of the object's backing pages. We > cannot take the page lock ourselves for fear of recursion. > > Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Reported-by: Martin Wilck <Martin.Wilck@suse.com> > Reported-by: Leo Kraav <leho@kraav.com> > Fixes: aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") > References: https://bugzilla.kernel.org/show_bug.cgi?id=203317 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Jani Nikula <jani.nikula@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: stable@vger.kernel.org > --- > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 10 +--------- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > index 74da35611d7c..11b231c187c5 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c > @@ -672,15 +672,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj, > > for_each_sgt_page(page, sgt_iter, pages) { > if (obj->mm.dirty) > - /* > - * As this may not be anonymous memory (e.g. shmem) > - * but exist on a real mapping, we have to lock > - * the page in order to dirty it -- holding > - * the page reference is not sufficient to > - * prevent the inode from being truncated. > - * Play safe and take the lock. > - */ > - set_page_dirty_lock(page); > + set_page_dirty(page); > > mark_page_accessed(page); > put_page(page); > Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-09-12 14:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-09 12:41 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages Martin Wilck 2019-08-09 12:53 ` Chris Wilson 2019-09-10 14:20 ` Leho Kraav 2019-09-12 11:23 ` Martin Wilck 2019-09-12 11:58 ` leho 2019-09-12 11:59 ` Linus Torvalds 2019-09-12 12:47 ` Chris Wilson 2019-09-12 12:56 ` [PATCH] Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" Chris Wilson 2019-09-12 14:39 ` [Intel-gfx] " Tvrtko Ursulin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).