* Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
@ 2019-08-09 14:31 Hillf Danton
2019-08-09 16:46 ` ✗ Fi.CI.BAT: failure for " Patchwork
0 siblings, 1 reply; 2+ messages in thread
From: Hillf Danton @ 2019-08-09 14:31 UTC (permalink / raw)
To: Martin Wilck; +Cc: intel-gfx, linux-kernel
[off topic: plain text mail please]
On Fri, 9 Aug 2019 12:41:42 +0000 Martin Wilck wrote:
>
> This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default
> (5.3-rc3 with just a few patches on top), after starting a KVM virtual
> machine. The X screen was frozen. Remote login via ssh was still
> possible, thus I was able to retrieve basic logs.
Thanks for report.
>
> sysrq-w showed two blocked processes (kcompactd0 and KVM). After a
> minute, the same two processes were still blocked. KVM seems to try to
> acquire a lock that kcompactd is holding. kcompactd is waiting for IO
> to complete on pages owned by the i915 driver.
>
> kcompactd stack:
>
> Aug 09 12:12:48 apollon.suse.de kernel: sysrq: Show Blocked State
> Aug 09 12:12:48 apollon.suse.de kernel: task PC stack pid father
> Aug 09 12:12:48 apollon.suse.de kernel: kcompactd0 D 0 43 2 0x80004000
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel: io_schedule+0x12/0x40
> Aug 09 12:12:48 apollon.suse.de kernel: __lock_page+0x123/0x200
> Aug 09 12:12:48 apollon.suse.de kernel: ? gen8_ppgtt_clear_pdp+0xc0/0x140 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: ? file_fdatawait_range+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: set_page_dirty_lock+0x49/0x50
> Aug 09 12:12:48 apollon.suse.de kernel: i915_gem_userptr_put_pages+0x13f/0x1c0 [i915]
The two lines above show commit aa56a292ce62 ("drm/i915/userptr: Acquire
the page lock around set_page_dirty()") is culprit.
> Aug 09 12:12:48 apollon.suse.de kernel: __i915_gem_object_put_pages+0x5e/0xa0 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1ff/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel: ? __mod_lruvec_state+0x3f/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
Page is locked before try_to_unmap(), and dirty page table entry is
handled in try_to_unmap_one(), so what was added in aa56a292ce62 is
a bit of overaction in this call trace. A bigger pain is it can not
be reverted because of the Fixes tag in it.
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel: ? entry_SYSCALL_64_after_hwframe+0xb8/0xbe
> Aug 09 12:12:48 apollon.suse.de kernel: kcompactd_do_work+0x120/0x290
>
> KVM stack:
>
> Aug 09 12:12:48 apollon.suse.de kernel: CPU 0/KVM D 0 25189 1 0x00000320
> Aug 09 12:12:48 apollon.suse.de kernel: Call Trace:
> Aug 09 12:12:48 apollon.suse.de kernel: ? __schedule+0x2af/0x6a0
> Aug 09 12:12:48 apollon.suse.de kernel: schedule+0x33/0x90
> Aug 09 12:12:48 apollon.suse.de kernel: schedule_preempt_disabled+0xa/0x10
> Aug 09 12:12:48 apollon.suse.de kernel: __mutex_lock.isra.0+0x172/0x4d0
> Aug 09 12:12:48 apollon.suse.de kernel: userptr_mn_invalidate_range_start+0x1bf/0x220 [i915]
> Aug 09 12:12:48 apollon.suse.de kernel: __mmu_notifier_invalidate_range_start+0x57/0xa0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap_one+0xa0b/0xae0
> Aug 09 12:12:48 apollon.suse.de kernel: rmap_walk_file+0xf2/0x250
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_unmap+0xa6/0xe0
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_remove_rmap+0x290/0x290
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_not_mapped+0x20/0x20
> Aug 09 12:12:48 apollon.suse.de kernel: ? page_get_anon_vma+0x80/0x80
> Aug 09 12:12:48 apollon.suse.de kernel: migrate_pages+0x8cd/0xbc0
> Aug 09 12:12:48 apollon.suse.de kernel: ? fast_isolate_freepages+0x6b0/0x6b0
> Aug 09 12:12:48 apollon.suse.de kernel: ? move_freelist_tail+0xb0/0xb0
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone+0x669/0xc80
> Aug 09 12:12:48 apollon.suse.de kernel: compact_zone_order+0xc6/0xf0
> Aug 09 12:12:48 apollon.suse.de kernel: try_to_compact_pages+0xcc/0x2a0
> Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_direct_compact+0x7c/0x150
> Aug 09 12:12:48 apollon.suse.de kernel: __alloc_pages_slowpath+0x1ee/0xd00
> Aug 09 12:12:48 apollon.suse.de kernel: ? vmx_vcpu_load+0x100/0x120 [kvm_intel]
>
> Full logs can be found under https://pastebin.com/KJ6tccj4
> I haven't yet tried if this is reproducible.
Set page dirty unless someone else is taking care of it.
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -663,7 +663,7 @@ i915_gem_userptr_put_pages(struct drm_i9
i915_gem_gtt_finish_pages(obj, pages);
for_each_sgt_page(page, sgt_iter, pages) {
- if (obj->mm.dirty)
+ if (obj->mm.dirty) {
/*
* As this may not be anonymous memory (e.g. shmem)
* but exist on a real mapping, we have to lock
@@ -672,8 +672,15 @@ i915_gem_userptr_put_pages(struct drm_i9
* prevent the inode from being truncated.
* Play safe and take the lock.
*/
- set_page_dirty_lock(page);
-
+ if (trylock_page(page)) {
+ set_page_dirty(page);
+ unlock_page(page);
+ }
+ /*
+ * else someone else is taking care of page and
+ * we can do nothing about it to avoid deadlock
+ */
+ }
mark_page_accessed(page);
put_page(page);
}
--
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 2+ messages in thread
* ✗ Fi.CI.BAT: failure for 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
2019-08-09 14:31 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages Hillf Danton
@ 2019-08-09 16:46 ` Patchwork
0 siblings, 0 replies; 2+ messages in thread
From: Patchwork @ 2019-08-09 16:46 UTC (permalink / raw)
To: Hillf Danton; +Cc: intel-gfx
== Series Details ==
Series: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
URL : https://patchwork.freedesktop.org/series/64983/
State : failure
== Summary ==
Applying: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gem/i915_gem_userptr.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-08-09 16:46 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09 14:31 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages Hillf Danton
2019-08-09 16:46 ` ✗ Fi.CI.BAT: failure for " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.