From: Demi Marie Obenour <email@example.com> To: Juergen Gross <firstname.lastname@example.org>, Xen developer discussion <email@example.com>, Andrew Morton <firstname.lastname@example.org> Cc: "Boris Ostrovski" <email@example.com>, "Marek Marczykowski-Górecki" <firstname.lastname@example.org>, email@example.com, "Jani Nikula" <firstname.lastname@example.org>, "Joonas Lahtinen" <email@example.com>, "Rodrigo Vivi" <firstname.lastname@example.org>, "Tvrtko Ursulin" <email@example.com>, "David Airlie" <firstname.lastname@example.org>, "Daniel Vetter" <email@example.com>, "Intel Graphics Development" <firstname.lastname@example.org>, "DRI Development" <email@example.com>, "Linux Memory Management" <firstname.lastname@example.org>, email@example.com Subject: [REGRESSION] Hang in 5.17.4+ that appears to be due to Xen Date: Thu, 19 May 2022 12:39:40 -0400 [thread overview] Message-ID: <YoZy6BRIkfoeY8af@itl-email> (raw) In-Reply-To: <YoJZcUsiE3y6oul5@itl-email> [-- Attachment #1: Type: text/plain, Size: 3568 bytes --] On Mon, May 16, 2022 at 10:00:07AM -0400, Demi Marie Obenour wrote: > On Mon, May 16, 2022 at 08:48:17AM +0200, Juergen Gross wrote: > > On 14.05.22 17:55, Demi Marie Obenour wrote: > > > In https://github.com/QubesOS/qubes-issues/issues/7481, a user reported > > > that Xorg locked up when resizing a VM window. While I do not have the > > > same hardware the user does and thus cannot reproduce the bug, the stack > > > trace seems to indicate a deadlock between xen_gntdev and i915. It > > > appears that gnttab_unmap_refs_sync() is waiting for i915 to free the > > > pages, while i915 is waiting for the MMU notifier that called > > > gnttab_unmap_refs_sync() to return. Result: deadlock. > > > > > > The problem appears to be that a mapped grant in PV mode will stay in > > > the “invalidating” state until it is freed. While MMU notifiers are > > > allowed to sleep, it appears that they cannot wait for the page to be > > > freed, as is happening here. That said, I am not very familiar with > > > this code, so my diagnosis might be incorrect. > > > > All I can say for now is that your patch seems to be introducing a use after > > free issue, as the parameters of the delayed work might get freed now before > > the delayed work is being executed. > > I figured it was wrong, not least because I don’t think it compiles > (invalid use of void value). That said, the current behavior is quite > suspicious to me. For one, it appears that munmap() on a grant in a PV > domain will not return until nobody else is using the page. This is not > what I would expect, and I can easily imagine it causing deadlocks in > userspace. Instead, I would expect for gntdev to automatically release > the grant when the reference count hits zero. This would also allow for > the same grant to be mapped in multiple processes, and might even unlock > DMA-BUF support. > > > I don't know why this is happening only with rather recent kernels, as the > > last gntdev changes in this area have been made in kernel 4.13. > > > > I'd suggest to look at i915, as quite some work has happened in the code > > visible in your stack backtraces rather recently. Maybe it would be possible > > to free the pages in i915 before calling the MMU notifier? > > While I agree that the actual problem is almost certainly in i915, the > gntdev code does appear rather fragile. Since so few people use i915 + > Xen, problems with the combination generally don’t show up until some > Qubes user makes a bug report, which isn’t great. It would be better if > Xen didn’t introduce requirements on other kernel code that did not hold > when not running on Xen. > > In this case, if it is actually an invariant that one must not call MMU > notifiers for pages that are still in use, it would be better if this > was caught by a WARN_ON() or BUG_ON() in the core memory management > code. That would have found the bug instantly and deterministically on > all platforms, whereas the current failure is nondeterministic and only > happens under Xen. > > I also wonder if this is a bug in the core MMU notifier infrastructure. > My reading of the mmu_interval_notifier_remove() documentation is that > it should only wait for the specific notifier being removed to finish, > not for all notifiers to finish. Adding the memory management > maintainers. Also adding the kernel regression tracker. #regzbot introduced v5.16..v5.17.4 -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-05-19 16:40 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-14 15:55 Demi Marie Obenour 2022-05-16 6:48 ` Juergen Gross 2022-05-16 14:00 ` Demi Marie Obenour 2022-05-19 16:39 ` Demi Marie Obenour [this message] 2022-05-19 18:17 ` Demi Marie Obenour
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YoZy6BRIkfoeY8af@itl-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [REGRESSION] Hang in 5.17.4+ that appears to be due to Xen' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).