From: Demi Marie Obenour <demi@invisiblethingslab.com>
To: Juergen Gross <jgross@suse.com>,
Xen developer discussion <xen-devel@lists.xenproject.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: "Boris Ostrovski" <boris.ostrovsky@oracle.com>,
"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
linux-kernel@vger.kernel.org,
"Jani Nikula" <jani.nikula@linux.intel.com>,
"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Tvrtko Ursulin" <tvrtko.ursulin@linux.intel.com>,
"David Airlie" <airlied@linux.ie>,
"Daniel Vetter" <daniel@ffwll.ch>,
"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
"DRI Development" <dri-devel@lists.freedesktop.org>,
"Linux Memory Management" <linux-mm@kvack.org>,
regressions@lists.linux.dev
Subject: [REGRESSION] Hang in 5.17.4+ that appears to be due to Xen
Date: Thu, 19 May 2022 12:39:40 -0400 [thread overview]
Message-ID: <YoZy6BRIkfoeY8af@itl-email> (raw)
In-Reply-To: <YoJZcUsiE3y6oul5@itl-email>
[-- Attachment #1: Type: text/plain, Size: 3568 bytes --]
On Mon, May 16, 2022 at 10:00:07AM -0400, Demi Marie Obenour wrote:
> On Mon, May 16, 2022 at 08:48:17AM +0200, Juergen Gross wrote:
> > On 14.05.22 17:55, Demi Marie Obenour wrote:
> > > In https://github.com/QubesOS/qubes-issues/issues/7481, a user reported
> > > that Xorg locked up when resizing a VM window. While I do not have the
> > > same hardware the user does and thus cannot reproduce the bug, the stack
> > > trace seems to indicate a deadlock between xen_gntdev and i915. It
> > > appears that gnttab_unmap_refs_sync() is waiting for i915 to free the
> > > pages, while i915 is waiting for the MMU notifier that called
> > > gnttab_unmap_refs_sync() to return. Result: deadlock.
> > >
> > > The problem appears to be that a mapped grant in PV mode will stay in
> > > the “invalidating” state until it is freed. While MMU notifiers are
> > > allowed to sleep, it appears that they cannot wait for the page to be
> > > freed, as is happening here. That said, I am not very familiar with
> > > this code, so my diagnosis might be incorrect.
> >
> > All I can say for now is that your patch seems to be introducing a use after
> > free issue, as the parameters of the delayed work might get freed now before
> > the delayed work is being executed.
>
> I figured it was wrong, not least because I don’t think it compiles
> (invalid use of void value). That said, the current behavior is quite
> suspicious to me. For one, it appears that munmap() on a grant in a PV
> domain will not return until nobody else is using the page. This is not
> what I would expect, and I can easily imagine it causing deadlocks in
> userspace. Instead, I would expect for gntdev to automatically release
> the grant when the reference count hits zero. This would also allow for
> the same grant to be mapped in multiple processes, and might even unlock
> DMA-BUF support.
>
> > I don't know why this is happening only with rather recent kernels, as the
> > last gntdev changes in this area have been made in kernel 4.13.
> >
> > I'd suggest to look at i915, as quite some work has happened in the code
> > visible in your stack backtraces rather recently. Maybe it would be possible
> > to free the pages in i915 before calling the MMU notifier?
>
> While I agree that the actual problem is almost certainly in i915, the
> gntdev code does appear rather fragile. Since so few people use i915 +
> Xen, problems with the combination generally don’t show up until some
> Qubes user makes a bug report, which isn’t great. It would be better if
> Xen didn’t introduce requirements on other kernel code that did not hold
> when not running on Xen.
>
> In this case, if it is actually an invariant that one must not call MMU
> notifiers for pages that are still in use, it would be better if this
> was caught by a WARN_ON() or BUG_ON() in the core memory management
> code. That would have found the bug instantly and deterministically on
> all platforms, whereas the current failure is nondeterministic and only
> happens under Xen.
>
> I also wonder if this is a bug in the core MMU notifier infrastructure.
> My reading of the mmu_interval_notifier_remove() documentation is that
> it should only wait for the specific notifier being removed to finish,
> not for all notifiers to finish. Adding the memory management
> maintainers.
Also adding the kernel regression tracker.
#regzbot introduced v5.16..v5.17.4
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-05-19 16:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-14 15:55 Hang in 5.17.4+ that appears to be due to Xen Demi Marie Obenour
2022-05-16 6:48 ` Juergen Gross
2022-05-16 14:00 ` Demi Marie Obenour
2022-05-19 16:39 ` Demi Marie Obenour [this message]
2022-05-19 18:17 ` Demi Marie Obenour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YoZy6BRIkfoeY8af@itl-email \
--to=demi@invisiblethingslab.com \
--cc=airlied@linux.ie \
--cc=akpm@linux-foundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=jgross@suse.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=marmarek@invisiblethingslab.com \
--cc=regressions@lists.linux.dev \
--cc=rodrigo.vivi@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).