kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: mst@redhat.com, stefanha@redhat.com
Cc: jasowang@redhat.com, torvalds@linux-foundation.org,
	ascull@google.com, maz@kernel.org, keirf@google.com,
	jiyong@google.com, kernel-team@android.com,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org
Subject: IOTLB support for vhost/vsock breaks crosvm on Android
Date: Fri, 5 Aug 2022 19:11:06 +0100	[thread overview]
Message-ID: <20220805181105.GA29848@willie-the-truck> (raw)

Hi folks,

[tl;dr a change from ~18 months ago breaks Android userspace and I don't
 know what to do about it]

As part of the development work for next year's Android, we've recently
been bringing up a 5.15 KVM host and have observed that vsock no longer
works for communicating with a guest because crosvm gets an unexpected
-EFAULT back from the VHOST_VSOCK_SET_RUNNING ioctl():

 | E crosvm : vpipe worker thread exited with error: VhostVsockStart(IoctlError(Os { code: 14, kind: Uncategorized, message: "Bad address" }))

The same guest payload works correctly on a 5.10 KVM host kernel.

After some digging, we narrowed this change in behaviour down to
e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging
reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to
blame. Indeed, our tests once again pass if we revert that patch (there's
a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET
but otherwise it reverts cleanly).

On Android, KVM runs in a mode where the host kernel is, by default,
unable to access guest memory [1] and so memory used for virtio (e.g.
the rings and descriptor data) needs to be shared explicitly with the
host using hypercalls issued by the guest. We implement this on top of
restricted DMA [2], whereby the guest allocates a pool of shared memory
during boot and bounces all virtio transfers through this window. In
order to get the guest to use the DMA API for virtio devices (which is
required for the SWIOTLB code to kick in and do the aforementioned
bouncing), crosvm sets the VIRTIO_F_ACCESS_PLATFORM feature flag on its
emulated devices and this is picked up by virtio_has_dma_quirk() in the
guest kernel. This has been working well for us so far.

With e13a6915a03f, the vhost backend for vsock now advertises
VIRTIO_F_ACCESS_PLATFORM in its response to the VHOST_GET_FEATURES
ioctl() and accepts it in the VHOST_SET_FEATURES as a mechanism to
enable the IOTLB feature (note: this is nothing to do with SWIOTLB!).
This feature is used for emulation of a virtual IOMMU and requires
explicit support for issuing IOTLB messages (see VHOST_IOTLB_MSG and
VHOST_IOTLB_MSG_V2) from userspace to manage address translations for
the virtio device.

Looking at how crosvm uses these vhost ioctl()s, we can see:

        let avail_features = self
            .vhost_handle
            .get_features()
            .map_err(Error::VhostGetFeatures)?;

        let features: c_ulonglong = self.acked_features & avail_features;
        self.vhost_handle
            .set_features(features)
            .map_err(Error::VhostSetFeatures)?;

where 'acked_features' is the feature set negotiated between crosvm and
the guest, while 'avail_features' is the supported feature set
advertised by vhost. The intersection of these now includes
VIRTIO_F_ACCESS_PLATFORM in the 5.15 kernel and so we quietly start
enabling IOTLB, despite not having any of the necessary support in
crosvm and therefore the vsock thread effectively grinds to a halt and
we cannot communicate with the guest.

The fundamental issue is, I think, that VIRTIO_F_ACCESS_PLATFORM is
being used for two very different things within the same device; for the
guest it basically means "use the DMA API, it knows what to do" but for
vhost it very specifically means "enable IOTLB". We've recently had
other problems with this flag [3] but in this case it used to work
reliably and now it doesn't anymore.

So how should we fix this? One possibility is for us to hack crosvm to
clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features,
but others here have reasonably pointed out that they didn't expect a
kernel change to break userspace. On the flip side, the offending commit
in the kernel isn't exactly new (it's from the end of 2020!) and so it's
likely that others (e.g. QEMU) are using this feature. I also strongly
suspect that vhost net suffers from exactly the same issue, we just
don't happen to be using that (yet) in Android.

Thanks,

Will

[1] https://lwn.net/Articles/836693/
[2] https://lwn.net/Articles/841916/
[3] https://lore.kernel.org/lkml/YtkCQsSvE9AZyrys@google.com/

             reply	other threads:[~2022-08-05 18:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 18:11 Will Deacon [this message]
2022-08-05 22:57 ` IOTLB support for vhost/vsock breaks crosvm on Android Linus Torvalds
2022-08-06  8:17   ` Stefano Garzarella
2022-08-06  9:42   ` Will Deacon
2022-08-07  6:52   ` Christoph Hellwig
2022-08-07 13:35     ` Michael S. Tsirkin
2022-08-07 13:27   ` Michael S. Tsirkin
2022-08-06  7:48 ` Stefano Garzarella
2022-08-06  9:42   ` Will Deacon
2022-08-06 10:52     ` Stefan Hajnoczi
2022-08-06 10:52       ` Stefan Hajnoczi
2022-08-06 14:34       ` Will Deacon
2022-08-06 21:47         ` Stefan Hajnoczi
2022-08-06 10:52     ` Stefano Garzarella
2022-08-07 13:14 ` Michael S. Tsirkin
2022-08-08 10:18   ` Will Deacon
2022-08-08 12:45     ` Michael S. Tsirkin
2022-08-08 14:33       ` Stefan Hajnoczi
2022-08-09  3:16       ` Jason Wang
2022-08-17 13:48       ` Will Deacon
2022-08-17 17:05         ` Michael S. Tsirkin
2022-08-09  3:12   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220805181105.GA29848@willie-the-truck \
    --to=will@kernel.org \
    --cc=ascull@google.com \
    --cc=jasowang@redhat.com \
    --cc=jiyong@google.com \
    --cc=keirf@google.com \
    --cc=kernel-team@android.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).