* IOTLB support for vhost/vsock breaks crosvm on Android @ 2022-08-05 18:11 Will Deacon 2022-08-05 22:57 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Will Deacon @ 2022-08-05 18:11 UTC (permalink / raw) To: mst, stefanha Cc: jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Hi folks, [tl;dr a change from ~18 months ago breaks Android userspace and I don't know what to do about it] As part of the development work for next year's Android, we've recently been bringing up a 5.15 KVM host and have observed that vsock no longer works for communicating with a guest because crosvm gets an unexpected -EFAULT back from the VHOST_VSOCK_SET_RUNNING ioctl(): | E crosvm : vpipe worker thread exited with error: VhostVsockStart(IoctlError(Os { code: 14, kind: Uncategorized, message: "Bad address" })) The same guest payload works correctly on a 5.10 KVM host kernel. After some digging, we narrowed this change in behaviour down to e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to blame. Indeed, our tests once again pass if we revert that patch (there's a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET but otherwise it reverts cleanly). On Android, KVM runs in a mode where the host kernel is, by default, unable to access guest memory [1] and so memory used for virtio (e.g. the rings and descriptor data) needs to be shared explicitly with the host using hypercalls issued by the guest. We implement this on top of restricted DMA [2], whereby the guest allocates a pool of shared memory during boot and bounces all virtio transfers through this window. In order to get the guest to use the DMA API for virtio devices (which is required for the SWIOTLB code to kick in and do the aforementioned bouncing), crosvm sets the VIRTIO_F_ACCESS_PLATFORM feature flag on its emulated devices and this is picked up by virtio_has_dma_quirk() in the guest kernel. This has been working well for us so far. With e13a6915a03f, the vhost backend for vsock now advertises VIRTIO_F_ACCESS_PLATFORM in its response to the VHOST_GET_FEATURES ioctl() and accepts it in the VHOST_SET_FEATURES as a mechanism to enable the IOTLB feature (note: this is nothing to do with SWIOTLB!). This feature is used for emulation of a virtual IOMMU and requires explicit support for issuing IOTLB messages (see VHOST_IOTLB_MSG and VHOST_IOTLB_MSG_V2) from userspace to manage address translations for the virtio device. Looking at how crosvm uses these vhost ioctl()s, we can see: let avail_features = self .vhost_handle .get_features() .map_err(Error::VhostGetFeatures)?; let features: c_ulonglong = self.acked_features & avail_features; self.vhost_handle .set_features(features) .map_err(Error::VhostSetFeatures)?; where 'acked_features' is the feature set negotiated between crosvm and the guest, while 'avail_features' is the supported feature set advertised by vhost. The intersection of these now includes VIRTIO_F_ACCESS_PLATFORM in the 5.15 kernel and so we quietly start enabling IOTLB, despite not having any of the necessary support in crosvm and therefore the vsock thread effectively grinds to a halt and we cannot communicate with the guest. The fundamental issue is, I think, that VIRTIO_F_ACCESS_PLATFORM is being used for two very different things within the same device; for the guest it basically means "use the DMA API, it knows what to do" but for vhost it very specifically means "enable IOTLB". We've recently had other problems with this flag [3] but in this case it used to work reliably and now it doesn't anymore. So how should we fix this? One possibility is for us to hack crosvm to clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, but others here have reasonably pointed out that they didn't expect a kernel change to break userspace. On the flip side, the offending commit in the kernel isn't exactly new (it's from the end of 2020!) and so it's likely that others (e.g. QEMU) are using this feature. I also strongly suspect that vhost net suffers from exactly the same issue, we just don't happen to be using that (yet) in Android. Thanks, Will [1] https://lwn.net/Articles/836693/ [2] https://lwn.net/Articles/841916/ [3] https://lore.kernel.org/lkml/YtkCQsSvE9AZyrys@google.com/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 18:11 IOTLB support for vhost/vsock breaks crosvm on Android Will Deacon @ 2022-08-05 22:57 ` Linus Torvalds 2022-08-06 8:17 ` Stefano Garzarella ` (3 more replies) 2022-08-06 7:48 ` Stefano Garzarella 2022-08-07 13:14 ` Michael S. Tsirkin 2 siblings, 4 replies; 22+ messages in thread From: Linus Torvalds @ 2022-08-05 22:57 UTC (permalink / raw) To: Will Deacon Cc: mst, stefanha, jasowang, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, Stefano Garzarella On Fri, Aug 5, 2022 at 11:11 AM Will Deacon <will@kernel.org> wrote: > > [tl;dr a change from ~18 months ago breaks Android userspace and I don't > know what to do about it] Augh. I had hoped that android being "closer" to upstream would have meant that somebody actually tests android with upstream kernels. People occasionally talk about it, but apparently it's not actually done. Or maybe it's done onl;y with a very limited android user space. The whole "we notice that something that happened 18 months ago broke our environment" is kind of broken. > After some digging, we narrowed this change in behaviour down to > e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging > reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to > blame. Indeed, our tests once again pass if we revert that patch (there's > a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET > but otherwise it reverts cleanly). I have to say, this smells for *so* many reasons. Why is "IOMMU support" called "VIRTIO_F_ACCESS_PLATFORM"? That seems insane, but seems fundamental in that commit e13a6915a03f ("vhost/vsock: add IOTLB API support") This code if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { if (vhost_init_device_iotlb(&vsock->dev, true)) goto err; } just makes me go "What?" It makes no sense. Why isn't that feature called something-something-IOTLB? Can we please just split that flag into two, and have that odd "platform access" be one bit, and the "enable iommu" be an entirely different bit? Now, since clearly nobody runs Android on newer kernels, I do think that the actual bit number choice should probably be one that makes the non-android use case binaries continue to work. And then the android system binaries that use this could maybe be compiled to know about *both* bits,. and work regardless? I'm also hoping that maybe Google android people could actually do some *testing*? I know, that sounds like a lot to ask, but humor me. Even if the product team runs stuff that is 18 months old, how about the dev team have a machine or two that actually tests current kernels, so that it's not a "oh, a few years have passed, and now we notice that a change doesn't work for us" situation any more. Is that really too much to ask for a big company like google? And hey, it's possible that the bit encoding is *so* incestuous that it's really hard to split it into two. But it really sounds to me like somebody mindlessly re-used a feature bit for a *completely* different thing. Why? Why have feature bits at all, when you then re-use the same bit for two different features? It kind of seems to defeat the whole purpose. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 22:57 ` Linus Torvalds @ 2022-08-06 8:17 ` Stefano Garzarella 2022-08-06 9:42 ` Will Deacon ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Stefano Garzarella @ 2022-08-06 8:17 UTC (permalink / raw) To: Linus Torvalds, mst, jasowang Cc: Will Deacon, stefanha, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Hi Linus, On Fri, Aug 05, 2022 at 03:57:08PM -0700, Linus Torvalds wrote: >On Fri, Aug 5, 2022 at 11:11 AM Will Deacon <will@kernel.org> wrote: >> >> [tl;dr a change from ~18 months ago breaks Android userspace and I don't >> know what to do about it] > >Augh. > >I had hoped that android being "closer" to upstream would have meant >that somebody actually tests android with upstream kernels. People >occasionally talk about it, but apparently it's not actually done. > >Or maybe it's done onl;y with a very limited android user space. > >The whole "we notice that something that happened 18 months ago broke >our environment" is kind of broken. > >> After some digging, we narrowed this change in behaviour down to >> e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging >> reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to >> blame. Indeed, our tests once again pass if we revert that patch (there's >> a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET >> but otherwise it reverts cleanly). > >I have to say, this smells for *so* many reasons. > >Why is "IOMMU support" called "VIRTIO_F_ACCESS_PLATFORM"? > >That seems insane, but seems fundamental in that commit e13a6915a03f >("vhost/vsock: add IOTLB API support") > >This code > > if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { > if (vhost_init_device_iotlb(&vsock->dev, true)) > goto err; > } > >just makes me go "What?" It makes no sense. Why isn't that feature >called something-something-IOTLB? I honestly don't know the reason for the name but VIRTIO_F_ACCESS_PLATFORM comes from the virtio specification: https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v1.2-cs01.html#x1-6600006 VIRTIO_F_ACCESS_PLATFORM(33) This feature indicates that the device can be used on a platform where device access to data in memory is limited and/or translated. E.g. this is the case if the device can be located behind an IOMMU that translates bus addresses from the device into physical addresses in memory, if the device can be limited to only access certain memory addresses or if special commands such as a cache flush can be needed to synchronise data in memory with the device. Whether accesses are actually limited or translated is described by platform-specific means. If this feature bit is set to 0, then the device has same access to memory addresses supplied to it as the driver has. In particular, the device will always use physical addresses matching addresses used by the driver (typically meaning physical addresses used by the CPU) and not translated further, and can access any address supplied to it by the driver. When clear, this overrides any platform-specific description of whether device access is limited or translated in any way, e.g. whether an IOMMU may be present. > >Can we please just split that flag into two, and have that odd >"platform access" be one bit, and the "enable iommu" be an entirely >different bit? IIUC the problem here is that the VMM does the translation and then for the device there is actually no need to translate, so this feature should not be negotiated by crosvm and vhost-vsock, but just between guest's driver and crosvm. Perhaps the confusion is that we use VIRTIO_F_ACCESS_PLATFORM both between guest and VMM and between VMM and vhost device. In fact, prior to commit e13a6915a03f ("vhost/vsock: add IOTLB API support"), vhost-vsock did not work when a VMM (e.g., QEMU) tried to negotiate translation with the device: https://bugzilla.redhat.com/show_bug.cgi?id=1894101 The simplest solution is that crosvm doesn't negotiate VIRTIO_F_ACCESS_PLATFORM with the vhost-vsock device if it doesn't want to use translation and send messages to set it. In fact before commit e13a6915a03f ("vhost/vsock: add IOTLB API support") this feature was not exposed by the vhost-vsock device, so it was never negotiated. Now crosvm is enabling a new feature (not masking guest-negotiated features) so I don't think it's a break in user space, if the user space enable it. I tried to explain what I understood when I made the change, Michael and Jason surely can add more information. Thanks, Stefano ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 22:57 ` Linus Torvalds 2022-08-06 8:17 ` Stefano Garzarella @ 2022-08-06 9:42 ` Will Deacon 2022-08-07 6:52 ` Christoph Hellwig 2022-08-07 13:27 ` Michael S. Tsirkin 3 siblings, 0 replies; 22+ messages in thread From: Will Deacon @ 2022-08-06 9:42 UTC (permalink / raw) To: Linus Torvalds Cc: mst, stefanha, jasowang, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, Stefano Garzarella On Fri, Aug 05, 2022 at 03:57:08PM -0700, Linus Torvalds wrote: > On Fri, Aug 5, 2022 at 11:11 AM Will Deacon <will@kernel.org> wrote: > > > > [tl;dr a change from ~18 months ago breaks Android userspace and I don't > > know what to do about it] > > Augh. > > I had hoped that android being "closer" to upstream would have meant > that somebody actually tests android with upstream kernels. People > occasionally talk about it, but apparently it's not actually done. > > Or maybe it's done onl;y with a very limited android user space. We do actually test every -rc with Android (and run a whole bunch of regression tests), this is largely using x86 builds for convenience but we've been bringing up arm64 recently and are getting increasingly more coverage there. So this _will_ improve and relatively soon. The kicker in this case is that we'd only catch it on systems using pKVM (arm64 host only; upstreaming ongoing) with restricted DMA (requires device-tree) and so it slipped through. This is made more challenging for CI because arm64 devices don't tend to have support for nested virtualisation and so we have to run bare-metal but, as I say, we're getting there. > > After some digging, we narrowed this change in behaviour down to > > e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging > > reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to > > blame. Indeed, our tests once again pass if we revert that patch (there's > > a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET > > but otherwise it reverts cleanly). > > I have to say, this smells for *so* many reasons. > > Why is "IOMMU support" called "VIRTIO_F_ACCESS_PLATFORM"? It was already renamed once (!) It used to be VIRTIO_F_IOMMU_PLATFORM... > That seems insane, but seems fundamental in that commit e13a6915a03f > ("vhost/vsock: add IOTLB API support") > > This code > > if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { > if (vhost_init_device_iotlb(&vsock->dev, true)) > goto err; > } > > just makes me go "What?" It makes no sense. Why isn't that feature > called something-something-IOTLB? > > Can we please just split that flag into two, and have that odd > "platform access" be one bit, and the "enable iommu" be an entirely > different bit? Something along those lines makes sense to me, but it's fiddly because the bits being used here are part of the virtio spec and we can't freely allocate them in Linux. I reckon it would probably be better to have a separate mechanism to enable IOTLB and not repurpose this flag for it. Hindsight is a wonderful thing. > And hey, it's possible that the bit encoding is *so* incestuous that > it's really hard to split it into two. But it really sounds to me like > somebody mindlessly re-used a feature bit for a *completely* different > thing. Why? > > Why have feature bits at all, when you then re-use the same bit for > two different features? It kind of seems to defeat the whole purpose. No argument here, and it's a big part of the reason I made the effort to write this up. Yes, we hit this in Android. Yes, we should've hit it sooner. But is it specific to Android? No. Anybody wanting a guest to use the DMA API for its virtio devices is going to be setting this flag and if they implement the same algorithm as crosvm then they're going to hit exactly the same problem that we did. Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 22:57 ` Linus Torvalds 2022-08-06 8:17 ` Stefano Garzarella 2022-08-06 9:42 ` Will Deacon @ 2022-08-07 6:52 ` Christoph Hellwig 2022-08-07 13:35 ` Michael S. Tsirkin 2022-08-07 13:27 ` Michael S. Tsirkin 3 siblings, 1 reply; 22+ messages in thread From: Christoph Hellwig @ 2022-08-07 6:52 UTC (permalink / raw) To: Linus Torvalds Cc: Will Deacon, mst, stefanha, jasowang, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, Stefano Garzarella On Fri, Aug 05, 2022 at 03:57:08PM -0700, Linus Torvalds wrote: > Why is "IOMMU support" called "VIRTIO_F_ACCESS_PLATFORM"? Because, as far as the virtio spec and virtio "guest" implementation is concerned it is not about IOMMU support at all. It is about treating virtio DMA as real DMA by the platform, which lets the platform let whatever method of DMA mapping it needs to the virtio device. This is needed to make sure harware virtio device are treated like actual hardware and not like a magic thing bypassing the normal PCIe rules. Using an IOMMU if one is present for bus is just one thing, others are using offets of DMAs that are very common on non-x86 platforms, or doing the horrible cache flushing needed on devices where PCIe is not cache coherent. It really is vhost that seems to abuse it so that if the guest claims it can handle VIRTIO_F_ACCESS_PLATFORM (which every modern guest should) it enables magic behavior, which I don't think is what the virtio spec intended. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-07 6:52 ` Christoph Hellwig @ 2022-08-07 13:35 ` Michael S. Tsirkin 0 siblings, 0 replies; 22+ messages in thread From: Michael S. Tsirkin @ 2022-08-07 13:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Linus Torvalds, Will Deacon, stefanha, jasowang, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, Stefano Garzarella On Sat, Aug 06, 2022 at 11:52:13PM -0700, Christoph Hellwig wrote: > It really is vhost that seems to abuse it so that if the guest > claims it can handle VIRTIO_F_ACCESS_PLATFORM (which every modern > guest should) it enables magic behavior, which I don't think is what > the virtio spec intended. Well the magic behavour happens to be used by QEMU to implement a virtual IOMMU. And when you have a virtual IOMMU you generally want VIRTIO_F_ACCESS_PLATFORM. This is how it came to be reused for that. And since QEMU never passed guest features to vhost unfiltered we never saw the issue even with old QEMU versions on new kernels. It seems natural to pass features unfiltered and we never even said userspace should not do it, so it's quite understandable that this is what corsvm did. -- MST ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 22:57 ` Linus Torvalds ` (2 preceding siblings ...) 2022-08-07 6:52 ` Christoph Hellwig @ 2022-08-07 13:27 ` Michael S. Tsirkin 3 siblings, 0 replies; 22+ messages in thread From: Michael S. Tsirkin @ 2022-08-07 13:27 UTC (permalink / raw) To: Linus Torvalds Cc: Will Deacon, stefanha, jasowang, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, Stefano Garzarella On Fri, Aug 05, 2022 at 03:57:08PM -0700, Linus Torvalds wrote: > And hey, it's possible that the bit encoding is *so* incestuous that > it's really hard to split it into two. But it really sounds to me like > somebody mindlessly re-used a feature bit for a *completely* different > thing. Why? > > Why have feature bits at all, when you then re-use the same bit for > two different features? It kind of seems to defeat the whole purpose. What can I say? Hindsight is 20/20. The two things are *related* in that IOTLB in vhost is a way for userspace (the platform) to limit device access to guest memory. So we reused the feature bits (it's not the only one, just the one we changed most recently). It bothered me a bit but everyone seemed happy and was able to refer to virtio spec for documentation so there was less documentation to write for Linux. It's not that it's hard to split it generally, it's just that it's been there like this for a while so it's hard to change now - we need to find a way that does not break existing userspace. -- MST ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 18:11 IOTLB support for vhost/vsock breaks crosvm on Android Will Deacon 2022-08-05 22:57 ` Linus Torvalds @ 2022-08-06 7:48 ` Stefano Garzarella 2022-08-06 9:42 ` Will Deacon 2022-08-07 13:14 ` Michael S. Tsirkin 2 siblings, 1 reply; 22+ messages in thread From: Stefano Garzarella @ 2022-08-06 7:48 UTC (permalink / raw) To: Will Deacon Cc: mst, stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Hi Will, On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: >Hi folks, > >[tl;dr a change from ~18 months ago breaks Android userspace and I don't > know what to do about it] > >As part of the development work for next year's Android, we've recently >been bringing up a 5.15 KVM host and have observed that vsock no longer >works for communicating with a guest because crosvm gets an unexpected >-EFAULT back from the VHOST_VSOCK_SET_RUNNING ioctl(): > > | E crosvm : vpipe worker thread exited with error: VhostVsockStart(IoctlError(Os { code: 14, kind: Uncategorized, message: "Bad address" })) > >The same guest payload works correctly on a 5.10 KVM host kernel. > >After some digging, we narrowed this change in behaviour down to >e13a6915a03f ("vhost/vsock: add IOTLB API support") and further digging >reveals that the infamous VIRTIO_F_ACCESS_PLATFORM feature flag is to >blame. Indeed, our tests once again pass if we revert that patch (there's >a trivial conflict with the later addition of VIRTIO_VSOCK_F_SEQPACKET >but otherwise it reverts cleanly). > >On Android, KVM runs in a mode where the host kernel is, by default, >unable to access guest memory [1] and so memory used for virtio (e.g. >the rings and descriptor data) needs to be shared explicitly with the >host using hypercalls issued by the guest. We implement this on top of >restricted DMA [2], whereby the guest allocates a pool of shared memory >during boot and bounces all virtio transfers through this window. In >order to get the guest to use the DMA API for virtio devices (which is >required for the SWIOTLB code to kick in and do the aforementioned >bouncing), crosvm sets the VIRTIO_F_ACCESS_PLATFORM feature flag on its >emulated devices and this is picked up by virtio_has_dma_quirk() in the >guest kernel. This has been working well for us so far. > >With e13a6915a03f, the vhost backend for vsock now advertises >VIRTIO_F_ACCESS_PLATFORM in its response to the VHOST_GET_FEATURES >ioctl() and accepts it in the VHOST_SET_FEATURES as a mechanism to >enable the IOTLB feature (note: this is nothing to do with SWIOTLB!). >This feature is used for emulation of a virtual IOMMU and requires >explicit support for issuing IOTLB messages (see VHOST_IOTLB_MSG and >VHOST_IOTLB_MSG_V2) from userspace to manage address translations for >the virtio device. > >Looking at how crosvm uses these vhost ioctl()s, we can see: > > let avail_features = self > .vhost_handle > .get_features() > .map_err(Error::VhostGetFeatures)?; > > let features: c_ulonglong = self.acked_features & avail_features; > self.vhost_handle > .set_features(features) > .map_err(Error::VhostSetFeatures)?; > >where 'acked_features' is the feature set negotiated between crosvm and >the guest, while 'avail_features' is the supported feature set >advertised by vhost. The intersection of these now includes >VIRTIO_F_ACCESS_PLATFORM in the 5.15 kernel and so we quietly start >enabling IOTLB, despite not having any of the necessary support in >crosvm and therefore the vsock thread effectively grinds to a halt and >we cannot communicate with the guest. > >The fundamental issue is, I think, that VIRTIO_F_ACCESS_PLATFORM is >being used for two very different things within the same device; for the >guest it basically means "use the DMA API, it knows what to do" but for >vhost it very specifically means "enable IOTLB". We've recently had >other problems with this flag [3] but in this case it used to work >reliably and now it doesn't anymore. > >So how should we fix this? One possibility is for us to hack crosvm to >clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost Why do you consider this a hack? If the VMM implements the translation feature, it is right in my opinion that it does not enable the feature for the vhost device. Otherwise, if it wants the vhost device to do the translation, enable the feature and send the IOTLB messages to set the translation. QEMU for example masks features when not required or supported. crosvm should negotiate only the features it supports. @Michael and @Jason can correct me, but if a vhost device negotiates VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to set the translation. Indeed that patch was to fix https://bugzilla.redhat.com/show_bug.cgi?id=1894101 >features, >but others here have reasonably pointed out that they didn't expect a >kernel change to break userspace. On the flip side, the offending commit >in the kernel isn't exactly new (it's from the end of 2020!) and so it's >likely that others (e.g. QEMU) are using this feature. Yep, QEMU can use it. >I also strongly >suspect that vhost net suffers from exactly the same issue, we just >don't happen to be using that (yet) in Android. I think so too, the implementation in vsock was done following vhost-net. Thanks, Stefano ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 7:48 ` Stefano Garzarella @ 2022-08-06 9:42 ` Will Deacon 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 10:52 ` Stefano Garzarella 0 siblings, 2 replies; 22+ messages in thread From: Will Deacon @ 2022-08-06 9:42 UTC (permalink / raw) To: Stefano Garzarella Cc: mst, stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Hi Stefano, On Sat, Aug 06, 2022 at 09:48:28AM +0200, Stefano Garzarella wrote: > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > The fundamental issue is, I think, that VIRTIO_F_ACCESS_PLATFORM is > > being used for two very different things within the same device; for the > > guest it basically means "use the DMA API, it knows what to do" but for > > vhost it very specifically means "enable IOTLB". We've recently had > > other problems with this flag [3] but in this case it used to work > > reliably and now it doesn't anymore. > > > > So how should we fix this? One possibility is for us to hack crosvm to > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost > > Why do you consider this a hack? I think it's a hack for two reasons: (1) We're changing userspace to avoid a breaking change in kernel behaviour (2) I think that crosvm's approach is actually pretty reasonable To elaborate on (2), crosvm has a set of device features that it has negotiated with the guest. It then takes the intersection of these features with those advertised by VHOST_GET_FEATURES and calls VHOST_SET_FEATURES with the result. If there was a common interpretation of what these features do, then this would work and would mean we wouldn't have to opt-in on a per-flag basis for vhost. Since VIRTIO_F_ACCESS_PLATFORM is being overloaded to mean two completely different things, then it breaks and I think masking out that specific flag is a hack because it's basically crosvm saying "yeah, I may have negotiated this with the driver but vhost _actually_ means 'IOTLB' when it says it supports this flag so I'll mask it out because I know better". > If the VMM implements the translation feature, it is right in my opinion > that it does not enable the feature for the vhost device. Otherwise, if it > wants the vhost device to do the translation, enable the feature and send > the IOTLB messages to set the translation. > > QEMU for example masks features when not required or supported. > crosvm should negotiate only the features it supports. > > @Michael and @Jason can correct me, but if a vhost device negotiates > VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to > set the translation. As above, the issue is that vhost now unconditionally advertises this in VHOST_GET_FEATURES and so a VMM with no knowledge of IOTLB can end up enabling it by accident. Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 9:42 ` Will Deacon @ 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 14:34 ` Will Deacon 2022-08-06 10:52 ` Stefano Garzarella 1 sibling, 2 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-08-06 10:52 UTC (permalink / raw) To: Will Deacon Cc: Stefano Garzarella, mst, Stefan Hajnoczi, Jason Wang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm On Sat, Aug 6, 2022 at 5:50 AM Will Deacon <will@kernel.org> wrote: > On Sat, Aug 06, 2022 at 09:48:28AM +0200, Stefano Garzarella wrote: > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > If the VMM implements the translation feature, it is right in my opinion > > that it does not enable the feature for the vhost device. Otherwise, if it > > wants the vhost device to do the translation, enable the feature and send > > the IOTLB messages to set the translation. > > > > QEMU for example masks features when not required or supported. > > crosvm should negotiate only the features it supports. > > > > @Michael and @Jason can correct me, but if a vhost device negotiates > > VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to > > set the translation. > > As above, the issue is that vhost now unconditionally advertises this in > VHOST_GET_FEATURES and so a VMM with no knowledge of IOTLB can end up > enabling it by accident. Unconditionally exposing all vhost feature bits to the guest is incorrect. The emulator must filter out only the feature bits that it supports. For example, see QEMU's vhost-net device's vhost feature bit allowlist: https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/vhost_net.c#L40 The reason why the emulator (crosvm/QEMU/etc) must filter out feature bits is that vhost devices are full VIRTIO devices. They are a subset of a VIRTIO device and the emulator is responsible for the rest of the device. Some features will require both vhost and emulator support. Therefore it is incorrect to expect the device to work correctly if the vhost feature bits are passed through to the guest. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 10:52 ` Stefan Hajnoczi @ 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 14:34 ` Will Deacon 1 sibling, 0 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-08-06 10:52 UTC (permalink / raw) To: Will Deacon Cc: Stefano Garzarella, mst, Stefan Hajnoczi, Jason Wang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm s/vhost devices are full VIRTIO devices/vhost devices are not full VIRTIO devices/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 10:52 ` Stefan Hajnoczi @ 2022-08-06 14:34 ` Will Deacon 2022-08-06 21:47 ` Stefan Hajnoczi 1 sibling, 1 reply; 22+ messages in thread From: Will Deacon @ 2022-08-06 14:34 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Stefano Garzarella, mst, Stefan Hajnoczi, Jason Wang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm On Sat, Aug 06, 2022 at 06:52:15AM -0400, Stefan Hajnoczi wrote: > On Sat, Aug 6, 2022 at 5:50 AM Will Deacon <will@kernel.org> wrote: > > On Sat, Aug 06, 2022 at 09:48:28AM +0200, Stefano Garzarella wrote: > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > If the VMM implements the translation feature, it is right in my opinion > > > that it does not enable the feature for the vhost device. Otherwise, if it > > > wants the vhost device to do the translation, enable the feature and send > > > the IOTLB messages to set the translation. > > > > > > QEMU for example masks features when not required or supported. > > > crosvm should negotiate only the features it supports. > > > > > > @Michael and @Jason can correct me, but if a vhost device negotiates > > > VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to > > > set the translation. > > > > As above, the issue is that vhost now unconditionally advertises this in > > VHOST_GET_FEATURES and so a VMM with no knowledge of IOTLB can end up > > enabling it by accident. > > Unconditionally exposing all vhost feature bits to the guest is > incorrect. The emulator must filter out only the feature bits that it > supports. I've evidently done a bad job of explaining this, sorry. crosvm _does_ filter the feature bits which it passes to vhost. It takes the feature set which it has negotiated with the guest and then takes the intersection of this set with the set of features which vhost advertises. The result is what is passed to VHOST_SET_FEATURES. I included the rust for this in my initial mail, but in C it might look something like: u64 features = negotiate_features_with_guest(dev); ioctl(vhost_fd, VHOST_GET_FEATURES, &vhost_features); vhost_features &= features; /* Mask out unsupported features */ ioctl(vhost_fd, VHOST_SET_FEATURES, &vhost_features); The problem is that crosvm has negotiated VIRTIO_F_ACCESS_PLATFORM with the guest so that restricted DMA is used for the virtio devices. With e13a6915a03f, VIRTIO_F_ACCESS_PLATFORM is now advertised by VHOST_GET_FEATURES and so IOTLB is enabled by the sequence above. > For example, see QEMU's vhost-net device's vhost feature bit allowlist: > https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/vhost_net.c#L40 I agree that changing crosvm to use an allowlist would fix the problem, I'm just questioning whether we should be changing userspace at all to resolve this issue. > The reason why the emulator (crosvm/QEMU/etc) must filter out feature > bits is that vhost devices are full VIRTIO devices. They are a subset > of a VIRTIO device and the emulator is responsible for the rest of the > device. Some features will require both vhost and emulator support. > Therefore it is incorrect to expect the device to work correctly if > the vhost feature bits are passed through to the guest. I think crosvm is trying to cater for this by masking out the features it doesn't know about. Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 14:34 ` Will Deacon @ 2022-08-06 21:47 ` Stefan Hajnoczi 0 siblings, 0 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-08-06 21:47 UTC (permalink / raw) To: Will Deacon Cc: Stefano Garzarella, Michael Tsirkin, Stefan Hajnoczi, Jason Wang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm On Sat, Aug 6, 2022 at 10:35 AM Will Deacon <will@kernel.org> wrote: > > On Sat, Aug 06, 2022 at 06:52:15AM -0400, Stefan Hajnoczi wrote: > > On Sat, Aug 6, 2022 at 5:50 AM Will Deacon <will@kernel.org> wrote: > > > On Sat, Aug 06, 2022 at 09:48:28AM +0200, Stefano Garzarella wrote: > > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > > If the VMM implements the translation feature, it is right in my opinion > > > > that it does not enable the feature for the vhost device. Otherwise, if it > > > > wants the vhost device to do the translation, enable the feature and send > > > > the IOTLB messages to set the translation. > > > > > > > > QEMU for example masks features when not required or supported. > > > > crosvm should negotiate only the features it supports. > > > > > > > > @Michael and @Jason can correct me, but if a vhost device negotiates > > > > VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to > > > > set the translation. > > > > > > As above, the issue is that vhost now unconditionally advertises this in > > > VHOST_GET_FEATURES and so a VMM with no knowledge of IOTLB can end up > > > enabling it by accident. > > > > Unconditionally exposing all vhost feature bits to the guest is > > incorrect. The emulator must filter out only the feature bits that it > > supports. > > I've evidently done a bad job of explaining this, sorry. > > crosvm _does_ filter the feature bits which it passes to vhost. It takes the > feature set which it has negotiated with the guest and then takes the > intersection of this set with the set of features which vhost advertises. > The result is what is passed to VHOST_SET_FEATURES. I included the rust > for this in my initial mail, but in C it might look something like: > > u64 features = negotiate_features_with_guest(dev); > > ioctl(vhost_fd, VHOST_GET_FEATURES, &vhost_features); > vhost_features &= features; /* Mask out unsupported features */ > ioctl(vhost_fd, VHOST_SET_FEATURES, &vhost_features); This is unrelated to the current issue, but this code looks wrong. VHOST_GET_FEATURES must be called before negotiating with the guest. The device features must be restricted by vhost before advertising them to the guest. For example, if a new crosvm binary runs on an old kernel then feature bits crosvm negotiated with the guest may not be supported by the vhost kernel module and the device is broken. > The problem is that crosvm has negotiated VIRTIO_F_ACCESS_PLATFORM with > the guest so that restricted DMA is used for the virtio devices. With > e13a6915a03f, VIRTIO_F_ACCESS_PLATFORM is now advertised by > VHOST_GET_FEATURES and so IOTLB is enabled by the sequence above. > > > For example, see QEMU's vhost-net device's vhost feature bit allowlist: > > https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/vhost_net.c#L40 > > I agree that changing crosvm to use an allowlist would fix the problem, > I'm just questioning whether we should be changing userspace at all to > resolve this issue. > > > The reason why the emulator (crosvm/QEMU/etc) must filter out feature > > bits is that vhost devices are full VIRTIO devices. They are a subset > > of a VIRTIO device and the emulator is responsible for the rest of the > > device. Some features will require both vhost and emulator support. > > Therefore it is incorrect to expect the device to work correctly if > > the vhost feature bits are passed through to the guest. > > I think crosvm is trying to cater for this by masking out the features > it doesn't know about. Can you point to the guest driver code for restricted DMA? It's unclear to me what the guest drivers are doing and whether that is VIRTIO spec compliant. Is the driver compliant with VIRTIO 1.2 "6.1 Driver Requirements: Reserved Feature Bits": A driver SHOULD accept VIRTIO_F_ACCESS_PLATFORM if it is offered, and it MUST then either disable the IOMMU or configure the IOMMU to translate bus addresses passed to the device into physical addresses in memory. If VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only physical addresses to the device. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-06 9:42 ` Will Deacon 2022-08-06 10:52 ` Stefan Hajnoczi @ 2022-08-06 10:52 ` Stefano Garzarella 1 sibling, 0 replies; 22+ messages in thread From: Stefano Garzarella @ 2022-08-06 10:52 UTC (permalink / raw) To: Will Deacon Cc: mst, stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm On Sat, Aug 06, 2022 at 10:42:40AM +0100, Will Deacon wrote: >Hi Stefano, > >On Sat, Aug 06, 2022 at 09:48:28AM +0200, Stefano Garzarella wrote: >> On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: >> > The fundamental issue is, I think, that VIRTIO_F_ACCESS_PLATFORM is >> > being used for two very different things within the same device; for the >> > guest it basically means "use the DMA API, it knows what to do" but for >> > vhost it very specifically means "enable IOTLB". We've recently had >> > other problems with this flag [3] but in this case it used to work >> > reliably and now it doesn't anymore. >> > >> > So how should we fix this? One possibility is for us to hack crosvm to >> > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost >> >> Why do you consider this a hack? > >I think it's a hack for two reasons: > > (1) We're changing userspace to avoid a breaking change in kernel behaviour > (2) I think that crosvm's approach is actually pretty reasonable > >To elaborate on (2), crosvm has a set of device features that it has >negotiated with the guest. It then takes the intersection of these features >with those advertised by VHOST_GET_FEATURES and calls VHOST_SET_FEATURES >with the result. If there was a common interpretation of what these features >do, then this would work and would mean we wouldn't have to opt-in on a >per-flag basis for vhost. Since VIRTIO_F_ACCESS_PLATFORM is being overloaded >to mean two completely different things, then it breaks and I think masking >out that specific flag is a hack because it's basically crosvm saying "yeah, >I may have negotiated this with the driver but vhost _actually_ means >'IOTLB' when it says it supports this flag so I'll mask it out because I >know better". Thanks for elaborating, now I think I get your point! If I understand you correctly, what you would like is that GET_FEATURES should return only the data path features (thus exposed to the guest) and not the features for the VMM, right? In that case, since we also negotiate backend features (with SET|GET_BACKEND_FEATURES ioctls) for IOTLB messages to work, maybe we could only expose that feature if VHOST_BACKEND_F_IOTLB_MSG_V2 has been negotiated @Michael, @Jason, do you think this could be doable? > >> If the VMM implements the translation feature, it is right in my opinion >> that it does not enable the feature for the vhost device. Otherwise, if it >> wants the vhost device to do the translation, enable the feature and send >> the IOTLB messages to set the translation. >> >> QEMU for example masks features when not required or supported. >> crosvm should negotiate only the features it supports. >> >> @Michael and @Jason can correct me, but if a vhost device negotiates >> VIRTIO_F_ACCESS_PLATFORM, then it expects the VMM to send IOTLB messages to >> set the translation. > >As above, the issue is that vhost now unconditionally advertises this in >VHOST_GET_FEATURES and so a VMM with no knowledge of IOTLB can end up >enabling it by accident. I honestly don't know what the initial design was, though, from what I've seen in QEMU, it only enables the known features, in fact for example when we added F_SEQPACKET for vhost-vsock, we had to update QEMU to pass the feature to the guest, so I think the initial idea was to not unconditionally accept all the features exposed by the vhost device. Maybe this part should be clarified. Thanks, Stefano ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-05 18:11 IOTLB support for vhost/vsock breaks crosvm on Android Will Deacon 2022-08-05 22:57 ` Linus Torvalds 2022-08-06 7:48 ` Stefano Garzarella @ 2022-08-07 13:14 ` Michael S. Tsirkin 2022-08-08 10:18 ` Will Deacon 2022-08-09 3:12 ` Jason Wang 2 siblings, 2 replies; 22+ messages in thread From: Michael S. Tsirkin @ 2022-08-07 13:14 UTC (permalink / raw) To: Will Deacon Cc: stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Will, thanks very much for the analysis and the writeup! On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > So how should we fix this? One possibility is for us to hack crosvm to > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > but others here have reasonably pointed out that they didn't expect a > kernel change to break userspace. On the flip side, the offending commit > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > likely that others (e.g. QEMU) are using this feature. Exactly, that's the problem. vhost is reusing the virtio bits and it's only natural that what you are doing would happen. To be precise, this is what we expected people to do (and what QEMU does): #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | (1 << VIRTIO_NET_F_RX_MRG) | .... ) VHOST_GET_FEATURES(... &host_features); host_features &= QEMU_VHOST_FEATURES VHOST_SET_FEATURES(host_features & guest_features) Here QEMU_VHOST_FEATURES are the bits userspace knows about. Our assumption was that whatever userspace enables, it knows what the effect on vhost is going to be. But yes, I understand absolutely how someone would instead just use the guest features. It is unfortunate that we did not catch this in time. In hindsight, we should have just created vhost level macros instead of reusing virtio ones. Would address the concern about naming: PLATFORM_ACCESS makes sense for the guest since there it means "whatever access rules platform has", but for vhost a better name would be VHOST_F_IOTLB. We should have also taken greater pains to document what we expect userspace to do. I remember now how I thought about something like this but after coding this up in QEMU I forgot to document this :( Also, I suspect given the history the GET/SET features ioctl and burned wrt extending it and we have to use a new when we add new features. All this we can do going forward. But what can we do about the specific issue? I am not 100% sure since as Will points out, QEMU and other userspace already rely on the current behaviour. Looking at QEMU specifically, it always sends some translations at startup, this in order to handle device rings. So, *maybe* we can get away with assuming that if no IOTLB ioctl was ever invoked then this userspace does not know about IOTLB and translation should ignore IOTLB completely. I am a bit nervous about breaking some *other* userspace which actually wants device to be blocked from accessing memory until IOTLB has been setup. If we get it wrong we are making guest and possibly even host vulnerable. And of course just revering is not an option either since there are now whole stacks depending on the feature. Will I'd like your input on whether you feel a hack in the kernel is justified here. Also yes, I think it's a good idea to change crosvm anyway. While the work around I describe might make sense upstream I don't think it's a reasonable thing to do in stable kernels. I think I'll prepare a patch documenting the legal vhost features as a 1st step even though crosvm is rust so it's not importing the header directly, right? -- MST ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-07 13:14 ` Michael S. Tsirkin @ 2022-08-08 10:18 ` Will Deacon 2022-08-08 12:45 ` Michael S. Tsirkin 2022-08-09 3:12 ` Jason Wang 1 sibling, 1 reply; 22+ messages in thread From: Will Deacon @ 2022-08-08 10:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm Hi Michael, On Sun, Aug 07, 2022 at 09:14:43AM -0400, Michael S. Tsirkin wrote: > Will, thanks very much for the analysis and the writeup! No problem, and thanks for following up. > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > So how should we fix this? One possibility is for us to hack crosvm to > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > but others here have reasonably pointed out that they didn't expect a > > kernel change to break userspace. On the flip side, the offending commit > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > likely that others (e.g. QEMU) are using this feature. > > Exactly, that's the problem. > > vhost is reusing the virtio bits and it's only natural that > what you are doing would happen. > > To be precise, this is what we expected people to do (and what QEMU does): > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > VHOST_GET_FEATURES(... &host_features); > host_features &= QEMU_VHOST_FEATURES > VHOST_SET_FEATURES(host_features & guest_features) > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > Our assumption was that whatever userspace enables, it > knows what the effect on vhost is going to be. > > But yes, I understand absolutely how someone would instead just use the > guest features. It is unfortunate that we did not catch this in time. > > > In hindsight, we should have just created vhost level macros > instead of reusing virtio ones. Would address the concern > about naming: PLATFORM_ACCESS makes sense for the > guest since there it means "whatever access rules platform has", > but for vhost a better name would be VHOST_F_IOTLB. > We should have also taken greater pains to document what > we expect userspace to do. I remember now how I thought about something > like this but after coding this up in QEMU I forgot to document this :( > Also, I suspect given the history the GET/SET features ioctl and burned > wrt extending it and we have to use a new when we add new features. > All this we can do going forward. Makes sense. The crosvm developers are also pretty friendly in my experience, so I'm sure they wouldn't mind being involved in discussions around any future ABI extensions. Just be aware that they _very_ recently moved their mailing lists, so I think it lives here now: https://groups.google.com/a/chromium.org/g/crosvm-dev > But what can we do about the specific issue? > I am not 100% sure since as Will points out, QEMU and other > userspace already rely on the current behaviour. > > Looking at QEMU specifically, it always sends some translations at > startup, this in order to handle device rings. > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > ever invoked then this userspace does not know about IOTLB and > translation should ignore IOTLB completely. There was a similar suggestion from Stefano: https://lore.kernel.org/r/20220806105225.crkui6nw53kbm5ge@sgarzare-redhat about spotting the backend ioctl for IOTLB and using that to enable the negotiation of F_ACCESS_PLATFORM. Would that work for qemu? > I am a bit nervous about breaking some *other* userspace which actually > wants device to be blocked from accessing memory until IOTLB > has been setup. If we get it wrong we are making guest > and possibly even host vulnerable. > And of course just revering is not an option either since there > are now whole stacks depending on the feature. Absolutely, I'm not seriously suggesting the revert. I just did it locally to confirm the issue I was seeing. > Will I'd like your input on whether you feel a hack in the kernel > is justified here. If we can come up with something that we have confidence in and won't be a pig to maintain, then I think we should do it, but otherwise we can go ahead and change crosvm to mask out this feature flag on the vhost side for now. We mainly wanted to raise the issue to illustrate that this flag continues to attract problems in the hope that it might inform further usage and/or spec work in this area. In any case, I'm happy to test any kernel patches with our setup if you want to give it a shot. > Also yes, I think it's a good idea to change crosvm anyway. While the > work around I describe might make sense upstream I don't think it's a > reasonable thing to do in stable kernels. > I think I'll prepare a patch documenting the legal vhost features > as a 1st step even though crosvm is rust so it's not importing > the header directly, right? Documentation is a good idea regardless, so thanks for that. Even though crosvm has its own bindings for the vhost ioctl()s, the documentation can be reference or duplicated once it's available in the kernel headers. Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-08 10:18 ` Will Deacon @ 2022-08-08 12:45 ` Michael S. Tsirkin 2022-08-08 14:33 ` Stefan Hajnoczi ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Michael S. Tsirkin @ 2022-08-08 12:45 UTC (permalink / raw) To: Will Deacon Cc: stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, crosvm-dev On Mon, Aug 08, 2022 at 11:18:50AM +0100, Will Deacon wrote: > Hi Michael, > > On Sun, Aug 07, 2022 at 09:14:43AM -0400, Michael S. Tsirkin wrote: > > Will, thanks very much for the analysis and the writeup! > > No problem, and thanks for following up. > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > So how should we fix this? One possibility is for us to hack crosvm to > > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > > but others here have reasonably pointed out that they didn't expect a > > > kernel change to break userspace. On the flip side, the offending commit > > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > > likely that others (e.g. QEMU) are using this feature. > > > > Exactly, that's the problem. > > > > vhost is reusing the virtio bits and it's only natural that > > what you are doing would happen. > > > > To be precise, this is what we expected people to do (and what QEMU does): > > > > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > > > VHOST_GET_FEATURES(... &host_features); > > host_features &= QEMU_VHOST_FEATURES > > VHOST_SET_FEATURES(host_features & guest_features) > > > > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > > > Our assumption was that whatever userspace enables, it > > knows what the effect on vhost is going to be. > > > > But yes, I understand absolutely how someone would instead just use the > > guest features. It is unfortunate that we did not catch this in time. > > > > > > In hindsight, we should have just created vhost level macros > > instead of reusing virtio ones. Would address the concern > > about naming: PLATFORM_ACCESS makes sense for the > > guest since there it means "whatever access rules platform has", > > but for vhost a better name would be VHOST_F_IOTLB. > > We should have also taken greater pains to document what > > we expect userspace to do. I remember now how I thought about something > > like this but after coding this up in QEMU I forgot to document this :( > > Also, I suspect given the history the GET/SET features ioctl and burned > > wrt extending it and we have to use a new when we add new features. > > All this we can do going forward. > > Makes sense. The crosvm developers are also pretty friendly in my > experience, so I'm sure they wouldn't mind being involved in discussions > around any future ABI extensions. Just be aware that they _very_ recently > moved their mailing lists, so I think it lives here now: > > https://groups.google.com/a/chromium.org/g/crosvm-dev > > > But what can we do about the specific issue? > > I am not 100% sure since as Will points out, QEMU and other > > userspace already rely on the current behaviour. > > > > Looking at QEMU specifically, it always sends some translations at > > startup, this in order to handle device rings. > > > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > > ever invoked then this userspace does not know about IOTLB and > > translation should ignore IOTLB completely. > > There was a similar suggestion from Stefano: > > https://lore.kernel.org/r/20220806105225.crkui6nw53kbm5ge@sgarzare-redhat > > about spotting the backend ioctl for IOTLB and using that to enable > the negotiation of F_ACCESS_PLATFORM. Would that work for qemu? Hmm I would worry that this disables the feature for old QEMU :( > > I am a bit nervous about breaking some *other* userspace which actually > > wants device to be blocked from accessing memory until IOTLB > > has been setup. If we get it wrong we are making guest > > and possibly even host vulnerable. > > And of course just revering is not an option either since there > > are now whole stacks depending on the feature. > > Absolutely, I'm not seriously suggesting the revert. I just did it locally > to confirm the issue I was seeing. > > > Will I'd like your input on whether you feel a hack in the kernel > > is justified here. > > If we can come up with something that we have confidence in and won't be a > pig to maintain, then I think we should do it, but otherwise we can go ahead > and change crosvm to mask out this feature flag on the vhost side for now. > We mainly wanted to raise the issue to illustrate that this flag continues > to attract problems in the hope that it might inform further usage and/or > spec work in this area. > > In any case, I'm happy to test any kernel patches with our setup if you > want to give it a shot. Thanks! I'm a bit concerned that the trick I proposed changes the configuration where iotlb was not set up from "access to memory not allowed" to "access to all memory allowed". This just might have security implications if some application assumed the former. And the one Stefano proposed disables IOTLB for old QEMU versions. > > Also yes, I think it's a good idea to change crosvm anyway. While the > > work around I describe might make sense upstream I don't think it's a > > reasonable thing to do in stable kernels. > > I think I'll prepare a patch documenting the legal vhost features > > as a 1st step even though crosvm is rust so it's not importing > > the header directly, right? > > Documentation is a good idea regardless, so thanks for that. Even though > crosvm has its own bindings for the vhost ioctl()s, the documentation > can be reference or duplicated once it's available in the kernel headers. > > Will So for crosvm change, I will post the documentation change and you guys can discuss? -- MST ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-08 12:45 ` Michael S. Tsirkin @ 2022-08-08 14:33 ` Stefan Hajnoczi 2022-08-09 3:16 ` Jason Wang 2022-08-17 13:48 ` Will Deacon 2 siblings, 0 replies; 22+ messages in thread From: Stefan Hajnoczi @ 2022-08-08 14:33 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Will Deacon, Stefan Hajnoczi, Jason Wang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, crosvm-dev On Mon, Aug 8, 2022 at 8:46 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Aug 08, 2022 at 11:18:50AM +0100, Will Deacon wrote: > > Hi Michael, > > > > On Sun, Aug 07, 2022 at 09:14:43AM -0400, Michael S. Tsirkin wrote: > > > Will, thanks very much for the analysis and the writeup! > > > > No problem, and thanks for following up. > > > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > > So how should we fix this? One possibility is for us to hack crosvm to > > > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > > > but others here have reasonably pointed out that they didn't expect a > > > > kernel change to break userspace. On the flip side, the offending commit > > > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > > > likely that others (e.g. QEMU) are using this feature. > > > > > > Exactly, that's the problem. > > > > > > vhost is reusing the virtio bits and it's only natural that > > > what you are doing would happen. > > > > > > To be precise, this is what we expected people to do (and what QEMU does): > > > > > > > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > > > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > > > > > VHOST_GET_FEATURES(... &host_features); > > > host_features &= QEMU_VHOST_FEATURES > > > VHOST_SET_FEATURES(host_features & guest_features) > > > > > > > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > > > > > Our assumption was that whatever userspace enables, it > > > knows what the effect on vhost is going to be. > > > > > > But yes, I understand absolutely how someone would instead just use the > > > guest features. It is unfortunate that we did not catch this in time. > > > > > > > > > In hindsight, we should have just created vhost level macros > > > instead of reusing virtio ones. Would address the concern > > > about naming: PLATFORM_ACCESS makes sense for the > > > guest since there it means "whatever access rules platform has", > > > but for vhost a better name would be VHOST_F_IOTLB. > > > We should have also taken greater pains to document what > > > we expect userspace to do. I remember now how I thought about something > > > like this but after coding this up in QEMU I forgot to document this :( > > > Also, I suspect given the history the GET/SET features ioctl and burned > > > wrt extending it and we have to use a new when we add new features. > > > All this we can do going forward. > > > > Makes sense. The crosvm developers are also pretty friendly in my > > experience, so I'm sure they wouldn't mind being involved in discussions > > around any future ABI extensions. Just be aware that they _very_ recently > > moved their mailing lists, so I think it lives here now: > > > > https://groups.google.com/a/chromium.org/g/crosvm-dev > > > > > But what can we do about the specific issue? > > > I am not 100% sure since as Will points out, QEMU and other > > > userspace already rely on the current behaviour. > > > > > > Looking at QEMU specifically, it always sends some translations at > > > startup, this in order to handle device rings. > > > > > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > > > ever invoked then this userspace does not know about IOTLB and > > > translation should ignore IOTLB completely. > > > > There was a similar suggestion from Stefano: > > > > https://lore.kernel.org/r/20220806105225.crkui6nw53kbm5ge@sgarzare-redhat > > > > about spotting the backend ioctl for IOTLB and using that to enable > > the negotiation of F_ACCESS_PLATFORM. Would that work for qemu? > > Hmm I would worry that this disables the feature for old QEMU :( > > > > > I am a bit nervous about breaking some *other* userspace which actually > > > wants device to be blocked from accessing memory until IOTLB > > > has been setup. If we get it wrong we are making guest > > > and possibly even host vulnerable. > > > And of course just revering is not an option either since there > > > are now whole stacks depending on the feature. > > > > Absolutely, I'm not seriously suggesting the revert. I just did it locally > > to confirm the issue I was seeing. > > > > > Will I'd like your input on whether you feel a hack in the kernel > > > is justified here. > > > > If we can come up with something that we have confidence in and won't be a > > pig to maintain, then I think we should do it, but otherwise we can go ahead > > and change crosvm to mask out this feature flag on the vhost side for now. > > We mainly wanted to raise the issue to illustrate that this flag continues > > to attract problems in the hope that it might inform further usage and/or > > spec work in this area. > > > > In any case, I'm happy to test any kernel patches with our setup if you > > want to give it a shot. > > Thanks! > I'm a bit concerned that the trick I proposed changes the configuration > where iotlb was not set up from "access to memory not allowed" to > "access to all memory allowed". This just might have security > implications if some application assumed the former. > And the one Stefano proposed disables IOTLB for old QEMU versions. Adding hacks to vhost in order to work around userspace applications that misunderstand the vhost model seems like a it will lead to problems. Userspace applications need to follow the vhost model: vhost is designed for virtqueue passthrough, but the rest of the vhost interface is not suitable for pass through. It's similar to how VFIO PCI passthrough needs to do a significant amount of stuff in userspace to emulate a PCI configuration space and it won't work properly if you pass through the physical PCI device's PCI configuration space. The emulator has to mediate between the guest device and vhost device because it still emulates the VIRTIO transport, configuration space, device lifecycle, etc even when all virtqueues are passed through. Let's document this for vhost and vDPA because it is not obvious. Stefan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-08 12:45 ` Michael S. Tsirkin 2022-08-08 14:33 ` Stefan Hajnoczi @ 2022-08-09 3:16 ` Jason Wang 2022-08-17 13:48 ` Will Deacon 2 siblings, 0 replies; 22+ messages in thread From: Jason Wang @ 2022-08-09 3:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Will Deacon, Stefan Hajnoczi, Linus Torvalds, ascull, Marc Zyngier, Keir Fraser, jiyong, kernel-team, linux-kernel, virtualization, kvm, crosvm-dev On Mon, Aug 8, 2022 at 8:45 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Aug 08, 2022 at 11:18:50AM +0100, Will Deacon wrote: > > Hi Michael, > > > > On Sun, Aug 07, 2022 at 09:14:43AM -0400, Michael S. Tsirkin wrote: > > > Will, thanks very much for the analysis and the writeup! > > > > No problem, and thanks for following up. > > > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > > So how should we fix this? One possibility is for us to hack crosvm to > > > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > > > but others here have reasonably pointed out that they didn't expect a > > > > kernel change to break userspace. On the flip side, the offending commit > > > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > > > likely that others (e.g. QEMU) are using this feature. > > > > > > Exactly, that's the problem. > > > > > > vhost is reusing the virtio bits and it's only natural that > > > what you are doing would happen. > > > > > > To be precise, this is what we expected people to do (and what QEMU does): > > > > > > > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > > > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > > > > > VHOST_GET_FEATURES(... &host_features); > > > host_features &= QEMU_VHOST_FEATURES > > > VHOST_SET_FEATURES(host_features & guest_features) > > > > > > > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > > > > > Our assumption was that whatever userspace enables, it > > > knows what the effect on vhost is going to be. > > > > > > But yes, I understand absolutely how someone would instead just use the > > > guest features. It is unfortunate that we did not catch this in time. > > > > > > > > > In hindsight, we should have just created vhost level macros > > > instead of reusing virtio ones. Would address the concern > > > about naming: PLATFORM_ACCESS makes sense for the > > > guest since there it means "whatever access rules platform has", > > > but for vhost a better name would be VHOST_F_IOTLB. > > > We should have also taken greater pains to document what > > > we expect userspace to do. I remember now how I thought about something > > > like this but after coding this up in QEMU I forgot to document this :( > > > Also, I suspect given the history the GET/SET features ioctl and burned > > > wrt extending it and we have to use a new when we add new features. > > > All this we can do going forward. > > > > Makes sense. The crosvm developers are also pretty friendly in my > > experience, so I'm sure they wouldn't mind being involved in discussions > > around any future ABI extensions. Just be aware that they _very_ recently > > moved their mailing lists, so I think it lives here now: > > > > https://groups.google.com/a/chromium.org/g/crosvm-dev > > > > > But what can we do about the specific issue? > > > I am not 100% sure since as Will points out, QEMU and other > > > userspace already rely on the current behaviour. > > > > > > Looking at QEMU specifically, it always sends some translations at > > > startup, this in order to handle device rings. > > > > > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > > > ever invoked then this userspace does not know about IOTLB and > > > translation should ignore IOTLB completely. > > > > There was a similar suggestion from Stefano: > > > > https://lore.kernel.org/r/20220806105225.crkui6nw53kbm5ge@sgarzare-redhat > > > > about spotting the backend ioctl for IOTLB and using that to enable > > the negotiation of F_ACCESS_PLATFORM. Would that work for qemu? > > Hmm I would worry that this disables the feature for old QEMU :( > > > > > I am a bit nervous about breaking some *other* userspace which actually > > > wants device to be blocked from accessing memory until IOTLB > > > has been setup. If we get it wrong we are making guest > > > and possibly even host vulnerable. > > > And of course just revering is not an option either since there > > > are now whole stacks depending on the feature. > > > > Absolutely, I'm not seriously suggesting the revert. I just did it locally > > to confirm the issue I was seeing. > > > > > Will I'd like your input on whether you feel a hack in the kernel > > > is justified here. > > > > If we can come up with something that we have confidence in and won't be a > > pig to maintain, then I think we should do it, but otherwise we can go ahead > > and change crosvm to mask out this feature flag on the vhost side for now. > > We mainly wanted to raise the issue to illustrate that this flag continues > > to attract problems in the hope that it might inform further usage and/or > > spec work in this area. > > > > In any case, I'm happy to test any kernel patches with our setup if you > > want to give it a shot. > > Thanks! > I'm a bit concerned that the trick I proposed changes the configuration > where iotlb was not set up from "access to memory not allowed" to > "access to all memory allowed". This just might have security > implications if some application assumed the former. > And the one Stefano proposed disables IOTLB for old QEMU versions. Yes, if there is no choice, having some known cases that are broken is better than silently breaking unknown setups. Thanks > > > > > > Also yes, I think it's a good idea to change crosvm anyway. While the > > > work around I describe might make sense upstream I don't think it's a > > > reasonable thing to do in stable kernels. > > > I think I'll prepare a patch documenting the legal vhost features > > > as a 1st step even though crosvm is rust so it's not importing > > > the header directly, right? > > > > Documentation is a good idea regardless, so thanks for that. Even though > > crosvm has its own bindings for the vhost ioctl()s, the documentation > > can be reference or duplicated once it's available in the kernel headers. > > > > Will > > So for crosvm change, I will post the documentation change and > you guys can discuss? > > -- > MST > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-08 12:45 ` Michael S. Tsirkin 2022-08-08 14:33 ` Stefan Hajnoczi 2022-08-09 3:16 ` Jason Wang @ 2022-08-17 13:48 ` Will Deacon 2022-08-17 17:05 ` Michael S. Tsirkin 2 siblings, 1 reply; 22+ messages in thread From: Will Deacon @ 2022-08-17 13:48 UTC (permalink / raw) To: Michael S. Tsirkin Cc: stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, crosvm-dev On Mon, Aug 08, 2022 at 08:45:48AM -0400, Michael S. Tsirkin wrote: > > > Also yes, I think it's a good idea to change crosvm anyway. While the > > > work around I describe might make sense upstream I don't think it's a > > > reasonable thing to do in stable kernels. > > > I think I'll prepare a patch documenting the legal vhost features > > > as a 1st step even though crosvm is rust so it's not importing > > > the header directly, right? > > > > Documentation is a good idea regardless, so thanks for that. Even though > > crosvm has its own bindings for the vhost ioctl()s, the documentation > > can be reference or duplicated once it's available in the kernel headers. > > > So for crosvm change, I will post the documentation change and > you guys can discuss? FYI, the crosvm patch is merged here: https://github.com/google/crosvm/commit/4e7d00be2e135b0a2d964320ea4276e5d896f426 Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-17 13:48 ` Will Deacon @ 2022-08-17 17:05 ` Michael S. Tsirkin 0 siblings, 0 replies; 22+ messages in thread From: Michael S. Tsirkin @ 2022-08-17 17:05 UTC (permalink / raw) To: Will Deacon Cc: stefanha, jasowang, torvalds, ascull, maz, keirf, jiyong, kernel-team, linux-kernel, virtualization, kvm, crosvm-dev On Wed, Aug 17, 2022 at 02:48:22PM +0100, Will Deacon wrote: > On Mon, Aug 08, 2022 at 08:45:48AM -0400, Michael S. Tsirkin wrote: > > > > Also yes, I think it's a good idea to change crosvm anyway. While the > > > > work around I describe might make sense upstream I don't think it's a > > > > reasonable thing to do in stable kernels. > > > > I think I'll prepare a patch documenting the legal vhost features > > > > as a 1st step even though crosvm is rust so it's not importing > > > > the header directly, right? > > > > > > Documentation is a good idea regardless, so thanks for that. Even though > > > crosvm has its own bindings for the vhost ioctl()s, the documentation > > > can be reference or duplicated once it's available in the kernel headers. > > > > > So for crosvm change, I will post the documentation change and > > you guys can discuss? > > FYI, the crosvm patch is merged here: > > https://github.com/google/crosvm/commit/4e7d00be2e135b0a2d964320ea4276e5d896f426 > > Will Great thanks a lot! I'm on vacation next week but will work on it afterwards. -- MST ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IOTLB support for vhost/vsock breaks crosvm on Android 2022-08-07 13:14 ` Michael S. Tsirkin 2022-08-08 10:18 ` Will Deacon @ 2022-08-09 3:12 ` Jason Wang 1 sibling, 0 replies; 22+ messages in thread From: Jason Wang @ 2022-08-09 3:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Will Deacon, Stefan Hajnoczi, Linus Torvalds, ascull, Marc Zyngier, Keir Fraser, jiyong, kernel-team, linux-kernel, virtualization, kvm On Sun, Aug 7, 2022 at 9:14 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > Will, thanks very much for the analysis and the writeup! > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > So how should we fix this? One possibility is for us to hack crosvm to > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > but others here have reasonably pointed out that they didn't expect a > > kernel change to break userspace. On the flip side, the offending commit > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > likely that others (e.g. QEMU) are using this feature. > > Exactly, that's the problem. > > vhost is reusing the virtio bits and it's only natural that > what you are doing would happen. > > To be precise, this is what we expected people to do (and what QEMU does): > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > VHOST_GET_FEATURES(... &host_features); > host_features &= QEMU_VHOST_FEATURES > VHOST_SET_FEATURES(host_features & guest_features) > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > Our assumption was that whatever userspace enables, it > knows what the effect on vhost is going to be. > > But yes, I understand absolutely how someone would instead just use the > guest features. It is unfortunate that we did not catch this in time. > > > In hindsight, we should have just created vhost level macros > instead of reusing virtio ones. Would address the concern > about naming: PLATFORM_ACCESS makes sense for the > guest since there it means "whatever access rules platform has", > but for vhost a better name would be VHOST_F_IOTLB. Yes, in the original patch it is called VHOST_F_DEVICE_IOTLB actually to make it differ from virtio like VHOST_F_LOG_ALL etc. And I remember I tried to post patch to avoid the bit duplication but the conclusion is that it's too late for the changes. > We should have also taken greater pains to document what > we expect userspace to do. I remember now how I thought about something > like this but after coding this up in QEMU I forgot to document this :( > Also, I suspect given the history the GET/SET features ioctl and burned > wrt extending it and we have to use a new when we add new features. > All this we can do going forward. > > > But what can we do about the specific issue? > I am not 100% sure since as Will points out, QEMU and other > userspace already rely on the current behaviour. > > Looking at QEMU specifically, it always sends some translations at > startup, this in order to handle device rings. > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > ever invoked then this userspace does not know about IOTLB and > translation should ignore IOTLB completely. I think this breaks the security assumptions of some setups. > > I am a bit nervous about breaking some *other* userspace which actually > wants device to be blocked from accessing memory until IOTLB > has been setup. If we get it wrong we are making guest > and possibly even host vulnerable. Yes. > And of course just revering is not an option either since there > are now whole stacks depending on the feature. > > Will I'd like your input on whether you feel a hack in the kernel > is justified here. > > Also yes, I think it's a good idea to change crosvm anyway. While the > work around I describe might make sense upstream I don't think it's a > reasonable thing to do in stable kernels. +1 Thanks > I think I'll prepare a patch documenting the legal vhost features > as a 1st step even though crosvm is rust so it's not importing > the header directly, right? > > -- > MST > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-08-17 17:06 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-05 18:11 IOTLB support for vhost/vsock breaks crosvm on Android Will Deacon 2022-08-05 22:57 ` Linus Torvalds 2022-08-06 8:17 ` Stefano Garzarella 2022-08-06 9:42 ` Will Deacon 2022-08-07 6:52 ` Christoph Hellwig 2022-08-07 13:35 ` Michael S. Tsirkin 2022-08-07 13:27 ` Michael S. Tsirkin 2022-08-06 7:48 ` Stefano Garzarella 2022-08-06 9:42 ` Will Deacon 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 10:52 ` Stefan Hajnoczi 2022-08-06 14:34 ` Will Deacon 2022-08-06 21:47 ` Stefan Hajnoczi 2022-08-06 10:52 ` Stefano Garzarella 2022-08-07 13:14 ` Michael S. Tsirkin 2022-08-08 10:18 ` Will Deacon 2022-08-08 12:45 ` Michael S. Tsirkin 2022-08-08 14:33 ` Stefan Hajnoczi 2022-08-09 3:16 ` Jason Wang 2022-08-17 13:48 ` Will Deacon 2022-08-17 17:05 ` Michael S. Tsirkin 2022-08-09 3:12 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).