From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoffer Dall Subject: Re: issues with emulated PCI MMIO backed by host memory under KVM Date: Mon, 27 Jun 2016 15:35:08 +0200 Message-ID: <20160627133508.GI26498@cbox> References: <20160627091619.GB26498@cbox> <20160627103421.GC26498@cbox> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 67E5241390 for ; Mon, 27 Jun 2016 09:29:16 -0400 (EDT) Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id U0-Jkg6jxsC1 for ; Mon, 27 Jun 2016 09:29:14 -0400 (EDT) Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 0E1CB412D0 for ; Mon, 27 Jun 2016 09:29:13 -0400 (EDT) Received: by mail-wm0-f49.google.com with SMTP id a66so115730555wme.0 for ; Mon, 27 Jun 2016 06:34:21 -0700 (PDT) Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Ard Biesheuvel Cc: Marc Zyngier , Catalin Marinas , Laszlo Ersek , "kvmarm@lists.cs.columbia.edu" List-Id: kvmarm@lists.cs.columbia.edu On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: > On 27 June 2016 at 12:34, Christoffer Dall wrote: > > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > >> On 27 June 2016 at 11:16, Christoffer Dall wrote: > >> > Hi, > >> > > >> > I'm going to ask some stupid questions here... > >> > > >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > >> >> Hi all, > >> >> > >> >> This old subject came up again in a discussion related to PCIe support > >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > >> >> regions as cacheable is preventing us from reusing a significant slice > >> >> of the PCIe support infrastructure, and so I'd like to bring this up > >> >> again, perhaps just to reiterate why we're simply out of luck. > >> >> > >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions > >> >> for emulated devices may be backed by memory that is mapped cacheable > >> >> by the host. Note that this has nothing to do with the device being > >> >> DMA coherent or not: in this case, we are dealing with regions that > >> >> are not memory from the POV of the guest, and it is reasonable for the > >> >> guest to assume that accesses to such a region are not visible to the > >> >> device before they hit the actual PCI MMIO window and are translated > >> >> into cycles on the PCI bus. > >> > > >> > For the sake of completeness, why is this reasonable? > >> > > >> > >> Because the whole point of accessing these regions is to communicate > >> with the device. It is common to use write combining mappings for > >> things like framebuffers to group writes before they hit the PCI bus, > >> but any caching just makes it more difficult for the driver state and > >> device state to remain synchronized. > >> > >> > Is this how any real ARM system implementing PCI would actually work? > >> > > >> > >> Yes. > >> > >> >> That means that mapping such a region > >> >> cacheable is a strange thing to do, in fact, and it is unlikely that > >> >> patches implementing this against the generic PCI stack in Tianocore > >> >> will be accepted by the maintainers. > >> >> > >> >> Note that this issue not only affects framebuffers on PCI cards, it > >> >> also affects emulated USB host controllers (perhaps Alex can remind us > >> >> which one exactly?) and likely other emulated generic PCI devices as > >> >> well. > >> >> > >> >> Since the issue exists only for emulated PCI devices whose MMIO > >> >> regions are backed by host memory, is there any way we can already > >> >> distinguish such memslots from ordinary ones? If we can, is there > >> >> anything we could do to treat these specially? Perhaps something like > >> >> using read-only memslots so we can at least trap guest writes instead > >> >> of having main memory going out of sync with the caches unnoticed? I > >> >> am just brainstorming here ... > >> > > >> > I think the only sensible solution is to make sure that the guest and > >> > emulation mappings use the same memory type, either cached or > >> > non-cached, and we 'simply' have to find the best way to implement this. > >> > > >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the > >> > one way. > >> > > >> > The other way is to use something like what you once wrote that rewrites > >> > stage-1 mappings to be cacheable, does that apply here ? > >> > > >> > Do we have a clear picture of why we'd prefer one way over the other? > >> > > >> > >> So first of all, let me reiterate that I could only find a single > >> instance in QEMU where a PCI MMIO region is backed by host memory, > >> which is vga-pci.c. I wonder of there are any other occurrences, but > >> if there aren't any, it makes much more sense to prohibit PCI BARs > >> backed by host memory rather than spend a lot of effort working around > >> it. > > > > Right, ok. So Marc's point during his KVM Forum talk was basically, > > don't use the legacy VGA adapter on ARM and use virtio graphics, right? > > > > Yes. But nothing is preventing you currently from using that, and I > think we should prefer crappy performance but correct operation over > the current situation. So in general, we should either disallow PCI > BARs backed by host memory, or emulate them, but never back them by a > RAM memslot when running under ARM/KVM. agreed, I just think that emulating accesses by trapping them is not just slow, it's not really possible in practice and even if it is, it's probably *unusably* slow. > > > What is the proposed solution for someone shipping an ARM server and > > wishing to provide a graphical output for that server? > > > > The problem does not exist on bare metal. It is an implementation > detail of KVM on ARM that guest PCI BAR mappings are incoherent with > the view of the emulator in QEMU. > > > It feels strange to work around supporting PCI VGA adapters in ARM VMs, > > if that's not a supported real hardware case. However, I don't see what > > would prevent someone from plugging a VGA adapter into the PCI slot on > > an ARM server, and people selling ARM servers probably want this to > > happen, I'm guessing. > > > > As I said, the problem does not exist on bare metal. > > >> > >> If we do decide to fix this, the best way would be to use uncached > >> attributes for the QEMU userland mapping, and force it uncached in the > >> guest via a stage 2 override (as Drews suggests). The only problem I > >> see here is that the host's kernel direct mapping has a cached alias > >> that we need to get rid of. > > > > Do we have a way to accomplish that? > > > > Will we run into a bunch of other problems if we begin punching holes in > > the direct mapping for regular RAM? > > > > I think the policy up until now has been not to remap regions in the > kernel direct mapping for the purposes of DMA, and I think by the same > reasoning, it is not preferable for KVM either I guess the difference is that from the (host) kernel's point of view this is not DMA memory, but just regular RAM. I just don't know enough about the kernel's VM mappings to know what's involved here, but we should find out somehow... Thanks, -Christoffer