All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Christoffer Dall <christoffer.dall@linaro.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>
Subject: Re: issues with emulated PCI MMIO backed by host memory under KVM
Date: Tue, 28 Jun 2016 13:06:36 +0200	[thread overview]
Message-ID: <9fbfb578-2235-2f2a-4502-a285e9ba22e6@redhat.com> (raw)
In-Reply-To: <20160628100405.GK26498@cbox>

On 06/28/16 12:04, Christoffer Dall wrote:
> On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote:
>> On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote:
>>>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>>>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>>>>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm going to ask some stupid questions here...
>>>>>>>
>>>>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> This old subject came up again in a discussion related to PCIe support
>>>>>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>>>>>>>> regions as cacheable is preventing us from reusing a significant slice
>>>>>>>> of the PCIe support infrastructure, and so I'd like to bring this up
>>>>>>>> again, perhaps just to reiterate why we're simply out of luck.
>>>>>>>>
>>>>>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>>>>>>>> for emulated devices may be backed by memory that is mapped cacheable
>>>>>>>> by the host. Note that this has nothing to do with the device being
>>>>>>>> DMA coherent or not: in this case, we are dealing with regions that
>>>>>>>> are not memory from the POV of the guest, and it is reasonable for the
>>>>>>>> guest to assume that accesses to such a region are not visible to the
>>>>>>>> device before they hit the actual PCI MMIO window and are translated
>>>>>>>> into cycles on the PCI bus.
>>>>>>>
>>>>>>> For the sake of completeness, why is this reasonable?
>>>>>>>
>>>>>>
>>>>>> Because the whole point of accessing these regions is to communicate
>>>>>> with the device. It is common to use write combining mappings for
>>>>>> things like framebuffers to group writes before they hit the PCI bus,
>>>>>> but any caching just makes it more difficult for the driver state and
>>>>>> device state to remain synchronized.
>>>>>>
>>>>>>> Is this how any real ARM system implementing PCI would actually work?
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> That means that mapping such a region
>>>>>>>> cacheable is a strange thing to do, in fact, and it is unlikely that
>>>>>>>> patches implementing this against the generic PCI stack in Tianocore
>>>>>>>> will be accepted by the maintainers.
>>>>>>>>
>>>>>>>> Note that this issue not only affects framebuffers on PCI cards, it
>>>>>>>> also affects emulated USB host controllers (perhaps Alex can remind us
>>>>>>>> which one exactly?) and likely other emulated generic PCI devices as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> Since the issue exists only for emulated PCI devices whose MMIO
>>>>>>>> regions are backed by host memory, is there any way we can already
>>>>>>>> distinguish such memslots from ordinary ones? If we can, is there
>>>>>>>> anything we could do to treat these specially? Perhaps something like
>>>>>>>> using read-only memslots so we can at least trap guest writes instead
>>>>>>>> of having main memory going out of sync with the caches unnoticed? I
>>>>>>>> am just brainstorming here ...
>>>>>>>
>>>>>>> I think the only sensible solution is to make sure that the guest and
>>>>>>> emulation mappings use the same memory type, either cached or
>>>>>>> non-cached, and we 'simply' have to find the best way to implement this.
>>>>>>>
>>>>>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>>>>>>> one way.
>>>>>>>
>>>>>>> The other way is to use something like what you once wrote that rewrites
>>>>>>> stage-1 mappings to be cacheable, does that apply here ?
>>>>>>>
>>>>>>> Do we have a clear picture of why we'd prefer one way over the other?
>>>>>>>
>>>>>>
>>>>>> So first of all, let me reiterate that I could only find a single
>>>>>> instance in QEMU where a PCI MMIO region is backed by host memory,
>>>>>> which is vga-pci.c. I wonder of there are any other occurrences, but
>>>>>> if there aren't any, it makes much more sense to prohibit PCI BARs
>>>>>> backed by host memory rather than spend a lot of effort working around
>>>>>> it.
>>>>>
>>>>> Right, ok.  So Marc's point during his KVM Forum talk was basically,
>>>>> don't use the legacy VGA adapter on ARM and use virtio graphics, right?
>>>>>
>>>>
>>>> Yes. But nothing is preventing you currently from using that, and I
>>>> think we should prefer crappy performance but correct operation over
>>>> the current situation. So in general, we should either disallow PCI
>>>> BARs backed by host memory, or emulate them, but never back them by a
>>>> RAM memslot when running under ARM/KVM.
>>>
>>> agreed, I just think that emulating accesses by trapping them is not
>>> just slow, it's not really possible in practice and even if it is, it's
>>> probably *unusably* slow.
>>>
>>
>> Well, it would probably involve a lot of effort to implement emulation
>> of instructions with multiple output registers, such as ldp/stp and
>> register writeback. And indeed, trapping on each store instruction to
>> the framebuffer is going to be sloooooowwwww.
>>
>> So let's disregard that option for now ...
>>
>>>>
>>>>> What is the proposed solution for someone shipping an ARM server and
>>>>> wishing to provide a graphical output for that server?
>>>>>
>>>>
>>>> The problem does not exist on bare metal. It is an implementation
>>>> detail of KVM on ARM that guest PCI BAR mappings are incoherent with
>>>> the view of the emulator in QEMU.
>>>>
>>>>> It feels strange to work around supporting PCI VGA adapters in ARM VMs,
>>>>> if that's not a supported real hardware case.  However, I don't see what
>>>>> would prevent someone from plugging a VGA adapter into the PCI slot on
>>>>> an ARM server, and people selling ARM servers probably want this to
>>>>> happen, I'm guessing.
>>>>>
>>>>
>>>> As I said, the problem does not exist on bare metal.
>>>>
>>>>>>
>>>>>> If we do decide to fix this, the best way would be to use uncached
>>>>>> attributes for the QEMU userland mapping, and force it uncached in the
>>>>>> guest via a stage 2 override (as Drews suggests). The only problem I
>>>>>> see here is that the host's kernel direct mapping has a cached alias
>>>>>> that we need to get rid of.
>>>>>
>>>>> Do we have a way to accomplish that?
>>>>>
>>>>> Will we run into a bunch of other problems if we begin punching holes in
>>>>> the direct mapping for regular RAM?
>>>>>
>>>>
>>>> I think the policy up until now has been not to remap regions in the
>>>> kernel direct mapping for the purposes of DMA, and I think by the same
>>>> reasoning, it is not preferable for KVM either
>>>
>>> I guess the difference is that from the (host) kernel's point of view
>>> this is not DMA memory, but just regular RAM.  I just don't know enough
>>> about the kernel's VM mappings to know what's involved here, but we
>>> should find out somehow...
>>>
>>
>> Whether it is DMA memory or not does not make a difference. The point
>> is simply that arm64 maps all RAM owned by the kernel as cacheable,
>> and remapping arbitrary ranges with different attributes is
>> problematic, since it is also likely to involve splitting of regions,
>> which is cumbersome with a mapping that is always live.
>>
>> So instead, we'd have to reserve some system memory early on and
>> remove it from the linear mapping, the complexity of which is more
>> than we are probably prepared to put up with.
> 
> Don't we have any existing frameworks for such things, like ion or
> other things like that?  Not sure if these systems export anything to
> userspace or even serve the purpose we want, but thought I'd throw it
> out there.
> 
>>
>> So if vga-pci.c is the only problematic device, for which a reasonable
>> alternative exists (virtio-gpu), I think the only feasible solution is
>> to educate QEMU not to allow RAM memslots being exposed via PCI BARs
>> when running under KVM/ARM.
> 
> It would be good if we could support vga-pci under KVM/ARM, but if
> there's no other way than rewriting the arm64 kernel's memory mappings
> completely, then probably we're stuck there, unfortunately.

It's been mentioned earlier that the specific combination of S1 and S2
mappings on aarch64 is actually an *architecture bug*. If we accept that
qualification, then we should realize our efforts here target finding a
*workaround*.

In your blog post
<http://www.linaro.org/blog/core-dump/on-the-performance-of-arm-virtualization/>,
you mention VHE ("Virtualization Host Extensions"). That's clearly a
sign of the architecture adapting to virt software needs.

Do you see any chance that the S1-S2 combinations too can be fixed in a
new revision of the architecture?

Thanks
Laszlo

  reply	other threads:[~2016-06-28 11:01 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel
2016-06-24 14:57 ` Andrew Jones
2016-06-27  8:17   ` Marc Zyngier
2016-06-24 18:16 ` Ard Biesheuvel
2016-06-25  7:15   ` Alexander Graf
2016-06-25  7:19 ` Alexander Graf
2016-06-27  8:11   ` Marc Zyngier
2016-06-27  9:16 ` Christoffer Dall
2016-06-27  9:47   ` Ard Biesheuvel
2016-06-27 10:34     ` Christoffer Dall
2016-06-27 12:30       ` Ard Biesheuvel
2016-06-27 13:35         ` Christoffer Dall
2016-06-27 13:57           ` Ard Biesheuvel
2016-06-27 14:29             ` Alexander Graf
2016-06-28 11:02               ` Laszlo Ersek
2016-06-28 10:04             ` Christoffer Dall
2016-06-28 11:06               ` Laszlo Ersek [this message]
2016-06-28 12:20                 ` Christoffer Dall
2016-06-28 13:10                   ` Catalin Marinas
2016-06-28 13:19                     ` Ard Biesheuvel
2016-06-28 13:25                       ` Catalin Marinas
2016-06-28 14:02                         ` Andrew Jones
2016-06-27 14:24       ` Alexander Graf
2016-06-28 10:55       ` Laszlo Ersek
2016-06-28 13:14         ` Ard Biesheuvel
2016-06-28 13:32           ` Laszlo Ersek
2016-06-29  7:12             ` Gerd Hoffmann
2016-06-28 15:23         ` Alexander Graf
2016-06-27 13:15     ` Peter Maydell
2016-06-27 13:49       ` Mark Rutland
2016-06-27 14:10         ` Peter Maydell
2016-06-28 10:05           ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fbfb578-2235-2f2a-4502-a285e9ba22e6@redhat.com \
    --to=lersek@redhat.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=marc.zyngier@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.