* issues with emulated PCI MMIO backed by host memory under KVM @ 2016-06-24 14:04 Ard Biesheuvel 2016-06-24 14:57 ` Andrew Jones ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-24 14:04 UTC (permalink / raw) To: Christoffer Dall, Peter Maydell, Marc Zyngier, Andrew Jones, Laszlo Ersek, kvmarm, Alexander Graf Cc: Catalin Marinas Hi all, This old subject came up again in a discussion related to PCIe support for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO regions as cacheable is preventing us from reusing a significant slice of the PCIe support infrastructure, and so I'd like to bring this up again, perhaps just to reiterate why we're simply out of luck. To refresh your memories, the issue is that on ARM, PCI MMIO regions for emulated devices may be backed by memory that is mapped cacheable by the host. Note that this has nothing to do with the device being DMA coherent or not: in this case, we are dealing with regions that are not memory from the POV of the guest, and it is reasonable for the guest to assume that accesses to such a region are not visible to the device before they hit the actual PCI MMIO window and are translated into cycles on the PCI bus. That means that mapping such a region cacheable is a strange thing to do, in fact, and it is unlikely that patches implementing this against the generic PCI stack in Tianocore will be accepted by the maintainers. Note that this issue not only affects framebuffers on PCI cards, it also affects emulated USB host controllers (perhaps Alex can remind us which one exactly?) and likely other emulated generic PCI devices as well. Since the issue exists only for emulated PCI devices whose MMIO regions are backed by host memory, is there any way we can already distinguish such memslots from ordinary ones? If we can, is there anything we could do to treat these specially? Perhaps something like using read-only memslots so we can at least trap guest writes instead of having main memory going out of sync with the caches unnoticed? I am just brainstorming here ... In any case, it would be good to put this to bed one way or the other (assuming it hasn't been put to bed already) Thanks, Ard. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel @ 2016-06-24 14:57 ` Andrew Jones 2016-06-27 8:17 ` Marc Zyngier 2016-06-24 18:16 ` Ard Biesheuvel ` (2 subsequent siblings) 3 siblings, 1 reply; 32+ messages in thread From: Andrew Jones @ 2016-06-24 14:57 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm Hi Ard, Thanks for bringing this back up again (I think :-) On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > Hi all, > > This old subject came up again in a discussion related to PCIe support > for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > regions as cacheable is preventing us from reusing a significant slice > of the PCIe support infrastructure, and so I'd like to bring this up > again, perhaps just to reiterate why we're simply out of luck. > > To refresh your memories, the issue is that on ARM, PCI MMIO regions > for emulated devices may be backed by memory that is mapped cacheable > by the host. Note that this has nothing to do with the device being > DMA coherent or not: in this case, we are dealing with regions that > are not memory from the POV of the guest, and it is reasonable for the > guest to assume that accesses to such a region are not visible to the > device before they hit the actual PCI MMIO window and are translated > into cycles on the PCI bus. That means that mapping such a region > cacheable is a strange thing to do, in fact, and it is unlikely that > patches implementing this against the generic PCI stack in Tianocore > will be accepted by the maintainers. > > Note that this issue not only affects framebuffers on PCI cards, it > also affects emulated USB host controllers (perhaps Alex can remind us > which one exactly?) and likely other emulated generic PCI devices as > well. > > Since the issue exists only for emulated PCI devices whose MMIO > regions are backed by host memory, is there any way we can already > distinguish such memslots from ordinary ones? If we can, is there When I was looking at this I didn't see any way to identify these memslots. I wrote some patches to add a new flag, KVM_MEM_NONCACHEABLE, allowing userspace to point them out. That was the easy part (although I didn't like that userspace developers would have to go around finding all memory regions that needed to be flagged, and new devices would likely not be flagged when developed on non-arm architectures, so we'd always be chasing it...) However what really slowed/stopped me was trying to figure out what to do with those identified memslots. My last idea, which had implementation issues (probably because I was getting in over my head), was 1) introduce PAGE_S2_NORMAL_NC and use it when mapping the guest's pages 2) flush the userspace pages and update all PTEs to be NC The reasoning was that, while we can't force a guest to use cacheable memory, we can take advantage of the noncacheable precedence of the architecture, forcing the memory accesses to be noncached by way of S2 attributes. And of course userspace mappings also need to become NC to finally have coherency. > anything we could do to treat these specially? Perhaps something like > using read-only memslots so we can at least trap guest writes instead > of having main memory going out of sync with the caches unnoticed? I > am just brainstorming here ... > > In any case, it would be good to put this to bed one way or the other > (assuming it hasn't been put to bed already) I'm willing to work on this again (because it's fun), but I'm a bit overloaded right now, and last time I touched it it sucked me into a time hole... drew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 14:57 ` Andrew Jones @ 2016-06-27 8:17 ` Marc Zyngier 0 siblings, 0 replies; 32+ messages in thread From: Marc Zyngier @ 2016-06-27 8:17 UTC (permalink / raw) To: Andrew Jones, Ard Biesheuvel; +Cc: Catalin Marinas, Laszlo Ersek, kvmarm On 24/06/16 15:57, Andrew Jones wrote: > > Hi Ard, > > Thanks for bringing this back up again (I think :-) > > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >> Hi all, >> >> This old subject came up again in a discussion related to PCIe support >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >> regions as cacheable is preventing us from reusing a significant slice >> of the PCIe support infrastructure, and so I'd like to bring this up >> again, perhaps just to reiterate why we're simply out of luck. >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions >> for emulated devices may be backed by memory that is mapped cacheable >> by the host. Note that this has nothing to do with the device being >> DMA coherent or not: in this case, we are dealing with regions that >> are not memory from the POV of the guest, and it is reasonable for the >> guest to assume that accesses to such a region are not visible to the >> device before they hit the actual PCI MMIO window and are translated >> into cycles on the PCI bus. That means that mapping such a region >> cacheable is a strange thing to do, in fact, and it is unlikely that >> patches implementing this against the generic PCI stack in Tianocore >> will be accepted by the maintainers. >> >> Note that this issue not only affects framebuffers on PCI cards, it >> also affects emulated USB host controllers (perhaps Alex can remind us >> which one exactly?) and likely other emulated generic PCI devices as >> well. >> >> Since the issue exists only for emulated PCI devices whose MMIO >> regions are backed by host memory, is there any way we can already >> distinguish such memslots from ordinary ones? If we can, is there > > When I was looking at this I didn't see any way to identify these > memslots. I wrote some patches to add a new flag, KVM_MEM_NONCACHEABLE, > allowing userspace to point them out. That was the easy part (although > I didn't like that userspace developers would have to go around finding > all memory regions that needed to be flagged, and new devices would > likely not be flagged when developed on non-arm architectures, so we'd > always be chasing it...) However what really slowed/stopped me was > trying to figure out what to do with those identified memslots. > > My last idea, which had implementation issues (probably because I was > getting in over my head), was > > 1) introduce PAGE_S2_NORMAL_NC and use it when mapping the guest's pages > 2) flush the userspace pages and update all PTEs to be NC > > The reasoning was that, while we can't force a guest to use cacheable > memory, we can take advantage of the noncacheable precedence of the > architecture, forcing the memory accesses to be noncached by way of > S2 attributes. And of course userspace mappings also need to become NC > to finally have coherency. I think this is a sensible course of action, as long as you can identify a specific memblock on which to apply this. You may even not have to "repaint" the PTEs, but instead obtain a non-cacheable mapping from the kernel (at a different address). I'm more worried if we end-up having both cacheable and non-cacheable pages inside the same VMA (and Alex seems to point at USB having weird requirements around this). Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel 2016-06-24 14:57 ` Andrew Jones @ 2016-06-24 18:16 ` Ard Biesheuvel 2016-06-25 7:15 ` Alexander Graf 2016-06-25 7:19 ` Alexander Graf 2016-06-27 9:16 ` Christoffer Dall 3 siblings, 1 reply; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-24 18:16 UTC (permalink / raw) To: Christoffer Dall, Peter Maydell, Marc Zyngier, Andrew Jones, Laszlo Ersek, kvmarm, Alexander Graf Cc: Catalin Marinas On 24 June 2016 at 16:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: [...] > Note that this issue not only affects framebuffers on PCI cards, it > also affects emulated USB host controllers (perhaps Alex can remind us > which one exactly?) Actually, looking at the QEMU source code, I am not able to spot the USB hcd emulation code that backs a PCI MMIO BAR using host memory, and in fact, the only instance I *can* find is vga-pci.c @Alex: could you please explain which exact issue with USB emulation is suspected to be caused by this? @team-RH: are there any other examples beyond VGA PCI where this is a problem? Thanks, Ard. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 18:16 ` Ard Biesheuvel @ 2016-06-25 7:15 ` Alexander Graf 0 siblings, 0 replies; 32+ messages in thread From: Alexander Graf @ 2016-06-25 7:15 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm > Am 24.06.2016 um 20:16 schrieb Ard Biesheuvel <ard.biesheuvel@linaro.org>: > >> On 24 June 2016 at 16:04, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> [...] >> Note that this issue not only affects framebuffers on PCI cards, it >> also affects emulated USB host controllers (perhaps Alex can remind us >> which one exactly?) > > Actually, looking at the QEMU source code, I am not able to spot the > USB hcd emulation code that backs a PCI MMIO BAR using host memory, > and in fact, the only instance I *can* find is vga-pci.c > > @Alex: could you please explain which exact issue with USB emulation > is suspected to be caused by this? IIRC Linux put thhe usb rings into guest memory and mapped them as NC inside the guest. So the host will see stale data from the cache. Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel 2016-06-24 14:57 ` Andrew Jones 2016-06-24 18:16 ` Ard Biesheuvel @ 2016-06-25 7:19 ` Alexander Graf 2016-06-27 8:11 ` Marc Zyngier 2016-06-27 9:16 ` Christoffer Dall 3 siblings, 1 reply; 32+ messages in thread From: Alexander Graf @ 2016-06-25 7:19 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm > Am 24.06.2016 um 16:04 schrieb Ard Biesheuvel <ard.biesheuvel@linaro.org>: > > Hi all, > > This old subject came up again in a discussion related to PCIe support > for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > regions as cacheable is preventing us from reusing a significant slice > of the PCIe support infrastructure, and so I'd like to bring this up > again, perhaps just to reiterate why we're simply out of luck. > > To refresh your memories, the issue is that on ARM, PCI MMIO regions > for emulated devices may be backed by memory that is mapped cacheable > by the host. Note that this has nothing to do with the device being > DMA coherent or not: in this case, we are dealing with regions that > are not memory from the POV of the guest, and it is reasonable for the > guest to assume that accesses to such a region are not visible to the > device before they hit the actual PCI MMIO window and are translated > into cycles on the PCI bus. That means that mapping such a region > cacheable is a strange thing to do, in fact, and it is unlikely that > patches implementing this against the generic PCI stack in Tianocore > will be accepted by the maintainers. > > Note that this issue not only affects framebuffers on PCI cards, it > also affects emulated USB host controllers (perhaps Alex can remind us > which one exactly?) and likely other emulated generic PCI devices as > well. > > Since the issue exists only for emulated PCI devices whose MMIO > regions are backed by host memory, is there any way we can already > distinguish such memslots from ordinary ones? If we can, is there > anything we could do to treat these specially? Perhaps something like > using read-only memslots so we can at least trap guest writes instead > of having main memory going out of sync with the caches unnoticed? I > am just brainstorming here ... The "easiest" first step would be to simply not map host memory into the guest when we're on arm. Unfortunately that would mean we trap on everything as mmio accesses, including user space access from Xorg for example. That in turn means we'd need to mmio emulate neon instructions and all other sorts of things that can trigger mmio exits without being emulated today. Also, even with that working and maybe even coalesced mmio implemented, I'd guess it'd still be too slow for real world usage... Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-25 7:19 ` Alexander Graf @ 2016-06-27 8:11 ` Marc Zyngier 0 siblings, 0 replies; 32+ messages in thread From: Marc Zyngier @ 2016-06-27 8:11 UTC (permalink / raw) To: Alexander Graf, Ard Biesheuvel; +Cc: Catalin Marinas, Laszlo Ersek, kvmarm On 25/06/16 08:19, Alexander Graf wrote: > > >> Am 24.06.2016 um 16:04 schrieb Ard Biesheuvel <ard.biesheuvel@linaro.org>: >> >> Hi all, >> >> This old subject came up again in a discussion related to PCIe support >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >> regions as cacheable is preventing us from reusing a significant slice >> of the PCIe support infrastructure, and so I'd like to bring this up >> again, perhaps just to reiterate why we're simply out of luck. >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions >> for emulated devices may be backed by memory that is mapped cacheable >> by the host. Note that this has nothing to do with the device being >> DMA coherent or not: in this case, we are dealing with regions that >> are not memory from the POV of the guest, and it is reasonable for the >> guest to assume that accesses to such a region are not visible to the >> device before they hit the actual PCI MMIO window and are translated >> into cycles on the PCI bus. That means that mapping such a region >> cacheable is a strange thing to do, in fact, and it is unlikely that >> patches implementing this against the generic PCI stack in Tianocore >> will be accepted by the maintainers. >> >> Note that this issue not only affects framebuffers on PCI cards, it >> also affects emulated USB host controllers (perhaps Alex can remind us >> which one exactly?) and likely other emulated generic PCI devices as >> well. >> >> Since the issue exists only for emulated PCI devices whose MMIO >> regions are backed by host memory, is there any way we can already >> distinguish such memslots from ordinary ones? If we can, is there >> anything we could do to treat these specially? Perhaps something like >> using read-only memslots so we can at least trap guest writes instead >> of having main memory going out of sync with the caches unnoticed? I >> am just brainstorming here ... > > The "easiest" first step would be to simply not map host memory into > the guest when we're on arm. Unfortunately that would mean we trap on > everything as mmio accesses, including user space access from Xorg > for example. That in turn means we'd need to mmio emulate neon > instructions and all other sorts of things that can trigger mmio > exits without being emulated today. It is not possible to emulate these instructions (load/store multiple, whether they are GP or FP registers) other than with a "stop the world" approach (in order to close the race where you read the instruction from memory while another vcpu changes the pages tables). > Also, even with that working and maybe even coalesced mmio > implemented, I'd guess it'd still be too slow for real world > usage... And probably even slower than you think. There is no way around using the architecture as is should be used. Either the guest is using cacheable memory, or userspace is using non-cacheable memory. Everything else is bound to fail one way or another. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel ` (2 preceding siblings ...) 2016-06-25 7:19 ` Alexander Graf @ 2016-06-27 9:16 ` Christoffer Dall 2016-06-27 9:47 ` Ard Biesheuvel 3 siblings, 1 reply; 32+ messages in thread From: Christoffer Dall @ 2016-06-27 9:16 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm Hi, I'm going to ask some stupid questions here... On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > Hi all, > > This old subject came up again in a discussion related to PCIe support > for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > regions as cacheable is preventing us from reusing a significant slice > of the PCIe support infrastructure, and so I'd like to bring this up > again, perhaps just to reiterate why we're simply out of luck. > > To refresh your memories, the issue is that on ARM, PCI MMIO regions > for emulated devices may be backed by memory that is mapped cacheable > by the host. Note that this has nothing to do with the device being > DMA coherent or not: in this case, we are dealing with regions that > are not memory from the POV of the guest, and it is reasonable for the > guest to assume that accesses to such a region are not visible to the > device before they hit the actual PCI MMIO window and are translated > into cycles on the PCI bus. For the sake of completeness, why is this reasonable? Is this how any real ARM system implementing PCI would actually work? > That means that mapping such a region > cacheable is a strange thing to do, in fact, and it is unlikely that > patches implementing this against the generic PCI stack in Tianocore > will be accepted by the maintainers. > > Note that this issue not only affects framebuffers on PCI cards, it > also affects emulated USB host controllers (perhaps Alex can remind us > which one exactly?) and likely other emulated generic PCI devices as > well. > > Since the issue exists only for emulated PCI devices whose MMIO > regions are backed by host memory, is there any way we can already > distinguish such memslots from ordinary ones? If we can, is there > anything we could do to treat these specially? Perhaps something like > using read-only memslots so we can at least trap guest writes instead > of having main memory going out of sync with the caches unnoticed? I > am just brainstorming here ... I think the only sensible solution is to make sure that the guest and emulation mappings use the same memory type, either cached or non-cached, and we 'simply' have to find the best way to implement this. As Drew suggested, forcing some S2 mappings to be non-cacheable is the one way. The other way is to use something like what you once wrote that rewrites stage-1 mappings to be cacheable, does that apply here ? Do we have a clear picture of why we'd prefer one way over the other? > > In any case, it would be good to put this to bed one way or the other > (assuming it hasn't been put to bed already) > Agreed. Thanks for the mail! -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 9:16 ` Christoffer Dall @ 2016-06-27 9:47 ` Ard Biesheuvel 2016-06-27 10:34 ` Christoffer Dall 2016-06-27 13:15 ` Peter Maydell 0 siblings, 2 replies; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-27 9:47 UTC (permalink / raw) To: Christoffer Dall; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: > Hi, > > I'm going to ask some stupid questions here... > > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >> Hi all, >> >> This old subject came up again in a discussion related to PCIe support >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >> regions as cacheable is preventing us from reusing a significant slice >> of the PCIe support infrastructure, and so I'd like to bring this up >> again, perhaps just to reiterate why we're simply out of luck. >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions >> for emulated devices may be backed by memory that is mapped cacheable >> by the host. Note that this has nothing to do with the device being >> DMA coherent or not: in this case, we are dealing with regions that >> are not memory from the POV of the guest, and it is reasonable for the >> guest to assume that accesses to such a region are not visible to the >> device before they hit the actual PCI MMIO window and are translated >> into cycles on the PCI bus. > > For the sake of completeness, why is this reasonable? > Because the whole point of accessing these regions is to communicate with the device. It is common to use write combining mappings for things like framebuffers to group writes before they hit the PCI bus, but any caching just makes it more difficult for the driver state and device state to remain synchronized. > Is this how any real ARM system implementing PCI would actually work? > Yes. >> That means that mapping such a region >> cacheable is a strange thing to do, in fact, and it is unlikely that >> patches implementing this against the generic PCI stack in Tianocore >> will be accepted by the maintainers. >> >> Note that this issue not only affects framebuffers on PCI cards, it >> also affects emulated USB host controllers (perhaps Alex can remind us >> which one exactly?) and likely other emulated generic PCI devices as >> well. >> >> Since the issue exists only for emulated PCI devices whose MMIO >> regions are backed by host memory, is there any way we can already >> distinguish such memslots from ordinary ones? If we can, is there >> anything we could do to treat these specially? Perhaps something like >> using read-only memslots so we can at least trap guest writes instead >> of having main memory going out of sync with the caches unnoticed? I >> am just brainstorming here ... > > I think the only sensible solution is to make sure that the guest and > emulation mappings use the same memory type, either cached or > non-cached, and we 'simply' have to find the best way to implement this. > > As Drew suggested, forcing some S2 mappings to be non-cacheable is the > one way. > > The other way is to use something like what you once wrote that rewrites > stage-1 mappings to be cacheable, does that apply here ? > > Do we have a clear picture of why we'd prefer one way over the other? > So first of all, let me reiterate that I could only find a single instance in QEMU where a PCI MMIO region is backed by host memory, which is vga-pci.c. I wonder of there are any other occurrences, but if there aren't any, it makes much more sense to prohibit PCI BARs backed by host memory rather than spend a lot of effort working around it. If we do decide to fix this, the best way would be to use uncached attributes for the QEMU userland mapping, and force it uncached in the guest via a stage 2 override (as Drews suggests). The only problem I see here is that the host's kernel direct mapping has a cached alias that we need to get rid of. The MAIR hack is just that, a hack, since there are corner cases that cannot be handled (but please refer to the old thread for the details) As for the USB case, I can't really figure out what is going on here, but I am fairly certain it is a different issue. If this is related to DMA, I wonder if adding the 'dma-coherent' property to the PCIe root complex node fixes anything. -- Ard. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 9:47 ` Ard Biesheuvel @ 2016-06-27 10:34 ` Christoffer Dall 2016-06-27 12:30 ` Ard Biesheuvel ` (2 more replies) 2016-06-27 13:15 ` Peter Maydell 1 sibling, 3 replies; 32+ messages in thread From: Christoffer Dall @ 2016-06-27 10:34 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > Hi, > > > > I'm going to ask some stupid questions here... > > > > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > >> Hi all, > >> > >> This old subject came up again in a discussion related to PCIe support > >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > >> regions as cacheable is preventing us from reusing a significant slice > >> of the PCIe support infrastructure, and so I'd like to bring this up > >> again, perhaps just to reiterate why we're simply out of luck. > >> > >> To refresh your memories, the issue is that on ARM, PCI MMIO regions > >> for emulated devices may be backed by memory that is mapped cacheable > >> by the host. Note that this has nothing to do with the device being > >> DMA coherent or not: in this case, we are dealing with regions that > >> are not memory from the POV of the guest, and it is reasonable for the > >> guest to assume that accesses to such a region are not visible to the > >> device before they hit the actual PCI MMIO window and are translated > >> into cycles on the PCI bus. > > > > For the sake of completeness, why is this reasonable? > > > > Because the whole point of accessing these regions is to communicate > with the device. It is common to use write combining mappings for > things like framebuffers to group writes before they hit the PCI bus, > but any caching just makes it more difficult for the driver state and > device state to remain synchronized. > > > Is this how any real ARM system implementing PCI would actually work? > > > > Yes. > > >> That means that mapping such a region > >> cacheable is a strange thing to do, in fact, and it is unlikely that > >> patches implementing this against the generic PCI stack in Tianocore > >> will be accepted by the maintainers. > >> > >> Note that this issue not only affects framebuffers on PCI cards, it > >> also affects emulated USB host controllers (perhaps Alex can remind us > >> which one exactly?) and likely other emulated generic PCI devices as > >> well. > >> > >> Since the issue exists only for emulated PCI devices whose MMIO > >> regions are backed by host memory, is there any way we can already > >> distinguish such memslots from ordinary ones? If we can, is there > >> anything we could do to treat these specially? Perhaps something like > >> using read-only memslots so we can at least trap guest writes instead > >> of having main memory going out of sync with the caches unnoticed? I > >> am just brainstorming here ... > > > > I think the only sensible solution is to make sure that the guest and > > emulation mappings use the same memory type, either cached or > > non-cached, and we 'simply' have to find the best way to implement this. > > > > As Drew suggested, forcing some S2 mappings to be non-cacheable is the > > one way. > > > > The other way is to use something like what you once wrote that rewrites > > stage-1 mappings to be cacheable, does that apply here ? > > > > Do we have a clear picture of why we'd prefer one way over the other? > > > > So first of all, let me reiterate that I could only find a single > instance in QEMU where a PCI MMIO region is backed by host memory, > which is vga-pci.c. I wonder of there are any other occurrences, but > if there aren't any, it makes much more sense to prohibit PCI BARs > backed by host memory rather than spend a lot of effort working around > it. Right, ok. So Marc's point during his KVM Forum talk was basically, don't use the legacy VGA adapter on ARM and use virtio graphics, right? What is the proposed solution for someone shipping an ARM server and wishing to provide a graphical output for that server? It feels strange to work around supporting PCI VGA adapters in ARM VMs, if that's not a supported real hardware case. However, I don't see what would prevent someone from plugging a VGA adapter into the PCI slot on an ARM server, and people selling ARM servers probably want this to happen, I'm guessing. > > If we do decide to fix this, the best way would be to use uncached > attributes for the QEMU userland mapping, and force it uncached in the > guest via a stage 2 override (as Drews suggests). The only problem I > see here is that the host's kernel direct mapping has a cached alias > that we need to get rid of. Do we have a way to accomplish that? Will we run into a bunch of other problems if we begin punching holes in the direct mapping for regular RAM? Thanks, -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 10:34 ` Christoffer Dall @ 2016-06-27 12:30 ` Ard Biesheuvel 2016-06-27 13:35 ` Christoffer Dall 2016-06-27 14:24 ` Alexander Graf 2016-06-28 10:55 ` Laszlo Ersek 2 siblings, 1 reply; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-27 12:30 UTC (permalink / raw) To: Christoffer Dall; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >> > Hi, >> > >> > I'm going to ask some stupid questions here... >> > >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >> >> Hi all, >> >> >> >> This old subject came up again in a discussion related to PCIe support >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >> >> regions as cacheable is preventing us from reusing a significant slice >> >> of the PCIe support infrastructure, and so I'd like to bring this up >> >> again, perhaps just to reiterate why we're simply out of luck. >> >> >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions >> >> for emulated devices may be backed by memory that is mapped cacheable >> >> by the host. Note that this has nothing to do with the device being >> >> DMA coherent or not: in this case, we are dealing with regions that >> >> are not memory from the POV of the guest, and it is reasonable for the >> >> guest to assume that accesses to such a region are not visible to the >> >> device before they hit the actual PCI MMIO window and are translated >> >> into cycles on the PCI bus. >> > >> > For the sake of completeness, why is this reasonable? >> > >> >> Because the whole point of accessing these regions is to communicate >> with the device. It is common to use write combining mappings for >> things like framebuffers to group writes before they hit the PCI bus, >> but any caching just makes it more difficult for the driver state and >> device state to remain synchronized. >> >> > Is this how any real ARM system implementing PCI would actually work? >> > >> >> Yes. >> >> >> That means that mapping such a region >> >> cacheable is a strange thing to do, in fact, and it is unlikely that >> >> patches implementing this against the generic PCI stack in Tianocore >> >> will be accepted by the maintainers. >> >> >> >> Note that this issue not only affects framebuffers on PCI cards, it >> >> also affects emulated USB host controllers (perhaps Alex can remind us >> >> which one exactly?) and likely other emulated generic PCI devices as >> >> well. >> >> >> >> Since the issue exists only for emulated PCI devices whose MMIO >> >> regions are backed by host memory, is there any way we can already >> >> distinguish such memslots from ordinary ones? If we can, is there >> >> anything we could do to treat these specially? Perhaps something like >> >> using read-only memslots so we can at least trap guest writes instead >> >> of having main memory going out of sync with the caches unnoticed? I >> >> am just brainstorming here ... >> > >> > I think the only sensible solution is to make sure that the guest and >> > emulation mappings use the same memory type, either cached or >> > non-cached, and we 'simply' have to find the best way to implement this. >> > >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the >> > one way. >> > >> > The other way is to use something like what you once wrote that rewrites >> > stage-1 mappings to be cacheable, does that apply here ? >> > >> > Do we have a clear picture of why we'd prefer one way over the other? >> > >> >> So first of all, let me reiterate that I could only find a single >> instance in QEMU where a PCI MMIO region is backed by host memory, >> which is vga-pci.c. I wonder of there are any other occurrences, but >> if there aren't any, it makes much more sense to prohibit PCI BARs >> backed by host memory rather than spend a lot of effort working around >> it. > > Right, ok. So Marc's point during his KVM Forum talk was basically, > don't use the legacy VGA adapter on ARM and use virtio graphics, right? > Yes. But nothing is preventing you currently from using that, and I think we should prefer crappy performance but correct operation over the current situation. So in general, we should either disallow PCI BARs backed by host memory, or emulate them, but never back them by a RAM memslot when running under ARM/KVM. > What is the proposed solution for someone shipping an ARM server and > wishing to provide a graphical output for that server? > The problem does not exist on bare metal. It is an implementation detail of KVM on ARM that guest PCI BAR mappings are incoherent with the view of the emulator in QEMU. > It feels strange to work around supporting PCI VGA adapters in ARM VMs, > if that's not a supported real hardware case. However, I don't see what > would prevent someone from plugging a VGA adapter into the PCI slot on > an ARM server, and people selling ARM servers probably want this to > happen, I'm guessing. > As I said, the problem does not exist on bare metal. >> >> If we do decide to fix this, the best way would be to use uncached >> attributes for the QEMU userland mapping, and force it uncached in the >> guest via a stage 2 override (as Drews suggests). The only problem I >> see here is that the host's kernel direct mapping has a cached alias >> that we need to get rid of. > > Do we have a way to accomplish that? > > Will we run into a bunch of other problems if we begin punching holes in > the direct mapping for regular RAM? > I think the policy up until now has been not to remap regions in the kernel direct mapping for the purposes of DMA, and I think by the same reasoning, it is not preferable for KVM either ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 12:30 ` Ard Biesheuvel @ 2016-06-27 13:35 ` Christoffer Dall 2016-06-27 13:57 ` Ard Biesheuvel 0 siblings, 1 reply; 32+ messages in thread From: Christoffer Dall @ 2016-06-27 13:35 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: > On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > >> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >> > Hi, > >> > > >> > I'm going to ask some stupid questions here... > >> > > >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > >> >> Hi all, > >> >> > >> >> This old subject came up again in a discussion related to PCIe support > >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > >> >> regions as cacheable is preventing us from reusing a significant slice > >> >> of the PCIe support infrastructure, and so I'd like to bring this up > >> >> again, perhaps just to reiterate why we're simply out of luck. > >> >> > >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions > >> >> for emulated devices may be backed by memory that is mapped cacheable > >> >> by the host. Note that this has nothing to do with the device being > >> >> DMA coherent or not: in this case, we are dealing with regions that > >> >> are not memory from the POV of the guest, and it is reasonable for the > >> >> guest to assume that accesses to such a region are not visible to the > >> >> device before they hit the actual PCI MMIO window and are translated > >> >> into cycles on the PCI bus. > >> > > >> > For the sake of completeness, why is this reasonable? > >> > > >> > >> Because the whole point of accessing these regions is to communicate > >> with the device. It is common to use write combining mappings for > >> things like framebuffers to group writes before they hit the PCI bus, > >> but any caching just makes it more difficult for the driver state and > >> device state to remain synchronized. > >> > >> > Is this how any real ARM system implementing PCI would actually work? > >> > > >> > >> Yes. > >> > >> >> That means that mapping such a region > >> >> cacheable is a strange thing to do, in fact, and it is unlikely that > >> >> patches implementing this against the generic PCI stack in Tianocore > >> >> will be accepted by the maintainers. > >> >> > >> >> Note that this issue not only affects framebuffers on PCI cards, it > >> >> also affects emulated USB host controllers (perhaps Alex can remind us > >> >> which one exactly?) and likely other emulated generic PCI devices as > >> >> well. > >> >> > >> >> Since the issue exists only for emulated PCI devices whose MMIO > >> >> regions are backed by host memory, is there any way we can already > >> >> distinguish such memslots from ordinary ones? If we can, is there > >> >> anything we could do to treat these specially? Perhaps something like > >> >> using read-only memslots so we can at least trap guest writes instead > >> >> of having main memory going out of sync with the caches unnoticed? I > >> >> am just brainstorming here ... > >> > > >> > I think the only sensible solution is to make sure that the guest and > >> > emulation mappings use the same memory type, either cached or > >> > non-cached, and we 'simply' have to find the best way to implement this. > >> > > >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the > >> > one way. > >> > > >> > The other way is to use something like what you once wrote that rewrites > >> > stage-1 mappings to be cacheable, does that apply here ? > >> > > >> > Do we have a clear picture of why we'd prefer one way over the other? > >> > > >> > >> So first of all, let me reiterate that I could only find a single > >> instance in QEMU where a PCI MMIO region is backed by host memory, > >> which is vga-pci.c. I wonder of there are any other occurrences, but > >> if there aren't any, it makes much more sense to prohibit PCI BARs > >> backed by host memory rather than spend a lot of effort working around > >> it. > > > > Right, ok. So Marc's point during his KVM Forum talk was basically, > > don't use the legacy VGA adapter on ARM and use virtio graphics, right? > > > > Yes. But nothing is preventing you currently from using that, and I > think we should prefer crappy performance but correct operation over > the current situation. So in general, we should either disallow PCI > BARs backed by host memory, or emulate them, but never back them by a > RAM memslot when running under ARM/KVM. agreed, I just think that emulating accesses by trapping them is not just slow, it's not really possible in practice and even if it is, it's probably *unusably* slow. > > > What is the proposed solution for someone shipping an ARM server and > > wishing to provide a graphical output for that server? > > > > The problem does not exist on bare metal. It is an implementation > detail of KVM on ARM that guest PCI BAR mappings are incoherent with > the view of the emulator in QEMU. > > > It feels strange to work around supporting PCI VGA adapters in ARM VMs, > > if that's not a supported real hardware case. However, I don't see what > > would prevent someone from plugging a VGA adapter into the PCI slot on > > an ARM server, and people selling ARM servers probably want this to > > happen, I'm guessing. > > > > As I said, the problem does not exist on bare metal. > > >> > >> If we do decide to fix this, the best way would be to use uncached > >> attributes for the QEMU userland mapping, and force it uncached in the > >> guest via a stage 2 override (as Drews suggests). The only problem I > >> see here is that the host's kernel direct mapping has a cached alias > >> that we need to get rid of. > > > > Do we have a way to accomplish that? > > > > Will we run into a bunch of other problems if we begin punching holes in > > the direct mapping for regular RAM? > > > > I think the policy up until now has been not to remap regions in the > kernel direct mapping for the purposes of DMA, and I think by the same > reasoning, it is not preferable for KVM either I guess the difference is that from the (host) kernel's point of view this is not DMA memory, but just regular RAM. I just don't know enough about the kernel's VM mappings to know what's involved here, but we should find out somehow... Thanks, -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 13:35 ` Christoffer Dall @ 2016-06-27 13:57 ` Ard Biesheuvel 2016-06-27 14:29 ` Alexander Graf 2016-06-28 10:04 ` Christoffer Dall 0 siblings, 2 replies; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-27 13:57 UTC (permalink / raw) To: Christoffer Dall; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: >> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: >> > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >> >> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >> >> > Hi, >> >> > >> >> > I'm going to ask some stupid questions here... >> >> > >> >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >> >> >> Hi all, >> >> >> >> >> >> This old subject came up again in a discussion related to PCIe support >> >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >> >> >> regions as cacheable is preventing us from reusing a significant slice >> >> >> of the PCIe support infrastructure, and so I'd like to bring this up >> >> >> again, perhaps just to reiterate why we're simply out of luck. >> >> >> >> >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions >> >> >> for emulated devices may be backed by memory that is mapped cacheable >> >> >> by the host. Note that this has nothing to do with the device being >> >> >> DMA coherent or not: in this case, we are dealing with regions that >> >> >> are not memory from the POV of the guest, and it is reasonable for the >> >> >> guest to assume that accesses to such a region are not visible to the >> >> >> device before they hit the actual PCI MMIO window and are translated >> >> >> into cycles on the PCI bus. >> >> > >> >> > For the sake of completeness, why is this reasonable? >> >> > >> >> >> >> Because the whole point of accessing these regions is to communicate >> >> with the device. It is common to use write combining mappings for >> >> things like framebuffers to group writes before they hit the PCI bus, >> >> but any caching just makes it more difficult for the driver state and >> >> device state to remain synchronized. >> >> >> >> > Is this how any real ARM system implementing PCI would actually work? >> >> > >> >> >> >> Yes. >> >> >> >> >> That means that mapping such a region >> >> >> cacheable is a strange thing to do, in fact, and it is unlikely that >> >> >> patches implementing this against the generic PCI stack in Tianocore >> >> >> will be accepted by the maintainers. >> >> >> >> >> >> Note that this issue not only affects framebuffers on PCI cards, it >> >> >> also affects emulated USB host controllers (perhaps Alex can remind us >> >> >> which one exactly?) and likely other emulated generic PCI devices as >> >> >> well. >> >> >> >> >> >> Since the issue exists only for emulated PCI devices whose MMIO >> >> >> regions are backed by host memory, is there any way we can already >> >> >> distinguish such memslots from ordinary ones? If we can, is there >> >> >> anything we could do to treat these specially? Perhaps something like >> >> >> using read-only memslots so we can at least trap guest writes instead >> >> >> of having main memory going out of sync with the caches unnoticed? I >> >> >> am just brainstorming here ... >> >> > >> >> > I think the only sensible solution is to make sure that the guest and >> >> > emulation mappings use the same memory type, either cached or >> >> > non-cached, and we 'simply' have to find the best way to implement this. >> >> > >> >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the >> >> > one way. >> >> > >> >> > The other way is to use something like what you once wrote that rewrites >> >> > stage-1 mappings to be cacheable, does that apply here ? >> >> > >> >> > Do we have a clear picture of why we'd prefer one way over the other? >> >> > >> >> >> >> So first of all, let me reiterate that I could only find a single >> >> instance in QEMU where a PCI MMIO region is backed by host memory, >> >> which is vga-pci.c. I wonder of there are any other occurrences, but >> >> if there aren't any, it makes much more sense to prohibit PCI BARs >> >> backed by host memory rather than spend a lot of effort working around >> >> it. >> > >> > Right, ok. So Marc's point during his KVM Forum talk was basically, >> > don't use the legacy VGA adapter on ARM and use virtio graphics, right? >> > >> >> Yes. But nothing is preventing you currently from using that, and I >> think we should prefer crappy performance but correct operation over >> the current situation. So in general, we should either disallow PCI >> BARs backed by host memory, or emulate them, but never back them by a >> RAM memslot when running under ARM/KVM. > > agreed, I just think that emulating accesses by trapping them is not > just slow, it's not really possible in practice and even if it is, it's > probably *unusably* slow. > Well, it would probably involve a lot of effort to implement emulation of instructions with multiple output registers, such as ldp/stp and register writeback. And indeed, trapping on each store instruction to the framebuffer is going to be sloooooowwwww. So let's disregard that option for now ... >> >> > What is the proposed solution for someone shipping an ARM server and >> > wishing to provide a graphical output for that server? >> > >> >> The problem does not exist on bare metal. It is an implementation >> detail of KVM on ARM that guest PCI BAR mappings are incoherent with >> the view of the emulator in QEMU. >> >> > It feels strange to work around supporting PCI VGA adapters in ARM VMs, >> > if that's not a supported real hardware case. However, I don't see what >> > would prevent someone from plugging a VGA adapter into the PCI slot on >> > an ARM server, and people selling ARM servers probably want this to >> > happen, I'm guessing. >> > >> >> As I said, the problem does not exist on bare metal. >> >> >> >> >> If we do decide to fix this, the best way would be to use uncached >> >> attributes for the QEMU userland mapping, and force it uncached in the >> >> guest via a stage 2 override (as Drews suggests). The only problem I >> >> see here is that the host's kernel direct mapping has a cached alias >> >> that we need to get rid of. >> > >> > Do we have a way to accomplish that? >> > >> > Will we run into a bunch of other problems if we begin punching holes in >> > the direct mapping for regular RAM? >> > >> >> I think the policy up until now has been not to remap regions in the >> kernel direct mapping for the purposes of DMA, and I think by the same >> reasoning, it is not preferable for KVM either > > I guess the difference is that from the (host) kernel's point of view > this is not DMA memory, but just regular RAM. I just don't know enough > about the kernel's VM mappings to know what's involved here, but we > should find out somehow... > Whether it is DMA memory or not does not make a difference. The point is simply that arm64 maps all RAM owned by the kernel as cacheable, and remapping arbitrary ranges with different attributes is problematic, since it is also likely to involve splitting of regions, which is cumbersome with a mapping that is always live. So instead, we'd have to reserve some system memory early on and remove it from the linear mapping, the complexity of which is more than we are probably prepared to put up with. So if vga-pci.c is the only problematic device, for which a reasonable alternative exists (virtio-gpu), I think the only feasible solution is to educate QEMU not to allow RAM memslots being exposed via PCI BARs when running under KVM/ARM. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 13:57 ` Ard Biesheuvel @ 2016-06-27 14:29 ` Alexander Graf 2016-06-28 11:02 ` Laszlo Ersek 2016-06-28 10:04 ` Christoffer Dall 1 sibling, 1 reply; 32+ messages in thread From: Alexander Graf @ 2016-06-27 14:29 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm > Am 27.06.2016 um 15:57 schrieb Ard Biesheuvel <ard.biesheuvel@linaro.org>: > >> On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: >>>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >>>>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>>> Hi, >>>>>> >>>>>> I'm going to ask some stupid questions here... >>>>>> >>>>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> This old subject came up again in a discussion related to PCIe support >>>>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >>>>>>> regions as cacheable is preventing us from reusing a significant slice >>>>>>> of the PCIe support infrastructure, and so I'd like to bring this up >>>>>>> again, perhaps just to reiterate why we're simply out of luck. >>>>>>> >>>>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions >>>>>>> for emulated devices may be backed by memory that is mapped cacheable >>>>>>> by the host. Note that this has nothing to do with the device being >>>>>>> DMA coherent or not: in this case, we are dealing with regions that >>>>>>> are not memory from the POV of the guest, and it is reasonable for the >>>>>>> guest to assume that accesses to such a region are not visible to the >>>>>>> device before they hit the actual PCI MMIO window and are translated >>>>>>> into cycles on the PCI bus. >>>>>> >>>>>> For the sake of completeness, why is this reasonable? >>>>> >>>>> Because the whole point of accessing these regions is to communicate >>>>> with the device. It is common to use write combining mappings for >>>>> things like framebuffers to group writes before they hit the PCI bus, >>>>> but any caching just makes it more difficult for the driver state and >>>>> device state to remain synchronized. >>>>> >>>>>> Is this how any real ARM system implementing PCI would actually work? >>>>> >>>>> Yes. >>>>> >>>>>>> That means that mapping such a region >>>>>>> cacheable is a strange thing to do, in fact, and it is unlikely that >>>>>>> patches implementing this against the generic PCI stack in Tianocore >>>>>>> will be accepted by the maintainers. >>>>>>> >>>>>>> Note that this issue not only affects framebuffers on PCI cards, it >>>>>>> also affects emulated USB host controllers (perhaps Alex can remind us >>>>>>> which one exactly?) and likely other emulated generic PCI devices as >>>>>>> well. >>>>>>> >>>>>>> Since the issue exists only for emulated PCI devices whose MMIO >>>>>>> regions are backed by host memory, is there any way we can already >>>>>>> distinguish such memslots from ordinary ones? If we can, is there >>>>>>> anything we could do to treat these specially? Perhaps something like >>>>>>> using read-only memslots so we can at least trap guest writes instead >>>>>>> of having main memory going out of sync with the caches unnoticed? I >>>>>>> am just brainstorming here ... >>>>>> >>>>>> I think the only sensible solution is to make sure that the guest and >>>>>> emulation mappings use the same memory type, either cached or >>>>>> non-cached, and we 'simply' have to find the best way to implement this. >>>>>> >>>>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the >>>>>> one way. >>>>>> >>>>>> The other way is to use something like what you once wrote that rewrites >>>>>> stage-1 mappings to be cacheable, does that apply here ? >>>>>> >>>>>> Do we have a clear picture of why we'd prefer one way over the other? >>>>> >>>>> So first of all, let me reiterate that I could only find a single >>>>> instance in QEMU where a PCI MMIO region is backed by host memory, >>>>> which is vga-pci.c. I wonder of there are any other occurrences, but >>>>> if there aren't any, it makes much more sense to prohibit PCI BARs >>>>> backed by host memory rather than spend a lot of effort working around >>>>> it. >>>> >>>> Right, ok. So Marc's point during his KVM Forum talk was basically, >>>> don't use the legacy VGA adapter on ARM and use virtio graphics, right? >>> >>> Yes. But nothing is preventing you currently from using that, and I >>> think we should prefer crappy performance but correct operation over >>> the current situation. So in general, we should either disallow PCI >>> BARs backed by host memory, or emulate them, but never back them by a >>> RAM memslot when running under ARM/KVM. >> >> agreed, I just think that emulating accesses by trapping them is not >> just slow, it's not really possible in practice and even if it is, it's >> probably *unusably* slow. > > Well, it would probably involve a lot of effort to implement emulation > of instructions with multiple output registers, such as ldp/stp and > register writeback. And indeed, trapping on each store instruction to > the framebuffer is going to be sloooooowwwww. > > So let's disregard that option for now ... > >>> >>>> What is the proposed solution for someone shipping an ARM server and >>>> wishing to provide a graphical output for that server? >>> >>> The problem does not exist on bare metal. It is an implementation >>> detail of KVM on ARM that guest PCI BAR mappings are incoherent with >>> the view of the emulator in QEMU. >>> >>>> It feels strange to work around supporting PCI VGA adapters in ARM VMs, >>>> if that's not a supported real hardware case. However, I don't see what >>>> would prevent someone from plugging a VGA adapter into the PCI slot on >>>> an ARM server, and people selling ARM servers probably want this to >>>> happen, I'm guessing. >>> >>> As I said, the problem does not exist on bare metal. >>> >>>>> >>>>> If we do decide to fix this, the best way would be to use uncached >>>>> attributes for the QEMU userland mapping, and force it uncached in the >>>>> guest via a stage 2 override (as Drews suggests). The only problem I >>>>> see here is that the host's kernel direct mapping has a cached alias >>>>> that we need to get rid of. >>>> >>>> Do we have a way to accomplish that? >>>> >>>> Will we run into a bunch of other problems if we begin punching holes in >>>> the direct mapping for regular RAM? >>> >>> I think the policy up until now has been not to remap regions in the >>> kernel direct mapping for the purposes of DMA, and I think by the same >>> reasoning, it is not preferable for KVM either >> >> I guess the difference is that from the (host) kernel's point of view >> this is not DMA memory, but just regular RAM. I just don't know enough >> about the kernel's VM mappings to know what's involved here, but we >> should find out somehow... > > Whether it is DMA memory or not does not make a difference. The point > is simply that arm64 maps all RAM owned by the kernel as cacheable, > and remapping arbitrary ranges with different attributes is > problematic, since it is also likely to involve splitting of regions, > which is cumbersome with a mapping that is always live. > > So instead, we'd have to reserve some system memory early on and > remove it from the linear mapping, the complexity of which is more > than we are probably prepared to put up with. > > So if vga-pci.c is the only problematic device, for which a reasonable > alternative exists (virtio-gpu), I think the only feasible solution is > to educate QEMU not to allow RAM memslots being exposed via PCI BARs > when running under KVM/ARM. That's ok, if there is a viable alternative. So if we had working virtio-gpu support in OVMF, we could just disable the legacy vga device with kvm on arm altogether - it'd either crash your guest (unhandled opcode in mmio emulation) or give you broken graphics. But first, someone would need to sit down and make virtio-gpu work in OVMF. Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 14:29 ` Alexander Graf @ 2016-06-28 11:02 ` Laszlo Ersek 0 siblings, 0 replies; 32+ messages in thread From: Laszlo Ersek @ 2016-06-28 11:02 UTC (permalink / raw) To: Alexander Graf, Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, kvmarm On 06/27/16 16:29, Alexander Graf wrote: > > >> Am 27.06.2016 um 15:57 schrieb Ard Biesheuvel <ard.biesheuvel@linaro.org>: >> >>> On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: >>>>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >>>>>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm going to ask some stupid questions here... >>>>>>> >>>>>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> This old subject came up again in a discussion related to PCIe support >>>>>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >>>>>>>> regions as cacheable is preventing us from reusing a significant slice >>>>>>>> of the PCIe support infrastructure, and so I'd like to bring this up >>>>>>>> again, perhaps just to reiterate why we're simply out of luck. >>>>>>>> >>>>>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions >>>>>>>> for emulated devices may be backed by memory that is mapped cacheable >>>>>>>> by the host. Note that this has nothing to do with the device being >>>>>>>> DMA coherent or not: in this case, we are dealing with regions that >>>>>>>> are not memory from the POV of the guest, and it is reasonable for the >>>>>>>> guest to assume that accesses to such a region are not visible to the >>>>>>>> device before they hit the actual PCI MMIO window and are translated >>>>>>>> into cycles on the PCI bus. >>>>>>> >>>>>>> For the sake of completeness, why is this reasonable? >>>>>> >>>>>> Because the whole point of accessing these regions is to communicate >>>>>> with the device. It is common to use write combining mappings for >>>>>> things like framebuffers to group writes before they hit the PCI bus, >>>>>> but any caching just makes it more difficult for the driver state and >>>>>> device state to remain synchronized. >>>>>> >>>>>>> Is this how any real ARM system implementing PCI would actually work? >>>>>> >>>>>> Yes. >>>>>> >>>>>>>> That means that mapping such a region >>>>>>>> cacheable is a strange thing to do, in fact, and it is unlikely that >>>>>>>> patches implementing this against the generic PCI stack in Tianocore >>>>>>>> will be accepted by the maintainers. >>>>>>>> >>>>>>>> Note that this issue not only affects framebuffers on PCI cards, it >>>>>>>> also affects emulated USB host controllers (perhaps Alex can remind us >>>>>>>> which one exactly?) and likely other emulated generic PCI devices as >>>>>>>> well. >>>>>>>> >>>>>>>> Since the issue exists only for emulated PCI devices whose MMIO >>>>>>>> regions are backed by host memory, is there any way we can already >>>>>>>> distinguish such memslots from ordinary ones? If we can, is there >>>>>>>> anything we could do to treat these specially? Perhaps something like >>>>>>>> using read-only memslots so we can at least trap guest writes instead >>>>>>>> of having main memory going out of sync with the caches unnoticed? I >>>>>>>> am just brainstorming here ... >>>>>>> >>>>>>> I think the only sensible solution is to make sure that the guest and >>>>>>> emulation mappings use the same memory type, either cached or >>>>>>> non-cached, and we 'simply' have to find the best way to implement this. >>>>>>> >>>>>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the >>>>>>> one way. >>>>>>> >>>>>>> The other way is to use something like what you once wrote that rewrites >>>>>>> stage-1 mappings to be cacheable, does that apply here ? >>>>>>> >>>>>>> Do we have a clear picture of why we'd prefer one way over the other? >>>>>> >>>>>> So first of all, let me reiterate that I could only find a single >>>>>> instance in QEMU where a PCI MMIO region is backed by host memory, >>>>>> which is vga-pci.c. I wonder of there are any other occurrences, but >>>>>> if there aren't any, it makes much more sense to prohibit PCI BARs >>>>>> backed by host memory rather than spend a lot of effort working around >>>>>> it. >>>>> >>>>> Right, ok. So Marc's point during his KVM Forum talk was basically, >>>>> don't use the legacy VGA adapter on ARM and use virtio graphics, right? >>>> >>>> Yes. But nothing is preventing you currently from using that, and I >>>> think we should prefer crappy performance but correct operation over >>>> the current situation. So in general, we should either disallow PCI >>>> BARs backed by host memory, or emulate them, but never back them by a >>>> RAM memslot when running under ARM/KVM. >>> >>> agreed, I just think that emulating accesses by trapping them is not >>> just slow, it's not really possible in practice and even if it is, it's >>> probably *unusably* slow. >> >> Well, it would probably involve a lot of effort to implement emulation >> of instructions with multiple output registers, such as ldp/stp and >> register writeback. And indeed, trapping on each store instruction to >> the framebuffer is going to be sloooooowwwww. >> >> So let's disregard that option for now ... >> >>>> >>>>> What is the proposed solution for someone shipping an ARM server and >>>>> wishing to provide a graphical output for that server? >>>> >>>> The problem does not exist on bare metal. It is an implementation >>>> detail of KVM on ARM that guest PCI BAR mappings are incoherent with >>>> the view of the emulator in QEMU. >>>> >>>>> It feels strange to work around supporting PCI VGA adapters in ARM VMs, >>>>> if that's not a supported real hardware case. However, I don't see what >>>>> would prevent someone from plugging a VGA adapter into the PCI slot on >>>>> an ARM server, and people selling ARM servers probably want this to >>>>> happen, I'm guessing. >>>> >>>> As I said, the problem does not exist on bare metal. >>>> >>>>>> >>>>>> If we do decide to fix this, the best way would be to use uncached >>>>>> attributes for the QEMU userland mapping, and force it uncached in the >>>>>> guest via a stage 2 override (as Drews suggests). The only problem I >>>>>> see here is that the host's kernel direct mapping has a cached alias >>>>>> that we need to get rid of. >>>>> >>>>> Do we have a way to accomplish that? >>>>> >>>>> Will we run into a bunch of other problems if we begin punching holes in >>>>> the direct mapping for regular RAM? >>>> >>>> I think the policy up until now has been not to remap regions in the >>>> kernel direct mapping for the purposes of DMA, and I think by the same >>>> reasoning, it is not preferable for KVM either >>> >>> I guess the difference is that from the (host) kernel's point of view >>> this is not DMA memory, but just regular RAM. I just don't know enough >>> about the kernel's VM mappings to know what's involved here, but we >>> should find out somehow... >> >> Whether it is DMA memory or not does not make a difference. The point >> is simply that arm64 maps all RAM owned by the kernel as cacheable, >> and remapping arbitrary ranges with different attributes is >> problematic, since it is also likely to involve splitting of regions, >> which is cumbersome with a mapping that is always live. >> >> So instead, we'd have to reserve some system memory early on and >> remove it from the linear mapping, the complexity of which is more >> than we are probably prepared to put up with. >> >> So if vga-pci.c is the only problematic device, for which a reasonable >> alternative exists (virtio-gpu), I think the only feasible solution is >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs >> when running under KVM/ARM. > > That's ok, if there is a viable alternative. So if we had working virtio-gpu support in OVMF, we could just disable the legacy vga device with kvm on arm altogether - it'd either crash your guest (unhandled opcode in mmio emulation) or give you broken graphics. > > But first, someone would need to sit down and make virtio-gpu work in OVMF. I've offered to (attempt to) implement a GOP driver for virtio-gpu, to be used by OvmfPkg and ArmVirtPkg, once the virtio-gpu bits become part of the official virtio specification. However, as I mentioned elsewhere in this thread, a GOP driver for virtio-gpu could provide the Blt() kind of display output *only*. Virtio-gpu (the device model) lacks a linear framebuffer by design, hence no GOP can expose it. The GOP can only offer the Blt() member function, which would internally turn the block transfer requests into virtio-gpu request. Unfortunately, offering Blt() *only* is not good enough. Some UEFI boot laoders (on x86 at least) depend on direct framebuffer access. Let me put it this way: the UEFI spec describes a possibility for the GOP implementor to expose a linear framebuffer for the display device if there is one. If there is one, great; if there isn't, that's fine too, the GOP specification allows it as well. It is the UEFI boot loaders (well, some of them) that ultimately depend on the linear framebuffer, which is what virtio-gpu (the device model) lacks. Thanks Laszlo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 13:57 ` Ard Biesheuvel 2016-06-27 14:29 ` Alexander Graf @ 2016-06-28 10:04 ` Christoffer Dall 2016-06-28 11:06 ` Laszlo Ersek 1 sibling, 1 reply; 32+ messages in thread From: Christoffer Dall @ 2016-06-28 10:04 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: > On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: > >> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >> > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > >> >> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >> >> > Hi, > >> >> > > >> >> > I'm going to ask some stupid questions here... > >> >> > > >> >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > >> >> >> Hi all, > >> >> >> > >> >> >> This old subject came up again in a discussion related to PCIe support > >> >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > >> >> >> regions as cacheable is preventing us from reusing a significant slice > >> >> >> of the PCIe support infrastructure, and so I'd like to bring this up > >> >> >> again, perhaps just to reiterate why we're simply out of luck. > >> >> >> > >> >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions > >> >> >> for emulated devices may be backed by memory that is mapped cacheable > >> >> >> by the host. Note that this has nothing to do with the device being > >> >> >> DMA coherent or not: in this case, we are dealing with regions that > >> >> >> are not memory from the POV of the guest, and it is reasonable for the > >> >> >> guest to assume that accesses to such a region are not visible to the > >> >> >> device before they hit the actual PCI MMIO window and are translated > >> >> >> into cycles on the PCI bus. > >> >> > > >> >> > For the sake of completeness, why is this reasonable? > >> >> > > >> >> > >> >> Because the whole point of accessing these regions is to communicate > >> >> with the device. It is common to use write combining mappings for > >> >> things like framebuffers to group writes before they hit the PCI bus, > >> >> but any caching just makes it more difficult for the driver state and > >> >> device state to remain synchronized. > >> >> > >> >> > Is this how any real ARM system implementing PCI would actually work? > >> >> > > >> >> > >> >> Yes. > >> >> > >> >> >> That means that mapping such a region > >> >> >> cacheable is a strange thing to do, in fact, and it is unlikely that > >> >> >> patches implementing this against the generic PCI stack in Tianocore > >> >> >> will be accepted by the maintainers. > >> >> >> > >> >> >> Note that this issue not only affects framebuffers on PCI cards, it > >> >> >> also affects emulated USB host controllers (perhaps Alex can remind us > >> >> >> which one exactly?) and likely other emulated generic PCI devices as > >> >> >> well. > >> >> >> > >> >> >> Since the issue exists only for emulated PCI devices whose MMIO > >> >> >> regions are backed by host memory, is there any way we can already > >> >> >> distinguish such memslots from ordinary ones? If we can, is there > >> >> >> anything we could do to treat these specially? Perhaps something like > >> >> >> using read-only memslots so we can at least trap guest writes instead > >> >> >> of having main memory going out of sync with the caches unnoticed? I > >> >> >> am just brainstorming here ... > >> >> > > >> >> > I think the only sensible solution is to make sure that the guest and > >> >> > emulation mappings use the same memory type, either cached or > >> >> > non-cached, and we 'simply' have to find the best way to implement this. > >> >> > > >> >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the > >> >> > one way. > >> >> > > >> >> > The other way is to use something like what you once wrote that rewrites > >> >> > stage-1 mappings to be cacheable, does that apply here ? > >> >> > > >> >> > Do we have a clear picture of why we'd prefer one way over the other? > >> >> > > >> >> > >> >> So first of all, let me reiterate that I could only find a single > >> >> instance in QEMU where a PCI MMIO region is backed by host memory, > >> >> which is vga-pci.c. I wonder of there are any other occurrences, but > >> >> if there aren't any, it makes much more sense to prohibit PCI BARs > >> >> backed by host memory rather than spend a lot of effort working around > >> >> it. > >> > > >> > Right, ok. So Marc's point during his KVM Forum talk was basically, > >> > don't use the legacy VGA adapter on ARM and use virtio graphics, right? > >> > > >> > >> Yes. But nothing is preventing you currently from using that, and I > >> think we should prefer crappy performance but correct operation over > >> the current situation. So in general, we should either disallow PCI > >> BARs backed by host memory, or emulate them, but never back them by a > >> RAM memslot when running under ARM/KVM. > > > > agreed, I just think that emulating accesses by trapping them is not > > just slow, it's not really possible in practice and even if it is, it's > > probably *unusably* slow. > > > > Well, it would probably involve a lot of effort to implement emulation > of instructions with multiple output registers, such as ldp/stp and > register writeback. And indeed, trapping on each store instruction to > the framebuffer is going to be sloooooowwwww. > > So let's disregard that option for now ... > > >> > >> > What is the proposed solution for someone shipping an ARM server and > >> > wishing to provide a graphical output for that server? > >> > > >> > >> The problem does not exist on bare metal. It is an implementation > >> detail of KVM on ARM that guest PCI BAR mappings are incoherent with > >> the view of the emulator in QEMU. > >> > >> > It feels strange to work around supporting PCI VGA adapters in ARM VMs, > >> > if that's not a supported real hardware case. However, I don't see what > >> > would prevent someone from plugging a VGA adapter into the PCI slot on > >> > an ARM server, and people selling ARM servers probably want this to > >> > happen, I'm guessing. > >> > > >> > >> As I said, the problem does not exist on bare metal. > >> > >> >> > >> >> If we do decide to fix this, the best way would be to use uncached > >> >> attributes for the QEMU userland mapping, and force it uncached in the > >> >> guest via a stage 2 override (as Drews suggests). The only problem I > >> >> see here is that the host's kernel direct mapping has a cached alias > >> >> that we need to get rid of. > >> > > >> > Do we have a way to accomplish that? > >> > > >> > Will we run into a bunch of other problems if we begin punching holes in > >> > the direct mapping for regular RAM? > >> > > >> > >> I think the policy up until now has been not to remap regions in the > >> kernel direct mapping for the purposes of DMA, and I think by the same > >> reasoning, it is not preferable for KVM either > > > > I guess the difference is that from the (host) kernel's point of view > > this is not DMA memory, but just regular RAM. I just don't know enough > > about the kernel's VM mappings to know what's involved here, but we > > should find out somehow... > > > > Whether it is DMA memory or not does not make a difference. The point > is simply that arm64 maps all RAM owned by the kernel as cacheable, > and remapping arbitrary ranges with different attributes is > problematic, since it is also likely to involve splitting of regions, > which is cumbersome with a mapping that is always live. > > So instead, we'd have to reserve some system memory early on and > remove it from the linear mapping, the complexity of which is more > than we are probably prepared to put up with. Don't we have any existing frameworks for such things, like ion or other things like that? Not sure if these systems export anything to userspace or even serve the purpose we want, but thought I'd throw it out there. > > So if vga-pci.c is the only problematic device, for which a reasonable > alternative exists (virtio-gpu), I think the only feasible solution is > to educate QEMU not to allow RAM memslots being exposed via PCI BARs > when running under KVM/ARM. It would be good if we could support vga-pci under KVM/ARM, but if there's no other way than rewriting the arm64 kernel's memory mappings completely, then probably we're stuck there, unfortunately. Thanks, -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 10:04 ` Christoffer Dall @ 2016-06-28 11:06 ` Laszlo Ersek 2016-06-28 12:20 ` Christoffer Dall 0 siblings, 1 reply; 32+ messages in thread From: Laszlo Ersek @ 2016-06-28 11:06 UTC (permalink / raw) To: Christoffer Dall, Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, kvmarm On 06/28/16 12:04, Christoffer Dall wrote: > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: >> On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: >>>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >>>>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I'm going to ask some stupid questions here... >>>>>>> >>>>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> This old subject came up again in a discussion related to PCIe support >>>>>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >>>>>>>> regions as cacheable is preventing us from reusing a significant slice >>>>>>>> of the PCIe support infrastructure, and so I'd like to bring this up >>>>>>>> again, perhaps just to reiterate why we're simply out of luck. >>>>>>>> >>>>>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions >>>>>>>> for emulated devices may be backed by memory that is mapped cacheable >>>>>>>> by the host. Note that this has nothing to do with the device being >>>>>>>> DMA coherent or not: in this case, we are dealing with regions that >>>>>>>> are not memory from the POV of the guest, and it is reasonable for the >>>>>>>> guest to assume that accesses to such a region are not visible to the >>>>>>>> device before they hit the actual PCI MMIO window and are translated >>>>>>>> into cycles on the PCI bus. >>>>>>> >>>>>>> For the sake of completeness, why is this reasonable? >>>>>>> >>>>>> >>>>>> Because the whole point of accessing these regions is to communicate >>>>>> with the device. It is common to use write combining mappings for >>>>>> things like framebuffers to group writes before they hit the PCI bus, >>>>>> but any caching just makes it more difficult for the driver state and >>>>>> device state to remain synchronized. >>>>>> >>>>>>> Is this how any real ARM system implementing PCI would actually work? >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>>>> That means that mapping such a region >>>>>>>> cacheable is a strange thing to do, in fact, and it is unlikely that >>>>>>>> patches implementing this against the generic PCI stack in Tianocore >>>>>>>> will be accepted by the maintainers. >>>>>>>> >>>>>>>> Note that this issue not only affects framebuffers on PCI cards, it >>>>>>>> also affects emulated USB host controllers (perhaps Alex can remind us >>>>>>>> which one exactly?) and likely other emulated generic PCI devices as >>>>>>>> well. >>>>>>>> >>>>>>>> Since the issue exists only for emulated PCI devices whose MMIO >>>>>>>> regions are backed by host memory, is there any way we can already >>>>>>>> distinguish such memslots from ordinary ones? If we can, is there >>>>>>>> anything we could do to treat these specially? Perhaps something like >>>>>>>> using read-only memslots so we can at least trap guest writes instead >>>>>>>> of having main memory going out of sync with the caches unnoticed? I >>>>>>>> am just brainstorming here ... >>>>>>> >>>>>>> I think the only sensible solution is to make sure that the guest and >>>>>>> emulation mappings use the same memory type, either cached or >>>>>>> non-cached, and we 'simply' have to find the best way to implement this. >>>>>>> >>>>>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the >>>>>>> one way. >>>>>>> >>>>>>> The other way is to use something like what you once wrote that rewrites >>>>>>> stage-1 mappings to be cacheable, does that apply here ? >>>>>>> >>>>>>> Do we have a clear picture of why we'd prefer one way over the other? >>>>>>> >>>>>> >>>>>> So first of all, let me reiterate that I could only find a single >>>>>> instance in QEMU where a PCI MMIO region is backed by host memory, >>>>>> which is vga-pci.c. I wonder of there are any other occurrences, but >>>>>> if there aren't any, it makes much more sense to prohibit PCI BARs >>>>>> backed by host memory rather than spend a lot of effort working around >>>>>> it. >>>>> >>>>> Right, ok. So Marc's point during his KVM Forum talk was basically, >>>>> don't use the legacy VGA adapter on ARM and use virtio graphics, right? >>>>> >>>> >>>> Yes. But nothing is preventing you currently from using that, and I >>>> think we should prefer crappy performance but correct operation over >>>> the current situation. So in general, we should either disallow PCI >>>> BARs backed by host memory, or emulate them, but never back them by a >>>> RAM memslot when running under ARM/KVM. >>> >>> agreed, I just think that emulating accesses by trapping them is not >>> just slow, it's not really possible in practice and even if it is, it's >>> probably *unusably* slow. >>> >> >> Well, it would probably involve a lot of effort to implement emulation >> of instructions with multiple output registers, such as ldp/stp and >> register writeback. And indeed, trapping on each store instruction to >> the framebuffer is going to be sloooooowwwww. >> >> So let's disregard that option for now ... >> >>>> >>>>> What is the proposed solution for someone shipping an ARM server and >>>>> wishing to provide a graphical output for that server? >>>>> >>>> >>>> The problem does not exist on bare metal. It is an implementation >>>> detail of KVM on ARM that guest PCI BAR mappings are incoherent with >>>> the view of the emulator in QEMU. >>>> >>>>> It feels strange to work around supporting PCI VGA adapters in ARM VMs, >>>>> if that's not a supported real hardware case. However, I don't see what >>>>> would prevent someone from plugging a VGA adapter into the PCI slot on >>>>> an ARM server, and people selling ARM servers probably want this to >>>>> happen, I'm guessing. >>>>> >>>> >>>> As I said, the problem does not exist on bare metal. >>>> >>>>>> >>>>>> If we do decide to fix this, the best way would be to use uncached >>>>>> attributes for the QEMU userland mapping, and force it uncached in the >>>>>> guest via a stage 2 override (as Drews suggests). The only problem I >>>>>> see here is that the host's kernel direct mapping has a cached alias >>>>>> that we need to get rid of. >>>>> >>>>> Do we have a way to accomplish that? >>>>> >>>>> Will we run into a bunch of other problems if we begin punching holes in >>>>> the direct mapping for regular RAM? >>>>> >>>> >>>> I think the policy up until now has been not to remap regions in the >>>> kernel direct mapping for the purposes of DMA, and I think by the same >>>> reasoning, it is not preferable for KVM either >>> >>> I guess the difference is that from the (host) kernel's point of view >>> this is not DMA memory, but just regular RAM. I just don't know enough >>> about the kernel's VM mappings to know what's involved here, but we >>> should find out somehow... >>> >> >> Whether it is DMA memory or not does not make a difference. The point >> is simply that arm64 maps all RAM owned by the kernel as cacheable, >> and remapping arbitrary ranges with different attributes is >> problematic, since it is also likely to involve splitting of regions, >> which is cumbersome with a mapping that is always live. >> >> So instead, we'd have to reserve some system memory early on and >> remove it from the linear mapping, the complexity of which is more >> than we are probably prepared to put up with. > > Don't we have any existing frameworks for such things, like ion or > other things like that? Not sure if these systems export anything to > userspace or even serve the purpose we want, but thought I'd throw it > out there. > >> >> So if vga-pci.c is the only problematic device, for which a reasonable >> alternative exists (virtio-gpu), I think the only feasible solution is >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs >> when running under KVM/ARM. > > It would be good if we could support vga-pci under KVM/ARM, but if > there's no other way than rewriting the arm64 kernel's memory mappings > completely, then probably we're stuck there, unfortunately. It's been mentioned earlier that the specific combination of S1 and S2 mappings on aarch64 is actually an *architecture bug*. If we accept that qualification, then we should realize our efforts here target finding a *workaround*. In your blog post <http://www.linaro.org/blog/core-dump/on-the-performance-of-arm-virtualization/>, you mention VHE ("Virtualization Host Extensions"). That's clearly a sign of the architecture adapting to virt software needs. Do you see any chance that the S1-S2 combinations too can be fixed in a new revision of the architecture? Thanks Laszlo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 11:06 ` Laszlo Ersek @ 2016-06-28 12:20 ` Christoffer Dall 2016-06-28 13:10 ` Catalin Marinas 0 siblings, 1 reply; 32+ messages in thread From: Christoffer Dall @ 2016-06-28 12:20 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, kvmarm On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote: > On 06/28/16 12:04, Christoffer Dall wrote: > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: > >> On 27 June 2016 at 15:35, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote: > >>>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >>>>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > >>>>>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I'm going to ask some stupid questions here... > >>>>>>> > >>>>>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> This old subject came up again in a discussion related to PCIe support > >>>>>>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO > >>>>>>>> regions as cacheable is preventing us from reusing a significant slice > >>>>>>>> of the PCIe support infrastructure, and so I'd like to bring this up > >>>>>>>> again, perhaps just to reiterate why we're simply out of luck. > >>>>>>>> > >>>>>>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions > >>>>>>>> for emulated devices may be backed by memory that is mapped cacheable > >>>>>>>> by the host. Note that this has nothing to do with the device being > >>>>>>>> DMA coherent or not: in this case, we are dealing with regions that > >>>>>>>> are not memory from the POV of the guest, and it is reasonable for the > >>>>>>>> guest to assume that accesses to such a region are not visible to the > >>>>>>>> device before they hit the actual PCI MMIO window and are translated > >>>>>>>> into cycles on the PCI bus. > >>>>>>> > >>>>>>> For the sake of completeness, why is this reasonable? > >>>>>>> > >>>>>> > >>>>>> Because the whole point of accessing these regions is to communicate > >>>>>> with the device. It is common to use write combining mappings for > >>>>>> things like framebuffers to group writes before they hit the PCI bus, > >>>>>> but any caching just makes it more difficult for the driver state and > >>>>>> device state to remain synchronized. > >>>>>> > >>>>>>> Is this how any real ARM system implementing PCI would actually work? > >>>>>>> > >>>>>> > >>>>>> Yes. > >>>>>> > >>>>>>>> That means that mapping such a region > >>>>>>>> cacheable is a strange thing to do, in fact, and it is unlikely that > >>>>>>>> patches implementing this against the generic PCI stack in Tianocore > >>>>>>>> will be accepted by the maintainers. > >>>>>>>> > >>>>>>>> Note that this issue not only affects framebuffers on PCI cards, it > >>>>>>>> also affects emulated USB host controllers (perhaps Alex can remind us > >>>>>>>> which one exactly?) and likely other emulated generic PCI devices as > >>>>>>>> well. > >>>>>>>> > >>>>>>>> Since the issue exists only for emulated PCI devices whose MMIO > >>>>>>>> regions are backed by host memory, is there any way we can already > >>>>>>>> distinguish such memslots from ordinary ones? If we can, is there > >>>>>>>> anything we could do to treat these specially? Perhaps something like > >>>>>>>> using read-only memslots so we can at least trap guest writes instead > >>>>>>>> of having main memory going out of sync with the caches unnoticed? I > >>>>>>>> am just brainstorming here ... > >>>>>>> > >>>>>>> I think the only sensible solution is to make sure that the guest and > >>>>>>> emulation mappings use the same memory type, either cached or > >>>>>>> non-cached, and we 'simply' have to find the best way to implement this. > >>>>>>> > >>>>>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the > >>>>>>> one way. > >>>>>>> > >>>>>>> The other way is to use something like what you once wrote that rewrites > >>>>>>> stage-1 mappings to be cacheable, does that apply here ? > >>>>>>> > >>>>>>> Do we have a clear picture of why we'd prefer one way over the other? > >>>>>>> > >>>>>> > >>>>>> So first of all, let me reiterate that I could only find a single > >>>>>> instance in QEMU where a PCI MMIO region is backed by host memory, > >>>>>> which is vga-pci.c. I wonder of there are any other occurrences, but > >>>>>> if there aren't any, it makes much more sense to prohibit PCI BARs > >>>>>> backed by host memory rather than spend a lot of effort working around > >>>>>> it. > >>>>> > >>>>> Right, ok. So Marc's point during his KVM Forum talk was basically, > >>>>> don't use the legacy VGA adapter on ARM and use virtio graphics, right? > >>>>> > >>>> > >>>> Yes. But nothing is preventing you currently from using that, and I > >>>> think we should prefer crappy performance but correct operation over > >>>> the current situation. So in general, we should either disallow PCI > >>>> BARs backed by host memory, or emulate them, but never back them by a > >>>> RAM memslot when running under ARM/KVM. > >>> > >>> agreed, I just think that emulating accesses by trapping them is not > >>> just slow, it's not really possible in practice and even if it is, it's > >>> probably *unusably* slow. > >>> > >> > >> Well, it would probably involve a lot of effort to implement emulation > >> of instructions with multiple output registers, such as ldp/stp and > >> register writeback. And indeed, trapping on each store instruction to > >> the framebuffer is going to be sloooooowwwww. > >> > >> So let's disregard that option for now ... > >> > >>>> > >>>>> What is the proposed solution for someone shipping an ARM server and > >>>>> wishing to provide a graphical output for that server? > >>>>> > >>>> > >>>> The problem does not exist on bare metal. It is an implementation > >>>> detail of KVM on ARM that guest PCI BAR mappings are incoherent with > >>>> the view of the emulator in QEMU. > >>>> > >>>>> It feels strange to work around supporting PCI VGA adapters in ARM VMs, > >>>>> if that's not a supported real hardware case. However, I don't see what > >>>>> would prevent someone from plugging a VGA adapter into the PCI slot on > >>>>> an ARM server, and people selling ARM servers probably want this to > >>>>> happen, I'm guessing. > >>>>> > >>>> > >>>> As I said, the problem does not exist on bare metal. > >>>> > >>>>>> > >>>>>> If we do decide to fix this, the best way would be to use uncached > >>>>>> attributes for the QEMU userland mapping, and force it uncached in the > >>>>>> guest via a stage 2 override (as Drews suggests). The only problem I > >>>>>> see here is that the host's kernel direct mapping has a cached alias > >>>>>> that we need to get rid of. > >>>>> > >>>>> Do we have a way to accomplish that? > >>>>> > >>>>> Will we run into a bunch of other problems if we begin punching holes in > >>>>> the direct mapping for regular RAM? > >>>>> > >>>> > >>>> I think the policy up until now has been not to remap regions in the > >>>> kernel direct mapping for the purposes of DMA, and I think by the same > >>>> reasoning, it is not preferable for KVM either > >>> > >>> I guess the difference is that from the (host) kernel's point of view > >>> this is not DMA memory, but just regular RAM. I just don't know enough > >>> about the kernel's VM mappings to know what's involved here, but we > >>> should find out somehow... > >>> > >> > >> Whether it is DMA memory or not does not make a difference. The point > >> is simply that arm64 maps all RAM owned by the kernel as cacheable, > >> and remapping arbitrary ranges with different attributes is > >> problematic, since it is also likely to involve splitting of regions, > >> which is cumbersome with a mapping that is always live. > >> > >> So instead, we'd have to reserve some system memory early on and > >> remove it from the linear mapping, the complexity of which is more > >> than we are probably prepared to put up with. > > > > Don't we have any existing frameworks for such things, like ion or > > other things like that? Not sure if these systems export anything to > > userspace or even serve the purpose we want, but thought I'd throw it > > out there. > > > >> > >> So if vga-pci.c is the only problematic device, for which a reasonable > >> alternative exists (virtio-gpu), I think the only feasible solution is > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs > >> when running under KVM/ARM. > > > > It would be good if we could support vga-pci under KVM/ARM, but if > > there's no other way than rewriting the arm64 kernel's memory mappings > > completely, then probably we're stuck there, unfortunately. > > It's been mentioned earlier that the specific combination of S1 and S2 > mappings on aarch64 is actually an *architecture bug*. If we accept that > qualification, then we should realize our efforts here target finding a > *workaround*. > > In your blog post > <http://www.linaro.org/blog/core-dump/on-the-performance-of-arm-virtualization/>, > you mention VHE ("Virtualization Host Extensions"). That's clearly a > sign of the architecture adapting to virt software needs. > > Do you see any chance that the S1-S2 combinations too can be fixed in a > new revision of the architecture? > I really can't speculate about this, I assume there are reasons for why the architecture is defined in this particular way, but I haven't investigated this aspect in any depth. -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 12:20 ` Christoffer Dall @ 2016-06-28 13:10 ` Catalin Marinas 2016-06-28 13:19 ` Ard Biesheuvel 0 siblings, 1 reply; 32+ messages in thread From: Catalin Marinas @ 2016-06-28 13:10 UTC (permalink / raw) To: Christoffer Dall; +Cc: Ard Biesheuvel, Marc Zyngier, Laszlo Ersek, kvmarm On Tue, Jun 28, 2016 at 02:20:43PM +0200, Christoffer Dall wrote: > On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote: > > On 06/28/16 12:04, Christoffer Dall wrote: > > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: > > >> So if vga-pci.c is the only problematic device, for which a reasonable > > >> alternative exists (virtio-gpu), I think the only feasible solution is > > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs > > >> when running under KVM/ARM. > > > > > > It would be good if we could support vga-pci under KVM/ARM, but if > > > there's no other way than rewriting the arm64 kernel's memory mappings > > > completely, then probably we're stuck there, unfortunately. Just to be clear, the behaviour of mismatched memory attributes is defined in the ARM ARM and so far Linux worked fine with such cacheable vs non-cacheable (as long as only one of them is accessed *or* cache maintenance is performed accordingly). I don't think the arm64 kernel memory map needs to be rewritten. > > It's been mentioned earlier that the specific combination of S1 and S2 > > mappings on aarch64 is actually an *architecture bug*. If we accept that > > qualification, then we should realize our efforts here target finding a > > *workaround*. I haven't read this thread in detail but I doubt it's an architecture bug. You may say a missing feature. > > In your blog post > > <http://www.linaro.org/blog/core-dump/on-the-performance-of-arm-virtualization/>, > > you mention VHE ("Virtualization Host Extensions"). That's clearly a > > sign of the architecture adapting to virt software needs. > > > > Do you see any chance that the S1-S2 combinations too can be fixed in a > > new revision of the architecture? > > I really can't speculate about this, I assume there are reasons for why > the architecture is defined in this particular way, but I haven't > investigated this aspect in any depth. In general, there are software issues with forcing cacheability at S2 when S1 required non-cacheable transactions, with all the coherency assumptions. The problem becomes even more complicated when memory types, not just cacheability, are "upgraded". E.g. forcing S1 Device to S2 Normal with consequences on memory ordering that the guest is not aware of. While there are potential, specific, hardware solutions, they can't be "back-ported" to existing CPU implementation, so we need a solution in software. *If* the only software solution has severe performance implications and it is on a critical path, the architecture might be improved in the future (like we did with VHE). But I don't think that's the case here. -- Catalin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 13:10 ` Catalin Marinas @ 2016-06-28 13:19 ` Ard Biesheuvel 2016-06-28 13:25 ` Catalin Marinas 0 siblings, 1 reply; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-28 13:19 UTC (permalink / raw) To: Catalin Marinas; +Cc: Marc Zyngier, Laszlo Ersek, kvmarm On 28 June 2016 at 15:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Tue, Jun 28, 2016 at 02:20:43PM +0200, Christoffer Dall wrote: >> On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote: >> > On 06/28/16 12:04, Christoffer Dall wrote: >> > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: >> > >> So if vga-pci.c is the only problematic device, for which a reasonable >> > >> alternative exists (virtio-gpu), I think the only feasible solution is >> > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs >> > >> when running under KVM/ARM. >> > > >> > > It would be good if we could support vga-pci under KVM/ARM, but if >> > > there's no other way than rewriting the arm64 kernel's memory mappings >> > > completely, then probably we're stuck there, unfortunately. > > Just to be clear, the behaviour of mismatched memory attributes is > defined in the ARM ARM and so far Linux worked fine with such cacheable > vs non-cacheable (as long as only one of them is accessed *or* cache > maintenance is performed accordingly). I don't think the arm64 kernel > memory map needs to be rewritten. > That would suggest that having an uncached userland mapping in QEMU and an uncached kernel mapping in the guest would be ok as long as we don't access the host kernel's cacheable alias? In that case, Drew's approach would be feasible, and the pci_register_bar() function in QEMU could be modified to force the userland mapping and the stage2 mapping to 'device' [when running under KVM/ARM] if it refers to a memslot that is backed by host memory. -- Ard. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 13:19 ` Ard Biesheuvel @ 2016-06-28 13:25 ` Catalin Marinas 2016-06-28 14:02 ` Andrew Jones 0 siblings, 1 reply; 32+ messages in thread From: Catalin Marinas @ 2016-06-28 13:25 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Laszlo Ersek, kvmarm On Tue, Jun 28, 2016 at 03:19:14PM +0200, Ard Biesheuvel wrote: > On 28 June 2016 at 15:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Jun 28, 2016 at 02:20:43PM +0200, Christoffer Dall wrote: > >> On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote: > >> > On 06/28/16 12:04, Christoffer Dall wrote: > >> > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: > >> > >> So if vga-pci.c is the only problematic device, for which a reasonable > >> > >> alternative exists (virtio-gpu), I think the only feasible solution is > >> > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs > >> > >> when running under KVM/ARM. > >> > > > >> > > It would be good if we could support vga-pci under KVM/ARM, but if > >> > > there's no other way than rewriting the arm64 kernel's memory mappings > >> > > completely, then probably we're stuck there, unfortunately. > > > > Just to be clear, the behaviour of mismatched memory attributes is > > defined in the ARM ARM and so far Linux worked fine with such cacheable > > vs non-cacheable (as long as only one of them is accessed *or* cache > > maintenance is performed accordingly). I don't think the arm64 kernel > > memory map needs to be rewritten. > > That would suggest that having an uncached userland mapping in QEMU > and an uncached kernel mapping in the guest would be ok as long as we > don't access the host kernel's cacheable alias? Yes, from an architecture perspective. Many framebuffer drivers already work in a similar way and map the framebuffer memory in user as non-cacheable. Of course, one difference is that the other agent accessing the memory is a DMA device rather than the CPU. > In that case, Drew's approach would be feasible, and the > pci_register_bar() function in QEMU could be modified to force the > userland mapping and the stage2 mapping to 'device' [when running > under KVM/ARM] if it refers to a memslot that is backed by host > memory. Device or normal non-cacheable (depending on the unaligned access requirements). Since such memory is allocated by Qemu (rather than a kernel driver), KVM would need to mark the pages as reserved so that they are not moved around by the host kernel, especially since it would use the cacheable alias. Another issue is taking care of the host kernel merging adjacent vmas since we only want to apply the attributes to a single region. -- Catalin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 13:25 ` Catalin Marinas @ 2016-06-28 14:02 ` Andrew Jones 0 siblings, 0 replies; 32+ messages in thread From: Andrew Jones @ 2016-06-28 14:02 UTC (permalink / raw) To: Catalin Marinas; +Cc: Ard Biesheuvel, Marc Zyngier, Laszlo Ersek, kvmarm On Tue, Jun 28, 2016 at 02:25:19PM +0100, Catalin Marinas wrote: > On Tue, Jun 28, 2016 at 03:19:14PM +0200, Ard Biesheuvel wrote: > > On 28 June 2016 at 15:10, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > On Tue, Jun 28, 2016 at 02:20:43PM +0200, Christoffer Dall wrote: > > >> On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote: > > >> > On 06/28/16 12:04, Christoffer Dall wrote: > > >> > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote: > > >> > >> So if vga-pci.c is the only problematic device, for which a reasonable > > >> > >> alternative exists (virtio-gpu), I think the only feasible solution is > > >> > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs > > >> > >> when running under KVM/ARM. > > >> > > > > >> > > It would be good if we could support vga-pci under KVM/ARM, but if > > >> > > there's no other way than rewriting the arm64 kernel's memory mappings > > >> > > completely, then probably we're stuck there, unfortunately. > > > > > > Just to be clear, the behaviour of mismatched memory attributes is > > > defined in the ARM ARM and so far Linux worked fine with such cacheable > > > vs non-cacheable (as long as only one of them is accessed *or* cache > > > maintenance is performed accordingly). I don't think the arm64 kernel > > > memory map needs to be rewritten. > > > > That would suggest that having an uncached userland mapping in QEMU > > and an uncached kernel mapping in the guest would be ok as long as we > > don't access the host kernel's cacheable alias? > > Yes, from an architecture perspective. Many framebuffer drivers already > work in a similar way and map the framebuffer memory in user as > non-cacheable. Of course, one difference is that the other agent > accessing the memory is a DMA device rather than the CPU. > > > In that case, Drew's approach would be feasible, and the > > pci_register_bar() function in QEMU could be modified to force the > > userland mapping and the stage2 mapping to 'device' [when running > > under KVM/ARM] if it refers to a memslot that is backed by host > > memory. > > Device or normal non-cacheable (depending on the unaligned access > requirements). > > Since such memory is allocated by Qemu (rather than a kernel driver), > KVM would need to mark the pages as reserved so that they are not moved > around by the host kernel, especially since it would use the cacheable > alias. > > Another issue is taking care of the host kernel merging adjacent vmas > since we only want to apply the attributes to a single region. I also experimented with dropping the KVM memslot flag in favor of an madvise flag, allowing us to avoid vma merging problems. I forget if I dropped that for any other reason than I thought it would generate too much hate mail... Or maybe it was because I ended up needing to add a new mprotect flag too, which I was quite sure would generate hate mail, even though I found precedence for it ef3d3246a0d0 powerpc/mm: Add Strong Access Ordering support I have experimental patches (somewhere, they don't seem to be on this laptop) for that stuff, but I can't recall what was still broken with them in the end. I just recall it still didn't work, which is why I never posted even a crazy RFC. Thanks, drew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 10:34 ` Christoffer Dall 2016-06-27 12:30 ` Ard Biesheuvel @ 2016-06-27 14:24 ` Alexander Graf 2016-06-28 10:55 ` Laszlo Ersek 2 siblings, 0 replies; 32+ messages in thread From: Alexander Graf @ 2016-06-27 14:24 UTC (permalink / raw) To: Christoffer Dall Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm > Am 27.06.2016 um 12:34 schrieb Christoffer Dall <christoffer.dall@linaro.org>: > >> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.dall@linaro.org> wrote: >>> Hi, >>> >>> I'm going to ask some stupid questions here... >>> >>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote: >>>> Hi all, >>>> >>>> This old subject came up again in a discussion related to PCIe support >>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO >>>> regions as cacheable is preventing us from reusing a significant slice >>>> of the PCIe support infrastructure, and so I'd like to bring this up >>>> again, perhaps just to reiterate why we're simply out of luck. >>>> >>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions >>>> for emulated devices may be backed by memory that is mapped cacheable >>>> by the host. Note that this has nothing to do with the device being >>>> DMA coherent or not: in this case, we are dealing with regions that >>>> are not memory from the POV of the guest, and it is reasonable for the >>>> guest to assume that accesses to such a region are not visible to the >>>> device before they hit the actual PCI MMIO window and are translated >>>> into cycles on the PCI bus. >>> >>> For the sake of completeness, why is this reasonable? >> >> Because the whole point of accessing these regions is to communicate >> with the device. It is common to use write combining mappings for >> things like framebuffers to group writes before they hit the PCI bus, >> but any caching just makes it more difficult for the driver state and >> device state to remain synchronized. >> >>> Is this how any real ARM system implementing PCI would actually work? >> >> Yes. >> >>>> That means that mapping such a region >>>> cacheable is a strange thing to do, in fact, and it is unlikely that >>>> patches implementing this against the generic PCI stack in Tianocore >>>> will be accepted by the maintainers. >>>> >>>> Note that this issue not only affects framebuffers on PCI cards, it >>>> also affects emulated USB host controllers (perhaps Alex can remind us >>>> which one exactly?) and likely other emulated generic PCI devices as >>>> well. >>>> >>>> Since the issue exists only for emulated PCI devices whose MMIO >>>> regions are backed by host memory, is there any way we can already >>>> distinguish such memslots from ordinary ones? If we can, is there >>>> anything we could do to treat these specially? Perhaps something like >>>> using read-only memslots so we can at least trap guest writes instead >>>> of having main memory going out of sync with the caches unnoticed? I >>>> am just brainstorming here ... >>> >>> I think the only sensible solution is to make sure that the guest and >>> emulation mappings use the same memory type, either cached or >>> non-cached, and we 'simply' have to find the best way to implement this. >>> >>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the >>> one way. >>> >>> The other way is to use something like what you once wrote that rewrites >>> stage-1 mappings to be cacheable, does that apply here ? >>> >>> Do we have a clear picture of why we'd prefer one way over the other? >> >> So first of all, let me reiterate that I could only find a single >> instance in QEMU where a PCI MMIO region is backed by host memory, >> which is vga-pci.c. I wonder of there are any other occurrences, but >> if there aren't any, it makes much more sense to prohibit PCI BARs >> backed by host memory rather than spend a lot of effort working around >> it. > > Right, ok. So Marc's point during his KVM Forum talk was basically, > don't use the legacy VGA adapter on ARM and use virtio graphics, right? > > What is the proposed solution for someone shipping an ARM server and > wishing to provide a graphical output for that server? Well, there is at least one server that I know of that has PCI VGA built in ;). I think he was more concerned about VMs rather than real hardware. > > It feels strange to work around supporting PCI VGA adapters in ARM VMs, > if that's not a supported real hardware case. However, I don't see what > would prevent someone from plugging a VGA adapter into the PCI slot on > an ARM server, and people selling ARM servers probably want this to > happen, I'm guessing. > >> >> If we do decide to fix this, the best way would be to use uncached >> attributes for the QEMU userland mapping, and force it uncached in the >> guest via a stage 2 override (as Drews suggests). The only problem I >> see here is that the host's kernel direct mapping has a cached alias >> that we need to get rid of. > > Do we have a way to accomplish that? > > Will we run into a bunch of other problems if we begin punching holes in > the direct mapping for regular RAM? Yeah, and how do you deal with aliases on that memory? You'd also need to stop ksm to run on it for example. Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 10:34 ` Christoffer Dall 2016-06-27 12:30 ` Ard Biesheuvel 2016-06-27 14:24 ` Alexander Graf @ 2016-06-28 10:55 ` Laszlo Ersek 2016-06-28 13:14 ` Ard Biesheuvel 2016-06-28 15:23 ` Alexander Graf 2 siblings, 2 replies; 32+ messages in thread From: Laszlo Ersek @ 2016-06-28 10:55 UTC (permalink / raw) To: Christoffer Dall, Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, kvmarm On 06/27/16 12:34, Christoffer Dall wrote: > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >> So first of all, let me reiterate that I could only find a single >> instance in QEMU where a PCI MMIO region is backed by host memory, >> which is vga-pci.c. I wonder of there are any other occurrences, but >> if there aren't any, it makes much more sense to prohibit PCI BARs >> backed by host memory rather than spend a lot of effort working around >> it. > > Right, ok. So Marc's point during his KVM Forum talk was basically, > don't use the legacy VGA adapter on ARM and use virtio graphics, right? The EFI GOP (Graphics Output Protocol) abstraction provides two ways for UEFI applications to access the display, and one way for a runtime OS to inherit the display hardware from the firmware (without OS native drivers). (a) For UEFI apps: - direct framebuffer access - Blt() (block transfer) member function (b) For runtime OS: - direct framebuffer access ("efifb" in Linux) Virtio-gpu lacks a linear framebuffer by design. Therefore the above methods are reduced to the following: (c) UEFI apps can access virtio-gpu with: - GOP.Blt() member function only (d) The runtime guest OS can access the virtio-gpu device as-inherited from the firmware (i.e., without native drivers) with: - n/a. Given that we expect all aarch64 OSes to include native virtio-gpu drivers on their install media, (d) is actually not a problem. Whenever the OS kernel runs, we except to have no need for "efifb", ever. So that's good. The problem is (c). UEFI boot loaders would have to be taught to call GOP.Blt() manually, whenever they need to display something. I'm not sure about grub2's current status, but it is free software, so in theory it should be doable. However, UEFI windows boot loaders are proprietary *and* they require direct framebuffer access (on x86 at least); they don't work with Blt()-only. (I found some Microsoft presentations about this earlier.) So, virtio-gpu is an almost universal solution for the problem, but not entirely. For any given GOP, offering Blt() *only* (i.e., not exposing a linear framebuffer) conforms to the UEFI spec, but some boot loaders are known to present further requirements (on x86 anyway). Thanks Laszlo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 10:55 ` Laszlo Ersek @ 2016-06-28 13:14 ` Ard Biesheuvel 2016-06-28 13:32 ` Laszlo Ersek 2016-06-28 15:23 ` Alexander Graf 1 sibling, 1 reply; 32+ messages in thread From: Ard Biesheuvel @ 2016-06-28 13:14 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Marc Zyngier, Catalin Marinas, kvmarm On 28 June 2016 at 12:55, Laszlo Ersek <lersek@redhat.com> wrote: > On 06/27/16 12:34, Christoffer Dall wrote: >> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: > >>> So first of all, let me reiterate that I could only find a single >>> instance in QEMU where a PCI MMIO region is backed by host memory, >>> which is vga-pci.c. I wonder of there are any other occurrences, but >>> if there aren't any, it makes much more sense to prohibit PCI BARs >>> backed by host memory rather than spend a lot of effort working around >>> it. >> >> Right, ok. So Marc's point during his KVM Forum talk was basically, >> don't use the legacy VGA adapter on ARM and use virtio graphics, right? > > The EFI GOP (Graphics Output Protocol) abstraction provides two ways for > UEFI applications to access the display, and one way for a runtime OS to > inherit the display hardware from the firmware (without OS native drivers). > > (a) For UEFI apps: > - direct framebuffer access > - Blt() (block transfer) member function > > (b) For runtime OS: > - direct framebuffer access ("efifb" in Linux) > > Virtio-gpu lacks a linear framebuffer by design. Therefore the above > methods are reduced to the following: > > (c) UEFI apps can access virtio-gpu with: > - GOP.Blt() member function only > > (d) The runtime guest OS can access the virtio-gpu device as-inherited > from the firmware (i.e., without native drivers) with: > - n/a. > > Given that we expect all aarch64 OSes to include native virtio-gpu > drivers on their install media, (d) is actually not a problem. Whenever > the OS kernel runs, we except to have no need for "efifb", ever. So > that's good. > > The problem is (c). UEFI boot loaders would have to be taught to call > GOP.Blt() manually, whenever they need to display something. I'm not > sure about grub2's current status, but it is free software, so in theory > it should be doable. However, UEFI windows boot loaders are proprietary > *and* they require direct framebuffer access (on x86 at least); they > don't work with Blt()-only. (I found some Microsoft presentations about > this earlier.) > > So, virtio-gpu is an almost universal solution for the problem, but not > entirely. For any given GOP, offering Blt() *only* (i.e., not exposing a > linear framebuffer) conforms to the UEFI spec, but some boot loaders are > known to present further requirements (on x86 anyway). > Even if virtio-gpu would expose a linear framebuffer, it would likely expose it as a PCI BAR, and we would be in the exact same situation. The only way we can work around this is to emulate a DMA coherent device that uses a framebuffer in system RAM. I looked at the PL111, which is already supported both in EDK2 and the Linux kernel, and would only require minor changes to support DMA coherent devices. Unfortunately, we would not be able to advertise its presence when running under ACPI, since it is not a PCI device. In any case, reconciling software that requires a framebuffer with a GPU emulation that does not expose one by design is going to be problematic even without this issue. How is this supposed to work on x86? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 13:14 ` Ard Biesheuvel @ 2016-06-28 13:32 ` Laszlo Ersek 2016-06-29 7:12 ` Gerd Hoffmann 0 siblings, 1 reply; 32+ messages in thread From: Laszlo Ersek @ 2016-06-28 13:32 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Gerd Hoffmann, kvmarm (adding Gerd) On 06/28/16 15:14, Ard Biesheuvel wrote: > In any case, reconciling software that requires a framebuffer with a > GPU emulation that does not expose one by design is going to be > problematic even without this issue. How is this supposed to work on > x86? AFAIK: "virtio-gpu-pci" is the device model without the framebuffer. It is good for secondary displays (i.e. those that you don't boot with, only use after the guest kernel starts up). "virtio-vga" is the same, but it also has the legacy VGA framebuffer, hence it can be used for accommodating boot loaders. (Except it won't work for aarch64 KVM guests, because of $SUBJECT.) Thanks Laszlo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 13:32 ` Laszlo Ersek @ 2016-06-29 7:12 ` Gerd Hoffmann 0 siblings, 0 replies; 32+ messages in thread From: Gerd Hoffmann @ 2016-06-29 7:12 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, kvmarm On Di, 2016-06-28 at 15:32 +0200, Laszlo Ersek wrote: > (adding Gerd) > > On 06/28/16 15:14, Ard Biesheuvel wrote: > > > In any case, reconciling software that requires a framebuffer with a > > GPU emulation that does not expose one by design is going to be > > problematic even without this issue. How is this supposed to work on > > x86? > > AFAIK: > > "virtio-gpu-pci" is the device model without the framebuffer. It is good > for secondary displays (i.e. those that you don't boot with, only use > after the guest kernel starts up). > > "virtio-vga" is the same, but it also has the legacy VGA framebuffer, > hence it can be used for accommodating boot loaders. (Except it won't > work for aarch64 KVM guests, because of $SUBJECT.) Exactly. virtio-vga is basically virtio-gpu-pci + stdvga combined. Power-on default is vga mode. It switches into virtio mode when the guest configures a output using virtio commands. It switches back to vga mode on reset. You can get a simple framebuffer by using the stdvga part of the device. QemuVideoDxe does exactly that. The linux kernel switches from vga mode (efifb) to virtio mode when the virtio-gpu kms driver loads. Of course virtio-vga in vga mode has exactly the same cache coherency issues as stdvga on arm. So, once the linux kernel with the virtio-gpu is up'n'running everything is fine, but how to handle early bootloader display isn't solved yet. cheers, Gerd ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-28 10:55 ` Laszlo Ersek 2016-06-28 13:14 ` Ard Biesheuvel @ 2016-06-28 15:23 ` Alexander Graf 1 sibling, 0 replies; 32+ messages in thread From: Alexander Graf @ 2016-06-28 15:23 UTC (permalink / raw) To: Laszlo Ersek, Christoffer Dall, Ard Biesheuvel Cc: Marc Zyngier, Catalin Marinas, kvmarm On 06/28/2016 12:55 PM, Laszlo Ersek wrote: > On 06/27/16 12:34, Christoffer Dall wrote: >> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote: >>> So first of all, let me reiterate that I could only find a single >>> instance in QEMU where a PCI MMIO region is backed by host memory, >>> which is vga-pci.c. I wonder of there are any other occurrences, but >>> if there aren't any, it makes much more sense to prohibit PCI BARs >>> backed by host memory rather than spend a lot of effort working around >>> it. >> Right, ok. So Marc's point during his KVM Forum talk was basically, >> don't use the legacy VGA adapter on ARM and use virtio graphics, right? > The EFI GOP (Graphics Output Protocol) abstraction provides two ways for > UEFI applications to access the display, and one way for a runtime OS to > inherit the display hardware from the firmware (without OS native drivers). > > (a) For UEFI apps: > - direct framebuffer access > - Blt() (block transfer) member function > > (b) For runtime OS: > - direct framebuffer access ("efifb" in Linux) > > Virtio-gpu lacks a linear framebuffer by design. Therefore the above > methods are reduced to the following: > > (c) UEFI apps can access virtio-gpu with: > - GOP.Blt() member function only > > (d) The runtime guest OS can access the virtio-gpu device as-inherited > from the firmware (i.e., without native drivers) with: > - n/a. > > Given that we expect all aarch64 OSes to include native virtio-gpu > drivers on their install media, (d) is actually not a problem. Whenever > the OS kernel runs, we except to have no need for "efifb", ever. So > that's good. > > The problem is (c). UEFI boot loaders would have to be taught to call > GOP.Blt() manually, whenever they need to display something. I'm not > sure about grub2's current status, but it is free software, so in theory > it should be doable. However, UEFI windows boot loaders are proprietary Yes, grub2 already ignores the frame buffer target address and instead uses Blt() operations only. > *and* they require direct framebuffer access (on x86 at least); they > don't work with Blt()-only. (I found some Microsoft presentations about > this earlier.) > > So, virtio-gpu is an almost universal solution for the problem, but not > entirely. For any given GOP, offering Blt() *only* (i.e., not exposing a > linear framebuffer) conforms to the UEFI spec, but some boot loaders are > known to present further requirements (on x86 anyway). Well, I'm perfectly happy in ignoring Windows on KVM for now, if that gets us working, smooth Linux guest support :). Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 9:47 ` Ard Biesheuvel 2016-06-27 10:34 ` Christoffer Dall @ 2016-06-27 13:15 ` Peter Maydell 2016-06-27 13:49 ` Mark Rutland 1 sibling, 1 reply; 32+ messages in thread From: Peter Maydell @ 2016-06-27 13:15 UTC (permalink / raw) To: Ard Biesheuvel; +Cc: Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On 27 June 2016 at 10:47, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > As for the USB case, I can't really figure out what is going on here, > but I am fairly certain it is a different issue. If this is related to > DMA, I wonder if adding the 'dma-coherent' property to the PCIe root > complex node fixes anything. I get the impression dma-coherent is the right thing to advertise anyway. Do you have the documentation to hand that specifies what "dma-coherent" means? The Documentation/devicetree docs in the kernel tree seem to rather unhelpfully define it as "Present if dma operations are coherent", which doesn't really clarify anything to me... thanks -- PMM ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 13:15 ` Peter Maydell @ 2016-06-27 13:49 ` Mark Rutland 2016-06-27 14:10 ` Peter Maydell 0 siblings, 1 reply; 32+ messages in thread From: Mark Rutland @ 2016-06-27 13:49 UTC (permalink / raw) To: Peter Maydell Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On Mon, Jun 27, 2016 at 02:15:29PM +0100, Peter Maydell wrote: > On 27 June 2016 at 10:47, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > As for the USB case, I can't really figure out what is going on here, > > but I am fairly certain it is a different issue. If this is related to > > DMA, I wonder if adding the 'dma-coherent' property to the PCIe root > > complex node fixes anything. > > I get the impression dma-coherent is the right thing to advertise > anyway. Do you have the documentation to hand that specifies what > "dma-coherent" means? The Documentation/devicetree docs in the > kernel tree seem to rather unhelpfully define it as "Present if > dma operations are coherent", which doesn't really clarify anything > to me... It's ill-defined today, and the precise definition is an open question. See replies to [1], which seems to have stalled as of [2]. My view is that for arm/arm64 this should mean the device makes accesses which are coherent with Inner Shareable Normal Inner-WB Outer-WB attributes, as this is the functional de-facto semantics today, and anything short of that is not well-defined or usable. Thanks, Mark. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/433626.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-June/434143.html ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 13:49 ` Mark Rutland @ 2016-06-27 14:10 ` Peter Maydell 2016-06-28 10:05 ` Christoffer Dall 0 siblings, 1 reply; 32+ messages in thread From: Peter Maydell @ 2016-06-27 14:10 UTC (permalink / raw) To: Mark Rutland Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On 27 June 2016 at 14:49, Mark Rutland <mark.rutland@arm.com> wrote: > On Mon, Jun 27, 2016 at 02:15:29PM +0100, Peter Maydell wrote: >> I get the impression dma-coherent is the right thing to advertise >> anyway. Do you have the documentation to hand that specifies what >> "dma-coherent" means? The Documentation/devicetree docs in the >> kernel tree seem to rather unhelpfully define it as "Present if >> dma operations are coherent", which doesn't really clarify anything >> to me... > > It's ill-defined today, and the precise definition is an open question. > See replies to [1], which seems to have stalled as of [2]. > > My view is that for arm/arm64 this should mean the device makes accesses > which are coherent with Inner Shareable Normal Inner-WB Outer-WB > attributes, as this is the functional de-facto semantics today, and > anything short of that is not well-defined or usable. OK, so for any emulated device in QEMU we should specify dma-coherent by those rules. I think our only DMA devices in the virt board are the emulated PCI devices; dma-coherent here is a property of the pci-controller and applies to any device on it, right? Presumably this means that if the host pci-controller doesn't advertise itself as dma-coherent then we cannot do any PCI passthrough of host hardware? thanks -- PMM ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: issues with emulated PCI MMIO backed by host memory under KVM 2016-06-27 14:10 ` Peter Maydell @ 2016-06-28 10:05 ` Christoffer Dall 0 siblings, 0 replies; 32+ messages in thread From: Christoffer Dall @ 2016-06-28 10:05 UTC (permalink / raw) To: Peter Maydell Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Laszlo Ersek, kvmarm On Mon, Jun 27, 2016 at 03:10:20PM +0100, Peter Maydell wrote: > On 27 June 2016 at 14:49, Mark Rutland <mark.rutland@arm.com> wrote: > > On Mon, Jun 27, 2016 at 02:15:29PM +0100, Peter Maydell wrote: > >> I get the impression dma-coherent is the right thing to advertise > >> anyway. Do you have the documentation to hand that specifies what > >> "dma-coherent" means? The Documentation/devicetree docs in the > >> kernel tree seem to rather unhelpfully define it as "Present if > >> dma operations are coherent", which doesn't really clarify anything > >> to me... > > > > It's ill-defined today, and the precise definition is an open question. > > See replies to [1], which seems to have stalled as of [2]. > > > > My view is that for arm/arm64 this should mean the device makes accesses > > which are coherent with Inner Shareable Normal Inner-WB Outer-WB > > attributes, as this is the functional de-facto semantics today, and > > anything short of that is not well-defined or usable. > > OK, so for any emulated device in QEMU we should specify > dma-coherent by those rules. I think our only DMA devices > in the virt board are the emulated PCI devices; dma-coherent > here is a property of the pci-controller and applies to any > device on it, right? Presumably this means that if the host > pci-controller doesn't advertise itself as dma-coherent then > we cannot do any PCI passthrough of host hardware? > Someone suggested a while back to have a second PCI controller matching the host properties for this purpose... -Christoffer ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2016-06-29 7:07 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-24 14:04 issues with emulated PCI MMIO backed by host memory under KVM Ard Biesheuvel 2016-06-24 14:57 ` Andrew Jones 2016-06-27 8:17 ` Marc Zyngier 2016-06-24 18:16 ` Ard Biesheuvel 2016-06-25 7:15 ` Alexander Graf 2016-06-25 7:19 ` Alexander Graf 2016-06-27 8:11 ` Marc Zyngier 2016-06-27 9:16 ` Christoffer Dall 2016-06-27 9:47 ` Ard Biesheuvel 2016-06-27 10:34 ` Christoffer Dall 2016-06-27 12:30 ` Ard Biesheuvel 2016-06-27 13:35 ` Christoffer Dall 2016-06-27 13:57 ` Ard Biesheuvel 2016-06-27 14:29 ` Alexander Graf 2016-06-28 11:02 ` Laszlo Ersek 2016-06-28 10:04 ` Christoffer Dall 2016-06-28 11:06 ` Laszlo Ersek 2016-06-28 12:20 ` Christoffer Dall 2016-06-28 13:10 ` Catalin Marinas 2016-06-28 13:19 ` Ard Biesheuvel 2016-06-28 13:25 ` Catalin Marinas 2016-06-28 14:02 ` Andrew Jones 2016-06-27 14:24 ` Alexander Graf 2016-06-28 10:55 ` Laszlo Ersek 2016-06-28 13:14 ` Ard Biesheuvel 2016-06-28 13:32 ` Laszlo Ersek 2016-06-29 7:12 ` Gerd Hoffmann 2016-06-28 15:23 ` Alexander Graf 2016-06-27 13:15 ` Peter Maydell 2016-06-27 13:49 ` Mark Rutland 2016-06-27 14:10 ` Peter Maydell 2016-06-28 10:05 ` Christoffer Dall
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.