On 2011-05-19 20:40, Gleb Natapov wrote: > On Thu, May 19, 2011 at 08:27:50PM +0200, Jan Kiszka wrote: >> On 2011-05-19 20:22, Gleb Natapov wrote: >>> On Thu, May 19, 2011 at 08:11:39PM +0200, Jan Kiszka wrote: >>>> On 2011-05-19 19:39, Gleb Natapov wrote: >>>>> On Thu, May 19, 2011 at 03:37:58PM +0200, Jan Kiszka wrote: >>>>>> On 2011-05-19 15:36, Anthony Liguori wrote: >>>>>>> On 05/18/2011 02:40 PM, Jan Kiszka wrote: >>>>>>>> On 2011-05-18 15:12, Avi Kivity wrote: >>>>>>>>> void cpu_register_memory_region(MemoryRegion *mr, target_phys_addr_t >>>>>>>>> addr); >>>>>>>> >>>>>>>> OK, let's allow overlapping, but make it explicit: >>>>>>>> >>>>>>>> void cpu_register_memory_region_overlap(MemoryRegion *mr, >>>>>>>> target_phys_addr_t addr, >>>>>>>> int priority); >>>>>>> >>>>>>> The device doesn't actually know how overlapping is handled. This is >>>>>>> based on the bus hierarchy. >>>>>> >>>>>> Devices don't register their regions, buses do. >>>>>> >>>>> Today PCI device may register region that overlaps with any other >>>>> registered memory region without even knowing it. Guest can write any >>>>> RAM address into PCI BAR and this RAM address will be come mmio are. More >>>>> interesting is what happens when guest reprogram PCI BAR to other address >>>>> - the RAM that was at the previous address just disappears. Obviously >>>>> this is crazy behaviour, but the question is how do we want to handle >>>>> it? One option is to disallow such overlapping registration, another is >>>>> to restore RAM mapping after PCI BAR is reprogrammed. If we chose second >>>>> one the PCI will not know that _overlap() should be called. >>>> >>>> BARs may overlap with other BARs or with RAM. That's well-known, so PCI >>>> bridged need to register their regions with the _overlap variant >>>> unconditionally. In contrast to the current PhysPageDesc mechanism, the >>> With what priority? If it needs to call _overlap unconditionally why not >>> always call _overlap and drop not _overlap variant? >> >> Because we should catch accidental overlaps in all those non PCI devices >> with hard-wired addressing. That's a bug in the device/machine model and >> should be reported as such by QEMU. > Why should we complicate API to catch unlikely errors? If you want to > debug that add capability to dump memory map from the monitor. Because we need to switch tons of code that so far saw a fairly different reaction of the core to overlapping regions. > >> >>> >>>> new region management will not cause any harm to overlapping regions so >>>> that they can "recover" when the overlap is gone. >>>> >>>>> >>>>> Another example may be APIC region and PCI. They overlap, but neither >>>>> CPU nor PCI knows about it. >>>> >>>> And they do not need to. The APIC regions will be managed by the per-CPU >>>> region management, reusing the tool box we need for all bridges. It will >>>> register the APIC page with a priority higher than the default one, thus >>>> overriding everything that comes from the host bridge. I think that >>>> reflects pretty well real machine behaviour. >>>> >>> What is "higher"? How does it know that priority is high enough? >> >> Because no one else manages priorities at a specific hierarchy level. >> There is only one. >> > PCI and CPU are on different hierarchy levels. PCI is under the PIIX and > CPU is on a system BUS. The priority for the APIC mapping will be applied at CPU level, of course. So it will override everything, not just PCI. > >>> I >>> thought, from reading other replies, that priorities are meaningful >>> only on the same hierarchy level (which kinda make sense), but now you >>> are saying that you will override PCI address from another part of >>> the topology? >> >> Everything from below in the hierarchy is fed in with default priority, >> the lowest one. So to let some region created at this level override >> those regions, just pick default+1. If you want to create more overlay >> levels (can't imagine a good scenario, though), pick >> default+1..default+n. It's really that simple. >> > Except that PCI and CPU are not on the same level. See above. Jan