From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVh0S-0000lw-QG for qemu-devel@nongnu.org; Tue, 16 Feb 2016 09:53:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVh0N-0005je-Kv for qemu-devel@nongnu.org; Tue, 16 Feb 2016 09:53:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35447) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVh0N-0005jR-B7 for qemu-devel@nongnu.org; Tue, 16 Feb 2016 09:53:31 -0500 Date: Tue, 16 Feb 2016 16:53:25 +0200 From: "Michael S. Tsirkin" Message-ID: <20160216163727-mutt-send-email-mst@redhat.com> References: <20160202105953.476a05bd@nial.brq.redhat.com> <20160202123756-mutt-send-email-mst@redhat.com> <20160209114608.4f89b528@nial.brq.redhat.com> <20160209131656-mutt-send-email-mst@redhat.com> <20160211161605.0022ed38@nial.brq.redhat.com> <20160211180836-mutt-send-email-mst@redhat.com> <56C2F46D.4080907@redhat.com> <20160216131737.7df40a1d@nial.brq.redhat.com> <56C317E1.1020602@redhat.com> <20160216145125.7f12882f@nial.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160216145125.7f12882f@nial.brq.redhat.com> Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: Xiao Guangrong , ehabkost@redhat.com, Marcel Apfelbaum , ghammer@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, Marcel Apfelbaum , lersek@redhat.com On Tue, Feb 16, 2016 at 02:51:25PM +0100, Igor Mammedov wrote: > On Tue, 16 Feb 2016 14:36:49 +0200 > Marcel Apfelbaum wrote: > > > On 02/16/2016 02:17 PM, Igor Mammedov wrote: > > > On Tue, 16 Feb 2016 12:05:33 +0200 > > > Marcel Apfelbaum wrote: > > > > > >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote: > > >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote: > > >>>> On Tue, 9 Feb 2016 14:17:44 +0200 > > >>>> "Michael S. Tsirkin" wrote: > > >>>> > > >>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote: > > >>>>>>> So the linker interface solves this rather neatly: > > >>>>>>> bios allocates memory, bios passes memory map to guest. > > >>>>>>> Served us well for several years without need for extensions, > > >>>>>>> and it does solve the VM GEN ID problem, even though > > >>>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use > > >>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory > > >>>>>> linker interface is fine for some readonly data, like ACPI tables > > >>>>>> especially fixed tables not so for AML ones is one wants to patch it. > > >>>>>> > > >>>>>> However now when you want to use it for other purposes you start > > >>>>>> adding extensions and other guest->QEMU channels to communicate > > >>>>>> patching info back. > > >>>>>> It steals guest's memory which is also not nice and doesn't scale well. > > >>>>> > > >>>>> This is an argument I don't get. memory is memory. call it guest memory > > >>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course > > >>>>> but much slower. > > >>>>> > > >>>>> ... > > >>>> It however matters for user, he pays for guest with XXX RAM but gets less > > >>>> than that. And that will be getting worse as a number of such devices > > >>>> increases. > > >>>> > > >>>>>>> OK fine, but returning PCI BAR address to guest is wrong. > > >>>>>>> How about reading it from ACPI then? Is it really > > >>>>>>> broken unless there's *also* a driver? > > >>>>>> I don't get question, MS Spec requires address (ADDR method), > > >>>>>> and it's read by ACPI (AML). > > >>>>> > > >>>>> You were unhappy about DMA into guest memory. > > >>>>> As a replacement for DMA, we could have AML read from > > >>>>> e.g. PCI and write into RAM. > > >>>>> This way we don't need to pass address to QEMU. > > >>>> That sounds better as it saves us from allocation of IO port > > >>>> and QEMU don't need to write into guest memory, the only question is > > >>>> if PCI_Config opregion would work with driver-less PCI device. > > >>> > > >>> Or PCI BAR for that reason. I don't know for sure. > > >>> > > >>>> > > >>>> And it's still pretty much not test-able since it would require > > >>>> fully running OSPM to execute AML side. > > >>> > > >>> AML is not testable, but that's nothing new. > > >>> You can test reading from PCI. > > >>> > > >>>>> > > >>>>>> As for working PCI_Config OpRegion without driver, I haven't tried, > > >>>>>> but I wouldn't be surprised if it doesn't, taking in account that > > >>>>>> MS introduced _DSM doesn't. > > >>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>>>>> Just compare with a graphics card design, where on device memory > > >>>>>>>>>> is mapped directly at some GPA not wasting RAM that guest could > > >>>>>>>>>> use for other tasks. > > >>>>>>>>> > > >>>>>>>>> This might have been true 20 years ago. Most modern cards do DMA. > > >>>>>>>> > > >>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly > > >>>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM. > > >>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this > > >>>>>>>> series (even PCI class id is the same) > > >>>>>>> > > >>>>>>> Don't know enough about graphics really, I'm not sure how these are > > >>>>>>> relevant. NICs and disks certainly do DMA. And virtio gl seems to > > >>>>>>> mostly use guest RAM, not on card RAM. > > >>>>>>> > > >>>>>>>>>> VMGENID and NVDIMM use-cases look to me exactly the same, i.e. > > >>>>>>>>>> instead of consuming guest's RAM they should be mapped at > > >>>>>>>>>> some GPA and their memory accessed directly. > > >>>>>>>>> > > >>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed > > >>>>>>>>> address. This breaks the straight-forward approach of using a > > >>>>>>>>> rebalanceable PCI BAR. > > >>>>>>>> > > >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver > > >>>>>>>> otherwise OS will ignore it when rebalancing happens and > > >>>>>>>> might map something else over ignored BAR. > > >>>>>>> > > >>>>>>> Does it disable the BAR then? Or just move it elsewhere? > > >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of > > >>>>>> another device with driver over it. > > >>>>> > > >>>>> Interesting. On classical PCI this is a forbidden configuration. > > >>>>> Maybe we do something that confuses windows? > > >>>>> Could you tell me how to reproduce this behaviour? > > >>>> #cat > t << EOF > > >>>> pci_update_mappings_del > > >>>> pci_update_mappings_add > > >>>> EOF > > >>>> > > >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > > >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > > >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > > >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > > >>>> > > >>>> wait till OS boots, note BARs programmed for ivshmem > > >>>> in my case it was > > >>>> 01:01.0 0,0xfe800000+0x100 > > >>>> then execute script and watch pci_update_mappings* trace events > > >>>> > > >>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > > >>>> > > >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > > >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > > >>>> and then programs new BARs, where: > > >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > > >>>> creates overlapping BAR with ivshmem > > >>> > > >>> > > >>> Thanks! > > >>> We need to figure this out because currently this does not > > >>> work properly (or maybe it works, but merely by chance). > > >>> Me and Marcel will play with this. > > >>> > > >> > > >> I checked and indeed we have 2 separate problems: > > >> > > >> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers > > >> for it, however it is not remapped on re-balancing. > > > Does it really have a driver, i.e ivshmem specific one? > > > It should have its own driver or otherwise userspace > > > won't be able to access/work with it and it would be pointless > > > to add such device to machine. > > > > No, it does not. > so it's "PCI RAM controller", which is marked as NODRV in INF file, > NODRV they use as a stub to prevent Windows asking for driver assuming > that HW owns/manages device. > And when rebalancing happens Windows completely ignores NODRV > BARs which causes overlapping with devices that have PCI drivers. But that can't work for classic pci: if BARs overlap, behaviour is undefined. We do something that windows does not expect that makes it create this setup. Is this something enabling BARs in fimware? Something else? > > > > > > > >> You can see on Device Manage 2 working devices with the same MMIO region - strange! > > >> This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden. > > >> Maybe we need to add a clue to the OS in ACPI regarding this range? > > >> > > >> 2. PCI devices with no driver installed are not re-mapped. This can be OK > > >> from the Windows point of view because Resources Window does not show the MMIO range > > >> for this device. > > >> > > >> If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range > > >> and have the same priority. > > >> > > >> We need to think about how to solve this. > > >> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences. > > > deferring won't solve problem as rebalancing could happen later > > > and make BARs overlap. > > > > Why not? If we do not activate the BAR in firmware and Windows does not have a driver > > for it, will not activate it at all, right? > > Why would Windows activate the device BAR if it can't use it? At least this is what I hope. > > Any other idea would be appreciated. > > > > > > > I've noticed that at startup Windows unmaps and then maps BARs > > > at the same addresses where BIOS've put them before. > > > > Including devices without a working driver? > I've just tried, it does so for ivshmem. > > > > > > > Thanks, > > Marcel > > > > > > > >> And this does not solve the ivshmem problem. > > > So far the only way to avoid overlapping BARs due to Windows > > > doing rebalancing for driver-less devices is to pin such > > > BARs statically with _CRS in ACPI table but as Michael said > > > it fragments PCI address-space. > > > > > >> > > >> Thanks, > > >> Marcel > > >> > > >> > > >> > > >> > > >> > > > > > > >