From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52845) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aU70c-0007hF-4J for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aU70Y-0008F2-Ri for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56608) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aU70Y-0008Ew-Kd for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:10 -0500 Date: Fri, 12 Feb 2016 08:15:04 +0200 From: "Michael S. Tsirkin" Message-ID: <20160212003752-mutt-send-email-mst@redhat.com> References: <20160128145348-mutt-send-email-mst@redhat.com> <20160129121359.17842fef@nial.brq.redhat.com> <20160131170118-mutt-send-email-mst@redhat.com> <20160202105953.476a05bd@nial.brq.redhat.com> <20160202123756-mutt-send-email-mst@redhat.com> <20160209114608.4f89b528@nial.brq.redhat.com> <20160209131656-mutt-send-email-mst@redhat.com> <20160211161605.0022ed38@nial.brq.redhat.com> <20160211180836-mutt-send-email-mst@redhat.com> <56BCC63C.4060702@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56BCC63C.4060702@redhat.com> Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcel Apfelbaum Cc: Xiao Guangrong , ehabkost@redhat.com, Marcel Apfelbaum , ghammer@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, Igor Mammedov , lersek@redhat.com On Thu, Feb 11, 2016 at 07:34:52PM +0200, Marcel Apfelbaum wrote: > On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote: > >On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote: > >>On Tue, 9 Feb 2016 14:17:44 +0200 > >>"Michael S. Tsirkin" wrote: > >> > >>>On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote: > >>>>>So the linker interface solves this rather neatly: > >>>>>bios allocates memory, bios passes memory map to guest. > >>>>>Served us well for several years without need for extensions, > >>>>>and it does solve the VM GEN ID problem, even though > >>>>>1. it was never designed for huge areas like nvdimm seems to want to use > >>>>>2. we might want to add a new 64 bit flag to avoid touching low memory > >>>>linker interface is fine for some readonly data, like ACPI tables > >>>>especially fixed tables not so for AML ones is one wants to patch it. > >>>> > >>>>However now when you want to use it for other purposes you start > >>>>adding extensions and other guest->QEMU channels to communicate > >>>>patching info back. > >>>>It steals guest's memory which is also not nice and doesn't scale well. > >>> > >>>This is an argument I don't get. memory is memory. call it guest memory > >>>or RAM backed PCI BAR - same thing. MMIO is cheaper of course > >>>but much slower. > >>> > >>>... > >>It however matters for user, he pays for guest with XXX RAM but gets less > >>than that. And that will be getting worse as a number of such devices > >>increases. > >> > >>>>>OK fine, but returning PCI BAR address to guest is wrong. > >>>>>How about reading it from ACPI then? Is it really > >>>>>broken unless there's *also* a driver? > >>>>I don't get question, MS Spec requires address (ADDR method), > >>>>and it's read by ACPI (AML). > >>> > >>>You were unhappy about DMA into guest memory. > >>>As a replacement for DMA, we could have AML read from > >>>e.g. PCI and write into RAM. > >>>This way we don't need to pass address to QEMU. > >>That sounds better as it saves us from allocation of IO port > >>and QEMU don't need to write into guest memory, the only question is > >>if PCI_Config opregion would work with driver-less PCI device. > > > >Or PCI BAR for that reason. I don't know for sure. > > > >> > >>And it's still pretty much not test-able since it would require > >>fully running OSPM to execute AML side. > > > >AML is not testable, but that's nothing new. > >You can test reading from PCI. > > > >>> > >>>>As for working PCI_Config OpRegion without driver, I haven't tried, > >>>>but I wouldn't be surprised if it doesn't, taking in account that > >>>>MS introduced _DSM doesn't. > >>>> > >>>>> > >>>>> > >>>>>>>> Just compare with a graphics card design, where on device memory > >>>>>>>> is mapped directly at some GPA not wasting RAM that guest could > >>>>>>>> use for other tasks. > >>>>>>> > >>>>>>>This might have been true 20 years ago. Most modern cards do DMA. > >>>>>> > >>>>>>Modern cards, with it's own RAM, map its VRAM in address space directly > >>>>>>and allow users use it (GEM API). So they do not waste conventional RAM. > >>>>>>For example NVIDIA VRAM is mapped as PCI BARs the same way like in this > >>>>>>series (even PCI class id is the same) > >>>>> > >>>>>Don't know enough about graphics really, I'm not sure how these are > >>>>>relevant. NICs and disks certainly do DMA. And virtio gl seems to > >>>>>mostly use guest RAM, not on card RAM. > >>>>> > >>>>>>>> VMGENID and NVDIMM use-cases look to me exactly the same, i.e. > >>>>>>>> instead of consuming guest's RAM they should be mapped at > >>>>>>>> some GPA and their memory accessed directly. > >>>>>>> > >>>>>>>VMGENID is tied to a spec that rather arbitrarily asks for a fixed > >>>>>>>address. This breaks the straight-forward approach of using a > >>>>>>>rebalanceable PCI BAR. > >>>>>> > >>>>>>For PCI rebalance to work on Windows, one has to provide working PCI driver > >>>>>>otherwise OS will ignore it when rebalancing happens and > >>>>>>might map something else over ignored BAR. > >>>>> > >>>>>Does it disable the BAR then? Or just move it elsewhere? > >>>>it doesn't, it just blindly ignores BARs existence and maps BAR of > >>>>another device with driver over it. > >>> > >>>Interesting. On classical PCI this is a forbidden configuration. > >>>Maybe we do something that confuses windows? > >>>Could you tell me how to reproduce this behaviour? > >>#cat > t << EOF > >>pci_update_mappings_del > >>pci_update_mappings_add > >>EOF > >> > >>#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > >> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > >> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > >> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > >> > >>wait till OS boots, note BARs programmed for ivshmem > >> in my case it was > >> 01:01.0 0,0xfe800000+0x100 > >>then execute script and watch pci_update_mappings* trace events > >> > >># for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > >> > >>hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > >>Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > >>and then programs new BARs, where: > >> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > >>creates overlapping BAR with ivshmem > > > > Hi, > > Let me see if I understand. > You say that in Windows, if a device does not have a driver installed, > its BARS ranges can be used after re-balancing by other devices, right? > > If yes, in Windows we cannot use the device anyway, so we shouldn't care, right? If e1000 (has driver) overlaps ibshmem (no driver) we have a problem as e1000 won't work, or will, but mostly by luck. > Our only problem remains the overlapping memory regions for > the old device and the new one and we need to ensure only the new device > will use these region? > > > Thanks, > Marcel > > > > >Thanks! > >We need to figure this out because currently this does not > >work properly (or maybe it works, but merely by chance). > >Me and Marcel will play with this. > > > [...]