From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52845)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aU70c-0007hF-4J
	for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aU70Y-0008F2-Ri
	for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:14 -0500
Received: from mx1.redhat.com ([209.132.183.28]:56608)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aU70Y-0008Ew-Kd
	for qemu-devel@nongnu.org; Fri, 12 Feb 2016 01:15:10 -0500
Date: Fri, 12 Feb 2016 08:15:04 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20160212003752-mutt-send-email-mst@redhat.com>
References: <20160128145348-mutt-send-email-mst@redhat.com>
	<20160129121359.17842fef@nial.brq.redhat.com>
	<20160131170118-mutt-send-email-mst@redhat.com>
	<20160202105953.476a05bd@nial.brq.redhat.com>
	<20160202123756-mutt-send-email-mst@redhat.com>
	<20160209114608.4f89b528@nial.brq.redhat.com>
	<20160209131656-mutt-send-email-mst@redhat.com>
	<20160211161605.0022ed38@nial.brq.redhat.com>
	<20160211180836-mutt-send-email-mst@redhat.com>
	<56BCC63C.4060702@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56BCC63C.4060702@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine
 Generation ID device
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Marcel Apfelbaum <marcel@redhat.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>, ehabkost@redhat.com, Marcel Apfelbaum <marcel.a@redhat.com>, ghammer@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, Igor Mammedov <imammedo@redhat.com>, lersek@redhat.com

On Thu, Feb 11, 2016 at 07:34:52PM +0200, Marcel Apfelbaum wrote:
> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> >On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> >>On Tue, 9 Feb 2016 14:17:44 +0200
> >>"Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >>>On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> >>>>>So the linker interface solves this rather neatly:
> >>>>>bios allocates memory, bios passes memory map to guest.
> >>>>>Served us well for several years without need for extensions,
> >>>>>and it does solve the VM GEN ID problem, even though
> >>>>>1. it was never designed for huge areas like nvdimm seems to want to use
> >>>>>2. we might want to add a new 64 bit flag to avoid touching low memory
> >>>>linker interface is fine for some readonly data, like ACPI tables
> >>>>especially fixed tables not so for AML ones is one wants to patch it.
> >>>>
> >>>>However now when you want to use it for other purposes you start
> >>>>adding extensions and other guest->QEMU channels to communicate
> >>>>patching info back.
> >>>>It steals guest's memory which is also not nice and doesn't scale well.
> >>>
> >>>This is an argument I don't get. memory is memory. call it guest memory
> >>>or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>>but much slower.
> >>>
> >>>...
> >>It however matters for user, he pays for guest with XXX RAM but gets less
> >>than that. And that will be getting worse as a number of such devices
> >>increases.
> >>
> >>>>>OK fine, but returning PCI BAR address to guest is wrong.
> >>>>>How about reading it from ACPI then? Is it really
> >>>>>broken unless there's *also* a driver?
> >>>>I don't get question, MS Spec requires address (ADDR method),
> >>>>and it's read by ACPI (AML).
> >>>
> >>>You were unhappy about DMA into guest memory.
> >>>As a replacement for DMA, we could have AML read from
> >>>e.g. PCI and write into RAM.
> >>>This way we don't need to pass address to QEMU.
> >>That sounds better as it saves us from allocation of IO port
> >>and QEMU don't need to write into guest memory, the only question is
> >>if PCI_Config opregion would work with driver-less PCI device.
> >
> >Or PCI BAR for that reason. I don't know for sure.
> >
> >>
> >>And it's still pretty much not test-able since it would require
> >>fully running OSPM to execute AML side.
> >
> >AML is not testable, but that's nothing new.
> >You can test reading from PCI.
> >
> >>>
> >>>>As for working PCI_Config OpRegion without driver, I haven't tried,
> >>>>but I wouldn't be surprised if it doesn't, taking in account that
> >>>>MS introduced _DSM doesn't.
> >>>>
> >>>>>
> >>>>>
> >>>>>>>>    Just compare with a graphics card design, where on device memory
> >>>>>>>>    is mapped directly at some GPA not wasting RAM that guest could
> >>>>>>>>    use for other tasks.
> >>>>>>>
> >>>>>>>This might have been true 20 years ago.  Most modern cards do DMA.
> >>>>>>
> >>>>>>Modern cards, with it's own RAM, map its VRAM in address space directly
> >>>>>>and allow users use it (GEM API). So they do not waste conventional RAM.
> >>>>>>For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >>>>>>series (even PCI class id is the same)
> >>>>>
> >>>>>Don't know enough about graphics really, I'm not sure how these are
> >>>>>relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >>>>>mostly use guest RAM, not on card RAM.
> >>>>>
> >>>>>>>>    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >>>>>>>>    instead of consuming guest's RAM they should be mapped at
> >>>>>>>>    some GPA and their memory accessed directly.
> >>>>>>>
> >>>>>>>VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>>>>>>address. This breaks the straight-forward approach of using a
> >>>>>>>rebalanceable PCI BAR.
> >>>>>>
> >>>>>>For PCI rebalance to work on Windows, one has to provide working PCI driver
> >>>>>>otherwise OS will ignore it when rebalancing happens and
> >>>>>>might map something else over ignored BAR.
> >>>>>
> >>>>>Does it disable the BAR then? Or just move it elsewhere?
> >>>>it doesn't, it just blindly ignores BARs existence and maps BAR of
> >>>>another device with driver over it.
> >>>
> >>>Interesting. On classical PCI this is a forbidden configuration.
> >>>Maybe we do something that confuses windows?
> >>>Could you tell me how to reproduce this behaviour?
> >>#cat > t << EOF
> >>pci_update_mappings_del
> >>pci_update_mappings_add
> >>EOF
> >>
> >>#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>
> >>wait till OS boots, note BARs programmed for ivshmem
> >>  in my case it was
> >>    01:01.0 0,0xfe800000+0x100
> >>then execute script and watch pci_update_mappings* trace events
> >>
> >># for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> >>
> >>hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> >>Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> >>and then programs new BARs, where:
> >>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> >>creates overlapping BAR with ivshmem
> >
> 
> Hi,
> 
> Let me see if I understand.
> You say that in Windows, if a device does not have a driver installed,
> its BARS ranges can be used after re-balancing by other devices, right?
> 
> If yes, in Windows we cannot use the device anyway, so we shouldn't care, right?

If e1000 (has driver) overlaps ibshmem (no driver) we have
a problem as e1000 won't work, or will, but mostly by luck.

> Our only problem remains the overlapping memory regions for
> the old device and the new one and we need to ensure only the new device
> will use these region?
> 
> 
> Thanks,
> Marcel
> 
> >
> >Thanks!
> >We need to figure this out because currently this does not
> >work properly (or maybe it works, but merely by chance).
> >Me and Marcel will play with this.
> >
> [...]