qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	ehabkost@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
	ghammer@redhat.com, Marcel Apfelbaum <marcel.a@redhat.com>,
	qemu-devel@nongnu.org, lcapitulino@redhat.com, lersek@redhat.com
Subject: Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
Date: Tue, 16 Feb 2016 14:36:49 +0200	[thread overview]
Message-ID: <56C317E1.1020602@redhat.com> (raw)
In-Reply-To: <20160216131737.7df40a1d@nial.brq.redhat.com>

On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 12:05:33 +0200
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
>>>> On Tue, 9 Feb 2016 14:17:44 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>
>>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
>>>>>>> So the linker interface solves this rather neatly:
>>>>>>> bios allocates memory, bios passes memory map to guest.
>>>>>>> Served us well for several years without need for extensions,
>>>>>>> and it does solve the VM GEN ID problem, even though
>>>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
>>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory
>>>>>> linker interface is fine for some readonly data, like ACPI tables
>>>>>> especially fixed tables not so for AML ones is one wants to patch it.
>>>>>>
>>>>>> However now when you want to use it for other purposes you start
>>>>>> adding extensions and other guest->QEMU channels to communicate
>>>>>> patching info back.
>>>>>> It steals guest's memory which is also not nice and doesn't scale well.
>>>>>
>>>>> This is an argument I don't get. memory is memory. call it guest memory
>>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
>>>>> but much slower.
>>>>>
>>>>> ...
>>>> It however matters for user, he pays for guest with XXX RAM but gets less
>>>> than that. And that will be getting worse as a number of such devices
>>>> increases.
>>>>
>>>>>>> OK fine, but returning PCI BAR address to guest is wrong.
>>>>>>> How about reading it from ACPI then? Is it really
>>>>>>> broken unless there's *also* a driver?
>>>>>> I don't get question, MS Spec requires address (ADDR method),
>>>>>> and it's read by ACPI (AML).
>>>>>
>>>>> You were unhappy about DMA into guest memory.
>>>>> As a replacement for DMA, we could have AML read from
>>>>> e.g. PCI and write into RAM.
>>>>> This way we don't need to pass address to QEMU.
>>>> That sounds better as it saves us from allocation of IO port
>>>> and QEMU don't need to write into guest memory, the only question is
>>>> if PCI_Config opregion would work with driver-less PCI device.
>>>
>>> Or PCI BAR for that reason. I don't know for sure.
>>>
>>>>
>>>> And it's still pretty much not test-able since it would require
>>>> fully running OSPM to execute AML side.
>>>
>>> AML is not testable, but that's nothing new.
>>> You can test reading from PCI.
>>>
>>>>>
>>>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
>>>>>> but I wouldn't be surprised if it doesn't, taking in account that
>>>>>> MS introduced _DSM doesn't.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>      Just compare with a graphics card design, where on device memory
>>>>>>>>>>      is mapped directly at some GPA not wasting RAM that guest could
>>>>>>>>>>      use for other tasks.
>>>>>>>>>
>>>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.
>>>>>>>>
>>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
>>>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
>>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
>>>>>>>> series (even PCI class id is the same)
>>>>>>>
>>>>>>> Don't know enough about graphics really, I'm not sure how these are
>>>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
>>>>>>> mostly use guest RAM, not on card RAM.
>>>>>>>
>>>>>>>>>>      VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>>>>>>>>>>      instead of consuming guest's RAM they should be mapped at
>>>>>>>>>>      some GPA and their memory accessed directly.
>>>>>>>>>
>>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
>>>>>>>>> address. This breaks the straight-forward approach of using a
>>>>>>>>> rebalanceable PCI BAR.
>>>>>>>>
>>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
>>>>>>>> otherwise OS will ignore it when rebalancing happens and
>>>>>>>> might map something else over ignored BAR.
>>>>>>>
>>>>>>> Does it disable the BAR then? Or just move it elsewhere?
>>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
>>>>>> another device with driver over it.
>>>>>
>>>>> Interesting. On classical PCI this is a forbidden configuration.
>>>>> Maybe we do something that confuses windows?
>>>>> Could you tell me how to reproduce this behaviour?
>>>> #cat > t << EOF
>>>> pci_update_mappings_del
>>>> pci_update_mappings_add
>>>> EOF
>>>>
>>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>>>>    -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>>>>    -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>>>>    -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
>>>>
>>>> wait till OS boots, note BARs programmed for ivshmem
>>>>    in my case it was
>>>>      01:01.0 0,0xfe800000+0x100
>>>> then execute script and watch pci_update_mappings* trace events
>>>>
>>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
>>>>
>>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
>>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
>>>> and then programs new BARs, where:
>>>>     pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
>>>> creates overlapping BAR with ivshmem
>>>
>>>
>>> Thanks!
>>> We need to figure this out because currently this does not
>>> work properly (or maybe it works, but merely by chance).
>>> Me and Marcel will play with this.
>>>
>>
>> I checked and indeed we have 2 separate problems:
>>
>> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
>>      for it, however it is not remapped on re-balancing.
> Does it really have a driver, i.e ivshmem specific one?
> It should have its own driver or otherwise userspace
> won't be able to access/work with it and it would be pointless
> to add such device to machine.

No, it does not.

>
>>      You can see on Device Manage 2 working devices with the same MMIO region - strange!
>>      This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
>>      Maybe we need to add a clue to the OS in ACPI regarding this range?
>>
>> 2. PCI devices with no driver installed are not re-mapped. This can be OK
>>      from the Windows point of view because Resources Window does not show the MMIO range
>>      for this device.
>>
>>      If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
>>      and have the same priority.
>>
>> We need to think about how to solve this.
>> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.
> deferring won't solve problem as rebalancing could happen later
> and make BARs overlap.

Why not? If we do not activate the BAR in firmware and Windows does not have a driver
for it, will not activate it at all, right?
Why would Windows activate the device BAR if it can't use it? At least this is what I hope.
Any other idea would be appreciated.


> I've noticed that at startup Windows unmaps and then maps BARs
> at the same addresses where BIOS've put them before.

Including devices without a working driver?


Thanks,
Marcel

>
>> And this does not solve the ivshmem problem.
> So far the only way to avoid overlapping BARs due to Windows
> doing rebalancing for driver-less devices is to pin such
> BARs statically with _CRS in ACPI table but as Michael said
> it fragments PCI address-space.
>
>>
>> Thanks,
>> Marcel
>>
>>
>>
>>
>>
>

  reply	other threads:[~2016-02-16 12:37 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 2/9] docs: vm generation id device's description Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device Igor Mammedov
2016-01-28 11:13   ` Michael S. Tsirkin
2016-01-28 12:03     ` Igor Mammedov
2016-01-28 12:59       ` Michael S. Tsirkin
2016-01-29 11:13         ` Igor Mammedov
2016-01-31 16:22           ` Michael S. Tsirkin
2016-02-02  9:59             ` Igor Mammedov
2016-02-02 11:16               ` Michael S. Tsirkin
2016-02-09 10:46                 ` Igor Mammedov
2016-02-09 12:17                   ` Michael S. Tsirkin
2016-02-11 15:16                     ` Igor Mammedov
2016-02-11 16:30                       ` Michael S. Tsirkin
2016-02-11 17:34                         ` Marcel Apfelbaum
2016-02-12  6:15                           ` Michael S. Tsirkin
2016-02-15 10:30                         ` Igor Mammedov
2016-02-15 11:26                           ` Michael S. Tsirkin
2016-02-15 13:56                             ` Igor Mammedov
2016-02-16 10:05                         ` Marcel Apfelbaum
2016-02-16 12:17                           ` Igor Mammedov
2016-02-16 12:36                             ` Marcel Apfelbaum [this message]
2016-02-16 13:51                               ` Igor Mammedov
2016-02-16 14:53                                 ` Michael S. Tsirkin
2016-02-16 15:10                               ` Michael S. Tsirkin
2016-02-10  8:51                   ` Michael S. Tsirkin
2016-02-10  9:28                     ` Michael S. Tsirkin
2016-02-10 10:00                       ` Laszlo Ersek
2016-01-28 13:48     ` Laszlo Ersek
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 4/9] tests: add a unit test for the vmgenid device Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands Igor Mammedov
2016-02-09 17:31   ` Eric Blake
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands Igor Mammedov
2016-02-09 17:33   ` Eric Blake
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 8/9] pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 9/9] pc/q53: by default put vmgenid device as an function of ISA bridge Igor Mammedov
2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
2016-01-28 14:02   ` Eduardo Habkost
2016-01-28 17:00     ` Igor Mammedov
2016-02-03 17:55       ` [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly) Eduardo Habkost
2016-02-03 18:46         ` Laszlo Ersek
2016-02-03 19:06         ` Michael S. Tsirkin
2016-02-04 11:31           ` Paolo Bonzini
2016-02-04 11:41             ` Andreas Färber
2016-02-04 11:55               ` Paolo Bonzini
2016-02-04 12:06                 ` Michael S. Tsirkin
2016-02-05  7:49                   ` Markus Armbruster
2016-02-05  7:51                     ` Marcel Apfelbaum
2016-02-11 19:41                       ` Eduardo Habkost
2016-02-12  9:17                         ` Marcel Apfelbaum
2016-02-12 11:22                           ` Andreas Färber
2016-02-12 18:17                             ` Eduardo Habkost
2016-02-12 22:30                               ` Paolo Bonzini
2016-02-12 18:09                           ` Eduardo Habkost
2016-02-05  7:52                 ` Markus Armbruster
2016-02-04 12:03               ` Michael S. Tsirkin
2016-02-04 12:12               ` Marcel Apfelbaum
2016-01-29 12:51   ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C317E1.1020602@redhat.com \
    --to=marcel@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=ghammer@redhat.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=imammedo@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=lersek@redhat.com \
    --cc=marcel.a@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).