All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug]  Intel RMRR support with upstream Qemu
@ 2017-07-21 10:57 Zhang, Xiong Y
  2017-07-21 13:28 ` Alexey G
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Xiong Y @ 2017-07-21 10:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Zhang, Xiong Y


[-- Attachment #1.1: Type: text/plain, Size: 1007 bytes --]

On an intel skylake machine with upstream qemu, if I add "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't boot up and continues reboot.

Steps to reproduce this issue:

1)       Boot xen with iommu=1 to enable iommu

2)       hvm.cfg contain:

builder="hvm"

memory=xxxx

disk=['win8.1 img']

device_model_override='qemu-system-i386'

device_model_version='qemu-xen'

rdm="strategy=host,policy=strict"

3)       xl cr hvm.cfg

Conditions to reproduce this issue:

1)       DomU memory size > the top address of RMRR. Otherwise, this issue will disappear.

2)       rdm=" strategy=host,policy=strict" should exist

3)       Windows DomU.  Linux DomU doesn't have such issue.

4)       Upstream qemu.  Traditional qemu doesn't have such issue.

In this situation, hvmloader will relocate some guest ram below RMRR to high memory, and it seems window guest access an invalid address.
Could someone give me some suggestions on how to debug this ?

thanks

[-- Attachment #1.2: Type: text/html, Size: 10324 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug]  Intel RMRR support with upstream Qemu
  2017-07-21 10:57 [Bug] Intel RMRR support with upstream Qemu Zhang, Xiong Y
@ 2017-07-21 13:28 ` Alexey G
  2017-07-21 13:56   ` Alexey G
  0 siblings, 1 reply; 18+ messages in thread
From: Alexey G @ 2017-07-21 13:28 UTC (permalink / raw)
  To: Zhang, Xiong Y; +Cc: xen-devel

Hi,

On Fri, 21 Jul 2017 10:57:55 +0000
"Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:

> On an intel skylake machine with upstream qemu, if I add
> "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't boot
> up and continues reboot.
> 
> Steps to reproduce this issue:
> 
> 1)       Boot xen with iommu=1 to enable iommu
> 2)       hvm.cfg contain:
> 
> builder="hvm"
> 
> memory=xxxx
> 
> disk=['win8.1 img']
> 
> device_model_override='qemu-system-i386'
> 
> device_model_version='qemu-xen'
> 
> rdm="strategy=host,policy=strict"
> 
> 3)       xl cr hvm.cfg
> 
> Conditions to reproduce this issue:
> 
> 1)       DomU memory size > the top address of RMRR. Otherwise, this
> issue will disappear.
> 2)       rdm=" strategy=host,policy=strict" should exist
> 3)       Windows DomU.  Linux DomU doesn't have such issue.
> 4)       Upstream qemu.  Traditional qemu doesn't have such issue.
> 
> In this situation, hvmloader will relocate some guest ram below RMRR to
> high memory, and it seems window guest access an invalid address. Could
> someone give me some suggestions on how to debug this ?

You're likely have RMRR range(s) below 2GB boundary.

You may try the following:

1. Specify some large 'mmio_hole' value in your domain configuration file,
ex. mmio_hole=2560
2. If it won't help, 'xl dmesg' output might come useful

Right now upstream QEMU still doesn't support relocation of parts
of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring anything
good for HVM guest.

Setting the mmio_hole size manually allows to create a "predefined"
memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR overlaps
or RAM relocation in hvmloader, so this might help.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug]  Intel RMRR support with upstream Qemu
  2017-07-21 13:28 ` Alexey G
@ 2017-07-21 13:56   ` Alexey G
  2017-07-24  8:07     ` Zhang, Xiong Y
  0 siblings, 1 reply; 18+ messages in thread
From: Alexey G @ 2017-07-21 13:56 UTC (permalink / raw)
  To: Zhang, Xiong Y; +Cc: xen-devel

> On Fri, 21 Jul 2017 10:57:55 +0000
> "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
> 
> > On an intel skylake machine with upstream qemu, if I add
> > "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't
> > boot up and continues reboot.
> > 
> > Steps to reproduce this issue:
> > 
> > 1)       Boot xen with iommu=1 to enable iommu
> > 2)       hvm.cfg contain:
> > 
> > builder="hvm"
> > 
> > memory=xxxx
> > 
> > disk=['win8.1 img']
> > 
> > device_model_override='qemu-system-i386'
> > 
> > device_model_version='qemu-xen'
> > 
> > rdm="strategy=host,policy=strict"
> > 
> > 3)       xl cr hvm.cfg
> > 
> > Conditions to reproduce this issue:
> > 
> > 1)       DomU memory size > the top address of RMRR. Otherwise, this
> > issue will disappear.
> > 2)       rdm=" strategy=host,policy=strict" should exist
> > 3)       Windows DomU.  Linux DomU doesn't have such issue.
> > 4)       Upstream qemu.  Traditional qemu doesn't have such issue.
> > 
> > In this situation, hvmloader will relocate some guest ram below RMRR to
> > high memory, and it seems window guest access an invalid address. Could
> > someone give me some suggestions on how to debug this ?  
> 
> You're likely have RMRR range(s) below 2GB boundary.
> 
> You may try the following:
> 
> 1. Specify some large 'mmio_hole' value in your domain configuration file,
> ex. mmio_hole=2560
> 2. If it won't help, 'xl dmesg' output might come useful
> 
> Right now upstream QEMU still doesn't support relocation of parts
> of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
> AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring
> anything good for HVM guest.
> 
> Setting the mmio_hole size manually allows to create a "predefined"
> memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
> hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR overlaps
> or RAM relocation in hvmloader, so this might help.

Wrote too soon, "policy=strict" means that you won't be able to create a
DomU if RMRR was below 2G... so it's actually should be above 2GB. Anyway,
try setting mmio_hole size.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug]  Intel RMRR support with upstream Qemu
  2017-07-21 13:56   ` Alexey G
@ 2017-07-24  8:07     ` Zhang, Xiong Y
  2017-07-24  9:53       ` Igor Druzhinin
  2017-07-24 16:24       ` Alexey G
  0 siblings, 2 replies; 18+ messages in thread
From: Zhang, Xiong Y @ 2017-07-24  8:07 UTC (permalink / raw)
  To: Alexey G; +Cc: Zhang, Xiong Y, xen-devel

> > On Fri, 21 Jul 2017 10:57:55 +0000
> > "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
> >
> > > On an intel skylake machine with upstream qemu, if I add
> > > "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't
> > > boot up and continues reboot.
> > >
> > > Steps to reproduce this issue:
> > >
> > > 1)       Boot xen with iommu=1 to enable iommu
> > > 2)       hvm.cfg contain:
> > >
> > > builder="hvm"
> > >
> > > memory=xxxx
> > >
> > > disk=['win8.1 img']
> > >
> > > device_model_override='qemu-system-i386'
> > >
> > > device_model_version='qemu-xen'
> > >
> > > rdm="strategy=host,policy=strict"
> > >
> > > 3)       xl cr hvm.cfg
> > >
> > > Conditions to reproduce this issue:
> > >
> > > 1)       DomU memory size > the top address of RMRR. Otherwise, this
> > > issue will disappear.
> > > 2)       rdm=" strategy=host,policy=strict" should exist
> > > 3)       Windows DomU.  Linux DomU doesn't have such issue.
> > > 4)       Upstream qemu.  Traditional qemu doesn't have such issue.
> > >
> > > In this situation, hvmloader will relocate some guest ram below RMRR to
> > > high memory, and it seems window guest access an invalid address. Could
> > > someone give me some suggestions on how to debug this ?
> >
> > You're likely have RMRR range(s) below 2GB boundary.
> >
> > You may try the following:
> >
> > 1. Specify some large 'mmio_hole' value in your domain configuration file,
> > ex. mmio_hole=2560
> > 2. If it won't help, 'xl dmesg' output might come useful
> >
> > Right now upstream QEMU still doesn't support relocation of parts
> > of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
> > AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring
> > anything good for HVM guest.
> >
> > Setting the mmio_hole size manually allows to create a "predefined"
> > memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
> > hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR
> overlaps
> > or RAM relocation in hvmloader, so this might help.
> 
> Wrote too soon, "policy=strict" means that you won't be able to create a
> DomU if RMRR was below 2G... so it's actually should be above 2GB. Anyway,
> try setting mmio_hole size.
[Zhang, Xiong Y] Thanks for your suggestion.
Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
For this I still have two questions, could you help me ?
1) If hvmloader do low memory relocation, hvmloader and qemu will see a different guest memory layout . So qemu ram maybe overlop with mmio, does xen have plan to fix this ?

2) Just now, I did an experiment: In hvmloader, I set HVM_BELOW_4G_RAM_END to 3G and reserve one area for qemu_ram_allocate like 0xF0000000 ~ 0xFC000000; In Qemu, I modified xen_ram_alloc() to make sure it only allocate gfn in 0xF0000000 ~ 0xFC000000. In this case qemu_ram won't overlap with mmio, but this workaround couldn't fix my issue.
 It seems qemu still has another interface to allocate gfn except xen_ram_alloc(), do you know this interface ?

thanks

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24  8:07     ` Zhang, Xiong Y
@ 2017-07-24  9:53       ` Igor Druzhinin
  2017-07-24 10:49         ` Zhang, Xiong Y
  2017-07-24 16:42         ` Alexey G
  2017-07-24 16:24       ` Alexey G
  1 sibling, 2 replies; 18+ messages in thread
From: Igor Druzhinin @ 2017-07-24  9:53 UTC (permalink / raw)
  To: Zhang, Xiong Y, Alexey G; +Cc: xen-devel

On 24/07/17 09:07, Zhang, Xiong Y wrote:
>>> On Fri, 21 Jul 2017 10:57:55 +0000
>>> "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
>>>
>>>> On an intel skylake machine with upstream qemu, if I add
>>>> "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't
>>>> boot up and continues reboot.
>>>>
>>>> Steps to reproduce this issue:
>>>>
>>>> 1)       Boot xen with iommu=1 to enable iommu
>>>> 2)       hvm.cfg contain:
>>>>
>>>> builder="hvm"
>>>>
>>>> memory=xxxx
>>>>
>>>> disk=['win8.1 img']
>>>>
>>>> device_model_override='qemu-system-i386'
>>>>
>>>> device_model_version='qemu-xen'
>>>>
>>>> rdm="strategy=host,policy=strict"
>>>>
>>>> 3)       xl cr hvm.cfg
>>>>
>>>> Conditions to reproduce this issue:
>>>>
>>>> 1)       DomU memory size > the top address of RMRR. Otherwise, this
>>>> issue will disappear.
>>>> 2)       rdm=" strategy=host,policy=strict" should exist
>>>> 3)       Windows DomU.  Linux DomU doesn't have such issue.
>>>> 4)       Upstream qemu.  Traditional qemu doesn't have such issue.
>>>>
>>>> In this situation, hvmloader will relocate some guest ram below RMRR to
>>>> high memory, and it seems window guest access an invalid address. Could
>>>> someone give me some suggestions on how to debug this ?
>>>
>>> You're likely have RMRR range(s) below 2GB boundary.
>>>
>>> You may try the following:
>>>
>>> 1. Specify some large 'mmio_hole' value in your domain configuration file,
>>> ex. mmio_hole=2560
>>> 2. If it won't help, 'xl dmesg' output might come useful
>>>
>>> Right now upstream QEMU still doesn't support relocation of parts
>>> of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
>>> AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring
>>> anything good for HVM guest.
>>>
>>> Setting the mmio_hole size manually allows to create a "predefined"
>>> memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
>>> hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR
>> overlaps
>>> or RAM relocation in hvmloader, so this might help.
>>
>> Wrote too soon, "policy=strict" means that you won't be able to create a
>> DomU if RMRR was below 2G... so it's actually should be above 2GB. Anyway,
>> try setting mmio_hole size.
> [Zhang, Xiong Y] Thanks for your suggestion.
> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> For this I still have two questions, could you help me ?
> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a different guest memory layout . So qemu ram maybe overlop with mmio, does xen have plan to fix this ?
> 

hvmloader doesn't do memory relocation - this ability is turned off by
default. The reason for the issue is that libxl initially sets the size
of lower MMIO hole (based on the RMRR regions present and their size)
and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.

When you set 'mmio_hole' size parameter you basically forces libxl to
pass this argument to QEMU.

That means the proper fix would be to make libxl to pass this argument
to QEMU in case there are RMRR regions present.

Igor

> 2) Just now, I did an experiment: In hvmloader, I set HVM_BELOW_4G_RAM_END to 3G and reserve one area for qemu_ram_allocate like 0xF0000000 ~ 0xFC000000; In Qemu, I modified xen_ram_alloc() to make sure it only allocate gfn in 0xF0000000 ~ 0xFC000000. In this case qemu_ram won't overlap with mmio, but this workaround couldn't fix my issue.
>  It seems qemu still has another interface to allocate gfn except xen_ram_alloc(), do you know this interface ?
> 
> thanks
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24  9:53       ` Igor Druzhinin
@ 2017-07-24 10:49         ` Zhang, Xiong Y
  2017-07-24 16:42         ` Alexey G
  1 sibling, 0 replies; 18+ messages in thread
From: Zhang, Xiong Y @ 2017-07-24 10:49 UTC (permalink / raw)
  To: Igor Druzhinin, Alexey G; +Cc: Zhang, Xiong Y, xen-devel

> On 24/07/17 09:07, Zhang, Xiong Y wrote:
> >>> On Fri, 21 Jul 2017 10:57:55 +0000
> >>> "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
> >>>
> >>>> On an intel skylake machine with upstream qemu, if I add
> >>>> "rdm=strategy=host, policy=strict" to hvm.cfg, win 8.1 DomU couldn't
> >>>> boot up and continues reboot.
> >>>>
> >>>> Steps to reproduce this issue:
> >>>>
> >>>> 1)       Boot xen with iommu=1 to enable iommu
> >>>> 2)       hvm.cfg contain:
> >>>>
> >>>> builder="hvm"
> >>>>
> >>>> memory=xxxx
> >>>>
> >>>> disk=['win8.1 img']
> >>>>
> >>>> device_model_override='qemu-system-i386'
> >>>>
> >>>> device_model_version='qemu-xen'
> >>>>
> >>>> rdm="strategy=host,policy=strict"
> >>>>
> >>>> 3)       xl cr hvm.cfg
> >>>>
> >>>> Conditions to reproduce this issue:
> >>>>
> >>>> 1)       DomU memory size > the top address of RMRR. Otherwise, this
> >>>> issue will disappear.
> >>>> 2)       rdm=" strategy=host,policy=strict" should exist
> >>>> 3)       Windows DomU.  Linux DomU doesn't have such issue.
> >>>> 4)       Upstream qemu.  Traditional qemu doesn't have such issue.
> >>>>
> >>>> In this situation, hvmloader will relocate some guest ram below RMRR to
> >>>> high memory, and it seems window guest access an invalid address.
> Could
> >>>> someone give me some suggestions on how to debug this ?
> >>>
> >>> You're likely have RMRR range(s) below 2GB boundary.
> >>>
> >>> You may try the following:
> >>>
> >>> 1. Specify some large 'mmio_hole' value in your domain configuration file,
> >>> ex. mmio_hole=2560
> >>> 2. If it won't help, 'xl dmesg' output might come useful
> >>>
> >>> Right now upstream QEMU still doesn't support relocation of parts
> >>> of guest RAM to >4GB boundary if they were overlapped by MMIO ranges.
> >>> AFAIR forcing allow_memory_relocate to 1 for hvmloader didn't bring
> >>> anything good for HVM guest.
> >>>
> >>> Setting the mmio_hole size manually allows to create a "predefined"
> >>> memory/MMIO hole layout for both QEMU (via 'max-ram-below-4g') and
> >>> hvmloader (via a XenStore param), effectively avoiding MMIO/RMRR
> >> overlaps
> >>> or RAM relocation in hvmloader, so this might help.
> >>
> >> Wrote too soon, "policy=strict" means that you won't be able to create a
> >> DomU if RMRR was below 2G... so it's actually should be above 2GB.
> Anyway,
> >> try setting mmio_hole size.
> > [Zhang, Xiong Y] Thanks for your suggestion.
> > Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> > For this I still have two questions, could you help me ?
> > 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
> different guest memory layout . So qemu ram maybe overlop with mmio, does
> xen have plan to fix this ?
> >
> 
> hvmloader doesn't do memory relocation - this ability is turned off by
> default. The reason for the issue is that libxl initially sets the size
> of lower MMIO hole (based on the RMRR regions present and their size)
> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
> 
> When you set 'mmio_hole' size parameter you basically forces libxl to
> pass this argument to QEMU.
> 
> That means the proper fix would be to make libxl to pass this argument
> to QEMU in case there are RMRR regions present.
[Zhang, Xiong Y] thanks for your clarification, I will try this solution.

What I said memory relocation is: both qemu and hvmloader think the default pci_mem_start is 0xF000000, while hvmloader will decrease pci_mem_start to 3G or 2G when mmio_total is big , but hvmloader's modification to pci_mem_start doesn't notify to Qemu, and I met another two issues for this in my igd passthrough environment:
(1): If guest ram is 2G, hvmloader's pci_mem_start is 2G; Qemu will allocate gfn in xen_ram_alloc() above 2G, so qemu's ram overlop with mmio.
(2): If guest ram >= 4G, hvmloader's pci_mem_start < 0xF0000000; Qemu will declare all the below 4G gfn as guest ram, When hvmloader set one device's bar base address < 0xF0000000, memory listener's callback in qemu for this bar couldn't be signaled as this bar's range has already been covered by qemu's guest ram.

Although the above two issues could be workaround by setting a big enough mmio_hole parameter, it is better xen has another fix.

Reference a comment from tools/firmware/hvmloader/pci.c:
        /*
         * At the moment qemu-xen can't deal with relocated memory regions.
         * It's too close to the release to make a proper fix; for now,
         * only allow the MMIO hole to grow large enough to move guest memory
         * if we're running qemu-traditional.  Items that don't fit will be
         * relocated into the 64-bit address space.

thanks
> 
> Igor
> 
> > 2) Just now, I did an experiment: In hvmloader, I set
> HVM_BELOW_4G_RAM_END to 3G and reserve one area for
> qemu_ram_allocate like 0xF0000000 ~ 0xFC000000; In Qemu, I modified
> xen_ram_alloc() to make sure it only allocate gfn in 0xF0000000 ~ 0xFC000000.
> In this case qemu_ram won't overlap with mmio, but this workaround couldn't
> fix my issue.
> >  It seems qemu still has another interface to allocate gfn except
> xen_ram_alloc(), do you know this interface ?
> >
> > thanks
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug]  Intel RMRR support with upstream Qemu
  2017-07-24  8:07     ` Zhang, Xiong Y
  2017-07-24  9:53       ` Igor Druzhinin
@ 2017-07-24 16:24       ` Alexey G
  2017-07-25  2:52         ` Zhang, Xiong Y
  1 sibling, 1 reply; 18+ messages in thread
From: Alexey G @ 2017-07-24 16:24 UTC (permalink / raw)
  To: Zhang, Xiong Y; +Cc: xen-devel

Hi,

On Mon, 24 Jul 2017 08:07:02 +0000
"Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:

> [Zhang, Xiong Y] Thanks for your suggestion.
> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> For this I still have two questions, could you help me ?
> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
> different guest memory layout . So qemu ram maybe overlop with mmio, does
> xen have plan to fix this ?
> 
> 2) Just now, I did an experiment: In hvmloader, I set
> HVM_BELOW_4G_RAM_END to 3G and reserve one area for qemu_ram_allocate
> like 0xF0000000 ~ 0xFC000000; In Qemu, I modified xen_ram_alloc() to make
> sure it only allocate gfn in 0xF0000000 ~ 0xFC000000. In this case
> qemu_ram won't overlap with mmio, but this workaround couldn't fix my
> issue. It seems qemu still has another interface to allocate gfn except
> xen_ram_alloc(), do you know this interface ?

Please share your 'xl dmesg' output, to have a look at your guest's MMIO
map and which RMRRs and PCI MBARs are present there.

If RMRR range happens to overlap some guest's RAM below pci_start
(dictated by lack of relocation support and low_mem_pgend value), I think
your problem might be solved by sacrificing some part of guest RAM which is
overlapped by RMRR -- by changing the E820 map in hvmloader.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24  9:53       ` Igor Druzhinin
  2017-07-24 10:49         ` Zhang, Xiong Y
@ 2017-07-24 16:42         ` Alexey G
  2017-07-24 17:01           ` Andrew Cooper
  2017-07-24 20:39           ` Igor Druzhinin
  1 sibling, 2 replies; 18+ messages in thread
From: Alexey G @ 2017-07-24 16:42 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: Zhang, Xiong Y, xen-devel

Hi,

On Mon, 24 Jul 2017 10:53:16 +0100
Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
> > [Zhang, Xiong Y] Thanks for your suggestion.
> > Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> > For this I still have two questions, could you help me ?
> > 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
> > different guest memory layout . So qemu ram maybe overlop with mmio,
> > does xen have plan to fix this ? 
> 
> hvmloader doesn't do memory relocation - this ability is turned off by
> default. The reason for the issue is that libxl initially sets the size
> of lower MMIO hole (based on the RMRR regions present and their size)
> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
> 
> When you set 'mmio_hole' size parameter you basically forces libxl to
> pass this argument to QEMU.
> 
> That means the proper fix would be to make libxl to pass this argument
> to QEMU in case there are RMRR regions present.

I tend to disagree a bit. 
What we lack actually is some way to perform a 'dynamical' physmem
relocation, when a guest domain is running already. Right now it works only
in the 'static' way - i.e. if memory layout was known for both QEMU and
hvmloader before starting a guest domain and with no means of arbitrarily
changing this layout at runtime when hvmloader runs.

But, the problem is that overall MMIO hole(s) requirements are not known
exactly at the time the HVM domain being created. Some PCI devices will be
emulated, some will be merely passed through and yet there will be some
RMRR ranges. libxl can't know all this stuff - some comes from the host,
some comes from DM. So actual MMIO requirements are known to hvmloader at
the PCI bus enumeration time.

libxl can be taught to retrieve all missing info from QEMU, but this way
will require to perform all grunt work of PCI BARs allocation in libxl
itself - in order to calculate the real MMIO hole(s) size, one needs to
take into account all PCI BARs sizes and their alignment requirements
diversity + existing gaps due to RMRR ranges... basically, libxl will
need to do most of hvmloader/pci.c's job.

My 2kop opinion here is that we don't need to move all PCI BAR allocation to
libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
else. A simple and somewhat good solution would be to implement this missing
hvmloader <-> QEMU interface in the same manner how it is done in real
hardware.

When we move some part of guest memory in 4GB range to address space above
4GB via XENMEM_add_to_physmap, we basically perform what chipset's
'remap' (aka reclaim) does. So we can implement this interface between
hvmloader and QEMU via providing custom emulation for MCH's
remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().

In this way hvmloader will calculate MMIO hole sizes as usual, relocate
some guest RAM above 4GB base and communicate this information to QEMU via
emulated host bridge registers -- so then QEMU will sync its memory layout
info to actual physmap's.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24 16:42         ` Alexey G
@ 2017-07-24 17:01           ` Andrew Cooper
  2017-07-24 18:34             ` Alexey G
  2017-07-24 20:39           ` Igor Druzhinin
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2017-07-24 17:01 UTC (permalink / raw)
  To: Alexey G, Igor Druzhinin; +Cc: Zhang, Xiong Y, xen-devel

On 24/07/17 17:42, Alexey G wrote:
> Hi,
>
> On Mon, 24 Jul 2017 10:53:16 +0100
> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>> For this I still have two questions, could you help me ?
>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>> does xen have plan to fix this ? 
>> hvmloader doesn't do memory relocation - this ability is turned off by
>> default. The reason for the issue is that libxl initially sets the size
>> of lower MMIO hole (based on the RMRR regions present and their size)
>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>
>> When you set 'mmio_hole' size parameter you basically forces libxl to
>> pass this argument to QEMU.
>>
>> That means the proper fix would be to make libxl to pass this argument
>> to QEMU in case there are RMRR regions present.
> I tend to disagree a bit. 
> What we lack actually is some way to perform a 'dynamical' physmem
> relocation, when a guest domain is running already. Right now it works only
> in the 'static' way - i.e. if memory layout was known for both QEMU and
> hvmloader before starting a guest domain and with no means of arbitrarily
> changing this layout at runtime when hvmloader runs.
>
> But, the problem is that overall MMIO hole(s) requirements are not known
> exactly at the time the HVM domain being created. Some PCI devices will be
> emulated, some will be merely passed through and yet there will be some
> RMRR ranges. libxl can't know all this stuff - some comes from the host,
> some comes from DM. So actual MMIO requirements are known to hvmloader at
> the PCI bus enumeration time.
>
> libxl can be taught to retrieve all missing info from QEMU, but this way
> will require to perform all grunt work of PCI BARs allocation in libxl
> itself - in order to calculate the real MMIO hole(s) size, one needs to
> take into account all PCI BARs sizes and their alignment requirements
> diversity + existing gaps due to RMRR ranges... basically, libxl will
> need to do most of hvmloader/pci.c's job.
>
> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
> else. A simple and somewhat good solution would be to implement this missing
> hvmloader <-> QEMU interface in the same manner how it is done in real
> hardware.
>
> When we move some part of guest memory in 4GB range to address space above
> 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
> 'remap' (aka reclaim) does. So we can implement this interface between
> hvmloader and QEMU via providing custom emulation for MCH's
> remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
>
> In this way hvmloader will calculate MMIO hole sizes as usual, relocate
> some guest RAM above 4GB base and communicate this information to QEMU via
> emulated host bridge registers -- so then QEMU will sync its memory layout
> info to actual physmap's.

Qemu isn't the only entity which needs to know.  There is currently an
attack surface via Xen by virtue of the fact that any hole in the p2m
gets emulated and forwarded to qemu.  (Two problems caused by this are a
qemu segfault and qemu infinite loop.)

The solution is to have Xen know which gfn ranges are supposed to be
MMIO, and terminate the access early if the guest frame falls outside of
the MMIO range.

Doing this by working it out statically at domain creation time is far
more simple for all components involved.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24 17:01           ` Andrew Cooper
@ 2017-07-24 18:34             ` Alexey G
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey G @ 2017-07-24 18:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Zhang, Xiong Y, Igor Druzhinin, xen-devel

On Mon, 24 Jul 2017 18:01:39 +0100
Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 24/07/17 17:42, Alexey G wrote:
> > Hi,
> >
> > On Mon, 24 Jul 2017 10:53:16 +0100
> > Igor Druzhinin <igor.druzhinin@citrix.com> wrote:  
> >>> [Zhang, Xiong Y] Thanks for your suggestion.
> >>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> >>> For this I still have two questions, could you help me ?
> >>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see
> >>> a different guest memory layout . So qemu ram maybe overlop with mmio,
> >>> does xen have plan to fix this ?   
> >> hvmloader doesn't do memory relocation - this ability is turned off by
> >> default. The reason for the issue is that libxl initially sets the size
> >> of lower MMIO hole (based on the RMRR regions present and their size)
> >> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
> >>
> >> When you set 'mmio_hole' size parameter you basically forces libxl to
> >> pass this argument to QEMU.
> >>
> >> That means the proper fix would be to make libxl to pass this argument
> >> to QEMU in case there are RMRR regions present.  
> > I tend to disagree a bit. 
> > What we lack actually is some way to perform a 'dynamical' physmem
> > relocation, when a guest domain is running already. Right now it works
> > only in the 'static' way - i.e. if memory layout was known for both
> > QEMU and hvmloader before starting a guest domain and with no means of
> > arbitrarily changing this layout at runtime when hvmloader runs.
> >
> > But, the problem is that overall MMIO hole(s) requirements are not known
> > exactly at the time the HVM domain being created. Some PCI devices will
> > be emulated, some will be merely passed through and yet there will be
> > some RMRR ranges. libxl can't know all this stuff - some comes from the
> > host, some comes from DM. So actual MMIO requirements are known to
> > hvmloader at the PCI bus enumeration time.
> >
> > libxl can be taught to retrieve all missing info from QEMU, but this way
> > will require to perform all grunt work of PCI BARs allocation in libxl
> > itself - in order to calculate the real MMIO hole(s) size, one needs to
> > take into account all PCI BARs sizes and their alignment requirements
> > diversity + existing gaps due to RMRR ranges... basically, libxl will
> > need to do most of hvmloader/pci.c's job.
> >
> > My 2kop opinion here is that we don't need to move all PCI BAR
> > allocation to libxl, or invent some new QMP-interfaces, or introduce
> > new hypercalls or else. A simple and somewhat good solution would be to
> > implement this missing hvmloader <-> QEMU interface in the same manner
> > how it is done in real hardware.
> >
> > When we move some part of guest memory in 4GB range to address space
> > above 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
> > 'remap' (aka reclaim) does. So we can implement this interface between
> > hvmloader and QEMU via providing custom emulation for MCH's
> > remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
> >
> > In this way hvmloader will calculate MMIO hole sizes as usual, relocate
> > some guest RAM above 4GB base and communicate this information to QEMU
> > via emulated host bridge registers -- so then QEMU will sync its memory
> > layout info to actual physmap's.  
> 
> Qemu isn't the only entity which needs to know.  There is currently an
> attack surface via Xen by virtue of the fact that any hole in the p2m
> gets emulated and forwarded to qemu.  (Two problems caused by this are a
> qemu segfault and qemu infinite loop.)
> 
> The solution is to have Xen know which gfn ranges are supposed to be
> MMIO, and terminate the access early if the guest frame falls outside of
> the MMIO range.
> 
> Doing this by working it out statically at domain creation time is far
> more simple for all components involved.

Well, I'm not acquainted with these issues, but this looks like a
bit different problem, I think.

We can possibly provide a fine-grained access to MMIO hole space by
tracking all passthru MMIO ranges and IOREQ ranges and restricting
accesses/forwarding for all currently 'unassigned' parts of MMIO holes,
this will yield the least possible attack surface.
This approach deals with the existing ioreq/p2m_mmio_direct ranges concept
while one-time restriction of MMIO hole's parts accesses at domain creation
time requires to introduce a new vision for guest's MMIO hole(s) for Xen.

And not all MMIO-related information will be available in static. There are
two things to consider: hotplugging PT PCI devices (some of them may have
large Mem BARs) and guest's attempts to change BAR values to some arbitrary
base ex. in (a large) high MMIO hole. If the guest sees a large MMIO hole
described in DSDT, he should be able to use any part of it to relocate PCI
BARs. On other hand, if we limit MMIO hole size in DSDT to some (barely
required) minimum, we will have problems when hotplugging PT devices -- the
guest OS might see no space to assign their resources. So, some MMIO
freedom and space are assumed.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24 16:42         ` Alexey G
  2017-07-24 17:01           ` Andrew Cooper
@ 2017-07-24 20:39           ` Igor Druzhinin
  2017-07-25  7:03             ` Zhang, Xiong Y
  2017-07-25 16:40             ` Alexey G
  1 sibling, 2 replies; 18+ messages in thread
From: Igor Druzhinin @ 2017-07-24 20:39 UTC (permalink / raw)
  To: Alexey G; +Cc: Zhang, Xiong Y, xen-devel

On 24/07/17 17:42, Alexey G wrote:
> Hi,
> 
> On Mon, 24 Jul 2017 10:53:16 +0100
> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>> For this I still have two questions, could you help me ?
>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>> does xen have plan to fix this ? 
>>
>> hvmloader doesn't do memory relocation - this ability is turned off by
>> default. The reason for the issue is that libxl initially sets the size
>> of lower MMIO hole (based on the RMRR regions present and their size)
>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>
>> When you set 'mmio_hole' size parameter you basically forces libxl to
>> pass this argument to QEMU.
>>
>> That means the proper fix would be to make libxl to pass this argument
>> to QEMU in case there are RMRR regions present.
> 
> I tend to disagree a bit. 
> What we lack actually is some way to perform a 'dynamical' physmem
> relocation, when a guest domain is running already. Right now it works only
> in the 'static' way - i.e. if memory layout was known for both QEMU and
> hvmloader before starting a guest domain and with no means of arbitrarily
> changing this layout at runtime when hvmloader runs.
> 
> But, the problem is that overall MMIO hole(s) requirements are not known
> exactly at the time the HVM domain being created. Some PCI devices will be
> emulated, some will be merely passed through and yet there will be some
> RMRR ranges. libxl can't know all this stuff - some comes from the host,
> some comes from DM. So actual MMIO requirements are known to hvmloader at
> the PCI bus enumeration time.
> 

IMO hvmloader shouldn't really be allowed to relocate memory under any
conditions. As Andrew said it's much easier to provision the hole
statically in libxl during domain construction process and it doesn't
really compromise any functionality. Having one more entity responsible
for guest memory layout only makes things more convoluted.

> libxl can be taught to retrieve all missing info from QEMU, but this way
> will require to perform all grunt work of PCI BARs allocation in libxl
> itself - in order to calculate the real MMIO hole(s) size, one needs to
> take into account all PCI BARs sizes and their alignment requirements
> diversity + existing gaps due to RMRR ranges... basically, libxl will
> need to do most of hvmloader/pci.c's job.
> 

The algorithm implemented in hvmloader for that is not complicated and
can be moved to libxl easily. What we can do is to provision a hole big
enough to include all the initially assigned PCI devices. We can also
account for emulated MMIO regions if necessary. But, to be honest, it
doesn't really matter since if there is no enough space in lower MMIO
hole for some BARs - they can be easily relocated to upper MMIO
hole by hvmloader or the guest itself (dynamically).

Igor

> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
> else. A simple and somewhat good solution would be to implement this missing
> hvmloader <-> QEMU interface in the same manner how it is done in real
> hardware.
> 
> When we move some part of guest memory in 4GB range to address space above
> 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
> 'remap' (aka reclaim) does. So we can implement this interface between
> hvmloader and QEMU via providing custom emulation for MCH's
> remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
> 
> In this way hvmloader will calculate MMIO hole sizes as usual, relocate
> some guest RAM above 4GB base and communicate this information to QEMU via
> emulated host bridge registers -- so then QEMU will sync its memory layout
> info to actual physmap's.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug]  Intel RMRR support with upstream Qemu
  2017-07-24 16:24       ` Alexey G
@ 2017-07-25  2:52         ` Zhang, Xiong Y
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Xiong Y @ 2017-07-25  2:52 UTC (permalink / raw)
  To: Alexey G; +Cc: Zhang, Xiong Y, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

> On Mon, 24 Jul 2017 08:07:02 +0000
> "Zhang, Xiong Y" <xiong.y.zhang@intel.com> wrote:
> 
> > [Zhang, Xiong Y] Thanks for your suggestion.
> > Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> > For this I still have two questions, could you help me ?
> > 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
> > different guest memory layout . So qemu ram maybe overlop with mmio,
> does
> > xen have plan to fix this ?
> >
> > 2) Just now, I did an experiment: In hvmloader, I set
> > HVM_BELOW_4G_RAM_END to 3G and reserve one area for
> qemu_ram_allocate
> > like 0xF0000000 ~ 0xFC000000; In Qemu, I modified xen_ram_alloc() to
> make
> > sure it only allocate gfn in 0xF0000000 ~ 0xFC000000. In this case
> > qemu_ram won't overlap with mmio, but this workaround couldn't fix my
> > issue. It seems qemu still has another interface to allocate gfn except
> > xen_ram_alloc(), do you know this interface ?
> 
> Please share your 'xl dmesg' output, to have a look at your guest's MMIO
> map and which RMRRs and PCI MBARs are present there.
[Zhang, Xiong Y] Thanks a lot for your help.
The attachment is my 'xl dmesg' output:
RMRR region: base_addr 3a271000 end_addr 3a290fff
RMRR region: base_addr 3b800000 end_addr 3fffffff
Because they are below 2G, I set rdm_mem_boundary=700 to avoid guest creation failure.

Guest Ram is 1G.
Guest's MMIO are:
(d47) pci dev 03:0 bar 10 size 002000000: 0e0000008
(d47) pci dev 02:0 bar 14 size 001000000: 0e2000008
(d47) pci dev 04:0 bar 30 size 000040000: 0e3000000
(d47) pci dev 03:0 bar 30 size 000010000: 0e3040000
(d47) pci dev 03:0 bar 14 size 000001000: 0e3050000
(d47) pci dev 02:0 bar 10 size 000000100: 00000c001
(d47) pci dev 04:0 bar 10 size 000000100: 00000c101
(d47) pci dev 04:0 bar 14 size 000000100: 0e3051000
(d47) pci dev 01:2 bar 20 size 000000020: 00000c201
(d47) pci dev 01:1 bar 20 size 000000010: 00000c221
Gfn: f0000000 ~ fc000000 are reserved for xen_ram_alloc().
> 
> If RMRR range happens to overlap some guest's RAM below pci_start
> (dictated by lack of relocation support and low_mem_pgend value), I think
> your problem might be solved by sacrificing some part of guest RAM which is
> overlapped by RMRR -- by changing the E820 map in hvmloader.
[Zhang, Xiong Y] yes, this is my case and could fix it by your suggestion.
> 


[-- Attachment #2: xl_dmesg --]
[-- Type: application/octet-stream, Size: 24343 bytes --]

 Xen 4.10-unstable
(XEN) Xen version 4.10-unstable (test@) (gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609) debug=y  Thu Jul  6 03:41:23 CST 2017
(XEN) Latest ChangeSet: Tue Jul 4 22:35:28 2017 +0800 git:d23afa6
(XEN) Bootloader: EFI
(XEN) Command line: console=vga loglvl=all guest_loglvl=all dom0_mem=2G msi=1 conring_size=128M iommu=1,debug iommu_inclusive_mapping=1 cpuidle=0
(XEN) Xen image load base address: 0x2ee00000
(XEN) Video information:
(XEN)  VGA is graphics mode 1280x1024, 32 bpp
(XEN) Disc information:
(XEN)  Found 0 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) EFI RAM map:
(XEN)  0000000000000000 - 0000000000058000 (usable)
(XEN)  0000000000058000 - 0000000000059000 (reserved)
(XEN)  0000000000059000 - 000000000009e000 (usable)
(XEN)  000000000009e000 - 00000000000a0000 (reserved)
(XEN)  0000000000100000 - 0000000031c0f000 (usable)
(XEN)  0000000031c0f000 - 0000000031c10000 (ACPI NVS)
(XEN)  0000000031c10000 - 0000000031c5a000 (reserved)
(XEN)  0000000031c5a000 - 000000003a1bd000 (usable)
(XEN)  000000003a1bd000 - 000000003a507000 (reserved)
(XEN)  000000003a507000 - 000000003a543000 (ACPI data)
(XEN)  000000003a543000 - 000000003ae3c000 (ACPI NVS)
(XEN)  000000003ae3c000 - 000000003b2fe000 (reserved)
(XEN)  000000003b2fe000 - 000000003b2ff000 (usable)
(XEN)  000000003b300000 - 000000003b400000 (reserved)
(XEN)  00000000e0000000 - 00000000f0000000 (reserved)
(XEN)  00000000fe000000 - 00000000fe011000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff000000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 00000008bf000000 (usable)
(XEN) ACPI: RSDP 3A513000, 0024 (r2 INTEL )
(XEN) ACPI: XSDT 3A5130A8, 00CC (r1  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) ACPI: FACP 3A536910, 010C (r5  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) ACPI: DSDT 3A513200, 23710 (r2  INTEL NUC6i7KY  1072009 INTL 20120913)
(XEN) ACPI: FACS 3AE3BF80, 0040
(XEN) ACPI: APIC 3A536A20, 00BC (r3  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) ACPI: FPDT 3A536AE0, 0044 (r1  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) ACPI: FIDT 3A536B28, 009C (r1  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) ACPI: MCFG 3A536BC8, 003C (r1  INTEL NUC6i7KY  1072009 MSFT       97)
(XEN) ACPI: HPET 3A536C08, 0038 (r1  INTEL NUC6i7KY  1072009 AMI.    5000B)
(XEN) ACPI: LPIT 3A536C40, 0094 (r1  INTEL NUC6i7KY        0 MSFT       5F)
(XEN) ACPI: SSDT 3A536CD8, 0248 (r2  INTEL NUC6i7KY        0 INTL 20120913)
(XEN) ACPI: SSDT 3A536F20, 2BAE (r2  INTEL NUC6i7KY     1000 INTL 20120913)
(XEN) ACPI: SSDT 3A539AD0, 0BE3 (r2  INTEL NUC6i7KY     1000 INTL 20120913)
(XEN) ACPI: SSDT 3A53A6B8, 04A3 (r2  INTEL NUC6i7KY     1000 INTL 20120913)
(XEN) ACPI: DBGP 3A53AB60, 0034 (r1  INTEL NUC6i7KY        0 MSFT       5F)
(XEN) ACPI: DBG2 3A53AB98, 0054 (r0  INTEL NUC6i7KY        0 MSFT       5F)
(XEN) ACPI: SSDT 3A53ABF0, 0631 (r2  INTEL NUC6i7KY        0 INTL 20120913)
(XEN) ACPI: SSDT 3A53B228, 5480 (r2  INTEL NUC6i7KY     3000 INTL 20120913)
(XEN) ACPI: UEFI 3A5406A8, 0042 (r1  INTEL NUC6i7KY        0             0)
(XEN) ACPI: SSDT 3A5406F0, 0E73 (r2  INTEL NUC6i7KY     3000 INTL 20120913)
(XEN) ACPI: SSDT 3A541568, 0735 (r2  INTEL NUC6i7KY     1000 INTL 20120913)
(XEN) ACPI: DMAR 3A541CA0, 00A8 (r1  INTEL NUC6i7KY        1 INTL        1)
(XEN) ACPI: SSDT 3A541D48, 0533 (r1  INTEL NUC6i7KY     1000 INTL 20120913)
(XEN) ACPI: BGRT 3A542280, 0038 (r1  INTEL NUC6i7KY  1072009 AMI     10013)
(XEN) System RAM: 32657MB (33440832kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-00000008bf000000
(XEN) Domain heap initialised
(XEN) Allocated console ring of 131072 KiB.
(XEN) vesafb: framebuffer at 0x80000000, mapped to 0xffff82c000201000, using 5120k, total 5120k
(XEN) vesafb: mode is 1280x1024x32, linelength=5120, font 8x16
(XEN) vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 94 (0x5e), Stepping 3 (raw 000506e3)
(XEN) SMBIOS 3.0 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x1808 (32 bits)
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - 3ae3bf80/0000000000000000, using 32
(XEN) ACPI:             wakeup_vec[3ae3bf8c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
(XEN) ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) [VT-D]Host address width 39
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fed90000
(XEN) [VT-D]drhd->address = fed90000 iommu->reg = ffff82c00071c000
(XEN) [VT-D]cap = 1c0000c40660462 ecap = 7e3ff0505e
(XEN) [VT-D] endpoint: 0000:00:02.0
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fed91000
(XEN) [VT-D]drhd->address = fed91000 iommu->reg = ffff82c00071e000
(XEN) [VT-D]cap = d2008c40660462 ecap = f050da
(XEN) [VT-D] IOAPIC: 0000:f0:1f.0
(XEN) [VT-D] MSI HPET: 0000:00:1f.0
(XEN) [VT-D]  flags: INCLUDE_ALL
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: 0000:00:14.0
(XEN) [VT-D]dmar.c:638:   RMRR region: base_addr 3a271000 end_addr 3a290fff
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D]  RMRR address range 3b800000..3fffffff not in reserved memory; need "iommu_inclusive_mapping=1"?
(XEN) [VT-D] endpoint: 0000:00:02.0
(XEN) [VT-D]dmar.c:638:   RMRR region: base_addr 3b800000 end_addr 3fffffff
(XEN) ERST table was not found
(XEN) ACPI: BGRT: invalidating v1 image at 0x384d8018
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 8 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 120 GSI, 1432 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) xstate: size: 0x440 and states: 0x1f
(XEN) mce_intel.c:763: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
(XEN) CPU0: Intel machine check reporting enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Platform timer is 23.999MHz HPET
(XEN) Detected 2592.132 MHz processor.
(XEN) EFI memory map:
(XEN)  0000000000000-0000000057fff type=7 attr=000000000000000f
(XEN)  0000000058000-0000000058fff type=0 attr=000000000000000f
(XEN)  0000000059000-000000009bfff type=7 attr=000000000000000f
(XEN)  000000009c000-000000009dfff type=2 attr=000000000000000f
(XEN)  000000009e000-000000009ffff type=0 attr=000000000000000f
(XEN)  0000000100000-000001513afff type=7 attr=000000000000000f
(XEN)  000001513b000-0000029cc7fff type=2 attr=000000000000000f
(XEN)  0000029cc8000-0000029d07fff type=4 attr=000000000000000f
(XEN)  0000029d08000-000002e60dfff type=7 attr=000000000000000f
(XEN)  000002e60e000-000002ed1bfff type=2 attr=000000000000000f
(XEN)  000002ed1c000-000002ff1bfff type=1 attr=000000000000000f
(XEN)  000002ff1c000-000002ff1cfff type=2 attr=000000000000000f
(XEN)  000002ff1d000-000002ff1dfff type=7 attr=000000000000000f
(XEN)  000002ff1e000-000002ff48fff type=2 attr=000000000000000f
(XEN)  000002ff49000-0000030005fff type=1 attr=000000000000000f
(XEN)  0000030006000-0000031c0efff type=4 attr=000000000000000f
(XEN)  0000031c0f000-0000031c0ffff type=10 attr=000000000000000f
(XEN)  0000031c10000-0000031c59fff type=6 attr=800000000000000f
(XEN)  0000031c5a000-0000031cb7fff type=4 attr=000000000000000f
(XEN)  0000031cb8000-0000031cb8fff type=7 attr=000000000000000f
(XEN)  0000031cb9000-0000031cbafff type=2 attr=000000000000000f
(XEN)  0000031cbb000-0000031cbbfff type=7 attr=000000000000000f
(XEN)  0000031cbc000-0000031cc6fff type=2 attr=000000000000000f
(XEN)  0000031cc7000-000003982afff type=4 attr=000000000000000f
(XEN)  000003982b000-0000039a29fff type=7 attr=000000000000000f
(XEN)  0000039a2a000-000003a1bcfff type=3 attr=000000000000000f
(XEN)  000003a1bd000-000003a506fff type=0 attr=000000000000000f
(XEN)  000003a507000-000003a542fff type=9 attr=000000000000000f
(XEN)  000003a543000-000003ae3bfff type=10 attr=000000000000000f
(XEN)  000003ae3c000-000003b293fff type=6 attr=800000000000000f
(XEN)  000003b294000-000003b2fdfff type=5 attr=800000000000000f
(XEN)  000003b2fe000-000003b2fefff type=4 attr=000000000000000f
(XEN)  0000100000000-00008beffffff type=7 attr=000000000000000f
(XEN)  000003b300000-000003b3fffff type=0 attr=0000000000000000
(XEN)  00000e0000000-00000efffffff type=11 attr=8000000000000001
(XEN)  00000fe000000-00000fe010fff type=11 attr=8000000000000001
(XEN)  00000fec00000-00000fec00fff type=11 attr=8000000000000001
(XEN)  00000fee00000-00000fee00fff type=11 attr=8000000000000001
(XEN)  00000ff000000-00000ffffffff type=11 attr=8000000000000001
(XEN) Initing memory sharing.
(XEN) alt table ffff82d080670750 -> ffff82d080671d40
(XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff
(XEN) PCI: MCFG area at e0000000 reserved in E820
(XEN) PCI: Using MCFG for segment 0000 bus 00-ff
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) nr_sockets: 1
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0
(XEN) TSC deadline timer enabled
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN)  - VM Functions
(XEN)  - Virtualisation Exceptions
(XEN)  - Page Modification Logging
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) [VT-D]iommu.c:909: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:911: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]INTR-REMAP: Request device [0000:f0:1f.0] fault index 0, iommu reg = ffff82c00071e000
(XEN) [VT-D]INTR-REMAP: reason 22 - Present field in the IRTE entry is clear
(XEN) Brought up 8 CPUs
(XEN) build-id: 24f4bed6bf8a30c6a1459975ad114c51
(XEN) Running stub recovery selftests...
(XEN) traps.c:1530: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08035ed22
(XEN) traps.c:738: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d08035ed22
(XEN) traps.c:1068: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08035ed22
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) Dom0 has maximum 888 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) *** LOADING DOMAIN 0 ***
(XEN) ELF: phdr: paddr=0x1000000 memsz=0xdac000
(XEN) ELF: phdr: paddr=0x1e00000 memsz=0x170000
(XEN) ELF: phdr: paddr=0x1f70000 memsz=0x1c4d8
(XEN) ELF: phdr: paddr=0x1f8d000 memsz=0x2ca000
(XEN) ELF: memory: 0x1000000 -> 0x2257000
(XEN) ELF: note: GUEST_OS = "linux"
(XEN) ELF: note: GUEST_VERSION = "2.6"
(XEN) ELF: note: XEN_VERSION = "xen-3.0"
(XEN) ELF: note: VIRT_BASE = 0xffffffff80000000
(XEN) ELF: note: INIT_P2M = 0x8000000000
(XEN) ELF: note: ENTRY = 0xffffffff81f8d180
(XEN) ELF: note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) ELF: note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb"
(XEN) ELF: note: SUPPORTED_FEATURES = 0x801
(XEN) ELF: note: PAE_MODE = "yes"
(XEN) ELF: note: LOADER = "generic"
(XEN) ELF: note: unknown (0xd)
(XEN) ELF: note: SUSPEND_CANCEL = 0x1
(XEN) ELF: note: MOD_START_PFN = 0x1
(XEN) ELF: note: HV_START_LOW = 0xffff800000000000
(XEN) ELF: note: PADDR_OFFSET = 0
(XEN) ELF: note: PHYS32_ENTRY = 0x10002d0
(XEN) ELF: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff82257000
(XEN)     virt_entry       = 0xffffffff81f8d180
(XEN)     p2m_base         = 0x8000000000
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2257000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000884000000->0000000888000000 (423027 pages to be allocated)
(XEN)  Init. ramdisk: 00000008aa473000->00000008befff8a8
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82257000
(XEN)  Init. ramdisk: 0000000000000000->0000000000000000
(XEN)  Phys-Mach map: 0000008000000000->0000008000400000
(XEN)  Start info:    ffffffff82257000->ffffffff822574b4
(XEN)  Page tables:   ffffffff82258000->ffffffff8226d000
(XEN)  Boot stack:    ffffffff8226d000->ffffffff8226e000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82400000
(XEN)  ENTRY ADDRESS: ffffffff81f8d180
(XEN) Dom0 has maximum 8 VCPUs
(XEN) ELF: phdr 0 at 0xffffffff81000000 -> 0xffffffff81dac000
(XEN) ELF: phdr 1 at 0xffffffff81e00000 -> 0xffffffff81f70000
(XEN) ELF: phdr 2 at 0xffffffff81f70000 -> 0xffffffff81f8c4d8
(XEN) ELF: phdr 3 at 0xffffffff81f8d000 -> 0xffffffff82108000
(XEN) [VT-D]d0:Hostbridge: skip 0000:00:00.0 map
(XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0
(XEN) [VT-D]d0:PCIe: map 0000:00:02.0
(XEN) [VT-D]d0:PCI: map 0000:00:08.0
(XEN) [VT-D]d0:PCI: map 0000:00:14.0
(XEN) [VT-D]d0:PCI: map 0000:00:14.2
(XEN) [VT-D]d0:PCI: map 0000:00:16.0
(XEN) [VT-D]d0:PCI: map 0000:00:17.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.0
(XEN) [VT-D]d0:PCI: map 0000:00:1f.2
(XEN) [VT-D]d0:PCI: map 0000:00:1f.3
(XEN) [VT-D]d0:PCI: map 0000:00:1f.4
(XEN) [VT-D]d0:PCI: map 0000:00:1f.6
(XEN) [VT-D]d0:PCIe: map 0000:02:00.0
(XEN) [VT-D]d0:PCIe: map 0000:03:00.0
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00071c000
(XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00071e000
(XEN) Scrubbing Free RAM on 1 nodes using 4 CPUs
(XEN) [VT-D]iommu.c:909: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:911: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault addr 0, iommu reg = ffff82c00071c000
(XEN) [VT-D]DMAR: reason 01 - Present bit in root entry is clear
(XEN) print_vtd_entries: iommu #0 dev 0000:00:02.0 gmfn 00000
(XEN)     root_entry[00] = 81c422001
(XEN)     context[10] = 2_820a3c001
(XEN)     l4[000] = 820a3b003
(XEN)     l3[000] = 820a3a003
(XEN)     l2[000] = 820a39003
(XEN)     l1[000] = 3
(XEN) ......................................................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 2048kB init memory
(XEN) d0: Forcing write emulation on MFNs e0000-effff
(XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0
(XEN) PCI add device 0000:00:00.0
(XEN) PCI add device 0000:00:02.0
(XEN) PCI add device 0000:00:08.0
(XEN) PCI add device 0000:00:14.0
(XEN) PCI add device 0000:00:14.2
(XEN) PCI add device 0000:00:16.0
(XEN) PCI add device 0000:00:17.0
(XEN) PCI add device 0000:00:1c.0
(XEN) PCI add device 0000:00:1c.1
(XEN) PCI add device 0000:00:1c.2
(XEN) PCI add device 0000:00:1c.4
(XEN) PCI add device 0000:00:1f.0
(XEN) PCI add device 0000:00:1f.2
(XEN) PCI add device 0000:00:1f.3
(XEN) PCI add device 0000:00:1f.4
(XEN) PCI add device 0000:00:1f.6
(XEN) PCI add device 0000:02:00.0
(XEN) PCI add device 0000:03:00.0
(XEN) d0: Forcing read-only access to MFN fed00
(XEN) Cannot bind IRQ2 to dom0. In use by 'cascade'.
(XEN) Cannot bind IRQ2 to dom0. In use by 'cascade'.
(XEN) traps.c:1530: GPF (0000): ffff82d0803549c8 [emul-priv-op.c#priv_op_read_msr+0x4da/0x51c] -> ffff82d08035f4a0
(XEN) traps.c:1530: GPF (0000): ffff82d0803549c8 [emul-priv-op.c#priv_op_read_msr+0x4da/0x51c] -> ffff82d08035f4a0
(XEN) traps.c:1530: GPF (0000): ffff82d0803549c8 [emul-priv-op.c#priv_op_read_msr+0x4da/0x51c] -> ffff82d08035f4a0
(XEN) traps.c:1530: GPF (0000): ffff82d0803549c8 [emul-priv-op.c#priv_op_read_msr+0x4da/0x51c] -> ffff82d08035f4a0
(XEN) emul-priv-op.c:1195:d0v0 Domain attempted WRMSR 00000610 from 0x004281c0009f8168 to 0x004281c00015816a
(XEN) emul-priv-op.c:1195:d0v0 Domain attempted WRMSR 00000610 from 0x004281c0009f8168 to 0x004281c0009f0168
(XEN) emul-priv-op.c:1195:d0v0 Domain attempted WRMSR 00000610 from 0x004281c0009f8168 to 0x004281c0009e8168
(XEN) emul-priv-op.c:1195:d0v0 Domain attempted WRMSR 00000610 from 0x004281c0009f8168 to 0x004201c0009f8168
(XEN) traps.c:1530: GPF (0000): ffff82d0803549c8 [emul-priv-op.c#priv_op_read_msr+0x4da/0x51c] -> ffff82d08035f4a0
(XEN) HVM47 save: CPU
(XEN) HVM47 save: PIC
(XEN) HVM47 save: IOAPIC
(XEN) HVM47 save: LAPIC
(XEN) HVM47 save: LAPIC_REGS
(XEN) HVM47 save: PCI_IRQ
(XEN) HVM47 save: ISA_IRQ
(XEN) HVM47 save: PCI_LINK
(XEN) HVM47 save: PIT
(XEN) HVM47 save: RTC
(XEN) HVM47 save: HPET
(XEN) HVM47 save: PMTIMER
(XEN) HVM47 save: MTRR
(XEN) HVM47 save: VIRIDIAN_DOMAIN
(XEN) HVM47 save: CPU_XSAVE
(XEN) HVM47 save: VIRIDIAN_VCPU
(XEN) HVM47 save: VMCE_VCPU
(XEN) HVM47 save: TSC_ADJUST
(XEN) HVM47 save: CPU_MSR
(XEN) HVM47 restore: CPU 0
(d47) HVM Loader
(d47) Detected Xen v4.10-unstable
(d47) Xenbus rings @0xfeffc000, event channel 1
(d47) System requested SeaBIOS
(d47) CPU speed is 2592 MHz
(d47) Relocating guest memory for lowmem MMIO space disabled
(XEN) irq.c:327: Dom47 PCI link 0 changed 0 -> 5
(d47) PCI-ISA link 0 routed to IRQ5
(XEN) irq.c:327: Dom47 PCI link 1 changed 0 -> 10
(d47) PCI-ISA link 1 routed to IRQ10
(XEN) irq.c:327: Dom47 PCI link 2 changed 0 -> 11
(d47) PCI-ISA link 2 routed to IRQ11
(XEN) irq.c:327: Dom47 PCI link 3 changed 0 -> 5
(d47) PCI-ISA link 3 routed to IRQ5
(d47) pci dev 01:2 INTD->IRQ5
(d47) pci dev 01:3 INTA->IRQ10
(d47) pci dev 02:0 INTA->IRQ11
(d47) pci dev 04:0 INTA->IRQ5
(d47) RAM in high memory; setting high_mem resource base to 10558f000
(d47) pci dev 03:0 bar 10 size 002000000: 0e0000008
(d47) pci dev 02:0 bar 14 size 001000000: 0e2000008
(d47) pci dev 04:0 bar 30 size 000040000: 0e3000000
(d47) pci dev 03:0 bar 30 size 000010000: 0e3040000
(d47) pci dev 03:0 bar 14 size 000001000: 0e3050000
(d47) pci dev 02:0 bar 10 size 000000100: 00000c001
(d47) pci dev 04:0 bar 10 size 000000100: 00000c101
(d47) pci dev 04:0 bar 14 size 000000100: 0e3051000
(d47) pci dev 01:2 bar 20 size 000000020: 00000c201
(d47) pci dev 01:1 bar 20 size 000000010: 00000c221
(d47) Multiprocessor initialisation:
(d47)  - CPU0 ... 39-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d47)  - CPU1 ... 39-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d47) Testing HVM environment:
(d47)  - REP INSB across page boundaries ... passed
(d47)  - GS base MSRs and SWAPGS ... passed
(d47) Passed 2 of 2 tests
(d47) Writing SMBIOS tables ...
(d47) Loading SeaBIOS ...
(d47) Creating MP tables ...
(d47) Loading ACPI ...
(d47) CONV disabled
(d47) vm86 TSS at fc00a680
(d47) BIOS map:
(d47)  10000-100e3: Scratch space
(d47)  c0000-fffff: Main BIOS
(d47) E820 table:
(d47)  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(d47)  HOLE: 00000000:000a0000 - 00000000:000c0000
(d47)  [01]: 00000000:000c0000 - 00000000:00100000: RESERVED
(d47)  [02]: 00000000:00100000 - 00000000:3a271000: RAM
(d47)  [03]: 00000000:3a271000 - 00000000:3a291000: RESERVED
(d47)  HOLE: 00000000:3a291000 - 00000000:3b800000
(d47)  [04]: 00000000:3b800000 - 00000000:40000000: RESERVED
(d47)  HOLE: 00000000:40000000 - 00000000:f0000000
(d47)  [05]: 00000000:f0000000 - 00000000:fc000000: RESERVED
(d47)  [06]: 00000000:fc000000 - 00000001:00000000: RESERVED
(d47)  [07]: 00000001:00000000 - 00000001:0558f000: RAM
(d47) Invoking SeaBIOS ...
(d47) SeaBIOS (version rel-1.10.2-0-g5f4c7b1)
(d47) BUILD: gcc: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 binutils: (GNU Binut
(d47) ils for Ubuntu) 2.26.1
(d47) 
(d47) Found Xen hypervisor signature at 40000000
(d47) Running on QEMU (i440fx)
(d47) xen: copy e820...
(d47) Relocating init from 0x000da500 to 0x3a21de20 (size 78144)
(d47) Found 8 PCI devices (max PCI bus is 00)
(d47) Allocated Xen hypercall page at 3a270000
(d47) Detected Xen v4.10-unstable
(d47) xen: copy BIOS tables...
(d47) Copying SMBIOS entry point from 0x00010020 to 0x000f6980
(d47) Copying MPTABLE from 0xfc001170/fc001180 to 0x000f6880
(d47) Copying PIR from 0x00010040 to 0x000f6800
(d47) Copying ACPI RSDP from 0x000100c0 to 0x000f67d0
(d47) Using pmtimer, ioport 0xb008
(d47) Scan for VGA option rom
(d47) Running option rom at c000:0003
(XEN) stdvga.c:173:d47v0 entering stdvga mode
(d47) pmm call arg1=0
(d47) Turning on vga text mode console
(d47) SeaBIOS (version rel-1.10.2-0-g5f4c7b1)
(d47) Machine UUID 14086470-ef72-4ac8-8897-6f73a46ff05a
(d47) UHCI init on dev 00:01.2 (io=c200)
(d47) ATA controller 1 at 1f0/3f4/c220 (irq 14 dev 9)
(d47) ATA controller 2 at 170/374/c228 (irq 15 dev 9)
(d47) Found 0 lpt ports
(d47) Found 0 serial ports
(d47) ata0-0: QEMU HARDDISK ATA-7 Hard-Disk (80 GiBytes)
(d47) Searching bootorder for: /pci@i0cf8/*@1,1/drive@0/disk@0
(d47) PS2 keyboard initialized
(d47) All threads complete.
(d47) Scan for option roms
(d47) Running option rom at c980:0003
(d47) pmm call arg1=1
(d47) pmm call arg1=0
(d47) pmm call arg1=1
(d47) pmm call arg1=0
(d47) Searching bootorder for: /pci@i0cf8/*@4
(d47) 
(d47) Press ESC for boot menu.
(d47) 
(d47) Searching bootorder for: HALT
(d47) drive 0x000f6760: PCHS=16383/16/63 translation=lba LCHS=1024/255/63 s=167772160
(d47) 
(d47) Space available for UMB: ca800-ec800, f61a0-f6760
(d47) Returned 258048 bytes of ZoneHigh
(d47) e820 map has 8 items:
(d47)   0: 0000000000000000 - 000000000009fc00 = 1 RAM
(d47)   1: 000000000009fc00 - 00000000000a0000 = 2 RESERVED
(d47)   2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
(d47)   3: 0000000000100000 - 000000003a270000 = 1 RAM
(d47)   4: 000000003a270000 - 000000003a291000 = 2 RESERVED
(d47)   5: 000000003b800000 - 0000000040000000 = 2 RESERVED
(d47)   6: 00000000f0000000 - 0000000100000000 = 2 RESERVED
(d47)   7: 0000000100000000 - 000000010558f000 = 1 RAM
(d47) enter handle_19:
(d47)   NULL
(d47) Booting from Hard Disk...
(d47) Booting from 0000:7c00
(XEN) stdvga.c:178:d47v0 leaving stdvga mode
(XEN) irq.c:327: Dom47 PCI link 0 changed 5 -> 0
(XEN) irq.c:327: Dom47 PCI link 1 changed 10 -> 0
(XEN) irq.c:327: Dom47 PCI link 2 changed 11 -> 0
(XEN) irq.c:327: Dom47 PCI link 3 changed 5 -> 0

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24 20:39           ` Igor Druzhinin
@ 2017-07-25  7:03             ` Zhang, Xiong Y
  2017-07-25 14:13               ` Igor Druzhinin
  2017-07-25 16:40             ` Alexey G
  1 sibling, 1 reply; 18+ messages in thread
From: Zhang, Xiong Y @ 2017-07-25  7:03 UTC (permalink / raw)
  To: Igor Druzhinin, Alexey G; +Cc: Zhang, Xiong Y, xen-devel

> On 24/07/17 17:42, Alexey G wrote:
> > Hi,
> >
> > On Mon, 24 Jul 2017 10:53:16 +0100
> > Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
> >>> [Zhang, Xiong Y] Thanks for your suggestion.
> >>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
> >>> For this I still have two questions, could you help me ?
> >>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
> >>> different guest memory layout . So qemu ram maybe overlop with mmio,
> >>> does xen have plan to fix this ?
> >>
> >> hvmloader doesn't do memory relocation - this ability is turned off by
> >> default. The reason for the issue is that libxl initially sets the size
> >> of lower MMIO hole (based on the RMRR regions present and their size)
> >> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
> >>
> >> When you set 'mmio_hole' size parameter you basically forces libxl to
> >> pass this argument to QEMU.
> >>
> >> That means the proper fix would be to make libxl to pass this argument
> >> to QEMU in case there are RMRR regions present.
> >
> > I tend to disagree a bit.
> > What we lack actually is some way to perform a 'dynamical' physmem
> > relocation, when a guest domain is running already. Right now it works only
> > in the 'static' way - i.e. if memory layout was known for both QEMU and
> > hvmloader before starting a guest domain and with no means of arbitrarily
> > changing this layout at runtime when hvmloader runs.
> >
> > But, the problem is that overall MMIO hole(s) requirements are not known
> > exactly at the time the HVM domain being created. Some PCI devices will be
> > emulated, some will be merely passed through and yet there will be some
> > RMRR ranges. libxl can't know all this stuff - some comes from the host,
> > some comes from DM. So actual MMIO requirements are known to
> hvmloader at
> > the PCI bus enumeration time.
> >
> 
> IMO hvmloader shouldn't really be allowed to relocate memory under any
> conditions. As Andrew said it's much easier to provision the hole
> statically in libxl during domain construction process and it doesn't
> really compromise any functionality. Having one more entity responsible
> for guest memory layout only makes things more convoluted.
> 
> > libxl can be taught to retrieve all missing info from QEMU, but this way
> > will require to perform all grunt work of PCI BARs allocation in libxl
> > itself - in order to calculate the real MMIO hole(s) size, one needs to
> > take into account all PCI BARs sizes and their alignment requirements
> > diversity + existing gaps due to RMRR ranges... basically, libxl will
> > need to do most of hvmloader/pci.c's job.
> >
> 
> The algorithm implemented in hvmloader for that is not complicated and
> can be moved to libxl easily. What we can do is to provision a hole big
> enough to include all the initially assigned PCI devices. We can also
> account for emulated MMIO regions if necessary. But, to be honest, it
> doesn't really matter since if there is no enough space in lower MMIO
> hole for some BARs - they can be easily relocated to upper MMIO
> hole by hvmloader or the guest itself (dynamically).
> 
> Igor
[Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and don't allow hvmloader to do relocate, things will be easier. But how could we supply a big enough mmio hole ?
a. statical set base address of mmio hole to 2G/3G.
b. Like hvmloader to probe all the pci devices and calculate mmio size. But this runs prior to qemu, how to probe pci devices ? 

thanks
> > My 2kop opinion here is that we don't need to move all PCI BAR allocation to
> > libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
> > else. A simple and somewhat good solution would be to implement this
> missing
> > hvmloader <-> QEMU interface in the same manner how it is done in real
> > hardware.
> >
> > When we move some part of guest memory in 4GB range to address space
> above
> > 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
> > 'remap' (aka reclaim) does. So we can implement this interface between
> > hvmloader and QEMU via providing custom emulation for MCH's
> > remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
> >
> > In this way hvmloader will calculate MMIO hole sizes as usual, relocate
> > some guest RAM above 4GB base and communicate this information to
> QEMU via
> > emulated host bridge registers -- so then QEMU will sync its memory layout
> > info to actual physmap's.
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-25  7:03             ` Zhang, Xiong Y
@ 2017-07-25 14:13               ` Igor Druzhinin
  2017-07-25 16:49                 ` Alexey G
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2017-07-25 14:13 UTC (permalink / raw)
  To: Zhang, Xiong Y, Alexey G; +Cc: xen-devel

On 25/07/17 08:03, Zhang, Xiong Y wrote:
>> On 24/07/17 17:42, Alexey G wrote:
>>> Hi,
>>>
>>> On Mon, 24 Jul 2017 10:53:16 +0100
>>> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>>>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>>>> For this I still have two questions, could you help me ?
>>>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>>>> does xen have plan to fix this ?
>>>>
>>>> hvmloader doesn't do memory relocation - this ability is turned off by
>>>> default. The reason for the issue is that libxl initially sets the size
>>>> of lower MMIO hole (based on the RMRR regions present and their size)
>>>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>>>
>>>> When you set 'mmio_hole' size parameter you basically forces libxl to
>>>> pass this argument to QEMU.
>>>>
>>>> That means the proper fix would be to make libxl to pass this argument
>>>> to QEMU in case there are RMRR regions present.
>>>
>>> I tend to disagree a bit.
>>> What we lack actually is some way to perform a 'dynamical' physmem
>>> relocation, when a guest domain is running already. Right now it works only
>>> in the 'static' way - i.e. if memory layout was known for both QEMU and
>>> hvmloader before starting a guest domain and with no means of arbitrarily
>>> changing this layout at runtime when hvmloader runs.
>>>
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will be
>>> emulated, some will be merely passed through and yet there will be some
>>> RMRR ranges. libxl can't know all this stuff - some comes from the host,
>>> some comes from DM. So actual MMIO requirements are known to
>> hvmloader at
>>> the PCI bus enumeration time.
>>>
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
>>
>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of PCI BARs allocation in libxl
>>> itself - in order to calculate the real MMIO hole(s) size, one needs to
>>> take into account all PCI BARs sizes and their alignment requirements
>>> diversity + existing gaps due to RMRR ranges... basically, libxl will
>>> need to do most of hvmloader/pci.c's job.
>>>
>>
>> The algorithm implemented in hvmloader for that is not complicated and
>> can be moved to libxl easily. What we can do is to provision a hole big
>> enough to include all the initially assigned PCI devices. We can also
>> account for emulated MMIO regions if necessary. But, to be honest, it
>> doesn't really matter since if there is no enough space in lower MMIO
>> hole for some BARs - they can be easily relocated to upper MMIO
>> hole by hvmloader or the guest itself (dynamically).
>>
>> Igor
> [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and don't allow hvmloader to do relocate, things will be easier. But how could we supply a big enough mmio hole ?
> a. statical set base address of mmio hole to 2G/3G.
> b. Like hvmloader to probe all the pci devices and calculate mmio size. But this runs prior to qemu, how to probe pci devices ? 
> 

It's true that we don't know the space occupied by emulated device
before QEMU is started.  But QEMU needs to be started with some lower
MMIO hole size statically assigned.

One of the possible solutions is to calculate a hole size required to
include all the assigned pass-through devices and round it up to the
nearest GB boundary but not larger than 2GB total. If it's not enough to
also include all the emulated devices - it's not enough, some of the PCI
device are going to be relocated to upper MMIO hole in that case.

Igor

> thanks
>>> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
>>> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
>>> else. A simple and somewhat good solution would be to implement this
>> missing
>>> hvmloader <-> QEMU interface in the same manner how it is done in real
>>> hardware.
>>>
>>> When we move some part of guest memory in 4GB range to address space
>> above
>>> 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
>>> 'remap' (aka reclaim) does. So we can implement this interface between
>>> hvmloader and QEMU via providing custom emulation for MCH's
>>> remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
>>>
>>> In this way hvmloader will calculate MMIO hole sizes as usual, relocate
>>> some guest RAM above 4GB base and communicate this information to
>> QEMU via
>>> emulated host bridge registers -- so then QEMU will sync its memory layout
>>> info to actual physmap's.
>>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-24 20:39           ` Igor Druzhinin
  2017-07-25  7:03             ` Zhang, Xiong Y
@ 2017-07-25 16:40             ` Alexey G
  2017-07-25 17:04               ` Igor Druzhinin
  2017-07-25 17:47               ` Andrew Cooper
  1 sibling, 2 replies; 18+ messages in thread
From: Alexey G @ 2017-07-25 16:40 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: Zhang, Xiong Y, xen-devel

On Mon, 24 Jul 2017 21:39:08 +0100
Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
> > But, the problem is that overall MMIO hole(s) requirements are not known
> > exactly at the time the HVM domain being created. Some PCI devices will
> > be emulated, some will be merely passed through and yet there will be
> > some RMRR ranges. libxl can't know all this stuff - some comes from the
> > host, some comes from DM. So actual MMIO requirements are known to
> > hvmloader at the PCI bus enumeration time.
> >   
> 
> IMO hvmloader shouldn't really be allowed to relocate memory under any
> conditions. As Andrew said it's much easier to provision the hole
> statically in libxl during domain construction process and it doesn't
> really compromise any functionality. Having one more entity responsible
> for guest memory layout only makes things more convoluted.

If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
please let it be discussed on xen-devel first as it may affect many
people... and not all of them might be happy. :)

(tons of IMO and TLDR ahead, be warned)

Moving PCI BAR allocation from guest side to libxl is a controversial step.
This may be the architecturally wrong way in fact. There are properties and
areas of responsibility. Among primary responsibilities of guest's firmware
is PCI BARs and MMIO hole size allocation. That's a guest's territory.
Guest relocates PCI BARs (and not just BIOS able to do this), guest
firmware relocates MMIO hole base for them. If it was a real system, all
tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
them and generating ACPI tables. And that's ok as hvmloader can be
considered merely a 'supplemental' firmware to perform some tasks of
SeaBIOS/OVMF before passing control to them. This solution has some
architecture logic at least and doesn't look bad.

On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
know what the MMIO size value is might be a bad idea.
Aside from some code duplication, straying too far from the real hw paths,
or breaking existing (or future) interfaces this might have some other
negative consequences. Ex. who will be initializing guest's ACPI tables if
only libxl will know the memory layout? Some new interfaces between libxl
and hvmloader just to let the latter know what values to write to ACPI
tables being created? Or libxl will be initializing guest's ACPI tables as
well (another guest's internal task)? Similar concerns are applicable to
guest's final E820 construction.

Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
of the device model (at least for now). Right now it's QEMU who traps PCI
BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
QEMU already talks to Xen which ranges should be passed through or trapped
-- it can tell him the current overall MMIO limits as well... or handle
these limits himself -- if the MMIO hole range check is all what required to
avoid MMIO space misusing, this check can be easily implemented in QEMU,
provided that QEMU knows what memory/MMIO layout is. There is a lot of
implementation freedom where to place restrictions and checks, Xen or QEMU.
Strictly speaking, the MMIO hole itself can be considered a property of the
emulated machine and may have implementation differences for different
emulated chipsets. For example, the real i440' NB do not have an idea of
high MMIO hole at all.

We have already a sort of an interface between hvmloader and QEMU --
hvmloader has to do basic initialization for some emulated chipset's
registers (and this depends on the machine). Providing additional handling
for few other registers (TOM/TOLUD/etc) will cost almost nothing and
purpose of this registers will actually match their usage in real HW. This
way we can use an existing available interface and don't stray too far from
the real HW ways. 

I want to try this approach for Q35 bringup patches for Xen I'm currently
working on. I'll send these patches as RFC and will be glad to receive some
constructive criticism.

> > libxl can be taught to retrieve all missing info from QEMU, but this way
> > will require to perform all grunt work of PCI BARs allocation in libxl
> > itself - in order to calculate the real MMIO hole(s) size, one needs to
> > take into account all PCI BARs sizes and their alignment requirements
> > diversity + existing gaps due to RMRR ranges... basically, libxl will
> > need to do most of hvmloader/pci.c's job.
> >   
> 
> The algorithm implemented in hvmloader for that is not complicated and
> can be moved to libxl easily. What we can do is to provision a hole big
> enough to include all the initially assigned PCI devices. We can also
> account for emulated MMIO regions if necessary. But, to be honest, it
> doesn't really matter since if there is no enough space in lower MMIO
> hole for some BARs - they can be easily relocated to upper MMIO
> hole by hvmloader or the guest itself (dynamically).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-25 14:13               ` Igor Druzhinin
@ 2017-07-25 16:49                 ` Alexey G
  0 siblings, 0 replies; 18+ messages in thread
From: Alexey G @ 2017-07-25 16:49 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: Zhang, Xiong Y, xen-devel

On Tue, 25 Jul 2017 15:13:17 +0100
Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
> >> The algorithm implemented in hvmloader for that is not complicated and
> >> can be moved to libxl easily. What we can do is to provision a hole big
> >> enough to include all the initially assigned PCI devices. We can also
> >> account for emulated MMIO regions if necessary. But, to be honest, it
> >> doesn't really matter since if there is no enough space in lower MMIO
> >> hole for some BARs - they can be easily relocated to upper MMIO
> >> hole by hvmloader or the guest itself (dynamically).
> >>
> >> Igor  
> > [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and
> > don't allow hvmloader to do relocate, things will be easier. But how
> > could we supply a big enough mmio hole ? a. statical set base address
> > of mmio hole to 2G/3G. b. Like hvmloader to probe all the pci devices
> > and calculate mmio size. But this runs prior to qemu, how to probe pci
> > devices ? 
> 
> It's true that we don't know the space occupied by emulated device
> before QEMU is started.  But QEMU needs to be started with some lower
> MMIO hole size statically assigned.
> 
> One of the possible solutions is to calculate a hole size required to
> include all the assigned pass-through devices and round it up to the
> nearest GB boundary but not larger than 2GB total. If it's not enough to
> also include all the emulated devices - it's not enough, some of the PCI
> device are going to be relocated to upper MMIO hole in that case.

Not all devices are BAR64-capable and even those which are may have Option
ROM BARs (mem32 only). Yet there are 32-bits guests who will find 64-bit
BARs with values above 4GB to be extremely unacceptable. Low MMIO hole is a
precious resource. Also, one need to consider implications of PCI device
hotplugging against the 'static' precalculation approach.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-25 16:40             ` Alexey G
@ 2017-07-25 17:04               ` Igor Druzhinin
  2017-07-25 17:47               ` Andrew Cooper
  1 sibling, 0 replies; 18+ messages in thread
From: Igor Druzhinin @ 2017-07-25 17:04 UTC (permalink / raw)
  To: Alexey G; +Cc: Zhang, Xiong Y, xen-devel

On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> 
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
> 

Everything always goes through the mailing list.

> (tons of IMO and TLDR ahead, be warned)
> 
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation. That's a guest's territory.
> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.
> 

libxl is also a part of firmware so to speak. It's incorrect to think
that only hvmloader and BIOS images are "proper" firmware.

> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.
> 

The information is not confined by libxl - it's passed to hvmloader and
it can finish the tasks libxl couldn't. Although, ACPI tables could be
harmlessly initialized inside libxl as well (see PVH implementation).

> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
> 
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 
> 
> I want to try this approach for Q35 bringup patches for Xen I'm currently
> working on. I'll send these patches as RFC and will be glad to receive some
> constructive criticism.
> 

Sure. Static hole size provisioning doesn't prohibit its dynamic
counterpart.

Igor

>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of PCI BARs allocation in libxl
>>> itself - in order to calculate the real MMIO hole(s) size, one needs to
>>> take into account all PCI BARs sizes and their alignment requirements
>>> diversity + existing gaps due to RMRR ranges... basically, libxl will
>>> need to do most of hvmloader/pci.c's job.
>>>   
>>
>> The algorithm implemented in hvmloader for that is not complicated and
>> can be moved to libxl easily. What we can do is to provision a hole big
>> enough to include all the initially assigned PCI devices. We can also
>> account for emulated MMIO regions if necessary. But, to be honest, it
>> doesn't really matter since if there is no enough space in lower MMIO
>> hole for some BARs - they can be easily relocated to upper MMIO
>> hole by hvmloader or the guest itself (dynamically).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug] Intel RMRR support with upstream Qemu
  2017-07-25 16:40             ` Alexey G
  2017-07-25 17:04               ` Igor Druzhinin
@ 2017-07-25 17:47               ` Andrew Cooper
  1 sibling, 0 replies; 18+ messages in thread
From: Andrew Cooper @ 2017-07-25 17:47 UTC (permalink / raw)
  To: Alexey G, Igor Druzhinin; +Cc: Zhang, Xiong Y, xen-devel

On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
>
> (tons of IMO and TLDR ahead, be warned)
>
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation.

There is already a very blury line concerning "firmware".  What you
describe is correct for real hardware, but remember that virtual
machines are anything but.  There is already a lot of aspects of
initialisation covered by Xen or the toolstack which would be covered by
"firmware" in a native system.  A lot of these are never ever going to
move within guest control.

> That's a guest's territory.

Every tweakable which is available inside the guest is a security attack
surface.

It is important to weigh up all options, and it might indeed be the case
that putting the tweakable inside the guest is the correct action to
take, but simply "because that's what real hardware does" is not a good
enough argument.

We've had far too many XSAs due to insufficient forethought when lashing
things together in the past.

> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.

PCI BAR relocation isn't interesting to consider.  It obviously has to
be dynamic (as the OS is free to renumber the bridges).

The issue I am concerned with is purely the MMIO window selection.  From
the point of view of the guest, this is fixed at boot; changing it
requires a reboot and altering the BIOS settings.

>
> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.

Who said anything about only libxl knowing the layout?

Whatever ends up happening, the hypervisor needs to know the layout to
be able to sensibly audit a number of guest actions which currently go
unaudited.  (I am disappointed that this wasn't done in the first place,
and surprised that Xen as a whole has managed to last this long without
this information being known to the hypervisor.)

>
> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
>
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 

The difference here is that there are two broad choices of how to proceed:
1) Calculate and set up the guest physical address space statically
during creation, making it immutable once the guest starts executing
code, or
2) Support the guest having dynamic control over its physical address space.

Which of these is a smaller attack surface?

So far, I see no advantage for going with option 2 (as it doesn't affect
any guest-visible behaviour), and a compelling set of reasons (based on
simplicity and reduction of security attack surface) to prefer option 1.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-07-25 17:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-21 10:57 [Bug] Intel RMRR support with upstream Qemu Zhang, Xiong Y
2017-07-21 13:28 ` Alexey G
2017-07-21 13:56   ` Alexey G
2017-07-24  8:07     ` Zhang, Xiong Y
2017-07-24  9:53       ` Igor Druzhinin
2017-07-24 10:49         ` Zhang, Xiong Y
2017-07-24 16:42         ` Alexey G
2017-07-24 17:01           ` Andrew Cooper
2017-07-24 18:34             ` Alexey G
2017-07-24 20:39           ` Igor Druzhinin
2017-07-25  7:03             ` Zhang, Xiong Y
2017-07-25 14:13               ` Igor Druzhinin
2017-07-25 16:49                 ` Alexey G
2017-07-25 16:40             ` Alexey G
2017-07-25 17:04               ` Igor Druzhinin
2017-07-25 17:47               ` Andrew Cooper
2017-07-24 16:24       ` Alexey G
2017-07-25  2:52         ` Zhang, Xiong Y

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.