GPU passthrough performance regression in >4GB vms due to XSA-60 changes

All of lore.kernel.org
 help / color / mirror / Atom feed

* GPU passthrough performance regression in >4GB vms due to XSA-60 changes
@ 2014-05-15  9:11 Tomasz Wroblewski
  2014-05-15 12:32 ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15  9:11 UTC (permalink / raw)
  To: xen-devel; +Cc: jinsong.liu, JBeulich

Hello,

We've recently updated from Xen 4.3.1 to 4.3.2 and found out a major 
regression in gpu passthrough performance in VMs using >4GB of memory. 
When using GPU pt (some radeon cards, also intergrated intel gpu pt), 
load on cpu is constantly near maximum and screen is slow to update. The 
machines are intel haswell/ivybridge laptops/desktops, the guests are 
windows 7 64-bit HVMs.

I've bisected the failure to be due to XSA-60 changes, specifically:

commit e81d0ac25464825b3828cff5dc9e8285612992c4
Author: Liu Jinsong <jinsong.liu@intel.com>
Date:   Mon Dec 9 14:26:03 2013 +0100

     VMX: remove the problematic set_uc_mode logic

This commit seems to have removed a bit of logic which, when guest was 
setting cache disable bit in CR0 for a brief time, was iterating on all 
mapped pfns and resetting memory type in EPTs to be consistent with the 
result of mtrr.c:epte_get_entry_emt() call. I believe my tracing 
indicates this used to return WRITEBACK caching strategy for the 64bit 
memory areas where the BARs of the gpu seem to be located.

This bit of code is not happening anymore, speculatively I think the PCI 
BAR area stays as uncached which causes the general slowness. Note that 
I'm not talking about slow performance during the window the CR0 has 
caching disabled, it does stays slow even after guest reenables it 
shortly after since the problem seems to be a side effect of removed 
loop setting some default EPT policies on all pfns. Reintroducing the 
removed loop fixes the problem.

Would welcome comments/ideas how to debug this more, or maybe there's an 
obvious fix.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 12:32 ` Jan Beulich
@ 2014-05-15 12:10   ` Tomasz Wroblewski
  2014-05-15 13:23     ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15 12:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jinsong.liu, xen-devel


On 05/15/2014 02:32 PM, Jan Beulich wrote:
>>>> On 15.05.14 at 11:11, <tomasz.wroblewski@gmail.com> wrote:
>> We've recently updated from Xen 4.3.1 to 4.3.2 and found out a major
>> regression in gpu passthrough performance in VMs using >4GB of memory.
>> When using GPU pt (some radeon cards, also intergrated intel gpu pt),
>> load on cpu is constantly near maximum and screen is slow to update. The
>> machines are intel haswell/ivybridge laptops/desktops, the guests are
>> windows 7 64-bit HVMs.
>>
>> I've bisected the failure to be due to XSA-60 changes, specifically:
>>
>> commit e81d0ac25464825b3828cff5dc9e8285612992c4
>> Author: Liu Jinsong <jinsong.liu@intel.com>
>> Date:   Mon Dec 9 14:26:03 2013 +0100
>>
>>       VMX: remove the problematic set_uc_mode logic
>>
>> This commit seems to have removed a bit of logic which, when guest was
>> setting cache disable bit in CR0 for a brief time, was iterating on all
>> mapped pfns and resetting memory type in EPTs to be consistent with the
>> result of mtrr.c:epte_get_entry_emt() call. I believe my tracing
>> indicates this used to return WRITEBACK caching strategy for the 64bit
>> memory areas where the BARs of the gpu seem to be located.
>>
>> This bit of code is not happening anymore, speculatively I think the PCI
>> BAR area stays as uncached which causes the general slowness.
> But of course you would have to ask yourself whether it was correct
> for that range to be anything but UC. And as you may be aware,
> there has been more work in this area recently, so further UC-ness
> may be there in 4.5...
>
Yes, indeed it is very possible it has been only accidentally working 
properly before. Thanks, I wasn't aware about further work in this area 
in 4.5, will look.

>> Note that
>> I'm not talking about slow performance during the window the CR0 has
>> caching disabled, it does stays slow even after guest reenables it
>> shortly after since the problem seems to be a side effect of removed
>> loop setting some default EPT policies on all pfns. Reintroducing the
>> removed loop fixes the problem.
> Doing so is clearly not going to be an option.
Yes, was merely stating how things are.
> To at least harden your suspicion, did you look at D debug key
> output with and without the change (for that to be useful here
> you may want to add memory type dumping as was added in
> -unstable).
Yeah I have dumped EPT memory types on affected ranges before and after 
change. Before the change we were getting writeback, and ultimately from 
debugging that value was originating from mtrr.c:get_mtrr_type function 
(called by epte_get_entry_emt()), specifically the "return m->def_type" 
statement in there so it seems it was just going off some default in 
case the range was not specified in mtrr. After the change, it stays as UC.

Not really sure why it only affects 64bit vms but I've just noticed the 
pci BARs for the card are being relocated by hvmloader as per some logs:

(XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
(XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for 
lowmem MMIO hole
(XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem 
MMIO hole

So it might be also related to that.



> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15  9:11 GPU passthrough performance regression in >4GB vms due to XSA-60 changes Tomasz Wroblewski
@ 2014-05-15 12:32 ` Jan Beulich
  2014-05-15 12:10   ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-15 12:32 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: jinsong.liu, xen-devel

>>> On 15.05.14 at 11:11, <tomasz.wroblewski@gmail.com> wrote:
> We've recently updated from Xen 4.3.1 to 4.3.2 and found out a major 
> regression in gpu passthrough performance in VMs using >4GB of memory. 
> When using GPU pt (some radeon cards, also intergrated intel gpu pt), 
> load on cpu is constantly near maximum and screen is slow to update. The 
> machines are intel haswell/ivybridge laptops/desktops, the guests are 
> windows 7 64-bit HVMs.
> 
> I've bisected the failure to be due to XSA-60 changes, specifically:
> 
> commit e81d0ac25464825b3828cff5dc9e8285612992c4
> Author: Liu Jinsong <jinsong.liu@intel.com>
> Date:   Mon Dec 9 14:26:03 2013 +0100
> 
>      VMX: remove the problematic set_uc_mode logic
> 
> This commit seems to have removed a bit of logic which, when guest was 
> setting cache disable bit in CR0 for a brief time, was iterating on all 
> mapped pfns and resetting memory type in EPTs to be consistent with the 
> result of mtrr.c:epte_get_entry_emt() call. I believe my tracing 
> indicates this used to return WRITEBACK caching strategy for the 64bit 
> memory areas where the BARs of the gpu seem to be located.
> 
> This bit of code is not happening anymore, speculatively I think the PCI 
> BAR area stays as uncached which causes the general slowness.

But of course you would have to ask yourself whether it was correct
for that range to be anything but UC. And as you may be aware,
there has been more work in this area recently, so further UC-ness
may be there in 4.5...

> Note that 
> I'm not talking about slow performance during the window the CR0 has 
> caching disabled, it does stays slow even after guest reenables it 
> shortly after since the problem seems to be a side effect of removed 
> loop setting some default EPT policies on all pfns. Reintroducing the 
> removed loop fixes the problem.

Doing so is clearly not going to be an option.

To at least harden your suspicion, did you look at D debug key
output with and without the change (for that to be useful here
you may want to add memory type dumping as was added in
-unstable).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 12:10   ` Tomasz Wroblewski
@ 2014-05-15 13:23     ` Jan Beulich
  2014-05-15 13:39       ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-15 13:23 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: jinsong.liu, xen-devel

>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
> Not really sure why it only affects 64bit vms but I've just noticed the 
> pci BARs for the card are being relocated by hvmloader as per some logs:
> 
> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for 
> lowmem MMIO hole
> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem 
> MMIO hole
> 
> So it might be also related to that.

Indeed it might - what are the (guest) MTRR types for those regions?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 13:23     ` Jan Beulich
@ 2014-05-15 13:39       ` Tomasz Wroblewski
  2014-05-15 14:34         ` Tomasz Wroblewski
  2014-05-15 16:01         ` Jan Beulich
  0 siblings, 2 replies; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15 13:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jinsong.liu, xen-devel


On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>> Not really sure why it only affects 64bit vms but I've just noticed the
>> pci BARs for the card are being relocated by hvmloader as per some logs:
>>
>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>> lowmem MMIO hole
>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem
>> MMIO hole
>>
>> So it might be also related to that.
> Indeed it might - what are the (guest) MTRR types for those regions?
It's writeback for both the 32bit and 64bit above ranges.
> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 13:39       ` Tomasz Wroblewski
@ 2014-05-15 14:34         ` Tomasz Wroblewski
  2014-05-15 14:56           ` Tomasz Wroblewski
  2014-05-15 16:01         ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15 14:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jinsong.liu, xen-devel


On 05/15/2014 03:39 PM, Tomasz Wroblewski wrote:
>
> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>> Not really sure why it only affects 64bit vms but I've just noticed the
>>> pci BARs for the card are being relocated by hvmloader as per some 
>>> logs:
>>>
>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>> lowmem MMIO hole
>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem
>>> MMIO hole
>>>
>>> So it might be also related to that.
>> Indeed it might - what are the (guest) MTRR types for those regions?
> It's writeback for both the 32bit and 64bit above ranges.
... however, after a bit more debugging its uncached at the time 
hvmloader does the relocation so that's why it ends up like that in EPT 
tables. It does go to writeback only soon after. Haven't pinpointed the 
exact time point for that yet nor why it's being updated to writeback, 
but it seems to be before the guest starts booting (i.e. still on bios 
screens).

>> Jan
>>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 14:34         ` Tomasz Wroblewski
@ 2014-05-15 14:56           ` Tomasz Wroblewski
  2014-05-15 16:07             ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15 14:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: jinsong.liu, xen-devel


On 05/15/2014 04:34 PM, Tomasz Wroblewski wrote:
>
> On 05/15/2014 03:39 PM, Tomasz Wroblewski wrote:
>>
>> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>>> Not really sure why it only affects 64bit vms but I've just noticed 
>>>> the
>>>> pci BARs for the card are being relocated by hvmloader as per some 
>>>> logs:
>>>>
>>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>>> lowmem MMIO hole
>>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for 
>>>> lowmem
>>>> MMIO hole
>>>>
>>>> So it might be also related to that.
>>> Indeed it might - what are the (guest) MTRR types for those regions?
>> It's writeback for both the 32bit and 64bit above ranges.
> ... however, after a bit more debugging its uncached at the time 
> hvmloader does the relocation so that's why it ends up like that in 
> EPT tables. It does go to writeback only soon after. Haven't 
> pinpointed the exact time point for that yet nor why it's being 
> updated to writeback, but it seems to be before the guest starts 
> booting (i.e. still on bios screens).
... and after even more I see that the type is uncached at the time the 
relocation is happening because mtrr is disabled at that time and 
get_mtrr_type() function exits with uncached value in the first few 
lines of it. Later when guests enabled MTRR, ept is not updated. So 
maybe the EPTs should be updated in some way at that time, but I guess 
the obvious way of doing it would just reintroduce the DoS covered by 
XSA-60 (just triggerable by fiddling with mtrr state instead of CR0 cd 
bit), so I don't really know..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 16:07             ` Jan Beulich
@ 2014-05-15 15:39               ` Tomasz Wroblewski
  2014-05-16  6:33                 ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-15 15:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jinsong Liu, xen-devel


On 05/15/2014 06:07 PM, Jan Beulich wrote:
>>>> On 15.05.14 at 16:56, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/15/2014 04:34 PM, Tomasz Wroblewski wrote:
>>> On 05/15/2014 03:39 PM, Tomasz Wroblewski wrote:
>>>> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>>>>> Not really sure why it only affects 64bit vms but I've just noticed
>>>>>> the
>>>>>> pci BARs for the card are being relocated by hvmloader as per some
>>>>>> logs:
>>>>>>
>>>>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>>>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>>>>> lowmem MMIO hole
>>>>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for
>>>>>> lowmem
>>>>>> MMIO hole
>>>>>>
>>>>>> So it might be also related to that.
>>>>> Indeed it might - what are the (guest) MTRR types for those regions?
>>>> It's writeback for both the 32bit and 64bit above ranges.
>>> ... however, after a bit more debugging its uncached at the time
>>> hvmloader does the relocation so that's why it ends up like that in
>>> EPT tables. It does go to writeback only soon after. Haven't
>>> pinpointed the exact time point for that yet nor why it's being
>>> updated to writeback, but it seems to be before the guest starts
>>> booting (i.e. still on bios screens).
>> ... and after even more I see that the type is uncached at the time the
>> relocation is happening because mtrr is disabled at that time and
>> get_mtrr_type() function exits with uncached value in the first few
>> lines of it. Later when guests enabled MTRR, ept is not updated. So
>> maybe the EPTs should be updated in some way at that time,
> Which is what -unstable is now doing.
Right.
> But the question remains why this region doesn't get marked UC or
> WC, but WB.
The region doesn't seem to be marked in any way in mtrr so it just goes 
off the default type for that mtrr ((struct mtrr_state*)->def_type) 
which seems to be WB.

> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 13:39       ` Tomasz Wroblewski
  2014-05-15 14:34         ` Tomasz Wroblewski
@ 2014-05-15 16:01         ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-05-15 16:01 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: jinsong.liu, xen-devel

>>> On 15.05.14 at 15:39, <tomasz.wroblewski@gmail.com> wrote:

> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>> Not really sure why it only affects 64bit vms but I've just noticed the
>>> pci BARs for the card are being relocated by hvmloader as per some logs:
>>>
>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>> lowmem MMIO hole
>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem
>>> MMIO hole
>>>
>>> So it might be also related to that.
>> Indeed it might - what are the (guest) MTRR types for those regions?
> It's writeback for both the 32bit and 64bit above ranges.

Which seems rather wrong - it should at best be WC. Did you try to
find out why it is WB?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 14:56           ` Tomasz Wroblewski
@ 2014-05-15 16:07             ` Jan Beulich
  2014-05-15 15:39               ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-15 16:07 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Jinsong Liu, xen-devel

>>> On 15.05.14 at 16:56, <tomasz.wroblewski@gmail.com> wrote:

> On 05/15/2014 04:34 PM, Tomasz Wroblewski wrote:
>>
>> On 05/15/2014 03:39 PM, Tomasz Wroblewski wrote:
>>>
>>> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>>>> Not really sure why it only affects 64bit vms but I've just noticed 
>>>>> the
>>>>> pci BARs for the card are being relocated by hvmloader as per some 
>>>>> logs:
>>>>>
>>>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>>>> lowmem MMIO hole
>>>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for 
>>>>> lowmem
>>>>> MMIO hole
>>>>>
>>>>> So it might be also related to that.
>>>> Indeed it might - what are the (guest) MTRR types for those regions?
>>> It's writeback for both the 32bit and 64bit above ranges.
>> ... however, after a bit more debugging its uncached at the time 
>> hvmloader does the relocation so that's why it ends up like that in 
>> EPT tables. It does go to writeback only soon after. Haven't 
>> pinpointed the exact time point for that yet nor why it's being 
>> updated to writeback, but it seems to be before the guest starts 
>> booting (i.e. still on bios screens).
> ... and after even more I see that the type is uncached at the time the 
> relocation is happening because mtrr is disabled at that time and 
> get_mtrr_type() function exits with uncached value in the first few 
> lines of it. Later when guests enabled MTRR, ept is not updated. So 
> maybe the EPTs should be updated in some way at that time,

Which is what -unstable is now doing.

But the question remains why this region doesn't get marked UC or
WC, but WB.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-15 15:39               ` Tomasz Wroblewski
@ 2014-05-16  6:33                 ` Jan Beulich
  2014-05-16 11:18                   ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-16  6:33 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Jinsong Liu, xen-devel

>>> On 15.05.14 at 17:39, <tomasz.wroblewski@gmail.com> wrote:

> On 05/15/2014 06:07 PM, Jan Beulich wrote:
>>>>> On 15.05.14 at 16:56, <tomasz.wroblewski@gmail.com> wrote:
>>> On 05/15/2014 04:34 PM, Tomasz Wroblewski wrote:
>>>> On 05/15/2014 03:39 PM, Tomasz Wroblewski wrote:
>>>>> On 05/15/2014 03:23 PM, Jan Beulich wrote:
>>>>>>>>> On 15.05.14 at 14:10, <tomasz.wroblewski@gmail.com> wrote:
>>>>>>> Not really sure why it only affects 64bit vms but I've just noticed
>>>>>>> the
>>>>>>> pci BARs for the card are being relocated by hvmloader as per some
>>>>>>> logs:
>>>>>>>
>>>>>>> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
>>>>>>> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for
>>>>>>> lowmem MMIO hole
>>>>>>> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for
>>>>>>> lowmem
>>>>>>> MMIO hole
>>>>>>>
>>>>>>> So it might be also related to that.
>>>>>> Indeed it might - what are the (guest) MTRR types for those regions?
>>>>> It's writeback for both the 32bit and 64bit above ranges.
>>>> ... however, after a bit more debugging its uncached at the time
>>>> hvmloader does the relocation so that's why it ends up like that in
>>>> EPT tables. It does go to writeback only soon after. Haven't
>>>> pinpointed the exact time point for that yet nor why it's being
>>>> updated to writeback, but it seems to be before the guest starts
>>>> booting (i.e. still on bios screens).
>>> ... and after even more I see that the type is uncached at the time the
>>> relocation is happening because mtrr is disabled at that time and
>>> get_mtrr_type() function exits with uncached value in the first few
>>> lines of it. Later when guests enabled MTRR, ept is not updated. So
>>> maybe the EPTs should be updated in some way at that time,
>> Which is what -unstable is now doing.
> Right.
>> But the question remains why this region doesn't get marked UC or
>> WC, but WB.
> The region doesn't seem to be marked in any way in mtrr so it just goes 
> off the default type for that mtrr ((struct mtrr_state*)->def_type) 
> which seems to be WB.

I was expecting this for the relocated (above 4Gb) region, but is this
also the case for the one below?

In any event - all MMIO regions of passed through devices absolutely
have to be represented in the MTRRs as long as the regions' types
in the host MTRRs differ from the default type in the guest ones,
though in the end this makes me (once again) question whether
defaulting to WB and setting up exceptions for MMIO isn't the wrong
approach especially when pass-through is being used. This used to
be the other way around until April 2008 (commit a6a82232:
"x86, hvm: Lots of MTRR/PAT emulation cleanup").

If I coded up a patch to deal with this on -unstable, would you be
able to test that?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-16  6:33                 ` Jan Beulich
@ 2014-05-16 11:18                   ` Tomasz Wroblewski
  2014-05-16 11:38                     ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-16 11:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jinsong Liu, xen-devel


>>>> ... and after even more I see that the type is uncached at the time the
>>>> relocation is happening because mtrr is disabled at that time and
>>>> get_mtrr_type() function exits with uncached value in the first few
>>>> lines of it. Later when guests enabled MTRR, ept is not updated. So
>>>> maybe the EPTs should be updated in some way at that time,
>>> Which is what -unstable is now doing.
>> Right.
>>> But the question remains why this region doesn't get marked UC or
>>> WC, but WB.
>> The region doesn't seem to be marked in any way in mtrr so it just goes
>> off the default type for that mtrr ((struct mtrr_state*)->def_type)
>> which seems to be WB.
> I was expecting this for the relocated (above 4Gb) region, but is this
> also the case for the one below?
You're right, I had a silly typo in the pfn number I was testing, the 
lower region indeed does have entry in the MTRR and it is set to UC. The 
relocated region doesn't.

> In any event - all MMIO regions of passed through devices absolutely
> have to be represented in the MTRRs as long as the regions' types
> in the host MTRRs differ from the default type in the guest ones,
> though in the end this makes me (once again) question whether
> defaulting to WB and setting up exceptions for MMIO isn't the wrong
> approach especially when pass-through is being used. This used to
> be the other way around until April 2008 (commit a6a82232:
> "x86, hvm: Lots of MTRR/PAT emulation cleanup").
>
> If I coded up a patch to deal with this on -unstable, would you be
> able to test that?
Willing to give it a go (xen major version updates are often problematic 
to do though so can't promise success). What would your patch be doing? 
Adding entries to MTRR for the relocated regions? I do fear it might not 
be enough since thru recent debugging it seems there are more wrongs 
done by the relocation code - the p2m entries for the relocated region 
are not setup with p2m_mmio_direct for example but rather default 
p2m_ram_rw (unless this has changed again past 4.3.2).


> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-16 11:18                   ` Tomasz Wroblewski
@ 2014-05-16 11:38                     ` Jan Beulich
  2014-05-16 14:36                       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-16 11:38 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Jinsong Liu, xen-devel

>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>> If I coded up a patch to deal with this on -unstable, would you be
>> able to test that?
> Willing to give it a go (xen major version updates are often problematic 
> to do though so can't promise success). What would your patch be doing? 
> Adding entries to MTRR for the relocated regions?

This and properly declare the region in ACPI's _CRS. For starters I'll
probably try keeping the WB default overlaid with UC variable ranges,
as that's going to be the less intrusive change.

> I do fear it might not 
> be enough since thru recent debugging it seems there are more wrongs 
> done by the relocation code - the p2m entries for the relocated region 
> are not setup with p2m_mmio_direct for example but rather default 
> p2m_ram_rw (unless this has changed again past 4.3.2).

Hmm, I would hope that's not a problem anymore on -unstable, but
then again I neither recall a change nor am I sure where this would
go wrong (in the hypervisor - all it cares about is that the tools
properly invoke XEN_DOMCTL_memory_mapping; please check that
these calls are being made).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-16 11:38                     ` Jan Beulich
@ 2014-05-16 14:36                       ` Jan Beulich
  2014-05-19 10:29                         ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-16 14:36 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Jinsong Liu, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]

>>> On 16.05.14 at 13:38, <JBeulich@suse.com> wrote:
>>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>>> If I coded up a patch to deal with this on -unstable, would you be
>>> able to test that?
>> Willing to give it a go (xen major version updates are often problematic 
>> to do though so can't promise success). What would your patch be doing? 
>> Adding entries to MTRR for the relocated regions?
> 
> This and properly declare the region in ACPI's _CRS. For starters I'll
> probably try keeping the WB default overlaid with UC variable ranges,
> as that's going to be the less intrusive change.

Okay here are two patches - the first to deal with the above mentioned
items, and the second to further increase correctness and at once
shrink the number of MTRR regions needed.

Afaict they apply equally well to stable-4.3, master, and staging.

But to be honest I don't expect any performance improvement, all
I'd expect is that BARs relocated above 4Gb would now get treated
equally to such below 4Gb - UC in all cases.

Jan


[-- Attachment #2: hvmloader-PCI-above-4G.patch --]
[-- Type: text/plain, Size: 7006 bytes --]

--- a/tools/firmware/hvmloader/acpi/build.c
+++ b/tools/firmware/hvmloader/acpi/build.c
@@ -53,6 +53,7 @@ struct acpi_info {
     uint32_t madt_csum_addr;    /* 12   - Address of MADT checksum */
     uint32_t madt_lapic0_addr;  /* 16   - Address of first MADT LAPIC struct */
     uint32_t vm_gid_addr;       /* 20   - Address of VM generation id buffer */
+    uint64_t pci_hi_min, pci_hi_len; /* 24, 32 - PCI I/O hole boundaries */
 };
 
 /* Number of processor objects in the chosen DSDT. */
@@ -541,6 +542,11 @@ void acpi_build_tables(struct acpi_confi
     acpi_info->hpet_present = hpet_exists(ACPI_HPET_ADDRESS);
     acpi_info->pci_min = pci_mem_start;
     acpi_info->pci_len = pci_mem_end - pci_mem_start;
+    if ( pci_hi_mem_end > pci_hi_mem_start )
+    {
+        acpi_info->pci_hi_min = pci_hi_mem_start;
+        acpi_info->pci_hi_len = pci_hi_mem_end - pci_hi_mem_start;
+    }
 
     return;
 
--- a/tools/firmware/hvmloader/acpi/dsdt.asl
+++ b/tools/firmware/hvmloader/acpi/dsdt.asl
@@ -45,7 +45,7 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, 
     Scope (\_SB)
     {
        /* ACPI_INFO_PHYSICAL_ADDRESS == 0xFC000000 */
-       OperationRegion(BIOS, SystemMemory, 0xFC000000, 24)
+       OperationRegion(BIOS, SystemMemory, 0xFC000000, 40)
        Field(BIOS, ByteAcc, NoLock, Preserve) {
            UAR1, 1,
            UAR2, 1,
@@ -56,7 +56,9 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, 
            PLEN, 32,
            MSUA, 32, /* MADT checksum address */
            MAPA, 32, /* MADT LAPIC0 address */
-           VGIA, 32  /* VM generation id address */
+           VGIA, 32, /* VM generation id address */
+           HMIN, 64,
+           HLEN, 64
        }
 
         /* Fix HCT test for 0x400 pci memory:
@@ -136,7 +138,7 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, 
                     /* reserve memory for pci devices */
                     DWordMemory(
                         ResourceProducer, PosDecode, MinFixed, MaxFixed,
-                        Cacheable, ReadWrite,
+                        WriteCombining, ReadWrite,
                         0x00000000,
                         0x000A0000,
                         0x000BFFFF,
@@ -145,13 +147,24 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, 
 
                     DWordMemory(
                         ResourceProducer, PosDecode, MinFixed, MaxFixed,
-                        Cacheable, ReadWrite,
+                        NonCacheable, ReadWrite,
                         0x00000000,
                         0xF0000000,
                         0xF4FFFFFF,
                         0x00000000,
                         0x05000000,
                         ,, _Y01)
+
+                    QWordMemory (
+                        ResourceProducer, PosDecode, MinFixed, MaxFixed,
+                        NonCacheable, ReadWrite,
+                        0x0000000000000000,
+                        0x0000000000000000,
+                        0x0000000000000000,
+                        0x0000000000000000,
+                        0x0000000000000000,
+                        ,, _Y02)
+
                 })
 
                 CreateDWordField(PRT0, \_SB.PCI0._CRS._Y01._MIN, MMIN)
@@ -163,6 +176,15 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, 
                 Add(MMIN, MLEN, MMAX)
                 Subtract(MMAX, One, MMAX)
 
+                CreateQWordField(PRT0, \_SB.PCI0._CRS._Y02._MIN, HMIN)
+                CreateQWordField(PRT0, \_SB.PCI0._CRS._Y02._MAX, HMAX)
+                CreateQWordField(PRT0, \_SB.PCI0._CRS._Y02._LEN, HLEN)
+
+                Store(\_SB.HMIN, HMIN)
+                Store(\_SB.HLEN, HLEN)
+                Add(HMIN, HLEN, HMAX)
+                Subtract(HMAX, One, HMAX)
+
                 Return (PRT0)
             }
 
--- a/tools/firmware/hvmloader/cacheattr.c
+++ b/tools/firmware/hvmloader/cacheattr.c
@@ -97,8 +97,7 @@ void cacheattr_init(void)
     nr_var_ranges = (uint8_t)mtrr_cap;
     if ( nr_var_ranges != 0 )
     {
-        unsigned long base = pci_mem_start, size;
-        int i;
+        uint64_t base = pci_mem_start, size;
 
         for ( i = 0; (base != pci_mem_end) && (i < nr_var_ranges); i++ )
         {
@@ -109,8 +108,22 @@ void cacheattr_init(void)
                 size >>= 1;
 
             wrmsr(MSR_MTRRphysBase(i), base);
-            wrmsr(MSR_MTRRphysMask(i),
-                  (~(uint64_t)(size-1) & addr_mask) | (1u << 11));
+            wrmsr(MSR_MTRRphysMask(i), (~(size - 1) & addr_mask) | (1u << 11));
+
+            base += size;
+        }
+
+        for ( base = pci_hi_mem_start;
+              (base != pci_hi_mem_end) && (i < nr_var_ranges); i++ )
+        {
+            size = PAGE_SIZE;
+            while ( !(base & size) )
+                size <<= 1;
+            while ( (base + size < base) || (base + size > pci_hi_mem_end) )
+                size >>= 1;
+
+            wrmsr(MSR_MTRRphysBase(i), base);
+            wrmsr(MSR_MTRRphysMask(i), (~(size - 1) & addr_mask) | (1u << 11));
 
             base += size;
         }
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -57,7 +57,7 @@ extern struct bios_config ovmf_config;
 #define PCI_MEM_END         0xfc000000
 
 extern unsigned long pci_mem_start, pci_mem_end;
-
+extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
 
 /* Memory map. */
 #define SCRATCH_PHYSICAL_ADDRESS      0x00010000
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -32,6 +32,7 @@
 
 unsigned long pci_mem_start = PCI_MEM_START;
 unsigned long pci_mem_end = PCI_MEM_END;
+uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
@@ -345,9 +346,8 @@ void pci_setup(void)
                 if ( high_mem_resource.base & (bar_sz - 1) )
                     high_mem_resource.base = high_mem_resource.base - 
                         (high_mem_resource.base & (bar_sz - 1)) + bar_sz;
-                else
-                    high_mem_resource.base = high_mem_resource.base - 
-                        (high_mem_resource.base & (bar_sz - 1));
+                if ( !pci_hi_mem_start )
+                    pci_hi_mem_start = high_mem_resource.base;
                 resource = &high_mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             } 
@@ -398,6 +398,16 @@ void pci_setup(void)
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
+    if ( pci_hi_mem_start )
+    {
+        /*
+         * Make end address alignment match the start address one's so that
+         * fewer variable range MTRRs are needed to cover the range.
+         */
+        pci_hi_mem_end = ((high_mem_resource.base - 1) |
+                          ((pci_hi_mem_start & -pci_hi_mem_start) - 1)) + 1;
+    }
+
     if ( vga_devfn != 256 )
     {
         /*

[-- Attachment #3: hvmloader-UC-up-to-4G.patch --]
[-- Type: text/plain, Size: 685 bytes --]

--- a/tools/firmware/hvmloader/cacheattr.c
+++ b/tools/firmware/hvmloader/cacheattr.c
@@ -99,12 +99,12 @@ void cacheattr_init(void)
     {
         uint64_t base = pci_mem_start, size;
 
-        for ( i = 0; (base != pci_mem_end) && (i < nr_var_ranges); i++ )
+        for ( i = 0; !(base >> 32) && (i < nr_var_ranges); i++ )
         {
             size = PAGE_SIZE;
             while ( !(base & size) )
                 size <<= 1;
-            while ( ((base + size) < base) || ((base + size) > pci_mem_end) )
+            while ( ((base + size) < base) || ((base + size - 1) >> 32) )
                 size >>= 1;
 
             wrmsr(MSR_MTRRphysBase(i), base);

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-16 14:36                       ` Jan Beulich
@ 2014-05-19 10:29                         ` Tomasz Wroblewski
  2014-05-19 10:38                           ` Jan Beulich
  2014-05-19 10:42                           ` Tomasz Wroblewski
  0 siblings, 2 replies; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 10:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/16/2014 04:36 PM, Jan Beulich wrote:
>>>> On 16.05.14 at 13:38, <JBeulich@suse.com> wrote:
>>>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>>>> If I coded up a patch to deal with this on -unstable, would you be
>>>> able to test that?
>>> Willing to give it a go (xen major version updates are often problematic
>>> to do though so can't promise success). What would your patch be doing?
>>> Adding entries to MTRR for the relocated regions?
>> This and properly declare the region in ACPI's _CRS. For starters I'll
>> probably try keeping the WB default overlaid with UC variable ranges,
>> as that's going to be the less intrusive change.
> Okay here are two patches - the first to deal with the above mentioned
> items, and the second to further increase correctness and at once
> shrink the number of MTRR regions needed.
>
> Afaict they apply equally well to stable-4.3, master, and staging.
>
> But to be honest I don't expect any performance improvement, all
> I'd expect is that BARs relocated above 4Gb would now get treated
> equally to such below 4Gb - UC in all cases.
Thanks Jan. I've tried the patches and you're correct, putting UC in 
MTRR for the relocated region didn't help the issue. However, I had to 
hack that manually - the codepaths to do that in your hvmloader patch 
were not activating. The hvmloader is not programming guest pci bars to 
64bit regions at all, rather still programming them with 32 bit 
regions... upon a look this seems because using_64bar conditon, as well 
as bar64_relocate in hvmloader/pci.c is always false.

So bar relocation to 64bit is not happening, but ram relocation as per 
the code tagged as /* Relocate RAM that overlaps PCI space (in 64k-page 
chunks). */ is happening. This maybe is correct (?), although I think 
the fact that RAM is relocated but not the BAR causes the tools (i.e. 
qemu) to lose sight of what memory is used for mmio and as you mentioned 
in one of the previous posts, the calls which would set it to 
mmio_direct in p2m table are not happening. Our qemu is pretty ancient 
and doesn't support 64bit bars so its not super trivial to verify 
whether relocating bars to 64bit would help. Trying to make sense out of 
this..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 10:29                         ` Tomasz Wroblewski
@ 2014-05-19 10:38                           ` Jan Beulich
  2014-05-19 10:47                             ` Tomasz Wroblewski
  2014-05-19 10:42                           ` Tomasz Wroblewski
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 10:38 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 12:29, <tomasz.wroblewski@gmail.com> wrote:

> On 05/16/2014 04:36 PM, Jan Beulich wrote:
>>>>> On 16.05.14 at 13:38, <JBeulich@suse.com> wrote:
>>>>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>>>>> If I coded up a patch to deal with this on -unstable, would you be
>>>>> able to test that?
>>>> Willing to give it a go (xen major version updates are often problematic
>>>> to do though so can't promise success). What would your patch be doing?
>>>> Adding entries to MTRR for the relocated regions?
>>> This and properly declare the region in ACPI's _CRS. For starters I'll
>>> probably try keeping the WB default overlaid with UC variable ranges,
>>> as that's going to be the less intrusive change.
>> Okay here are two patches - the first to deal with the above mentioned
>> items, and the second to further increase correctness and at once
>> shrink the number of MTRR regions needed.
>>
>> Afaict they apply equally well to stable-4.3, master, and staging.
>>
>> But to be honest I don't expect any performance improvement, all
>> I'd expect is that BARs relocated above 4Gb would now get treated
>> equally to such below 4Gb - UC in all cases.
> Thanks Jan. I've tried the patches and you're correct, putting UC in 
> MTRR for the relocated region didn't help the issue. However, I had to 
> hack that manually - the codepaths to do that in your hvmloader patch 
> were not activating. The hvmloader is not programming guest pci bars to 
> 64bit regions at all, rather still programming them with 32 bit 
> regions... upon a look this seems because using_64bar conditon, as well 
> as bar64_relocate in hvmloader/pci.c is always false.

I'm confused - iirc this started out because you saw the graphics
card BARs to be put above 4Gb. And now you say they aren't being
put there. But ...

> So bar relocation to 64bit is not happening, but ram relocation as per 
> the code tagged as /* Relocate RAM that overlaps PCI space (in 64k-page 
> chunks). */ is happening. This maybe is correct (?), although I think 
> the fact that RAM is relocated but not the BAR causes the tools (i.e. 
> qemu) to lose sight of what memory is used for mmio and as you mentioned 
> in one of the previous posts, the calls which would set it to 
> mmio_direct in p2m table are not happening. Our qemu is pretty ancient 
> and doesn't support 64bit bars so its not super trivial to verify 
> whether relocating bars to 64bit would help. Trying to make sense out of 
> this..

... indeed I was apparently mis-interpreting what you said - all that
really was to be concluded from the log messages you quoted was
that RAM pages got relocated. But according to

(XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
(XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for lowmem MMIO hole
(XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem MMIO hole

and assuming that these were all related messages, this really isn't
a sign of using 64-bit BARs yet. All it tells us is that the PCI region
gets extended from 0xf0000000-0xfc000000 to 0xe0000000-0xfc000000.

So perhaps time for sending complete logs, plus suitable information
from inside the guest of how things (RAM, MMIO, MTRRs) end up being
set up?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 10:29                         ` Tomasz Wroblewski
  2014-05-19 10:38                           ` Jan Beulich
@ 2014-05-19 10:42                           ` Tomasz Wroblewski
  2014-05-19 11:01                             ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 10:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/19/2014 12:29 PM, Tomasz Wroblewski wrote:
>
> On 05/16/2014 04:36 PM, Jan Beulich wrote:
>>>>> On 16.05.14 at 13:38, <JBeulich@suse.com> wrote:
>>>>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>>>>> If I coded up a patch to deal with this on -unstable, would you be
>>>>> able to test that?
>>>> Willing to give it a go (xen major version updates are often 
>>>> problematic
>>>> to do though so can't promise success). What would your patch be 
>>>> doing?
>>>> Adding entries to MTRR for the relocated regions?
>>> This and properly declare the region in ACPI's _CRS. For starters I'll
>>> probably try keeping the WB default overlaid with UC variable ranges,
>>> as that's going to be the less intrusive change.
>> Okay here are two patches - the first to deal with the above mentioned
>> items, and the second to further increase correctness and at once
>> shrink the number of MTRR regions needed.
>>
>> Afaict they apply equally well to stable-4.3, master, and staging.
>>
>> But to be honest I don't expect any performance improvement, all
>> I'd expect is that BARs relocated above 4Gb would now get treated
>> equally to such below 4Gb - UC in all cases.
> Thanks Jan. I've tried the patches and you're correct, putting UC in 
> MTRR for the relocated region didn't help the issue. However, I had to 
> hack that manually - the codepaths to do that in your hvmloader patch 
> were not activating. The hvmloader is not programming guest pci bars 
> to 64bit regions at all, rather still programming them with 32 bit 
> regions... upon a look this seems because using_64bar conditon, as 
> well as bar64_relocate in hvmloader/pci.c is always false.
>
> So bar relocation to 64bit is not happening, but ram relocation as per 
> the code tagged as /* Relocate RAM that overlaps PCI space (in 
> 64k-page chunks). */ is happening. This maybe is correct (?), although 
> I think the fact that RAM is relocated but not the BAR causes the 
> tools (i.e. qemu) to lose sight of what memory is used for mmio and as 
> you mentioned in one of the previous posts, the calls which would set 
> it to mmio_direct in p2m table are not happening. Our qemu is pretty 
> ancient and doesn't support 64bit bars so its not super trivial to 
> verify whether relocating bars to 64bit would help. Trying to make 
> sense out of this..
>
>
Actually seems to be like the plausible explanation for the performance 
issues we see could be that

- some region of guest space has been relocated by hvmloader out to 
64bit memory to enlarge pci mmio hole (which stays in 32bit space)
- BUT the caching on that relocated region is UC since at the time of 
the relocation MTRR was disabled and that caused the EPT entry to get UC 
type.
- however since this is just some region of guest memory not actually 
used for mmio, just relocated out of mmio hole, the caching should be WB
- guest doesn't use that region for mmio but for some other tasks, 
access to that is slow and slows the guest down.
- as you mentioned it might already be fixed on unstable since EPTs are 
updated there when mtrr is enabled.

That would explain why retaining old loop fixed by XSA-60 fixes the perf 
issue, since it runs at the time mtrr is enabled, it reverts the 
relocated region to WB (which is correct I guess for the non mmio-regions)



>
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 10:38                           ` Jan Beulich
@ 2014-05-19 10:47                             ` Tomasz Wroblewski
  2014-05-19 11:07                               ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 10:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/19/2014 12:38 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 12:29, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/16/2014 04:36 PM, Jan Beulich wrote:
>>>>>> On 16.05.14 at 13:38, <JBeulich@suse.com> wrote:
>>>>>>> On 16.05.14 at 13:18, <tomasz.wroblewski@gmail.com> wrote:
>>>>>> If I coded up a patch to deal with this on -unstable, would you be
>>>>>> able to test that?
>>>>> Willing to give it a go (xen major version updates are often problematic
>>>>> to do though so can't promise success). What would your patch be doing?
>>>>> Adding entries to MTRR for the relocated regions?
>>>> This and properly declare the region in ACPI's _CRS. For starters I'll
>>>> probably try keeping the WB default overlaid with UC variable ranges,
>>>> as that's going to be the less intrusive change.
>>> Okay here are two patches - the first to deal with the above mentioned
>>> items, and the second to further increase correctness and at once
>>> shrink the number of MTRR regions needed.
>>>
>>> Afaict they apply equally well to stable-4.3, master, and staging.
>>>
>>> But to be honest I don't expect any performance improvement, all
>>> I'd expect is that BARs relocated above 4Gb would now get treated
>>> equally to such below 4Gb - UC in all cases.
>> Thanks Jan. I've tried the patches and you're correct, putting UC in
>> MTRR for the relocated region didn't help the issue. However, I had to
>> hack that manually - the codepaths to do that in your hvmloader patch
>> were not activating. The hvmloader is not programming guest pci bars to
>> 64bit regions at all, rather still programming them with 32 bit
>> regions... upon a look this seems because using_64bar conditon, as well
>> as bar64_relocate in hvmloader/pci.c is always false.
> I'm confused - iirc this started out because you saw the graphics
> card BARs to be put above 4Gb. And now you say they aren't being
> put there. But ...
>
>> So bar relocation to 64bit is not happening, but ram relocation as per
>> the code tagged as /* Relocate RAM that overlaps PCI space (in 64k-page
>> chunks). */ is happening. This maybe is correct (?), although I think
>> the fact that RAM is relocated but not the BAR causes the tools (i.e.
>> qemu) to lose sight of what memory is used for mmio and as you mentioned
>> in one of the previous posts, the calls which would set it to
>> mmio_direct in p2m table are not happening. Our qemu is pretty ancient
>> and doesn't support 64bit bars so its not super trivial to verify
>> whether relocating bars to 64bit would help. Trying to make sense out of
>> this..
> ... indeed I was apparently mis-interpreting what you said - all that
> really was to be concluded from the log messages you quoted was
> that RAM pages got relocated. But according to
>
> (XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
> (XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for lowmem MMIO hole
> (XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem MMIO hole
>
> and assuming that these were all related messages, this really isn't
> a sign of using 64-bit BARs yet. All it tells us is that the PCI region
> gets extended from 0xf0000000-0xfc000000 to 0xe0000000-0xfc000000.
>
> So perhaps time for sending complete logs, plus suitable information
> from inside the guest of how things (RAM, MMIO, MTRRs) end up being
> set up?
Could be, though please read the explanation I came up in the other post 
whether its enough, I think it makes sense... 64bit guest BARs are 
indeed not in use (confirmed from guest). MTRR is setup such that only 
the low region is UC, which is correct.

But the RAM relocation code causes the caching on relocated region to be 
UC instead of WB due to the timing (very early, MTRR disabled) at which 
it runs, which is incorrect. I am thinking enabling MTRR during that 
relocation would probably fix it on 4.3


> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 10:42                           ` Tomasz Wroblewski
@ 2014-05-19 11:01                             ` Jan Beulich
  2014-05-19 11:09                               ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 11:01 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 12:42, <tomasz.wroblewski@gmail.com> wrote:
> Actually seems to be like the plausible explanation for the performance 
> issues we see could be that
> 
> - some region of guest space has been relocated by hvmloader out to 
> 64bit memory to enlarge pci mmio hole (which stays in 32bit space)
> - BUT the caching on that relocated region is UC since at the time of 
> the relocation MTRR was disabled and that caused the EPT entry to get UC 
> type.

But this would be easily seen from 'D' debug key output - did you
check this is what is the case?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 10:47                             ` Tomasz Wroblewski
@ 2014-05-19 11:07                               ` Jan Beulich
  2014-05-19 11:32                                 ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 11:07 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 12:47, <tomasz.wroblewski@gmail.com> wrote:
> On 05/19/2014 12:38 PM, Jan Beulich wrote:
>> So perhaps time for sending complete logs, plus suitable information
>> from inside the guest of how things (RAM, MMIO, MTRRs) end up being
>> set up?
> Could be, though please read the explanation I came up in the other post 
> whether its enough, I think it makes sense... 64bit guest BARs are 
> indeed not in use (confirmed from guest). MTRR is setup such that only 
> the low region is UC, which is correct.

Yes, that's a very sensible theory, which - as just said in the other
reply - can be easily verified.

> But the RAM relocation code causes the caching on relocated region to be 
> UC instead of WB due to the timing (very early, MTRR disabled) at which 
> it runs, which is incorrect. I am thinking enabling MTRR during that 
> relocation would probably fix it on 4.3

Except that this is a chicken and egg problem then: In order to
populate the variable range MTRRs, the BAR assignment (and hence
the prerequisite RAM relocation) need to be done already. What we
might be able to do (after careful evaluation whether backporting
what we have on -unstable wouldn't be the better option, which
first of all implies you'd need to test things there) is enable MTRRs
with WB default, but without any variable ranges set up _before_
doing the RAM relocation/BAR assignment. The main problem to solve
there, however, would still be to make sure the EPT entries get
re-created for the regions that we don't want to be WB (after the
variable ranges got set up). That, I'm afraid, would still lead us to
considerations on backporting at least some of the changes from
-unstable.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 11:01                             ` Jan Beulich
@ 2014-05-19 11:09                               ` Tomasz Wroblewski
  2014-05-19 11:19                                 ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 11:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/19/2014 01:01 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 12:42, <tomasz.wroblewski@gmail.com> wrote:
>> Actually seems to be like the plausible explanation for the performance
>> issues we see could be that
>>
>> - some region of guest space has been relocated by hvmloader out to
>> 64bit memory to enlarge pci mmio hole (which stays in 32bit space)
>> - BUT the caching on that relocated region is UC since at the time of
>> the relocation MTRR was disabled and that caused the EPT entry to get UC
>> type.
> But this would be easily seen from 'D' debug key output - did you
> check this is what is the case?
Yes I've checked that, though I used some homebrew EPT dumping code 
instead. Mentioned it a bit earlier, in the original post as well as 
http://lists.xen.org/archives/html/xen-devel/2014-05/msg01965.html. I 
just incorrectly thought that it's the BAR being relocated which 
introduced some confusion..

> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 11:09                               ` Tomasz Wroblewski
@ 2014-05-19 11:19                                 ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 11:19 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 13:09, <tomasz.wroblewski@gmail.com> wrote:

> On 05/19/2014 01:01 PM, Jan Beulich wrote:
>>>>> On 19.05.14 at 12:42, <tomasz.wroblewski@gmail.com> wrote:
>>> Actually seems to be like the plausible explanation for the performance
>>> issues we see could be that
>>>
>>> - some region of guest space has been relocated by hvmloader out to
>>> 64bit memory to enlarge pci mmio hole (which stays in 32bit space)
>>> - BUT the caching on that relocated region is UC since at the time of
>>> the relocation MTRR was disabled and that caused the EPT entry to get UC
>>> type.
>> But this would be easily seen from 'D' debug key output - did you
>> check this is what is the case?
> Yes I've checked that, though I used some homebrew EPT dumping code 
> instead. Mentioned it a bit earlier, in the original post as well as 
> http://lists.xen.org/archives/html/xen-devel/2014-05/msg01965.html. I 
> just incorrectly thought that it's the BAR being relocated which 
> introduced some confusion..

Ah, okay. So then the question is how to address this. Which means
it now becomes really important to be certain that this is already
fixed on -unstable, so we can evaluate whether we want to backport
some non-trivial changes, or find a workaround just for the stable
trees (which, as said before, I can't easily see how it would look like).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 11:07                               ` Jan Beulich
@ 2014-05-19 11:32                                 ` Tomasz Wroblewski
  2014-05-19 12:06                                   ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 11:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/19/2014 01:07 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 12:47, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/19/2014 12:38 PM, Jan Beulich wrote:
>>> So perhaps time for sending complete logs, plus suitable information
>>> from inside the guest of how things (RAM, MMIO, MTRRs) end up being
>>> set up?
>> Could be, though please read the explanation I came up in the other post
>> whether its enough, I think it makes sense... 64bit guest BARs are
>> indeed not in use (confirmed from guest). MTRR is setup such that only
>> the low region is UC, which is correct.
> Yes, that's a very sensible theory, which - as just said in the other
> reply - can be easily verified.
>
>> But the RAM relocation code causes the caching on relocated region to be
>> UC instead of WB due to the timing (very early, MTRR disabled) at which
>> it runs, which is incorrect. I am thinking enabling MTRR during that
>> relocation would probably fix it on 4.3
> Except that this is a chicken and egg problem then: In order to
> populate the variable range MTRRs, the BAR assignment (and hence
> the prerequisite RAM relocation) need to be done already.
I am not sure; looking at hvmloader code, wouldn't it be possible to 
calculate the BAR locations first, then update the MTRR var ranges and 
enable it, and only then actually write the BAR registers (from 
precalculated info)? Presumably it's only the write part which needs to 
be done after relocation as it causes qemu to setup mmio etc.

> What we
> might be able to do (after careful evaluation whether backporting
> what we have on -unstable wouldn't be the better option, which
> first of all implies you'd need to test things there) is enable MTRRs
> with WB default, but without any variable ranges set up _before_
> doing the RAM relocation/BAR assignment. The main problem to solve
> there, however, would still be to make sure the EPT entries get
> re-created for the regions that we don't want to be WB (after the
> variable ranges got set up). That, I'm afraid, would still lead us to
> considerations on backporting at least some of the changes from
> -unstable.
Yeah I gave about a day of effort to port us onto unstable and test 
there but it sadly looks to be a bigger job, so leaving that as a last 
resort (though planning to spend couple more days on it soon).


> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 11:32                                 ` Tomasz Wroblewski
@ 2014-05-19 12:06                                   ` Jan Beulich
  2014-05-19 12:17                                     ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 12:06 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 13:32, <tomasz.wroblewski@gmail.com> wrote:

> On 05/19/2014 01:07 PM, Jan Beulich wrote:
>>>>> On 19.05.14 at 12:47, <tomasz.wroblewski@gmail.com> wrote:
>>> On 05/19/2014 12:38 PM, Jan Beulich wrote:
>>>> So perhaps time for sending complete logs, plus suitable information
>>>> from inside the guest of how things (RAM, MMIO, MTRRs) end up being
>>>> set up?
>>> Could be, though please read the explanation I came up in the other post
>>> whether its enough, I think it makes sense... 64bit guest BARs are
>>> indeed not in use (confirmed from guest). MTRR is setup such that only
>>> the low region is UC, which is correct.
>> Yes, that's a very sensible theory, which - as just said in the other
>> reply - can be easily verified.
>>
>>> But the RAM relocation code causes the caching on relocated region to be
>>> UC instead of WB due to the timing (very early, MTRR disabled) at which
>>> it runs, which is incorrect. I am thinking enabling MTRR during that
>>> relocation would probably fix it on 4.3
>> Except that this is a chicken and egg problem then: In order to
>> populate the variable range MTRRs, the BAR assignment (and hence
>> the prerequisite RAM relocation) need to be done already.
> I am not sure; looking at hvmloader code, wouldn't it be possible to 
> calculate the BAR locations first, then update the MTRR var ranges and 
> enable it, and only then actually write the BAR registers (from 
> precalculated info)? Presumably it's only the write part which needs to 
> be done after relocation as it causes qemu to setup mmio etc.

Leaving aside that this would require splitting pci_setup(), and
hence communicating state from its main part (RAM relocation and
resource allocation) to the final one (BAR writing), which by itself is
already not as simple a change as one would like for something that
is intended to go _only_ into the stable trees, you also already
imply with the above that we'd add a pre-enabling step for the
MTRRs. I.e. we'd end up with

- enable fixed-range MTRRs and set default to WB (no var ranges)
- pci_setup_early()
- set variable range MTRRs
- pci_setup_late()
- set MTRRs in one go on APs

Yes, that ought to work. But do we want this much diverging from
-unstable on 4.3 and 4.4? Are we certain that namely the two-stage
MTRR setup won't have any unintended side effects?

> Yeah I gave about a day of effort to port us onto unstable and test 
> there but it sadly looks to be a bigger job, so leaving that as a last 
> resort (though planning to spend couple more days on it soon).

Then as an alternative did you try pulling over the EPT changes
from -unstable?

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 12:06                                   ` Jan Beulich
@ 2014-05-19 12:17                                     ` Tomasz Wroblewski
  2014-05-19 12:44                                       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 12:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel


On 05/19/2014 02:06 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 13:32, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/19/2014 01:07 PM, Jan Beulich wrote:
>>>>>> On 19.05.14 at 12:47, <tomasz.wroblewski@gmail.com> wrote:
>>>> On 05/19/2014 12:38 PM, Jan Beulich wrote:
>>>>> So perhaps time for sending complete logs, plus suitable information
>>>>> from inside the guest of how things (RAM, MMIO, MTRRs) end up being
>>>>> set up?
>>>> Could be, though please read the explanation I came up in the other post
>>>> whether its enough, I think it makes sense... 64bit guest BARs are
>>>> indeed not in use (confirmed from guest). MTRR is setup such that only
>>>> the low region is UC, which is correct.
>>> Yes, that's a very sensible theory, which - as just said in the other
>>> reply - can be easily verified.
>>>
>>>> But the RAM relocation code causes the caching on relocated region to be
>>>> UC instead of WB due to the timing (very early, MTRR disabled) at which
>>>> it runs, which is incorrect. I am thinking enabling MTRR during that
>>>> relocation would probably fix it on 4.3
>>> Except that this is a chicken and egg problem then: In order to
>>> populate the variable range MTRRs, the BAR assignment (and hence
>>> the prerequisite RAM relocation) need to be done already.
>> I am not sure; looking at hvmloader code, wouldn't it be possible to
>> calculate the BAR locations first, then update the MTRR var ranges and
>> enable it, and only then actually write the BAR registers (from
>> precalculated info)? Presumably it's only the write part which needs to
>> be done after relocation as it causes qemu to setup mmio etc.
> Leaving aside that this would require splitting pci_setup(), and
> hence communicating state from its main part (RAM relocation and
> resource allocation) to the final one (BAR writing), which by itself is
> already not as simple a change as one would like for something that
> is intended to go _only_ into the stable trees, you also already
> imply with the above that we'd add a pre-enabling step for the
> MTRRs. I.e. we'd end up with
>
> - enable fixed-range MTRRs and set default to WB (no var ranges)
> - pci_setup_early()
> - set variable range MTRRs
> - pci_setup_late()
> - set MTRRs in one go on APs
>
> Yes, that ought to work. But do we want this much diverging from
> -unstable on 4.3 and 4.4? Are we certain that namely the two-stage
> MTRR setup won't have any unintended side effects?
>
>> Yeah I gave about a day of effort to port us onto unstable and test
>> there but it sadly looks to be a bigger job, so leaving that as a last
>> resort (though planning to spend couple more days on it soon).
> Then as an alternative did you try pulling over the EPT changes
> from -unstable?
That would be indeed preferable, I've looked over them but couldn't 
figure out which particular change would fix the EPT update after MTRR 
enable. Do you remember which that was? I could test it and try to 
narrow any other commits it'd require (seems there were a lot of ept 
related changes)


> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 12:17                                     ` Tomasz Wroblewski
@ 2014-05-19 12:44                                       ` Jan Beulich
  2014-05-19 14:20                                         ` Tomasz Wroblewski
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 12:44 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel

>>> On 19.05.14 at 14:17, <tomasz.wroblewski@gmail.com> wrote:
> On 05/19/2014 02:06 PM, Jan Beulich wrote:
>>>>> On 19.05.14 at 13:32, <tomasz.wroblewski@gmail.com> wrote:
>>> Yeah I gave about a day of effort to port us onto unstable and test
>>> there but it sadly looks to be a bigger job, so leaving that as a last
>>> resort (though planning to spend couple more days on it soon).
>> Then as an alternative did you try pulling over the EPT changes
>> from -unstable?
> That would be indeed preferable, I've looked over them but couldn't 
> figure out which particular change would fix the EPT update after MTRR 
> enable. Do you remember which that was? I could test it and try to 
> narrow any other commits it'd require (seems there were a lot of ept 
> related changes)

I used plural for a reason - I'm afraid you would need to start out with
taking them all, and then possibly determine which ones to drop as
being unrelated to the issue at hand.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 12:44                                       ` Jan Beulich
@ 2014-05-19 14:20                                         ` Tomasz Wroblewski
  2014-05-19 15:24                                           ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 14:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]


On 05/19/2014 02:44 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 14:17, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/19/2014 02:06 PM, Jan Beulich wrote:
>>>>>> On 19.05.14 at 13:32, <tomasz.wroblewski@gmail.com> wrote:
>>>> Yeah I gave about a day of effort to port us onto unstable and test
>>>> there but it sadly looks to be a bigger job, so leaving that as a last
>>>> resort (though planning to spend couple more days on it soon).
>>> Then as an alternative did you try pulling over the EPT changes
>>> from -unstable?
>> That would be indeed preferable, I've looked over them but couldn't
>> figure out which particular change would fix the EPT update after MTRR
>> enable. Do you remember which that was? I could test it and try to
>> narrow any other commits it'd require (seems there were a lot of ept
>> related changes)
> I used plural for a reason - I'm afraid you would need to start out with
> taking them all, and then possibly determine which ones to drop as
> being unrelated to the issue at hand.
Looks like a partial backport of your commit

commit aa9114edd97b292cd89b3616e3f2089471fd2201
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Apr 10 16:01:41 2014 +0200

     x86/EPT: force re-evaluation of memory type as necessary

is all that's necessary. Attaching it versus 4.3.2. I only left the 
memory_type_change calls in MTRR related areas, since only this is 
problematic for the particular issue. This is probably good enough for 
us, thanks for the pointers! Do you think this one is a relatively safe 
for the stable branches?


[-- Attachment #2: ept-bp-aa9114edd97b292cd89b3616e3f2089471fd2201 --]
[-- Type: text/plain, Size: 13286 bytes --]

@@ -3249,13 +3250,13 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
     case MSR_MTRRdefType:
         if ( !mtrr )
             goto gp_fault;
-        if ( !mtrr_def_type_msr_set(&v->arch.hvm_vcpu.mtrr, msr_content) )
+        if ( !mtrr_def_type_msr_set(v->domain, &v->arch.hvm_vcpu.mtrr, msr_content) )
            goto gp_fault;
         break;
     case MSR_MTRRfix64K_00000:
         if ( !mtrr )
             goto gp_fault;
-        if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr, 0, msr_content) )
+        if ( !mtrr_fix_range_msr_set(v->domain, &v->arch.hvm_vcpu.mtrr, 0, msr_content) )
             goto gp_fault;
         break;
     case MSR_MTRRfix16K_80000:
@@ -3263,7 +3264,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( !mtrr )
             goto gp_fault;
         index = msr - MSR_MTRRfix16K_80000 + 1;
-        if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr,
+        if ( !mtrr_fix_range_msr_set(v->domain, &v->arch.hvm_vcpu.mtrr,
                                      index, msr_content) )
             goto gp_fault;
         break;
@@ -3271,7 +3272,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content)
         if ( !mtrr )
             goto gp_fault;
         index = msr - MSR_MTRRfix4K_C0000 + 3;
-        if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr,
+        if ( !mtrr_fix_range_msr_set(v->domain, &v->arch.hvm_vcpu.mtrr,
                                      index, msr_content) )
             goto gp_fault;
         break;
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 83ff1ff..6763322 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -403,7 +403,7 @@ uint32_t get_pat_flags(struct vcpu *v,
     return pat_type_2_pte_flags(pat_entry_value);
 }
 
-bool_t mtrr_def_type_msr_set(struct mtrr_state *m, uint64_t msr_content)
+bool_t mtrr_def_type_msr_set(struct domain *d, struct mtrr_state *m, uint64_t msr_content)
 {
     uint8_t def_type = msr_content & 0xff;
     uint8_t enabled = (msr_content >> 10) & 0x3;
@@ -422,13 +422,17 @@ bool_t mtrr_def_type_msr_set(struct mtrr_state *m, uint64_t msr_content)
          return 0;
     }
 
-    m->enabled = enabled;
-    m->def_type = def_type;
+    if ( m->enabled != enabled || m->def_type != def_type )
+    {
+        m->enabled = enabled;
+        m->def_type = def_type;
+        memory_type_changed(d);
+    }
 
     return 1;
 }
 
-bool_t mtrr_fix_range_msr_set(struct mtrr_state *m, uint32_t row,
+bool_t mtrr_fix_range_msr_set(struct domain *d, struct mtrr_state *m, uint32_t row,
                               uint64_t msr_content)
 {
     uint64_t *fixed_range_base = (uint64_t *)m->fixed_ranges;
@@ -447,6 +451,7 @@ bool_t mtrr_fix_range_msr_set(struct mtrr_state *m, uint32_t row,
         }
 
         fixed_range_base[row] = msr_content;
+        memory_type_changed(d);
     }
 
     return 1;
@@ -488,6 +493,8 @@ bool_t mtrr_var_range_msr_set(
 
     m->overlapped = is_var_mtrr_overlapped(m);
 
+    memory_type_changed(d);
+
     return 1;
 }
 
@@ -662,7 +669,7 @@ static int hvm_load_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
     mtrr_state->mtrr_cap = hw_mtrr.msr_mtrr_cap;
 
     for ( i = 0; i < NUM_FIXED_MSR; i++ )
-        mtrr_fix_range_msr_set(mtrr_state, i, hw_mtrr.msr_mtrr_fixed[i]);
+        mtrr_fix_range_msr_set(d, mtrr_state, i, hw_mtrr.msr_mtrr_fixed[i]);
 
     for ( i = 0; i < MTRR_VCNT; i++ )
     {
@@ -672,7 +679,7 @@ static int hvm_load_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
                 MTRRphysMask_MSR(i), hw_mtrr.msr_mtrr_var[i*2+1]);
     }
 
-    mtrr_def_type_msr_set(mtrr_state, hw_mtrr.msr_mtrr_def_type);
+    mtrr_def_type_msr_set(d, mtrr_state, hw_mtrr.msr_mtrr_def_type);
 
     return 0;
 }
@@ -680,6 +687,12 @@ static int hvm_load_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
 HVM_REGISTER_SAVE_RESTORE(MTRR, hvm_save_mtrr_msr, hvm_load_mtrr_msr,
                           1, HVMSR_PER_VCPU);
 
+void memory_type_changed(struct domain *d)
+{
+    if ( iommu_enabled && !iommu_snoop && d->vcpu && d->vcpu[0] )
+        p2m_memory_type_changed(d);
+}
+
 uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
                            uint8_t *ipat, bool_t direct_mmio)
 {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 03216b6..2e9d12d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2907,6 +2907,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         break;
     }
 
+    case EXIT_REASON_EPT_MISCONFIG:
+    {
+        paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS);
+        if ( !ept_handle_misconfig(gpa) )
+            goto exit_and_crash;
+        break;
+    }
+
     case EXIT_REASON_MONITOR_TRAP_FLAG:
         v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
         vmx_update_cpu_exec_control(v);
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 92d9e2d..16ba317 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -271,6 +271,125 @@ static int ept_next_level(struct p2m_domain *p2m, bool_t read_only,
     return GUEST_TABLE_NORMAL_PAGE;
 }
 
+static bool_t ept_invalidate_emt(mfn_t mfn)
+{
+    ept_entry_t *epte = map_domain_page(mfn_x(mfn));
+    unsigned int i;
+    bool_t changed = 0;
+
+    for ( i = 0; i < EPT_PAGETABLE_ENTRIES; i++ )
+    {
+        ept_entry_t e = atomic_read_ept_entry(&epte[i]);
+
+        if ( !is_epte_valid(&e) || !is_epte_present(&e) ||
+             e.emt == MTRR_NUM_TYPES )
+            continue;
+
+        e.emt = MTRR_NUM_TYPES;
+        atomic_write_ept_entry(&epte[i], e);
+        changed = 1;
+    }
+
+    unmap_domain_page(epte);
+
+    return changed;
+}
+
+bool_t ept_handle_misconfig(uint64_t gpa)
+{
+    struct vcpu *curr = current;
+    struct p2m_domain *p2m = p2m_get_hostp2m(curr->domain);
+    struct ept_data *ept = &p2m->ept;
+    unsigned int level = ept_get_wl(ept);
+    unsigned long gfn = PFN_DOWN(gpa);
+    unsigned long mfn = ept_get_asr(ept);
+    ept_entry_t *epte;
+    int okay;
+
+    if ( !mfn )
+        return 0;
+
+    p2m_lock(p2m);
+
+    okay = -curr->arch.hvm_vmx.ept_spurious_misconfig;
+    for ( ; ; --level )
+    {
+        ept_entry_t e;
+        unsigned int i;
+
+        epte = map_domain_page(mfn);
+        i = (gfn >> (level * EPT_TABLE_ORDER)) & (EPT_PAGETABLE_ENTRIES - 1);
+        e = atomic_read_ept_entry(&epte[i]);
+
+        if ( level == 0 || is_epte_superpage(&e) )
+        {
+            uint8_t ipat = 0;
+
+            if ( e.emt != MTRR_NUM_TYPES )
+                break;
+
+            if ( level == 0 )
+            {
+                for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
+                {
+                    e = atomic_read_ept_entry(&epte[i]);
+                    if ( e.emt == MTRR_NUM_TYPES )
+                        e.emt = 0;
+                    if ( !is_epte_valid(&e) || !is_epte_present(&e) )
+                        continue;
+                    e.emt = epte_get_entry_emt(p2m->domain, gfn + i,
+                                               _mfn(e.mfn), &ipat,
+                                               e.sa_p2mt == p2m_mmio_direct);
+                    e.ipat = ipat;
+                    atomic_write_ept_entry(&epte[i], e);
+                }
+            }
+            else
+            {
+                e.emt = epte_get_entry_emt(p2m->domain, gfn, _mfn(e.mfn),
+                                           &ipat,
+                                           e.sa_p2mt == p2m_mmio_direct);
+                e.ipat = ipat;
+                atomic_write_ept_entry(&epte[i], e);
+            }
+
+            okay = 1;
+            break;
+        }
+
+        if ( e.emt == MTRR_NUM_TYPES )
+        {
+            ASSERT(is_epte_present(&e));
+            ept_invalidate_emt(_mfn(e.mfn));
+            smp_wmb();
+            e.emt = 0;
+            atomic_write_ept_entry(&epte[i], e);
+            unmap_domain_page(epte);
+            okay = 1;
+        }
+        else if ( is_epte_present(&e) && !e.emt )
+            unmap_domain_page(epte);
+        else
+            break;
+
+        mfn = e.mfn;
+    }
+
+    unmap_domain_page(epte);
+    if ( okay > 0 )
+    {
+        struct vcpu *v;
+
+        for_each_vcpu ( curr->domain, v )
+            v->arch.hvm_vmx.ept_spurious_misconfig = 1;
+    }
+    curr->arch.hvm_vmx.ept_spurious_misconfig = 0;
+    ept_sync_domain(p2m);
+    p2m_unlock(p2m);
+
+    return !!okay;
+}
+
 /*
  * ept_set_entry() computes 'need_modify_vtd_table' for itself,
  * by observing whether any gfn->mfn translations are modified.
@@ -687,6 +806,17 @@ static void ept_change_entry_type_global(struct p2m_domain *p2m,
     ept_sync_domain(p2m);
 }
 
+static void ept_memory_type_changed(struct p2m_domain *p2m)
+{
+    unsigned long mfn = ept_get_asr(&p2m->ept);
+
+    if ( !mfn )
+        return;
+
+    if ( ept_invalidate_emt(_mfn(mfn)) )
+        ept_sync_domain(p2m);
+}
+
 static void __ept_sync_domain(void *info)
 {
     struct ept_data *ept = &((struct p2m_domain *)info)->ept;
@@ -724,6 +854,7 @@ int ept_p2m_init(struct p2m_domain *p2m)
     p2m->set_entry = ept_set_entry;
     p2m->get_entry = ept_get_entry;
     p2m->change_entry_type_global = ept_change_entry_type_global;
+    p2m->memory_type_changed = ept_memory_type_changed;
     p2m->audit_p2m = NULL;
 
     /* Set the memory type used when accessing EPT paging structures. */
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index f5ddd20..a3ecb36 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -200,6 +200,18 @@ void p2m_change_entry_type_global(struct domain *d,
     p2m_unlock(p2m);
 }
 
+void p2m_memory_type_changed(struct domain *d)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+    if ( p2m->memory_type_changed )
+    {
+        p2m_lock(p2m);
+        p2m->memory_type_changed(p2m);
+        p2m_unlock(p2m);
+    }
+}
+
 mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn,
                     p2m_type_t *t, p2m_access_t *a, p2m_query_t q,
                     unsigned int *page_order, bool_t locked)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 4d55573..0d85347 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -124,6 +124,9 @@ struct arch_vmx_struct {
 
     unsigned long        host_cr0;
 
+    /* Do we need to tolerate a spurious EPT_MISCONFIG VM exit? */
+    bool_t               ept_spurious_misconfig;
+
     /* Is the guest in real mode? */
     uint8_t              vmx_realmode;
     /* Are we emulating rather than VMENTERing? */
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index f4d759b..55daed9 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -455,6 +455,7 @@ int ept_p2m_init(struct p2m_domain *p2m);
 void ept_p2m_uninit(struct p2m_domain *p2m);
 
 void ept_walk_table(struct domain *d, unsigned long gfn);
+bool_t ept_handle_misconfig(uint64_t gpa);
 void setup_ept_dump(void);
 
 void update_guest_eip(void);
diff --git a/xen/include/asm-x86/mtrr.h b/xen/include/asm-x86/mtrr.h
index 6b4d632..a6f426e 100644
--- a/xen/include/asm-x86/mtrr.h
+++ b/xen/include/asm-x86/mtrr.h
@@ -78,10 +78,11 @@ extern void mtrr_bp_restore(void);
 extern bool_t mtrr_var_range_msr_set(
     struct domain *d, struct mtrr_state *m,
     uint32_t msr, uint64_t msr_content);
-extern bool_t mtrr_fix_range_msr_set(struct mtrr_state *v,
+extern bool_t mtrr_fix_range_msr_set(struct domain *d, struct mtrr_state *v,
 				uint32_t row, uint64_t msr_content);
-extern bool_t mtrr_def_type_msr_set(struct mtrr_state *v, uint64_t msr_content);
+extern bool_t mtrr_def_type_msr_set(struct domain *d, struct mtrr_state *v, uint64_t msr_content);
 extern bool_t pat_msr_set(uint64_t *pat, uint64_t msr);
+extern void memory_type_changed(struct domain *);
 
 bool_t is_var_mtrr_overlapped(struct mtrr_state *m);
 bool_t mtrr_pat_not_equal(struct vcpu *vd, struct vcpu *vs);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index f4e7253..facd318 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -233,6 +233,7 @@ struct p2m_domain {
     void               (*change_entry_type_global)(struct p2m_domain *p2m,
                                                    p2m_type_t ot,
                                                    p2m_type_t nt);
+    void               (*memory_type_changed)(struct p2m_domain *p2m);
     
     void               (*write_p2m_entry)(struct p2m_domain *p2m,
                                           unsigned long gfn, l1_pgentry_t *p,
@@ -506,6 +507,9 @@ void p2m_change_type_range(struct domain *d,
 p2m_type_t p2m_change_type(struct domain *d, unsigned long gfn,
                            p2m_type_t ot, p2m_type_t nt);
 
+/* Report a change affecting memory types. */
+void p2m_memory_type_changed(struct domain *d);
+
 /* Set mmio addresses in the p2m table (for pass-through) */
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 14:20                                         ` Tomasz Wroblewski
@ 2014-05-19 15:24                                           ` Jan Beulich
  2014-05-19 15:48                                             ` Tomasz Wroblewski
  2014-05-19 17:36                                             ` Tim Deegan
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Beulich @ 2014-05-19 15:24 UTC (permalink / raw)
  To: Tomasz Wroblewski, Tim Deegan; +Cc: xen-devel

>>> On 19.05.14 at 16:20, <tomasz.wroblewski@gmail.com> wrote:
> On 05/19/2014 02:44 PM, Jan Beulich wrote:
>> I used plural for a reason - I'm afraid you would need to start out with
>> taking them all, and then possibly determine which ones to drop as
>> being unrelated to the issue at hand.
> Looks like a partial backport of your commit
> 
> commit aa9114edd97b292cd89b3616e3f2089471fd2201
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Thu Apr 10 16:01:41 2014 +0200
> 
>      x86/EPT: force re-evaluation of memory type as necessary
> 
> is all that's necessary. Attaching it versus 4.3.2. I only left the 
> memory_type_change calls in MTRR related areas, since only this is 
> problematic for the particular issue. This is probably good enough for 
> us, thanks for the pointers! Do you think this one is a relatively safe 
> for the stable branches?

I'm rather reluctant to put in any half-baked stuff like this - in going
through the set of changes you certainly noticed that there are
quite a few more fixes, that all deal with similar problems. So the
partial (and amended) backport you provided is really of the sort
"my problem is fixed, let's ignore everything else"...

Independently of this - Tim, do you think this EPT misconfig stuff can
already be considered mature enough for backporting? It's been in a
little over a month, and considering that we're using formally
undefined behavior here, I'd rather view this as not yet a backport
candidate.

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 15:24                                           ` Jan Beulich
@ 2014-05-19 15:48                                             ` Tomasz Wroblewski
  2014-05-19 17:36                                             ` Tim Deegan
  1 sibling, 0 replies; 31+ messages in thread
From: Tomasz Wroblewski @ 2014-05-19 15:48 UTC (permalink / raw)
  To: Jan Beulich, Tim Deegan; +Cc: xen-devel


On 05/19/2014 05:24 PM, Jan Beulich wrote:
>>>> On 19.05.14 at 16:20, <tomasz.wroblewski@gmail.com> wrote:
>> On 05/19/2014 02:44 PM, Jan Beulich wrote:
>>> I used plural for a reason - I'm afraid you would need to start out with
>>> taking them all, and then possibly determine which ones to drop as
>>> being unrelated to the issue at hand.
>> Looks like a partial backport of your commit
>>
>> commit aa9114edd97b292cd89b3616e3f2089471fd2201
>> Author: Jan Beulich <jbeulich@suse.com>
>> Date:   Thu Apr 10 16:01:41 2014 +0200
>>
>>       x86/EPT: force re-evaluation of memory type as necessary
>>
>> is all that's necessary. Attaching it versus 4.3.2. I only left the
>> memory_type_change calls in MTRR related areas, since only this is
>> problematic for the particular issue. This is probably good enough for
>> us, thanks for the pointers! Do you think this one is a relatively safe
>> for the stable branches?
> I'm rather reluctant to put in any half-baked stuff like this - in going
> through the set of changes you certainly noticed that there are
> quite a few more fixes, that all deal with similar problems. So the
> partial (and amended) backport you provided is really of the sort
> "my problem is fixed, let's ignore everything else"...
Yup. Backporting the smallest viable fix seems ok solution for us, so I 
did it and we'll test it a bit, but I fully agree it's likely not the 
best thing for upstream project especially since that code seems very 
fresh. Given there doesn't seem to be much noise about recent (and very 
obvious) vm performance regressions with pci passthrough and large 
amounts of memory, it might be quite an unpopular use case to justify 
the risky backport.


> Independently of this - Tim, do you think this EPT misconfig stuff can
> already be considered mature enough for backporting? It's been in a
> little over a month, and considering that we're using formally
> undefined behavior here, I'd rather view this as not yet a backport
> candidate.
>
> Jan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 15:24                                           ` Jan Beulich
  2014-05-19 15:48                                             ` Tomasz Wroblewski
@ 2014-05-19 17:36                                             ` Tim Deegan
  2014-05-20  6:31                                               ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Tim Deegan @ 2014-05-19 17:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Tomasz Wroblewski

At 16:24 +0100 on 19 May (1400513093), Jan Beulich wrote:
> Independently of this - Tim, do you think this EPT misconfig stuff can
> already be considered mature enough for backporting? It's been in a
> little over a month, and considering that we're using formally
> undefined behavior here, I'd rather view this as not yet a backport
> candidate.

I agree -- it's a pretty major set of changes and I don't think it's
ready for a stable branch yet.   Of course, it depends what the
alternatives are...

Tim.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  2014-05-19 17:36                                             ` Tim Deegan
@ 2014-05-20  6:31                                               ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2014-05-20  6:31 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel, Tomasz Wroblewski

>>> On 19.05.14 at 19:36, <tim@xen.org> wrote:
> At 16:24 +0100 on 19 May (1400513093), Jan Beulich wrote:
>> Independently of this - Tim, do you think this EPT misconfig stuff can
>> already be considered mature enough for backporting? It's been in a
>> little over a month, and considering that we're using formally
>> undefined behavior here, I'd rather view this as not yet a backport
>> candidate.
> 
> I agree -- it's a pretty major set of changes and I don't think it's
> ready for a stable branch yet.   Of course, it depends what the
> alternatives are...

If you go back in the thread, you'll find that the only alternative
found so far would be to split hvmloader's pci_setup() and
cacheattr_init(), with the downside of this becoming a stable
branch only change (and the uncertainty of it having some
unexpected side effect).

Jan

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-05-20  6:31 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15  9:11 GPU passthrough performance regression in >4GB vms due to XSA-60 changes Tomasz Wroblewski
2014-05-15 12:32 ` Jan Beulich
2014-05-15 12:10   ` Tomasz Wroblewski
2014-05-15 13:23     ` Jan Beulich
2014-05-15 13:39       ` Tomasz Wroblewski
2014-05-15 14:34         ` Tomasz Wroblewski
2014-05-15 14:56           ` Tomasz Wroblewski
2014-05-15 16:07             ` Jan Beulich
2014-05-15 15:39               ` Tomasz Wroblewski
2014-05-16  6:33                 ` Jan Beulich
2014-05-16 11:18                   ` Tomasz Wroblewski
2014-05-16 11:38                     ` Jan Beulich
2014-05-16 14:36                       ` Jan Beulich
2014-05-19 10:29                         ` Tomasz Wroblewski
2014-05-19 10:38                           ` Jan Beulich
2014-05-19 10:47                             ` Tomasz Wroblewski
2014-05-19 11:07                               ` Jan Beulich
2014-05-19 11:32                                 ` Tomasz Wroblewski
2014-05-19 12:06                                   ` Jan Beulich
2014-05-19 12:17                                     ` Tomasz Wroblewski
2014-05-19 12:44                                       ` Jan Beulich
2014-05-19 14:20                                         ` Tomasz Wroblewski
2014-05-19 15:24                                           ` Jan Beulich
2014-05-19 15:48                                             ` Tomasz Wroblewski
2014-05-19 17:36                                             ` Tim Deegan
2014-05-20  6:31                                               ` Jan Beulich
2014-05-19 10:42                           ` Tomasz Wroblewski
2014-05-19 11:01                             ` Jan Beulich
2014-05-19 11:09                               ` Tomasz Wroblewski
2014-05-19 11:19                                 ` Jan Beulich
2014-05-15 16:01         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.