All of lore.kernel.org
 help / color / mirror / Atom feed
* AMD Ryzen KVM/NPT/IOMMU issue
@ 2017-05-03 14:37 Matthias Ehrenfeuchter
       [not found] ` <575f8fbc-0fdc-f336-e3da-53f27da4b2e1-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Matthias Ehrenfeuchter @ 2017-05-03 14:37 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,

There are a lot of messages/threads out there about bad performance 
while using AMDs Ryzen with KVM GPU passthrough. It revolves all on 
enabling/disabling npt, while enabled overall VM performance is nice but 
the GPU performance gives me about 20% (and a lot of drops to zero GPU 
usage, while CPU/Disk/Ram also doing nothing) compared to npt disabled. 
But while npt is disabled overall VM performance is like beeing on 4x86 
with floppy disk as only storage. (Ex. it takes 2 seconds just to open 
startmenu while host and vm are in idle, and neither CPU pinning, 
changing CPU model, changing storage device nor using hugepages changed 
anything).

So everything I read pointed to a bug in the npt implementation? 
Anything I could do to get closer to the "thing" issuing this?

Best Regards

efeu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found] ` <575f8fbc-0fdc-f336-e3da-53f27da4b2e1-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org>
@ 2017-05-03 16:28   ` Nick Sarnie
       [not found]     ` <CAOcCaLbdi9KZoXiV5htjShc_mYvZ5jK2B3Ot7NeM=3v_ZA39aA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-05-05 17:27     ` Alex Williamson
  0 siblings, 2 replies; 29+ messages in thread
From: Nick Sarnie @ 2017-05-03 16:28 UTC (permalink / raw)
  To: Matthias Ehrenfeuchter; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter <efeu-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org> wrote:
> Hi,
>
> There are a lot of messages/threads out there about bad performance while
> using AMDs Ryzen with KVM GPU passthrough. It revolves all on
> enabling/disabling npt, while enabled overall VM performance is nice but the
> GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
> while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
> npt is disabled overall VM performance is like beeing on 4x86 with floppy
> disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
> host and vm are in idle, and neither CPU pinning, changing CPU model,
> changing storage device nor using hugepages changed anything).
>
> So everything I read pointed to a bug in the npt implementation? Anything I
> could do to get closer to the "thing" issuing this?
>
> Best Regards
>
> efeu
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

I heard from Joerg that it might be related to a lower intercept rate
being used when NPT is enabled, but we haven't been able to find a way
to trace that to confirm.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]     ` <CAOcCaLbdi9KZoXiV5htjShc_mYvZ5jK2B3Ot7NeM=3v_ZA39aA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-05 12:05       ` Matthias Ehrenfeuchter
  0 siblings, 0 replies; 29+ messages in thread
From: Matthias Ehrenfeuchter @ 2017-05-05 12:05 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I recognized (with npt disabled) the VM is getting slower over time, 
like in Windows the system process is taking more and more CPU usage. A 
soft restart does help makeing it "usable" again. Also wondering if this 
is an hardware related issue in Ryzen, so the upcoming Naples does have 
it too? This would be a nogo for the server platform and, in my eyes, 
the death even pre-released.

Regards


Am 03.05.2017 um 18:28 schrieb Nick Sarnie:
> On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter <efeu-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org> wrote:
>> Hi,
>>
>> There are a lot of messages/threads out there about bad performance while
>> using AMDs Ryzen with KVM GPU passthrough. It revolves all on
>> enabling/disabling npt, while enabled overall VM performance is nice but the
>> GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
>> while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
>> npt is disabled overall VM performance is like beeing on 4x86 with floppy
>> disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
>> host and vm are in idle, and neither CPU pinning, changing CPU model,
>> changing storage device nor using hugepages changed anything).
>>
>> So everything I read pointed to a bug in the npt implementation? Anything I
>> could do to get closer to the "thing" issuing this?
>>
>> Best Regards
>>
>> efeu
>> _______________________________________________
>> iommu mailing list
>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> I heard from Joerg that it might be related to a lower intercept rate
> being used when NPT is enabled, but we haven't been able to find a way
> to trace that to confirm.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-05-03 16:28   ` Nick Sarnie
       [not found]     ` <CAOcCaLbdi9KZoXiV5htjShc_mYvZ5jK2B3Ot7NeM=3v_ZA39aA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-05 17:27     ` Alex Williamson
       [not found]       ` <20170505112706.7785948c-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2017-05-05 17:27 UTC (permalink / raw)
  To: Nick Sarnie; +Cc: Matthias Ehrenfeuchter, iommu, Paolo Bonzini, kvm

On Wed, 3 May 2017 12:28:35 -0400
Nick Sarnie <commendsarnex@gmail.com> wrote:

> On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter <efeu@markju.com> wrote:
> > Hi,
> >
> > There are a lot of messages/threads out there about bad performance while
> > using AMDs Ryzen with KVM GPU passthrough. It revolves all on
> > enabling/disabling npt, while enabled overall VM performance is nice but the
> > GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
> > while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
> > npt is disabled overall VM performance is like beeing on 4x86 with floppy
> > disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
> > host and vm are in idle, and neither CPU pinning, changing CPU model,
> > changing storage device nor using hugepages changed anything).
> >
> > So everything I read pointed to a bug in the npt implementation? Anything I
> > could do to get closer to the "thing" issuing this?
> >
> > Best Regards
> >
> > efeu
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu  
> 
> I heard from Joerg that it might be related to a lower intercept rate
> being used when NPT is enabled, but we haven't been able to find a way
> to trace that to confirm.

Joerg/Paolo, any ideas how we might debug this?  Anyone from AMD
watching?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]       ` <20170505112706.7785948c-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
@ 2017-06-25  5:55         ` Nick Sarnie
       [not found]           ` <CAOcCaLbAS0FkRrG8YZNM5rYUtCFeUGkdgdy=4o16Njufdy8Gag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Nick Sarnie @ 2017-06-25  5:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	John.Bridgman-5C7GfCeVMHo, kvm-u79uwXL29TY76Z2rM5mHXA,
	Matthias Ehrenfeuchter

On Fri, May 5, 2017 at 1:27 PM, Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 3 May 2017 12:28:35 -0400
> Nick Sarnie <commendsarnex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter <efeu-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org> wrote:
>> > Hi,
>> >
>> > There are a lot of messages/threads out there about bad performance while
>> > using AMDs Ryzen with KVM GPU passthrough. It revolves all on
>> > enabling/disabling npt, while enabled overall VM performance is nice but the
>> > GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
>> > while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
>> > npt is disabled overall VM performance is like beeing on 4x86 with floppy
>> > disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
>> > host and vm are in idle, and neither CPU pinning, changing CPU model,
>> > changing storage device nor using hugepages changed anything).
>> >
>> > So everything I read pointed to a bug in the npt implementation? Anything I
>> > could do to get closer to the "thing" issuing this?
>> >
>> > Best Regards
>> >
>> > efeu
>> > _______________________________________________
>> > iommu mailing list
>> > iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>> I heard from Joerg that it might be related to a lower intercept rate
>> being used when NPT is enabled, but we haven't been able to find a way
>> to trace that to confirm.
>
> Joerg/Paolo, any ideas how we might debug this?  Anyone from AMD
> watching?  Thanks,
>
> Alex


Hi all,

A somewhat major update.

I managed to install Xen with my GPU passthrough config and test the
performance with NPT enabled.

There is no performance drop with NPT on Xen, it matches the GPU
performance of KVM with NPT disabled. The CPU performance is also
great.

John Bridgman (ccd) from AMD says he's going to ask around AMD about
this next week, but it would be even better if some AMD guys that read
this ML shared their ideas or took a look.

Let me know if you need any more information.

Thanks,
Sarnex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]           ` <CAOcCaLbAS0FkRrG8YZNM5rYUtCFeUGkdgdy=4o16Njufdy8Gag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-28 17:23             ` Suravee Suthikulpanit
  2017-06-28 17:26               ` Steven Walter
       [not found]               ` <545f19a3-4923-cdec-4ce9-2a4155a04f6a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 2 replies; 29+ messages in thread
From: Suravee Suthikulpanit @ 2017-06-28 17:23 UTC (permalink / raw)
  To: Nick Sarnie, Alex Williamson
  Cc: Paolo Bonzini, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Matthias Ehrenfeuchter, kvm-u79uwXL29TY76Z2rM5mHXA,
	John.Bridgman-5C7GfCeVMHo



On 6/25/17 12:55, Nick Sarnie wrote:
> On Fri, May 5, 2017 at 1:27 PM, Alex Williamson
> <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> On Wed, 3 May 2017 12:28:35 -0400
>> Nick Sarnie <commendsarnex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>>> On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter <efeu-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org> wrote:
>>>> Hi,
>>>>
>>>> There are a lot of messages/threads out there about bad performance while
>>>> using AMDs Ryzen with KVM GPU passthrough. It revolves all on
>>>> enabling/disabling npt, while enabled overall VM performance is nice but the
>>>> GPU performance gives me about 20% (and a lot of drops to zero GPU usage,
>>>> while CPU/Disk/Ram also doing nothing) compared to npt disabled. But while
>>>> npt is disabled overall VM performance is like beeing on 4x86 with floppy
>>>> disk as only storage. (Ex. it takes 2 seconds just to open startmenu while
>>>> host and vm are in idle, and neither CPU pinning, changing CPU model,
>>>> changing storage device nor using hugepages changed anything).
>>>>
>>>> So everything I read pointed to a bug in the npt implementation? Anything I
>>>> could do to get closer to the "thing" issuing this?
>>>>
>>>> Best Regards
>>>>
>>>> efeu
>>>> _______________________________________________
>>>> iommu mailing list
>>>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>
>>> I heard from Joerg that it might be related to a lower intercept rate
>>> being used when NPT is enabled, but we haven't been able to find a way
>>> to trace that to confirm.
>>
>> Joerg/Paolo, any ideas how we might debug this?  Anyone from AMD
>> watching?  Thanks,
>>
>> Alex
>
>
> Hi all,
>
> A somewhat major update.
>
> I managed to install Xen with my GPU passthrough config and test the
> performance with NPT enabled.
>
> There is no performance drop with NPT on Xen, it matches the GPU
> performance of KVM with NPT disabled. The CPU performance is also
> great.
>
> John Bridgman (ccd) from AMD says he's going to ask around AMD about
> this next week, but it would be even better if some AMD guys that read
> this ML shared their ideas or took a look.
>
> Let me know if you need any more information.
>
> Thanks,
> Sarnex
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

So, I'm trying to reproduce this issue on the Ryzen system w/ the following setup:

   * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)

   * guest VM RHEL7.3

   * guest graphic driver = radeon

   * qemu-system-x86_64 --version
     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)

   * kvm-amd npt=1

   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)

   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios 
/usr/share/qemu/bios.bin -device 
ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -drive 
file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none -vga none 
-nodefaults -device 
vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom 
-usb -device usb-host,hostbus=3,hostport=1 -device usb-host,hostbus=3,hostport=3 
-device vfio-pci,host=0000:08:00.1 -device vfio-pci,host=0000:09:00.0

With this setup, I am able to pass-through the dGPU and run the following test:
   * Starting up the guest w/ full GNOME GUI on the attached monitor.
   * glxgears (running @ 60 FPS)
   * Playing 1080p HD video on Youtube

I am not noticing issues here. What kind of test are you running in the guest VM?

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-06-28 17:23             ` Suravee Suthikulpanit
@ 2017-06-28 17:26               ` Steven Walter
       [not found]                 ` <CAK8d-aJ+XHi+5sr6bHj3D2BaG94v6Lyk1C_ZuA4erDVhEyp-uQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]               ` <545f19a3-4923-cdec-4ce9-2a4155a04f6a-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Walter @ 2017-06-28 17:26 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: Nick Sarnie, Alex Williamson, Paolo Bonzini, iommu,
	Matthias Ehrenfeuchter, kvm, John.Bridgman

On Wed, Jun 28, 2017 at 1:23 PM, Suravee Suthikulpanit
<Suravee.Suthikulpanit@amd.com> wrote:
>
>
> On 6/25/17 12:55, Nick Sarnie wrote:
>>
>> On Fri, May 5, 2017 at 1:27 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>>>
>>> On Wed, 3 May 2017 12:28:35 -0400
>>> Nick Sarnie <commendsarnex@gmail.com> wrote:
>>>
>>>> On Wed, May 3, 2017 at 10:37 AM, Matthias Ehrenfeuchter
>>>> <efeu@markju.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> There are a lot of messages/threads out there about bad performance
>>>>> while
>>>>> using AMDs Ryzen with KVM GPU passthrough. It revolves all on
>>>>> enabling/disabling npt, while enabled overall VM performance is nice
>>>>> but the
>>>>> GPU performance gives me about 20% (and a lot of drops to zero GPU
>>>>> usage,
>>>>> while CPU/Disk/Ram also doing nothing) compared to npt disabled. But
>>>>> while
>>>>> npt is disabled overall VM performance is like beeing on 4x86 with
>>>>> floppy
>>>>> disk as only storage. (Ex. it takes 2 seconds just to open startmenu
>>>>> while
>>>>> host and vm are in idle, and neither CPU pinning, changing CPU model,
>>>>> changing storage device nor using hugepages changed anything).
>>>>>
>>>>> So everything I read pointed to a bug in the npt implementation?
>>>>> Anything I
>>>>> could do to get closer to the "thing" issuing this?
>>>>>
>>>>> Best Regards
>>>>>
>>>>> efeu
>>>>> _______________________________________________
>>>>> iommu mailing list
>>>>> iommu@lists.linux-foundation.org
>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>>
>>>>
>>>> I heard from Joerg that it might be related to a lower intercept rate
>>>> being used when NPT is enabled, but we haven't been able to find a way
>>>> to trace that to confirm.
>>>
>>>
>>> Joerg/Paolo, any ideas how we might debug this?  Anyone from AMD
>>> watching?  Thanks,
>>>
>>> Alex
>>
>>
>>
>> Hi all,
>>
>> A somewhat major update.
>>
>> I managed to install Xen with my GPU passthrough config and test the
>> performance with NPT enabled.
>>
>> There is no performance drop with NPT on Xen, it matches the GPU
>> performance of KVM with NPT disabled. The CPU performance is also
>> great.
>>
>> John Bridgman (ccd) from AMD says he's going to ask around AMD about
>> this next week, but it would be even better if some AMD guys that read
>> this ML shared their ideas or took a look.
>>
>> Let me know if you need any more information.
>>
>> Thanks,
>> Sarnex
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
>
> So, I'm trying to reproduce this issue on the Ryzen system w/ the following
> setup:
>
>   * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)
>
>   * guest VM RHEL7.3
>
>   * guest graphic driver = radeon
>
>   * qemu-system-x86_64 --version
>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
>
>   * kvm-amd npt=1
>
>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
>
>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
> /usr/share/qemu/bios.bin -device
> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
> -vga none -nodefaults -device
> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
> -usb -device usb-host,hostbus=3,hostport=1 -device
> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
> vfio-pci,host=0000:09:00.0
>
> With this setup, I am able to pass-through the dGPU and run the following
> test:
>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
>   * glxgears (running @ 60 FPS)
>   * Playing 1080p HD video on Youtube
>
> I am not noticing issues here. What kind of test are you running in the
> guest VM?

Try running the open source game "torcs" inside the VM.  I think
you'll find that there's a very noticeable performance different
between npt=1 and npt=0
-- 
-Steven Walter <stevenrwalter@gmail.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]               ` <545f19a3-4923-cdec-4ce9-2a4155a04f6a-5C7GfCeVMHo@public.gmane.org>
@ 2017-06-28 17:31                 ` Alex Williamson
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2017-06-28 17:31 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter,
	John.Bridgman-5C7GfCeVMHo,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Paolo Bonzini

On Thu, 29 Jun 2017 00:23:20 +0700
Suravee Suthikulpanit <Suravee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org> wrote:
> So, I'm trying to reproduce this issue on the Ryzen system w/ the following setup:
> 
>    * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)
> 
>    * guest VM RHEL7.3
> 
>    * guest graphic driver = radeon
> 
>    * qemu-system-x86_64 --version
>      QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
> 
>    * kvm-amd npt=1
> 
>    * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. 
> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> 
>    * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios 
> /usr/share/qemu/bios.bin -device 
> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 -drive 
> file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none -vga none 
> -nodefaults -device 
> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom 
> -usb -device usb-host,hostbus=3,hostport=1 -device usb-host,hostbus=3,hostport=3 
> -device vfio-pci,host=0000:08:00.1 -device vfio-pci,host=0000:09:00.0
> 
> With this setup, I am able to pass-through the dGPU and run the following test:
>    * Starting up the guest w/ full GNOME GUI on the attached monitor.
>    * glxgears (running @ 60 FPS)
>    * Playing 1080p HD video on Youtube
> 
> I am not noticing issues here. What kind of test are you running in the guest VM?


Hi Suravee,

Thanks for your help!  I think we'd be in real trouble if glxgears
reported something less than 60fps (perhaps that might be easier to
debug though).  I'd suggest trying one of the Unigine benchmarks, like
Heaven.  There should be a noticeable and consistent framerate
difference for npt=0/1.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                 ` <CAK8d-aJ+XHi+5sr6bHj3D2BaG94v6Lyk1C_ZuA4erDVhEyp-uQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-28 18:53                   ` Suravee Suthikulpanit
       [not found]                     ` <5d2ea709-8f90-bfaa-975d-48aed39e75ad-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Suravee Suthikulpanit @ 2017-06-28 18:53 UTC (permalink / raw)
  To: Steven Walter
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter,
	John.Bridgman-5C7GfCeVMHo,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Paolo Bonzini



On 6/29/17 00:26, Steven Walter wrote:
>> So, I'm trying to reproduce this issue on the Ryzen system w/ the following
>> setup:
>>
>>   * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)
>>
>>   * guest VM RHEL7.3
>>
>>   * guest graphic driver = radeon
>>
>>   * qemu-system-x86_64 --version
>>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
>>
>>   * kvm-amd npt=1
>>
>>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
>>
>>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
>> /usr/share/qemu/bios.bin -device
>> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
>> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
>> -vga none -nodefaults -device
>> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
>> -usb -device usb-host,hostbus=3,hostport=1 -device
>> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
>> vfio-pci,host=0000:09:00.0
>>
>> With this setup, I am able to pass-through the dGPU and run the following
>> test:
>>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
>>   * glxgears (running @ 60 FPS)
>>   * Playing 1080p HD video on Youtube
>>
>> I am not noticing issues here. What kind of test are you running in the
>> guest VM?
> Try running the open source game "torcs" inside the VM.  I think
> you'll find that there's a very noticeable performance different
> between npt=1 and npt=0

Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I think my 
driving skill is ~20% worse :(

S

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                     ` <5d2ea709-8f90-bfaa-975d-48aed39e75ad-5C7GfCeVMHo@public.gmane.org>
@ 2017-06-28 19:08                       ` Alex Williamson
       [not found]                         ` <20170628130855.76c2b700-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2017-06-28 19:08 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter,
	John.Bridgman-5C7GfCeVMHo,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Paolo Bonzini

On Thu, 29 Jun 2017 01:53:57 +0700
Suravee Suthikulpanit <Suravee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org> wrote:

> On 6/29/17 00:26, Steven Walter wrote:
> >> So, I'm trying to reproduce this issue on the Ryzen system w/ the following
> >> setup:
> >>
> >>   * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)
> >>
> >>   * guest VM RHEL7.3
> >>
> >>   * guest graphic driver = radeon
> >>
> >>   * qemu-system-x86_64 --version
> >>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
> >>
> >>   * kvm-amd npt=1
> >>
> >>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> >>
> >>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
> >> /usr/share/qemu/bios.bin -device
> >> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
> >> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
> >> -vga none -nodefaults -device
> >> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
> >> -usb -device usb-host,hostbus=3,hostport=1 -device
> >> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
> >> vfio-pci,host=0000:09:00.0
> >>
> >> With this setup, I am able to pass-through the dGPU and run the following
> >> test:
> >>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
> >>   * glxgears (running @ 60 FPS)
> >>   * Playing 1080p HD video on Youtube
> >>
> >> I am not noticing issues here. What kind of test are you running in the
> >> guest VM?  
> > Try running the open source game "torcs" inside the VM.  I think
> > you'll find that there's a very noticeable performance different
> > between npt=1 and npt=0  
> 
> Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I think my 
> driving skill is ~20% worse :(

A clarification on the issue, it's not that these games/benchmarks
don't work or aren't playable, it's that they run slower (as measured by
framerate) with npt=1 vs npt=0.  A virtualized guest on AMD hardware is
hindered either by lower graphics performance (npt=1) or higher CPU
virtualization overhead (npt=0) for high intensity games or graphics
workloads.  Intel's equivalent ept feature does not have this issue.  I
would encourage looking at the framerate for one mode vs the other
before drawing any conclusions on whether it's working well.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                         ` <20170628130855.76c2b700-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
@ 2017-06-28 19:28                           ` Bridgman, John
  2017-06-28 19:29                             ` Bridgman, John
  0 siblings, 1 reply; 29+ messages in thread
From: Bridgman, John @ 2017-06-28 19:28 UTC (permalink / raw)
  To: Alex Williamson, Suthikulpanit, Suravee
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Paolo Bonzini


[-- Attachment #1.1: Type: text/plain, Size: 3603 bytes --]


________________________________
From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sent: June 28, 2017 3:08 PM
To: Suthikulpanit, Suravee
Cc: Steven Walter; Nick Sarnie; Paolo Bonzini; iommu-cunTk1MwBs9QetFLy7KEmy65B3kUBaIG@public.gmane.org.org; Matthias Ehrenfeuchter; kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bridgman, John
Subject: Re: AMD Ryzen KVM/NPT/IOMMU issue

On Thu, 29 Jun 2017 01:53:57 +0700
Suravee Suthikulpanit <Suravee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org> wrote:

> On 6/29/17 00:26, Steven Walter wrote:
> >> So, I'm trying to reproduce this issue on the Ryzen system w/ the following
> >> setup:
> >>
> >>   * Host kernel v4.11 (with this patch https://lkml.org/lkml/2017/6/23/295)
> >>
> >>   * guest VM RHEL7.3
> >>
> >>   * guest graphic driver = radeon
> >>
> >>   * qemu-system-x86_64 --version
> >>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
> >>
> >>   * kvm-amd npt=1
> >>
> >>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> >>
> >>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
> >> /usr/share/qemu/bios.bin -device
> >> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
> >> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
> >> -vga none -nodefaults -device
> >> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
> >> -usb -device usb-host,hostbus=3,hostport=1 -device
> >> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
> >> vfio-pci,host=0000:09:00.0
> >>
> >> With this setup, I am able to pass-through the dGPU and run the following
> >> test:
> >>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
> >>   * glxgears (running @ 60 FPS)
> >>   * Playing 1080p HD video on Youtube
> >>
> >> I am not noticing issues here. What kind of test are you running in the
> >> guest VM?
> > Try running the open source game "torcs" inside the VM.  I think
> > you'll find that there's a very noticeable performance different
> > between npt=1 and npt=0
>
> Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I think my
> driving skill is ~20% worse :(

>A clarification on the issue, it's not that these games/benchmarks
>don't work or aren't playable, it's that they run slower (as measured by
>framerate) with npt=1 vs npt=0.  A virtualized guest on AMD hardware is
>hindered either by lower graphics performance (npt=1) or higher CPU
>virtualization overhead (npt=0) for high intensity games or graphics
>workloads.  Intel's equivalent ept feature does not have this issue.  I
>would encourage looking at the framerate for one mode vs the other
>before drawing any conclusions on whether it's working well.  Thanks,
>Alex

One more data point - Nick did some testing with Xen enabling/disabling npt and
found that (a) performance was not affected much whether npt was on or off, and
(b) performance with npt on was pretty close to KVM performance with npt off.

This suggests something specific to KVM that doesn't play well with npt, although
I have no idea what that might be. I was going to talk to our folks to see if they had
any suggestions re: ways to narrow down where the performance impact is coming
from, but I ended up going off sick instead.

Haven't gone through the whole thread here yet so apologies if this has already been
mentioned.

[-- Attachment #1.2: Type: text/html, Size: 5218 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-06-28 19:28                           ` Bridgman, John
@ 2017-06-28 19:29                             ` Bridgman, John
       [not found]                               ` <BN6PR12MB13481A39CD3EA714754FEE49E8DD0-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Bridgman, John @ 2017-06-28 19:29 UTC (permalink / raw)
  To: Alex Williamson, Suthikulpanit, Suravee
  Cc: Steven Walter, Nick Sarnie, Paolo Bonzini, iommu,
	Matthias Ehrenfeuchter, kvm



From: Alex Williamson <alex.williamson@redhat.com>
Sent: June 28, 2017 3:08 PM
To: Suthikulpanit, Suravee
Cc: Steven Walter; Nick Sarnie; Paolo Bonzini; iommu@lists.linux-foundation.org; Matthias Ehrenfeuchter; kvm@vger.kernel.org; Bridgman, John
Subject: Re: AMD Ryzen KVM/NPT/IOMMU issue
    
On Thu, 29 Jun 2017 01:53:57 +0700
Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> wrote:

> On 6/29/17 00:26, Steven Walter wrote:
> >> So, I'm trying to reproduce this issue on the Ryzen system w/ the following
> >> setup:
> >>
> >>   * Host kernel v4.11 (with this patch  https://lkml.org/lkml/2017/6/23/295)
> >>
> >>   * guest VM RHEL7.3
> >>
> >>   * guest graphic driver = radeon
> >>
> >>   * qemu-system-x86_64 --version
> >>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
> >>
> >>   * kvm-amd npt=1
> >>
> >>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> >>
> >>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
> >> /usr/share/qemu/bios.bin -device
> >> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
> >> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
> >> -vga none -nodefaults -device
> >> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
> >> -usb -device usb-host,hostbus=3,hostport=1 -device
> >> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
> >> vfio-pci,host=0000:09:00.0
> >>
> >> With this setup, I am able to pass-through the dGPU and run the following
> >> test:
> >>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
> >>   * glxgears (running @ 60 FPS)
> >>   * Playing 1080p HD video on Youtube
> >>
> >> I am not noticing issues here. What kind of test are you running in the
> >> guest VM?  
> > Try running the open source game "torcs" inside the VM.  I think
> > you'll find that there's a very noticeable performance different
> > between npt=1 and npt=0  
> 
> Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I think my
> driving skill is ~20% worse :(

>A clarification on the issue, it's not that these games/benchmarks
>don't work or aren't playable, it's that they run slower (as measured by
>framerate) with npt=1 vs npt=0.  A virtualized guest on AMD hardware is
>hindered either by lower graphics performance (npt=1) or higher CPU
>virtualization overhead (npt=0) for high intensity games or graphics
>workloads.  Intel's equivalent ept feature does not have this issue.  I
>would encourage looking at the framerate for one mode vs the other
>before drawing any conclusions on whether it's working well.  Thanks,
>Alex

One more data point - Nick did some testing with Xen enabling/disabling npt and 
found that (a) performance was not affected much whether npt was on or off, and
(b) performance with npt on was pretty close to KVM performance with npt off. 

This suggests something specific to KVM that doesn't play well with npt, although
I have no idea what that might be. I was going to talk to our folks to see if they had
any suggestions re: ways to narrow down where the performance impact is coming
from, but I ended up going off sick instead. 

Haven't gone through the whole thread here yet so apologies if this has already been
mentioned (and Nick sorry for the delay). 
      

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                               ` <BN6PR12MB13481A39CD3EA714754FEE49E8DD0-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-06-28 19:52                                 ` Graham Neville
       [not found]                                   ` <CAEk7i1-Ar0ES8ekmSGiRrrWzTz8gFb2RDTW6KsbuNdDubVerww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Graham Neville @ 2017-06-28 19:52 UTC (permalink / raw)
  To: Bridgman, John
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Paolo Bonzini


[-- Attachment #1.1: Type: text/plain, Size: 4766 bytes --]

Although not related to graphics card performance, there is definitely
another issue with regards to running KVM nested L2 guests when npt=1.

Thought I'd mention this in case it helps with identifying performance
issues with NPT.

I'm unable to start any L2 guests with KVM acceleration (--enable-kvm). As
soon as it attempts to bring up the L2 guest the L1 host crashes, L0 host
remains online. Nothing is printed in either L1 or L0's dmesg.

My L0 is running Arch with 4.11.0-rc6, with qemu 2.8.0. I've tried
different L1 hosts (Ubuntu,Arch) and different kernels right to 4.12-rc5
kernel, along with different qemu versions.

This used to work fine with my Intel i7-4770s setup.

With npt=0, L2 guests can start but performance is dier.

On Wed, Jun 28, 2017 at 7:29 PM, Bridgman, John <John.Bridgman-5C7GfCeVMHo@public.gmane.org>
wrote:

>
>
> From: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sent: June 28, 2017 3:08 PM
> To: Suthikulpanit, Suravee
> Cc: Steven Walter; Nick Sarnie; Paolo Bonzini;
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Matthias Ehrenfeuchter;
> kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Bridgman, John
> Subject: Re: AMD Ryzen KVM/NPT/IOMMU issue
>
> On Thu, 29 Jun 2017 01:53:57 +0700
> Suravee Suthikulpanit <Suravee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org> wrote:
>
> > On 6/29/17 00:26, Steven Walter wrote:
> > >> So, I'm trying to reproduce this issue on the Ryzen system w/ the
> following
> > >> setup:
> > >>
> > >>   * Host kernel v4.11 (with this patch  https://lkml.org/lkml/2017/6/
> 23/295)
> > >>
> > >>   * guest VM RHEL7.3
> > >>
> > >>   * guest graphic driver = radeon
> > >>
> > >>   * qemu-system-x86_64 --version
> > >>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
> > >>
> > >>   * kvm-amd npt=1
> > >>
> > >>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro
> Devices, Inc.
> > >> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> > >>
> > >>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host
> -bios
> > >> /usr/share/qemu/bios.bin -device
> > >> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,
> chassis=1,id=root.1
> > >> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net
> none
> > >> -vga none -nodefaults -device
> > >> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,
> bus=root.1,romfile=/sandbox/vm-images/vbios.rom
> > >> -usb -device usb-host,hostbus=3,hostport=1 -device
> > >> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1
> -device
> > >> vfio-pci,host=0000:09:00.0
> > >>
> > >> With this setup, I am able to pass-through the dGPU and run the
> following
> > >> test:
> > >>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
> > >>   * glxgears (running @ 60 FPS)
> > >>   * Playing 1080p HD video on Youtube
> > >>
> > >> I am not noticing issues here. What kind of test are you running in
> the
> > >> guest VM?
> > > Try running the open source game "torcs" inside the VM.  I think
> > > you'll find that there's a very noticeable performance different
> > > between npt=1 and npt=0
> >
> > Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I
> think my
> > driving skill is ~20% worse :(
>
> >A clarification on the issue, it's not that these games/benchmarks
> >don't work or aren't playable, it's that they run slower (as measured by
> >framerate) with npt=1 vs npt=0.  A virtualized guest on AMD hardware is
> >hindered either by lower graphics performance (npt=1) or higher CPU
> >virtualization overhead (npt=0) for high intensity games or graphics
> >workloads.  Intel's equivalent ept feature does not have this issue.  I
> >would encourage looking at the framerate for one mode vs the other
> >before drawing any conclusions on whether it's working well.  Thanks,
> >Alex
>
> One more data point - Nick did some testing with Xen enabling/disabling
> npt and
> found that (a) performance was not affected much whether npt was on or
> off, and
> (b) performance with npt on was pretty close to KVM performance with npt
> off.
>
> This suggests something specific to KVM that doesn't play well with npt,
> although
> I have no idea what that might be. I was going to talk to our folks to see
> if they had
> any suggestions re: ways to narrow down where the performance impact is
> coming
> from, but I ended up going off sick instead.
>
> Haven't gone through the whole thread here yet so apologies if this has
> already been
> mentioned (and Nick sorry for the delay).
>
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

[-- Attachment #1.2: Type: text/html, Size: 6552 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                                   ` <CAEk7i1-Ar0ES8ekmSGiRrrWzTz8gFb2RDTW6KsbuNdDubVerww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-28 20:33                                     ` Paolo Bonzini
  2017-06-28 22:34                                       ` Nick Sarnie
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-06-28 20:33 UTC (permalink / raw)
  To: Graham Neville, Bridgman, John
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, Matthias Ehrenfeuchter



On 28/06/2017 21:52, Graham Neville wrote:
> Although not related to graphics card performance, there is definitely
> another issue with regards to running KVM nested L2 guests when npt=1.
> 
> Thought I'd mention this in case it helps with identifying performance
> issues with NPT.
> 
> I'm unable to start any L2 guests with KVM acceleration (--enable-kvm).
> As soon as it attempts to bring up the L2 guest the L1 host crashes, L0
> host remains online. Nothing is printed in either L1 or L0's dmesg.
> 
> My L0 is running Arch with 4.11.0-rc6, with qemu 2.8.0. I've tried
> different L1 hosts (Ubuntu,Arch) and different kernels right to 4.12-rc5
> kernel, along with different qemu versions.
> 
> This used to work fine with my Intel i7-4770s setup.
> 
> With npt=0, L2 guests can start but performance is dier.

Nested AMD needs some care.  It's known, but time has been lacking...

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-06-28 20:33                                     ` Paolo Bonzini
@ 2017-06-28 22:34                                       ` Nick Sarnie
       [not found]                                         ` <CAOcCaLao_Y-8KP60baoSehtCu7C5CVnuuZNEom-zi54Fa2h+sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Nick Sarnie @ 2017-06-28 22:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Graham Neville, Bridgman, John, iommu, kvm, Matthias Ehrenfeuchter

On Wed, Jun 28, 2017 at 4:33 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 28/06/2017 21:52, Graham Neville wrote:
>> Although not related to graphics card performance, there is definitely
>> another issue with regards to running KVM nested L2 guests when npt=1.
>>
>> Thought I'd mention this in case it helps with identifying performance
>> issues with NPT.
>>
>> I'm unable to start any L2 guests with KVM acceleration (--enable-kvm).
>> As soon as it attempts to bring up the L2 guest the L1 host crashes, L0
>> host remains online. Nothing is printed in either L1 or L0's dmesg.
>>
>> My L0 is running Arch with 4.11.0-rc6, with qemu 2.8.0. I've tried
>> different L1 hosts (Ubuntu,Arch) and different kernels right to 4.12-rc5
>> kernel, along with different qemu versions.
>>
>> This used to work fine with my Intel i7-4770s setup.
>>
>> With npt=0, L2 guests can start but performance is dier.
>
> Nested AMD needs some care.  It's known, but time has been lacking...
>
> Paolo
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

Hi Suravee,

Thanks a lot for helping. Torcs does not appear graphically demanding
on modern hardware, so this issue may not be easily noticeable. I was
able to easily reproduce the problem using the Unigine Heaven
benchmark, but I'm sure anything moderately graphically demanding
would show a performance loss with NPT enabled. As an example, when I
tested this with Fedora on my RX480, I got around 30-35 FPS with NPT
on and around 55-60 with NPT off.

Let me know if you need any more information or have any questions.

(no problem John, thanks a lot for taking interest in this)

Thanks again,
Sarnex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                                         ` <CAOcCaLao_Y-8KP60baoSehtCu7C5CVnuuZNEom-zi54Fa2h+sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-29  0:21                                           ` Thiago Padilha
       [not found]                                             ` <CAAq2Xdpu_rv7FgVfGCv-nYttGzH6hZujqdYvcf4qgXetkOGLzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Thiago Padilha @ 2017-06-29  0:21 UTC (permalink / raw)
  To: Nick Sarnie
  Cc: Paolo Bonzini, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bridgman, John, kvm-u79uwXL29TY76Z2rM5mHXA,
	Matthias Ehrenfeuchter

On Wed, Jun 28, 2017 at 7:34 PM, Nick Sarnie <commendsarnex-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Suravee,
>
> Thanks a lot for helping. Torcs does not appear graphically demanding
> on modern hardware, so this issue may not be easily noticeable. I was
> able to easily reproduce the problem using the Unigine Heaven
> benchmark, but I'm sure anything moderately graphically demanding
> would show a performance loss with NPT enabled. As an example, when I
> tested this with Fedora on my RX480, I got around 30-35 FPS with NPT
> on and around 55-60 with NPT off.
>
> Let me know if you need any more information or have any questions.
>
> (no problem John, thanks a lot for taking interest in this)
>
> Thanks again,
> Sarnex

Hi

I don't think the FPS drop is proportional to how graphically demanding the
workload is. On the contrary, at first sight it would seem like the less
demanding a workload is, the bigger the FPS impact suffered, though as some
numbers I will show in a moment suggest, this is not always the case.

Unfortunately I haven't been able to find a pattern to what causes the most
impact in FPS except that the relative drop increases with higher FPS
values. Other
than that, it seems very specific to the workload/benchmark used.

Here's some data I've collected to help with the investigation. The system is
Ryzen 1700 (no overclock, 3ghz), GTX 1070, windows 10 guest.

I've used Unigine Heaven and Passmark's PerformanceTest 9.0.

First Heaven benchmark with ultra settings on 1920x1080:

- DirectX 11:
  - npt=0: 87.0 fps
  - npt=1: 78.4 fps (10% drop)
- DirectX 9:
  - npt=0: 100.0 fps
  - npt=1: 66.4 fps (33% drop)
- OpenGL:
  - npt=0: 82.5 fps
  - npt=1: 35.2 fps (58% drop)

Heaven Benchmark again, this time with low settings on 1280x720:

- DirectX 11:
  - npt=0: 182.5 fps
  - npt=1: 140.1 fps (25% drop)
- DirectX 9:
  - npt=0: 169.2 fps
  - npt=1: 74.1 fps (56% drop)
- OpenGL:
  - npt=0: 202.8 fps
  - npt=1: 45.0 fps (78% drop)

PerformanceTest 9.0 3d benchmark:

- DirectX 9:
  - npt=0: 157 fps
  - npt=1: 13 fps (92% drop)
- DirectX 10:
  - npt=0: 220 fps
  - npt=1: 212 fps (4% drop)
- DirectX 11:
  - npt=0: 234 fps
  - npt=1: 140 fps (40% drop)
- DirectX 12:
  - npt=0: 88 fps (scored 35 because of the penalized FPS of not being
able to run at 4k)
  - npt=1: 4.5 fps (scored 1, 95% drop)
- GPU Compute:
  - Mandel:
    - npt=0: ~= 2000 fps
    - npt=1: ~= 2000 fps
  - Bitonic Sort:
    - npt=0: ~= 153583696.0 elements/sec
    - npt=1: ~= 106233376.0 elements/sec (31% drop)
  - QJulia4D:
    - npt=0: ~= 1000 fps
    - npt=1: ~= 1000 fps
  - OpenCL:
    - npt=0: ~= 750 fps
    - npt=1: ~= 220 fps

As you can see, in some cases there's only about 5% drop(which could be within
the margin of error), while others the drop is as high as 95%. Some points of
interest:

- Passmark directx9 is not graphically demanding(runs at 1024x768, gtx 1070
  doesn't break a sweat) and suffers a 92% drop in FPS.
- Unigine directx11 on ultra is graphically demanding and suffers less than 10%
  drop in FPS.
- Passmark directx12 is graphically demanding and suffers 95% drop in FPS.
- The bitonic sort is not a graphical benchmark, it shows the results(avg number
  of sorted elements/sec) in a console window, yet it suffers 31% drop in
  performance.

I think it would take someone with experience in GPU programming, and with
knowledge of what each benchmark does, to find a pattern in these numbers.

Thiago

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                                             ` <CAAq2Xdpu_rv7FgVfGCv-nYttGzH6hZujqdYvcf4qgXetkOGLzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-29  1:50                                               ` Thiago Padilha
       [not found]                                                 ` <CAAq2XdppNcKcmbJhPQ9WfTowKSmp76jhDa9JHM1rc92Enx=1Zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: Thiago Padilha @ 2017-06-29  1:50 UTC (permalink / raw)
  To: Nick Sarnie
  Cc: Paolo Bonzini, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bridgman, John, kvm-u79uwXL29TY76Z2rM5mHXA,
	Matthias Ehrenfeuchter

Some more data from 3DMark benchmarks:

Time Spy(DirectX 12):
- Graphics test 1:
  - npt=0: 37.65 FPS
  - npt=1: 24.22 FPS (36% drop)
- Graphics test 2:
  - npt=0: 33.05 FPS
  - npt=1: 29.65 FPS (10% drop)
- CPU test:
  - npt=0: 17.35 FPS
  - npt=1: 12.03 FPS (31% drop)

Fire Strike(DirectX 11):
- Graphics test 1:
  - npt=0: 80.56 FPS
  - npt=1: 41.89 FPS (49% drop)
- Graphics test 2:
  - npt=0: 70.64 FPS
  - npt=1: 60.75 FPS (14% drop)
- Physics test:
  - npt=0: 50.14 FPS
  - npt=1: 5.78 FPS (89% drop)
- Combined test:
  - npt=0: 32.83 FPS
  - npt=1: 17.70 FPS (47% drop)

Sky Diver(DirectX 11):
- Graphics test 1:
  - npt=0: 248.81 FPS
  - npt=1: 173.63 FPS (31% drop)
- Graphics test 2:
  - npt=0: 250.49 FPS
  - npt=1: 124.84 FPS (51% drop)
- Physics test:
  - 8 threads:
    - npt=0: 140.93 FPS
    - npt=1: 119.08 FPS (15% drop)
  - 24 threads:
    - npt=0: 110.22 FPS
    - npt=1: 74.55 FPS (33% drop)
  - 48 threads:
    - npt=1: 71.56 FPS
    - npt=1: 45.93 FPS (36% drop)
  - 96 threads:
    - npt=0: 41.04 FPS
    - npt=1: 24.81 FPS (40% drop)
- Combined test:
  - npt=0: 75.65 FPS
  - npt=1: 50.45 FPS (33% drop)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                                                 ` <CAAq2XdppNcKcmbJhPQ9WfTowKSmp76jhDa9JHM1rc92Enx=1Zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-29  1:54                                                   ` Nick Sarnie
  2017-07-01 14:15                                                     ` Thiago Padilha
  0 siblings, 1 reply; 29+ messages in thread
From: Nick Sarnie @ 2017-06-29  1:54 UTC (permalink / raw)
  To: Thiago Padilha
  Cc: Paolo Bonzini, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bridgman, John, kvm-u79uwXL29TY76Z2rM5mHXA,
	Matthias Ehrenfeuchter

On Wed, Jun 28, 2017 at 9:50 PM, Thiago Padilha <tpadilha84-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Some more data from 3DMark benchmarks:
>
> Time Spy(DirectX 12):
> - Graphics test 1:
>   - npt=0: 37.65 FPS
>   - npt=1: 24.22 FPS (36% drop)
> - Graphics test 2:
>   - npt=0: 33.05 FPS
>   - npt=1: 29.65 FPS (10% drop)
> - CPU test:
>   - npt=0: 17.35 FPS
>   - npt=1: 12.03 FPS (31% drop)
>
> Fire Strike(DirectX 11):
> - Graphics test 1:
>   - npt=0: 80.56 FPS
>   - npt=1: 41.89 FPS (49% drop)
> - Graphics test 2:
>   - npt=0: 70.64 FPS
>   - npt=1: 60.75 FPS (14% drop)
> - Physics test:
>   - npt=0: 50.14 FPS
>   - npt=1: 5.78 FPS (89% drop)
> - Combined test:
>   - npt=0: 32.83 FPS
>   - npt=1: 17.70 FPS (47% drop)
>
> Sky Diver(DirectX 11):
> - Graphics test 1:
>   - npt=0: 248.81 FPS
>   - npt=1: 173.63 FPS (31% drop)
> - Graphics test 2:
>   - npt=0: 250.49 FPS
>   - npt=1: 124.84 FPS (51% drop)
> - Physics test:
>   - 8 threads:
>     - npt=0: 140.93 FPS
>     - npt=1: 119.08 FPS (15% drop)
>   - 24 threads:
>     - npt=0: 110.22 FPS
>     - npt=1: 74.55 FPS (33% drop)
>   - 48 threads:
>     - npt=1: 71.56 FPS
>     - npt=1: 45.93 FPS (36% drop)
>   - 96 threads:
>     - npt=0: 41.04 FPS
>     - npt=1: 24.81 FPS (40% drop)
> - Combined test:
>   - npt=0: 75.65 FPS
>   - npt=1: 50.45 FPS (33% drop)

Hi Thiago,

Thanks for the data, I'm sure it will be useful. Sorry, I should have
noted that I've only tested a few games/benchmarks and never any
non-intensive loads.

Sarnex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-06-29  1:54                                                   ` Nick Sarnie
@ 2017-07-01 14:15                                                     ` Thiago Padilha
  2017-10-17  4:16                                                       ` Nick Sarnie
  0 siblings, 1 reply; 29+ messages in thread
From: Thiago Padilha @ 2017-07-01 14:15 UTC (permalink / raw)
  To: Nick Sarnie
  Cc: Paolo Bonzini, iommu, Matthias Ehrenfeuchter, kvm, Bridgman, John

Hi

Just wanted to add that I've also tested passthrough with Xen and my radeon
rx460(unfortunately Xen doesn't work with nvidia yet) and confirm that hardware
assisted paging seems to work flawless without affecting GPU performance.

An easy way to verify this is to run the following Passmark's PerformanceTest
benchmarks:

- CPU
- Memory
- 3d

and compare with the results on bare metal and with KVM when npt=1/npt=0. This
benchmark is ideal for reproducing the issue because the 3d performance impact
of the directx9/directx12 tests are significant(about 90% drop in FPS, as I've
mentioned in my previous post).

With Xen, I get the same CPU/Memory scores as KVM with npt=1,
and the same CPU/3d scores as with KVM with npt=0. Summary:

- npt=0 causes drop in Memory test score
- npt=1 causes drop in 3d performance

Xen doesn't suffer performance hit in any of the benchmarks(assuming
HAP enabled),
and is very close to bare metal in all benchmarks except for disk
benchmark. This
result(with Nick Sarnie previous result) suggests that the bug is
located in KVM.

Thiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-07-01 14:15                                                     ` Thiago Padilha
@ 2017-10-17  4:16                                                       ` Nick Sarnie
  0 siblings, 0 replies; 29+ messages in thread
From: Nick Sarnie @ 2017-10-17  4:16 UTC (permalink / raw)
  To: Suravee.Suthikulpanit; +Cc: iommu, kvm, Bridgman, John, Jerome Glisse

On Wed, Jun 28, 2017 at 2:53 PM, Suravee Suthikulpanit
<Suravee.Suthikulpanit@amd.com> wrote:
>
>
> On 6/29/17 00:26, Steven Walter wrote:
>>>
>>> So, I'm trying to reproduce this issue on the Ryzen system w/ the
>>> following
>>> setup:
>>>
>>>   * Host kernel v4.11 (with this patch
>>> https://lkml.org/lkml/2017/6/23/295)
>>>
>>>   * guest VM RHEL7.3
>>>
>>>   * guest graphic driver = radeon
>>>
>>>   * qemu-system-x86_64 --version
>>>     QEMU emulator version 2.9.50 (v2.9.0-1659-g577caa2-dirty)
>>>
>>>   * kvm-amd npt=1
>>>
>>>   * dGPU is 08:00.0 VGA compatible controller: Advanced Micro Devices,
>>> Inc.
>>> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
>>>
>>>   * qemu-system-x86_64 -smp 4 -enable-kvm -M q35 -m 4096 -cpu host -bios
>>> /usr/share/qemu/bios.bin -device
>>> ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1
>>> -drive file=/sandbox/vm-images/rhel7.3.qcow2,if=virtio,id=disk0 -net none
>>> -vga none -nodefaults -device
>>>
>>> vfio-pci,host=08:00.0,x-vga=on,addr=0.0,multifunction=on,bus=root.1,romfile=/sandbox/vm-images/vbios.rom
>>> -usb -device usb-host,hostbus=3,hostport=1 -device
>>> usb-host,hostbus=3,hostport=3 -device vfio-pci,host=0000:08:00.1 -device
>>> vfio-pci,host=0000:09:00.0
>>>
>>> With this setup, I am able to pass-through the dGPU and run the following
>>> test:
>>>   * Starting up the guest w/ full GNOME GUI on the attached monitor.
>>>   * glxgears (running @ 60 FPS)
>>>   * Playing 1080p HD video on Youtube
>>>
>>> I am not noticing issues here. What kind of test are you running in the
>>> guest VM?
>>
>> Try running the open source game "torcs" inside the VM.  I think
>> you'll find that there's a very noticeable performance different
>> between npt=1 and npt=0
>
>
> Hm.. actually torcs seems to be running fine w/ npt=1 setup. Although I
> think my driving skill is ~20% worse :(
>
> S

Hi Suravee,

Did you get a chance to test this again with a more graphically
intense program such as the Unigine Heaven benchmark? This is still a
massive issue in the community.

Thanks,
Sarnex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]                   ` <a909bd77b381f5beef6d74c97307265d-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
@ 2017-10-24 23:39                     ` Nick Sarnie
  0 siblings, 0 replies; 29+ messages in thread
From: Nick Sarnie @ 2017-10-24 23:39 UTC (permalink / raw)
  To: Geoffrey McRae
  Cc: Paolo Bonzini, geoff--- via iommu, kvm-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 24, 2017 at 5:39 PM, geoff--- via iommu
<iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org> wrote:
> On 2017-10-25 08:31, Alex Williamson wrote:
>>
>> On Wed, 25 Oct 2017 07:16:46 +1100
>> geoff--- via iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org> wrote:
>>
>>> I have isolated it to a single change, although I do not completely
>>> understand what other implications it might have.
>>>
>>> By just changing the line in `init_vmcb` that reads:
>>>
>>>    save->g_pat = svm->vcpu.arch.pat;
>>>
>>> To:
>>>
>>>    save->g_pat = 0x0606060606060606;
>>>
>>> This enables write back and performance jumps through the roof.
>>>
>>> This needs someone with more experience to write a proper patch that
>>> addresses this in a smarter way rather then just hard coding the value.
>>>
>>> This patch looks like an attempt to fix this issue but it yields no
>>> detectable performance gains.
>>>
>>> https://patchwork.kernel.org/patch/6748441/
>>>
>>> Any takers?
>>
>>
>> IOMMU is not the right list for such a change.  I'm dubious this is
>> correct since you're basically going against the comment immediately
>> previous in the code, but perhaps it's a hint in the right direction.
>> Thanks,
>>
>> Alex
>
>
> As am I, which is why it needs someone with more experience to figure out
> why this has had such a huge impact. I have been testing everything since
> I made that change and I am finding that everything I throw at it works
> at near native performance.
>
> I will post my findings to the KVM mailing list as it is clearly a KVM
> issue with SVM, perhaps someone there can write a patch to fix this, or
> at the very least allow for a workaround/quirk module parameter.
>
>
>>
>>> On 2017-10-25 06:08, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>>> > I have identified the issue! With NPT enabled I am now getting near
>>> > bare
>>> > metal performance with PCI pass through. The issue was with some stubs
>>> > that have not been properly implemented. I will clean my code up and
>>> > submit a patch shortly.
>>> >
>>> > This is a 10 year old bug that has only become evident with the recent
>>> > ability to perform PCI pass-through with dedicated graphics cards. I
>>> > would expect this to improve performance across most workloads that use
>>> > AMD NPT.
>>> >
>>> > Here are some benchmarks to show what I am getting in my dev
>>> > environment:
>>> >
>>> > https://www.3dmark.com/3dm/22878932
>>> > https://www.3dmark.com/3dm/22879024
>>> >
>>> > -Geoff
>>> >
>>> >
>>> > On 2017-10-24 16:15, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>>> >> Further to this I have verified that IOMMU is working fine, traces and
>>> >> additional printk's added to the kernel module were used to check. All
>>> >> accesses are successful and hit the correct addresses.
>>> >>
>>> >> However profiling under Windows shows there might be an issue with
>>> >> IRQs
>>> >> not reaching the guest. When FluidMark is running at 5fps I still see
>>> >> excellent system responsiveness with the CPU 90% idle and the GPU load
>>> >> at 6%.
>>> >>
>>> >> When switching PhysX to CPU mode the GPU enters low power mode,
>>> >> indicating that the card is no longer in use. This would seem to
>>> >> confirm that the GPU is indeed in use by the PhysX API correctly.
>>> >>
>>> >> My assumption now is that the IRQs from the video card are getting
>>> >> lost.
>>> >>
>>> >> I could be completely off base here but at this point it seems like
>>> >> the
>>> >> best way to proceed unless someone cares to comment.
>>> >>
>>> >> -Geoff
>>> >>
>>> >>
>>> >> On 2017-10-24 10:49, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>>> >>> Hi,
>>> >>>
>>> >>> I realize this is an older thread but I have spent much of today
>>> >>> trying to
>>> >>> diagnose the problem.
>>> >>>
>>> >>> I have discovered how to reliably reproduce the problem with very
>>> >>> little effort.
>>> >>> It seems that reproducing the issue has been hit and miss for people
>>> >>> as it seems
>>> >>> to primarily affect games/programs that make use of nVidia PhysX. My
>>> >>> understanding of npt's inner workings is quite primitive but I have
>>> >>> still spent
>>> >>> much of my time trying to diagnose the fault and identify the cause.
>>> >>>
>>> >>> Using the free program FluidMark[1] it is possible to reproduce the
>>> >>> issue, where
>>> >>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt
>>> >>> turned on, but
>>> >>> if turned off the render rate is in excess of 60fps.
>>> >>>
>>> >>> I have produced traces for with and without ntp enabled during these
>>> >>> tests which
>>> >>> I can provide if it will help. So far I have been digging through how
>>> >>> npt works
>>> >>> and trying to glean as much information as I can from the source and
>>> >>> the AMD
>>> >>> specifications but much of this and how mmu works is very new to me
>>> >>> so progress
>>> >>> is slow.
>>> >>>
>>> >>> If anyone else has looked into this and has more information to share
>>> >>> I would be
>>> >>> very interested.
>>> >>>
>>> >>> Kind Regards,
>>> >>> Geoffrey McRae
>>> >>> HostFission
>>> >>> https://hostfission.com
>>> >>>
>>> >>>
>>> >>> [1]:
>>> >>>
>>> >>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
>
> _______________________________________________
> iommu mailing list
> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

Hi all,

Yeah, I just tested it and I confirm this works around the GPU
performance hit we've all been seeing. Amazing find, and I'll be happy
to see the final solution be merged upstream one day.

Thanks,
Sarnex

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]               ` <20171024233137.295a6b39-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
@ 2017-10-24 21:39                 ` geoff--- via iommu
       [not found]                   ` <a909bd77b381f5beef6d74c97307265d-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: geoff--- via iommu @ 2017-10-24 21:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Paolo Bonzini, geoff--- via iommu, kvm-u79uwXL29TY76Z2rM5mHXA

On 2017-10-25 08:31, Alex Williamson wrote:
> On Wed, 25 Oct 2017 07:16:46 +1100
> geoff--- via iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org> wrote:
> 
>> I have isolated it to a single change, although I do not completely
>> understand what other implications it might have.
>> 
>> By just changing the line in `init_vmcb` that reads:
>> 
>>    save->g_pat = svm->vcpu.arch.pat;
>> 
>> To:
>> 
>>    save->g_pat = 0x0606060606060606;
>> 
>> This enables write back and performance jumps through the roof.
>> 
>> This needs someone with more experience to write a proper patch that
>> addresses this in a smarter way rather then just hard coding the 
>> value.
>> 
>> This patch looks like an attempt to fix this issue but it yields no
>> detectable performance gains.
>> 
>> https://patchwork.kernel.org/patch/6748441/
>> 
>> Any takers?
> 
> IOMMU is not the right list for such a change.  I'm dubious this is
> correct since you're basically going against the comment immediately
> previous in the code, but perhaps it's a hint in the right direction.
> Thanks,
> 
> Alex

As am I, which is why it needs someone with more experience to figure 
out
why this has had such a huge impact. I have been testing everything 
since
I made that change and I am finding that everything I throw at it works
at near native performance.

I will post my findings to the KVM mailing list as it is clearly a KVM
issue with SVM, perhaps someone there can write a patch to fix this, or
at the very least allow for a workaround/quirk module parameter.

> 
>> On 2017-10-25 06:08, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>> > I have identified the issue! With NPT enabled I am now getting near
>> > bare
>> > metal performance with PCI pass through. The issue was with some stubs
>> > that have not been properly implemented. I will clean my code up and
>> > submit a patch shortly.
>> >
>> > This is a 10 year old bug that has only become evident with the recent
>> > ability to perform PCI pass-through with dedicated graphics cards. I
>> > would expect this to improve performance across most workloads that use
>> > AMD NPT.
>> >
>> > Here are some benchmarks to show what I am getting in my dev
>> > environment:
>> >
>> > https://www.3dmark.com/3dm/22878932
>> > https://www.3dmark.com/3dm/22879024
>> >
>> > -Geoff
>> >
>> >
>> > On 2017-10-24 16:15, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>> >> Further to this I have verified that IOMMU is working fine, traces and
>> >> additional printk's added to the kernel module were used to check. All
>> >> accesses are successful and hit the correct addresses.
>> >>
>> >> However profiling under Windows shows there might be an issue with
>> >> IRQs
>> >> not reaching the guest. When FluidMark is running at 5fps I still see
>> >> excellent system responsiveness with the CPU 90% idle and the GPU load
>> >> at 6%.
>> >>
>> >> When switching PhysX to CPU mode the GPU enters low power mode,
>> >> indicating that the card is no longer in use. This would seem to
>> >> confirm that the GPU is indeed in use by the PhysX API correctly.
>> >>
>> >> My assumption now is that the IRQs from the video card are getting
>> >> lost.
>> >>
>> >> I could be completely off base here but at this point it seems like
>> >> the
>> >> best way to proceed unless someone cares to comment.
>> >>
>> >> -Geoff
>> >>
>> >>
>> >> On 2017-10-24 10:49, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>> >>> Hi,
>> >>>
>> >>> I realize this is an older thread but I have spent much of today
>> >>> trying to
>> >>> diagnose the problem.
>> >>>
>> >>> I have discovered how to reliably reproduce the problem with very
>> >>> little effort.
>> >>> It seems that reproducing the issue has been hit and miss for people
>> >>> as it seems
>> >>> to primarily affect games/programs that make use of nVidia PhysX. My
>> >>> understanding of npt's inner workings is quite primitive but I have
>> >>> still spent
>> >>> much of my time trying to diagnose the fault and identify the cause.
>> >>>
>> >>> Using the free program FluidMark[1] it is possible to reproduce the
>> >>> issue, where
>> >>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt
>> >>> turned on, but
>> >>> if turned off the render rate is in excess of 60fps.
>> >>>
>> >>> I have produced traces for with and without ntp enabled during these
>> >>> tests which
>> >>> I can provide if it will help. So far I have been digging through how
>> >>> npt works
>> >>> and trying to glean as much information as I can from the source and
>> >>> the AMD
>> >>> specifications but much of this and how mmu works is very new to me
>> >>> so progress
>> >>> is slow.
>> >>>
>> >>> If anyone else has looked into this and has more information to share
>> >>> I would be
>> >>> very interested.
>> >>>
>> >>> Kind Regards,
>> >>> Geoffrey McRae
>> >>> HostFission
>> >>> https://hostfission.com
>> >>>
>> >>>
>> >>> [1]:
>> >>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/
>> 
>> _______________________________________________
>> iommu mailing list
>> iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
  2017-10-24 20:16           ` geoff--- via iommu
@ 2017-10-24 21:31               ` Alex Williamson
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2017-10-24 21:31 UTC (permalink / raw)
  To: geoff--- via iommu; +Cc: geoff, kvm, Paolo Bonzini, suravee.suthikulpanit

On Wed, 25 Oct 2017 07:16:46 +1100
geoff--- via iommu <iommu@lists.linux-foundation.org> wrote:

> I have isolated it to a single change, although I do not completely
> understand what other implications it might have.
> 
> By just changing the line in `init_vmcb` that reads:
> 
>    save->g_pat = svm->vcpu.arch.pat;
> 
> To:
> 
>    save->g_pat = 0x0606060606060606;
> 
> This enables write back and performance jumps through the roof.
> 
> This needs someone with more experience to write a proper patch that
> addresses this in a smarter way rather then just hard coding the value.
> 
> This patch looks like an attempt to fix this issue but it yields no
> detectable performance gains.
> 
> https://patchwork.kernel.org/patch/6748441/
> 
> Any takers?

IOMMU is not the right list for such a change.  I'm dubious this is
correct since you're basically going against the comment immediately
previous in the code, but perhaps it's a hint in the right direction.
Thanks,

Alex
 
> On 2017-10-25 06:08, geoff@hostfission.com wrote:
> > I have identified the issue! With NPT enabled I am now getting near 
> > bare
> > metal performance with PCI pass through. The issue was with some stubs
> > that have not been properly implemented. I will clean my code up and
> > submit a patch shortly.
> > 
> > This is a 10 year old bug that has only become evident with the recent
> > ability to perform PCI pass-through with dedicated graphics cards. I
> > would expect this to improve performance across most workloads that use
> > AMD NPT.
> > 
> > Here are some benchmarks to show what I am getting in my dev 
> > environment:
> > 
> > https://www.3dmark.com/3dm/22878932
> > https://www.3dmark.com/3dm/22879024
> > 
> > -Geoff
> > 
> > 
> > On 2017-10-24 16:15, geoff@hostfission.com wrote:  
> >> Further to this I have verified that IOMMU is working fine, traces and
> >> additional printk's added to the kernel module were used to check. All
> >> accesses are successful and hit the correct addresses.
> >> 
> >> However profiling under Windows shows there might be an issue with 
> >> IRQs
> >> not reaching the guest. When FluidMark is running at 5fps I still see
> >> excellent system responsiveness with the CPU 90% idle and the GPU load
> >> at 6%.
> >> 
> >> When switching PhysX to CPU mode the GPU enters low power mode,
> >> indicating that the card is no longer in use. This would seem to
> >> confirm that the GPU is indeed in use by the PhysX API correctly.
> >> 
> >> My assumption now is that the IRQs from the video card are getting 
> >> lost.
> >> 
> >> I could be completely off base here but at this point it seems like 
> >> the
> >> best way to proceed unless someone cares to comment.
> >> 
> >> -Geoff
> >> 
> >> 
> >> On 2017-10-24 10:49, geoff@hostfission.com wrote:  
> >>> Hi,
> >>> 
> >>> I realize this is an older thread but I have spent much of today 
> >>> trying to
> >>> diagnose the problem.
> >>> 
> >>> I have discovered how to reliably reproduce the problem with very 
> >>> little effort.
> >>> It seems that reproducing the issue has been hit and miss for people 
> >>> as it seems
> >>> to primarily affect games/programs that make use of nVidia PhysX. My
> >>> understanding of npt's inner workings is quite primitive but I have 
> >>> still spent
> >>> much of my time trying to diagnose the fault and identify the cause.
> >>> 
> >>> Using the free program FluidMark[1] it is possible to reproduce the 
> >>> issue, where
> >>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt 
> >>> turned on, but
> >>> if turned off the render rate is in excess of 60fps.
> >>> 
> >>> I have produced traces for with and without ntp enabled during these 
> >>> tests which
> >>> I can provide if it will help. So far I have been digging through how 
> >>> npt works
> >>> and trying to glean as much information as I can from the source and 
> >>> the AMD
> >>> specifications but much of this and how mmu works is very new to me 
> >>> so progress
> >>> is slow.
> >>> 
> >>> If anyone else has looked into this and has more information to share 
> >>> I would be
> >>> very interested.
> >>> 
> >>> Kind Regards,
> >>> Geoffrey McRae
> >>> HostFission
> >>> https://hostfission.com
> >>> 
> >>> 
> >>> [1]:
> >>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/  
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
@ 2017-10-24 21:31               ` Alex Williamson
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Williamson @ 2017-10-24 21:31 UTC (permalink / raw)
  To: geoff--- via iommu; +Cc: geoff, kvm, Paolo Bonzini, suravee.suthikulpanit

On Wed, 25 Oct 2017 07:16:46 +1100
geoff--- via iommu <iommu@lists.linux-foundation.org> wrote:

> I have isolated it to a single change, although I do not completely
> understand what other implications it might have.
> 
> By just changing the line in `init_vmcb` that reads:
> 
>    save->g_pat = svm->vcpu.arch.pat;
> 
> To:
> 
>    save->g_pat = 0x0606060606060606;
> 
> This enables write back and performance jumps through the roof.
> 
> This needs someone with more experience to write a proper patch that
> addresses this in a smarter way rather then just hard coding the value.
> 
> This patch looks like an attempt to fix this issue but it yields no
> detectable performance gains.
> 
> https://patchwork.kernel.org/patch/6748441/
> 
> Any takers?

IOMMU is not the right list for such a change.  I'm dubious this is
correct since you're basically going against the comment immediately
previous in the code, but perhaps it's a hint in the right direction.
Thanks,

Alex
 
> On 2017-10-25 06:08, geoff@hostfission.com wrote:
> > I have identified the issue! With NPT enabled I am now getting near 
> > bare
> > metal performance with PCI pass through. The issue was with some stubs
> > that have not been properly implemented. I will clean my code up and
> > submit a patch shortly.
> > 
> > This is a 10 year old bug that has only become evident with the recent
> > ability to perform PCI pass-through with dedicated graphics cards. I
> > would expect this to improve performance across most workloads that use
> > AMD NPT.
> > 
> > Here are some benchmarks to show what I am getting in my dev 
> > environment:
> > 
> > https://www.3dmark.com/3dm/22878932
> > https://www.3dmark.com/3dm/22879024
> > 
> > -Geoff
> > 
> > 
> > On 2017-10-24 16:15, geoff@hostfission.com wrote:  
> >> Further to this I have verified that IOMMU is working fine, traces and
> >> additional printk's added to the kernel module were used to check. All
> >> accesses are successful and hit the correct addresses.
> >> 
> >> However profiling under Windows shows there might be an issue with 
> >> IRQs
> >> not reaching the guest. When FluidMark is running at 5fps I still see
> >> excellent system responsiveness with the CPU 90% idle and the GPU load
> >> at 6%.
> >> 
> >> When switching PhysX to CPU mode the GPU enters low power mode,
> >> indicating that the card is no longer in use. This would seem to
> >> confirm that the GPU is indeed in use by the PhysX API correctly.
> >> 
> >> My assumption now is that the IRQs from the video card are getting 
> >> lost.
> >> 
> >> I could be completely off base here but at this point it seems like 
> >> the
> >> best way to proceed unless someone cares to comment.
> >> 
> >> -Geoff
> >> 
> >> 
> >> On 2017-10-24 10:49, geoff@hostfission.com wrote:  
> >>> Hi,
> >>> 
> >>> I realize this is an older thread but I have spent much of today 
> >>> trying to
> >>> diagnose the problem.
> >>> 
> >>> I have discovered how to reliably reproduce the problem with very 
> >>> little effort.
> >>> It seems that reproducing the issue has been hit and miss for people 
> >>> as it seems
> >>> to primarily affect games/programs that make use of nVidia PhysX. My
> >>> understanding of npt's inner workings is quite primitive but I have 
> >>> still spent
> >>> much of my time trying to diagnose the fault and identify the cause.
> >>> 
> >>> Using the free program FluidMark[1] it is possible to reproduce the 
> >>> issue, where
> >>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt 
> >>> turned on, but
> >>> if turned off the render rate is in excess of 60fps.
> >>> 
> >>> I have produced traces for with and without ntp enabled during these 
> >>> tests which
> >>> I can provide if it will help. So far I have been digging through how 
> >>> npt works
> >>> and trying to glean as much information as I can from the source and 
> >>> the AMD
> >>> specifications but much of this and how mmu works is very new to me 
> >>> so progress
> >>> is slow.
> >>> 
> >>> If anyone else has looked into this and has more information to share 
> >>> I would be
> >>> very interested.
> >>> 
> >>> Kind Regards,
> >>> Geoffrey McRae
> >>> HostFission
> >>> https://hostfission.com
> >>> 
> >>> 
> >>> [1]:
> >>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/  
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]         ` <1b4a39530fde35783be63470003f0911-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
@ 2017-10-24 20:16           ` geoff--- via iommu
  2017-10-24 21:31               ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: geoff--- via iommu @ 2017-10-24 20:16 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I have isolated it to a single change, although I do not completely
understand what other implications it might have.

By just changing the line in `init_vmcb` that reads:

   save->g_pat = svm->vcpu.arch.pat;

To:

   save->g_pat = 0x0606060606060606;

This enables write back and performance jumps through the roof.

This needs someone with more experience to write a proper patch that
addresses this in a smarter way rather then just hard coding the value.

This patch looks like an attempt to fix this issue but it yields no
detectable performance gains.

https://patchwork.kernel.org/patch/6748441/

Any takers?

On 2017-10-25 06:08, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
> I have identified the issue! With NPT enabled I am now getting near 
> bare
> metal performance with PCI pass through. The issue was with some stubs
> that have not been properly implemented. I will clean my code up and
> submit a patch shortly.
> 
> This is a 10 year old bug that has only become evident with the recent
> ability to perform PCI pass-through with dedicated graphics cards. I
> would expect this to improve performance across most workloads that use
> AMD NPT.
> 
> Here are some benchmarks to show what I am getting in my dev 
> environment:
> 
> https://www.3dmark.com/3dm/22878932
> https://www.3dmark.com/3dm/22879024
> 
> -Geoff
> 
> 
> On 2017-10-24 16:15, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>> Further to this I have verified that IOMMU is working fine, traces and
>> additional printk's added to the kernel module were used to check. All
>> accesses are successful and hit the correct addresses.
>> 
>> However profiling under Windows shows there might be an issue with 
>> IRQs
>> not reaching the guest. When FluidMark is running at 5fps I still see
>> excellent system responsiveness with the CPU 90% idle and the GPU load
>> at 6%.
>> 
>> When switching PhysX to CPU mode the GPU enters low power mode,
>> indicating that the card is no longer in use. This would seem to
>> confirm that the GPU is indeed in use by the PhysX API correctly.
>> 
>> My assumption now is that the IRQs from the video card are getting 
>> lost.
>> 
>> I could be completely off base here but at this point it seems like 
>> the
>> best way to proceed unless someone cares to comment.
>> 
>> -Geoff
>> 
>> 
>> On 2017-10-24 10:49, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>>> Hi,
>>> 
>>> I realize this is an older thread but I have spent much of today 
>>> trying to
>>> diagnose the problem.
>>> 
>>> I have discovered how to reliably reproduce the problem with very 
>>> little effort.
>>> It seems that reproducing the issue has been hit and miss for people 
>>> as it seems
>>> to primarily affect games/programs that make use of nVidia PhysX. My
>>> understanding of npt's inner workings is quite primitive but I have 
>>> still spent
>>> much of my time trying to diagnose the fault and identify the cause.
>>> 
>>> Using the free program FluidMark[1] it is possible to reproduce the 
>>> issue, where
>>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt 
>>> turned on, but
>>> if turned off the render rate is in excess of 60fps.
>>> 
>>> I have produced traces for with and without ntp enabled during these 
>>> tests which
>>> I can provide if it will help. So far I have been digging through how 
>>> npt works
>>> and trying to glean as much information as I can from the source and 
>>> the AMD
>>> specifications but much of this and how mmu works is very new to me 
>>> so progress
>>> is slow.
>>> 
>>> If anyone else has looked into this and has more information to share 
>>> I would be
>>> very interested.
>>> 
>>> Kind Regards,
>>> Geoffrey McRae
>>> HostFission
>>> https://hostfission.com
>>> 
>>> 
>>> [1]:
>>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found]     ` <cb2b1ee0a3b705e668ac3cf19cfa1ecc-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
@ 2017-10-24 19:08       ` geoff--- via iommu
       [not found]         ` <1b4a39530fde35783be63470003f0911-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: geoff--- via iommu @ 2017-10-24 19:08 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I have identified the issue! With NPT enabled I am now getting near bare
metal performance with PCI pass through. The issue was with some stubs
that have not been properly implemented. I will clean my code up and
submit a patch shortly.

This is a 10 year old bug that has only become evident with the recent
ability to perform PCI pass-through with dedicated graphics cards. I
would expect this to improve performance across most workloads that use
AMD NPT.

Here are some benchmarks to show what I am getting in my dev 
environment:

https://www.3dmark.com/3dm/22878932
https://www.3dmark.com/3dm/22879024

-Geoff


On 2017-10-24 16:15, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
> Further to this I have verified that IOMMU is working fine, traces and
> additional printk's added to the kernel module were used to check. All
> accesses are successful and hit the correct addresses.
> 
> However profiling under Windows shows there might be an issue with IRQs
> not reaching the guest. When FluidMark is running at 5fps I still see
> excellent system responsiveness with the CPU 90% idle and the GPU load
> at 6%.
> 
> When switching PhysX to CPU mode the GPU enters low power mode,
> indicating that the card is no longer in use. This would seem to
> confirm that the GPU is indeed in use by the PhysX API correctly.
> 
> My assumption now is that the IRQs from the video card are getting 
> lost.
> 
> I could be completely off base here but at this point it seems like the
> best way to proceed unless someone cares to comment.
> 
> -Geoff
> 
> 
> On 2017-10-24 10:49, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
>> Hi,
>> 
>> I realize this is an older thread but I have spent much of today 
>> trying to
>> diagnose the problem.
>> 
>> I have discovered how to reliably reproduce the problem with very 
>> little effort.
>> It seems that reproducing the issue has been hit and miss for people 
>> as it seems
>> to primarily affect games/programs that make use of nVidia PhysX. My
>> understanding of npt's inner workings is quite primitive but I have 
>> still spent
>> much of my time trying to diagnose the fault and identify the cause.
>> 
>> Using the free program FluidMark[1] it is possible to reproduce the 
>> issue, where
>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt 
>> turned on, but
>> if turned off the render rate is in excess of 60fps.
>> 
>> I have produced traces for with and without ntp enabled during these 
>> tests which
>> I can provide if it will help. So far I have been digging through how 
>> npt works
>> and trying to glean as much information as I can from the source and 
>> the AMD
>> specifications but much of this and how mmu works is very new to me so 
>> progress
>> is slow.
>> 
>> If anyone else has looked into this and has more information to share 
>> I would be
>> very interested.
>> 
>> Kind Regards,
>> Geoffrey McRae
>> HostFission
>> https://hostfission.com
>> 
>> 
>> [1]:
>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: AMD Ryzen KVM/NPT/IOMMU issue
       [not found] ` <b88fc14b230d7ecac6066bdd9e95be19-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
@ 2017-10-24  5:15   ` geoff--- via iommu
       [not found]     ` <cb2b1ee0a3b705e668ac3cf19cfa1ecc-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: geoff--- via iommu @ 2017-10-24  5:15 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Further to this I have verified that IOMMU is working fine, traces and
additional printk's added to the kernel module were used to check. All
accesses are successful and hit the correct addresses.

However profiling under Windows shows there might be an issue with IRQs
not reaching the guest. When FluidMark is running at 5fps I still see
excellent system responsiveness with the CPU 90% idle and the GPU load
at 6%.

When switching PhysX to CPU mode the GPU enters low power mode,
indicating that the card is no longer in use. This would seem to
confirm that the GPU is indeed in use by the PhysX API correctly.

My assumption now is that the IRQs from the video card are getting lost.

I could be completely off base here but at this point it seems like the
best way to proceed unless someone cares to comment.

-Geoff


On 2017-10-24 10:49, geoff-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org wrote:
> Hi,
> 
> I realize this is an older thread but I have spent much of today trying 
> to
> diagnose the problem.
> 
> I have discovered how to reliably reproduce the problem with very 
> little effort.
> It seems that reproducing the issue has been hit and miss for people as 
> it seems
> to primarily affect games/programs that make use of nVidia PhysX. My
> understanding of npt's inner workings is quite primitive but I have 
> still spent
> much of my time trying to diagnose the fault and identify the cause.
> 
> Using the free program FluidMark[1] it is possible to reproduce the 
> issue, where
> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt 
> turned on, but
> if turned off the render rate is in excess of 60fps.
> 
> I have produced traces for with and without ntp enabled during these 
> tests which
> I can provide if it will help. So far I have been digging through how 
> npt works
> and trying to glean as much information as I can from the source and 
> the AMD
> specifications but much of this and how mmu works is very new to me so 
> progress
> is slow.
> 
> If anyone else has looked into this and has more information to share I 
> would be
> very interested.
> 
> Kind Regards,
> Geoffrey McRae
> HostFission
> https://hostfission.com
> 
> 
> [1]:
> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* AMD Ryzen KVM/NPT/IOMMU issue
@ 2017-10-23 23:49 geoff--- via iommu
       [not found] ` <b88fc14b230d7ecac6066bdd9e95be19-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
  0 siblings, 1 reply; 29+ messages in thread
From: geoff--- via iommu @ 2017-10-23 23:49 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,

I realize this is an older thread but I have spent much of today trying 
to
diagnose the problem.

I have discovered how to reliably reproduce the problem with very little 
effort.
It seems that reproducing the issue has been hit and miss for people as 
it seems
to primarily affect games/programs that make use of nVidia PhysX. My
understanding of npt's inner workings is quite primitive but I have 
still spent
much of my time trying to diagnose the fault and identify the cause.

Using the free program FluidMark[1] it is possible to reproduce the 
issue, where
on a GTX 1080Ti the rendering rate drops to around 4 fps with npt turned 
on, but
if turned off the render rate is in excess of 60fps.

I have produced traces for with and without ntp enabled during these 
tests which
I can provide if it will help. So far I have been digging through how 
npt works
and trying to glean as much information as I can from the source and the 
AMD
specifications but much of this and how mmu works is very new to me so 
progress
is slow.

If anyone else has looked into this and has more information to share I 
would be
very interested.

Kind Regards,
Geoffrey McRae
HostFission
https://hostfission.com


[1]: 
http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/

^ permalink raw reply	[flat|nested] 29+ messages in thread

* AMD Ryzen KVM/NPT/IOMMU issue
@ 2017-06-28 19:17 Graham Neville
  0 siblings, 0 replies; 29+ messages in thread
From: Graham Neville @ 2017-06-28 19:17 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


[-- Attachment #1.1: Type: text/plain, Size: 741 bytes --]

Although not related to graphics card performance, there is definitely
another issue with regards to running KVM nested L2 guests when npt=1.

Thought I'd mention this in case it helps with identifying performance
issues with NPT.

I'm unable to start any L2 guests with KVM acceleration (--enable-kvm). As
soon as it attempts to bring up the L2 guest the L1 host crashes, L0 host
remains online. Nothing is printed in either L1 or L0's dmesg.

My L0 is running Arch with 4.11.0-rc6, with qemu 2.8.0. I've tried
different L1 hosts (Ubuntu,Arch) and different kernels right to 4.12-rc5
kernel, along with different qemu versions.

This used to work fine with my Intel i7-4770s setup.

With npt=0, L2 guests can start but performance is dier.

[-- Attachment #1.2: Type: text/html, Size: 913 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-10-24 23:39 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 14:37 AMD Ryzen KVM/NPT/IOMMU issue Matthias Ehrenfeuchter
     [not found] ` <575f8fbc-0fdc-f336-e3da-53f27da4b2e1-5Zrl/DuVEGLQT0dZR+AlfA@public.gmane.org>
2017-05-03 16:28   ` Nick Sarnie
     [not found]     ` <CAOcCaLbdi9KZoXiV5htjShc_mYvZ5jK2B3Ot7NeM=3v_ZA39aA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-05 12:05       ` Matthias Ehrenfeuchter
2017-05-05 17:27     ` Alex Williamson
     [not found]       ` <20170505112706.7785948c-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-06-25  5:55         ` Nick Sarnie
     [not found]           ` <CAOcCaLbAS0FkRrG8YZNM5rYUtCFeUGkdgdy=4o16Njufdy8Gag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-28 17:23             ` Suravee Suthikulpanit
2017-06-28 17:26               ` Steven Walter
     [not found]                 ` <CAK8d-aJ+XHi+5sr6bHj3D2BaG94v6Lyk1C_ZuA4erDVhEyp-uQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-28 18:53                   ` Suravee Suthikulpanit
     [not found]                     ` <5d2ea709-8f90-bfaa-975d-48aed39e75ad-5C7GfCeVMHo@public.gmane.org>
2017-06-28 19:08                       ` Alex Williamson
     [not found]                         ` <20170628130855.76c2b700-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-06-28 19:28                           ` Bridgman, John
2017-06-28 19:29                             ` Bridgman, John
     [not found]                               ` <BN6PR12MB13481A39CD3EA714754FEE49E8DD0-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-06-28 19:52                                 ` Graham Neville
     [not found]                                   ` <CAEk7i1-Ar0ES8ekmSGiRrrWzTz8gFb2RDTW6KsbuNdDubVerww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-28 20:33                                     ` Paolo Bonzini
2017-06-28 22:34                                       ` Nick Sarnie
     [not found]                                         ` <CAOcCaLao_Y-8KP60baoSehtCu7C5CVnuuZNEom-zi54Fa2h+sQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-29  0:21                                           ` Thiago Padilha
     [not found]                                             ` <CAAq2Xdpu_rv7FgVfGCv-nYttGzH6hZujqdYvcf4qgXetkOGLzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-29  1:50                                               ` Thiago Padilha
     [not found]                                                 ` <CAAq2XdppNcKcmbJhPQ9WfTowKSmp76jhDa9JHM1rc92Enx=1Zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-29  1:54                                                   ` Nick Sarnie
2017-07-01 14:15                                                     ` Thiago Padilha
2017-10-17  4:16                                                       ` Nick Sarnie
     [not found]               ` <545f19a3-4923-cdec-4ce9-2a4155a04f6a-5C7GfCeVMHo@public.gmane.org>
2017-06-28 17:31                 ` Alex Williamson
2017-06-28 19:17 Graham Neville
2017-10-23 23:49 geoff--- via iommu
     [not found] ` <b88fc14b230d7ecac6066bdd9e95be19-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
2017-10-24  5:15   ` geoff--- via iommu
     [not found]     ` <cb2b1ee0a3b705e668ac3cf19cfa1ecc-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
2017-10-24 19:08       ` geoff--- via iommu
     [not found]         ` <1b4a39530fde35783be63470003f0911-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
2017-10-24 20:16           ` geoff--- via iommu
2017-10-24 21:31             ` Alex Williamson
2017-10-24 21:31               ` Alex Williamson
     [not found]               ` <20171024233137.295a6b39-1yVPhWWZRC1BDLzU/O5InQ@public.gmane.org>
2017-10-24 21:39                 ` geoff--- via iommu
     [not found]                   ` <a909bd77b381f5beef6d74c97307265d-9M2dFRIgpjGrDvn5mFPilA@public.gmane.org>
2017-10-24 23:39                     ` Nick Sarnie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.