All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-06 13:33 Gabriel C
  2018-06-06 14:12   ` Michel Dänzer
  2018-06-06 14:29   ` Christian König
  0 siblings, 2 replies; 25+ messages in thread
From: Gabriel C @ 2018-06-06 13:33 UTC (permalink / raw)
  To: Christian König
  Cc: Jean-Marc Valin, Dave Airlie, alexander.deucher, Felix Kuehling,
	Laura Abbott, Andrew Morton, michel.daenzer, dri-devel, LKML,
	Linus Torvalds

2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>
>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>
>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>
>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>
>>>>>> 2018-04-11 6:00 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>> 2018-04-09 11:42 GMT+02:00 Christian König
>>>>>> <ckoenig.leichtzumerken@gmail.com>:
>>>>>>>
>>>>>>> Am 07.04.2018 um 00:00 schrieb Jean-Marc Valin:
>>>>>
>>>>> ...
>>>>>>
>>>>>> I can help testing code for 4.17/++ if you wish but that is
>>>>>> *different*
>>>>>> storry.
>>>>>>
>>>>> Quick tested an 4.16.0-11490-gb284d4d5a678 , amdgpu and radeon driver
>>>>> are broken now in this one.
>>>>>
>>>>> radeon tells:
>>>>>
>>>>> ...
>>>>>
>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>> 0x00000000001D6000).
>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>
>>>>> ...
>>>>>
>>>> I have the same Issue now on final 4.17.
>>>
>>>
>>> Actually Michel came up with a fix for the performance regression which
>>> is
>>> now backported to older kernels as well.
>>>
>>> So the original issue of this mail thread should be fixed by now.
>>
>> Ok , will test as soon I get the GPU to work :))
>>
>>>> Also I played with BIOS options also which does not fix anything but
>>>> changes the error message.
>>>>
>>>> IOMMU && SR-IOV disabled the error changes to this :
>>>>
>>>> [    7.092044] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0
>>>> test failed (scratch(0x850C)=0xCAFEDEAD)
>>>> [    7.092059] radeon 0000:21:00.0: disabling GPU acceleration
>>>>
>>>>
>>>> While I could workaround SWIOTLB bugs in 4.15 and 4.16 , 4.17 seems to
>>>> kill the GPU with no way
>>>> for me to make it work ( at least I could not find any workaround by now
>>>> )
>>>
>>>
>>> That actually sounds like something completely different. Can you provide
>>> a
>>> full dmesg of radeon and/or amdgpu?
>>
>> Sure here from boot with IOMMU/SR-IOV ON/OFF in BIOS :
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>
>> Also nothing else changed in that setup just testing kernel 4.17.
>
>
> That has nothing TODO with the driver nor the original bug you reported. The
> problem is that SME is active and that is currently not supported at all
> with a that hardware.

Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?

SME was like this in kernel 4.16.x here and all worked.

Also if you don't support SME at all now on that Hardware while worked before
please add proper error handling and proper dmesg messages
letting the user know.

radeon: xxxx : SME not supported on that Hardware anymore , please
disable SME...
radeon: xxxx: Update your GPU < or whatever >

How hard would be that ?

No one but developers , can guess from these error messges why his
hardware  suddenly  isn't working anymore by just updating the kernel.


>
> Try to disable SME either in the BIOS or on the kernel command line.

Yes that works but is not the point.

Really you just can't break users setups like this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 13:33 Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later ) Gabriel C
@ 2018-06-06 14:12   ` Michel Dänzer
  2018-06-06 14:29   ` Christian König
  1 sibling, 0 replies; 25+ messages in thread
From: Michel Dänzer @ 2018-06-06 14:12 UTC (permalink / raw)
  To: Gabriel C, Christian König
  Cc: Jean-Marc Valin, Dave Airlie, alexander.deucher, Felix Kuehling,
	Laura Abbott, Andrew Morton, dri-devel, LKML, Linus Torvalds

On 2018-06-06 03:33 PM, Gabriel C wrote:
> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>
>>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>> 0x00000000001D6000).
>>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>
>>>>>> ...
>>>>>>
>>>>> I have the same Issue now on final 4.17.

Please file a bug report, and ideally bisect which commit(s) introduced 
the issue(s).


>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>
>>> Also nothing else changed in that setup just testing kernel 4.17.
>>
>>
>> That has nothing TODO with the driver nor the original bug you reported. The
>> problem is that SME is active and that is currently not supported at all
>> with a that hardware.
> 
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
> 
> SME was like this in kernel 4.16.x here and all worked.

If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu has 
never worked with SME (which BTW doesn't mean it's never going to work 
or that we don't want to support it, just that as far as we know it's 
currently not working).


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-06 14:12   ` Michel Dänzer
  0 siblings, 0 replies; 25+ messages in thread
From: Michel Dänzer @ 2018-06-06 14:12 UTC (permalink / raw)
  To: Gabriel C, Christian König
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds

On 2018-06-06 03:33 PM, Gabriel C wrote:
> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>
>>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>> 0x00000000001D6000).
>>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>
>>>>>> ...
>>>>>>
>>>>> I have the same Issue now on final 4.17.

Please file a bug report, and ideally bisect which commit(s) introduced 
the issue(s).


>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>
>>> Also nothing else changed in that setup just testing kernel 4.17.
>>
>>
>> That has nothing TODO with the driver nor the original bug you reported. The
>> problem is that SME is active and that is currently not supported at all
>> with a that hardware.
> 
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
> 
> SME was like this in kernel 4.16.x here and all worked.

If that is true, again please bisect which commit broke it.

All the reports I've seen before this indicated that at least amdgpu has 
never worked with SME (which BTW doesn't mean it's never going to work 
or that we don't want to support it, just that as far as we know it's 
currently not working).


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 13:33 Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later ) Gabriel C
@ 2018-06-06 14:29   ` Christian König
  2018-06-06 14:29   ` Christian König
  1 sibling, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-06 14:29 UTC (permalink / raw)
  To: Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, alexander.deucher, Felix Kuehling,
	Laura Abbott, Andrew Morton, michel.daenzer, dri-devel, LKML,
	Linus Torvalds

Am 06.06.2018 um 15:33 schrieb Gabriel C:
> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> [SNIP]
>>>
>>> That has nothing TODO with the driver nor the original bug you reported. The
>>> problem is that SME is active and that is currently not supported at all
>>> with a that hardware.
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
>
> SME was like this in kernel 4.16.x here and all worked.
>
> Also if you don't support SME at all now on that Hardware while worked before
> please add proper error handling and proper dmesg messages
> letting the user know.
>
> radeon: xxxx : SME not supported on that Hardware anymore , please
> disable SME...
> radeon: xxxx: Update your GPU < or whatever >
>
> How hard would be that ?

Yes, to be precise that isn't the job of the GFX driver to care about 
such things.

It is a well known and documented limitation of SME that it is in 
general mostly incompatible with GFX (or compute) hardware, and it 
actually doesn't matter which hardware or driver you use.

In other words what happens is that as soon as you use GFX (or compute) 
SME gets disabled transparently.

The problem is that this happens only on the DMA slow path we just 
disabled because of the performance problems.

Going to propose to revert that or at least only use it when SME is 
disabled.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-06 14:29   ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-06 14:29 UTC (permalink / raw)
  To: Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, michel.daenzer,
	LKML, dri-devel, alexander.deucher, Andrew Morton,
	Linus Torvalds

Am 06.06.2018 um 15:33 schrieb Gabriel C:
> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>> [SNIP]
>>>
>>> That has nothing TODO with the driver nor the original bug you reported. The
>>> problem is that SME is active and that is currently not supported at all
>>> with a that hardware.
> Ok .. so are we playing now kernel an AMD Hardware roulette on each release ?
>
> SME was like this in kernel 4.16.x here and all worked.
>
> Also if you don't support SME at all now on that Hardware while worked before
> please add proper error handling and proper dmesg messages
> letting the user know.
>
> radeon: xxxx : SME not supported on that Hardware anymore , please
> disable SME...
> radeon: xxxx: Update your GPU < or whatever >
>
> How hard would be that ?

Yes, to be precise that isn't the job of the GFX driver to care about 
such things.

It is a well known and documented limitation of SME that it is in 
general mostly incompatible with GFX (or compute) hardware, and it 
actually doesn't matter which hardware or driver you use.

In other words what happens is that as soon as you use GFX (or compute) 
SME gets disabled transparently.

The problem is that this happens only on the DMA slow path we just 
disabled because of the performance problems.

Going to propose to revert that or at least only use it when SME is 
disabled.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 14:12   ` Michel Dänzer
@ 2018-06-06 14:44     ` Christian König
  -1 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-06 14:44 UTC (permalink / raw)
  To: Michel Dänzer, Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, alexander.deucher, Felix Kuehling,
	Laura Abbott, Andrew Morton, dri-devel, LKML, Linus Torvalds

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
> On 2018-06-06 03:33 PM, Gabriel C wrote:
>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>>
>>>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>> 0x00000000001D6000).
>>>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>> I have the same Issue now on final 4.17.
>
> Please file a bug report, and ideally bisect which commit(s) 
> introduced the issue(s).
>
>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt 
>>>>
>>>>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt 
>>>>
>>>>
>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>
>>>
>>> That has nothing TODO with the driver nor the original bug you 
>>> reported. The
>>> problem is that SME is active and that is currently not supported at 
>>> all
>>> with a that hardware.
>>
>> Ok .. so are we playing now kernel an AMD Hardware roulette on each 
>> release ?
>>
>> SME was like this in kernel 4.16.x here and all worked.
>
> If that is true, again please bisect which commit broke it.
>
> All the reports I've seen before this indicated that at least amdgpu 
> has never worked with SME (which BTW doesn't mean it's never going to 
> work or that we don't want to support it, just that as far as we know 
> it's currently not working).

At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke 
this is:

commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou <david1.zhou@amd.com>
Date:   Fri Feb 9 10:44:09 2018 +0800

     drm/amdgpu: only enable swiotlb alloc when need v2

     get the max io mapping address of system memory to see if it is over
     our card accessing range.
     v2: move checking later

     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
     Reviewed-by: Monk Liu <monk.liu@amd.com>
     Reviewed-by: Christian König <christian.koenig@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Currently looking into how we could somehow improve this detection.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-06 14:44     ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-06 14:44 UTC (permalink / raw)
  To: Michel Dänzer, Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds

Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
> On 2018-06-06 03:33 PM, Gabriel C wrote:
>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>>
>>>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>> 0x00000000001D6000).
>>>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>> I have the same Issue now on final 4.17.
>
> Please file a bug report, and ideally bisect which commit(s) 
> introduced the issue(s).
>
>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt 
>>>>
>>>>
>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt 
>>>>
>>>>
>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>
>>>
>>> That has nothing TODO with the driver nor the original bug you 
>>> reported. The
>>> problem is that SME is active and that is currently not supported at 
>>> all
>>> with a that hardware.
>>
>> Ok .. so are we playing now kernel an AMD Hardware roulette on each 
>> release ?
>>
>> SME was like this in kernel 4.16.x here and all worked.
>
> If that is true, again please bisect which commit broke it.
>
> All the reports I've seen before this indicated that at least amdgpu 
> has never worked with SME (which BTW doesn't mean it's never going to 
> work or that we don't want to support it, just that as far as we know 
> it's currently not working).

At least in theory it should work when we use the coherent DMA allocator.

When that really worked before, so the most likely commit which broke 
this is:

commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
Author: Chunming Zhou <david1.zhou@amd.com>
Date:   Fri Feb 9 10:44:09 2018 +0800

     drm/amdgpu: only enable swiotlb alloc when need v2

     get the max io mapping address of system memory to see if it is over
     our card accessing range.
     v2: move checking later

     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
     Reviewed-by: Monk Liu <monk.liu@amd.com>
     Reviewed-by: Christian König <christian.koenig@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Currently looking into how we could somehow improve this detection.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 14:44     ` Christian König
@ 2018-06-06 15:03       ` Michel Dänzer
  -1 siblings, 0 replies; 25+ messages in thread
From: Michel Dänzer @ 2018-06-06 15:03 UTC (permalink / raw)
  To: Christian König, Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds

On 2018-06-06 04:44 PM, Christian König wrote:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you
>>>> reported. The
>>>> problem is that SME is active and that is currently not supported at
>>>> all
>>>> with a that hardware.
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu
>> has never worked with SME (which BTW doesn't mean it's never going to
>> work or that we don't want to support it, just that as far as we know
>> it's currently not working).
> 
> At least in theory it should work when we use the coherent DMA allocator.
> 
> When that really worked before, so the most likely commit which broke
> this is:
> 
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou <david1.zhou@amd.com>
> Date:   Fri Feb 9 10:44:09 2018 +0800
> 
>     drm/amdgpu: only enable swiotlb alloc when need v2
> 
>     get the max io mapping address of system memory to see if it is over
>     our card accessing range.
>     v2: move checking later
> 
>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>     Reviewed-by: Monk Liu <monk.liu@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Currently looking into how we could somehow improve this detection.

I guess this could fit for Gabriel, but e.g.
https://bugs.freedesktop.org/104437 says amdgpu was already broken with
SME in 4.15, if not 4.14 (I suspect there was simply no SME support
earlier).


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-06 15:03       ` Michel Dänzer
  0 siblings, 0 replies; 25+ messages in thread
From: Michel Dänzer @ 2018-06-06 15:03 UTC (permalink / raw)
  To: Christian König, Gabriel C
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds

On 2018-06-06 04:44 PM, Christian König wrote:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you
>>>> reported. The
>>>> problem is that SME is active and that is currently not supported at
>>>> all
>>>> with a that hardware.
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu
>> has never worked with SME (which BTW doesn't mean it's never going to
>> work or that we don't want to support it, just that as far as we know
>> it's currently not working).
> 
> At least in theory it should work when we use the coherent DMA allocator.
> 
> When that really worked before, so the most likely commit which broke
> this is:
> 
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou <david1.zhou@amd.com>
> Date:   Fri Feb 9 10:44:09 2018 +0800
> 
>     drm/amdgpu: only enable swiotlb alloc when need v2
> 
>     get the max io mapping address of system memory to see if it is over
>     our card accessing range.
>     v2: move checking later
> 
>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>     Reviewed-by: Monk Liu <monk.liu@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Currently looking into how we could somehow improve this detection.

I guess this could fit for Gabriel, but e.g.
https://bugs.freedesktop.org/104437 says amdgpu was already broken with
SME in 4.15, if not 4.14 (I suspect there was simply no SME support
earlier).


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 14:44     ` Christian König
  (?)
  (?)
@ 2018-06-06 15:24     ` Gabriel C
  -1 siblings, 0 replies; 25+ messages in thread
From: Gabriel C @ 2018-06-06 15:24 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Jean-Marc Valin, Dave Airlie,
	alexander.deucher, Felix Kuehling, Laura Abbott, Andrew Morton,
	dri-devel, LKML, Linus Torvalds

2018-06-06 16:44 GMT+02:00 Christian König <christian.koenig@amd.com>:
> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>
>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>
>>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>
>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>
>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>>>
>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>>>>>
>>>>>>> 2018-04-11 7:02 GMT+02:00 Gabriel C <nix.or.die@gmail.com>:
>>>>>>>>
>>>>>>>>
>>>>>>>> [    6.337838] [drm] PCIE GART of 2048M enabled (table at
>>>>>>>> 0x00000000001D6000).
>>>>>>>> [    6.338210] radeon 0000:21:00.0: (-12) create WB bo failed
>>>>>>>> [    6.338214] radeon 0000:21:00.0: disabling GPU acceleration
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>> I have the same Issue now on final 4.17.
>>
>>
>> Please file a bug report, and ideally bisect which commit(s) introduced
>> the issue(s).
>>
>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>
>>>>>
>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>
>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>
>>>>
>>>>
>>>> That has nothing TODO with the driver nor the original bug you reported.
>>>> The
>>>> problem is that SME is active and that is currently not supported at all
>>>> with a that hardware.
>>>
>>>
>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>> release ?
>>>
>>> SME was like this in kernel 4.16.x here and all worked.
>>
>>
>> If that is true, again please bisect which commit broke it.
>>
>> All the reports I've seen before this indicated that at least amdgpu has
>> never worked with SME (which BTW doesn't mean it's never going to work or
>> that we don't want to support it, just that as far as we know it's currently
>> not working).
>
>
> At least in theory it should work when we use the coherent DMA allocator.
>
> When that really worked before, so the most likely commit which broke this
> is:
>
> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
> Author: Chunming Zhou <david1.zhou@amd.com>
> Date:   Fri Feb 9 10:44:09 2018 +0800
>
>     drm/amdgpu: only enable swiotlb alloc when need v2
>
>     get the max io mapping address of system memory to see if it is over
>     our card accessing range.
>     v2: move checking later
>
>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>     Reviewed-by: Monk Liu <monk.liu@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>
> Currently looking into how we could somehow improve this detection.

Is not this one , I've build an kernel with this reverted.

I'll do an bisect tonight or tomorrow.

>
> Regards,
> Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 15:03       ` Michel Dänzer
  (?)
@ 2018-06-06 15:44       ` Gabriel C
  2018-06-07  7:07         ` Christian König
  -1 siblings, 1 reply; 25+ messages in thread
From: Gabriel C @ 2018-06-06 15:44 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Christian König, Jean-Marc Valin, Dave Airlie,
	Felix Kuehling, LKML, dri-devel, alexander.deucher,
	Andrew Morton, Linus Torvalds

2018-06-06 17:03 GMT+02:00 Michel Dänzer <michel@daenzer.net>:
> On 2018-06-06 04:44 PM, Christian König wrote:
>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> On 2018-06-06 03:33 PM, Gabriel C wrote:
>>>> 2018-06-06 14:19 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>> Am 06.06.2018 um 14:08 schrieb Gabriel C:
>>>>>> 2018-06-06 13:33 GMT+02:00 Christian König <christian.koenig@amd.com>:
>>>>>>> Am 06.06.2018 um 13:28 schrieb Gabriel C:
>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-off.txt
>>>>>>
>>>>>>
>>>>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-iommu-sr-iov-on.txt
>>>>>>
>>>>>>
>>>>>> Also nothing else changed in that setup just testing kernel 4.17.
>>>>>
>>>>>
>>>>> That has nothing TODO with the driver nor the original bug you
>>>>> reported. The
>>>>> problem is that SME is active and that is currently not supported at
>>>>> all
>>>>> with a that hardware.
>>>>
>>>> Ok .. so are we playing now kernel an AMD Hardware roulette on each
>>>> release ?
>>>>
>>>> SME was like this in kernel 4.16.x here and all worked.
>>>
>>> If that is true, again please bisect which commit broke it.
>>>
>>> All the reports I've seen before this indicated that at least amdgpu
>>> has never worked with SME (which BTW doesn't mean it's never going to
>>> work or that we don't want to support it, just that as far as we know
>>> it's currently not working).
>>
>> At least in theory it should work when we use the coherent DMA allocator.
>>
>> When that really worked before, so the most likely commit which broke
>> this is:
>>
>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>> Author: Chunming Zhou <david1.zhou@amd.com>
>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>
>>     drm/amdgpu: only enable swiotlb alloc when need v2
>>
>>     get the max io mapping address of system memory to see if it is over
>>     our card accessing range.
>>     v2: move checking later
>>
>>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>     Reviewed-by: Monk Liu <monk.liu@amd.com>
>>     Reviewed-by: Christian König <christian.koenig@amd.com>
>>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> Currently looking into how we could somehow improve this detection.
>
> I guess this could fit for Gabriel, but e.g.
> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
> earlier).

I got strange performance issue with 4.15 and 4.16 .. but SME was ON
on that setup ( even before it hit mainline ) and never broke the GPU like this.

There is a 4.16.13 boot dmesg which has no such issue:

http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt

With the setup as is booting 4.16.x works , while 4.17 trows the errors.

>
>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-06 15:44       ` Gabriel C
@ 2018-06-07  7:07         ` Christian König
  2018-06-07 12:32           ` Gabriel C
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-06-07  7:07 UTC (permalink / raw)
  To: Gabriel C, Michel Dänzer
  Cc: Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds

Am 06.06.2018 um 17:44 schrieb Gabriel C:
> 2018-06-06 17:03 GMT+02:00 Michel Dänzer <michel@daenzer.net>:
>> On 2018-06-06 04:44 PM, Christian König wrote:
>>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>> [SNIP]
>>> At least in theory it should work when we use the coherent DMA allocator.
>>>
>>> When that really worked before, so the most likely commit which broke
>>> this is:
>>>
>>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>>> Author: Chunming Zhou <david1.zhou@amd.com>
>>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>>
>>>      drm/amdgpu: only enable swiotlb alloc when need v2
>>>
>>>      get the max io mapping address of system memory to see if it is over
>>>      our card accessing range.
>>>      v2: move checking later
>>>
>>>      Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>      Reviewed-by: Monk Liu <monk.liu@amd.com>
>>>      Reviewed-by: Christian König <christian.koenig@amd.com>
>>>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>
>>> Currently looking into how we could somehow improve this detection.
>> I guess this could fit for Gabriel, but e.g.
>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>> earlier).

And what I totally missed is that Gabriel is using radeon and not amdgpu.

So Gabriel you need to revert this one for testing:
commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
Author: Chunming Zhou <david1.zhou@amd.com>
Date:   Fri Feb 9 10:44:10 2018 +0800

     drm/radeon: only enable swiotlb path when need v2

     swiotlb expands our card accessing range, but its path always is slower
     than ttm pool allocation.
     So add condition to use it.
     v2: move a bit later

     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
     Reviewed-by: Monk Liu <monk.liu@amd.com>
     Reviewed-by: Christian König <christian.koenig@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
     Link: 
https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.zhou@amd.com

> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
> on that setup ( even before it hit mainline ) and never broke the GPU like this.

Well that is very interesting, you are the first one who reports that 
SME + GFX works in some way. So far we only got negative reports for that.

> There is a 4.16.13 boot dmesg which has no such issue:
>
> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>
> With the setup as is booting 4.16.x works , while 4.17 trows the errors.

Please do the bisect if the patch I've mentioned above doesn't help.

Thanks,
Christian.

>
>>
>> --
>> Earthling Michel Dänzer               |               http://www.amd.com
>> Libre software enthusiast             |             Mesa and X developer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-07  7:07         ` Christian König
@ 2018-06-07 12:32           ` Gabriel C
  2018-06-07 16:24             ` Gabriel C
  2018-06-08  6:02               ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Gabriel C @ 2018-06-07 12:32 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Jean-Marc Valin, Dave Airlie, Felix Kuehling,
	LKML, dri-devel, alexander.deucher, Andrew Morton,
	Linus Torvalds, Tom Lendacky, Joerg Roedel, Christoph Hellwig

2018-06-07 9:07 GMT+02:00 Christian König <christian.koenig@amd.com>:
> Am 06.06.2018 um 17:44 schrieb Gabriel C:
>>
>> 2018-06-06 17:03 GMT+02:00 Michel Dänzer <michel@daenzer.net>:
>>>
>>> On 2018-06-06 04:44 PM, Christian König wrote:
>>>>
>>>> Am 06.06.2018 um 16:12 schrieb Michel Dänzer:
>>>> [SNIP]
>>>> At least in theory it should work when we use the coherent DMA
>>>> allocator.
>>>>
>>>> When that really worked before, so the most likely commit which broke
>>>> this is:
>>>>
>>>> commit fd5fd480dd8fe4910546e7b080b3ae345e57fe9f
>>>> Author: Chunming Zhou <david1.zhou@amd.com>
>>>> Date:   Fri Feb 9 10:44:09 2018 +0800
>>>>
>>>>      drm/amdgpu: only enable swiotlb alloc when need v2
>>>>
>>>>      get the max io mapping address of system memory to see if it is
>>>> over
>>>>      our card accessing range.
>>>>      v2: move checking later
>>>>
>>>>      Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>      Reviewed-by: Monk Liu <monk.liu@amd.com>
>>>>      Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>>>>
>>>> Currently looking into how we could somehow improve this detection.
>>>
>>> I guess this could fit for Gabriel, but e.g.
>>> https://bugs.freedesktop.org/104437 says amdgpu was already broken with
>>> SME in 4.15, if not 4.14 (I suspect there was simply no SME support
>>> earlier).
>
>
> And what I totally missed is that Gabriel is using radeon and not amdgpu.
>
> So Gabriel you need to revert this one for testing:
> commit 1bc3d3cce8c3b44c2b5ac6cee98c830bb40e6b0f
> Author: Chunming Zhou <david1.zhou@amd.com>
> Date:   Fri Feb 9 10:44:10 2018 +0800
>
>     drm/radeon: only enable swiotlb path when need v2
>
>     swiotlb expands our card accessing range, but its path always is slower
>     than ttm pool allocation.
>     So add condition to use it.
>     v2: move a bit later
>
>     Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>     Reviewed-by: Monk Liu <monk.liu@amd.com>
>     Reviewed-by: Christian König <christian.koenig@amd.com>
>     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180209024410.1469-3-david1.zhou@amd.com
>
>> I got strange performance issue with 4.15 and 4.16 .. but SME was ON
>> on that setup ( even before it hit mainline ) and never broke the GPU like
>> this.
>
>
> Well that is very interesting, you are the first one who reports that SME +
> GFX works in some way. So far we only got negative reports for that.
>
>> There is a 4.16.13 boot dmesg which has no such issue:
>>
>>
>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>
>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>
>
> Please do the bisect if the patch I've mentioned above doesn't help.

Ok done.. bisect points to:

b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Mar 19 11:38:19 2018 +0100

   iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()

   This cleans up the code a lot by removing duplicate logic.

   Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
   Tested-by: Joerg Roedel <jroedel@suse.de>
   Signed-off-by: Christoph Hellwig <hch@lst.de>
   Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
   Acked-by: Joerg Roedel <jroedel@suse.de>
   Cc: David Woodhouse <dwmw2@infradead.org>
   Cc: Joerg Roedel <joro@8bytes.org>
   Cc: Jon Mason <jdmason@kudzu.us>
   Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
   Cc: Linus Torvalds <torvalds@linux-foundation.org>
   Cc: Muli Ben-Yehuda <mulix@mulix.org>
   Cc: Peter Zijlstra <peterz@infradead.org>
   Cc: iommu@lists.linux-foundation.org
   Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de
   Signed-off-by: Ingo Molnar <mingo@kernel.org>


I'll try to revert this once I'm home.

BR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-07 12:32           ` Gabriel C
@ 2018-06-07 16:24             ` Gabriel C
  2018-06-07 17:20               ` Christian König
  2018-06-08  6:02               ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Gabriel C @ 2018-06-07 16:24 UTC (permalink / raw)
  To: Christian König
  Cc: Michel Dänzer, Jean-Marc Valin, Dave Airlie, Felix Kuehling,
	LKML, dri-devel, alexander.deucher, Andrew Morton,
	Linus Torvalds, Tom Lendacky, Joerg Roedel, Christoph Hellwig

>> Well that is very interesting, you are the first one who reports that SME +
>> GFX works in some way. So far we only got negative reports for that.
>>
>>> There is a 4.16.13 boot dmesg which has no such issue:
>>>
>>>
>>> http://ftp.frugalware.org/pub/other/people/crazy/radeon/dmesg-radeon-SME-ON-kernel-4.16.txt
>>>
>>> With the setup as is booting 4.16.x works , while 4.17 trows the errors.
>>
>>
>> Please do the bisect if the patch I've mentioned above doesn't help.
>
> Ok done.. bisect points to:
>
> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> Author: Christoph Hellwig <hch@lst.de>
> Date:   Mon Mar 19 11:38:19 2018 +0100
>
>    iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>
>    This cleans up the code a lot by removing duplicate logic.
>
>    Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
>    Tested-by: Joerg Roedel <jroedel@suse.de>
>    Signed-off-by: Christoph Hellwig <hch@lst.de>
>    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>    Acked-by: Joerg Roedel <jroedel@suse.de>
>    Cc: David Woodhouse <dwmw2@infradead.org>
>    Cc: Joerg Roedel <joro@8bytes.org>
>    Cc: Jon Mason <jdmason@kudzu.us>
>    Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>    Cc: Linus Torvalds <torvalds@linux-foundation.org>
>    Cc: Muli Ben-Yehuda <mulix@mulix.org>
>    Cc: Peter Zijlstra <peterz@infradead.org>
>    Cc: iommu@lists.linux-foundation.org
>    Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de
>    Signed-off-by: Ingo Molnar <mingo@kernel.org>
>
>
> I'll try to revert this once I'm home.

I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
fixes that issue for me.

The GPU is working fine with SME enabled.

Now with working GPU :) I can also confirm performance is back to normal
without doing any other workarounds.

The only app still acting up a bit is Firefox , just minor frame drops,
but nothing to bad.  ( probably an Firefox bug too )

crhomium/chrome is fine .. even with 10 tabs open , each one playing
an video on youtube no glitches at all.

Desktop is also fine now,  could not find anything wrong.


BR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-07 16:24             ` Gabriel C
@ 2018-06-07 17:20               ` Christian König
  2018-06-08  6:01                 ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-06-07 17:20 UTC (permalink / raw)
  To: Gabriel C, Christoph Hellwig
  Cc: Michel Dänzer, Jean-Marc Valin, Dave Airlie, Felix Kuehling,
	LKML, dri-devel, alexander.deucher, Andrew Morton,
	Linus Torvalds, Tom Lendacky, Joerg Roedel

Hi Christopher,

Am 07.06.2018 um 18:24 schrieb Gabriel C:
>> [SNIP]
>> Ok done.. bisect points to:
>>
>> b468620f2a1dfdcfddfd6fa54367b8bcc1b51248 is the first bad commit
>> commit b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
>> Author: Christoph Hellwig <hch@lst.de>
>> Date:   Mon Mar 19 11:38:19 2018 +0100
>>
>>     iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}()
>>
>>     This cleans up the code a lot by removing duplicate logic.
>>
>>     Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
>>     Tested-by: Joerg Roedel <jroedel@suse.de>
>>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>>     Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>>     Acked-by: Joerg Roedel <jroedel@suse.de>
>>     Cc: David Woodhouse <dwmw2@infradead.org>
>>     Cc: Joerg Roedel <joro@8bytes.org>
>>     Cc: Jon Mason <jdmason@kudzu.us>
>>     Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>     Cc: Muli Ben-Yehuda <mulix@mulix.org>
>>     Cc: Peter Zijlstra <peterz@infradead.org>
>>     Cc: iommu@lists.linux-foundation.org
>>     Link: http://lkml.kernel.org/r/20180319103826.12853-8-hch@lst.de
>>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>
>>
>> I'll try to revert this once I'm home.
> I can confirm reverting b468620f2a1dfdcfddfd6fa54367b8bcc1b51248
> fixes that issue for me.

any idea what could cause that? Basically this patch breaks radeon when 
SME is enabled.

> The GPU is working fine with SME enabled.
>
> Now with working GPU :) I can also confirm performance is back to normal
> without doing any other workarounds.
>
> The only app still acting up a bit is Firefox , just minor frame drops,
> but nothing to bad.  ( probably an Firefox bug too )
>
> crhomium/chrome is fine .. even with 10 tabs open , each one playing
> an video on youtube no glitches at all.
>
> Desktop is also fine now,  could not find anything wrong.

Thanks for testing,
Christian.

>
>
> BR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-07 17:20               ` Christian König
@ 2018-06-08  6:01                 ` Christoph Hellwig
  2018-06-08  6:47                   ` Christian König
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2018-06-08  6:01 UTC (permalink / raw)
  To: Christian König
  Cc: Gabriel C, Christoph Hellwig, Michel Dänzer,
	Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	alexander.deucher, Andrew Morton, Linus Torvalds, Tom Lendacky,
	Joerg Roedel

On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:
> Hi Christopher,

I don't see a Christopher on the Cc list..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-07 12:32           ` Gabriel C
@ 2018-06-08  6:02               ` Christoph Hellwig
  2018-06-08  6:02               ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2018-06-08  6:02 UTC (permalink / raw)
  To: Gabriel C
  Cc: Christian König, Michel Dänzer, Jean-Marc Valin,
	Dave Airlie, Felix Kuehling, LKML, dri-devel, alexander.deucher,
	Andrew Morton, Linus Torvalds, Tom Lendacky, Joerg Roedel,
	Christoph Hellwig

On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
> Ok done.. bisect points to:

What is the failure mode you are seeing?  Can't find anything in the
mail unfortunately.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-08  6:02               ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2018-06-08  6:02 UTC (permalink / raw)
  To: Gabriel C
  Cc: Tom Lendacky, Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML,
	dri-devel, Christian König, Joerg Roedel, Christoph Hellwig,
	alexander.deucher, Andrew Morton, Linus Torvalds,
	Michel Dänzer

On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
> Ok done.. bisect points to:

What is the failure mode you are seeing?  Can't find anything in the
mail unfortunately.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-08  6:01                 ` Christoph Hellwig
@ 2018-06-08  6:47                   ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-08  6:47 UTC (permalink / raw)
  To: Christoph Hellwig, Christian König
  Cc: Tom Lendacky, Jean-Marc Valin, Gabriel C, Dave Airlie,
	Felix Kuehling, LKML, dri-devel, Joerg Roedel, alexander.deucher,
	Andrew Morton, Linus Torvalds, Michel Dänzer

Hi Christoph,

Am 08.06.2018 um 08:01 schrieb Christoph Hellwig:
> On Thu, Jun 07, 2018 at 07:20:37PM +0200, Christian König wrote:
>> Hi Christopher,
> I don't see a Christopher on the Cc list..

Sorry, auto-uncorrection. I indeed meant you :)

Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-08  6:02               ` Christoph Hellwig
@ 2018-06-08  6:52                 ` Christian König
  -1 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-08  6:52 UTC (permalink / raw)
  To: Christoph Hellwig, Gabriel C
  Cc: Tom Lendacky, Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML,
	dri-devel, Christian König, Joerg Roedel, alexander.deucher,
	Andrew Morton, Linus Torvalds, Michel Dänzer

Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>> Ok done.. bisect points to:
> What is the failure mode you are seeing?  Can't find anything in the
> mail unfortunately.

As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in 
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.

Still need to figure out which parameters we want to use for the 
allocation, but I think it is only 4k or 8k.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-08  6:52                 ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-06-08  6:52 UTC (permalink / raw)
  To: Christoph Hellwig, Gabriel C
  Cc: Tom Lendacky, Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML,
	dri-devel, Michel Dänzer, Joerg Roedel, alexander.deucher,
	Andrew Morton, Linus Torvalds, Christian König

Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>> Ok done.. bisect points to:
> What is the failure mode you are seeing?  Can't find anything in the
> mail unfortunately.

As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in 
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.

Still need to figure out which parameters we want to use for the 
allocation, but I think it is only 4k or 8k.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-08  6:52                 ` Christian König
  (?)
@ 2018-06-08 13:32                 ` Gabriel C
  2018-06-11  7:15                   ` Christoph Hellwig
  -1 siblings, 1 reply; 25+ messages in thread
From: Gabriel C @ 2018-06-08 13:32 UTC (permalink / raw)
  To: Christian König
  Cc: Christoph Hellwig, Tom Lendacky, Jean-Marc Valin, Dave Airlie,
	Felix Kuehling, LKML, dri-devel, Joerg Roedel, alexander.deucher,
	Andrew Morton, Linus Torvalds, Michel Dänzer

2018-06-08 8:52 GMT+02:00 Christian König <christian.koenig@amd.com>:
> Am 08.06.2018 um 08:02 schrieb Christoph Hellwig:
>>
>> On Thu, Jun 07, 2018 at 02:32:46PM +0200, Gabriel C wrote:
>>>
>>> Ok done.. bisect points to:
>>
>> What is the failure mode you are seeing?  Can't find anything in the
>> mail unfortunately.
>
>
> As far as I analyzed it we now get an -ENOMEM from dma_alloc_attrs() in
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c when IOMMU is enabled.
>
> Still need to figure out which parameters we want to use for the allocation,
> but I think it is only 4k or 8k.

When you guys need me to test something , or run debug patches
or patches of any sort just let me know..

>
> Regards,
> Christian.

BR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-08 13:32                 ` Gabriel C
@ 2018-06-11  7:15                   ` Christoph Hellwig
  2018-06-11 19:23                       ` Linus Torvalds
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2018-06-11  7:15 UTC (permalink / raw)
  To: Gabriel C
  Cc: Christian König, Christoph Hellwig, Tom Lendacky,
	Jean-Marc Valin, Dave Airlie, Felix Kuehling, LKML, dri-devel,
	Joerg Roedel, alexander.deucher, Andrew Morton, Linus Torvalds,
	Michel Dänzer

I think the prime issue is that dma_direct_alloc respects the dma
mask.  Which we don't need if actually using the iommu.  This would
be mostly harmless exept for the the SEV bit high in the address that
makes the checks fail.

For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
addressing these issues properly.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
  2018-06-11  7:15                   ` Christoph Hellwig
@ 2018-06-11 19:23                       ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2018-06-11 19:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Gabriel C, Christian König, Tom Lendacky, jmvalin,
	Dave Airlie, Felix.Kuehling, Linux Kernel Mailing List, DRI,
	Joerg Roedel, Alex Deucher, Andrew Morton, Michel Dänzer

On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig <hch@lst.de> wrote:
>
> For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
> addressing these issues properly.

Ok, reverted in my tree, and marked for stable (for 4.17). Thanks,

                     Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later )
@ 2018-06-11 19:23                       ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2018-06-11 19:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tom Lendacky, jmvalin, Gabriel C, Dave Airlie, Felix.Kuehling,
	Linux Kernel Mailing List, DRI, Michel Dänzer, Joerg Roedel,
	Alex Deucher, Andrew Morton, Christian König

On Mon, Jun 11, 2018 at 12:07 AM Christoph Hellwig <hch@lst.de> wrote:
>
> For now I'd say revert this commit for 4.17/4.18-rc and I'll look into
> addressing these issues properly.

Ok, reverted in my tree, and marked for stable (for 4.17). Thanks,

                     Linus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-06-11 19:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-06 13:33 Kernel and ADM hardware roulette ( was AMD graphics performance regression in 4.15 and later ) Gabriel C
2018-06-06 14:12 ` Michel Dänzer
2018-06-06 14:12   ` Michel Dänzer
2018-06-06 14:44   ` Christian König
2018-06-06 14:44     ` Christian König
2018-06-06 15:03     ` Michel Dänzer
2018-06-06 15:03       ` Michel Dänzer
2018-06-06 15:44       ` Gabriel C
2018-06-07  7:07         ` Christian König
2018-06-07 12:32           ` Gabriel C
2018-06-07 16:24             ` Gabriel C
2018-06-07 17:20               ` Christian König
2018-06-08  6:01                 ` Christoph Hellwig
2018-06-08  6:47                   ` Christian König
2018-06-08  6:02             ` Christoph Hellwig
2018-06-08  6:02               ` Christoph Hellwig
2018-06-08  6:52               ` Christian König
2018-06-08  6:52                 ` Christian König
2018-06-08 13:32                 ` Gabriel C
2018-06-11  7:15                   ` Christoph Hellwig
2018-06-11 19:23                     ` Linus Torvalds
2018-06-11 19:23                       ` Linus Torvalds
2018-06-06 15:24     ` Gabriel C
2018-06-06 14:29 ` Christian König
2018-06-06 14:29   ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.