All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression on gfx8 with ring init
@ 2018-09-18 13:27 Tom St Denis
       [not found] ` <8cdb037b-7db7-9be9-2c8a-d52c1b058454-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 13:27 UTC (permalink / raw)
  To: amd-gfx mailing list, Koenig, Christian, Zhou, David(ChunMing)

This commit:

[root@raven linux]# git bisect good
9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
Author: Christian König <christian.koenig@amd.com>
Date:   Tue Sep 18 10:38:09 2018 +0200

     drm/amdgpu: remove fence fallback

     DC doesn't seem to have a fallback path either.

     So when interrupts doesn't work any more we are pretty much busted no
     matter what.

     Signed-off-by: Christian König <christian.koenig@amd.com>
     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>

Results in this:

[   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:07:00.0 
on minor 1
[   24.335674] modprobe (3895) used greatest stack depth: 12600 bytes left
[   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
test timed out.
[   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
failed testing IB on ring 9 (-110).
[   26.407885] [drm:process_one_work] *ERROR* ib ring test failed (-110).
[   28.506708] fuse init (API version 7.27)

On init with my polaris/raven1 system.

Cheers,
Tom
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found] ` <8cdb037b-7db7-9be9-2c8a-d52c1b058454-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 13:30   ` Christian König
       [not found]     ` <7f748397-265d-20e9-b081-108b28994c1f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 13:30 UTC (permalink / raw)
  To: Tom St Denis, amd-gfx mailing list, Koenig, Christian, Zhou,
	David(ChunMing)

Great, not sure if that is a good or a bad news.

Anyway going to revert the change for now. Does anybody volunteer to 
figure out why interrupts sometimes doesn't work correctly on Raven?

Christian.

Am 18.09.2018 um 15:27 schrieb Tom St Denis:
> This commit:
>
> [root@raven linux]# git bisect good
> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
> Author: Christian König <christian.koenig@amd.com>
> Date:   Tue Sep 18 10:38:09 2018 +0200
>
>     drm/amdgpu: remove fence fallback
>
>     DC doesn't seem to have a fallback path either.
>
>     So when interrupts doesn't work any more we are pretty much busted no
>     matter what.
>
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>
> Results in this:
>
> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
> 0000:07:00.0 on minor 1
> [   24.335674] modprobe (3895) used greatest stack depth: 12600 bytes 
> left
> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
> test timed out.
> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
> failed testing IB on ring 9 (-110).
> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed (-110).
> [   28.506708] fuse init (API version 7.27)
>
> On init with my polaris/raven1 system.
>
> Cheers,
> Tom
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]     ` <7f748397-265d-20e9-b081-108b28994c1f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-18 13:32       ` Tom St Denis
       [not found]         ` <1fdbd1f8-afb8-59e7-c057-10da9b9f6e25-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 13:32 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, amd-gfx mailing list, Zhou,
	David(ChunMing)

On 2018-09-18 9:30 a.m., Christian König wrote:
> Great, not sure if that is a good or a bad news.
> 
> Anyway going to revert the change for now. Does anybody volunteer to 
> figure out why interrupts sometimes doesn't work correctly on Raven?

What does "doesn't work correctly?"  My workstation is a Raven1 (Ryzen 
2400G) and other than the TTM bulk move issue has been perfectly stable 
(through suspend/resumes too I might add).

Anything I could test with my devel raven?

Tom

> 
> Christian.
> 
> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>> This commit:
>>
>> [root@raven linux]# git bisect good
>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>> Author: Christian König <christian.koenig@amd.com>
>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>
>>     drm/amdgpu: remove fence fallback
>>
>>     DC doesn't seem to have a fallback path either.
>>
>>     So when interrupts doesn't work any more we are pretty much busted no
>>     matter what.
>>
>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>
>> Results in this:
>>
>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>> 0000:07:00.0 on minor 1
>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 bytes 
>> left
>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
>> test timed out.
>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
>> failed testing IB on ring 9 (-110).
>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed (-110).
>> [   28.506708] fuse init (API version 7.27)
>>
>> On init with my polaris/raven1 system.
>>
>> Cheers,
>> Tom
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]         ` <1fdbd1f8-afb8-59e7-c057-10da9b9f6e25-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 13:35           ` Christian König
       [not found]             ` <80d8437f-0873-8318-01c1-2710adea67e0-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 13:35 UTC (permalink / raw)
  To: Tom St Denis, amd-gfx mailing list, Zhou, David(ChunMing)

Am 18.09.2018 um 15:32 schrieb Tom St Denis:
> On 2018-09-18 9:30 a.m., Christian König wrote:
>> Great, not sure if that is a good or a bad news.
>>
>> Anyway going to revert the change for now. Does anybody volunteer to 
>> figure out why interrupts sometimes doesn't work correctly on Raven?
>
> What does "doesn't work correctly?"  My workstation is a Raven1 (Ryzen 
> 2400G) and other than the TTM bulk move issue has been perfectly 
> stable (through suspend/resumes too I might add).
>
> Anything I could test with my devel raven?

The problem seems to be that on some boards IH handling doesn't work as 
it should.

Can you try to disable the onboard graphics and try again?

If that still doesn't work there is a DRM_DEBUG in amdgpu_ih_process(), 
make that a DRM_ERROR and send me the resulting dmesg of loading amdgpu 
(but don't start any UMD).

Thanks,
Christian.

>
>
> Tom
>
>>
>> Christian.
>>
>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>> This commit:
>>>
>>> [root@raven linux]# git bisect good
>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>> Author: Christian König <christian.koenig@amd.com>
>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>
>>>     drm/amdgpu: remove fence fallback
>>>
>>>     DC doesn't seem to have a fallback path either.
>>>
>>>     So when interrupts doesn't work any more we are pretty much 
>>> busted no
>>>     matter what.
>>>
>>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>>
>>> Results in this:
>>>
>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>> 0000:07:00.0 on minor 1
>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>> bytes left
>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: 
>>> IB test timed out.
>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
>>> failed testing IB on ring 9 (-110).
>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed 
>>> (-110).
>>> [   28.506708] fuse init (API version 7.27)
>>>
>>> On init with my polaris/raven1 system.
>>>
>>> Cheers,
>>> Tom
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]             ` <80d8437f-0873-8318-01c1-2710adea67e0-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 13:58               ` Tom St Denis
       [not found]                 ` <aa62adca-48cb-fb3c-65ac-7d2e3311d602-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 13:58 UTC (permalink / raw)
  To: Christian König, amd-gfx mailing list, Zhou, David(ChunMing)

Odd I couldn't even boot my system with the dGPU as primary after 
rebuilding the kernel.  It got hung up in the IOMMU driver (loads of 
AMD-Vi IOMMU errors) which I wasn't able to capture because it panic'ed 
before loading the network stack.

Bizarre.

I'll keep trying.

Tom

On 2018-09-18 9:35 a.m., Christian König wrote:
> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>> Great, not sure if that is a good or a bad news.
>>>
>>> Anyway going to revert the change for now. Does anybody volunteer to 
>>> figure out why interrupts sometimes doesn't work correctly on Raven?
>>
>> What does "doesn't work correctly?"  My workstation is a Raven1 (Ryzen 
>> 2400G) and other than the TTM bulk move issue has been perfectly 
>> stable (through suspend/resumes too I might add).
>>
>> Anything I could test with my devel raven?
> 
> The problem seems to be that on some boards IH handling doesn't work as 
> it should.
> 
> Can you try to disable the onboard graphics and try again?
> 
> If that still doesn't work there is a DRM_DEBUG in amdgpu_ih_process(), 
> make that a DRM_ERROR and send me the resulting dmesg of loading amdgpu 
> (but don't start any UMD).
> 
> Thanks,
> Christian.
> 
>>
>>
>> Tom
>>
>>>
>>> Christian.
>>>
>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>> This commit:
>>>>
>>>> [root@raven linux]# git bisect good
>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>> Author: Christian König <christian.koenig@amd.com>
>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>
>>>>     drm/amdgpu: remove fence fallback
>>>>
>>>>     DC doesn't seem to have a fallback path either.
>>>>
>>>>     So when interrupts doesn't work any more we are pretty much 
>>>> busted no
>>>>     matter what.
>>>>
>>>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>>>
>>>> Results in this:
>>>>
>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>> 0000:07:00.0 on minor 1
>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>> bytes left
>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: 
>>>> IB test timed out.
>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
>>>> failed testing IB on ring 9 (-110).
>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed 
>>>> (-110).
>>>> [   28.506708] fuse init (API version 7.27)
>>>>
>>>> On init with my polaris/raven1 system.
>>>>
>>>> Cheers,
>>>> Tom
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                 ` <aa62adca-48cb-fb3c-65ac-7d2e3311d602-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 14:09                   ` Tom St Denis
       [not found]                     ` <43e69bf1-8751-dbe8-6b8d-5250c527154c-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 14:09 UTC (permalink / raw)
  To: Christian König, amd-gfx mailing list, Zhou, David(ChunMing)

[-- Attachment #1: Type: text/plain, Size: 3147 bytes --]

Disabling IOMMU in the BIOS resulted in a correct boot up...

Here's the log.

Tom

On 2018-09-18 9:58 a.m., Tom St Denis wrote:
> Odd I couldn't even boot my system with the dGPU as primary after 
> rebuilding the kernel.  It got hung up in the IOMMU driver (loads of 
> AMD-Vi IOMMU errors) which I wasn't able to capture because it panic'ed 
> before loading the network stack.
> 
> Bizarre.
> 
> I'll keep trying.
> 
> Tom
> 
> On 2018-09-18 9:35 a.m., Christian König wrote:
>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>> Great, not sure if that is a good or a bad news.
>>>>
>>>> Anyway going to revert the change for now. Does anybody volunteer to 
>>>> figure out why interrupts sometimes doesn't work correctly on Raven?
>>>
>>> What does "doesn't work correctly?"  My workstation is a Raven1 
>>> (Ryzen 2400G) and other than the TTM bulk move issue has been 
>>> perfectly stable (through suspend/resumes too I might add).
>>>
>>> Anything I could test with my devel raven?
>>
>> The problem seems to be that on some boards IH handling doesn't work 
>> as it should.
>>
>> Can you try to disable the onboard graphics and try again?
>>
>> If that still doesn't work there is a DRM_DEBUG in 
>> amdgpu_ih_process(), make that a DRM_ERROR and send me the resulting 
>> dmesg of loading amdgpu (but don't start any UMD).
>>
>> Thanks,
>> Christian.
>>
>>>
>>>
>>> Tom
>>>
>>>>
>>>> Christian.
>>>>
>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>> This commit:
>>>>>
>>>>> [root@raven linux]# git bisect good
>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>
>>>>>     drm/amdgpu: remove fence fallback
>>>>>
>>>>>     DC doesn't seem to have a fallback path either.
>>>>>
>>>>>     So when interrupts doesn't work any more we are pretty much 
>>>>> busted no
>>>>>     matter what.
>>>>>
>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>
>>>>> Results in this:
>>>>>
>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>>> 0000:07:00.0 on minor 1
>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>>> bytes left
>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: 
>>>>> IB test timed out.
>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
>>>>> failed testing IB on ring 9 (-110).
>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed 
>>>>> (-110).
>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>
>>>>> On init with my polaris/raven1 system.
>>>>>
>>>>> Cheers,
>>>>> Tom
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>
>>
> 


[-- Attachment #2: amdgpu_ih_process.log.gz --]
[-- Type: application/gzip, Size: 20766 bytes --]

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                     ` <43e69bf1-8751-dbe8-6b8d-5250c527154c-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 14:13                       ` Christian König
       [not found]                         ` <34359f9e-be6f-945c-e084-c109e6584d67-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 14:13 UTC (permalink / raw)
  To: Tom St Denis, amd-gfx mailing list, Zhou, David(ChunMing)

Mhm, there is no more failed IB-test in there isn't it?

Christian.

Am 18.09.2018 um 16:09 schrieb Tom St Denis:
> Disabling IOMMU in the BIOS resulted in a correct boot up...
>
> Here's the log.
>
> Tom
>
> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>> Odd I couldn't even boot my system with the dGPU as primary after 
>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads of 
>> AMD-Vi IOMMU errors) which I wasn't able to capture because it 
>> panic'ed before loading the network stack.
>>
>> Bizarre.
>>
>> I'll keep trying.
>>
>> Tom
>>
>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>> Great, not sure if that is a good or a bad news.
>>>>>
>>>>> Anyway going to revert the change for now. Does anybody volunteer 
>>>>> to figure out why interrupts sometimes doesn't work correctly on 
>>>>> Raven?
>>>>
>>>> What does "doesn't work correctly?"  My workstation is a Raven1 
>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been 
>>>> perfectly stable (through suspend/resumes too I might add).
>>>>
>>>> Anything I could test with my devel raven?
>>>
>>> The problem seems to be that on some boards IH handling doesn't work 
>>> as it should.
>>>
>>> Can you try to disable the onboard graphics and try again?
>>>
>>> If that still doesn't work there is a DRM_DEBUG in 
>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the resulting 
>>> dmesg of loading amdgpu (but don't start any UMD).
>>>
>>> Thanks,
>>> Christian.
>>>
>>>>
>>>>
>>>> Tom
>>>>
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>> This commit:
>>>>>>
>>>>>> [root@raven linux]# git bisect good
>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>
>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>
>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>
>>>>>>     So when interrupts doesn't work any more we are pretty much 
>>>>>> busted no
>>>>>>     matter what.
>>>>>>
>>>>>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>
>>>>>> Results in this:
>>>>>>
>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>>>> 0000:07:00.0 on minor 1
>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>>>> bytes left
>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* 
>>>>>> amdgpu: IB test timed out.
>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* 
>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed 
>>>>>> (-110).
>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>
>>>>>> On init with my polaris/raven1 system.
>>>>>>
>>>>>> Cheers,
>>>>>> Tom
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>>
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                         ` <34359f9e-be6f-945c-e084-c109e6584d67-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 14:20                           ` Tom St Denis
       [not found]                             ` <12ac8b66-0ce2-0304-d9ad-6e3f2479e04f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 14:20 UTC (permalink / raw)
  To: Christian König, amd-gfx mailing list, Zhou, David(ChunMing)

[-- Attachment #1: Type: text/plain, Size: 3627 bytes --]

On 2018-09-18 10:13 a.m., Christian König wrote:
> Mhm, there is no more failed IB-test in there isn't it?

oh sorry I thought you wanted to test HEAD~ ... Attached is a log from 
the tip of drm-next

Tom

> 
> Christian.
> 
> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>
>> Here's the log.
>>
>> Tom
>>
>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>> Odd I couldn't even boot my system with the dGPU as primary after 
>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads of 
>>> AMD-Vi IOMMU errors) which I wasn't able to capture because it 
>>> panic'ed before loading the network stack.
>>>
>>> Bizarre.
>>>
>>> I'll keep trying.
>>>
>>> Tom
>>>
>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>
>>>>>> Anyway going to revert the change for now. Does anybody volunteer 
>>>>>> to figure out why interrupts sometimes doesn't work correctly on 
>>>>>> Raven?
>>>>>
>>>>> What does "doesn't work correctly?"  My workstation is a Raven1 
>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been 
>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>
>>>>> Anything I could test with my devel raven?
>>>>
>>>> The problem seems to be that on some boards IH handling doesn't work 
>>>> as it should.
>>>>
>>>> Can you try to disable the onboard graphics and try again?
>>>>
>>>> If that still doesn't work there is a DRM_DEBUG in 
>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the resulting 
>>>> dmesg of loading amdgpu (but don't start any UMD).
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>>>
>>>>>
>>>>> Tom
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>> This commit:
>>>>>>>
>>>>>>> [root@raven linux]# git bisect good
>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>>
>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>>
>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>>
>>>>>>>     So when interrupts doesn't work any more we are pretty much 
>>>>>>> busted no
>>>>>>>     matter what.
>>>>>>>
>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>>>
>>>>>>> Results in this:
>>>>>>>
>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>>>>> 0000:07:00.0 on minor 1
>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>>>>> bytes left
>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* 
>>>>>>> amdgpu: IB test timed out.
>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* 
>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test failed 
>>>>>>> (-110).
>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>>
>>>>>>> On init with my polaris/raven1 system.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Tom
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>
>>>
>>
> 


[-- Attachment #2: amdgpu_ih_process2.log.gz --]
[-- Type: application/gzip, Size: 21165 bytes --]

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                             ` <12ac8b66-0ce2-0304-d9ad-6e3f2479e04f-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 14:31                               ` Christian König
       [not found]                                 ` <3ad24617-bdee-846e-b47c-d854c48fce43-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 14:31 UTC (permalink / raw)
  To: Tom St Denis, amd-gfx mailing list, Zhou, David(ChunMing)

Well looks like interrupt processing is working perfectly fine.

But looking at the error message once more I see that this actually 
affects ring number 9 and not the GFX ring.

Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the 
number?

That must be some of the compute rings.

Thanks,
Christian.

Am 18.09.2018 um 16:20 schrieb Tom St Denis:
> On 2018-09-18 10:13 a.m., Christian König wrote:
>> Mhm, there is no more failed IB-test in there isn't it?
>
> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from 
> the tip of drm-next
>
> Tom
>
>>
>> Christian.
>>
>> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>
>>> Here's the log.
>>>
>>> Tom
>>>
>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>> Odd I couldn't even boot my system with the dGPU as primary after 
>>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads 
>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it 
>>>> panic'ed before loading the network stack.
>>>>
>>>> Bizarre.
>>>>
>>>> I'll keep trying.
>>>>
>>>> Tom
>>>>
>>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>>
>>>>>>> Anyway going to revert the change for now. Does anybody 
>>>>>>> volunteer to figure out why interrupts sometimes doesn't work 
>>>>>>> correctly on Raven?
>>>>>>
>>>>>> What does "doesn't work correctly?"  My workstation is a Raven1 
>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been 
>>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>
>>>>>> Anything I could test with my devel raven?
>>>>>
>>>>> The problem seems to be that on some boards IH handling doesn't 
>>>>> work as it should.
>>>>>
>>>>> Can you try to disable the onboard graphics and try again?
>>>>>
>>>>> If that still doesn't work there is a DRM_DEBUG in 
>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the 
>>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>>> This commit:
>>>>>>>>
>>>>>>>> [root@raven linux]# git bisect good
>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>>>
>>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>>>
>>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>>>
>>>>>>>>     So when interrupts doesn't work any more we are pretty much 
>>>>>>>> busted no
>>>>>>>>     matter what.
>>>>>>>>
>>>>>>>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>
>>>>>>>> Results in this:
>>>>>>>>
>>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>>>>>> 0000:07:00.0 on minor 1
>>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>>>>>> bytes left
>>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* 
>>>>>>>> amdgpu: IB test timed out.
>>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* 
>>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test 
>>>>>>>> failed (-110).
>>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>>>
>>>>>>>> On init with my polaris/raven1 system.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tom
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                 ` <3ad24617-bdee-846e-b47c-d854c48fce43-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-18 14:36                                   ` Deucher, Alexander
       [not found]                                     ` <BN6PR12MB1809B0E02DDA1E8AACFFD1DAF71D0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2018-09-18 14:40                                   ` Tom St Denis
  1 sibling, 1 reply; 18+ messages in thread
From: Deucher, Alexander @ 2018-09-18 14:36 UTC (permalink / raw)
  To: Koenig, Christian, StDenis, Tom, amd-gfx mailing list, Zhou,
	David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 5046 bytes --]

FWIW, a number of consumer Raven boards have bad IVRS tables (windows doesn't use interrupt remapping so they are sometimes wrong and probably not validated.  There are a number of workaround to manually override the IVRS tables to make interrupts work.  I think specifying pci=noacpi is also a possible workaround.


Alex

________________________________
From: amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Sent: Tuesday, September 18, 2018 10:31:16 AM
To: StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
Subject: Re: Regression on gfx8 with ring init

Well looks like interrupt processing is working perfectly fine.

But looking at the error message once more I see that this actually
affects ring number 9 and not the GFX ring.

Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
number?

That must be some of the compute rings.

Thanks,
Christian.

Am 18.09.2018 um 16:20 schrieb Tom St Denis:
> On 2018-09-18 10:13 a.m., Christian König wrote:
>> Mhm, there is no more failed IB-test in there isn't it?
>
> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
> the tip of drm-next
>
> Tom
>
>>
>> Christian.
>>
>> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>
>>> Here's the log.
>>>
>>> Tom
>>>
>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>> Odd I couldn't even boot my system with the dGPU as primary after
>>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>>>> panic'ed before loading the network stack.
>>>>
>>>> Bizarre.
>>>>
>>>> I'll keep trying.
>>>>
>>>> Tom
>>>>
>>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>>
>>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>>>>>> correctly on Raven?
>>>>>>
>>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>
>>>>>> Anything I could test with my devel raven?
>>>>>
>>>>> The problem seems to be that on some boards IH handling doesn't
>>>>> work as it should.
>>>>>
>>>>> Can you try to disable the onboard graphics and try again?
>>>>>
>>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>>> This commit:
>>>>>>>>
>>>>>>>> [root@raven linux]# git bisect good
>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>>>
>>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>>>
>>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>>>
>>>>>>>>     So when interrupts doesn't work any more we are pretty much
>>>>>>>> busted no
>>>>>>>>     matter what.
>>>>>>>>
>>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>>>>
>>>>>>>> Results in this:
>>>>>>>>
>>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>>>> 0000:07:00.0 on minor 1
>>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600
>>>>>>>> bytes left
>>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>>>> amdgpu: IB test timed out.
>>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>>>> failed (-110).
>>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>>>
>>>>>>>> On init with my polaris/raven1 system.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tom
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #1.2: Type: text/html, Size: 8714 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                 ` <3ad24617-bdee-846e-b47c-d854c48fce43-5C7GfCeVMHo@public.gmane.org>
  2018-09-18 14:36                                   ` Deucher, Alexander
@ 2018-09-18 14:40                                   ` Tom St Denis
  1 sibling, 0 replies; 18+ messages in thread
From: Tom St Denis @ 2018-09-18 14:40 UTC (permalink / raw)
  To: Christian König, amd-gfx mailing list, Zhou, David(ChunMing)

On 2018-09-18 10:31 a.m., Christian König wrote:
> Well looks like interrupt processing is working perfectly fine.
> 
> But looking at the error message once more I see that this actually 
> affects ring number 9 and not the GFX ring.
> 
> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the 
> number?
> 
> That must be some of the compute rings.

That's a bingo.

[   32.231734] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 
on minor 0
[   32.233803] modprobe (3816) used greatest stack depth: 12464 bytes left
[   35.266007] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu: IB 
test timed out.
[   35.266373] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu: 
failed testing IB on ring (kiq_2.1.0) 9 (-110).
[   35.403034] [drm:process_one_work] *ERROR* ib ring test failed (-110).

Should point out that kfd still has the old fence logic:

[root@raven amd]# git grep enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: *  nofity when the BO is free to move. 
fence_add_callback --> enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: *  --> amdgpu_amdkfd_fence.enable_signaling
amdgpu/amdgpu_amdkfd_fence.c: * amdgpu_amdkfd_fence.enable_signaling - 
Start a work item that will quiesce
amdgpu/amdgpu_amdkfd_fence.c: * amdkfd_fence_enable_signaling - This 
gets called when TTM wants to evict
amdgpu/amdgpu_amdkfd_fence.c:static bool 
amdkfd_fence_enable_signaling(struct dma_fence *f)
amdgpu/amdgpu_amdkfd_fence.c:   .enable_signaling = 
amdkfd_fence_enable_signaling,


Tom

> 
> Thanks,
> Christian.
> 
> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>> On 2018-09-18 10:13 a.m., Christian König wrote:
>>> Mhm, there is no more failed IB-test in there isn't it?
>>
>> oh sorry I thought you wanted to test HEAD~ ... Attached is a log from 
>> the tip of drm-next
>>
>> Tom
>>
>>>
>>> Christian.
>>>
>>> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>
>>>> Here's the log.
>>>>
>>>> Tom
>>>>
>>>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>> Odd I couldn't even boot my system with the dGPU as primary after 
>>>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads 
>>>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it 
>>>>> panic'ed before loading the network stack.
>>>>>
>>>>> Bizarre.
>>>>>
>>>>> I'll keep trying.
>>>>>
>>>>> Tom
>>>>>
>>>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>>>
>>>>>>>> Anyway going to revert the change for now. Does anybody 
>>>>>>>> volunteer to figure out why interrupts sometimes doesn't work 
>>>>>>>> correctly on Raven?
>>>>>>>
>>>>>>> What does "doesn't work correctly?"  My workstation is a Raven1 
>>>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been 
>>>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>>
>>>>>>> Anything I could test with my devel raven?
>>>>>>
>>>>>> The problem seems to be that on some boards IH handling doesn't 
>>>>>> work as it should.
>>>>>>
>>>>>> Can you try to disable the onboard graphics and try again?
>>>>>>
>>>>>> If that still doesn't work there is a DRM_DEBUG in 
>>>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the 
>>>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>>>> This commit:
>>>>>>>>>
>>>>>>>>> [root@raven linux]# git bisect good
>>>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>>>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>>>> Author: Christian König <christian.koenig@amd.com>
>>>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>>>>
>>>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>>>>>
>>>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>>>>
>>>>>>>>>     So when interrupts doesn't work any more we are pretty much 
>>>>>>>>> busted no
>>>>>>>>>     matter what.
>>>>>>>>>
>>>>>>>>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>>>>>
>>>>>>>>> Results in this:
>>>>>>>>>
>>>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for 
>>>>>>>>> 0000:07:00.0 on minor 1
>>>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600 
>>>>>>>>> bytes left
>>>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* 
>>>>>>>>> amdgpu: IB test timed out.
>>>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* 
>>>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test 
>>>>>>>>> failed (-110).
>>>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>>>>>
>>>>>>>>> On init with my polaris/raven1 system.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Tom
>>>>>>>>> _______________________________________________
>>>>>>>>> amd-gfx mailing list
>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                     ` <BN6PR12MB1809B0E02DDA1E8AACFFD1DAF71D0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-09-18 14:41                                       ` Christian König
       [not found]                                         ` <4a250398-d2ac-1650-739d-e4a6598f1c48-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 14:41 UTC (permalink / raw)
  To: Deucher, Alexander, Koenig, Christian, StDenis, Tom,
	amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 5712 bytes --]

CRTC and GFX interrupts seem to be working perfectly fine.

The problem here looks like only EOP interrupts from the Compute queue 
are not correctly handled.

Most likely a bug somewhere in gfx_v8_0_eop_irq().

Christian.

Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>
> FWIW, a number of consumer Raven boards have bad IVRS tables (windows 
> doesn't use interrupt remapping so they are sometimes wrong and 
> probably not validated.  There are a number of workaround to manually 
> override the IVRS tables to make interrupts work.  I think specifying 
> pci=noacpi is also a possible workaround.
>
>
> Alex
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of 
> Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
> *Subject:* Re: Regression on gfx8 with ring init
> Well looks like interrupt processing is working perfectly fine.
>
> But looking at the error message once more I see that this actually
> affects ring number 9 and not the GFX ring.
>
> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
> number?
>
> That must be some of the compute rings.
>
> Thanks,
> Christian.
>
> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
> > On 2018-09-18 10:13 a.m., Christian König wrote:
> >> Mhm, there is no more failed IB-test in there isn't it?
> >
> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
> > the tip of drm-next
> >
> > Tom
> >
> >>
> >> Christian.
> >>
> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
> >>>
> >>> Here's the log.
> >>>
> >>> Tom
> >>>
> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
> >>>> Odd I couldn't even boot my system with the dGPU as primary after
> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
> >>>> panic'ed before loading the network stack.
> >>>>
> >>>> Bizarre.
> >>>>
> >>>> I'll keep trying.
> >>>>
> >>>> Tom
> >>>>
> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
> >>>>>>> Great, not sure if that is a good or a bad news.
> >>>>>>>
> >>>>>>> Anyway going to revert the change for now. Does anybody
> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
> >>>>>>> correctly on Raven?
> >>>>>>
> >>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
> >>>>>> perfectly stable (through suspend/resumes too I might add).
> >>>>>>
> >>>>>> Anything I could test with my devel raven?
> >>>>>
> >>>>> The problem seems to be that on some boards IH handling doesn't
> >>>>> work as it should.
> >>>>>
> >>>>> Can you try to disable the onboard graphics and try again?
> >>>>>
> >>>>> If that still doesn't work there is a DRM_DEBUG in
> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
> >>>>>
> >>>>> Thanks,
> >>>>> Christian.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> Tom
> >>>>>>
> >>>>>>>
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
> >>>>>>>> This commit:
> >>>>>>>>
> >>>>>>>> [root@raven linux]# git bisect good
> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
> >>>>>>>>
> >>>>>>>>     drm/amdgpu: remove fence fallback
> >>>>>>>>
> >>>>>>>>     DC doesn't seem to have a fallback path either.
> >>>>>>>>
> >>>>>>>>     So when interrupts doesn't work any more we are pretty much
> >>>>>>>> busted no
> >>>>>>>>     matter what.
> >>>>>>>>
> >>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
> >>>>>>>>
> >>>>>>>> Results in this:
> >>>>>>>>
> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
> >>>>>>>> 0000:07:00.0 on minor 1
> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600
> >>>>>>>> bytes left
> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
> >>>>>>>> amdgpu: IB test timed out.
> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
> >>>>>>>> failed (-110).
> >>>>>>>> [   28.506708] fuse init (API version 7.27)
> >>>>>>>>
> >>>>>>>> On init with my polaris/raven1 system.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Tom
> >>>>>>>> _______________________________________________
> >>>>>>>> amd-gfx mailing list
> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 13182 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                         ` <4a250398-d2ac-1650-739d-e4a6598f1c48-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-18 15:00                                           ` Christian König
       [not found]                                             ` <edd44be9-2ef3-3c39-3342-5d3b4bbfa40a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-18 15:00 UTC (permalink / raw)
  To: Deucher, Alexander, StDenis, Tom, amd-gfx mailing list, Zhou,
	David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 6995 bytes --]

Tom,

can you try if the following makes it working again?

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b6160de70d12..d65f5ba92fc5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct amdgpu_ring 
*ring, long timeout)
         return r;
  }

+static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, long 
timeout)
+{
+       return 0;
+}

  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
  {
@@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_kiq = {
         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
         .test_ring = gfx_v8_0_ring_test_ring,
-       .test_ib = gfx_v8_0_ring_test_ib,
+       .test_ib = gfx_v8_0_kiq_ring_test_ib,
         .insert_nop = amdgpu_ring_insert_nop,
         .pad_ib = amdgpu_ring_generic_pad_ib,
         .emit_rreg = gfx_v8_0_ring_emit_rreg,


Thanks,
Christian.

Am 18.09.2018 um 16:41 schrieb Christian König:
> CRTC and GFX interrupts seem to be working perfectly fine.
>
> The problem here looks like only EOP interrupts from the Compute queue 
> are not correctly handled.
>
> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>
> Christian.
>
> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>
>> FWIW, a number of consumer Raven boards have bad IVRS tables (windows 
>> doesn't use interrupt remapping so they are sometimes wrong and 
>> probably not validated.  There are a number of workaround to manually 
>> override the IVRS tables to make interrupts work. I think specifying 
>> pci=noacpi is also a possible workaround.
>>
>>
>> Alex
>>
>> ------------------------------------------------------------------------
>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of 
>> Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>> *Subject:* Re: Regression on gfx8 with ring init
>> Well looks like interrupt processing is working perfectly fine.
>>
>> But looking at the error message once more I see that this actually
>> affects ring number 9 and not the GFX ring.
>>
>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
>> number?
>>
>> That must be some of the compute rings.
>>
>> Thanks,
>> Christian.
>>
>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>> >> Mhm, there is no more failed IB-test in there isn't it?
>> >
>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log from
>> > the tip of drm-next
>> >
>> > Tom
>> >
>> >>
>> >> Christian.
>> >>
>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>> >>>
>> >>> Here's the log.
>> >>>
>> >>> Tom
>> >>>
>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>> >>>> Odd I couldn't even boot my system with the dGPU as primary after
>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>> >>>> panic'ed before loading the network stack.
>> >>>>
>> >>>> Bizarre.
>> >>>>
>> >>>> I'll keep trying.
>> >>>>
>> >>>> Tom
>> >>>>
>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>> >>>>>>> Great, not sure if that is a good or a bad news.
>> >>>>>>>
>> >>>>>>> Anyway going to revert the change for now. Does anybody
>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>> >>>>>>> correctly on Raven?
>> >>>>>>
>> >>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>> >>>>>>
>> >>>>>> Anything I could test with my devel raven?
>> >>>>>
>> >>>>> The problem seems to be that on some boards IH handling doesn't
>> >>>>> work as it should.
>> >>>>>
>> >>>>> Can you try to disable the onboard graphics and try again?
>> >>>>>
>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Christian.
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Tom
>> >>>>>>
>> >>>>>>>
>> >>>>>>> Christian.
>> >>>>>>>
>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>> >>>>>>>> This commit:
>> >>>>>>>>
>> >>>>>>>> [root@raven linux]# git bisect good
>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad commit
>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>> >>>>>>>>
>> >>>>>>>>     drm/amdgpu: remove fence fallback
>> >>>>>>>>
>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>> >>>>>>>>
>> >>>>>>>>     So when interrupts doesn't work any more we are pretty much
>> >>>>>>>> busted no
>> >>>>>>>>     matter what.
>> >>>>>>>>
>> >>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>> >>>>>>>>
>> >>>>>>>> Results in this:
>> >>>>>>>>
>> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>> >>>>>>>> 0000:07:00.0 on minor 1
>> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 12600
>> >>>>>>>> bytes left
>> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>> >>>>>>>> amdgpu: IB test timed out.
>> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>> >>>>>>>> failed (-110).
>> >>>>>>>> [   28.506708] fuse init (API version 7.27)
>> >>>>>>>>
>> >>>>>>>> On init with my polaris/raven1 system.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Tom
>> >>>>>>>> _______________________________________________
>> >>>>>>>> amd-gfx mailing list
>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


[-- Attachment #1.2: Type: text/html, Size: 15773 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                             ` <edd44be9-2ef3-3c39-3342-5d3b4bbfa40a-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-20 20:35                                               ` Andrey Grodzovsky
       [not found]                                                 ` <4afeb01c-37e9-ca76-8055-5dd15fca98d3-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Grodzovsky @ 2018-09-20 20:35 UTC (permalink / raw)
  To: Christian König, Deucher, Alexander, StDenis, Tom,
	amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 7805 bytes --]

What's the status with this error and the suggested patch to fix it ? It 
impacts GPU reset on Polaris11.

Do we want to investigate why the original patch breaks it or just 
disable with the proposed patch ?


P.S Suspend resume also stopped working on latest branch - will bisect 
it later today or tomorrow.


Andrey


On 09/18/2018 11:00 AM, Christian König wrote:
> Tom,
>
> can you try if the following makes it working again?
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index b6160de70d12..d65f5ba92fc5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct 
> amdgpu_ring *ring, long timeout)
>         return r;
>  }
>
> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, long 
> timeout)
> +{
> +       return 0;
> +}
>
>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>  {
> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v8_0_ring_funcs_kiq = {
>         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>         .test_ring = gfx_v8_0_ring_test_ring,
> -       .test_ib = gfx_v8_0_ring_test_ib,
> +       .test_ib = gfx_v8_0_kiq_ring_test_ib,
>         .insert_nop = amdgpu_ring_insert_nop,
>         .pad_ib = amdgpu_ring_generic_pad_ib,
>         .emit_rreg = gfx_v8_0_ring_emit_rreg,
>
>
> Thanks,
> Christian.
>
> Am 18.09.2018 um 16:41 schrieb Christian König:
>> CRTC and GFX interrupts seem to be working perfectly fine.
>>
>> The problem here looks like only EOP interrupts from the Compute 
>> queue are not correctly handled.
>>
>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>
>> Christian.
>>
>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>
>>> FWIW, a number of consumer Raven boards have bad IVRS tables 
>>> (windows doesn't use interrupt remapping so they are sometimes wrong 
>>> and probably not validated.  There are a number of workaround to 
>>> manually override the IVRS tables to make interrupts work.  I think 
>>> specifying pci=noacpi is also a possible workaround.
>>>
>>>
>>> Alex
>>>
>>> ------------------------------------------------------------------------
>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf of 
>>> Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>> *Subject:* Re: Regression on gfx8 with ring init
>>> Well looks like interrupt processing is working perfectly fine.
>>>
>>> But looking at the error message once more I see that this actually
>>> affects ring number 9 and not the GFX ring.
>>>
>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
>>> number?
>>>
>>> That must be some of the compute rings.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>> >
>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log 
>>> from
>>> > the tip of drm-next
>>> >
>>> > Tom
>>> >
>>> >>
>>> >> Christian.
>>> >>
>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>> >>>
>>> >>> Here's the log.
>>> >>>
>>> >>> Tom
>>> >>>
>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>> >>>> Odd I couldn't even boot my system with the dGPU as primary after
>>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>>> >>>> panic'ed before loading the network stack.
>>> >>>>
>>> >>>> Bizarre.
>>> >>>>
>>> >>>> I'll keep trying.
>>> >>>>
>>> >>>> Tom
>>> >>>>
>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>> >>>>>>>
>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>> >>>>>>> correctly on Raven?
>>> >>>>>>
>>> >>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>> >>>>>>
>>> >>>>>> Anything I could test with my devel raven?
>>> >>>>>
>>> >>>>> The problem seems to be that on some boards IH handling doesn't
>>> >>>>> work as it should.
>>> >>>>>
>>> >>>>> Can you try to disable the onboard graphics and try again?
>>> >>>>>
>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Christian.
>>> >>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Tom
>>> >>>>>>
>>> >>>>>>>
>>> >>>>>>> Christian.
>>> >>>>>>>
>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>> >>>>>>>> This commit:
>>> >>>>>>>>
>>> >>>>>>>> [root@raven linux]# git bisect good
>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad 
>>> commit
>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>> >>>>>>>>
>>> >>>>>>>>     drm/amdgpu: remove fence fallback
>>> >>>>>>>>
>>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>>> >>>>>>>>
>>> >>>>>>>>     So when interrupts doesn't work any more we are pretty 
>>> much
>>> >>>>>>>> busted no
>>> >>>>>>>>     matter what.
>>> >>>>>>>>
>>> >>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>> >>>>>>>>
>>> >>>>>>>> Results in this:
>>> >>>>>>>>
>>> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>> >>>>>>>> 0000:07:00.0 on minor 1
>>> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 
>>> 12600
>>> >>>>>>>> bytes left
>>> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>> >>>>>>>> amdgpu: IB test timed out.
>>> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>> >>>>>>>> failed (-110).
>>> >>>>>>>> [   28.506708] fuse init (API version 7.27)
>>> >>>>>>>>
>>> >>>>>>>> On init with my polaris/raven1 system.
>>> >>>>>>>>
>>> >>>>>>>> Cheers,
>>> >>>>>>>> Tom
>>> >>>>>>>> _______________________________________________
>>> >>>>>>>> amd-gfx mailing list
>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 17559 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                                 ` <4afeb01c-37e9-ca76-8055-5dd15fca98d3-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-21 17:11                                                   ` Andrey Grodzovsky
       [not found]                                                     ` <c81338de-5fc7-3be3-961a-bba0eba05351-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Grodzovsky @ 2018-09-21 17:11 UTC (permalink / raw)
  To: Christian König, Deucher, Alexander, StDenis, Tom,
	amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 8308 bytes --]

Ping...


Andrey


On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote:
>
> What's the status with this error and the suggested patch to fix it ? 
> It impacts GPU reset on Polaris11.
>
> Do we want to investigate why the original patch breaks it or just 
> disable with the proposed patch ?
>
>
> P.S Suspend resume also stopped working on latest branch - will bisect 
> it later today or tomorrow.
>
>
> Andrey
>
>
> On 09/18/2018 11:00 AM, Christian König wrote:
>> Tom,
>>
>> can you try if the following makes it working again?
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> index b6160de70d12..d65f5ba92fc5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct 
>> amdgpu_ring *ring, long timeout)
>>         return r;
>>  }
>>
>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, long 
>> timeout)
>> +{
>> +       return 0;
>> +}
>>
>>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>>  {
>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
>> gfx_v8_0_ring_funcs_kiq = {
>>         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>>         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>>         .test_ring = gfx_v8_0_ring_test_ring,
>> -       .test_ib = gfx_v8_0_ring_test_ib,
>> +       .test_ib = gfx_v8_0_kiq_ring_test_ib,
>>         .insert_nop = amdgpu_ring_insert_nop,
>>         .pad_ib = amdgpu_ring_generic_pad_ib,
>>         .emit_rreg = gfx_v8_0_ring_emit_rreg,
>>
>>
>> Thanks,
>> Christian.
>>
>> Am 18.09.2018 um 16:41 schrieb Christian König:
>>> CRTC and GFX interrupts seem to be working perfectly fine.
>>>
>>> The problem here looks like only EOP interrupts from the Compute 
>>> queue are not correctly handled.
>>>
>>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>>
>>> Christian.
>>>
>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>>
>>>> FWIW, a number of consumer Raven boards have bad IVRS tables 
>>>> (windows doesn't use interrupt remapping so they are sometimes 
>>>> wrong and probably not validated.  There are a number of workaround 
>>>> to manually override the IVRS tables to make interrupts work.  I 
>>>> think specifying pci=noacpi is also a possible workaround.
>>>>
>>>>
>>>> Alex
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf 
>>>> of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>>> *Subject:* Re: Regression on gfx8 with ring init
>>>> Well looks like interrupt processing is working perfectly fine.
>>>>
>>>> But looking at the error message once more I see that this actually
>>>> affects ring number 9 and not the GFX ring.
>>>>
>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
>>>> number?
>>>>
>>>> That must be some of the compute rings.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>>> >
>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a log 
>>>> from
>>>> > the tip of drm-next
>>>> >
>>>> > Tom
>>>> >
>>>> >>
>>>> >> Christian.
>>>> >>
>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>> >>>
>>>> >>> Here's the log.
>>>> >>>
>>>> >>> Tom
>>>> >>>
>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary after
>>>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver (loads
>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture because it
>>>> >>>> panic'ed before loading the network stack.
>>>> >>>>
>>>> >>>> Bizarre.
>>>> >>>>
>>>> >>>> I'll keep trying.
>>>> >>>>
>>>> >>>> Tom
>>>> >>>>
>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>>> >>>>>>>
>>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>>> >>>>>>> correctly on Raven?
>>>> >>>>>>
>>>> >>>>>> What does "doesn't work correctly?"  My workstation is a Raven1
>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>>> >>>>>>
>>>> >>>>>> Anything I could test with my devel raven?
>>>> >>>>>
>>>> >>>>> The problem seems to be that on some boards IH handling doesn't
>>>> >>>>> work as it should.
>>>> >>>>>
>>>> >>>>> Can you try to disable the onboard graphics and try again?
>>>> >>>>>
>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>> >>>>>
>>>> >>>>> Thanks,
>>>> >>>>> Christian.
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Tom
>>>> >>>>>>
>>>> >>>>>>>
>>>> >>>>>>> Christian.
>>>> >>>>>>>
>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>> >>>>>>>> This commit:
>>>> >>>>>>>>
>>>> >>>>>>>> [root@raven linux]# git bisect good
>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad 
>>>> commit
>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>> >>>>>>>>
>>>> >>>>>>>>     drm/amdgpu: remove fence fallback
>>>> >>>>>>>>
>>>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>>>> >>>>>>>>
>>>> >>>>>>>>     So when interrupts doesn't work any more we are pretty 
>>>> much
>>>> >>>>>>>> busted no
>>>> >>>>>>>>     matter what.
>>>> >>>>>>>>
>>>> >>>>>>>>     Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>> >>>>>>>>
>>>> >>>>>>>> Results in this:
>>>> >>>>>>>>
>>>> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>> >>>>>>>> 0000:07:00.0 on minor 1
>>>> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 
>>>> 12600
>>>> >>>>>>>> bytes left
>>>> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>> >>>>>>>> amdgpu: IB test timed out.
>>>> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>> >>>>>>>> failed (-110).
>>>> >>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>> >>>>>>>>
>>>> >>>>>>>> On init with my polaris/raven1 system.
>>>> >>>>>>>>
>>>> >>>>>>>> Cheers,
>>>> >>>>>>>> Tom
>>>> >>>>>>>> _______________________________________________
>>>> >>>>>>>> amd-gfx mailing list
>>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 19163 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                                     ` <c81338de-5fc7-3be3-961a-bba0eba05351-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-21 17:53                                                       ` Christian König
       [not found]                                                         ` <04944e7b-044b-4b16-3d2f-e760eedcee9a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2018-09-21 17:53 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, Deucher, Alexander,
	StDenis, Tom, amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 8990 bytes --]

I unfortunately don't have a Polaris to test this myself.

But please give me time till Monday so that I can at least try one more 
things to fix it.

Christian.

Am 21.09.2018 um 19:11 schrieb Andrey Grodzovsky:
>
> Ping...
>
>
> Andrey
>
>
> On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote:
>>
>> What's the status with this error and the suggested patch to fix it ? 
>> It impacts GPU reset on Polaris11.
>>
>> Do we want to investigate why the original patch breaks it or just 
>> disable with the proposed patch ?
>>
>>
>> P.S Suspend resume also stopped working on latest branch - will 
>> bisect it later today or tomorrow.
>>
>>
>> Andrey
>>
>>
>> On 09/18/2018 11:00 AM, Christian König wrote:
>>> Tom,
>>>
>>> can you try if the following makes it working again?
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> index b6160de70d12..d65f5ba92fc5 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct 
>>> amdgpu_ring *ring, long timeout)
>>>         return r;
>>>  }
>>>
>>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, long 
>>> timeout)
>>> +{
>>> +       return 0;
>>> +}
>>>
>>>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>>>  {
>>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
>>> gfx_v8_0_ring_funcs_kiq = {
>>>         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>>>         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>>>         .test_ring = gfx_v8_0_ring_test_ring,
>>> -       .test_ib = gfx_v8_0_ring_test_ib,
>>> +       .test_ib = gfx_v8_0_kiq_ring_test_ib,
>>>         .insert_nop = amdgpu_ring_insert_nop,
>>>         .pad_ib = amdgpu_ring_generic_pad_ib,
>>>         .emit_rreg = gfx_v8_0_ring_emit_rreg,
>>>
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 18.09.2018 um 16:41 schrieb Christian König:
>>>> CRTC and GFX interrupts seem to be working perfectly fine.
>>>>
>>>> The problem here looks like only EOP interrupts from the Compute 
>>>> queue are not correctly handled.
>>>>
>>>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>>>
>>>> Christian.
>>>>
>>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>>>
>>>>> FWIW, a number of consumer Raven boards have bad IVRS tables 
>>>>> (windows doesn't use interrupt remapping so they are sometimes 
>>>>> wrong and probably not validated.  There are a number of 
>>>>> workaround to manually override the IVRS tables to make interrupts 
>>>>> work.  I think specifying pci=noacpi is also a possible workaround.
>>>>>
>>>>>
>>>>> Alex
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf 
>>>>> of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>>>> *Subject:* Re: Regression on gfx8 with ring init
>>>>> Well looks like interrupt processing is working perfectly fine.
>>>>>
>>>>> But looking at the error message once more I see that this actually
>>>>> affects ring number 9 and not the GFX ring.
>>>>>
>>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of the
>>>>> number?
>>>>>
>>>>> That must be some of the compute rings.
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>>>> >
>>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a 
>>>>> log from
>>>>> > the tip of drm-next
>>>>> >
>>>>> > Tom
>>>>> >
>>>>> >>
>>>>> >> Christian.
>>>>> >>
>>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>> >>>
>>>>> >>> Here's the log.
>>>>> >>>
>>>>> >>> Tom
>>>>> >>>
>>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary 
>>>>> after
>>>>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver 
>>>>> (loads
>>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture 
>>>>> because it
>>>>> >>>> panic'ed before loading the network stack.
>>>>> >>>>
>>>>> >>>> Bizarre.
>>>>> >>>>
>>>>> >>>> I'll keep trying.
>>>>> >>>>
>>>>> >>>> Tom
>>>>> >>>>
>>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>>>> >>>>>>>
>>>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't work
>>>>> >>>>>>> correctly on Raven?
>>>>> >>>>>>
>>>>> >>>>>> What does "doesn't work correctly?"  My workstation is a 
>>>>> Raven1
>>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>> >>>>>>
>>>>> >>>>>> Anything I could test with my devel raven?
>>>>> >>>>>
>>>>> >>>>> The problem seems to be that on some boards IH handling doesn't
>>>>> >>>>> work as it should.
>>>>> >>>>>
>>>>> >>>>> Can you try to disable the onboard graphics and try again?
>>>>> >>>>>
>>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>> >>>>>
>>>>> >>>>> Thanks,
>>>>> >>>>> Christian.
>>>>> >>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> Tom
>>>>> >>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> Christian.
>>>>> >>>>>>>
>>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>> >>>>>>>> This commit:
>>>>> >>>>>>>>
>>>>> >>>>>>>> [root@raven linux]# git bisect good
>>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first bad 
>>>>> commit
>>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>> >>>>>>>>
>>>>> >>>>>>>>     drm/amdgpu: remove fence fallback
>>>>> >>>>>>>>
>>>>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>> >>>>>>>>
>>>>> >>>>>>>>     So when interrupts doesn't work any more we are 
>>>>> pretty much
>>>>> >>>>>>>> busted no
>>>>> >>>>>>>>     matter what.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>> >>>>>>>>     Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>> >>>>>>>>
>>>>> >>>>>>>> Results in this:
>>>>> >>>>>>>>
>>>>> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>> >>>>>>>> 0000:07:00.0 on minor 1
>>>>> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack depth: 
>>>>> 12600
>>>>> >>>>>>>> bytes left
>>>>> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>> >>>>>>>> amdgpu: IB test timed out.
>>>>> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>> >>>>>>>> failed (-110).
>>>>> >>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>> >>>>>>>>
>>>>> >>>>>>>> On init with my polaris/raven1 system.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Cheers,
>>>>> >>>>>>>> Tom
>>>>> >>>>>>>> _______________________________________________
>>>>> >>>>>>>> amd-gfx mailing list
>>>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 20975 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                                         ` <04944e7b-044b-4b16-3d2f-e760eedcee9a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-09-21 17:56                                                           ` Andrey Grodzovsky
       [not found]                                                             ` <681ddd4e-6bd2-db28-4286-2cc577d0f00a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Andrey Grodzovsky @ 2018-09-21 17:56 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Deucher, Alexander, StDenis, Tom,
	amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 9488 bytes --]

No worries, I will just revert locally until then to clear the extra 
errors during my investigation of current GPU reset status and issues.


Andrey


On 09/21/2018 01:53 PM, Christian König wrote:
> I unfortunately don't have a Polaris to test this myself.
>
> But please give me time till Monday so that I can at least try one 
> more things to fix it.
>
> Christian.
>
> Am 21.09.2018 um 19:11 schrieb Andrey Grodzovsky:
>>
>> Ping...
>>
>>
>> Andrey
>>
>>
>> On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote:
>>>
>>> What's the status with this error and the suggested patch to fix it 
>>> ? It impacts GPU reset on Polaris11.
>>>
>>> Do we want to investigate why the original patch breaks it or just 
>>> disable with the proposed patch ?
>>>
>>>
>>> P.S Suspend resume also stopped working on latest branch - will 
>>> bisect it later today or tomorrow.
>>>
>>>
>>> Andrey
>>>
>>>
>>> On 09/18/2018 11:00 AM, Christian König wrote:
>>>> Tom,
>>>>
>>>> can you try if the following makes it working again?
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> index b6160de70d12..d65f5ba92fc5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct 
>>>> amdgpu_ring *ring, long timeout)
>>>>         return r;
>>>>  }
>>>>
>>>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, 
>>>> long timeout)
>>>> +{
>>>> +       return 0;
>>>> +}
>>>>
>>>>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>>>>  {
>>>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
>>>> gfx_v8_0_ring_funcs_kiq = {
>>>>         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>>>>         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>>>>         .test_ring = gfx_v8_0_ring_test_ring,
>>>> -       .test_ib = gfx_v8_0_ring_test_ib,
>>>> +       .test_ib = gfx_v8_0_kiq_ring_test_ib,
>>>>         .insert_nop = amdgpu_ring_insert_nop,
>>>>         .pad_ib = amdgpu_ring_generic_pad_ib,
>>>>         .emit_rreg = gfx_v8_0_ring_emit_rreg,
>>>>
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 18.09.2018 um 16:41 schrieb Christian König:
>>>>> CRTC and GFX interrupts seem to be working perfectly fine.
>>>>>
>>>>> The problem here looks like only EOP interrupts from the Compute 
>>>>> queue are not correctly handled.
>>>>>
>>>>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>>>>
>>>>>> FWIW, a number of consumer Raven boards have bad IVRS tables 
>>>>>> (windows doesn't use interrupt remapping so they are sometimes 
>>>>>> wrong and probably not validated.  There are a number of 
>>>>>> workaround to manually override the IVRS tables to make 
>>>>>> interrupts work.  I think specifying pci=noacpi is also a 
>>>>>> possible workaround.
>>>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on behalf 
>>>>>> of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>>>>> *Subject:* Re: Regression on gfx8 with ring init
>>>>>> Well looks like interrupt processing is working perfectly fine.
>>>>>>
>>>>>> But looking at the error message once more I see that this actually
>>>>>> affects ring number 9 and not the GFX ring.
>>>>>>
>>>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead of 
>>>>>> the
>>>>>> number?
>>>>>>
>>>>>> That must be some of the compute rings.
>>>>>>
>>>>>> Thanks,
>>>>>> Christian.
>>>>>>
>>>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>>>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>>>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>>>>> >
>>>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a 
>>>>>> log from
>>>>>> > the tip of drm-next
>>>>>> >
>>>>>> > Tom
>>>>>> >
>>>>>> >>
>>>>>> >> Christian.
>>>>>> >>
>>>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>>> >>>
>>>>>> >>> Here's the log.
>>>>>> >>>
>>>>>> >>> Tom
>>>>>> >>>
>>>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary 
>>>>>> after
>>>>>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver 
>>>>>> (loads
>>>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture 
>>>>>> because it
>>>>>> >>>> panic'ed before loading the network stack.
>>>>>> >>>>
>>>>>> >>>> Bizarre.
>>>>>> >>>>
>>>>>> >>>> I'll keep trying.
>>>>>> >>>>
>>>>>> >>>> Tom
>>>>>> >>>>
>>>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>>>>> >>>>>>>
>>>>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't 
>>>>>> work
>>>>>> >>>>>>> correctly on Raven?
>>>>>> >>>>>>
>>>>>> >>>>>> What does "doesn't work correctly?"  My workstation is a 
>>>>>> Raven1
>>>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has been
>>>>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>> >>>>>>
>>>>>> >>>>>> Anything I could test with my devel raven?
>>>>>> >>>>>
>>>>>> >>>>> The problem seems to be that on some boards IH handling 
>>>>>> doesn't
>>>>>> >>>>> work as it should.
>>>>>> >>>>>
>>>>>> >>>>> Can you try to disable the onboard graphics and try again?
>>>>>> >>>>>
>>>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>> >>>>>
>>>>>> >>>>> Thanks,
>>>>>> >>>>> Christian.
>>>>>> >>>>>
>>>>>> >>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>> Tom
>>>>>> >>>>>>
>>>>>> >>>>>>>
>>>>>> >>>>>>> Christian.
>>>>>> >>>>>>>
>>>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>> >>>>>>>> This commit:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> [root@raven linux]# git bisect good
>>>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first 
>>>>>> bad commit
>>>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>     drm/amdgpu: remove fence fallback
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>>     So when interrupts doesn't work any more we are 
>>>>>> pretty much
>>>>>> >>>>>>>> busted no
>>>>>> >>>>>>>>     matter what.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>> Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Results in this:
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> [   24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>> >>>>>>>> 0000:07:00.0 on minor 1
>>>>>> >>>>>>>> [   24.335674] modprobe (3895) used greatest stack 
>>>>>> depth: 12600
>>>>>> >>>>>>>> bytes left
>>>>>> >>>>>>>> [   26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>> >>>>>>>> amdgpu: IB test timed out.
>>>>>> >>>>>>>> [   26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>> >>>>>>>> [   26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>> >>>>>>>> failed (-110).
>>>>>> >>>>>>>> [   28.506708] fuse init (API version 7.27)
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> On init with my polaris/raven1 system.
>>>>>> >>>>>>>>
>>>>>> >>>>>>>> Cheers,
>>>>>> >>>>>>>> Tom
>>>>>> >>>>>>>> _______________________________________________
>>>>>> >>>>>>>> amd-gfx mailing list
>>>>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> >>>>>>>
>>>>>> >>>>>>
>>>>>> >>>>>
>>>>>> >>>>
>>>>>> >>>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


[-- Attachment #1.2: Type: text/html, Size: 22275 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Regression on gfx8 with ring init
       [not found]                                                             ` <681ddd4e-6bd2-db28-4286-2cc577d0f00a-5C7GfCeVMHo@public.gmane.org>
@ 2018-09-21 18:04                                                               ` Andrey Grodzovsky
  0 siblings, 0 replies; 18+ messages in thread
From: Andrey Grodzovsky @ 2018-09-21 18:04 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Deucher, Alexander, StDenis, Tom,
	amd-gfx mailing list, Zhou, David(ChunMing)


[-- Attachment #1.1: Type: text/plain, Size: 9881 bytes --]

BTW, this also seems to be what breaks suspend/resume.


Andrey


On 09/21/2018 01:56 PM, Andrey Grodzovsky wrote:
>
> No worries, I will just revert locally until then to clear the extra 
> errors during my investigation of current GPU reset status and issues.
>
>
> Andrey
>
>
> On 09/21/2018 01:53 PM, Christian König wrote:
>> I unfortunately don't have a Polaris to test this myself.
>>
>> But please give me time till Monday so that I can at least try one 
>> more things to fix it.
>>
>> Christian.
>>
>> Am 21.09.2018 um 19:11 schrieb Andrey Grodzovsky:
>>>
>>> Ping...
>>>
>>>
>>> Andrey
>>>
>>>
>>> On 09/20/2018 04:35 PM, Andrey Grodzovsky wrote:
>>>>
>>>> What's the status with this error and the suggested patch to fix it 
>>>> ? It impacts GPU reset on Polaris11.
>>>>
>>>> Do we want to investigate why the original patch breaks it or just 
>>>> disable with the proposed patch ?
>>>>
>>>>
>>>> P.S Suspend resume also stopped working on latest branch - will 
>>>> bisect it later today or tomorrow.
>>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>> On 09/18/2018 11:00 AM, Christian König wrote:
>>>>> Tom,
>>>>>
>>>>> can you try if the following makes it working again?
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>>> index b6160de70d12..d65f5ba92fc5 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>>>> @@ -937,6 +937,10 @@ static int gfx_v8_0_ring_test_ib(struct 
>>>>> amdgpu_ring *ring, long timeout)
>>>>>         return r;
>>>>>  }
>>>>>
>>>>> +static int gfx_v8_0_kiq_ring_test_ib(struct amdgpu_ring *ring, 
>>>>> long timeout)
>>>>> +{
>>>>> +       return 0;
>>>>> +}
>>>>>
>>>>>  static void gfx_v8_0_free_microcode(struct amdgpu_device *adev)
>>>>>  {
>>>>> @@ -7174,7 +7178,7 @@ static const struct amdgpu_ring_funcs 
>>>>> gfx_v8_0_ring_funcs_kiq = {
>>>>>         .emit_ib = gfx_v8_0_ring_emit_ib_compute,
>>>>>         .emit_fence = gfx_v8_0_ring_emit_fence_kiq,
>>>>>         .test_ring = gfx_v8_0_ring_test_ring,
>>>>> -       .test_ib = gfx_v8_0_ring_test_ib,
>>>>> +       .test_ib = gfx_v8_0_kiq_ring_test_ib,
>>>>>         .insert_nop = amdgpu_ring_insert_nop,
>>>>>         .pad_ib = amdgpu_ring_generic_pad_ib,
>>>>>         .emit_rreg = gfx_v8_0_ring_emit_rreg,
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> Am 18.09.2018 um 16:41 schrieb Christian König:
>>>>>> CRTC and GFX interrupts seem to be working perfectly fine.
>>>>>>
>>>>>> The problem here looks like only EOP interrupts from the Compute 
>>>>>> queue are not correctly handled.
>>>>>>
>>>>>> Most likely a bug somewhere in gfx_v8_0_eop_irq().
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 18.09.2018 um 16:36 schrieb Deucher, Alexander:
>>>>>>>
>>>>>>> FWIW, a number of consumer Raven boards have bad IVRS tables 
>>>>>>> (windows doesn't use interrupt remapping so they are sometimes 
>>>>>>> wrong and probably not validated.  There are a number of 
>>>>>>> workaround to manually override the IVRS tables to make 
>>>>>>> interrupts work.  I think specifying pci=noacpi is also a 
>>>>>>> possible workaround.
>>>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>> *From:* amd-gfx <amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> on 
>>>>>>> behalf of Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>> *Sent:* Tuesday, September 18, 2018 10:31:16 AM
>>>>>>> *To:* StDenis, Tom; amd-gfx mailing list; Zhou, David(ChunMing)
>>>>>>> *Subject:* Re: Regression on gfx8 with ring init
>>>>>>> Well looks like interrupt processing is working perfectly fine.
>>>>>>>
>>>>>>> But looking at the error message once more I see that this actually
>>>>>>> affects ring number 9 and not the GFX ring.
>>>>>>>
>>>>>>> Can you fix amdgpu_ib_ring_tests() to print ring->name instead 
>>>>>>> of the
>>>>>>> number?
>>>>>>>
>>>>>>> That must be some of the compute rings.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 18.09.2018 um 16:20 schrieb Tom St Denis:
>>>>>>> > On 2018-09-18 10:13 a.m., Christian König wrote:
>>>>>>> >> Mhm, there is no more failed IB-test in there isn't it?
>>>>>>> >
>>>>>>> > oh sorry I thought you wanted to test HEAD~ ... Attached is a 
>>>>>>> log from
>>>>>>> > the tip of drm-next
>>>>>>> >
>>>>>>> > Tom
>>>>>>> >
>>>>>>> >>
>>>>>>> >> Christian.
>>>>>>> >>
>>>>>>> >> Am 18.09.2018 um 16:09 schrieb Tom St Denis:
>>>>>>> >>> Disabling IOMMU in the BIOS resulted in a correct boot up...
>>>>>>> >>>
>>>>>>> >>> Here's the log.
>>>>>>> >>>
>>>>>>> >>> Tom
>>>>>>> >>>
>>>>>>> >>> On 2018-09-18 9:58 a.m., Tom St Denis wrote:
>>>>>>> >>>> Odd I couldn't even boot my system with the dGPU as primary 
>>>>>>> after
>>>>>>> >>>> rebuilding the kernel.  It got hung up in the IOMMU driver 
>>>>>>> (loads
>>>>>>> >>>> of AMD-Vi IOMMU errors) which I wasn't able to capture 
>>>>>>> because it
>>>>>>> >>>> panic'ed before loading the network stack.
>>>>>>> >>>>
>>>>>>> >>>> Bizarre.
>>>>>>> >>>>
>>>>>>> >>>> I'll keep trying.
>>>>>>> >>>>
>>>>>>> >>>> Tom
>>>>>>> >>>>
>>>>>>> >>>> On 2018-09-18 9:35 a.m., Christian König wrote:
>>>>>>> >>>>> Am 18.09.2018 um 15:32 schrieb Tom St Denis:
>>>>>>> >>>>>> On 2018-09-18 9:30 a.m., Christian König wrote:
>>>>>>> >>>>>>> Great, not sure if that is a good or a bad news.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Anyway going to revert the change for now. Does anybody
>>>>>>> >>>>>>> volunteer to figure out why interrupts sometimes doesn't 
>>>>>>> work
>>>>>>> >>>>>>> correctly on Raven?
>>>>>>> >>>>>>
>>>>>>> >>>>>> What does "doesn't work correctly?"  My workstation is a 
>>>>>>> Raven1
>>>>>>> >>>>>> (Ryzen 2400G) and other than the TTM bulk move issue has 
>>>>>>> been
>>>>>>> >>>>>> perfectly stable (through suspend/resumes too I might add).
>>>>>>> >>>>>>
>>>>>>> >>>>>> Anything I could test with my devel raven?
>>>>>>> >>>>>
>>>>>>> >>>>> The problem seems to be that on some boards IH handling 
>>>>>>> doesn't
>>>>>>> >>>>> work as it should.
>>>>>>> >>>>>
>>>>>>> >>>>> Can you try to disable the onboard graphics and try again?
>>>>>>> >>>>>
>>>>>>> >>>>> If that still doesn't work there is a DRM_DEBUG in
>>>>>>> >>>>> amdgpu_ih_process(), make that a DRM_ERROR and send me the
>>>>>>> >>>>> resulting dmesg of loading amdgpu (but don't start any UMD).
>>>>>>> >>>>>
>>>>>>> >>>>> Thanks,
>>>>>>> >>>>> Christian.
>>>>>>> >>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>> Tom
>>>>>>> >>>>>>
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Christian.
>>>>>>> >>>>>>>
>>>>>>> >>>>>>> Am 18.09.2018 um 15:27 schrieb Tom St Denis:
>>>>>>> >>>>>>>> This commit:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> [root@raven linux]# git bisect good
>>>>>>> >>>>>>>> 9b0df0937a852d299fbe42a5939c9a8a4cc83c55 is the first 
>>>>>>> bad commit
>>>>>>> >>>>>>>> commit 9b0df0937a852d299fbe42a5939c9a8a4cc83c55
>>>>>>> >>>>>>>> Author: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>> >>>>>>>> Date:   Tue Sep 18 10:38:09 2018 +0200
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> drm/amdgpu: remove fence fallback
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>     DC doesn't seem to have a fallback path either.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>>     So when interrupts doesn't work any more we are 
>>>>>>> pretty much
>>>>>>> >>>>>>>> busted no
>>>>>>> >>>>>>>>     matter what.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Signed-off-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
>>>>>>> >>>>>>>> Reviewed-by: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Results in this:
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> [ 24.334025] [drm] Initialized amdgpu 3.27.0 20150101 for
>>>>>>> >>>>>>>> 0000:07:00.0 on minor 1
>>>>>>> >>>>>>>> [ 24.335674] modprobe (3895) used greatest stack depth: 
>>>>>>> 12600
>>>>>>> >>>>>>>> bytes left
>>>>>>> >>>>>>>> [ 26.272358] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR*
>>>>>>> >>>>>>>> amdgpu: IB test timed out.
>>>>>>> >>>>>>>> [ 26.272460] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
>>>>>>> >>>>>>>> amdgpu: failed testing IB on ring 9 (-110).
>>>>>>> >>>>>>>> [ 26.407885] [drm:process_one_work] *ERROR* ib ring test
>>>>>>> >>>>>>>> failed (-110).
>>>>>>> >>>>>>>> [ 28.506708] fuse init (API version 7.27)
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> On init with my polaris/raven1 system.
>>>>>>> >>>>>>>>
>>>>>>> >>>>>>>> Cheers,
>>>>>>> >>>>>>>> Tom
>>>>>>> >>>>>>>> _______________________________________________
>>>>>>> >>>>>>>> amd-gfx mailing list
>>>>>>> >>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>>> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>> >>>>>>>
>>>>>>> >>>>>>
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>


[-- Attachment #1.2: Type: text/html, Size: 23631 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-09-21 18:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-18 13:27 Regression on gfx8 with ring init Tom St Denis
     [not found] ` <8cdb037b-7db7-9be9-2c8a-d52c1b058454-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:30   ` Christian König
     [not found]     ` <7f748397-265d-20e9-b081-108b28994c1f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18 13:32       ` Tom St Denis
     [not found]         ` <1fdbd1f8-afb8-59e7-c057-10da9b9f6e25-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:35           ` Christian König
     [not found]             ` <80d8437f-0873-8318-01c1-2710adea67e0-5C7GfCeVMHo@public.gmane.org>
2018-09-18 13:58               ` Tom St Denis
     [not found]                 ` <aa62adca-48cb-fb3c-65ac-7d2e3311d602-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:09                   ` Tom St Denis
     [not found]                     ` <43e69bf1-8751-dbe8-6b8d-5250c527154c-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:13                       ` Christian König
     [not found]                         ` <34359f9e-be6f-945c-e084-c109e6584d67-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:20                           ` Tom St Denis
     [not found]                             ` <12ac8b66-0ce2-0304-d9ad-6e3f2479e04f-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:31                               ` Christian König
     [not found]                                 ` <3ad24617-bdee-846e-b47c-d854c48fce43-5C7GfCeVMHo@public.gmane.org>
2018-09-18 14:36                                   ` Deucher, Alexander
     [not found]                                     ` <BN6PR12MB1809B0E02DDA1E8AACFFD1DAF71D0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-09-18 14:41                                       ` Christian König
     [not found]                                         ` <4a250398-d2ac-1650-739d-e4a6598f1c48-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-18 15:00                                           ` Christian König
     [not found]                                             ` <edd44be9-2ef3-3c39-3342-5d3b4bbfa40a-5C7GfCeVMHo@public.gmane.org>
2018-09-20 20:35                                               ` Andrey Grodzovsky
     [not found]                                                 ` <4afeb01c-37e9-ca76-8055-5dd15fca98d3-5C7GfCeVMHo@public.gmane.org>
2018-09-21 17:11                                                   ` Andrey Grodzovsky
     [not found]                                                     ` <c81338de-5fc7-3be3-961a-bba0eba05351-5C7GfCeVMHo@public.gmane.org>
2018-09-21 17:53                                                       ` Christian König
     [not found]                                                         ` <04944e7b-044b-4b16-3d2f-e760eedcee9a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-09-21 17:56                                                           ` Andrey Grodzovsky
     [not found]                                                             ` <681ddd4e-6bd2-db28-4286-2cc577d0f00a-5C7GfCeVMHo@public.gmane.org>
2018-09-21 18:04                                                               ` Andrey Grodzovsky
2018-09-18 14:40                                   ` Tom St Denis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.