All of lore.kernel.org
 help / color / mirror / Atom feed
* amdgpu display corruption and hang on AMD A10-9620P
@ 2017-05-09 16:54 Daniel Drake
       [not found] ` <CAD8Lp46UgXx4Du_cpFvvJc+xkuX2_dTC+q=RyZggUc1Ui8oQWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Drake @ 2017-05-09 16:54 UTC (permalink / raw)
  To: dri-devel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Deucher, Alexander
  Cc: Linux Upstreaming Team, Chris Chiu

Hi,

We are working with new laptops that have the AMD Bristol Ridge
chipset with this SoC:

AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G

I think this is the Bristol Ridge chipset.

During boot, the display becomes unusable at the point where the
amdgpu driver loads. You can see at least two horizontal lines of
garbage at this point. We have reproduced on 4.8, 4.10 and linus
master (early 4.12).

Photo: http://pasteboard.co/qrC9mh4p.jpg

Getting logs is tricky because the system appears to freeze at that point.

Is this a known issue? Anything we can do to help diagnosis?

Thanks
Daniel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: amdgpu display corruption and hang on AMD A10-9620P
       [not found] ` <CAD8Lp46UgXx4Du_cpFvvJc+xkuX2_dTC+q=RyZggUc1Ui8oQWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-09 17:03   ` Deucher, Alexander
       [not found]     ` <BN6PR12MB16521B41AEABD4933E63FFD0F7EF0-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Deucher, Alexander @ 2017-05-09 17:03 UTC (permalink / raw)
  To: 'Daniel Drake',
	dri-devel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Linux Upstreaming Team, Chris Chiu

> -----Original Message-----
> From: Daniel Drake [mailto:drake@endlessm.com]
> Sent: Tuesday, May 09, 2017 12:55 PM
> To: dri-devel; amd-gfx@lists.freedesktop.org; Deucher, Alexander
> Cc: Chris Chiu; Linux Upstreaming Team
> Subject: amdgpu display corruption and hang on AMD A10-9620P
> 
> Hi,
> 
> We are working with new laptops that have the AMD Bristol Ridge
> chipset with this SoC:
> 
> AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G
> 
> I think this is the Bristol Ridge chipset.
> 
> During boot, the display becomes unusable at the point where the
> amdgpu driver loads. You can see at least two horizontal lines of
> garbage at this point. We have reproduced on 4.8, 4.10 and linus
> master (early 4.12).
> 
> Photo: http://pasteboard.co/qrC9mh4p.jpg
> 
> Getting logs is tricky because the system appears to freeze at that point.
> 
> Is this a known issue? Anything we can do to help diagnosis?

I'm not aware of any specific issues.  Please file a bug and attach your logs (https://bugs.freedesktop.org) along with information about the system.

Alex

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: amdgpu display corruption and hang on AMD A10-9620P
       [not found]     ` <BN6PR12MB16521B41AEABD4933E63FFD0F7EF0-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-06-12 10:24       ` Carlo Caione
       [not found]         ` <CAL9uMOE2hD31_uO8sD1Zh7g5_WkZ9Wi-mExrMN64v5-mcRMuiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Carlo Caione @ 2017-06-12 10:24 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Linux Upstreaming Team, Chris Chiu,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, dri-devel,
	Daniel Drake

On Tue, May 9, 2017 at 7:03 PM, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
>> -----Original Message-----
>> From: Daniel Drake [mailto:drake@endlessm.com]
>> Sent: Tuesday, May 09, 2017 12:55 PM
>> To: dri-devel; amd-gfx@lists.freedesktop.org; Deucher, Alexander
>> Cc: Chris Chiu; Linux Upstreaming Team
>> Subject: amdgpu display corruption and hang on AMD A10-9620P
>>
>> Hi,
>>
>> We are working with new laptops that have the AMD Bristol Ridge
>> chipset with this SoC:
>>
>> AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G
>>
>> I think this is the Bristol Ridge chipset.
>>
>> During boot, the display becomes unusable at the point where the
>> amdgpu driver loads. You can see at least two horizontal lines of
>> garbage at this point. We have reproduced on 4.8, 4.10 and linus
>> master (early 4.12).
>>
>> Photo: http://pasteboard.co/qrC9mh4p.jpg
>>
>> Getting logs is tricky because the system appears to freeze at that point.
>>
>> Is this a known issue? Anything we can do to help diagnosis?
>
> I'm not aware of any specific issues.  Please file a bug and attach your logs (https://bugs.freedesktop.org) along with information about the system.

Opened https://bugs.freedesktop.org/show_bug.cgi?id=101387 to trace
this bug. I also have attached there the full log we get when
modprobing amdgpu.
Reporting here only the trace for the sake of documentation (full log
attached to the bug opened on freedesktop)

[   80.766937] ---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffffffc0c88942
[   80.766937]
[   80.766408] Kernel panic - not syncing: stack-protector: Kernel
stack is corrupted in: ffffffffc0c88942
[   80.766408]
[   80.766428] CPU: 1 PID: 1594 Comm: modprobe Not tainted 4.11.3+ #2
[   80.766431] Hardware name: Acer Aspire A515-41G/Wartortle_BS, BIOS
V0.09 04/19/2017
[   80.766434] Call Trace:
[   80.766445]  dump_stack+0x63/0x90
[   80.766451]  panic+0xe8/0x236
[   80.766526]  ? amdgpu_atombios_crtc_powergate_init+0x52/0x60 [amdgpu]
[   80.766537]  __stack_chk_fail+0x1b/0x20
[   80.766571]  amdgpu_atombios_crtc_powergate_init+0x52/0x60 [amdgpu]
[   80.766610]  dce_v11_0_hw_init+0x3e/0x2d0 [amdgpu]
[   80.766643]  amdgpu_device_init+0xe23/0x13c0 [amdgpu]
[   80.766647]  ? kmalloc_order+0x18/0x40
[   80.766650]  ? kmalloc_order_trace+0x24/0xa0
[   80.766683]  amdgpu_driver_load_kms+0x5d/0x240 [amdgpu]
[   80.766708]  drm_dev_register+0x148/0x1e0 [drm]
[   80.766721]  drm_get_pci_dev+0xa0/0x160 [drm]
[   80.766754]  amdgpu_pci_probe+0xb9/0xf0 [amdgpu]
[   80.766759]  local_pci_probe+0x45/0xa0
[   80.766762]  pci_device_probe+0xf4/0x150
[   80.766768]  driver_probe_device+0x2c5/0x470
[   80.766772]  __driver_attach+0xdf/0xf0
[   80.766776]  ? driver_probe_device+0x470/0x470
[   80.766780]  bus_for_each_dev+0x6c/0xc0
[   80.766784]  driver_attach+0x1e/0x20
[   80.766787]  bus_add_driver+0x45/0x270
[   80.766790]  ? 0xffffffffc09a8000
[   80.766794]  driver_register+0x60/0xe0
[   80.766796]  ? 0xffffffffc09a8000
[   80.766799]  __pci_register_driver+0x4c/0x50
[   80.766811]  drm_pci_init+0xed/0x100 [drm]
[   80.766816]  ? vga_switcheroo_register_handler+0x6c/0x90
[   80.766819]  ? 0xffffffffc09a8000
[   80.766850]  amdgpu_init+0x9b/0xac [amdgpu]
[   80.766855]  do_one_initcall+0x53/0x1c0
[   80.766860]  ? __vunmap+0x81/0xd0
[   80.766865]  ? kmem_cache_alloc_trace+0xdb/0x1b0
[   80.766868]  ? kfree+0x161/0x170
[   80.766876]  do_init_module+0x60/0x202
[   80.766881]  load_module+0x2612/0x29f0
[   80.766885]  SYSC_finit_module+0xa6/0xf0
[   80.766888]  ? SYSC_finit_module+0xa6/0xf0
[   80.766892]  SyS_finit_module+0xe/0x10
[   80.766896]  entry_SYSCALL_64_fastpath+0x1e/0xad
[   80.766899] RIP: 0033:0x7fa525e60709
[   80.766902] RSP: 002b:00007fff2f5bbbf8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[   80.766905] RAX: ffffffffffffffda RBX: 00007fa526129760 RCX: 00007fa525e60709
[   80.766908] RDX: 0000000000000000 RSI: 000055f51f1c9439 RDI: 000000000000000b
[   80.766910] RBP: 0000000000000070 R08: 0000000000000000 R09: 000055f51fcd83f0
[   80.766913] R10: 000000000000000b R11: 0000000000000246 R12: 000055f51fcd9ff0
[   80.766915] R13: 0000000000000007 R14: 00007fa5261297b8 R15: 0000000000002710
[   80.766931] Kernel Offset: 0x22800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   80.766937] ---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffffffc0c88942

-- 
Carlo Caione  |  +39.340.80.30.096  |  Endless
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: amdgpu display corruption and hang on AMD A10-9620P
       [not found]         ` <CAL9uMOE2hD31_uO8sD1Zh7g5_WkZ9Wi-mExrMN64v5-mcRMuiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-06-15  6:46           ` Carlo Caione
  0 siblings, 0 replies; 4+ messages in thread
From: Carlo Caione @ 2017-06-15  6:46 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: michel-otUistvHUpPR7s880joybQ, Chris Chiu, dri-devel,
	Daniel Drake, Deucher, Alexander, Linux Upstreaming Team

On Mon, Jun 12, 2017 at 12:24 PM, Carlo Caione <carlo@endlessm.com> wrote:
> On Tue, May 9, 2017 at 7:03 PM, Deucher, Alexander
> <Alexander.Deucher@amd.com> wrote:
>>> -----Original Message-----
>>> From: Daniel Drake [mailto:drake@endlessm.com]
>>> Sent: Tuesday, May 09, 2017 12:55 PM
>>> To: dri-devel; amd-gfx@lists.freedesktop.org; Deucher, Alexander
>>> Cc: Chris Chiu; Linux Upstreaming Team
>>> Subject: amdgpu display corruption and hang on AMD A10-9620P
>>>
>>> Hi,
>>>
>>> We are working with new laptops that have the AMD Bristol Ridge
>>> chipset with this SoC:
>>>
>>> AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G
>>>
>>> I think this is the Bristol Ridge chipset.
>>>
>>> During boot, the display becomes unusable at the point where the
>>> amdgpu driver loads. You can see at least two horizontal lines of
>>> garbage at this point. We have reproduced on 4.8, 4.10 and linus
>>> master (early 4.12).
>>>
>>> Photo: http://pasteboard.co/qrC9mh4p.jpg
>>>
>>> Getting logs is tricky because the system appears to freeze at that point.
>>>
>>> Is this a known issue? Anything we can do to help diagnosis?
>>
>> I'm not aware of any specific issues.  Please file a bug and attach your logs (https://bugs.freedesktop.org) along with information about the system.
>
> Opened https://bugs.freedesktop.org/show_bug.cgi?id=101387 to trace
> this bug. I also have attached there the full log we get when
> modprobing amdgpu.
> Reporting here only the trace for the sake of documentation (full log
> attached to the bug opened on freedesktop)
>
> [   80.766937] ---[ end Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffc0c88942
> [   80.766937]
> [   80.766408] Kernel panic - not syncing: stack-protector: Kernel
> stack is corrupted in: ffffffffc0c88942
> [   80.766408]
> [   80.766428] CPU: 1 PID: 1594 Comm: modprobe Not tainted 4.11.3+ #2
> [   80.766431] Hardware name: Acer Aspire A515-41G/Wartortle_BS, BIOS
> V0.09 04/19/2017
> [   80.766434] Call Trace:
> [   80.766445]  dump_stack+0x63/0x90
> [   80.766451]  panic+0xe8/0x236
> [   80.766526]  ? amdgpu_atombios_crtc_powergate_init+0x52/0x60 [amdgpu]
> [   80.766537]  __stack_chk_fail+0x1b/0x20
> [   80.766571]  amdgpu_atombios_crtc_powergate_init+0x52/0x60 [amdgpu]
> [   80.766610]  dce_v11_0_hw_init+0x3e/0x2d0 [amdgpu]
> [   80.766643]  amdgpu_device_init+0xe23/0x13c0 [amdgpu]
> [   80.766647]  ? kmalloc_order+0x18/0x40
> [   80.766650]  ? kmalloc_order_trace+0x24/0xa0
> [   80.766683]  amdgpu_driver_load_kms+0x5d/0x240 [amdgpu]
> [   80.766708]  drm_dev_register+0x148/0x1e0 [drm]
> [   80.766721]  drm_get_pci_dev+0xa0/0x160 [drm]
> [   80.766754]  amdgpu_pci_probe+0xb9/0xf0 [amdgpu]
> [   80.766759]  local_pci_probe+0x45/0xa0
> [   80.766762]  pci_device_probe+0xf4/0x150
> [   80.766768]  driver_probe_device+0x2c5/0x470
> [   80.766772]  __driver_attach+0xdf/0xf0
> [   80.766776]  ? driver_probe_device+0x470/0x470
> [   80.766780]  bus_for_each_dev+0x6c/0xc0
> [   80.766784]  driver_attach+0x1e/0x20
> [   80.766787]  bus_add_driver+0x45/0x270
> [   80.766790]  ? 0xffffffffc09a8000
> [   80.766794]  driver_register+0x60/0xe0
> [   80.766796]  ? 0xffffffffc09a8000
> [   80.766799]  __pci_register_driver+0x4c/0x50
> [   80.766811]  drm_pci_init+0xed/0x100 [drm]
> [   80.766816]  ? vga_switcheroo_register_handler+0x6c/0x90
> [   80.766819]  ? 0xffffffffc09a8000
> [   80.766850]  amdgpu_init+0x9b/0xac [amdgpu]
> [   80.766855]  do_one_initcall+0x53/0x1c0
> [   80.766860]  ? __vunmap+0x81/0xd0
> [   80.766865]  ? kmem_cache_alloc_trace+0xdb/0x1b0
> [   80.766868]  ? kfree+0x161/0x170
> [   80.766876]  do_init_module+0x60/0x202
> [   80.766881]  load_module+0x2612/0x29f0
> [   80.766885]  SYSC_finit_module+0xa6/0xf0
> [   80.766888]  ? SYSC_finit_module+0xa6/0xf0
> [   80.766892]  SyS_finit_module+0xe/0x10
> [   80.766896]  entry_SYSCALL_64_fastpath+0x1e/0xad
> [   80.766899] RIP: 0033:0x7fa525e60709
> [   80.766902] RSP: 002b:00007fff2f5bbbf8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [   80.766905] RAX: ffffffffffffffda RBX: 00007fa526129760 RCX: 00007fa525e60709
> [   80.766908] RDX: 0000000000000000 RSI: 000055f51f1c9439 RDI: 000000000000000b
> [   80.766910] RBP: 0000000000000070 R08: 0000000000000000 R09: 000055f51fcd83f0
> [   80.766913] R10: 000000000000000b R11: 0000000000000246 R12: 000055f51fcd9ff0
> [   80.766915] R13: 0000000000000007 R14: 00007fa5261297b8 R15: 0000000000002710
> [   80.766931] Kernel Offset: 0x22800000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [   80.766937] ---[ end Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffc0c88942

Trying to move this discussion here for more visibility. This is what
is happening.

In amdgpu_atombios_crtc_powergate_init() we are declaring
ENABLE_DISP_POWER_GATING_PARAMETERS_V2_1 args as parameter space, this
is 32bytes wide and passed down to the atombios interpreter in
ctx->ps.

When amdgpu_atombios_crtc_powergate_init() is called this triggers the
parsing of the command table with index == 13 [>> execute C5C0 (len
589, WS 0, PS 0)]. During the execution of this table several
CALL_TABLE (op == 82) are executed. More in detail we first jump to
table with index == 78 [>> execute F166 (len 588, WS 0, PS 8)], then
to table with index == 51 [>> execute F446 (len 465, WS 4, PS 4)] and
to table with index == 75 [>> execute F6CC (len 1330, WS 4, PS 0)]
before finally reaching the EOT for table 13. At this point when
returning in amdgpu_atombios_crtc_powergate_init() the stack is
already corrupted.

The corruption is happening during the execution of the code in the
table 75 [>> execute F6CC (len 1330, WS 4, PS 0)]. In this table a
MOVE_PS is executed with a destination index == 1, accessing
ctx->ps[idx] and causing the stack corruption.

My first guess here is that something is wrong in the atombios code.
Table 75 has WS == 4 and PS == 0 and looking at the opcodes in the
table I basically have only *_WS opcodes (MOVE_WS, TEST_WS, ADD_WS,
etc...) and just two *_PS instructions (MOVE_PS and OR_PS) that (guess
what) are the instructions causing the stack corruption. My guess here
is that the opcodes *_PS in the atombios are wrong and they should
actually be *_WS opcodes.

Another possibility is that the atombios interpreter is doing
something wrong. Don't we need to allocate the size of the ps
allocation struct (ctx->ps) for the command table we are going to
execute after a CALL_TABLE matching the ps size in the table header?
IIUC the code in the kernel, when we are jumping to a different table
ctx->ps is not being reallocated.

Thanks,

-- 
Carlo Caione  |  +39.340.80.30.096  |  Endless
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-15  6:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-09 16:54 amdgpu display corruption and hang on AMD A10-9620P Daniel Drake
     [not found] ` <CAD8Lp46UgXx4Du_cpFvvJc+xkuX2_dTC+q=RyZggUc1Ui8oQWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-09 17:03   ` Deucher, Alexander
     [not found]     ` <BN6PR12MB16521B41AEABD4933E63FFD0F7EF0-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-06-12 10:24       ` Carlo Caione
     [not found]         ` <CAL9uMOE2hD31_uO8sD1Zh7g5_WkZ9Wi-mExrMN64v5-mcRMuiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-15  6:46           ` Carlo Caione

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.