All of lore.kernel.org
 help / color / mirror / Atom feed
* Stack out of bounds in KFD on Arcturus
@ 2019-10-17 20:09 Grodzovsky, Andrey
       [not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-17 20:09 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

He Felix - I see this on boot when working with Arcturus.

Andrey


[  103.602092] kfd kfd: Allocated 3969056 bytes on gart
[  103.610769] 
==================================================================
[  103.611469] BUG: KASAN: stack-out-of-bounds in 
kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[  103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122

[  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G           
O      5.3.0-rc3+ #45
[  103.611847] Hardware name: System manufacturer System Product 
Name/Z170-PRO, BIOS 1902 06/27/2016
[  103.611856] Call Trace:
[  103.611879]  dump_stack+0x71/0xab
[  103.611907]  print_address_description+0x1da/0x3c0
[  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[  103.612479]  __kasan_report+0x13f/0x1a0
[  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[  103.613604]  kasan_report+0xe/0x20
[  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
[  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
[  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
[  103.614898]  ? kmalloc_order+0x63/0x70
[  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
[  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
[  103.616095]  ? up_write+0x4b/0x70
[  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
[  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
[  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
[  103.617777]  ? mutex_lock_io_nested+0xac0/0xac0
[  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
[  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
[  103.617877]  ? wait_for_completion+0x200/0x200
[  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
[  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
[  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
[  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
[  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
[  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
[  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
[  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
[  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
[  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
[  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
[  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
[  103.623842]  ? __isolate_free_page+0x290/0x290
[  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
[  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
[  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
[  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
[  103.623970]  ? kmalloc_order+0x63/0x70
[  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
[  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
[  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
[  103.624768]  ? __kasan_slab_free+0x133/0x160
[  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
[  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
[  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
[  103.625580]  local_pci_probe+0x74/0xd0
[  103.625603]  pci_device_probe+0x1fa/0x310
[  103.625620]  ? pci_device_remove+0x1c0/0x1c0
[  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
[  103.625673]  really_probe+0x367/0x5d0
[  103.625700]  driver_probe_device+0x177/0x1b0
[  103.625721]  device_driver_attach+0x8a/0x90
[  103.625737]  ? device_driver_attach+0x90/0x90
[  103.625746]  __driver_attach+0xeb/0x190
[  103.625765]  ? device_driver_attach+0x90/0x90
[  103.625773]  bus_for_each_dev+0xe4/0x160
[  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
[  103.625829]  bus_add_driver+0x277/0x330
[  103.625855]  driver_register+0xc6/0x1a0
[  103.625866]  ? 0xffffffffa0d88000
[  103.625880]  do_one_initcall+0xd3/0x334
[  103.625895]  ? trace_event_raw_event_initcall_finish+0x150/0x150
[  103.625911]  ? kasan_unpoison_shadow+0x31/0x40
[  103.625924]  ? __kasan_kmalloc+0xd5/0xf0
[  103.625946]  ? kmem_cache_alloc_trace+0x154/0x300
[  103.625955]  ? kasan_unpoison_shadow+0x31/0x40
[  103.625985]  do_init_module+0xec/0x354
[  103.626011]  load_module+0x3c91/0x4980
[  103.626118]  ? module_frob_arch_sections+0x20/0x20
[  103.626132]  ? ima_read_file+0x10/0x10
[  103.626142]  ? vfs_read+0x127/0x190
[  103.626163]  ? kernel_read+0x95/0xb0
[  103.626187]  ? kernel_read_file+0x1a5/0x340
[  103.626277]  ? __do_sys_finit_module+0x175/0x1b0
[  103.626287]  __do_sys_finit_module+0x175/0x1b0
[  103.626301]  ? __ia32_sys_init_module+0x40/0x40
[  103.626338]  ? lock_downgrade+0x390/0x390
[  103.626396]  ? vtime_user_exit+0xc8/0xe0
[  103.626423]  do_syscall_64+0x7d/0x250
[  103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  103.626450] RIP: 0033:0x7f09984854d9
[  103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
[  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX: 
0000000000000139
[  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX: 
00007f09984854d9
[  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI: 
0000000000000006
[  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09: 
0000000000000000
[  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12: 
0000000000000000
[  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15: 
0000000000000013

[  103.626592] The buggy address belongs to the page:
[  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0 
mapping:0000000000000000 index:0x0
[  103.626675] flags: 0x2ffff0000000000()
[  103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788 
0000000000000000
[  103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff 
0000000000000000
[  103.626702] page dumped because: kasan: bad access detected

[  103.626742] addr ffff8883cb19ee38 is located in stack of task 
modprobe/1122 at offset 264 in frame:
[  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]

[  103.627346] this frame has 3 objects:
[  103.627405]  [32, 36) 'avail_size'
[  103.627410]  [96, 120) 'local_mem_info'
[  103.627466]  [160, 264) 'cu_info'

[  103.627602] Memory state around the buggy address:
[  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4 
f4 f2 f2
[  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00 
00 00 00
[  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3 
f3 00 00
[  103.627989]                                         ^
[  103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 
00 00 00
[  103.628273] 
==================================================================

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-17 21:29   ` Kuehling, Felix
       [not found]     ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Kuehling, Felix @ 2019-10-17 21:29 UTC (permalink / raw)
  To: Grodzovsky, Andrey; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I don't see why this problem would be specific to Arcturus. I don't see 
any excessive allocations on the stack either. Also the code involved 
here hasn't changed recently.

Are you using some weird kernel config with a smaller stack? Is it 
specific to a compiler version or some optimization flags? I've 
sometimes seen function inlining cause excessive stack usage.

Regards,
   Felix

On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
> He Felix - I see this on boot when working with Arcturus.
>
> Andrey
>
>
> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
> [  103.610769]
> ==================================================================
> [  103.611469] BUG: KASAN: stack-out-of-bounds in
> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>
> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
> O      5.3.0-rc3+ #45
> [  103.611847] Hardware name: System manufacturer System Product
> Name/Z170-PRO, BIOS 1902 06/27/2016
> [  103.611856] Call Trace:
> [  103.611879]  dump_stack+0x71/0xab
> [  103.611907]  print_address_description+0x1da/0x3c0
> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.612479]  __kasan_report+0x13f/0x1a0
> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.613604]  kasan_report+0xe/0x20
> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
> [  103.614898]  ? kmalloc_order+0x63/0x70
> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
> [  103.616095]  ? up_write+0x4b/0x70
> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
> [  103.617777]  ? mutex_lock_io_nested+0xac0/0xac0
> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
> [  103.617877]  ? wait_for_completion+0x200/0x200
> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
> [  103.623842]  ? __isolate_free_page+0x290/0x290
> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.623970]  ? kmalloc_order+0x63/0x70
> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
> [  103.624768]  ? __kasan_slab_free+0x133/0x160
> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
> [  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
> [  103.625580]  local_pci_probe+0x74/0xd0
> [  103.625603]  pci_device_probe+0x1fa/0x310
> [  103.625620]  ? pci_device_remove+0x1c0/0x1c0
> [  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
> [  103.625673]  really_probe+0x367/0x5d0
> [  103.625700]  driver_probe_device+0x177/0x1b0
> [  103.625721]  device_driver_attach+0x8a/0x90
> [  103.625737]  ? device_driver_attach+0x90/0x90
> [  103.625746]  __driver_attach+0xeb/0x190
> [  103.625765]  ? device_driver_attach+0x90/0x90
> [  103.625773]  bus_for_each_dev+0xe4/0x160
> [  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
> [  103.625829]  bus_add_driver+0x277/0x330
> [  103.625855]  driver_register+0xc6/0x1a0
> [  103.625866]  ? 0xffffffffa0d88000
> [  103.625880]  do_one_initcall+0xd3/0x334
> [  103.625895]  ? trace_event_raw_event_initcall_finish+0x150/0x150
> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.625924]  ? __kasan_kmalloc+0xd5/0xf0
> [  103.625946]  ? kmem_cache_alloc_trace+0x154/0x300
> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40
> [  103.625985]  do_init_module+0xec/0x354
> [  103.626011]  load_module+0x3c91/0x4980
> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
> [  103.626132]  ? ima_read_file+0x10/0x10
> [  103.626142]  ? vfs_read+0x127/0x190
> [  103.626163]  ? kernel_read+0x95/0xb0
> [  103.626187]  ? kernel_read_file+0x1a5/0x340
> [  103.626277]  ? __do_sys_finit_module+0x175/0x1b0
> [  103.626287]  __do_sys_finit_module+0x175/0x1b0
> [  103.626301]  ? __ia32_sys_init_module+0x40/0x40
> [  103.626338]  ? lock_downgrade+0x390/0x390
> [  103.626396]  ? vtime_user_exit+0xc8/0xe0
> [  103.626423]  do_syscall_64+0x7d/0x250
> [  103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  103.626450] RIP: 0033:0x7f09984854d9
> [  103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
> [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
> 00007f09984854d9
> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
> 0000000000000006
> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
> 0000000000000000
> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000000000000
> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
> 0000000000000013
>
> [  103.626592] The buggy address belongs to the page:
> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [  103.626675] flags: 0x2ffff0000000000()
> [  103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
> 0000000000000000
> [  103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [  103.626702] page dumped because: kasan: bad access detected
>
> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
> modprobe/1122 at offset 264 in frame:
> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>
> [  103.627346] this frame has 3 objects:
> [  103.627405]  [32, 36) 'avail_size'
> [  103.627410]  [96, 120) 'local_mem_info'
> [  103.627466]  [160, 264) 'cu_info'
>
> [  103.627602] Memory state around the buggy address:
> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
> f4 f2 f2
> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
> 00 00 00
> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
> f3 00 00
> [  103.627989]                                         ^
> [  103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00
> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
> 00 00 00
> [  103.628273]
> ==================================================================
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found]     ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-17 22:38       ` Grodzovsky, Andrey
       [not found]         ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-17 22:38 UTC (permalink / raw)
  To: Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Not that I aware of, is there a special Kconfig flag to determine stack 
size ?

Andrey

On 10/17/19 5:29 PM, Kuehling, Felix wrote:
> I don't see why this problem would be specific to Arcturus. I don't see
> any excessive allocations on the stack either. Also the code involved
> here hasn't changed recently.
>
> Are you using some weird kernel config with a smaller stack? Is it
> specific to a compiler version or some optimization flags? I've
> sometimes seen function inlining cause excessive stack usage.
>
> Regards,
>     Felix
>
> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>> He Felix - I see this on boot when working with Arcturus.
>>
>> Andrey
>>
>>
>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
>> [  103.610769]
>> ==================================================================
>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [  103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>
>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>> O      5.3.0-rc3+ #45
>> [  103.611847] Hardware name: System manufacturer System Product
>> Name/Z170-PRO, BIOS 1902 06/27/2016
>> [  103.611856] Call Trace:
>> [  103.611879]  dump_stack+0x71/0xab
>> [  103.611907]  print_address_description+0x1da/0x3c0
>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [  103.612479]  __kasan_report+0x13f/0x1a0
>> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [  103.613604]  kasan_report+0xe/0x20
>> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>> [  103.614898]  ? kmalloc_order+0x63/0x70
>> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>> [  103.616095]  ? up_write+0x4b/0x70
>> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>> [  103.617777]  ? mutex_lock_io_nested+0xac0/0xac0
>> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>> [  103.617877]  ? wait_for_completion+0x200/0x200
>> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
>> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
>> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
>> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
>> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>> [  103.623842]  ? __isolate_free_page+0x290/0x290
>> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
>> [  103.623970]  ? kmalloc_order+0x63/0x70
>> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
>> [  103.624768]  ? __kasan_slab_free+0x133/0x160
>> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
>> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>> [  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
>> [  103.625580]  local_pci_probe+0x74/0xd0
>> [  103.625603]  pci_device_probe+0x1fa/0x310
>> [  103.625620]  ? pci_device_remove+0x1c0/0x1c0
>> [  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
>> [  103.625673]  really_probe+0x367/0x5d0
>> [  103.625700]  driver_probe_device+0x177/0x1b0
>> [  103.625721]  device_driver_attach+0x8a/0x90
>> [  103.625737]  ? device_driver_attach+0x90/0x90
>> [  103.625746]  __driver_attach+0xeb/0x190
>> [  103.625765]  ? device_driver_attach+0x90/0x90
>> [  103.625773]  bus_for_each_dev+0xe4/0x160
>> [  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
>> [  103.625829]  bus_add_driver+0x277/0x330
>> [  103.625855]  driver_register+0xc6/0x1a0
>> [  103.625866]  ? 0xffffffffa0d88000
>> [  103.625880]  do_one_initcall+0xd3/0x334
>> [  103.625895]  ? trace_event_raw_event_initcall_finish+0x150/0x150
>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40
>> [  103.625924]  ? __kasan_kmalloc+0xd5/0xf0
>> [  103.625946]  ? kmem_cache_alloc_trace+0x154/0x300
>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40
>> [  103.625985]  do_init_module+0xec/0x354
>> [  103.626011]  load_module+0x3c91/0x4980
>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>> [  103.626132]  ? ima_read_file+0x10/0x10
>> [  103.626142]  ? vfs_read+0x127/0x190
>> [  103.626163]  ? kernel_read+0x95/0xb0
>> [  103.626187]  ? kernel_read_file+0x1a5/0x340
>> [  103.626277]  ? __do_sys_finit_module+0x175/0x1b0
>> [  103.626287]  __do_sys_finit_module+0x175/0x1b0
>> [  103.626301]  ? __ia32_sys_init_module+0x40/0x40
>> [  103.626338]  ? lock_downgrade+0x390/0x390
>> [  103.626396]  ? vtime_user_exit+0xc8/0xe0
>> [  103.626423]  do_syscall_64+0x7d/0x250
>> [  103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  103.626450] RIP: 0033:0x7f09984854d9
>> [  103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
>> [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000139
>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>> 00007f09984854d9
>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>> 0000000000000006
>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>> 0000000000000000
>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>> 0000000000000013
>>
>> [  103.626592] The buggy address belongs to the page:
>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x0
>> [  103.626675] flags: 0x2ffff0000000000()
>> [  103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
>> 0000000000000000
>> [  103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [  103.626702] page dumped because: kasan: bad access detected
>>
>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>> modprobe/1122 at offset 264 in frame:
>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>
>> [  103.627346] this frame has 3 objects:
>> [  103.627405]  [32, 36) 'avail_size'
>> [  103.627410]  [96, 120) 'local_mem_info'
>> [  103.627466]  [160, 264) 'cu_info'
>>
>> [  103.627602] Memory state around the buggy address:
>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
>> f4 f2 f2
>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
>> 00 00 00
>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
>> f3 00 00
>> [  103.627989]                                         ^
>> [  103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00
>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
>> 00 00 00
>> [  103.628273]
>> ==================================================================
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found]         ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-18 20:55           ` Kuehling, Felix
       [not found]             ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Kuehling, Felix @ 2019-10-18 20:55 UTC (permalink / raw)
  To: Grodzovsky, Andrey; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine stack
> size ?

I remember there used to be a Kconfig option to force a 4KB kernel 
stack. I don't see it in the current kernel any more.

I don't have time to work on this myself. I'll create a ticket and see 
if I can find someone to investigate.

Thanks,
   Felix


>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't see
>> any excessive allocations on the stack either. Also the code involved
>> here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it
>> specific to a compiler version or some optimization flags? I've
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>>      Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
>>> [  103.610769]
>>> ==================================================================
>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>
>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>>> O      5.3.0-rc3+ #45
>>> [  103.611847] Hardware name: System manufacturer System Product
>>> Name/Z170-PRO, BIOS 1902 06/27/2016
>>> [  103.611856] Call Trace:
>>> [  103.611879]  dump_stack+0x71/0xab
>>> [  103.611907]  print_address_description+0x1da/0x3c0
>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.612479]  __kasan_report+0x13f/0x1a0
>>> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613604]  kasan_report+0xe/0x20
>>> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>>> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.614898]  ? kmalloc_order+0x63/0x70
>>> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>>> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>>> [  103.616095]  ? up_write+0x4b/0x70
>>> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>>> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>>> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>>> [  103.617777]  ? mutex_lock_io_nested+0xac0/0xac0
>>> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617877]  ? wait_for_completion+0x200/0x200
>>> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
>>> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
>>> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>>> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>>> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>>> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
>>> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>>> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>>> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
>>> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>>> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>>> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>>> [  103.623842]  ? __isolate_free_page+0x290/0x290
>>> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
>>> [  103.623970]  ? kmalloc_order+0x63/0x70
>>> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>>> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>>> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
>>> [  103.624768]  ? __kasan_slab_free+0x133/0x160
>>> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
>>> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>>> [  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
>>> [  103.625580]  local_pci_probe+0x74/0xd0
>>> [  103.625603]  pci_device_probe+0x1fa/0x310
>>> [  103.625620]  ? pci_device_remove+0x1c0/0x1c0
>>> [  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>> [  103.625673]  really_probe+0x367/0x5d0
>>> [  103.625700]  driver_probe_device+0x177/0x1b0
>>> [  103.625721]  device_driver_attach+0x8a/0x90
>>> [  103.625737]  ? device_driver_attach+0x90/0x90
>>> [  103.625746]  __driver_attach+0xeb/0x190
>>> [  103.625765]  ? device_driver_attach+0x90/0x90
>>> [  103.625773]  bus_for_each_dev+0xe4/0x160
>>> [  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
>>> [  103.625829]  bus_add_driver+0x277/0x330
>>> [  103.625855]  driver_register+0xc6/0x1a0
>>> [  103.625866]  ? 0xffffffffa0d88000
>>> [  103.625880]  do_one_initcall+0xd3/0x334
>>> [  103.625895]  ? trace_event_raw_event_initcall_finish+0x150/0x150
>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40
>>> [  103.625924]  ? __kasan_kmalloc+0xd5/0xf0
>>> [  103.625946]  ? kmem_cache_alloc_trace+0x154/0x300
>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40
>>> [  103.625985]  do_init_module+0xec/0x354
>>> [  103.626011]  load_module+0x3c91/0x4980
>>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>>> [  103.626132]  ? ima_read_file+0x10/0x10
>>> [  103.626142]  ? vfs_read+0x127/0x190
>>> [  103.626163]  ? kernel_read+0x95/0xb0
>>> [  103.626187]  ? kernel_read_file+0x1a5/0x340
>>> [  103.626277]  ? __do_sys_finit_module+0x175/0x1b0
>>> [  103.626287]  __do_sys_finit_module+0x175/0x1b0
>>> [  103.626301]  ? __ia32_sys_init_module+0x40/0x40
>>> [  103.626338]  ? lock_downgrade+0x390/0x390
>>> [  103.626396]  ? vtime_user_exit+0xc8/0xe0
>>> [  103.626423]  do_syscall_64+0x7d/0x250
>>> [  103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [  103.626450] RIP: 0033:0x7f09984854d9
>>> [  103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
>>> [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000139
>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>> 00007f09984854d9
>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>> 0000000000000006
>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>> 0000000000000013
>>>
>>> [  103.626592] The buggy address belongs to the page:
>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>> mapping:0000000000000000 index:0x0
>>> [  103.626675] flags: 0x2ffff0000000000()
>>> [  103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>> 0000000000000000
>>> [  103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
>>> 0000000000000000
>>> [  103.626702] page dumped because: kasan: bad access detected
>>>
>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>> modprobe/1122 at offset 264 in frame:
>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>
>>> [  103.627346] this frame has 3 objects:
>>> [  103.627405]  [32, 36) 'avail_size'
>>> [  103.627410]  [96, 120) 'local_mem_info'
>>> [  103.627466]  [160, 264) 'cu_info'
>>>
>>> [  103.627602] Memory state around the buggy address:
>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
>>> f4 f2 f2
>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
>>> 00 00 00
>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
>>> f3 00 00
>>> [  103.627989]                                         ^
>>> [  103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00
>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
>>> 00 00 00
>>> [  103.628273]
>>> ==================================================================
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Stack out of bounds in KFD on Arcturus
       [not found]             ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-18 21:31               ` Zeng, Oak
       [not found]                 ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-18 21:31 UTC (permalink / raw)
  To: Kuehling, Felix, Grodzovsky, Andrey
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 9702 bytes --]

Hi Andrey, 

What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.

Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?

Regards,
Oak

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Friday, October 18, 2019 4:55 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus

On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine 
> stack size ?

I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.

I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.

Thanks,
   Felix


>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't 
>> see any excessive allocations on the stack either. Also the code 
>> involved here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it 
>> specific to a compiler version or some optimization flags? I've 
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>>      Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [  
>>> 103.610769] 
>>> ==================================================================
>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] Read 
>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>
>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O      
>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer 
>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856] 
>>> Call Trace:
>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]  
>>> print_address_description+0x1da/0x3c0
>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  
>>> 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ? 
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ? 
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]  
>>> kasan_report+0xe/0x20 [  103.614149]  
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ? 
>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ? 
>>> __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]  
>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ? 
>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ? 
>>> up_write+0x4b/0x70 [  103.616649]  
>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ? 
>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ? 
>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ? 
>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ? 
>>> __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ? 
>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ? 
>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ? 
>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ? 
>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ? 
>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]  
>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]  
>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ? 
>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ? 
>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ? 
>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]  
>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ? 
>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ? 
>>> __isolate_free_page+0x290/0x290 [  103.623852]  ? 
>>> fs_reclaim_acquire.part.97+0x5/0x30
>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ? 
>>> kmalloc_order+0x63/0x70 [  103.624337]  
>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ? 
>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ? 
>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ? 
>>> __kasan_slab_free+0x133/0x160 [  103.624849]  
>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]  
>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ? 
>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]  
>>> local_pci_probe+0x74/0xd0 [  103.625603]  
>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ? 
>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ? 
>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]  
>>> driver_probe_device+0x177/0x1b0 [  103.625721]  
>>> device_driver_attach+0x8a/0x90 [  103.625737]  ? 
>>> device_driver_attach+0x90/0x90 [  103.625746]  
>>> __driver_attach+0xeb/0x190 [  103.625765]  ? 
>>> device_driver_attach+0x90/0x90 [  103.625773]  
>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ? 
>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]  
>>> bus_add_driver+0x277/0x330 [  103.625855]  
>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [  
>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ? 
>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ? 
>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ? 
>>> kmem_cache_alloc_trace+0x154/0x300
>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]  
>>> do_init_module+0xec/0x354 [  103.626011]  load_module+0x3c91/0x4980 
>>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ? 
>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [  
>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ? 
>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]  
>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ? 
>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ? 
>>> lock_downgrade+0x390/0x390 [  103.626396]  ? 
>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250 [  
>>> 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3 
>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 
>>> 0f
>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 
>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000139
>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>> 00007f09984854d9
>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>> 0000000000000006
>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>> 0000000000000013
>>>
>>> [  103.626592] The buggy address belongs to the page:
>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>> mapping:0000000000000000 index:0x0
>>> [  103.626675] flags: 0x2ffff0000000000() [  103.626686] raw: 
>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>> 0000000000000000
>>> [  103.626696] raw: 0000000000000000 0000000000000000 
>>> 00000000ffffffff
>>> 0000000000000000
>>> [  103.626702] page dumped because: kasan: bad access detected
>>>
>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>> modprobe/1122 at offset 264 in frame:
>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>
>>> [  103.627346] this frame has 3 objects:
>>> [  103.627405]  [32, 36) 'avail_size'
>>> [  103.627410]  [96, 120) 'local_mem_info'
>>> [  103.627466]  [160, 264) 'cu_info'
>>>
>>> [  103.627602] Memory state around the buggy address:
>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 
>>> f4 f4
>>> f4 f2 f2
>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 
>>> 00 00
>>> 00 00 00
>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 
>>> f3 f3
>>> f3 00 00
>>> [  103.627989]                                         ^ [  
>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 00
>>> 00 00 00
>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 
>>> f3 00
>>> 00 00 00
>>> [  103.628273]
>>> ==================================================================
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #2: Type: message/rfc822, Size: 212414 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 12150 bytes --]


MI100 HW Enablement Linux SW Stack(VBIOS, FW, DKMS Kernel Driver) Integration Test Report<http://confluence.amd.com/x/c9iEC>
Dashboard

Hardware info

AMDGPU Linux Stack

Linux Distro

Status

SUT-1 Configuration:

  *   Motherboard: ASUS PRIME Z270-A
  *   CPU: i7-7700K CPU @ 4.20GHz
  *   Memory: Kingston DDR4 2133 8GB *2
  *   ASIC: MI100 socket PA Non-Secure board revB 102-D34101-01

  *   VBIOS:
     *   10 Oct 2019 D3410100.019<http://storeiis2/BIOSTest/SignedBIOS/G0484/484666/D3410100.019>
  *   ROCm DKMS Package:
     *   Firmware: http://git.amd.com:8080/plugins/gitiles/brahma/ec/utility/brahma-utils/+log/amd-staging
        *   commit: 18bb9059 firmware/arcturus: update rlc firmware
        *   version: RLC: 21.1, MEC: 33.45, SMC: 54.7, SDMA: 34.44, SOS: 0x0017002a; ASD: 0x21000018; XGMI TA: 0x20000003; RAS TA: 1B00000C
     *   Kernel: http://git.amd.com:8080/plugins/gitiles/brahma/ec/linux/+log/amd-mainline-dkms-5.0
        *   commit: 6b05d1f005c0 drm/amdgpu/swSMU: custom UMD pstate peak clock for navi14
     *   amdgpu-dkms package: amdgpu-dkms_1910121037-6b05d1f005c0_all.deb<http://srdcartifactory/artifactory/api/download/linux-ci-generic-local/builds/canli/secure/amdgpu-dkms_1910121037-6b05d1f005c0_all.deb>
  *   ROCm LKG build for UMD:
     *   20 Sep 2019 http://rocm-ci/job/compute-rocm-dkms-no-npi/1004/

Ubuntu 18.04.3 LTS

PROMOTABLE

SUT-2 Configuration:

  *   Motherboard: Supermicro X10DRG-OT (SYS-4028GR-TRT2)
  *   CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
  *   Memory: Micron DDR4 2667 MT/s 64GB *12
  *   ASIC: MI100 102-D34302-00 PCIe Product Board 32GB (U/F) Non-Secure board XGMI 2P

  *   VBIOS:
     *   10 Oct 2019 D3430200.L19<http://storeiis2/BIOSTest/SignedBIOS/G0484/484668/D3430200.L19> (enable flag: ENABLE_SEC_POLICY_ON_UNSEC_ASIC and Large Bar)
  *   ROCm DKMS Package:
     *   Firmware: http://git.amd.com:8080/plugins/gitiles/brahma/ec/utility/brahma-utils/+log/amd-staging
        *   commit: 18bb9059 firmware/arcturus: update rlc firmware
        *   version: RLC: 21.1, MEC: 33.45, SMC: 54.7, SDMA: 34.44, SOS: 0x0017002a; ASD: 0x21000018; XGMI TA: 0x20000003; RAS TA: 1B00000C
     *   Kernel: http://git.amd.com:8080/plugins/gitiles/brahma/ec/linux/+log/amd-mainline-dkms-5.0
        *   commit: 6b05d1f005c0 drm/amdgpu/swSMU: custom UMD pstate peak clock for navi14
     *   amdgpu-dkms package: amdgpu-dkms_1910121037-6b05d1f005c0_all.deb<http://srdcartifactory/artifactory/api/download/linux-ci-generic-local/builds/canli/secure/amdgpu-dkms_1910121037-6b05d1f005c0_all.deb>
  *   ROCm LKG build for UMD:
     *   20 Sep 2019 http://rocm-ci/job/compute-rocm-dkms-no-npi/1004/
Ubuntu 18.04.3 LTS

PROMOTABLE

SUT-3 Configuration:

  *   Motherboard: Supermicro X10DRG-Q (SYS-7048GR-TR)
  *   CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
  *   Memory: Micron DDR4 2133 MT/s 16GB *7
  *   ASIC: MI100 102-D34302-00 PCIe Product Board 32GB (U/F) Non-Secure board *2 non-XGMI

Reference:

MI100 VBIOS: http://home.amd.com/VideoBios/Video%20BIOS%20Releases/SingleASICRelease.asp?AsicName=MI100

ROCm build for MI100: http://rocm-ci/job/compute-rocm-dkms-no-npi/

How to replace kernel driver and FWs: How to install and replace kernel driver and FWs for MI100<http://confluence.amd.com/display/~canli/How+to+install+and+replace+kernel+driver+and+FWs+for+MI100>

Executive Summary
What's Current and New?

  *   Outstanding issues:
     *   Issue can be observed with VBIOS L18 on XGMI 2P but not on non-XGMI
        *   [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030> - [MI100] kfdtest subtests failed on XGMI 2P with large bar enabled Opened
  *   Existing issues:
     *   [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> - [MI100 XGMI] UCLK/SOCCLK/FCLK DPM are still disabled with XGMI enabled Opened
     *   [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-201443<http://ontrack-internal.amd.com/browse/SWDEV-201443> - Linux Pro: KFDMemoryTest.BigBufferStressTest fails Assessed
  *   VBIOS upgraded to v19
  *   RLC FW upgrade to 21.1, SOS FW upgrade to SOS: 0x0017002a
  *   Power Feature enablement status
Feature
SMU FW Ready
AMDGPU Kernel Ready

DPM_PREFETCHER
Yes
Yes

DPM_GFXCLK
Yes
Yes

DPM_UCLK
Yes

Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened

DPM_SOCCLK
Yes

Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened

DPM_FCLK
Yes

Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened

DPM_XGMI
No
No

DS_GFXCLK
Yes
Yes

DS_SOCCLK
Yes
Yes

DS_LCLK
Yes

Yes

Require ASPM L1 support in Driver and M/B(Under discussion)

DS_FCLK
Yes
Yes

GFX_ULV
Yes
Yes

DPM_VCN
Yes

VCN disabled for PSP front door loading due to the issue: [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022> Assessed

RSMU_SMN_CG
Yes
Yes

WAFL_CG
No
No

PPT

Yes

Yes

Depends on PPTable setting to enable 4 PPT(PPTable Not ready) or 1 PPT

TDC
Yes
Yes

APCC_PLUS
Yes
Pending on pptable release

VR0HOT
Yes
Yes

VR1HOT
No
No

FW_CTF
Yes
Yes
FAN CONTROL
Not POR
N/A

THERMAL CONTROL
Yes
Yes

OUT_OF_BAND_MONITOR
Yes
Yes

TEMP_DEPENDENT_VMIN
Yes
Pending on pptable release

GFX CG

NOT SMU feature

Yes

HDP CG

NOT SMU feature

Yes

SDMA CG

NOT SMU feature

Yes

MMHUB CG

NOT SMU feature

Yes

UMC CG

NOT SMU feature

Yes

DF CG

NOT SMU feature

Yes

ATHUB CG

NOT SMU feature

Yes

PSP CG

NOT SMU feature

Checking the readiness

User Mode Stable Power State

NOT SMU feature

Yes

Workload Aware Dynamic Power Management / User Power Control

Yes

Yes

Test Coverage

Test case

MI100 GPU

(D34101)

MI100 mGPU

(D34302*2 XGMI 2P)

MI100 mGPU

(D34302*2 non-XGMI)

Comments
Base
amdgpu_test
Basic Tests

PASS

PASS

PASS
BO Tests

PASS

PASS

PASS
VCN Tests

N/A

N/A

N/A

Skip VCN Test due to Skip VCN IP initialization after switch to FW front door loading.

[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022> Assessed
VM Tests

PASS

PASS

PASS
Power
GFX DPM check

PASS

PASS

PASS
Force GFX DPM level check

PASS

PASS

PASS
GFX ULV check

PASS

PASS

PASS
DS GFXCLK check

PASS

PASS

PASS
DS SOCCLK check

PASS

FAIL

PASS

[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
DS FCLK check

PASS

FAIL

PASS

[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
ROCr/KFD
rocm_info

PASS

PASS

PASS
kfdtest

PASS

FAIL

PASS

  *   KFDPerformanceTest.P2PBandWidthTest and KFDGraphicsInterop.RegisterForeignDeviceMem tests failed via XGMI on Large bar enabled
     *   [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030> Opened
  *   Existing issue with large size system memory
     *   [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-201443<http://ontrack-internal.amd.com/browse/SWDEV-201443> Assessed
rocrtst

PASS

PASS

PASS
rocm_bandwidth_test

PASS

PASS

PASS

  *   Using RBT built in rocm no-npi-dkms build#1060 to verify the data path passed.
  *   Bad performance via XGMI
rocm-smi

PASS

PASS

PASS
rsmitst

PASS

PASS

PASS
OCL
ocltst

PASS

PASS

PASS
HIP
hipsamples_utils

PASS

PASS

PASS
Frameworks
Tensorflow
tf_convolutional_quick_test

PASS

PASS

PASS
Pytorch unit test
test_autograd

PASS

PASS

PASS
test_nn

PASS

PASS

PASS
MIOpen unit test
MIOpen (HIP)

PASS

PASS

PASS
MIOpen(OpenCL)

PASS

PASS

PASS
Math libs
rocBLAS

PASS

PASS

PASS
Run quick tests only
hipBLAS

PASS

PASS

PASS

Additional Information

Note: All tests run with latest VBIOS/FW/Kernel and ROCm LKG build

Defect list
Key
Summary
triage assignment
target sw release
Assignee
SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030?src=confmacro>
[MI100] kfdtest subtests failed on XGMI 2P with large bar enabled<http://ontrack-internal.amd.com/browse/SWDEV-207030?src=confmacro>
VBIOS

Tao, Cherry
SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604?src=confmacro>
[MI100 XGMI] UCLK/SOCCLK/FCLK DPM are still disabled with XGMI enabled <http://ontrack-internal.amd.com/browse/SWDEV-204604?src=confmacro>
Base dGPU Enablement
Quan, Evan
SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022?src=confmacro>
MI100 VCN engine hangs after FW loading with PSP <http://ontrack-internal.amd.com/browse/SWDEV-203022?src=confmacro>
Multimedia
Staging-DRM-Next
Zhu, James
SWDEV-202188<http://ontrack-internal.amd.com/browse/SWDEV-202188?src=confmacro>
[MI100] HSA_STATUS_ERROR_OUT_OF_RESOURCES when run rocminfo on Gigabyte Eypc platform <http://ontrack-internal.amd.com/browse/SWDEV-202188?src=confmacro>
HSA KFD
Keely, Sean
SWDEV-201817<http://ontrack-internal.amd.com/browse/SWDEV-201817?src=confmacro>
[MI100] rocrtst test failed on Gigabyte Eypc platform <http://ontrack-internal.amd.com/browse/SWDEV-201817?src=confmacro>
Runtime
Keely, Sean
SWDEV-200753<http://ontrack-internal.amd.com/browse/SWDEV-200753?src=confmacro>
[ROCm QA][no-npi-dkms][MI100] XGMI Links not working with 4P/2P <http://ontrack-internal.amd.com/browse/SWDEV-200753?src=confmacro>
Base
ROC-Master
Clements, John



BCC: Rose, Danny <Danny.Rose@amd.com<mailto:Danny.Rose@amd.com>>; dl.MLSE.QA <dl.MLSE.QA@amd.com<mailto:dl.MLSE.QA@amd.com>>; Weyman, Jeff <Jeffrey.Weyman@amd.com<mailto:Jeffrey.Weyman@amd.com>>; Fan, Fai <Fai.Fan@amd.com<mailto:Fai.Fan@amd.com>>; Marsan, Luugi <Luugi.Marsan@amd.com<mailto:Luugi.Marsan@amd.com>>; sw.dl.ERP.LuugiM <sw.dl.ERP.LuugiM@amd.com<mailto:sw.dl.ERP.LuugiM@amd.com>>; dl.srdc_lnx_mi100 <dl.srdc_lnx_mi100@amd.com<mailto:dl.srdc_lnx_mi100@amd.com>>; Tim Writer <Tim.Writer@amd.com<mailto:Tim.Writer@amd.com>>; dl.SRDC_SW_Linux_dev dl.SRDC_SW_Linux_dev@amd.com<mailto:dl.SRDC_SW_Linux_dev@amd.com>; Guo, Miaomiao <Miaomiao.Guo@amd.com<http://amd.com>>; Yao, Yoyo <Yoyo.Yao@amd.com<http://amd.com>>; Jain, Praveen <Praveen.Jain@amd.com<http://amd.com>>; Arora, Jitesh <Jitesh.Arora@amd.com<mailto:Arora@amd.com>>; Zhu, James <James.Zhu@amd.com<http://amd.com>>; Bridgman, John <John.Bridgman@amd.com<http://amd.com>>; Islam, Jamin <Jamin.Islam@amd.com<http://amd.com>>; Koohestani, Ehsan <Ehsan.Koohestani@amd.com<http://amd.com>>; Wang, Cloud <Cloud.Wang@amd.com<http://amd.com>>; Gong, Yakov <Yakov.Gong@amd.com<http://amd.com>>; Yang, Alice (SRDC 3D) <Alice1.Yang@amd.com<mailto:Alice1.Yang@amd.com>>; Ma, Sigil <Sigil.Ma@amd.com<http://amd.com>>; Li, Colin <Colin.Li@amd.com<http://amd.com>>; Tang, Moon <Moon.Tang@amd.com<http://amd.com>>; Khan, Irfan <Irfan.Khan@amd.com<http://amd.com>>; Nasim, Kam <Kam.Nasim@amd.com<http://amd.com>>; Shavakh, Shadi <Shadi.Shavakh@amd.com<http://amd.com>>; Lotfi, Khatereh <Khatereh.Lotfi@amd.com<http://amd.com>>; Feng, Haifeng <Haifeng.Feng@amd.com<http://amd.com>>; Liang, Ming <Ming.Liang@amd.com<http://amd.com>>; "Min.Xu2@amd.com<http://amd.com>";dl.MI100_CTA <dl.MI100_CTA@amd.com<http://amd.com>>; Chen, Joe <Joe.Chen@amd.com<http://amd.com>>




Thanks,
Candice Li

[-- Attachment #2.1.2: Type: text/html, Size: 142872 bytes --]

[-- Attachment #3: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found]                 ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 16:48                   ` Grodzovsky, Andrey
       [not found]                     ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 16:48 UTC (permalink / raw)
  To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 9959 bytes --]

On 10/18/19 5:31 PM, Zeng, Oak wrote:

> Hi Andrey,
>
> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.


Attached is my lshw

>
> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?


What do you mean if this is my Kconfig ? Is there particular Kconfig 
flag you know that i can look for ?

Andrey


>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
> Sent: Friday, October 18, 2019 4:55 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>> Not that I aware of, is there a special Kconfig flag to determine
>> stack size ?
> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>
> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>
> Thanks,
>     Felix
>
>
>> Andrey
>>
>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>> I don't see why this problem would be specific to Arcturus. I don't
>>> see any excessive allocations on the stack either. Also the code
>>> involved here hasn't changed recently.
>>>
>>> Are you using some weird kernel config with a smaller stack? Is it
>>> specific to a compiler version or some optimization flags? I've
>>> sometimes seen function inlining cause excessive stack usage.
>>>
>>> Regards,
>>>       Felix
>>>
>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>> He Felix - I see this on boot when working with Arcturus.
>>>>
>>>> Andrey
>>>>
>>>>
>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>> 103.610769]
>>>> ==================================================================
>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] Read
>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>
>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer
>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856]
>>>> Call Trace:
>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>> print_address_description+0x1da/0x3c0
>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>>> 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>> kasan_report+0xe/0x20 [  103.614149]
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>> up_write+0x4b/0x70 [  103.616649]
>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>> __mutex_unlock_slowpath+0xda/0x420
>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [
>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>> kmem_cache_alloc_trace+0x154/0x300
>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>> do_init_module+0xec/0x354 [  103.626011]  load_module+0x3c91/0x4980
>>>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [
>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250 [
>>>> 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
>>>> 0f
>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>> 0000000000000139
>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>> 00007f09984854d9
>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>> 0000000000000006
>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000000000000
>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>> 0000000000000013
>>>>
>>>> [  103.626592] The buggy address belongs to the page:
>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>> mapping:0000000000000000 index:0x0
>>>> [  103.626675] flags: 0x2ffff0000000000() [  103.626686] raw:
>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>> 0000000000000000
>>>> [  103.626696] raw: 0000000000000000 0000000000000000
>>>> 00000000ffffffff
>>>> 0000000000000000
>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>
>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>> modprobe/1122 at offset 264 in frame:
>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>
>>>> [  103.627346] this frame has 3 objects:
>>>> [  103.627405]  [32, 36) 'avail_size'
>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>
>>>> [  103.627602] Memory state around the buggy address:
>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>> f4 f4
>>>> f4 f2 f2
>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>> 00 00
>>>> 00 00 00
>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>> f3 f3
>>>> f3 00 00
>>>> [  103.627989]                                         ^ [
>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 00
>>>> 00 00 00
>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>> f3 00
>>>> 00 00 00
>>>> [  103.628273]
>>>> ==================================================================
>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[-- Attachment #2: lshw --]
[-- Type: text/plain, Size: 20351 bytes --]

dal@ubuntu-1604-test:~$ sudo lshw
[sudo] password for dal: 
ubuntu-1604-test          
    description: Desktop Computer
    product: System Product Name (SKU)
    vendor: System manufacturer
    version: System Version
    serial: System Serial Number
    width: 64 bits
    capabilities: smbios-3.0 dmi-3.0 vsyscall32
    configuration: boot=normal chassis=desktop family=To be filled by O.E.M. sku=SKU uuid=204CDE28-DAD7-DD11-B0DC-38D54727F70C
  *-core
       description: Motherboard
       product: Z170-PRO
       vendor: ASUSTeK COMPUTER INC.
       physical id: 0
       version: Rev 1.xx
       serial: 160879880901004
       slot: Default string
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 1902
          date: 06/27/2016
          size: 64KiB
          capacity: 15MiB
          capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
     *-cache:0
          description: L1 cache
          physical id: 41
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back data
          configuration: level=1
     *-cache:1
          description: L1 cache
          physical id: 42
          slot: L1 Cache
          size: 128KiB
          capacity: 128KiB
          capabilities: synchronous internal write-back instruction
          configuration: level=1
     *-cache:2
          description: L2 cache
          physical id: 43
          slot: L2 Cache
          size: 1MiB
          capacity: 1MiB
          capabilities: synchronous internal write-back unified
          configuration: level=2
     *-cache:3
          description: L3 cache
          physical id: 44
          slot: L3 Cache
          size: 8MiB
          capacity: 8MiB
          capabilities: synchronous internal write-back unified
          configuration: level=3
     *-cpu
          description: CPU
          product: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          vendor: Intel Corp.
          physical id: 45
          bus info: cpu@0
          version: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
          serial: To Be Filled By O.E.M.
          slot: LGA1151
          size: 3907MHz
          capacity: 4200MHz
          width: 64 bits
          clock: 100MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp cpufreq
          configuration: cores=4 enabledcores=4 threads=8
     *-memory
          description: System Memory
          physical id: 46
          slot: System board or motherboard
          size: 16GiB
        *-bank:0
             description: [empty]
             physical id: 0
             slot: ChannelA-DIMM1
        *-bank:1
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: CMK16GX4M2B3000C15
             vendor: Corsair
             physical id: 1
             serial: 00000000
             slot: ChannelA-DIMM2
             size: 8GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
        *-bank:2
             description: [empty]
             physical id: 2
             slot: ChannelB-DIMM1
        *-bank:3
             description: DIMM Synchronous 2133 MHz (0.5 ns)
             product: CMK16GX4M2B3000C15
             vendor: Corsair
             physical id: 3
             serial: 00000000
             slot: ChannelB-DIMM2
             size: 8GiB
             width: 64 bits
             clock: 2133MHz (0.5ns)
     *-pci
          description: Host bridge
          product: Sky Lake Host Bridge/DRAM Registers
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 07
          width: 32 bits
          clock: 33MHz
          configuration: driver=skl_uncore
          resources: irq:0
        *-pci:0
             description: PCI bridge
             product: Sky Lake PCIe Controller (x16)
             vendor: Intel Corporation
             physical id: 1
             bus info: pci@0000:00:01.0
             version: 07
             width: 32 bits
             clock: 33MHz
             capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:120 ioport:e000(size=4096) memory:df000000-df1fffff ioport:c0000000(size=270532608)
           *-pci
                description: PCI bridge
                product: Advanced Micro Devices, Inc. [AMD/ATI]
                vendor: Advanced Micro Devices, Inc. [AMD/ATI]
                physical id: 0
                bus info: pci@0000:01:00.0
                version: 00
                width: 32 bits
                clock: 33MHz
                capabilities: pci pm pciexpress msi normal_decode bus_master cap_list
                configuration: driver=pcieport
                resources: irq:16 memory:df100000-df103fff ioport:e000(size=4096) memory:df000000-df0fffff ioport:c0000000(size=270532608)
              *-pci
                   description: PCI bridge
                   product: Advanced Micro Devices, Inc. [AMD/ATI]
                   vendor: Advanced Micro Devices, Inc. [AMD/ATI]
                   physical id: 0
                   bus info: pci@0000:02:00.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm pciexpress msi normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:124 ioport:e000(size=4096) memory:df000000-df0fffff ioport:c0000000(size=270532608)
                 *-display UNCLAIMED
                      description: Display controller
                      product: Advanced Micro Devices, Inc. [AMD/ATI]
                      vendor: Advanced Micro Devices, Inc. [AMD/ATI]
                      physical id: 0
                      bus info: pci@0000:03:00.0
                      version: 00
                      width: 64 bits
                      clock: 33MHz
                      capabilities: pm pciexpress msi bus_master cap_list
                      configuration: latency=0
                      resources: memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:e000(size=256) memory:df000000-df07ffff memory:df080000-df09ffff
        *-display
             description: VGA compatible controller
             product: Sky Lake Integrated Graphics
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 06
             width: 64 bits
             clock: 33MHz
             capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
             configuration: driver=i915 latency=0
             resources: irq:133 memory:de000000-deffffff memory:b0000000-bfffffff ioport:f000(size=64) memory:c0000-dffff
        *-usb
             description: USB controller
             product: Sunrise Point-H USB 3.0 xHCI Controller
             vendor: Intel Corporation
             physical id: 14
             bus info: pci@0000:00:14.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi xhci bus_master cap_list
             configuration: driver=xhci_hcd latency=0
             resources: irq:129 memory:df330000-df33ffff
           *-usbhost:0
                product: xHCI Host Controller
                vendor: Linux 5.3.0-rc3+ xhci-hcd
                physical id: 0
                bus info: usb@1
                logical name: usb1
                version: 5.03
                capabilities: usb-2.00
                configuration: driver=hub slots=16 speed=480Mbit/s
           *-usbhost:1
                product: xHCI Host Controller
                vendor: Linux 5.3.0-rc3+ xhci-hcd
                physical id: 1
                bus info: usb@2
                logical name: usb2
                version: 5.03
                capabilities: usb-3.00
                configuration: driver=hub slots=10 speed=5000Mbit/s
        *-communication
             description: Communication controller
             product: Sunrise Point-H CSME HECI #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=mei_me latency=0
             resources: irq:134 memory:df34d000-df34dfff
        *-storage
             description: SATA controller
             product: Sunrise Point-H SATA controller [AHCI mode]
             vendor: Intel Corporation
             physical id: 17
             bus info: pci@0000:00:17.0
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list
             configuration: driver=ahci latency=0
             resources: irq:132 memory:df348000-df349fff memory:df34c000-df34c0ff ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:df34b000-df34b7ff
        *-pci:1
             description: PCI bridge
             product: Sunrise Point-H PCI Root Port #17
             vendor: Intel Corporation
             physical id: 1b
             bus info: pci@0000:00:1b.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:121 ioport:2000(size=4096) memory:7b000000-7b1fffff ioport:7b200000(size=2097152)
        *-pci:2
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #1
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:122 ioport:3000(size=8192) memory:df200000-df2fffff ioport:7b400000(size=6291456)
           *-pci
                description: PCI bridge
                product: Intel Corporation
                vendor: Intel Corporation
                physical id: 0
                bus info: pci@0000:05:00.0
                version: 00
                width: 32 bits
                clock: 33MHz
                capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
                configuration: driver=pcieport
                resources: irq:16 ioport:3000(size=8192) memory:df200000-df2fffff ioport:7b400000(size=6291456)
              *-pci:0
                   description: PCI bridge
                   product: Intel Corporation
                   vendor: Intel Corporation
                   physical id: 0
                   bus info: pci@0000:06:00.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:125
              *-pci:1
                   description: PCI bridge
                   product: Intel Corporation
                   vendor: Intel Corporation
                   physical id: 1
                   bus info: pci@0000:06:01.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:126 ioport:3000(size=4096) ioport:7b400000(size=2097152)
              *-pci:2
                   description: PCI bridge
                   product: Intel Corporation
                   vendor: Intel Corporation
                   physical id: 2
                   bus info: pci@0000:06:02.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:127 memory:df200000-df2fffff
                 *-usb
                      description: USB controller
                      product: Intel Corporation
                      vendor: Intel Corporation
                      physical id: 0
                      bus info: pci@0000:09:00.0
                      version: 00
                      width: 32 bits
                      clock: 33MHz
                      capabilities: pm msi pciexpress xhci cap_list
                      configuration: driver=xhci_hcd latency=0
                      resources: irq:130 memory:df200000-df20ffff
                    *-usbhost:0
                         product: xHCI Host Controller
                         vendor: Linux 5.3.0-rc3+ xhci-hcd
                         physical id: 0
                         bus info: usb@3
                         logical name: usb3
                         version: 5.03
                         capabilities: usb-2.00
                         configuration: driver=hub slots=2 speed=480Mbit/s
                    *-usbhost:1
                         product: xHCI Host Controller
                         vendor: Linux 5.3.0-rc3+ xhci-hcd
                         physical id: 1
                         bus info: usb@4
                         logical name: usb4
                         version: 5.03
                         capabilities: usb-3.00
                         configuration: driver=hub slots=2 speed=5000Mbit/s
              *-pci:3
                   description: PCI bridge
                   product: Intel Corporation
                   vendor: Intel Corporation
                   physical id: 4
                   bus info: pci@0000:06:04.0
                   version: 00
                   width: 32 bits
                   clock: 33MHz
                   capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
                   configuration: driver=pcieport
                   resources: irq:128 ioport:4000(size=4096) ioport:7b600000(size=2097152)
        *-pci:3
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #9
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:123 ioport:5000(size=4096) memory:7ba00000-7bbfffff ioport:7bc00000(size=2097152)
        *-isa
             description: ISA bridge
             product: Sunrise Point-H LPC Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 31
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-memory UNCLAIMED
             description: Memory controller
             product: Sunrise Point-H PMC
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 31
             width: 32 bits
             clock: 33MHz (30.3ns)
             capabilities: bus_master
             configuration: latency=0
             resources: memory:df344000-df347fff
        *-multimedia
             description: Audio device
             product: Sunrise Point-H HD Audio
             vendor: Intel Corporation
             physical id: 1f.3
             bus info: pci@0000:00:1f.3
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=snd_hda_intel latency=32
             resources: irq:135 memory:df340000-df343fff memory:df320000-df32ffff
        *-serial UNCLAIMED
             description: SMBus
             product: Sunrise Point-H SMBus
             vendor: Intel Corporation
             physical id: 1f.4
             bus info: pci@0000:00:1f.4
             version: 31
             width: 64 bits
             clock: 33MHz
             configuration: latency=0
             resources: memory:df34a000-df34a0ff ioport:f040(size=32)
        *-network
             description: Ethernet interface
             product: Ethernet Connection (2) I219-V
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 31
             serial: 38:d5:47:27:f7:0c
             size: 1Gbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=0.7-4 ip=172.27.234.186 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
             resources: irq:131 memory:df300000-df31ffff
     *-scsi
          physical id: 1
          logical name: scsi4
          capabilities: emulated
        *-disk
             description: ATA Disk
             product: Samsung SSD 850
             physical id: 0.0.0
             bus info: scsi@4:0.0.0
             logical name: /dev/sda
             version: 2B6Q
             serial: S251NX0H703541J
             size: 238GiB (256GB)
             capabilities: partitioned partitioned:dos
             configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=74397aa1
           *-volume:0
                description: EXT4 volume
                vendor: Linux
                physical id: 1
                bus info: scsi@4:0.0.0,1
                logical name: /dev/sda1
                logical name: /
                version: 1.0
                serial: 80cc92c9-bd8b-47f9-82b8-14d0a93b29f9
                size: 109GiB
                capacity: 109GiB
                capabilities: primary bootable journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
                configuration: created=2016-03-17 11:35:41 filesystem=ext4 lastmountpoint=/ modified=2019-10-18 16:56:08 mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro mounted=2019-10-18 16:22:05 state=mounted
           *-volume:1
                description: Extended partition
                physical id: 2
                bus info: scsi@4:0.0.0,2
                logical name: /dev/sda2
                size: 2043MiB
                capacity: 2043MiB
                capabilities: primary extended partitioned partitioned:extended
              *-logicalvolume
                   description: Linux swap / Solaris partition
                   physical id: 5
                   logical name: /dev/sda5
                   capacity: 2043MiB
                   capabilities: nofs
  *-power UNCLAIMED
       description: To Be Filled By O.E.M.
       product: To Be Filled By O.E.M.
       vendor: To Be Filled By O.E.M.
       physical id: 1
       version: To Be Filled By O.E.M.
       serial: To Be Filled By O.E.M.
       capacity: 32768mWh


[-- Attachment #3: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Stack out of bounds in KFD on Arcturus
       [not found]                     ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-22 17:17                       ` Zeng, Oak
       [not found]                         ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-22 17:17 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Kuehling, Felix
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Sorry I meant is the kernel stack size 16KB in your kconfig?

Oak

-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
Sent: Tuesday, October 22, 2019 12:49 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus

On 10/18/19 5:31 PM, Zeng, Oak wrote:

> Hi Andrey,
>
> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.


Attached is my lshw

>
> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?


What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?

Andrey


>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
> Kuehling, Felix
> Sent: Friday, October 18, 2019 4:55 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>> Not that I aware of, is there a special Kconfig flag to determine 
>> stack size ?
> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>
> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>
> Thanks,
>     Felix
>
>
>> Andrey
>>
>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>> I don't see why this problem would be specific to Arcturus. I don't 
>>> see any excessive allocations on the stack either. Also the code 
>>> involved here hasn't changed recently.
>>>
>>> Are you using some weird kernel config with a smaller stack? Is it 
>>> specific to a compiler version or some optimization flags? I've 
>>> sometimes seen function inlining cause excessive stack usage.
>>>
>>> Regards,
>>>       Felix
>>>
>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>> He Felix - I see this on boot when working with Arcturus.
>>>>
>>>> Andrey
>>>>
>>>>
>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [ 
>>>> 103.610769] 
>>>> ==================================================================
>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] Read 
>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>
>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O 
>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer 
>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856] 
>>>> Call Trace:
>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>> print_address_description+0x1da/0x3c0
>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 
>>>> 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>> kasan_report+0xe/0x20 [  103.614149]
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>> up_write+0x4b/0x70 [  103.616649]
>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>> __mutex_unlock_slowpath+0xda/0x420
>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [ 
>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>> kmem_cache_alloc_trace+0x154/0x300
>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>> do_init_module+0xec/0x354 [  103.626011]  load_module+0x3c91/0x4980 
>>>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [ 
>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250 
>>>> [ 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 
>>>> 08 0f
>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>> 0000000000000139
>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>> 00007f09984854d9
>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>> 0000000000000006
>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000000000000
>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>> 0000000000000013
>>>>
>>>> [  103.626592] The buggy address belongs to the page:
>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>> mapping:0000000000000000 index:0x0
>>>> [  103.626675] flags: 0x2ffff0000000000() [  103.626686] raw:
>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>> 0000000000000000
>>>> [  103.626696] raw: 0000000000000000 0000000000000000 
>>>> 00000000ffffffff
>>>> 0000000000000000
>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>
>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>> modprobe/1122 at offset 264 in frame:
>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>
>>>> [  103.627346] this frame has 3 objects:
>>>> [  103.627405]  [32, 36) 'avail_size'
>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>
>>>> [  103.627602] Memory state around the buggy address:
>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>> f4 f4
>>>> f4 f2 f2
>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>> 00 00
>>>> 00 00 00
>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>> f3 f3
>>>> f3 00 00
>>>> [  103.627989]                                         ^ [ 
>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 00
>>>> 00 00 00
>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>> f3 00
>>>> 00 00 00
>>>> [  103.628273]
>>>> ==================================================================
>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found]                         ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 17:28                           ` Grodzovsky, Andrey
       [not found]                             ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 17:28 UTC (permalink / raw)
  To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I don't know - what Kconfig flag should I look at ?

Andrey

On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 12:49 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>
>> Hi Andrey,
>>
>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>
> Attached is my lshw
>
>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>
> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>
> Andrey
>
>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Kuehling, Felix
>> Sent: Friday, October 18, 2019 4:55 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>> Not that I aware of, is there a special Kconfig flag to determine
>>> stack size ?
>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>
>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>
>> Thanks,
>>      Felix
>>
>>
>>> Andrey
>>>
>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>> I don't see why this problem would be specific to Arcturus. I don't
>>>> see any excessive allocations on the stack either. Also the code
>>>> involved here hasn't changed recently.
>>>>
>>>> Are you using some weird kernel config with a smaller stack? Is it
>>>> specific to a compiler version or some optimization flags? I've
>>>> sometimes seen function inlining cause excessive stack usage.
>>>>
>>>> Regards,
>>>>        Felix
>>>>
>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>>> 103.610769]
>>>>> ==================================================================
>>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] Read
>>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>
>>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer
>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856]
>>>>> Call Trace:
>>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>>> print_address_description+0x1da/0x3c0
>>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>>>> 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>>> kasan_report+0xe/0x20 [  103.614149]
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>>> up_write+0x4b/0x70 [  103.616649]
>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [
>>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>>> do_init_module+0xec/0x354 [  103.626011]  load_module+0x3c91/0x4980
>>>>> [  103.626118]  ? module_frob_arch_sections+0x20/0x20
>>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [
>>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250
>>>>> [ 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>> 00007f09984854d9
>>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>> 0000000000000006
>>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>> 0000000000000013
>>>>>
>>>>> [  103.626592] The buggy address belongs to the page:
>>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>> mapping:0000000000000000 index:0x0
>>>>> [  103.626675] flags: 0x2ffff0000000000() [  103.626686] raw:
>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>> 0000000000000000
>>>>> [  103.626696] raw: 0000000000000000 0000000000000000
>>>>> 00000000ffffffff
>>>>> 0000000000000000
>>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>>
>>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>> modprobe/1122 at offset 264 in frame:
>>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>
>>>>> [  103.627346] this frame has 3 objects:
>>>>> [  103.627405]  [32, 36) 'avail_size'
>>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>>
>>>>> [  103.627602] Memory state around the buggy address:
>>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>> f4 f4
>>>>> f4 f2 f2
>>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>> 00 00
>>>>> 00 00 00
>>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>> f3 f3
>>>>> f3 00 00
>>>>> [  103.627989]                                         ^ [
>>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00
>>>>> 00 00 00
>>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>> f3 00
>>>>> 00 00 00
>>>>> [  103.628273]
>>>>> ==================================================================
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Stack out of bounds in KFD on Arcturus
       [not found]                             ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-22 17:46                               ` Zeng, Oak
       [not found]                                 ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-22 17:46 UTC (permalink / raw)
  To: Grodzovsky, Andrey, Kuehling, Felix
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Sorry I searched my kconfig and I didn't find the stack size configure anymore...Maybe today kernel stack size is not configurable anymore...

Can you try your kernel on vega10 or 20 or navi10? We want to know whether this is mi100 specific issue.

Oak

-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
Sent: Tuesday, October 22, 2019 1:28 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus

I don't know - what Kconfig flag should I look at ?

Andrey

On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 12:49 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix 
> <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>
>> Hi Andrey,
>>
>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>
> Attached is my lshw
>
>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>
> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>
> Andrey
>
>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
>> Kuehling, Felix
>> Sent: Friday, October 18, 2019 4:55 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>> Not that I aware of, is there a special Kconfig flag to determine 
>>> stack size ?
>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>
>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>
>> Thanks,
>>      Felix
>>
>>
>>> Andrey
>>>
>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>> I don't see why this problem would be specific to Arcturus. I don't 
>>>> see any excessive allocations on the stack either. Also the code 
>>>> involved here hasn't changed recently.
>>>>
>>>> Are you using some weird kernel config with a smaller stack? Is it 
>>>> specific to a compiler version or some optimization flags? I've 
>>>> sometimes seen function inlining cause excessive stack usage.
>>>>
>>>> Regards,
>>>>        Felix
>>>>
>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [ 
>>>>> 103.610769] 
>>>>> ==================================================================
>>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] 
>>>>> Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>
>>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O 
>>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer 
>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856] 
>>>>> Call Trace:
>>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>>> print_address_description+0x1da/0x3c0
>>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] 
>>>>> [ 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>>> kasan_report+0xe/0x20 [  103.614149]
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>>> up_write+0x4b/0x70 [  103.616649]
>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [ 
>>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>>> do_init_module+0xec/0x354 [  103.626011]  
>>>>> load_module+0x3c91/0x4980 [  103.626118]  ? 
>>>>> module_frob_arch_sections+0x20/0x20
>>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [ 
>>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250 
>>>>> [ 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 
>>>>> 01
>>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>> 00007f09984854d9
>>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>> 0000000000000006
>>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>> 0000000000000013
>>>>>
>>>>> [  103.626592] The buggy address belongs to the page:
>>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>> mapping:0000000000000000 index:0x0 [  103.626675] flags: 
>>>>> 0x2ffff0000000000() [  103.626686] raw:
>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>> 0000000000000000
>>>>> [  103.626696] raw: 0000000000000000 0000000000000000 
>>>>> 00000000ffffffff
>>>>> 0000000000000000
>>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>>
>>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>> modprobe/1122 at offset 264 in frame:
>>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>
>>>>> [  103.627346] this frame has 3 objects:
>>>>> [  103.627405]  [32, 36) 'avail_size'
>>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>>
>>>>> [  103.627602] Memory state around the buggy address:
>>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>> f4 f4
>>>>> f4 f2 f2
>>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>> 00 00
>>>>> 00 00 00
>>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>> f3 f3
>>>>> f3 00 00
>>>>> [  103.627989]                                         ^ [ 
>>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00
>>>>> 00 00 00
>>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>> f3 00
>>>>> 00 00 00
>>>>> [  103.628273]
>>>>> ==================================================================
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack out of bounds in KFD on Arcturus
       [not found]                                 ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 18:00                                   ` Grodzovsky, Andrey
  0 siblings, 0 replies; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 18:00 UTC (permalink / raw)
  To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

No problem on Vega 20

Andrey

On 10/22/19 1:46 PM, Zeng, Oak wrote:
> Sorry I searched my kconfig and I didn't find the stack size configure anymore...Maybe today kernel stack size is not configurable anymore...
>
> Can you try your kernel on vega10 or 20 or navi10? We want to know whether this is mi100 specific issue.
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 1:28 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> I don't know - what Kconfig flag should I look at ?
>
> Andrey
>
> On 10/22/19 1:17 PM, Zeng, Oak wrote:
>> Sorry I meant is the kernel stack size 16KB in your kconfig?
>>
>> Oak
>>
>> -----Original Message-----
>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Sent: Tuesday, October 22, 2019 12:49 PM
>> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix
>> <Felix.Kuehling@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>>
>>> Hi Andrey,
>>>
>>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>> Attached is my lshw
>>
>>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>>
>> Andrey
>>
>>
>>> Regards,
>>> Oak
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> Kuehling, Felix
>>> Sent: Friday, October 18, 2019 4:55 PM
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>>
>>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>>> Not that I aware of, is there a special Kconfig flag to determine
>>>> stack size ?
>>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>>
>>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>>
>>> Thanks,
>>>       Felix
>>>
>>>
>>>> Andrey
>>>>
>>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>>> I don't see why this problem would be specific to Arcturus. I don't
>>>>> see any excessive allocations on the stack either. Also the code
>>>>> involved here hasn't changed recently.
>>>>>
>>>>> Are you using some weird kernel config with a smaller stack? Is it
>>>>> specific to a compiler version or some optimization flags? I've
>>>>> sometimes seen function inlining cause excessive stack usage.
>>>>>
>>>>> Regards,
>>>>>         Felix
>>>>>
>>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>>>> 103.610769]
>>>>>> ==================================================================
>>>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646]
>>>>>> Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>>
>>>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer
>>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856]
>>>>>> Call Trace:
>>>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>>>> print_address_description+0x1da/0x3c0
>>>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>>>>> [ 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>>>> kasan_report+0xe/0x20 [  103.614149]
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>>>> up_write+0x4b/0x70 [  103.616649]
>>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [
>>>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>>>> do_init_module+0xec/0x354 [  103.626011]
>>>>>> load_module+0x3c91/0x4980 [  103.626118]  ?
>>>>>> module_frob_arch_sections+0x20/0x20
>>>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [
>>>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250
>>>>>> [ 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>>> 08 0f
>>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89
>>>>>> 01
>>>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>>> 0000000000000139
>>>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>>> 00007f09984854d9
>>>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>>> 0000000000000006
>>>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>>> 0000000000000000
>>>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>>> 0000000000000000
>>>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>>> 0000000000000013
>>>>>>
>>>>>> [  103.626592] The buggy address belongs to the page:
>>>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>>> mapping:0000000000000000 index:0x0 [  103.626675] flags:
>>>>>> 0x2ffff0000000000() [  103.626686] raw:
>>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>>> 0000000000000000
>>>>>> [  103.626696] raw: 0000000000000000 0000000000000000
>>>>>> 00000000ffffffff
>>>>>> 0000000000000000
>>>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>>>
>>>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>>> modprobe/1122 at offset 264 in frame:
>>>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>>
>>>>>> [  103.627346] this frame has 3 objects:
>>>>>> [  103.627405]  [32, 36) 'avail_size'
>>>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>>>
>>>>>> [  103.627602] Memory state around the buggy address:
>>>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>>> f4 f4
>>>>>> f4 f2 f2
>>>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>>> 00 00
>>>>>> 00 00 00
>>>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>>> f3 f3
>>>>>> f3 00 00
>>>>>> [  103.627989]                                         ^ [
>>>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> 00
>>>>>> 00 00 00
>>>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>>> f3 00
>>>>>> 00 00 00
>>>>>> [  103.628273]
>>>>>> ==================================================================
>>>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-22 18:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17 20:09 Stack out of bounds in KFD on Arcturus Grodzovsky, Andrey
     [not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
2019-10-17 21:29   ` Kuehling, Felix
     [not found]     ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
2019-10-17 22:38       ` Grodzovsky, Andrey
     [not found]         ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
2019-10-18 20:55           ` Kuehling, Felix
     [not found]             ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
2019-10-18 21:31               ` Zeng, Oak
     [not found]                 ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 16:48                   ` Grodzovsky, Andrey
     [not found]                     ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:17                       ` Zeng, Oak
     [not found]                         ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 17:28                           ` Grodzovsky, Andrey
     [not found]                             ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:46                               ` Zeng, Oak
     [not found]                                 ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 18:00                                   ` Grodzovsky, Andrey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.