* Stack out of bounds in KFD on Arcturus
@ 2019-10-17 20:09 Grodzovsky, Andrey
[not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-17 20:09 UTC (permalink / raw)
To: Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
He Felix - I see this on boot when working with Arcturus.
Andrey
[ 103.602092] kfd kfd: Allocated 3969056 bytes on gart
[ 103.610769]
==================================================================
[ 103.611469] BUG: KASAN: stack-out-of-bounds in
kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[ 103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
[ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
O 5.3.0-rc3+ #45
[ 103.611847] Hardware name: System manufacturer System Product
Name/Z170-PRO, BIOS 1902 06/27/2016
[ 103.611856] Call Trace:
[ 103.611879] dump_stack+0x71/0xab
[ 103.611907] print_address_description+0x1da/0x3c0
[ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[ 103.612479] __kasan_report+0x13f/0x1a0
[ 103.613022] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[ 103.613580] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[ 103.613604] kasan_report+0xe/0x20
[ 103.614149] kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
[ 103.614762] ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
[ 103.614796] ? __alloc_pages_nodemask+0x2c9/0x560
[ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
[ 103.614898] ? kmalloc_order+0x63/0x70
[ 103.615469] kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
[ 103.616054] ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
[ 103.616095] ? up_write+0x4b/0x70
[ 103.616649] kfd_topology_add_device+0x98d/0xb10 [amdgpu]
[ 103.617207] ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
[ 103.617743] ? start_cpsch+0x2ff/0x3a0 [amdgpu]
[ 103.617777] ? mutex_lock_io_nested+0xac0/0xac0
[ 103.617807] ? __mutex_unlock_slowpath+0xda/0x420
[ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
[ 103.617877] ? wait_for_completion+0x200/0x200
[ 103.618461] ? start_cpsch+0x38b/0x3a0 [amdgpu]
[ 103.619011] ? create_queue_cpsch+0x670/0x670 [amdgpu]
[ 103.619573] ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
[ 103.620112] ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
[ 103.620655] ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
[ 103.621228] kgd2kfd_device_init+0x474/0x870 [amdgpu]
[ 103.621781] amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
[ 103.622329] ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
[ 103.622344] ? kmsg_dump_rewind_nolock+0x59/0x59
[ 103.622895] ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
[ 103.623424] amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
[ 103.623819] ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
[ 103.623842] ? __isolate_free_page+0x290/0x290
[ 103.623852] ? fs_reclaim_acquire.part.97+0x5/0x30
[ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
[ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
[ 103.623945] ? kasan_unpoison_shadow+0x31/0x40
[ 103.623970] ? kmalloc_order+0x63/0x70
[ 103.624337] amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
[ 103.624690] ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
[ 103.624756] ? drm_dev_register+0x19c/0x310 [drm]
[ 103.624768] ? __kasan_slab_free+0x133/0x160
[ 103.624849] drm_dev_register+0x1f5/0x310 [drm]
[ 103.625212] amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
[ 103.625565] ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
[ 103.625580] local_pci_probe+0x74/0xd0
[ 103.625603] pci_device_probe+0x1fa/0x310
[ 103.625620] ? pci_device_remove+0x1c0/0x1c0
[ 103.625640] ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
[ 103.625673] really_probe+0x367/0x5d0
[ 103.625700] driver_probe_device+0x177/0x1b0
[ 103.625721] device_driver_attach+0x8a/0x90
[ 103.625737] ? device_driver_attach+0x90/0x90
[ 103.625746] __driver_attach+0xeb/0x190
[ 103.625765] ? device_driver_attach+0x90/0x90
[ 103.625773] bus_for_each_dev+0xe4/0x160
[ 103.625789] ? subsys_dev_iter_exit+0x10/0x10
[ 103.625829] bus_add_driver+0x277/0x330
[ 103.625855] driver_register+0xc6/0x1a0
[ 103.625866] ? 0xffffffffa0d88000
[ 103.625880] do_one_initcall+0xd3/0x334
[ 103.625895] ? trace_event_raw_event_initcall_finish+0x150/0x150
[ 103.625911] ? kasan_unpoison_shadow+0x31/0x40
[ 103.625924] ? __kasan_kmalloc+0xd5/0xf0
[ 103.625946] ? kmem_cache_alloc_trace+0x154/0x300
[ 103.625955] ? kasan_unpoison_shadow+0x31/0x40
[ 103.625985] do_init_module+0xec/0x354
[ 103.626011] load_module+0x3c91/0x4980
[ 103.626118] ? module_frob_arch_sections+0x20/0x20
[ 103.626132] ? ima_read_file+0x10/0x10
[ 103.626142] ? vfs_read+0x127/0x190
[ 103.626163] ? kernel_read+0x95/0xb0
[ 103.626187] ? kernel_read_file+0x1a5/0x340
[ 103.626277] ? __do_sys_finit_module+0x175/0x1b0
[ 103.626287] __do_sys_finit_module+0x175/0x1b0
[ 103.626301] ? __ia32_sys_init_module+0x40/0x40
[ 103.626338] ? lock_downgrade+0x390/0x390
[ 103.626396] ? vtime_user_exit+0xc8/0xe0
[ 103.626423] do_syscall_64+0x7d/0x250
[ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 103.626450] RIP: 0033:0x7f09984854d9
[ 103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
[ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
00007f09984854d9
[ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
0000000000000006
[ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
0000000000000000
[ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
0000000000000000
[ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
0000000000000013
[ 103.626592] The buggy address belongs to the page:
[ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
mapping:0000000000000000 index:0x0
[ 103.626675] flags: 0x2ffff0000000000()
[ 103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
0000000000000000
[ 103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
0000000000000000
[ 103.626702] page dumped because: kasan: bad access detected
[ 103.626742] addr ffff8883cb19ee38 is located in stack of task
modprobe/1122 at offset 264 in frame:
[ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
[ 103.627346] this frame has 3 objects:
[ 103.627405] [32, 36) 'avail_size'
[ 103.627410] [96, 120) 'local_mem_info'
[ 103.627466] [160, 264) 'cu_info'
[ 103.627602] Memory state around the buggy address:
[ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
f4 f2 f2
[ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
00 00 00
[ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
f3 00 00
[ 103.627989] ^
[ 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
[ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
00 00 00
[ 103.628273]
==================================================================
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-17 21:29 ` Kuehling, Felix
[not found] ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Kuehling, Felix @ 2019-10-17 21:29 UTC (permalink / raw)
To: Grodzovsky, Andrey; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
I don't see why this problem would be specific to Arcturus. I don't see
any excessive allocations on the stack either. Also the code involved
here hasn't changed recently.
Are you using some weird kernel config with a smaller stack? Is it
specific to a compiler version or some optimization flags? I've
sometimes seen function inlining cause excessive stack usage.
Regards,
Felix
On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
> He Felix - I see this on boot when working with Arcturus.
>
> Andrey
>
>
> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart
> [ 103.610769]
> ==================================================================
> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [ 103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>
> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
> O 5.3.0-rc3+ #45
> [ 103.611847] Hardware name: System manufacturer System Product
> Name/Z170-PRO, BIOS 1902 06/27/2016
> [ 103.611856] Call Trace:
> [ 103.611879] dump_stack+0x71/0xab
> [ 103.611907] print_address_description+0x1da/0x3c0
> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [ 103.612479] __kasan_report+0x13f/0x1a0
> [ 103.613022] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [ 103.613580] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [ 103.613604] kasan_report+0xe/0x20
> [ 103.614149] kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
> [ 103.614762] ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
> [ 103.614796] ? __alloc_pages_nodemask+0x2c9/0x560
> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
> [ 103.614898] ? kmalloc_order+0x63/0x70
> [ 103.615469] kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
> [ 103.616054] ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
> [ 103.616095] ? up_write+0x4b/0x70
> [ 103.616649] kfd_topology_add_device+0x98d/0xb10 [amdgpu]
> [ 103.617207] ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
> [ 103.617743] ? start_cpsch+0x2ff/0x3a0 [amdgpu]
> [ 103.617777] ? mutex_lock_io_nested+0xac0/0xac0
> [ 103.617807] ? __mutex_unlock_slowpath+0xda/0x420
> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
> [ 103.617877] ? wait_for_completion+0x200/0x200
> [ 103.618461] ? start_cpsch+0x38b/0x3a0 [amdgpu]
> [ 103.619011] ? create_queue_cpsch+0x670/0x670 [amdgpu]
> [ 103.619573] ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
> [ 103.620112] ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
> [ 103.620655] ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
> [ 103.621228] kgd2kfd_device_init+0x474/0x870 [amdgpu]
> [ 103.621781] amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
> [ 103.622329] ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
> [ 103.622344] ? kmsg_dump_rewind_nolock+0x59/0x59
> [ 103.622895] ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
> [ 103.623424] amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
> [ 103.623819] ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
> [ 103.623842] ? __isolate_free_page+0x290/0x290
> [ 103.623852] ? fs_reclaim_acquire.part.97+0x5/0x30
> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40
> [ 103.623970] ? kmalloc_order+0x63/0x70
> [ 103.624337] amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
> [ 103.624690] ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
> [ 103.624756] ? drm_dev_register+0x19c/0x310 [drm]
> [ 103.624768] ? __kasan_slab_free+0x133/0x160
> [ 103.624849] drm_dev_register+0x1f5/0x310 [drm]
> [ 103.625212] amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
> [ 103.625565] ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
> [ 103.625580] local_pci_probe+0x74/0xd0
> [ 103.625603] pci_device_probe+0x1fa/0x310
> [ 103.625620] ? pci_device_remove+0x1c0/0x1c0
> [ 103.625640] ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
> [ 103.625673] really_probe+0x367/0x5d0
> [ 103.625700] driver_probe_device+0x177/0x1b0
> [ 103.625721] device_driver_attach+0x8a/0x90
> [ 103.625737] ? device_driver_attach+0x90/0x90
> [ 103.625746] __driver_attach+0xeb/0x190
> [ 103.625765] ? device_driver_attach+0x90/0x90
> [ 103.625773] bus_for_each_dev+0xe4/0x160
> [ 103.625789] ? subsys_dev_iter_exit+0x10/0x10
> [ 103.625829] bus_add_driver+0x277/0x330
> [ 103.625855] driver_register+0xc6/0x1a0
> [ 103.625866] ? 0xffffffffa0d88000
> [ 103.625880] do_one_initcall+0xd3/0x334
> [ 103.625895] ? trace_event_raw_event_initcall_finish+0x150/0x150
> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40
> [ 103.625924] ? __kasan_kmalloc+0xd5/0xf0
> [ 103.625946] ? kmem_cache_alloc_trace+0x154/0x300
> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40
> [ 103.625985] do_init_module+0xec/0x354
> [ 103.626011] load_module+0x3c91/0x4980
> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
> [ 103.626132] ? ima_read_file+0x10/0x10
> [ 103.626142] ? vfs_read+0x127/0x190
> [ 103.626163] ? kernel_read+0x95/0xb0
> [ 103.626187] ? kernel_read_file+0x1a5/0x340
> [ 103.626277] ? __do_sys_finit_module+0x175/0x1b0
> [ 103.626287] __do_sys_finit_module+0x175/0x1b0
> [ 103.626301] ? __ia32_sys_init_module+0x40/0x40
> [ 103.626338] ? lock_downgrade+0x390/0x390
> [ 103.626396] ? vtime_user_exit+0xc8/0xe0
> [ 103.626423] do_syscall_64+0x7d/0x250
> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 103.626450] RIP: 0033:0x7f09984854d9
> [ 103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
> [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
> 00007f09984854d9
> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
> 0000000000000006
> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
> 0000000000000000
> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000000000000
> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
> 0000000000000013
>
> [ 103.626592] The buggy address belongs to the page:
> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0
> [ 103.626675] flags: 0x2ffff0000000000()
> [ 103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
> 0000000000000000
> [ 103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [ 103.626702] page dumped because: kasan: bad access detected
>
> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
> modprobe/1122 at offset 264 in frame:
> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>
> [ 103.627346] this frame has 3 objects:
> [ 103.627405] [32, 36) 'avail_size'
> [ 103.627410] [96, 120) 'local_mem_info'
> [ 103.627466] [160, 264) 'cu_info'
>
> [ 103.627602] Memory state around the buggy address:
> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
> f4 f2 f2
> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
> 00 00 00
> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
> f3 00 00
> [ 103.627989] ^
> [ 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00
> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
> 00 00 00
> [ 103.628273]
> ==================================================================
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-17 22:38 ` Grodzovsky, Andrey
[not found] ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-17 22:38 UTC (permalink / raw)
To: Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Not that I aware of, is there a special Kconfig flag to determine stack
size ?
Andrey
On 10/17/19 5:29 PM, Kuehling, Felix wrote:
> I don't see why this problem would be specific to Arcturus. I don't see
> any excessive allocations on the stack either. Also the code involved
> here hasn't changed recently.
>
> Are you using some weird kernel config with a smaller stack? Is it
> specific to a compiler version or some optimization flags? I've
> sometimes seen function inlining cause excessive stack usage.
>
> Regards,
> Felix
>
> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>> He Felix - I see this on boot when working with Arcturus.
>>
>> Andrey
>>
>>
>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart
>> [ 103.610769]
>> ==================================================================
>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [ 103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>
>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>> O 5.3.0-rc3+ #45
>> [ 103.611847] Hardware name: System manufacturer System Product
>> Name/Z170-PRO, BIOS 1902 06/27/2016
>> [ 103.611856] Call Trace:
>> [ 103.611879] dump_stack+0x71/0xab
>> [ 103.611907] print_address_description+0x1da/0x3c0
>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [ 103.612479] __kasan_report+0x13f/0x1a0
>> [ 103.613022] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [ 103.613580] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [ 103.613604] kasan_report+0xe/0x20
>> [ 103.614149] kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>> [ 103.614762] ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>> [ 103.614796] ? __alloc_pages_nodemask+0x2c9/0x560
>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>> [ 103.614898] ? kmalloc_order+0x63/0x70
>> [ 103.615469] kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>> [ 103.616054] ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>> [ 103.616095] ? up_write+0x4b/0x70
>> [ 103.616649] kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>> [ 103.617207] ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>> [ 103.617743] ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>> [ 103.617777] ? mutex_lock_io_nested+0xac0/0xac0
>> [ 103.617807] ? __mutex_unlock_slowpath+0xda/0x420
>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>> [ 103.617877] ? wait_for_completion+0x200/0x200
>> [ 103.618461] ? start_cpsch+0x38b/0x3a0 [amdgpu]
>> [ 103.619011] ? create_queue_cpsch+0x670/0x670 [amdgpu]
>> [ 103.619573] ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>> [ 103.620112] ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>> [ 103.620655] ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>> [ 103.621228] kgd2kfd_device_init+0x474/0x870 [amdgpu]
>> [ 103.621781] amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>> [ 103.622329] ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>> [ 103.622344] ? kmsg_dump_rewind_nolock+0x59/0x59
>> [ 103.622895] ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>> [ 103.623424] amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>> [ 103.623819] ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>> [ 103.623842] ? __isolate_free_page+0x290/0x290
>> [ 103.623852] ? fs_reclaim_acquire.part.97+0x5/0x30
>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40
>> [ 103.623970] ? kmalloc_order+0x63/0x70
>> [ 103.624337] amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>> [ 103.624690] ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>> [ 103.624756] ? drm_dev_register+0x19c/0x310 [drm]
>> [ 103.624768] ? __kasan_slab_free+0x133/0x160
>> [ 103.624849] drm_dev_register+0x1f5/0x310 [drm]
>> [ 103.625212] amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>> [ 103.625565] ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
>> [ 103.625580] local_pci_probe+0x74/0xd0
>> [ 103.625603] pci_device_probe+0x1fa/0x310
>> [ 103.625620] ? pci_device_remove+0x1c0/0x1c0
>> [ 103.625640] ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
>> [ 103.625673] really_probe+0x367/0x5d0
>> [ 103.625700] driver_probe_device+0x177/0x1b0
>> [ 103.625721] device_driver_attach+0x8a/0x90
>> [ 103.625737] ? device_driver_attach+0x90/0x90
>> [ 103.625746] __driver_attach+0xeb/0x190
>> [ 103.625765] ? device_driver_attach+0x90/0x90
>> [ 103.625773] bus_for_each_dev+0xe4/0x160
>> [ 103.625789] ? subsys_dev_iter_exit+0x10/0x10
>> [ 103.625829] bus_add_driver+0x277/0x330
>> [ 103.625855] driver_register+0xc6/0x1a0
>> [ 103.625866] ? 0xffffffffa0d88000
>> [ 103.625880] do_one_initcall+0xd3/0x334
>> [ 103.625895] ? trace_event_raw_event_initcall_finish+0x150/0x150
>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40
>> [ 103.625924] ? __kasan_kmalloc+0xd5/0xf0
>> [ 103.625946] ? kmem_cache_alloc_trace+0x154/0x300
>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40
>> [ 103.625985] do_init_module+0xec/0x354
>> [ 103.626011] load_module+0x3c91/0x4980
>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>> [ 103.626132] ? ima_read_file+0x10/0x10
>> [ 103.626142] ? vfs_read+0x127/0x190
>> [ 103.626163] ? kernel_read+0x95/0xb0
>> [ 103.626187] ? kernel_read_file+0x1a5/0x340
>> [ 103.626277] ? __do_sys_finit_module+0x175/0x1b0
>> [ 103.626287] __do_sys_finit_module+0x175/0x1b0
>> [ 103.626301] ? __ia32_sys_init_module+0x40/0x40
>> [ 103.626338] ? lock_downgrade+0x390/0x390
>> [ 103.626396] ? vtime_user_exit+0xc8/0xe0
>> [ 103.626423] do_syscall_64+0x7d/0x250
>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 103.626450] RIP: 0033:0x7f09984854d9
>> [ 103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
>> [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000139
>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>> 00007f09984854d9
>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>> 0000000000000006
>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>> 0000000000000000
>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>> 0000000000000013
>>
>> [ 103.626592] The buggy address belongs to the page:
>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x0
>> [ 103.626675] flags: 0x2ffff0000000000()
>> [ 103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
>> 0000000000000000
>> [ 103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [ 103.626702] page dumped because: kasan: bad access detected
>>
>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>> modprobe/1122 at offset 264 in frame:
>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>
>> [ 103.627346] this frame has 3 objects:
>> [ 103.627405] [32, 36) 'avail_size'
>> [ 103.627410] [96, 120) 'local_mem_info'
>> [ 103.627466] [160, 264) 'cu_info'
>>
>> [ 103.627602] Memory state around the buggy address:
>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
>> f4 f2 f2
>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
>> 00 00 00
>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
>> f3 00 00
>> [ 103.627989] ^
>> [ 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00
>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
>> 00 00 00
>> [ 103.628273]
>> ==================================================================
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-18 20:55 ` Kuehling, Felix
[not found] ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Kuehling, Felix @ 2019-10-18 20:55 UTC (permalink / raw)
To: Grodzovsky, Andrey; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine stack
> size ?
I remember there used to be a Kconfig option to force a 4KB kernel
stack. I don't see it in the current kernel any more.
I don't have time to work on this myself. I'll create a ticket and see
if I can find someone to investigate.
Thanks,
Felix
>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't see
>> any excessive allocations on the stack either. Also the code involved
>> here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it
>> specific to a compiler version or some optimization flags? I've
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>> Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart
>>> [ 103.610769]
>>> ==================================================================
>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [ 103.611646] Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>
>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>>> O 5.3.0-rc3+ #45
>>> [ 103.611847] Hardware name: System manufacturer System Product
>>> Name/Z170-PRO, BIOS 1902 06/27/2016
>>> [ 103.611856] Call Trace:
>>> [ 103.611879] dump_stack+0x71/0xab
>>> [ 103.611907] print_address_description+0x1da/0x3c0
>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [ 103.612479] __kasan_report+0x13f/0x1a0
>>> [ 103.613022] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [ 103.613580] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [ 103.613604] kasan_report+0xe/0x20
>>> [ 103.614149] kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [ 103.614762] ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>>> [ 103.614796] ? __alloc_pages_nodemask+0x2c9/0x560
>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>> [ 103.614898] ? kmalloc_order+0x63/0x70
>>> [ 103.615469] kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>>> [ 103.616054] ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>>> [ 103.616095] ? up_write+0x4b/0x70
>>> [ 103.616649] kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>>> [ 103.617207] ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>>> [ 103.617743] ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>>> [ 103.617777] ? mutex_lock_io_nested+0xac0/0xac0
>>> [ 103.617807] ? __mutex_unlock_slowpath+0xda/0x420
>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>> [ 103.617877] ? wait_for_completion+0x200/0x200
>>> [ 103.618461] ? start_cpsch+0x38b/0x3a0 [amdgpu]
>>> [ 103.619011] ? create_queue_cpsch+0x670/0x670 [amdgpu]
>>> [ 103.619573] ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>>> [ 103.620112] ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>>> [ 103.620655] ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>>> [ 103.621228] kgd2kfd_device_init+0x474/0x870 [amdgpu]
>>> [ 103.621781] amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>>> [ 103.622329] ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>>> [ 103.622344] ? kmsg_dump_rewind_nolock+0x59/0x59
>>> [ 103.622895] ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>>> [ 103.623424] amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>>> [ 103.623819] ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>>> [ 103.623842] ? __isolate_free_page+0x290/0x290
>>> [ 103.623852] ? fs_reclaim_acquire.part.97+0x5/0x30
>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40
>>> [ 103.623970] ? kmalloc_order+0x63/0x70
>>> [ 103.624337] amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>>> [ 103.624690] ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>>> [ 103.624756] ? drm_dev_register+0x19c/0x310 [drm]
>>> [ 103.624768] ? __kasan_slab_free+0x133/0x160
>>> [ 103.624849] drm_dev_register+0x1f5/0x310 [drm]
>>> [ 103.625212] amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>>> [ 103.625565] ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
>>> [ 103.625580] local_pci_probe+0x74/0xd0
>>> [ 103.625603] pci_device_probe+0x1fa/0x310
>>> [ 103.625620] ? pci_device_remove+0x1c0/0x1c0
>>> [ 103.625640] ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>> [ 103.625673] really_probe+0x367/0x5d0
>>> [ 103.625700] driver_probe_device+0x177/0x1b0
>>> [ 103.625721] device_driver_attach+0x8a/0x90
>>> [ 103.625737] ? device_driver_attach+0x90/0x90
>>> [ 103.625746] __driver_attach+0xeb/0x190
>>> [ 103.625765] ? device_driver_attach+0x90/0x90
>>> [ 103.625773] bus_for_each_dev+0xe4/0x160
>>> [ 103.625789] ? subsys_dev_iter_exit+0x10/0x10
>>> [ 103.625829] bus_add_driver+0x277/0x330
>>> [ 103.625855] driver_register+0xc6/0x1a0
>>> [ 103.625866] ? 0xffffffffa0d88000
>>> [ 103.625880] do_one_initcall+0xd3/0x334
>>> [ 103.625895] ? trace_event_raw_event_initcall_finish+0x150/0x150
>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40
>>> [ 103.625924] ? __kasan_kmalloc+0xd5/0xf0
>>> [ 103.625946] ? kmem_cache_alloc_trace+0x154/0x300
>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40
>>> [ 103.625985] do_init_module+0xec/0x354
>>> [ 103.626011] load_module+0x3c91/0x4980
>>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>>> [ 103.626132] ? ima_read_file+0x10/0x10
>>> [ 103.626142] ? vfs_read+0x127/0x190
>>> [ 103.626163] ? kernel_read+0x95/0xb0
>>> [ 103.626187] ? kernel_read_file+0x1a5/0x340
>>> [ 103.626277] ? __do_sys_finit_module+0x175/0x1b0
>>> [ 103.626287] __do_sys_finit_module+0x175/0x1b0
>>> [ 103.626301] ? __ia32_sys_init_module+0x40/0x40
>>> [ 103.626338] ? lock_downgrade+0x390/0x390
>>> [ 103.626396] ? vtime_user_exit+0xc8/0xe0
>>> [ 103.626423] do_syscall_64+0x7d/0x250
>>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [ 103.626450] RIP: 0033:0x7f09984854d9
>>> [ 103.626461] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01 48
>>> [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000139
>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>> 00007f09984854d9
>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>> 0000000000000006
>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>> 0000000000000013
>>>
>>> [ 103.626592] The buggy address belongs to the page:
>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>> mapping:0000000000000000 index:0x0
>>> [ 103.626675] flags: 0x2ffff0000000000()
>>> [ 103.626686] raw: 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>> 0000000000000000
>>> [ 103.626696] raw: 0000000000000000 0000000000000000 00000000ffffffff
>>> 0000000000000000
>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>
>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>> modprobe/1122 at offset 264 in frame:
>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>
>>> [ 103.627346] this frame has 3 objects:
>>> [ 103.627405] [32, 36) 'avail_size'
>>> [ 103.627410] [96, 120) 'local_mem_info'
>>> [ 103.627466] [160, 264) 'cu_info'
>>>
>>> [ 103.627602] Memory state around the buggy address:
>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04 f4 f4
>>> f4 f2 f2
>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00 00 00
>>> 00 00 00
>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3 f3 f3
>>> f3 00 00
>>> [ 103.627989] ^
>>> [ 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00
>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00
>>> 00 00 00
>>> [ 103.628273]
>>> ==================================================================
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Stack out of bounds in KFD on Arcturus
[not found] ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-18 21:31 ` Zeng, Oak
[not found] ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-18 21:31 UTC (permalink / raw)
To: Kuehling, Felix, Grodzovsky, Andrey
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
[-- Attachment #1: Type: text/plain, Size: 9702 bytes --]
Hi Andrey,
What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
Regards,
Oak
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Friday, October 18, 2019 4:55 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus
On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine
> stack size ?
I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
Thanks,
Felix
>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't
>> see any excessive allocations on the stack either. Also the code
>> involved here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it
>> specific to a compiler version or some optimization flags? I've
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>> Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>> 103.610769]
>>> ==================================================================
>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646] Read
>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>
>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>> Call Trace:
>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>> print_address_description+0x1da/0x3c0
>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>> 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>> kasan_report+0xe/0x20 [ 103.614149]
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>> __alloc_pages_nodemask+0x2c9/0x560
>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>> up_write+0x4b/0x70 [ 103.616649]
>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>> __mutex_unlock_slowpath+0xda/0x420
>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>> fs_reclaim_acquire.part.97+0x5/0x30
>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>> kmem_cache_alloc_trace+0x154/0x300
>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>> do_init_module+0xec/0x354 [ 103.626011] load_module+0x3c91/0x4980
>>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250 [
>>> 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
>>> 0f
>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000139
>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>> 00007f09984854d9
>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>> 0000000000000006
>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>> 0000000000000013
>>>
>>> [ 103.626592] The buggy address belongs to the page:
>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>> mapping:0000000000000000 index:0x0
>>> [ 103.626675] flags: 0x2ffff0000000000() [ 103.626686] raw:
>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>> 0000000000000000
>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>> 00000000ffffffff
>>> 0000000000000000
>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>
>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>> modprobe/1122 at offset 264 in frame:
>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>
>>> [ 103.627346] this frame has 3 objects:
>>> [ 103.627405] [32, 36) 'avail_size'
>>> [ 103.627410] [96, 120) 'local_mem_info'
>>> [ 103.627466] [160, 264) 'cu_info'
>>>
>>> [ 103.627602] Memory state around the buggy address:
>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>> f4 f4
>>> f4 f2 f2
>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>> 00 00
>>> 00 00 00
>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>> f3 f3
>>> f3 00 00
>>> [ 103.627989] ^ [
>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00
>>> 00 00 00
>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>> f3 00
>>> 00 00 00
>>> [ 103.628273]
>>> ==================================================================
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #2: Type: message/rfc822, Size: 212414 bytes --]
[-- Attachment #2.1.1: Type: text/plain, Size: 12150 bytes --]
MI100 HW Enablement Linux SW Stack(VBIOS, FW, DKMS Kernel Driver) Integration Test Report<http://confluence.amd.com/x/c9iEC>
Dashboard
Hardware info
AMDGPU Linux Stack
Linux Distro
Status
SUT-1 Configuration:
* Motherboard: ASUS PRIME Z270-A
* CPU: i7-7700K CPU @ 4.20GHz
* Memory: Kingston DDR4 2133 8GB *2
* ASIC: MI100 socket PA Non-Secure board revB 102-D34101-01
* VBIOS:
* 10 Oct 2019 D3410100.019<http://storeiis2/BIOSTest/SignedBIOS/G0484/484666/D3410100.019>
* ROCm DKMS Package:
* Firmware: http://git.amd.com:8080/plugins/gitiles/brahma/ec/utility/brahma-utils/+log/amd-staging
* commit: 18bb9059 firmware/arcturus: update rlc firmware
* version: RLC: 21.1, MEC: 33.45, SMC: 54.7, SDMA: 34.44, SOS: 0x0017002a; ASD: 0x21000018; XGMI TA: 0x20000003; RAS TA: 1B00000C
* Kernel: http://git.amd.com:8080/plugins/gitiles/brahma/ec/linux/+log/amd-mainline-dkms-5.0
* commit: 6b05d1f005c0 drm/amdgpu/swSMU: custom UMD pstate peak clock for navi14
* amdgpu-dkms package: amdgpu-dkms_1910121037-6b05d1f005c0_all.deb<http://srdcartifactory/artifactory/api/download/linux-ci-generic-local/builds/canli/secure/amdgpu-dkms_1910121037-6b05d1f005c0_all.deb>
* ROCm LKG build for UMD:
* 20 Sep 2019 http://rocm-ci/job/compute-rocm-dkms-no-npi/1004/
Ubuntu 18.04.3 LTS
PROMOTABLE
SUT-2 Configuration:
* Motherboard: Supermicro X10DRG-OT (SYS-4028GR-TRT2)
* CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
* Memory: Micron DDR4 2667 MT/s 64GB *12
* ASIC: MI100 102-D34302-00 PCIe Product Board 32GB (U/F) Non-Secure board XGMI 2P
* VBIOS:
* 10 Oct 2019 D3430200.L19<http://storeiis2/BIOSTest/SignedBIOS/G0484/484668/D3430200.L19> (enable flag: ENABLE_SEC_POLICY_ON_UNSEC_ASIC and Large Bar)
* ROCm DKMS Package:
* Firmware: http://git.amd.com:8080/plugins/gitiles/brahma/ec/utility/brahma-utils/+log/amd-staging
* commit: 18bb9059 firmware/arcturus: update rlc firmware
* version: RLC: 21.1, MEC: 33.45, SMC: 54.7, SDMA: 34.44, SOS: 0x0017002a; ASD: 0x21000018; XGMI TA: 0x20000003; RAS TA: 1B00000C
* Kernel: http://git.amd.com:8080/plugins/gitiles/brahma/ec/linux/+log/amd-mainline-dkms-5.0
* commit: 6b05d1f005c0 drm/amdgpu/swSMU: custom UMD pstate peak clock for navi14
* amdgpu-dkms package: amdgpu-dkms_1910121037-6b05d1f005c0_all.deb<http://srdcartifactory/artifactory/api/download/linux-ci-generic-local/builds/canli/secure/amdgpu-dkms_1910121037-6b05d1f005c0_all.deb>
* ROCm LKG build for UMD:
* 20 Sep 2019 http://rocm-ci/job/compute-rocm-dkms-no-npi/1004/
Ubuntu 18.04.3 LTS
PROMOTABLE
SUT-3 Configuration:
* Motherboard: Supermicro X10DRG-Q (SYS-7048GR-TR)
* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
* Memory: Micron DDR4 2133 MT/s 16GB *7
* ASIC: MI100 102-D34302-00 PCIe Product Board 32GB (U/F) Non-Secure board *2 non-XGMI
Reference:
MI100 VBIOS: http://home.amd.com/VideoBios/Video%20BIOS%20Releases/SingleASICRelease.asp?AsicName=MI100
ROCm build for MI100: http://rocm-ci/job/compute-rocm-dkms-no-npi/
How to replace kernel driver and FWs: How to install and replace kernel driver and FWs for MI100<http://confluence.amd.com/display/~canli/How+to+install+and+replace+kernel+driver+and+FWs+for+MI100>
Executive Summary
What's Current and New?
* Outstanding issues:
* Issue can be observed with VBIOS L18 on XGMI 2P but not on non-XGMI
* [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030> - [MI100] kfdtest subtests failed on XGMI 2P with large bar enabled Opened
* Existing issues:
* [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> - [MI100 XGMI] UCLK/SOCCLK/FCLK DPM are still disabled with XGMI enabled Opened
* [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-201443<http://ontrack-internal.amd.com/browse/SWDEV-201443> - Linux Pro: KFDMemoryTest.BigBufferStressTest fails Assessed
* VBIOS upgraded to v19
* RLC FW upgrade to 21.1, SOS FW upgrade to SOS: 0x0017002a
* Power Feature enablement status
Feature
SMU FW Ready
AMDGPU Kernel Ready
DPM_PREFETCHER
Yes
Yes
DPM_GFXCLK
Yes
Yes
DPM_UCLK
Yes
Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
DPM_SOCCLK
Yes
Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
DPM_FCLK
Yes
Checking on driver side
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
DPM_XGMI
No
No
DS_GFXCLK
Yes
Yes
DS_SOCCLK
Yes
Yes
DS_LCLK
Yes
Yes
Require ASPM L1 support in Driver and M/B(Under discussion)
DS_FCLK
Yes
Yes
GFX_ULV
Yes
Yes
DPM_VCN
Yes
VCN disabled for PSP front door loading due to the issue: [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022> Assessed
RSMU_SMN_CG
Yes
Yes
WAFL_CG
No
No
PPT
Yes
Yes
Depends on PPTable setting to enable 4 PPT(PPTable Not ready) or 1 PPT
TDC
Yes
Yes
APCC_PLUS
Yes
Pending on pptable release
VR0HOT
Yes
Yes
VR1HOT
No
No
FW_CTF
Yes
Yes
FAN CONTROL
Not POR
N/A
THERMAL CONTROL
Yes
Yes
OUT_OF_BAND_MONITOR
Yes
Yes
TEMP_DEPENDENT_VMIN
Yes
Pending on pptable release
GFX CG
NOT SMU feature
Yes
HDP CG
NOT SMU feature
Yes
SDMA CG
NOT SMU feature
Yes
MMHUB CG
NOT SMU feature
Yes
UMC CG
NOT SMU feature
Yes
DF CG
NOT SMU feature
Yes
ATHUB CG
NOT SMU feature
Yes
PSP CG
NOT SMU feature
Checking the readiness
User Mode Stable Power State
NOT SMU feature
Yes
Workload Aware Dynamic Power Management / User Power Control
Yes
Yes
Test Coverage
Test case
MI100 GPU
(D34101)
MI100 mGPU
(D34302*2 XGMI 2P)
MI100 mGPU
(D34302*2 non-XGMI)
Comments
Base
amdgpu_test
Basic Tests
PASS
PASS
PASS
BO Tests
PASS
PASS
PASS
VCN Tests
N/A
N/A
N/A
Skip VCN Test due to Skip VCN IP initialization after switch to FW front door loading.
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022> Assessed
VM Tests
PASS
PASS
PASS
Power
GFX DPM check
PASS
PASS
PASS
Force GFX DPM level check
PASS
PASS
PASS
GFX ULV check
PASS
PASS
PASS
DS GFXCLK check
PASS
PASS
PASS
DS SOCCLK check
PASS
FAIL
PASS
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
DS FCLK check
PASS
FAIL
PASS
[http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png]SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604> Opened
ROCr/KFD
rocm_info
PASS
PASS
PASS
kfdtest
PASS
FAIL
PASS
* KFDPerformanceTest.P2PBandWidthTest and KFDGraphicsInterop.RegisterForeignDeviceMem tests failed via XGMI on Large bar enabled
* [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030> Opened
* Existing issue with large size system memory
* [http://ontrack-internal.amd.com/images/icons/issuetypes/defect.png] SWDEV-201443<http://ontrack-internal.amd.com/browse/SWDEV-201443> Assessed
rocrtst
PASS
PASS
PASS
rocm_bandwidth_test
PASS
PASS
PASS
* Using RBT built in rocm no-npi-dkms build#1060 to verify the data path passed.
* Bad performance via XGMI
rocm-smi
PASS
PASS
PASS
rsmitst
PASS
PASS
PASS
OCL
ocltst
PASS
PASS
PASS
HIP
hipsamples_utils
PASS
PASS
PASS
Frameworks
Tensorflow
tf_convolutional_quick_test
PASS
PASS
PASS
Pytorch unit test
test_autograd
PASS
PASS
PASS
test_nn
PASS
PASS
PASS
MIOpen unit test
MIOpen (HIP)
PASS
PASS
PASS
MIOpen(OpenCL)
PASS
PASS
PASS
Math libs
rocBLAS
PASS
PASS
PASS
Run quick tests only
hipBLAS
PASS
PASS
PASS
Additional Information
Note: All tests run with latest VBIOS/FW/Kernel and ROCm LKG build
Defect list
Key
Summary
triage assignment
target sw release
Assignee
SWDEV-207030<http://ontrack-internal.amd.com/browse/SWDEV-207030?src=confmacro>
[MI100] kfdtest subtests failed on XGMI 2P with large bar enabled<http://ontrack-internal.amd.com/browse/SWDEV-207030?src=confmacro>
VBIOS
Tao, Cherry
SWDEV-204604<http://ontrack-internal.amd.com/browse/SWDEV-204604?src=confmacro>
[MI100 XGMI] UCLK/SOCCLK/FCLK DPM are still disabled with XGMI enabled <http://ontrack-internal.amd.com/browse/SWDEV-204604?src=confmacro>
Base dGPU Enablement
Quan, Evan
SWDEV-203022<http://ontrack-internal.amd.com/browse/SWDEV-203022?src=confmacro>
MI100 VCN engine hangs after FW loading with PSP <http://ontrack-internal.amd.com/browse/SWDEV-203022?src=confmacro>
Multimedia
Staging-DRM-Next
Zhu, James
SWDEV-202188<http://ontrack-internal.amd.com/browse/SWDEV-202188?src=confmacro>
[MI100] HSA_STATUS_ERROR_OUT_OF_RESOURCES when run rocminfo on Gigabyte Eypc platform <http://ontrack-internal.amd.com/browse/SWDEV-202188?src=confmacro>
HSA KFD
Keely, Sean
SWDEV-201817<http://ontrack-internal.amd.com/browse/SWDEV-201817?src=confmacro>
[MI100] rocrtst test failed on Gigabyte Eypc platform <http://ontrack-internal.amd.com/browse/SWDEV-201817?src=confmacro>
Runtime
Keely, Sean
SWDEV-200753<http://ontrack-internal.amd.com/browse/SWDEV-200753?src=confmacro>
[ROCm QA][no-npi-dkms][MI100] XGMI Links not working with 4P/2P <http://ontrack-internal.amd.com/browse/SWDEV-200753?src=confmacro>
Base
ROC-Master
Clements, John
BCC: Rose, Danny <Danny.Rose@amd.com<mailto:Danny.Rose@amd.com>>; dl.MLSE.QA <dl.MLSE.QA@amd.com<mailto:dl.MLSE.QA@amd.com>>; Weyman, Jeff <Jeffrey.Weyman@amd.com<mailto:Jeffrey.Weyman@amd.com>>; Fan, Fai <Fai.Fan@amd.com<mailto:Fai.Fan@amd.com>>; Marsan, Luugi <Luugi.Marsan@amd.com<mailto:Luugi.Marsan@amd.com>>; sw.dl.ERP.LuugiM <sw.dl.ERP.LuugiM@amd.com<mailto:sw.dl.ERP.LuugiM@amd.com>>; dl.srdc_lnx_mi100 <dl.srdc_lnx_mi100@amd.com<mailto:dl.srdc_lnx_mi100@amd.com>>; Tim Writer <Tim.Writer@amd.com<mailto:Tim.Writer@amd.com>>; dl.SRDC_SW_Linux_dev dl.SRDC_SW_Linux_dev@amd.com<mailto:dl.SRDC_SW_Linux_dev@amd.com>; Guo, Miaomiao <Miaomiao.Guo@amd.com<http://amd.com>>; Yao, Yoyo <Yoyo.Yao@amd.com<http://amd.com>>; Jain, Praveen <Praveen.Jain@amd.com<http://amd.com>>; Arora, Jitesh <Jitesh.Arora@amd.com<mailto:Arora@amd.com>>; Zhu, James <James.Zhu@amd.com<http://amd.com>>; Bridgman, John <John.Bridgman@amd.com<http://amd.com>>; Islam, Jamin <Jamin.Islam@amd.com<http://amd.com>>; Koohestani, Ehsan <Ehsan.Koohestani@amd.com<http://amd.com>>; Wang, Cloud <Cloud.Wang@amd.com<http://amd.com>>; Gong, Yakov <Yakov.Gong@amd.com<http://amd.com>>; Yang, Alice (SRDC 3D) <Alice1.Yang@amd.com<mailto:Alice1.Yang@amd.com>>; Ma, Sigil <Sigil.Ma@amd.com<http://amd.com>>; Li, Colin <Colin.Li@amd.com<http://amd.com>>; Tang, Moon <Moon.Tang@amd.com<http://amd.com>>; Khan, Irfan <Irfan.Khan@amd.com<http://amd.com>>; Nasim, Kam <Kam.Nasim@amd.com<http://amd.com>>; Shavakh, Shadi <Shadi.Shavakh@amd.com<http://amd.com>>; Lotfi, Khatereh <Khatereh.Lotfi@amd.com<http://amd.com>>; Feng, Haifeng <Haifeng.Feng@amd.com<http://amd.com>>; Liang, Ming <Ming.Liang@amd.com<http://amd.com>>; "Min.Xu2@amd.com<http://amd.com>";dl.MI100_CTA <dl.MI100_CTA@amd.com<http://amd.com>>; Chen, Joe <Joe.Chen@amd.com<http://amd.com>>
Thanks,
Candice Li
[-- Attachment #2.1.2: Type: text/html, Size: 142872 bytes --]
[-- Attachment #3: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 16:48 ` Grodzovsky, Andrey
[not found] ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 16:48 UTC (permalink / raw)
To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
[-- Attachment #1: Type: text/plain, Size: 9959 bytes --]
On 10/18/19 5:31 PM, Zeng, Oak wrote:
> Hi Andrey,
>
> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
Attached is my lshw
>
> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
What do you mean if this is my Kconfig ? Is there particular Kconfig
flag you know that i can look for ?
Andrey
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
> Sent: Friday, October 18, 2019 4:55 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>> Not that I aware of, is there a special Kconfig flag to determine
>> stack size ?
> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>
> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>
> Thanks,
> Felix
>
>
>> Andrey
>>
>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>> I don't see why this problem would be specific to Arcturus. I don't
>>> see any excessive allocations on the stack either. Also the code
>>> involved here hasn't changed recently.
>>>
>>> Are you using some weird kernel config with a smaller stack? Is it
>>> specific to a compiler version or some optimization flags? I've
>>> sometimes seen function inlining cause excessive stack usage.
>>>
>>> Regards,
>>> Felix
>>>
>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>> He Felix - I see this on boot when working with Arcturus.
>>>>
>>>> Andrey
>>>>
>>>>
>>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>> 103.610769]
>>>> ==================================================================
>>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646] Read
>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>
>>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>>> Call Trace:
>>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>>> print_address_description+0x1da/0x3c0
>>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>>> 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>>> kasan_report+0xe/0x20 [ 103.614149]
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>>> up_write+0x4b/0x70 [ 103.616649]
>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>>> __mutex_unlock_slowpath+0xda/0x420
>>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>>> kmem_cache_alloc_trace+0x154/0x300
>>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>>> do_init_module+0xec/0x354 [ 103.626011] load_module+0x3c91/0x4980
>>>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250 [
>>>> 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
>>>> 0f
>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>> 0000000000000139
>>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>> 00007f09984854d9
>>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>> 0000000000000006
>>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000000000000
>>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>> 0000000000000013
>>>>
>>>> [ 103.626592] The buggy address belongs to the page:
>>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>> mapping:0000000000000000 index:0x0
>>>> [ 103.626675] flags: 0x2ffff0000000000() [ 103.626686] raw:
>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>> 0000000000000000
>>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>>> 00000000ffffffff
>>>> 0000000000000000
>>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>>
>>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>> modprobe/1122 at offset 264 in frame:
>>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>
>>>> [ 103.627346] this frame has 3 objects:
>>>> [ 103.627405] [32, 36) 'avail_size'
>>>> [ 103.627410] [96, 120) 'local_mem_info'
>>>> [ 103.627466] [160, 264) 'cu_info'
>>>>
>>>> [ 103.627602] Memory state around the buggy address:
>>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>> f4 f4
>>>> f4 f2 f2
>>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>> 00 00
>>>> 00 00 00
>>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>> f3 f3
>>>> f3 00 00
>>>> [ 103.627989] ^ [
>>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 00
>>>> 00 00 00
>>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>> f3 00
>>>> 00 00 00
>>>> [ 103.628273]
>>>> ==================================================================
>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[-- Attachment #2: lshw --]
[-- Type: text/plain, Size: 20351 bytes --]
dal@ubuntu-1604-test:~$ sudo lshw
[sudo] password for dal:
ubuntu-1604-test
description: Desktop Computer
product: System Product Name (SKU)
vendor: System manufacturer
version: System Version
serial: System Serial Number
width: 64 bits
capabilities: smbios-3.0 dmi-3.0 vsyscall32
configuration: boot=normal chassis=desktop family=To be filled by O.E.M. sku=SKU uuid=204CDE28-DAD7-DD11-B0DC-38D54727F70C
*-core
description: Motherboard
product: Z170-PRO
vendor: ASUSTeK COMPUTER INC.
physical id: 0
version: Rev 1.xx
serial: 160879880901004
slot: Default string
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 1902
date: 06/27/2016
size: 64KiB
capacity: 15MiB
capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-cache:0
description: L1 cache
physical id: 41
slot: L1 Cache
size: 128KiB
capacity: 128KiB
capabilities: synchronous internal write-back data
configuration: level=1
*-cache:1
description: L1 cache
physical id: 42
slot: L1 Cache
size: 128KiB
capacity: 128KiB
capabilities: synchronous internal write-back instruction
configuration: level=1
*-cache:2
description: L2 cache
physical id: 43
slot: L2 Cache
size: 1MiB
capacity: 1MiB
capabilities: synchronous internal write-back unified
configuration: level=2
*-cache:3
description: L3 cache
physical id: 44
slot: L3 Cache
size: 8MiB
capacity: 8MiB
capabilities: synchronous internal write-back unified
configuration: level=3
*-cpu
description: CPU
product: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
vendor: Intel Corp.
physical id: 45
bus info: cpu@0
version: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
serial: To Be Filled By O.E.M.
slot: LGA1151
size: 3907MHz
capacity: 4200MHz
width: 64 bits
clock: 100MHz
capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp cpufreq
configuration: cores=4 enabledcores=4 threads=8
*-memory
description: System Memory
physical id: 46
slot: System board or motherboard
size: 16GiB
*-bank:0
description: [empty]
physical id: 0
slot: ChannelA-DIMM1
*-bank:1
description: DIMM Synchronous 2133 MHz (0.5 ns)
product: CMK16GX4M2B3000C15
vendor: Corsair
physical id: 1
serial: 00000000
slot: ChannelA-DIMM2
size: 8GiB
width: 64 bits
clock: 2133MHz (0.5ns)
*-bank:2
description: [empty]
physical id: 2
slot: ChannelB-DIMM1
*-bank:3
description: DIMM Synchronous 2133 MHz (0.5 ns)
product: CMK16GX4M2B3000C15
vendor: Corsair
physical id: 3
serial: 00000000
slot: ChannelB-DIMM2
size: 8GiB
width: 64 bits
clock: 2133MHz (0.5ns)
*-pci
description: Host bridge
product: Sky Lake Host Bridge/DRAM Registers
vendor: Intel Corporation
physical id: 100
bus info: pci@0000:00:00.0
version: 07
width: 32 bits
clock: 33MHz
configuration: driver=skl_uncore
resources: irq:0
*-pci:0
description: PCI bridge
product: Sky Lake PCIe Controller (x16)
vendor: Intel Corporation
physical id: 1
bus info: pci@0000:00:01.0
version: 07
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:120 ioport:e000(size=4096) memory:df000000-df1fffff ioport:c0000000(size=270532608)
*-pci
description: PCI bridge
product: Advanced Micro Devices, Inc. [AMD/ATI]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:01:00.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm pciexpress msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:16 memory:df100000-df103fff ioport:e000(size=4096) memory:df000000-df0fffff ioport:c0000000(size=270532608)
*-pci
description: PCI bridge
product: Advanced Micro Devices, Inc. [AMD/ATI]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:02:00.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm pciexpress msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:124 ioport:e000(size=4096) memory:df000000-df0fffff ioport:c0000000(size=270532608)
*-display UNCLAIMED
description: Display controller
product: Advanced Micro Devices, Inc. [AMD/ATI]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:03:00.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi bus_master cap_list
configuration: latency=0
resources: memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:e000(size=256) memory:df000000-df07ffff memory:df080000-df09ffff
*-display
description: VGA compatible controller
product: Sky Lake Integrated Graphics
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 06
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:133 memory:de000000-deffffff memory:b0000000-bfffffff ioport:f000(size=64) memory:c0000-dffff
*-usb
description: USB controller
product: Sunrise Point-H USB 3.0 xHCI Controller
vendor: Intel Corporation
physical id: 14
bus info: pci@0000:00:14.0
version: 31
width: 64 bits
clock: 33MHz
capabilities: pm msi xhci bus_master cap_list
configuration: driver=xhci_hcd latency=0
resources: irq:129 memory:df330000-df33ffff
*-usbhost:0
product: xHCI Host Controller
vendor: Linux 5.3.0-rc3+ xhci-hcd
physical id: 0
bus info: usb@1
logical name: usb1
version: 5.03
capabilities: usb-2.00
configuration: driver=hub slots=16 speed=480Mbit/s
*-usbhost:1
product: xHCI Host Controller
vendor: Linux 5.3.0-rc3+ xhci-hcd
physical id: 1
bus info: usb@2
logical name: usb2
version: 5.03
capabilities: usb-3.00
configuration: driver=hub slots=10 speed=5000Mbit/s
*-communication
description: Communication controller
product: Sunrise Point-H CSME HECI #1
vendor: Intel Corporation
physical id: 16
bus info: pci@0000:00:16.0
version: 31
width: 64 bits
clock: 33MHz
capabilities: pm msi bus_master cap_list
configuration: driver=mei_me latency=0
resources: irq:134 memory:df34d000-df34dfff
*-storage
description: SATA controller
product: Sunrise Point-H SATA controller [AHCI mode]
vendor: Intel Corporation
physical id: 17
bus info: pci@0000:00:17.0
version: 31
width: 32 bits
clock: 66MHz
capabilities: storage msi pm ahci_1.0 bus_master cap_list
configuration: driver=ahci latency=0
resources: irq:132 memory:df348000-df349fff memory:df34c000-df34c0ff ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:df34b000-df34b7ff
*-pci:1
description: PCI bridge
product: Sunrise Point-H PCI Root Port #17
vendor: Intel Corporation
physical id: 1b
bus info: pci@0000:00:1b.0
version: f1
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:121 ioport:2000(size=4096) memory:7b000000-7b1fffff ioport:7b200000(size=2097152)
*-pci:2
description: PCI bridge
product: Sunrise Point-H PCI Express Root Port #1
vendor: Intel Corporation
physical id: 1c
bus info: pci@0000:00:1c.0
version: f1
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:122 ioport:3000(size=8192) memory:df200000-df2fffff ioport:7b400000(size=6291456)
*-pci
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:05:00.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:16 ioport:3000(size=8192) memory:df200000-df2fffff ioport:7b400000(size=6291456)
*-pci:0
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:06:00.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:125
*-pci:1
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 1
bus info: pci@0000:06:01.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:126 ioport:3000(size=4096) ioport:7b400000(size=2097152)
*-pci:2
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:06:02.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:127 memory:df200000-df2fffff
*-usb
description: USB controller
product: Intel Corporation
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:09:00.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pm msi pciexpress xhci cap_list
configuration: driver=xhci_hcd latency=0
resources: irq:130 memory:df200000-df20ffff
*-usbhost:0
product: xHCI Host Controller
vendor: Linux 5.3.0-rc3+ xhci-hcd
physical id: 0
bus info: usb@3
logical name: usb3
version: 5.03
capabilities: usb-2.00
configuration: driver=hub slots=2 speed=480Mbit/s
*-usbhost:1
product: xHCI Host Controller
vendor: Linux 5.3.0-rc3+ xhci-hcd
physical id: 1
bus info: usb@4
logical name: usb4
version: 5.03
capabilities: usb-3.00
configuration: driver=hub slots=2 speed=5000Mbit/s
*-pci:3
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 4
bus info: pci@0000:06:04.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:128 ioport:4000(size=4096) ioport:7b600000(size=2097152)
*-pci:3
description: PCI bridge
product: Sunrise Point-H PCI Express Root Port #9
vendor: Intel Corporation
physical id: 1d
bus info: pci@0000:00:1d.0
version: f1
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:123 ioport:5000(size=4096) memory:7ba00000-7bbfffff ioport:7bc00000(size=2097152)
*-isa
description: ISA bridge
product: Sunrise Point-H LPC Controller
vendor: Intel Corporation
physical id: 1f
bus info: pci@0000:00:1f.0
version: 31
width: 32 bits
clock: 33MHz
capabilities: isa bus_master
configuration: latency=0
*-memory UNCLAIMED
description: Memory controller
product: Sunrise Point-H PMC
vendor: Intel Corporation
physical id: 1f.2
bus info: pci@0000:00:1f.2
version: 31
width: 32 bits
clock: 33MHz (30.3ns)
capabilities: bus_master
configuration: latency=0
resources: memory:df344000-df347fff
*-multimedia
description: Audio device
product: Sunrise Point-H HD Audio
vendor: Intel Corporation
physical id: 1f.3
bus info: pci@0000:00:1f.3
version: 31
width: 64 bits
clock: 33MHz
capabilities: pm msi bus_master cap_list
configuration: driver=snd_hda_intel latency=32
resources: irq:135 memory:df340000-df343fff memory:df320000-df32ffff
*-serial UNCLAIMED
description: SMBus
product: Sunrise Point-H SMBus
vendor: Intel Corporation
physical id: 1f.4
bus info: pci@0000:00:1f.4
version: 31
width: 64 bits
clock: 33MHz
configuration: latency=0
resources: memory:df34a000-df34a0ff ioport:f040(size=32)
*-network
description: Ethernet interface
product: Ethernet Connection (2) I219-V
vendor: Intel Corporation
physical id: 1f.6
bus info: pci@0000:00:1f.6
logical name: enp0s31f6
version: 31
serial: 38:d5:47:27:f7:0c
size: 1Gbit/s
capacity: 1Gbit/s
width: 32 bits
clock: 33MHz
capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=0.7-4 ip=172.27.234.186 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
resources: irq:131 memory:df300000-df31ffff
*-scsi
physical id: 1
logical name: scsi4
capabilities: emulated
*-disk
description: ATA Disk
product: Samsung SSD 850
physical id: 0.0.0
bus info: scsi@4:0.0.0
logical name: /dev/sda
version: 2B6Q
serial: S251NX0H703541J
size: 238GiB (256GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=74397aa1
*-volume:0
description: EXT4 volume
vendor: Linux
physical id: 1
bus info: scsi@4:0.0.0,1
logical name: /dev/sda1
logical name: /
version: 1.0
serial: 80cc92c9-bd8b-47f9-82b8-14d0a93b29f9
size: 109GiB
capacity: 109GiB
capabilities: primary bootable journaled extended_attributes large_files huge_files dir_nlink extents ext4 ext2 initialized
configuration: created=2016-03-17 11:35:41 filesystem=ext4 lastmountpoint=/ modified=2019-10-18 16:56:08 mount.fstype=ext4 mount.options=rw,relatime,errors=remount-ro mounted=2019-10-18 16:22:05 state=mounted
*-volume:1
description: Extended partition
physical id: 2
bus info: scsi@4:0.0.0,2
logical name: /dev/sda2
size: 2043MiB
capacity: 2043MiB
capabilities: primary extended partitioned partitioned:extended
*-logicalvolume
description: Linux swap / Solaris partition
physical id: 5
logical name: /dev/sda5
capacity: 2043MiB
capabilities: nofs
*-power UNCLAIMED
description: To Be Filled By O.E.M.
product: To Be Filled By O.E.M.
vendor: To Be Filled By O.E.M.
physical id: 1
version: To Be Filled By O.E.M.
serial: To Be Filled By O.E.M.
capacity: 32768mWh
[-- Attachment #3: Type: text/plain, Size: 153 bytes --]
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Stack out of bounds in KFD on Arcturus
[not found] ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-22 17:17 ` Zeng, Oak
[not found] ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-22 17:17 UTC (permalink / raw)
To: Grodzovsky, Andrey, Kuehling, Felix
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Sorry I meant is the kernel stack size 16KB in your kconfig?
Oak
-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Tuesday, October 22, 2019 12:49 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus
On 10/18/19 5:31 PM, Zeng, Oak wrote:
> Hi Andrey,
>
> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
Attached is my lshw
>
> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
Andrey
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Kuehling, Felix
> Sent: Friday, October 18, 2019 4:55 PM
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>> Not that I aware of, is there a special Kconfig flag to determine
>> stack size ?
> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>
> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>
> Thanks,
> Felix
>
>
>> Andrey
>>
>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>> I don't see why this problem would be specific to Arcturus. I don't
>>> see any excessive allocations on the stack either. Also the code
>>> involved here hasn't changed recently.
>>>
>>> Are you using some weird kernel config with a smaller stack? Is it
>>> specific to a compiler version or some optimization flags? I've
>>> sometimes seen function inlining cause excessive stack usage.
>>>
>>> Regards,
>>> Felix
>>>
>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>> He Felix - I see this on boot when working with Arcturus.
>>>>
>>>> Andrey
>>>>
>>>>
>>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>> 103.610769]
>>>> ==================================================================
>>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646] Read
>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>
>>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>>> Call Trace:
>>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>>> print_address_description+0x1da/0x3c0
>>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>>> 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>>> kasan_report+0xe/0x20 [ 103.614149]
>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>>> up_write+0x4b/0x70 [ 103.616649]
>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>>> __mutex_unlock_slowpath+0xda/0x420
>>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>>> kmem_cache_alloc_trace+0x154/0x300
>>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>>> do_init_module+0xec/0x354 [ 103.626011] load_module+0x3c91/0x4980
>>>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250
>>>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>> 08 0f
>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>> 0000000000000139
>>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>> 00007f09984854d9
>>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>> 0000000000000006
>>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>> 0000000000000000
>>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>> 0000000000000013
>>>>
>>>> [ 103.626592] The buggy address belongs to the page:
>>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>> mapping:0000000000000000 index:0x0
>>>> [ 103.626675] flags: 0x2ffff0000000000() [ 103.626686] raw:
>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>> 0000000000000000
>>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>>> 00000000ffffffff
>>>> 0000000000000000
>>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>>
>>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>> modprobe/1122 at offset 264 in frame:
>>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>
>>>> [ 103.627346] this frame has 3 objects:
>>>> [ 103.627405] [32, 36) 'avail_size'
>>>> [ 103.627410] [96, 120) 'local_mem_info'
>>>> [ 103.627466] [160, 264) 'cu_info'
>>>>
>>>> [ 103.627602] Memory state around the buggy address:
>>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>> f4 f4
>>>> f4 f2 f2
>>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>> 00 00
>>>> 00 00 00
>>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>> f3 f3
>>>> f3 00 00
>>>> [ 103.627989] ^ [
>>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 00
>>>> 00 00 00
>>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>> f3 00
>>>> 00 00 00
>>>> [ 103.628273]
>>>> ==================================================================
>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 17:28 ` Grodzovsky, Andrey
[not found] ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 17:28 UTC (permalink / raw)
To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
I don't know - what Kconfig flag should I look at ?
Andrey
On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 12:49 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>
>> Hi Andrey,
>>
>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>
> Attached is my lshw
>
>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>
> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>
> Andrey
>
>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Kuehling, Felix
>> Sent: Friday, October 18, 2019 4:55 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>> Not that I aware of, is there a special Kconfig flag to determine
>>> stack size ?
>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>
>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>
>> Thanks,
>> Felix
>>
>>
>>> Andrey
>>>
>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>> I don't see why this problem would be specific to Arcturus. I don't
>>>> see any excessive allocations on the stack either. Also the code
>>>> involved here hasn't changed recently.
>>>>
>>>> Are you using some weird kernel config with a smaller stack? Is it
>>>> specific to a compiler version or some optimization flags? I've
>>>> sometimes seen function inlining cause excessive stack usage.
>>>>
>>>> Regards,
>>>> Felix
>>>>
>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>>> 103.610769]
>>>>> ==================================================================
>>>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646] Read
>>>>> of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>
>>>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>>>> Call Trace:
>>>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>>>> print_address_description+0x1da/0x3c0
>>>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [
>>>>> 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>>>> kasan_report+0xe/0x20 [ 103.614149]
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>>>> up_write+0x4b/0x70 [ 103.616649]
>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>>>> do_init_module+0xec/0x354 [ 103.626011] load_module+0x3c91/0x4980
>>>>> [ 103.626118] ? module_frob_arch_sections+0x20/0x20
>>>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250
>>>>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 01
>>>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>> 00007f09984854d9
>>>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>> 0000000000000006
>>>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>> 0000000000000013
>>>>>
>>>>> [ 103.626592] The buggy address belongs to the page:
>>>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>> mapping:0000000000000000 index:0x0
>>>>> [ 103.626675] flags: 0x2ffff0000000000() [ 103.626686] raw:
>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>> 0000000000000000
>>>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>>>> 00000000ffffffff
>>>>> 0000000000000000
>>>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>>>
>>>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>> modprobe/1122 at offset 264 in frame:
>>>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>
>>>>> [ 103.627346] this frame has 3 objects:
>>>>> [ 103.627405] [32, 36) 'avail_size'
>>>>> [ 103.627410] [96, 120) 'local_mem_info'
>>>>> [ 103.627466] [160, 264) 'cu_info'
>>>>>
>>>>> [ 103.627602] Memory state around the buggy address:
>>>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>> f4 f4
>>>>> f4 f2 f2
>>>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>> 00 00
>>>>> 00 00 00
>>>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>> f3 f3
>>>>> f3 00 00
>>>>> [ 103.627989] ^ [
>>>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00
>>>>> 00 00 00
>>>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>> f3 00
>>>>> 00 00 00
>>>>> [ 103.628273]
>>>>> ==================================================================
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Stack out of bounds in KFD on Arcturus
[not found] ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
@ 2019-10-22 17:46 ` Zeng, Oak
[not found] ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Zeng, Oak @ 2019-10-22 17:46 UTC (permalink / raw)
To: Grodzovsky, Andrey, Kuehling, Felix
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Sorry I searched my kconfig and I didn't find the stack size configure anymore...Maybe today kernel stack size is not configurable anymore...
Can you try your kernel on vega10 or 20 or navi10? We want to know whether this is mi100 specific issue.
Oak
-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: Tuesday, October 22, 2019 1:28 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus
I don't know - what Kconfig flag should I look at ?
Andrey
On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 12:49 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix
> <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>
>> Hi Andrey,
>>
>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>
> Attached is my lshw
>
>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>
> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>
> Andrey
>
>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Kuehling, Felix
>> Sent: Friday, October 18, 2019 4:55 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>> Not that I aware of, is there a special Kconfig flag to determine
>>> stack size ?
>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>
>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>
>> Thanks,
>> Felix
>>
>>
>>> Andrey
>>>
>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>> I don't see why this problem would be specific to Arcturus. I don't
>>>> see any excessive allocations on the stack either. Also the code
>>>> involved here hasn't changed recently.
>>>>
>>>> Are you using some weird kernel config with a smaller stack? Is it
>>>> specific to a compiler version or some optimization flags? I've
>>>> sometimes seen function inlining cause excessive stack usage.
>>>>
>>>> Regards,
>>>> Felix
>>>>
>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>>> 103.610769]
>>>>> ==================================================================
>>>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646]
>>>>> Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>
>>>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>>>> Call Trace:
>>>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>>>> print_address_description+0x1da/0x3c0
>>>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>>>> [ 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>>>> kasan_report+0xe/0x20 [ 103.614149]
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>>>> up_write+0x4b/0x70 [ 103.616649]
>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>>>> do_init_module+0xec/0x354 [ 103.626011]
>>>>> load_module+0x3c91/0x4980 [ 103.626118] ?
>>>>> module_frob_arch_sections+0x20/0x20
>>>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250
>>>>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89
>>>>> 01
>>>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>> 00007f09984854d9
>>>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>> 0000000000000006
>>>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>> 0000000000000013
>>>>>
>>>>> [ 103.626592] The buggy address belongs to the page:
>>>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>> mapping:0000000000000000 index:0x0 [ 103.626675] flags:
>>>>> 0x2ffff0000000000() [ 103.626686] raw:
>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>> 0000000000000000
>>>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>>>> 00000000ffffffff
>>>>> 0000000000000000
>>>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>>>
>>>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>> modprobe/1122 at offset 264 in frame:
>>>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>
>>>>> [ 103.627346] this frame has 3 objects:
>>>>> [ 103.627405] [32, 36) 'avail_size'
>>>>> [ 103.627410] [96, 120) 'local_mem_info'
>>>>> [ 103.627466] [160, 264) 'cu_info'
>>>>>
>>>>> [ 103.627602] Memory state around the buggy address:
>>>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>> f4 f4
>>>>> f4 f2 f2
>>>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>> 00 00
>>>>> 00 00 00
>>>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>> f3 f3
>>>>> f3 00 00
>>>>> [ 103.627989] ^ [
>>>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00
>>>>> 00 00 00
>>>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>> f3 00
>>>>> 00 00 00
>>>>> [ 103.628273]
>>>>> ==================================================================
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Stack out of bounds in KFD on Arcturus
[not found] ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2019-10-22 18:00 ` Grodzovsky, Andrey
0 siblings, 0 replies; 10+ messages in thread
From: Grodzovsky, Andrey @ 2019-10-22 18:00 UTC (permalink / raw)
To: Zeng, Oak, Kuehling, Felix; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
No problem on Vega 20
Andrey
On 10/22/19 1:46 PM, Zeng, Oak wrote:
> Sorry I searched my kconfig and I didn't find the stack size configure anymore...Maybe today kernel stack size is not configurable anymore...
>
> Can you try your kernel on vega10 or 20 or navi10? We want to know whether this is mi100 specific issue.
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 1:28 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> I don't know - what Kconfig flag should I look at ?
>
> Andrey
>
> On 10/22/19 1:17 PM, Zeng, Oak wrote:
>> Sorry I meant is the kernel stack size 16KB in your kconfig?
>>
>> Oak
>>
>> -----Original Message-----
>> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Sent: Tuesday, October 22, 2019 12:49 PM
>> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix
>> <Felix.Kuehling@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>>
>>> Hi Andrey,
>>>
>>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>> Attached is my lshw
>>
>>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>>
>> Andrey
>>
>>
>>> Regards,
>>> Oak
>>>
>>> -----Original Message-----
>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>>> Kuehling, Felix
>>> Sent: Friday, October 18, 2019 4:55 PM
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>>> Cc: amd-gfx@lists.freedesktop.org
>>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>>
>>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>>> Not that I aware of, is there a special Kconfig flag to determine
>>>> stack size ?
>>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>>
>>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>>
>>> Thanks,
>>> Felix
>>>
>>>
>>>> Andrey
>>>>
>>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>>> I don't see why this problem would be specific to Arcturus. I don't
>>>>> see any excessive allocations on the stack either. Also the code
>>>>> involved here hasn't changed recently.
>>>>>
>>>>> Are you using some weird kernel config with a smaller stack? Is it
>>>>> specific to a compiler version or some optimization flags? I've
>>>>> sometimes seen function inlining cause excessive stack usage.
>>>>>
>>>>> Regards,
>>>>> Felix
>>>>>
>>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> [ 103.602092] kfd kfd: Allocated 3969056 bytes on gart [
>>>>>> 103.610769]
>>>>>> ==================================================================
>>>>>> [ 103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.611646]
>>>>>> Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>>
>>>>>> [ 103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O
>>>>>> 5.3.0-rc3+ #45 [ 103.611847] Hardware name: System manufacturer
>>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [ 103.611856]
>>>>>> Call Trace:
>>>>>> [ 103.611879] dump_stack+0x71/0xab [ 103.611907]
>>>>>> print_address_description+0x1da/0x3c0
>>>>>> [ 103.612453] ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>>>>> [ 103.612479] __kasan_report+0x13f/0x1a0 [ 103.613022] ?
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613580] ?
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.613604]
>>>>>> kasan_report+0xe/0x20 [ 103.614149]
>>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [ 103.614762] ?
>>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [ 103.614796] ?
>>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>>> [ 103.614824] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>>> [ 103.614898] ? kmalloc_order+0x63/0x70 [ 103.615469]
>>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [ 103.616054] ?
>>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [ 103.616095] ?
>>>>>> up_write+0x4b/0x70 [ 103.616649]
>>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [ 103.617207] ?
>>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [ 103.617743] ?
>>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [ 103.617777] ?
>>>>>> mutex_lock_io_nested+0xac0/0xac0 [ 103.617807] ?
>>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>>> [ 103.617848] ? __mutex_unlock_slowpath+0xda/0x420
>>>>>> [ 103.617877] ? wait_for_completion+0x200/0x200 [ 103.618461] ?
>>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [ 103.619011] ?
>>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [ 103.619573] ?
>>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [ 103.620112] ?
>>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [ 103.620655] ?
>>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [ 103.621228]
>>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [ 103.621781]
>>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [ 103.622329] ?
>>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [ 103.622344] ?
>>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [ 103.622895] ?
>>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [ 103.623424]
>>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [ 103.623819] ?
>>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 103.623842] ?
>>>>>> __isolate_free_page+0x290/0x290 [ 103.623852] ?
>>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>>> [ 103.623891] ? __alloc_pages_nodemask+0x2c9/0x560
>>>>>> [ 103.623912] ? __alloc_pages_slowpath+0x1390/0x1390
>>>>>> [ 103.623945] ? kasan_unpoison_shadow+0x31/0x40 [ 103.623970] ?
>>>>>> kmalloc_order+0x63/0x70 [ 103.624337]
>>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [ 103.624690] ?
>>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [ 103.624756] ?
>>>>>> drm_dev_register+0x19c/0x310 [drm] [ 103.624768] ?
>>>>>> __kasan_slab_free+0x133/0x160 [ 103.624849]
>>>>>> drm_dev_register+0x1f5/0x310 [drm] [ 103.625212]
>>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [ 103.625565] ?
>>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 103.625580]
>>>>>> local_pci_probe+0x74/0xd0 [ 103.625603]
>>>>>> pci_device_probe+0x1fa/0x310 [ 103.625620] ?
>>>>>> pci_device_remove+0x1c0/0x1c0 [ 103.625640] ?
>>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>>> [ 103.625673] really_probe+0x367/0x5d0 [ 103.625700]
>>>>>> driver_probe_device+0x177/0x1b0 [ 103.625721]
>>>>>> device_driver_attach+0x8a/0x90 [ 103.625737] ?
>>>>>> device_driver_attach+0x90/0x90 [ 103.625746]
>>>>>> __driver_attach+0xeb/0x190 [ 103.625765] ?
>>>>>> device_driver_attach+0x90/0x90 [ 103.625773]
>>>>>> bus_for_each_dev+0xe4/0x160 [ 103.625789] ?
>>>>>> subsys_dev_iter_exit+0x10/0x10 [ 103.625829]
>>>>>> bus_add_driver+0x277/0x330 [ 103.625855]
>>>>>> driver_register+0xc6/0x1a0 [ 103.625866] ? 0xffffffffa0d88000 [
>>>>>> 103.625880] do_one_initcall+0xd3/0x334 [ 103.625895] ?
>>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>>> [ 103.625911] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625924] ?
>>>>>> __kasan_kmalloc+0xd5/0xf0 [ 103.625946] ?
>>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>>> [ 103.625955] ? kasan_unpoison_shadow+0x31/0x40 [ 103.625985]
>>>>>> do_init_module+0xec/0x354 [ 103.626011]
>>>>>> load_module+0x3c91/0x4980 [ 103.626118] ?
>>>>>> module_frob_arch_sections+0x20/0x20
>>>>>> [ 103.626132] ? ima_read_file+0x10/0x10 [ 103.626142] ?
>>>>>> vfs_read+0x127/0x190 [ 103.626163] ? kernel_read+0x95/0xb0 [
>>>>>> 103.626187] ? kernel_read_file+0x1a5/0x340 [ 103.626277] ?
>>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626287]
>>>>>> __do_sys_finit_module+0x175/0x1b0 [ 103.626301] ?
>>>>>> __ia32_sys_init_module+0x40/0x40 [ 103.626338] ?
>>>>>> lock_downgrade+0x390/0x390 [ 103.626396] ?
>>>>>> vtime_user_exit+0xc8/0xe0 [ 103.626423] do_syscall_64+0x7d/0x250
>>>>>> [ 103.626440] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>> [ 103.626450] RIP: 0033:0x7f09984854d9 [ 103.626461] Code: 00 f3
>>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>>> 08 0f
>>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89
>>>>>> 01
>>>>>> 48 [ 103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>>> 0000000000000139
>>>>>> [ 103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>>> 00007f09984854d9
>>>>>> [ 103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>>> 0000000000000006
>>>>>> [ 103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>>> 0000000000000000
>>>>>> [ 103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>>> 0000000000000000
>>>>>> [ 103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>>> 0000000000000013
>>>>>>
>>>>>> [ 103.626592] The buggy address belongs to the page:
>>>>>> [ 103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>>> mapping:0000000000000000 index:0x0 [ 103.626675] flags:
>>>>>> 0x2ffff0000000000() [ 103.626686] raw:
>>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>>> 0000000000000000
>>>>>> [ 103.626696] raw: 0000000000000000 0000000000000000
>>>>>> 00000000ffffffff
>>>>>> 0000000000000000
>>>>>> [ 103.626702] page dumped because: kasan: bad access detected
>>>>>>
>>>>>> [ 103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>>> modprobe/1122 at offset 264 in frame:
>>>>>> [ 103.627233] kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>>
>>>>>> [ 103.627346] this frame has 3 objects:
>>>>>> [ 103.627405] [32, 36) 'avail_size'
>>>>>> [ 103.627410] [96, 120) 'local_mem_info'
>>>>>> [ 103.627466] [160, 264) 'cu_info'
>>>>>>
>>>>>> [ 103.627602] Memory state around the buggy address:
>>>>>> [ 103.627675] ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>>> f4 f4
>>>>>> f4 f2 f2
>>>>>> [ 103.627780] ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>>> 00 00
>>>>>> 00 00 00
>>>>>> [ 103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>>> f3 f3
>>>>>> f3 00 00
>>>>>> [ 103.627989] ^ [
>>>>>> 103.628065] ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> 00
>>>>>> 00 00 00
>>>>>> [ 103.628169] ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>>> f3 00
>>>>>> 00 00 00
>>>>>> [ 103.628273]
>>>>>> ==================================================================
>>>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-10-22 18:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17 20:09 Stack out of bounds in KFD on Arcturus Grodzovsky, Andrey
[not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
2019-10-17 21:29 ` Kuehling, Felix
[not found] ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
2019-10-17 22:38 ` Grodzovsky, Andrey
[not found] ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
2019-10-18 20:55 ` Kuehling, Felix
[not found] ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
2019-10-18 21:31 ` Zeng, Oak
[not found] ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 16:48 ` Grodzovsky, Andrey
[not found] ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:17 ` Zeng, Oak
[not found] ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 17:28 ` Grodzovsky, Andrey
[not found] ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:46 ` Zeng, Oak
[not found] ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 18:00 ` Grodzovsky, Andrey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.