All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zeng, Oak" <Oak.Zeng-5C7GfCeVMHo@public.gmane.org>
To: "Grodzovsky,
	Andrey" <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org>,
	"Kuehling, Felix" <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
Cc: "amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: RE: Stack out of bounds in KFD on Arcturus
Date: Tue, 22 Oct 2019 17:46:51 +0000	[thread overview]
Message-ID: <BL0PR12MB258071C07B015BBE3C4CA54A80680@BL0PR12MB2580.namprd12.prod.outlook.com> (raw)
In-Reply-To: <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>

Sorry I searched my kconfig and I didn't find the stack size configure anymore...Maybe today kernel stack size is not configurable anymore...

Can you try your kernel on vega10 or 20 or navi10? We want to know whether this is mi100 specific issue.

Oak

-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com> 
Sent: Tuesday, October 22, 2019 1:28 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: Stack out of bounds in KFD on Arcturus

I don't know - what Kconfig flag should I look at ?

Andrey

On 10/22/19 1:17 PM, Zeng, Oak wrote:
> Sorry I meant is the kernel stack size 16KB in your kconfig?
>
> Oak
>
> -----Original Message-----
> From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
> Sent: Tuesday, October 22, 2019 12:49 PM
> To: Zeng, Oak <Oak.Zeng@amd.com>; Kuehling, Felix 
> <Felix.Kuehling@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: Stack out of bounds in KFD on Arcturus
>
> On 10/18/19 5:31 PM, Zeng, Oak wrote:
>
>> Hi Andrey,
>>
>> What is your system configuration? I didn’t see this issue before. Also see attached QA's configuration - you can compare to see any difference.
>
> Attached is my lshw
>
>> Also I believe for x86-64, the default kernel stack size is 16kb? Is this your Kconfig?
>
> What do you mean if this is my Kconfig ? Is there particular Kconfig flag you know that i can look for ?
>
> Andrey
>
>
>> Regards,
>> Oak
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of 
>> Kuehling, Felix
>> Sent: Friday, October 18, 2019 4:55 PM
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: Stack out of bounds in KFD on Arcturus
>>
>> On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
>>> Not that I aware of, is there a special Kconfig flag to determine 
>>> stack size ?
>> I remember there used to be a Kconfig option to force a 4KB kernel stack. I don't see it in the current kernel any more.
>>
>> I don't have time to work on this myself. I'll create a ticket and see if I can find someone to investigate.
>>
>> Thanks,
>>      Felix
>>
>>
>>> Andrey
>>>
>>> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>>>> I don't see why this problem would be specific to Arcturus. I don't 
>>>> see any excessive allocations on the stack either. Also the code 
>>>> involved here hasn't changed recently.
>>>>
>>>> Are you using some weird kernel config with a smaller stack? Is it 
>>>> specific to a compiler version or some optimization flags? I've 
>>>> sometimes seen function inlining cause excessive stack usage.
>>>>
>>>> Regards,
>>>>        Felix
>>>>
>>>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>>>> He Felix - I see this on boot when working with Arcturus.
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart [ 
>>>>> 103.610769] 
>>>>> ==================================================================
>>>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.611646] 
>>>>> Read of size 4 at addr ffff8883cb19ee38 by task modprobe/1122
>>>>>
>>>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G O 
>>>>> 5.3.0-rc3+ #45 [  103.611847] Hardware name: System manufacturer 
>>>>> System Product Name/Z170-PRO, BIOS 1902 06/27/2016 [  103.611856] 
>>>>> Call Trace:
>>>>> [  103.611879]  dump_stack+0x71/0xab [  103.611907]
>>>>> print_address_description+0x1da/0x3c0
>>>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] 
>>>>> [ 103.612479]  __kasan_report+0x13f/0x1a0 [  103.613022]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613580]  ?
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.613604]
>>>>> kasan_report+0xe/0x20 [  103.614149]
>>>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu] [  103.614762]  ?
>>>>> kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu] [  103.614796]  ?
>>>>> __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.614898]  ? kmalloc_order+0x63/0x70 [  103.615469]
>>>>> kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu] [  103.616054]  ?
>>>>> kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu] [  103.616095]  ?
>>>>> up_write+0x4b/0x70 [  103.616649]
>>>>> kfd_topology_add_device+0x98d/0xb10 [amdgpu] [  103.617207]  ?
>>>>> kfd_topology_shutdown+0x60/0x60 [amdgpu] [  103.617743]  ?
>>>>> start_cpsch+0x2ff/0x3a0 [amdgpu] [  103.617777]  ?
>>>>> mutex_lock_io_nested+0xac0/0xac0 [  103.617807]  ?
>>>>> __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>>>> [  103.617877]  ? wait_for_completion+0x200/0x200 [  103.618461]  ?
>>>>> start_cpsch+0x38b/0x3a0 [amdgpu] [  103.619011]  ?
>>>>> create_queue_cpsch+0x670/0x670 [amdgpu] [  103.619573]  ?
>>>>> kfd_iommu_device_init+0x92/0x1e0 [amdgpu] [  103.620112]  ?
>>>>> kfd_iommu_resume+0x2c/0x2c0 [amdgpu] [  103.620655]  ?
>>>>> kfd_iommu_check_device+0xf0/0xf0 [amdgpu] [  103.621228]
>>>>> kgd2kfd_device_init+0x474/0x870 [amdgpu] [  103.621781]
>>>>> amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu] [  103.622329]  ?
>>>>> amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu] [  103.622344]  ?
>>>>> kmsg_dump_rewind_nolock+0x59/0x59 [  103.622895]  ?
>>>>> amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu] [  103.623424]
>>>>> amdgpu_device_init+0x1bbe/0x2f00 [amdgpu] [  103.623819]  ?
>>>>> amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [  103.623842]  ?
>>>>> __isolate_free_page+0x290/0x290 [  103.623852]  ?
>>>>> fs_reclaim_acquire.part.97+0x5/0x30
>>>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40 [  103.623970]  ?
>>>>> kmalloc_order+0x63/0x70 [  103.624337]
>>>>> amdgpu_driver_load_kms+0xd9/0x430 [amdgpu] [  103.624690]  ?
>>>>> amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu] [  103.624756]  ?
>>>>> drm_dev_register+0x19c/0x310 [drm] [  103.624768]  ?
>>>>> __kasan_slab_free+0x133/0x160 [  103.624849]
>>>>> drm_dev_register+0x1f5/0x310 [drm] [  103.625212]
>>>>> amdgpu_pci_probe+0x109/0x1f0 [amdgpu] [  103.625565]  ?
>>>>> amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [  103.625580]
>>>>> local_pci_probe+0x74/0xd0 [  103.625603]
>>>>> pci_device_probe+0x1fa/0x310 [  103.625620]  ?
>>>>> pci_device_remove+0x1c0/0x1c0 [  103.625640]  ?
>>>>> sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>>>> [  103.625673]  really_probe+0x367/0x5d0 [  103.625700]
>>>>> driver_probe_device+0x177/0x1b0 [  103.625721]
>>>>> device_driver_attach+0x8a/0x90 [  103.625737]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625746]
>>>>> __driver_attach+0xeb/0x190 [  103.625765]  ?
>>>>> device_driver_attach+0x90/0x90 [  103.625773]
>>>>> bus_for_each_dev+0xe4/0x160 [  103.625789]  ?
>>>>> subsys_dev_iter_exit+0x10/0x10 [  103.625829]
>>>>> bus_add_driver+0x277/0x330 [  103.625855]
>>>>> driver_register+0xc6/0x1a0 [  103.625866]  ? 0xffffffffa0d88000 [ 
>>>>> 103.625880]  do_one_initcall+0xd3/0x334 [  103.625895]  ?
>>>>> trace_event_raw_event_initcall_finish+0x150/0x150
>>>>> [  103.625911]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625924]  ?
>>>>> __kasan_kmalloc+0xd5/0xf0 [  103.625946]  ?
>>>>> kmem_cache_alloc_trace+0x154/0x300
>>>>> [  103.625955]  ? kasan_unpoison_shadow+0x31/0x40 [  103.625985]
>>>>> do_init_module+0xec/0x354 [  103.626011]  
>>>>> load_module+0x3c91/0x4980 [  103.626118]  ? 
>>>>> module_frob_arch_sections+0x20/0x20
>>>>> [  103.626132]  ? ima_read_file+0x10/0x10 [  103.626142]  ?
>>>>> vfs_read+0x127/0x190 [  103.626163]  ? kernel_read+0x95/0xb0 [ 
>>>>> 103.626187]  ? kernel_read_file+0x1a5/0x340 [  103.626277]  ?
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626287]
>>>>> __do_sys_finit_module+0x175/0x1b0 [  103.626301]  ?
>>>>> __ia32_sys_init_module+0x40/0x40 [  103.626338]  ?
>>>>> lock_downgrade+0x390/0x390 [  103.626396]  ?
>>>>> vtime_user_exit+0xc8/0xe0 [  103.626423]  do_syscall_64+0x7d/0x250 
>>>>> [ 103.626440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [  103.626450] RIP: 0033:0x7f09984854d9 [  103.626461] Code: 00 f3
>>>>> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
>>>>> 08 0f
>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 29 2c 00 f7 d8 64 89 
>>>>> 01
>>>>> 48 [  103.626468] RSP: 002b:00007ffc42896008 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000139
>>>>> [  103.626479] RAX: ffffffffffffffda RBX: 0000559a52495400 RCX:
>>>>> 00007f09984854d9
>>>>> [  103.626486] RDX: 0000000000000000 RSI: 0000559a52499900 RDI:
>>>>> 0000000000000006
>>>>> [  103.626493] RBP: 0000559a52499900 R08: 0000000000000000 R09:
>>>>> 0000000000000000
>>>>> [  103.626500] R10: 0000000000000006 R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [  103.626508] R13: 0000559a52499b30 R14: 0000000000040000 R15:
>>>>> 0000000000000013
>>>>>
>>>>> [  103.626592] The buggy address belongs to the page:
>>>>> [  103.626665] page:ffffea000f2c6780 refcount:0 mapcount:0
>>>>> mapping:0000000000000000 index:0x0 [  103.626675] flags: 
>>>>> 0x2ffff0000000000() [  103.626686] raw:
>>>>> 02ffff0000000000 0000000000000000 ffffea000f2c6788
>>>>> 0000000000000000
>>>>> [  103.626696] raw: 0000000000000000 0000000000000000 
>>>>> 00000000ffffffff
>>>>> 0000000000000000
>>>>> [  103.626702] page dumped because: kasan: bad access detected
>>>>>
>>>>> [  103.626742] addr ffff8883cb19ee38 is located in stack of task
>>>>> modprobe/1122 at offset 264 in frame:
>>>>> [  103.627233]  kfd_create_vcrat_image_gpu+0x0/0xb80 [amdgpu]
>>>>>
>>>>> [  103.627346] this frame has 3 objects:
>>>>> [  103.627405]  [32, 36) 'avail_size'
>>>>> [  103.627410]  [96, 120) 'local_mem_info'
>>>>> [  103.627466]  [160, 264) 'cu_info'
>>>>>
>>>>> [  103.627602] Memory state around the buggy address:
>>>>> [  103.627675]  ffff8883cb19ed00: 00 00 00 00 00 00 f1 f1 f1 f1 04
>>>>> f4 f4
>>>>> f4 f2 f2
>>>>> [  103.627780]  ffff8883cb19ed80: f2 f2 00 00 00 f4 f2 f2 f2 f2 00
>>>>> 00 00
>>>>> 00 00 00
>>>>> [  103.627885] >ffff8883cb19ee00: 00 00 00 00 00 00 00 f4 f4 f4 f3
>>>>> f3 f3
>>>>> f3 00 00
>>>>> [  103.627989]                                         ^ [ 
>>>>> 103.628065]  ffff8883cb19ee80: 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00
>>>>> 00 00 00
>>>>> [  103.628169]  ffff8883cb19ef00: f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3
>>>>> f3 00
>>>>> 00 00 00
>>>>> [  103.628273]
>>>>> ==================================================================
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2019-10-22 17:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17 20:09 Stack out of bounds in KFD on Arcturus Grodzovsky, Andrey
     [not found] ` <a81a3f82-1f21-663f-150c-cdbbbf231ab3-5C7GfCeVMHo@public.gmane.org>
2019-10-17 21:29   ` Kuehling, Felix
     [not found]     ` <31aa5ae0-5eb4-38ca-aed7-d807ab19e2ca-5C7GfCeVMHo@public.gmane.org>
2019-10-17 22:38       ` Grodzovsky, Andrey
     [not found]         ` <96393d3a-ebf7-3c2b-5b51-6a968ee9b4f8-5C7GfCeVMHo@public.gmane.org>
2019-10-18 20:55           ` Kuehling, Felix
     [not found]             ` <134de413-61fe-a6ee-96ac-73b694fcb94c-5C7GfCeVMHo@public.gmane.org>
2019-10-18 21:31               ` Zeng, Oak
     [not found]                 ` <BL0PR12MB25806E425A051EA059C805EF806C0-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 16:48                   ` Grodzovsky, Andrey
     [not found]                     ` <f865ffcd-2be0-0135-ba78-f78b370aa1fd-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:17                       ` Zeng, Oak
     [not found]                         ` <BL0PR12MB2580ED7FB1607624E3D884B280680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 17:28                           ` Grodzovsky, Andrey
     [not found]                             ` <bbba4ea5-f253-5974-397a-c38f8d4c857f-5C7GfCeVMHo@public.gmane.org>
2019-10-22 17:46                               ` Zeng, Oak [this message]
     [not found]                                 ` <BL0PR12MB258071C07B015BBE3C4CA54A80680-b4cIHhjg/p/XzH18dTCKOgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2019-10-22 18:00                                   ` Grodzovsky, Andrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR12MB258071C07B015BBE3C4CA54A80680@BL0PR12MB2580.namprd12.prod.outlook.com \
    --to=oak.zeng-5c7gfcevmho@public.gmane.org \
    --cc=Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org \
    --cc=Felix.Kuehling-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.