From: Nathan Chancellor <nathan@kernel.org>
To: Yifan Zhang <yifan1.zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org, Felix.Kuehling@amd.com,
llvm@lists.linux.dev
Subject: Re: [PATCH 1/2] drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call
Date: Tue, 8 Mar 2022 10:11:24 -0700 [thread overview]
Message-ID: <YieOPO1WaQ/VnqdD@dev-arch.thelio-3990X> (raw)
In-Reply-To: <20220303080514.680150-1-yifan1.zhang@amd.com>
On Thu, Mar 03, 2022 at 04:05:13PM +0800, Yifan Zhang wrote:
> Fix the NULL point issue:
>
> [ 3076.255609] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 3076.255624] #PF: supervisor instruction fetch in kernel mode
> [ 3076.255637] #PF: error_code(0x0010) - not-present page
> [ 3076.255649] PGD 0 P4D 0
> [ 3076.255660] Oops: 0010 [#1] SMP NOPTI
> [ 3076.255669] CPU: 20 PID: 2415 Comm: kfdtest Tainted: G W OE 5.11.0-41-generic #45~20.04.1-Ubuntu
> [ 3076.255691] Hardware name: AMD Splinter/Splinter-RPL, BIOS VS2326337N.FD 02/07/2022
> [ 3076.255706] RIP: 0010:0x0
> [ 3076.255718] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> [ 3076.255732] RSP: 0018:ffffb64283e3fc10 EFLAGS: 00010297
> [ 3076.255744] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000027
> [ 3076.255759] RDX: ffffb64283e3fc1e RSI: 0000000000000008 RDI: ffff8c7a87f60000
> [ 3076.255776] RBP: ffffb64283e3fc78 R08: ffff8c7d88518ac0 R09: ffffb64283e3fa60
> [ 3076.255791] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000f
> [ 3076.255805] R13: ffff8c7bdcea5800 R14: ffff8c7a9f3f3000 R15: ffff8c7a8696bc00
> [ 3076.255820] FS: 0000000000000000(0000) GS:ffff8c7d88500000(0000) knlGS:0000000000000000
> [ 3076.255839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3076.255851] CR2: ffffffffffffffd6 CR3: 0000000109e3c000 CR4: 0000000000750ee0
> [ 3076.255866] PKRU: 55555554
> [ 3076.255873] Call Trace:
> [ 3076.255884] dbgdev_wave_reset_wavefronts+0x72/0x160 [amdgpu]
> [ 3076.256025] process_termination_cpsch.cold+0x26/0x2f [amdgpu]
> [ 3076.256182] ? ktime_get_mono_fast_ns+0x4e/0xa0
> [ 3076.256196] kfd_process_dequeue_from_all_devices+0x49/0x70 [amdgpu]
> [ 3076.256328] kfd_process_notifier_release+0x187/0x2b0 [amdgpu]
> [ 3076.256451] ? mn_itree_inv_end+0xdc/0x110
> [ 3076.256463] __mmu_notifier_release+0x74/0x1f0
> [ 3076.256474] exit_mmap+0x170/0x200
> [ 3076.256484] ? __handle_mm_fault+0x677/0x920
> [ 3076.256496] ? _cond_resched+0x19/0x30
> [ 3076.256507] mmput+0x5d/0x130
> [ 3076.256518] do_exit+0x332/0xaf0
> [ 3076.256526] ? handle_mm_fault+0xd7/0x2b0
> [ 3076.256537] do_group_exit+0x43/0xa0
> [ 3076.256548] __x64_sys_exit_group+0x18/0x20
> [ 3076.256559] do_syscall_64+0x38/0x90
> [ 3076.256569] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
> ---
> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 1cd2ea536bd0..77364afdc606 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -505,14 +505,16 @@ static int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process
> * to check which VMID the current process is mapped to.
> */
>
> - for (vmid = first_vmid_to_scan; vmid <= last_vmid_to_scan; vmid++) {
> - status = dev->kfd2kgd->get_atc_vmid_pasid_mapping_info
> - (dev->adev, vmid, &queried_pasid);
> -
> - if (status && queried_pasid == p->pasid) {
> - pr_debug("Killing wave fronts of vmid %d and pasid 0x%x\n",
> - vmid, p->pasid);
> - break;
> + if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
> + for (vmid = first_vmid_to_scan; vmid <= last_vmid_to_scan; vmid++) {
> + status = dev->kfd2kgd->get_atc_vmid_pasid_mapping_info
> + (dev->adev, vmid, &queried_pasid);
> +
> + if (status && queried_pasid == p->pasid) {
> + pr_debug("Killing wave fronts of vmid %d and pasid 0x%x\n",
> + vmid, p->pasid);
> + break;
> + }
> }
> }
>
> --
> 2.25.1
>
>
Apologies if this has been reported and fixed already, I have not seen
it if it has.
This patch as commit c8b0507f40de ("drm/amdkfd: judge
get_atc_vmid_pasid_mapping_info before call") in -next causes the
following clang warning, which appears to be legitimate.
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:508:6: error: variable 'vmid' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:521:6: note: uninitialized use occurs here
if (vmid > last_vmid_to_scan) {
^~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:508:2: note: remove the 'if' if its condition is always true
if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:490:19: note: initialize the variable 'vmid' to silence this warning
unsigned int vmid;
^
= 0
1 error generated.
Should this function bail out earlier if
->get_atc_vmid_pasid_mapping_info() is NULL, rather than skipping the
for loop?
Cheers,
Nathan
WARNING: multiple messages have this Message-ID
From: Nathan Chancellor <nathan@kernel.org>
To: Yifan Zhang <yifan1.zhang@amd.com>
Cc: Felix.Kuehling@amd.com, llvm@lists.linux.dev,
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call
Date: Tue, 8 Mar 2022 10:11:24 -0700 [thread overview]
Message-ID: <YieOPO1WaQ/VnqdD@dev-arch.thelio-3990X> (raw)
In-Reply-To: <20220303080514.680150-1-yifan1.zhang@amd.com>
On Thu, Mar 03, 2022 at 04:05:13PM +0800, Yifan Zhang wrote:
> Fix the NULL point issue:
>
> [ 3076.255609] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 3076.255624] #PF: supervisor instruction fetch in kernel mode
> [ 3076.255637] #PF: error_code(0x0010) - not-present page
> [ 3076.255649] PGD 0 P4D 0
> [ 3076.255660] Oops: 0010 [#1] SMP NOPTI
> [ 3076.255669] CPU: 20 PID: 2415 Comm: kfdtest Tainted: G W OE 5.11.0-41-generic #45~20.04.1-Ubuntu
> [ 3076.255691] Hardware name: AMD Splinter/Splinter-RPL, BIOS VS2326337N.FD 02/07/2022
> [ 3076.255706] RIP: 0010:0x0
> [ 3076.255718] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> [ 3076.255732] RSP: 0018:ffffb64283e3fc10 EFLAGS: 00010297
> [ 3076.255744] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000027
> [ 3076.255759] RDX: ffffb64283e3fc1e RSI: 0000000000000008 RDI: ffff8c7a87f60000
> [ 3076.255776] RBP: ffffb64283e3fc78 R08: ffff8c7d88518ac0 R09: ffffb64283e3fa60
> [ 3076.255791] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000f
> [ 3076.255805] R13: ffff8c7bdcea5800 R14: ffff8c7a9f3f3000 R15: ffff8c7a8696bc00
> [ 3076.255820] FS: 0000000000000000(0000) GS:ffff8c7d88500000(0000) knlGS:0000000000000000
> [ 3076.255839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3076.255851] CR2: ffffffffffffffd6 CR3: 0000000109e3c000 CR4: 0000000000750ee0
> [ 3076.255866] PKRU: 55555554
> [ 3076.255873] Call Trace:
> [ 3076.255884] dbgdev_wave_reset_wavefronts+0x72/0x160 [amdgpu]
> [ 3076.256025] process_termination_cpsch.cold+0x26/0x2f [amdgpu]
> [ 3076.256182] ? ktime_get_mono_fast_ns+0x4e/0xa0
> [ 3076.256196] kfd_process_dequeue_from_all_devices+0x49/0x70 [amdgpu]
> [ 3076.256328] kfd_process_notifier_release+0x187/0x2b0 [amdgpu]
> [ 3076.256451] ? mn_itree_inv_end+0xdc/0x110
> [ 3076.256463] __mmu_notifier_release+0x74/0x1f0
> [ 3076.256474] exit_mmap+0x170/0x200
> [ 3076.256484] ? __handle_mm_fault+0x677/0x920
> [ 3076.256496] ? _cond_resched+0x19/0x30
> [ 3076.256507] mmput+0x5d/0x130
> [ 3076.256518] do_exit+0x332/0xaf0
> [ 3076.256526] ? handle_mm_fault+0xd7/0x2b0
> [ 3076.256537] do_group_exit+0x43/0xa0
> [ 3076.256548] __x64_sys_exit_group+0x18/0x20
> [ 3076.256559] do_syscall_64+0x38/0x90
> [ 3076.256569] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
> ---
> .../drm/amd/amdkfd/kfd_device_queue_manager.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 1cd2ea536bd0..77364afdc606 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -505,14 +505,16 @@ static int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process
> * to check which VMID the current process is mapped to.
> */
>
> - for (vmid = first_vmid_to_scan; vmid <= last_vmid_to_scan; vmid++) {
> - status = dev->kfd2kgd->get_atc_vmid_pasid_mapping_info
> - (dev->adev, vmid, &queried_pasid);
> -
> - if (status && queried_pasid == p->pasid) {
> - pr_debug("Killing wave fronts of vmid %d and pasid 0x%x\n",
> - vmid, p->pasid);
> - break;
> + if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
> + for (vmid = first_vmid_to_scan; vmid <= last_vmid_to_scan; vmid++) {
> + status = dev->kfd2kgd->get_atc_vmid_pasid_mapping_info
> + (dev->adev, vmid, &queried_pasid);
> +
> + if (status && queried_pasid == p->pasid) {
> + pr_debug("Killing wave fronts of vmid %d and pasid 0x%x\n",
> + vmid, p->pasid);
> + break;
> + }
> }
> }
>
> --
> 2.25.1
>
>
Apologies if this has been reported and fixed already, I have not seen
it if it has.
This patch as commit c8b0507f40de ("drm/amdkfd: judge
get_atc_vmid_pasid_mapping_info before call") in -next causes the
following clang warning, which appears to be legitimate.
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:508:6: error: variable 'vmid' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:521:6: note: uninitialized use occurs here
if (vmid > last_vmid_to_scan) {
^~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:508:2: note: remove the 'if' if its condition is always true
if (dev->kfd2kgd->get_atc_vmid_pasid_mapping_info) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:490:19: note: initialize the variable 'vmid' to silence this warning
unsigned int vmid;
^
= 0
1 error generated.
Should this function bail out earlier if
->get_atc_vmid_pasid_mapping_info() is NULL, rather than skipping the
for loop?
Cheers,
Nathan
next prev parent reply other threads:[~2022-03-08 17:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-03 8:05 [PATCH 1/2] drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call Yifan Zhang
2022-03-03 8:05 ` [PATCH 2/2] drm/amdkfd: implement get_atc_vmid_pasid_mapping_info for gfx10.3 Yifan Zhang
2022-03-03 15:32 ` Deucher, Alexander
2022-03-08 17:11 ` Nathan Chancellor [this message]
2022-03-08 17:11 ` [PATCH 1/2] drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call Nathan Chancellor
2022-03-08 20:28 ` Felix Kuehling
2022-03-08 20:28 ` Felix Kuehling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YieOPO1WaQ/VnqdD@dev-arch.thelio-3990X \
--to=nathan@kernel.org \
--cc=Felix.Kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=llvm@lists.linux.dev \
--cc=yifan1.zhang@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.