AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
From: Nirmoy <nirmodas@amd.com>
To: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Check entity rq
Date: Wed, 25 Mar 2020 12:03:33 +0100
Message-ID: <985d6068-7dd9-b305-e34f-d9c59a6db11b@amd.com> (raw)
In-Reply-To: <E108D9FC-CE64-4F4F-B5C9-08780CFF1356@amd.com>


On 3/25/20 10:23 AM, Pan, Xinhui wrote:
>
>> 2020年3月25日 15:48,Koenig, Christian <Christian.Koenig@amd.com> 写道:
>>
>> Am 25.03.20 um 06:47 schrieb xinhui pan:
>>> Hit panic during GPU recovery test. drm_sched_entity_select_rq might
>>> set NULL to rq. So add a check like drm_sched_job_init does.
>> NAK, the rq should never be set to NULL in the first place.
>>
>> How did that happened?
> well, I have not check the details.
> but just got the call trace below.
> looks like sched is not ready, and drm_sched_entity_select_rq set entity->rq to NULL.
> in the next amdgpu_vm_sdma_commit, hit panic when we deference entity->rq.

"drm/amdgpu: stop disable the scheduler during HW fini" from Christian 
should've fix it already. But

I can't find that commit in brahma/amd-staging-drm-next.

Regards,

Nirmoy

>
> 297567 [   44.667677] amdgpu 0000:03:00.0: GPU reset begin!
> 297568 [   44.929047] [drm] scheduler sdma0 is not ready, skipping
> 297569 [   44.929048] [drm] scheduler sdma1 is not ready, skipping
> 297570 [   44.934608] [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-2)
> 297571 [   44.947941] BUG: kernel NULL pointer dereference, address: 0000000000000038
> 297572 [   44.955132] #PF: supervisor read access in kernel mode
> 297573 [   44.960451] #PF: error_code(0x0000) - not-present page
> 297574 [   44.965714] PGD 0 P4D 0
> 297575 [   44.968331] Oops: 0000 [#1] SMP PTI
> 297576 [   44.971911] CPU: 7 PID: 2496 Comm: gnome-shell Tainted: G        W         5.4.0-rc7+ #1
> 297577 [   44.980221] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1702 01/28/2016
> 297578 [   44.989177] RIP: 0010:amdgpu_vm_sdma_commit+0x55/0x190 [amdgpu]
> 297579 [   44.995242] Code: 47 20 80 7f 10 00 4c 8b a0 88 01 00 00 48 8b 47 08 4c 8d a8 70 01 00 00 75 07 4c 8d a8 88 02 00 00 49 8b 45 10 41 8b 54 24 08 <48> 8b 40 38 85 d2 48 8d b8 30 ff ff f       f 0f 84 06 01 00 00 48 8b 80
> 297580 [   45.014931] RSP: 0018:ffffb66e008839d0 EFLAGS: 00010246
> 297581 [   45.020504] RAX: 0000000000000000 RBX: ffffb66e00883a30 RCX: 0000000000100400
> 297582 [   45.028062] RDX: 000000000000003c RSI: ffff8df123662138 RDI: ffffb66e00883a30
> 297583 [   45.035662] RBP: ffffb66e00883a00 R08: ffffb66e0088395c R09: ffffb66e00883960
> 297584 [   45.043298] R10: 0000000000100240 R11: 0000000000000035 R12: ffff8df1425385e8
> 297585 [   45.050916] R13: ffff8df13cfd1288 R14: ffff8df123662138 R15: ffff8df13cfd1000
> 297586 [   45.058524] FS:  00007fcc8f6b2100(0000) GS:ffff8df15e380000(0000) knlGS:0000000000000000
> 297587 [   45.067114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 297588 [   45.073206] CR2: 0000000000000038 CR3: 0000000641fb6006 CR4: 00000000003606e0
> 297589 [   45.080791] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 297590 [   45.088277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 297591 [   45.095773] Call Trace:
> 297592 [   45.098354]  amdgpu_vm_bo_update_mapping+0x1c1/0x1f0 [amdgpu]
> 297593 [   45.104427]  ? mark_held_locks+0x4d/0x80
> 297594 [   45.108682]  amdgpu_vm_bo_update+0x3b7/0x960 [amdgpu]
> 297595 [   45.114049]  ? rcu_read_lock_sched_held+0x4f/0x80
> 297596 [   45.119111]  amdgpu_gem_va_ioctl+0x4f3/0x510 [amdgpu]
> 297597 [   45.124495]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
> 297598 [   45.130250]  drm_ioctl_kernel+0xb0/0x100 [drm]
> 297599 [   45.134988]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
> 297600 [   45.140742]  ? drm_ioctl_kernel+0xb0/0x100 [drm]
> 297601 [   45.145622]  drm_ioctl+0x389/0x450 [drm]
> 297602 [   45.149804]  ? amdgpu_gem_va_map_flags+0x70/0x70 [amdgpu]
> 297603 [   45.155551]  ? trace_hardirqs_on+0x3b/0xf0
> 297604 [   45.159892]  amdgpu_drm_ioctl+0x4f/0x80 [amdgpu]
> 297605 [   45.172104]  do_vfs_ioctl+0xa9/0x6f0
> 297606 [   45.175909]  ? tomoyo_file_ioctl+0x19/0x20
> 297607 [   45.180241]  ksys_ioctl+0x75/0x80
> 297608 [   45.183760]  ? do_syscall_64+0x17/0x230
> 297609 [   45.187833]  __x64_sys_ioctl+0x1a/0x20
> 297610 [   45.191846]  do_syscall_64+0x5f/0x230
> 297611 [   45.195764]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 297612 [   45.201126] RIP: 0033:0x7fcc8c7725d7
>
>> Regards,
>> Christian.
>>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>>> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> index cf96c335b258..d30d103e48a2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> @@ -95,6 +95,8 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>>>   	int r;
>>>     	entity = p->direct ? &p->vm->direct : &p->vm->delayed;
>>> +	if (!entity->rq)
>>> +		return -ENOENT;
>>>   	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>>>     	WARN_ON(ib->length_dw == 0);
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cnirmoy.das%40amd.com%7Cd293af82969b445042e008d7d09e3f53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637207250441956201&amp;sdata=cvW%2B%2FlmbeeovS4EHk4VjtC1MTaCAVjHTV%2FitSoAoOD4%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25  5:47 xinhui pan
2020-03-25  7:48 ` Christian König
2020-03-25  9:23   ` Pan, Xinhui
2020-03-25 10:54     ` Pan, Xinhui
2020-03-25 11:03     ` Nirmoy [this message]
2020-03-30 11:11       ` Christian König
2020-03-25 11:07 xinhui pan
2020-03-25 11:14 ` Nirmoy
2020-03-25 11:13   ` Koenig, Christian
2020-03-25 11:34     ` Pan, Xinhui
2020-03-25 11:37     ` Pan, Xinhui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=985d6068-7dd9-b305-e34f-d9c59a6db11b@amd.com \
    --to=nirmodas@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git