All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Andrey Grodzovsky" <andrey.grodzovsky@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	chris@chris-wilson.co.uk,
	"daniel.vetter@ffwll.ch" <daniel.vetter@ffwll.ch>
Subject: Re: Lockdep spalt on killing a processes
Date: Fri, 29 Oct 2021 09:07:42 +0200	[thread overview]
Message-ID: <6aa79474-e998-368b-bb53-b43f135f5a0c@amd.com> (raw)
In-Reply-To: <a0a54261-f83a-8402-31dd-009588adece6@amd.com>

Am 28.10.21 um 19:26 schrieb Andrey Grodzovsky:
>
> On 2021-10-27 3:58 p.m., Andrey Grodzovsky wrote:
>>
>> On 2021-10-27 10:50 a.m., Christian König wrote:
>>> Am 27.10.21 um 16:47 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-10-27 10:34 a.m., Christian König wrote:
>>>>> Am 27.10.21 um 16:27 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>
>>>>>>>> Let me please know if I am still missing some point of yours.
>>>>>>>
>>>>>>> Well, I mean we need to be able to handle this for all drivers.
>>>>>>
>>>>>>
>>>>>> For sure, but as i said above in my opinion we need to change 
>>>>>> only for those drivers that don't use the _locked version.
>>>>>
>>>>> And that absolutely won't work.
>>>>>
>>>>> See the dma_fence is a contract between drivers, so you need the 
>>>>> same calling convention between all drivers.
>>>>>
>>>>> Either we always call the callback with the lock held or we always 
>>>>> call it without the lock, but sometimes like that and sometimes 
>>>>> otherwise won't work.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> I am not sure I fully understand what problems this will cause but 
>>>> anyway, then we are back to irq_work. We cannot embed irq_work as 
>>>> union within dma_fenc's cb_list
>>>> because it's already reused as timestamp and as rcu head after the 
>>>> fence is signaled. So I will do it within drm_scheduler with single 
>>>> irq_work per drm_sched_entity
>>>> as we discussed before.
>>>
>>> That won't work either. We free up the entity after the cleanup 
>>> function. That's the reason we use the callback on the job in the 
>>> first place.
>>
>>
>> Yep, missed it.
>>
>>
>>>
>>> We could overlead the cb structure in the job though.
>>
>>
>> I guess, since no one else is using this member it after the cb 
>> executed.
>>
>> Andrey
>
>
> Attached a patch. Give it a try please, I tested it on my side and 
> tried to generate the right conditions to trigger this code path by 
> repeatedly submitting commands while issuing GPU reset to stop the 
> scheduler and then killing command submissions process in the middle. 
> But for some reason looks like the job_queue was always empty already 
> at the time of entity kill.

It was trivial to trigger with the stress utility I've hacked together:

amdgpu_stress -b v 1g -b g 1g -c 1 2 1g 1k

Then while it is copying just cntrl+c to kill it.

The patch itself is:

Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

Thanks,
Christian.

>
> Andrey
>
>
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>
>>>


  reply	other threads:[~2021-10-29  7:08 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-01 10:50 Lockdep spalt on killing a processes Christian König
2021-10-01 14:52 ` Daniel Vetter
2021-10-01 15:10 ` Andrey Grodzovsky
2021-10-04  8:14   ` Christian König
2021-10-04 15:27     ` Andrey Grodzovsky
2021-10-20 19:32     ` Andrey Grodzovsky
2021-10-21  6:34       ` Christian König
2021-10-25 19:10         ` Andrey Grodzovsky
2021-10-25 19:56           ` Christian König
2021-10-26  2:33             ` Andrey Grodzovsky
2021-10-26 10:54               ` Christian König
2021-10-27 14:27                 ` Andrey Grodzovsky
2021-10-27 14:34                   ` Christian König
2021-10-27 14:47                     ` Andrey Grodzovsky
2021-10-27 14:50                       ` Christian König
2021-10-27 19:58                         ` Andrey Grodzovsky
2021-10-28 17:26                           ` Andrey Grodzovsky
2021-10-29  7:07                             ` Christian König [this message]
2021-11-01 15:24                               ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6aa79474-e998-368b-bb53-b43f135f5a0c@amd.com \
    --to=christian.koenig@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.