Re: [PATCH 1/1] drm/amdgpu: disable gpu_sched load balancer for vcn jobs

From: Luben Tuikov <luben.tuikov@amd.com>
To: Nirmoy <nirmodas@amd.com>,
	christian.koenig@amd.com,
	Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>,
	amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com, Boyuan.Zhang@amd.com,
	nirmoy.das@amd.com, Leo.Liu@amd.com, James.Zhu@amd.com
Subject: Re: [PATCH 1/1] drm/amdgpu: disable gpu_sched load balancer for vcn jobs
Date: Tue, 17 Mar 2020 16:46:24 -0400	[thread overview]
Message-ID: <4616f7cc-ce70-d74c-5a62-d736fec08085@amd.com> (raw)
In-Reply-To: <3fac4046-0c2c-7fa9-7c83-6af9149e50bf@amd.com>

On 2020-03-12 06:56, Nirmoy wrote:
> 
> On 3/12/20 9:50 AM, Christian König wrote:
>> Am 11.03.20 um 21:55 schrieb Nirmoy:
>>>
>>> On 3/11/20 9:35 PM, Andrey Grodzovsky wrote:
>>>>
>>>> On 3/11/20 4:32 PM, Nirmoy wrote:
>>>>>
>>>>> On 3/11/20 9:02 PM, Andrey Grodzovsky wrote:
>>>>>>
>>>>>> On 3/11/20 4:00 PM, Andrey Grodzovsky wrote:
>>>>>>>
>>>>>>> On 3/11/20 4:00 PM, Nirmoy Das wrote:
>>>>>>>> [SNIP]
>>>>>>>> @@ -1257,6 +1258,9 @@ static int amdgpu_cs_submit(struct 
>>>>>>>> amdgpu_cs_parser *p,
>>>>>>>>       priority = job->base.s_priority;
>>>>>>>>       drm_sched_entity_push_job(&job->base, entity);
>>>>>>>>   +    if (ring->funcs->no_gpu_sched_loadbalance)
>>>>>>>> + amdgpu_ctx_disable_gpu_sched_load_balance(entity);
>>>>>>>> +
>>>>>>>
>>>>>>>
>>>>>>> Why this needs to be done each time a job is submitted and not 
>>>>>>> once in drm_sched_entity_init (same foramdgpu_job_submit bellow ?)
>>>>>>>
>>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>> My bad - not in drm_sched_entity_init but in relevant amdgpu code.
>>>>>
>>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> Do you mean drm_sched_job_init() or after creating VCN entities?
>>>>>
>>>>>
>>>>> Nirmoy
>>>>
>>>>
>>>> I guess after creating the VCN entities (has to be amdgpu specific 
>>>> code) - I just don't get why it needs to be done each time job is 
>>>> submitted, I mean - since you set .no_gpu_sched_loadbalance = true 
>>>> anyway this is always true and so shouldn't you just initialize the 
>>>> VCN entity with a schedulers list consisting of one scheduler and 
>>>> that it ?
>>>
>>>
>>> Assumption: If I understand correctly we shouldn't be doing load 
>>> balance among VCN jobs in the same context. Christian, James and Leo 
>>> can clarify that if I am wrong.
>>>
>>> But we can still do load balance of VNC jobs among multiple contexts. 
>>> That load balance decision happens in drm_sched_entity_init(). If we 
>>> initialize VCN entity with one scheduler then
>>>
>>> all entities irrespective of context gets that one scheduler which 
>>> means we are not utilizing extra VNC instances.
>>
>> Andrey has a very good point here. So far we only looked at this from 
>> the hardware requirement side that we can't change the ring after the 
>> first submission any more.
>>
>> But it is certainly valuable to keep the extra overhead out of the hot 
>> path during command submission.
> 
> 
> 
>>
>>> Ideally we should be calling 
>>> amdgpu_ctx_disable_gpu_sched_load_balance() only once after 1st call 
>>> of drm_sched_entity_init() of a VCN job. I am not sure how to do that 
>>> efficiently.
>>>
>>> Another option might be to copy the logic of 
>>> drm_sched_entity_get_free_sched() and choose suitable VNC sched 
>>> at/after VCN entity creation.
>>
>> Yes, but we should not copy the logic but rather refactor it :)
>>
>> Basically we need a drm_sched_pick_best() function which gets an array 
>> of drm_gpu_scheduler structures and returns the one with the least 
>> load on it.
>>
>> This function can then be used by VCN to pick one instance before 
>> initializing the entity as well as a replacement for 
>> drm_sched_entity_get_free_sched() to change the scheduler for load 
>> balancing.
> 
> 
> This sounds like a optimum solution here.
> 
> Thanks Andrey and Christian. I will resend with suggested changes.

Note that this isn't an optimal solution. Note that drm_sched_pick_best()
and drm_sched_entity_get_free_sched() (these names are too long), are similar
in what they do, in that they pick a scheduler, which is still a centralized
decision making.

An optimal solution would be for each execution unit to pick work
when work is available, which is a decentralized decision model.

Not sure how an array would be used, as the proposition here is
laid out--would that be an O(n) search through the array?

In any case, centralized decision making introduces a bottleneck. Decentralized
solutions are available for scheduling with O(1) time complexity.

Regards,
Luben

> 
> 
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> Regards,
>>>
>>> Nirmoy
>>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cluben.tuikov%40amd.com%7C903b7b3a6faf480f1a7908d7c6738bae%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637196071895640764&amp;sdata=RrPrZ5aHVOhMd5H8wEqCt%2FPPSBNLCyRVwDoLBU4p3Iw%3D&amp;reserved=0
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx