Re: [PATCH 1/2] drm/ttm: cleanup and add TTM_PL_FLAG_TEMPORARY

From: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>
To: "Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Yu, Lang" <Lang.Yu@amd.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 1/2] drm/ttm: cleanup and add TTM_PL_FLAG_TEMPORARY
Date: Mon, 31 May 2021 14:09:59 +0200	[thread overview]
Message-ID: <7d545d70-8212-3fbe-e803-5198af69f5c2@shipmail.org> (raw)
In-Reply-To: <cd081dda-7430-d891-6b02-8aedda7b67be@amd.com>


On 5/31/21 2:02 PM, Christian König wrote:
> Am 31.05.21 um 13:19 schrieb Thomas Hellström (Intel):
>>
>> On 5/31/21 12:56 PM, Christian König wrote:
>>> Am 31.05.21 um 12:46 schrieb Thomas Hellström (Intel):
>>>>
>>>> On 5/31/21 12:32 PM, Christian König wrote:
>>>>> Am 31.05.21 um 11:52 schrieb Thomas Hellström (Intel):
>>>>>> Hi, Lang,
>>>>>>
>>>>>> On 5/31/21 10:19 AM, Yu, Lang wrote:
>>>>>>> [Public]
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> On 5/27/21 3:30 AM, Lang Yu wrote:
>>>>>>>>> Make TTM_PL_FLAG_* start from zero and add
>>>>>>>>> TTM_PL_FLAG_TEMPORARY flag for temporary
>>>>>>>>> GTT allocation use.
>>>>>>>> GTT is a driver private acronym, right? And it doesn't look like
>>>>>>>> TTM_PL_FLAG_TEMPORARY will be used in core TTM, so should we 
>>>>>>>> instead set
>>>>>>>> aside a mask in the PL flag for driver-private use?
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> Thanks for your comments and advice, GTT means Graphics 
>>>>>>> Translation Table here, seems
>>>>>>> a general acronym. TTM_PL_FLAG_TEMPORARY may also be used by 
>>>>>>> other drives.
>>>>>>> I have made other patches for this. Please help review.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Lang
>>>>>>>
>>>>>> My point was really that the flag naming and documentation should 
>>>>>> reflect what core ttm is doing with that flag. If there is no 
>>>>>> specific core TTM usage, IMO we should move it to a driver 
>>>>>> specific flag to avoid future confusion. In particular a writer 
>>>>>> of a generic TTM resource manager should be able to know without 
>>>>>> looking at an old commit message what the placement flag is 
>>>>>> intended for.
>>>>>>
>>>>>> So here we add a flag where core TTM forces a bo move on validate 
>>>>>> and that's it. And that appears to be how it's used when an 
>>>>>> amdgpu bo is evicted to GTT, (btw should it be accounted in this 
>>>>>> situation?)
>>>>>>
>>>>>> The other use is to force the amdgpu driver to temporarily accept 
>>>>>> it into GTT when there is a lack of space, and IMHO that's a 
>>>>>> driver specific use and we should allocate a mask for driver 
>>>>>> specific flags for that.
>>>>>>
>>>>>> So shouldn't this be two flags, really?
>>>>>
>>>>> Well one flag makes sense for the use case at hand that drivers 
>>>>> want to signal to TTM that an allocation is only temporary and not 
>>>>> considered valid.
>>>>>
>>>>> That we then use this flag to implement temporary GTT allocations 
>>>>> to avoid problems during eviction is just extending that use case.
>>>>>
>>>> OK, but it looked like there were actually two use-cases. One for 
>>>> possibly long-term VRAM evictions to GTT, the other for the 
>>>> temporary use-case where the hop resource allocations sometimes 
>>>> failed. Or did I misunderstand the code?
>>>
>>> Ok sounds like we need more documentation here. That's really one 
>>> use case.
>>>
>>> Key point is we need temporary allocation during multihop which 
>>> should be handled differently to normal allocations.
>>
>> Yes, that part is clear from the patches. The part that I can't fit 
>> into that pattern is why the evict flags when evicting from visible 
>> VRAM to GTT or ordinary VRAM is marked with TTM_PL_FLAG_TEMPORARY. 
>> Wouldn't those remain evicted in that placement until re-validated to 
>> visible VRAM at an unknown future time?
>
> Not necessarily.
>
> The situation we ran into was the following:
> 1. OOM on VRAM, we try to evict.
>
> 2. GTT space is used up as well, ok evict directly to SYSTEM.
>
> 3. For VRAM->SYSTEM eviction we use a temporary bounce buffer.
>
> 4. Waiting for the bounce buffer to become idle is interrupted by a 
> signal so BO is still backed by bounce buffer.
>
> 5. Next CS, BO is validated with VRAM|GTT. TTM sees that it is in GTT 
> and doesn't move the buffer.
>
> 6. CS goes into nirvana because bounce buffers are not meant to be 
> used for CS (we can ignore alignment and accounting for them).
>
Yes, makes sense to me.

/Thomas