Re: [PATCH] drm/ttm: stop warning on TT shrinker failure

From: "Christian König" <christian.koenig@amd.com>
To: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Dave Chinner" <dchinner@redhat.com>, "Leo Liu" <Leo.Liu@amd.com>
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Wed, 24 Mar 2021 13:07:44 +0100	[thread overview]
Message-ID: <31a52f86-e4af-f1d3-90b2-6eff8ec5f300@amd.com> (raw)
In-Reply-To: <YFsqN7068vUL8rAM@phenom.ffwll.local>

Am 24.03.21 um 13:01 schrieb Daniel Vetter:
> On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote:
>> Am 24.03.21 um 12:55 schrieb Daniel Vetter:
>>> On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
>>>> On 3/23/21 4:45 PM, Christian König wrote:
>>>>> Am 23.03.21 um 16:13 schrieb Michal Hocko:
>>>>>> On Tue 23-03-21 14:56:54, Christian König wrote:
>>>>>>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
>>>>>> [...]
>>>>>>>> Anyway, I am wondering whether the overall approach is
>>>>>>>> sound. Why don't
>>>>>>>> you simply use shmem as your backing storage from the
>>>>>>>> beginning and pin
>>>>>>>> those pages if they are used by the device?
>>>>>>> Yeah, that is exactly what the Intel guys are doing for their
>>>>>>> integrated
>>>>>>> GPUs :)
>>>>>>>
>>>>>>> Problem is for TTM I need to be able to handle dGPUs and those have all
>>>>>>> kinds of funny allocation restrictions. In other words I need to
>>>>>>> guarantee
>>>>>>> that the allocated memory is coherent accessible to the GPU
>>>>>>> without using
>>>>>>> SWIOTLB.
>>>>>>>
>>>>>>> The simple case is that the device can only do DMA32, but you also got
>>>>>>> device which can only do 40bits or 48bits.
>>>>>>>
>>>>>>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>>>>>>> changes (write back vs. write through, vs. uncached).
>>>>>> OK, so the underlying problem seems to be that gfp mask (thus
>>>>>> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
>>>>>> it help if shmem would allow to provide an allocation callback to
>>>>>> override alloc_page_vma which is used currently? I am pretty sure there
>>>>>> will be more to handle but going through shmem for the whole life time
>>>>>> is just so much easier to reason about than some tricks to abuse shmem
>>>>>> just for the swapout path.
>>>>> Well it's a start, but the pages can have special CPU cache settings. So
>>>>> direct IO from/to them usually doesn't work as expected.
>>>>>
>>>>> Additional to that for AGP and CMA I need to make sure that I give those
>>>>> pages back to the relevant subsystems instead of just dropping the page
>>>>> reference.
>>>>>
>>>>> So I would need to block for the swapio to be completed.
>>>>>
>>>>> Anyway I probably need to revert those patches for now since this isn't
>>>>> working as we hoped it would.
>>>>>
>>>>> Thanks for the explanation how stuff works here.
>>>> Another alternative here that I've tried before without being successful
>>>> would perhaps be to drop shmem completely and, if it's a normal page (no dma
>>>> or funny caching attributes) just use add_to_swap_cache()? If it's something
>>>> else, try alloc a page with relevant gfp attributes, copy and
>>>> add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
>>>> either?
>>> So before we toss everything and go an a great rewrite-the-world tour,
>>> what if we just try to split up big objects. So for objects which are
>>> bigger than e.g. 10mb
>>>
>>> - move them to a special "under eviction" list
>>> - keep a note how far we evicted thus far
>>> - interleave allocating shmem pages, copying data and releasing the ttm
>>>     backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
>>>
>>> If that's not enough, occasionally break out of the shrinker entirely so
>>> other parts of reclaim can reclaim the shmem stuff. But just releasing our
>>> own pages as we go should help a lot I think.
>> Yeah, the later is exactly what I was currently prototyping.
>>
>> I just didn't used a limit but rather a only partially evicted BOs list
>> which is used when we fail to allocate a page.
>>
>> For the 5.12 cycle I think we should just go back to a hard 50% limit for
>> now and then resurrect this when we have solved the issues.
> Can we do the 50% limit without tossing out all the code we've done thus
> far? Just so this doesn't get too disruptive.

Yeah, I just need to get back to v1 of this patch. Before you convinced 
me that the shrinker is the better approach .)

Cheers,
Christian.

> -Daniel