linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: "Thomas Hellström (Intel)" <thomas_os@shipmail.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Linux MM" <linux-mm@kvack.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Dave Chinner" <dchinner@redhat.com>, "Leo Liu" <Leo.Liu@amd.com>
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Wed, 24 Mar 2021 20:20:57 +0100	[thread overview]
Message-ID: <YFuRGdLPF8FX0Be1@phenom.ffwll.local> (raw)
In-Reply-To: <31a52f86-e4af-f1d3-90b2-6eff8ec5f300@amd.com>

On Wed, Mar 24, 2021 at 01:07:44PM +0100, Christian König wrote:
> 
> 
> Am 24.03.21 um 13:01 schrieb Daniel Vetter:
> > On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote:
> > > Am 24.03.21 um 12:55 schrieb Daniel Vetter:
> > > > On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
> > > > > On 3/23/21 4:45 PM, Christian König wrote:
> > > > > > Am 23.03.21 um 16:13 schrieb Michal Hocko:
> > > > > > > On Tue 23-03-21 14:56:54, Christian König wrote:
> > > > > > > > Am 23.03.21 um 14:41 schrieb Michal Hocko:
> > > > > > > [...]
> > > > > > > > > Anyway, I am wondering whether the overall approach is
> > > > > > > > > sound. Why don't
> > > > > > > > > you simply use shmem as your backing storage from the
> > > > > > > > > beginning and pin
> > > > > > > > > those pages if they are used by the device?
> > > > > > > > Yeah, that is exactly what the Intel guys are doing for their
> > > > > > > > integrated
> > > > > > > > GPUs :)
> > > > > > > > 
> > > > > > > > Problem is for TTM I need to be able to handle dGPUs and those have all
> > > > > > > > kinds of funny allocation restrictions. In other words I need to
> > > > > > > > guarantee
> > > > > > > > that the allocated memory is coherent accessible to the GPU
> > > > > > > > without using
> > > > > > > > SWIOTLB.
> > > > > > > > 
> > > > > > > > The simple case is that the device can only do DMA32, but you also got
> > > > > > > > device which can only do 40bits or 48bits.
> > > > > > > > 
> > > > > > > > On top of that you also got AGP, CMA and stuff like CPU cache behavior
> > > > > > > > changes (write back vs. write through, vs. uncached).
> > > > > > > OK, so the underlying problem seems to be that gfp mask (thus
> > > > > > > mapping_gfp_mask) cannot really reflect your requirements, right?  Would
> > > > > > > it help if shmem would allow to provide an allocation callback to
> > > > > > > override alloc_page_vma which is used currently? I am pretty sure there
> > > > > > > will be more to handle but going through shmem for the whole life time
> > > > > > > is just so much easier to reason about than some tricks to abuse shmem
> > > > > > > just for the swapout path.
> > > > > > Well it's a start, but the pages can have special CPU cache settings. So
> > > > > > direct IO from/to them usually doesn't work as expected.
> > > > > > 
> > > > > > Additional to that for AGP and CMA I need to make sure that I give those
> > > > > > pages back to the relevant subsystems instead of just dropping the page
> > > > > > reference.
> > > > > > 
> > > > > > So I would need to block for the swapio to be completed.
> > > > > > 
> > > > > > Anyway I probably need to revert those patches for now since this isn't
> > > > > > working as we hoped it would.
> > > > > > 
> > > > > > Thanks for the explanation how stuff works here.
> > > > > Another alternative here that I've tried before without being successful
> > > > > would perhaps be to drop shmem completely and, if it's a normal page (no dma
> > > > > or funny caching attributes) just use add_to_swap_cache()? If it's something
> > > > > else, try alloc a page with relevant gfp attributes, copy and
> > > > > add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
> > > > > either?
> > > > So before we toss everything and go an a great rewrite-the-world tour,
> > > > what if we just try to split up big objects. So for objects which are
> > > > bigger than e.g. 10mb
> > > > 
> > > > - move them to a special "under eviction" list
> > > > - keep a note how far we evicted thus far
> > > > - interleave allocating shmem pages, copying data and releasing the ttm
> > > >     backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
> > > > 
> > > > If that's not enough, occasionally break out of the shrinker entirely so
> > > > other parts of reclaim can reclaim the shmem stuff. But just releasing our
> > > > own pages as we go should help a lot I think.
> > > Yeah, the later is exactly what I was currently prototyping.
> > > 
> > > I just didn't used a limit but rather a only partially evicted BOs list
> > > which is used when we fail to allocate a page.
> > > 
> > > For the 5.12 cycle I think we should just go back to a hard 50% limit for
> > > now and then resurrect this when we have solved the issues.
> > Can we do the 50% limit without tossing out all the code we've done thus
> > far? Just so this doesn't get too disruptive.
> 
> Yeah, I just need to get back to v1 of this patch. Before you convinced me
> that the shrinker is the better approach .)

I don't think there's anything else than the shrinker if you want
dynamically sized memory usage. Or pinning it all. Implementing our own
kswapd and not tying into direct reclaim does not sound like a good idea
to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


  reply	other threads:[~2021-03-24 19:21 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210319140857.2262-1-christian.koenig@amd.com>
     [not found] ` <YFTk1GSaUDI3wcWt@phenom.ffwll.local>
     [not found]   ` <2831bfcc-140e-dade-1f50-a6431e495e9d@gmail.com>
     [not found]     ` <YFT2LSR97rkkPyEP@phenom.ffwll.local>
     [not found]       ` <1ae415c4-8e49-5183-b44d-bc92088657d5@gmail.com>
     [not found]         ` <CAKMK7uEDhuvSwJj5CX8vHgLb+5zm=rdJPmXwb-VQWdrW6GwQZw@mail.gmail.com>
     [not found]           ` <e6e9df3e-cd2b-d80f-205d-6ca1865819b2@gmail.com>
2021-03-22 13:49             ` [PATCH] drm/ttm: stop warning on TT shrinker failure Daniel Vetter
2021-03-22 14:05               ` Matthew Wilcox
2021-03-22 14:22                 ` Daniel Vetter
2021-03-22 15:57                 ` Michal Hocko
2021-03-22 17:02                   ` Daniel Vetter
2021-03-22 19:34                     ` Christian König
2021-03-23  7:38                       ` Michal Hocko
2021-03-23 11:28                         ` Daniel Vetter
2021-03-23 11:46                           ` Michal Hocko
2021-03-23 11:51                             ` Christian König
2021-03-23 12:00                               ` Daniel Vetter
2021-03-23 12:05                               ` Michal Hocko
2021-03-23 11:48                           ` Christian König
2021-03-23 12:04                             ` Michal Hocko
2021-03-23 12:21                               ` Christian König
2021-03-23 12:37                                 ` Michal Hocko
2021-03-23 13:06                                   ` Christian König
2021-03-23 13:41                                     ` Michal Hocko
2021-03-23 13:56                                       ` Christian König
2021-03-23 15:13                                         ` Michal Hocko
2021-03-23 15:45                                           ` Christian König
2021-03-24 10:19                                             ` Thomas Hellström (Intel)
2021-03-24 11:55                                               ` Daniel Vetter
2021-03-24 12:00                                                 ` Christian König
2021-03-24 12:01                                                   ` Daniel Vetter
2021-03-24 12:07                                                     ` Christian König
2021-03-24 19:20                                                       ` Daniel Vetter [this message]
2021-03-23 13:15                               ` Daniel Vetter
2021-03-23 13:48                                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFuRGdLPF8FX0Be1@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Leo.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dchinner@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=thomas_os@shipmail.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).