linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Linux MM <linux-mm@kvack.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Dave Chinner <dchinner@redhat.com>, Leo Liu <Leo.Liu@amd.com>
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Tue, 23 Mar 2021 12:48:58 +0100	[thread overview]
Message-ID: <cd17d2ca-140e-1e69-37ac-c2726cc1ef9f@gmail.com> (raw)
In-Reply-To: <YFnQ1Lk6BTo8L7pr@phenom.ffwll.local>

Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
>> On Mon 22-03-21 20:34:25, Christian König wrote:
>>> Am 22.03.21 um 18:02 schrieb Daniel Vetter:
>>>> On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
>>>>> On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
>>>>>> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
>>>>>>> On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
>>>>>>>> Am 20.03.21 um 14:17 schrieb Daniel Vetter:
>>>>>>>>> On Sat, Mar 20, 2021 at 10:04 AM Christian König
>>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>>> Am 19.03.21 um 20:06 schrieb Daniel Vetter:
>>>>>>>>>>> On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
>>>>>>>>>>>> Am 19.03.21 um 18:52 schrieb Daniel Vetter:
>>>>>>>>>>>>> On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
>>>>>>>>>>>>>> Don't print a warning when we fail to allocate a page for swapping things out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
>>>>>>>>>>>>> Uh this part doesn't make sense. Especially since you only do it for the
>>>>>>>>>>>>> debugfs file, not in general. Which means you've just completely broken
>>>>>>>>>>>>> the shrinker.
>>>>>>>>>>>> Are you sure? My impression is that GFP_NOFS should now work much more out
>>>>>>>>>>>> of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
>>>>>>>>>>> Yeah, if you'd put it in the right place :-)
>>>>>>>>>>>
>>>>>>>>>>> But also -mm folks are very clear that memalloc_no*() family is for dire
>>>>>>>>>>> situation where there's really no other way out. For anything where you
>>>>>>>>>>> know what you're doing, you really should use explicit gfp flags.
>>>>>>>>>> My impression is just the other way around. You should try to avoid the
>>>>>>>>>> NOFS/NOIO flags and use the memalloc_no* approach instead.
>>>>>>>>> Where did you get that idea?
>>>>>>>> Well from the kernel comment on GFP_NOFS:
>>>>>>>>
>>>>>>>>    * %GFP_NOFS will use direct reclaim but will not use any filesystem
>>>>>>>> interfaces.
>>>>>>>>    * Please try to avoid using this flag directly and instead use
>>>>>>>>    * memalloc_nofs_{save,restore} to mark the whole scope which
>>>>>>>> cannot/shouldn't
>>>>>>>>    * recurse into the FS layer with a short explanation why. All allocation
>>>>>>>>    * requests will inherit GFP_NOFS implicitly.
>>>>>>> Huh that's interesting, since iirc Willy or Dave told me the opposite, and
>>>>>>> the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
>>>>>>> GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
>>>>>>>
>>>>>>> Adding them, maybe I got confused.
>>>>>> My impression is that the scoped API is preferred these days.
>>>>>>
>>>>>> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
>>>>>>
>>>>>> I'd probably need to spend a few months learning the DRM subsystem to
>>>>>> have a more detailed opinion on whether passing GFP flags around explicitly
>>>>>> or using the scope API is the better approach for your situation.
>>>>> yes, in an ideal world we would have a clearly defined scope of the
>>>>> reclaim recursion wrt FS/IO associated with it. I've got back to
>>>>> https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
>>>>> and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
>>>>> really require NOFS semantic? And why does it play with
>>>>> fs_reclaim_acquire?
>>>> It's our shrinker. shrink_show simply triggers that specific shrinker
>>>> asking it to shrink everything it can, which helps a lot with testing
>>>> without having to drive the entire system against the OOM wall.
>> Yes I figured that much. But...
>>
>>>> fs_reclaim_acquire is there to make sure lockdep understands that this
>>>> is a shrinker and that it checks all the dependencies for us like if
>>>> we'd be in real reclaim. There is some drop caches interfaces in proc
>>>> iirc, but those drop everything, and they don't have the fs_reclaim
>>>> annotations to teach lockdep about what we're doing.
>> ... I really do not follow this. You shouldn't really care whether this
>> is a reclaim interface or not. Or maybe I just do not understand this...
> We're heavily relying on lockdep and fs_reclaim to make sure we get it all
> right. So any drop caches interface that isn't wrapped in fs_reclaim
> context is kinda useless for testing. Plus ideally we want to only hit our
> own paths, and not trash every other cache in the system. Speed matters in
> CI.
>
>>> To summarize the debugfs code is basically to test if that stuff really
>>> works with GFP_NOFS.
>> What do you mean by testing GFP_NOFS. Do you mean to test that GFP_NOFS
>> context is sufficiently powerful to reclaim enough objects due to some
>> internal constrains?
>>
>>> My only concern is that if I could rely on memalloc_no* being used we could
>>> optimize this quite a bit further.
>> Yes you can use the scope API and you will be guaranteed that _any_
>> allocation from the enclosed context will inherit GFP_NO* semantic.

The question is if this is also guaranteed the other way around?

In other words if somebody calls get_free_page(GFP_NOFS) are the context 
flags set as well?

>> I think this is where I don't get yet what Christian tries to do: We
>> really shouldn't do different tricks and calling contexts between direct
>> reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
>> pretty much guaranteed. So whether we use explicit gfp flags or the
>> context apis, result is exactly the same.

Ok let us recap what TTMs TT shrinker does here:

1. We got memory which is not swapable because it might be accessed by 
the GPU at any time.
2. Make sure the memory is not accessed by the GPU and driver need to 
grab a lock before they can make it accessible again.
3. Allocate a shmem file and copy over the not swapable pages.
4. Free the not swapable/reclaimable pages.

The pages we got from the shmem file are easily swapable to disk after 
the copy is completed. But only if IO is not already blocked because the 
shrinker was called from an allocation restricted by GFP_NOFS or GFP_NOIO.

Regards,
Christian.

> -Daniel



  parent reply	other threads:[~2021-03-23 11:49 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210319140857.2262-1-christian.koenig@amd.com>
     [not found] ` <YFTk1GSaUDI3wcWt@phenom.ffwll.local>
     [not found]   ` <2831bfcc-140e-dade-1f50-a6431e495e9d@gmail.com>
     [not found]     ` <YFT2LSR97rkkPyEP@phenom.ffwll.local>
     [not found]       ` <1ae415c4-8e49-5183-b44d-bc92088657d5@gmail.com>
     [not found]         ` <CAKMK7uEDhuvSwJj5CX8vHgLb+5zm=rdJPmXwb-VQWdrW6GwQZw@mail.gmail.com>
     [not found]           ` <e6e9df3e-cd2b-d80f-205d-6ca1865819b2@gmail.com>
2021-03-22 13:49             ` [PATCH] drm/ttm: stop warning on TT shrinker failure Daniel Vetter
2021-03-22 14:05               ` Matthew Wilcox
2021-03-22 14:22                 ` Daniel Vetter
2021-03-22 15:57                 ` Michal Hocko
2021-03-22 17:02                   ` Daniel Vetter
2021-03-22 19:34                     ` Christian König
2021-03-23  7:38                       ` Michal Hocko
2021-03-23 11:28                         ` Daniel Vetter
2021-03-23 11:46                           ` Michal Hocko
2021-03-23 11:51                             ` Christian König
2021-03-23 12:00                               ` Daniel Vetter
2021-03-23 12:05                               ` Michal Hocko
2021-03-23 11:48                           ` Christian König [this message]
2021-03-23 12:04                             ` Michal Hocko
2021-03-23 12:21                               ` Christian König
2021-03-23 12:37                                 ` Michal Hocko
2021-03-23 13:06                                   ` Christian König
2021-03-23 13:41                                     ` Michal Hocko
2021-03-23 13:56                                       ` Christian König
2021-03-23 15:13                                         ` Michal Hocko
2021-03-23 15:45                                           ` Christian König
2021-03-24 10:19                                             ` Thomas Hellström (Intel)
2021-03-24 11:55                                               ` Daniel Vetter
2021-03-24 12:00                                                 ` Christian König
2021-03-24 12:01                                                   ` Daniel Vetter
2021-03-24 12:07                                                     ` Christian König
2021-03-24 19:20                                                       ` Daniel Vetter
2021-03-23 13:15                               ` Daniel Vetter
2021-03-23 13:48                                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd17d2ca-140e-1e69-37ac-c2726cc1ef9f@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Leo.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dchinner@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).