All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Michal Hocko <mhocko@suse.com>, Nico Pache <npache@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeelb@google.com>, Roman Gushchin <guro@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	raquini@redhat.com
Subject: Re: [RFC PATCH 2/2] mm/vmscan.c: Prevent allocating shrinker_info on offlined nodes
Date: Mon, 6 Dec 2021 13:28:34 -0800	[thread overview]
Message-ID: <CAHbLzkrfU3SQ8r4FyhumDHr02DSKd8oWbhwwVbBUHF7GCGY2Hg@mail.gmail.com> (raw)
In-Reply-To: <a48c16d6-07df-ff44-67e6-f0942672ec28@redhat.com>

On Mon, Dec 6, 2021 at 11:01 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.12.21 19:42, Yang Shi wrote:
> > On Mon, Dec 6, 2021 at 5:19 AM Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> >>
> >> On 06.12.2021 13:45, David Hildenbrand wrote:
> >>>> This doesn't seen complete. Slab shrinkers are used in the reclaim
> >>>> context. Previously offline nodes could be onlined later and this would
> >>>> lead to NULL ptr because there is no hook to allocate new shrinker
> >>>> infos. This would be also really impractical because this would have to
> >>>> update all existing memcgs...
> >>>
> >>> Instead of going through the trouble of updating...
> >>>
> >>> ...  maybe just keep for_each_node() and check if the target node is
> >>> offline. If it's offline, just allocate from the first online node.
> >>> After all, we're not using __GFP_THISNODE, so there are no guarantees
> >>> either way ...
> >>
> >> Hm, can't we add shrinker maps allocation to __try_online_node() in addition
> >> to this patch?
> >
> > I think the below fix (an example, doesn't cover all affected
> > callsites) should be good enough for now? It doesn't touch the hot
> > path of the page allocator.
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index fb9584641ac7..1252a33f7c28 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -222,13 +222,15 @@ static int expand_one_shrinker_info(struct
> > mem_cgroup *memcg,
> >         int size = map_size + defer_size;
> >
> >         for_each_node(nid) {
> > +               int tmp = nid;
> >                 pn = memcg->nodeinfo[nid];
> >                 old = shrinker_info_protected(memcg, nid);
> >                 /* Not yet online memcg */
> >                 if (!old)
> >                         return 0;
> > -
> > -               new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> > +               if (!node_online(nid))
> > +                       tmp = -1;
> > +               new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, tmp);
> >                 if (!new)
> >                         return -ENOMEM;
> >
> > It used to use kvmalloc instead of kvmalloc_node(). The commit
> > 86daf94efb11d7319fbef5e480018c4807add6ef ("mm/memcontrol.c: allocate
> > shrinker_map on appropriate NUMA node") changed to use *_node()
> > version. The justification was that "kswapd is always bound to
> > specific node. So allocate shrinker_map from the related NUMA node to
> > respect its NUMA locality." There is no kswapd for offlined node, so
> > just allocate shrinker info on node 0. This is also what
> > alloc_mem_cgroup_per_node_info() does.
>
> Yes, that's what I refer to as fixing it in the caller -- similar to
> [1]. Michals point is to not require such node_online() checks at all,
> neither in the caller nor in the buddy.
>
> I see 2 options short-term
>
> 1) What we have in [1].
> 2) What I proposed in [2], fixing it for all such instances until we
> have something better.
>
> Long term I tend to agree that what Michal proposes is better.
>
> Short term I tend to like [2], because it avoids having to mess with all
> such instances to eventually get it right and the temporary overhead
> until we have the code reworked should be really negligible ...

Thanks, David. Basically either option looks fine to me. But I'm a
little bit concerned about [2]. It silently changes the node requested
by the callers. It actually papers over potential bugs? And what if
the callers specify __GFP_THISNODE (I didn't search if such callers
really exist in the current code)?

How's about a helper function, for example, called
kvmalloc_best_node()? It does:

void * kvmalloc_best_node(unsigned long size, int flag, int nid)
{
    bool onlined = node_online(nid);

    WARN_ON_ONCE((flag & __GFP_THISNODE) && !onlined);

    if (!onlined)
        nid = -1;

    return kvmalloc_node(size, GFP_xxx, nid);
}

>
>
>
> [1] https://lkml.kernel.org/r/20211108202325.20304-1-amakhalov@vmware.com
> [2]
> https://lkml.kernel.org/r/51c65635-1dae-6ba4-daf9-db9df0ec35d8@redhat.com
>
> --
> Thanks,
>
> David / dhildenb
>

  reply	other threads:[~2021-12-06 21:28 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-06  3:33 [RFC PATCH 0/2] mm: Dont allocate pages on a offline node Nico Pache
2021-12-06  3:33 ` [RFC PATCH 1/2] include/linux/gfp.h: Do not allocate pages on a offlined node Nico Pache
2021-12-06  3:37   ` Matthew Wilcox
2021-12-06  8:29   ` David Hildenbrand
2021-12-06  9:22   ` Michal Hocko
2021-12-07 21:24     ` Nico Pache
2021-12-06  3:33 ` [RFC PATCH 2/2] mm/vmscan.c: Prevent allocating shrinker_info on offlined nodes Nico Pache
2021-12-06  8:32   ` David Hildenbrand
2021-12-06  9:22   ` Michal Hocko
2021-12-06  9:24     ` Michal Hocko
2021-12-06 10:45     ` David Hildenbrand
2021-12-06 10:54       ` Michal Hocko
2021-12-06 11:00         ` David Hildenbrand
2021-12-06 11:22           ` Michal Hocko
2021-12-06 12:43             ` David Hildenbrand
2021-12-06 13:06               ` Michal Hocko
2021-12-06 13:47                 ` David Hildenbrand
2021-12-06 14:06                   ` Michal Hocko
2021-12-06 14:08                     ` David Hildenbrand
2021-12-06 14:21                       ` Michal Hocko
2021-12-06 14:30                         ` Vlastimil Babka
2021-12-06 14:53                           ` Michal Hocko
2021-12-06 18:26                             ` Yang Shi
2021-12-07 10:15                               ` Michal Hocko
2021-12-06 14:15                   ` Michal Hocko
2021-12-06 13:19       ` Kirill Tkhai
2021-12-06 13:24         ` Michal Hocko
2021-12-08 19:00           ` Nico Pache
2021-12-06 18:42         ` Yang Shi
2021-12-06 19:01           ` David Hildenbrand
2021-12-06 21:28             ` Yang Shi [this message]
2021-12-07 10:15               ` David Hildenbrand
2021-12-07 10:55             ` Michal Hocko
2021-12-07 21:45         ` Nico Pache
2021-12-07 21:40       ` Nico Pache
2021-12-07 21:34     ` Nico Pache
2021-12-06 18:45   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkrfU3SQ8r4FyhumDHr02DSKd8oWbhwwVbBUHF7GCGY2Hg@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=guro@fb.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=raquini@redhat.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.