All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Quentin Monnet <quentin@isovalent.com>,
	Hao Luo <haoluo@google.com>, bpf <bpf@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
Date: Wed, 6 Jul 2022 15:54:04 -0700	[thread overview]
Message-ID: <YsYSjAUGUkERUZ4q@castle> (raw)
In-Reply-To: <CAADnVQ+c_2Q6GxH3E0iD0RkOy2H2-UhuYL4V3v2BTQ6sZNxQAA@mail.gmail.com>

On Wed, Jul 06, 2022 at 03:11:46PM -0700, Alexei Starovoitov wrote:
> On Wed, Jul 6, 2022 at 12:09 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Wed, Jul 06, 2022 at 09:47:32AM -0700, Alexei Starovoitov wrote:
> > > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > > easily break the memcg limit by force charge. So it is very dangerous to
> > > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > > too much memory.
> > > >
> > > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > > it-avoiding issues caused by too much memory. So let's remove it.
> > > >
> > > > The force charge of GFP_ATOMIC was introduced in
> > > > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > > > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > > > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > > > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > > > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > > > we have to carefully verify all the callsites. Now that we can fix it in
> > > > BPF, we'd better not modify the memcg code.
> > > >
> > > > This fix can also apply to other run-time allocations, for example, the
> > > > allocation in lpm trie, local storage and devmap. So let fix it
> > > > consistently over the bpf code
> > > >
> > > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > > > currently. But the memcg code can be improved to make
> > > > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> > >
> > > Could you elaborate ?
> > >
> > > > It also fixes a typo in the comment.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > >
> > > Roman, do you agree with this change ?
> >
> > Yes, removing __GFP_HIGH makes sense to me. I can imagine we might want
> > it for *some* bpf allocations, but applying it unconditionally looks wrong.
> 
> Yeah. It's a difficult trade-off to make without having the data
> to decide whether removing __GFP_HIGH can cause issues or not,

Yeah, the change looks reasonable, but it's hard to say without giving
it a good testing in (something close to) a production environment.

> but do you agree that __GFP_HIGH doesn't cooperate well with memcg ?
> If so it's a bug on memcg side, right?

No. Historically we allowed high-prio allocations to exceed the memcg limit
because otherwise there were too many stability and performance issues.
It's not a memcg bug, it's a way to avoid exposing ENOMEM handling bugs all over
the kernel code. Without memory cgroups GFP_ATOMIC allocations rarely fail
and a lot of code paths in the kernel are not really ready for it (at least
it was the case several years ago, maybe things are better now).

But it was usually thought in the context of small(ish) allocations which do not
change the global memory usage picture. Subsequent "normal" allocations are
triggering reclaim/OOM, so from a user's POV the limit works as expected.

But with the ownership model and size of bpf maps it's a different story:
if a bpf map belongs to an abandoned cgroup, it might consume a lot of memory
and there will be no "normal" allocations. So cgroup memory limit will be never
applied. It's a valid issue, I agree with Yafang here.

> but we should probably
> apply this band-aid on bpf side to fix the bleeding.
> Later we can add a knob to allow __GFP_HIGH usage on demand from
> bpf prog.

Yes, it sounds like a good idea. I have to think what's the best approach
here, it's not obvious for me.

Thanks!

  reply	other threads:[~2022-07-06 22:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 15:58 [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
2022-07-06 16:47   ` Alexei Starovoitov
2022-07-06 19:09     ` Roman Gushchin
2022-07-06 22:11       ` Alexei Starovoitov
2022-07-06 22:54         ` Roman Gushchin [this message]
2022-07-06 23:22           ` Alexei Starovoitov
2022-07-07  0:07   ` Shakeel Butt
2022-07-07  0:14     ` Alexei Starovoitov
2022-07-07  0:25     ` Roman Gushchin
2022-07-07  2:09       ` Alexei Starovoitov
2022-07-07  3:36         ` Roman Gushchin
2022-07-07 10:27     ` Yafang Shao
2022-07-07 15:44       ` Alexei Starovoitov
2022-07-07 16:19         ` Yafang Shao
2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
2022-07-06 16:50   ` Alexei Starovoitov
2022-07-07 10:29     ` Yafang Shao
2022-07-07 15:45       ` Alexei Starovoitov
2022-07-07 16:22         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsYSjAUGUkERUZ4q@castle \
    --to=roman.gushchin@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=quentin@isovalent.com \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.