All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory
@ 2022-07-06 15:58 Yafang Shao
  2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
  2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
  0 siblings, 2 replies; 20+ messages in thread
From: Yafang Shao @ 2022-07-06 15:58 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo
  Cc: bpf, linux-mm, Yafang Shao

When I was implementing bpf recharge[1], I found some other issues.
These issues are independent, so I send them separately.

[1]. https://lore.kernel.org/bpf/20220619155032.32515-1-laoar.shao@gmail.com/

v2:
- fix GFP_HIGH consistently over the bpf code. (Daniel, Roman)
- get rid of an error patch (Hao)

Yafang Shao (2):
  bpf: Make non-preallocated allocation low priority
  bpf: Warn on non-preallocated case for missed trace types

 kernel/bpf/devmap.c        | 3 ++-
 kernel/bpf/hashtab.c       | 8 +++++---
 kernel/bpf/local_storage.c | 3 ++-
 kernel/bpf/lpm_trie.c      | 3 ++-
 kernel/bpf/verifier.c      | 2 ++
 5 files changed, 13 insertions(+), 6 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 15:58 [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
@ 2022-07-06 15:58 ` Yafang Shao
  2022-07-06 16:47   ` Alexei Starovoitov
  2022-07-07  0:07   ` Shakeel Butt
  2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
  1 sibling, 2 replies; 20+ messages in thread
From: Yafang Shao @ 2022-07-06 15:58 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo
  Cc: bpf, linux-mm, Yafang Shao

GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

The force charge of GFP_ATOMIC was introduced in
commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
__GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
commit 1461e8c2b6af ("memcg: unify force charging conditions") by
checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
__GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
we have to carefully verify all the callsites. Now that we can fix it in
BPF, we'd better not modify the memcg code.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

__GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
currently. But the memcg code can be improved to make
__GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.

It also fixes a typo in the comment.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 kernel/bpf/devmap.c        | 3 ++-
 kernel/bpf/hashtab.c       | 8 +++++---
 kernel/bpf/local_storage.c | 3 ++-
 kernel/bpf/lpm_trie.c      | 3 ++-
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index c2867068e5bd..7672946126d5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -845,7 +845,8 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	struct bpf_dtab_netdev *dev;
 
 	dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev),
-				   GFP_ATOMIC | __GFP_NOWARN,
+				   __GFP_ATOMIC | __GFP_NOWARN |
+				   __GFP_KSWAPD_RECLAIM,
 				   dtab->map.numa_node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 17fb69c0e0dc..9d4559a1c032 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -61,7 +61,7 @@
  *
  * As regular device interrupt handlers and soft interrupts are forced into
  * thread context, the existing code which does
- *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
  * just works.
  *
  * In theory the BPF locks could be converted to regular spinlocks as well,
@@ -978,7 +978,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 				goto dec_count;
 			}
 		l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
-					     GFP_ATOMIC | __GFP_NOWARN,
+					     __GFP_ATOMIC | __GFP_NOWARN |
+					     __GFP_KSWAPD_RECLAIM,
 					     htab->map.numa_node);
 		if (!l_new) {
 			l_new = ERR_PTR(-ENOMEM);
@@ -996,7 +997,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			/* alloc_percpu zero-fills */
 			pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
-						    GFP_ATOMIC | __GFP_NOWARN);
+						    __GFP_ATOMIC | __GFP_NOWARN |
+						    __GFP_KSWAPD_RECLAIM);
 			if (!pptr) {
 				kfree(l_new);
 				l_new = ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 8654fc97f5fe..534b69682b17 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -165,7 +165,8 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
 	}
 
 	new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size),
-				   __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN,
+				   __GFP_ZERO | __GFP_ATOMIC | __GFP_NOWARN |
+				   __GFP_KSWAPD_RECLAIM,
 				   map->numa_node);
 	if (!new)
 		return -ENOMEM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index f0d05a3cc4b9..7bae7133f1dd 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -285,7 +285,8 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
 	if (value)
 		size += trie->map.value_size;
 
-	node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN,
+	node = bpf_map_kmalloc_node(&trie->map, size, __GFP_ATOMIC |
+				    __GFP_KSWAPD_RECLAIM | __GFP_NOWARN,
 				    trie->map.numa_node);
 	if (!node)
 		return NULL;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-06 15:58 [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
  2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-07-06 15:58 ` Yafang Shao
  2022-07-06 16:50   ` Alexei Starovoitov
  1 sibling, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-07-06 15:58 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo
  Cc: bpf, linux-mm, Yafang Shao

BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
trace type as well, which may also cause unexpected memory allocation if
we set BPF_F_NO_PREALLOC.
Let's also warn on both of them.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index df3ec6b05f05..f9c0f4889a3a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12570,6 +12570,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_TRACEPOINT:
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
+	case BPF_PROG_TYPE_TRACING:
 		return true;
 	default:
 		return false;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-07-06 16:47   ` Alexei Starovoitov
  2022-07-06 19:09     ` Roman Gushchin
  2022-07-07  0:07   ` Shakeel Butt
  1 sibling, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-06 16:47 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Roman Gushchin, Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> if we allocate too much GFP_ATOMIC memory. For example, when we set the
> memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> easily break the memcg limit by force charge. So it is very dangerous to
> use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> too much memory.
>
> We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> too memory expensive for some cases. That means removing __GFP_HIGH
> doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> it-avoiding issues caused by too much memory. So let's remove it.
>
> The force charge of GFP_ATOMIC was introduced in
> commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> we have to carefully verify all the callsites. Now that we can fix it in
> BPF, we'd better not modify the memcg code.
>
> This fix can also apply to other run-time allocations, for example, the
> allocation in lpm trie, local storage and devmap. So let fix it
> consistently over the bpf code
>
> __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> currently. But the memcg code can be improved to make
> __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.

Could you elaborate ?

> It also fixes a typo in the comment.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Roman, do you agree with this change ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
@ 2022-07-06 16:50   ` Alexei Starovoitov
  2022-07-07 10:29     ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-06 16:50 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Roman Gushchin, Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
> trace type as well, which may also cause unexpected memory allocation if
> we set BPF_F_NO_PREALLOC.
> Let's also warn on both of them.
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  kernel/bpf/verifier.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3ec6b05f05..f9c0f4889a3a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12570,6 +12570,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
>         case BPF_PROG_TYPE_TRACEPOINT:
>         case BPF_PROG_TYPE_PERF_EVENT:
>         case BPF_PROG_TYPE_RAW_TRACEPOINT:
> +       case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> +       case BPF_PROG_TYPE_TRACING:

BPF_TRACE_ITER should probably be excluded.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 16:47   ` Alexei Starovoitov
@ 2022-07-06 19:09     ` Roman Gushchin
  2022-07-06 22:11       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Roman Gushchin @ 2022-07-06 19:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 06, 2022 at 09:47:32AM -0700, Alexei Starovoitov wrote:
> On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
> >
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> >
> > The force charge of GFP_ATOMIC was introduced in
> > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > we have to carefully verify all the callsites. Now that we can fix it in
> > BPF, we'd better not modify the memcg code.
> >
> > This fix can also apply to other run-time allocations, for example, the
> > allocation in lpm trie, local storage and devmap. So let fix it
> > consistently over the bpf code
> >
> > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > currently. But the memcg code can be improved to make
> > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> 
> Could you elaborate ?
> 
> > It also fixes a typo in the comment.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> 
> Roman, do you agree with this change ?

Yes, removing __GFP_HIGH makes sense to me. I can imagine we might want
it for *some* bpf allocations, but applying it unconditionally looks wrong.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 19:09     ` Roman Gushchin
@ 2022-07-06 22:11       ` Alexei Starovoitov
  2022-07-06 22:54         ` Roman Gushchin
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-06 22:11 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 12:09 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Jul 06, 2022 at 09:47:32AM -0700, Alexei Starovoitov wrote:
> > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > easily break the memcg limit by force charge. So it is very dangerous to
> > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > too much memory.
> > >
> > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > it-avoiding issues caused by too much memory. So let's remove it.
> > >
> > > The force charge of GFP_ATOMIC was introduced in
> > > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > > we have to carefully verify all the callsites. Now that we can fix it in
> > > BPF, we'd better not modify the memcg code.
> > >
> > > This fix can also apply to other run-time allocations, for example, the
> > > allocation in lpm trie, local storage and devmap. So let fix it
> > > consistently over the bpf code
> > >
> > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > > currently. But the memcg code can be improved to make
> > > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> >
> > Could you elaborate ?
> >
> > > It also fixes a typo in the comment.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> >
> > Roman, do you agree with this change ?
>
> Yes, removing __GFP_HIGH makes sense to me. I can imagine we might want
> it for *some* bpf allocations, but applying it unconditionally looks wrong.

Yeah. It's a difficult trade-off to make without having the data
to decide whether removing __GFP_HIGH can cause issues or not,
but do you agree that __GFP_HIGH doesn't cooperate well with memcg ?
If so it's a bug on memcg side, right? but we should probably
apply this band-aid on bpf side to fix the bleeding.
Later we can add a knob to allow __GFP_HIGH usage on demand from
bpf prog.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 22:11       ` Alexei Starovoitov
@ 2022-07-06 22:54         ` Roman Gushchin
  2022-07-06 23:22           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Roman Gushchin @ 2022-07-06 22:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 06, 2022 at 03:11:46PM -0700, Alexei Starovoitov wrote:
> On Wed, Jul 6, 2022 at 12:09 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Wed, Jul 06, 2022 at 09:47:32AM -0700, Alexei Starovoitov wrote:
> > > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > > easily break the memcg limit by force charge. So it is very dangerous to
> > > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > > too much memory.
> > > >
> > > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > > it-avoiding issues caused by too much memory. So let's remove it.
> > > >
> > > > The force charge of GFP_ATOMIC was introduced in
> > > > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > > > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > > > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > > > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > > > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > > > we have to carefully verify all the callsites. Now that we can fix it in
> > > > BPF, we'd better not modify the memcg code.
> > > >
> > > > This fix can also apply to other run-time allocations, for example, the
> > > > allocation in lpm trie, local storage and devmap. So let fix it
> > > > consistently over the bpf code
> > > >
> > > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > > > currently. But the memcg code can be improved to make
> > > > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> > >
> > > Could you elaborate ?
> > >
> > > > It also fixes a typo in the comment.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > >
> > > Roman, do you agree with this change ?
> >
> > Yes, removing __GFP_HIGH makes sense to me. I can imagine we might want
> > it for *some* bpf allocations, but applying it unconditionally looks wrong.
> 
> Yeah. It's a difficult trade-off to make without having the data
> to decide whether removing __GFP_HIGH can cause issues or not,

Yeah, the change looks reasonable, but it's hard to say without giving
it a good testing in (something close to) a production environment.

> but do you agree that __GFP_HIGH doesn't cooperate well with memcg ?
> If so it's a bug on memcg side, right?

No. Historically we allowed high-prio allocations to exceed the memcg limit
because otherwise there were too many stability and performance issues.
It's not a memcg bug, it's a way to avoid exposing ENOMEM handling bugs all over
the kernel code. Without memory cgroups GFP_ATOMIC allocations rarely fail
and a lot of code paths in the kernel are not really ready for it (at least
it was the case several years ago, maybe things are better now).

But it was usually thought in the context of small(ish) allocations which do not
change the global memory usage picture. Subsequent "normal" allocations are
triggering reclaim/OOM, so from a user's POV the limit works as expected.

But with the ownership model and size of bpf maps it's a different story:
if a bpf map belongs to an abandoned cgroup, it might consume a lot of memory
and there will be no "normal" allocations. So cgroup memory limit will be never
applied. It's a valid issue, I agree with Yafang here.

> but we should probably
> apply this band-aid on bpf side to fix the bleeding.
> Later we can add a knob to allow __GFP_HIGH usage on demand from
> bpf prog.

Yes, it sounds like a good idea. I have to think what's the best approach
here, it's not obvious for me.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 22:54         ` Roman Gushchin
@ 2022-07-06 23:22           ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-06 23:22 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 3:54 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Wed, Jul 06, 2022 at 03:11:46PM -0700, Alexei Starovoitov wrote:
> > On Wed, Jul 6, 2022 at 12:09 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Wed, Jul 06, 2022 at 09:47:32AM -0700, Alexei Starovoitov wrote:
> > > > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > > > easily break the memcg limit by force charge. So it is very dangerous to
> > > > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > > > too much memory.
> > > > >
> > > > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > > > it-avoiding issues caused by too much memory. So let's remove it.
> > > > >
> > > > > The force charge of GFP_ATOMIC was introduced in
> > > > > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > > > > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > > > > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > > > > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > > > > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > > > > we have to carefully verify all the callsites. Now that we can fix it in
> > > > > BPF, we'd better not modify the memcg code.
> > > > >
> > > > > This fix can also apply to other run-time allocations, for example, the
> > > > > allocation in lpm trie, local storage and devmap. So let fix it
> > > > > consistently over the bpf code
> > > > >
> > > > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > > > > currently. But the memcg code can be improved to make
> > > > > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> > > >
> > > > Could you elaborate ?
> > > >
> > > > > It also fixes a typo in the comment.
> > > > >
> > > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > >
> > > > Roman, do you agree with this change ?
> > >
> > > Yes, removing __GFP_HIGH makes sense to me. I can imagine we might want
> > > it for *some* bpf allocations, but applying it unconditionally looks wrong.
> >
> > Yeah. It's a difficult trade-off to make without having the data
> > to decide whether removing __GFP_HIGH can cause issues or not,
>
> Yeah, the change looks reasonable, but it's hard to say without giving
> it a good testing in (something close to) a production environment.
>
> > but do you agree that __GFP_HIGH doesn't cooperate well with memcg ?
> > If so it's a bug on memcg side, right?
>
> No. Historically we allowed high-prio allocations to exceed the memcg limit
> because otherwise there were too many stability and performance issues.
> It's not a memcg bug, it's a way to avoid exposing ENOMEM handling bugs all over
> the kernel code. Without memory cgroups GFP_ATOMIC allocations rarely fail
> and a lot of code paths in the kernel are not really ready for it (at least
> it was the case several years ago, maybe things are better now).
>
> But it was usually thought in the context of small(ish) allocations which do not
> change the global memory usage picture. Subsequent "normal" allocations are
> triggering reclaim/OOM, so from a user's POV the limit works as expected.
>
> But with the ownership model and size of bpf maps it's a different story:
> if a bpf map belongs to an abandoned cgroup, it might consume a lot of memory
> and there will be no "normal" allocations. So cgroup memory limit will be never
> applied. It's a valid issue, I agree with Yafang here.

Understood.

> > but we should probably
> > apply this band-aid on bpf side to fix the bleeding.
> > Later we can add a knob to allow __GFP_HIGH usage on demand from
> > bpf prog.
>
> Yes, it sounds like a good idea. I have to think what's the best approach
> here, it's not obvious for me.

Ok. Applied this patch for now.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
  2022-07-06 16:47   ` Alexei Starovoitov
@ 2022-07-07  0:07   ` Shakeel Butt
  2022-07-07  0:14     ` Alexei Starovoitov
                       ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Shakeel Butt @ 2022-07-07  0:07 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo, bpf, linux-mm

On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> if we allocate too much GFP_ATOMIC memory. For example, when we set the
> memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> easily break the memcg limit by force charge. So it is very dangerous to
> use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> too much memory.

Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
There is already a plan to completely remove __GFP_ATOMIC and mm-tree
already have a patch for that.

> 
> We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> too memory expensive for some cases. That means removing __GFP_HIGH
> doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> it-avoiding issues caused by too much memory. So let's remove it.
> 
> The force charge of GFP_ATOMIC was introduced in
> commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> we have to carefully verify all the callsites. Now that we can fix it in
> BPF, we'd better not modify the memcg code.
> 
> This fix can also apply to other run-time allocations, for example, the
> allocation in lpm trie, local storage and devmap. So let fix it
> consistently over the bpf code
> 
> __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> currently. But the memcg code can be improved to make
> __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> 

IMO there is no need to give all this detail and background on
GFP_ATOMIC and __GFP_KSWAPD_RECLAIM. Just say kernel allows GFP_ATOMIC
allocations to exceed memcg limits which we don't want in this case. So,
replace with GFP_NOWAIT which obey memcg limits. Both of these flags
tell kernel that the caller can not sleep.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07  0:07   ` Shakeel Butt
@ 2022-07-07  0:14     ` Alexei Starovoitov
  2022-07-07  0:25     ` Roman Gushchin
  2022-07-07 10:27     ` Yafang Shao
  2 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-07  0:14 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Roman Gushchin,
	Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 5:07 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
>
> Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> already have a patch for that.

hmm. ok. reverted.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07  0:07   ` Shakeel Butt
  2022-07-07  0:14     ` Alexei Starovoitov
@ 2022-07-07  0:25     ` Roman Gushchin
  2022-07-07  2:09       ` Alexei Starovoitov
  2022-07-07 10:27     ` Yafang Shao
  2 siblings, 1 reply; 20+ messages in thread
From: Roman Gushchin @ 2022-07-07  0:25 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yafang Shao, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, quentin, haoluo, bpf, linux-mm

On Thu, Jul 07, 2022 at 12:07:21AM +0000, Shakeel Butt wrote:
> On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
> 
> Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> already have a patch for that.

Oh, I didn't know this, thanks for heads up!
I agree that GFP_NOWAIT is the best choice then.

Btw, we probably shouldn't even add GFP_NOWAIT if the allocation is performed
from the bpf syscall context. Why would we fail to pre-allocate a map if
we can easily go into the reclaim? But probably better to leave it for
a separate change.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07  0:25     ` Roman Gushchin
@ 2022-07-07  2:09       ` Alexei Starovoitov
  2022-07-07  3:36         ` Roman Gushchin
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-07  2:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Shakeel Butt, Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 6, 2022 at 5:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Thu, Jul 07, 2022 at 12:07:21AM +0000, Shakeel Butt wrote:
> > On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > easily break the memcg limit by force charge. So it is very dangerous to
> > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > too much memory.
> >
> > Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> > There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> > already have a patch for that.
>
> Oh, I didn't know this, thanks for heads up!
> I agree that GFP_NOWAIT is the best choice then.
>
> Btw, we probably shouldn't even add GFP_NOWAIT if the allocation is performed
> from the bpf syscall context. Why would we fail to pre-allocate a map if
> we can easily go into the reclaim? But probably better to leave it for
> a separate change.

The places affected by this patch are in atomic context.
Prealloc path from syscall is using GFP_USER.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07  2:09       ` Alexei Starovoitov
@ 2022-07-07  3:36         ` Roman Gushchin
  0 siblings, 0 replies; 20+ messages in thread
From: Roman Gushchin @ 2022-07-07  3:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Shakeel Butt, Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, linux-mm

On Wed, Jul 06, 2022 at 07:09:22PM -0700, Alexei Starovoitov wrote:
> On Wed, Jul 6, 2022 at 5:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Thu, Jul 07, 2022 at 12:07:21AM +0000, Shakeel Butt wrote:
> > > On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > > easily break the memcg limit by force charge. So it is very dangerous to
> > > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > > too much memory.
> > >
> > > Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> > > There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> > > already have a patch for that.
> >
> > Oh, I didn't know this, thanks for heads up!
> > I agree that GFP_NOWAIT is the best choice then.
> >
> > Btw, we probably shouldn't even add GFP_NOWAIT if the allocation is performed
> > from the bpf syscall context. Why would we fail to pre-allocate a map if
> > we can easily go into the reclaim? But probably better to leave it for
> > a separate change.
> 
> The places affected by this patch are in atomic context.
> Prealloc path from syscall is using GFP_USER.

Right. Sorry, my bad, for some reason I was under an impression it's a common
path for all allocations.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07  0:07   ` Shakeel Butt
  2022-07-07  0:14     ` Alexei Starovoitov
  2022-07-07  0:25     ` Roman Gushchin
@ 2022-07-07 10:27     ` Yafang Shao
  2022-07-07 15:44       ` Alexei Starovoitov
  2 siblings, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-07-07 10:27 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Quentin Monnet, Roman Gushchin, Hao Luo, bpf, Linux MM

On Thu, Jul 7, 2022 at 8:07 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
>
> Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> already have a patch for that.
>

After reading the discussion[1], it looks good to me to use GFP_NOWAIT
instead. I will update it.

[1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/

> >
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> >
> > The force charge of GFP_ATOMIC was introduced in
> > commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
> > __GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
> > commit 1461e8c2b6af ("memcg: unify force charging conditions") by
> > checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
> > __GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
> > we have to carefully verify all the callsites. Now that we can fix it in
> > BPF, we'd better not modify the memcg code.
> >
> > This fix can also apply to other run-time allocations, for example, the
> > allocation in lpm trie, local storage and devmap. So let fix it
> > consistently over the bpf code
> >
> > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > currently. But the memcg code can be improved to make
> > __GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.
> >
>
> IMO there is no need to give all this detail and background on
> GFP_ATOMIC and __GFP_KSWAPD_RECLAIM. Just say kernel allows GFP_ATOMIC
> allocations to exceed memcg limits which we don't want in this case. So,
> replace with GFP_NOWAIT which obey memcg limits. Both of these flags
> tell kernel that the caller can not sleep.
>

Sure, thanks.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-06 16:50   ` Alexei Starovoitov
@ 2022-07-07 10:29     ` Yafang Shao
  2022-07-07 15:45       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-07-07 10:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Roman Gushchin, Hao Luo, bpf, linux-mm

On Thu, Jul 7, 2022 at 12:50 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
> > trace type as well, which may also cause unexpected memory allocation if
> > we set BPF_F_NO_PREALLOC.
> > Let's also warn on both of them.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  kernel/bpf/verifier.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index df3ec6b05f05..f9c0f4889a3a 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12570,6 +12570,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
> >         case BPF_PROG_TYPE_TRACEPOINT:
> >         case BPF_PROG_TYPE_PERF_EVENT:
> >         case BPF_PROG_TYPE_RAW_TRACEPOINT:
> > +       case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> > +       case BPF_PROG_TYPE_TRACING:
>
> BPF_TRACE_ITER should probably be excluded.

Right, I have verified that BPF_TRACE_ITER can be excluded.
Will change it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07 10:27     ` Yafang Shao
@ 2022-07-07 15:44       ` Alexei Starovoitov
  2022-07-07 16:19         ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-07 15:44 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
	john fastabend, KP Singh, Quentin Monnet, Roman Gushchin,
	Hao Luo, bpf, Linux MM

On Thu, Jul 7, 2022 at 3:28 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 8:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > easily break the memcg limit by force charge. So it is very dangerous to
> > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > too much memory.
> >
> > Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> > There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> > already have a patch for that.
> >
>
> After reading the discussion[1], it looks good to me to use GFP_NOWAIT
> instead. I will update it.

Should we use GFP_ATOMIC | __GFP_NOMEMALLOC instead
to align with its usage in the networking stack?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-07 10:29     ` Yafang Shao
@ 2022-07-07 15:45       ` Alexei Starovoitov
  2022-07-07 16:22         ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2022-07-07 15:45 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Roman Gushchin, Hao Luo, bpf, linux-mm

On Thu, Jul 7, 2022 at 3:30 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 12:50 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
> > > trace type as well, which may also cause unexpected memory allocation if
> > > we set BPF_F_NO_PREALLOC.
> > > Let's also warn on both of them.
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > ---
> > >  kernel/bpf/verifier.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index df3ec6b05f05..f9c0f4889a3a 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -12570,6 +12570,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
> > >         case BPF_PROG_TYPE_TRACEPOINT:
> > >         case BPF_PROG_TYPE_PERF_EVENT:
> > >         case BPF_PROG_TYPE_RAW_TRACEPOINT:
> > > +       case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> > > +       case BPF_PROG_TYPE_TRACING:
> >
> > BPF_TRACE_ITER should probably be excluded.
>
> Right, I have verified that BPF_TRACE_ITER can be excluded.
> Will change it.

Probably more than that. See that your change broke BPF CI
and selftests are failing.
Which means it breaks existing bpf programs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-07 15:44       ` Alexei Starovoitov
@ 2022-07-07 16:19         ` Yafang Shao
  0 siblings, 0 replies; 20+ messages in thread
From: Yafang Shao @ 2022-07-07 16:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
	john fastabend, KP Singh, Quentin Monnet, Roman Gushchin,
	Hao Luo, bpf, Linux MM

On Thu, Jul 7, 2022 at 11:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 3:28 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Jul 7, 2022 at 8:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Wed, Jul 06, 2022 at 03:58:47PM +0000, Yafang Shao wrote:
> > > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > > easily break the memcg limit by force charge. So it is very dangerous to
> > > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > > too much memory.
> > >
> > > Please use GFP_NOWAIT instead of (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM).
> > > There is already a plan to completely remove __GFP_ATOMIC and mm-tree
> > > already have a patch for that.
> > >
> >
> > After reading the discussion[1], it looks good to me to use GFP_NOWAIT
> > instead. I will update it.
>
> Should we use GFP_ATOMIC | __GFP_NOMEMALLOC instead
> to align with its usage in the networking stack?

GFP_ATOMIC | __GFP_NOMEMALLOC will continue to break the memcg limit,
so we have to modify the try_charge_memcg() code to make sure
__GFP_NOMEMALLOC takes precedence over the __GFP_HIGH flag, IOW, if
both of them are set we won't allow it to break memcg limit.  That
will need more verification.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-07 15:45       ` Alexei Starovoitov
@ 2022-07-07 16:22         ` Yafang Shao
  0 siblings, 0 replies; 20+ messages in thread
From: Yafang Shao @ 2022-07-07 16:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Quentin Monnet, Roman Gushchin, Hao Luo, bpf, linux-mm

On Thu, Jul 7, 2022 at 11:45 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 3:30 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Jul 7, 2022 at 12:50 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 8:59 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
> > > > trace type as well, which may also cause unexpected memory allocation if
> > > > we set BPF_F_NO_PREALLOC.
> > > > Let's also warn on both of them.
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > > > ---
> > > >  kernel/bpf/verifier.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index df3ec6b05f05..f9c0f4889a3a 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -12570,6 +12570,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
> > > >         case BPF_PROG_TYPE_TRACEPOINT:
> > > >         case BPF_PROG_TYPE_PERF_EVENT:
> > > >         case BPF_PROG_TYPE_RAW_TRACEPOINT:
> > > > +       case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> > > > +       case BPF_PROG_TYPE_TRACING:
> > >
> > > BPF_TRACE_ITER should probably be excluded.
> >
> > Right, I have verified that BPF_TRACE_ITER can be excluded.
> > Will change it.
>
> Probably more than that. See that your change broke BPF CI
> and selftests are failing.
> Which means it breaks existing bpf programs.

Ah, yes, "#194 timer:FAIL" which is caused by this change.
I will be more careful here.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-07-07 16:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 15:58 [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
2022-07-06 15:58 ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
2022-07-06 16:47   ` Alexei Starovoitov
2022-07-06 19:09     ` Roman Gushchin
2022-07-06 22:11       ` Alexei Starovoitov
2022-07-06 22:54         ` Roman Gushchin
2022-07-06 23:22           ` Alexei Starovoitov
2022-07-07  0:07   ` Shakeel Butt
2022-07-07  0:14     ` Alexei Starovoitov
2022-07-07  0:25     ` Roman Gushchin
2022-07-07  2:09       ` Alexei Starovoitov
2022-07-07  3:36         ` Roman Gushchin
2022-07-07 10:27     ` Yafang Shao
2022-07-07 15:44       ` Alexei Starovoitov
2022-07-07 16:19         ` Yafang Shao
2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
2022-07-06 16:50   ` Alexei Starovoitov
2022-07-07 10:29     ` Yafang Shao
2022-07-07 15:45       ` Alexei Starovoitov
2022-07-07 16:22         ` Yafang Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.