All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	john.fastabend@gmail.com, kpsingh@kernel.org,
	quentin@isovalent.com, roman.gushchin@linux.dev,
	haoluo@google.com
Cc: bpf@vger.kernel.org, linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority
Date: Wed,  6 Jul 2022 15:58:47 +0000	[thread overview]
Message-ID: <20220706155848.4939-2-laoar.shao@gmail.com> (raw)
In-Reply-To: <20220706155848.4939-1-laoar.shao@gmail.com>

GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

The force charge of GFP_ATOMIC was introduced in
commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
__GFP_ATOMIC charges") by checking __GFP_ATOMIC, then got improved in
commit 1461e8c2b6af ("memcg: unify force charging conditions") by
checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
__GFP_ATOMIC are set in GFP_AOMIC). So, if we want to fix it in memcg,
we have to carefully verify all the callsites. Now that we can fix it in
BPF, we'd better not modify the memcg code.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

__GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
currently. But the memcg code can be improved to make
__GFP_KSWAPD_RECLAIM work well under memcg pressure if desired.

It also fixes a typo in the comment.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 kernel/bpf/devmap.c        | 3 ++-
 kernel/bpf/hashtab.c       | 8 +++++---
 kernel/bpf/local_storage.c | 3 ++-
 kernel/bpf/lpm_trie.c      | 3 ++-
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index c2867068e5bd..7672946126d5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -845,7 +845,8 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	struct bpf_dtab_netdev *dev;
 
 	dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev),
-				   GFP_ATOMIC | __GFP_NOWARN,
+				   __GFP_ATOMIC | __GFP_NOWARN |
+				   __GFP_KSWAPD_RECLAIM,
 				   dtab->map.numa_node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 17fb69c0e0dc..9d4559a1c032 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -61,7 +61,7 @@
  *
  * As regular device interrupt handlers and soft interrupts are forced into
  * thread context, the existing code which does
- *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
  * just works.
  *
  * In theory the BPF locks could be converted to regular spinlocks as well,
@@ -978,7 +978,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 				goto dec_count;
 			}
 		l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
-					     GFP_ATOMIC | __GFP_NOWARN,
+					     __GFP_ATOMIC | __GFP_NOWARN |
+					     __GFP_KSWAPD_RECLAIM,
 					     htab->map.numa_node);
 		if (!l_new) {
 			l_new = ERR_PTR(-ENOMEM);
@@ -996,7 +997,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			/* alloc_percpu zero-fills */
 			pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
-						    GFP_ATOMIC | __GFP_NOWARN);
+						    __GFP_ATOMIC | __GFP_NOWARN |
+						    __GFP_KSWAPD_RECLAIM);
 			if (!pptr) {
 				kfree(l_new);
 				l_new = ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 8654fc97f5fe..534b69682b17 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -165,7 +165,8 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
 	}
 
 	new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size),
-				   __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN,
+				   __GFP_ZERO | __GFP_ATOMIC | __GFP_NOWARN |
+				   __GFP_KSWAPD_RECLAIM,
 				   map->numa_node);
 	if (!new)
 		return -ENOMEM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index f0d05a3cc4b9..7bae7133f1dd 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -285,7 +285,8 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
 	if (value)
 		size += trie->map.value_size;
 
-	node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN,
+	node = bpf_map_kmalloc_node(&trie->map, size, __GFP_ATOMIC |
+				    __GFP_KSWAPD_RECLAIM | __GFP_NOWARN,
 				    trie->map.numa_node);
 	if (!node)
 		return NULL;
-- 
2.17.1


  reply	other threads:[~2022-07-06 15:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 15:58 [PATCH bpf-next v2 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
2022-07-06 15:58 ` Yafang Shao [this message]
2022-07-06 16:47   ` [PATCH bpf-next v2 1/2] bpf: Make non-preallocated allocation low priority Alexei Starovoitov
2022-07-06 19:09     ` Roman Gushchin
2022-07-06 22:11       ` Alexei Starovoitov
2022-07-06 22:54         ` Roman Gushchin
2022-07-06 23:22           ` Alexei Starovoitov
2022-07-07  0:07   ` Shakeel Butt
2022-07-07  0:14     ` Alexei Starovoitov
2022-07-07  0:25     ` Roman Gushchin
2022-07-07  2:09       ` Alexei Starovoitov
2022-07-07  3:36         ` Roman Gushchin
2022-07-07 10:27     ` Yafang Shao
2022-07-07 15:44       ` Alexei Starovoitov
2022-07-07 16:19         ` Yafang Shao
2022-07-06 15:58 ` [PATCH bpf-next v2 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
2022-07-06 16:50   ` Alexei Starovoitov
2022-07-07 10:29     ` Yafang Shao
2022-07-07 15:45       ` Alexei Starovoitov
2022-07-07 16:22         ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706155848.4939-2-laoar.shao@gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quentin@isovalent.com \
    --cc=roman.gushchin@linux.dev \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.