linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/2] bpf: Minor fixes for non-preallocated memory
@ 2022-07-09 15:44 Yafang Shao
  2022-07-09 15:44 ` [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
  2022-07-09 15:44 ` [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
  0 siblings, 2 replies; 10+ messages in thread
From: Yafang Shao @ 2022-07-09 15:44 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo, shakeelb
  Cc: bpf, linux-mm, Yafang Shao

When I was implementing bpf recharge[1], I found some other issues.
These issues are independent, so I send them separately.

[1]. https://lore.kernel.org/bpf/20220619155032.32515-1-laoar.shao@gmail.com/

v3:
- use GFP_NOWAIT instead and update commit log (Shakeel)
- exclude some attach types for BPF_PROG_TYPE_TRACING (Alexei)

v2:
- fix GFP_HIGH consistently over the bpf code. (Daniel, Roman)
- get rid of an error patch (Hao)

Yafang Shao (2):
  bpf: Make non-preallocated allocation low priority
  bpf: Warn on non-preallocated case for missed trace types

 kernel/bpf/devmap.c        |  2 +-
 kernel/bpf/hashtab.c       |  6 +++---
 kernel/bpf/local_storage.c |  2 +-
 kernel/bpf/lpm_trie.c      |  2 +-
 kernel/bpf/verifier.c      | 18 +++++++++++++-----
 5 files changed, 19 insertions(+), 11 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-09 15:44 [PATCH bpf-next v3 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
@ 2022-07-09 15:44 ` Yafang Shao
  2022-07-11 19:19   ` Shakeel Butt
  2022-07-09 15:44 ` [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
  1 sibling, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2022-07-09 15:44 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo, shakeelb
  Cc: bpf, linux-mm, Yafang Shao, NeilBrown

GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory. There's a plan to completely remove __GFP_ATOMIC in the
mm side[1], so let's use GFP_NOWAIT instead.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

This fix can also apply to other run-time allocations, for example, the
allocation in lpm trie, local storage and devmap. So let fix it
consistently over the bpf code

It also fixes a typo in the comment.

[1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/

Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/devmap.c        | 2 +-
 kernel/bpf/hashtab.c       | 6 +++---
 kernel/bpf/local_storage.c | 2 +-
 kernel/bpf/lpm_trie.c      | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index c2867068e5bd..1400561efb15 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -845,7 +845,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	struct bpf_dtab_netdev *dev;
 
 	dev = bpf_map_kmalloc_node(&dtab->map, sizeof(*dev),
-				   GFP_ATOMIC | __GFP_NOWARN,
+				   GFP_NOWAIT | __GFP_NOWARN,
 				   dtab->map.numa_node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 17fb69c0e0dc..da7578426a46 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -61,7 +61,7 @@
  *
  * As regular device interrupt handlers and soft interrupts are forced into
  * thread context, the existing code which does
- *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
  * just works.
  *
  * In theory the BPF locks could be converted to regular spinlocks as well,
@@ -978,7 +978,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 				goto dec_count;
 			}
 		l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
-					     GFP_ATOMIC | __GFP_NOWARN,
+					     GFP_NOWAIT | __GFP_NOWARN,
 					     htab->map.numa_node);
 		if (!l_new) {
 			l_new = ERR_PTR(-ENOMEM);
@@ -996,7 +996,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			/* alloc_percpu zero-fills */
 			pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
-						    GFP_ATOMIC | __GFP_NOWARN);
+						    GFP_NOWAIT | __GFP_NOWARN);
 			if (!pptr) {
 				kfree(l_new);
 				l_new = ERR_PTR(-ENOMEM);
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 8654fc97f5fe..49ef0ce040c7 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -165,7 +165,7 @@ static int cgroup_storage_update_elem(struct bpf_map *map, void *key,
 	}
 
 	new = bpf_map_kmalloc_node(map, struct_size(new, data, map->value_size),
-				   __GFP_ZERO | GFP_ATOMIC | __GFP_NOWARN,
+				   __GFP_ZERO | GFP_NOWAIT | __GFP_NOWARN,
 				   map->numa_node);
 	if (!new)
 		return -ENOMEM;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index f0d05a3cc4b9..d789e3b831ad 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -285,7 +285,7 @@ static struct lpm_trie_node *lpm_trie_node_alloc(const struct lpm_trie *trie,
 	if (value)
 		size += trie->map.value_size;
 
-	node = bpf_map_kmalloc_node(&trie->map, size, GFP_ATOMIC | __GFP_NOWARN,
+	node = bpf_map_kmalloc_node(&trie->map, size, GFP_NOWAIT | __GFP_NOWARN,
 				    trie->map.numa_node);
 	if (!node)
 		return NULL;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-09 15:44 [PATCH bpf-next v3 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
  2022-07-09 15:44 ` [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-07-09 15:44 ` Yafang Shao
  2022-07-10 17:51   ` Yonghong Song
  1 sibling, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2022-07-09 15:44 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, roman.gushchin, haoluo, shakeelb
  Cc: bpf, linux-mm, Yafang Shao

The raw tracepoint may cause unexpected memory allocation if we set
BPF_F_NO_PREALLOC. So let's warn on it.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/verifier.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e3cf6194c24f..3cd8260827e0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12574,14 +12574,20 @@ static int check_map_prealloc(struct bpf_map *map)
 		!(map->map_flags & BPF_F_NO_PREALLOC);
 }
 
-static bool is_tracing_prog_type(enum bpf_prog_type type)
+static bool is_tracing_prog_type(enum bpf_prog_type prog_type,
+				 enum bpf_attach_type attach_type)
 {
-	switch (type) {
+	switch (prog_type) {
 	case BPF_PROG_TYPE_KPROBE:
 	case BPF_PROG_TYPE_TRACEPOINT:
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
 		return true;
+	case BPF_PROG_TYPE_TRACING:
+		if (attach_type == BPF_TRACE_RAW_TP)
+			return true;
+		return false;
 	default:
 		return false;
 	}
@@ -12601,7 +12607,9 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 					struct bpf_prog *prog)
 
 {
+	enum bpf_attach_type attach_type = prog->expected_attach_type;
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
+
 	/*
 	 * Validate that trace type programs use preallocated hash maps.
 	 *
@@ -12619,7 +12627,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 	 * now, but warnings are emitted so developers are made aware of
 	 * the unsafety and can fix their programs before this is enforced.
 	 */
-	if (is_tracing_prog_type(prog_type) && !is_preallocated_map(map)) {
+	if (is_tracing_prog_type(prog_type, attach_type) && !is_preallocated_map(map)) {
 		if (prog_type == BPF_PROG_TYPE_PERF_EVENT) {
 			verbose(env, "perf_event programs can only use preallocated hash map\n");
 			return -EINVAL;
@@ -12638,7 +12646,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		if (is_tracing_prog_type(prog_type)) {
+		if (is_tracing_prog_type(prog_type, attach_type)) {
 			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
 		}
@@ -12650,7 +12658,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 	}
 
 	if (map_value_has_timer(map)) {
-		if (is_tracing_prog_type(prog_type)) {
+		if (is_tracing_prog_type(prog_type, attach_type)) {
 			verbose(env, "tracing progs cannot use bpf_timer yet\n");
 			return -EINVAL;
 		}
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-09 15:44 ` [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
@ 2022-07-10 17:51   ` Yonghong Song
  2022-07-11  6:48     ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Yonghong Song @ 2022-07-10 17:51 UTC (permalink / raw)
  To: Yafang Shao, ast, daniel, andrii, kafai, songliubraving,
	john.fastabend, kpsingh, quentin, roman.gushchin, haoluo,
	shakeelb
  Cc: bpf, linux-mm



On 7/9/22 8:44 AM, Yafang Shao wrote:
> The raw tracepoint may cause unexpected memory allocation if we set
> BPF_F_NO_PREALLOC. So let's warn on it.

Please extend raw_tracepoint to other attach types which
may cause runtime map allocations.

> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>   kernel/bpf/verifier.c | 18 +++++++++++++-----
>   1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e3cf6194c24f..3cd8260827e0 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12574,14 +12574,20 @@ static int check_map_prealloc(struct bpf_map *map)
>   		!(map->map_flags & BPF_F_NO_PREALLOC);
>   }
>   
> -static bool is_tracing_prog_type(enum bpf_prog_type type)
> +static bool is_tracing_prog_type(enum bpf_prog_type prog_type,
> +				 enum bpf_attach_type attach_type)
>   {
> -	switch (type) {
> +	switch (prog_type) {
>   	case BPF_PROG_TYPE_KPROBE:
>   	case BPF_PROG_TYPE_TRACEPOINT:
>   	case BPF_PROG_TYPE_PERF_EVENT:
>   	case BPF_PROG_TYPE_RAW_TRACEPOINT:
> +	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
>   		return true;
> +	case BPF_PROG_TYPE_TRACING:
> +		if (attach_type == BPF_TRACE_RAW_TP)
> +			return true;

As Alexei mentioned earlier, here we should have
		if (attach_type != BPF_TRACE_ITER)
			return true;

For attach types with BPF_PROG_TYPE_TRACING programs,
BPF_TRACE_ITER attach type can only appear in process context.
All other attach types may appear in non-process context.

> +		return false;
>   	default:
>   		return false;
>   	}
> @@ -12601,7 +12607,9 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>   					struct bpf_prog *prog)
>   
>   {
> +	enum bpf_attach_type attach_type = prog->expected_attach_type;
>   	enum bpf_prog_type prog_type = resolve_prog_type(prog);
> +
[...]


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-10 17:51   ` Yonghong Song
@ 2022-07-11  6:48     ` Yafang Shao
  2022-07-11 19:04       ` Yonghong Song
  0 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2022-07-11  6:48 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, john fastabend, KP Singh, Quentin Monnet,
	Roman Gushchin, Hao Luo, Shakeel Butt, bpf, Linux MM

On Mon, Jul 11, 2022 at 1:51 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/9/22 8:44 AM, Yafang Shao wrote:
> > The raw tracepoint may cause unexpected memory allocation if we set
> > BPF_F_NO_PREALLOC. So let's warn on it.
>
> Please extend raw_tracepoint to other attach types which
> may cause runtime map allocations.
>

Per my understanding, it is safe to allocate memory in a non-process
context as long as we don't allow blocking it.
So this issue doesn't matter with whether it causes runtime map
allocations or not, while it really matters with the tracepoint or
kprobe.
Regarding the tracepoint or kprobe, if we don't use non-preallocated
maps, it may allocate other extra memory besides the map element
itself.
I have verified that it is safe to use non-preallocated maps in
BPF_TRACE_ITER or BPF_TRACE_FENTRY.
So filtering out BPF_TRACE_RAW_TP only is enough.

> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >   kernel/bpf/verifier.c | 18 +++++++++++++-----
> >   1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index e3cf6194c24f..3cd8260827e0 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12574,14 +12574,20 @@ static int check_map_prealloc(struct bpf_map *map)
> >               !(map->map_flags & BPF_F_NO_PREALLOC);
> >   }
> >
> > -static bool is_tracing_prog_type(enum bpf_prog_type type)
> > +static bool is_tracing_prog_type(enum bpf_prog_type prog_type,
> > +                              enum bpf_attach_type attach_type)
> >   {
> > -     switch (type) {
> > +     switch (prog_type) {
> >       case BPF_PROG_TYPE_KPROBE:
> >       case BPF_PROG_TYPE_TRACEPOINT:
> >       case BPF_PROG_TYPE_PERF_EVENT:
> >       case BPF_PROG_TYPE_RAW_TRACEPOINT:
> > +     case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> >               return true;
> > +     case BPF_PROG_TYPE_TRACING:
> > +             if (attach_type == BPF_TRACE_RAW_TP)
> > +                     return true;
>
> As Alexei mentioned earlier, here we should have
>                 if (attach_type != BPF_TRACE_ITER)
>                         return true;

That will break selftests/bpf/progs/timer.c, because it uses timer in fentry.

> For attach types with BPF_PROG_TYPE_TRACING programs,
> BPF_TRACE_ITER attach type can only appear in process context.
> All other attach types may appear in non-process context.
>

Thanks for the explanation.

> > +             return false;
> >       default:
> >               return false;
> >       }
> > @@ -12601,7 +12607,9 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> >                                       struct bpf_prog *prog)
> >
> >   {
> > +     enum bpf_attach_type attach_type = prog->expected_attach_type;
> >       enum bpf_prog_type prog_type = resolve_prog_type(prog);
> > +
> [...]



-- 
Regards
Yafang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-11  6:48     ` Yafang Shao
@ 2022-07-11 19:04       ` Yonghong Song
  2022-07-12  8:26         ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Yonghong Song @ 2022-07-11 19:04 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, john fastabend, KP Singh, Quentin Monnet,
	Roman Gushchin, Hao Luo, Shakeel Butt, bpf, Linux MM



On 7/10/22 11:48 PM, Yafang Shao wrote:
> On Mon, Jul 11, 2022 at 1:51 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/9/22 8:44 AM, Yafang Shao wrote:
>>> The raw tracepoint may cause unexpected memory allocation if we set
>>> BPF_F_NO_PREALLOC. So let's warn on it.
>>
>> Please extend raw_tracepoint to other attach types which
>> may cause runtime map allocations.
>>
> 
> Per my understanding, it is safe to allocate memory in a non-process
> context as long as we don't allow blocking it.
> So this issue doesn't matter with whether it causes runtime map
> allocations or not, while it really matters with the tracepoint or
> kprobe.
> Regarding the tracepoint or kprobe, if we don't use non-preallocated
> maps, it may allocate other extra memory besides the map element
> itself.
> I have verified that it is safe to use non-preallocated maps in
> BPF_TRACE_ITER or BPF_TRACE_FENTRY.
> So filtering out BPF_TRACE_RAW_TP only is enough. >
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> ---
>>>    kernel/bpf/verifier.c | 18 +++++++++++++-----
>>>    1 file changed, 13 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index e3cf6194c24f..3cd8260827e0 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -12574,14 +12574,20 @@ static int check_map_prealloc(struct bpf_map *map)
>>>                !(map->map_flags & BPF_F_NO_PREALLOC);
>>>    }
>>>
>>> -static bool is_tracing_prog_type(enum bpf_prog_type type)
>>> +static bool is_tracing_prog_type(enum bpf_prog_type prog_type,
>>> +                              enum bpf_attach_type attach_type)
>>>    {
>>> -     switch (type) {
>>> +     switch (prog_type) {
>>>        case BPF_PROG_TYPE_KPROBE:
>>>        case BPF_PROG_TYPE_TRACEPOINT:
>>>        case BPF_PROG_TYPE_PERF_EVENT:
>>>        case BPF_PROG_TYPE_RAW_TRACEPOINT:
>>> +     case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
>>>                return true;
>>> +     case BPF_PROG_TYPE_TRACING:
>>> +             if (attach_type == BPF_TRACE_RAW_TP)
>>> +                     return true;
>>
>> As Alexei mentioned earlier, here we should have
>>                  if (attach_type != BPF_TRACE_ITER)
>>                          return true;
> 
> That will break selftests/bpf/progs/timer.c, because it uses timer in fentry.

Okay. Let us remove BPF_PROG_TYPE_TRACING from this patch for now.
fentry/fexit/fmod may attach any kallsyms functions and many of them
are called in process context and non-preallocated hashmap totally fine.
It is not good to prohibit non-preallocated hashmap for any 
fentry/fexit/fmod bpf programs.

> 
>> For attach types with BPF_PROG_TYPE_TRACING programs,
>> BPF_TRACE_ITER attach type can only appear in process context.
>> All other attach types may appear in non-process context.
>>
> 
> Thanks for the explanation.
> 
>>> +             return false;
>>>        default:
>>>                return false;
>>>        }
>>> @@ -12601,7 +12607,9 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>>>                                        struct bpf_prog *prog)
>>>
>>>    {
>>> +     enum bpf_attach_type attach_type = prog->expected_attach_type;
>>>        enum bpf_prog_type prog_type = resolve_prog_type(prog);
>>> +
>> [...]
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-09 15:44 ` [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-07-11 19:19   ` Shakeel Butt
  2022-07-13  0:49     ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2022-07-11 19:19 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	kpsingh, quentin, Roman Gushchin, Hao Luo, bpf, Linux MM,
	NeilBrown

On Sat, Jul 9, 2022 at 8:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> if we allocate too much GFP_ATOMIC memory. For example, when we set the
> memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> easily break the memcg limit by force charge. So it is very dangerous to
> use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> too much memory. There's a plan to completely remove __GFP_ATOMIC in the
> mm side[1], so let's use GFP_NOWAIT instead.
>
> We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> too memory expensive for some cases. That means removing __GFP_HIGH
> doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> it-avoiding issues caused by too much memory. So let's remove it.
>
> This fix can also apply to other run-time allocations, for example, the
> allocation in lpm trie, local storage and devmap. So let fix it
> consistently over the bpf code
>
> It also fixes a typo in the comment.
>
> [1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/
>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: NeilBrown <neilb@suse.de>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types
  2022-07-11 19:04       ` Yonghong Song
@ 2022-07-12  8:26         ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2022-07-12  8:26 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, john fastabend, KP Singh, Quentin Monnet,
	Roman Gushchin, Hao Luo, Shakeel Butt, bpf, Linux MM

On Tue, Jul 12, 2022 at 3:04 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/10/22 11:48 PM, Yafang Shao wrote:
> > On Mon, Jul 11, 2022 at 1:51 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/9/22 8:44 AM, Yafang Shao wrote:
> >>> The raw tracepoint may cause unexpected memory allocation if we set
> >>> BPF_F_NO_PREALLOC. So let's warn on it.
> >>
> >> Please extend raw_tracepoint to other attach types which
> >> may cause runtime map allocations.
> >>
> >
> > Per my understanding, it is safe to allocate memory in a non-process
> > context as long as we don't allow blocking it.
> > So this issue doesn't matter with whether it causes runtime map
> > allocations or not, while it really matters with the tracepoint or
> > kprobe.
> > Regarding the tracepoint or kprobe, if we don't use non-preallocated
> > maps, it may allocate other extra memory besides the map element
> > itself.
> > I have verified that it is safe to use non-preallocated maps in
> > BPF_TRACE_ITER or BPF_TRACE_FENTRY.
> > So filtering out BPF_TRACE_RAW_TP only is enough. >
> >>>
> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>> ---
> >>>    kernel/bpf/verifier.c | 18 +++++++++++++-----
> >>>    1 file changed, 13 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>> index e3cf6194c24f..3cd8260827e0 100644
> >>> --- a/kernel/bpf/verifier.c
> >>> +++ b/kernel/bpf/verifier.c
> >>> @@ -12574,14 +12574,20 @@ static int check_map_prealloc(struct bpf_map *map)
> >>>                !(map->map_flags & BPF_F_NO_PREALLOC);
> >>>    }
> >>>
> >>> -static bool is_tracing_prog_type(enum bpf_prog_type type)
> >>> +static bool is_tracing_prog_type(enum bpf_prog_type prog_type,
> >>> +                              enum bpf_attach_type attach_type)
> >>>    {
> >>> -     switch (type) {
> >>> +     switch (prog_type) {
> >>>        case BPF_PROG_TYPE_KPROBE:
> >>>        case BPF_PROG_TYPE_TRACEPOINT:
> >>>        case BPF_PROG_TYPE_PERF_EVENT:
> >>>        case BPF_PROG_TYPE_RAW_TRACEPOINT:
> >>> +     case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
> >>>                return true;
> >>> +     case BPF_PROG_TYPE_TRACING:
> >>> +             if (attach_type == BPF_TRACE_RAW_TP)
> >>> +                     return true;
> >>
> >> As Alexei mentioned earlier, here we should have
> >>                  if (attach_type != BPF_TRACE_ITER)
> >>                          return true;
> >
> > That will break selftests/bpf/progs/timer.c, because it uses timer in fentry.
>
> Okay. Let us remove BPF_PROG_TYPE_TRACING from this patch for now.
> fentry/fexit/fmod may attach any kallsyms functions and many of them
> are called in process context and non-preallocated hashmap totally fine.
> It is not good to prohibit non-preallocated hashmap for any
> fentry/fexit/fmod bpf programs.
>

Got it. I will do it.

-- 
Regards
Yafang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-11 19:19   ` Shakeel Butt
@ 2022-07-13  0:49     ` Alexei Starovoitov
  2022-07-13  2:12       ` Roman Gushchin
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-07-13  0:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Roman Gushchin,
	Hao Luo, bpf, Linux MM, NeilBrown

On Mon, Jul 11, 2022 at 12:19 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Sat, Jul 9, 2022 at 8:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory. There's a plan to completely remove __GFP_ATOMIC in the
> > mm side[1], so let's use GFP_NOWAIT instead.
> >
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> >
> > This fix can also apply to other run-time allocations, for example, the
> > allocation in lpm trie, local storage and devmap. So let fix it
> > consistently over the bpf code
> >
> > It also fixes a typo in the comment.
> >
> > [1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/
> >
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: NeilBrown <neilb@suse.de>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Applied to bpf-next.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority
  2022-07-13  0:49     ` Alexei Starovoitov
@ 2022-07-13  2:12       ` Roman Gushchin
  0 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2022-07-13  2:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Shakeel Butt, Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Quentin Monnet, Hao Luo, bpf, Linux MM,
	NeilBrown

On Tue, Jul 12, 2022 at 05:49:24PM -0700, Alexei Starovoitov wrote:
> On Mon, Jul 11, 2022 at 12:19 PM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Sat, Jul 9, 2022 at 8:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > easily break the memcg limit by force charge. So it is very dangerous to
> > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > too much memory. There's a plan to completely remove __GFP_ATOMIC in the
> > > mm side[1], so let's use GFP_NOWAIT instead.
> > >
> > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > it-avoiding issues caused by too much memory. So let's remove it.
> > >
> > > This fix can also apply to other run-time allocations, for example, the
> > > allocation in lpm trie, local storage and devmap. So let fix it
> > > consistently over the bpf code
> > >
> > > It also fixes a typo in the comment.
> > >
> > > [1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/
> > >
> > > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > > Cc: Shakeel Butt <shakeelb@google.com>
> > > Cc: NeilBrown <neilb@suse.de>
> > > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> 
> Applied to bpf-next.

Looks like I'm a bit late to the party, but my ack still applies.

Thanks!


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-13  2:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-09 15:44 [PATCH bpf-next v3 0/2] bpf: Minor fixes for non-preallocated memory Yafang Shao
2022-07-09 15:44 ` [PATCH bpf-next v3 1/2] bpf: Make non-preallocated allocation low priority Yafang Shao
2022-07-11 19:19   ` Shakeel Butt
2022-07-13  0:49     ` Alexei Starovoitov
2022-07-13  2:12       ` Roman Gushchin
2022-07-09 15:44 ` [PATCH bpf-next v3 2/2] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
2022-07-10 17:51   ` Yonghong Song
2022-07-11  6:48     ` Yafang Shao
2022-07-11 19:04       ` Yonghong Song
2022-07-12  8:26         ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).