All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/4] bpf: Minor fixes
@ 2022-06-29 15:48 Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Yafang Shao @ 2022-06-29 15:48 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin
  Cc: netdev, bpf, Yafang Shao

When I was implementing bpf recharge[1], I found some other issues.
These issues are independent, so I send them separately.

[1]. https://lore.kernel.org/bpf/20220619155032.32515-1-laoar.shao@gmail.com/

Yafang Shao (4):
  bpf: Make non-preallocated allocation low priority
  bpf: Warn on non-preallocated case for missed trace types
  bpf: Don't do preempt check when migrate is disabled
  bpftool: Show also the name of type BPF_OBJ_LINK

 kernel/bpf/bpf_task_storage.c |  8 ++++----
 kernel/bpf/hashtab.c          | 14 ++++++++------
 kernel/bpf/trampoline.c       |  4 ++--
 kernel/bpf/verifier.c         |  2 ++
 tools/bpf/bpftool/common.c    |  1 +
 5 files changed, 17 insertions(+), 12 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-06-29 15:48 [PATCH bpf-next 0/4] bpf: Minor fixes Yafang Shao
@ 2022-06-29 15:48 ` Yafang Shao
  2022-06-30 21:47   ` Daniel Borkmann
  2022-07-02  3:54   ` Roman Gushchin
  2022-06-29 15:48 ` [PATCH bpf-next 2/4] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Yafang Shao @ 2022-06-29 15:48 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin
  Cc: netdev, bpf, Yafang Shao, Roman Gushchin

GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
if we allocate too much GFP_ATOMIC memory. For example, when we set the
memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
easily break the memcg limit by force charge. So it is very dangerous to
use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
__GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
too much memory.

We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
too memory expensive for some cases. That means removing __GFP_HIGH
doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
it-avoiding issues caused by too much memory. So let's remove it.

__GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
currently. But the memcg code can be improved to make
__GFP_KSWAPD_RECLAIM work well under memcg pressure.

It also fixes a typo in the comment.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
---
 kernel/bpf/hashtab.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 17fb69c0e0dc..9d4559a1c032 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -61,7 +61,7 @@
  *
  * As regular device interrupt handlers and soft interrupts are forced into
  * thread context, the existing code which does
- *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
+ *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
  * just works.
  *
  * In theory the BPF locks could be converted to regular spinlocks as well,
@@ -978,7 +978,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 				goto dec_count;
 			}
 		l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
-					     GFP_ATOMIC | __GFP_NOWARN,
+					     __GFP_ATOMIC | __GFP_NOWARN |
+					     __GFP_KSWAPD_RECLAIM,
 					     htab->map.numa_node);
 		if (!l_new) {
 			l_new = ERR_PTR(-ENOMEM);
@@ -996,7 +997,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			/* alloc_percpu zero-fills */
 			pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
-						    GFP_ATOMIC | __GFP_NOWARN);
+						    __GFP_ATOMIC | __GFP_NOWARN |
+						    __GFP_KSWAPD_RECLAIM);
 			if (!pptr) {
 				kfree(l_new);
 				l_new = ERR_PTR(-ENOMEM);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 2/4] bpf: Warn on non-preallocated case for missed trace types
  2022-06-29 15:48 [PATCH bpf-next 0/4] bpf: Minor fixes Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-06-29 15:48 ` Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK Yafang Shao
  3 siblings, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2022-06-29 15:48 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin
  Cc: netdev, bpf, Yafang Shao

BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE and BPF_PROG_TYPE_TRACING are
trace type as well, which may also cause unexpected memory allocation if
we set BPF_F_NO_PREALLOC.
Let's also warn on both of them.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4938477912cd..8452e5746f59 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12539,6 +12539,8 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_TRACEPOINT:
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
+	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
+	case BPF_PROG_TYPE_TRACING:
 		return true;
 	default:
 		return false;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled
  2022-06-29 15:48 [PATCH bpf-next 0/4] bpf: Minor fixes Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
  2022-06-29 15:48 ` [PATCH bpf-next 2/4] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
@ 2022-06-29 15:48 ` Yafang Shao
  2022-06-30 20:43   ` Hao Luo
  2022-06-29 15:48 ` [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK Yafang Shao
  3 siblings, 1 reply; 14+ messages in thread
From: Yafang Shao @ 2022-06-29 15:48 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin
  Cc: netdev, bpf, Yafang Shao

It doesn't need to do the preempt check when migrate is disabled
after commit
74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT").

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/bpf_task_storage.c | 8 ++++----
 kernel/bpf/hashtab.c          | 6 +++---
 kernel/bpf/trampoline.c       | 4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index e9014dc62682..6f290623347e 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -26,20 +26,20 @@ static DEFINE_PER_CPU(int, bpf_task_storage_busy);
 static void bpf_task_storage_lock(void)
 {
 	migrate_disable();
-	__this_cpu_inc(bpf_task_storage_busy);
+	this_cpu_inc(bpf_task_storage_busy);
 }
 
 static void bpf_task_storage_unlock(void)
 {
-	__this_cpu_dec(bpf_task_storage_busy);
+	this_cpu_dec(bpf_task_storage_busy);
 	migrate_enable();
 }
 
 static bool bpf_task_storage_trylock(void)
 {
 	migrate_disable();
-	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
-		__this_cpu_dec(bpf_task_storage_busy);
+	if (unlikely(this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
+		this_cpu_dec(bpf_task_storage_busy);
 		migrate_enable();
 		return false;
 	}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 9d4559a1c032..6a3a95037aac 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -166,8 +166,8 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
 	hash = hash & HASHTAB_MAP_LOCK_MASK;
 
 	migrate_disable();
-	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
-		__this_cpu_dec(*(htab->map_locked[hash]));
+	if (unlikely(this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
+		this_cpu_dec(*(htab->map_locked[hash]));
 		migrate_enable();
 		return -EBUSY;
 	}
@@ -190,7 +190,7 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
 		raw_spin_unlock_irqrestore(&b->raw_lock, flags);
 	else
 		spin_unlock_irqrestore(&b->lock, flags);
-	__this_cpu_dec(*(htab->map_locked[hash]));
+	this_cpu_dec(*(htab->map_locked[hash]));
 	migrate_enable();
 }
 
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 93c7675f0c9e..f4486e54fdb3 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -585,7 +585,7 @@ u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *ru
 
 	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
 
-	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
+	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		inc_misses_counter(prog);
 		return 0;
 	}
@@ -631,7 +631,7 @@ u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_r
 	migrate_disable();
 	might_fault();
 
-	if (unlikely(__this_cpu_inc_return(*(prog->active)) != 1)) {
+	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		inc_misses_counter(prog);
 		return 0;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK
  2022-06-29 15:48 [PATCH bpf-next 0/4] bpf: Minor fixes Yafang Shao
                   ` (2 preceding siblings ...)
  2022-06-29 15:48 ` [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled Yafang Shao
@ 2022-06-29 15:48 ` Yafang Shao
  2022-06-29 16:22   ` Quentin Monnet
  3 siblings, 1 reply; 14+ messages in thread
From: Yafang Shao @ 2022-06-29 15:48 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin
  Cc: netdev, bpf, Yafang Shao

For example,
/sys/fs/bpf/maps.debug is a bpf link, when you run `bpftool map show` to
show it,
- before
  $ bpftool map show pinned /sys/fs/bpf/maps.debug
  Error: incorrect object type: unknown
- after
  $ bpftool map show pinned /sys/fs/bpf/maps.debug
  Error: incorrect object type: link

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 tools/bpf/bpftool/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index a0d4acd7c54a..5e979269c89a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -251,6 +251,7 @@ const char *get_fd_type_name(enum bpf_obj_type type)
 		[BPF_OBJ_UNKNOWN]	= "unknown",
 		[BPF_OBJ_PROG]		= "prog",
 		[BPF_OBJ_MAP]		= "map",
+		[BPF_OBJ_LINK]		= "link",
 	};
 
 	if (type < 0 || type >= ARRAY_SIZE(names) || !names[type])
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK
  2022-06-29 15:48 ` [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK Yafang Shao
@ 2022-06-29 16:22   ` Quentin Monnet
  2022-06-30 21:55     ` Daniel Borkmann
  0 siblings, 1 reply; 14+ messages in thread
From: Quentin Monnet @ 2022-06-29 16:22 UTC (permalink / raw)
  To: Yafang Shao, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh
  Cc: netdev, bpf

On 29/06/2022 16:48, Yafang Shao wrote:
> For example,
> /sys/fs/bpf/maps.debug is a bpf link, when you run `bpftool map show` to
> show it,
> - before
>   $ bpftool map show pinned /sys/fs/bpf/maps.debug
>   Error: incorrect object type: unknown
> - after
>   $ bpftool map show pinned /sys/fs/bpf/maps.debug
>   Error: incorrect object type: link
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  tools/bpf/bpftool/common.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index a0d4acd7c54a..5e979269c89a 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -251,6 +251,7 @@ const char *get_fd_type_name(enum bpf_obj_type type)
>  		[BPF_OBJ_UNKNOWN]	= "unknown",
>  		[BPF_OBJ_PROG]		= "prog",
>  		[BPF_OBJ_MAP]		= "map",
> +		[BPF_OBJ_LINK]		= "link",
>  	};
>  
>  	if (type < 0 || type >= ARRAY_SIZE(names) || !names[type])

Reviewed-by: Quentin Monnet <quentin@isovalent.com>

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled
  2022-06-29 15:48 ` [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled Yafang Shao
@ 2022-06-30 20:43   ` Hao Luo
  2022-07-02  2:34     ` Yafang Shao
  0 siblings, 1 reply; 14+ messages in thread
From: Hao Luo @ 2022-06-30 20:43 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, netdev, bpf

Hi Yafang,

On Wed, Jun 29, 2022 at 8:49 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> It doesn't need to do the preempt check when migrate is disabled
> after commit
> 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT").
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---

In my understanding, migrate_disable() doesn't imply
preempt_disable(), I think this is not safe. Am I missing something?

Hao

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
@ 2022-06-30 21:47   ` Daniel Borkmann
  2022-07-02  2:30     ` Yafang Shao
  2022-07-02  4:14     ` Roman Gushchin
  2022-07-02  3:54   ` Roman Gushchin
  1 sibling, 2 replies; 14+ messages in thread
From: Daniel Borkmann @ 2022-06-30 21:47 UTC (permalink / raw)
  To: Yafang Shao, ast, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, quentin
  Cc: netdev, bpf, Roman Gushchin

Hi Yafang,

On 6/29/22 5:48 PM, Yafang Shao wrote:
> GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> if we allocate too much GFP_ATOMIC memory. For example, when we set the
> memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> easily break the memcg limit by force charge. So it is very dangerous to
> use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> too much memory.
> 
> We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> too memory expensive for some cases. That means removing __GFP_HIGH
> doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> it-avoiding issues caused by too much memory. So let's remove it.
> 
> __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> currently. But the memcg code can be improved to make
> __GFP_KSWAPD_RECLAIM work well under memcg pressure.

Ok, but could you also explain in commit desc why it's a specific problem
to BPF hashtab?

Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside
of BPF e.g. under net/, so it's a generic memcg problem?

Why are lpm trie and local storage map for BPF not affected (at least I don't
see them covered in the patch)?

Thanks,
Daniel

> It also fixes a typo in the comment.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>   kernel/bpf/hashtab.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 17fb69c0e0dc..9d4559a1c032 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -61,7 +61,7 @@
>    *
>    * As regular device interrupt handlers and soft interrupts are forced into
>    * thread context, the existing code which does
> - *   spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
> + *   spin_lock*(); alloc(GFP_ATOMIC); spin_unlock*();
>    * just works.
>    *
>    * In theory the BPF locks could be converted to regular spinlocks as well,
> @@ -978,7 +978,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
>   				goto dec_count;
>   			}
>   		l_new = bpf_map_kmalloc_node(&htab->map, htab->elem_size,
> -					     GFP_ATOMIC | __GFP_NOWARN,
> +					     __GFP_ATOMIC | __GFP_NOWARN |
> +					     __GFP_KSWAPD_RECLAIM,
>   					     htab->map.numa_node);
>   		if (!l_new) {
>   			l_new = ERR_PTR(-ENOMEM);
> @@ -996,7 +997,8 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
>   		} else {
>   			/* alloc_percpu zero-fills */
>   			pptr = bpf_map_alloc_percpu(&htab->map, size, 8,
> -						    GFP_ATOMIC | __GFP_NOWARN);
> +						    __GFP_ATOMIC | __GFP_NOWARN |
> +						    __GFP_KSWAPD_RECLAIM);
>   			if (!pptr) {
>   				kfree(l_new);
>   				l_new = ERR_PTR(-ENOMEM);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK
  2022-06-29 16:22   ` Quentin Monnet
@ 2022-06-30 21:55     ` Daniel Borkmann
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2022-06-30 21:55 UTC (permalink / raw)
  To: Quentin Monnet, Yafang Shao, ast, andrii, kafai, songliubraving,
	yhs, john.fastabend, kpsingh
  Cc: netdev, bpf

On 6/29/22 6:22 PM, Quentin Monnet wrote:
> On 29/06/2022 16:48, Yafang Shao wrote:
>> For example,
>> /sys/fs/bpf/maps.debug is a bpf link, when you run `bpftool map show` to
>> show it,
>> - before
>>    $ bpftool map show pinned /sys/fs/bpf/maps.debug
>>    Error: incorrect object type: unknown
>> - after
>>    $ bpftool map show pinned /sys/fs/bpf/maps.debug
>>    Error: incorrect object type: link
>>
>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>> ---
>>   tools/bpf/bpftool/common.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>> index a0d4acd7c54a..5e979269c89a 100644
>> --- a/tools/bpf/bpftool/common.c
>> +++ b/tools/bpf/bpftool/common.c
>> @@ -251,6 +251,7 @@ const char *get_fd_type_name(enum bpf_obj_type type)
>>   		[BPF_OBJ_UNKNOWN]	= "unknown",
>>   		[BPF_OBJ_PROG]		= "prog",
>>   		[BPF_OBJ_MAP]		= "map",
>> +		[BPF_OBJ_LINK]		= "link",
>>   	};
>>   
>>   	if (type < 0 || type >= ARRAY_SIZE(names) || !names[type])
> 
> Reviewed-by: Quentin Monnet <quentin@isovalent.com>

(Took this one in so far given fairly independent of the rest, thanks Yafang!)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-06-30 21:47   ` Daniel Borkmann
@ 2022-07-02  2:30     ` Yafang Shao
  2022-07-02  4:14     ` Roman Gushchin
  1 sibling, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2022-07-02  2:30 UTC (permalink / raw)
  To: Daniel Borkmann, Shakeel Butt, Johannes Weiner
  Cc: Alexei Starovoitov, Andrii Nakryiko, Martin Lau, Song Liu,
	Yonghong Song, john fastabend, KP Singh, Quentin Monnet, netdev,
	bpf, Roman Gushchin

On Fri, Jul 1, 2022 at 5:47 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Hi Yafang,
>
> On 6/29/22 5:48 PM, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
> >
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> >
> > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > currently. But the memcg code can be improved to make
> > __GFP_KSWAPD_RECLAIM work well under memcg pressure.
>
> Ok, but could you also explain in commit desc why it's a specific problem
> to BPF hashtab?
>

It should be a specific problem to non-preallocated BPF maps.
BPF program has its special attribute that it can run without user
application, for example, when it is pinned.
So if the user agent exits after pinning the BPF program, and if the
BPF maps are not preallocated, the memory allocated by the maps will
be continuously charged to a memcg. Then it may happen that the memcg
will eventually be filled with kernel memory. That is not a usual case
for other memcg users. Finally the memcg can't limit it because it can
force charge.

Regarding the preallocated BPF maps, it doesn't have this problem
because all its memory is allocated at load time, and if the memory
exceeds the memcg limit, it won't be loaded.

> Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside
> of BPF e.g. under net/, so it's a generic memcg problem?
>

I haven't checked all other GFP_ATOMIC usage yet.
But per my understanding, the net/ code should have user applications,
and if it exceeds the memcg limit, the user applications will be
killed and then it will stop allocating new memory.

[ + Shakeel, Johannes ]

This issue can be fixed in the memcg charge path.  But the memg charge
path is fragile.
The force charge of GFP_ATOMIC was introduced in
commit 869712fd3de5 ("mm: memcontrol: fix network errors from failing
__GFP_ATOMIC charges") by checking __GFP_ATOMIC, and then got improved
in
commit 1461e8c2b6af ("memcg: unify force charging conditions") by
checking __GFP_HIGH (that is no problem because both __GFP_HIGH and
__GFP_ATOMIC are set in GFP_AOMIC).
So it seems that once we fix a problem in memcg, another problem may
be introduced by this fix. So, if we want to fix it in memcg, we have
to carefully verify all the callsites. Now that we can fix it in BPF,
we'd better not modify the memcg code.  But maybe a comment is needed
in memcg, in case someone may change the behavior of GFP_ATOMIC in the
memg charge path once more, for example,

nomem:
+   /* Pls, don't force charge __GFP_ATOMIC allocation if
+    * __GFP_HIGH or __GFP_NOFAIL is not set with it.
+    */
    if (!(gfp_mask & (__GFP_NOFAIL | __GFP_HIGH)))
        return -ENOMEM;
force:


> Why are lpm trie and local storage map for BPF not affected (at least I don't
> see them covered in the patch)?
>

I haven't verified lpm trie and local storage map yet.
I will verify them.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled
  2022-06-30 20:43   ` Hao Luo
@ 2022-07-02  2:34     ` Yafang Shao
  0 siblings, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2022-07-02  2:34 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Quentin Monnet, netdev, bpf

On Fri, Jul 1, 2022 at 4:43 AM Hao Luo <haoluo@google.com> wrote:
>
> Hi Yafang,
>
> On Wed, Jun 29, 2022 at 8:49 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > It doesn't need to do the preempt check when migrate is disabled
> > after commit
> > 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT").
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
>
> In my understanding, migrate_disable() doesn't imply
> preempt_disable(), I think this is not safe. Am I missing something?
>

It seems I have some misunderstanding of it after second thoughts.
I will think more about it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
  2022-06-30 21:47   ` Daniel Borkmann
@ 2022-07-02  3:54   ` Roman Gushchin
  1 sibling, 0 replies; 14+ messages in thread
From: Roman Gushchin @ 2022-07-02  3:54 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, quentin, netdev, bpf

On Wed, Jun 29, 2022 at 03:48:29PM +0000, Yafang Shao wrote:
> GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> if we allocate too much GFP_ATOMIC memory. For example, when we set the
> memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> easily break the memcg limit by force charge. So it is very dangerous to
> use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> too much memory.
> 
> We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> too memory expensive for some cases. That means removing __GFP_HIGH
> doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> it-avoiding issues caused by too much memory. So let's remove it.
> 
> __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> currently. But the memcg code can be improved to make
> __GFP_KSWAPD_RECLAIM work well under memcg pressure.
> 
> It also fixes a typo in the comment.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>

I agree, it makes total sense to me. Bpf allocations are not high priority
and should not be enforced both on memcg and page allocator levels.

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks, Yafang!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-06-30 21:47   ` Daniel Borkmann
  2022-07-02  2:30     ` Yafang Shao
@ 2022-07-02  4:14     ` Roman Gushchin
  2022-07-02 15:08       ` Yafang Shao
  1 sibling, 1 reply; 14+ messages in thread
From: Roman Gushchin @ 2022-07-02  4:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Yafang Shao, ast, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, quentin, netdev, bpf

On Thu, Jun 30, 2022 at 11:47:00PM +0200, Daniel Borkmann wrote:
> Hi Yafang,
> 
> On 6/29/22 5:48 PM, Yafang Shao wrote:
> > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > easily break the memcg limit by force charge. So it is very dangerous to
> > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > too much memory.
> > 
> > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > too memory expensive for some cases. That means removing __GFP_HIGH
> > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > it-avoiding issues caused by too much memory. So let's remove it.
> > 
> > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > currently. But the memcg code can be improved to make
> > __GFP_KSWAPD_RECLAIM work well under memcg pressure.
> 
> Ok, but could you also explain in commit desc why it's a specific problem
> to BPF hashtab?
> 
> Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside
> of BPF e.g. under net/, so it's a generic memcg problem?

I'd be careful here and not change it all together.

__GFP_NOWARN might be used to suppress warnings which otherwise would be too
verbose and disruptive (especially if we talk about /net allocations in
conjunction with netconsole) and might not mean a low/lower priority.

> Why are lpm trie and local storage map for BPF not affected (at least I don't
> see them covered in the patch)?

Yes, it would be nice to fix this consistently over the bpf code.
Yafang, would you mind to fix it too?

Thanks!

Roman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority
  2022-07-02  4:14     ` Roman Gushchin
@ 2022-07-02 15:08       ` Yafang Shao
  0 siblings, 0 replies; 14+ messages in thread
From: Yafang Shao @ 2022-07-02 15:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh,
	Quentin Monnet, netdev, bpf

On Sat, Jul 2, 2022 at 12:14 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Thu, Jun 30, 2022 at 11:47:00PM +0200, Daniel Borkmann wrote:
> > Hi Yafang,
> >
> > On 6/29/22 5:48 PM, Yafang Shao wrote:
> > > GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially
> > > if we allocate too much GFP_ATOMIC memory. For example, when we set the
> > > memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can
> > > easily break the memcg limit by force charge. So it is very dangerous to
> > > use GFP_ATOMIC in non-preallocated case. One way to make it safe is to
> > > remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC |
> > > __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate
> > > too much memory.
> > >
> > > We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is
> > > too memory expensive for some cases. That means removing __GFP_HIGH
> > > doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with
> > > it-avoiding issues caused by too much memory. So let's remove it.
> > >
> > > __GFP_KSWAPD_RECLAIM doesn't cooperate well with memcg pressure neither
> > > currently. But the memcg code can be improved to make
> > > __GFP_KSWAPD_RECLAIM work well under memcg pressure.
> >
> > Ok, but could you also explain in commit desc why it's a specific problem
> > to BPF hashtab?
> >
> > Afaik, there is plenty of other code using GFP_ATOMIC | __GFP_NOWARN outside
> > of BPF e.g. under net/, so it's a generic memcg problem?
>
> I'd be careful here and not change it all together.
>
> __GFP_NOWARN might be used to suppress warnings which otherwise would be too
> verbose and disruptive (especially if we talk about /net allocations in
> conjunction with netconsole) and might not mean a low/lower priority.
>
> > Why are lpm trie and local storage map for BPF not affected (at least I don't
> > see them covered in the patch)?
>
> Yes, it would be nice to fix this consistently over the bpf code.
> Yafang, would you mind to fix it too?
>

I will fix it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-02 15:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 15:48 [PATCH bpf-next 0/4] bpf: Minor fixes Yafang Shao
2022-06-29 15:48 ` [PATCH bpf-next 1/4] bpf: Make non-preallocated allocation low priority Yafang Shao
2022-06-30 21:47   ` Daniel Borkmann
2022-07-02  2:30     ` Yafang Shao
2022-07-02  4:14     ` Roman Gushchin
2022-07-02 15:08       ` Yafang Shao
2022-07-02  3:54   ` Roman Gushchin
2022-06-29 15:48 ` [PATCH bpf-next 2/4] bpf: Warn on non-preallocated case for missed trace types Yafang Shao
2022-06-29 15:48 ` [PATCH bpf-next 3/4] bpf: Don't do preempt check when migrate is disabled Yafang Shao
2022-06-30 20:43   ` Hao Luo
2022-07-02  2:34     ` Yafang Shao
2022-06-29 15:48 ` [PATCH bpf-next 4/4] bpftool: Show also the name of type BPF_OBJ_LINK Yafang Shao
2022-06-29 16:22   ` Quentin Monnet
2022-06-30 21:55     ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.