All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs()
@ 2023-12-08 10:23 Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 1/7] bpf: Remove unnecessary wait from bpf_map_copy_value() Hou Tao
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The patch set aims to fix the problems found when inspecting the code
related with maybe_wait_bpf_programs().

Patch #1 removes unnecessary invocation of maybe_wait_bpf_programs().
Patch #2 calls maybe_wait_bpf_programs() only once for batched update.
Patch #3 adds the missed waiting when doing batched lookup_deletion on
htab of maps. Patch #4 does wait only if the update or deletion
operation succeeds. Patch #5 fixes the value of batch.count when memory
allocation fails. Patch #6 does the similar thing as patch #4, except it
fixes the problem for batched map operations. Patch #7 handles sleepable
BPF program in maybe_wait_bpf_programs(), but it doesn't handle the bpf
syscall from syscall program.

Please see individual patches for more details. Comments are always
welcome.

Hou Tao (7):
  bpf: Remove unnecessary wait from bpf_map_copy_value()
  bpf: Call maybe_wait_bpf_programs() only once for
    generic_map_update_batch()
  bpf: Add missed maybe_wait_bpf_programs() for htab of maps
  bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
  bpf: Set uattr->batch.count as zero before batched update or deletion
  bpf: Only call maybe_wait_bpf_programs() when at least one map
    operation succeeds
  bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()

 include/linux/bpf.h  | 14 +++++------
 kernel/bpf/hashtab.c | 20 ++++++++-------
 kernel/bpf/syscall.c | 60 +++++++++++++++++++++++++++++++-------------
 3 files changed, 61 insertions(+), 33 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 1/7] bpf: Remove unnecessary wait from bpf_map_copy_value()
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 2/7] bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch() Hou Tao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Both map_lookup_elem() and generic_map_lookup_batch() use
bpf_map_copy_value() to lookup and copy the value, and there is no
update operation in bpf_map_copy_value(), so just remove the invocation
of maybe_wait_bpf_programs() from it.

Fixes: 15c14a3dca42 ("bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ccd14d8e43df..71d869f74236 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -264,7 +264,6 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	}
 
 	bpf_enable_instrumentation();
-	maybe_wait_bpf_programs(map);
 
 	return err;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 2/7] bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 1/7] bpf: Remove unnecessary wait from bpf_map_copy_value() Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 3/7] bpf: Add missed maybe_wait_bpf_programs() for htab of maps Hou Tao
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Just like commit 9087c6ff8dfe ("bpf: Call maybe_wait_bpf_programs() only
once from generic_map_delete_batch()"), there is also no need to call
maybe_wait_bpf_programs() for each update in batched update, so only
call it once in generic_map_update_batch().

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 71d869f74236..9d9bb82e6062 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -203,7 +203,6 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
 		rcu_read_unlock();
 	}
 	bpf_enable_instrumentation();
-	maybe_wait_bpf_programs(map);
 
 	return err;
 }
@@ -1561,6 +1560,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 	}
 
 	err = bpf_map_update_value(map, f.file, key, value, attr->flags);
+	maybe_wait_bpf_programs(map);
 
 	kvfree(value);
 free_key:
@@ -1800,6 +1800,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 
 	kvfree(value);
 	kvfree(key);
+
+	maybe_wait_bpf_programs(map);
 	return err;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 3/7] bpf: Add missed maybe_wait_bpf_programs() for htab of maps
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 1/7] bpf: Remove unnecessary wait from bpf_map_copy_value() Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 2/7] bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch() Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 4/7] bpf: Only call maybe_wait_bpf_programs() when map operation succeeds Hou Tao
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

When doing batched lookup and deletion operations on htab of maps,
maybe_wait_bpf_programs() is needed to ensure all programs don't use the
inner map after the bpf syscall returns.

Instead of adding the wait in __htab_map_lookup_and_delete_batch(),
adding the wait in bpf_map_do_batch() and also removing the calling of
maybe_wait_bpf_programs() from generic_map_{delete,update}_batch().

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9d9bb82e6062..47a7c8786e70 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1742,7 +1742,6 @@ int generic_map_delete_batch(struct bpf_map *map,
 
 	kvfree(key);
 
-	maybe_wait_bpf_programs(map);
 	return err;
 }
 
@@ -1801,7 +1800,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 	kvfree(value);
 	kvfree(key);
 
-	maybe_wait_bpf_programs(map);
 	return err;
 }
 
@@ -4959,8 +4957,10 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 	else
 		BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
 err_put:
-	if (has_write)
+	if (has_write) {
+		maybe_wait_bpf_programs(map);
 		bpf_map_write_active_dec(map);
+	}
 	fdput(f);
 	return err;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 4/7] bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
                   ` (2 preceding siblings ...)
  2023-12-08 10:23 ` [PATCH bpf-next 3/7] bpf: Add missed maybe_wait_bpf_programs() for htab of maps Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 5/7] bpf: Set uattr->batch.count as zero before batched update or deletion Hou Tao
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

There is no need to call maybe_wait_bpf_programs() if update or deletion
operation fails. So only call maybe_wait_bpf_programs() if update or
deletion operation succeeds.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 47a7c8786e70..b711cc941664 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1560,7 +1560,8 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 	}
 
 	err = bpf_map_update_value(map, f.file, key, value, attr->flags);
-	maybe_wait_bpf_programs(map);
+	if (!err)
+		maybe_wait_bpf_programs(map);
 
 	kvfree(value);
 free_key:
@@ -1616,7 +1617,8 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 	err = map->ops->map_delete_elem(map, key);
 	rcu_read_unlock();
 	bpf_enable_instrumentation();
-	maybe_wait_bpf_programs(map);
+	if (!err)
+		maybe_wait_bpf_programs(map);
 out:
 	kvfree(key);
 err_put:
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 5/7] bpf: Set uattr->batch.count as zero before batched update or deletion
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
                   ` (3 preceding siblings ...)
  2023-12-08 10:23 ` [PATCH bpf-next 4/7] bpf: Only call maybe_wait_bpf_programs() when map operation succeeds Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 10:23 ` [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds Hou Tao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

generic_map_{delete,update}_batch() doesn't set uattr->batch.count as
zero before it tries to allocate memory for key. If the memory
allocation fails, the value of uattr->batch.count will be incorrect.

Fix it by setting uattr->batch.count as zero beore batched update or
deletion.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b711cc941664..efda2353a7d5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1715,6 +1715,9 @@ int generic_map_delete_batch(struct bpf_map *map,
 	if (!max_count)
 		return 0;
 
+	if (put_user(0, &uattr->batch.count))
+		return -EFAULT;
+
 	key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN);
 	if (!key)
 		return -ENOMEM;
@@ -1771,6 +1774,9 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 	if (!max_count)
 		return 0;
 
+	if (put_user(0, &uattr->batch.count))
+		return -EFAULT;
+
 	key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN);
 	if (!key)
 		return -ENOMEM;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
                   ` (4 preceding siblings ...)
  2023-12-08 10:23 ` [PATCH bpf-next 5/7] bpf: Set uattr->batch.count as zero before batched update or deletion Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 22:55   ` Yonghong Song
  2023-12-08 10:23 ` [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs() Hou Tao
  2023-12-10  2:40 ` [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() patchwork-bot+netdevbpf
  7 siblings, 1 reply; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

There is no need to call maybe_wait_bpf_programs() if all operations in
batched update, deletion, or lookup_and_deletion fail. So only call
maybe_wait_bpf_programs() if at least one map operation succeeds.

Similar with uattr->batch.count which is used to return the number of
succeeded map operations to userspace application, use attr->batch.count
to record the number of succeeded map operations in kernel. Sometimes
these two number may be different. For example, in
__htab_map_lookup_and_delete_batch(do_delete=true), it is possible that
10 items in current bucket have been successfully deleted, but copying
the deleted keys to userspace application fails, attr->batch.count will
be 10 but uattr->batch.count will be 0 instead.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf.h  | 14 +++++++-------
 kernel/bpf/hashtab.c | 20 +++++++++++---------
 kernel/bpf/syscall.c | 21 ++++++++++++++-------
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f7aa255c634f..a0c4d696a231 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -81,17 +81,17 @@ struct bpf_map_ops {
 	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
 	void (*map_release_uref)(struct bpf_map *map);
 	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
-	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
+	int (*map_lookup_batch)(struct bpf_map *map, union bpf_attr *attr,
 				union bpf_attr __user *uattr);
 	int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key,
 					  void *value, u64 flags);
 	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
-					   const union bpf_attr *attr,
+					   union bpf_attr *attr,
 					   union bpf_attr __user *uattr);
 	int (*map_update_batch)(struct bpf_map *map, struct file *map_file,
-				const union bpf_attr *attr,
+				union bpf_attr *attr,
 				union bpf_attr __user *uattr);
-	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
+	int (*map_delete_batch)(struct bpf_map *map, union bpf_attr *attr,
 				union bpf_attr __user *uattr);
 
 	/* funcs callable from userspace and from eBPF programs */
@@ -2095,13 +2095,13 @@ void bpf_map_area_free(void *base);
 bool bpf_map_write_active(const struct bpf_map *map);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
 int  generic_map_lookup_batch(struct bpf_map *map,
-			      const union bpf_attr *attr,
+			      union bpf_attr *attr,
 			      union bpf_attr __user *uattr);
 int  generic_map_update_batch(struct bpf_map *map, struct file *map_file,
-			      const union bpf_attr *attr,
+			      union bpf_attr *attr,
 			      union bpf_attr __user *uattr);
 int  generic_map_delete_batch(struct bpf_map *map,
-			      const union bpf_attr *attr,
+			      union bpf_attr *attr,
 			      union bpf_attr __user *uattr);
 struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
 struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 5b9146fa825f..b777bd8d4f8d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1673,7 +1673,7 @@ static int htab_lru_percpu_map_lookup_and_delete_elem(struct bpf_map *map,
 
 static int
 __htab_map_lookup_and_delete_batch(struct bpf_map *map,
-				   const union bpf_attr *attr,
+				   union bpf_attr *attr,
 				   union bpf_attr __user *uattr,
 				   bool do_delete, bool is_lru_map,
 				   bool is_percpu)
@@ -1708,6 +1708,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 	if (!max_count)
 		return 0;
 
+	attr->batch.count = 0;
 	if (put_user(0, &uattr->batch.count))
 		return -EFAULT;
 
@@ -1845,6 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 		}
 		dst_key += key_size;
 		dst_val += value_size;
+		attr->batch.count++;
 	}
 
 	htab_unlock_bucket(htab, b, batch, flags);
@@ -1900,7 +1902,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
 }
 
 static int
-htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+htab_percpu_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
 			     union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
@@ -1909,7 +1911,7 @@ htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
 
 static int
 htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
-					const union bpf_attr *attr,
+					union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
@@ -1917,7 +1919,7 @@ htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
 }
 
 static int
-htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+htab_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
 		      union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
@@ -1926,7 +1928,7 @@ htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
 
 static int
 htab_map_lookup_and_delete_batch(struct bpf_map *map,
-				 const union bpf_attr *attr,
+				 union bpf_attr *attr,
 				 union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
@@ -1935,7 +1937,7 @@ htab_map_lookup_and_delete_batch(struct bpf_map *map,
 
 static int
 htab_lru_percpu_map_lookup_batch(struct bpf_map *map,
-				 const union bpf_attr *attr,
+				 union bpf_attr *attr,
 				 union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
@@ -1944,7 +1946,7 @@ htab_lru_percpu_map_lookup_batch(struct bpf_map *map,
 
 static int
 htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
-					    const union bpf_attr *attr,
+					    union bpf_attr *attr,
 					    union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
@@ -1952,7 +1954,7 @@ htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
 }
 
 static int
-htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+htab_lru_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
@@ -1961,7 +1963,7 @@ htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
 
 static int
 htab_lru_map_lookup_and_delete_batch(struct bpf_map *map,
-				     const union bpf_attr *attr,
+				     union bpf_attr *attr,
 				     union bpf_attr __user *uattr)
 {
 	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index efda2353a7d5..d2641e51a1a7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1695,7 +1695,7 @@ static int map_get_next_key(union bpf_attr *attr)
 }
 
 int generic_map_delete_batch(struct bpf_map *map,
-			     const union bpf_attr *attr,
+			     union bpf_attr *attr,
 			     union bpf_attr __user *uattr)
 {
 	void __user *keys = u64_to_user_ptr(attr->batch.keys);
@@ -1715,6 +1715,7 @@ int generic_map_delete_batch(struct bpf_map *map,
 	if (!max_count)
 		return 0;
 
+	attr->batch.count = 0;
 	if (put_user(0, &uattr->batch.count))
 		return -EFAULT;
 
@@ -1742,6 +1743,8 @@ int generic_map_delete_batch(struct bpf_map *map,
 			break;
 		cond_resched();
 	}
+
+	attr->batch.count = cp;
 	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
 		err = -EFAULT;
 
@@ -1751,7 +1754,7 @@ int generic_map_delete_batch(struct bpf_map *map,
 }
 
 int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
-			     const union bpf_attr *attr,
+			     union bpf_attr *attr,
 			     union bpf_attr __user *uattr)
 {
 	void __user *values = u64_to_user_ptr(attr->batch.values);
@@ -1774,6 +1777,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 	if (!max_count)
 		return 0;
 
+	attr->batch.count = 0;
 	if (put_user(0, &uattr->batch.count))
 		return -EFAULT;
 
@@ -1802,6 +1806,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 		cond_resched();
 	}
 
+	attr->batch.count = cp;
 	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
 		err = -EFAULT;
 
@@ -1813,9 +1818,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
 
 #define MAP_LOOKUP_RETRIES 3
 
-int generic_map_lookup_batch(struct bpf_map *map,
-				    const union bpf_attr *attr,
-				    union bpf_attr __user *uattr)
+int generic_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
+			     union bpf_attr __user *uattr)
 {
 	void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
 	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
@@ -1838,6 +1842,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
 	if (!max_count)
 		return 0;
 
+	attr->batch.count = 0;
 	if (put_user(0, &uattr->batch.count))
 		return -EFAULT;
 
@@ -1903,6 +1908,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
 	if (err == -EFAULT)
 		goto free_buf;
 
+	attr->batch.count = cp;
 	if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
 		    (cp && copy_to_user(uobatch, prev_key, map->key_size))))
 		err = -EFAULT;
@@ -4926,7 +4932,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
 		err = fn(__VA_ARGS__);		\
 	} while (0)
 
-static int bpf_map_do_batch(const union bpf_attr *attr,
+static int bpf_map_do_batch(union bpf_attr *attr,
 			    union bpf_attr __user *uattr,
 			    int cmd)
 {
@@ -4966,7 +4972,8 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 		BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
 err_put:
 	if (has_write) {
-		maybe_wait_bpf_programs(map);
+		if (attr->batch.count)
+			maybe_wait_bpf_programs(map);
 		bpf_map_write_active_dec(map);
 	}
 	fdput(f);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
                   ` (5 preceding siblings ...)
  2023-12-08 10:23 ` [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds Hou Tao
@ 2023-12-08 10:23 ` Hou Tao
  2023-12-08 22:30   ` Andrii Nakryiko
  2023-12-10  2:11   ` Alexei Starovoitov
  2023-12-10  2:40 ` [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() patchwork-bot+netdevbpf
  7 siblings, 2 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-08 10:23 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

From: Hou Tao <houtao1@huawei.com>

Since commit 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in
sleepable programs"), sleepable BPF program can use map-in-map, but
maybe_wait_bpf_programs() doesn't consider it accordingly.

So checking the value of sleepable_refcnt in maybe_wait_bpf_programs(),
if the value is not zero, use synchronize_rcu_mult() to wait for both
sleepable and non-sleepable BPF programs. But bpf syscall from syscall
program is special, because the bpf syscall is called with
rcu_read_lock_trace() being held, and there will be dead-lock if
synchronize_rcu_mult() is used to wait for the exit of sleepable BPF
program, so just skip the waiting of sleepable BPF program for bpf
syscall which comes from syscall program.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/syscall.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d2641e51a1a7..6b9d7990d95f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -35,6 +35,7 @@
 #include <linux/rcupdate_trace.h>
 #include <linux/memcontrol.h>
 #include <linux/trace_events.h>
+#include <linux/rcupdate_wait.h>
 
 #include <net/netfilter/nf_bpf_link.h>
 #include <net/netkit.h>
@@ -140,15 +141,24 @@ static u32 bpf_map_value_size(const struct bpf_map *map)
 		return  map->value_size;
 }
 
-static void maybe_wait_bpf_programs(struct bpf_map *map)
+static void maybe_wait_bpf_programs(struct bpf_map *map, bool rcu_trace_lock_held)
 {
-	/* Wait for any running BPF programs to complete so that
-	 * userspace, when we return to it, knows that all programs
-	 * that could be running use the new map value.
+	/* Wait for any running non-sleepable and sleepable BPF programs to
+	 * complete, so that userspace, when we return to it, knows that all
+	 * programs that could be running use the new map value. However
+	 * syscall program can also use bpf syscall to update or delete inner
+	 * map in outer map, and it holds rcu_read_lock_trace() before doing
+	 * the bpf syscall. If use synchronize_rcu_mult(call_rcu_tasks_trace)
+	 * to wait for the exit of running sleepable BPF programs, there will
+	 * be dead-lock, so skip the waiting for syscall program.
 	 */
 	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
-	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
-		synchronize_rcu();
+	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
+		if (atomic64_read(&map->sleepable_refcnt) && !rcu_trace_lock_held)
+			synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
+		else
+			synchronize_rcu();
+	}
 }
 
 static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
@@ -1561,7 +1571,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
 
 	err = bpf_map_update_value(map, f.file, key, value, attr->flags);
 	if (!err)
-		maybe_wait_bpf_programs(map);
+		maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
 
 	kvfree(value);
 free_key:
@@ -1618,7 +1628,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
 	rcu_read_unlock();
 	bpf_enable_instrumentation();
 	if (!err)
-		maybe_wait_bpf_programs(map);
+		maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
 out:
 	kvfree(key);
 err_put:
@@ -4973,7 +4983,7 @@ static int bpf_map_do_batch(union bpf_attr *attr,
 err_put:
 	if (has_write) {
 		if (attr->batch.count)
-			maybe_wait_bpf_programs(map);
+			maybe_wait_bpf_programs(map, false);
 		bpf_map_write_active_dec(map);
 	}
 	fdput(f);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-08 10:23 ` [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs() Hou Tao
@ 2023-12-08 22:30   ` Andrii Nakryiko
  2023-12-11  1:17     ` Hou Tao
  2023-12-10  2:11   ` Alexei Starovoitov
  1 sibling, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2023-12-08 22:30 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

On Fri, Dec 8, 2023 at 2:23 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Since commit 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in
> sleepable programs"), sleepable BPF program can use map-in-map, but
> maybe_wait_bpf_programs() doesn't consider it accordingly.
>
> So checking the value of sleepable_refcnt in maybe_wait_bpf_programs(),
> if the value is not zero, use synchronize_rcu_mult() to wait for both
> sleepable and non-sleepable BPF programs. But bpf syscall from syscall
> program is special, because the bpf syscall is called with
> rcu_read_lock_trace() being held, and there will be dead-lock if
> synchronize_rcu_mult() is used to wait for the exit of sleepable BPF
> program, so just skip the waiting of sleepable BPF program for bpf
> syscall which comes from syscall program.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/syscall.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d2641e51a1a7..6b9d7990d95f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -35,6 +35,7 @@
>  #include <linux/rcupdate_trace.h>
>  #include <linux/memcontrol.h>
>  #include <linux/trace_events.h>
> +#include <linux/rcupdate_wait.h>
>
>  #include <net/netfilter/nf_bpf_link.h>
>  #include <net/netkit.h>
> @@ -140,15 +141,24 @@ static u32 bpf_map_value_size(const struct bpf_map *map)
>                 return  map->value_size;
>  }
>
> -static void maybe_wait_bpf_programs(struct bpf_map *map)
> +static void maybe_wait_bpf_programs(struct bpf_map *map, bool rcu_trace_lock_held)
>  {
> -       /* Wait for any running BPF programs to complete so that
> -        * userspace, when we return to it, knows that all programs
> -        * that could be running use the new map value.
> +       /* Wait for any running non-sleepable and sleepable BPF programs to
> +        * complete, so that userspace, when we return to it, knows that all
> +        * programs that could be running use the new map value. However
> +        * syscall program can also use bpf syscall to update or delete inner
> +        * map in outer map, and it holds rcu_read_lock_trace() before doing
> +        * the bpf syscall. If use synchronize_rcu_mult(call_rcu_tasks_trace)
> +        * to wait for the exit of running sleepable BPF programs, there will
> +        * be dead-lock, so skip the waiting for syscall program.
>          */
>         if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> -           map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
> -               synchronize_rcu();
> +           map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> +               if (atomic64_read(&map->sleepable_refcnt) && !rcu_trace_lock_held)

why is this correct and non-racy without holding used_maps_mutex under
which this sleepable_refcnt is incremented?

> +                       synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
> +               else
> +                       synchronize_rcu();
> +       }
>  }
>
>  static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
> @@ -1561,7 +1571,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>
>         err = bpf_map_update_value(map, f.file, key, value, attr->flags);
>         if (!err)
> -               maybe_wait_bpf_programs(map);
> +               maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
>
>         kvfree(value);
>  free_key:
> @@ -1618,7 +1628,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
>         rcu_read_unlock();
>         bpf_enable_instrumentation();
>         if (!err)
> -               maybe_wait_bpf_programs(map);
> +               maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
>  out:
>         kvfree(key);
>  err_put:
> @@ -4973,7 +4983,7 @@ static int bpf_map_do_batch(union bpf_attr *attr,
>  err_put:
>         if (has_write) {
>                 if (attr->batch.count)
> -                       maybe_wait_bpf_programs(map);
> +                       maybe_wait_bpf_programs(map, false);
>                 bpf_map_write_active_dec(map);
>         }
>         fdput(f);
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds
  2023-12-08 10:23 ` [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds Hou Tao
@ 2023-12-08 22:55   ` Yonghong Song
  2023-12-10  2:11     ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Yonghong Song @ 2023-12-08 22:55 UTC (permalink / raw)
  To: Hou Tao, bpf
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, houtao1


On 12/8/23 2:23 AM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> There is no need to call maybe_wait_bpf_programs() if all operations in
> batched update, deletion, or lookup_and_deletion fail. So only call
> maybe_wait_bpf_programs() if at least one map operation succeeds.
>
> Similar with uattr->batch.count which is used to return the number of
> succeeded map operations to userspace application, use attr->batch.count
> to record the number of succeeded map operations in kernel. Sometimes
> these two number may be different. For example, in
> __htab_map_lookup_and_delete_batch(do_delete=true), it is possible that
> 10 items in current bucket have been successfully deleted, but copying
> the deleted keys to userspace application fails, attr->batch.count will
> be 10 but uattr->batch.count will be 0 instead.
>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>   include/linux/bpf.h  | 14 +++++++-------
>   kernel/bpf/hashtab.c | 20 +++++++++++---------
>   kernel/bpf/syscall.c | 21 ++++++++++++++-------
>   3 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f7aa255c634f..a0c4d696a231 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -81,17 +81,17 @@ struct bpf_map_ops {
>   	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
>   	void (*map_release_uref)(struct bpf_map *map);
>   	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
> -	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
> +	int (*map_lookup_batch)(struct bpf_map *map, union bpf_attr *attr,
>   				union bpf_attr __user *uattr);
>   	int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key,
>   					  void *value, u64 flags);
>   	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> -					   const union bpf_attr *attr,
> +					   union bpf_attr *attr,
>   					   union bpf_attr __user *uattr);
>   	int (*map_update_batch)(struct bpf_map *map, struct file *map_file,
> -				const union bpf_attr *attr,
> +				union bpf_attr *attr,
>   				union bpf_attr __user *uattr);
> -	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
> +	int (*map_delete_batch)(struct bpf_map *map, union bpf_attr *attr,
>   				union bpf_attr __user *uattr);
>   
>   	/* funcs callable from userspace and from eBPF programs */
> @@ -2095,13 +2095,13 @@ void bpf_map_area_free(void *base);
>   bool bpf_map_write_active(const struct bpf_map *map);
>   void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
>   int  generic_map_lookup_batch(struct bpf_map *map,
> -			      const union bpf_attr *attr,
> +			      union bpf_attr *attr,
>   			      union bpf_attr __user *uattr);
>   int  generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> -			      const union bpf_attr *attr,
> +			      union bpf_attr *attr,
>   			      union bpf_attr __user *uattr);
>   int  generic_map_delete_batch(struct bpf_map *map,
> -			      const union bpf_attr *attr,
> +			      union bpf_attr *attr,
>   			      union bpf_attr __user *uattr);
>   struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
>   struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 5b9146fa825f..b777bd8d4f8d 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1673,7 +1673,7 @@ static int htab_lru_percpu_map_lookup_and_delete_elem(struct bpf_map *map,
>   
>   static int
>   __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> -				   const union bpf_attr *attr,
> +				   union bpf_attr *attr,
>   				   union bpf_attr __user *uattr,
>   				   bool do_delete, bool is_lru_map,
>   				   bool is_percpu)
> @@ -1708,6 +1708,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>   	if (!max_count)
>   		return 0;
>   
> +	attr->batch.count = 0;
>   	if (put_user(0, &uattr->batch.count))
>   		return -EFAULT;
>   
> @@ -1845,6 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
>   		}
>   		dst_key += key_size;
>   		dst_val += value_size;
> +		attr->batch.count++;
>   	}
>   
>   	htab_unlock_bucket(htab, b, batch, flags);

[...]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index efda2353a7d5..d2641e51a1a7 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1695,7 +1695,7 @@ static int map_get_next_key(union bpf_attr *attr)
>   }
>   
>   int generic_map_delete_batch(struct bpf_map *map,
> -			     const union bpf_attr *attr,
> +			     union bpf_attr *attr,
>   			     union bpf_attr __user *uattr)
>   {
>   	void __user *keys = u64_to_user_ptr(attr->batch.keys);
> @@ -1715,6 +1715,7 @@ int generic_map_delete_batch(struct bpf_map *map,
>   	if (!max_count)
>   		return 0;
>   
> +	attr->batch.count = 0;
>   	if (put_user(0, &uattr->batch.count))
>   		return -EFAULT;
>   
> @@ -1742,6 +1743,8 @@ int generic_map_delete_batch(struct bpf_map *map,
>   			break;
>   		cond_resched();
>   	}
> +
> +	attr->batch.count = cp;
>   	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
>   		err = -EFAULT;
>   
> @@ -1751,7 +1754,7 @@ int generic_map_delete_batch(struct bpf_map *map,
>   }
>   
>   int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> -			     const union bpf_attr *attr,
> +			     union bpf_attr *attr,
>   			     union bpf_attr __user *uattr)
>   {
>   	void __user *values = u64_to_user_ptr(attr->batch.values);
> @@ -1774,6 +1777,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>   	if (!max_count)
>   		return 0;
>   
> +	attr->batch.count = 0;
>   	if (put_user(0, &uattr->batch.count))
>   		return -EFAULT;
>   
> @@ -1802,6 +1806,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>   		cond_resched();
>   	}
>   
> +	attr->batch.count = cp;
>   	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
>   		err = -EFAULT;
>   
> @@ -1813,9 +1818,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
>   
>   #define MAP_LOOKUP_RETRIES 3
>   
> -int generic_map_lookup_batch(struct bpf_map *map,
> -				    const union bpf_attr *attr,
> -				    union bpf_attr __user *uattr)
> +int generic_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
> +			     union bpf_attr __user *uattr)
>   {
>   	void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
>   	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> @@ -1838,6 +1842,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>   	if (!max_count)
>   		return 0;
>   
> +	attr->batch.count = 0;
>   	if (put_user(0, &uattr->batch.count))
>   		return -EFAULT;
>   
> @@ -1903,6 +1908,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>   	if (err == -EFAULT)
>   		goto free_buf;
>   
> +	attr->batch.count = cp;

You don't need to change generic_map_lookup_batch() here. It won't trigger
maybe_wait_bpf_programs().

>   	if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
>   		    (cp && copy_to_user(uobatch, prev_key, map->key_size))))
>   		err = -EFAULT;
> @@ -4926,7 +4932,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>   		err = fn(__VA_ARGS__);		\
>   	} while (0)
>   
> -static int bpf_map_do_batch(const union bpf_attr *attr,
> +static int bpf_map_do_batch(union bpf_attr *attr,
>   			    union bpf_attr __user *uattr,
>   			    int cmd)
>   {
> @@ -4966,7 +4972,8 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
>   		BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
>   err_put:
>   	if (has_write) {
> -		maybe_wait_bpf_programs(map);
> +		if (attr->batch.count)
> +			maybe_wait_bpf_programs(map);

Your code logic sounds correct but I feel you are optimizing for extreme
corner cases. In really esp production environment, a fault with something
like copy_to_user() should be extremely rare. So in my opinion, this optimization
is not needed.

>   		bpf_map_write_active_dec(map);
>   	}
>   	fdput(f);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-08 10:23 ` [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs() Hou Tao
  2023-12-08 22:30   ` Andrii Nakryiko
@ 2023-12-10  2:11   ` Alexei Starovoitov
  2023-12-11  2:07     ` Hou Tao
  1 sibling, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2023-12-10  2:11 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, Hou Tao

On Fri, Dec 8, 2023 at 2:22 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> +       /* Wait for any running non-sleepable and sleepable BPF programs to
> +        * complete, so that userspace, when we return to it, knows that all
> +        * programs that could be running use the new map value.

which could be forever... and the user space task doing simple map update
will never know why it got stuck in syscall waiting... forever...
synchronous waiting for tasks_trace is never an option.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds
  2023-12-08 22:55   ` Yonghong Song
@ 2023-12-10  2:11     ` Alexei Starovoitov
  2023-12-11  1:11       ` Hou Tao
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2023-12-10  2:11 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Hou Tao, bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, Hou Tao

On Fri, Dec 8, 2023 at 2:55 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 2:23 AM, Hou Tao wrote:
> > From: Hou Tao <houtao1@huawei.com>
> >
> > There is no need to call maybe_wait_bpf_programs() if all operations in
> > batched update, deletion, or lookup_and_deletion fail. So only call
> > maybe_wait_bpf_programs() if at least one map operation succeeds.
> >
> > Similar with uattr->batch.count which is used to return the number of
> > succeeded map operations to userspace application, use attr->batch.count
> > to record the number of succeeded map operations in kernel. Sometimes
> > these two number may be different. For example, in
> > __htab_map_lookup_and_delete_batch(do_delete=true), it is possible that
> > 10 items in current bucket have been successfully deleted, but copying
> > the deleted keys to userspace application fails, attr->batch.count will
> > be 10 but uattr->batch.count will be 0 instead.
> >
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >   include/linux/bpf.h  | 14 +++++++-------
> >   kernel/bpf/hashtab.c | 20 +++++++++++---------
> >   kernel/bpf/syscall.c | 21 ++++++++++++++-------
> >   3 files changed, 32 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f7aa255c634f..a0c4d696a231 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -81,17 +81,17 @@ struct bpf_map_ops {
> >       int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
> >       void (*map_release_uref)(struct bpf_map *map);
> >       void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
> > -     int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
> > +     int (*map_lookup_batch)(struct bpf_map *map, union bpf_attr *attr,
> >                               union bpf_attr __user *uattr);
> >       int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key,
> >                                         void *value, u64 flags);
> >       int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> > -                                        const union bpf_attr *attr,
> > +                                        union bpf_attr *attr,
> >                                          union bpf_attr __user *uattr);
> >       int (*map_update_batch)(struct bpf_map *map, struct file *map_file,
> > -                             const union bpf_attr *attr,
> > +                             union bpf_attr *attr,
> >                               union bpf_attr __user *uattr);
> > -     int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
> > +     int (*map_delete_batch)(struct bpf_map *map, union bpf_attr *attr,
> >                               union bpf_attr __user *uattr);
> >
> >       /* funcs callable from userspace and from eBPF programs */
> > @@ -2095,13 +2095,13 @@ void bpf_map_area_free(void *base);
> >   bool bpf_map_write_active(const struct bpf_map *map);
> >   void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> >   int  generic_map_lookup_batch(struct bpf_map *map,
> > -                           const union bpf_attr *attr,
> > +                           union bpf_attr *attr,
> >                             union bpf_attr __user *uattr);
> >   int  generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> > -                           const union bpf_attr *attr,
> > +                           union bpf_attr *attr,
> >                             union bpf_attr __user *uattr);
> >   int  generic_map_delete_batch(struct bpf_map *map,
> > -                           const union bpf_attr *attr,
> > +                           union bpf_attr *attr,
> >                             union bpf_attr __user *uattr);
> >   struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
> >   struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 5b9146fa825f..b777bd8d4f8d 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1673,7 +1673,7 @@ static int htab_lru_percpu_map_lookup_and_delete_elem(struct bpf_map *map,
> >
> >   static int
> >   __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > -                                const union bpf_attr *attr,
> > +                                union bpf_attr *attr,
> >                                  union bpf_attr __user *uattr,
> >                                  bool do_delete, bool is_lru_map,
> >                                  bool is_percpu)
> > @@ -1708,6 +1708,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >       if (!max_count)
> >               return 0;
> >
> > +     attr->batch.count = 0;
> >       if (put_user(0, &uattr->batch.count))
> >               return -EFAULT;
> >
> > @@ -1845,6 +1846,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >               }
> >               dst_key += key_size;
> >               dst_val += value_size;
> > +             attr->batch.count++;
> >       }
> >
> >       htab_unlock_bucket(htab, b, batch, flags);
>
> [...]
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index efda2353a7d5..d2641e51a1a7 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1695,7 +1695,7 @@ static int map_get_next_key(union bpf_attr *attr)
> >   }
> >
> >   int generic_map_delete_batch(struct bpf_map *map,
> > -                          const union bpf_attr *attr,
> > +                          union bpf_attr *attr,
> >                            union bpf_attr __user *uattr)
> >   {
> >       void __user *keys = u64_to_user_ptr(attr->batch.keys);
> > @@ -1715,6 +1715,7 @@ int generic_map_delete_batch(struct bpf_map *map,
> >       if (!max_count)
> >               return 0;
> >
> > +     attr->batch.count = 0;
> >       if (put_user(0, &uattr->batch.count))
> >               return -EFAULT;
> >
> > @@ -1742,6 +1743,8 @@ int generic_map_delete_batch(struct bpf_map *map,
> >                       break;
> >               cond_resched();
> >       }
> > +
> > +     attr->batch.count = cp;
> >       if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> >               err = -EFAULT;
> >
> > @@ -1751,7 +1754,7 @@ int generic_map_delete_batch(struct bpf_map *map,
> >   }
> >
> >   int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> > -                          const union bpf_attr *attr,
> > +                          union bpf_attr *attr,
> >                            union bpf_attr __user *uattr)
> >   {
> >       void __user *values = u64_to_user_ptr(attr->batch.values);
> > @@ -1774,6 +1777,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> >       if (!max_count)
> >               return 0;
> >
> > +     attr->batch.count = 0;
> >       if (put_user(0, &uattr->batch.count))
> >               return -EFAULT;
> >
> > @@ -1802,6 +1806,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> >               cond_resched();
> >       }
> >
> > +     attr->batch.count = cp;
> >       if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> >               err = -EFAULT;
> >
> > @@ -1813,9 +1818,8 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> >
> >   #define MAP_LOOKUP_RETRIES 3
> >
> > -int generic_map_lookup_batch(struct bpf_map *map,
> > -                                 const union bpf_attr *attr,
> > -                                 union bpf_attr __user *uattr)
> > +int generic_map_lookup_batch(struct bpf_map *map, union bpf_attr *attr,
> > +                          union bpf_attr __user *uattr)
> >   {
> >       void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
> >       void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> > @@ -1838,6 +1842,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
> >       if (!max_count)
> >               return 0;
> >
> > +     attr->batch.count = 0;
> >       if (put_user(0, &uattr->batch.count))
> >               return -EFAULT;
> >
> > @@ -1903,6 +1908,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
> >       if (err == -EFAULT)
> >               goto free_buf;
> >
> > +     attr->batch.count = cp;
>
> You don't need to change generic_map_lookup_batch() here. It won't trigger
> maybe_wait_bpf_programs().
>
> >       if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
> >                   (cp && copy_to_user(uobatch, prev_key, map->key_size))))
> >               err = -EFAULT;
> > @@ -4926,7 +4932,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> >               err = fn(__VA_ARGS__);          \
> >       } while (0)
> >
> > -static int bpf_map_do_batch(const union bpf_attr *attr,
> > +static int bpf_map_do_batch(union bpf_attr *attr,
> >                           union bpf_attr __user *uattr,
> >                           int cmd)
> >   {
> > @@ -4966,7 +4972,8 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
> >               BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
> >   err_put:
> >       if (has_write) {
> > -             maybe_wait_bpf_programs(map);
> > +             if (attr->batch.count)
> > +                     maybe_wait_bpf_programs(map);
>
> Your code logic sounds correct but I feel you are optimizing for extreme
> corner cases. In really esp production environment, a fault with something
> like copy_to_user() should be extremely rare. So in my opinion, this optimization
> is not needed.

+1
the code is fine as-is.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs()
  2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
                   ` (6 preceding siblings ...)
  2023-12-08 10:23 ` [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs() Hou Tao
@ 2023-12-10  2:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-12-10  2:40 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, martin.lau, alexei.starovoitov, andrii, song, haoluo,
	yonghong.song, daniel, kpsingh, sdf, jolsa, john.fastabend,
	houtao1

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri,  8 Dec 2023 18:23:48 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patch set aims to fix the problems found when inspecting the code
> related with maybe_wait_bpf_programs().
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/7] bpf: Remove unnecessary wait from bpf_map_copy_value()
    https://git.kernel.org/bpf/bpf-next/c/c26f2a890139
  - [bpf-next,2/7] bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch()
    https://git.kernel.org/bpf/bpf-next/c/37ba5b59d6ad
  - [bpf-next,3/7] bpf: Add missed maybe_wait_bpf_programs() for htab of maps
    https://git.kernel.org/bpf/bpf-next/c/012772581d04
  - [bpf-next,4/7] bpf: Only call maybe_wait_bpf_programs() when map operation succeeds
    https://git.kernel.org/bpf/bpf-next/c/67ad2c73ff29
  - [bpf-next,5/7] bpf: Set uattr->batch.count as zero before batched update or deletion
    https://git.kernel.org/bpf/bpf-next/c/06e5c999f102
  - [bpf-next,6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds
    (no matching commit)
  - [bpf-next,7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds
  2023-12-10  2:11     ` Alexei Starovoitov
@ 2023-12-11  1:11       ` Hou Tao
  0 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-11  1:11 UTC (permalink / raw)
  To: Alexei Starovoitov, Yonghong Song
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Hou Tao

Hi

On 12/10/2023 10:11 AM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 2:55 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> On 12/8/23 2:23 AM, Hou Tao wrote:
>>> From: Hou Tao <houtao1@huawei.com>
>>>
>>> There is no need to call maybe_wait_bpf_programs() if all operations in
>>> batched update, deletion, or lookup_and_deletion fail. So only call
>>> maybe_wait_bpf_programs() if at least one map operation succeeds.
>>>

SNIP
>>> +     attr->batch.count = 0;
>>>       if (put_user(0, &uattr->batch.count))
>>>               return -EFAULT;
>>>
>>> @@ -1903,6 +1908,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>>>       if (err == -EFAULT)
>>>               goto free_buf;
>>>
>>> +     attr->batch.count = cp;
>> You don't need to change generic_map_lookup_batch() here. It won't trigger
>> maybe_wait_bpf_programs().
>>
>>>       if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
>>>                   (cp && copy_to_user(uobatch, prev_key, map->key_size))))
>>>               err = -EFAULT;
>>> @@ -4926,7 +4932,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>               err = fn(__VA_ARGS__);          \
>>>       } while (0)
>>>
>>> -static int bpf_map_do_batch(const union bpf_attr *attr,
>>> +static int bpf_map_do_batch(union bpf_attr *attr,
>>>                           union bpf_attr __user *uattr,
>>>                           int cmd)
>>>   {
>>> @@ -4966,7 +4972,8 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
>>>               BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
>>>   err_put:
>>>       if (has_write) {
>>> -             maybe_wait_bpf_programs(map);
>>> +             if (attr->batch.count)
>>> +                     maybe_wait_bpf_programs(map);
>> Your code logic sounds correct but I feel you are optimizing for extreme
>> corner cases. In really esp production environment, a fault with something
>> like copy_to_user() should be extremely rare. So in my opinion, this optimization
>> is not needed.
> +1
> the code is fine as-is.

Thanks for suggestions. It is indeed an over-optimization for a rare
scenario. Will drop it.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-08 22:30   ` Andrii Nakryiko
@ 2023-12-11  1:17     ` Hou Tao
  0 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-11  1:17 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1

Hi,

On 12/9/2023 6:30 AM, Andrii Nakryiko wrote:
> On Fri, Dec 8, 2023 at 2:23 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Since commit 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in
>> sleepable programs"), sleepable BPF program can use map-in-map, but
>> maybe_wait_bpf_programs() doesn't consider it accordingly.
>>
>> So checking the value of sleepable_refcnt in maybe_wait_bpf_programs(),
>> if the value is not zero, use synchronize_rcu_mult() to wait for both
>> sleepable and non-sleepable BPF programs. But bpf syscall from syscall
>> program is special, because the bpf syscall is called with
>> rcu_read_lock_trace() being held, and there will be dead-lock if
>> synchronize_rcu_mult() is used to wait for the exit of sleepable BPF
>> program, so just skip the waiting of sleepable BPF program for bpf
>> syscall which comes from syscall program.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/syscall.c | 28 +++++++++++++++++++---------
>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d2641e51a1a7..6b9d7990d95f 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -35,6 +35,7 @@
>>  #include <linux/rcupdate_trace.h>
>>  #include <linux/memcontrol.h>
>>  #include <linux/trace_events.h>
>> +#include <linux/rcupdate_wait.h>
>>
>>  #include <net/netfilter/nf_bpf_link.h>
>>  #include <net/netkit.h>
>> @@ -140,15 +141,24 @@ static u32 bpf_map_value_size(const struct bpf_map *map)
>>                 return  map->value_size;
>>  }
>>
>> -static void maybe_wait_bpf_programs(struct bpf_map *map)
>> +static void maybe_wait_bpf_programs(struct bpf_map *map, bool rcu_trace_lock_held)
>>  {
>> -       /* Wait for any running BPF programs to complete so that
>> -        * userspace, when we return to it, knows that all programs
>> -        * that could be running use the new map value.
>> +       /* Wait for any running non-sleepable and sleepable BPF programs to
>> +        * complete, so that userspace, when we return to it, knows that all
>> +        * programs that could be running use the new map value. However
>> +        * syscall program can also use bpf syscall to update or delete inner
>> +        * map in outer map, and it holds rcu_read_lock_trace() before doing
>> +        * the bpf syscall. If use synchronize_rcu_mult(call_rcu_tasks_trace)
>> +        * to wait for the exit of running sleepable BPF programs, there will
>> +        * be dead-lock, so skip the waiting for syscall program.
>>          */
>>         if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
>> -           map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
>> -               synchronize_rcu();
>> +           map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
>> +               if (atomic64_read(&map->sleepable_refcnt) && !rcu_trace_lock_held)
> why is this correct and non-racy without holding used_maps_mutex under
> which this sleepable_refcnt is incremented?

Do you mean bpf_prog_bind_map(), right ? The program which binds with
the map doesn't access it, so if the atomic64_inc() is missed, there
will still be OK. But if the implementation is changed afterwards, there
will be a problem.
>
>> +                       synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
>> +               else
>> +                       synchronize_rcu();
>> +       }
>>  }
>>
>>  static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>> @@ -1561,7 +1571,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
>>
>>         err = bpf_map_update_value(map, f.file, key, value, attr->flags);
>>         if (!err)
>> -               maybe_wait_bpf_programs(map);
>> +               maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
>>
>>         kvfree(value);
>>  free_key:
>> @@ -1618,7 +1628,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
>>         rcu_read_unlock();
>>         bpf_enable_instrumentation();
>>         if (!err)
>> -               maybe_wait_bpf_programs(map);
>> +               maybe_wait_bpf_programs(map, bpfptr_is_kernel(uattr));
>>  out:
>>         kvfree(key);
>>  err_put:
>> @@ -4973,7 +4983,7 @@ static int bpf_map_do_batch(union bpf_attr *attr,
>>  err_put:
>>         if (has_write) {
>>                 if (attr->batch.count)
>> -                       maybe_wait_bpf_programs(map);
>> +                       maybe_wait_bpf_programs(map, false);
>>                 bpf_map_write_active_dec(map);
>>         }
>>         fdput(f);
>> --
>> 2.29.2
>>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-10  2:11   ` Alexei Starovoitov
@ 2023-12-11  2:07     ` Hou Tao
  2023-12-11  2:56       ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Hou Tao @ 2023-12-11  2:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, Hou Tao

Hi Alexei,

On 12/10/2023 10:11 AM, Alexei Starovoitov wrote:
> On Fri, Dec 8, 2023 at 2:22 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> +       /* Wait for any running non-sleepable and sleepable BPF programs to
>> +        * complete, so that userspace, when we return to it, knows that all
>> +        * programs that could be running use the new map value.
> which could be forever... and the user space task doing simple map update
> will never know why it got stuck in syscall waiting... forever...
> synchronous waiting for tasks_trace is never an option.

Could you please elaborate the reason why there is dead-lock problem ? 
In my naive understanding, synchronize_rcu_tasks_trace() only waits for
the end of rcu_read_lock_trace()/rcu_read_unlock_trace(), if there is no
rcu_read_lock_trace being held, there will be no dead-lock.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-11  2:07     ` Hou Tao
@ 2023-12-11  2:56       ` Alexei Starovoitov
  2023-12-11  3:03         ` Hou Tao
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2023-12-11  2:56 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, Hou Tao

On Sun, Dec 10, 2023 at 6:07 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi Alexei,
>
> On 12/10/2023 10:11 AM, Alexei Starovoitov wrote:
> > On Fri, Dec 8, 2023 at 2:22 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> +       /* Wait for any running non-sleepable and sleepable BPF programs to
> >> +        * complete, so that userspace, when we return to it, knows that all
> >> +        * programs that could be running use the new map value.
> > which could be forever... and the user space task doing simple map update
> > will never know why it got stuck in syscall waiting... forever...
> > synchronous waiting for tasks_trace is never an option.
>
> Could you please elaborate the reason why there is dead-lock problem ?
> In my naive understanding, synchronize_rcu_tasks_trace() only waits for
> the end of rcu_read_lock_trace()/rcu_read_unlock_trace(), if there is no
> rcu_read_lock_trace being held, there will be no dead-lock.

I didn't say it's dead-lock. rcu_read_lock_trace() section can last
for a very long time. The user space shouldn't be exposed to such delays.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs()
  2023-12-11  2:56       ` Alexei Starovoitov
@ 2023-12-11  3:03         ` Hou Tao
  0 siblings, 0 replies; 18+ messages in thread
From: Hou Tao @ 2023-12-11  3:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, Stanislav Fomichev,
	Jiri Olsa, John Fastabend, Hou Tao

Hi,

On 12/11/2023 10:56 AM, Alexei Starovoitov wrote:
> On Sun, Dec 10, 2023 at 6:07 PM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi Alexei,
>>
>> On 12/10/2023 10:11 AM, Alexei Starovoitov wrote:
>>> On Fri, Dec 8, 2023 at 2:22 AM Hou Tao <houtao@huaweicloud.com> wrote:
>>>> +       /* Wait for any running non-sleepable and sleepable BPF programs to
>>>> +        * complete, so that userspace, when we return to it, knows that all
>>>> +        * programs that could be running use the new map value.
>>> which could be forever... and the user space task doing simple map update
>>> will never know why it got stuck in syscall waiting... forever...
>>> synchronous waiting for tasks_trace is never an option.
>> Could you please elaborate the reason why there is dead-lock problem ?
>> In my naive understanding, synchronize_rcu_tasks_trace() only waits for
>> the end of rcu_read_lock_trace()/rcu_read_unlock_trace(), if there is no
>> rcu_read_lock_trace being held, there will be no dead-lock.
> I didn't say it's dead-lock. rcu_read_lock_trace() section can last
> for a very long time. The user space shouldn't be exposed to such delays.

I see. Thanks for the explanation. Will update the comments in
maybe_wait_bpf_programs() in a new patch.
> .


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-12-11  3:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-08 10:23 [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 1/7] bpf: Remove unnecessary wait from bpf_map_copy_value() Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 2/7] bpf: Call maybe_wait_bpf_programs() only once for generic_map_update_batch() Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 3/7] bpf: Add missed maybe_wait_bpf_programs() for htab of maps Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 4/7] bpf: Only call maybe_wait_bpf_programs() when map operation succeeds Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 5/7] bpf: Set uattr->batch.count as zero before batched update or deletion Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 6/7] bpf: Only call maybe_wait_bpf_programs() when at least one map operation succeeds Hou Tao
2023-12-08 22:55   ` Yonghong Song
2023-12-10  2:11     ` Alexei Starovoitov
2023-12-11  1:11       ` Hou Tao
2023-12-08 10:23 ` [PATCH bpf-next 7/7] bpf: Wait for sleepable BPF program in maybe_wait_bpf_programs() Hou Tao
2023-12-08 22:30   ` Andrii Nakryiko
2023-12-11  1:17     ` Hou Tao
2023-12-10  2:11   ` Alexei Starovoitov
2023-12-11  2:07     ` Hou Tao
2023-12-11  2:56       ` Alexei Starovoitov
2023-12-11  3:03         ` Hou Tao
2023-12-10  2:40 ` [PATCH bpf-next 0/7] bpf: Fixes for maybe_wait_bpf_programs() patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.