All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
@ 2018-07-29 15:51 Alexei Starovoitov
  2018-07-29 20:46 ` Daniel Colascione
  2018-07-31  2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes
  0 siblings, 2 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2018-07-29 15:51 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Joel Fernandes, LKML, Tim Murray, Network Development,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
> BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
> map made by a BPF program that is running at the time the
> BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
> to provide a means for userspace to replace a BPF map with another,
> newer version, then ensure that no component is still using the "old"
> map before manipulating the "old" map in some way.
>
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  include/uapi/linux/bpf.h |  9 +++++++++
>  kernel/bpf/syscall.c     | 13 +++++++++++++
>  2 files changed, 22 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b7db3261c62d..5b27e9117d3e 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
>         __u8    data[0];        /* Arbitrary size */
>  };
>
> +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
> + * BPF map made by a BPF program that is running at the time the
> + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command

that doesn't sound right to me.
such command won't wait for the release of the references.
in case of map-in-map the program does not hold
the references to inner map (only to outer map).

> + * is to provide a means for userspace to replace a BPF map with
> + * another, newer version, then ensure that no component is still
> + * using the "old" map before manipulating the "old" map in some way.
> + */

that's correct, but the whole paragraph still reads too
'generic' which will lead to confusion,
whereas the use case is map-in-map only.
I think bpf_sync_inner_map or
bpf_sync_map_in_map would be better
choices for the name.

btw i don't have reliable internet access at the moment,
so pls excuse the delays.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov
@ 2018-07-29 20:46 ` Daniel Colascione
  2018-07-29 20:58   ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione
  2018-07-31  2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Colascione @ 2018-07-29 20:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Joel Fernandes, LKML, Tim Murray, Network Development,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

On Sun, Jul 29, 2018 at 8:51 AM, Alexei Starovoitov <
alexei.starovoitov@gmail.com> wrote:

> On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com>
> wrote:
> > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
> > map made by a BPF program that is running at the time the
> > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
> > to provide a means for userspace to replace a BPF map with another,
> > newer version, then ensure that no component is still using the "old"
> > map before manipulating the "old" map in some way.
> >
> > Signed-off-by: Daniel Colascione <dancol@google.com>
> > ---
> >  include/uapi/linux/bpf.h |  9 +++++++++
> >  kernel/bpf/syscall.c     | 13 +++++++++++++
> >  2 files changed, 22 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b7db3261c62d..5b27e9117d3e 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
> >         __u8    data[0];        /* Arbitrary size */
> >  };
> >
> > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
> > + * BPF map made by a BPF program that is running at the time the
> > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
>
> that doesn't sound right to me.
> such command won't wait for the release of the references.
> in case of map-in-map the program does not hold
> the references to inner map (only to outer map).
>

Well, today that's the guarantee it provides. :-) I figured it'd be simpler
to explain the guarantee this way.

How about "waits for the release of any reference to a map obtained from
another map"?


>
> > + * is to provide a means for userspace to replace a BPF map with
> > + * another, newer version, then ensure that no component is still
> > + * using the "old" map before manipulating the "old" map in some way.
> > + */
>
> that's correct, but the whole paragraph still reads too
> 'generic' which will lead to confusion,
> whereas the use case is map-in-map only.
> I think bpf_sync_inner_map or
> bpf_sync_map_in_map would be better
> choices for the name.


I worry that a name like that would be too specific.

[-- Attachment #2: Type: text/html, Size: 3163 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-29 20:46 ` Daniel Colascione
@ 2018-07-29 20:58   ` Daniel Colascione
  2018-07-30 10:04       ` Daniel Borkmann
       [not found]     ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel Colascione @ 2018-07-29 20:58 UTC (permalink / raw)
  To: joelaf
  Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, Daniel Colascione

BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
references to maps active at the instant the
BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
expiration of map references obtained by BPF programs from other maps.

The purpose of this command is to provide a means for userspace to
replace a BPF map with another, newer version, then ensure that no
component is still using the "old" map before manipulating the "old"
map in some way.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 include/uapi/linux/bpf.h | 14 ++++++++++++++
 kernel/bpf/syscall.c     | 13 +++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b7db3261c62d..ca3cfca76edc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -75,6 +75,19 @@ struct bpf_lpm_trie_key {
 	__u8	data[0];	/* Arbitrary size */
 };
 
+/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
+ * references to maps active at the instant the
+ * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
+ * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
+ * expiration of map references obtained by BPF programs from
+ * other maps.
+ *
+ * The purpose of this command is to provide a means for userspace to
+ * replace a BPF map with another, newer version, then ensure that no
+ * component is still using the "old" map before manipulating the
+ * "old" map in some way.
+ */
+
 /* BPF syscall commands, see bpf(2) man-page for details. */
 enum bpf_cmd {
 	BPF_MAP_CREATE,
@@ -98,6 +111,7 @@ enum bpf_cmd {
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
+	BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a31a1ba0f8ea..bc9a0713f47d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) {
+		if (uattr != NULL || size != 0)
+			return -EINVAL;
+		err = security_bpf(cmd, NULL, 0);
+		if (err < 0)
+			return err;
+		/* BPF programs always enter a critical section while
+		 * they have a map reference outstanding.
+		 */
+		synchronize_rcu();
+		return 0;
+	}
+
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
 	if (err)
 		return err;
-- 
2.18.0.345.g5c9ce644c3-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-29 20:58   ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione
@ 2018-07-30 10:04       ` Daniel Borkmann
       [not found]     ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Daniel Borkmann @ 2018-07-30 10:04 UTC (permalink / raw)
  To: Daniel Colascione, joelaf
  Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov

On 07/29/2018 10:58 PM, Daniel Colascione wrote:
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> references to maps active at the instant the
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> expiration of map references obtained by BPF programs from other maps.
> 
> The purpose of this command is to provide a means for userspace to
> replace a BPF map with another, newer version, then ensure that no
> component is still using the "old" map before manipulating the "old"
> map in some way.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  include/uapi/linux/bpf.h | 14 ++++++++++++++
>  kernel/bpf/syscall.c     | 13 +++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b7db3261c62d..ca3cfca76edc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key {
>  	__u8	data[0];	/* Arbitrary size */
>  };
>  
> +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> + * references to maps active at the instant the
> + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> + * expiration of map references obtained by BPF programs from
> + * other maps.
> + *
> + * The purpose of this command is to provide a means for userspace to
> + * replace a BPF map with another, newer version, then ensure that no
> + * component is still using the "old" map before manipulating the
> + * "old" map in some way.
> + */
> +
>  /* BPF syscall commands, see bpf(2) man-page for details. */
>  enum bpf_cmd {
>  	BPF_MAP_CREATE,
> @@ -98,6 +111,7 @@ enum bpf_cmd {
>  	BPF_BTF_LOAD,
>  	BPF_BTF_GET_FD_BY_ID,
>  	BPF_TASK_FD_QUERY,
> +	BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES,
>  };
>  
>  enum bpf_map_type {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a31a1ba0f8ea..bc9a0713f47d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>  	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) {
> +		if (uattr != NULL || size != 0)
> +			return -EINVAL;
> +		err = security_bpf(cmd, NULL, 0);
> +		if (err < 0)
> +			return err;
> +		/* BPF programs always enter a critical section while
> +		 * they have a map reference outstanding.
> +		 */
> +		synchronize_rcu();
> +		return 0;
> +	}
> +
>  	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
>  	if (err)
>  		return err;
> 

Hmm, I don't think such UAPI as above is future-proof. In case we would want
a similar mechanism in future for other maps, we would need a whole new bpf
command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
the underlying map may not even be a map-to-map. Additionally, we don't have
any map object at hand in the above, so we couldn't make any finer grained
decisions either. Something like below would be more suitable and leaves room
for extending this further in future.

Thanks,
Daniel

From 8dfea71b73fa0d402633b76f78c106e82a7a5007 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 30 Jul 2018 11:47:37 +0200
Subject: [PATCH] sync map refs

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c    |  1 +
 kernel/bpf/hashtab.c     |  1 +
 kernel/bpf/map_in_map.c  |  6 ++++++
 kernel/bpf/map_in_map.h  |  1 +
 kernel/bpf/syscall.c     | 24 ++++++++++++++++++++++++
 7 files changed, 35 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b5ad95..7b51f86 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -34,6 +34,7 @@ struct bpf_map_ops {
 	void (*map_free)(struct bpf_map *map);
 	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
 	void (*map_release_uref)(struct bpf_map *map);
+	int (*map_sync_refs)(struct bpf_map *map);

 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8701139..e6ec1de 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -98,6 +98,7 @@ enum bpf_cmd {
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
+	BPF_MAP_SYNC_REFS,
 };

 enum bpf_map_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 544e58f..ddaf42a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -748,5 +748,6 @@ const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_sync_refs = bpf_map_sync_refs,
 	.map_gen_lookup = array_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 513d9df..05380ea 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1407,5 +1407,6 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_sync_refs = bpf_map_sync_refs,
 	.map_gen_lookup = htab_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 1da5746..698a50f 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -96,6 +96,12 @@ void bpf_map_fd_put_ptr(void *ptr)
 	bpf_map_put(ptr);
 }

+int bpf_map_sync_refs(struct bpf_map *map)
+{
+	synchronize_rcu();
+	return 0;
+}
+
 u32 bpf_map_fd_sys_lookup_elem(void *ptr)
 {
 	return ((struct bpf_map *)ptr)->id;
diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h
index 6183db9..ac02456 100644
--- a/kernel/bpf/map_in_map.h
+++ b/kernel/bpf/map_in_map.h
@@ -19,6 +19,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file,
 			 int ufd);
 void bpf_map_fd_put_ptr(void *ptr);
+int bpf_map_sync_refs(struct bpf_map *map);
 u32 bpf_map_fd_sys_lookup_elem(void *ptr);

 #endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a31a1ba..b1286cc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -896,6 +896,27 @@ static int map_get_next_key(union bpf_attr *attr)
 	return err;
 }

+#define BPF_MAP_SYNC_REFS_LAST_FIELD map_fd
+
+static int map_sync_refs(union bpf_attr *attr)
+{
+	int err = -ENOTSUPP, ufd = attr->map_fd;
+	struct bpf_map *map;
+	struct fd f;
+
+	if (CHECK_ATTR(BPF_MAP_SYNC_REFS))
+		return -EINVAL;
+
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	if (map->ops->map_sync_refs)
+		err = map->ops->map_sync_refs(map);
+	fdput(f);
+	return err;
+}
+
 static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name) \
 	[_id] = & _name ## _prog_ops,
@@ -2303,6 +2324,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_GET_NEXT_KEY:
 		err = map_get_next_key(&attr);
 		break;
+	case BPF_MAP_SYNC_REFS:
+		err = map_sync_refs(&attr);
+		break;
 	case BPF_PROG_LOAD:
 		err = bpf_prog_load(&attr);
 		break;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
@ 2018-07-30 10:04       ` Daniel Borkmann
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Borkmann @ 2018-07-30 10:04 UTC (permalink / raw)
  To: Daniel Colascione, joelaf
  Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov

On 07/29/2018 10:58 PM, Daniel Colascione wrote:
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> references to maps active at the instant the
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> expiration of map references obtained by BPF programs from other maps.
> 
> The purpose of this command is to provide a means for userspace to
> replace a BPF map with another, newer version, then ensure that no
> component is still using the "old" map before manipulating the "old"
> map in some way.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  include/uapi/linux/bpf.h | 14 ++++++++++++++
>  kernel/bpf/syscall.c     | 13 +++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b7db3261c62d..ca3cfca76edc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key {
>  	__u8	data[0];	/* Arbitrary size */
>  };
>  
> +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> + * references to maps active at the instant the
> + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> + * expiration of map references obtained by BPF programs from
> + * other maps.
> + *
> + * The purpose of this command is to provide a means for userspace to
> + * replace a BPF map with another, newer version, then ensure that no
> + * component is still using the "old" map before manipulating the
> + * "old" map in some way.
> + */
> +
>  /* BPF syscall commands, see bpf(2) man-page for details. */
>  enum bpf_cmd {
>  	BPF_MAP_CREATE,
> @@ -98,6 +111,7 @@ enum bpf_cmd {
>  	BPF_BTF_LOAD,
>  	BPF_BTF_GET_FD_BY_ID,
>  	BPF_TASK_FD_QUERY,
> +	BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES,
>  };
>  
>  enum bpf_map_type {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a31a1ba0f8ea..bc9a0713f47d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>  	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) {
> +		if (uattr != NULL || size != 0)
> +			return -EINVAL;
> +		err = security_bpf(cmd, NULL, 0);
> +		if (err < 0)
> +			return err;
> +		/* BPF programs always enter a critical section while
> +		 * they have a map reference outstanding.
> +		 */
> +		synchronize_rcu();
> +		return 0;
> +	}
> +
>  	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
>  	if (err)
>  		return err;
> 

Hmm, I don't think such UAPI as above is future-proof. In case we would want
a similar mechanism in future for other maps, we would need a whole new bpf
command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
the underlying map may not even be a map-to-map. Additionally, we don't have
any map object at hand in the above, so we couldn't make any finer grained
decisions either. Something like below would be more suitable and leaves room
for extending this further in future.

Thanks,
Daniel

>From 8dfea71b73fa0d402633b76f78c106e82a7a5007 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 30 Jul 2018 11:47:37 +0200
Subject: [PATCH] sync map refs

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c    |  1 +
 kernel/bpf/hashtab.c     |  1 +
 kernel/bpf/map_in_map.c  |  6 ++++++
 kernel/bpf/map_in_map.h  |  1 +
 kernel/bpf/syscall.c     | 24 ++++++++++++++++++++++++
 7 files changed, 35 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b5ad95..7b51f86 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -34,6 +34,7 @@ struct bpf_map_ops {
 	void (*map_free)(struct bpf_map *map);
 	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
 	void (*map_release_uref)(struct bpf_map *map);
+	int (*map_sync_refs)(struct bpf_map *map);

 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8701139..e6ec1de 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -98,6 +98,7 @@ enum bpf_cmd {
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
+	BPF_MAP_SYNC_REFS,
 };

 enum bpf_map_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 544e58f..ddaf42a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -748,5 +748,6 @@ const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_sync_refs = bpf_map_sync_refs,
 	.map_gen_lookup = array_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 513d9df..05380ea 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1407,5 +1407,6 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_fd_get_ptr = bpf_map_fd_get_ptr,
 	.map_fd_put_ptr = bpf_map_fd_put_ptr,
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
+	.map_sync_refs = bpf_map_sync_refs,
 	.map_gen_lookup = htab_of_map_gen_lookup,
 };
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 1da5746..698a50f 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -96,6 +96,12 @@ void bpf_map_fd_put_ptr(void *ptr)
 	bpf_map_put(ptr);
 }

+int bpf_map_sync_refs(struct bpf_map *map)
+{
+	synchronize_rcu();
+	return 0;
+}
+
 u32 bpf_map_fd_sys_lookup_elem(void *ptr)
 {
 	return ((struct bpf_map *)ptr)->id;
diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h
index 6183db9..ac02456 100644
--- a/kernel/bpf/map_in_map.h
+++ b/kernel/bpf/map_in_map.h
@@ -19,6 +19,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
 void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file,
 			 int ufd);
 void bpf_map_fd_put_ptr(void *ptr);
+int bpf_map_sync_refs(struct bpf_map *map);
 u32 bpf_map_fd_sys_lookup_elem(void *ptr);

 #endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a31a1ba..b1286cc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -896,6 +896,27 @@ static int map_get_next_key(union bpf_attr *attr)
 	return err;
 }

+#define BPF_MAP_SYNC_REFS_LAST_FIELD map_fd
+
+static int map_sync_refs(union bpf_attr *attr)
+{
+	int err = -ENOTSUPP, ufd = attr->map_fd;
+	struct bpf_map *map;
+	struct fd f;
+
+	if (CHECK_ATTR(BPF_MAP_SYNC_REFS))
+		return -EINVAL;
+
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	if (map->ops->map_sync_refs)
+		err = map->ops->map_sync_refs(map);
+	fdput(f);
+	return err;
+}
+
 static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name) \
 	[_id] = & _name ## _prog_ops,
@@ -2303,6 +2324,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_GET_NEXT_KEY:
 		err = map_get_next_key(&attr);
 		break;
+	case BPF_MAP_SYNC_REFS:
+		err = map_sync_refs(&attr);
+		break;
 	case BPF_PROG_LOAD:
 		err = bpf_prog_load(&attr);
 		break;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-30 10:04       ` Daniel Borkmann
  (?)
@ 2018-07-30 10:25       ` Daniel Colascione
  2018-07-31  0:26         ` Jakub Kicinski
  -1 siblings, 1 reply; 31+ messages in thread
From: Daniel Colascione @ 2018-07-30 10:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev,
	Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Hmm, I don't think such UAPI as above is future-proof. In case we would want
> a similar mechanism in future for other maps, we would need a whole new bpf
> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
> the underlying map may not even be a map-to-map. Additionally, we don't have
> any map object at hand in the above, so we couldn't make any finer grained
> decisions either. Something like below would be more suitable and leaves room
> for extending this further in future.

YAGNI. Your proposed mechanism doesn't add anything under the current
implementation. It's also not clear how a map-specific synchronization
command is supposed to work in cases where we swap multiple map
references. Do we synchronize_rcu multiple times? Why would we impose
that inefficiency just for the sake of some non-specific future
extensibility? Add some kind of batching layer? The current approach
works for the anticipated use cases.

While my preference is not to talk about map-to-maps at all in the
user API and instead spec the thing as talking about map references in
general, I'd rather have something that talks about
references-to-maps-acquired-from-maps than this interface.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-30 10:25       ` Daniel Colascione
@ 2018-07-31  0:26         ` Jakub Kicinski
  2018-07-31  0:33           ` Daniel Colascione
  0 siblings, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2018-07-31  0:26 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray,
	netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > Hmm, I don't think such UAPI as above is future-proof. In case we would want
> > a similar mechanism in future for other maps, we would need a whole new bpf
> > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
> > the underlying map may not even be a map-to-map. Additionally, we don't have
> > any map object at hand in the above, so we couldn't make any finer grained
> > decisions either. Something like below would be more suitable and leaves room
> > for extending this further in future.  
> 
> YAGNI.  Your proposed mechanism doesn't add anything under the current
> implementation. 

FWIW in case of HW offload targeting a particular map may allow users
to avoid a potentially slow sync with all the devices on the system.

> It's also not clear how a map-specific synchronization
> command is supposed to work in cases where we swap multiple map
> references. Do we synchronize_rcu multiple times? Why would we impose
> that inefficiency just for the sake of some non-specific future
> extensibility? Add some kind of batching layer? The current approach
> works for the anticipated use cases.
> 
> While my preference is not to talk about map-to-maps at all in the
> user API and instead spec the thing as talking about map references in
> general, I'd rather have something that talks about
> references-to-maps-acquired-from-maps than this interface.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  0:26         ` Jakub Kicinski
@ 2018-07-31  0:33           ` Daniel Colascione
  2018-07-31  0:45             ` Jakub Kicinski
  2018-07-31  8:34             ` Daniel Borkmann
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel Colascione @ 2018-07-31  0:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray,
	netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
> > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > Hmm, I don't think such UAPI as above is future-proof. In case we would want
> > > a similar mechanism in future for other maps, we would need a whole new bpf
> > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
> > > the underlying map may not even be a map-to-map. Additionally, we don't have
> > > any map object at hand in the above, so we couldn't make any finer grained
> > > decisions either. Something like below would be more suitable and leaves room
> > > for extending this further in future.
> >
> > YAGNI.  Your proposed mechanism doesn't add anything under the current
> > implementation.
>
> FWIW in case of HW offload targeting a particular map may allow users
> to avoid a potentially slow sync with all the devices on the system.


Sure. But such a thing doesn't exist right now (right?), and we can
add that more-efficient-in-that-one-case BPF interface when it lands.
I'd rather keep things simple for now.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  0:33           ` Daniel Colascione
@ 2018-07-31  0:45             ` Jakub Kicinski
  2018-07-31  0:50               ` Daniel Colascione
  2018-07-31  8:34             ` Daniel Borkmann
  1 sibling, 1 reply; 31+ messages in thread
From: Jakub Kicinski @ 2018-07-31  0:45 UTC (permalink / raw)
  To: Daniel Colascione, Daniel Borkmann
  Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev,
	Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, 30 Jul 2018 17:33:39 -0700, Daniel Colascione wrote:
> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski wrote:
> > On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:  
> > > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:  
> > > > Hmm, I don't think such UAPI as above is future-proof. In case we would want
> > > > a similar mechanism in future for other maps, we would need a whole new bpf
> > > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
> > > > the underlying map may not even be a map-to-map. Additionally, we don't have
> > > > any map object at hand in the above, so we couldn't make any finer grained
> > > > decisions either. Something like below would be more suitable and leaves room
> > > > for extending this further in future.  
> > >
> > > YAGNI.  Your proposed mechanism doesn't add anything under the current
> > > implementation.  
> >
> > FWIW in case of HW offload targeting a particular map may allow users
> > to avoid a potentially slow sync with all the devices on the system.  
> 
> Sure. But such a thing doesn't exist right now (right?), and we can
> add that more-efficient-in-that-one-case BPF interface when it lands.
> I'd rather keep things simple for now.

You mean map-in-map offload doesn't exist today?  True, but it's on the
roadmap for Netronome.

Tangentially it would be really useful for us to have a "the map has
actually been freed" notification/barrier.  We have complaints of users
creating huge maps and then trying to free and recreate them quickly,
and the creation part failing with -ENOMEM, because the free from a
workqueue didn't run, yet :(  It'd probably require a different API to
solve than what's discussed here, but since we're talking about syncing
things I thought I'd put it out there...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  0:45             ` Jakub Kicinski
@ 2018-07-31  0:50               ` Daniel Colascione
  2018-07-31  1:14                 ` Jakub Kicinski
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Colascione @ 2018-07-31  0:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray,
	netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, Jul 30, 2018 at 5:45 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> On Mon, 30 Jul 2018 17:33:39 -0700, Daniel Colascione wrote:
>> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski wrote:
>> > On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
>> > > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> > > > Hmm, I don't think such UAPI as above is future-proof. In case we would want
>> > > > a similar mechanism in future for other maps, we would need a whole new bpf
>> > > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
>> > > > the underlying map may not even be a map-to-map. Additionally, we don't have
>> > > > any map object at hand in the above, so we couldn't make any finer grained
>> > > > decisions either. Something like below would be more suitable and leaves room
>> > > > for extending this further in future.
>> > >
>> > > YAGNI.  Your proposed mechanism doesn't add anything under the current
>> > > implementation.
>> >
>> > FWIW in case of HW offload targeting a particular map may allow users
>> > to avoid a potentially slow sync with all the devices on the system.
>>
>> Sure. But such a thing doesn't exist right now (right?), and we can
>> add that more-efficient-in-that-one-case BPF interface when it lands.
>> I'd rather keep things simple for now.
>
> You mean map-in-map offload doesn't exist today?  True, but it's on the
> roadmap for Netronome.

Sounds cool. I'd still wait until map-in-map offload lands until
adding the map-specific API though.

> Tangentially it would be really useful for us to have a "the map has
> actually been freed" notification/barrier.  We have complaints of users
> creating huge maps and then trying to free and recreate them quickly,
> and the creation part failing with -ENOMEM, because the free from a
> workqueue didn't run, yet :(  It'd probably require a different API to
> solve than what's discussed here, but since we're talking about syncing
> things I thought I'd put it out there...

Yeah. What you're talking about is what I meant upthread when I
briefly mentioned a "refcount draining approach" --- you'd block until
all references except your own to a map disappeared.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  0:50               ` Daniel Colascione
@ 2018-07-31  1:14                 ` Jakub Kicinski
  0 siblings, 0 replies; 31+ messages in thread
From: Jakub Kicinski @ 2018-07-31  1:14 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray,
	netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Mon, 30 Jul 2018 17:50:15 -0700, Daniel Colascione wrote:
> > Tangentially it would be really useful for us to have a "the map has
> > actually been freed" notification/barrier.  We have complaints of users
> > creating huge maps and then trying to free and recreate them quickly,
> > and the creation part failing with -ENOMEM, because the free from a
> > workqueue didn't run, yet :(  It'd probably require a different API to
> > solve than what's discussed here, but since we're talking about syncing
> > things I thought I'd put it out there...  
> 
> Yeah. What you're talking about is what I meant upthread when I
> briefly mentioned a "refcount draining approach" --- you'd block until
> all references except your own to a map disappeared.

I don't think so.  The ref count goes to 0, work gets scheduled to call
free, work runs, free gets called, device memory gets freed.  Now the
memory can be reused.  As long as there are any refs we can't free
memory, unless we want to make the semantics really ugly.

But as I said, only superficially related to this patch.  Perhaps
solution will naturally fall out of an API to query the device
capabilities and resources, if we ever get there.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov
  2018-07-29 20:46 ` Daniel Colascione
@ 2018-07-31  2:01 ` Joel Fernandes
  2018-07-31  2:06   ` Joel Fernandes
  1 sibling, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2018-07-31  2:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray,
	Network Development, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann

On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
> On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
> > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
> > map made by a BPF program that is running at the time the
> > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
> > to provide a means for userspace to replace a BPF map with another,
> > newer version, then ensure that no component is still using the "old"
> > map before manipulating the "old" map in some way.
> >
> > Signed-off-by: Daniel Colascione <dancol@google.com>
> > ---
> >  include/uapi/linux/bpf.h |  9 +++++++++
> >  kernel/bpf/syscall.c     | 13 +++++++++++++
> >  2 files changed, 22 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b7db3261c62d..5b27e9117d3e 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
> >         __u8    data[0];        /* Arbitrary size */
> >  };
> >
> > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
> > + * BPF map made by a BPF program that is running at the time the
> > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
> 
> that doesn't sound right to me.
> such command won't wait for the release of the references.
> in case of map-in-map the program does not hold
> the references to inner map (only to outer map).

I didn't follow this completely.

The userspace program is using the inner map per your description of the
algorithm for using map-in-map to solve the race conditions that this patch
is trying to address:

If you don't mind, I copy-pasted it below from your netdev post:

if you use map-in-map you don't need extra boolean map.
0. bpf prog can do
   inner_map = lookup(map_in_map, key=0);
   lookup(inner_map, your_real_key);
1. user space writes into map_in_map[0] <- FD of new map
2. some cpus are using old inner map and some a new
3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
   which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
   which will guarantee that progs finished.
4. scan old inner map

In step 2, as you mentioned there are CPUs using different inner maps. So
could you clarify how the synchronize_rcu mechanism will even work if you're
now saying "program does not hold references to the inner maps"? 

Thanks!

- Joel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-31  2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes
@ 2018-07-31  2:06   ` Joel Fernandes
  2018-07-31  4:03     ` Y Song
  0 siblings, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2018-07-31  2:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray,
	Network Development, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote:
> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
> > > map made by a BPF program that is running at the time the
> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
> > > to provide a means for userspace to replace a BPF map with another,
> > > newer version, then ensure that no component is still using the "old"
> > > map before manipulating the "old" map in some way.
> > >
> > > Signed-off-by: Daniel Colascione <dancol@google.com>
> > > ---
> > >  include/uapi/linux/bpf.h |  9 +++++++++
> > >  kernel/bpf/syscall.c     | 13 +++++++++++++
> > >  2 files changed, 22 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index b7db3261c62d..5b27e9117d3e 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
> > >         __u8    data[0];        /* Arbitrary size */
> > >  };
> > >
> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
> > > + * BPF map made by a BPF program that is running at the time the
> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
> > 
> > that doesn't sound right to me.
> > such command won't wait for the release of the references.
> > in case of map-in-map the program does not hold
> > the references to inner map (only to outer map).
> 
> I didn't follow this completely.
> 
> The userspace program is using the inner map per your description of the

Sorry just to correct myself, here I meant "The kernel eBPF program is using
the inner map on multiple CPUs" instead of "userspace".

thanks,

- Joel





> algorithm for using map-in-map to solve the race conditions that this patch
> is trying to address:
> 
> If you don't mind, I copy-pasted it below from your netdev post:
> 
> if you use map-in-map you don't need extra boolean map.
> 0. bpf prog can do
>    inner_map = lookup(map_in_map, key=0);
>    lookup(inner_map, your_real_key);
> 1. user space writes into map_in_map[0] <- FD of new map
> 2. some cpus are using old inner map and some a new
> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
>    which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
>    which will guarantee that progs finished.
> 4. scan old inner map
> 
> In step 2, as you mentioned there are CPUs using different inner maps. So
> could you clarify how the synchronize_rcu mechanism will even work if you're
> now saying "program does not hold references to the inner maps"? 
> 
> Thanks!
> 
> - Joel
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-31  2:06   ` Joel Fernandes
@ 2018-07-31  4:03     ` Y Song
  2018-07-31 21:56       ` Joel Fernandes
  0 siblings, 1 reply; 31+ messages in thread
From: Y Song @ 2018-07-31  4:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexei Starovoitov, Daniel Colascione, Joel Fernandes, LKML,
	Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote:
>> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
>> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
>> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
>> > > map made by a BPF program that is running at the time the
>> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
>> > > to provide a means for userspace to replace a BPF map with another,
>> > > newer version, then ensure that no component is still using the "old"
>> > > map before manipulating the "old" map in some way.
>> > >
>> > > Signed-off-by: Daniel Colascione <dancol@google.com>
>> > > ---
>> > >  include/uapi/linux/bpf.h |  9 +++++++++
>> > >  kernel/bpf/syscall.c     | 13 +++++++++++++
>> > >  2 files changed, 22 insertions(+)
>> > >
>> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> > > index b7db3261c62d..5b27e9117d3e 100644
>> > > --- a/include/uapi/linux/bpf.h
>> > > +++ b/include/uapi/linux/bpf.h
>> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
>> > >         __u8    data[0];        /* Arbitrary size */
>> > >  };
>> > >
>> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
>> > > + * BPF map made by a BPF program that is running at the time the
>> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
>> >
>> > that doesn't sound right to me.
>> > such command won't wait for the release of the references.
>> > in case of map-in-map the program does not hold
>> > the references to inner map (only to outer map).
>>
>> I didn't follow this completely.
>>
>> The userspace program is using the inner map per your description of the
>
> Sorry just to correct myself, here I meant "The kernel eBPF program is using
> the inner map on multiple CPUs" instead of "userspace".
>
> thanks,
>
> - Joel
>
>
>
>
>
>> algorithm for using map-in-map to solve the race conditions that this patch
>> is trying to address:
>>
>> If you don't mind, I copy-pasted it below from your netdev post:
>>
>> if you use map-in-map you don't need extra boolean map.
>> 0. bpf prog can do
>>    inner_map = lookup(map_in_map, key=0);
>>    lookup(inner_map, your_real_key);
>> 1. user space writes into map_in_map[0] <- FD of new map
>> 2. some cpus are using old inner map and some a new
>> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
>>    which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
>>    which will guarantee that progs finished.
>> 4. scan old inner map
>>
>> In step 2, as you mentioned there are CPUs using different inner maps. So
>> could you clarify how the synchronize_rcu mechanism will even work if you're
>> now saying "program does not hold references to the inner maps"?

The program only held references to the outer maps, and the outer map
held references to the inner maps. The user space program can add/remove
the inner map for a particular outer map while the prog <-> outer-map
relationship is not changed.

>>
>> Thanks!
>>
>> - Joel
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  0:33           ` Daniel Colascione
  2018-07-31  0:45             ` Jakub Kicinski
@ 2018-07-31  8:34             ` Daniel Borkmann
  2018-07-31  9:36               ` Daniel Colascione
  1 sibling, 1 reply; 31+ messages in thread
From: Daniel Borkmann @ 2018-07-31  8:34 UTC (permalink / raw)
  To: Daniel Colascione, Jakub Kicinski
  Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev,
	Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On 07/31/2018 02:33 AM, Daniel Colascione wrote:
> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
>>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want
>>>> a similar mechanism in future for other maps, we would need a whole new bpf
>>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
>>>> the underlying map may not even be a map-to-map. Additionally, we don't have
>>>> any map object at hand in the above, so we couldn't make any finer grained
>>>> decisions either. Something like below would be more suitable and leaves room
>>>> for extending this further in future.
>>>
>>> YAGNI.  Your proposed mechanism doesn't add anything under the current
>>> implementation.
>>
>> FWIW in case of HW offload targeting a particular map may allow users
>> to avoid a potentially slow sync with all the devices on the system.
> 
> Sure. But such a thing doesn't exist right now (right?), and we can
> add that more-efficient-in-that-one-case BPF interface when it lands.
> I'd rather keep things simple for now.

I don't see a reason why that is even more complicated. An API command name
such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
exposes specific map details (here: map-in-map) into the UAPI whereas it
should reside within a specific implementation instead similar to other ops
we have for maps. If in future other maps would be added that would have
similar mechanisms of inner objects they return to the BPF program, we'll
be adding yet another command just for this. Also, union bpf_attr is extensible,
e.g. additional members could be added in future whenever needed for this
subcommand instead of forcing it to NULL as done here. E.g. having a high-level
one like bpf(BPF_MAP_WAIT, { .map_fd = fd }) could later on also cover the
use-case from Jakub by adding an 'event' member into union bpf_attr where we
could add other map specific wakeups for user space while the current default
of 0 would be BPF_MAP_WAIT_PENDING_REFS (or the like). All I'm saying is to
keep it generic so it can be extended later.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  8:34             ` Daniel Borkmann
@ 2018-07-31  9:36               ` Daniel Colascione
  2018-08-10 22:52                 ` Alexei Starovoitov
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Colascione @ 2018-07-31  9:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jakub Kicinski, Joel Fernandes, linux-kernel, Tim Murray, netdev,
	Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Tue, Jul 31, 2018 at 1:34 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 07/31/2018 02:33 AM, Daniel Colascione wrote:
>> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski
>> <jakub.kicinski@netronome.com> wrote:
>>> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote:
>>>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want
>>>>> a similar mechanism in future for other maps, we would need a whole new bpf
>>>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
>>>>> the underlying map may not even be a map-to-map. Additionally, we don't have
>>>>> any map object at hand in the above, so we couldn't make any finer grained
>>>>> decisions either. Something like below would be more suitable and leaves room
>>>>> for extending this further in future.
>>>>
>>>> YAGNI.  Your proposed mechanism doesn't add anything under the current
>>>> implementation.
>>>
>>> FWIW in case of HW offload targeting a particular map may allow users
>>> to avoid a potentially slow sync with all the devices on the system.
>>
>> Sure. But such a thing doesn't exist right now (right?), and we can
>> add that more-efficient-in-that-one-case BPF interface when it lands.
>> I'd rather keep things simple for now.
>
> I don't see a reason why that is even more complicated.

Both the API and the implementation are much more complicated in the
per-map ops version: just look at the patch size. The size argument
isn't necessarily a dealbreaker, but I still don't see what the extra
code size and complexity is buying.

> An API command name
> such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
> exposes specific map details (here: map-in-map) into the UAPI whereas it
> should reside within a specific implementation instead similar to other ops
> we have for maps.

But synchronize isn't conceptually a command that applies to a
specific map. It waits on all references. Did you address my point
about your proposed map-specific interface requiring redundant
synchronize_rcu calls in the case where we swap multiple maps and want
to wait for all the references to drain? Under my proposal, you'd just
BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your
proposal, we'd make it a per-map operation, so we'd issue one
synchronize_rcu per map.

> If in future other maps would be added that would have
> similar mechanisms of inner objects they return to the BPF program, we'll
> be adding yet another command just for this.

And that's why my personal preference is to just calling this thing
BPF_SYNCHRONIZE, which I'd define to wait for all such "inner
objects". Alexei is the one who asked for the very specific naming, I
believe.

Anyway, we have a very simple patch that we could apply today. It
addresses a real need, and it doesn't preclude adding something more
specific later, when we know we need it. Besides, it's not as if
adding a BPF command is particularly expensive.

> Also, union bpf_attr is extensible,
> e.g. additional members could be added in future whenever needed for this
> subcommand instead of forcing it to NULL as done here.

We fail with EINVAL when attr != NULL now, which means that we can
safely accept a non-NULL attr-based subcommand later without breaking
anyone. The interface is already extensible.

> All I'm saying is to
> keep it generic so it can be extended later.

Sure, but no more extensible than it has to be. Prematurely-added
extension points tend to cause trouble later.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-31  4:03     ` Y Song
@ 2018-07-31 21:56       ` Joel Fernandes
  2018-07-31 22:30         ` Daniel Borkmann
  0 siblings, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2018-07-31 21:56 UTC (permalink / raw)
  To: Y Song
  Cc: Joel Fernandes, Alexei Starovoitov, Daniel Colascione, LKML,
	Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann

On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote:
> On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote:
> >> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
> >> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
> >> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
> >> > > map made by a BPF program that is running at the time the
> >> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
> >> > > to provide a means for userspace to replace a BPF map with another,
> >> > > newer version, then ensure that no component is still using the "old"
> >> > > map before manipulating the "old" map in some way.
> >> > >
> >> > > Signed-off-by: Daniel Colascione <dancol@google.com>
> >> > > ---
> >> > >  include/uapi/linux/bpf.h |  9 +++++++++
> >> > >  kernel/bpf/syscall.c     | 13 +++++++++++++
> >> > >  2 files changed, 22 insertions(+)
> >> > >
> >> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> > > index b7db3261c62d..5b27e9117d3e 100644
> >> > > --- a/include/uapi/linux/bpf.h
> >> > > +++ b/include/uapi/linux/bpf.h
> >> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
> >> > >         __u8    data[0];        /* Arbitrary size */
> >> > >  };
> >> > >
> >> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
> >> > > + * BPF map made by a BPF program that is running at the time the
> >> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
> >> >
> >> > that doesn't sound right to me.
> >> > such command won't wait for the release of the references.
> >> > in case of map-in-map the program does not hold
> >> > the references to inner map (only to outer map).
> >>
> >> I didn't follow this completely.
> >>
> >> The userspace program is using the inner map per your description of the
> >> algorithm for using map-in-map to solve the race conditions that this patch
> >> is trying to address:
> >>
> >> If you don't mind, I copy-pasted it below from your netdev post:
> >>
> >> if you use map-in-map you don't need extra boolean map.
> >> 0. bpf prog can do
> >>    inner_map = lookup(map_in_map, key=0);
> >>    lookup(inner_map, your_real_key);
> >> 1. user space writes into map_in_map[0] <- FD of new map
> >> 2. some cpus are using old inner map and some a new
> >> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
> >>    which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
> >>    which will guarantee that progs finished.
> >> 4. scan old inner map
> >>
> >> In step 2, as you mentioned there are CPUs using different inner maps. So
> >> could you clarify how the synchronize_rcu mechanism will even work if you're
> >> now saying "program does not hold references to the inner maps"?
> 
> The program only held references to the outer maps, and the outer map
> held references to the inner maps. The user space program can add/remove
> the inner map for a particular outer map while the prog <-> outer-map
> relationship is not changed.

My definition of "reference" in this context is protection by rcu_read_lock.

So I was concerned the above map-in-map access isn't protected as such when
Alexei said "program doesn't have reference on inner map" in the above steps.
Maybe I misunderstood what is the meaning of reference here.

To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the
access of outer map at key=0 and the inner map have to protect by
rcu_read_lock so that the membarrier call will work.

So basically step 0 in the steps above should be rcu_read_lock protected to
satisfy Chenbo/Lorenzo's usecase.

I know today the entire program is run as preempt disabled (unless something
changed) so this shouldn't be a problem, but in the future if the verifier is
doing similar things at a finer grainer level, then the above has to be
taken into consideration.

Does that make sense or am I missing something?

thanks,

- Joel



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
  2018-07-31 21:56       ` Joel Fernandes
@ 2018-07-31 22:30         ` Daniel Borkmann
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel Borkmann @ 2018-07-31 22:30 UTC (permalink / raw)
  To: Joel Fernandes, Y Song
  Cc: Joel Fernandes, Alexei Starovoitov, Daniel Colascione, LKML,
	Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On 07/31/2018 11:56 PM, Joel Fernandes wrote:
> On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote:
>> On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
>>> On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote:
>>>> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
>>>>> On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote:
>>>>>> BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
>>>>>> map made by a BPF program that is running at the time the
>>>>>> BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
>>>>>> to provide a means for userspace to replace a BPF map with another,
>>>>>> newer version, then ensure that no component is still using the "old"
>>>>>> map before manipulating the "old" map in some way.
>>>>>>
>>>>>> Signed-off-by: Daniel Colascione <dancol@google.com>
>>>>>> ---
>>>>>>  include/uapi/linux/bpf.h |  9 +++++++++
>>>>>>  kernel/bpf/syscall.c     | 13 +++++++++++++
>>>>>>  2 files changed, 22 insertions(+)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>>> index b7db3261c62d..5b27e9117d3e 100644
>>>>>> --- a/include/uapi/linux/bpf.h
>>>>>> +++ b/include/uapi/linux/bpf.h
>>>>>> @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
>>>>>>         __u8    data[0];        /* Arbitrary size */
>>>>>>  };
>>>>>>
>>>>>> +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
>>>>>> + * BPF map made by a BPF program that is running at the time the
>>>>>> + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
>>>>>
>>>>> that doesn't sound right to me.
>>>>> such command won't wait for the release of the references.
>>>>> in case of map-in-map the program does not hold
>>>>> the references to inner map (only to outer map).
>>>>
>>>> I didn't follow this completely.
>>>>
>>>> The userspace program is using the inner map per your description of the
>>>> algorithm for using map-in-map to solve the race conditions that this patch
>>>> is trying to address:
>>>>
>>>> If you don't mind, I copy-pasted it below from your netdev post:
>>>>
>>>> if you use map-in-map you don't need extra boolean map.
>>>> 0. bpf prog can do
>>>>    inner_map = lookup(map_in_map, key=0);
>>>>    lookup(inner_map, your_real_key);
>>>> 1. user space writes into map_in_map[0] <- FD of new map
>>>> 2. some cpus are using old inner map and some a new
>>>> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
>>>>    which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
>>>>    which will guarantee that progs finished.
>>>> 4. scan old inner map
>>>>
>>>> In step 2, as you mentioned there are CPUs using different inner maps. So
>>>> could you clarify how the synchronize_rcu mechanism will even work if you're
>>>> now saying "program does not hold references to the inner maps"?
>>
>> The program only held references to the outer maps, and the outer map
>> held references to the inner maps. The user space program can add/remove
>> the inner map for a particular outer map while the prog <-> outer-map
>> relationship is not changed.
> 
> My definition of "reference" in this context is protection by rcu_read_lock.
> 
> So I was concerned the above map-in-map access isn't protected as such when
> Alexei said "program doesn't have reference on inner map" in the above steps.
> Maybe I misunderstood what is the meaning of reference here.
> 
> To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the
> access of outer map at key=0 and the inner map have to protect by
> rcu_read_lock so that the membarrier call will work.
> 
> So basically step 0 in the steps above should be rcu_read_lock protected to
> satisfy Chenbo/Lorenzo's usecase.
> 
> I know today the entire program is run as preempt disabled (unless something
> changed) so this shouldn't be a problem, but in the future if the verifier is
> doing similar things at a finer grainer level, then the above has to be
> taken into consideration.
> 
> Does that make sense or am I missing something?

All BPF programs are required to run under rcu_read_lock today, so that
assumption holds. Should this ever change in future, then this constraint
of course needs to be taken into consideration.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
       [not found]     ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com>
@ 2018-08-10 22:29       ` Alexei Starovoitov
  0 siblings, 0 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2018-08-10 22:29 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann

On Thu, Aug 09, 2018 at 10:17:50AM -0700, Daniel Colascione wrote:
> Ping.

sorry for delay. have been off grid. let's continue in the other thread for full context


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-07-31  9:36               ` Daniel Colascione
@ 2018-08-10 22:52                 ` Alexei Starovoitov
  2018-08-14 20:37                   ` Daniel Colascione
  0 siblings, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2018-08-10 22:52 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel,
	Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Tue, Jul 31, 2018 at 02:36:39AM -0700, Daniel Colascione wrote:
> 
> > An API command name
> > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
> > exposes specific map details (here: map-in-map) into the UAPI whereas it
> > should reside within a specific implementation instead similar to other ops
> > we have for maps.
> 
> But synchronize isn't conceptually a command that applies to a
> specific map. It waits on all references. Did you address my point
> about your proposed map-specific interface requiring redundant
> synchronize_rcu calls in the case where we swap multiple maps and want
> to wait for all the references to drain? Under my proposal, you'd just
> BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your
> proposal, we'd make it a per-map operation, so we'd issue one
> synchronize_rcu per map.

optimizing for multi-map sync sounds like premature optimization.
In general I'd prefer DanielB proposal to make the sync logic map and fd specific,
but before we argue about implementation further let's agree on
the problem we're trying to solve.
I believe the only issue being discussed is user space doesn't know
when it's ok to start draining the inner map when it was replaced
by bpf_map_update syscall command with another map, right?
If we agree on that, should bpf_map_update handle it then?
Wouldn't it be much easier to understand and use from user pov?
No new commands to learn. map_update syscall replaced the map
and old map is no longer accessed by the program via this given map-in-map.
But if the replaced map is used directly or it sits in some other
map-in-map slot the progs can still access it.

My issue with DanielC SYNC cmd that it exposes implementation details
and introduces complex 'synchronization' semantics. To majority of
the users it won't be obvious what is being synchronized.

My issue with DanielB WAIT_REF map_fd cmd that it needs to wait for all refs
to this map to be dropped. I think combination of usercnt and refcnt
can answer that, but feels dangerous to sleep potentially forever
in a syscall until all prog->map references are gone, though such
cmd is useful beyond map-in-map use case.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-08-10 22:52                 ` Alexei Starovoitov
@ 2018-08-14 20:37                   ` Daniel Colascione
  2018-08-16  4:01                     ` Alexei Starovoitov
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel Colascione @ 2018-08-14 20:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel,
	Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Fri, Aug 10, 2018 at 3:52 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Jul 31, 2018 at 02:36:39AM -0700, Daniel Colascione wrote:
>>
>> > An API command name
>> > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and
>> > exposes specific map details (here: map-in-map) into the UAPI whereas it
>> > should reside within a specific implementation instead similar to other ops
>> > we have for maps.
>>
>> But synchronize isn't conceptually a command that applies to a
>> specific map. It waits on all references. Did you address my point
>> about your proposed map-specific interface requiring redundant
>> synchronize_rcu calls in the case where we swap multiple maps and want
>> to wait for all the references to drain? Under my proposal, you'd just
>> BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your
>> proposal, we'd make it a per-map operation, so we'd issue one
>> synchronize_rcu per map.
>
> optimizing for multi-map sync sounds like premature optimization.

Maybe, but the per-map proposal is less efficient *and* more
complicated! I don't want to spend more code just to go slower.

> I believe the only issue being discussed is user space doesn't know
> when it's ok to start draining the inner map when it was replaced
> by bpf_map_update syscall command with another map, right?

Yes.

> If we agree on that, should bpf_map_update handle it then?
> Wouldn't it be much easier to understand and use from user pov?
> No new commands to learn. map_update syscall replaced the map
> and old map is no longer accessed by the program via this given map-in-map.

Maybe with a new BPF_SYNCHRONIZE flag for BPF_MAP_UPDATE_ELEM and
BPF_MAP_DELETE_ELEM. Otherwise, it seems wrong to make every user of
these commands pay for synchronization that only a few will need.

> But if the replaced map is used directly or it sits in some other
> map-in-map slot the progs can still access it.
>
> My issue with DanielC SYNC cmd that it exposes implementation details

What implementation details? The command semantics are defined
entirely in terms of existing user-visible primitives.

> and introduces complex 'synchronization' semantics. To majority of
> the users it won't be obvious what is being synchronized.
>
> My issue with DanielB WAIT_REF map_fd cmd that it needs to wait for all refs
> to this map to be dropped. I think combination of usercnt and refcnt
> can answer that, but feels dangerous to sleep potentially forever
> in a syscall until all prog->map references are gone, though such
> cmd is useful beyond map-in-map use case.

In what scenarios?

In any case, can we submit _something_?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command
  2018-08-14 20:37                   ` Daniel Colascione
@ 2018-08-16  4:01                     ` Alexei Starovoitov
  2018-10-12 10:54                       ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione
  0 siblings, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2018-08-16  4:01 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel,
	Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng,
	Mathieu Desnoyers, Alexei Starovoitov

On Tue, Aug 14, 2018 at 01:37:12PM -0700, Daniel Colascione wrote:
> 
> > If we agree on that, should bpf_map_update handle it then?
> > Wouldn't it be much easier to understand and use from user pov?
> > No new commands to learn. map_update syscall replaced the map
> > and old map is no longer accessed by the program via this given map-in-map.
> 
> Maybe with a new BPF_SYNCHRONIZE flag for BPF_MAP_UPDATE_ELEM and
> BPF_MAP_DELETE_ELEM. Otherwise, it seems wrong to make every user of
> these commands pay for synchronization that only a few will need.

I don't think extra flag is needed. Extra sync_rcu() for map-in-map
is useful for all users. I would consider it a bugfix,
since users that examine deleted map have this race today
and removing the race is always a good thing especially since the cost
is small.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-08-16  4:01                     ` Alexei Starovoitov
@ 2018-10-12 10:54                       ` Daniel Colascione
  2018-10-12 20:54                         ` Joel Fernandes
  2018-10-13  2:31                         ` Alexei Starovoitov
  0 siblings, 2 replies; 31+ messages in thread
From: Daniel Colascione @ 2018-10-12 10:54 UTC (permalink / raw)
  To: joelaf
  Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, Daniel Colascione

The map-in-map frequently serves as a mechanism for atomic
snapshotting of state that a BPF program might record.  The current
implementation is dangerous to use in this way, however, since
userspace has no way of knowing when all programs that might have
retrieved the "old" value of the map may have completed.

This change ensures that map update operations on map-in-map map types
always wait for all references to the old map to drop before returning
to userspace.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 kernel/bpf/syscall.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8339d81cba1d..d7c16ae1e85a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
 	return err;
 }
 
+static void maybe_wait_bpf_programs(struct bpf_map *map)
+{
+	/* Wait for any running BPF programs to complete so that
+	 * userspace, when we return to it, knows that all programs
+	 * that could be running use the new map value.
+	 */
+	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
+	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
+		synchronize_rcu();
+	}
+}
+
 #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
 
 static int map_update_elem(union bpf_attr *attr)
@@ -831,6 +843,7 @@ static int map_update_elem(union bpf_attr *attr)
 	}
 	__this_cpu_dec(bpf_prog_active);
 	preempt_enable();
+	maybe_wait_bpf_programs(map);
 out:
 free_value:
 	kfree(value);
@@ -883,6 +896,7 @@ static int map_delete_elem(union bpf_attr *attr)
 	rcu_read_unlock();
 	__this_cpu_dec(bpf_prog_active);
 	preempt_enable();
+	maybe_wait_bpf_programs(map);
 out:
 	kfree(key);
 err_put:
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-12 10:54                       ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione
@ 2018-10-12 20:54                         ` Joel Fernandes
  2018-10-13  2:31                         ` Alexei Starovoitov
  1 sibling, 0 replies; 31+ messages in thread
From: Joel Fernandes @ 2018-10-12 20:54 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, kernel-team

On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> The map-in-map frequently serves as a mechanism for atomic
> snapshotting of state that a BPF program might record.  The current
> implementation is dangerous to use in this way, however, since
> userspace has no way of knowing when all programs that might have
> retrieved the "old" value of the map may have completed.
> 
> This change ensures that map update operations on map-in-map map types
> always wait for all references to the old map to drop before returning
> to userspace.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  kernel/bpf/syscall.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8339d81cba1d..d7c16ae1e85a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
>  	return err;
>  }
>  
> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> +{
> +	/* Wait for any running BPF programs to complete so that
> +	 * userspace, when we return to it, knows that all programs
> +	 * that could be running use the new map value.
> +	 */
> +	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> +	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> +		synchronize_rcu();
> +	}
> +}
> +
>  #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
>  
>  static int map_update_elem(union bpf_attr *attr)
> @@ -831,6 +843,7 @@ static int map_update_elem(union bpf_attr *attr)
>  	}
>  	__this_cpu_dec(bpf_prog_active);
>  	preempt_enable();
> +	maybe_wait_bpf_programs(map);
>  out:
>  free_value:
>  	kfree(value);
> @@ -883,6 +896,7 @@ static int map_delete_elem(union bpf_attr *attr)
>  	rcu_read_unlock();
>  	__this_cpu_dec(bpf_prog_active);
>  	preempt_enable();
> +	maybe_wait_bpf_programs(map);

Looks good to me,

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Also I believe that those rcu_read_lock() and rcu_read_unlock() calls in the
existing code are useless. preempt_disable()d code is already an RCU
read-side section, and synchronize_rcu and friends work on those type of
read-side sections as well (as of recent kernel releases) however removing it
may make lockdep unhappy, unless we also replace all rcu_dereference() usages
with rcu_dereference_sched(), so lets leave that alone for now I guess.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-12 10:54                       ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione
  2018-10-12 20:54                         ` Joel Fernandes
@ 2018-10-13  2:31                         ` Alexei Starovoitov
  2018-10-16 17:39                           ` Joel Fernandes
  1 sibling, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2018-10-13  2:31 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: joelaf, linux-kernel, timmurray, netdev, Lorenzo Colitti,
	Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov,
	Daniel Borkmann

On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> The map-in-map frequently serves as a mechanism for atomic
> snapshotting of state that a BPF program might record.  The current
> implementation is dangerous to use in this way, however, since
> userspace has no way of knowing when all programs that might have
> retrieved the "old" value of the map may have completed.
> 
> This change ensures that map update operations on map-in-map map types
> always wait for all references to the old map to drop before returning
> to userspace.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  kernel/bpf/syscall.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 8339d81cba1d..d7c16ae1e85a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
>  	return err;
>  }
>  
> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> +{
> +	/* Wait for any running BPF programs to complete so that
> +	 * userspace, when we return to it, knows that all programs
> +	 * that could be running use the new map value.
> +	 */
> +	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> +	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> +		synchronize_rcu();
> +	}

extra {} were not necessary. I removed them while applying to bpf-next.
Please run checkpatch.pl next time.
Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-13  2:31                         ` Alexei Starovoitov
@ 2018-10-16 17:39                           ` Joel Fernandes
  2018-10-18 15:46                             ` Alexei Starovoitov
  0 siblings, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2018-10-16 17:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, stable

On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
>> The map-in-map frequently serves as a mechanism for atomic
>> snapshotting of state that a BPF program might record.  The current
>> implementation is dangerous to use in this way, however, since
>> userspace has no way of knowing when all programs that might have
>> retrieved the "old" value of the map may have completed.
>>
>> This change ensures that map update operations on map-in-map map types
>> always wait for all references to the old map to drop before returning
>> to userspace.
>>
>> Signed-off-by: Daniel Colascione <dancol@google.com>
>> ---
>>  kernel/bpf/syscall.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 8339d81cba1d..d7c16ae1e85a 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
>>       return err;
>>  }
>>
>> +static void maybe_wait_bpf_programs(struct bpf_map *map)
>> +{
>> +     /* Wait for any running BPF programs to complete so that
>> +      * userspace, when we return to it, knows that all programs
>> +      * that could be running use the new map value.
>> +      */
>> +     if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
>> +         map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
>> +             synchronize_rcu();
>> +     }
>
> extra {} were not necessary. I removed them while applying to bpf-next.
> Please run checkpatch.pl next time.
> Thanks

Thanks Alexei for taking it. Me and Lorenzo were discussing that not
having this causes incorrect behavior for apps using map-in-map for
this. So I CC'd stable as well.

-Joel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-16 17:39                           ` Joel Fernandes
@ 2018-10-18 15:46                             ` Alexei Starovoitov
  2018-10-18 23:36                               ` Joel Fernandes
  0 siblings, 1 reply; 31+ messages in thread
From: Alexei Starovoitov @ 2018-10-18 15:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, stable

On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote:
> On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> >> The map-in-map frequently serves as a mechanism for atomic
> >> snapshotting of state that a BPF program might record.  The current
> >> implementation is dangerous to use in this way, however, since
> >> userspace has no way of knowing when all programs that might have
> >> retrieved the "old" value of the map may have completed.
> >>
> >> This change ensures that map update operations on map-in-map map types
> >> always wait for all references to the old map to drop before returning
> >> to userspace.
> >>
> >> Signed-off-by: Daniel Colascione <dancol@google.com>
> >> ---
> >>  kernel/bpf/syscall.c | 14 ++++++++++++++
> >>  1 file changed, 14 insertions(+)
> >>
> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> >> index 8339d81cba1d..d7c16ae1e85a 100644
> >> --- a/kernel/bpf/syscall.c
> >> +++ b/kernel/bpf/syscall.c
> >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
> >>       return err;
> >>  }
> >>
> >> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> >> +{
> >> +     /* Wait for any running BPF programs to complete so that
> >> +      * userspace, when we return to it, knows that all programs
> >> +      * that could be running use the new map value.
> >> +      */
> >> +     if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> >> +         map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> >> +             synchronize_rcu();
> >> +     }
> >
> > extra {} were not necessary. I removed them while applying to bpf-next.
> > Please run checkpatch.pl next time.
> > Thanks
> 
> Thanks Alexei for taking it. Me and Lorenzo were discussing that not
> having this causes incorrect behavior for apps using map-in-map for
> this. So I CC'd stable as well.

It is too late in the release cycle.
We can submit it to stable releases after the merge window.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-18 15:46                             ` Alexei Starovoitov
@ 2018-10-18 23:36                               ` Joel Fernandes
  2018-11-10  2:01                                 ` Chenbo Feng
  0 siblings, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2018-10-18 23:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev,
	Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers,
	Alexei Starovoitov, Daniel Borkmann, stable

On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote:
> On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote:
> > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> > >> The map-in-map frequently serves as a mechanism for atomic
> > >> snapshotting of state that a BPF program might record.  The current
> > >> implementation is dangerous to use in this way, however, since
> > >> userspace has no way of knowing when all programs that might have
> > >> retrieved the "old" value of the map may have completed.
> > >>
> > >> This change ensures that map update operations on map-in-map map types
> > >> always wait for all references to the old map to drop before returning
> > >> to userspace.
> > >>
> > >> Signed-off-by: Daniel Colascione <dancol@google.com>
> > >> ---
> > >>  kernel/bpf/syscall.c | 14 ++++++++++++++
> > >>  1 file changed, 14 insertions(+)
> > >>
> > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > >> index 8339d81cba1d..d7c16ae1e85a 100644
> > >> --- a/kernel/bpf/syscall.c
> > >> +++ b/kernel/bpf/syscall.c
> > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
> > >>       return err;
> > >>  }
> > >>
> > >> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> > >> +{
> > >> +     /* Wait for any running BPF programs to complete so that
> > >> +      * userspace, when we return to it, knows that all programs
> > >> +      * that could be running use the new map value.
> > >> +      */
> > >> +     if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> > >> +         map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> > >> +             synchronize_rcu();
> > >> +     }
> > >
> > > extra {} were not necessary. I removed them while applying to bpf-next.
> > > Please run checkpatch.pl next time.
> > > Thanks
> > 
> > Thanks Alexei for taking it. Me and Lorenzo were discussing that not
> > having this causes incorrect behavior for apps using map-in-map for
> > this. So I CC'd stable as well.
> 
> It is too late in the release cycle.
> We can submit it to stable releases after the merge window.
> 

Sounds good, thanks.

- Joel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-10-18 23:36                               ` Joel Fernandes
@ 2018-11-10  2:01                                 ` Chenbo Feng
  2018-11-10 15:22                                   ` Greg KH
  0 siblings, 1 reply; 31+ messages in thread
From: Chenbo Feng @ 2018-11-10  2:01 UTC (permalink / raw)
  To: joel
  Cc: Alexei Starovoitov, Daniel Colascione, Joel Fernandes,
	linux-kernel, Tim Murray, netdev, Lorenzo Colitti,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable

Hi netdev,

Could we queue up this patch to stable 4.14 and stable 4.19? I can
provide a backport patch if needed. I checked it is a clean
cherry-pick for 4.19 but have some minor conflict for 4.14.

Thanks
Chenbo Feng
On Thu, Oct 18, 2018 at 4:36 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote:
> > > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> > > >> The map-in-map frequently serves as a mechanism for atomic
> > > >> snapshotting of state that a BPF program might record.  The current
> > > >> implementation is dangerous to use in this way, however, since
> > > >> userspace has no way of knowing when all programs that might have
> > > >> retrieved the "old" value of the map may have completed.
> > > >>
> > > >> This change ensures that map update operations on map-in-map map types
> > > >> always wait for all references to the old map to drop before returning
> > > >> to userspace.
> > > >>
> > > >> Signed-off-by: Daniel Colascione <dancol@google.com>
> > > >> ---
> > > >>  kernel/bpf/syscall.c | 14 ++++++++++++++
> > > >>  1 file changed, 14 insertions(+)
> > > >>
> > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > >> index 8339d81cba1d..d7c16ae1e85a 100644
> > > >> --- a/kernel/bpf/syscall.c
> > > >> +++ b/kernel/bpf/syscall.c
> > > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
> > > >>       return err;
> > > >>  }
> > > >>
> > > >> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> > > >> +{
> > > >> +     /* Wait for any running BPF programs to complete so that
> > > >> +      * userspace, when we return to it, knows that all programs
> > > >> +      * that could be running use the new map value.
> > > >> +      */
> > > >> +     if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> > > >> +         map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> > > >> +             synchronize_rcu();
> > > >> +     }
> > > >
> > > > extra {} were not necessary. I removed them while applying to bpf-next.
> > > > Please run checkpatch.pl next time.
> > > > Thanks
> > >
> > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not
> > > having this causes incorrect behavior for apps using map-in-map for
> > > this. So I CC'd stable as well.
> >
> > It is too late in the release cycle.
> > We can submit it to stable releases after the merge window.
> >
>
> Sounds good, thanks.
>
> - Joel
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-11-10  2:01                                 ` Chenbo Feng
@ 2018-11-10 15:22                                   ` Greg KH
  2018-11-10 18:58                                     ` Chenbo Feng
  0 siblings, 1 reply; 31+ messages in thread
From: Greg KH @ 2018-11-10 15:22 UTC (permalink / raw)
  To: Chenbo Feng
  Cc: joel, Alexei Starovoitov, Daniel Colascione, Joel Fernandes,
	linux-kernel, Tim Murray, netdev, Lorenzo Colitti,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable

On Fri, Nov 09, 2018 at 06:01:54PM -0800, Chenbo Feng wrote:
> Hi netdev,
> 
> Could we queue up this patch to stable 4.14 and stable 4.19? I can
> provide a backport patch if needed. I checked it is a clean
> cherry-pick for 4.19 but have some minor conflict for 4.14.

What is the git commit id of the patch in Linus's tree?

thanks

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map
  2018-11-10 15:22                                   ` Greg KH
@ 2018-11-10 18:58                                     ` Chenbo Feng
  0 siblings, 0 replies; 31+ messages in thread
From: Chenbo Feng @ 2018-11-10 18:58 UTC (permalink / raw)
  To: gregkh
  Cc: joel, Alexei Starovoitov, Daniel Colascione, Joel Fernandes,
	linux-kernel, Tim Murray, netdev, Lorenzo Colitti,
	Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable

The Linus's tree commit SHA is 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8.

I can send the patch to stable directly if needed.

Thanks
Chenbo Feng
On Sat, Nov 10, 2018 at 7:22 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Nov 09, 2018 at 06:01:54PM -0800, Chenbo Feng wrote:
> > Hi netdev,
> >
> > Could we queue up this patch to stable 4.14 and stable 4.19? I can
> > provide a backport patch if needed. I checked it is a clean
> > cherry-pick for 4.19 but have some minor conflict for 4.14.
>
> What is the git commit id of the patch in Linus's tree?
>
> thanks
>
> greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-11-10 18:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov
2018-07-29 20:46 ` Daniel Colascione
2018-07-29 20:58   ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione
2018-07-30 10:04     ` Daniel Borkmann
2018-07-30 10:04       ` Daniel Borkmann
2018-07-30 10:25       ` Daniel Colascione
2018-07-31  0:26         ` Jakub Kicinski
2018-07-31  0:33           ` Daniel Colascione
2018-07-31  0:45             ` Jakub Kicinski
2018-07-31  0:50               ` Daniel Colascione
2018-07-31  1:14                 ` Jakub Kicinski
2018-07-31  8:34             ` Daniel Borkmann
2018-07-31  9:36               ` Daniel Colascione
2018-08-10 22:52                 ` Alexei Starovoitov
2018-08-14 20:37                   ` Daniel Colascione
2018-08-16  4:01                     ` Alexei Starovoitov
2018-10-12 10:54                       ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione
2018-10-12 20:54                         ` Joel Fernandes
2018-10-13  2:31                         ` Alexei Starovoitov
2018-10-16 17:39                           ` Joel Fernandes
2018-10-18 15:46                             ` Alexei Starovoitov
2018-10-18 23:36                               ` Joel Fernandes
2018-11-10  2:01                                 ` Chenbo Feng
2018-11-10 15:22                                   ` Greg KH
2018-11-10 18:58                                     ` Chenbo Feng
     [not found]     ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com>
2018-08-10 22:29       ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command Alexei Starovoitov
2018-07-31  2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes
2018-07-31  2:06   ` Joel Fernandes
2018-07-31  4:03     ` Y Song
2018-07-31 21:56       ` Joel Fernandes
2018-07-31 22:30         ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.