* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command @ 2018-07-29 15:51 Alexei Starovoitov 2018-07-29 20:46 ` Daniel Colascione 2018-07-31 2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes 0 siblings, 2 replies; 32+ messages in thread From: Alexei Starovoitov @ 2018-07-29 15:51 UTC (permalink / raw) To: Daniel Colascione Cc: Joel Fernandes, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > map made by a BPF program that is running at the time the > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > to provide a means for userspace to replace a BPF map with another, > newer version, then ensure that no component is still using the "old" > map before manipulating the "old" map in some way. > > Signed-off-by: Daniel Colascione <dancol@google.com> > --- > include/uapi/linux/bpf.h | 9 +++++++++ > kernel/bpf/syscall.c | 13 +++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b7db3261c62d..5b27e9117d3e 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > __u8 data[0]; /* Arbitrary size */ > }; > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > + * BPF map made by a BPF program that is running at the time the > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command that doesn't sound right to me. such command won't wait for the release of the references. in case of map-in-map the program does not hold the references to inner map (only to outer map). > + * is to provide a means for userspace to replace a BPF map with > + * another, newer version, then ensure that no component is still > + * using the "old" map before manipulating the "old" map in some way. > + */ that's correct, but the whole paragraph still reads too 'generic' which will lead to confusion, whereas the use case is map-in-map only. I think bpf_sync_inner_map or bpf_sync_map_in_map would be better choices for the name. btw i don't have reliable internet access at the moment, so pls excuse the delays. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov @ 2018-07-29 20:46 ` Daniel Colascione 2018-07-29 20:58 ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione 2018-07-31 2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes 1 sibling, 1 reply; 32+ messages in thread From: Daniel Colascione @ 2018-07-29 20:46 UTC (permalink / raw) To: Alexei Starovoitov Cc: Joel Fernandes, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann [-- Attachment #1: Type: text/plain, Size: 2236 bytes --] On Sun, Jul 29, 2018 at 8:51 AM, Alexei Starovoitov < alexei.starovoitov@gmail.com> wrote: > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> > wrote: > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > > map made by a BPF program that is running at the time the > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > > to provide a means for userspace to replace a BPF map with another, > > newer version, then ensure that no component is still using the "old" > > map before manipulating the "old" map in some way. > > > > Signed-off-by: Daniel Colascione <dancol@google.com> > > --- > > include/uapi/linux/bpf.h | 9 +++++++++ > > kernel/bpf/syscall.c | 13 +++++++++++++ > > 2 files changed, 22 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index b7db3261c62d..5b27e9117d3e 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > > __u8 data[0]; /* Arbitrary size */ > > }; > > > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > > + * BPF map made by a BPF program that is running at the time the > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command > > that doesn't sound right to me. > such command won't wait for the release of the references. > in case of map-in-map the program does not hold > the references to inner map (only to outer map). > Well, today that's the guarantee it provides. :-) I figured it'd be simpler to explain the guarantee this way. How about "waits for the release of any reference to a map obtained from another map"? > > > + * is to provide a means for userspace to replace a BPF map with > > + * another, newer version, then ensure that no component is still > > + * using the "old" map before manipulating the "old" map in some way. > > + */ > > that's correct, but the whole paragraph still reads too > 'generic' which will lead to confusion, > whereas the use case is map-in-map only. > I think bpf_sync_inner_map or > bpf_sync_map_in_map would be better > choices for the name. I worry that a name like that would be too specific. [-- Attachment #2: Type: text/html, Size: 3163 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-29 20:46 ` Daniel Colascione @ 2018-07-29 20:58 ` Daniel Colascione 2018-07-30 10:04 ` Daniel Borkmann [not found] ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com> 0 siblings, 2 replies; 32+ messages in thread From: Daniel Colascione @ 2018-07-29 20:58 UTC (permalink / raw) To: joelaf Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, Daniel Colascione BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all references to maps active at the instant the BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the expiration of map references obtained by BPF programs from other maps. The purpose of this command is to provide a means for userspace to replace a BPF map with another, newer version, then ensure that no component is still using the "old" map before manipulating the "old" map in some way. Signed-off-by: Daniel Colascione <dancol@google.com> --- include/uapi/linux/bpf.h | 14 ++++++++++++++ kernel/bpf/syscall.c | 13 +++++++++++++ 2 files changed, 27 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b7db3261c62d..ca3cfca76edc 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key { __u8 data[0]; /* Arbitrary size */ }; +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all + * references to maps active at the instant the + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the + * expiration of map references obtained by BPF programs from + * other maps. + * + * The purpose of this command is to provide a means for userspace to + * replace a BPF map with another, newer version, then ensure that no + * component is still using the "old" map before manipulating the + * "old" map in some way. + */ + /* BPF syscall commands, see bpf(2) man-page for details. */ enum bpf_cmd { BPF_MAP_CREATE, @@ -98,6 +111,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES, }; enum bpf_map_type { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a31a1ba0f8ea..bc9a0713f47d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) return -EPERM; + if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) { + if (uattr != NULL || size != 0) + return -EINVAL; + err = security_bpf(cmd, NULL, 0); + if (err < 0) + return err; + /* BPF programs always enter a critical section while + * they have a map reference outstanding. + */ + synchronize_rcu(); + return 0; + } + err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); if (err) return err; -- 2.18.0.345.g5c9ce644c3-goog ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-29 20:58 ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione @ 2018-07-30 10:04 ` Daniel Borkmann [not found] ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com> 1 sibling, 0 replies; 32+ messages in thread From: Daniel Borkmann @ 2018-07-30 10:04 UTC (permalink / raw) To: Daniel Colascione, joelaf Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On 07/29/2018 10:58 PM, Daniel Colascione wrote: > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > references to maps active at the instant the > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > expiration of map references obtained by BPF programs from other maps. > > The purpose of this command is to provide a means for userspace to > replace a BPF map with another, newer version, then ensure that no > component is still using the "old" map before manipulating the "old" > map in some way. > > Signed-off-by: Daniel Colascione <dancol@google.com> > --- > include/uapi/linux/bpf.h | 14 ++++++++++++++ > kernel/bpf/syscall.c | 13 +++++++++++++ > 2 files changed, 27 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b7db3261c62d..ca3cfca76edc 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key { > __u8 data[0]; /* Arbitrary size */ > }; > > +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > + * references to maps active at the instant the > + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > + * expiration of map references obtained by BPF programs from > + * other maps. > + * > + * The purpose of this command is to provide a means for userspace to > + * replace a BPF map with another, newer version, then ensure that no > + * component is still using the "old" map before manipulating the > + * "old" map in some way. > + */ > + > /* BPF syscall commands, see bpf(2) man-page for details. */ > enum bpf_cmd { > BPF_MAP_CREATE, > @@ -98,6 +111,7 @@ enum bpf_cmd { > BPF_BTF_LOAD, > BPF_BTF_GET_FD_BY_ID, > BPF_TASK_FD_QUERY, > + BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES, > }; > > enum bpf_map_type { > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index a31a1ba0f8ea..bc9a0713f47d 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz > if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) > return -EPERM; > > + if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) { > + if (uattr != NULL || size != 0) > + return -EINVAL; > + err = security_bpf(cmd, NULL, 0); > + if (err < 0) > + return err; > + /* BPF programs always enter a critical section while > + * they have a map reference outstanding. > + */ > + synchronize_rcu(); > + return 0; > + } > + > err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); > if (err) > return err; > Hmm, I don't think such UAPI as above is future-proof. In case we would want a similar mechanism in future for other maps, we would need a whole new bpf command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though the underlying map may not even be a map-to-map. Additionally, we don't have any map object at hand in the above, so we couldn't make any finer grained decisions either. Something like below would be more suitable and leaves room for extending this further in future. Thanks, Daniel From 8dfea71b73fa0d402633b76f78c106e82a7a5007 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <daniel@iogearbox.net> Date: Mon, 30 Jul 2018 11:47:37 +0200 Subject: [PATCH] sync map refs Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/arraymap.c | 1 + kernel/bpf/hashtab.c | 1 + kernel/bpf/map_in_map.c | 6 ++++++ kernel/bpf/map_in_map.h | 1 + kernel/bpf/syscall.c | 24 ++++++++++++++++++++++++ 7 files changed, 35 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5b5ad95..7b51f86 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -34,6 +34,7 @@ struct bpf_map_ops { void (*map_free)(struct bpf_map *map); int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); + int (*map_sync_refs)(struct bpf_map *map); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8701139..e6ec1de 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -98,6 +98,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_SYNC_REFS, }; enum bpf_map_type { diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 544e58f..ddaf42a 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -748,5 +748,6 @@ const struct bpf_map_ops array_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = array_of_map_gen_lookup, }; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 513d9df..05380ea 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1407,5 +1407,6 @@ const struct bpf_map_ops htab_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = htab_of_map_gen_lookup, }; diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index 1da5746..698a50f 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -96,6 +96,12 @@ void bpf_map_fd_put_ptr(void *ptr) bpf_map_put(ptr); } +int bpf_map_sync_refs(struct bpf_map *map) +{ + synchronize_rcu(); + return 0; +} + u32 bpf_map_fd_sys_lookup_elem(void *ptr) { return ((struct bpf_map *)ptr)->id; diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h index 6183db9..ac02456 100644 --- a/kernel/bpf/map_in_map.h +++ b/kernel/bpf/map_in_map.h @@ -19,6 +19,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0, void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file, int ufd); void bpf_map_fd_put_ptr(void *ptr); +int bpf_map_sync_refs(struct bpf_map *map); u32 bpf_map_fd_sys_lookup_elem(void *ptr); #endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a31a1ba..b1286cc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -896,6 +896,27 @@ static int map_get_next_key(union bpf_attr *attr) return err; } +#define BPF_MAP_SYNC_REFS_LAST_FIELD map_fd + +static int map_sync_refs(union bpf_attr *attr) +{ + int err = -ENOTSUPP, ufd = attr->map_fd; + struct bpf_map *map; + struct fd f; + + if (CHECK_ATTR(BPF_MAP_SYNC_REFS)) + return -EINVAL; + + f = fdget(ufd); + map = __bpf_map_get(f); + if (IS_ERR(map)) + return PTR_ERR(map); + if (map->ops->map_sync_refs) + err = map->ops->map_sync_refs(map); + fdput(f); + return err; +} + static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name) \ [_id] = & _name ## _prog_ops, @@ -2303,6 +2324,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_MAP_GET_NEXT_KEY: err = map_get_next_key(&attr); break; + case BPF_MAP_SYNC_REFS: + err = map_sync_refs(&attr); + break; case BPF_PROG_LOAD: err = bpf_prog_load(&attr); break; -- 2.9.5 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command @ 2018-07-30 10:04 ` Daniel Borkmann 0 siblings, 0 replies; 32+ messages in thread From: Daniel Borkmann @ 2018-07-30 10:04 UTC (permalink / raw) To: Daniel Colascione, joelaf Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On 07/29/2018 10:58 PM, Daniel Colascione wrote: > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > references to maps active at the instant the > BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > expiration of map references obtained by BPF programs from other maps. > > The purpose of this command is to provide a means for userspace to > replace a BPF map with another, newer version, then ensure that no > component is still using the "old" map before manipulating the "old" > map in some way. > > Signed-off-by: Daniel Colascione <dancol@google.com> > --- > include/uapi/linux/bpf.h | 14 ++++++++++++++ > kernel/bpf/syscall.c | 13 +++++++++++++ > 2 files changed, 27 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index b7db3261c62d..ca3cfca76edc 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key { > __u8 data[0]; /* Arbitrary size */ > }; > > +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all > + * references to maps active at the instant the > + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is > + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the > + * expiration of map references obtained by BPF programs from > + * other maps. > + * > + * The purpose of this command is to provide a means for userspace to > + * replace a BPF map with another, newer version, then ensure that no > + * component is still using the "old" map before manipulating the > + * "old" map in some way. > + */ > + > /* BPF syscall commands, see bpf(2) man-page for details. */ > enum bpf_cmd { > BPF_MAP_CREATE, > @@ -98,6 +111,7 @@ enum bpf_cmd { > BPF_BTF_LOAD, > BPF_BTF_GET_FD_BY_ID, > BPF_TASK_FD_QUERY, > + BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES, > }; > > enum bpf_map_type { > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index a31a1ba0f8ea..bc9a0713f47d 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz > if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) > return -EPERM; > > + if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) { > + if (uattr != NULL || size != 0) > + return -EINVAL; > + err = security_bpf(cmd, NULL, 0); > + if (err < 0) > + return err; > + /* BPF programs always enter a critical section while > + * they have a map reference outstanding. > + */ > + synchronize_rcu(); > + return 0; > + } > + > err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); > if (err) > return err; > Hmm, I don't think such UAPI as above is future-proof. In case we would want a similar mechanism in future for other maps, we would need a whole new bpf command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though the underlying map may not even be a map-to-map. Additionally, we don't have any map object at hand in the above, so we couldn't make any finer grained decisions either. Something like below would be more suitable and leaves room for extending this further in future. Thanks, Daniel >From 8dfea71b73fa0d402633b76f78c106e82a7a5007 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann <daniel@iogearbox.net> Date: Mon, 30 Jul 2018 11:47:37 +0200 Subject: [PATCH] sync map refs Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/arraymap.c | 1 + kernel/bpf/hashtab.c | 1 + kernel/bpf/map_in_map.c | 6 ++++++ kernel/bpf/map_in_map.h | 1 + kernel/bpf/syscall.c | 24 ++++++++++++++++++++++++ 7 files changed, 35 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5b5ad95..7b51f86 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -34,6 +34,7 @@ struct bpf_map_ops { void (*map_free)(struct bpf_map *map); int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); + int (*map_sync_refs)(struct bpf_map *map); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 8701139..e6ec1de 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -98,6 +98,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_SYNC_REFS, }; enum bpf_map_type { diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 544e58f..ddaf42a 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -748,5 +748,6 @@ const struct bpf_map_ops array_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = array_of_map_gen_lookup, }; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 513d9df..05380ea 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1407,5 +1407,6 @@ const struct bpf_map_ops htab_of_maps_map_ops = { .map_fd_get_ptr = bpf_map_fd_get_ptr, .map_fd_put_ptr = bpf_map_fd_put_ptr, .map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem, + .map_sync_refs = bpf_map_sync_refs, .map_gen_lookup = htab_of_map_gen_lookup, }; diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c index 1da5746..698a50f 100644 --- a/kernel/bpf/map_in_map.c +++ b/kernel/bpf/map_in_map.c @@ -96,6 +96,12 @@ void bpf_map_fd_put_ptr(void *ptr) bpf_map_put(ptr); } +int bpf_map_sync_refs(struct bpf_map *map) +{ + synchronize_rcu(); + return 0; +} + u32 bpf_map_fd_sys_lookup_elem(void *ptr) { return ((struct bpf_map *)ptr)->id; diff --git a/kernel/bpf/map_in_map.h b/kernel/bpf/map_in_map.h index 6183db9..ac02456 100644 --- a/kernel/bpf/map_in_map.h +++ b/kernel/bpf/map_in_map.h @@ -19,6 +19,7 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0, void *bpf_map_fd_get_ptr(struct bpf_map *map, struct file *map_file, int ufd); void bpf_map_fd_put_ptr(void *ptr); +int bpf_map_sync_refs(struct bpf_map *map); u32 bpf_map_fd_sys_lookup_elem(void *ptr); #endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a31a1ba..b1286cc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -896,6 +896,27 @@ static int map_get_next_key(union bpf_attr *attr) return err; } +#define BPF_MAP_SYNC_REFS_LAST_FIELD map_fd + +static int map_sync_refs(union bpf_attr *attr) +{ + int err = -ENOTSUPP, ufd = attr->map_fd; + struct bpf_map *map; + struct fd f; + + if (CHECK_ATTR(BPF_MAP_SYNC_REFS)) + return -EINVAL; + + f = fdget(ufd); + map = __bpf_map_get(f); + if (IS_ERR(map)) + return PTR_ERR(map); + if (map->ops->map_sync_refs) + err = map->ops->map_sync_refs(map); + fdput(f); + return err; +} + static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name) \ [_id] = & _name ## _prog_ops, @@ -2303,6 +2324,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_MAP_GET_NEXT_KEY: err = map_get_next_key(&attr); break; + case BPF_MAP_SYNC_REFS: + err = map_sync_refs(&attr); + break; case BPF_PROG_LOAD: err = bpf_prog_load(&attr); break; -- 2.9.5 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-30 10:04 ` Daniel Borkmann (?) @ 2018-07-30 10:25 ` Daniel Colascione 2018-07-31 0:26 ` Jakub Kicinski -1 siblings, 1 reply; 32+ messages in thread From: Daniel Colascione @ 2018-07-30 10:25 UTC (permalink / raw) To: Daniel Borkmann Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > Hmm, I don't think such UAPI as above is future-proof. In case we would want > a similar mechanism in future for other maps, we would need a whole new bpf > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though > the underlying map may not even be a map-to-map. Additionally, we don't have > any map object at hand in the above, so we couldn't make any finer grained > decisions either. Something like below would be more suitable and leaves room > for extending this further in future. YAGNI. Your proposed mechanism doesn't add anything under the current implementation. It's also not clear how a map-specific synchronization command is supposed to work in cases where we swap multiple map references. Do we synchronize_rcu multiple times? Why would we impose that inefficiency just for the sake of some non-specific future extensibility? Add some kind of batching layer? The current approach works for the anticipated use cases. While my preference is not to talk about map-to-maps at all in the user API and instead spec the thing as talking about map references in general, I'd rather have something that talks about references-to-maps-acquired-from-maps than this interface. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-30 10:25 ` Daniel Colascione @ 2018-07-31 0:26 ` Jakub Kicinski 2018-07-31 0:33 ` Daniel Colascione 0 siblings, 1 reply; 32+ messages in thread From: Jakub Kicinski @ 2018-07-31 0:26 UTC (permalink / raw) To: Daniel Colascione Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > > Hmm, I don't think such UAPI as above is future-proof. In case we would want > > a similar mechanism in future for other maps, we would need a whole new bpf > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though > > the underlying map may not even be a map-to-map. Additionally, we don't have > > any map object at hand in the above, so we couldn't make any finer grained > > decisions either. Something like below would be more suitable and leaves room > > for extending this further in future. > > YAGNI. Your proposed mechanism doesn't add anything under the current > implementation. FWIW in case of HW offload targeting a particular map may allow users to avoid a potentially slow sync with all the devices on the system. > It's also not clear how a map-specific synchronization > command is supposed to work in cases where we swap multiple map > references. Do we synchronize_rcu multiple times? Why would we impose > that inefficiency just for the sake of some non-specific future > extensibility? Add some kind of batching layer? The current approach > works for the anticipated use cases. > > While my preference is not to talk about map-to-maps at all in the > user API and instead spec the thing as talking about map references in > general, I'd rather have something that talks about > references-to-maps-acquired-from-maps than this interface. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 0:26 ` Jakub Kicinski @ 2018-07-31 0:33 ` Daniel Colascione 2018-07-31 0:45 ` Jakub Kicinski 2018-07-31 8:34 ` Daniel Borkmann 0 siblings, 2 replies; 32+ messages in thread From: Daniel Colascione @ 2018-07-31 0:33 UTC (permalink / raw) To: Jakub Kicinski Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > > On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: > > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > > > Hmm, I don't think such UAPI as above is future-proof. In case we would want > > > a similar mechanism in future for other maps, we would need a whole new bpf > > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though > > > the underlying map may not even be a map-to-map. Additionally, we don't have > > > any map object at hand in the above, so we couldn't make any finer grained > > > decisions either. Something like below would be more suitable and leaves room > > > for extending this further in future. > > > > YAGNI. Your proposed mechanism doesn't add anything under the current > > implementation. > > FWIW in case of HW offload targeting a particular map may allow users > to avoid a potentially slow sync with all the devices on the system. Sure. But such a thing doesn't exist right now (right?), and we can add that more-efficient-in-that-one-case BPF interface when it lands. I'd rather keep things simple for now. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 0:33 ` Daniel Colascione @ 2018-07-31 0:45 ` Jakub Kicinski 2018-07-31 0:50 ` Daniel Colascione 2018-07-31 8:34 ` Daniel Borkmann 1 sibling, 1 reply; 32+ messages in thread From: Jakub Kicinski @ 2018-07-31 0:45 UTC (permalink / raw) To: Daniel Colascione, Daniel Borkmann Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, 30 Jul 2018 17:33:39 -0700, Daniel Colascione wrote: > On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski wrote: > > On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: > > > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > Hmm, I don't think such UAPI as above is future-proof. In case we would want > > > > a similar mechanism in future for other maps, we would need a whole new bpf > > > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though > > > > the underlying map may not even be a map-to-map. Additionally, we don't have > > > > any map object at hand in the above, so we couldn't make any finer grained > > > > decisions either. Something like below would be more suitable and leaves room > > > > for extending this further in future. > > > > > > YAGNI. Your proposed mechanism doesn't add anything under the current > > > implementation. > > > > FWIW in case of HW offload targeting a particular map may allow users > > to avoid a potentially slow sync with all the devices on the system. > > Sure. But such a thing doesn't exist right now (right?), and we can > add that more-efficient-in-that-one-case BPF interface when it lands. > I'd rather keep things simple for now. You mean map-in-map offload doesn't exist today? True, but it's on the roadmap for Netronome. Tangentially it would be really useful for us to have a "the map has actually been freed" notification/barrier. We have complaints of users creating huge maps and then trying to free and recreate them quickly, and the creation part failing with -ENOMEM, because the free from a workqueue didn't run, yet :( It'd probably require a different API to solve than what's discussed here, but since we're talking about syncing things I thought I'd put it out there... ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 0:45 ` Jakub Kicinski @ 2018-07-31 0:50 ` Daniel Colascione 2018-07-31 1:14 ` Jakub Kicinski 0 siblings, 1 reply; 32+ messages in thread From: Daniel Colascione @ 2018-07-31 0:50 UTC (permalink / raw) To: Jakub Kicinski Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, Jul 30, 2018 at 5:45 PM, Jakub Kicinski <jakub.kicinski@netronome.com> wrote: > On Mon, 30 Jul 2018 17:33:39 -0700, Daniel Colascione wrote: >> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski wrote: >> > On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: >> > > On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: >> > > > Hmm, I don't think such UAPI as above is future-proof. In case we would want >> > > > a similar mechanism in future for other maps, we would need a whole new bpf >> > > > command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though >> > > > the underlying map may not even be a map-to-map. Additionally, we don't have >> > > > any map object at hand in the above, so we couldn't make any finer grained >> > > > decisions either. Something like below would be more suitable and leaves room >> > > > for extending this further in future. >> > > >> > > YAGNI. Your proposed mechanism doesn't add anything under the current >> > > implementation. >> > >> > FWIW in case of HW offload targeting a particular map may allow users >> > to avoid a potentially slow sync with all the devices on the system. >> >> Sure. But such a thing doesn't exist right now (right?), and we can >> add that more-efficient-in-that-one-case BPF interface when it lands. >> I'd rather keep things simple for now. > > You mean map-in-map offload doesn't exist today? True, but it's on the > roadmap for Netronome. Sounds cool. I'd still wait until map-in-map offload lands until adding the map-specific API though. > Tangentially it would be really useful for us to have a "the map has > actually been freed" notification/barrier. We have complaints of users > creating huge maps and then trying to free and recreate them quickly, > and the creation part failing with -ENOMEM, because the free from a > workqueue didn't run, yet :( It'd probably require a different API to > solve than what's discussed here, but since we're talking about syncing > things I thought I'd put it out there... Yeah. What you're talking about is what I meant upthread when I briefly mentioned a "refcount draining approach" --- you'd block until all references except your own to a map disappeared. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 0:50 ` Daniel Colascione @ 2018-07-31 1:14 ` Jakub Kicinski 0 siblings, 0 replies; 32+ messages in thread From: Jakub Kicinski @ 2018-07-31 1:14 UTC (permalink / raw) To: Daniel Colascione Cc: Daniel Borkmann, Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Mon, 30 Jul 2018 17:50:15 -0700, Daniel Colascione wrote: > > Tangentially it would be really useful for us to have a "the map has > > actually been freed" notification/barrier. We have complaints of users > > creating huge maps and then trying to free and recreate them quickly, > > and the creation part failing with -ENOMEM, because the free from a > > workqueue didn't run, yet :( It'd probably require a different API to > > solve than what's discussed here, but since we're talking about syncing > > things I thought I'd put it out there... > > Yeah. What you're talking about is what I meant upthread when I > briefly mentioned a "refcount draining approach" --- you'd block until > all references except your own to a map disappeared. I don't think so. The ref count goes to 0, work gets scheduled to call free, work runs, free gets called, device memory gets freed. Now the memory can be reused. As long as there are any refs we can't free memory, unless we want to make the semantics really ugly. But as I said, only superficially related to this patch. Perhaps solution will naturally fall out of an API to query the device capabilities and resources, if we ever get there. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 0:33 ` Daniel Colascione 2018-07-31 0:45 ` Jakub Kicinski @ 2018-07-31 8:34 ` Daniel Borkmann 2018-07-31 9:36 ` Daniel Colascione 1 sibling, 1 reply; 32+ messages in thread From: Daniel Borkmann @ 2018-07-31 8:34 UTC (permalink / raw) To: Daniel Colascione, Jakub Kicinski Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On 07/31/2018 02:33 AM, Daniel Colascione wrote: > On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski > <jakub.kicinski@netronome.com> wrote: >> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: >>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: >>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want >>>> a similar mechanism in future for other maps, we would need a whole new bpf >>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though >>>> the underlying map may not even be a map-to-map. Additionally, we don't have >>>> any map object at hand in the above, so we couldn't make any finer grained >>>> decisions either. Something like below would be more suitable and leaves room >>>> for extending this further in future. >>> >>> YAGNI. Your proposed mechanism doesn't add anything under the current >>> implementation. >> >> FWIW in case of HW offload targeting a particular map may allow users >> to avoid a potentially slow sync with all the devices on the system. > > Sure. But such a thing doesn't exist right now (right?), and we can > add that more-efficient-in-that-one-case BPF interface when it lands. > I'd rather keep things simple for now. I don't see a reason why that is even more complicated. An API command name such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and exposes specific map details (here: map-in-map) into the UAPI whereas it should reside within a specific implementation instead similar to other ops we have for maps. If in future other maps would be added that would have similar mechanisms of inner objects they return to the BPF program, we'll be adding yet another command just for this. Also, union bpf_attr is extensible, e.g. additional members could be added in future whenever needed for this subcommand instead of forcing it to NULL as done here. E.g. having a high-level one like bpf(BPF_MAP_WAIT, { .map_fd = fd }) could later on also cover the use-case from Jakub by adding an 'event' member into union bpf_attr where we could add other map specific wakeups for user space while the current default of 0 would be BPF_MAP_WAIT_PENDING_REFS (or the like). All I'm saying is to keep it generic so it can be extended later. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 8:34 ` Daniel Borkmann @ 2018-07-31 9:36 ` Daniel Colascione 2018-08-10 22:52 ` Alexei Starovoitov 0 siblings, 1 reply; 32+ messages in thread From: Daniel Colascione @ 2018-07-31 9:36 UTC (permalink / raw) To: Daniel Borkmann Cc: Jakub Kicinski, Joel Fernandes, linux-kernel, Tim Murray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Tue, Jul 31, 2018 at 1:34 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > On 07/31/2018 02:33 AM, Daniel Colascione wrote: >> On Mon, Jul 30, 2018 at 5:26 PM, Jakub Kicinski >> <jakub.kicinski@netronome.com> wrote: >>> On Mon, 30 Jul 2018 03:25:43 -0700, Daniel Colascione wrote: >>>> On Mon, Jul 30, 2018 at 3:04 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: >>>>> Hmm, I don't think such UAPI as above is future-proof. In case we would want >>>>> a similar mechanism in future for other maps, we would need a whole new bpf >>>>> command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though >>>>> the underlying map may not even be a map-to-map. Additionally, we don't have >>>>> any map object at hand in the above, so we couldn't make any finer grained >>>>> decisions either. Something like below would be more suitable and leaves room >>>>> for extending this further in future. >>>> >>>> YAGNI. Your proposed mechanism doesn't add anything under the current >>>> implementation. >>> >>> FWIW in case of HW offload targeting a particular map may allow users >>> to avoid a potentially slow sync with all the devices on the system. >> >> Sure. But such a thing doesn't exist right now (right?), and we can >> add that more-efficient-in-that-one-case BPF interface when it lands. >> I'd rather keep things simple for now. > > I don't see a reason why that is even more complicated. Both the API and the implementation are much more complicated in the per-map ops version: just look at the patch size. The size argument isn't necessarily a dealbreaker, but I still don't see what the extra code size and complexity is buying. > An API command name > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and > exposes specific map details (here: map-in-map) into the UAPI whereas it > should reside within a specific implementation instead similar to other ops > we have for maps. But synchronize isn't conceptually a command that applies to a specific map. It waits on all references. Did you address my point about your proposed map-specific interface requiring redundant synchronize_rcu calls in the case where we swap multiple maps and want to wait for all the references to drain? Under my proposal, you'd just BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your proposal, we'd make it a per-map operation, so we'd issue one synchronize_rcu per map. > If in future other maps would be added that would have > similar mechanisms of inner objects they return to the BPF program, we'll > be adding yet another command just for this. And that's why my personal preference is to just calling this thing BPF_SYNCHRONIZE, which I'd define to wait for all such "inner objects". Alexei is the one who asked for the very specific naming, I believe. Anyway, we have a very simple patch that we could apply today. It addresses a real need, and it doesn't preclude adding something more specific later, when we know we need it. Besides, it's not as if adding a BPF command is particularly expensive. > Also, union bpf_attr is extensible, > e.g. additional members could be added in future whenever needed for this > subcommand instead of forcing it to NULL as done here. We fail with EINVAL when attr != NULL now, which means that we can safely accept a non-NULL attr-based subcommand later without breaking anyone. The interface is already extensible. > All I'm saying is to > keep it generic so it can be extended later. Sure, but no more extensible than it has to be. Prematurely-added extension points tend to cause trouble later. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-07-31 9:36 ` Daniel Colascione @ 2018-08-10 22:52 ` Alexei Starovoitov 2018-08-14 20:37 ` Daniel Colascione 0 siblings, 1 reply; 32+ messages in thread From: Alexei Starovoitov @ 2018-08-10 22:52 UTC (permalink / raw) To: Daniel Colascione Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Tue, Jul 31, 2018 at 02:36:39AM -0700, Daniel Colascione wrote: > > > An API command name > > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and > > exposes specific map details (here: map-in-map) into the UAPI whereas it > > should reside within a specific implementation instead similar to other ops > > we have for maps. > > But synchronize isn't conceptually a command that applies to a > specific map. It waits on all references. Did you address my point > about your proposed map-specific interface requiring redundant > synchronize_rcu calls in the case where we swap multiple maps and want > to wait for all the references to drain? Under my proposal, you'd just > BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your > proposal, we'd make it a per-map operation, so we'd issue one > synchronize_rcu per map. optimizing for multi-map sync sounds like premature optimization. In general I'd prefer DanielB proposal to make the sync logic map and fd specific, but before we argue about implementation further let's agree on the problem we're trying to solve. I believe the only issue being discussed is user space doesn't know when it's ok to start draining the inner map when it was replaced by bpf_map_update syscall command with another map, right? If we agree on that, should bpf_map_update handle it then? Wouldn't it be much easier to understand and use from user pov? No new commands to learn. map_update syscall replaced the map and old map is no longer accessed by the program via this given map-in-map. But if the replaced map is used directly or it sits in some other map-in-map slot the progs can still access it. My issue with DanielC SYNC cmd that it exposes implementation details and introduces complex 'synchronization' semantics. To majority of the users it won't be obvious what is being synchronized. My issue with DanielB WAIT_REF map_fd cmd that it needs to wait for all refs to this map to be dropped. I think combination of usercnt and refcnt can answer that, but feels dangerous to sleep potentially forever in a syscall until all prog->map references are gone, though such cmd is useful beyond map-in-map use case. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-08-10 22:52 ` Alexei Starovoitov @ 2018-08-14 20:37 ` Daniel Colascione 2018-08-16 4:01 ` Alexei Starovoitov 0 siblings, 1 reply; 32+ messages in thread From: Daniel Colascione @ 2018-08-14 20:37 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Fri, Aug 10, 2018 at 3:52 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Jul 31, 2018 at 02:36:39AM -0700, Daniel Colascione wrote: >> >> > An API command name >> > such as BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is simply non-generic, and >> > exposes specific map details (here: map-in-map) into the UAPI whereas it >> > should reside within a specific implementation instead similar to other ops >> > we have for maps. >> >> But synchronize isn't conceptually a command that applies to a >> specific map. It waits on all references. Did you address my point >> about your proposed map-specific interface requiring redundant >> synchronize_rcu calls in the case where we swap multiple maps and want >> to wait for all the references to drain? Under my proposal, you'd just >> BPF_SYNCHRONIZE_WHATEVER and call schedule_rcu once. Under your >> proposal, we'd make it a per-map operation, so we'd issue one >> synchronize_rcu per map. > > optimizing for multi-map sync sounds like premature optimization. Maybe, but the per-map proposal is less efficient *and* more complicated! I don't want to spend more code just to go slower. > I believe the only issue being discussed is user space doesn't know > when it's ok to start draining the inner map when it was replaced > by bpf_map_update syscall command with another map, right? Yes. > If we agree on that, should bpf_map_update handle it then? > Wouldn't it be much easier to understand and use from user pov? > No new commands to learn. map_update syscall replaced the map > and old map is no longer accessed by the program via this given map-in-map. Maybe with a new BPF_SYNCHRONIZE flag for BPF_MAP_UPDATE_ELEM and BPF_MAP_DELETE_ELEM. Otherwise, it seems wrong to make every user of these commands pay for synchronization that only a few will need. > But if the replaced map is used directly or it sits in some other > map-in-map slot the progs can still access it. > > My issue with DanielC SYNC cmd that it exposes implementation details What implementation details? The command semantics are defined entirely in terms of existing user-visible primitives. > and introduces complex 'synchronization' semantics. To majority of > the users it won't be obvious what is being synchronized. > > My issue with DanielB WAIT_REF map_fd cmd that it needs to wait for all refs > to this map to be dropped. I think combination of usercnt and refcnt > can answer that, but feels dangerous to sleep potentially forever > in a syscall until all prog->map references are gone, though such > cmd is useful beyond map-in-map use case. In what scenarios? In any case, can we submit _something_? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command 2018-08-14 20:37 ` Daniel Colascione @ 2018-08-16 4:01 ` Alexei Starovoitov 2018-10-12 10:54 ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione 0 siblings, 1 reply; 32+ messages in thread From: Alexei Starovoitov @ 2018-08-16 4:01 UTC (permalink / raw) To: Daniel Colascione Cc: Daniel Borkmann, Jakub Kicinski, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On Tue, Aug 14, 2018 at 01:37:12PM -0700, Daniel Colascione wrote: > > > If we agree on that, should bpf_map_update handle it then? > > Wouldn't it be much easier to understand and use from user pov? > > No new commands to learn. map_update syscall replaced the map > > and old map is no longer accessed by the program via this given map-in-map. > > Maybe with a new BPF_SYNCHRONIZE flag for BPF_MAP_UPDATE_ELEM and > BPF_MAP_DELETE_ELEM. Otherwise, it seems wrong to make every user of > these commands pay for synchronization that only a few will need. I don't think extra flag is needed. Extra sync_rcu() for map-in-map is useful for all users. I would consider it a bugfix, since users that examine deleted map have this race today and removing the race is always a good thing especially since the cost is small. ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-08-16 4:01 ` Alexei Starovoitov @ 2018-10-12 10:54 ` Daniel Colascione 2018-10-12 20:54 ` Joel Fernandes 2018-10-13 2:31 ` Alexei Starovoitov 0 siblings, 2 replies; 32+ messages in thread From: Daniel Colascione @ 2018-10-12 10:54 UTC (permalink / raw) To: joelaf Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, Daniel Colascione The map-in-map frequently serves as a mechanism for atomic snapshotting of state that a BPF program might record. The current implementation is dangerous to use in this way, however, since userspace has no way of knowing when all programs that might have retrieved the "old" value of the map may have completed. This change ensures that map update operations on map-in-map map types always wait for all references to the old map to drop before returning to userspace. Signed-off-by: Daniel Colascione <dancol@google.com> --- kernel/bpf/syscall.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8339d81cba1d..d7c16ae1e85a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) return err; } +static void maybe_wait_bpf_programs(struct bpf_map *map) +{ + /* Wait for any running BPF programs to complete so that + * userspace, when we return to it, knows that all programs + * that could be running use the new map value. + */ + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { + synchronize_rcu(); + } +} + #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags static int map_update_elem(union bpf_attr *attr) @@ -831,6 +843,7 @@ static int map_update_elem(union bpf_attr *attr) } __this_cpu_dec(bpf_prog_active); preempt_enable(); + maybe_wait_bpf_programs(map); out: free_value: kfree(value); @@ -883,6 +896,7 @@ static int map_delete_elem(union bpf_attr *attr) rcu_read_unlock(); __this_cpu_dec(bpf_prog_active); preempt_enable(); + maybe_wait_bpf_programs(map); out: kfree(key); err_put: -- 2.19.0.605.g01d371f741-goog ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-12 10:54 ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione @ 2018-10-12 20:54 ` Joel Fernandes 2018-10-13 2:31 ` Alexei Starovoitov 1 sibling, 0 replies; 32+ messages in thread From: Joel Fernandes @ 2018-10-12 20:54 UTC (permalink / raw) To: Daniel Colascione Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, kernel-team On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: > The map-in-map frequently serves as a mechanism for atomic > snapshotting of state that a BPF program might record. The current > implementation is dangerous to use in this way, however, since > userspace has no way of knowing when all programs that might have > retrieved the "old" value of the map may have completed. > > This change ensures that map update operations on map-in-map map types > always wait for all references to the old map to drop before returning > to userspace. > > Signed-off-by: Daniel Colascione <dancol@google.com> > --- > kernel/bpf/syscall.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 8339d81cba1d..d7c16ae1e85a 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) > return err; > } > > +static void maybe_wait_bpf_programs(struct bpf_map *map) > +{ > + /* Wait for any running BPF programs to complete so that > + * userspace, when we return to it, knows that all programs > + * that could be running use the new map value. > + */ > + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || > + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { > + synchronize_rcu(); > + } > +} > + > #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags > > static int map_update_elem(union bpf_attr *attr) > @@ -831,6 +843,7 @@ static int map_update_elem(union bpf_attr *attr) > } > __this_cpu_dec(bpf_prog_active); > preempt_enable(); > + maybe_wait_bpf_programs(map); > out: > free_value: > kfree(value); > @@ -883,6 +896,7 @@ static int map_delete_elem(union bpf_attr *attr) > rcu_read_unlock(); > __this_cpu_dec(bpf_prog_active); > preempt_enable(); > + maybe_wait_bpf_programs(map); Looks good to me, Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Also I believe that those rcu_read_lock() and rcu_read_unlock() calls in the existing code are useless. preempt_disable()d code is already an RCU read-side section, and synchronize_rcu and friends work on those type of read-side sections as well (as of recent kernel releases) however removing it may make lockdep unhappy, unless we also replace all rcu_dereference() usages with rcu_dereference_sched(), so lets leave that alone for now I guess. thanks, - Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-12 10:54 ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione 2018-10-12 20:54 ` Joel Fernandes @ 2018-10-13 2:31 ` Alexei Starovoitov 2018-10-16 17:39 ` Joel Fernandes 1 sibling, 1 reply; 32+ messages in thread From: Alexei Starovoitov @ 2018-10-13 2:31 UTC (permalink / raw) To: Daniel Colascione Cc: joelaf, linux-kernel, timmurray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: > The map-in-map frequently serves as a mechanism for atomic > snapshotting of state that a BPF program might record. The current > implementation is dangerous to use in this way, however, since > userspace has no way of knowing when all programs that might have > retrieved the "old" value of the map may have completed. > > This change ensures that map update operations on map-in-map map types > always wait for all references to the old map to drop before returning > to userspace. > > Signed-off-by: Daniel Colascione <dancol@google.com> > --- > kernel/bpf/syscall.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 8339d81cba1d..d7c16ae1e85a 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) > return err; > } > > +static void maybe_wait_bpf_programs(struct bpf_map *map) > +{ > + /* Wait for any running BPF programs to complete so that > + * userspace, when we return to it, knows that all programs > + * that could be running use the new map value. > + */ > + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || > + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { > + synchronize_rcu(); > + } extra {} were not necessary. I removed them while applying to bpf-next. Please run checkpatch.pl next time. Thanks ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-13 2:31 ` Alexei Starovoitov @ 2018-10-16 17:39 ` Joel Fernandes 2018-10-18 15:46 ` Alexei Starovoitov 0 siblings, 1 reply; 32+ messages in thread From: Joel Fernandes @ 2018-10-16 17:39 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: >> The map-in-map frequently serves as a mechanism for atomic >> snapshotting of state that a BPF program might record. The current >> implementation is dangerous to use in this way, however, since >> userspace has no way of knowing when all programs that might have >> retrieved the "old" value of the map may have completed. >> >> This change ensures that map update operations on map-in-map map types >> always wait for all references to the old map to drop before returning >> to userspace. >> >> Signed-off-by: Daniel Colascione <dancol@google.com> >> --- >> kernel/bpf/syscall.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index 8339d81cba1d..d7c16ae1e85a 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) >> return err; >> } >> >> +static void maybe_wait_bpf_programs(struct bpf_map *map) >> +{ >> + /* Wait for any running BPF programs to complete so that >> + * userspace, when we return to it, knows that all programs >> + * that could be running use the new map value. >> + */ >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { >> + synchronize_rcu(); >> + } > > extra {} were not necessary. I removed them while applying to bpf-next. > Please run checkpatch.pl next time. > Thanks Thanks Alexei for taking it. Me and Lorenzo were discussing that not having this causes incorrect behavior for apps using map-in-map for this. So I CC'd stable as well. -Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-16 17:39 ` Joel Fernandes @ 2018-10-18 15:46 ` Alexei Starovoitov 2018-10-18 23:36 ` Joel Fernandes 0 siblings, 1 reply; 32+ messages in thread From: Alexei Starovoitov @ 2018-10-18 15:46 UTC (permalink / raw) To: Joel Fernandes Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote: > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: > >> The map-in-map frequently serves as a mechanism for atomic > >> snapshotting of state that a BPF program might record. The current > >> implementation is dangerous to use in this way, however, since > >> userspace has no way of knowing when all programs that might have > >> retrieved the "old" value of the map may have completed. > >> > >> This change ensures that map update operations on map-in-map map types > >> always wait for all references to the old map to drop before returning > >> to userspace. > >> > >> Signed-off-by: Daniel Colascione <dancol@google.com> > >> --- > >> kernel/bpf/syscall.c | 14 ++++++++++++++ > >> 1 file changed, 14 insertions(+) > >> > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > >> index 8339d81cba1d..d7c16ae1e85a 100644 > >> --- a/kernel/bpf/syscall.c > >> +++ b/kernel/bpf/syscall.c > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) > >> return err; > >> } > >> > >> +static void maybe_wait_bpf_programs(struct bpf_map *map) > >> +{ > >> + /* Wait for any running BPF programs to complete so that > >> + * userspace, when we return to it, knows that all programs > >> + * that could be running use the new map value. > >> + */ > >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || > >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { > >> + synchronize_rcu(); > >> + } > > > > extra {} were not necessary. I removed them while applying to bpf-next. > > Please run checkpatch.pl next time. > > Thanks > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not > having this causes incorrect behavior for apps using map-in-map for > this. So I CC'd stable as well. It is too late in the release cycle. We can submit it to stable releases after the merge window. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-18 15:46 ` Alexei Starovoitov @ 2018-10-18 23:36 ` Joel Fernandes 2018-11-10 2:01 ` Chenbo Feng 0 siblings, 1 reply; 32+ messages in thread From: Joel Fernandes @ 2018-10-18 23:36 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote: > On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote: > > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: > > >> The map-in-map frequently serves as a mechanism for atomic > > >> snapshotting of state that a BPF program might record. The current > > >> implementation is dangerous to use in this way, however, since > > >> userspace has no way of knowing when all programs that might have > > >> retrieved the "old" value of the map may have completed. > > >> > > >> This change ensures that map update operations on map-in-map map types > > >> always wait for all references to the old map to drop before returning > > >> to userspace. > > >> > > >> Signed-off-by: Daniel Colascione <dancol@google.com> > > >> --- > > >> kernel/bpf/syscall.c | 14 ++++++++++++++ > > >> 1 file changed, 14 insertions(+) > > >> > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > >> index 8339d81cba1d..d7c16ae1e85a 100644 > > >> --- a/kernel/bpf/syscall.c > > >> +++ b/kernel/bpf/syscall.c > > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) > > >> return err; > > >> } > > >> > > >> +static void maybe_wait_bpf_programs(struct bpf_map *map) > > >> +{ > > >> + /* Wait for any running BPF programs to complete so that > > >> + * userspace, when we return to it, knows that all programs > > >> + * that could be running use the new map value. > > >> + */ > > >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || > > >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { > > >> + synchronize_rcu(); > > >> + } > > > > > > extra {} were not necessary. I removed them while applying to bpf-next. > > > Please run checkpatch.pl next time. > > > Thanks > > > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not > > having this causes incorrect behavior for apps using map-in-map for > > this. So I CC'd stable as well. > > It is too late in the release cycle. > We can submit it to stable releases after the merge window. > Sounds good, thanks. - Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-10-18 23:36 ` Joel Fernandes @ 2018-11-10 2:01 ` Chenbo Feng 2018-11-10 15:22 ` Greg KH 0 siblings, 1 reply; 32+ messages in thread From: Chenbo Feng @ 2018-11-10 2:01 UTC (permalink / raw) To: joel Cc: Alexei Starovoitov, Daniel Colascione, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable Hi netdev, Could we queue up this patch to stable 4.14 and stable 4.19? I can provide a backport patch if needed. I checked it is a clean cherry-pick for 4.19 but have some minor conflict for 4.14. Thanks Chenbo Feng On Thu, Oct 18, 2018 at 4:36 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote: > > On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote: > > > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote: > > > >> The map-in-map frequently serves as a mechanism for atomic > > > >> snapshotting of state that a BPF program might record. The current > > > >> implementation is dangerous to use in this way, however, since > > > >> userspace has no way of knowing when all programs that might have > > > >> retrieved the "old" value of the map may have completed. > > > >> > > > >> This change ensures that map update operations on map-in-map map types > > > >> always wait for all references to the old map to drop before returning > > > >> to userspace. > > > >> > > > >> Signed-off-by: Daniel Colascione <dancol@google.com> > > > >> --- > > > >> kernel/bpf/syscall.c | 14 ++++++++++++++ > > > >> 1 file changed, 14 insertions(+) > > > >> > > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > > >> index 8339d81cba1d..d7c16ae1e85a 100644 > > > >> --- a/kernel/bpf/syscall.c > > > >> +++ b/kernel/bpf/syscall.c > > > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr) > > > >> return err; > > > >> } > > > >> > > > >> +static void maybe_wait_bpf_programs(struct bpf_map *map) > > > >> +{ > > > >> + /* Wait for any running BPF programs to complete so that > > > >> + * userspace, when we return to it, knows that all programs > > > >> + * that could be running use the new map value. > > > >> + */ > > > >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS || > > > >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) { > > > >> + synchronize_rcu(); > > > >> + } > > > > > > > > extra {} were not necessary. I removed them while applying to bpf-next. > > > > Please run checkpatch.pl next time. > > > > Thanks > > > > > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not > > > having this causes incorrect behavior for apps using map-in-map for > > > this. So I CC'd stable as well. > > > > It is too late in the release cycle. > > We can submit it to stable releases after the merge window. > > > > Sounds good, thanks. > > - Joel > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-11-10 2:01 ` Chenbo Feng @ 2018-11-10 15:22 ` Greg KH 2018-11-10 18:58 ` Chenbo Feng 0 siblings, 1 reply; 32+ messages in thread From: Greg KH @ 2018-11-10 15:22 UTC (permalink / raw) To: Chenbo Feng Cc: joel, Alexei Starovoitov, Daniel Colascione, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable On Fri, Nov 09, 2018 at 06:01:54PM -0800, Chenbo Feng wrote: > Hi netdev, > > Could we queue up this patch to stable 4.14 and stable 4.19? I can > provide a backport patch if needed. I checked it is a clean > cherry-pick for 4.19 but have some minor conflict for 4.14. What is the git commit id of the patch in Linus's tree? thanks greg k-h ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v4] Wait for running BPF programs when updating map-in-map 2018-11-10 15:22 ` Greg KH @ 2018-11-10 18:58 ` Chenbo Feng 0 siblings, 0 replies; 32+ messages in thread From: Chenbo Feng @ 2018-11-10 18:58 UTC (permalink / raw) To: gregkh Cc: joel, Alexei Starovoitov, Daniel Colascione, Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, stable The Linus's tree commit SHA is 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8. I can send the patch to stable directly if needed. Thanks Chenbo Feng On Sat, Nov 10, 2018 at 7:22 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Fri, Nov 09, 2018 at 06:01:54PM -0800, Chenbo Feng wrote: > > Hi netdev, > > > > Could we queue up this patch to stable 4.14 and stable 4.19? I can > > provide a backport patch if needed. I checked it is a clean > > cherry-pick for 4.19 but have some minor conflict for 4.14. > > What is the git commit id of the patch in Linus's tree? > > thanks > > greg k-h ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com>]
* Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command [not found] ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com> @ 2018-08-10 22:29 ` Alexei Starovoitov 0 siblings, 0 replies; 32+ messages in thread From: Alexei Starovoitov @ 2018-08-10 22:29 UTC (permalink / raw) To: Daniel Colascione Cc: Joel Fernandes, linux-kernel, Tim Murray, netdev, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Thu, Aug 09, 2018 at 10:17:50AM -0700, Daniel Colascione wrote: > Ping. sorry for delay. have been off grid. let's continue in the other thread for full context ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov 2018-07-29 20:46 ` Daniel Colascione @ 2018-07-31 2:01 ` Joel Fernandes 2018-07-31 2:06 ` Joel Fernandes 1 sibling, 1 reply; 32+ messages in thread From: Joel Fernandes @ 2018-07-31 2:01 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > > map made by a BPF program that is running at the time the > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > > to provide a means for userspace to replace a BPF map with another, > > newer version, then ensure that no component is still using the "old" > > map before manipulating the "old" map in some way. > > > > Signed-off-by: Daniel Colascione <dancol@google.com> > > --- > > include/uapi/linux/bpf.h | 9 +++++++++ > > kernel/bpf/syscall.c | 13 +++++++++++++ > > 2 files changed, 22 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index b7db3261c62d..5b27e9117d3e 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > > __u8 data[0]; /* Arbitrary size */ > > }; > > > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > > + * BPF map made by a BPF program that is running at the time the > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command > > that doesn't sound right to me. > such command won't wait for the release of the references. > in case of map-in-map the program does not hold > the references to inner map (only to outer map). I didn't follow this completely. The userspace program is using the inner map per your description of the algorithm for using map-in-map to solve the race conditions that this patch is trying to address: If you don't mind, I copy-pasted it below from your netdev post: if you use map-in-map you don't need extra boolean map. 0. bpf prog can do inner_map = lookup(map_in_map, key=0); lookup(inner_map, your_real_key); 1. user space writes into map_in_map[0] <- FD of new map 2. some cpus are using old inner map and some a new 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() which will guarantee that progs finished. 4. scan old inner map In step 2, as you mentioned there are CPUs using different inner maps. So could you clarify how the synchronize_rcu mechanism will even work if you're now saying "program does not hold references to the inner maps"? Thanks! - Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-31 2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes @ 2018-07-31 2:06 ` Joel Fernandes 2018-07-31 4:03 ` Y Song 0 siblings, 1 reply; 32+ messages in thread From: Joel Fernandes @ 2018-07-31 2:06 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Colascione, Joel Fernandes, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote: > On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: > > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: > > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > > > map made by a BPF program that is running at the time the > > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > > > to provide a means for userspace to replace a BPF map with another, > > > newer version, then ensure that no component is still using the "old" > > > map before manipulating the "old" map in some way. > > > > > > Signed-off-by: Daniel Colascione <dancol@google.com> > > > --- > > > include/uapi/linux/bpf.h | 9 +++++++++ > > > kernel/bpf/syscall.c | 13 +++++++++++++ > > > 2 files changed, 22 insertions(+) > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index b7db3261c62d..5b27e9117d3e 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > > > __u8 data[0]; /* Arbitrary size */ > > > }; > > > > > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > > > + * BPF map made by a BPF program that is running at the time the > > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command > > > > that doesn't sound right to me. > > such command won't wait for the release of the references. > > in case of map-in-map the program does not hold > > the references to inner map (only to outer map). > > I didn't follow this completely. > > The userspace program is using the inner map per your description of the Sorry just to correct myself, here I meant "The kernel eBPF program is using the inner map on multiple CPUs" instead of "userspace". thanks, - Joel > algorithm for using map-in-map to solve the race conditions that this patch > is trying to address: > > If you don't mind, I copy-pasted it below from your netdev post: > > if you use map-in-map you don't need extra boolean map. > 0. bpf prog can do > inner_map = lookup(map_in_map, key=0); > lookup(inner_map, your_real_key); > 1. user space writes into map_in_map[0] <- FD of new map > 2. some cpus are using old inner map and some a new > 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() > which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() > which will guarantee that progs finished. > 4. scan old inner map > > In step 2, as you mentioned there are CPUs using different inner maps. So > could you clarify how the synchronize_rcu mechanism will even work if you're > now saying "program does not hold references to the inner maps"? > > Thanks! > > - Joel > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-31 2:06 ` Joel Fernandes @ 2018-07-31 4:03 ` Y Song 2018-07-31 21:56 ` Joel Fernandes 0 siblings, 1 reply; 32+ messages in thread From: Y Song @ 2018-07-31 4:03 UTC (permalink / raw) To: Joel Fernandes Cc: Alexei Starovoitov, Daniel Colascione, Joel Fernandes, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote: > On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote: >> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: >> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: >> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF >> > > map made by a BPF program that is running at the time the >> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is >> > > to provide a means for userspace to replace a BPF map with another, >> > > newer version, then ensure that no component is still using the "old" >> > > map before manipulating the "old" map in some way. >> > > >> > > Signed-off-by: Daniel Colascione <dancol@google.com> >> > > --- >> > > include/uapi/linux/bpf.h | 9 +++++++++ >> > > kernel/bpf/syscall.c | 13 +++++++++++++ >> > > 2 files changed, 22 insertions(+) >> > > >> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> > > index b7db3261c62d..5b27e9117d3e 100644 >> > > --- a/include/uapi/linux/bpf.h >> > > +++ b/include/uapi/linux/bpf.h >> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { >> > > __u8 data[0]; /* Arbitrary size */ >> > > }; >> > > >> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a >> > > + * BPF map made by a BPF program that is running at the time the >> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command >> > >> > that doesn't sound right to me. >> > such command won't wait for the release of the references. >> > in case of map-in-map the program does not hold >> > the references to inner map (only to outer map). >> >> I didn't follow this completely. >> >> The userspace program is using the inner map per your description of the > > Sorry just to correct myself, here I meant "The kernel eBPF program is using > the inner map on multiple CPUs" instead of "userspace". > > thanks, > > - Joel > > > > > >> algorithm for using map-in-map to solve the race conditions that this patch >> is trying to address: >> >> If you don't mind, I copy-pasted it below from your netdev post: >> >> if you use map-in-map you don't need extra boolean map. >> 0. bpf prog can do >> inner_map = lookup(map_in_map, key=0); >> lookup(inner_map, your_real_key); >> 1. user space writes into map_in_map[0] <- FD of new map >> 2. some cpus are using old inner map and some a new >> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() >> which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() >> which will guarantee that progs finished. >> 4. scan old inner map >> >> In step 2, as you mentioned there are CPUs using different inner maps. So >> could you clarify how the synchronize_rcu mechanism will even work if you're >> now saying "program does not hold references to the inner maps"? The program only held references to the outer maps, and the outer map held references to the inner maps. The user space program can add/remove the inner map for a particular outer map while the prog <-> outer-map relationship is not changed. >> >> Thanks! >> >> - Joel >> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-31 4:03 ` Y Song @ 2018-07-31 21:56 ` Joel Fernandes 2018-07-31 22:30 ` Daniel Borkmann 0 siblings, 1 reply; 32+ messages in thread From: Joel Fernandes @ 2018-07-31 21:56 UTC (permalink / raw) To: Y Song Cc: Joel Fernandes, Alexei Starovoitov, Daniel Colascione, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote: > On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote: > > On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote: > >> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: > >> > On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: > >> > > BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF > >> > > map made by a BPF program that is running at the time the > >> > > BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is > >> > > to provide a means for userspace to replace a BPF map with another, > >> > > newer version, then ensure that no component is still using the "old" > >> > > map before manipulating the "old" map in some way. > >> > > > >> > > Signed-off-by: Daniel Colascione <dancol@google.com> > >> > > --- > >> > > include/uapi/linux/bpf.h | 9 +++++++++ > >> > > kernel/bpf/syscall.c | 13 +++++++++++++ > >> > > 2 files changed, 22 insertions(+) > >> > > > >> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> > > index b7db3261c62d..5b27e9117d3e 100644 > >> > > --- a/include/uapi/linux/bpf.h > >> > > +++ b/include/uapi/linux/bpf.h > >> > > @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { > >> > > __u8 data[0]; /* Arbitrary size */ > >> > > }; > >> > > > >> > > +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a > >> > > + * BPF map made by a BPF program that is running at the time the > >> > > + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command > >> > > >> > that doesn't sound right to me. > >> > such command won't wait for the release of the references. > >> > in case of map-in-map the program does not hold > >> > the references to inner map (only to outer map). > >> > >> I didn't follow this completely. > >> > >> The userspace program is using the inner map per your description of the > >> algorithm for using map-in-map to solve the race conditions that this patch > >> is trying to address: > >> > >> If you don't mind, I copy-pasted it below from your netdev post: > >> > >> if you use map-in-map you don't need extra boolean map. > >> 0. bpf prog can do > >> inner_map = lookup(map_in_map, key=0); > >> lookup(inner_map, your_real_key); > >> 1. user space writes into map_in_map[0] <- FD of new map > >> 2. some cpus are using old inner map and some a new > >> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() > >> which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() > >> which will guarantee that progs finished. > >> 4. scan old inner map > >> > >> In step 2, as you mentioned there are CPUs using different inner maps. So > >> could you clarify how the synchronize_rcu mechanism will even work if you're > >> now saying "program does not hold references to the inner maps"? > > The program only held references to the outer maps, and the outer map > held references to the inner maps. The user space program can add/remove > the inner map for a particular outer map while the prog <-> outer-map > relationship is not changed. My definition of "reference" in this context is protection by rcu_read_lock. So I was concerned the above map-in-map access isn't protected as such when Alexei said "program doesn't have reference on inner map" in the above steps. Maybe I misunderstood what is the meaning of reference here. To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the access of outer map at key=0 and the inner map have to protect by rcu_read_lock so that the membarrier call will work. So basically step 0 in the steps above should be rcu_read_lock protected to satisfy Chenbo/Lorenzo's usecase. I know today the entire program is run as preempt disabled (unless something changed) so this shouldn't be a problem, but in the future if the verifier is doing similar things at a finer grainer level, then the above has to be taken into consideration. Does that make sense or am I missing something? thanks, - Joel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-31 21:56 ` Joel Fernandes @ 2018-07-31 22:30 ` Daniel Borkmann 0 siblings, 0 replies; 32+ messages in thread From: Daniel Borkmann @ 2018-07-31 22:30 UTC (permalink / raw) To: Joel Fernandes, Y Song Cc: Joel Fernandes, Alexei Starovoitov, Daniel Colascione, LKML, Tim Murray, Network Development, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov On 07/31/2018 11:56 PM, Joel Fernandes wrote: > On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote: >> On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@joelfernandes.org> wrote: >>> On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote: >>>> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote: >>>>> On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@google.com> wrote: >>>>>> BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF >>>>>> map made by a BPF program that is running at the time the >>>>>> BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is >>>>>> to provide a means for userspace to replace a BPF map with another, >>>>>> newer version, then ensure that no component is still using the "old" >>>>>> map before manipulating the "old" map in some way. >>>>>> >>>>>> Signed-off-by: Daniel Colascione <dancol@google.com> >>>>>> --- >>>>>> include/uapi/linux/bpf.h | 9 +++++++++ >>>>>> kernel/bpf/syscall.c | 13 +++++++++++++ >>>>>> 2 files changed, 22 insertions(+) >>>>>> >>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>>> index b7db3261c62d..5b27e9117d3e 100644 >>>>>> --- a/include/uapi/linux/bpf.h >>>>>> +++ b/include/uapi/linux/bpf.h >>>>>> @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { >>>>>> __u8 data[0]; /* Arbitrary size */ >>>>>> }; >>>>>> >>>>>> +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a >>>>>> + * BPF map made by a BPF program that is running at the time the >>>>>> + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command >>>>> >>>>> that doesn't sound right to me. >>>>> such command won't wait for the release of the references. >>>>> in case of map-in-map the program does not hold >>>>> the references to inner map (only to outer map). >>>> >>>> I didn't follow this completely. >>>> >>>> The userspace program is using the inner map per your description of the >>>> algorithm for using map-in-map to solve the race conditions that this patch >>>> is trying to address: >>>> >>>> If you don't mind, I copy-pasted it below from your netdev post: >>>> >>>> if you use map-in-map you don't need extra boolean map. >>>> 0. bpf prog can do >>>> inner_map = lookup(map_in_map, key=0); >>>> lookup(inner_map, your_real_key); >>>> 1. user space writes into map_in_map[0] <- FD of new map >>>> 2. some cpus are using old inner map and some a new >>>> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched() >>>> which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu() >>>> which will guarantee that progs finished. >>>> 4. scan old inner map >>>> >>>> In step 2, as you mentioned there are CPUs using different inner maps. So >>>> could you clarify how the synchronize_rcu mechanism will even work if you're >>>> now saying "program does not hold references to the inner maps"? >> >> The program only held references to the outer maps, and the outer map >> held references to the inner maps. The user space program can add/remove >> the inner map for a particular outer map while the prog <-> outer-map >> relationship is not changed. > > My definition of "reference" in this context is protection by rcu_read_lock. > > So I was concerned the above map-in-map access isn't protected as such when > Alexei said "program doesn't have reference on inner map" in the above steps. > Maybe I misunderstood what is the meaning of reference here. > > To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the > access of outer map at key=0 and the inner map have to protect by > rcu_read_lock so that the membarrier call will work. > > So basically step 0 in the steps above should be rcu_read_lock protected to > satisfy Chenbo/Lorenzo's usecase. > > I know today the entire program is run as preempt disabled (unless something > changed) so this shouldn't be a problem, but in the future if the verifier is > doing similar things at a finer grainer level, then the above has to be > taken into consideration. > > Does that make sense or am I missing something? All BPF programs are required to run under rcu_read_lock today, so that assumption holds. Should this ever change in future, then this constraint of course needs to be taken into consideration. Thanks, Daniel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC] Add BPF_SYNCHRONIZE bpf(2) command
@ 2018-07-16 20:23 Joel Fernandes
2018-07-26 16:51 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Daniel Colascione
0 siblings, 1 reply; 32+ messages in thread
From: Joel Fernandes @ 2018-07-16 20:23 UTC (permalink / raw)
To: Daniel Colascione
Cc: Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng,
Mathieu Desnoyers, Joel Fernandes, Alexei Starovoitov, lkml,
Tim Murray, Daniel Borkmann, netdev
On Mon, Jul 16, 2018 at 08:29:47AM -0700, Daniel Colascione wrote:
> On Sat, Jul 14, 2018 at 11:18 AM, Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Tue, Jul 10, 2018 at 08:40:19PM -0700, Alexei Starovoitov wrote:
> > [..]
> >> > The kernel program might do:
> >> >
> >> > =====
> >> > const int current_map_key = 1;
> >> > void *current_map = bpf_map_lookup_elem(outer_map, ¤t_map_key);
> >> >
> >> > int stats_key = 42;
> >> > uint64_t *stats_value = bpf_map_lookup_elem(current_map, &stats_key);
> >> > __sync_fetch_and_add(&stats_value, 1);
> >> > =====
> >> >
> >> > If a userspace does:
> >> >
> >> > 1. Write new fd to outer_map[1].
> >> > 2. Call BPF_SYNC_MAP_ACCESS.
> >> > 3. Start deleting everything in the old map.
> >> >
> >> > How can we guarantee that the __sync_fetch_and_add will not add to the
> >> > old map?
> >>
> >> without any changes to the kernel sys_membarrier will work.
> >> And that's what folks use already.
> >> BPF_SYNC_MAP_ACCESS implemented via synchronize_rcu() will work
> >> as well whether in the current implementation where rcu_lock/unlock
> >> is done outside of the program and in the future when
> >> rcu_lock/unlock are called by the program itself.
> >
> > Cool Alexei and Lorenzo, sounds great to me. Daniel want to send a follow up
> > patch with BPF_SYNC_MAP_ACCESS changes then?
>
> Will do. Mind if I just mine this thread for the doc comment?
Do you mean the changelog? Yes I believe you could use the discussion in this
thread for the rationale as Alexei described.
thanks,
- Joel
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command 2018-07-16 20:23 [RFC] Add BPF_SYNCHRONIZE " Joel Fernandes @ 2018-07-26 16:51 ` Daniel Colascione 0 siblings, 0 replies; 32+ messages in thread From: Daniel Colascione @ 2018-07-26 16:51 UTC (permalink / raw) To: joelaf Cc: linux-kernel, timmurray, netdev, Alexei Starovoitov, Lorenzo Colitti, Chenbo Feng, Mathieu Desnoyers, Alexei Starovoitov, Daniel Borkmann, Daniel Colascione BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF map made by a BPF program that is running at the time the BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is to provide a means for userspace to replace a BPF map with another, newer version, then ensure that no component is still using the "old" map before manipulating the "old" map in some way. Signed-off-by: Daniel Colascione <dancol@google.com> --- include/uapi/linux/bpf.h | 9 +++++++++ kernel/bpf/syscall.c | 13 +++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b7db3261c62d..5b27e9117d3e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key { __u8 data[0]; /* Arbitrary size */ }; +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a + * BPF map made by a BPF program that is running at the time the + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command + * is to provide a means for userspace to replace a BPF map with + * another, newer version, then ensure that no component is still + * using the "old" map before manipulating the "old" map in some way. + */ + /* BPF syscall commands, see bpf(2) man-page for details. */ enum bpf_cmd { BPF_MAP_CREATE, @@ -98,6 +106,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_SYNCHRONIZE_MAPS, }; enum bpf_map_type { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a31a1ba0f8ea..8bbd1a5d01d1 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN)) return -EPERM; + if (cmd == BPF_SYNCHRONIZE_MAPS) { + if (uattr != NULL || size != 0) + return -EINVAL; + err = security_bpf(cmd, NULL, 0); + if (err < 0) + return err; + /* BPF programs always enter a critical section while + * they have a map reference outstanding. + */ + synchronize_rcu(); + return 0; + } + err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size); if (err) return err; -- 2.18.0.233.g985f88cf7e-goog ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-11-10 18:58 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-29 15:51 [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command Alexei Starovoitov 2018-07-29 20:46 ` Daniel Colascione 2018-07-29 20:58 ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES " Daniel Colascione 2018-07-30 10:04 ` Daniel Borkmann 2018-07-30 10:04 ` Daniel Borkmann 2018-07-30 10:25 ` Daniel Colascione 2018-07-31 0:26 ` Jakub Kicinski 2018-07-31 0:33 ` Daniel Colascione 2018-07-31 0:45 ` Jakub Kicinski 2018-07-31 0:50 ` Daniel Colascione 2018-07-31 1:14 ` Jakub Kicinski 2018-07-31 8:34 ` Daniel Borkmann 2018-07-31 9:36 ` Daniel Colascione 2018-08-10 22:52 ` Alexei Starovoitov 2018-08-14 20:37 ` Daniel Colascione 2018-08-16 4:01 ` Alexei Starovoitov 2018-10-12 10:54 ` [PATCH v4] Wait for running BPF programs when updating map-in-map Daniel Colascione 2018-10-12 20:54 ` Joel Fernandes 2018-10-13 2:31 ` Alexei Starovoitov 2018-10-16 17:39 ` Joel Fernandes 2018-10-18 15:46 ` Alexei Starovoitov 2018-10-18 23:36 ` Joel Fernandes 2018-11-10 2:01 ` Chenbo Feng 2018-11-10 15:22 ` Greg KH 2018-11-10 18:58 ` Chenbo Feng [not found] ` <CAKOZues6SE_c=ix7ap6QaJHqd1TmYpWWMJiu3=TtuqgKuqOUCA@mail.gmail.com> 2018-08-10 22:29 ` [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command Alexei Starovoitov 2018-07-31 2:01 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Joel Fernandes 2018-07-31 2:06 ` Joel Fernandes 2018-07-31 4:03 ` Y Song 2018-07-31 21:56 ` Joel Fernandes 2018-07-31 22:30 ` Daniel Borkmann -- strict thread matches above, loose matches on Subject: below -- 2018-07-16 20:23 [RFC] Add BPF_SYNCHRONIZE " Joel Fernandes 2018-07-26 16:51 ` [PATCH v2] Add BPF_SYNCHRONIZE_MAPS " Daniel Colascione
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.