bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero
@ 2020-06-08 16:51 Jesper Dangaard Brouer
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-08 16:51 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

Make it easier to handle UAPI/kABI extensions by avoid BPF using/returning
file descriptor value zero. Use this in recent devmap extension to keep
older applications compatible with newer kernels.

For special type maps (e.g. devmap and cpumap) the map-value data-layout is
a configuration interface. This is a kernel Application Binary Interface
(kABI) that can only be tail extended. Thus, new members (and thus features)
can only be added to the end of this structure, and the kernel uses the
map->value_size from userspace to determine feature set 'version'.

For this kind of kABI to be extensible and backward compatible, is it common
that new members/fields (that represent a new feature) in the struct are
initialised as zero, which indicate that the feature isn't used. This makes
it possible to write userspace applications that are unaware of new kernel
features, but just include latest uapi headers, zero-init struct and
populate features it knows about.

The recent extension of devmap with a bpf_prog.fd requires end-user to
supply the file-descriptor value minus-1 to communicate that the features
isn't used. This isn't compatible with the described kABI extension model.

---

Jesper Dangaard Brouer (3):
      bpf: syscall to start at file-descriptor 1
      bpf: devmap adjust uapi for attach bpf program
      bpf: selftests and tools use struct bpf_devmap_val from uapi


 fs/file.c                                          |    2 +
 include/linux/file.h                               |    1 +
 include/uapi/linux/bpf.h                           |   13 +++++++
 kernel/bpf/devmap.c                                |   17 ++-------
 kernel/bpf/syscall.c                               |   38 +++++++++++++++++---
 tools/include/uapi/linux/bpf.h                     |   13 +++++++
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   |    8 ----
 .../selftests/bpf/progs/test_xdp_devmap_helpers.c  |    2 +
 .../bpf/progs/test_xdp_with_devmap_helpers.c       |    3 +-
 9 files changed, 66 insertions(+), 31 deletions(-)

--


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
@ 2020-06-08 16:51 ` Jesper Dangaard Brouer
  2020-06-08 18:28   ` Toke Høiland-Jørgensen
                     ` (5 more replies)
  2020-06-08 16:51 ` [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  3 siblings, 6 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-08 16:51 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

This patch change BPF syscall to avoid returning file descriptor value zero.

As mentioned in cover letter, it is very impractical when extending kABI
that the file-descriptor value 'zero' is valid, as this requires new fields
must be initialised as minus-1. First step is to change the kernel such that
BPF-syscall simply doesn't return value zero as a FD number.

This patch achieves this by similar code to anon_inode_getfd(), with the
exception of getting unused FD starting from 1. The kernel already supports
starting from a specific FD value, as this is used by f_dupfd(). It seems
simpler to replicate part of anon_inode_getfd() code and use this start from
offset feature, instead of using f_dupfd() handling afterwards.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 fs/file.c            |    2 +-
 include/linux/file.h |    1 +
 kernel/bpf/syscall.c |   38 ++++++++++++++++++++++++++++++++------
 3 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index abb8b7081d7a..122185cb7707 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -535,7 +535,7 @@ int __alloc_fd(struct files_struct *files,
 	return error;
 }
 
-static int alloc_fd(unsigned start, unsigned flags)
+int alloc_fd(unsigned start, unsigned flags)
 {
 	return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags);
 }
diff --git a/include/linux/file.h b/include/linux/file.h
index 122f80084a3e..927fb6c2582d 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -85,6 +85,7 @@ extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
 extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
 extern void set_close_on_exec(unsigned int fd, int flag);
 extern bool get_close_on_exec(unsigned int fd);
+extern int alloc_fd(unsigned start, unsigned flags);
 extern int __get_unused_fd_flags(unsigned flags, unsigned long nofile);
 extern int get_unused_fd_flags(unsigned flags);
 extern void put_unused_fd(unsigned int fd);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4d530b1d5683..6eba236aacd1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -688,6 +688,32 @@ const struct file_operations bpf_map_fops = {
 	.poll		= bpf_map_poll,
 };
 
+/* Code is similar to anon_inode_getfd(), except starts at FD 1 */
+int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
+			 void *priv, int flags)
+{
+	int error, fd;
+	struct file *file;
+
+	error = alloc_fd(1, flags);
+	if (error < 0)
+		return error;
+	fd = error;
+
+	file = anon_inode_getfile(name, fops, priv, flags);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	fd_install(fd, file);
+
+	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+	return error;
+}
+
 int bpf_map_new_fd(struct bpf_map *map, int flags)
 {
 	int ret;
@@ -696,8 +722,8 @@ int bpf_map_new_fd(struct bpf_map *map, int flags)
 	if (ret < 0)
 		return ret;
 
-	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
-				flags | O_CLOEXEC);
+	return bpf_anon_inode_getfd("bpf-map", &bpf_map_fops, map,
+				    flags | O_CLOEXEC);
 }
 
 int bpf_get_file_flag(int flags)
@@ -1840,8 +1866,8 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
 	if (ret < 0)
 		return ret;
 
-	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
-				O_RDWR | O_CLOEXEC);
+	return bpf_anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
+				    O_RDWR | O_CLOEXEC);
 }
 
 static struct bpf_prog *____bpf_prog_get(struct fd f)
@@ -2471,7 +2497,7 @@ int bpf_link_settle(struct bpf_link_primer *primer)
 
 int bpf_link_new_fd(struct bpf_link *link)
 {
-	return anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
+	return bpf_anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
 }
 
 struct bpf_link *bpf_link_get_from_fd(u32 ufd)
@@ -4024,7 +4050,7 @@ static int bpf_enable_runtime_stats(void)
 		return -EBUSY;
 	}
 
-	fd = anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, O_CLOEXEC);
+	fd = bpf_anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, O_CLOEXEC);
 	if (fd >= 0)
 		static_key_slow_inc(&bpf_stats_enabled_key.key);
 



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program
  2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
@ 2020-06-08 16:51 ` Jesper Dangaard Brouer
  2020-06-08 18:30   ` Toke Høiland-Jørgensen
  2020-06-08 16:51 ` [PATCH bpf 3/3] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
  2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
  3 siblings, 1 reply; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-08 16:51 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

The recent commit fbee97feed9b ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.

This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.

As previous patch changed BPF-syscall to avoid returning file descriptor
value zero, we can remove the minus-1 checks, and have zero mean feature
isn't used.

Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h       |   13 +++++++++++++
 kernel/bpf/devmap.c            |   17 ++++-------------
 tools/include/uapi/linux/bpf.h |    5 -----
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c65b374a5090..19684813faae 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3761,6 +3761,19 @@ struct xdp_md {
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
+/* DEVMAP map-value layout
+ *
+ * The struct data-layout of map-value is a configuration interface.
+ * New members can only be added to the end of this structure.
+ */
+struct bpf_devmap_val {
+	__u32 ifindex;   /* device index */
+	union {
+		int   fd;  /* prog fd on map write */
+		__u32 id;  /* prog id on map read */
+	} bpf_prog;
+};
+
 enum sk_action {
 	SK_DROP = 0,
 	SK_PASS,
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 854b09beb16b..d268a8e693cb 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -60,15 +60,6 @@ struct xdp_dev_bulk_queue {
 	unsigned int count;
 };
 
-/* DEVMAP values */
-struct bpf_devmap_val {
-	u32 ifindex;   /* device index */
-	union {
-		int fd;  /* prog fd on map write */
-		u32 id;  /* prog id on map read */
-	} bpf_prog;
-};
-
 struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct hlist_node index_hlist;
@@ -618,7 +609,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	if (!dev->dev)
 		goto err_out;
 
-	if (val->bpf_prog.fd >= 0) {
+	if (val->bpf_prog.fd > 0) {
 		prog = bpf_prog_get_type_dev(val->bpf_prog.fd,
 					     BPF_PROG_TYPE_XDP, false);
 		if (IS_ERR(prog))
@@ -652,8 +643,8 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
 				 void *key, void *value, u64 map_flags)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
 	struct bpf_dtab_netdev *dev, *old_dev;
+	struct bpf_devmap_val val = {0};
 	u32 i = *(u32 *)key;
 
 	if (unlikely(map_flags > BPF_EXIST))
@@ -669,7 +660,7 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
 	if (!val.ifindex) {
 		dev = NULL;
 		/* can not specify fd if ifindex is 0 */
-		if (val.bpf_prog.fd != -1)
+		if (val.bpf_prog.fd > 0)
 			return -EINVAL;
 	} else {
 		dev = __dev_map_alloc_node(net, dtab, &val, i);
@@ -699,8 +690,8 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
 				     void *key, void *value, u64 map_flags)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
 	struct bpf_dtab_netdev *dev, *old_dev;
+	struct bpf_devmap_val val = {0};
 	u32 idx = *(u32 *)key;
 	unsigned long flags;
 	int err = -EEXIST;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c65b374a5090..868e9efe9c09 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3761,11 +3761,6 @@ struct xdp_md {
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
-enum sk_action {
-	SK_DROP = 0,
-	SK_PASS,
-};
-
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf 3/3] bpf: selftests and tools use struct bpf_devmap_val from uapi
  2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
  2020-06-08 16:51 ` [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
@ 2020-06-08 16:51 ` Jesper Dangaard Brouer
  2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
  3 siblings, 0 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-08 16:51 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

Sync tools uapi bpf.h header file and update selftests that use
struct bpf_devmap_val.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 tools/include/uapi/linux/bpf.h                     |   18 ++++++++++++++++++
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   |    8 --------
 .../selftests/bpf/progs/test_xdp_devmap_helpers.c  |    2 +-
 .../bpf/progs/test_xdp_with_devmap_helpers.c       |    3 +--
 4 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 868e9efe9c09..19684813faae 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3761,6 +3761,24 @@ struct xdp_md {
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
+/* DEVMAP map-value layout
+ *
+ * The struct data-layout of map-value is a configuration interface.
+ * New members can only be added to the end of this structure.
+ */
+struct bpf_devmap_val {
+	__u32 ifindex;   /* device index */
+	union {
+		int   fd;  /* prog fd on map write */
+		__u32 id;  /* prog id on map read */
+	} bpf_prog;
+};
+
+enum sk_action {
+	SK_DROP = 0,
+	SK_PASS,
+};
+
 /* user accessible metadata for SK_MSG packet hook, new fields must
  * be added to the end of this structure
  */
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
index d19dbd668f6a..88ef3ec8ac4c 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
@@ -8,14 +8,6 @@
 
 #define IFINDEX_LO 1
 
-struct bpf_devmap_val {
-	u32 ifindex;   /* device index */
-	union {
-		int fd;  /* prog fd on map write */
-		u32 id;  /* prog id on map read */
-	} bpf_prog;
-};
-
 void test_xdp_with_devmap_helpers(void)
 {
 	struct test_xdp_with_devmap_helpers *skel;
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
index e5c0f131c8a7..b360ba2bd441 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
@@ -2,7 +2,7 @@
 /* fails to load without expected_attach_type = BPF_XDP_DEVMAP
  * because of access to egress_ifindex
  */
-#include "vmlinux.h"
+#include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
 SEC("xdp_dm_log")
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
index deef0e050863..330811260123 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-
-#include "vmlinux.h"
+#include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
 struct {



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
@ 2020-06-08 18:28   ` Toke Høiland-Jørgensen
  2020-06-08 18:36   ` Andrii Nakryiko
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-08 18:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> This patch change BPF syscall to avoid returning file descriptor value zero.
>
> As mentioned in cover letter, it is very impractical when extending kABI
> that the file-descriptor value 'zero' is valid, as this requires new fields
> must be initialised as minus-1. First step is to change the kernel such that
> BPF-syscall simply doesn't return value zero as a FD number.
>
> This patch achieves this by similar code to anon_inode_getfd(), with the
> exception of getting unused FD starting from 1. The kernel already supports
> starting from a specific FD value, as this is used by f_dupfd(). It seems
> simpler to replicate part of anon_inode_getfd() code and use this start from
> offset feature, instead of using f_dupfd() handling afterwards.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  fs/file.c            |    2 +-
>  include/linux/file.h |    1 +
>  kernel/bpf/syscall.c |   38 ++++++++++++++++++++++++++++++++------
>  3 files changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index abb8b7081d7a..122185cb7707 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -535,7 +535,7 @@ int __alloc_fd(struct files_struct *files,
>  	return error;
>  }
>  
> -static int alloc_fd(unsigned start, unsigned flags)
> +int alloc_fd(unsigned start, unsigned flags)

Missing an EXPORT_SYMBOL() to go with this.

>  {
>  	return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags);
>  }
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 122f80084a3e..927fb6c2582d 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -85,6 +85,7 @@ extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
>  extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
>  extern void set_close_on_exec(unsigned int fd, int flag);
>  extern bool get_close_on_exec(unsigned int fd);
> +extern int alloc_fd(unsigned start, unsigned flags);
>  extern int __get_unused_fd_flags(unsigned flags, unsigned long nofile);
>  extern int get_unused_fd_flags(unsigned flags);
>  extern void put_unused_fd(unsigned int fd);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 4d530b1d5683..6eba236aacd1 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -688,6 +688,32 @@ const struct file_operations bpf_map_fops = {
>  	.poll		= bpf_map_poll,
>  };
>  
> +/* Code is similar to anon_inode_getfd(), except starts at FD 1 */
> +int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
> +			 void *priv, int flags)
> +{

I think it's a little sad that we have to essentially re-implement
anon_inode_getfd(). I guess an alternative could be to add an extra
parameter to the existing function, either with a different name and a
wrapper function, or just change all the callers (by my count there are
only 13 call sites outside of BPF). Not sure if that is better, though?

> +	int error, fd;
> +	struct file *file;
> +
> +	error = alloc_fd(1, flags);
> +	if (error < 0)
> +		return error;
> +	fd = error;
> +
> +	file = anon_inode_getfile(name, fops, priv, flags);
> +	if (IS_ERR(file)) {
> +		error = PTR_ERR(file);
> +		goto err_put_unused_fd;
> +	}
> +	fd_install(fd, file);
> +
> +	return fd;
> +
> +err_put_unused_fd:
> +	put_unused_fd(fd);
> +	return error;
> +}
> +
>  int bpf_map_new_fd(struct bpf_map *map, int flags)
>  {
>  	int ret;
> @@ -696,8 +722,8 @@ int bpf_map_new_fd(struct bpf_map *map, int flags)
>  	if (ret < 0)
>  		return ret;
>  
> -	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
> -				flags | O_CLOEXEC);
> +	return bpf_anon_inode_getfd("bpf-map", &bpf_map_fops, map,
> +				    flags | O_CLOEXEC);
>  }
>  
>  int bpf_get_file_flag(int flags)
> @@ -1840,8 +1866,8 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
>  	if (ret < 0)
>  		return ret;
>  
> -	return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
> -				O_RDWR | O_CLOEXEC);
> +	return bpf_anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
> +				    O_RDWR | O_CLOEXEC);
>  }
>  
>  static struct bpf_prog *____bpf_prog_get(struct fd f)
> @@ -2471,7 +2497,7 @@ int bpf_link_settle(struct bpf_link_primer *primer)
>  
>  int bpf_link_new_fd(struct bpf_link *link)
>  {
> -	return anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
> +	return bpf_anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
>  }
>  
>  struct bpf_link *bpf_link_get_from_fd(u32 ufd)
> @@ -4024,7 +4050,7 @@ static int bpf_enable_runtime_stats(void)
>  		return -EBUSY;
>  	}
>  
> -	fd = anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, O_CLOEXEC);
> +	fd = bpf_anon_inode_getfd("bpf-stats", &bpf_stats_fops, NULL, O_CLOEXEC);
>  	if (fd >= 0)
>  		static_key_slow_inc(&bpf_stats_enabled_key.key);
>  

I guess you should also fix __btf_new_fd() in btf.c?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program
  2020-06-08 16:51 ` [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
@ 2020-06-08 18:30   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-08 18:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> The recent commit fbee97feed9b ("bpf: Add support to attach bpf program to a
> devmap entry"), introduced ability to attach (and run) a separate XDP
> bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
> As zero were a valid FD, not using the feature requires using value minus-1.
> The UAPI is extended via tail-extending struct bpf_devmap_val and using
> map->value_size to determine the feature set.
>
> This will break older userspace applications not using the bpf_prog feature.
> Consider an old userspace app that is compiled against newer kernel
> uapi/bpf.h, it will not know that it need to initialise the member
> bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
> get program running on newer kernels.
>
> As previous patch changed BPF-syscall to avoid returning file descriptor
> value zero, we can remove the minus-1 checks, and have zero mean feature
> isn't used.
>
> Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  include/uapi/linux/bpf.h       |   13 +++++++++++++
>  kernel/bpf/devmap.c            |   17 ++++-------------
>  tools/include/uapi/linux/bpf.h |    5 -----
>  3 files changed, 17 insertions(+), 18 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c65b374a5090..19684813faae 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3761,6 +3761,19 @@ struct xdp_md {
>  	__u32 egress_ifindex;  /* txq->dev->ifindex */
>  };
>  
> +/* DEVMAP map-value layout
> + *
> + * The struct data-layout of map-value is a configuration interface.
> + * New members can only be added to the end of this structure.
> + */
> +struct bpf_devmap_val {
> +	__u32 ifindex;   /* device index */
> +	union {
> +		int   fd;  /* prog fd on map write */
> +		__u32 id;  /* prog id on map read */
> +	} bpf_prog;
> +};
> +
>  enum sk_action {
>  	SK_DROP = 0,
>  	SK_PASS,
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 854b09beb16b..d268a8e693cb 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -60,15 +60,6 @@ struct xdp_dev_bulk_queue {
>  	unsigned int count;
>  };
>  
> -/* DEVMAP values */
> -struct bpf_devmap_val {
> -	u32 ifindex;   /* device index */
> -	union {
> -		int fd;  /* prog fd on map write */
> -		u32 id;  /* prog id on map read */
> -	} bpf_prog;
> -};
> -
>  struct bpf_dtab_netdev {
>  	struct net_device *dev; /* must be first member, due to tracepoint */
>  	struct hlist_node index_hlist;
> @@ -618,7 +609,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
>  	if (!dev->dev)
>  		goto err_out;
>  
> -	if (val->bpf_prog.fd >= 0) {
> +	if (val->bpf_prog.fd > 0) {
>  		prog = bpf_prog_get_type_dev(val->bpf_prog.fd,
>  					     BPF_PROG_TYPE_XDP, false);
>  		if (IS_ERR(prog))
> @@ -652,8 +643,8 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
>  				 void *key, void *value, u64 map_flags)
>  {
>  	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> -	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
>  	struct bpf_dtab_netdev *dev, *old_dev;
> +	struct bpf_devmap_val val = {0};

nit: I prefer {} to {0} - 'git grep' indicates that {} is also the most
commonly used in kernel/bpf/


>  	u32 i = *(u32 *)key;
>  
>  	if (unlikely(map_flags > BPF_EXIST))
> @@ -669,7 +660,7 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
>  	if (!val.ifindex) {
>  		dev = NULL;
>  		/* can not specify fd if ifindex is 0 */
> -		if (val.bpf_prog.fd != -1)
> +		if (val.bpf_prog.fd > 0)
>  			return -EINVAL;
>  	} else {
>  		dev = __dev_map_alloc_node(net, dtab, &val, i);
> @@ -699,8 +690,8 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
>  				     void *key, void *value, u64 map_flags)
>  {
>  	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> -	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
>  	struct bpf_dtab_netdev *dev, *old_dev;
> +	struct bpf_devmap_val val = {0};

Same here.

>  	u32 idx = *(u32 *)key;
>  	unsigned long flags;
>  	int err = -EEXIST;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index c65b374a5090..868e9efe9c09 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3761,11 +3761,6 @@ struct xdp_md {
>  	__u32 egress_ifindex;  /* txq->dev->ifindex */
>  };
>  
> -enum sk_action {
> -	SK_DROP = 0,
> -	SK_PASS,
> -};
> -
>  /* user accessible metadata for SK_MSG packet hook, new fields must
>   * be added to the end of this structure
>   */


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
  2020-06-08 18:28   ` Toke Høiland-Jørgensen
@ 2020-06-08 18:36   ` Andrii Nakryiko
  2020-06-08 19:44     ` Jesper Dangaard Brouer
  2020-06-08 19:00   ` kernel test robot
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Andrii Nakryiko @ 2020-06-08 18:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Ahern, bpf, Alexei Starovoitov, Networking,
	Daniel Borkmann, Lorenzo Bianconi

On Mon, Jun 8, 2020 at 9:51 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> This patch change BPF syscall to avoid returning file descriptor value zero.
>
> As mentioned in cover letter, it is very impractical when extending kABI
> that the file-descriptor value 'zero' is valid, as this requires new fields
> must be initialised as minus-1. First step is to change the kernel such that
> BPF-syscall simply doesn't return value zero as a FD number.
>
> This patch achieves this by similar code to anon_inode_getfd(), with the
> exception of getting unused FD starting from 1. The kernel already supports
> starting from a specific FD value, as this is used by f_dupfd(). It seems
> simpler to replicate part of anon_inode_getfd() code and use this start from
> offset feature, instead of using f_dupfd() handling afterwards.

Wouldn't it be better to just handle that on libbpf side? That way it
works on all kernels and doesn't require this duplication of logic
inside kernel?

>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  fs/file.c            |    2 +-
>  include/linux/file.h |    1 +
>  kernel/bpf/syscall.c |   38 ++++++++++++++++++++++++++++++++------
>  3 files changed, 34 insertions(+), 7 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
  2020-06-08 18:28   ` Toke Høiland-Jørgensen
  2020-06-08 18:36   ` Andrii Nakryiko
@ 2020-06-08 19:00   ` kernel test robot
  2020-06-08 19:55   ` kernel test robot
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2020-06-08 19:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: kbuild-all, Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Andrii Nakryiko, Lorenzo Bianconi

[-- Attachment #1: Type: text/plain, Size: 4018 bytes --]

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-avoid-using-returning-file-descriptor-value-zero/20200609-005457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
In file included from include/linux/kernel.h:11,
from include/linux/list.h:9,
from include/linux/timer.h:5,
from include/linux/workqueue.h:9,
from include/linux/bpf.h:9,
from kernel/bpf/syscall.c:4:
include/asm-generic/fixmap.h: In function 'fix_to_virt':
include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
|                   ^~
include/linux/compiler.h:383:9: note: in definition of macro '__compiletime_assert'
383 |   if (!(condition))              |         ^~~~~~~~~
include/linux/compiler.h:403:2: note: in expansion of macro '_compiletime_assert'
403 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
|  ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
|                                     ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
|  ^~~~~~~~~~~~~~~~
include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
|  ^~~~~~~~~~~~
In file included from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:34,
from include/linux/huge_mm.h:8,
from include/linux/mm.h:675,
from include/linux/kallsyms.h:12,
from include/linux/bpf.h:21,
from kernel/bpf/syscall.c:4:
arch/um/include/asm/uaccess.h: In function '__access_ok':
arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
17 |    (((unsigned long) (addr) >= FIXADDR_USER_START) &&          |                             ^~
arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
45 |   __access_ok_vsyscall(addr, size) ||
|   ^~~~~~~~~~~~~~~~~~~~
kernel/bpf/syscall.c: At top level:
>> kernel/bpf/syscall.c:692:5: warning: no previous prototype for 'bpf_anon_inode_getfd' [-Wmissing-prototypes]
692 | int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
|     ^~~~~~~~~~~~~~~~~~~~

vim +/bpf_anon_inode_getfd +692 kernel/bpf/syscall.c

   690	
   691	/* Code is similar to anon_inode_getfd(), except starts at FD 1 */
 > 692	int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
   693				 void *priv, int flags)
   694	{
   695		int error, fd;
   696		struct file *file;
   697	
   698		error = alloc_fd(1, flags);
   699		if (error < 0)
   700			return error;
   701		fd = error;
   702	
   703		file = anon_inode_getfile(name, fops, priv, flags);
   704		if (IS_ERR(file)) {
   705			error = PTR_ERR(file);
   706			goto err_put_unused_fd;
   707		}
   708		fd_install(fd, file);
   709	
   710		return fd;
   711	
   712	err_put_unused_fd:
   713		put_unused_fd(fd);
   714		return error;
   715	}
   716	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22683 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 18:36   ` Andrii Nakryiko
@ 2020-06-08 19:44     ` Jesper Dangaard Brouer
  2020-06-08 20:05       ` Andrii Nakryiko
  0 siblings, 1 reply; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-08 19:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Ahern, bpf, Alexei Starovoitov, Networking,
	Daniel Borkmann, Lorenzo Bianconi, brouer

On Mon, 8 Jun 2020 11:36:33 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Mon, Jun 8, 2020 at 9:51 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > This patch change BPF syscall to avoid returning file descriptor value zero.
> >
> > As mentioned in cover letter, it is very impractical when extending kABI
> > that the file-descriptor value 'zero' is valid, as this requires new fields
> > must be initialised as minus-1. First step is to change the kernel such that
> > BPF-syscall simply doesn't return value zero as a FD number.
> >
> > This patch achieves this by similar code to anon_inode_getfd(), with the
> > exception of getting unused FD starting from 1. The kernel already supports
> > starting from a specific FD value, as this is used by f_dupfd(). It seems
> > simpler to replicate part of anon_inode_getfd() code and use this start from
> > offset feature, instead of using f_dupfd() handling afterwards.  
> 
> Wouldn't it be better to just handle that on libbpf side? That way it
> works on all kernels and doesn't require this duplication of logic
> inside kernel?

IMHO this is needed on the kernel side, to pair it with the API change.
I don't see how doing this in libbpf can cover all cases.

First of all, some users might not be using libbpf.

Second, a userspace application could be using an older version of
libbpf on a newer kernel. (Notice this is not only due to older
distros, but also because projects using git submodule libbpf will
freeze at some point in time that worked).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
                     ` (2 preceding siblings ...)
  2020-06-08 19:00   ` kernel test robot
@ 2020-06-08 19:55   ` kernel test robot
  2020-06-08 19:55   ` [RFC PATCH] bpf: bpf_anon_inode_getfd() can be static kernel test robot
  2020-06-08 20:42   ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 kernel test robot
  5 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2020-06-08 19:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: kbuild-all, Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Andrii Nakryiko, Lorenzo Bianconi

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-avoid-using-returning-file-descriptor-value-zero/20200609-005457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: i386-randconfig-s001-20200609 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-247-gcadbd124-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> kernel/bpf/syscall.c:692:5: sparse: sparse: symbol 'bpf_anon_inode_getfd' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33961 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH] bpf: bpf_anon_inode_getfd() can be static
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
                     ` (3 preceding siblings ...)
  2020-06-08 19:55   ` kernel test robot
@ 2020-06-08 19:55   ` kernel test robot
  2020-06-08 20:42   ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 kernel test robot
  5 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2020-06-08 19:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: kbuild-all, Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Andrii Nakryiko, Lorenzo Bianconi


Signed-off-by: kernel test robot <lkp@intel.com>
---
 syscall.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6eba236aacd1f..fcd9860cdf148 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -689,8 +689,8 @@ const struct file_operations bpf_map_fops = {
 };
 
 /* Code is similar to anon_inode_getfd(), except starts at FD 1 */
-int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
-			 void *priv, int flags)
+static int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
+				void *priv, int flags)
 {
 	int error, fd;
 	struct file *file;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 19:44     ` Jesper Dangaard Brouer
@ 2020-06-08 20:05       ` Andrii Nakryiko
  0 siblings, 0 replies; 21+ messages in thread
From: Andrii Nakryiko @ 2020-06-08 20:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Ahern, bpf, Alexei Starovoitov, Networking,
	Daniel Borkmann, Lorenzo Bianconi

-- Andrii

On Mon, Jun 8, 2020 at 12:45 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Mon, 8 Jun 2020 11:36:33 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Mon, Jun 8, 2020 at 9:51 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> > >
> > > This patch change BPF syscall to avoid returning file descriptor value zero.
> > >
> > > As mentioned in cover letter, it is very impractical when extending kABI
> > > that the file-descriptor value 'zero' is valid, as this requires new fields
> > > must be initialised as minus-1. First step is to change the kernel such that
> > > BPF-syscall simply doesn't return value zero as a FD number.
> > >
> > > This patch achieves this by similar code to anon_inode_getfd(), with the
> > > exception of getting unused FD starting from 1. The kernel already supports
> > > starting from a specific FD value, as this is used by f_dupfd(). It seems
> > > simpler to replicate part of anon_inode_getfd() code and use this start from
> > > offset feature, instead of using f_dupfd() handling afterwards.
> >
> > Wouldn't it be better to just handle that on libbpf side? That way it
> > works on all kernels and doesn't require this duplication of logic
> > inside kernel?
>
> IMHO this is needed on the kernel side, to pair it with the API change.
> I don't see how doing this in libbpf can cover all cases.
>

In practice, it's going to be very rare that fd=0 is not already
allocated for application. So even not doing anything is going to work
in 99.9% of cases.

Think about this, any realistic BPF program will have a map or global
variable associated with it. So in the rare case where we have FD 0
not used, map will get that FD. Even if not, if you load your program
from ELF file, that ELF file will get FD 0. Even if not, modern BPF
programs will have BTF object associated, which will get FD 0. And so
on... Even daemons will probably already have some FD open for
whatever logging/output it needs to do (it doesn't have to be stdout).
So this FD=0 is very hypothetical case, very. You have to essentially
stage it consciously, to actually hit it.

Second of all, this BPF-specific FD allocation logic is just that --
duplication and extra code to maintain. Other folks in kernel will
eventually just be "huh? bpf needs its own anon_file API, why?!..." Do
we really want more of that?

Third, you already missed anon_inode_getfile use in bpf_link_prime(),
and in the future we'll undoubtedly will miss something else, so this
FD >= 1 from kernel will work only sometimes and no one will notice
until it breaks for someone, which won't happen for a while (because
it works as is most of the time, see above).

I'm not even talking about the fact that we are also subverting
standard Linux promise that a user gets the smallest available FD in
the system...

So I think libbpf can be kind to user and do:

int fd = bpf_whatever();
if (fd == 0) {
    fd = dup(0);
    close(0);
}

But even if it doesn't, not a big deal and probably no one ever will
have this problem with FD 0 for BPF program.

> First of all, some users might not be using libbpf.
>
> Second, a userspace application could be using an older version of
> libbpf on a newer kernel. (Notice this is not only due to older
> distros, but also because projects using git submodule libbpf will
> freeze at some point in time that worked).

Theoretically this is a problem, in practice libbpf is always more
up-to-date compared to kernel... so I don't buy this argument,
honestly :)

>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1
  2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
                     ` (4 preceding siblings ...)
  2020-06-08 19:55   ` [RFC PATCH] bpf: bpf_anon_inode_getfd() can be static kernel test robot
@ 2020-06-08 20:42   ` kernel test robot
  5 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2020-06-08 20:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern, bpf, Alexei Starovoitov
  Cc: kbuild-all, clang-built-linux, Jesper Dangaard Brouer, netdev,
	Daniel Borkmann, Andrii Nakryiko, Lorenzo Bianconi

[-- Attachment #1: Type: text/plain, Size: 2398 bytes --]

Hi Jesper,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf/master]

url:    https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/bpf-avoid-using-returning-file-descriptor-value-zero/20200609-005457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: x86_64-randconfig-r003-20200608 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project e429cffd4f228f70c1d9df0e5d77c08590dd9766)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> kernel/bpf/syscall.c:692:5: warning: no previous prototype for function 'bpf_anon_inode_getfd' [-Wmissing-prototypes]
int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
^
kernel/bpf/syscall.c:692:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
^
static
1 warning generated.

vim +/bpf_anon_inode_getfd +692 kernel/bpf/syscall.c

   690	
   691	/* Code is similar to anon_inode_getfd(), except starts at FD 1 */
 > 692	int bpf_anon_inode_getfd(const char *name, const struct file_operations *fops,
   693				 void *priv, int flags)
   694	{
   695		int error, fd;
   696		struct file *file;
   697	
   698		error = alloc_fd(1, flags);
   699		if (error < 0)
   700			return error;
   701		fd = error;
   702	
   703		file = anon_inode_getfile(name, fops, priv, flags);
   704		if (IS_ERR(file)) {
   705			error = PTR_ERR(file);
   706			goto err_put_unused_fd;
   707		}
   708		fd_install(fd, file);
   709	
   710		return fd;
   711	
   712	err_put_unused_fd:
   713		put_unused_fd(fd);
   714		return error;
   715	}
   716	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38120 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero
  2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2020-06-08 16:51 ` [PATCH bpf 3/3] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
@ 2020-06-09  1:34 ` Alexei Starovoitov
  2020-06-09  9:55   ` Jesper Dangaard Brouer
  2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
  3 siblings, 2 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-06-09  1:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Ahern, bpf, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

On Mon, Jun 08, 2020 at 06:51:12PM +0200, Jesper Dangaard Brouer wrote:
> Make it easier to handle UAPI/kABI extensions by avoid BPF using/returning
> file descriptor value zero. Use this in recent devmap extension to keep
> older applications compatible with newer kernels.
> 
> For special type maps (e.g. devmap and cpumap) the map-value data-layout is
> a configuration interface. This is a kernel Application Binary Interface
> (kABI) that can only be tail extended. Thus, new members (and thus features)
> can only be added to the end of this structure, and the kernel uses the
> map->value_size from userspace to determine feature set 'version'.

please drop these kabi references. As far as I know kabi is a redhat invention
and I'm not even sure what exactly it means.
'struct bpf_devmap_val' is uapi. No need to invent new names for existing concept.

> The recent extension of devmap with a bpf_prog.fd requires end-user to
> supply the file-descriptor value minus-1 to communicate that the features
> isn't used. This isn't compatible with the described kABI extension model.

non-zero prog_fd requirement exists already in bpf syscall. It's not recent.
So I don't think patch 1 is appropriate at this point. Certainly not
for bpf tree. We can argue about it usefulness when bpf-next reopens.
For now I think patches 2 and 3 are good to go.
Don't delete 'enum sk_action' in patch 2 though.
The rest looks good to me.
Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero
  2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
@ 2020-06-09  9:55   ` Jesper Dangaard Brouer
  2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
  1 sibling, 0 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-09  9:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Ahern, bpf, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi, brouer

On Mon, 8 Jun 2020 18:34:10 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jun 08, 2020 at 06:51:12PM +0200, Jesper Dangaard Brouer wrote:
> > Make it easier to handle UAPI/kABI extensions by avoid BPF using/returning
> > file descriptor value zero. Use this in recent devmap extension to keep
> > older applications compatible with newer kernels.
> > 
> > For special type maps (e.g. devmap and cpumap) the map-value data-layout is
> > a configuration interface. This is a kernel Application Binary Interface
> > (kABI) that can only be tail extended. Thus, new members (and thus features)
> > can only be added to the end of this structure, and the kernel uses the
> > map->value_size from userspace to determine feature set 'version'.  
> 
> please drop these kabi references. As far as I know kabi is a redhat invention
> and I'm not even sure what exactly it means.
> 'struct bpf_devmap_val' is uapi. No need to invent new names for existing concept.

Sure I can call it UAPI.

I was alluding to the difference between API and ABI, but it doesn't matter.
For the record, Red Hat didn't invent ABI (Application Binary Interface):
 https://en.wikipedia.org/wiki/Application_binary_interface


> > The recent extension of devmap with a bpf_prog.fd requires end-user to
> > supply the file-descriptor value minus-1 to communicate that the features
> > isn't used. This isn't compatible with the described kABI extension model.  
> 
> non-zero prog_fd requirement exists already in bpf syscall. It's not recent.
> So I don't think patch 1 is appropriate at this point. Certainly not
> for bpf tree. We can argue about it usefulness when bpf-next reopens.
> For now I think patches 2 and 3 are good to go.

Great.

> Don't delete 'enum sk_action' in patch 2 though.

Sorry, yes, that was a mistake.

> The rest looks good to me.

Thanks!
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release
  2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
  2020-06-09  9:55   ` Jesper Dangaard Brouer
@ 2020-06-09 13:31   ` Jesper Dangaard Brouer
  2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-09 13:31 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

For special type maps (e.g. devmap and cpumap) the map-value data-layout is
a configuration interface. This is uapi that can only be tail extended.
Thus, new members (and thus features) can only be added to the end of this
structure, and the kernel uses the map->value_size from userspace to
determine feature set 'version'.

For this kind of uapi to be extensible and backward compatible, is it common
that new members/fields (that represent a new feature) in the struct are
initialized as zero, which indicate that the feature isn't used. This makes
it possible to write userspace applications that are unaware of new kernel
features, but just include latest uapi headers, zero-init struct and
populate features it knows about.

The recent extension of devmap with a bpf_prog.fd requires end-user to
supply the file-descriptor value minus-1 to communicate that the features
isn't used. This isn't compatible with the described kABI extension model.

V2: Drop patch-1 that changed BPF-syscall to start at file-descriptor 1

---

Jesper Dangaard Brouer (2):
      bpf: devmap adjust uapi for attach bpf program
      bpf: selftests and tools use struct bpf_devmap_val from uapi


 include/uapi/linux/bpf.h                           |   13 +++++++++++++
 kernel/bpf/devmap.c                                |   17 ++++-------------
 tools/include/uapi/linux/bpf.h                     |   13 +++++++++++++
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   |    8 --------
 .../selftests/bpf/progs/test_xdp_devmap_helpers.c  |    2 +-
 .../bpf/progs/test_xdp_with_devmap_helpers.c       |    3 +--
 6 files changed, 32 insertions(+), 24 deletions(-)

--


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program
  2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
@ 2020-06-09 13:31     ` Jesper Dangaard Brouer
  2020-06-09 13:47       ` David Ahern
  2020-06-09 13:31     ` [PATCH bpf V2 2/2] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
  2020-06-09 19:03     ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Alexei Starovoitov
  2 siblings, 1 reply; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-09 13:31 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

V2:
- Defer changing BPF-syscall to start at file-descriptor 1
- Use {} to zero initialise struct.

The recent commit fbee97feed9b ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.

This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.

This patch remove the minus-1 checks, and have zero mean feature isn't used.

Followup patches either for kernel or libbpf should handle and avoid
returning file-descriptor zero in the first place.

Fixes: fbee97feed9b ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |   13 +++++++++++++
 kernel/bpf/devmap.c      |   17 ++++-------------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c65b374a5090..19684813faae 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3761,6 +3761,19 @@ struct xdp_md {
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
+/* DEVMAP map-value layout
+ *
+ * The struct data-layout of map-value is a configuration interface.
+ * New members can only be added to the end of this structure.
+ */
+struct bpf_devmap_val {
+	__u32 ifindex;   /* device index */
+	union {
+		int   fd;  /* prog fd on map write */
+		__u32 id;  /* prog id on map read */
+	} bpf_prog;
+};
+
 enum sk_action {
 	SK_DROP = 0,
 	SK_PASS,
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 854b09beb16b..899a30a67cc1 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -60,15 +60,6 @@ struct xdp_dev_bulk_queue {
 	unsigned int count;
 };
 
-/* DEVMAP values */
-struct bpf_devmap_val {
-	u32 ifindex;   /* device index */
-	union {
-		int fd;  /* prog fd on map write */
-		u32 id;  /* prog id on map read */
-	} bpf_prog;
-};
-
 struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct hlist_node index_hlist;
@@ -618,7 +609,7 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 	if (!dev->dev)
 		goto err_out;
 
-	if (val->bpf_prog.fd >= 0) {
+	if (val->bpf_prog.fd > 0) {
 		prog = bpf_prog_get_type_dev(val->bpf_prog.fd,
 					     BPF_PROG_TYPE_XDP, false);
 		if (IS_ERR(prog))
@@ -652,8 +643,8 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
 				 void *key, void *value, u64 map_flags)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
 	struct bpf_dtab_netdev *dev, *old_dev;
+	struct bpf_devmap_val val = {};
 	u32 i = *(u32 *)key;
 
 	if (unlikely(map_flags > BPF_EXIST))
@@ -669,7 +660,7 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
 	if (!val.ifindex) {
 		dev = NULL;
 		/* can not specify fd if ifindex is 0 */
-		if (val.bpf_prog.fd != -1)
+		if (val.bpf_prog.fd > 0)
 			return -EINVAL;
 	} else {
 		dev = __dev_map_alloc_node(net, dtab, &val, i);
@@ -699,8 +690,8 @@ static int __dev_map_hash_update_elem(struct net *net, struct bpf_map *map,
 				     void *key, void *value, u64 map_flags)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct bpf_devmap_val val = { .bpf_prog.fd = -1 };
 	struct bpf_dtab_netdev *dev, *old_dev;
+	struct bpf_devmap_val val = {};
 	u32 idx = *(u32 *)key;
 	unsigned long flags;
 	int err = -EEXIST;



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH bpf V2 2/2] bpf: selftests and tools use struct bpf_devmap_val from uapi
  2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
  2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
@ 2020-06-09 13:31     ` Jesper Dangaard Brouer
  2020-06-09 19:03     ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Alexei Starovoitov
  2 siblings, 0 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-09 13:31 UTC (permalink / raw)
  To: David Ahern, bpf, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

Sync tools uapi bpf.h header file and update selftests that use
struct bpf_devmap_val.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 tools/include/uapi/linux/bpf.h                     |   13 +++++++++++++
 .../selftests/bpf/prog_tests/xdp_devmap_attach.c   |    8 --------
 .../selftests/bpf/progs/test_xdp_devmap_helpers.c  |    2 +-
 .../bpf/progs/test_xdp_with_devmap_helpers.c       |    3 +--
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c65b374a5090..19684813faae 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3761,6 +3761,19 @@ struct xdp_md {
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
+/* DEVMAP map-value layout
+ *
+ * The struct data-layout of map-value is a configuration interface.
+ * New members can only be added to the end of this structure.
+ */
+struct bpf_devmap_val {
+	__u32 ifindex;   /* device index */
+	union {
+		int   fd;  /* prog fd on map write */
+		__u32 id;  /* prog id on map read */
+	} bpf_prog;
+};
+
 enum sk_action {
 	SK_DROP = 0,
 	SK_PASS,
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
index d19dbd668f6a..88ef3ec8ac4c 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
@@ -8,14 +8,6 @@
 
 #define IFINDEX_LO 1
 
-struct bpf_devmap_val {
-	u32 ifindex;   /* device index */
-	union {
-		int fd;  /* prog fd on map write */
-		u32 id;  /* prog id on map read */
-	} bpf_prog;
-};
-
 void test_xdp_with_devmap_helpers(void)
 {
 	struct test_xdp_with_devmap_helpers *skel;
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
index e5c0f131c8a7..b360ba2bd441 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_devmap_helpers.c
@@ -2,7 +2,7 @@
 /* fails to load without expected_attach_type = BPF_XDP_DEVMAP
  * because of access to egress_ifindex
  */
-#include "vmlinux.h"
+#include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
 SEC("xdp_dm_log")
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
index deef0e050863..330811260123 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
-
-#include "vmlinux.h"
+#include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
 struct {



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program
  2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
@ 2020-06-09 13:47       ` David Ahern
  2020-06-09 15:12         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2020-06-09 13:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, bpf, Alexei Starovoitov
  Cc: netdev, Daniel Borkmann, Andrii Nakryiko, Lorenzo Bianconi

On 6/9/20 7:31 AM, Jesper Dangaard Brouer wrote:
> This patch remove the minus-1 checks, and have zero mean feature isn't used.
> 

For consistency this should apply to other XDP fd uses as well -- like
IFLA_XDP_EXPECTED_FD and IFLA_XDP_FD.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program
  2020-06-09 13:47       ` David Ahern
@ 2020-06-09 15:12         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2020-06-09 15:12 UTC (permalink / raw)
  To: David Ahern
  Cc: bpf, Alexei Starovoitov, netdev, Daniel Borkmann,
	Andrii Nakryiko, Lorenzo Bianconi, brouer

On Tue, 9 Jun 2020 07:47:06 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 6/9/20 7:31 AM, Jesper Dangaard Brouer wrote:
> > This patch remove the minus-1 checks, and have zero mean feature isn't used.
> >   
> 
> For consistency this should apply to other XDP fd uses as well -- like
> IFLA_XDP_EXPECTED_FD and IFLA_XDP_FD.

I agree, but I choose to limit the scope as this is for bpf-tree.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release
  2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
  2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
  2020-06-09 13:31     ` [PATCH bpf V2 2/2] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
@ 2020-06-09 19:03     ` Alexei Starovoitov
  2 siblings, 0 replies; 21+ messages in thread
From: Alexei Starovoitov @ 2020-06-09 19:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Ahern, bpf, netdev, Daniel Borkmann, Andrii Nakryiko,
	Lorenzo Bianconi

On Tue, Jun 09, 2020 at 03:31:41PM +0200, Jesper Dangaard Brouer wrote:
> For special type maps (e.g. devmap and cpumap) the map-value data-layout is
> a configuration interface. This is uapi that can only be tail extended.
> Thus, new members (and thus features) can only be added to the end of this
> structure, and the kernel uses the map->value_size from userspace to
> determine feature set 'version'.
> 
> For this kind of uapi to be extensible and backward compatible, is it common
> that new members/fields (that represent a new feature) in the struct are
> initialized as zero, which indicate that the feature isn't used. This makes
> it possible to write userspace applications that are unaware of new kernel
> features, but just include latest uapi headers, zero-init struct and
> populate features it knows about.
> 
> The recent extension of devmap with a bpf_prog.fd requires end-user to
> supply the file-descriptor value minus-1 to communicate that the features
> isn't used. This isn't compatible with the described kABI extension model.

Applied to bpf tree without this cover letter, because I don't want
folks to read above and start using kabi terminology liks this.
I've never seen a definition of kabi. I've heard redhat has something, but
I don't know what it is and really not interested to find out.
Studying amd64 psABI, sparc psABI, gABI was enough of time sink.
When folks use ABI they really mean binary. 
Old binaries that use devmap_val will work as-is with newer kernel.
There is no binary breakage due to devmap_val.
Whereas what you describe above is what will happen if something gets
recompiled. It's an API quirk. And arguable not an UAPI breakage.
UAPI structs have to initialized.
There is a struct and there is initializer for it.
Like if you did 'spinlock_t lock;' and it got broken with new kernel
it's programmers fault. It's not uapi and certainly not abi issue.
DEFINE_SPINLOCK() should have been used.
Same thing with user space.
'struct bpf_devmap_val' would be ok from uapi pov even with -1.
It's just much more convenient to have zero init. Less error prone, etc.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-06-09 19:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 16:51 [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Jesper Dangaard Brouer
2020-06-08 16:51 ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 Jesper Dangaard Brouer
2020-06-08 18:28   ` Toke Høiland-Jørgensen
2020-06-08 18:36   ` Andrii Nakryiko
2020-06-08 19:44     ` Jesper Dangaard Brouer
2020-06-08 20:05       ` Andrii Nakryiko
2020-06-08 19:00   ` kernel test robot
2020-06-08 19:55   ` kernel test robot
2020-06-08 19:55   ` [RFC PATCH] bpf: bpf_anon_inode_getfd() can be static kernel test robot
2020-06-08 20:42   ` [PATCH bpf 1/3] bpf: syscall to start at file-descriptor 1 kernel test robot
2020-06-08 16:51 ` [PATCH bpf 2/3] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
2020-06-08 18:30   ` Toke Høiland-Jørgensen
2020-06-08 16:51 ` [PATCH bpf 3/3] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
2020-06-09  1:34 ` [PATCH bpf 0/3] bpf: avoid using/returning file descriptor value zero Alexei Starovoitov
2020-06-09  9:55   ` Jesper Dangaard Brouer
2020-06-09 13:31   ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Jesper Dangaard Brouer
2020-06-09 13:31     ` [PATCH bpf V2 1/2] bpf: devmap adjust uapi for attach bpf program Jesper Dangaard Brouer
2020-06-09 13:47       ` David Ahern
2020-06-09 15:12         ` Jesper Dangaard Brouer
2020-06-09 13:31     ` [PATCH bpf V2 2/2] bpf: selftests and tools use struct bpf_devmap_val from uapi Jesper Dangaard Brouer
2020-06-09 19:03     ` [PATCH bpf V2 0/2] bpf: adjust uapi for devmap prior to kernel release Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).