All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
@ 2021-06-09 10:04 Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 1/5] ebpf: Added eBPF initialization by fds Andrew Melnychenko
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

Libvirt usually launches qemu with strict permissions.
To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.

Added property "ebpf_rss_fds" for "virtio-net" that allows to
initialize eBPF RSS context with passed program & maps fds.

Added qemu-ebpf-rss-helper - simple helper that loads eBPF
context and passes fds through unix socket.
Libvirt should call the helper and pass fds to qemu through
"ebpf_rss_fds" property.

Added explicit target OS check for libbpf dependency in meson.
eBPF RSS works only with Linux TAP, so there is no reason to
build eBPF loader/helper for non-Linux.

Overall, libvirt process should not be aware of the "interface"
of eBPF RSS, it will not be aware of eBPF maps/program "type" and
their quantity. That's why qemu and the helper should be from
the same build and be "synchronized". Technically each qemu may
have its own helper. That's why "query-helper-paths" qmp command
was added. Qemu should return the path to the helper that suits
and libvirt should use "that" helper for "that" emulator.

qmp sample:
C: { "execute": "query-helper-paths" }
S: { "return": [
     {
       "name": "qemu-ebpf-rss-helper",
       "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
     }
    ]
   }

Andrew Melnychenko (5):
  ebpf: Added eBPF initialization by fds.
  virtio-net: Added property to load eBPF RSS with fds.
  ebpf_rss_helper: Added helper for eBPF RSS.
  qmp: Added qemu-ebpf-rss-path command.
  meson: libbpf dependency now exclusively for Linux.

 ebpf/ebpf_rss-stub.c           |   6 ++
 ebpf/ebpf_rss.c                |  31 +++++++-
 ebpf/ebpf_rss.h                |   5 ++
 ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
 hw/net/virtio-net.c            |  77 ++++++++++++++++++-
 include/hw/virtio/virtio-net.h |   1 +
 meson.build                    |  37 ++++++----
 monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
 qapi/misc.json                 |  29 ++++++++
 9 files changed, 374 insertions(+), 20 deletions(-)
 create mode 100644 ebpf/qemu-ebpf-rss-helper.c

-- 
2.31.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH 1/5] ebpf: Added eBPF initialization by fds.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
@ 2021-06-09 10:04 ` Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

eBPF RSS context may be initialized by program fd and map fds.
virtio-net may provide fds passed by libvirt.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/ebpf_rss-stub.c |  6 ++++++
 ebpf/ebpf_rss.c      | 31 ++++++++++++++++++++++++++++---
 ebpf/ebpf_rss.h      |  5 +++++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
index e71e229190..8d7fae2ad9 100644
--- a/ebpf/ebpf_rss-stub.c
+++ b/ebpf/ebpf_rss-stub.c
@@ -28,6 +28,12 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
     return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd)
+{
+    return false;
+}
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
                       uint16_t *indirections_table, uint8_t *toeplitz_key)
 {
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index 118c68da83..6e9b88efed 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -27,19 +27,20 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
     if (ctx != NULL) {
         ctx->obj = NULL;
+        ctx->program_fd = -1;
     }
 }
 
 bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 {
-    return ctx != NULL && ctx->obj != NULL;
+    return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
 }
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 {
     struct rss_bpf *rss_bpf_ctx;
 
-    if (ctx == NULL) {
+    if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
         return false;
     }
 
@@ -70,10 +71,26 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 error:
     rss_bpf__destroy(rss_bpf_ctx);
     ctx->obj = NULL;
+    ctx->program_fd = -1;
 
     return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd)
+{
+    if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
+
+    ctx->program_fd = program_fd;
+    ctx->map_configuration = config_fd;
+    ctx->map_toeplitz_key = toeplitz_fd;
+    ctx->map_indirections_table = table_fd;
+
+    return true;
+}
+
 static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
                                 struct EBPFRSSConfig *config)
 {
@@ -160,6 +177,14 @@ void ebpf_rss_unload(struct EBPFRSSContext *ctx)
         return;
     }
 
-    rss_bpf__destroy(ctx->obj);
+    if (ctx->obj != NULL) {
+        rss_bpf__destroy(ctx->obj);
+    } else {
+        close(ctx->program_fd);
+        close(ctx->map_configuration);
+        close(ctx->map_toeplitz_key);
+        close(ctx->map_indirections_table);
+    }
     ctx->obj = NULL;
+    ctx->program_fd = -1;
 }
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
index bf3f2572c7..363859c8bb 100644
--- a/ebpf/ebpf_rss.h
+++ b/ebpf/ebpf_rss.h
@@ -14,6 +14,8 @@
 #ifndef QEMU_EBPF_RSS_H
 #define QEMU_EBPF_RSS_H
 
+#define EBPF_RSS_MAX_FDS 4
+
 struct EBPFRSSContext {
     void *obj;
     int program_fd;
@@ -36,6 +38,9 @@ bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx);
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd);
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
                       uint16_t *indirections_table, uint8_t *toeplitz_key);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 1/5] ebpf: Added eBPF initialization by fds Andrew Melnychenko
@ 2021-06-09 10:04 ` Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 3/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

eBPF RSS program and maps now may be passed during initialization.
Initially was implemented for libvirt to launch qemu without permissions.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 hw/net/virtio-net.c            | 77 ++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index bd7958b9f0..0602b1772e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "monitor/qdev.h"
+#include "monitor/monitor.h"
 #include "hw/pci/pci.h"
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
@@ -1223,14 +1224,81 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
     virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n)
+static int virtio_net_get_ebpf_rss_fds(char *str, char *fds[], int nfds)
 {
-    if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-        /* backend does't support steering ebpf */
+    char *ptr = str;
+    char *cur = NULL;
+    size_t len = strlen(str);
+    int i = 0;
+
+    for (; i < nfds && ptr < str + len;) {
+        cur = strchr(ptr, ':');
+
+        if (cur == NULL) {
+            fds[i] = g_strdup(ptr);
+        } else {
+            fds[i] = g_strndup(ptr, cur - ptr);
+        }
+
+        i++;
+        if (cur == NULL) {
+            break;
+        } else {
+            ptr = cur + 1;
+        }
+    }
+
+    return i;
+}
+
+static bool virtio_net_load_ebpf_fds(VirtIONet *n)
+{
+    char *fds_strs[EBPF_RSS_MAX_FDS];
+    int fds[EBPF_RSS_MAX_FDS];
+    int nfds;
+    int ret = false;
+    Error *errp;
+    int i = 0;
+
+    if (n == NULL || !n->ebpf_rss_fds) {
         return false;
     }
 
-    return ebpf_rss_load(&n->ebpf_rss);
+    nfds = virtio_net_get_ebpf_rss_fds(n->ebpf_rss_fds,
+                                       fds_strs, EBPF_RSS_MAX_FDS);
+    for (i = 0; i < nfds; i++) {
+        fds[i] = monitor_fd_param(monitor_cur(), fds_strs[i], &errp);
+    }
+
+    if (nfds == EBPF_RSS_MAX_FDS) {
+        ret = ebpf_rss_load_fds(&n->ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
+    }
+
+    if (!ret) {
+        for (i = 0; i < nfds; i++) {
+            close(fds[i]);
+        }
+    }
+
+    for (i = 0; i < nfds; i++) {
+        g_free(fds_strs[i]);
+    }
+
+    return ret;
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n)
+{
+    bool ret = true;
+
+    if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+        if (!(n->ebpf_rss_fds
+                && virtio_net_load_ebpf_fds(n))) {
+            ret = ebpf_rss_load(&n->ebpf_rss);
+        }
+    }
+
+    return ret;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3605,6 +3673,7 @@ static Property virtio_net_properties[] = {
                     VIRTIO_NET_F_RSS, false),
     DEFINE_PROP_BIT64("hash", VirtIONet, host_features,
                     VIRTIO_NET_F_HASH_REPORT, false),
+    DEFINE_PROP_STRING("ebpf_rss_fds", VirtIONet, ebpf_rss_fds),
     DEFINE_PROP_BIT64("guest_rsc_ext", VirtIONet, host_features,
                     VIRTIO_NET_F_RSC_EXT, false),
     DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 824a69c23f..993f2f3036 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -213,6 +213,7 @@ struct VirtIONet {
     VirtioNetRssData rss_data;
     struct NetRxPkt *rx_pkt;
     struct EBPFRSSContext ebpf_rss;
+    char *ebpf_rss_fds;
 };
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 3/5] ebpf_rss_helper: Added helper for eBPF RSS.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 1/5] ebpf: Added eBPF initialization by fds Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
@ 2021-06-09 10:04 ` Andrew Melnychenko
  2021-06-09 10:04 ` [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command Andrew Melnychenko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

Helper program. Loads eBPF RSS program and maps and passes them through unix socket.
Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/qemu-ebpf-rss-helper.c | 130 ++++++++++++++++++++++++++++++++++++
 meson.build                 |   8 +++
 2 files changed, 138 insertions(+)
 create mode 100644 ebpf/qemu-ebpf-rss-helper.c

diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
new file mode 100644
index 0000000000..f0379dddf2
--- /dev/null
+++ b/ebpf/qemu-ebpf-rss-helper.c
@@ -0,0 +1,130 @@
+/*
+ * eBPF RSS Helper
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This is helper program for libvirtd.
+ *              It loads eBPF RSS program and passes fds through unix socket.
+ *              Built by meson, target - 'qemu-ebpf-rss-helper'.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <getopt.h>
+#include <memory.h>
+#include <errno.h>
+#include <sys/socket.h>
+
+#include "ebpf_rss.h"
+
+static int send_fds(int socket, int *fds, int n)
+{
+    struct msghdr msg = {};
+    struct cmsghdr *cmsg = NULL;
+    char buf[CMSG_SPACE(n * sizeof(int))];
+    char dummy_buffer = 0;
+    struct iovec io = { .iov_base = &dummy_buffer,
+                        .iov_len = sizeof(dummy_buffer) };
+
+    memset(buf, 0, sizeof(buf));
+
+    msg.msg_iov = &io;
+    msg.msg_iovlen = 1;
+    msg.msg_control = buf;
+    msg.msg_controllen = sizeof(buf);
+
+    cmsg = CMSG_FIRSTHDR(&msg);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
+
+    memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
+
+    return sendmsg(socket, &msg, 0);
+}
+
+static void print_help_and_exit(const char *prog, int exitcode)
+{
+    fprintf(stderr, "%s - load eBPF RSS program for qemu and pass eBPF fds"
+            " through unix socket.\n", prog);
+    fprintf(stderr, "\t--fd <num>, -f <num> - unix socket file descriptor"
+            " used to pass eBPF fds.\n");
+    fprintf(stderr, "\t--help, -h - this help.\n");
+    exit(exitcode);
+}
+
+int main(int argc, char **argv)
+{
+    char *fd_string = NULL;
+    int unix_fd = 0;
+    struct EBPFRSSContext ctx = {};
+    int fds[EBPF_RSS_MAX_FDS] = {};
+    int ret = -1;
+
+    for (;;) {
+        int c;
+        static struct option long_options[] = {
+                {"help",  no_argument, 0, 'h'},
+                {"fd",  required_argument, 0, 'f'},
+                {0, 0, 0, 0}
+        };
+        c = getopt_long(argc, argv, "hf:",
+                long_options, NULL);
+
+        if (c == -1) {
+            break;
+        }
+
+        switch (c) {
+        case 'f':
+            fd_string = optarg;
+            break;
+        case 'h':
+        default:
+            print_help_and_exit(argv[0],
+                    c == 'h' ? EXIT_SUCCESS : EXIT_FAILURE);
+        }
+    }
+
+    if (!fd_string) {
+        fprintf(stderr, "Unix file descriptor not present.\n");
+        print_help_and_exit(argv[0], EXIT_FAILURE);
+    }
+
+    fprintf(stderr, "FD %s\n", fd_string);
+
+    unix_fd = atoi(fd_string);
+
+    if (!unix_fd) {
+        fprintf(stderr, "Unix file descriptor is invalid.\n");
+        return EXIT_FAILURE;
+    }
+
+    ebpf_rss_init(&ctx);
+    if (!ebpf_rss_load(&ctx)) {
+        fprintf(stderr, "Can't load ebpf.\n");
+        return EXIT_FAILURE;
+    }
+    fds[0] = ctx.program_fd;
+    fds[1] = ctx.map_configuration;
+    fds[2] = ctx.map_toeplitz_key;
+    fds[3] = ctx.map_indirections_table;
+
+    ret = send_fds(unix_fd, fds, EBPF_RSS_MAX_FDS);
+    if (ret < 0) {
+        fprintf(stderr, "Issue while sending fds: %s.\n", strerror(errno));
+    }
+
+    ebpf_rss_unload(&ctx);
+
+    return ret < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
diff --git a/meson.build b/meson.build
index 626cf932c1..ce26bddead 100644
--- a/meson.build
+++ b/meson.build
@@ -2413,6 +2413,14 @@ if have_tools
                dependencies: [authz, crypto, io, qom, qemuutil,
                               libcap_ng, mpathpersist],
                install: true)
+
+    if libbpf.found()
+        executable('qemu-ebpf-rss-helper', files(
+                   'ebpf/qemu-ebpf-rss-helper.c', 'ebpf/ebpf_rss.c'),
+                   dependencies: [qemuutil, libbpf, glib],
+                   install: true,
+                   install_dir: get_option('libexecdir'))
+    endif
   endif
 
   if 'CONFIG_IVSHMEM' in config_host
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
                   ` (2 preceding siblings ...)
  2021-06-09 10:04 ` [RFC PATCH 3/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
@ 2021-06-09 10:04 ` Andrew Melnychenko
  2021-06-11 14:15   ` Eric Blake
  2021-06-12  5:28   ` Markus Armbruster
  2021-06-09 10:04 ` [RFC PATCH 5/5] meson: libbpf dependency now exclusively for Linux Andrew Melnychenko
  2021-06-10  6:41 ` [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Jason Wang
  5 siblings, 2 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

New qmp command to query ebpf helper.
It's crucial that qemu and helper are in sync and in touch.
Technically helper should pass eBPF fds that qemu may accept.
And different qemu's builds may have different eBPF programs and helpers.
Qemu returns helper that should "fit" to virtio-net.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 qapi/misc.json     | 29 +++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index f7d64a6457..5dd2a58ea2 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error **errp)
         abort();
     }
 }
+
+#ifdef CONFIG_LINUX
+
+static const char *get_dirname(char *path)
+{
+    char *sep;
+
+    sep = strrchr(path, '/');
+    if (sep == path) {
+        return "/";
+    } else if (sep) {
+        *sep = 0;
+        return path;
+    }
+    return ".";
+}
+
+static char *find_helper(const char *name)
+{
+    char qemu_exec[PATH_MAX];
+    const char *qemu_dir = NULL;
+    char *helper = NULL;
+
+    if (name == NULL) {
+        return NULL;
+    }
+
+    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
+        qemu_dir = get_dirname(qemu_exec);
+
+        helper = g_strdup_printf("%s/%s", qemu_dir, name);
+        if (access(helper, F_OK) == 0) {
+            return helper;
+        }
+        g_free(helper);
+    }
+
+    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
+    if (access(helper, F_OK) == 0) {
+        return helper;
+    }
+    g_free(helper);
+
+    return NULL;
+}
+
+HelperPathList *qmp_query_helper_paths(Error **errp)
+{
+    HelperPathList *ret = NULL;
+    const char *helpers_list[] = {
+#ifdef CONFIG_EBPF
+        "qemu-ebpf-rss-helper",
+#endif
+        NULL
+    };
+    const char **helper_iter = helpers_list;
+
+    for (; *helper_iter != NULL; ++helper_iter) {
+        char *path = find_helper(*helper_iter);
+        if (path) {
+            HelperPath *helper = g_new0(HelperPath, 1);
+            helper->name = g_strdup(*helper_iter);
+            helper->path = path;
+
+            QAPI_LIST_PREPEND(ret, helper);
+        }
+    }
+
+    return ret;
+}
+#else
+
+HelperPathList *qmp_query_helper_paths(Error **errp)
+{
+    return NULL;
+}
+
+#endif
diff --git a/qapi/misc.json b/qapi/misc.json
index 156f98203e..023bd2120d 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -519,3 +519,32 @@
  'data': { '*option': 'str' },
  'returns': ['CommandLineOptionInfo'],
  'allow-preconfig': true }
+
+##
+# @HelperPath:
+#
+# Name of the helper and binary location.
+##
+{ 'struct': 'HelperPath',
+  'data': {'name': 'str', 'path': 'str'} }
+
+##
+# @query-helper-paths:
+#
+# Query specific eBPF RSS helper for current qemu binary.
+#
+# Returns: list of object that contains name and path for helper.
+#
+# Example:
+#
+# -> { "execute": "query-helper-paths" }
+# <- { "return": [
+#        {
+#          "name": "qemu-ebpf-rss-helper",
+#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
+#        }
+#      ]
+#    }
+#
+##
+{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH 5/5] meson: libbpf dependency now exclusively for Linux.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
                   ` (3 preceding siblings ...)
  2021-06-09 10:04 ` [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command Andrew Melnychenko
@ 2021-06-09 10:04 ` Andrew Melnychenko
  2021-06-10  6:41 ` [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Jason Wang
  5 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnychenko @ 2021-06-09 10:04 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

Libbpf is used for eBPF RSS steering, which is
supported only by Linux TAP.
There is no reason yet to build eBPF loader and
helper for non Linux systems, even if libbpf is
present.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 meson.build | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/meson.build b/meson.build
index ce26bddead..fa6c48fb86 100644
--- a/meson.build
+++ b/meson.build
@@ -1033,19 +1033,22 @@ if not get_option('fuse_lseek').disabled()
 endif
 
 # libbpf
-libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
-if libbpf.found() and not cc.links('''
-   #include <bpf/libbpf.h>
-   int main(void)
-   {
-     bpf_object__destroy_skeleton(NULL);
-     return 0;
-   }''', dependencies: libbpf)
-  libbpf = not_found
-  if get_option('bpf').enabled()
-    error('libbpf skeleton test failed')
-  else
-    warning('libbpf skeleton test failed, disabling')
+libbpf = not_found
+if targetos == 'linux'
+  libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
+  if libbpf.found() and not cc.links('''
+    #include <bpf/libbpf.h>
+    int main(void)
+    {
+      bpf_object__destroy_skeleton(NULL);
+      return 0;
+    }''', dependencies: libbpf)
+    libbpf = not_found
+    if get_option('bpf').enabled()
+      error('libbpf skeleton test failed')
+    else
+      warning('libbpf skeleton test failed, disabling')
+    endif
   endif
 endif
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
                   ` (4 preceding siblings ...)
  2021-06-09 10:04 ` [RFC PATCH 5/5] meson: libbpf dependency now exclusively for Linux Andrew Melnychenko
@ 2021-06-10  6:41 ` Jason Wang
  2021-06-10  6:55   ` Yuri Benditovich
  5 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-10  6:41 UTC (permalink / raw)
  To: Andrew Melnychenko, mst, yuri.benditovich, armbru, eblake, berrange
  Cc: yan, qemu-devel


在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> Libvirt usually launches qemu with strict permissions.
> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.


A silly question:

Kernel had the following permission checks in bpf syscall:

        if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
                 return -EPERM;
...

         err = security_bpf(cmd, &attr, size);
         if (err < 0)
                 return err;

So if I understand the code correctly, bpf syscall can only be done if:

1) unprivileged_bpf is enabled or
2) has the capability  and pass the LSM checks

So I think the series is for unprivileged_bpf disabled. If I'm not 
wrong, I guess the policy is to grant CAP_BPF but do fine grain checks 
via LSM.

If this is correct, need to describe it in the commit log.


>
> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> initialize eBPF RSS context with passed program & maps fds.
>
> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> context and passes fds through unix socket.
> Libvirt should call the helper and pass fds to qemu through
> "ebpf_rss_fds" property.
>
> Added explicit target OS check for libbpf dependency in meson.
> eBPF RSS works only with Linux TAP, so there is no reason to
> build eBPF loader/helper for non-Linux.
>
> Overall, libvirt process should not be aware of the "interface"
> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> their quantity.


I'm not sure this is the best. We have several examples that let libvirt 
to involve. Examples:

1) create TAP device (and the TUN_SETIFF)

2) open vhost devices


>   That's why qemu and the helper should be from
> the same build and be "synchronized". Technically each qemu may
> have its own helper. That's why "query-helper-paths" qmp command
> was added. Qemu should return the path to the helper that suits
> and libvirt should use "that" helper for "that" emulator.
>
> qmp sample:
> C: { "execute": "query-helper-paths" }
> S: { "return": [
>       {
>         "name": "qemu-ebpf-rss-helper",
>         "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>       }
>      ]
>     }


I think we need an example on the detail steps for how libvirt is 
expected to use this.

Thanks


>
> Andrew Melnychenko (5):
>    ebpf: Added eBPF initialization by fds.
>    virtio-net: Added property to load eBPF RSS with fds.
>    ebpf_rss_helper: Added helper for eBPF RSS.
>    qmp: Added qemu-ebpf-rss-path command.
>    meson: libbpf dependency now exclusively for Linux.
>
>   ebpf/ebpf_rss-stub.c           |   6 ++
>   ebpf/ebpf_rss.c                |  31 +++++++-
>   ebpf/ebpf_rss.h                |   5 ++
>   ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
>   hw/net/virtio-net.c            |  77 ++++++++++++++++++-
>   include/hw/virtio/virtio-net.h |   1 +
>   meson.build                    |  37 ++++++----
>   monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
>   qapi/misc.json                 |  29 ++++++++
>   9 files changed, 374 insertions(+), 20 deletions(-)
>   create mode 100644 ebpf/qemu-ebpf-rss-helper.c
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-10  6:41 ` [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Jason Wang
@ 2021-06-10  6:55   ` Yuri Benditovich
  2021-06-11  5:36     ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Yuri Benditovich @ 2021-06-10  6:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Melnychenko, Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yan Vugenfirer, Eric Blake

On Thu, Jun 10, 2021 at 9:41 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> > Libvirt usually launches qemu with strict permissions.
> > To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>
>
> A silly question:
>
> Kernel had the following permission checks in bpf syscall:
>
>         if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>                  return -EPERM;
> ...
>
>          err = security_bpf(cmd, &attr, size);
>          if (err < 0)
>                  return err;
>
> So if I understand the code correctly, bpf syscall can only be done if:
>
> 1) unprivileged_bpf is enabled or
> 2) has the capability  and pass the LSM checks
>
> So I think the series is for unprivileged_bpf disabled. If I'm not
> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> via LSM.
>
> If this is correct, need to describe it in the commit log.
>
>
> >
> > Added property "ebpf_rss_fds" for "virtio-net" that allows to
> > initialize eBPF RSS context with passed program & maps fds.
> >
> > Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> > context and passes fds through unix socket.
> > Libvirt should call the helper and pass fds to qemu through
> > "ebpf_rss_fds" property.
> >
> > Added explicit target OS check for libbpf dependency in meson.
> > eBPF RSS works only with Linux TAP, so there is no reason to
> > build eBPF loader/helper for non-Linux.
> >
> > Overall, libvirt process should not be aware of the "interface"
> > of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> > their quantity.
>
>
> I'm not sure this is the best. We have several examples that let libvirt
> to involve. Examples:
>
> 1) create TAP device (and the TUN_SETIFF)
>
> 2) open vhost devices
>
>
> >   That's why qemu and the helper should be from
> > the same build and be "synchronized". Technically each qemu may
> > have its own helper. That's why "query-helper-paths" qmp command
> > was added. Qemu should return the path to the helper that suits
> > and libvirt should use "that" helper for "that" emulator.
> >
> > qmp sample:
> > C: { "execute": "query-helper-paths" }
> > S: { "return": [
> >       {
> >         "name": "qemu-ebpf-rss-helper",
> >         "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >       }
> >      ]
> >     }
>
>
> I think we need an example on the detail steps for how libvirt is
> expected to use this.

The preliminary patches for libvirt are at
https://github.com/daynix/libvirt/tree/RSSv1

>
> Thanks
>
>
> >
> > Andrew Melnychenko (5):
> >    ebpf: Added eBPF initialization by fds.
> >    virtio-net: Added property to load eBPF RSS with fds.
> >    ebpf_rss_helper: Added helper for eBPF RSS.
> >    qmp: Added qemu-ebpf-rss-path command.
> >    meson: libbpf dependency now exclusively for Linux.
> >
> >   ebpf/ebpf_rss-stub.c           |   6 ++
> >   ebpf/ebpf_rss.c                |  31 +++++++-
> >   ebpf/ebpf_rss.h                |   5 ++
> >   ebpf/qemu-ebpf-rss-helper.c    | 130 +++++++++++++++++++++++++++++++++
> >   hw/net/virtio-net.c            |  77 ++++++++++++++++++-
> >   include/hw/virtio/virtio-net.h |   1 +
> >   meson.build                    |  37 ++++++----
> >   monitor/qmp-cmds.c             |  78 ++++++++++++++++++++
> >   qapi/misc.json                 |  29 ++++++++
> >   9 files changed, 374 insertions(+), 20 deletions(-)
> >   create mode 100644 ebpf/qemu-ebpf-rss-helper.c
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-10  6:55   ` Yuri Benditovich
@ 2021-06-11  5:36     ` Jason Wang
  2021-06-11 16:49       ` Andrew Melnichenko
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-11  5:36 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Andrew Melnychenko, Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yan Vugenfirer, Eric Blake


在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com>  wrote:
>> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>>> Libvirt usually launches qemu with strict permissions.
>>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>> A silly question:
>>
>> Kernel had the following permission checks in bpf syscall:
>>
>>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>>                   return -EPERM;
>> ...
>>
>>           err = security_bpf(cmd, &attr, size);
>>           if (err < 0)
>>                   return err;
>>
>> So if I understand the code correctly, bpf syscall can only be done if:
>>
>> 1) unprivileged_bpf is enabled or
>> 2) has the capability  and pass the LSM checks
>>
>> So I think the series is for unprivileged_bpf disabled. If I'm not
>> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
>> via LSM.
>>
>> If this is correct, need to describe it in the commit log.
>>
>>
>>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>>> initialize eBPF RSS context with passed program & maps fds.
>>>
>>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>>> context and passes fds through unix socket.
>>> Libvirt should call the helper and pass fds to qemu through
>>> "ebpf_rss_fds" property.
>>>
>>> Added explicit target OS check for libbpf dependency in meson.
>>> eBPF RSS works only with Linux TAP, so there is no reason to
>>> build eBPF loader/helper for non-Linux.
>>>
>>> Overall, libvirt process should not be aware of the "interface"
>>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
>>> their quantity.
>> I'm not sure this is the best. We have several examples that let libvirt
>> to involve. Examples:
>>
>> 1) create TAP device (and the TUN_SETIFF)
>>
>> 2) open vhost devices
>>
>>
>>>    That's why qemu and the helper should be from
>>> the same build and be "synchronized". Technically each qemu may
>>> have its own helper. That's why "query-helper-paths" qmp command
>>> was added. Qemu should return the path to the helper that suits
>>> and libvirt should use "that" helper for "that" emulator.
>>>
>>> qmp sample:
>>> C: { "execute": "query-helper-paths" }
>>> S: { "return": [
>>>        {
>>>          "name": "qemu-ebpf-rss-helper",
>>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>>>        }
>>>       ]
>>>      }
>> I think we need an example on the detail steps for how libvirt is
>> expected to use this.
> The preliminary patches for libvirt are at
> https://github.com/daynix/libvirt/tree/RSSv1


Will have a look but it would be better if the assumption of the 
management is detailed here to ease the reviewers.

Thanks


>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-09 10:04 ` [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command Andrew Melnychenko
@ 2021-06-11 14:15   ` Eric Blake
  2021-06-11 17:21     ` Daniel P. Berrangé
  2021-06-12  5:28   ` Markus Armbruster
  1 sibling, 1 reply; 31+ messages in thread
From: Eric Blake @ 2021-06-11 14:15 UTC (permalink / raw)
  To: Andrew Melnychenko
  Cc: berrange, mst, jasowang, qemu-devel, armbru, yuri.benditovich, yan

On Wed, Jun 09, 2021 at 01:04:56PM +0300, Andrew Melnychenko wrote:
> New qmp command to query ebpf helper.
> It's crucial that qemu and helper are in sync and in touch.
> Technically helper should pass eBPF fds that qemu may accept.
> And different qemu's builds may have different eBPF programs and helpers.
> Qemu returns helper that should "fit" to virtio-net.
> 
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>  monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
>  qapi/misc.json     | 29 +++++++++++++++++
>  2 files changed, 107 insertions(+)
> 
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index f7d64a6457..5dd2a58ea2 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error **errp)
>          abort();
>      }
>  }
> +
> +#ifdef CONFIG_LINUX
> +
> +static const char *get_dirname(char *path)
> +{
> +    char *sep;
> +
> +    sep = strrchr(path, '/');
> +    if (sep == path) {
> +        return "/";
> +    } else if (sep) {
> +        *sep = 0;
> +        return path;
> +    }
> +    return ".";
> +}

Seems like this function is duplicating what glib should already be
able to do.

> +
> +static char *find_helper(const char *name)
> +{
> +    char qemu_exec[PATH_MAX];

Stack-allocating a PATH_MAX array for readlink() is poor practice.
Better is to use g_file_read_link().

> +    const char *qemu_dir = NULL;
> +    char *helper = NULL;
> +
> +    if (name == NULL) {
> +        return NULL;
> +    }
> +
> +    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> +        qemu_dir = get_dirname(qemu_exec);
> +
> +        helper = g_strdup_printf("%s/%s", qemu_dir, name);
> +        if (access(helper, F_OK) == 0) {
> +            return helper;
> +        }
> +        g_free(helper);
> +    }
> +
> +    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);

Could we use a compile-time determination of where we were (supposed)
to be installed, and therefore where our helper should be installed,
rather than the dynamic /proc/self/exe munging?

> +    if (access(helper, F_OK) == 0) {
> +        return helper;
> +    }
> +    g_free(helper);
> +
> +    return NULL;
> +}
> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +    HelperPathList *ret = NULL;
> +    const char *helpers_list[] = {
> +#ifdef CONFIG_EBPF
> +        "qemu-ebpf-rss-helper",
> +#endif

So the intent is that we can make this list larger if we write other
helper binaries.  But this code is in an overall #ifdef CONFIG_LINUX,
which means it won't work on other platforms.

> +#else
> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +    return NULL;
> +}
> +
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 156f98203e..023bd2120d 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -519,3 +519,32 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @HelperPath:
> +#
> +# Name of the helper and binary location.
> +##

Missing a 'Since: 6.1' line.

> +{ 'struct': 'HelperPath',
> +  'data': {'name': 'str', 'path': 'str'} }
> +
> +##
> +# @query-helper-paths:
> +#
> +# Query specific eBPF RSS helper for current qemu binary.
> +#
> +# Returns: list of object that contains name and path for helper.
> +#
> +# Example:
> +#
> +# -> { "execute": "query-helper-paths" }
> +# <- { "return": [
> +#        {
> +#          "name": "qemu-ebpf-rss-helper",
> +#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> +#        }
> +#      ]
> +#    }
> +#
> +##
> +{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }

Missing an 'if':... to designate that this command is only useful on
Linux.  Or you can make it work on other platforms with my suggestions
above about better utilizing glib functionality at runtime or even
just relying on compile-time use of the configured install locations.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-11  5:36     ` Jason Wang
@ 2021-06-11 16:49       ` Andrew Melnichenko
  2021-06-11 17:24         ` Daniel P. Berrangé
  2021-06-15  9:13         ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-11 16:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 4686 bytes --]

Hi,

> So I think the series is for unprivileged_bpf disabled. If I'm not
> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> via LSM.
>

The main idea is to run eBPF RSS with qemu without any permission.
Libvirt should handle everything and pass proper eBPF file descriptors.
For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
also required, and in the future may be other permissions.

I'm not sure this is the best. We have several examples that let libvirt
> to involve. Examples:
>
> 1) create TAP device (and the TUN_SETIFF)
>
> 2) open vhost devices
>

Technically TAP/vhost not related to a particular qemu emulator. So common
TAP creation should fit any modern qemu. eBPF fds(program and maps) should
suit the interface for current qemu, g.e. some qemu builds may have
different map
structures or their count. It's necessary that the qemu got fds prepared by
the helper
that was built with the qemu.

I think we need an example on the detail steps for how libvirt is
> expected to use this.
>

The simplified workflow looks like this:

   1. Libvirt got "emulator" from domain document.
   2. Libvirt queries for qemu capabilities.
   3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
   4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
   5. If required, the "qemu-ebpf-rss-helper" called and fds are received
   through unix fd.
   6. Those fds are for eBPF RSS, which passed to child process - qemu.
   7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".


On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com>  wrote:
> >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >>> Libvirt usually launches qemu with strict permissions.
> >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
> >> A silly question:
> >>
> >> Kernel had the following permission checks in bpf syscall:
> >>
> >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >>                   return -EPERM;
> >> ...
> >>
> >>           err = security_bpf(cmd, &attr, size);
> >>           if (err < 0)
> >>                   return err;
> >>
> >> So if I understand the code correctly, bpf syscall can only be done if:
> >>
> >> 1) unprivileged_bpf is enabled or
> >> 2) has the capability  and pass the LSM checks
> >>
> >> So I think the series is for unprivileged_bpf disabled. If I'm not
> >> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> >> via LSM.
> >>
> >> If this is correct, need to describe it in the commit log.
> >>
> >>
> >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> >>> initialize eBPF RSS context with passed program & maps fds.
> >>>
> >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> >>> context and passes fds through unix socket.
> >>> Libvirt should call the helper and pass fds to qemu through
> >>> "ebpf_rss_fds" property.
> >>>
> >>> Added explicit target OS check for libbpf dependency in meson.
> >>> eBPF RSS works only with Linux TAP, so there is no reason to
> >>> build eBPF loader/helper for non-Linux.
> >>>
> >>> Overall, libvirt process should not be aware of the "interface"
> >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> >>> their quantity.
> >> I'm not sure this is the best. We have several examples that let libvirt
> >> to involve. Examples:
> >>
> >> 1) create TAP device (and the TUN_SETIFF)
> >>
> >> 2) open vhost devices
> >>
> >>
> >>>    That's why qemu and the helper should be from
> >>> the same build and be "synchronized". Technically each qemu may
> >>> have its own helper. That's why "query-helper-paths" qmp command
> >>> was added. Qemu should return the path to the helper that suits
> >>> and libvirt should use "that" helper for "that" emulator.
> >>>
> >>> qmp sample:
> >>> C: { "execute": "query-helper-paths" }
> >>> S: { "return": [
> >>>        {
> >>>          "name": "qemu-ebpf-rss-helper",
> >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >>>        }
> >>>       ]
> >>>      }
> >> I think we need an example on the detail steps for how libvirt is
> >> expected to use this.
> > The preliminary patches for libvirt are at
> > https://github.com/daynix/libvirt/tree/RSSv1
>
>
> Will have a look but it would be better if the assumption of the
> management is detailed here to ease the reviewers.
>
> Thanks
>
>
> >
>
>

[-- Attachment #2: Type: text/html, Size: 7089 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-11 14:15   ` Eric Blake
@ 2021-06-11 17:21     ` Daniel P. Berrangé
  0 siblings, 0 replies; 31+ messages in thread
From: Daniel P. Berrangé @ 2021-06-11 17:21 UTC (permalink / raw)
  To: Eric Blake
  Cc: Andrew Melnychenko, mst, jasowang, qemu-devel, armbru,
	yuri.benditovich, yan

On Fri, Jun 11, 2021 at 09:15:52AM -0500, Eric Blake wrote:
> On Wed, Jun 09, 2021 at 01:04:56PM +0300, Andrew Melnychenko wrote:
> > New qmp command to query ebpf helper.
> > It's crucial that qemu and helper are in sync and in touch.
> > Technically helper should pass eBPF fds that qemu may accept.
> > And different qemu's builds may have different eBPF programs and helpers.
> > Qemu returns helper that should "fit" to virtio-net.
> > 
> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > ---
> >  monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi/misc.json     | 29 +++++++++++++++++
> >  2 files changed, 107 insertions(+)
> > 
> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> > index f7d64a6457..5dd2a58ea2 100644
> > --- a/monitor/qmp-cmds.c
> > +++ b/monitor/qmp-cmds.c
> > @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error **errp)
> >          abort();
> >      }
> >  }
> > +
> > +#ifdef CONFIG_LINUX
> > +
> > +static const char *get_dirname(char *path)
> > +{
> > +    char *sep;
> > +
> > +    sep = strrchr(path, '/');
> > +    if (sep == path) {
> > +        return "/";
> > +    } else if (sep) {
> > +        *sep = 0;
> > +        return path;
> > +    }
> > +    return ".";
> > +}
> 
> Seems like this function is duplicating what glib should already be
> able to do.
> 
> > +
> > +static char *find_helper(const char *name)
> > +{
> > +    char qemu_exec[PATH_MAX];
> 
> Stack-allocating a PATH_MAX array for readlink() is poor practice.
> Better is to use g_file_read_link().
> 
> > +    const char *qemu_dir = NULL;
> > +    char *helper = NULL;
> > +
> > +    if (name == NULL) {
> > +        return NULL;
> > +    }
> > +
> > +    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> > +        qemu_dir = get_dirname(qemu_exec);
> > +
> > +        helper = g_strdup_printf("%s/%s", qemu_dir, name);
> > +        if (access(helper, F_OK) == 0) {
> > +            return helper;
> > +        }
> > +        g_free(helper);
> > +    }
> > +
> > +    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
> 
> Could we use a compile-time determination of where we were (supposed)
> to be installed, and therefore where our helper should be installed,
> rather than the dynamic /proc/self/exe munging?

Yeah I think avoiding /proc/self/exe is desirable, because I can
imagine scenarios where this can lead to picking the wrong helper.
Better to always use the compile time install directory.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-11 16:49       ` Andrew Melnichenko
@ 2021-06-11 17:24         ` Daniel P. Berrangé
  2021-06-15  9:13         ` Jason Wang
  1 sibling, 0 replies; 31+ messages in thread
From: Daniel P. Berrangé @ 2021-06-11 17:24 UTC (permalink / raw)
  To: Andrew Melnichenko
  Cc: Michael S . Tsirkin, Jason Wang, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

On Fri, Jun 11, 2021 at 07:49:21PM +0300, Andrew Melnichenko wrote:
> Hi,
> 
> > So I think the series is for unprivileged_bpf disabled. If I'm not
> > wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
> > via LSM.
> >
> 
> The main idea is to run eBPF RSS with qemu without any permission.
> Libvirt should handle everything and pass proper eBPF file descriptors.
> For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> also required, and in the future may be other permissions.
> 
> I'm not sure this is the best. We have several examples that let libvirt
> > to involve. Examples:
> >
> > 1) create TAP device (and the TUN_SETIFF)
> >
> > 2) open vhost devices
> >
> 
> Technically TAP/vhost not related to a particular qemu emulator. So common
> TAP creation should fit any modern qemu. eBPF fds(program and maps) should
> suit the interface for current qemu, g.e. some qemu builds may have
> different map
> structures or their count. It's necessary that the qemu got fds prepared by
> the helper
> that was built with the qemu.
> 
> I think we need an example on the detail steps for how libvirt is
> > expected to use this.
> >
> 
> The simplified workflow looks like this:
> 
>    1. Libvirt got "emulator" from domain document.
>    2. Libvirt queries for qemu capabilities.
>    3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>    4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
>    5. If required, the "qemu-ebpf-rss-helper" called and fds are received
>    through unix fd.
>    6. Those fds are for eBPF RSS, which passed to child process - qemu.
>    7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".

So this basically works in the same way as the qemu bridge
helper, with the extra advantage that we can actually query
QEMU for the right helper instead of libvirt hardcoding te
helper path.  We should make your QMP query command also
return the paths for the existing QEMU helpers (bridge helper,
and pr helper) too.

Anyway, this approach is obviously viable for libvirt, since
it matches what we already do for other features.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-09 10:04 ` [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command Andrew Melnychenko
  2021-06-11 14:15   ` Eric Blake
@ 2021-06-12  5:28   ` Markus Armbruster
  2021-06-15 23:16     ` Andrew Melnichenko
  1 sibling, 1 reply; 31+ messages in thread
From: Markus Armbruster @ 2021-06-12  5:28 UTC (permalink / raw)
  To: Andrew Melnychenko
  Cc: berrange, mst, jasowang, qemu-devel, yuri.benditovich, yan, eblake

Andrew Melnychenko <andrew@daynix.com> writes:

> New qmp command to query ebpf helper.
> It's crucial that qemu and helper are in sync and in touch.
> Technically helper should pass eBPF fds that qemu may accept.
> And different qemu's builds may have different eBPF programs and helpers.
> Qemu returns helper that should "fit" to virtio-net.
>
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>  monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
>  qapi/misc.json     | 29 +++++++++++++++++
>  2 files changed, 107 insertions(+)
>
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index f7d64a6457..5dd2a58ea2 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg, Error **errp)
>          abort();
>      }
>  }
> +
> +#ifdef CONFIG_LINUX
> +
> +static const char *get_dirname(char *path)
> +{
> +    char *sep;
> +
> +    sep = strrchr(path, '/');
> +    if (sep == path) {
> +        return "/";
> +    } else if (sep) {
> +        *sep = 0;
> +        return path;
> +    }
> +    return ".";
> +}
> +
> +static char *find_helper(const char *name)
> +{
> +    char qemu_exec[PATH_MAX];
> +    const char *qemu_dir = NULL;
> +    char *helper = NULL;
> +
> +    if (name == NULL) {
> +        return NULL;
> +    }
> +
> +    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> +        qemu_dir = get_dirname(qemu_exec);
> +
> +        helper = g_strdup_printf("%s/%s", qemu_dir, name);
> +        if (access(helper, F_OK) == 0) {
> +            return helper;
> +        }
> +        g_free(helper);
> +    }
> +
> +    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
> +    if (access(helper, F_OK) == 0) {
> +        return helper;
> +    }
> +    g_free(helper);
> +
> +    return NULL;
> +}

This returns the helper in the same directory as the running executable,
or as a fallback the helper in CONFIG_QEMU_HELPERDIR.

Checking F_OK (existence) instea of X_OK is odd.

It uses /proc/self/exe to find the running executable's directory.  This
is specific to Linux[*].  You get different behavior on Linux vs. other
systems.

CONFIG_QEMU_HELPERDIR is $prefix/libexec/.

If $prefix is /usr, then qemu-system-FOO is normally installed in
/usr/bin/, and the helper in /usr/libexec/.  We look for the helper in
the wrong place first, and the right one only when it isn't in the wrong
place.  Feels overcomplicated and fragile.

Consider the following scenario:

* The system has a binary package's /usr/bin/qemu-system-x86_64 and
  /usr/libexec/qemu-ebpf-rss-helper installed

* Alice builds her own QEMU with prefix /usr (and no intention to
  install), resulting in bld/qemu-system-x86_64, bld/qemu-ebpf-rss-path,
  and a symlink bld/x86_64-softmmu/qemu-system-x86_64.

Now:

* If Alice runs bld/qemu-system-x86_64, and the host is Linux,
  find_helper() returns bld/qemu-ebpf-rss-path.  Good.

* If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
  Not good.

* If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
  /usr/libexec/qemu-ebpf-rss-helper.  Not good.

> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +    HelperPathList *ret = NULL;
> +    const char *helpers_list[] = {
> +#ifdef CONFIG_EBPF
> +        "qemu-ebpf-rss-helper",
> +#endif
> +        NULL
> +    };
> +    const char **helper_iter = helpers_list;
> +
> +    for (; *helper_iter != NULL; ++helper_iter) {
> +        char *path = find_helper(*helper_iter);
> +        if (path) {
> +            HelperPath *helper = g_new0(HelperPath, 1);
> +            helper->name = g_strdup(*helper_iter);
> +            helper->path = path;
> +
> +            QAPI_LIST_PREPEND(ret, helper);
> +        }
> +    }
> +
> +    return ret;
> +}
> +#else
> +
> +HelperPathList *qmp_query_helper_paths(Error **errp)
> +{
> +    return NULL;
> +}
> +
> +#endif
> diff --git a/qapi/misc.json b/qapi/misc.json
> index 156f98203e..023bd2120d 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -519,3 +519,32 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @HelperPath:
> +#
> +# Name of the helper and binary location.
> +##
> +{ 'struct': 'HelperPath',
> +  'data': {'name': 'str', 'path': 'str'} }
> +
> +##
> +# @query-helper-paths:
> +#
> +# Query specific eBPF RSS helper for current qemu binary.
> +#
> +# Returns: list of object that contains name and path for helper.
> +#
> +# Example:
> +#
> +# -> { "execute": "query-helper-paths" }
> +# <- { "return": [
> +#        {
> +#          "name": "qemu-ebpf-rss-helper",
> +#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> +#        }
> +#      ]
> +#    }
> +#
> +##
> +{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }

The name query-helper-paths is generic, the documented purpose "Query
specific eBPF RSS helper" is specific.

qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be
in sync with QEMU.

I doubt a query command is a good way to help with using the right one.
qemu-system-FOO doesn't really know where the right one is.  Only the
person or program that put them where they are does.

If we want to ensure the right helper runs, we should use a build
identifier compiled into the programs, like we do for modules.

For modules, the program loading a module checks the module's build
identifier matches its own.

For programs talking to each other, the peers together check their build
identifiers match.

For programs where that isn't practical, the management application can
check.

This should be a lot more reliable.

Helpers QEMU code runs itself should be run as
CONFIG_QEMU_HELPERDIR/HELPER, with a suitable user override.  This is
how qemu-bridge-helper works.

Helpers some other program runs are that other program's problem.
They'll probably work the same: built-in default that can be overridden
with configuration.


[*] For detailed advice, see
https://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-11 16:49       ` Andrew Melnichenko
  2021-06-11 17:24         ` Daniel P. Berrangé
@ 2021-06-15  9:13         ` Jason Wang
  2021-06-15 22:18           ` Andrew Melnichenko
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-15  9:13 UTC (permalink / raw)
  To: Andrew Melnichenko
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake


在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> Hi,
>
>     So I think the series is for unprivileged_bpf disabled. If I'm not
>     wrong, I guess the policy is to grant CAP_BPF but do fine grain
>     checks
>     via LSM.
>
>
> The main idea is to run eBPF RSS with qemu without any permission.
> Libvirt should handle everything and pass proper eBPF file descriptors.
> For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> also required, and in the future may be other permissions.


I may miss something.

But RSS requires to update the map. This won't work if you don't grant 
any permission to qemu.

Thanks


>
>     I'm not sure this is the best. We have several examples that let
>     libvirt
>     to involve. Examples:
>
>     1) create TAP device (and the TUN_SETIFF)
>
>     2) open vhost devices
>
>
> Technically TAP/vhost not related to a particular qemu emulator. So common
> TAP creation should fit any modern qemu. eBPF fds(program and maps) should
> suit the interface for current qemu, g.e. some qemu builds may have 
> different map
> structures or their count. It's necessary that the qemu got fds 
> prepared by the helper
> that was built with the qemu.
>
>     I think we need an example on the detail steps for how libvirt is
>     expected to use this.
>
>
> The simplified workflow looks like this:
>
>  1. Libvirt got "emulator" from domain document.
>  2. Libvirt queries for qemu capabilities.
>  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>  4. On NIC preparation Libvirt checks for virtio-net + rss configurations.
>  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>     received through unix fd.
>  6. Those fds are for eBPF RSS, which passed to child process - qemu.
>  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
>
>
> On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
>     <mailto:jasowang@redhat.com>>  wrote:
>     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>     >>> Libvirt usually launches qemu with strict permissions.
>     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>     >> A silly question:
>     >>
>     >> Kernel had the following permission checks in bpf syscall:
>     >>
>     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>     >>                   return -EPERM;
>     >> ...
>     >>
>     >>           err = security_bpf(cmd, &attr, size);
>     >>           if (err < 0)
>     >>                   return err;
>     >>
>     >> So if I understand the code correctly, bpf syscall can only be
>     done if:
>     >>
>     >> 1) unprivileged_bpf is enabled or
>     >> 2) has the capability  and pass the LSM checks
>     >>
>     >> So I think the series is for unprivileged_bpf disabled. If I'm not
>     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
>     checks
>     >> via LSM.
>     >>
>     >> If this is correct, need to describe it in the commit log.
>     >>
>     >>
>     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>     >>> initialize eBPF RSS context with passed program & maps fds.
>     >>>
>     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>     >>> context and passes fds through unix socket.
>     >>> Libvirt should call the helper and pass fds to qemu through
>     >>> "ebpf_rss_fds" property.
>     >>>
>     >>> Added explicit target OS check for libbpf dependency in meson.
>     >>> eBPF RSS works only with Linux TAP, so there is no reason to
>     >>> build eBPF loader/helper for non-Linux.
>     >>>
>     >>> Overall, libvirt process should not be aware of the "interface"
>     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
>     >>> their quantity.
>     >> I'm not sure this is the best. We have several examples that
>     let libvirt
>     >> to involve. Examples:
>     >>
>     >> 1) create TAP device (and the TUN_SETIFF)
>     >>
>     >> 2) open vhost devices
>     >>
>     >>
>     >>>    That's why qemu and the helper should be from
>     >>> the same build and be "synchronized". Technically each qemu may
>     >>> have its own helper. That's why "query-helper-paths" qmp command
>     >>> was added. Qemu should return the path to the helper that suits
>     >>> and libvirt should use "that" helper for "that" emulator.
>     >>>
>     >>> qmp sample:
>     >>> C: { "execute": "query-helper-paths" }
>     >>> S: { "return": [
>     >>>        {
>     >>>          "name": "qemu-ebpf-rss-helper",
>     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>     >>>        }
>     >>>       ]
>     >>>      }
>     >> I think we need an example on the detail steps for how libvirt is
>     >> expected to use this.
>     > The preliminary patches for libvirt are at
>     > https://github.com/daynix/libvirt/tree/RSSv1
>     <https://github.com/daynix/libvirt/tree/RSSv1>
>
>
>     Will have a look but it would be better if the assumption of the
>     management is detailed here to ease the reviewers.
>
>     Thanks
>
>
>     >
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-15  9:13         ` Jason Wang
@ 2021-06-15 22:18           ` Andrew Melnichenko
  2021-06-18 20:03             ` Andrew Melnichenko
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-15 22:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 6170 bytes --]

Hi,

> I may miss something.
>
> But RSS requires to update the map. This won't work if you don't grant
> any permission to qemu.
>
> Thanks
>

Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is not
required to update maps.
With "kernel.unprivileged_bpf_disabled=1" - setting maps will fail(without
CAP_BPF) and "in-qemu" RSS will be used.

On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> > Hi,
> >
> >     So I think the series is for unprivileged_bpf disabled. If I'm not
> >     wrong, I guess the policy is to grant CAP_BPF but do fine grain
> >     checks
> >     via LSM.
> >
> >
> > The main idea is to run eBPF RSS with qemu without any permission.
> > Libvirt should handle everything and pass proper eBPF file descriptors.
> > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> > also required, and in the future may be other permissions.
>
>
> I may miss something.
>
> But RSS requires to update the map. This won't work if you don't grant
> any permission to qemu.
>
> Thanks
>
>
> >
> >     I'm not sure this is the best. We have several examples that let
> >     libvirt
> >     to involve. Examples:
> >
> >     1) create TAP device (and the TUN_SETIFF)
> >
> >     2) open vhost devices
> >
> >
> > Technically TAP/vhost not related to a particular qemu emulator. So
> common
> > TAP creation should fit any modern qemu. eBPF fds(program and maps)
> should
> > suit the interface for current qemu, g.e. some qemu builds may have
> > different map
> > structures or their count. It's necessary that the qemu got fds
> > prepared by the helper
> > that was built with the qemu.
> >
> >     I think we need an example on the detail steps for how libvirt is
> >     expected to use this.
> >
> >
> > The simplified workflow looks like this:
> >
> >  1. Libvirt got "emulator" from domain document.
> >  2. Libvirt queries for qemu capabilities.
> >  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
> >  4. On NIC preparation Libvirt checks for virtio-net + rss
> configurations.
> >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
> >     received through unix fd.
> >  6. Those fds are for eBPF RSS, which passed to child process - qemu.
> >  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
> >
> >
> > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> >     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>>  wrote:
> >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >     >>> Libvirt usually launches qemu with strict permissions.
> >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
> >     >> A silly question:
> >     >>
> >     >> Kernel had the following permission checks in bpf syscall:
> >     >>
> >     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> >     >>                   return -EPERM;
> >     >> ...
> >     >>
> >     >>           err = security_bpf(cmd, &attr, size);
> >     >>           if (err < 0)
> >     >>                   return err;
> >     >>
> >     >> So if I understand the code correctly, bpf syscall can only be
> >     done if:
> >     >>
> >     >> 1) unprivileged_bpf is enabled or
> >     >> 2) has the capability  and pass the LSM checks
> >     >>
> >     >> So I think the series is for unprivileged_bpf disabled. If I'm not
> >     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
> >     checks
> >     >> via LSM.
> >     >>
> >     >> If this is correct, need to describe it in the commit log.
> >     >>
> >     >>
> >     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
> >     >>> initialize eBPF RSS context with passed program & maps fds.
> >     >>>
> >     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
> >     >>> context and passes fds through unix socket.
> >     >>> Libvirt should call the helper and pass fds to qemu through
> >     >>> "ebpf_rss_fds" property.
> >     >>>
> >     >>> Added explicit target OS check for libbpf dependency in meson.
> >     >>> eBPF RSS works only with Linux TAP, so there is no reason to
> >     >>> build eBPF loader/helper for non-Linux.
> >     >>>
> >     >>> Overall, libvirt process should not be aware of the "interface"
> >     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and
> >     >>> their quantity.
> >     >> I'm not sure this is the best. We have several examples that
> >     let libvirt
> >     >> to involve. Examples:
> >     >>
> >     >> 1) create TAP device (and the TUN_SETIFF)
> >     >>
> >     >> 2) open vhost devices
> >     >>
> >     >>
> >     >>>    That's why qemu and the helper should be from
> >     >>> the same build and be "synchronized". Technically each qemu may
> >     >>> have its own helper. That's why "query-helper-paths" qmp command
> >     >>> was added. Qemu should return the path to the helper that suits
> >     >>> and libvirt should use "that" helper for "that" emulator.
> >     >>>
> >     >>> qmp sample:
> >     >>> C: { "execute": "query-helper-paths" }
> >     >>> S: { "return": [
> >     >>>        {
> >     >>>          "name": "qemu-ebpf-rss-helper",
> >     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> >     >>>        }
> >     >>>       ]
> >     >>>      }
> >     >> I think we need an example on the detail steps for how libvirt is
> >     >> expected to use this.
> >     > The preliminary patches for libvirt are at
> >     > https://github.com/daynix/libvirt/tree/RSSv1
> >     <https://github.com/daynix/libvirt/tree/RSSv1>
> >
> >
> >     Will have a look but it would be better if the assumption of the
> >     management is detailed here to ease the reviewers.
> >
> >     Thanks
> >
> >
> >     >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 8976 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-12  5:28   ` Markus Armbruster
@ 2021-06-15 23:16     ` Andrew Melnichenko
  2021-07-05 13:50       ` Andrew Melnichenko
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-15 23:16 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	Michael S. Tsirkin, Jason Wang, qemu-devel, Yuri Benditovich,
	Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 9810 bytes --]

Hi all,

> Seems like this function is duplicating what glib should already be
> able to do.
>
Yea, but it's required a Linux specific header - without it, qemu builds
but crashes.

Could we use a compile-time determination of where we were (supposed)
> to be installed, and therefore where our helper should be installed,
> rather than the dynamic /proc/self/exe munging?
>
Yes, we can define something like CONFIG_QEMU_HELPERDIR ##
"qemu-ebpf-rss-helper", for RSS helper.
But I've tried to implement generic function for possible other helpers.

Yeah I think avoiding /proc/self/exe is desirable, because I can
> imagine scenarios where this can lead to picking the wrong helper.
> Better to always use the compile time install directory.
>
The main scenario that find_helper() should solve - non installed qemu use
helper from own build.
That's why reading /proc/self/exe is implemented.

So the intent is that we can make this list larger if we write other
> helper binaries.  But this code is in an overall #ifdef CONFIG_LINUX,
> which means it won't work on other platforms.
>
Yes, for now, eBPF RSS is only for virtio-net + Linux TAP.

Checking F_OK (existence) instea of X_OK is odd.
>
Libvirt launches qemu to get different properties. That qemu may not have
permission to launch the helper.

It uses /proc/self/exe to find the running executable's directory.  This
> is specific to Linux[*].  You get different behavior on Linux vs. other
> systems.
>
The code guarded by CONFIG_LINUX.

* If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
>   Not good.
>
No,  "query-helper-paths" will return an empty list.

* If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
>   /usr/libexec/qemu-ebpf-rss-helper.  Not good.
>
No, /proc/self/exe dereferences "bld/x86_64-softmmu/qemu-system-x86_64" to
"bld/qemu-system-x86_64"
and we will get bld/qemu-ebpf-rss-helper.

 The name query-helper-paths is generic, the documented purpose "Query
> specific eBPF RSS helper" is specific.
>
> qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be
> in sync with QEMU.
>
Yea, I'll update the document.

If we want to ensure the right helper runs, we should use a build
> identifier compiled into the programs, like we do for modules.
>
Thanks, I'll check. Overall, current idea was to avoid the use of the helper
from CONFIG_QEMU_HELPERDIR if qemu is not installed(like in your examples).

Helpers QEMU code runs itself should be run as
> CONFIG_QEMU_HELPERDIR/HELPER, with a suitable user override.  This is
> how qemu-bridge-helper works.
>
> Helpers some other program runs are that other program's problem.
> They'll probably work the same: built-in default that can be overridden
> with configuration.
>
Well, for qemu it does not really matter how TAP fd was created. It can be
the helper, Libvirt itself, or a script.
In the end, "netdev" gets its fds and for qemu there is no difference. TAP
fd is TAP fd.
And Libvirt would use the same qemu-bridge-helper(from libvirt/qemu.conf)
for every qemu "emulator".
For eBPF we need to create specific maps(and/or thair quantities) that
contain specific structures and for different
qemu it may be different.



On Sat, Jun 12, 2021 at 8:28 AM Markus Armbruster <armbru@redhat.com> wrote:

> Andrew Melnychenko <andrew@daynix.com> writes:
>
> > New qmp command to query ebpf helper.
> > It's crucial that qemu and helper are in sync and in touch.
> > Technically helper should pass eBPF fds that qemu may accept.
> > And different qemu's builds may have different eBPF programs and helpers.
> > Qemu returns helper that should "fit" to virtio-net.
> >
> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > ---
> >  monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi/misc.json     | 29 +++++++++++++++++
> >  2 files changed, 107 insertions(+)
> >
> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> > index f7d64a6457..5dd2a58ea2 100644
> > --- a/monitor/qmp-cmds.c
> > +++ b/monitor/qmp-cmds.c
> > @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg,
> Error **errp)
> >          abort();
> >      }
> >  }
> > +
> > +#ifdef CONFIG_LINUX
> > +
> > +static const char *get_dirname(char *path)
> > +{
> > +    char *sep;
> > +
> > +    sep = strrchr(path, '/');
> > +    if (sep == path) {
> > +        return "/";
> > +    } else if (sep) {
> > +        *sep = 0;
> > +        return path;
> > +    }
> > +    return ".";
> > +}
> > +
> > +static char *find_helper(const char *name)
> > +{
> > +    char qemu_exec[PATH_MAX];
> > +    const char *qemu_dir = NULL;
> > +    char *helper = NULL;
> > +
> > +    if (name == NULL) {
> > +        return NULL;
> > +    }
> > +
> > +    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
> > +        qemu_dir = get_dirname(qemu_exec);
> > +
> > +        helper = g_strdup_printf("%s/%s", qemu_dir, name);
> > +        if (access(helper, F_OK) == 0) {
> > +            return helper;
> > +        }
> > +        g_free(helper);
> > +    }
> > +
> > +    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
> > +    if (access(helper, F_OK) == 0) {
> > +        return helper;
> > +    }
> > +    g_free(helper);
> > +
> > +    return NULL;
> > +}
>
> This returns the helper in the same directory as the running executable,
> or as a fallback the helper in CONFIG_QEMU_HELPERDIR.
>
> Checking F_OK (existence) instea of X_OK is odd.
>
> It uses /proc/self/exe to find the running executable's directory.  This
> is specific to Linux[*].  You get different behavior on Linux vs. other
> systems.
>
> CONFIG_QEMU_HELPERDIR is $prefix/libexec/.
>
> If $prefix is /usr, then qemu-system-FOO is normally installed in
> /usr/bin/, and the helper in /usr/libexec/.  We look for the helper in
> the wrong place first, and the right one only when it isn't in the wrong
> place.  Feels overcomplicated and fragile.
>
> Consider the following scenario:
>
> * The system has a binary package's /usr/bin/qemu-system-x86_64 and
>   /usr/libexec/qemu-ebpf-rss-helper installed
>
> * Alice builds her own QEMU with prefix /usr (and no intention to
>   install), resulting in bld/qemu-system-x86_64, bld/qemu-ebpf-rss-path,
>   and a symlink bld/x86_64-softmmu/qemu-system-x86_64.
>
> Now:
>
> * If Alice runs bld/qemu-system-x86_64, and the host is Linux,
>   find_helper() returns bld/qemu-ebpf-rss-path.  Good.
>
> * If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
>   Not good.
>
> * If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
>   /usr/libexec/qemu-ebpf-rss-helper.  Not good.
>
> > +
> > +HelperPathList *qmp_query_helper_paths(Error **errp)
> > +{
> > +    HelperPathList *ret = NULL;
> > +    const char *helpers_list[] = {
> > +#ifdef CONFIG_EBPF
> > +        "qemu-ebpf-rss-helper",
> > +#endif
> > +        NULL
> > +    };
> > +    const char **helper_iter = helpers_list;
> > +
> > +    for (; *helper_iter != NULL; ++helper_iter) {
> > +        char *path = find_helper(*helper_iter);
> > +        if (path) {
> > +            HelperPath *helper = g_new0(HelperPath, 1);
> > +            helper->name = g_strdup(*helper_iter);
> > +            helper->path = path;
> > +
> > +            QAPI_LIST_PREPEND(ret, helper);
> > +        }
> > +    }
> > +
> > +    return ret;
> > +}
> > +#else
> > +
> > +HelperPathList *qmp_query_helper_paths(Error **errp)
> > +{
> > +    return NULL;
> > +}
> > +
> > +#endif
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index 156f98203e..023bd2120d 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -519,3 +519,32 @@
> >   'data': { '*option': 'str' },
> >   'returns': ['CommandLineOptionInfo'],
> >   'allow-preconfig': true }
> > +
> > +##
> > +# @HelperPath:
> > +#
> > +# Name of the helper and binary location.
> > +##
> > +{ 'struct': 'HelperPath',
> > +  'data': {'name': 'str', 'path': 'str'} }
> > +
> > +##
> > +# @query-helper-paths:
> > +#
> > +# Query specific eBPF RSS helper for current qemu binary.
> > +#
> > +# Returns: list of object that contains name and path for helper.
> > +#
> > +# Example:
> > +#
> > +# -> { "execute": "query-helper-paths" }
> > +# <- { "return": [
> > +#        {
> > +#          "name": "qemu-ebpf-rss-helper",
> > +#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
> > +#        }
> > +#      ]
> > +#    }
> > +#
> > +##
> > +{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }
>
> The name query-helper-paths is generic, the documented purpose "Query
> specific eBPF RSS helper" is specific.
>
> qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be
> in sync with QEMU.
>
> I doubt a query command is a good way to help with using the right one.
> qemu-system-FOO doesn't really know where the right one is.  Only the
> person or program that put them where they are does.
>
> If we want to ensure the right helper runs, we should use a build
> identifier compiled into the programs, like we do for modules.
>
> For modules, the program loading a module checks the module's build
> identifier matches its own.
>
> For programs talking to each other, the peers together check their build
> identifiers match.
>
> For programs where that isn't practical, the management application can
> check.
>
> This should be a lot more reliable.
>
> Helpers QEMU code runs itself should be run as
> CONFIG_QEMU_HELPERDIR/HELPER, with a suitable user override.  This is
> how qemu-bridge-helper works.
>
> Helpers some other program runs are that other program's problem.
> They'll probably work the same: built-in default that can be overridden
> with configuration.
>
>
> [*] For detailed advice, see
>
> https://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe
>
>

[-- Attachment #2: Type: text/html, Size: 14910 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-15 22:18           ` Andrew Melnichenko
@ 2021-06-18 20:03             ` Andrew Melnichenko
  2021-06-21  9:20               ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-18 20:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 6574 bytes --]

Hi Jason,
I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu, and
Debian - no need permissions to update BPF maps.

On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com>
wrote:

> Hi,
>
>> I may miss something.
>>
>> But RSS requires to update the map. This won't work if you don't grant
>> any permission to qemu.
>>
>> Thanks
>>
>
> Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is not
> required to update maps.
> With "kernel.unprivileged_bpf_disabled=1" - setting maps will fail(without
> CAP_BPF) and "in-qemu" RSS will be used.
>
> On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com> wrote:
>
>>
>> 在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>> > Hi,
>> >
>> >     So I think the series is for unprivileged_bpf disabled. If I'm not
>> >     wrong, I guess the policy is to grant CAP_BPF but do fine grain
>> >     checks
>> >     via LSM.
>> >
>> >
>> > The main idea is to run eBPF RSS with qemu without any permission.
>> > Libvirt should handle everything and pass proper eBPF file descriptors.
>> > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>> > also required, and in the future may be other permissions.
>>
>>
>> I may miss something.
>>
>> But RSS requires to update the map. This won't work if you don't grant
>> any permission to qemu.
>>
>> Thanks
>>
>>
>> >
>> >     I'm not sure this is the best. We have several examples that let
>> >     libvirt
>> >     to involve. Examples:
>> >
>> >     1) create TAP device (and the TUN_SETIFF)
>> >
>> >     2) open vhost devices
>> >
>> >
>> > Technically TAP/vhost not related to a particular qemu emulator. So
>> common
>> > TAP creation should fit any modern qemu. eBPF fds(program and maps)
>> should
>> > suit the interface for current qemu, g.e. some qemu builds may have
>> > different map
>> > structures or their count. It's necessary that the qemu got fds
>> > prepared by the helper
>> > that was built with the qemu.
>> >
>> >     I think we need an example on the detail steps for how libvirt is
>> >     expected to use this.
>> >
>> >
>> > The simplified workflow looks like this:
>> >
>> >  1. Libvirt got "emulator" from domain document.
>> >  2. Libvirt queries for qemu capabilities.
>> >  3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present).
>> >  4. On NIC preparation Libvirt checks for virtio-net + rss
>> configurations.
>> >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>> >     received through unix fd.
>> >  6. Those fds are for eBPF RSS, which passed to child process - qemu.
>> >  7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds".
>> >
>> >
>> > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang <jasowang@redhat.com
>> > <mailto:jasowang@redhat.com>> wrote:
>> >
>> >
>> >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>> >     > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com
>> >     <mailto:jasowang@redhat.com>>  wrote:
>> >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>> >     >>> Libvirt usually launches qemu with strict permissions.
>> >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added.
>> >     >> A silly question:
>> >     >>
>> >     >> Kernel had the following permission checks in bpf syscall:
>> >     >>
>> >     >>          if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
>> >     >>                   return -EPERM;
>> >     >> ...
>> >     >>
>> >     >>           err = security_bpf(cmd, &attr, size);
>> >     >>           if (err < 0)
>> >     >>                   return err;
>> >     >>
>> >     >> So if I understand the code correctly, bpf syscall can only be
>> >     done if:
>> >     >>
>> >     >> 1) unprivileged_bpf is enabled or
>> >     >> 2) has the capability  and pass the LSM checks
>> >     >>
>> >     >> So I think the series is for unprivileged_bpf disabled. If I'm
>> not
>> >     >> wrong, I guess the policy is to grant CAP_BPF but do fine grain
>> >     checks
>> >     >> via LSM.
>> >     >>
>> >     >> If this is correct, need to describe it in the commit log.
>> >     >>
>> >     >>
>> >     >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to
>> >     >>> initialize eBPF RSS context with passed program & maps fds.
>> >     >>>
>> >     >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>> >     >>> context and passes fds through unix socket.
>> >     >>> Libvirt should call the helper and pass fds to qemu through
>> >     >>> "ebpf_rss_fds" property.
>> >     >>>
>> >     >>> Added explicit target OS check for libbpf dependency in meson.
>> >     >>> eBPF RSS works only with Linux TAP, so there is no reason to
>> >     >>> build eBPF loader/helper for non-Linux.
>> >     >>>
>> >     >>> Overall, libvirt process should not be aware of the "interface"
>> >     >>> of eBPF RSS, it will not be aware of eBPF maps/program "type"
>> and
>> >     >>> their quantity.
>> >     >> I'm not sure this is the best. We have several examples that
>> >     let libvirt
>> >     >> to involve. Examples:
>> >     >>
>> >     >> 1) create TAP device (and the TUN_SETIFF)
>> >     >>
>> >     >> 2) open vhost devices
>> >     >>
>> >     >>
>> >     >>>    That's why qemu and the helper should be from
>> >     >>> the same build and be "synchronized". Technically each qemu may
>> >     >>> have its own helper. That's why "query-helper-paths" qmp command
>> >     >>> was added. Qemu should return the path to the helper that suits
>> >     >>> and libvirt should use "that" helper for "that" emulator.
>> >     >>>
>> >     >>> qmp sample:
>> >     >>> C: { "execute": "query-helper-paths" }
>> >     >>> S: { "return": [
>> >     >>>        {
>> >     >>>          "name": "qemu-ebpf-rss-helper",
>> >     >>>          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>> >     >>>        }
>> >     >>>       ]
>> >     >>>      }
>> >     >> I think we need an example on the detail steps for how libvirt is
>> >     >> expected to use this.
>> >     > The preliminary patches for libvirt are at
>> >     > https://github.com/daynix/libvirt/tree/RSSv1
>> >     <https://github.com/daynix/libvirt/tree/RSSv1>
>> >
>> >
>> >     Will have a look but it would be better if the assumption of the
>> >     management is detailed here to ease the reviewers.
>> >
>> >     Thanks
>> >
>> >
>> >     >
>> >
>>
>>

[-- Attachment #2: Type: text/html, Size: 9518 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-18 20:03             ` Andrew Melnichenko
@ 2021-06-21  9:20               ` Jason Wang
  2021-06-22  3:29                 ` Yuri Benditovich
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-21  9:20 UTC (permalink / raw)
  To: Andrew Melnichenko
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake


在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> Hi Jason,
> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu, 
> and Debian - no need permissions to update BPF maps.


How about RHEL :) ?

Thanks


>
> On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com 
> <mailto:andrew@daynix.com>> wrote:
>
>     Hi,
>
>         I may miss something.
>
>         But RSS requires to update the map. This won't work if you
>         don't grant
>         any permission to qemu.
>
>         Thanks
>
>
>     Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
>     not required to update maps.
>     With "kernel.unprivileged_bpf_disabled=1" - setting maps will
>     fail(without CAP_BPF) and "in-qemu" RSS will be used.
>
>     On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
>     <mailto:jasowang@redhat.com>> wrote:
>
>
>         在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>         > Hi,
>         >
>         >     So I think the series is for unprivileged_bpf disabled.
>         If I'm not
>         >     wrong, I guess the policy is to grant CAP_BPF but do
>         fine grain
>         >     checks
>         >     via LSM.
>         >
>         >
>         > The main idea is to run eBPF RSS with qemu without any
>         permission.
>         > Libvirt should handle everything and pass proper eBPF file
>         descriptors.
>         > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>         > also required, and in the future may be other permissions.
>
>
>         I may miss something.
>
>         But RSS requires to update the map. This won't work if you
>         don't grant
>         any permission to qemu.
>
>         Thanks
>
>
>         >
>         >     I'm not sure this is the best. We have several examples
>         that let
>         >     libvirt
>         >     to involve. Examples:
>         >
>         >     1) create TAP device (and the TUN_SETIFF)
>         >
>         >     2) open vhost devices
>         >
>         >
>         > Technically TAP/vhost not related to a particular qemu
>         emulator. So common
>         > TAP creation should fit any modern qemu. eBPF fds(program
>         and maps) should
>         > suit the interface for current qemu, g.e. some qemu builds
>         may have
>         > different map
>         > structures or their count. It's necessary that the qemu got fds
>         > prepared by the helper
>         > that was built with the qemu.
>         >
>         >     I think we need an example on the detail steps for how
>         libvirt is
>         >     expected to use this.
>         >
>         >
>         > The simplified workflow looks like this:
>         >
>         >  1. Libvirt got "emulator" from domain document.
>         >  2. Libvirt queries for qemu capabilities.
>         >  3. One of the capabilities is "qemu-ebpf-rss-helper"
>         path(if present).
>         >  4. On NIC preparation Libvirt checks for virtio-net + rss
>         configurations.
>         >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>         >     received through unix fd.
>         >  6. Those fds are for eBPF RSS, which passed to child
>         process - qemu.
>         >  7. Qemu launched with virtio-net-pci property "rss" and
>         "ebpf_rss_fds".
>         >
>         >
>         > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
>         <jasowang@redhat.com <mailto:jasowang@redhat.com>
>         > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
>         wrote:
>         >
>         >
>         >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>         >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
>         Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
>         >     <mailto:jasowang@redhat.com
>         <mailto:jasowang@redhat.com>>> wrote:
>         >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>         >     >>> Libvirt usually launches qemu with strict permissions.
>         >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
>         was added.
>         >     >> A silly question:
>         >     >>
>         >     >> Kernel had the following permission checks in bpf
>         syscall:
>         >     >>
>         >     >>          if (sysctl_unprivileged_bpf_disabled &&
>         !bpf_capable())
>         >     >>                   return -EPERM;
>         >     >> ...
>         >     >>
>         >     >>           err = security_bpf(cmd, &attr, size);
>         >     >>           if (err < 0)
>         >     >>                   return err;
>         >     >>
>         >     >> So if I understand the code correctly, bpf syscall
>         can only be
>         >     done if:
>         >     >>
>         >     >> 1) unprivileged_bpf is enabled or
>         >     >> 2) has the capability  and pass the LSM checks
>         >     >>
>         >     >> So I think the series is for unprivileged_bpf
>         disabled. If I'm not
>         >     >> wrong, I guess the policy is to grant CAP_BPF but do
>         fine grain
>         >     checks
>         >     >> via LSM.
>         >     >>
>         >     >> If this is correct, need to describe it in the commit
>         log.
>         >     >>
>         >     >>
>         >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
>         allows to
>         >     >>> initialize eBPF RSS context with passed program &
>         maps fds.
>         >     >>>
>         >     >>> Added qemu-ebpf-rss-helper - simple helper that
>         loads eBPF
>         >     >>> context and passes fds through unix socket.
>         >     >>> Libvirt should call the helper and pass fds to qemu
>         through
>         >     >>> "ebpf_rss_fds" property.
>         >     >>>
>         >     >>> Added explicit target OS check for libbpf dependency
>         in meson.
>         >     >>> eBPF RSS works only with Linux TAP, so there is no
>         reason to
>         >     >>> build eBPF loader/helper for non-Linux.
>         >     >>>
>         >     >>> Overall, libvirt process should not be aware of the
>         "interface"
>         >     >>> of eBPF RSS, it will not be aware of eBPF
>         maps/program "type" and
>         >     >>> their quantity.
>         >     >> I'm not sure this is the best. We have several
>         examples that
>         >     let libvirt
>         >     >> to involve. Examples:
>         >     >>
>         >     >> 1) create TAP device (and the TUN_SETIFF)
>         >     >>
>         >     >> 2) open vhost devices
>         >     >>
>         >     >>
>         >     >>>    That's why qemu and the helper should be from
>         >     >>> the same build and be "synchronized". Technically
>         each qemu may
>         >     >>> have its own helper. That's why "query-helper-paths"
>         qmp command
>         >     >>> was added. Qemu should return the path to the helper
>         that suits
>         >     >>> and libvirt should use "that" helper for "that"
>         emulator.
>         >     >>>
>         >     >>> qmp sample:
>         >     >>> C: { "execute": "query-helper-paths" }
>         >     >>> S: { "return": [
>         >     >>>        {
>         >     >>>          "name": "qemu-ebpf-rss-helper",
>         >     >>>          "path":
>         "/usr/local/libexec/qemu-ebpf-rss-helper"
>         >     >>>        }
>         >     >>>       ]
>         >     >>>      }
>         >     >> I think we need an example on the detail steps for
>         how libvirt is
>         >     >> expected to use this.
>         >     > The preliminary patches for libvirt are at
>         >     > https://github.com/daynix/libvirt/tree/RSSv1
>         <https://github.com/daynix/libvirt/tree/RSSv1>
>         >     <https://github.com/daynix/libvirt/tree/RSSv1
>         <https://github.com/daynix/libvirt/tree/RSSv1>>
>         >
>         >
>         >     Will have a look but it would be better if the
>         assumption of the
>         >     management is detailed here to ease the reviewers.
>         >
>         >     Thanks
>         >
>         >
>         >     >
>         >
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-21  9:20               ` Jason Wang
@ 2021-06-22  3:29                 ` Yuri Benditovich
  2021-06-22  4:58                   ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Yuri Benditovich @ 2021-06-22  3:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Melnichenko, Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yan Vugenfirer, Eric Blake

On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> > Hi Jason,
> > I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> > and Debian - no need permissions to update BPF maps.
>
>
> How about RHEL :) ?

If I'm not mistaken, the RHEL releases do not use modern kernels yet
(for BPF we need 5.8+).
So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.


>
> Thanks
>
>
> >
> > On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com
> > <mailto:andrew@daynix.com>> wrote:
> >
> >     Hi,
> >
> >         I may miss something.
> >
> >         But RSS requires to update the map. This won't work if you
> >         don't grant
> >         any permission to qemu.
> >
> >         Thanks
> >
> >
> >     Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
> >     not required to update maps.
> >     With "kernel.unprivileged_bpf_disabled=1" - setting maps will
> >     fail(without CAP_BPF) and "in-qemu" RSS will be used.
> >
> >     On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >         在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
> >         > Hi,
> >         >
> >         >     So I think the series is for unprivileged_bpf disabled.
> >         If I'm not
> >         >     wrong, I guess the policy is to grant CAP_BPF but do
> >         fine grain
> >         >     checks
> >         >     via LSM.
> >         >
> >         >
> >         > The main idea is to run eBPF RSS with qemu without any
> >         permission.
> >         > Libvirt should handle everything and pass proper eBPF file
> >         descriptors.
> >         > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
> >         > also required, and in the future may be other permissions.
> >
> >
> >         I may miss something.
> >
> >         But RSS requires to update the map. This won't work if you
> >         don't grant
> >         any permission to qemu.
> >
> >         Thanks
> >
> >
> >         >
> >         >     I'm not sure this is the best. We have several examples
> >         that let
> >         >     libvirt
> >         >     to involve. Examples:
> >         >
> >         >     1) create TAP device (and the TUN_SETIFF)
> >         >
> >         >     2) open vhost devices
> >         >
> >         >
> >         > Technically TAP/vhost not related to a particular qemu
> >         emulator. So common
> >         > TAP creation should fit any modern qemu. eBPF fds(program
> >         and maps) should
> >         > suit the interface for current qemu, g.e. some qemu builds
> >         may have
> >         > different map
> >         > structures or their count. It's necessary that the qemu got fds
> >         > prepared by the helper
> >         > that was built with the qemu.
> >         >
> >         >     I think we need an example on the detail steps for how
> >         libvirt is
> >         >     expected to use this.
> >         >
> >         >
> >         > The simplified workflow looks like this:
> >         >
> >         >  1. Libvirt got "emulator" from domain document.
> >         >  2. Libvirt queries for qemu capabilities.
> >         >  3. One of the capabilities is "qemu-ebpf-rss-helper"
> >         path(if present).
> >         >  4. On NIC preparation Libvirt checks for virtio-net + rss
> >         configurations.
> >         >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
> >         >     received through unix fd.
> >         >  6. Those fds are for eBPF RSS, which passed to child
> >         process - qemu.
> >         >  7. Qemu launched with virtio-net-pci property "rss" and
> >         "ebpf_rss_fds".
> >         >
> >         >
> >         > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
> >         <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >         > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
> >         wrote:
> >         >
> >         >
> >         >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
> >         >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
> >         Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
> >         >     <mailto:jasowang@redhat.com
> >         <mailto:jasowang@redhat.com>>> wrote:
> >         >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
> >         >     >>> Libvirt usually launches qemu with strict permissions.
> >         >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
> >         was added.
> >         >     >> A silly question:
> >         >     >>
> >         >     >> Kernel had the following permission checks in bpf
> >         syscall:
> >         >     >>
> >         >     >>          if (sysctl_unprivileged_bpf_disabled &&
> >         !bpf_capable())
> >         >     >>                   return -EPERM;
> >         >     >> ...
> >         >     >>
> >         >     >>           err = security_bpf(cmd, &attr, size);
> >         >     >>           if (err < 0)
> >         >     >>                   return err;
> >         >     >>
> >         >     >> So if I understand the code correctly, bpf syscall
> >         can only be
> >         >     done if:
> >         >     >>
> >         >     >> 1) unprivileged_bpf is enabled or
> >         >     >> 2) has the capability  and pass the LSM checks
> >         >     >>
> >         >     >> So I think the series is for unprivileged_bpf
> >         disabled. If I'm not
> >         >     >> wrong, I guess the policy is to grant CAP_BPF but do
> >         fine grain
> >         >     checks
> >         >     >> via LSM.
> >         >     >>
> >         >     >> If this is correct, need to describe it in the commit
> >         log.
> >         >     >>
> >         >     >>
> >         >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
> >         allows to
> >         >     >>> initialize eBPF RSS context with passed program &
> >         maps fds.
> >         >     >>>
> >         >     >>> Added qemu-ebpf-rss-helper - simple helper that
> >         loads eBPF
> >         >     >>> context and passes fds through unix socket.
> >         >     >>> Libvirt should call the helper and pass fds to qemu
> >         through
> >         >     >>> "ebpf_rss_fds" property.
> >         >     >>>
> >         >     >>> Added explicit target OS check for libbpf dependency
> >         in meson.
> >         >     >>> eBPF RSS works only with Linux TAP, so there is no
> >         reason to
> >         >     >>> build eBPF loader/helper for non-Linux.
> >         >     >>>
> >         >     >>> Overall, libvirt process should not be aware of the
> >         "interface"
> >         >     >>> of eBPF RSS, it will not be aware of eBPF
> >         maps/program "type" and
> >         >     >>> their quantity.
> >         >     >> I'm not sure this is the best. We have several
> >         examples that
> >         >     let libvirt
> >         >     >> to involve. Examples:
> >         >     >>
> >         >     >> 1) create TAP device (and the TUN_SETIFF)
> >         >     >>
> >         >     >> 2) open vhost devices
> >         >     >>
> >         >     >>
> >         >     >>>    That's why qemu and the helper should be from
> >         >     >>> the same build and be "synchronized". Technically
> >         each qemu may
> >         >     >>> have its own helper. That's why "query-helper-paths"
> >         qmp command
> >         >     >>> was added. Qemu should return the path to the helper
> >         that suits
> >         >     >>> and libvirt should use "that" helper for "that"
> >         emulator.
> >         >     >>>
> >         >     >>> qmp sample:
> >         >     >>> C: { "execute": "query-helper-paths" }
> >         >     >>> S: { "return": [
> >         >     >>>        {
> >         >     >>>          "name": "qemu-ebpf-rss-helper",
> >         >     >>>          "path":
> >         "/usr/local/libexec/qemu-ebpf-rss-helper"
> >         >     >>>        }
> >         >     >>>       ]
> >         >     >>>      }
> >         >     >> I think we need an example on the detail steps for
> >         how libvirt is
> >         >     >> expected to use this.
> >         >     > The preliminary patches for libvirt are at
> >         >     > https://github.com/daynix/libvirt/tree/RSSv1
> >         <https://github.com/daynix/libvirt/tree/RSSv1>
> >         >     <https://github.com/daynix/libvirt/tree/RSSv1
> >         <https://github.com/daynix/libvirt/tree/RSSv1>>
> >         >
> >         >
> >         >     Will have a look but it would be better if the
> >         assumption of the
> >         >     management is detailed here to ease the reviewers.
> >         >
> >         >     Thanks
> >         >
> >         >
> >         >     >
> >         >
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  3:29                 ` Yuri Benditovich
@ 2021-06-22  4:58                   ` Jason Wang
  2021-06-22  8:25                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-22  4:58 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Andrew Melnichenko, Daniel P. Berrangé,
	Michael S . Tsirkin, toke, qemu-devel, Markus Armbruster,
	Yan Vugenfirer, Eric Blake


在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>> Hi Jason,
>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>> and Debian - no need permissions to update BPF maps.
>>
>> How about RHEL :) ?
> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> (for BPF we need 5.8+).
> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.


Adding Toke for more ideas on this.

Thanks


>
>
>> Thanks
>>
>>
>>> On Wed, Jun 16, 2021 at 1:18 AM Andrew Melnichenko <andrew@daynix.com
>>> <mailto:andrew@daynix.com>> wrote:
>>>
>>>      Hi,
>>>
>>>          I may miss something.
>>>
>>>          But RSS requires to update the map. This won't work if you
>>>          don't grant
>>>          any permission to qemu.
>>>
>>>          Thanks
>>>
>>>
>>>      Partly - with "kernel.unprivileged_bpf_disabled=0" capabilities is
>>>      not required to update maps.
>>>      With "kernel.unprivileged_bpf_disabled=1" - setting maps will
>>>      fail(without CAP_BPF) and "in-qemu" RSS will be used.
>>>
>>>      On Tue, Jun 15, 2021 at 12:13 PM Jason Wang <jasowang@redhat.com
>>>      <mailto:jasowang@redhat.com>> wrote:
>>>
>>>
>>>          在 2021/6/12 上午12:49, Andrew Melnichenko 写道:
>>>          > Hi,
>>>          >
>>>          >     So I think the series is for unprivileged_bpf disabled.
>>>          If I'm not
>>>          >     wrong, I guess the policy is to grant CAP_BPF but do
>>>          fine grain
>>>          >     checks
>>>          >     via LSM.
>>>          >
>>>          >
>>>          > The main idea is to run eBPF RSS with qemu without any
>>>          permission.
>>>          > Libvirt should handle everything and pass proper eBPF file
>>>          descriptors.
>>>          > For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations)
>>>          > also required, and in the future may be other permissions.
>>>
>>>
>>>          I may miss something.
>>>
>>>          But RSS requires to update the map. This won't work if you
>>>          don't grant
>>>          any permission to qemu.
>>>
>>>          Thanks
>>>
>>>
>>>          >
>>>          >     I'm not sure this is the best. We have several examples
>>>          that let
>>>          >     libvirt
>>>          >     to involve. Examples:
>>>          >
>>>          >     1) create TAP device (and the TUN_SETIFF)
>>>          >
>>>          >     2) open vhost devices
>>>          >
>>>          >
>>>          > Technically TAP/vhost not related to a particular qemu
>>>          emulator. So common
>>>          > TAP creation should fit any modern qemu. eBPF fds(program
>>>          and maps) should
>>>          > suit the interface for current qemu, g.e. some qemu builds
>>>          may have
>>>          > different map
>>>          > structures or their count. It's necessary that the qemu got fds
>>>          > prepared by the helper
>>>          > that was built with the qemu.
>>>          >
>>>          >     I think we need an example on the detail steps for how
>>>          libvirt is
>>>          >     expected to use this.
>>>          >
>>>          >
>>>          > The simplified workflow looks like this:
>>>          >
>>>          >  1. Libvirt got "emulator" from domain document.
>>>          >  2. Libvirt queries for qemu capabilities.
>>>          >  3. One of the capabilities is "qemu-ebpf-rss-helper"
>>>          path(if present).
>>>          >  4. On NIC preparation Libvirt checks for virtio-net + rss
>>>          configurations.
>>>          >  5. If required, the "qemu-ebpf-rss-helper" called and fds are
>>>          >     received through unix fd.
>>>          >  6. Those fds are for eBPF RSS, which passed to child
>>>          process - qemu.
>>>          >  7. Qemu launched with virtio-net-pci property "rss" and
>>>          "ebpf_rss_fds".
>>>          >
>>>          >
>>>          > On Fri, Jun 11, 2021 at 8:36 AM Jason Wang
>>>          <jasowang@redhat.com <mailto:jasowang@redhat.com>
>>>          > <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>
>>>          wrote:
>>>          >
>>>          >
>>>          >     在 2021/6/10 下午2:55, Yuri Benditovich 写道:
>>>          >     > On Thu, Jun 10, 2021 at 9:41 AM Jason
>>>          Wang<jasowang@redhat.com <mailto:jasowang@redhat.com>
>>>          >     <mailto:jasowang@redhat.com
>>>          <mailto:jasowang@redhat.com>>> wrote:
>>>          >     >> 在 2021/6/9 下午6:04, Andrew Melnychenko 写道:
>>>          >     >>> Libvirt usually launches qemu with strict permissions.
>>>          >     >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper
>>>          was added.
>>>          >     >> A silly question:
>>>          >     >>
>>>          >     >> Kernel had the following permission checks in bpf
>>>          syscall:
>>>          >     >>
>>>          >     >>          if (sysctl_unprivileged_bpf_disabled &&
>>>          !bpf_capable())
>>>          >     >>                   return -EPERM;
>>>          >     >> ...
>>>          >     >>
>>>          >     >>           err = security_bpf(cmd, &attr, size);
>>>          >     >>           if (err < 0)
>>>          >     >>                   return err;
>>>          >     >>
>>>          >     >> So if I understand the code correctly, bpf syscall
>>>          can only be
>>>          >     done if:
>>>          >     >>
>>>          >     >> 1) unprivileged_bpf is enabled or
>>>          >     >> 2) has the capability  and pass the LSM checks
>>>          >     >>
>>>          >     >> So I think the series is for unprivileged_bpf
>>>          disabled. If I'm not
>>>          >     >> wrong, I guess the policy is to grant CAP_BPF but do
>>>          fine grain
>>>          >     checks
>>>          >     >> via LSM.
>>>          >     >>
>>>          >     >> If this is correct, need to describe it in the commit
>>>          log.
>>>          >     >>
>>>          >     >>
>>>          >     >>> Added property "ebpf_rss_fds" for "virtio-net" that
>>>          allows to
>>>          >     >>> initialize eBPF RSS context with passed program &
>>>          maps fds.
>>>          >     >>>
>>>          >     >>> Added qemu-ebpf-rss-helper - simple helper that
>>>          loads eBPF
>>>          >     >>> context and passes fds through unix socket.
>>>          >     >>> Libvirt should call the helper and pass fds to qemu
>>>          through
>>>          >     >>> "ebpf_rss_fds" property.
>>>          >     >>>
>>>          >     >>> Added explicit target OS check for libbpf dependency
>>>          in meson.
>>>          >     >>> eBPF RSS works only with Linux TAP, so there is no
>>>          reason to
>>>          >     >>> build eBPF loader/helper for non-Linux.
>>>          >     >>>
>>>          >     >>> Overall, libvirt process should not be aware of the
>>>          "interface"
>>>          >     >>> of eBPF RSS, it will not be aware of eBPF
>>>          maps/program "type" and
>>>          >     >>> their quantity.
>>>          >     >> I'm not sure this is the best. We have several
>>>          examples that
>>>          >     let libvirt
>>>          >     >> to involve. Examples:
>>>          >     >>
>>>          >     >> 1) create TAP device (and the TUN_SETIFF)
>>>          >     >>
>>>          >     >> 2) open vhost devices
>>>          >     >>
>>>          >     >>
>>>          >     >>>    That's why qemu and the helper should be from
>>>          >     >>> the same build and be "synchronized". Technically
>>>          each qemu may
>>>          >     >>> have its own helper. That's why "query-helper-paths"
>>>          qmp command
>>>          >     >>> was added. Qemu should return the path to the helper
>>>          that suits
>>>          >     >>> and libvirt should use "that" helper for "that"
>>>          emulator.
>>>          >     >>>
>>>          >     >>> qmp sample:
>>>          >     >>> C: { "execute": "query-helper-paths" }
>>>          >     >>> S: { "return": [
>>>          >     >>>        {
>>>          >     >>>          "name": "qemu-ebpf-rss-helper",
>>>          >     >>>          "path":
>>>          "/usr/local/libexec/qemu-ebpf-rss-helper"
>>>          >     >>>        }
>>>          >     >>>       ]
>>>          >     >>>      }
>>>          >     >> I think we need an example on the detail steps for
>>>          how libvirt is
>>>          >     >> expected to use this.
>>>          >     > The preliminary patches for libvirt are at
>>>          >     > https://github.com/daynix/libvirt/tree/RSSv1
>>>          <https://github.com/daynix/libvirt/tree/RSSv1>
>>>          >     <https://github.com/daynix/libvirt/tree/RSSv1
>>>          <https://github.com/daynix/libvirt/tree/RSSv1>>
>>>          >
>>>          >
>>>          >     Will have a look but it would be better if the
>>>          assumption of the
>>>          >     management is detailed here to ease the reviewers.
>>>          >
>>>          >     Thanks
>>>          >
>>>          >
>>>          >     >
>>>          >
>>>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  4:58                   ` Jason Wang
@ 2021-06-22  8:25                     ` Toke Høiland-Jørgensen
  2021-06-22  8:27                       ` Daniel P. Berrangé
  0 siblings, 1 reply; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-22  8:25 UTC (permalink / raw)
  To: Jason Wang, Yuri Benditovich
  Cc: Andrew Melnichenko, Daniel P. Berrangé,
	Michael S . Tsirkin, qemu-devel, Markus Armbruster,
	Yan Vugenfirer, Eric Blake

Jason Wang <jasowang@redhat.com> writes:

> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>> Hi Jason,
>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>> and Debian - no need permissions to update BPF maps.
>>>
>>> How about RHEL :) ?
>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>> (for BPF we need 5.8+).
>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>
> Adding Toke for more ideas on this.

Ignore the kernel version number; we backport all of BPF to RHEL,
basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.

However, we completely disable unprivileged BPF on RHEL kernels. Also,
there's upstream commit:
08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")

which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
believe this may end up being the default on Fedora as well.

So any design relying on unprivileged BPF is likely to break; I'd
suggest you look into how you can get this to work with CAP_BPF :)

-Toke



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  8:25                     ` Toke Høiland-Jørgensen
@ 2021-06-22  8:27                       ` Daniel P. Berrangé
  2021-06-22  9:09                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 31+ messages in thread
From: Daniel P. Berrangé @ 2021-06-22  8:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrew Melnichenko, Michael S . Tsirkin, Jason Wang,
	Markus Armbruster, qemu-devel, Yuri Benditovich, Yan Vugenfirer,
	Eric Blake

On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> Jason Wang <jasowang@redhat.com> writes:
> 
> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>
> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>> Hi Jason,
> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> >>>> and Debian - no need permissions to update BPF maps.
> >>>
> >>> How about RHEL :) ?
> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >> (for BPF we need 5.8+).
> >> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
> >
> > Adding Toke for more ideas on this.
> 
> Ignore the kernel version number; we backport all of BPF to RHEL,
> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> 
> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> there's upstream commit:
> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
> 
> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> believe this may end up being the default on Fedora as well.
> 
> So any design relying on unprivileged BPF is likely to break; I'd
> suggest you look into how you can get this to work with CAP_BPF :)

QEMU will never have any capabilities. Any resources that required
privileges have to be opened by a separate privileged helper, and the
open FD then passed across to the QEMU process. This relies on the
capabilities checks only being performed at time of initial opening,
and *not* on operations performed on the already open FD.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  8:27                       ` Daniel P. Berrangé
@ 2021-06-22  9:09                         ` Toke Høiland-Jørgensen
  2021-06-22 13:01                           ` Andrew Melnichenko
  2021-06-23  0:47                           ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-22  9:09 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Andrew Melnichenko, Michael S . Tsirkin, Jason Wang,
	Markus Armbruster, qemu-devel, Yuri Benditovich, Yan Vugenfirer,
	Eric Blake

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>> Jason Wang <jasowang@redhat.com> writes:
>> 
>> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>> >>>
>> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>> >>>> Hi Jason,
>> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>> >>>> and Debian - no need permissions to update BPF maps.
>> >>>
>> >>> How about RHEL :) ?
>> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>> >> (for BPF we need 5.8+).
>> >> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>> >
>> > Adding Toke for more ideas on this.
>> 
>> Ignore the kernel version number; we backport all of BPF to RHEL,
>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>> 
>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>> there's upstream commit:
>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>> 
>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>> believe this may end up being the default on Fedora as well.
>> 
>> So any design relying on unprivileged BPF is likely to break; I'd
>> suggest you look into how you can get this to work with CAP_BPF :)
>
> QEMU will never have any capabilities. Any resources that required
> privileges have to be opened by a separate privileged helper, and the
> open FD then passed across to the QEMU process. This relies on the
> capabilities checks only being performed at time of initial opening,
> and *not* on operations performed on the already open FD.

That won't work for regular map updates either, unfortunately: you still
have to perform a bpf() syscall to update an element, and that is a
privileged operation.

You may be able to get around this by using an array map type and
mmap()'ing the map contents, but I'm not sure how well that will work
across process boundaries.

If it doesn't, I only see two possibilities: populate the map
ahead-of-time and leave it in place, or keep the privileged helper
process around to perform map updates on behalf of QEMU...

-Toke



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  9:09                         ` Toke Høiland-Jørgensen
@ 2021-06-22 13:01                           ` Andrew Melnichenko
  2021-06-22 13:17                             ` Toke Høiland-Jørgensen
  2021-06-23  0:47                           ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-22 13:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, qemu-devel,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

Hi,
Thank you for your comments.
I'll play with array type mmap. And later will provide some solution.

On Tue, Jun 22, 2021 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> >> Jason Wang <jasowang@redhat.com> writes:
> >>
> >> > 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >> >> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com>
> wrote:
> >> >>>
> >> >>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >> >>>> Hi Jason,
> >> >>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,
> Ubuntu,
> >> >>>> and Debian - no need permissions to update BPF maps.
> >> >>>
> >> >>> How about RHEL :) ?
> >> >> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >> >> (for BPF we need 5.8+).
> >> >> So this will be (probably) relevant for RHEL 9. Please correct me if
> I'm wrong.
> >> >
> >> > Adding Toke for more ideas on this.
> >>
> >> Ignore the kernel version number; we backport all of BPF to RHEL,
> >> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>
> >> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> >> there's upstream commit:
> >> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by
> default")
> >>
> >> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> >> believe this may end up being the default on Fedora as well.
> >>
> >> So any design relying on unprivileged BPF is likely to break; I'd
> >> suggest you look into how you can get this to work with CAP_BPF :)
> >
> > QEMU will never have any capabilities. Any resources that required
> > privileges have to be opened by a separate privileged helper, and the
> > open FD then passed across to the QEMU process. This relies on the
> > capabilities checks only being performed at time of initial opening,
> > and *not* on operations performed on the already open FD.
>
> That won't work for regular map updates either, unfortunately: you still
> have to perform a bpf() syscall to update an element, and that is a
> privileged operation.
>
> You may be able to get around this by using an array map type and
> mmap()'ing the map contents, but I'm not sure how well that will work
> across process boundaries.
>
> If it doesn't, I only see two possibilities: populate the map
> ahead-of-time and leave it in place, or keep the privileged helper
> process around to perform map updates on behalf of QEMU...
>
> -Toke
>
>

[-- Attachment #2: Type: text/html, Size: 3605 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22 13:01                           ` Andrew Melnichenko
@ 2021-06-22 13:17                             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 31+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-22 13:17 UTC (permalink / raw)
  To: Andrew Melnichenko
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, Jason Wang, Markus Armbruster, qemu-devel,
	Yuri Benditovich, Yan Vugenfirer, Eric Blake

Andrew Melnichenko <andrew@daynix.com> writes:

> Hi,
> Thank you for your comments.
> I'll play with array type mmap. And later will provide some solution.

Cool - you're welcome! :)

-Toke



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-22  9:09                         ` Toke Høiland-Jørgensen
  2021-06-22 13:01                           ` Andrew Melnichenko
@ 2021-06-23  0:47                           ` Jason Wang
  2021-06-28 11:18                             ` Yuri Benditovich
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-23  0:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel P. Berrangé
  Cc: Andrew Melnichenko, Michael S . Tsirkin, qemu-devel,
	Markus Armbruster, Yuri Benditovich, Yan Vugenfirer, Eric Blake


在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>>> Jason Wang <jasowang@redhat.com> writes:
>>>
>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>>>>> Hi Jason,
>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>>>>> and Debian - no need permissions to update BPF maps.
>>>>>> How about RHEL :) ?
>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>>>>> (for BPF we need 5.8+).
>>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>>>> Adding Toke for more ideas on this.
>>> Ignore the kernel version number; we backport all of BPF to RHEL,
>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>>>
>>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>>> there's upstream commit:
>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>>>
>>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>>> believe this may end up being the default on Fedora as well.
>>>
>>> So any design relying on unprivileged BPF is likely to break; I'd
>>> suggest you look into how you can get this to work with CAP_BPF :)
>> QEMU will never have any capabilities. Any resources that required
>> privileges have to be opened by a separate privileged helper, and the
>> open FD then passed across to the QEMU process. This relies on the
>> capabilities checks only being performed at time of initial opening,
>> and *not* on operations performed on the already open FD.
> That won't work for regular map updates either, unfortunately: you still
> have to perform a bpf() syscall to update an element, and that is a
> privileged operation.
>
> You may be able to get around this by using an array map type and
> mmap()'ing the map contents, but I'm not sure how well that will work
> across process boundaries.
>
> If it doesn't, I only see two possibilities: populate the map
> ahead-of-time and leave it in place, or keep the privileged helper
> process around to perform map updates on behalf of QEMU...


Right, and this could be probably done by extending and tracking the RSS 
update via rx filter event.

Thanks


>
> -Toke
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-23  0:47                           ` Jason Wang
@ 2021-06-28 11:18                             ` Yuri Benditovich
  2021-06-29  3:39                               ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Yuri Benditovich @ 2021-06-28 11:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrew Melnichenko, Daniel P. Berrangé,
	Michael S . Tsirkin, Toke Høiland-Jørgensen,
	Markus Armbruster, qemu-devel, Yan Vugenfirer, Eric Blake

On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> >> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
> >>> Jason Wang <jasowang@redhat.com> writes:
> >>>
> >>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
> >>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>>>>> Hi Jason,
> >>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
> >>>>>>> and Debian - no need permissions to update BPF maps.
> >>>>>> How about RHEL :) ?
> >>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
> >>>>> (for BPF we need 5.8+).
> >>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
> >>>> Adding Toke for more ideas on this.
> >>> Ignore the kernel version number; we backport all of BPF to RHEL,
> >>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>>
> >>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
> >>> there's upstream commit:
> >>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
> >>>
> >>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
> >>> believe this may end up being the default on Fedora as well.
> >>>
> >>> So any design relying on unprivileged BPF is likely to break; I'd
> >>> suggest you look into how you can get this to work with CAP_BPF :)
> >> QEMU will never have any capabilities. Any resources that required
> >> privileges have to be opened by a separate privileged helper, and the
> >> open FD then passed across to the QEMU process. This relies on the
> >> capabilities checks only being performed at time of initial opening,
> >> and *not* on operations performed on the already open FD.
> > That won't work for regular map updates either, unfortunately: you still
> > have to perform a bpf() syscall to update an element, and that is a
> > privileged operation.
> >
> > You may be able to get around this by using an array map type and
> > mmap()'ing the map contents, but I'm not sure how well that will work
> > across process boundaries.
> >
> > If it doesn't, I only see two possibilities: populate the map
> > ahead-of-time and leave it in place, or keep the privileged helper
> > process around to perform map updates on behalf of QEMU...
>
>
> Right, and this could be probably done by extending and tracking the RSS
> update via rx filter event.

Jason,
Can you please get a little into details - what you mean by 'extending
and tracking the RSS
> update via rx filter event'?

Thanks,
Yuri

>
> Thanks
>
>
> >
> > -Toke
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-28 11:18                             ` Yuri Benditovich
@ 2021-06-29  3:39                               ` Jason Wang
  2021-06-30 16:40                                 ` Andrew Melnichenko
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2021-06-29  3:39 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Andrew Melnichenko, Daniel P. Berrangé,
	Michael S . Tsirkin, Toke Høiland-Jørgensen,
	Markus Armbruster, qemu-devel, Yan Vugenfirer, Eric Blake


在 2021/6/28 下午7:18, Yuri Benditovich 写道:
> On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>
>>>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>>>>> Jason Wang <jasowang@redhat.com> writes:
>>>>>
>>>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>>>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>>>>>>> Hi Jason,
>>>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,  Ubuntu,
>>>>>>>>> and Debian - no need permissions to update BPF maps.
>>>>>>>> How about RHEL :) ?
>>>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>>>>>>> (for BPF we need 5.8+).
>>>>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>>>>>> Adding Toke for more ideas on this.
>>>>> Ignore the kernel version number; we backport all of BPF to RHEL,
>>>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>>>>>
>>>>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>>>>> there's upstream commit:
>>>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>>>>>
>>>>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>>>>> believe this may end up being the default on Fedora as well.
>>>>>
>>>>> So any design relying on unprivileged BPF is likely to break; I'd
>>>>> suggest you look into how you can get this to work with CAP_BPF :)
>>>> QEMU will never have any capabilities. Any resources that required
>>>> privileges have to be opened by a separate privileged helper, and the
>>>> open FD then passed across to the QEMU process. This relies on the
>>>> capabilities checks only being performed at time of initial opening,
>>>> and *not* on operations performed on the already open FD.
>>> That won't work for regular map updates either, unfortunately: you still
>>> have to perform a bpf() syscall to update an element, and that is a
>>> privileged operation.
>>>
>>> You may be able to get around this by using an array map type and
>>> mmap()'ing the map contents, but I'm not sure how well that will work
>>> across process boundaries.
>>>
>>> If it doesn't, I only see two possibilities: populate the map
>>> ahead-of-time and leave it in place, or keep the privileged helper
>>> process around to perform map updates on behalf of QEMU...
>>
>> Right, and this could be probably done by extending and tracking the RSS
>> update via rx filter event.
> Jason,
> Can you please get a little into details - what you mean by 'extending
> and tracking the RSS


There's a monitor event which could be used for qemu to notify the 
privileged application (e.g the one has CAP_NET_ADMIN) to update the rx 
filter attributes of the host networking device.

It works like, when the rx filters is updated by guest, qemu will 
generate an rx filter update event (see rxfilter_notify()) which could 
be captured by the privileged application.

Then the privileged application query rx filter information via 
query-rx-filter command and do the proper setups.

This is designed for macvtap but I think it might be used by RSS as well.

The helper can monitor the rx-filter event and update the eBPF maps. But 
I'm not sure if it needs some coordination with libvirt in this case.

Thanks


>> update via rx filter event'?
> Thanks,
> Yuri
>
>> Thanks
>>
>>
>>> -Toke
>>>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd.
  2021-06-29  3:39                               ` Jason Wang
@ 2021-06-30 16:40                                 ` Andrew Melnichenko
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnichenko @ 2021-06-30 16:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Daniel P. Berrangé,
	Michael S . Tsirkin, Toke Høiland-Jørgensen,
	Markus Armbruster, qemu-devel, Yuri Benditovich, Yan Vugenfirer,
	Eric Blake

[-- Attachment #1: Type: text/plain, Size: 4088 bytes --]

Hi, all.
Thank you for ur comments.
I've tested few possible solutions and I'll prepare new patches for RFC
with mmap() based eBPF in the near future.

On Tue, Jun 29, 2021 at 6:39 AM Jason Wang <jasowang@redhat.com> wrote:

>
> 在 2021/6/28 下午7:18, Yuri Benditovich 写道:
> > On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
> >>> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>>
> >>>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen
> wrote:
> >>>>> Jason Wang <jasowang@redhat.com> writes:
> >>>>>
> >>>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
> >>>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com>
> wrote:
> >>>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
> >>>>>>>>> Hi Jason,
> >>>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora,
> Ubuntu,
> >>>>>>>>> and Debian - no need permissions to update BPF maps.
> >>>>>>>> How about RHEL :) ?
> >>>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels
> yet
> >>>>>>> (for BPF we need 5.8+).
> >>>>>>> So this will be (probably) relevant for RHEL 9. Please correct me
> if I'm wrong.
> >>>>>> Adding Toke for more ideas on this.
> >>>>> Ignore the kernel version number; we backport all of BPF to RHEL,
> >>>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
> >>>>>
> >>>>> However, we completely disable unprivileged BPF on RHEL kernels.
> Also,
> >>>>> there's upstream commit:
> >>>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by
> default")
> >>>>>
> >>>>> which adds a new value of '2' to the unprivileged_bpf_disable
> sysctl. I
> >>>>> believe this may end up being the default on Fedora as well.
> >>>>>
> >>>>> So any design relying on unprivileged BPF is likely to break; I'd
> >>>>> suggest you look into how you can get this to work with CAP_BPF :)
> >>>> QEMU will never have any capabilities. Any resources that required
> >>>> privileges have to be opened by a separate privileged helper, and the
> >>>> open FD then passed across to the QEMU process. This relies on the
> >>>> capabilities checks only being performed at time of initial opening,
> >>>> and *not* on operations performed on the already open FD.
> >>> That won't work for regular map updates either, unfortunately: you
> still
> >>> have to perform a bpf() syscall to update an element, and that is a
> >>> privileged operation.
> >>>
> >>> You may be able to get around this by using an array map type and
> >>> mmap()'ing the map contents, but I'm not sure how well that will work
> >>> across process boundaries.
> >>>
> >>> If it doesn't, I only see two possibilities: populate the map
> >>> ahead-of-time and leave it in place, or keep the privileged helper
> >>> process around to perform map updates on behalf of QEMU...
> >>
> >> Right, and this could be probably done by extending and tracking the RSS
> >> update via rx filter event.
> > Jason,
> > Can you please get a little into details - what you mean by 'extending
> > and tracking the RSS
>
>
> There's a monitor event which could be used for qemu to notify the
> privileged application (e.g the one has CAP_NET_ADMIN) to update the rx
> filter attributes of the host networking device.
>
> It works like, when the rx filters is updated by guest, qemu will
> generate an rx filter update event (see rxfilter_notify()) which could
> be captured by the privileged application.
>
> Then the privileged application query rx filter information via
> query-rx-filter command and do the proper setups.
>
> This is designed for macvtap but I think it might be used by RSS as well.
>
> The helper can monitor the rx-filter event and update the eBPF maps. But
> I'm not sure if it needs some coordination with libvirt in this case.
>
> Thanks
>
>
> >> update via rx filter event'?
> > Thanks,
> > Yuri
> >
> >> Thanks
> >>
> >>
> >>> -Toke
> >>>
>
>

[-- Attachment #2: Type: text/html, Size: 5637 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command.
  2021-06-15 23:16     ` Andrew Melnichenko
@ 2021-07-05 13:50       ` Andrew Melnichenko
  0 siblings, 0 replies; 31+ messages in thread
From: Andrew Melnichenko @ 2021-07-05 13:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Daniel P. Berrangé,
	Michael S. Tsirkin, Jason Wang, qemu-devel, Yuri Benditovich,
	Yan Vugenfirer, Eric Blake

[-- Attachment #1: Type: text/plain, Size: 11074 bytes --]

Hi all,
I'm working right now with the helper stamping.
In short - we can't check the helper stamp as it was done in qemu with
modules.
Gmodule can't load ELF executable also gmodule is not present if qemu
module feature is not set.
Also, CONFIG_STAMP is not present if there is no module feature.
There are some possible solutions:

   - Try to load ELF executable with ld.so and check the stamp symbol.
   - Launch potential helper with an argument(something like --get-stamp)
   and check the output.
   - Write ELF parser to check symbols table.

Also, add a stamp for helpers(something like QEMU_HELPER_STAMP) so the
helper should not be dependent on qemu module feature.
I think is more preferable to write ELF parser and also it should be more
secure(no .so constructors or launching unknown helper).
What do you think people?

On Wed, Jun 16, 2021 at 2:16 AM Andrew Melnichenko <andrew@daynix.com>
wrote:

> Hi all,
>
>> Seems like this function is duplicating what glib should already be
>> able to do.
>>
> Yea, but it's required a Linux specific header - without it, qemu builds
> but crashes.
>
> Could we use a compile-time determination of where we were (supposed)
>> to be installed, and therefore where our helper should be installed,
>> rather than the dynamic /proc/self/exe munging?
>>
> Yes, we can define something like CONFIG_QEMU_HELPERDIR ##
> "qemu-ebpf-rss-helper", for RSS helper.
> But I've tried to implement generic function for possible other helpers.
>
> Yeah I think avoiding /proc/self/exe is desirable, because I can
>> imagine scenarios where this can lead to picking the wrong helper.
>> Better to always use the compile time install directory.
>>
> The main scenario that find_helper() should solve - non installed qemu use
> helper from own build.
> That's why reading /proc/self/exe is implemented.
>
> So the intent is that we can make this list larger if we write other
>> helper binaries.  But this code is in an overall #ifdef CONFIG_LINUX,
>> which means it won't work on other platforms.
>>
> Yes, for now, eBPF RSS is only for virtio-net + Linux TAP.
>
> Checking F_OK (existence) instea of X_OK is odd.
>>
> Libvirt launches qemu to get different properties. That qemu may not have
> permission to launch the helper.
>
> It uses /proc/self/exe to find the running executable's directory.  This
>> is specific to Linux[*].  You get different behavior on Linux vs. other
>> systems.
>>
> The code guarded by CONFIG_LINUX.
>
> * If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
>>   Not good.
>>
> No,  "query-helper-paths" will return an empty list.
>
> * If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
>>   /usr/libexec/qemu-ebpf-rss-helper.  Not good.
>>
> No, /proc/self/exe dereferences "bld/x86_64-softmmu/qemu-system-x86_64"
> to "bld/qemu-system-x86_64"
> and we will get bld/qemu-ebpf-rss-helper.
>
>  The name query-helper-paths is generic, the documented purpose "Query
>> specific eBPF RSS helper" is specific.
>>
>> qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be
>> in sync with QEMU.
>>
> Yea, I'll update the document.
>
> If we want to ensure the right helper runs, we should use a build
>> identifier compiled into the programs, like we do for modules.
>>
> Thanks, I'll check. Overall, current idea was to avoid the use of the
> helper
> from CONFIG_QEMU_HELPERDIR if qemu is not installed(like in your examples).
>
> Helpers QEMU code runs itself should be run as
>> CONFIG_QEMU_HELPERDIR/HELPER, with a suitable user override.  This is
>> how qemu-bridge-helper works.
>>
>> Helpers some other program runs are that other program's problem.
>> They'll probably work the same: built-in default that can be overridden
>> with configuration.
>>
> Well, for qemu it does not really matter how TAP fd was created. It can be
> the helper, Libvirt itself, or a script.
> In the end, "netdev" gets its fds and for qemu there is no difference. TAP
> fd is TAP fd.
> And Libvirt would use the same qemu-bridge-helper(from libvirt/qemu.conf)
> for every qemu "emulator".
> For eBPF we need to create specific maps(and/or thair quantities) that
> contain specific structures and for different
> qemu it may be different.
>
>
>
> On Sat, Jun 12, 2021 at 8:28 AM Markus Armbruster <armbru@redhat.com>
> wrote:
>
>> Andrew Melnychenko <andrew@daynix.com> writes:
>>
>> > New qmp command to query ebpf helper.
>> > It's crucial that qemu and helper are in sync and in touch.
>> > Technically helper should pass eBPF fds that qemu may accept.
>> > And different qemu's builds may have different eBPF programs and
>> helpers.
>> > Qemu returns helper that should "fit" to virtio-net.
>> >
>> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
>> > ---
>> >  monitor/qmp-cmds.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
>> >  qapi/misc.json     | 29 +++++++++++++++++
>> >  2 files changed, 107 insertions(+)
>> >
>> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>> > index f7d64a6457..5dd2a58ea2 100644
>> > --- a/monitor/qmp-cmds.c
>> > +++ b/monitor/qmp-cmds.c
>> > @@ -351,3 +351,81 @@ void qmp_display_reload(DisplayReloadOptions *arg,
>> Error **errp)
>> >          abort();
>> >      }
>> >  }
>> > +
>> > +#ifdef CONFIG_LINUX
>> > +
>> > +static const char *get_dirname(char *path)
>> > +{
>> > +    char *sep;
>> > +
>> > +    sep = strrchr(path, '/');
>> > +    if (sep == path) {
>> > +        return "/";
>> > +    } else if (sep) {
>> > +        *sep = 0;
>> > +        return path;
>> > +    }
>> > +    return ".";
>> > +}
>> > +
>> > +static char *find_helper(const char *name)
>> > +{
>> > +    char qemu_exec[PATH_MAX];
>> > +    const char *qemu_dir = NULL;
>> > +    char *helper = NULL;
>> > +
>> > +    if (name == NULL) {
>> > +        return NULL;
>> > +    }
>> > +
>> > +    if (readlink("/proc/self/exe", qemu_exec, PATH_MAX) > 0) {
>> > +        qemu_dir = get_dirname(qemu_exec);
>> > +
>> > +        helper = g_strdup_printf("%s/%s", qemu_dir, name);
>> > +        if (access(helper, F_OK) == 0) {
>> > +            return helper;
>> > +        }
>> > +        g_free(helper);
>> > +    }
>> > +
>> > +    helper = g_strdup_printf("%s/%s", CONFIG_QEMU_HELPERDIR, name);
>> > +    if (access(helper, F_OK) == 0) {
>> > +        return helper;
>> > +    }
>> > +    g_free(helper);
>> > +
>> > +    return NULL;
>> > +}
>>
>> This returns the helper in the same directory as the running executable,
>> or as a fallback the helper in CONFIG_QEMU_HELPERDIR.
>>
>> Checking F_OK (existence) instea of X_OK is odd.
>>
>> It uses /proc/self/exe to find the running executable's directory.  This
>> is specific to Linux[*].  You get different behavior on Linux vs. other
>> systems.
>>
>> CONFIG_QEMU_HELPERDIR is $prefix/libexec/.
>>
>> If $prefix is /usr, then qemu-system-FOO is normally installed in
>> /usr/bin/, and the helper in /usr/libexec/.  We look for the helper in
>> the wrong place first, and the right one only when it isn't in the wrong
>> place.  Feels overcomplicated and fragile.
>>
>> Consider the following scenario:
>>
>> * The system has a binary package's /usr/bin/qemu-system-x86_64 and
>>   /usr/libexec/qemu-ebpf-rss-helper installed
>>
>> * Alice builds her own QEMU with prefix /usr (and no intention to
>>   install), resulting in bld/qemu-system-x86_64, bld/qemu-ebpf-rss-path,
>>   and a symlink bld/x86_64-softmmu/qemu-system-x86_64.
>>
>> Now:
>>
>> * If Alice runs bld/qemu-system-x86_64, and the host is Linux,
>>   find_helper() returns bld/qemu-ebpf-rss-path.  Good.
>>
>> * If the host isn't Linux, it returns /usr/libexec/qemu-ebpf-rss-helper.
>>   Not good.
>>
>> * If Alice runs bld/x86_64-softmmu/qemu-system-x86_64, it also returns
>>   /usr/libexec/qemu-ebpf-rss-helper.  Not good.
>>
>> > +
>> > +HelperPathList *qmp_query_helper_paths(Error **errp)
>> > +{
>> > +    HelperPathList *ret = NULL;
>> > +    const char *helpers_list[] = {
>> > +#ifdef CONFIG_EBPF
>> > +        "qemu-ebpf-rss-helper",
>> > +#endif
>> > +        NULL
>> > +    };
>> > +    const char **helper_iter = helpers_list;
>> > +
>> > +    for (; *helper_iter != NULL; ++helper_iter) {
>> > +        char *path = find_helper(*helper_iter);
>> > +        if (path) {
>> > +            HelperPath *helper = g_new0(HelperPath, 1);
>> > +            helper->name = g_strdup(*helper_iter);
>> > +            helper->path = path;
>> > +
>> > +            QAPI_LIST_PREPEND(ret, helper);
>> > +        }
>> > +    }
>> > +
>> > +    return ret;
>> > +}
>> > +#else
>> > +
>> > +HelperPathList *qmp_query_helper_paths(Error **errp)
>> > +{
>> > +    return NULL;
>> > +}
>> > +
>> > +#endif
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 156f98203e..023bd2120d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
>> > @@ -519,3 +519,32 @@
>> >   'data': { '*option': 'str' },
>> >   'returns': ['CommandLineOptionInfo'],
>> >   'allow-preconfig': true }
>> > +
>> > +##
>> > +# @HelperPath:
>> > +#
>> > +# Name of the helper and binary location.
>> > +##
>> > +{ 'struct': 'HelperPath',
>> > +  'data': {'name': 'str', 'path': 'str'} }
>> > +
>> > +##
>> > +# @query-helper-paths:
>> > +#
>> > +# Query specific eBPF RSS helper for current qemu binary.
>> > +#
>> > +# Returns: list of object that contains name and path for helper.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "query-helper-paths" }
>> > +# <- { "return": [
>> > +#        {
>> > +#          "name": "qemu-ebpf-rss-helper",
>> > +#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
>> > +#        }
>> > +#      ]
>> > +#    }
>> > +#
>> > +##
>> > +{ 'command': 'query-helper-paths', 'returns': ['HelperPath'] }
>>
>> The name query-helper-paths is generic, the documented purpose "Query
>> specific eBPF RSS helper" is specific.
>>
>> qemu-ebpf-rss-helper isn't necessarily the only helper that needs to be
>> in sync with QEMU.
>>
>> I doubt a query command is a good way to help with using the right one.
>> qemu-system-FOO doesn't really know where the right one is.  Only the
>> person or program that put them where they are does.
>>
>> If we want to ensure the right helper runs, we should use a build
>> identifier compiled into the programs, like we do for modules.
>>
>> For modules, the program loading a module checks the module's build
>> identifier matches its own.
>>
>> For programs talking to each other, the peers together check their build
>> identifiers match.
>>
>> For programs where that isn't practical, the management application can
>> check.
>>
>> This should be a lot more reliable.
>>
>> Helpers QEMU code runs itself should be run as
>> CONFIG_QEMU_HELPERDIR/HELPER, with a suitable user override.  This is
>> how qemu-bridge-helper works.
>>
>> Helpers some other program runs are that other program's problem.
>> They'll probably work the same: built-in default that can be overridden
>> with configuration.
>>
>>
>> [*] For detailed advice, see
>>
>> https://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe
>>
>>

[-- Attachment #2: Type: text/html, Size: 15941 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-07-05 13:51 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 10:04 [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
2021-06-09 10:04 ` [RFC PATCH 1/5] ebpf: Added eBPF initialization by fds Andrew Melnychenko
2021-06-09 10:04 ` [RFC PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
2021-06-09 10:04 ` [RFC PATCH 3/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
2021-06-09 10:04 ` [RFC PATCH 4/5] qmp: Added qemu-ebpf-rss-path command Andrew Melnychenko
2021-06-11 14:15   ` Eric Blake
2021-06-11 17:21     ` Daniel P. Berrangé
2021-06-12  5:28   ` Markus Armbruster
2021-06-15 23:16     ` Andrew Melnichenko
2021-07-05 13:50       ` Andrew Melnichenko
2021-06-09 10:04 ` [RFC PATCH 5/5] meson: libbpf dependency now exclusively for Linux Andrew Melnychenko
2021-06-10  6:41 ` [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd Jason Wang
2021-06-10  6:55   ` Yuri Benditovich
2021-06-11  5:36     ` Jason Wang
2021-06-11 16:49       ` Andrew Melnichenko
2021-06-11 17:24         ` Daniel P. Berrangé
2021-06-15  9:13         ` Jason Wang
2021-06-15 22:18           ` Andrew Melnichenko
2021-06-18 20:03             ` Andrew Melnichenko
2021-06-21  9:20               ` Jason Wang
2021-06-22  3:29                 ` Yuri Benditovich
2021-06-22  4:58                   ` Jason Wang
2021-06-22  8:25                     ` Toke Høiland-Jørgensen
2021-06-22  8:27                       ` Daniel P. Berrangé
2021-06-22  9:09                         ` Toke Høiland-Jørgensen
2021-06-22 13:01                           ` Andrew Melnichenko
2021-06-22 13:17                             ` Toke Høiland-Jørgensen
2021-06-23  0:47                           ` Jason Wang
2021-06-28 11:18                             ` Yuri Benditovich
2021-06-29  3:39                               ` Jason Wang
2021-06-30 16:40                                 ` Andrew Melnichenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.