All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] eBPF RSS Helper support.
@ 2023-02-19 16:20 Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 1/5] ebpf: Added eBPF initialization by fds and map update Andrew Melnychenko
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:20 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

This series of patches provides the ability to initialize eBPF RSS steering
with the helper.
Now, virtio-net devices can accept eBPF programs and maps through properties
as external file descriptors. Access to the eBPF map is direct through mmap()
call, so it should not require additional capabilities to bpf* calls.
eBPF file descriptors can be passed to QEMU from parent process or by unix
socket with sendfd() qmp command.
The helper is provided that would load eBPF RSS program/maps and pass fd to
the called process(in future - Libvirtd) through unix socket.
Because of structures stored in the maps, it's crucial that the helper provides a proper eBPF program. That's why the stamp was added to the helper and
QEMU may check the binary. Also, additional qmp command was added for checking
the stamp.
Overall, the basic scenario of using the helper looks like this:
 * Libvirt checks for ebpf_fds property.
 * Libvirt ask QEMU for the proper helper(where is located and proper stamp)
 * Libvirt calls the helper with BPF capabilities and retrieves fds.
 * Libvirt launches the QEMU with eBPF fds passed.

Changes since RFC:
 * refactored/rebased code.
 * changed qmp command.
 * refactored helper.
 
Andrew Melnychenko (5):
  ebpf: Added eBPF initialization by fds and map update.
  virtio-net: Added property to load eBPF RSS with fds.
  qmp: Added the helper stamp check.
  ebpf_rss_helper: Added helper for eBPF RSS.
  qmp: Added find-ebpf-rss-helper command.

 ebpf/ebpf_rss-stub.c                       |   6 +
 ebpf/ebpf_rss.c                            | 120 ++++++--
 ebpf/ebpf_rss.h                            |  10 +
 ebpf/qemu-ebpf-rss-helper.c                | 132 +++++++++
 hw/net/virtio-net.c                        |  77 ++++-
 include/hw/virtio/virtio-net.h             |   1 +
 meson.build                                |  47 ++-
 monitor/meson.build                        |   1 +
 monitor/qemu-ebpf-rss-helper-stamp-utils.c | 322 +++++++++++++++++++++
 monitor/qemu-ebpf-rss-helper-stamp-utils.h |  39 +++
 monitor/qmp-cmds.c                         |  28 ++
 qapi/misc.json                             |  42 +++
 12 files changed, 785 insertions(+), 40 deletions(-)
 create mode 100644 ebpf/qemu-ebpf-rss-helper.c
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.c
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.h

-- 
2.39.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] ebpf: Added eBPF initialization by fds and map update.
  2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
@ 2023-02-19 16:20 ` Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:20 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

Changed eBPF map updates through mmaped array.
Mmaped arrays provide direct access to map data.
It should omit using bpf_map_update_elem() call,
which may require capabilities that are not present.
Also, eBPF RSS context may be initialized by file descriptors.
virtio-net may provide fds passed by libvirt.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/ebpf_rss-stub.c |   6 +++
 ebpf/ebpf_rss.c      | 120 ++++++++++++++++++++++++++++++++++---------
 ebpf/ebpf_rss.h      |  10 ++++
 3 files changed, 113 insertions(+), 23 deletions(-)

diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
index e71e229190..8d7fae2ad9 100644
--- a/ebpf/ebpf_rss-stub.c
+++ b/ebpf/ebpf_rss-stub.c
@@ -28,6 +28,12 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
     return false;
 }
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd)
+{
+    return false;
+}
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
                       uint16_t *indirections_table, uint8_t *toeplitz_key)
 {
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
index cee658c158..08015fecb1 100644
--- a/ebpf/ebpf_rss.c
+++ b/ebpf/ebpf_rss.c
@@ -27,19 +27,68 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
 {
     if (ctx != NULL) {
         ctx->obj = NULL;
+        ctx->program_fd = -1;
     }
 }
 
 bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 {
-    return ctx != NULL && ctx->obj != NULL;
+    return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
+}
+
+static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
+
+    ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
+                                   PROT_READ | PROT_WRITE, MAP_SHARED,
+                                   ctx->map_configuration, 0);
+    if (ctx->mmap_configuration == MAP_FAILED) {
+        trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration array");
+        return false;
+    }
+    ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
+                                   PROT_READ | PROT_WRITE, MAP_SHARED,
+                                   ctx->map_toeplitz_key, 0);
+    if (ctx->mmap_toeplitz_key == MAP_FAILED) {
+        trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
+        goto toeplitz_fail;
+    }
+    ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
+                                   PROT_READ | PROT_WRITE, MAP_SHARED,
+                                   ctx->map_indirections_table, 0);
+    if (ctx->mmap_indirections_table == MAP_FAILED) {
+        trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
+        goto indirection_fail;
+    }
+
+    return true;
+
+indirection_fail:
+    munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+toeplitz_fail:
+    munmap(ctx->mmap_configuration, qemu_real_host_page_size());
+    return false;
+}
+
+static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return;
+    }
+
+    munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
+    munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
+    munmap(ctx->mmap_configuration, qemu_real_host_page_size());
 }
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 {
     struct rss_bpf *rss_bpf_ctx;
 
-    if (ctx == NULL) {
+    if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
         return false;
     }
 
@@ -66,26 +115,51 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
     ctx->map_toeplitz_key = bpf_map__fd(
             rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
 
+    if (!ebpf_rss_mmap(ctx)) {
+        goto error;
+    }
+
     return true;
 error:
     rss_bpf__destroy(rss_bpf_ctx);
     ctx->obj = NULL;
+    ctx->program_fd = -1;
 
     return false;
 }
 
-static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
-                                struct EBPFRSSConfig *config)
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd)
 {
-    uint32_t map_key = 0;
+    if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
 
-    if (!ebpf_rss_is_loaded(ctx)) {
+    if (program_fd < 0 || config_fd < 0 || toeplitz_fd < 0 || table_fd < 0) {
         return false;
     }
-    if (bpf_map_update_elem(ctx->map_configuration,
-                            &map_key, config, 0) < 0) {
+
+    ctx->program_fd = program_fd;
+    ctx->map_configuration = config_fd;
+    ctx->map_toeplitz_key = toeplitz_fd;
+    ctx->map_indirections_table = table_fd;
+
+    if (!ebpf_rss_mmap(ctx)) {
+        ctx->program_fd = -1;
         return false;
     }
+
+    return true;
+}
+
+static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
+                                struct EBPFRSSConfig *config)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
+
+    memcpy(ctx->mmap_configuration, config, sizeof(*config));
     return true;
 }
 
@@ -93,27 +167,19 @@ static bool ebpf_rss_set_indirections_table(struct EBPFRSSContext *ctx,
                                             uint16_t *indirections_table,
                                             size_t len)
 {
-    uint32_t i = 0;
-
     if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
        len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
         return false;
     }
 
-    for (; i < len; ++i) {
-        if (bpf_map_update_elem(ctx->map_indirections_table, &i,
-                                indirections_table + i, 0) < 0) {
-            return false;
-        }
-    }
+    memcpy(ctx->mmap_indirections_table, indirections_table,
+            sizeof(*indirections_table) * len);
     return true;
 }
 
 static bool ebpf_rss_set_toepliz_key(struct EBPFRSSContext *ctx,
                                      uint8_t *toeplitz_key)
 {
-    uint32_t map_key = 0;
-
     /* prepare toeplitz key */
     uint8_t toe[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {};
 
@@ -123,10 +189,7 @@ static bool ebpf_rss_set_toepliz_key(struct EBPFRSSContext *ctx,
     memcpy(toe, toeplitz_key, VIRTIO_NET_RSS_MAX_KEY_SIZE);
     *(uint32_t *)toe = ntohl(*(uint32_t *)toe);
 
-    if (bpf_map_update_elem(ctx->map_toeplitz_key, &map_key, toe,
-                            0) < 0) {
-        return false;
-    }
+    memcpy(ctx->mmap_toeplitz_key, toe, VIRTIO_NET_RSS_MAX_KEY_SIZE);
     return true;
 }
 
@@ -160,6 +223,17 @@ void ebpf_rss_unload(struct EBPFRSSContext *ctx)
         return;
     }
 
-    rss_bpf__destroy(ctx->obj);
+    ebpf_rss_munmap(ctx);
+
+    if (ctx->obj) {
+        rss_bpf__destroy(ctx->obj);
+    } else {
+        close(ctx->program_fd);
+        close(ctx->map_configuration);
+        close(ctx->map_toeplitz_key);
+        close(ctx->map_indirections_table);
+    }
+
     ctx->obj = NULL;
+    ctx->program_fd = -1;
 }
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
index bf3f2572c7..239242b0d2 100644
--- a/ebpf/ebpf_rss.h
+++ b/ebpf/ebpf_rss.h
@@ -14,12 +14,19 @@
 #ifndef QEMU_EBPF_RSS_H
 #define QEMU_EBPF_RSS_H
 
+#define EBPF_RSS_MAX_FDS 4
+
 struct EBPFRSSContext {
     void *obj;
     int program_fd;
     int map_configuration;
     int map_toeplitz_key;
     int map_indirections_table;
+
+    /* mapped eBPF maps for direct access to omit bpf_map_update_elem() */
+    void *mmap_configuration;
+    void *mmap_toeplitz_key;
+    void *mmap_indirections_table;
 };
 
 struct EBPFRSSConfig {
@@ -36,6 +43,9 @@ bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
 
 bool ebpf_rss_load(struct EBPFRSSContext *ctx);
 
+bool ebpf_rss_load_fds(struct EBPFRSSContext *ctx, int program_fd,
+                       int config_fd, int toeplitz_fd, int table_fd);
+
 bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
                       uint16_t *indirections_table, uint8_t *toeplitz_key);
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds.
  2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 1/5] ebpf: Added eBPF initialization by fds and map update Andrew Melnychenko
@ 2023-02-19 16:20 ` Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 3/5] qmp: Added the helper stamp check Andrew Melnychenko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:20 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

eBPF RSS program and maps may now be passed during initialization.
Initially was implemented for libvirt to launch qemu without permissions,
and initialized eBPF program through the helper.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 hw/net/virtio-net.c            | 77 ++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-net.h |  1 +
 2 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3ae909041a..0ab2cf33f9 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -42,6 +42,7 @@
 #include "sysemu/sysemu.h"
 #include "trace.h"
 #include "monitor/qdev.h"
+#include "monitor/monitor.h"
 #include "hw/pci/pci_device.h"
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
@@ -1290,14 +1291,81 @@ static void virtio_net_detach_epbf_rss(VirtIONet *n)
     virtio_net_attach_ebpf_to_backend(n->nic, -1);
 }
 
-static bool virtio_net_load_ebpf(VirtIONet *n)
+static int virtio_net_get_ebpf_rss_fds(char *str, char *fds[], int nfds)
+{
+    char *ptr = str;
+    char *cur = NULL;
+    size_t len = strlen(str);
+    int i = 0;
+
+    for (; i < nfds && ptr < str + len;) {
+        cur = strchr(ptr, ':');
+
+        if (cur == NULL) {
+            fds[i] = g_strdup(ptr);
+        } else {
+            fds[i] = g_strndup(ptr, cur - ptr);
+        }
+
+        i++;
+        if (cur == NULL) {
+            break;
+        } else {
+            ptr = cur + 1;
+        }
+    }
+
+    return i;
+}
+
+static bool virtio_net_load_ebpf_fds(VirtIONet *n)
 {
-    if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-        /* backend does't support steering ebpf */
+    char *fds_strs[EBPF_RSS_MAX_FDS];
+    int fds[EBPF_RSS_MAX_FDS];
+    int nfds;
+    int ret = false;
+    Error *errp;
+    int i = 0;
+
+    if (n == NULL || !n->ebpf_rss_fds) {
         return false;
     }
 
-    return ebpf_rss_load(&n->ebpf_rss);
+    nfds = virtio_net_get_ebpf_rss_fds(n->ebpf_rss_fds,
+                                       fds_strs, EBPF_RSS_MAX_FDS);
+    for (i = 0; i < nfds; i++) {
+        fds[i] = monitor_fd_param(monitor_cur(), fds_strs[i], &errp);
+    }
+
+    if (nfds == EBPF_RSS_MAX_FDS) {
+        ret = ebpf_rss_load_fds(&n->ebpf_rss, fds[0], fds[1], fds[2], fds[3]);
+    }
+
+    if (!ret) {
+        for (i = 0; i < nfds; i++) {
+            close(fds[i]);
+        }
+    }
+
+    for (i = 0; i < nfds; i++) {
+        g_free(fds_strs[i]);
+    }
+
+    return ret;
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n)
+{
+    bool ret = true;
+
+    if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+        if (!(n->ebpf_rss_fds
+                && virtio_net_load_ebpf_fds(n))) {
+            ret = ebpf_rss_load(&n->ebpf_rss);
+        }
+    }
+
+    return ret;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3868,6 +3936,7 @@ static Property virtio_net_properties[] = {
                     VIRTIO_NET_F_RSS, false),
     DEFINE_PROP_BIT64("hash", VirtIONet, host_features,
                     VIRTIO_NET_F_HASH_REPORT, false),
+    DEFINE_PROP_STRING("ebpf_rss_fds", VirtIONet, ebpf_rss_fds),
     DEFINE_PROP_BIT64("guest_rsc_ext", VirtIONet, host_features,
                     VIRTIO_NET_F_RSC_EXT, false),
     DEFINE_PROP_UINT32("rsc_interval", VirtIONet, rsc_timeout,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index ef234ffe7e..e10ce88f91 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -219,6 +219,7 @@ struct VirtIONet {
     VirtioNetRssData rss_data;
     struct NetRxPkt *rx_pkt;
     struct EBPFRSSContext ebpf_rss;
+    char *ebpf_rss_fds;
 };
 
 size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 1/5] ebpf: Added eBPF initialization by fds and map update Andrew Melnychenko
  2023-02-19 16:20 ` [PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
@ 2023-02-19 16:20 ` Andrew Melnychenko
  2023-02-20  9:49   ` Daniel P. Berrangé
  2023-02-19 16:20 ` [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
  2023-02-19 16:21 ` [PATCH 5/5] qmp: Added find-ebpf-rss-helper command Andrew Melnychenko
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:20 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

Added a function to check the stamp in the helper.
eBPF helper should have a special symbol that generates during the build.
QEMU checks the helper and determines that it fits, so the helper
will produce proper output.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 meson.build                                |  10 +
 monitor/meson.build                        |   1 +
 monitor/qemu-ebpf-rss-helper-stamp-utils.c | 322 +++++++++++++++++++++
 monitor/qemu-ebpf-rss-helper-stamp-utils.h |  39 +++
 4 files changed, 372 insertions(+)
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.c
 create mode 100644 monitor/qemu-ebpf-rss-helper-stamp-utils.h

diff --git a/meson.build b/meson.build
index a76c855312..b409912aed 100644
--- a/meson.build
+++ b/meson.build
@@ -2868,6 +2868,16 @@ foreach d : hx_headers
 endforeach
 genh += hxdep
 
+ebpf_rss_helper_stamp = custom_target(
+    'qemu-ebpf-rss-helper-stamp.h',
+    output : 'qemu-ebpf-rss-helper-stamp.h',
+    input : 'ebpf/rss.bpf.skeleton.h',
+    command : [python, '-c', 'import hashlib; print(\'#define QEMU_EBPF_RSS_HELPER_STAMP qemuEbpfRssHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', \'rb\').read()).hexdigest()))'],
+    capture: true,
+)
+
+genh += ebpf_rss_helper_stamp
+
 ###################
 # Collect sources #
 ###################
diff --git a/monitor/meson.build b/monitor/meson.build
index ccb4d1a8e6..36de73414b 100644
--- a/monitor/meson.build
+++ b/monitor/meson.build
@@ -6,6 +6,7 @@ softmmu_ss.add(files(
   'hmp.c',
 ))
 softmmu_ss.add([spice_headers, files('qmp-cmds.c')])
+softmmu_ss.add(files('qemu-ebpf-rss-helper-stamp-utils.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU',
 		if_true: [files( 'hmp-cmds-target.c', 'hmp-target.c'), spice])
diff --git a/monitor/qemu-ebpf-rss-helper-stamp-utils.c b/monitor/qemu-ebpf-rss-helper-stamp-utils.c
new file mode 100644
index 0000000000..23efc36ef0
--- /dev/null
+++ b/monitor/qemu-ebpf-rss-helper-stamp-utils.c
@@ -0,0 +1,322 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This file mostly implements helper stamp checking.
+ *              The stamp is implemented in a similar way as in qemu modules.
+ *              The helper should contain a specific symbol.
+ *              Not in a similar way is symbol checking - here we parse
+ *              the ELF file. For now, only eBPF helper contains
+ *              the stamp, and the stamp is generated from
+ *              sha1 ebpf/rss.bpf.skeleton.h (see meson.build).
+ */
+
+#include "qemu/osdep.h"
+#include "elf.h"
+#include "qemu-ebpf-rss-helper-stamp-utils.h"
+
+#include <glib/gstdio.h>
+
+#ifdef CONFIG_LINUX
+
+static void *file_allocate_and_read(int fd, off_t off, size_t size)
+{
+    void *data;
+    int err;
+
+    if (fd < 0) {
+        return NULL;
+    }
+
+    err = lseek(fd, off, SEEK_SET);
+    if (err < 0) {
+        return NULL;
+    }
+
+    data = g_new0(char, size);
+    if (data == NULL) {
+        return NULL;
+    }
+
+    err = read(fd, data, size);
+    if (err < 0) {
+        g_free(data);
+        return NULL;
+    }
+
+    return data;
+}
+
+static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header)
+{
+    if (elf_header == NULL) {
+        return NULL;
+    }
+    return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+                             elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header)
+{
+    if (elf_header == NULL) {
+        return NULL;
+    }
+    return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+                             elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header)
+{
+    if (fd < 0 || section_header == NULL) {
+        return NULL;
+    }
+    return file_allocate_and_read(fd, section_header->sh_offset,
+                                  section_header->sh_size);
+}
+
+static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header)
+{
+    if (fd < 0 || section_header == NULL) {
+        return NULL;
+    }
+    return file_allocate_and_read(fd, section_header->sh_offset,
+                                  section_header->sh_size);
+}
+
+static bool elf64_check_symbol_in_symbol_table(int fd,
+                                               Elf64_Shdr *section_table,
+                                               Elf64_Shdr *symbol_section,
+                                               const char *symbol)
+{
+    Elf64_Sym *symbol_table;
+    char *string_table;
+    uint32_t i;
+    bool ret = false;
+
+    symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section);
+    if (symbol_table == NULL) {
+        return false;
+    }
+
+    string_table = (char *) elf64_get_section_data(
+            fd, section_table + symbol_section->sh_link);
+    if (string_table == NULL) {
+        g_free(symbol_table);
+        return false;
+    }
+
+    for (i = 0; i < (symbol_section->sh_size / sizeof(Elf64_Sym)); ++i) {
+        if (strncmp((string_table + symbol_table[i].st_name),
+                     symbol, strlen(symbol)) == 0)
+        {
+            ret = true;
+            break;
+        }
+    }
+
+    g_free(string_table);
+    g_free(symbol_table);
+    return ret;
+}
+
+static bool elf32_check_symbol_in_symbol_table(int fd,
+                                               Elf32_Shdr *section_table,
+                                               Elf32_Shdr *symbol_section,
+                                               const char *symbol)
+{
+    Elf32_Sym *symbol_table;
+    char *string_table;
+    uint32_t i;
+    bool ret = false;
+
+    symbol_table = (Elf32_Sym *) elf32_get_section_data(fd, symbol_section);
+    if (symbol_table == NULL) {
+        return false;
+    }
+
+    string_table = (char *) elf32_get_section_data(fd,
+                                       section_table + symbol_section->sh_link);
+    if (string_table == NULL) {
+        g_free(symbol_table);
+        return false;
+    }
+
+    for (i = 0; i < (symbol_section->sh_size / sizeof(Elf32_Sym)); ++i) {
+        if (strncmp((string_table + symbol_table[i].st_name),
+                     symbol, strlen(symbol)) == 0)
+        {
+            ret = true;
+            break;
+        }
+    }
+
+    g_free(string_table);
+    g_free(symbol_table);
+    return ret;
+}
+
+static bool elf64_check_stamp(int fd, Elf64_Ehdr *elf_header, const char *stamp)
+{
+    Elf64_Shdr *section_table;
+    size_t i;
+    bool ret = false;
+
+    section_table = elf64_get_section_table(fd, elf_header);
+    if (section_table == NULL) {
+        return false;
+    }
+
+    for (i = 0; i < elf_header->e_shnum; ++i) {
+        if ((section_table[i].sh_type == SHT_SYMTAB)
+             || (section_table[i].sh_type == SHT_DYNSYM)) {
+            if (elf64_check_symbol_in_symbol_table(fd, section_table,
+                                                   section_table + i, stamp)) {
+                ret = true;
+                break;
+            }
+        }
+    }
+
+    g_free(section_table);
+    return ret;
+}
+
+static bool elf32_check_stamp(int fd, Elf32_Ehdr *elf_header, const char *stamp)
+{
+    Elf32_Shdr *section_table;
+    size_t i;
+    bool ret = false;
+
+    section_table = elf32_get_section_table(fd, elf_header);
+    if (section_table == NULL) {
+        return false;
+    }
+
+    for (i = 0; i < elf_header->e_shnum; ++i) {
+        if ((section_table[i].sh_type == SHT_SYMTAB)
+             || (section_table[i].sh_type == SHT_DYNSYM)) {
+            if (elf32_check_symbol_in_symbol_table(fd, section_table,
+                                                   section_table + i, stamp)) {
+                ret = true;
+                break;
+            }
+        }
+    }
+
+    g_free(section_table);
+    return ret;
+}
+
+static bool qemu_check_helper_stamp(const char *path, const char *stamp)
+{
+    int fd;
+    bool ret = false;
+    Elf64_Ehdr *elf_header;
+
+    fd = open(path, O_RDONLY | O_SYNC);
+    if (fd < 0) {
+        return false;
+    }
+
+    elf_header = (Elf64_Ehdr *)file_allocate_and_read(
+            fd, 0, sizeof(Elf64_Ehdr));
+    if (elf_header == NULL) {
+        goto error;
+    }
+
+    if (strncmp((char *)elf_header->e_ident, ELFMAG, SELFMAG)) {
+        g_free(elf_header);
+        goto error;
+    }
+
+    if (elf_header->e_ident[EI_CLASS] == ELFCLASS64) {
+        ret = elf64_check_stamp(fd, elf_header, stamp);
+    } else if (elf_header->e_ident[EI_CLASS] == ELFCLASS32) {
+        ret = elf32_check_stamp(fd, (Elf32_Ehdr *)elf_header, stamp);
+    }
+
+    g_free(elf_header);
+error:
+    close(fd);
+    return ret;
+}
+
+#else
+
+static bool qemu_check_helper_stamp(const char *path, const char *stamp)
+{
+    return false;
+}
+
+#endif
+
+char *qemu_find_default_ebpf_rss_helper(void)
+{
+    char *qemu_exec = NULL;
+    char *qemu_dir = NULL;
+    char *helper = NULL;
+
+    helper = g_build_filename(CONFIG_QEMU_HELPERDIR,
+            QEMU_DEFAULT_EBPF_RSS_HELPER_BIN_NAME, NULL);
+    if (g_access(helper, X_OK) == 0
+        && qemu_check_helper_stamp(helper, QEMU_EBPF_RSS_HELPER_STAMP_STR)) {
+        return helper;
+    }
+    g_free(helper);
+
+#ifdef CONFIG_LINUX
+    qemu_exec = g_file_read_link("/proc/self/exe", NULL);
+#else
+    qemu_exec = NULL;
+#endif
+    if (qemu_exec != NULL) {
+        qemu_dir = g_path_get_dirname(qemu_exec);
+        g_free(qemu_exec);
+        helper = g_build_filename(qemu_dir,
+                QEMU_DEFAULT_EBPF_RSS_HELPER_BIN_NAME, NULL);
+        g_free(qemu_dir);
+        if (g_access(helper, X_OK) == 0
+           && qemu_check_helper_stamp(helper, QEMU_EBPF_RSS_HELPER_STAMP_STR)) {
+            return helper;
+        }
+        g_free(helper);
+    }
+
+    return NULL;
+}
+
+char *qemu_check_suggested_ebpf_rss_helper(const char *path)
+{
+    char *helperbin = NULL;
+    struct stat statbuf; /* NOTE: use GStatBuf? */
+
+    /* check is dir or file */
+    if (g_stat(path, &statbuf) < 0) {
+        return NULL;
+    }
+
+    if (statbuf.st_mode & S_IFDIR) {
+        /* is dir */
+        helperbin = g_build_filename(path,
+                QEMU_DEFAULT_EBPF_RSS_HELPER_BIN_NAME, NULL);
+
+    } else if (statbuf.st_mode & S_IFREG) {
+        /* is file */
+        helperbin = g_strdup(path);
+    }
+
+    if (qemu_check_helper_stamp(helperbin, QEMU_EBPF_RSS_HELPER_STAMP_STR)) {
+        return helperbin;
+    }
+
+    g_free(helperbin);
+
+    return NULL;
+}
diff --git a/monitor/qemu-ebpf-rss-helper-stamp-utils.h b/monitor/qemu-ebpf-rss-helper-stamp-utils.h
new file mode 100644
index 0000000000..7490568aa1
--- /dev/null
+++ b/monitor/qemu-ebpf-rss-helper-stamp-utils.h
@@ -0,0 +1,39 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_QEMU_HELPER_STAMP_UTILS_H
+#define QEMU_QEMU_HELPER_STAMP_UTILS_H
+
+#include "qemu-ebpf-rss-helper-stamp.h" /* generated stamp per build */
+
+#define QEMU_EBPF_RSS_HELPER_STAMP_STR     stringify(QEMU_EBPF_RSS_HELPER_STAMP)
+
+#define QEMU_DEFAULT_EBPF_RSS_HELPER_BIN_NAME "qemu-ebpf-rss-helper"
+
+/**
+ * Trying to find the helper with a valid stamp in HELPERDIR
+ * or next to the QEMU binary.
+ * @return path to the eBPF RSS helper bin or NULL(helper not found).
+ */
+char *qemu_find_default_ebpf_rss_helper(void);
+
+/**
+ * Check the helper by the suggested path. The helper should have a valid stamp.
+ * @param path - it can be either a file or directory path.
+ * For the file - checks the stamp of the file.
+ * For the directory - looks for QEMU_DEFAULT_EBPF_RSS_HELPER_BIN_NAME
+ * and checks the stamp of that file.
+ * @return path to valid eBPF RSS helper bin or NULL.
+ */
+char *qemu_check_suggested_ebpf_rss_helper(const char *path);
+
+#endif /* QEMU_QEMU_HELPER_STAMP_UTILS_H */
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS.
  2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
                   ` (2 preceding siblings ...)
  2023-02-19 16:20 ` [PATCH 3/5] qmp: Added the helper stamp check Andrew Melnychenko
@ 2023-02-19 16:20 ` Andrew Melnychenko
  2023-02-20  9:54   ` Daniel P. Berrangé
  2023-02-19 16:21 ` [PATCH 5/5] qmp: Added find-ebpf-rss-helper command Andrew Melnychenko
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:20 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

Helper program. Loads eBPF RSS program and maps and passes them through unix socket.
Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.
Also, libbpf dependency for now is exclusively for Linux.
Libbpf is used for eBPF RSS steering, which is supported only by Linux TAP.
There is no reason yet to build eBPF loader and helper for non-Linux systems,
even if libbpf is present.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/qemu-ebpf-rss-helper.c | 132 ++++++++++++++++++++++++++++++++++++
 meson.build                 |  37 ++++++----
 2 files changed, 156 insertions(+), 13 deletions(-)
 create mode 100644 ebpf/qemu-ebpf-rss-helper.c

diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
new file mode 100644
index 0000000000..348d26bcdd
--- /dev/null
+++ b/ebpf/qemu-ebpf-rss-helper.c
@@ -0,0 +1,132 @@
+/*
+ * eBPF RSS Helper
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This is helper program for libvirtd.
+ *              It loads eBPF RSS program and passes fds through unix socket.
+ *              Built by meson, target - 'qemu-ebpf-rss-helper'.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <getopt.h>
+#include <memory.h>
+#include <errno.h>
+#include <sys/socket.h>
+
+#include "ebpf_rss.h"
+
+#include "qemu-ebpf-rss-helper-stamp.h"
+
+void QEMU_EBPF_RSS_HELPER_STAMP(void) {}
+
+static int send_fds(int socket, int *fds, int n)
+{
+    struct msghdr msg = {};
+    struct cmsghdr *cmsg = NULL;
+    char buf[CMSG_SPACE(n * sizeof(int))];
+    char dummy_buffer = 0;
+    struct iovec io = { .iov_base = &dummy_buffer,
+                        .iov_len = sizeof(dummy_buffer) };
+
+    memset(buf, 0, sizeof(buf));
+
+    msg.msg_iov = &io;
+    msg.msg_iovlen = 1;
+    msg.msg_control = buf;
+    msg.msg_controllen = sizeof(buf);
+
+    cmsg = CMSG_FIRSTHDR(&msg);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
+
+    memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
+
+    return sendmsg(socket, &msg, 0);
+}
+
+static void print_help_and_exit(const char *prog, int exitcode)
+{
+    fprintf(stderr, "%s - load eBPF RSS program for qemu and pass eBPF fds"
+            " through unix socket.\n", prog);
+    fprintf(stderr, "\t--fd <num>, -f <num> - unix socket file descriptor"
+            " used to pass eBPF fds.\n");
+    fprintf(stderr, "\t--help, -h - this help.\n");
+    exit(exitcode);
+}
+
+int main(int argc, char **argv)
+{
+    char *fd_string = NULL;
+    int unix_fd = 0;
+    struct EBPFRSSContext ctx = {};
+    int fds[EBPF_RSS_MAX_FDS] = {};
+    int ret = -1;
+
+    for (;;) {
+        int c;
+        static struct option long_options[] = {
+                {"help",  no_argument, 0, 'h'},
+                {"fd",  required_argument, 0, 'f'},
+                {0, 0, 0, 0}
+        };
+        c = getopt_long(argc, argv, "hf:",
+                long_options, NULL);
+
+        if (c == -1) {
+            break;
+        }
+
+        switch (c) {
+        case 'f':
+            fd_string = optarg;
+            break;
+        case 'h':
+        default:
+            print_help_and_exit(argv[0],
+                    c == 'h' ? EXIT_SUCCESS : EXIT_FAILURE);
+        }
+    }
+
+    if (!fd_string) {
+        fprintf(stderr, "Unix file descriptor not present.\n");
+        print_help_and_exit(argv[0], EXIT_FAILURE);
+    }
+
+    unix_fd = atoi(fd_string);
+
+    if (!unix_fd) {
+        fprintf(stderr, "Unix file descriptor is invalid.\n");
+        return EXIT_FAILURE;
+    }
+
+    ebpf_rss_init(&ctx);
+    if (!ebpf_rss_load(&ctx)) {
+        fprintf(stderr, "Can't load ebpf.\n");
+        return EXIT_FAILURE;
+    }
+    fds[0] = ctx.program_fd;
+    fds[1] = ctx.map_configuration;
+    fds[2] = ctx.map_toeplitz_key;
+    fds[3] = ctx.map_indirections_table;
+
+    ret = send_fds(unix_fd, fds, EBPF_RSS_MAX_FDS);
+    if (ret < 0) {
+        fprintf(stderr, "Issue while sending fds: %s.\n", strerror(errno));
+    }
+
+    ebpf_rss_unload(&ctx);
+
+    return ret < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
diff --git a/meson.build b/meson.build
index b409912aed..6e6e2f3e40 100644
--- a/meson.build
+++ b/meson.build
@@ -1632,19 +1632,22 @@ elif get_option('vduse_blk_export').disabled()
 endif
 
 # libbpf
-libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
-if libbpf.found() and not cc.links('''
-   #include <bpf/libbpf.h>
-   int main(void)
-   {
-     bpf_object__destroy_skeleton(NULL);
-     return 0;
-   }''', dependencies: libbpf)
-  libbpf = not_found
-  if get_option('bpf').enabled()
-    error('libbpf skeleton test failed')
-  else
-    warning('libbpf skeleton test failed, disabling')
+libbpf = not_found
+if targetos == 'linux'
+  libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
+  if libbpf.found() and not cc.links('''
+    #include <bpf/libbpf.h>
+    int main(void)
+    {
+      bpf_object__destroy_skeleton(NULL);
+      return 0;
+    }''', dependencies: libbpf)
+    libbpf = not_found
+    if get_option('bpf').enabled()
+      error('libbpf skeleton test failed')
+    else
+      warning('libbpf skeleton test failed, disabling')
+    endif
   endif
 endif
 
@@ -3646,6 +3649,14 @@ if have_tools
                dependencies: [authz, crypto, io, qom, qemuutil,
                               libcap_ng, mpathpersist],
                install: true)
+
+    if libbpf.found()
+        executable('qemu-ebpf-rss-helper', files(
+                   'ebpf/qemu-ebpf-rss-helper.c', 'ebpf/ebpf_rss.c'),
+                   dependencies: [qemuutil, libbpf, glib],
+                   install: true,
+                   install_dir: get_option('libexecdir'))
+    endif
   endif
 
   if have_ivshmem
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] qmp: Added find-ebpf-rss-helper command.
  2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
                   ` (3 preceding siblings ...)
  2023-02-19 16:20 ` [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
@ 2023-02-19 16:21 ` Andrew Melnychenko
  4 siblings, 0 replies; 26+ messages in thread
From: Andrew Melnychenko @ 2023-02-19 16:21 UTC (permalink / raw)
  To: jasowang, mst, pbonzini, marcandre.lureau, berrange, thuth,
	philmd, armbru, eblake, qemu-devel, toke, mprivozn
  Cc: yuri.benditovich, yan

New qmp command to query ebpf helper.
It's crucial that QEMU and helper are in sync.
Technically helper should pass eBPF fds that QEMU may accept.
And different QEMU's builds may have different eBPF programs.
QEMU returns helper that should "fit" to virtio-net.
QEMU would check the stamp of the helper to make sure
that eBPF program is valid.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 monitor/qmp-cmds.c | 28 ++++++++++++++++++++++++++++
 qapi/misc.json     | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 859012aef4..2f91c34bbb 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -31,6 +31,7 @@
 #include "hw/mem/memory-device.h"
 #include "hw/intc/intc.h"
 #include "hw/rdma/rdma.h"
+#include "qemu-ebpf-rss-helper-stamp-utils.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -202,3 +203,30 @@ static void __attribute__((__constructor__)) monitor_init_qmp_commands(void)
                          qmp_marshal_qmp_capabilities,
                          QCO_ALLOW_PRECONFIG, 0);
 }
+
+HelperPath *qmp_find_ebpf_rss_helper(bool has_path,
+                                     strList *path, Error **errp)
+{
+    HelperPath *ret = NULL;
+    char *helperbin = NULL;
+
+    /* Look for helper in the suggested pathes */
+    if (has_path) {
+        strList *str_list = NULL;
+        for (str_list = path;
+             str_list && !helperbin;
+             str_list = str_list->next) {
+            helperbin = qemu_check_suggested_ebpf_rss_helper(str_list->value);
+        }
+    }
+
+    if (helperbin == NULL) {
+        helperbin = qemu_find_default_ebpf_rss_helper();
+    }
+
+    if (helperbin) {
+        ret = g_new0(HelperPath, 1);
+        ret->path = helperbin;
+    }
+    return ret;
+}
diff --git a/qapi/misc.json b/qapi/misc.json
index 27ef5a2b20..1dfb3c132e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -584,3 +584,45 @@
 { 'event': 'VFU_CLIENT_HANGUP',
   'data': { 'vfu-id': 'str', 'vfu-qom-path': 'str',
             'dev-id': 'str', 'dev-qom-path': 'str' } }
+
+##
+# @HelperPath:
+#
+# Name of the helper and binary location.
+##
+{ 'struct': 'HelperPath',
+  'data': {'path': 'str'} }
+
+##
+# @find-ebpf-rss-helper:
+#
+# Query helper paths to find qemu-ebpf-rss-helper.
+# The qemu would check "the stamp" and
+# returns the proper helper.
+# It's possible to provide paths where to look for a helper.
+# If the path is provided to a file - qemu would check the file for the stamp.
+# If the path is provided to a directory - qemu would look for
+# a file "qemu-ebpf-rss-helper" and check its stamp.
+#
+# Returns: path to the helper with a valid stamp.
+#
+# Note: Provided path arguments have the highest priority where to look
+#       for a helper. Then, default "helper directory" and then
+#       near the qemu binary.
+#
+# Since: 7.2
+#
+# Example:
+#
+# -> { "execute": "find-ebpf-rss-helper", "arguments": { "path": "/opt/qemu_helpers/" } }
+# <- { "return": [
+#        {
+#          "path": "/usr/local/libexec/qemu-ebpf-rss-helper"
+#        }
+#      ]
+#    }
+#
+##
+{ 'command': 'find-ebpf-rss-helper',
+  'data': {'*path': ['str']},
+  'returns': 'HelperPath' }
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-19 16:20 ` [PATCH 3/5] qmp: Added the helper stamp check Andrew Melnychenko
@ 2023-02-20  9:49   ` Daniel P. Berrangé
  2023-02-27  3:45     ` Andrew Melnichenko
  2023-02-28  9:56     ` Yuri Benditovich
  0 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-02-20  9:49 UTC (permalink / raw)
  To: Andrew Melnychenko
  Cc: jasowang, mst, pbonzini, marcandre.lureau, thuth, philmd, armbru,
	eblake, qemu-devel, toke, mprivozn, yuri.benditovich, yan

On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> Added a function to check the stamp in the helper.
> eBPF helper should have a special symbol that generates during the build.
> QEMU checks the helper and determines that it fits, so the helper
> will produce proper output.

I think this is quite limiting for in place upgrades.

Consider this scenario

 * Host has QEMU 8.1.0 installed
 * VM is running QEMU 8.1.0
 * QEMU 8.1.1 is released with a bug fix in the EBF program
 * Host is upgraded to QEMU 8.1.1
 * User attempts to hotplug a NIC to the running VM

IIUC this last step is going to fail because we'll be loading
the EBF program from 8.1.1 and so its hash is different from
that expected by the QEMU 8.1.0 that the pre-existing VM is
running.

If some changes to the EBF program are not going to be back
compatible from the POV of the QEMU process, should we instead
be versioning the EBF program. eg so new QEMU will ship both
the old and new versions of the EBF program. 


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS.
  2023-02-19 16:20 ` [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
@ 2023-02-20  9:54   ` Daniel P. Berrangé
  2023-02-27  3:50     ` Andrew Melnichenko
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-02-20  9:54 UTC (permalink / raw)
  To: Andrew Melnychenko
  Cc: jasowang, mst, pbonzini, marcandre.lureau, thuth, philmd, armbru,
	eblake, qemu-devel, toke, mprivozn, yuri.benditovich, yan

On Sun, Feb 19, 2023 at 06:20:59PM +0200, Andrew Melnychenko wrote:
> Helper program. Loads eBPF RSS program and maps and passes them through unix socket.
> Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.
> Also, libbpf dependency for now is exclusively for Linux.
> Libbpf is used for eBPF RSS steering, which is supported only by Linux TAP.
> There is no reason yet to build eBPF loader and helper for non-Linux systems,
> even if libbpf is present.
> 
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>  ebpf/qemu-ebpf-rss-helper.c | 132 ++++++++++++++++++++++++++++++++++++
>  meson.build                 |  37 ++++++----
>  2 files changed, 156 insertions(+), 13 deletions(-)
>  create mode 100644 ebpf/qemu-ebpf-rss-helper.c
> 
> diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
> new file mode 100644
> index 0000000000..348d26bcdd
> --- /dev/null
> +++ b/ebpf/qemu-ebpf-rss-helper.c
> @@ -0,0 +1,132 @@
> +/*
> + * eBPF RSS Helper
> + *
> + * Developed by Daynix Computing LTD (http://www.daynix.com)
> + *
> + * Authors:
> + *  Andrew Melnychenko <andrew@daynix.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.

Is there a reason for specifying GPL version 2 only ?

Unless this has copied code from one of the existing GPLv2-only files
in QEMU, the requirement (listed in LICENSE) is that new contributions
will be GPLv2-or-later, except for a handful of sub-directories.



> + *
> + * Description: This is helper program for libvirtd.
> + *              It loads eBPF RSS program and passes fds through unix socket.
> + *              Built by meson, target - 'qemu-ebpf-rss-helper'.
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <getopt.h>
> +#include <memory.h>
> +#include <errno.h>
> +#include <sys/socket.h>
> +
> +#include "ebpf_rss.h"
> +
> +#include "qemu-ebpf-rss-helper-stamp.h"
> +
> +void QEMU_EBPF_RSS_HELPER_STAMP(void) {}
> +
> +static int send_fds(int socket, int *fds, int n)
> +{
> +    struct msghdr msg = {};
> +    struct cmsghdr *cmsg = NULL;
> +    char buf[CMSG_SPACE(n * sizeof(int))];
> +    char dummy_buffer = 0;
> +    struct iovec io = { .iov_base = &dummy_buffer,
> +                        .iov_len = sizeof(dummy_buffer) };
> +
> +    memset(buf, 0, sizeof(buf));
> +
> +    msg.msg_iov = &io;
> +    msg.msg_iovlen = 1;
> +    msg.msg_control = buf;
> +    msg.msg_controllen = sizeof(buf);
> +
> +    cmsg = CMSG_FIRSTHDR(&msg);
> +    cmsg->cmsg_level = SOL_SOCKET;
> +    cmsg->cmsg_type = SCM_RIGHTS;
> +    cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
> +
> +    memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
> +
> +    return sendmsg(socket, &msg, 0);
> +}
> +
> +static void print_help_and_exit(const char *prog, int exitcode)
> +{
> +    fprintf(stderr, "%s - load eBPF RSS program for qemu and pass eBPF fds"
> +            " through unix socket.\n", prog);
> +    fprintf(stderr, "\t--fd <num>, -f <num> - unix socket file descriptor"
> +            " used to pass eBPF fds.\n");
> +    fprintf(stderr, "\t--help, -h - this help.\n");
> +    exit(exitcode);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    char *fd_string = NULL;
> +    int unix_fd = 0;
> +    struct EBPFRSSContext ctx = {};
> +    int fds[EBPF_RSS_MAX_FDS] = {};
> +    int ret = -1;
> +
> +    for (;;) {
> +        int c;
> +        static struct option long_options[] = {
> +                {"help",  no_argument, 0, 'h'},
> +                {"fd",  required_argument, 0, 'f'},
> +                {0, 0, 0, 0}
> +        };
> +        c = getopt_long(argc, argv, "hf:",
> +                long_options, NULL);
> +
> +        if (c == -1) {
> +            break;
> +        }
> +
> +        switch (c) {
> +        case 'f':
> +            fd_string = optarg;
> +            break;
> +        case 'h':
> +        default:
> +            print_help_and_exit(argv[0],
> +                    c == 'h' ? EXIT_SUCCESS : EXIT_FAILURE);
> +        }
> +    }
> +
> +    if (!fd_string) {
> +        fprintf(stderr, "Unix file descriptor not present.\n");
> +        print_help_and_exit(argv[0], EXIT_FAILURE);
> +    }
> +
> +    unix_fd = atoi(fd_string);
> +
> +    if (!unix_fd) {
> +        fprintf(stderr, "Unix file descriptor is invalid.\n");
> +        return EXIT_FAILURE;
> +    }
> +
> +    ebpf_rss_init(&ctx);
> +    if (!ebpf_rss_load(&ctx)) {
> +        fprintf(stderr, "Can't load ebpf.\n");
> +        return EXIT_FAILURE;
> +    }
> +    fds[0] = ctx.program_fd;
> +    fds[1] = ctx.map_configuration;
> +    fds[2] = ctx.map_toeplitz_key;
> +    fds[3] = ctx.map_indirections_table;
> +
> +    ret = send_fds(unix_fd, fds, EBPF_RSS_MAX_FDS);
> +    if (ret < 0) {
> +        fprintf(stderr, "Issue while sending fds: %s.\n", strerror(errno));
> +    }
> +
> +    ebpf_rss_unload(&ctx);
> +
> +    return ret < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
> +}
> +
> diff --git a/meson.build b/meson.build
> index b409912aed..6e6e2f3e40 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1632,19 +1632,22 @@ elif get_option('vduse_blk_export').disabled()
>  endif
>  
>  # libbpf
> -libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
> -if libbpf.found() and not cc.links('''
> -   #include <bpf/libbpf.h>
> -   int main(void)
> -   {
> -     bpf_object__destroy_skeleton(NULL);
> -     return 0;
> -   }''', dependencies: libbpf)
> -  libbpf = not_found
> -  if get_option('bpf').enabled()
> -    error('libbpf skeleton test failed')
> -  else
> -    warning('libbpf skeleton test failed, disabling')
> +libbpf = not_found
> +if targetos == 'linux'
> +  libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
> +  if libbpf.found() and not cc.links('''
> +    #include <bpf/libbpf.h>
> +    int main(void)
> +    {
> +      bpf_object__destroy_skeleton(NULL);
> +      return 0;
> +    }''', dependencies: libbpf)
> +    libbpf = not_found
> +    if get_option('bpf').enabled()
> +      error('libbpf skeleton test failed')
> +    else
> +      warning('libbpf skeleton test failed, disabling')
> +    endif
>    endif
>  endif
>  
> @@ -3646,6 +3649,14 @@ if have_tools
>                 dependencies: [authz, crypto, io, qom, qemuutil,
>                                libcap_ng, mpathpersist],
>                 install: true)
> +
> +    if libbpf.found()
> +        executable('qemu-ebpf-rss-helper', files(
> +                   'ebpf/qemu-ebpf-rss-helper.c', 'ebpf/ebpf_rss.c'),
> +                   dependencies: [qemuutil, libbpf, glib],
> +                   install: true,
> +                   install_dir: get_option('libexecdir'))
> +    endif
>    endif
>  
>    if have_ivshmem
> -- 
> 2.39.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-20  9:49   ` Daniel P. Berrangé
@ 2023-02-27  3:45     ` Andrew Melnichenko
  2023-02-27 14:06       ` Toke Høiland-Jørgensen
  2023-02-28  9:56     ` Yuri Benditovich
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Melnichenko @ 2023-02-27  3:45 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, mst, pbonzini, marcandre.lureau, thuth, philmd, armbru,
	eblake, qemu-devel, toke, mprivozn, yuri.benditovich, yan

Hi all,

On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > Added a function to check the stamp in the helper.
> > eBPF helper should have a special symbol that generates during the build.
> > QEMU checks the helper and determines that it fits, so the helper
> > will produce proper output.
>
> I think this is quite limiting for in place upgrades.
>
> Consider this scenario
>
>  * Host has QEMU 8.1.0 installed
>  * VM is running QEMU 8.1.0
>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>  * Host is upgraded to QEMU 8.1.1
>  * User attempts to hotplug a NIC to the running VM
>
> IIUC this last step is going to fail because we'll be loading
> the EBF program from 8.1.1 and so its hash is different from
> that expected by the QEMU 8.1.0 that the pre-existing VM is
> running.
>
> If some changes to the EBF program are not going to be back
> compatible from the POV of the QEMU process, should we instead
> be versioning the EBF program. eg so new QEMU will ship both
> the old and new versions of the EBF program.
>

I think it's too complicated to maintain backward compatibility with
eBPF programs in "one package".
As we can see, the eBPF skeleton may be changed not only by bugfix but
with changes in libbpf(g.e. libbpf 1.0.1+).
This may lead to issues when with a newer environment you can't
consistently recreate the "old" skeleton.

To solve the case - we need to "save" the halper from an old package somehow.

>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS.
  2023-02-20  9:54   ` Daniel P. Berrangé
@ 2023-02-27  3:50     ` Andrew Melnichenko
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Melnichenko @ 2023-02-27  3:50 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, mst, pbonzini, marcandre.lureau, thuth, philmd, armbru,
	eblake, qemu-devel, toke, mprivozn, yuri.benditovich, yan

Hi,

On Mon, Feb 20, 2023 at 11:54 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Sun, Feb 19, 2023 at 06:20:59PM +0200, Andrew Melnychenko wrote:
> > Helper program. Loads eBPF RSS program and maps and passes them through unix socket.
> > Libvirt may launch this helper and pass eBPF fds to qemu virtio-net.
> > Also, libbpf dependency for now is exclusively for Linux.
> > Libbpf is used for eBPF RSS steering, which is supported only by Linux TAP.
> > There is no reason yet to build eBPF loader and helper for non-Linux systems,
> > even if libbpf is present.
> >
> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > ---
> >  ebpf/qemu-ebpf-rss-helper.c | 132 ++++++++++++++++++++++++++++++++++++
> >  meson.build                 |  37 ++++++----
> >  2 files changed, 156 insertions(+), 13 deletions(-)
> >  create mode 100644 ebpf/qemu-ebpf-rss-helper.c
> >
> > diff --git a/ebpf/qemu-ebpf-rss-helper.c b/ebpf/qemu-ebpf-rss-helper.c
> > new file mode 100644
> > index 0000000000..348d26bcdd
> > --- /dev/null
> > +++ b/ebpf/qemu-ebpf-rss-helper.c
> > @@ -0,0 +1,132 @@
> > +/*
> > + * eBPF RSS Helper
> > + *
> > + * Developed by Daynix Computing LTD (http://www.daynix.com)
> > + *
> > + * Authors:
> > + *  Andrew Melnychenko <andrew@daynix.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.  See
> > + * the COPYING file in the top-level directory.
>
> Is there a reason for specifying GPL version 2 only ?
>
> Unless this has copied code from one of the existing GPLv2-only files
> in QEMU, the requirement (listed in LICENSE) is that new contributions
> will be GPLv2-or-later, except for a handful of sub-directories.
>

Yeah - I'll change it. Thank you.

>
>
> > + *
> > + * Description: This is helper program for libvirtd.
> > + *              It loads eBPF RSS program and passes fds through unix socket.
> > + *              Built by meson, target - 'qemu-ebpf-rss-helper'.
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <stdbool.h>
> > +#include <getopt.h>
> > +#include <memory.h>
> > +#include <errno.h>
> > +#include <sys/socket.h>
> > +
> > +#include "ebpf_rss.h"
> > +
> > +#include "qemu-ebpf-rss-helper-stamp.h"
> > +
> > +void QEMU_EBPF_RSS_HELPER_STAMP(void) {}
> > +
> > +static int send_fds(int socket, int *fds, int n)
> > +{
> > +    struct msghdr msg = {};
> > +    struct cmsghdr *cmsg = NULL;
> > +    char buf[CMSG_SPACE(n * sizeof(int))];
> > +    char dummy_buffer = 0;
> > +    struct iovec io = { .iov_base = &dummy_buffer,
> > +                        .iov_len = sizeof(dummy_buffer) };
> > +
> > +    memset(buf, 0, sizeof(buf));
> > +
> > +    msg.msg_iov = &io;
> > +    msg.msg_iovlen = 1;
> > +    msg.msg_control = buf;
> > +    msg.msg_controllen = sizeof(buf);
> > +
> > +    cmsg = CMSG_FIRSTHDR(&msg);
> > +    cmsg->cmsg_level = SOL_SOCKET;
> > +    cmsg->cmsg_type = SCM_RIGHTS;
> > +    cmsg->cmsg_len = CMSG_LEN(n * sizeof(int));
> > +
> > +    memcpy(CMSG_DATA(cmsg), fds, n * sizeof(int));
> > +
> > +    return sendmsg(socket, &msg, 0);
> > +}
> > +
> > +static void print_help_and_exit(const char *prog, int exitcode)
> > +{
> > +    fprintf(stderr, "%s - load eBPF RSS program for qemu and pass eBPF fds"
> > +            " through unix socket.\n", prog);
> > +    fprintf(stderr, "\t--fd <num>, -f <num> - unix socket file descriptor"
> > +            " used to pass eBPF fds.\n");
> > +    fprintf(stderr, "\t--help, -h - this help.\n");
> > +    exit(exitcode);
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > +    char *fd_string = NULL;
> > +    int unix_fd = 0;
> > +    struct EBPFRSSContext ctx = {};
> > +    int fds[EBPF_RSS_MAX_FDS] = {};
> > +    int ret = -1;
> > +
> > +    for (;;) {
> > +        int c;
> > +        static struct option long_options[] = {
> > +                {"help",  no_argument, 0, 'h'},
> > +                {"fd",  required_argument, 0, 'f'},
> > +                {0, 0, 0, 0}
> > +        };
> > +        c = getopt_long(argc, argv, "hf:",
> > +                long_options, NULL);
> > +
> > +        if (c == -1) {
> > +            break;
> > +        }
> > +
> > +        switch (c) {
> > +        case 'f':
> > +            fd_string = optarg;
> > +            break;
> > +        case 'h':
> > +        default:
> > +            print_help_and_exit(argv[0],
> > +                    c == 'h' ? EXIT_SUCCESS : EXIT_FAILURE);
> > +        }
> > +    }
> > +
> > +    if (!fd_string) {
> > +        fprintf(stderr, "Unix file descriptor not present.\n");
> > +        print_help_and_exit(argv[0], EXIT_FAILURE);
> > +    }
> > +
> > +    unix_fd = atoi(fd_string);
> > +
> > +    if (!unix_fd) {
> > +        fprintf(stderr, "Unix file descriptor is invalid.\n");
> > +        return EXIT_FAILURE;
> > +    }
> > +
> > +    ebpf_rss_init(&ctx);
> > +    if (!ebpf_rss_load(&ctx)) {
> > +        fprintf(stderr, "Can't load ebpf.\n");
> > +        return EXIT_FAILURE;
> > +    }
> > +    fds[0] = ctx.program_fd;
> > +    fds[1] = ctx.map_configuration;
> > +    fds[2] = ctx.map_toeplitz_key;
> > +    fds[3] = ctx.map_indirections_table;
> > +
> > +    ret = send_fds(unix_fd, fds, EBPF_RSS_MAX_FDS);
> > +    if (ret < 0) {
> > +        fprintf(stderr, "Issue while sending fds: %s.\n", strerror(errno));
> > +    }
> > +
> > +    ebpf_rss_unload(&ctx);
> > +
> > +    return ret < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
> > +}
> > +
> > diff --git a/meson.build b/meson.build
> > index b409912aed..6e6e2f3e40 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -1632,19 +1632,22 @@ elif get_option('vduse_blk_export').disabled()
> >  endif
> >
> >  # libbpf
> > -libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
> > -if libbpf.found() and not cc.links('''
> > -   #include <bpf/libbpf.h>
> > -   int main(void)
> > -   {
> > -     bpf_object__destroy_skeleton(NULL);
> > -     return 0;
> > -   }''', dependencies: libbpf)
> > -  libbpf = not_found
> > -  if get_option('bpf').enabled()
> > -    error('libbpf skeleton test failed')
> > -  else
> > -    warning('libbpf skeleton test failed, disabling')
> > +libbpf = not_found
> > +if targetos == 'linux'
> > +  libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
> > +  if libbpf.found() and not cc.links('''
> > +    #include <bpf/libbpf.h>
> > +    int main(void)
> > +    {
> > +      bpf_object__destroy_skeleton(NULL);
> > +      return 0;
> > +    }''', dependencies: libbpf)
> > +    libbpf = not_found
> > +    if get_option('bpf').enabled()
> > +      error('libbpf skeleton test failed')
> > +    else
> > +      warning('libbpf skeleton test failed, disabling')
> > +    endif
> >    endif
> >  endif
> >
> > @@ -3646,6 +3649,14 @@ if have_tools
> >                 dependencies: [authz, crypto, io, qom, qemuutil,
> >                                libcap_ng, mpathpersist],
> >                 install: true)
> > +
> > +    if libbpf.found()
> > +        executable('qemu-ebpf-rss-helper', files(
> > +                   'ebpf/qemu-ebpf-rss-helper.c', 'ebpf/ebpf_rss.c'),
> > +                   dependencies: [qemuutil, libbpf, glib],
> > +                   install: true,
> > +                   install_dir: get_option('libexecdir'))
> > +    endif
> >    endif
> >
> >    if have_ivshmem
> > --
> > 2.39.1
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-27  3:45     ` Andrew Melnichenko
@ 2023-02-27 14:06       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-02-27 14:06 UTC (permalink / raw)
  To: Andrew Melnichenko, Daniel P. Berrangé
  Cc: jasowang, mst, pbonzini, marcandre.lureau, thuth, philmd, armbru,
	eblake, qemu-devel, mprivozn, yuri.benditovich, yan

Andrew Melnichenko <andrew@daynix.com> writes:

> Hi all,
>
> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>
>> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
>> > Added a function to check the stamp in the helper.
>> > eBPF helper should have a special symbol that generates during the build.
>> > QEMU checks the helper and determines that it fits, so the helper
>> > will produce proper output.
>>
>> I think this is quite limiting for in place upgrades.
>>
>> Consider this scenario
>>
>>  * Host has QEMU 8.1.0 installed
>>  * VM is running QEMU 8.1.0
>>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>>  * Host is upgraded to QEMU 8.1.1
>>  * User attempts to hotplug a NIC to the running VM
>>
>> IIUC this last step is going to fail because we'll be loading
>> the EBF program from 8.1.1 and so its hash is different from
>> that expected by the QEMU 8.1.0 that the pre-existing VM is
>> running.
>>
>> If some changes to the EBF program are not going to be back
>> compatible from the POV of the QEMU process, should we instead
>> be versioning the EBF program. eg so new QEMU will ship both
>> the old and new versions of the EBF program.
>>
>
> I think it's too complicated to maintain backward compatibility with
> eBPF programs in "one package".
> As we can see, the eBPF skeleton may be changed not only by bugfix but
> with changes in libbpf(g.e. libbpf 1.0.1+).

Hmm, what change in libbpf 1.0.1 affects the skeleton format?

> This may lead to issues when with a newer environment you can't
> consistently recreate the "old" skeleton.

This should be detectable, though? As in, if you know that a new version
breaks compatibility you just bump the version number regardless of the
underlying application version?

-Toke



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-20  9:49   ` Daniel P. Berrangé
  2023-02-27  3:45     ` Andrew Melnichenko
@ 2023-02-28  9:56     ` Yuri Benditovich
  2023-02-28 18:04       ` Daniel P. Berrangé
  1 sibling, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2023-02-28  9:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Andrew Melnychenko, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, toke, mprivozn, yan

[-- Attachment #1: Type: text/plain, Size: 2395 bytes --]

On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > Added a function to check the stamp in the helper.
> > eBPF helper should have a special symbol that generates during the build.
> > QEMU checks the helper and determines that it fits, so the helper
> > will produce proper output.
>
> I think this is quite limiting for in place upgrades.
>
> Consider this scenario
>
>  * Host has QEMU 8.1.0 installed
>  * VM is running QEMU 8.1.0
>  * QEMU 8.1.1 is released with a bug fix in the EBF program
>  * Host is upgraded to QEMU 8.1.1
>  * User attempts to hotplug a NIC to the running VM
>
> IIUC this last step is going to fail because we'll be loading
> the EBF program from 8.1.1 and so its hash is different from
> that expected by the QEMU 8.1.0 that the pre-existing VM is
> running.
>
>   Indeed we did not take in account the in-place upgrade.



> If some changes to the EBF program are not going to be back
> compatible from the POV of the QEMU process, should we instead
> be versioning the EBF program. eg so new QEMU will ship both
> the old and new versions of the EBF program.
>
>
This does not seem to be an elegant option: QEMU theoretically can include
different eBPF programs but it hardly can interface with each one of them.
The code of QEMU (access to eBPF maps etc) includes header files which eBPF
of the day is being built with them.

I see 2 options to address this issue (of course there are more)
1. Build and install qemu-rss-helper-<hash> executable. Libvirt will always
have a correct name, so for the running instance it will use
qemu-rss-helper-<old-hash>, for the new instance it will use
qemu-rss-helper-<new-hash>
2. Build the helper executable and link it inside qemu as a blob. Libvirt
will always retrieve the executable to the temporary file name and use it.
So the retrieved helper will always be compatible with QEMU. I'm not sure
what is the most portable way to do that.

Daniel,
Does one of these seem suitable?


> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

[-- Attachment #2: Type: text/html, Size: 3670 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28  9:56     ` Yuri Benditovich
@ 2023-02-28 18:04       ` Daniel P. Berrangé
  2023-02-28 19:01         ` Toke Høiland-Jørgensen
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-02-28 18:04 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Andrew Melnychenko, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, toke, mprivozn, yan

On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> 
> > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > Added a function to check the stamp in the helper.
> > > eBPF helper should have a special symbol that generates during the build.
> > > QEMU checks the helper and determines that it fits, so the helper
> > > will produce proper output.
> >
> > I think this is quite limiting for in place upgrades.
> >
> > Consider this scenario
> >
> >  * Host has QEMU 8.1.0 installed
> >  * VM is running QEMU 8.1.0
> >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> >  * Host is upgraded to QEMU 8.1.1
> >  * User attempts to hotplug a NIC to the running VM
> >
> > IIUC this last step is going to fail because we'll be loading
> > the EBF program from 8.1.1 and so its hash is different from
> > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > running.
> >
> >   Indeed we did not take in account the in-place upgrade.
> 
> 
> 
> > If some changes to the EBF program are not going to be back
> > compatible from the POV of the QEMU process, should we instead
> > be versioning the EBF program. eg so new QEMU will ship both
> > the old and new versions of the EBF program.
> 
> This does not seem to be an elegant option: QEMU theoretically can include
> different eBPF programs but it hardly can interface with each one of them.
> The code of QEMU (access to eBPF maps etc) includes header files which eBPF
> of the day is being built with them.
> 
> I see 2 options to address this issue (of course there are more)
> 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will always
> have a correct name, so for the running instance it will use
> qemu-rss-helper-<old-hash>, for the new instance it will use
> qemu-rss-helper-<new-hash>

We'll get an ever growing number of program variants we need to
build & distribute with each new QEMU release.

> 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> will always retrieve the executable to the temporary file name and use it.
> So the retrieved helper will always be compatible with QEMU. I'm not sure
> what is the most portable way to do that.

QEMU is considered an untrusted process, so there's no way we're going
to ask it to give us an ELF binary and then execute that in privileged
context.

> Does one of these seem suitable?

Neither feels very appealing to me.

I've been trying to understand the eBPF code we're dealing with in a
little more detail.

IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
for the BPF program, and one or more FDs for the BPF maps that the
program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
the program FD.

The helper program that is proposed here calls ebpf_rss_load() to
load the program and get back a struct which gives access to the
4 FDs, which are then sent to the mgmt app, which forwards them
onto QEMU.

The ebpf_rss_load() method is making use of various structs that
are specific to the RSS program implementation, but does not seems
to do anything especially interesting.  It calls into rss_bpf__open()
which eventually gets around to calling rss_bpf__create_skeleton
which is where the interesting stuff happens.

This rss_bpf__create_skeleton() method is implemented in terms of
totally generic libbpf APIs, and has the actual blob that is the
BPF program.

Looking at what this does, I feel it should be trivial for a mgmt
app to implement equivalent logic to rss_bpf__create_skeleton in a
generic manner, if we could just expose the program blob and the
map names to the mgmt app. eg a simple json file

  {
     "maps": [
        "tap_rss_map_configurations",
	"tap_rss_map_indirection_table",
	"tap_rss_map_toeplitz_key",
     ],
     "program": "....the big blob encoded in base64..."
  }

if we installed that file are /usr/share/qemu/bpf/net-rss.json
then when a QEMU process is started, the mgmt app capture the
data in this JSON file. It now has enough info to create the
EBPF programs needed and pass the FDs over to QEMU. This would
be robust against QEMU software upgrades, and not tied to the
specific EBPF program imlp. We can add or remove maps / change
their names etc any time, as the details in the JSON descriptor
can be updated.  This avoids need for any special helper program
to be provided by QEMU with the problems that is throwing up
for us.

What am I missing ?  This seems pretty straightforward to
achieve from what I see.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 18:04       ` Daniel P. Berrangé
@ 2023-02-28 19:01         ` Toke Høiland-Jørgensen
  2023-02-28 19:03           ` Daniel P. Berrangé
  2023-02-28 19:01         ` Daniel P. Berrangé
  2023-03-01  6:49         ` Yuri Benditovich
  2 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-02-28 19:01 UTC (permalink / raw)
  To: Daniel P. Berrangé, Yuri Benditovich
  Cc: Andrew Melnychenko, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, mprivozn, yan

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
>> On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com>
>> wrote:
>> 
>> > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
>> > > Added a function to check the stamp in the helper.
>> > > eBPF helper should have a special symbol that generates during the build.
>> > > QEMU checks the helper and determines that it fits, so the helper
>> > > will produce proper output.
>> >
>> > I think this is quite limiting for in place upgrades.
>> >
>> > Consider this scenario
>> >
>> >  * Host has QEMU 8.1.0 installed
>> >  * VM is running QEMU 8.1.0
>> >  * QEMU 8.1.1 is released with a bug fix in the EBF program
>> >  * Host is upgraded to QEMU 8.1.1
>> >  * User attempts to hotplug a NIC to the running VM
>> >
>> > IIUC this last step is going to fail because we'll be loading
>> > the EBF program from 8.1.1 and so its hash is different from
>> > that expected by the QEMU 8.1.0 that the pre-existing VM is
>> > running.
>> >
>> >   Indeed we did not take in account the in-place upgrade.
>> 
>> 
>> 
>> > If some changes to the EBF program are not going to be back
>> > compatible from the POV of the QEMU process, should we instead
>> > be versioning the EBF program. eg so new QEMU will ship both
>> > the old and new versions of the EBF program.
>> 
>> This does not seem to be an elegant option: QEMU theoretically can include
>> different eBPF programs but it hardly can interface with each one of them.
>> The code of QEMU (access to eBPF maps etc) includes header files which eBPF
>> of the day is being built with them.
>> 
>> I see 2 options to address this issue (of course there are more)
>> 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will always
>> have a correct name, so for the running instance it will use
>> qemu-rss-helper-<old-hash>, for the new instance it will use
>> qemu-rss-helper-<new-hash>
>
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
>
>> 2. Build the helper executable and link it inside qemu as a blob. Libvirt
>> will always retrieve the executable to the temporary file name and use it.
>> So the retrieved helper will always be compatible with QEMU. I'm not sure
>> what is the most portable way to do that.
>
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
>
>> Does one of these seem suitable?
>
> Neither feels very appealing to me.
>
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
>
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
>
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
>
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
>
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
>
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
>
>   {
>      "maps": [
>         "tap_rss_map_configurations",
> 	"tap_rss_map_indirection_table",
> 	"tap_rss_map_toeplitz_key",
>      ],
>      "program": "....the big blob encoded in base64..."
>   }
>
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.

Just to interject a note on this here: the skeleton code is mostly a
convenience feature used to embed BPF programs into the calling binary.
It is perfectly possible to just have the BPF object file itself reside
directly in the file system and just use the regular libbpf APIs to load
it. Some things get a bit more cumbersome (mostly setting values of
global variables, if the BPF program uses those).

So the JSON example above could just be a regular compiled-from-clang
BPF object file, and the management program can load that, inspect its
contents using the libbpf APIs and pass the file descriptors on to Qemu.
It's even possible to embed version information into this so that Qemu
can check if it understands the format and bail out if it doesn't - just
stick a version field in the configuration map as the first entry :)

-Toke



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 18:04       ` Daniel P. Berrangé
  2023-02-28 19:01         ` Toke Høiland-Jørgensen
@ 2023-02-28 19:01         ` Daniel P. Berrangé
  2023-03-01  6:49         ` Yuri Benditovich
  2 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-02-28 19:01 UTC (permalink / raw)
  To: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	toke, mprivozn, yan

On Tue, Feb 28, 2023 at 06:04:51PM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com>
> > wrote:
> > 
> > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > Added a function to check the stamp in the helper.
> > > > eBPF helper should have a special symbol that generates during the build.
> > > > QEMU checks the helper and determines that it fits, so the helper
> > > > will produce proper output.
> > >
> > > I think this is quite limiting for in place upgrades.
> > >
> > > Consider this scenario
> > >
> > >  * Host has QEMU 8.1.0 installed
> > >  * VM is running QEMU 8.1.0
> > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > >  * Host is upgraded to QEMU 8.1.1
> > >  * User attempts to hotplug a NIC to the running VM
> > >
> > > IIUC this last step is going to fail because we'll be loading
> > > the EBF program from 8.1.1 and so its hash is different from
> > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > running.
> > >
> > >   Indeed we did not take in account the in-place upgrade.
> > 
> > 
> > 
> > > If some changes to the EBF program are not going to be back
> > > compatible from the POV of the QEMU process, should we instead
> > > be versioning the EBF program. eg so new QEMU will ship both
> > > the old and new versions of the EBF program.
> > 
> > This does not seem to be an elegant option: QEMU theoretically can include
> > different eBPF programs but it hardly can interface with each one of them.
> > The code of QEMU (access to eBPF maps etc) includes header files which eBPF
> > of the day is being built with them.
> > 
> > I see 2 options to address this issue (of course there are more)
> > 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will always
> > have a correct name, so for the running instance it will use
> > qemu-rss-helper-<old-hash>, for the new instance it will use
> > qemu-rss-helper-<new-hash>
> 
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
> 
> > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > will always retrieve the executable to the temporary file name and use it.
> > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > what is the most portable way to do that.
> 
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
> 
> > Does one of these seem suitable?
> 
> Neither feels very appealing to me.
> 
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
> 
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
> 
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
> 
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
> 
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
> 
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
> 
>   {
>      "maps": [
>         "tap_rss_map_configurations",
> 	"tap_rss_map_indirection_table",
> 	"tap_rss_map_toeplitz_key",
>      ],
>      "program": "....the big blob encoded in base64..."
>   }
> 
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.

It occurrs to me that exposing the BPF program as data rather than
via binary will make more practical to integrate this into KubeVirt's
architecture. In their deployment setup both QEMU and libvirt are
running unprivileged inside a container. For any advanced nmetworking
a completely separate component creates the TAP device and passes it
into the container running QEMU. I don't think that the separate
precisely matched helper binary would be something they can use, but
it might be possible to expose a data file providing the BPF program
blob and describing its maps.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 19:01         ` Toke Høiland-Jørgensen
@ 2023-02-28 19:03           ` Daniel P. Berrangé
  2023-02-28 22:21             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-02-28 19:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> Just to interject a note on this here: the skeleton code is mostly a
> convenience feature used to embed BPF programs into the calling binary.
> It is perfectly possible to just have the BPF object file itself reside
> directly in the file system and just use the regular libbpf APIs to load
> it. Some things get a bit more cumbersome (mostly setting values of
> global variables, if the BPF program uses those).
> 
> So the JSON example above could just be a regular compiled-from-clang
> BPF object file, and the management program can load that, inspect its
> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> It's even possible to embed version information into this so that Qemu
> can check if it understands the format and bail out if it doesn't - just
> stick a version field in the configuration map as the first entry :)

If all you have is the BPF object file is it possible to interrogate
it to get a list of all the maps, and get FDs associated for them ?
I had a look at the libbpf API and wasn't sure about that, it seemed
like you had to know the required maps upfront ?  If it is possible
to auto-discover everything you need, soley from the BPF object file
as input, then just dealing with that in isolation would feel simpler.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 19:03           ` Daniel P. Berrangé
@ 2023-02-28 22:21             ` Toke Høiland-Jørgensen
  2023-03-01  9:30               ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-02-28 22:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> Just to interject a note on this here: the skeleton code is mostly a
>> convenience feature used to embed BPF programs into the calling binary.
>> It is perfectly possible to just have the BPF object file itself reside
>> directly in the file system and just use the regular libbpf APIs to load
>> it. Some things get a bit more cumbersome (mostly setting values of
>> global variables, if the BPF program uses those).
>> 
>> So the JSON example above could just be a regular compiled-from-clang
>> BPF object file, and the management program can load that, inspect its
>> contents using the libbpf APIs and pass the file descriptors on to Qemu.
>> It's even possible to embed version information into this so that Qemu
>> can check if it understands the format and bail out if it doesn't - just
>> stick a version field in the configuration map as the first entry :)
>
> If all you have is the BPF object file is it possible to interrogate
> it to get a list of all the maps, and get FDs associated for them ?
> I had a look at the libbpf API and wasn't sure about that, it seemed
> like you had to know the required maps upfront ?  If it is possible
> to auto-discover everything you need, soley from the BPF object file
> as input, then just dealing with that in isolation would feel simpler.

It is. You load the object file, and bpf_object__for_each_map() lets you
discover which maps it contains, with the different bpf_map__*() APIs
telling you the properties of that map (and you can modify them too
before loading the object if needed).

The only thing that's not in the object file is any initial data you
want to put into the map(s). But except for read-only maps that can be
added by userspace after loading the maps, so you could just let Qemu do
that...

> It occurrs to me that exposing the BPF program as data rather than
> via binary will make more practical to integrate this into KubeVirt's
> architecture. In their deployment setup both QEMU and libvirt are
> running unprivileged inside a container. For any advanced nmetworking
> a completely separate component creates the TAP device and passes it
> into the container running QEMU. I don't think that the separate
> precisely matched helper binary would be something they can use, but
> it might be possible to expose a data file providing the BPF program
> blob and describing its maps.

Well, "a data file providing the BPF program blob and describing its
maps" is basically what a BPF .o file is. It just happens to be encoded
in ELF format :)

You can embed it into some other data structure and have libbpf load it
from a blob in memory as well as from the filesystem, though; that is
basically what the skeleton file does (notice the big character string
at the end, that's just the original .o file contents).

-Toke



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 18:04       ` Daniel P. Berrangé
  2023-02-28 19:01         ` Toke Høiland-Jørgensen
  2023-02-28 19:01         ` Daniel P. Berrangé
@ 2023-03-01  6:49         ` Yuri Benditovich
  2023-03-01  9:31           ` Daniel P. Berrangé
  2 siblings, 1 reply; 26+ messages in thread
From: Yuri Benditovich @ 2023-03-01  6:49 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Andrew Melnychenko, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, toke, mprivozn, yan

[-- Attachment #1: Type: text/plain, Size: 5870 bytes --]

On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com
> >
> > wrote:
> >
> > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > Added a function to check the stamp in the helper.
> > > > eBPF helper should have a special symbol that generates during the
> build.
> > > > QEMU checks the helper and determines that it fits, so the helper
> > > > will produce proper output.
> > >
> > > I think this is quite limiting for in place upgrades.
> > >
> > > Consider this scenario
> > >
> > >  * Host has QEMU 8.1.0 installed
> > >  * VM is running QEMU 8.1.0
> > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > >  * Host is upgraded to QEMU 8.1.1
> > >  * User attempts to hotplug a NIC to the running VM
> > >
> > > IIUC this last step is going to fail because we'll be loading
> > > the EBF program from 8.1.1 and so its hash is different from
> > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > running.
> > >
> > >   Indeed we did not take in account the in-place upgrade.
> >
> >
> >
> > > If some changes to the EBF program are not going to be back
> > > compatible from the POV of the QEMU process, should we instead
> > > be versioning the EBF program. eg so new QEMU will ship both
> > > the old and new versions of the EBF program.
> >
> > This does not seem to be an elegant option: QEMU theoretically can
> include
> > different eBPF programs but it hardly can interface with each one of
> them.
> > The code of QEMU (access to eBPF maps etc) includes header files which
> eBPF
> > of the day is being built with them.
> >
> > I see 2 options to address this issue (of course there are more)
> > 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will
> always
> > have a correct name, so for the running instance it will use
> > qemu-rss-helper-<old-hash>, for the new instance it will use
> > qemu-rss-helper-<new-hash>
>
> We'll get an ever growing number of program variants we need to
> build & distribute with each new QEMU release.
>

New release of the qemu-rss-helper-<new-hash> will be created in fact only
when the eBPF binary is updated.
This does not happen on each release. But yes, this looks like versioning
of all the shared libraries.


>
> > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > will always retrieve the executable to the temporary file name and use
> it.
> > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > what is the most portable way to do that.
>
> QEMU is considered an untrusted process, so there's no way we're going
> to ask it to give us an ELF binary and then execute that in privileged
> context.
>
> > Does one of these seem suitable?
>
> Neither feels very appealing to me.
>
> I've been trying to understand the eBPF code we're dealing with in a
> little more detail.
>
> IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> for the BPF program, and one or more FDs for the BPF maps that the
> program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> the program FD.
>
> The helper program that is proposed here calls ebpf_rss_load() to
> load the program and get back a struct which gives access to the
> 4 FDs, which are then sent to the mgmt app, which forwards them
> onto QEMU.
>
> The ebpf_rss_load() method is making use of various structs that
> are specific to the RSS program implementation, but does not seems
> to do anything especially interesting.  It calls into rss_bpf__open()
> which eventually gets around to calling rss_bpf__create_skeleton
> which is where the interesting stuff happens.
>
> This rss_bpf__create_skeleton() method is implemented in terms of
> totally generic libbpf APIs, and has the actual blob that is the
> BPF program.
>
> Looking at what this does, I feel it should be trivial for a mgmt
> app to implement equivalent logic to rss_bpf__create_skeleton in a
> generic manner, if we could just expose the program blob and the
> map names to the mgmt app. eg a simple json file
>
>   {
>      "maps": [
>         "tap_rss_map_configurations",
>         "tap_rss_map_indirection_table",
>         "tap_rss_map_toeplitz_key",
>      ],
>      "program": "....the big blob encoded in base64..."
>   }
>
> if we installed that file are /usr/share/qemu/bpf/net-rss.json
> then when a QEMU process is started, the mgmt app capture the
> data in this JSON file. It now has enough info to create the
> EBPF programs needed and pass the FDs over to QEMU. This would
> be robust against QEMU software upgrades, and not tied to the
> specific EBPF program imlp. We can add or remove maps / change
> their names etc any time, as the details in the JSON descriptor
> can be updated.  This avoids need for any special helper program
> to be provided by QEMU with the problems that is throwing up
> for us.
>

If I understand correctly, the libvirt will have the same problem in the
in-place update scenario with the JSON file
Let's say that there is no virtio-net device at the initial start and the
libvirt does not need to load the JSON file.
On the later hot plug of virtio-net it will go to the JSON file after the
update, correct?


>
> What am I missing ?  This seems pretty straightforward to
> achieve from what I see.
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

[-- Attachment #2: Type: text/html, Size: 7692 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-02-28 22:21             ` Toke Høiland-Jørgensen
@ 2023-03-01  9:30               ` Daniel P. Berrangé
  2023-03-01 14:53                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-03-01  9:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> Just to interject a note on this here: the skeleton code is mostly a
> >> convenience feature used to embed BPF programs into the calling binary.
> >> It is perfectly possible to just have the BPF object file itself reside
> >> directly in the file system and just use the regular libbpf APIs to load
> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> global variables, if the BPF program uses those).
> >> 
> >> So the JSON example above could just be a regular compiled-from-clang
> >> BPF object file, and the management program can load that, inspect its
> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> >> It's even possible to embed version information into this so that Qemu
> >> can check if it understands the format and bail out if it doesn't - just
> >> stick a version field in the configuration map as the first entry :)
> >
> > If all you have is the BPF object file is it possible to interrogate
> > it to get a list of all the maps, and get FDs associated for them ?
> > I had a look at the libbpf API and wasn't sure about that, it seemed
> > like you had to know the required maps upfront ?  If it is possible
> > to auto-discover everything you need, soley from the BPF object file
> > as input, then just dealing with that in isolation would feel simpler.
> 
> It is. You load the object file, and bpf_object__for_each_map() lets you
> discover which maps it contains, with the different bpf_map__*() APIs
> telling you the properties of that map (and you can modify them too
> before loading the object if needed).
> 
> The only thing that's not in the object file is any initial data you
> want to put into the map(s). But except for read-only maps that can be
> added by userspace after loading the maps, so you could just let Qemu do
> that...
> 
> > It occurrs to me that exposing the BPF program as data rather than
> > via binary will make more practical to integrate this into KubeVirt's
> > architecture. In their deployment setup both QEMU and libvirt are
> > running unprivileged inside a container. For any advanced nmetworking
> > a completely separate component creates the TAP device and passes it
> > into the container running QEMU. I don't think that the separate
> > precisely matched helper binary would be something they can use, but
> > it might be possible to expose a data file providing the BPF program
> > blob and describing its maps.
> 
> Well, "a data file providing the BPF program blob and describing its
> maps" is basically what a BPF .o file is. It just happens to be encoded
> in ELF format :)
> 
> You can embed it into some other data structure and have libbpf load it
> from a blob in memory as well as from the filesystem, though; that is
> basically what the skeleton file does (notice the big character string
> at the end, that's just the original .o file contents).

Ok, in that case I'm really wondering why any of this helper program
stuff was proposed. I recall the rationale was that it was impossible
for an external program to load the BPF object on behalf of QEMU,
because it would not know how todo that without QEMU specific
knowledge.

It looks like we can simply expose the BPF object blob to mgmt apps
directly and get rid of this helper program entirely.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-01  6:49         ` Yuri Benditovich
@ 2023-03-01  9:31           ` Daniel P. Berrangé
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-03-01  9:31 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Andrew Melnychenko, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, toke, mprivozn, yan

On Wed, Mar 01, 2023 at 08:49:42AM +0200, Yuri Benditovich wrote:
> On Tue, Feb 28, 2023 at 8:05 PM Daniel P. Berrangé <berrange@redhat.com>
> wrote:
> 
> > On Tue, Feb 28, 2023 at 11:56:27AM +0200, Yuri Benditovich wrote:
> > > On Mon, Feb 20, 2023 at 11:50 AM Daniel P. Berrangé <berrange@redhat.com
> > >
> > > wrote:
> > >
> > > > On Sun, Feb 19, 2023 at 06:20:58PM +0200, Andrew Melnychenko wrote:
> > > > > Added a function to check the stamp in the helper.
> > > > > eBPF helper should have a special symbol that generates during the
> > build.
> > > > > QEMU checks the helper and determines that it fits, so the helper
> > > > > will produce proper output.
> > > >
> > > > I think this is quite limiting for in place upgrades.
> > > >
> > > > Consider this scenario
> > > >
> > > >  * Host has QEMU 8.1.0 installed
> > > >  * VM is running QEMU 8.1.0
> > > >  * QEMU 8.1.1 is released with a bug fix in the EBF program
> > > >  * Host is upgraded to QEMU 8.1.1
> > > >  * User attempts to hotplug a NIC to the running VM
> > > >
> > > > IIUC this last step is going to fail because we'll be loading
> > > > the EBF program from 8.1.1 and so its hash is different from
> > > > that expected by the QEMU 8.1.0 that the pre-existing VM is
> > > > running.
> > > >
> > > >   Indeed we did not take in account the in-place upgrade.
> > >
> > >
> > >
> > > > If some changes to the EBF program are not going to be back
> > > > compatible from the POV of the QEMU process, should we instead
> > > > be versioning the EBF program. eg so new QEMU will ship both
> > > > the old and new versions of the EBF program.
> > >
> > > This does not seem to be an elegant option: QEMU theoretically can
> > include
> > > different eBPF programs but it hardly can interface with each one of
> > them.
> > > The code of QEMU (access to eBPF maps etc) includes header files which
> > eBPF
> > > of the day is being built with them.
> > >
> > > I see 2 options to address this issue (of course there are more)
> > > 1. Build and install qemu-rss-helper-<hash> executable. Libvirt will
> > always
> > > have a correct name, so for the running instance it will use
> > > qemu-rss-helper-<old-hash>, for the new instance it will use
> > > qemu-rss-helper-<new-hash>
> >
> > We'll get an ever growing number of program variants we need to
> > build & distribute with each new QEMU release.
> >
> 
> New release of the qemu-rss-helper-<new-hash> will be created in fact only
> when the eBPF binary is updated.
> This does not happen on each release. But yes, this looks like versioning
> of all the shared libraries.
> 
> 
> >
> > > 2. Build the helper executable and link it inside qemu as a blob. Libvirt
> > > will always retrieve the executable to the temporary file name and use
> > it.
> > > So the retrieved helper will always be compatible with QEMU. I'm not sure
> > > what is the most portable way to do that.
> >
> > QEMU is considered an untrusted process, so there's no way we're going
> > to ask it to give us an ELF binary and then execute that in privileged
> > context.
> >
> > > Does one of these seem suitable?
> >
> > Neither feels very appealing to me.
> >
> > I've been trying to understand the eBPF code we're dealing with in a
> > little more detail.
> >
> > IIUC, QEMU, or rather the virtio-net  driver needs to receive one FD
> > for the BPF program, and one or more FDs for the BPF maps that the
> > program uses. Currently it uses 3 maps, so needs 3 map FDs on top of
> > the program FD.
> >
> > The helper program that is proposed here calls ebpf_rss_load() to
> > load the program and get back a struct which gives access to the
> > 4 FDs, which are then sent to the mgmt app, which forwards them
> > onto QEMU.
> >
> > The ebpf_rss_load() method is making use of various structs that
> > are specific to the RSS program implementation, but does not seems
> > to do anything especially interesting.  It calls into rss_bpf__open()
> > which eventually gets around to calling rss_bpf__create_skeleton
> > which is where the interesting stuff happens.
> >
> > This rss_bpf__create_skeleton() method is implemented in terms of
> > totally generic libbpf APIs, and has the actual blob that is the
> > BPF program.
> >
> > Looking at what this does, I feel it should be trivial for a mgmt
> > app to implement equivalent logic to rss_bpf__create_skeleton in a
> > generic manner, if we could just expose the program blob and the
> > map names to the mgmt app. eg a simple json file
> >
> >   {
> >      "maps": [
> >         "tap_rss_map_configurations",
> >         "tap_rss_map_indirection_table",
> >         "tap_rss_map_toeplitz_key",
> >      ],
> >      "program": "....the big blob encoded in base64..."
> >   }
> >
> > if we installed that file are /usr/share/qemu/bpf/net-rss.json
> > then when a QEMU process is started, the mgmt app capture the
> > data in this JSON file. It now has enough info to create the
> > EBPF programs needed and pass the FDs over to QEMU. This would
> > be robust against QEMU software upgrades, and not tied to the
> > specific EBPF program imlp. We can add or remove maps / change
> > their names etc any time, as the details in the JSON descriptor
> > can be updated.  This avoids need for any special helper program
> > to be provided by QEMU with the problems that is throwing up
> > for us.
> >
> 
> If I understand correctly, the libvirt will have the same problem in the
> in-place update scenario with the JSON file
> Let's say that there is no virtio-net device at the initial start and the
> libvirt does not need to load the JSON file.
> On the later hot plug of virtio-net it will go to the JSON file after the
> update, correct?

Libvirt does work to detect capabilities of a QEMU binary before starting
a guest, and caches this info for later use when the VM is running.  I'm
suggesting that libvirt can detect the BPF program info at this time and
cache it, regardless of whether it initially needs to use the BPF program.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-01  9:30               ` Daniel P. Berrangé
@ 2023-03-01 14:53                 ` Toke Høiland-Jørgensen
  2023-03-01 15:05                   ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-03-01 14:53 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >> 
>> >> Just to interject a note on this here: the skeleton code is mostly a
>> >> convenience feature used to embed BPF programs into the calling binary.
>> >> It is perfectly possible to just have the BPF object file itself reside
>> >> directly in the file system and just use the regular libbpf APIs to load
>> >> it. Some things get a bit more cumbersome (mostly setting values of
>> >> global variables, if the BPF program uses those).
>> >> 
>> >> So the JSON example above could just be a regular compiled-from-clang
>> >> BPF object file, and the management program can load that, inspect its
>> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
>> >> It's even possible to embed version information into this so that Qemu
>> >> can check if it understands the format and bail out if it doesn't - just
>> >> stick a version field in the configuration map as the first entry :)
>> >
>> > If all you have is the BPF object file is it possible to interrogate
>> > it to get a list of all the maps, and get FDs associated for them ?
>> > I had a look at the libbpf API and wasn't sure about that, it seemed
>> > like you had to know the required maps upfront ?  If it is possible
>> > to auto-discover everything you need, soley from the BPF object file
>> > as input, then just dealing with that in isolation would feel simpler.
>> 
>> It is. You load the object file, and bpf_object__for_each_map() lets you
>> discover which maps it contains, with the different bpf_map__*() APIs
>> telling you the properties of that map (and you can modify them too
>> before loading the object if needed).
>> 
>> The only thing that's not in the object file is any initial data you
>> want to put into the map(s). But except for read-only maps that can be
>> added by userspace after loading the maps, so you could just let Qemu do
>> that...
>> 
>> > It occurrs to me that exposing the BPF program as data rather than
>> > via binary will make more practical to integrate this into KubeVirt's
>> > architecture. In their deployment setup both QEMU and libvirt are
>> > running unprivileged inside a container. For any advanced nmetworking
>> > a completely separate component creates the TAP device and passes it
>> > into the container running QEMU. I don't think that the separate
>> > precisely matched helper binary would be something they can use, but
>> > it might be possible to expose a data file providing the BPF program
>> > blob and describing its maps.
>> 
>> Well, "a data file providing the BPF program blob and describing its
>> maps" is basically what a BPF .o file is. It just happens to be encoded
>> in ELF format :)
>> 
>> You can embed it into some other data structure and have libbpf load it
>> from a blob in memory as well as from the filesystem, though; that is
>> basically what the skeleton file does (notice the big character string
>> at the end, that's just the original .o file contents).
>
> Ok, in that case I'm really wondering why any of this helper program
> stuff was proposed. I recall the rationale was that it was impossible
> for an external program to load the BPF object on behalf of QEMU,
> because it would not know how todo that without QEMU specific
> knowledge.

I'm not sure either. Was there some bits that initially needed to be set
before the program was loaded (read-only maps or something)? Also,
upstream does encourage the use of skeletons for embedding into
applications, so it's not an unreasonable thing to start with if you
don't have the kind of deployment constraints that Qemu does in this
case.

> It looks like we can simply expose the BPF object blob to mgmt apps
> directly and get rid of this helper program entirely.

I believe so, yes. You'd still need to be sure that the BPF object file
itself comes from a trusted place, but hopefully it should be enough to
load it from a known filesystem path? (Sorry if this is a stupid
question, I only have a fuzzy idea of how all the pieces fit together
here).

-Toke



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-01 14:53                 ` Toke Høiland-Jørgensen
@ 2023-03-01 15:05                   ` Daniel P. Berrangé
  2023-03-01 22:40                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-03-01 15:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> >> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> >> 
> >> >> Just to interject a note on this here: the skeleton code is mostly a
> >> >> convenience feature used to embed BPF programs into the calling binary.
> >> >> It is perfectly possible to just have the BPF object file itself reside
> >> >> directly in the file system and just use the regular libbpf APIs to load
> >> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> >> global variables, if the BPF program uses those).
> >> >> 
> >> >> So the JSON example above could just be a regular compiled-from-clang
> >> >> BPF object file, and the management program can load that, inspect its
> >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> >> >> It's even possible to embed version information into this so that Qemu
> >> >> can check if it understands the format and bail out if it doesn't - just
> >> >> stick a version field in the configuration map as the first entry :)
> >> >
> >> > If all you have is the BPF object file is it possible to interrogate
> >> > it to get a list of all the maps, and get FDs associated for them ?
> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
> >> > like you had to know the required maps upfront ?  If it is possible
> >> > to auto-discover everything you need, soley from the BPF object file
> >> > as input, then just dealing with that in isolation would feel simpler.
> >> 
> >> It is. You load the object file, and bpf_object__for_each_map() lets you
> >> discover which maps it contains, with the different bpf_map__*() APIs
> >> telling you the properties of that map (and you can modify them too
> >> before loading the object if needed).
> >> 
> >> The only thing that's not in the object file is any initial data you
> >> want to put into the map(s). But except for read-only maps that can be
> >> added by userspace after loading the maps, so you could just let Qemu do
> >> that...
> >> 
> >> > It occurrs to me that exposing the BPF program as data rather than
> >> > via binary will make more practical to integrate this into KubeVirt's
> >> > architecture. In their deployment setup both QEMU and libvirt are
> >> > running unprivileged inside a container. For any advanced nmetworking
> >> > a completely separate component creates the TAP device and passes it
> >> > into the container running QEMU. I don't think that the separate
> >> > precisely matched helper binary would be something they can use, but
> >> > it might be possible to expose a data file providing the BPF program
> >> > blob and describing its maps.
> >> 
> >> Well, "a data file providing the BPF program blob and describing its
> >> maps" is basically what a BPF .o file is. It just happens to be encoded
> >> in ELF format :)
> >> 
> >> You can embed it into some other data structure and have libbpf load it
> >> from a blob in memory as well as from the filesystem, though; that is
> >> basically what the skeleton file does (notice the big character string
> >> at the end, that's just the original .o file contents).
> >
> > Ok, in that case I'm really wondering why any of this helper program
> > stuff was proposed. I recall the rationale was that it was impossible
> > for an external program to load the BPF object on behalf of QEMU,
> > because it would not know how todo that without QEMU specific
> > knowledge.
> 
> I'm not sure either. Was there some bits that initially needed to be set
> before the program was loaded (read-only maps or something)? Also,
> upstream does encourage the use of skeletons for embedding into
> applications, so it's not an unreasonable thing to start with if you
> don't have the kind of deployment constraints that Qemu does in this
> case.
> 
> > It looks like we can simply expose the BPF object blob to mgmt apps
> > directly and get rid of this helper program entirely.
> 
> I believe so, yes. You'd still need to be sure that the BPF object file
> itself comes from a trusted place, but hopefully it should be enough to
> load it from a known filesystem path? (Sorry if this is a stupid
> question, I only have a fuzzy idea of how all the pieces fit together
> here).

It could be from a well known location on the filesystem, but might
be better to make it possible to query it from QMP, which is mostly
safe *provided* you've not yet started guest CPUs running. It could
be queried at startup and then cached for future use.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-01 15:05                   ` Daniel P. Berrangé
@ 2023-03-01 22:40                     ` Toke Høiland-Jørgensen
  2023-03-22 13:26                       ` Andrew Melnichenko
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-03-01 22:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yuri Benditovich, Andrew Melnychenko, jasowang, mst, pbonzini,
	marcandre.lureau, thuth, philmd, armbru, eblake, qemu-devel,
	mprivozn, yan

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
>> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >> 
>> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
>> >> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >> >> 
>> >> >> Just to interject a note on this here: the skeleton code is mostly a
>> >> >> convenience feature used to embed BPF programs into the calling binary.
>> >> >> It is perfectly possible to just have the BPF object file itself reside
>> >> >> directly in the file system and just use the regular libbpf APIs to load
>> >> >> it. Some things get a bit more cumbersome (mostly setting values of
>> >> >> global variables, if the BPF program uses those).
>> >> >> 
>> >> >> So the JSON example above could just be a regular compiled-from-clang
>> >> >> BPF object file, and the management program can load that, inspect its
>> >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
>> >> >> It's even possible to embed version information into this so that Qemu
>> >> >> can check if it understands the format and bail out if it doesn't - just
>> >> >> stick a version field in the configuration map as the first entry :)
>> >> >
>> >> > If all you have is the BPF object file is it possible to interrogate
>> >> > it to get a list of all the maps, and get FDs associated for them ?
>> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
>> >> > like you had to know the required maps upfront ?  If it is possible
>> >> > to auto-discover everything you need, soley from the BPF object file
>> >> > as input, then just dealing with that in isolation would feel simpler.
>> >> 
>> >> It is. You load the object file, and bpf_object__for_each_map() lets you
>> >> discover which maps it contains, with the different bpf_map__*() APIs
>> >> telling you the properties of that map (and you can modify them too
>> >> before loading the object if needed).
>> >> 
>> >> The only thing that's not in the object file is any initial data you
>> >> want to put into the map(s). But except for read-only maps that can be
>> >> added by userspace after loading the maps, so you could just let Qemu do
>> >> that...
>> >> 
>> >> > It occurrs to me that exposing the BPF program as data rather than
>> >> > via binary will make more practical to integrate this into KubeVirt's
>> >> > architecture. In their deployment setup both QEMU and libvirt are
>> >> > running unprivileged inside a container. For any advanced nmetworking
>> >> > a completely separate component creates the TAP device and passes it
>> >> > into the container running QEMU. I don't think that the separate
>> >> > precisely matched helper binary would be something they can use, but
>> >> > it might be possible to expose a data file providing the BPF program
>> >> > blob and describing its maps.
>> >> 
>> >> Well, "a data file providing the BPF program blob and describing its
>> >> maps" is basically what a BPF .o file is. It just happens to be encoded
>> >> in ELF format :)
>> >> 
>> >> You can embed it into some other data structure and have libbpf load it
>> >> from a blob in memory as well as from the filesystem, though; that is
>> >> basically what the skeleton file does (notice the big character string
>> >> at the end, that's just the original .o file contents).
>> >
>> > Ok, in that case I'm really wondering why any of this helper program
>> > stuff was proposed. I recall the rationale was that it was impossible
>> > for an external program to load the BPF object on behalf of QEMU,
>> > because it would not know how todo that without QEMU specific
>> > knowledge.
>> 
>> I'm not sure either. Was there some bits that initially needed to be set
>> before the program was loaded (read-only maps or something)? Also,
>> upstream does encourage the use of skeletons for embedding into
>> applications, so it's not an unreasonable thing to start with if you
>> don't have the kind of deployment constraints that Qemu does in this
>> case.
>> 
>> > It looks like we can simply expose the BPF object blob to mgmt apps
>> > directly and get rid of this helper program entirely.
>> 
>> I believe so, yes. You'd still need to be sure that the BPF object file
>> itself comes from a trusted place, but hopefully it should be enough to
>> load it from a known filesystem path? (Sorry if this is a stupid
>> question, I only have a fuzzy idea of how all the pieces fit together
>> here).
>
> It could be from a well known location on the filesystem, but might
> be better to make it possible to query it from QMP, which is mostly
> safe *provided* you've not yet started guest CPUs running. It could
> be queried at startup and then cached for future use.

Right, I don't have a strong opinion about the exact mechanism, just
wanted to convey a general "loading an untrusted BPF program is bad"
kind of vibe ;)

-Toke



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-01 22:40                     ` Toke Høiland-Jørgensen
@ 2023-03-22 13:26                       ` Andrew Melnichenko
  2023-03-22 15:59                         ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Melnichenko @ 2023-03-22 13:26 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel P. Berrangé,
	Yuri Benditovich, jasowang, mst, pbonzini, marcandre.lureau,
	thuth, philmd, armbru, eblake, qemu-devel, mprivozn, yan

Hi all,
I've researched an issue a bit. The solution with passing eBPF blob
and loading in the Libvirt looks promising.
Overall, the possible solution looks like this:
 * Libvirt checks virtio-net properties and understands that eBPF
steering may be required.
 * Libvirt requests eBPF blob through QMP.
 * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU.

I think that it's a good idea to pass only eBPF blob without
additional metainformation. Most metainfo that we need could be
retrieved from eBPF blob, and the only question is to pass fds
sequence to QEMU.
I propose to pass them as they appear in the blob itself, like
"virtio-net-pci,ebpf_rss_fds=<prog>,<map1>,<map2>,<map3>...".
Also, I think it's a good idea to make a "general" QMP request for
eBPF blobs. Something like "request_ebpf <arg>"(g.e "request_ebpf
virtio-net-rss").

I'll prepare new RFC patches if you have questions or something to
discuss, please let me know.

On Thu, Mar 2, 2023 at 12:40 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Daniel P. Berrangé <berrange@redhat.com> writes:
>
> > On Wed, Mar 01, 2023 at 03:53:47PM +0100, Toke Høiland-Jørgensen wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >>
> >> > On Tue, Feb 28, 2023 at 11:21:56PM +0100, Toke Høiland-Jørgensen wrote:
> >> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> >>
> >> >> > On Tue, Feb 28, 2023 at 08:01:51PM +0100, Toke Høiland-Jørgensen wrote:
> >> >> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> >> >>
> >> >> >> Just to interject a note on this here: the skeleton code is mostly a
> >> >> >> convenience feature used to embed BPF programs into the calling binary.
> >> >> >> It is perfectly possible to just have the BPF object file itself reside
> >> >> >> directly in the file system and just use the regular libbpf APIs to load
> >> >> >> it. Some things get a bit more cumbersome (mostly setting values of
> >> >> >> global variables, if the BPF program uses those).
> >> >> >>
> >> >> >> So the JSON example above could just be a regular compiled-from-clang
> >> >> >> BPF object file, and the management program can load that, inspect its
> >> >> >> contents using the libbpf APIs and pass the file descriptors on to Qemu.
> >> >> >> It's even possible to embed version information into this so that Qemu
> >> >> >> can check if it understands the format and bail out if it doesn't - just
> >> >> >> stick a version field in the configuration map as the first entry :)
> >> >> >
> >> >> > If all you have is the BPF object file is it possible to interrogate
> >> >> > it to get a list of all the maps, and get FDs associated for them ?
> >> >> > I had a look at the libbpf API and wasn't sure about that, it seemed
> >> >> > like you had to know the required maps upfront ?  If it is possible
> >> >> > to auto-discover everything you need, soley from the BPF object file
> >> >> > as input, then just dealing with that in isolation would feel simpler.
> >> >>
> >> >> It is. You load the object file, and bpf_object__for_each_map() lets you
> >> >> discover which maps it contains, with the different bpf_map__*() APIs
> >> >> telling you the properties of that map (and you can modify them too
> >> >> before loading the object if needed).
> >> >>
> >> >> The only thing that's not in the object file is any initial data you
> >> >> want to put into the map(s). But except for read-only maps that can be
> >> >> added by userspace after loading the maps, so you could just let Qemu do
> >> >> that...
> >> >>
> >> >> > It occurrs to me that exposing the BPF program as data rather than
> >> >> > via binary will make more practical to integrate this into KubeVirt's
> >> >> > architecture. In their deployment setup both QEMU and libvirt are
> >> >> > running unprivileged inside a container. For any advanced nmetworking
> >> >> > a completely separate component creates the TAP device and passes it
> >> >> > into the container running QEMU. I don't think that the separate
> >> >> > precisely matched helper binary would be something they can use, but
> >> >> > it might be possible to expose a data file providing the BPF program
> >> >> > blob and describing its maps.
> >> >>
> >> >> Well, "a data file providing the BPF program blob and describing its
> >> >> maps" is basically what a BPF .o file is. It just happens to be encoded
> >> >> in ELF format :)
> >> >>
> >> >> You can embed it into some other data structure and have libbpf load it
> >> >> from a blob in memory as well as from the filesystem, though; that is
> >> >> basically what the skeleton file does (notice the big character string
> >> >> at the end, that's just the original .o file contents).
> >> >
> >> > Ok, in that case I'm really wondering why any of this helper program
> >> > stuff was proposed. I recall the rationale was that it was impossible
> >> > for an external program to load the BPF object on behalf of QEMU,
> >> > because it would not know how todo that without QEMU specific
> >> > knowledge.
> >>
> >> I'm not sure either. Was there some bits that initially needed to be set
> >> before the program was loaded (read-only maps or something)? Also,
> >> upstream does encourage the use of skeletons for embedding into
> >> applications, so it's not an unreasonable thing to start with if you
> >> don't have the kind of deployment constraints that Qemu does in this
> >> case.
> >>
> >> > It looks like we can simply expose the BPF object blob to mgmt apps
> >> > directly and get rid of this helper program entirely.
> >>
> >> I believe so, yes. You'd still need to be sure that the BPF object file
> >> itself comes from a trusted place, but hopefully it should be enough to
> >> load it from a known filesystem path? (Sorry if this is a stupid
> >> question, I only have a fuzzy idea of how all the pieces fit together
> >> here).
> >
> > It could be from a well known location on the filesystem, but might
> > be better to make it possible to query it from QMP, which is mostly
> > safe *provided* you've not yet started guest CPUs running. It could
> > be queried at startup and then cached for future use.
>
> Right, I don't have a strong opinion about the exact mechanism, just
> wanted to convey a general "loading an untrusted BPF program is bad"
> kind of vibe ;)
>
> -Toke
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/5] qmp: Added the helper stamp check.
  2023-03-22 13:26                       ` Andrew Melnichenko
@ 2023-03-22 15:59                         ` Daniel P. Berrangé
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-03-22 15:59 UTC (permalink / raw)
  To: Andrew Melnichenko
  Cc: Toke Høiland-Jørgensen, Yuri Benditovich, jasowang,
	mst, pbonzini, marcandre.lureau, thuth, philmd, armbru, eblake,
	qemu-devel, mprivozn, yan

On Wed, Mar 22, 2023 at 03:26:59PM +0200, Andrew Melnichenko wrote:
> Hi all,
> I've researched an issue a bit. The solution with passing eBPF blob
> and loading in the Libvirt looks promising.
> Overall, the possible solution looks like this:
>  * Libvirt checks virtio-net properties and understands that eBPF
> steering may be required.
>  * Libvirt requests eBPF blob through QMP.
>  * Libvirt loads blob for virtio-net and passes fds from eBPF to QEMU.
> 
> I think that it's a good idea to pass only eBPF blob without
> additional metainformation. Most metainfo that we need could be
> retrieved from eBPF blob, and the only question is to pass fds
> sequence to QEMU.
> I propose to pass them as they appear in the blob itself, like
> "virtio-net-pci,ebpf_rss_fds=<prog>,<map1>,<map2>,<map3>...".

Using ',' for separating FDs is a bad idea, because ',' is already
used for separating QemuOpts arguments.

With -netdev we use ':' for spearating FDs with vhostfds= and fds=
arguments, so I'd suggest following that practice.

> Also, I think it's a good idea to make a "general" QMP request for
> eBPF blobs. Something like "request_ebpf <arg>"(g.e "request_ebpf
> virtio-net-rss").

That's reasonable as a future proofing idea I think.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/5] qmp: Added the helper stamp check.
  2021-07-13 15:37 [PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
@ 2021-07-13 15:37 ` Andrew Melnychenko
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Melnychenko @ 2021-07-13 15:37 UTC (permalink / raw)
  To: mst, yuri.benditovich, jasowang, armbru, eblake, berrange; +Cc: yan, qemu-devel

Added function to check the stamp in the helper.
eBPF helper should have a special symbol that generates during build.
QEMU checks the helper and determinates that it fits, so the helper
will produce proper output.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 meson.build                       |  10 +
 monitor/meson.build               |   1 +
 monitor/qemu-helper-stamp-utils.c | 297 ++++++++++++++++++++++++++++++
 monitor/qemu-helper-stamp-utils.h |  24 +++
 4 files changed, 332 insertions(+)
 create mode 100644 monitor/qemu-helper-stamp-utils.c
 create mode 100644 monitor/qemu-helper-stamp-utils.h

diff --git a/meson.build b/meson.build
index 626cf932c1..257e51d91b 100644
--- a/meson.build
+++ b/meson.build
@@ -1757,6 +1757,16 @@ foreach d : hx_headers
 endforeach
 genh += hxdep
 
+helper_stamp = custom_target(
+    'qemu-helper-stamp.h',
+    output : 'qemu-helper-stamp.h',
+    input : 'ebpf/rss.bpf.skeleton.h',
+    command : [python, '-c', 'import hashlib; print(\'#define QEMU_HELPER_STAMP qemuHelperStamp_{}\'.format(hashlib.sha1(open(\'@INPUT@\', \'rb\').read()).hexdigest()))'],
+    capture: true,
+)
+
+genh += helper_stamp
+
 ###################
 # Collect sources #
 ###################
diff --git a/monitor/meson.build b/monitor/meson.build
index 6d00985ace..2b6b39549b 100644
--- a/monitor/meson.build
+++ b/monitor/meson.build
@@ -5,5 +5,6 @@ softmmu_ss.add(files(
   'hmp.c',
 ))
 softmmu_ss.add([spice_headers, files('qmp-cmds.c')])
+softmmu_ss.add(files('qemu-helper-stamp-utils.c'))
 
 specific_ss.add(when: 'CONFIG_SOFTMMU', if_true: [files('misc.c'), spice])
diff --git a/monitor/qemu-helper-stamp-utils.c b/monitor/qemu-helper-stamp-utils.c
new file mode 100644
index 0000000000..d34c3b94c5
--- /dev/null
+++ b/monitor/qemu-helper-stamp-utils.c
@@ -0,0 +1,297 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Description: This file mostly implements helper stamp checking.
+ *              The stamp is implemented in a similar way as in qemu modules.
+ *              The helper should contain a specific symbol.
+ *              Not in a similar way is symbol checking - here we parse
+ *              the ELF file. For now(10.07.2021), only eBPF helper contains
+ *              the stamp, and the stamp is generated from
+ *              sha1 ebpf/rss.bpf.skeleton.h (see meson.build).
+ */
+
+#include "qemu/osdep.h"
+#include "elf.h"
+#include "qemu-helper-stamp-utils.h"
+
+#include <glib/gstdio.h>
+
+#ifdef CONFIG_LINUX
+
+static void *file_allocate_and_read(int fd, off_t off, size_t size)
+{
+    void *data;
+    int err;
+
+    if (fd < 0) {
+        return NULL;
+    }
+
+    err = lseek(fd, off, SEEK_SET);
+    if (err < 0) {
+        return NULL;
+    }
+
+    data = g_new0(char, size);
+    if (data == NULL) {
+        return NULL;
+    }
+
+    err = read(fd, data, size);
+    if (err < 0) {
+        g_free(data);
+        return NULL;
+    }
+
+    return data;
+}
+
+static Elf64_Shdr *elf64_get_section_table(int fd, Elf64_Ehdr *elf_header)
+{
+    if (elf_header == NULL) {
+        return NULL;
+    }
+    return (Elf64_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+                             elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static Elf32_Shdr *elf32_get_section_table(int fd, Elf32_Ehdr *elf_header)
+{
+    if (elf_header == NULL) {
+        return NULL;
+    }
+    return (Elf32_Shdr *)file_allocate_and_read(fd, elf_header->e_shoff,
+                             elf_header->e_shnum * elf_header->e_shentsize);
+}
+
+static void *elf64_get_section_data(int fd, const Elf64_Shdr* section_header)
+{
+    if (fd < 0 || section_header == NULL) {
+        return NULL;
+    }
+    return file_allocate_and_read(fd, section_header->sh_offset,
+                                  section_header->sh_size);
+}
+
+static void *elf32_get_section_data(int fd, const Elf32_Shdr* section_header)
+{
+    if (fd < 0 || section_header == NULL) {
+        return NULL;
+    }
+    return file_allocate_and_read(fd, section_header->sh_offset,
+                                  section_header->sh_size);
+}
+
+static bool elf64_check_symbol_in_symbol_table(int fd,
+                                               Elf64_Shdr *section_table,
+                                               Elf64_Shdr *symbol_section,
+                                               const char *symbol)
+{
+    Elf64_Sym *symbol_table;
+    char *string_table;
+    uint32_t i;
+    bool ret = false;
+
+    symbol_table = (Elf64_Sym *) elf64_get_section_data(fd, symbol_section);
+    if (symbol_table == NULL) {
+        return false;
+    }
+
+    string_table = (char *) elf64_get_section_data(
+            fd, section_table + symbol_section->sh_link);
+    if (string_table == NULL) {
+        g_free(symbol_table);
+        return false;
+    }
+
+    for (i = 0; i < (symbol_section->sh_size / sizeof(Elf64_Sym)); ++i) {
+        if (strncmp((string_table + symbol_table[i].st_name),
+                     symbol, strlen(symbol)) == 0)
+        {
+            ret = true;
+            break;
+        }
+    }
+
+    g_free(string_table);
+    g_free(symbol_table);
+    return ret;
+}
+
+static bool elf32_check_symbol_in_symbol_table(int fd,
+                                               Elf32_Shdr *section_table,
+                                               Elf32_Shdr *symbol_section,
+                                               const char *symbol)
+{
+    Elf32_Sym *symbol_table;
+    char *string_table;
+    uint32_t i;
+    bool ret = false;
+
+    symbol_table = (Elf32_Sym *) elf32_get_section_data(fd, symbol_section);
+    if (symbol_table == NULL) {
+        return false;
+    }
+
+    string_table = (char *) elf32_get_section_data(fd,
+                                       section_table + symbol_section->sh_link);
+    if (string_table == NULL) {
+        g_free(symbol_table);
+        return false;
+    }
+
+    for (i = 0; i < (symbol_section->sh_size / sizeof(Elf32_Sym)); ++i) {
+        if (strncmp((string_table + symbol_table[i].st_name),
+                     symbol, strlen(symbol)) == 0)
+        {
+            ret = true;
+            break;
+        }
+    }
+
+    g_free(string_table);
+    g_free(symbol_table);
+    return ret;
+}
+
+static bool elf64_check_stamp(int fd, Elf64_Ehdr *elf_header, const char *stamp)
+{
+    Elf64_Shdr *section_table;
+    size_t i;
+    bool ret = false;
+
+    section_table = elf64_get_section_table(fd, elf_header);
+    if (section_table == NULL) {
+        return false;
+    }
+
+    for (i = 0; i < elf_header->e_shnum; ++i) {
+        if ((section_table[i].sh_type == SHT_SYMTAB)
+             || (section_table[i].sh_type == SHT_DYNSYM)) {
+            if (elf64_check_symbol_in_symbol_table(fd, section_table,
+                                                   section_table + i, stamp)) {
+                ret = true;
+                break;
+            }
+        }
+    }
+
+    g_free(section_table);
+    return ret;
+}
+
+static bool elf32_check_stamp(int fd, Elf32_Ehdr *elf_header, const char *stamp)
+{
+    Elf32_Shdr *section_table;
+    size_t i;
+    bool ret = false;
+
+    section_table = elf32_get_section_table(fd, elf_header);
+    if (section_table == NULL) {
+        return false;
+    }
+
+    for (i = 0; i < elf_header->e_shnum; ++i) {
+        if ((section_table[i].sh_type == SHT_SYMTAB)
+             || (section_table[i].sh_type == SHT_DYNSYM)) {
+            if (elf32_check_symbol_in_symbol_table(fd, section_table,
+                                                   section_table + i, stamp)) {
+                ret = true;
+                break;
+            }
+        }
+    }
+
+    g_free(section_table);
+    return ret;
+}
+
+bool qemu_check_helper_stamp(const char *path, const char *stamp)
+{
+    int fd;
+    bool ret = false;
+    Elf64_Ehdr *elf_header;
+
+    fd = open(path, O_RDONLY | O_SYNC);
+    if (fd < 0) {
+        return false;
+    }
+
+    elf_header = (Elf64_Ehdr *)file_allocate_and_read(
+            fd, 0, sizeof(Elf64_Ehdr));
+    if (elf_header == NULL) {
+        goto error;
+    }
+
+    if (strncmp((char *)elf_header->e_ident, ELFMAG, SELFMAG)) {
+        g_free(elf_header);
+        goto error;
+    }
+
+    if (elf_header->e_ident[EI_CLASS] == ELFCLASS64) {
+        ret = elf64_check_stamp(fd, elf_header, stamp);
+    } else if (elf_header->e_ident[EI_CLASS] == ELFCLASS32) {
+        ret = elf32_check_stamp(fd, (Elf32_Ehdr *)elf_header, stamp);
+    }
+
+    g_free(elf_header);
+error:
+    close(fd);
+    return ret;
+}
+
+#else
+
+bool qemu_check_helper_stamp(const char *path, const char *stamp)
+{
+    return false;
+}
+
+#endif
+
+char *qemu_find_helper(const char *name, bool check_stamp)
+{
+    char *qemu_exec = NULL;
+    char *qemu_dir = NULL;
+    char *helper = NULL;
+
+    if (name == NULL) {
+        return NULL;
+    }
+
+    helper = g_build_filename(CONFIG_QEMU_HELPERDIR, name, NULL);
+    if (g_access(helper, F_OK) == 0
+        && (!check_stamp
+            || qemu_check_helper_stamp(helper, QEMU_HELPER_STAMP_STR))) {
+        return helper;
+    }
+    g_free(helper);
+
+#ifdef CONFIG_LINUX
+    qemu_exec = g_file_read_link("/proc/self/exe", NULL);
+#else
+    qemu_exec = NULL;
+#endif
+    if (qemu_exec != NULL) {
+        qemu_dir = g_path_get_dirname(qemu_exec);
+        g_free(qemu_exec);
+        helper = g_build_filename(qemu_dir, name, NULL);
+        g_free(qemu_dir);
+        if (g_access(helper, F_OK) == 0
+           && (!check_stamp
+               || qemu_check_helper_stamp(helper, QEMU_HELPER_STAMP_STR))) {
+            return helper;
+        }
+        g_free(helper);
+    }
+
+    return NULL;
+}
diff --git a/monitor/qemu-helper-stamp-utils.h b/monitor/qemu-helper-stamp-utils.h
new file mode 100644
index 0000000000..e64cf96aa6
--- /dev/null
+++ b/monitor/qemu-helper-stamp-utils.h
@@ -0,0 +1,24 @@
+/*
+ * QEMU helper stamp check utils.
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_QEMU_HELPER_STAMP_UTILS_H
+#define QEMU_QEMU_HELPER_STAMP_UTILS_H
+
+#include "qemu-helper-stamp.h" /* generated stamp per build */
+
+#define QEMU_HELPER_STAMP_STR     stringify(QEMU_HELPER_STAMP)
+
+bool qemu_check_helper_stamp(const char *path, const char *stamp);
+
+char *qemu_find_helper(const char *name, bool check_stamp);
+
+#endif /* QEMU_QEMU_HELPER_STAMP_UTILS_H */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-03-22 16:00 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-19 16:20 [PATCH 0/5] eBPF RSS Helper support Andrew Melnychenko
2023-02-19 16:20 ` [PATCH 1/5] ebpf: Added eBPF initialization by fds and map update Andrew Melnychenko
2023-02-19 16:20 ` [PATCH 2/5] virtio-net: Added property to load eBPF RSS with fds Andrew Melnychenko
2023-02-19 16:20 ` [PATCH 3/5] qmp: Added the helper stamp check Andrew Melnychenko
2023-02-20  9:49   ` Daniel P. Berrangé
2023-02-27  3:45     ` Andrew Melnichenko
2023-02-27 14:06       ` Toke Høiland-Jørgensen
2023-02-28  9:56     ` Yuri Benditovich
2023-02-28 18:04       ` Daniel P. Berrangé
2023-02-28 19:01         ` Toke Høiland-Jørgensen
2023-02-28 19:03           ` Daniel P. Berrangé
2023-02-28 22:21             ` Toke Høiland-Jørgensen
2023-03-01  9:30               ` Daniel P. Berrangé
2023-03-01 14:53                 ` Toke Høiland-Jørgensen
2023-03-01 15:05                   ` Daniel P. Berrangé
2023-03-01 22:40                     ` Toke Høiland-Jørgensen
2023-03-22 13:26                       ` Andrew Melnichenko
2023-03-22 15:59                         ` Daniel P. Berrangé
2023-02-28 19:01         ` Daniel P. Berrangé
2023-03-01  6:49         ` Yuri Benditovich
2023-03-01  9:31           ` Daniel P. Berrangé
2023-02-19 16:20 ` [PATCH 4/5] ebpf_rss_helper: Added helper for eBPF RSS Andrew Melnychenko
2023-02-20  9:54   ` Daniel P. Berrangé
2023-02-27  3:50     ` Andrew Melnichenko
2023-02-19 16:21 ` [PATCH 5/5] qmp: Added find-ebpf-rss-helper command Andrew Melnychenko
  -- strict thread matches above, loose matches on Subject: below --
2021-07-13 15:37 [PATCH 0/5] ebpf: Added ebpf helper for libvirtd Andrew Melnychenko
2021-07-13 15:37 ` [PATCH 3/5] qmp: Added the helper stamp check Andrew Melnychenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.