All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] eBPF RSS support for virtio-net
@ 2020-11-02 18:51 Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew Melnychenko, qemu-devel

Basic idea is to use eBPF to calculate and steer packets in TAP.
RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
by calculating packet hash.
eBPF RSS allows us to use RSS with vhost TAP.

This set of patches introduces the usage of eBPF for packet steering
and RSS hash calculation:
* RSS(Receive Side Scaling) is used to distribute network packets to
guest virtqueues by calculating packet hash
* eBPF RSS suppose to be faster than already existing 'software'
implementation in QEMU
* Additionally adding support for the usage of RSS with vhost

Supported kernels: 5.8+

Implementation notes:
Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
Added eBPF support to qemu directly through a system call, see the
bpf(2) for details.
The eBPF program is part of the qemu and presented as an array of bpf
instructions.
The program can be recompiled by provided Makefile.ebpf(need to adjust
'linuxhdrs'),
although it's not required to build QEMU with eBPF support.
Added changes to virtio-net and vhost, primary eBPF RSS is used.
'Software' RSS used in the case of hash population and as a fallback option.
For vhost, the hash population feature is not reported to the guest.

Please also see the documentation in PATCH 6/6.

I am sending those patches as RFC to initiate the discussions and get
feedback on the following points:
* Fallback when eBPF is not supported by the kernel
* Live migration to the kernel that doesn't have eBPF support
* Integration with current QEMU build
* Additional usage for eBPF for packet filtering

Know issues:
* hash population not supported by eBPF RSS: 'software' RSS used
as a fallback, also, hash population feature is not reported to guests
with vhost.
* big-endian BPF support: for now, eBPF is disabled for big-endian systems.

Andrew (6):
  Added SetSteeringEBPF method for NetClientState.
  ebpf: Added basic eBPF API.
  ebpf: Added eBPF RSS program.
  ebpf: Added eBPF RSS loader.
  virtio-net: Added eBPF RSS to virtio-net.
  docs: Added eBPF documentation.

 MAINTAINERS                    |   6 +
 configure                      |  36 +++
 docs/ebpf.rst                  |  29 ++
 docs/ebpf_rss.rst              | 129 ++++++++
 ebpf/EbpfElf_to_C.py           |  67 ++++
 ebpf/Makefile.ebpf             |  38 +++
 ebpf/ebpf-stub.c               |  28 ++
 ebpf/ebpf.c                    | 107 +++++++
 ebpf/ebpf.h                    |  35 +++
 ebpf/ebpf_rss.c                | 178 +++++++++++
 ebpf/ebpf_rss.h                |  30 ++
 ebpf/meson.build               |   1 +
 ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
 ebpf/trace-events              |   4 +
 ebpf/trace.h                   |   2 +
 ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
 hw/net/vhost_net.c             |   2 +
 hw/net/virtio-net.c            | 120 ++++++-
 include/hw/virtio/virtio-net.h |   4 +
 include/net/net.h              |   2 +
 meson.build                    |   3 +
 net/tap-bsd.c                  |   5 +
 net/tap-linux.c                |  19 ++
 net/tap-solaris.c              |   5 +
 net/tap-stub.c                 |   5 +
 net/tap.c                      |   9 +
 net/tap_int.h                  |   1 +
 net/vhost-vdpa.c               |   2 +
 28 files changed, 1889 insertions(+), 4 deletions(-)
 create mode 100644 docs/ebpf.rst
 create mode 100644 docs/ebpf_rss.rst
 create mode 100644 ebpf/EbpfElf_to_C.py
 create mode 100755 ebpf/Makefile.ebpf
 create mode 100644 ebpf/ebpf-stub.c
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.c
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h
 create mode 100644 ebpf/tun_rss_steering.h

-- 
2.28.0



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-04  2:49   ` Jason Wang
  2020-11-02 18:51 ` [RFC PATCH 2/6] ebpf: Added basic eBPF API Andrew Melnychenko
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

For now, that method supported only by Linux TAP.
Linux TAP uses TUNSETSTEERINGEBPF ioctl.
TUNSETSTEERINGBPF was added 3 years ago.
Qemu checks if it was defined before using.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 include/net/net.h |  2 ++
 net/tap-bsd.c     |  5 +++++
 net/tap-linux.c   | 19 +++++++++++++++++++
 net/tap-solaris.c |  5 +++++
 net/tap-stub.c    |  5 +++++
 net/tap.c         |  9 +++++++++
 net/tap_int.h     |  1 +
 7 files changed, 46 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 897b2d7595..d8a41fb010 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -60,6 +60,7 @@ typedef int (SetVnetBE)(NetClientState *, bool);
 typedef struct SocketReadState SocketReadState;
 typedef void (SocketReadStateFinalize)(SocketReadState *rs);
 typedef void (NetAnnounce)(NetClientState *);
+typedef bool (SetSteeringEBPF)(NetClientState *, int);
 
 typedef struct NetClientInfo {
     NetClientDriver type;
@@ -81,6 +82,7 @@ typedef struct NetClientInfo {
     SetVnetLE *set_vnet_le;
     SetVnetBE *set_vnet_be;
     NetAnnounce *announce;
+    SetSteeringEBPF *set_steering_ebpf;
 } NetClientInfo;
 
 struct NetClientState {
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 77aaf674b1..4f64f31e98 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -259,3 +259,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index b0635e9e32..196373019f 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -31,6 +31,7 @@
 
 #include <net/if.h>
 #include <sys/ioctl.h>
+#include <linux/if_tun.h> /* TUNSETSTEERINGEBPF */
 
 #include "qapi/error.h"
 #include "qemu/error-report.h"
@@ -316,3 +317,21 @@ int tap_fd_get_ifname(int fd, char *ifname)
     pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
     return 0;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+#ifdef TUNSETSTEERINGEBPF
+    if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
+        error_report("Issue while setting TUNSETSTEERINGEBPF:"
+                    " %s with fd: %d, prog_fd: %d",
+                    strerror(errno), fd, prog_fd);
+
+       return -1;
+    }
+
+    return 0;
+#else
+    error_report("TUNSETSTEERINGEBPF is not supported");
+    return -1;
+#endif
+}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 0475a58207..d85224242b 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -255,3 +255,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap-stub.c b/net/tap-stub.c
index de525a2e69..a0fa25804b 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -85,3 +85,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap.c b/net/tap.c
index c46ff66184..81f50017bd 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -337,6 +337,14 @@ static void tap_poll(NetClientState *nc, bool enable)
     tap_write_poll(s, enable);
 }
 
+static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+
+    return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
+}
+
 int tap_get_fd(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -362,6 +370,7 @@ static NetClientInfo net_tap_info = {
     .set_vnet_hdr_len = tap_set_vnet_hdr_len,
     .set_vnet_le = tap_set_vnet_le,
     .set_vnet_be = tap_set_vnet_be,
+    .set_steering_ebpf = tap_set_steering_ebpf,
 };
 
 static TAPState *net_tap_fd_init(NetClientState *peer,
diff --git a/net/tap_int.h b/net/tap_int.h
index 225a49ea48..547f8a5a28 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -44,5 +44,6 @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
 int tap_fd_enable(int fd);
 int tap_fd_disable(int fd);
 int tap_fd_get_ifname(int fd, char *ifname);
+int tap_fd_set_steering_ebpf(int fd, int prog_fd);
 
 #endif /* NET_TAP_INT_H */
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 2/6] ebpf: Added basic eBPF API.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 3/6] ebpf: Added eBPF RSS program Andrew Melnychenko
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

Added basic functions for creating eBPF maps and loading programs.
Also added helper function to 'fix' eBPF map descriptors in programs.
During runtime, different values of eBPF map file descriptors created,
and it required to place them into eBPF instructions for proper work.
It's similar to ELF's relocation table section routine.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/ebpf.c       | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 ebpf/ebpf.h       |  35 +++++++++++++++
 ebpf/trace-events |   4 ++
 ebpf/trace.h      |   2 +
 4 files changed, 148 insertions(+)
 create mode 100644 ebpf/ebpf.c
 create mode 100644 ebpf/ebpf.h
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h

diff --git a/ebpf/ebpf.c b/ebpf/ebpf.c
new file mode 100644
index 0000000000..cec35a484c
--- /dev/null
+++ b/ebpf/ebpf.c
@@ -0,0 +1,107 @@
+#include "ebpf/ebpf.h"
+#include <sys/syscall.h>
+#include "trace.h"
+
+#define ptr_to_u64(x) ((uint64_t)(uintptr_t)x)
+
+static inline int ebpf(enum bpf_cmd cmd, union bpf_attr *attr,
+        unsigned int size)
+{
+    int ret = syscall(__NR_bpf, cmd, attr, size);
+    if (ret < 0) {
+        trace_ebpf_error("eBPF syscall error", strerror(errno));
+    }
+
+    return ret;
+}
+
+int bpf_create_map(enum bpf_map_type map_type,
+                   unsigned int key_size,
+                   unsigned int value_size,
+                   unsigned int max_entries)
+{
+    union bpf_attr attr = {
+            .map_type    = map_type,
+            .key_size    = key_size,
+            .value_size  = value_size,
+            .max_entries = max_entries
+    };
+
+    return ebpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
+
+int bpf_lookup_elem(int fd, const void *key, void *value)
+{
+    union bpf_attr attr = {
+            .map_fd = (uint32_t)fd,
+            .key    = ptr_to_u64(key),
+            .value  = ptr_to_u64(value),
+    };
+
+    return ebpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_update_elem(int fd, const void *key, const void *value,
+                    uint64_t flags)
+{
+    union bpf_attr attr = {
+            .map_fd = (uint32_t)fd,
+            .key    = ptr_to_u64(key),
+            .value  = ptr_to_u64(value),
+            .flags  = flags,
+    };
+
+    return ebpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+}
+
+int bpf_delete_elem(int fd, const void *key)
+{
+    union bpf_attr attr = {
+            .map_fd = (uint32_t)fd,
+            .key    = ptr_to_u64(key),
+    };
+
+    return ebpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
+}
+
+#define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8)
+static char bpf_log_buf[BPF_LOG_BUF_SIZE] = {};
+
+int bpf_prog_load(enum bpf_prog_type type,
+                  const struct bpf_insn *insns, int insn_cnt,
+                  const char *license)
+{
+    int ret = 0;
+    union bpf_attr attr = {};
+    attr.prog_type = type;
+    attr.insns     = ptr_to_u64(insns);
+    attr.insn_cnt  = (uint32_t)insn_cnt;
+    attr.license   = ptr_to_u64(license);
+    attr.log_buf   = ptr_to_u64(bpf_log_buf);
+    attr.log_size  = BPF_LOG_BUF_SIZE;
+    attr.log_level = 1;
+
+    ret = ebpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+    if (ret < 0) {
+        trace_ebpf_error("eBPF program load error:", bpf_log_buf);
+    }
+
+    return ret;
+}
+
+unsigned int bpf_fixup_mapfd(struct fixup_mapfd_t *table,
+                             size_t table_size, struct bpf_insn *insn,
+                             size_t insn_len, const char *map_name, int fd) {
+    unsigned int ret = 0;
+    int i = 0;
+
+    for (; i < table_size; ++i) {
+        if (strcmp(table[i].map_name, map_name) == 0) {
+            insn[table[i].instruction_num].src_reg = 1;
+            insn[table[i].instruction_num].imm = fd;
+            ++ret;
+        }
+    }
+
+    return ret;
+}
diff --git a/ebpf/ebpf.h b/ebpf/ebpf.h
new file mode 100644
index 0000000000..511ad0a06f
--- /dev/null
+++ b/ebpf/ebpf.h
@@ -0,0 +1,35 @@
+#ifndef QEMU_EBPF_H
+#define QEMU_EBPF_H
+
+#include "qemu/osdep.h"
+
+#ifdef CONFIG_EBPF
+#include <linux/bpf.h>
+
+int bpf_create_map(enum bpf_map_type map_type,
+                   unsigned int key_size,
+                   unsigned int value_size,
+                   unsigned int max_entries);
+
+int bpf_lookup_elem(int fd, const void *key, void *value);
+
+int bpf_update_elem(int fd, const void *key, const void *value,
+                    uint64_t flags);
+
+int bpf_delete_elem(int fd, const void *key);
+
+int bpf_prog_load(enum bpf_prog_type type,
+                  const struct bpf_insn *insns, int insn_cnt,
+                  const char *license);
+
+struct fixup_mapfd_t {
+    const char *map_name;
+    size_t instruction_num;
+};
+
+unsigned int bpf_fixup_mapfd(struct fixup_mapfd_t *table,
+                             size_t table_size, struct bpf_insn *insn,
+                             size_t insn_len, const char *map_name, int fd);
+
+#endif /* CONFIG_EBPF */
+#endif /* QEMU_EBPF_H */
diff --git a/ebpf/trace-events b/ebpf/trace-events
new file mode 100644
index 0000000000..3c189516e3
--- /dev/null
+++ b/ebpf/trace-events
@@ -0,0 +1,4 @@
+# See docs/devel/tracing.txt for syntax documentation.
+
+# ebpf.c
+ebpf_error(const char *s1, const char *s2) "error in %s: %s"
diff --git a/ebpf/trace.h b/ebpf/trace.h
new file mode 100644
index 0000000000..ad570e6691
--- /dev/null
+++ b/ebpf/trace.h
@@ -0,0 +1,2 @@
+#include "trace/trace-ebpf.h"
+
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 3/6] ebpf: Added eBPF RSS program.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 2/6] ebpf: Added basic eBPF API Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-03 13:07   ` Daniel P. Berrangé
  2020-11-02 18:51 ` [RFC PATCH 4/6] ebpf: Added eBPF RSS loader Andrew Melnychenko
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

RSS program and Makefile to build it.
Also, added a python script that would generate '.h' file.
The data in that file may be loaded by eBPF API.
EBPF compilation is not required for building qemu.
You can use Makefile if you need to regenerate tun_rss_steering.h.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 ebpf/EbpfElf_to_C.py    |  67 +++++
 ebpf/Makefile.ebpf      |  38 +++
 ebpf/rss.bpf.c          | 470 +++++++++++++++++++++++++++++++++
 ebpf/tun_rss_steering.h | 556 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 1131 insertions(+)
 create mode 100644 ebpf/EbpfElf_to_C.py
 create mode 100755 ebpf/Makefile.ebpf
 create mode 100644 ebpf/rss.bpf.c
 create mode 100644 ebpf/tun_rss_steering.h

diff --git a/ebpf/EbpfElf_to_C.py b/ebpf/EbpfElf_to_C.py
new file mode 100644
index 0000000000..a6e3476f2b
--- /dev/null
+++ b/ebpf/EbpfElf_to_C.py
@@ -0,0 +1,67 @@
+#!/usr/bin/python3
+# pip install pyelftools
+
+import sys
+import argparse
+
+from elftools.elf.elffile import ELFFile
+from elftools.elf.relocation import RelocationSection
+from elftools.elf.sections import Section
+from elftools.elf.sections import SymbolTableSection
+
+def process_file(filename, prog_name):
+    print('Processing file:', filename)
+    with open(filename, 'rb') as f:
+        with open("%s.h" % prog_name, 'w') as w:
+
+            elffile = ELFFile(f)
+
+            symtab = elffile.get_section_by_name(".symtab")
+            if not isinstance(symtab, SymbolTableSection):
+                print('  The file has no %s section' % ".symtab")
+                return -1
+
+            prog_sec = elffile.get_section_by_name(prog_name);
+            if not isinstance(prog_sec, Section):
+                print('  The file has no %s section' % prog_name)
+                return -1
+
+            w.write('#ifndef %s\n' % prog_name.upper())
+            w.write('#define %s\n\n' % prog_name.upper())
+
+            w.write("struct bpf_insn ins%s[] = {\n" % prog_name)
+            insns = [prog_sec.data()[i:i + 8] for i in range(0, prog_sec.data_size, 8)]
+            for x in insns:
+                w.write( \
+                    '    {0x%02x, 0x%02x, 0x%02x, 0x%02x%02x, 0x%02x%02x%02x%02x},\n' \
+                    % (x[0], x[1] & 0x0f, (x[1] >> 4) & 0x0f, \
+                       x[3], x[2], x[7], x[6], x[5], x[4]))
+            w.write('};\n\n')
+
+            reladyn_name = '.rel' + prog_name
+            reladyn = elffile.get_section_by_name(reladyn_name)
+
+            if isinstance(reladyn, RelocationSection):
+                w.write('struct fixup_mapfd_t rel%s[] = {\n' % prog_name)
+                for reloc in reladyn.iter_relocations():
+                    w.write('    {"%s", %i},\n' \
+                        % (symtab.get_symbol(reloc['r_info_sym']).name, \
+                           (reloc['r_offset']/8)))
+                w.write('};\n\n')
+            else:
+               print('  The file has no %s section' % reladyn_name)
+
+            w.write('#endif /* %s */\n' % prog_name.upper())
+
+    return 0
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(
+        description='Convert eBPF ELF to C header. '
+                    'Section name will be used in C namings.')
+    parser.add_argument('--file', '-f', nargs=1, required=True,
+                        help='eBPF ELF file')
+    parser.add_argument('--section', '-s', nargs=1, required=True,
+                        help='section in ELF with eBPF program.')
+    args = parser.parse_args()
+    sys.exit(process_file(args.file[0], args.section[0]))
diff --git a/ebpf/Makefile.ebpf b/ebpf/Makefile.ebpf
new file mode 100755
index 0000000000..f7008d7d32
--- /dev/null
+++ b/ebpf/Makefile.ebpf
@@ -0,0 +1,38 @@
+OBJS = rss.bpf.o
+
+LLC ?= llc
+CLANG ?= clang
+INC_FLAGS = -nostdinc -isystem `$(CLANG) -print-file-name=include`
+EXTRA_CFLAGS ?= -O2 -emit-llvm
+
+linuxhdrs = ~/src/kernel/master
+
+LINUXINCLUDE =  -I $(linuxhdrs)/arch/x86/include/uapi \
+                -I $(linuxhdrs)/arch/x86/include/generated/uapi \
+                -I $(linuxhdrs)/arch/x86/include/generated \
+                -I $(linuxhdrs)/include/generated/uapi \
+                -I $(linuxhdrs)/include/uapi \
+                -I $(linuxhdrs)/include \
+                -I $(linuxhdrs)/tools/lib
+
+all: $(OBJS)
+
+.PHONY: clean
+
+clean:
+	rm -f $(OBJS)
+
+INC_FLAGS = -nostdinc -isystem `$(CLANG) -print-file-name=include`
+
+$(OBJS):  %.o:%.c
+	$(CLANG) $(INC_FLAGS) \
+                -D__KERNEL__ -D__ASM_SYSREG_H \
+                -Wno-unused-value -Wno-pointer-sign \
+                -Wno-compare-distinct-pointer-types \
+                -Wno-gnu-variable-sized-type-not-at-end \
+                -Wno-address-of-packed-member -Wno-tautological-compare \
+                -Wno-unknown-warning-option \
+                -I../include $(LINUXINCLUDE) \
+                $(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+	python3 EbpfElf_to_C.py -f rss.bpf.o -s tun_rss_steering
+
diff --git a/ebpf/rss.bpf.c b/ebpf/rss.bpf.c
new file mode 100644
index 0000000000..084fc33f96
--- /dev/null
+++ b/ebpf/rss.bpf.c
@@ -0,0 +1,470 @@
+#include <stddef.h>
+#include <stdbool.h>
+#include <linux/bpf.h>
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include <linux/udp.h>
+#include <linux/tcp.h>
+
+#include <bpf/bpf_helpers.h>
+#include <linux/virtio_net.h>
+
+/*
+ * Prepare:
+ * Requires llvm, clang, python3 with pyelftools, linux kernel tree
+ *
+ * Build tun_rss_steering.h:
+ * make -f Mefile.ebpf clean all
+ */
+
+#define INDIRECTION_TABLE_SIZE 128
+#define HASH_CALCULATION_BUFFER_SIZE 36
+
+struct rss_config_t {
+    __u8 redirect;
+    __u8 populate_hash;
+    __u32 hash_types;
+    __u16 indirections_len;
+    __u16 default_queue;
+};
+
+struct toeplitz_key_data_t {
+    __u32 leftmost_32_bits;
+    __u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
+};
+
+struct packet_hash_info_t {
+    __u8 is_ipv4;
+    __u8 is_ipv6;
+    __u8 is_udp;
+    __u8 is_tcp;
+    __u8 is_ipv6_ext_src;
+    __u8 is_ipv6_ext_dst;
+
+    __u16 src_port;
+    __u16 dst_port;
+
+    union {
+        struct {
+            __be32 in_src;
+            __be32 in_dst;
+        };
+
+        struct {
+            struct in6_addr in6_src;
+            struct in6_addr in6_dst;
+            struct in6_addr in6_ext_src;
+            struct in6_addr in6_ext_dst;
+        };
+    };
+};
+
+struct {
+    __uint(type, BPF_MAP_TYPE_ARRAY);
+    __type(key, __u32);
+    __type(value, struct rss_config_t);
+    __uint(max_entries, 1);
+} tap_rss_map_configurations SEC(".maps");
+
+struct {
+    __uint(type, BPF_MAP_TYPE_ARRAY);
+    __type(key, __u32);
+    __type(value, struct toeplitz_key_data_t);
+    __uint(max_entries, 1);
+} tap_rss_map_toeplitz_key SEC(".maps");
+
+struct {
+    __uint(type, BPF_MAP_TYPE_ARRAY);
+    __type(key, __u32);
+    __type(value, __u16);
+    __uint(max_entries, INDIRECTION_TABLE_SIZE);
+} tap_rss_map_indirection_table SEC(".maps");
+
+
+static inline void net_rx_rss_add_chunk(__u8 *rss_input, size_t *bytes_written,
+                                        const void *ptr, size_t size) {
+    __builtin_memcpy(&rss_input[*bytes_written], ptr, size);
+    *bytes_written += size;
+}
+
+static inline
+void net_toeplitz_add(__u32 *result,
+                      __u8 *input,
+                      __u32 len
+        , struct toeplitz_key_data_t *key) {
+
+    __u32 accumulator = *result;
+    __u32 leftmost_32_bits = key->leftmost_32_bits;
+    __u32 byte;
+
+    for (byte = 0; byte < HASH_CALCULATION_BUFFER_SIZE; byte++) {
+        __u8 input_byte = input[byte];
+        __u8 key_byte = key->next_byte[byte];
+        __u8 bit;
+
+        for (bit = 0; bit < 8; bit++) {
+            if (input_byte & (1 << 7)) {
+                accumulator ^= leftmost_32_bits;
+            }
+
+            leftmost_32_bits =
+                    (leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+            input_byte <<= 1;
+            key_byte <<= 1;
+        }
+    }
+
+    *result = accumulator;
+}
+
+
+static inline int ip6_extension_header_type(__u8 hdr_type)
+{
+    switch (hdr_type) {
+    case IPPROTO_HOPOPTS:
+    case IPPROTO_ROUTING:
+    case IPPROTO_FRAGMENT:
+    case IPPROTO_ICMPV6:
+    case IPPROTO_NONE:
+    case IPPROTO_DSTOPTS:
+    case IPPROTO_MH:
+        return 1;
+    default:
+        return 0;
+    }
+}
+/*
+ * According to https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml
+ * we suspect that there are would be no more than 11 extensions in IPv6 header,
+ * also there is 27 TLV options for Destination and Hop-by-hop extensions.
+ * Need to choose reasonable amount of maximum extensions/options we may check to find
+ * ext src/dst.
+ */
+#define IP6_EXTENSIONS_COUNT 11
+#define IP6_OPTIONS_COUNT 30
+
+static inline void parse_ipv6_ext(struct __sk_buff *skb,
+        struct packet_hash_info_t *info,
+        __u8 *l4_protocol, size_t *l4_offset)
+{
+    if (!ip6_extension_header_type(*l4_protocol)) {
+        return;
+    }
+
+    struct ipv6_opt_hdr ext_hdr = {};
+
+    for (unsigned int i = 0; i < IP6_EXTENSIONS_COUNT; ++i) {
+
+        bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_hdr,
+                                    sizeof(ext_hdr), BPF_HDR_START_NET);
+
+        if (*l4_protocol == IPPROTO_ROUTING) {
+            struct ipv6_rt_hdr ext_rt = {};
+
+            bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_rt,
+                                        sizeof(ext_rt), BPF_HDR_START_NET);
+
+            if ((ext_rt.type == IPV6_SRCRT_TYPE_2) &&
+                    (ext_rt.hdrlen == sizeof(struct in6_addr) / 8) &&
+                    (ext_rt.segments_left == 1)) {
+
+                bpf_skb_load_bytes_relative(skb,
+                    *l4_offset + offsetof(struct rt2_hdr, addr),
+                    &info->in6_ext_dst, sizeof(info->in6_ext_dst),
+                    BPF_HDR_START_NET);
+
+                info->is_ipv6_ext_dst = 1;
+            }
+
+        } else if (*l4_protocol == IPPROTO_DSTOPTS) {
+            struct ipv6_opt_t {
+                __u8 type;
+                __u8 length;
+            } __attribute__((packed)) opt = {};
+
+            size_t opt_offset = sizeof(ext_hdr);
+
+            for (unsigned int j = 0; j < IP6_OPTIONS_COUNT; ++j) {
+                bpf_skb_load_bytes_relative(skb, *l4_offset + opt_offset,
+                                        &opt, sizeof(opt), BPF_HDR_START_NET);
+
+                opt_offset += (opt.type == IPV6_TLV_PAD1) ?
+                        1 : opt.length + sizeof(opt);
+
+                if (opt_offset + 1 >= ext_hdr.hdrlen * 8) {
+                    break;
+                }
+
+                if (opt.type == IPV6_TLV_HAO) {
+                    bpf_skb_load_bytes_relative(skb,
+                        *l4_offset + opt_offset + offsetof(struct ipv6_destopt_hao, addr),
+                        &info->is_ipv6_ext_src, sizeof(info->is_ipv6_ext_src),
+                        BPF_HDR_START_NET);
+
+                    info->is_ipv6_ext_src = 1;
+                    break;
+                }
+            }
+        }
+
+        *l4_protocol = ext_hdr.nexthdr;
+        *l4_offset += (ext_hdr.hdrlen + 1) * 8;
+
+        if (!ip6_extension_header_type(ext_hdr.nexthdr)) {
+            return;
+        }
+    }
+}
+
+static inline void parse_packet(struct __sk_buff *skb,
+        struct packet_hash_info_t *info)
+{
+    if (!info || !skb) {
+        return;
+    }
+
+    size_t l4_offset = 0;
+    __u8 l4_protocol = 0;
+    __u16 l3_protocol = __be16_to_cpu(skb->protocol);
+
+    if (l3_protocol == ETH_P_IP) {
+        info->is_ipv4 = 1;
+
+        struct iphdr ip = {};
+        bpf_skb_load_bytes_relative(skb, 0, &ip, sizeof(ip),
+                                    BPF_HDR_START_NET);
+
+        info->in_src = ip.saddr;
+        info->in_dst = ip.daddr;
+
+        l4_protocol = ip.protocol;
+        l4_offset = ip.ihl * 4;
+    } else if (l3_protocol == ETH_P_IPV6) {
+        info->is_ipv6 = 1;
+
+        struct ipv6hdr ip6 = {};
+        bpf_skb_load_bytes_relative(skb, 0, &ip6, sizeof(ip6),
+                                    BPF_HDR_START_NET);
+
+        info->in6_src = ip6.saddr;
+        info->in6_dst = ip6.daddr;
+
+        l4_protocol = ip6.nexthdr;
+        l4_offset = sizeof(ip6);
+
+        parse_ipv6_ext(skb, info, &l4_protocol, &l4_offset);
+    }
+
+    if (l4_protocol != 0) {
+        if (l4_protocol == IPPROTO_TCP) {
+            info->is_tcp = 1;
+
+            struct tcphdr tcp = {};
+            bpf_skb_load_bytes_relative(skb, l4_offset, &tcp, sizeof(tcp),
+                                        BPF_HDR_START_NET);
+
+            info->src_port = tcp.source;
+            info->dst_port = tcp.dest;
+        } else if (l4_protocol == IPPROTO_UDP) { /* TODO: add udplite? */
+            info->is_udp = 1;
+
+            struct udphdr udp = {};
+            bpf_skb_load_bytes_relative(skb, l4_offset, &udp, sizeof(udp),
+                                        BPF_HDR_START_NET);
+
+            info->src_port = udp.source;
+            info->dst_port = udp.dest;
+        }
+    }
+}
+
+static inline __u32 calculate_rss_hash(struct __sk_buff *skb,
+        struct rss_config_t *config, struct toeplitz_key_data_t *toe)
+{
+    __u8 rss_input[HASH_CALCULATION_BUFFER_SIZE] = {};
+    size_t bytes_written = 0;
+    __u32 result = 0;
+    struct packet_hash_info_t packet_info = {};
+
+    parse_packet(skb, &packet_info);
+
+    if (packet_info.is_ipv4) {
+        if (packet_info.is_tcp &&
+            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (packet_info.is_udp &&
+                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+        }
+    } else if (packet_info.is_ipv6) {
+        if (packet_info.is_tcp &&
+            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
+
+            if (packet_info.is_ipv6_ext_src &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (packet_info.is_udp &&
+                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
+
+            if (packet_info.is_ipv6_ext_src &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+
+        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+            if (packet_info.is_ipv6_ext_src &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+        }
+    }
+
+    if (bytes_written) {
+        net_toeplitz_add(&result, rss_input, bytes_written, toe);
+    }
+
+    return result;
+}
+
+SEC("tun_rss_steering")
+int tun_rss_steering_prog(struct __sk_buff *skb)
+{
+
+    struct rss_config_t *config = 0;
+    struct toeplitz_key_data_t *toe = 0;
+
+    __u32 key = 0;
+    __u32 hash = 0;
+
+    config = bpf_map_lookup_elem(&tap_rss_map_configurations, &key);
+    toe = bpf_map_lookup_elem(&tap_rss_map_toeplitz_key, &key);
+
+    if (config && toe) {
+        if (!config->redirect) {
+            return config->default_queue;
+        }
+
+        hash = calculate_rss_hash(skb, config, toe);
+        if (hash) {
+            __u32 table_idx = hash % config->indirections_len;
+            __u16 *queue = 0;
+
+            queue = bpf_map_lookup_elem(&tap_rss_map_indirection_table,
+                                        &table_idx);
+
+            if (queue) {
+                return *queue;
+            }
+        }
+
+        return config->default_queue;
+    }
+
+    return -1;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/ebpf/tun_rss_steering.h b/ebpf/tun_rss_steering.h
new file mode 100644
index 0000000000..bbf63a109a
--- /dev/null
+++ b/ebpf/tun_rss_steering.h
@@ -0,0 +1,556 @@
+#ifndef TUN_RSS_STEERING
+#define TUN_RSS_STEERING
+
+struct bpf_insn instun_rss_steering[] = {
+    {0xbf, 0x09, 0x01, 0x0000, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff4c, 0x00000000},
+    {0xbf, 0x06, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x06, 0x00, 0x0000, 0xffffff4c},
+    {0x18, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x00, 0x00, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x06, 0x0000, 0x00000000},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x07, 0x00, 0x0000, 0x00000000},
+    {0x18, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x00, 0x00, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x06, 0x0000, 0x00000000},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x08, 0x00, 0x0000, 0x00000000},
+    {0x18, 0x00, 0x00, 0x0000, 0xffffffff},
+    {0x00, 0x00, 0x00, 0x0000, 0x00000000},
+    {0x15, 0x07, 0x00, 0x016e, 0x00000000},
+    {0xbf, 0x05, 0x08, 0x0000, 0x00000000},
+    {0x15, 0x05, 0x00, 0x016c, 0x00000000},
+    {0x71, 0x01, 0x07, 0x0000, 0x00000000},
+    {0x55, 0x01, 0x00, 0x0001, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0168, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffc0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffb8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffb0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffa8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffa0, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff98, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff90, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff88, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff80, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff78, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff70, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff68, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff60, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff58, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff50, 0x00000000},
+    {0x15, 0x09, 0x00, 0x007e, 0x00000000},
+    {0x61, 0x01, 0x09, 0x0010, 0x00000000},
+    {0xdc, 0x01, 0x00, 0x0000, 0x00000010},
+    {0x15, 0x01, 0x00, 0x0030, 0x000086dd},
+    {0x55, 0x01, 0x00, 0x007a, 0x00000800},
+    {0x7b, 0x0a, 0x05, 0xff28, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x01, 0xff50, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffe0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd0, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffffd0},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0xb7, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000014},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0x61, 0x01, 0x0a, 0xffdc, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff5c, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xffe0, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff60, 0x00000000},
+    {0x71, 0x06, 0x0a, 0xffd9, 0x00000000},
+    {0x71, 0x01, 0x0a, 0xffd0, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000002},
+    {0x57, 0x01, 0x00, 0x0000, 0x0000003c},
+    {0x7b, 0x0a, 0x01, 0xff40, 0x00000000},
+    {0x57, 0x06, 0x00, 0x0000, 0x000000ff},
+    {0x15, 0x06, 0x00, 0x0051, 0x00000011},
+    {0x79, 0x05, 0x0a, 0xff28, 0x00000000},
+    {0x55, 0x06, 0x00, 0x005f, 0x00000006},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x01, 0xff53, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffe0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd0, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffffd0},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000014},
+    {0xbf, 0x06, 0x05, 0x0000, 0x00000000},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0xbf, 0x05, 0x06, 0x0000, 0x00000000},
+    {0x69, 0x01, 0x0a, 0xffd0, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xff56, 0x00000000},
+    {0x69, 0x01, 0x0a, 0xffd2, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xff58, 0x00000000},
+    {0x05, 0x00, 0x00, 0x004b, 0x00000000},
+    {0x7b, 0x0a, 0x05, 0xff28, 0x00000000},
+    {0x7b, 0x0a, 0x07, 0xff10, 0x00000000},
+    {0xb7, 0x07, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x07, 0xff51, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xfff0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffe8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffe0, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd8, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd0, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffffd0},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000028},
+    {0x7b, 0x0a, 0x01, 0xff40, 0x00000000},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0xb7, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000028},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0x79, 0x01, 0x0a, 0xffd8, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff5c, 0x00000000},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x63, 0x0a, 0x01, 0xff60, 0x00000000},
+    {0x79, 0x01, 0x0a, 0xffe0, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff64, 0x00000000},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x63, 0x0a, 0x01, 0xff68, 0x00000000},
+    {0x79, 0x01, 0x0a, 0xffe8, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff6c, 0x00000000},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x63, 0x0a, 0x01, 0xff70, 0x00000000},
+    {0x79, 0x01, 0x0a, 0xfff0, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xff74, 0x00000000},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x63, 0x0a, 0x01, 0xff78, 0x00000000},
+    {0x71, 0x06, 0x0a, 0xffd6, 0x00000000},
+    {0x25, 0x06, 0x00, 0x0126, 0x0000003c},
+    {0x6f, 0x07, 0x06, 0x0000, 0x00000000},
+    {0x18, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x00, 0x00, 0x00, 0x0000, 0x1c001800},
+    {0x5f, 0x07, 0x01, 0x0000, 0x00000000},
+    {0x55, 0x07, 0x00, 0x0001, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0120, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xfffe, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000028},
+    {0x7b, 0x0a, 0x01, 0xff40, 0x00000000},
+    {0xbf, 0x01, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x01, 0x00, 0x0000, 0xffffff8c},
+    {0x7b, 0x0a, 0x01, 0xff20, 0x00000000},
+    {0xbf, 0x01, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x01, 0x00, 0x0000, 0xffffff54},
+    {0x7b, 0x0a, 0x01, 0xff18, 0x00000000},
+    {0x18, 0x07, 0x00, 0x0000, 0x00000001},
+    {0x00, 0x00, 0x00, 0x0000, 0x1c001800},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xff38, 0x00000000},
+    {0x7b, 0x0a, 0x08, 0xff30, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0153, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x01, 0xff52, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffd0, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffffd0},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000008},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0x69, 0x01, 0x0a, 0xffd0, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xff56, 0x00000000},
+    {0x69, 0x01, 0x0a, 0xffd2, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xff58, 0x00000000},
+    {0x79, 0x05, 0x0a, 0xff28, 0x00000000},
+    {0x71, 0x01, 0x0a, 0xff50, 0x00000000},
+    {0x15, 0x01, 0x00, 0x000f, 0x00000000},
+    {0x61, 0x01, 0x07, 0x0004, 0x00000000},
+    {0x71, 0x02, 0x0a, 0xff53, 0x00000000},
+    {0x15, 0x02, 0x00, 0x002b, 0x00000000},
+    {0xbf, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x57, 0x02, 0x00, 0x0000, 0x00000002},
+    {0x15, 0x02, 0x00, 0x0028, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff5c, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffa0, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff60, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffa4, 0x00000000},
+    {0x69, 0x01, 0x0a, 0xff56, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xffa8, 0x00000000},
+    {0x69, 0x01, 0x0a, 0xff58, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xffaa, 0x00000000},
+    {0x05, 0x00, 0x00, 0x005e, 0x00000000},
+    {0x71, 0x01, 0x0a, 0xff51, 0x00000000},
+    {0x15, 0x01, 0x00, 0x00c6, 0x00000000},
+    {0x61, 0x01, 0x07, 0x0004, 0x00000000},
+    {0x71, 0x02, 0x0a, 0xff53, 0x00000000},
+    {0x15, 0x02, 0x00, 0x0027, 0x00000000},
+    {0xbf, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x57, 0x02, 0x00, 0x0000, 0x00000010},
+    {0x15, 0x02, 0x00, 0x0024, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffff5c},
+    {0xbf, 0x02, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0xffffff7c},
+    {0x71, 0x04, 0x0a, 0xff54, 0x00000000},
+    {0x55, 0x04, 0x00, 0x0001, 0x00000000},
+    {0xbf, 0x02, 0x03, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000038},
+    {0xc7, 0x01, 0x00, 0x0000, 0x00000038},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000000},
+    {0x6d, 0x04, 0x01, 0x0001, 0x00000000},
+    {0xbf, 0x02, 0x03, 0x0000, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffff6c},
+    {0xbf, 0x05, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x05, 0x00, 0x0000, 0xffffff8c},
+    {0x6d, 0x04, 0x01, 0x0001, 0x00000000},
+    {0xbf, 0x05, 0x03, 0x0000, 0x00000000},
+    {0x71, 0x01, 0x0a, 0xff55, 0x00000000},
+    {0x15, 0x01, 0x00, 0x0028, 0x00000000},
+    {0xbf, 0x03, 0x05, 0x0000, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0026, 0x00000000},
+    {0x71, 0x02, 0x0a, 0xff52, 0x00000000},
+    {0x15, 0x02, 0x00, 0x0004, 0x00000000},
+    {0xbf, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x57, 0x02, 0x00, 0x0000, 0x00000004},
+    {0x15, 0x02, 0x00, 0x0001, 0x00000000},
+    {0x05, 0x00, 0x00, 0xffd2, 0x00000000},
+    {0x57, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x15, 0x01, 0x00, 0x00a1, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff5c, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffa0, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff60, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xffa4, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0032, 0x00000000},
+    {0x71, 0x02, 0x0a, 0xff52, 0x00000000},
+    {0x15, 0x02, 0x00, 0x009e, 0x00000000},
+    {0xbf, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x57, 0x02, 0x00, 0x0000, 0x00000020},
+    {0x15, 0x02, 0x00, 0x009b, 0x00000000},
+    {0xbf, 0x02, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0xffffff5c},
+    {0x71, 0x04, 0x0a, 0xff54, 0x00000000},
+    {0xbf, 0x03, 0x02, 0x0000, 0x00000000},
+    {0x15, 0x04, 0x00, 0x0002, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffff7c},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x57, 0x01, 0x00, 0x0000, 0x00000100},
+    {0x15, 0x01, 0x00, 0x0001, 0x00000000},
+    {0xbf, 0x02, 0x03, 0x0000, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffff6c},
+    {0x71, 0x05, 0x0a, 0xff55, 0x00000000},
+    {0xbf, 0x04, 0x03, 0x0000, 0x00000000},
+    {0x15, 0x05, 0x00, 0x0002, 0x00000000},
+    {0xbf, 0x04, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x04, 0x00, 0x0000, 0xffffff8c},
+    {0x15, 0x01, 0x00, 0x0001, 0x00000000},
+    {0xbf, 0x03, 0x04, 0x0000, 0x00000000},
+    {0x61, 0x01, 0x02, 0x0004, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x04, 0x02, 0x0000, 0x00000000},
+    {0x4f, 0x01, 0x04, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffa0, 0x00000000},
+    {0x61, 0x01, 0x02, 0x0008, 0x00000000},
+    {0x61, 0x02, 0x02, 0x000c, 0x00000000},
+    {0x67, 0x02, 0x00, 0x0000, 0x00000020},
+    {0x4f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x02, 0xffa8, 0x00000000},
+    {0x61, 0x01, 0x03, 0x0000, 0x00000000},
+    {0x61, 0x02, 0x03, 0x0004, 0x00000000},
+    {0x61, 0x04, 0x03, 0x0008, 0x00000000},
+    {0x61, 0x03, 0x03, 0x000c, 0x00000000},
+    {0x69, 0x05, 0x0a, 0xff58, 0x00000000},
+    {0x6b, 0x0a, 0x05, 0xffc2, 0x00000000},
+    {0x69, 0x05, 0x0a, 0xff56, 0x00000000},
+    {0x6b, 0x0a, 0x05, 0xffc0, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000020},
+    {0x4f, 0x03, 0x04, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x03, 0xffb8, 0x00000000},
+    {0x67, 0x02, 0x00, 0x0000, 0x00000020},
+    {0x4f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x02, 0xffb0, 0x00000000},
+    {0xbf, 0x05, 0x00, 0x0000, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x07, 0x08, 0x00, 0x0000, 0x00000004},
+    {0x61, 0x03, 0x05, 0x0000, 0x00000000},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0xffffffa0},
+    {0x0f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x71, 0x04, 0x02, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x02, 0x00, 0x0000, 0x00000038},
+    {0xc7, 0x02, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x02, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x05, 0x0000, 0x00000000},
+    {0xbf, 0x05, 0x08, 0x0000, 0x00000000},
+    {0x0f, 0x05, 0x01, 0x0000, 0x00000000},
+    {0x71, 0x05, 0x05, 0x0000, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000007},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x00000039},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000006},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x0000003a},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000005},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x0000003b},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000004},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x0000003c},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000003},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x0000003d},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000002},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x04, 0x0000, 0x00000000},
+    {0x67, 0x00, 0x00, 0x0000, 0x0000003e},
+    {0xc7, 0x00, 0x00, 0x0000, 0x0000003f},
+    {0x5f, 0x00, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x00, 0x0000, 0x00000000},
+    {0xbf, 0x00, 0x05, 0x0000, 0x00000000},
+    {0x77, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x57, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x00, 0x0000, 0x00000000},
+    {0x57, 0x04, 0x00, 0x0000, 0x00000001},
+    {0x87, 0x04, 0x00, 0x0000, 0x00000000},
+    {0x5f, 0x04, 0x03, 0x0000, 0x00000000},
+    {0xaf, 0x02, 0x04, 0x0000, 0x00000000},
+    {0x57, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000001},
+    {0x4f, 0x03, 0x05, 0x0000, 0x00000000},
+    {0x07, 0x01, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x05, 0x02, 0x0000, 0x00000000},
+    {0x15, 0x01, 0x00, 0x0001, 0x00000024},
+    {0x05, 0x00, 0x00, 0xffa9, 0x00000000},
+    {0xbf, 0x01, 0x02, 0x0000, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x15, 0x01, 0x00, 0x000b, 0x00000000},
+    {0x69, 0x03, 0x07, 0x0008, 0x00000000},
+    {0x3f, 0x01, 0x03, 0x0000, 0x00000000},
+    {0x2f, 0x01, 0x03, 0x0000, 0x00000000},
+    {0x1f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x02, 0xff50, 0x00000000},
+    {0xbf, 0x02, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0xffffff50},
+    {0x18, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x00, 0x00, 0x00, 0x0000, 0x00000000},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000001},
+    {0x55, 0x00, 0x00, 0x0002, 0x00000000},
+    {0x69, 0x00, 0x07, 0x000a, 0x00000000},
+    {0x95, 0x00, 0x00, 0x0000, 0x00000000},
+    {0x69, 0x00, 0x00, 0x0000, 0x00000000},
+    {0x05, 0x00, 0x00, 0xfffd, 0x00000000},
+    {0xbf, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x57, 0x02, 0x00, 0x0000, 0x00000008},
+    {0x15, 0x02, 0x00, 0xfff9, 0x00000000},
+    {0xbf, 0x02, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0xffffff5c},
+    {0x71, 0x04, 0x0a, 0xff54, 0x00000000},
+    {0xbf, 0x03, 0x02, 0x0000, 0x00000000},
+    {0x15, 0x04, 0x00, 0x0002, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xffffff7c},
+    {0x57, 0x01, 0x00, 0x0000, 0x00000040},
+    {0x15, 0x01, 0x00, 0x0001, 0x00000000},
+    {0xbf, 0x02, 0x03, 0x0000, 0x00000000},
+    {0x61, 0x03, 0x02, 0x0004, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x04, 0x02, 0x0000, 0x00000000},
+    {0x4f, 0x03, 0x04, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x03, 0xffa0, 0x00000000},
+    {0x61, 0x03, 0x02, 0x0008, 0x00000000},
+    {0x61, 0x02, 0x02, 0x000c, 0x00000000},
+    {0x67, 0x02, 0x00, 0x0000, 0x00000020},
+    {0x4f, 0x02, 0x03, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x02, 0xffa8, 0x00000000},
+    {0x15, 0x01, 0x00, 0x0079, 0x00000000},
+    {0x71, 0x01, 0x0a, 0xff55, 0x00000000},
+    {0x15, 0x01, 0x00, 0x0077, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff98, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x02, 0x0a, 0xff94, 0x00000000},
+    {0x4f, 0x01, 0x02, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffb8, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff90, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x02, 0x0a, 0xff8c, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0076, 0x00000000},
+    {0x15, 0x06, 0x00, 0xfedf, 0x00000087},
+    {0x05, 0x00, 0x00, 0x003f, 0x00000000},
+    {0x0f, 0x06, 0x09, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x06, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0x00000001},
+    {0x71, 0x03, 0x0a, 0xffff, 0x00000000},
+    {0x67, 0x03, 0x00, 0x0000, 0x00000003},
+    {0x3d, 0x02, 0x03, 0x0022, 0x00000000},
+    {0x55, 0x01, 0x00, 0x000c, 0x000000c9},
+    {0x79, 0x01, 0x0a, 0xff40, 0x00000000},
+    {0x0f, 0x06, 0x01, 0x0000, 0x00000000},
+    {0x07, 0x06, 0x00, 0x0000, 0x00000002},
+    {0xbf, 0x01, 0x07, 0x0000, 0x00000000},
+    {0xbf, 0x02, 0x06, 0x0000, 0x00000000},
+    {0x79, 0x03, 0x0a, 0xff18, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000001},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x01, 0xff54, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0015, 0x00000000},
+    {0x07, 0x08, 0x00, 0x0000, 0xffffffff},
+    {0xbf, 0x01, 0x08, 0x0000, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0xbf, 0x09, 0x06, 0x0000, 0x00000000},
+    {0x15, 0x01, 0x00, 0x000f, 0x00000000},
+    {0xbf, 0x02, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x01, 0x0a, 0xff40, 0x00000000},
+    {0x0f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xfffffff8},
+    {0xb7, 0x06, 0x00, 0x0000, 0x00000001},
+    {0xbf, 0x01, 0x07, 0x0000, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000002},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0x71, 0x01, 0x0a, 0xfff8, 0x00000000},
+    {0x15, 0x01, 0x00, 0xffdb, 0x00000000},
+    {0x71, 0x06, 0x0a, 0xfff9, 0x00000000},
+    {0x07, 0x06, 0x00, 0x0000, 0x00000002},
+    {0x05, 0x00, 0x00, 0xffd8, 0x00000000},
+    {0x79, 0x08, 0x0a, 0xff30, 0x00000000},
+    {0xbf, 0x09, 0x07, 0x0000, 0x00000000},
+    {0x18, 0x07, 0x00, 0x0000, 0x00000001},
+    {0x00, 0x00, 0x00, 0x0000, 0x1c001800},
+    {0x71, 0x01, 0x0a, 0xffff, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000003},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0x0f, 0x02, 0x01, 0x0000, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0x00000008},
+    {0x7b, 0x0a, 0x02, 0xff40, 0x00000000},
+    {0x71, 0x06, 0x0a, 0xfffe, 0x00000000},
+    {0x25, 0x06, 0x00, 0x0036, 0x0000003c},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x6f, 0x01, 0x06, 0x0000, 0x00000000},
+    {0x5f, 0x01, 0x07, 0x0000, 0x00000000},
+    {0x55, 0x01, 0x00, 0x0001, 0x00000000},
+    {0x05, 0x00, 0x00, 0x0031, 0x00000000},
+    {0x79, 0x01, 0x0a, 0xff38, 0x00000000},
+    {0x07, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x7b, 0x0a, 0x01, 0xff38, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x77, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x55, 0x01, 0x00, 0x0002, 0x0000000b},
+    {0x79, 0x07, 0x0a, 0xff10, 0x00000000},
+    {0x05, 0x00, 0x00, 0xfe5a, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xfffffffe},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000002},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0xbf, 0x01, 0x06, 0x0000, 0x00000000},
+    {0x15, 0x01, 0x00, 0x001a, 0x0000003c},
+    {0x55, 0x01, 0x00, 0xffe1, 0x0000002b},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x63, 0x0a, 0x01, 0xfff8, 0x00000000},
+    {0xbf, 0x03, 0x0a, 0x0000, 0x00000000},
+    {0x07, 0x03, 0x00, 0x0000, 0xfffffff8},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000004},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0x71, 0x01, 0x0a, 0xfffa, 0x00000000},
+    {0x55, 0x01, 0x00, 0xffd6, 0x00000002},
+    {0x71, 0x01, 0x0a, 0xfff9, 0x00000000},
+    {0x55, 0x01, 0x00, 0xffd4, 0x00000002},
+    {0x71, 0x01, 0x0a, 0xfffb, 0x00000000},
+    {0x55, 0x01, 0x00, 0xffd2, 0x00000001},
+    {0x79, 0x02, 0x0a, 0xff40, 0x00000000},
+    {0x07, 0x02, 0x00, 0x0000, 0x00000008},
+    {0xbf, 0x01, 0x09, 0x0000, 0x00000000},
+    {0x79, 0x03, 0x0a, 0xff20, 0x00000000},
+    {0xb7, 0x04, 0x00, 0x0000, 0x00000010},
+    {0xb7, 0x05, 0x00, 0x0000, 0x00000001},
+    {0x85, 0x00, 0x00, 0x0000, 0x00000044},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000001},
+    {0x73, 0x0a, 0x01, 0xff55, 0x00000000},
+    {0x05, 0x00, 0x00, 0xffc8, 0x00000000},
+    {0xbf, 0x07, 0x09, 0x0000, 0x00000000},
+    {0xb7, 0x01, 0x00, 0x0000, 0x00000000},
+    {0x6b, 0x0a, 0x01, 0xfff8, 0x00000000},
+    {0xb7, 0x09, 0x00, 0x0000, 0x00000002},
+    {0xb7, 0x08, 0x00, 0x0000, 0x0000001e},
+    {0x05, 0x00, 0x00, 0xffaf, 0x00000000},
+    {0x15, 0x06, 0x00, 0xffce, 0x00000087},
+    {0x05, 0x00, 0x00, 0xffd3, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff78, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x02, 0x0a, 0xff74, 0x00000000},
+    {0x4f, 0x01, 0x02, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffb8, 0x00000000},
+    {0x61, 0x01, 0x0a, 0xff70, 0x00000000},
+    {0x67, 0x01, 0x00, 0x0000, 0x00000020},
+    {0x61, 0x02, 0x0a, 0xff6c, 0x00000000},
+    {0x4f, 0x01, 0x02, 0x0000, 0x00000000},
+    {0x7b, 0x0a, 0x01, 0xffb0, 0x00000000},
+    {0x05, 0x00, 0x00, 0xfef6, 0x00000000},
+};
+
+struct fixup_mapfd_t reltun_rss_steering[] = {
+    {"tap_rss_map_configurations", 5},
+    {"tap_rss_map_toeplitz_key", 10},
+    {"tap_rss_map_indirection_table", 379},
+};
+
+#endif /* TUN_RSS_STEERING */
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 4/6] ebpf: Added eBPF RSS loader.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
                   ` (2 preceding siblings ...)
  2020-11-02 18:51 ` [RFC PATCH 3/6] ebpf: Added eBPF RSS program Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-02 18:51 ` [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net Andrew Melnychenko
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, Sameeh Jubran, qemu-devel

From: Andrew <andrew@daynix.com>

Added function that loads RSS eBPF program.
Added stub functions for RSS eBPF.
Added meson and configuration options.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 configure        |  36 ++++++++++
 ebpf/ebpf-stub.c |  28 ++++++++
 ebpf/ebpf_rss.c  | 178 +++++++++++++++++++++++++++++++++++++++++++++++
 ebpf/ebpf_rss.h  |  30 ++++++++
 ebpf/meson.build |   1 +
 meson.build      |   3 +
 6 files changed, 276 insertions(+)
 create mode 100644 ebpf/ebpf-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build

diff --git a/configure b/configure
index 6df4306c88..bae4ea54f8 100755
--- a/configure
+++ b/configure
@@ -330,6 +330,7 @@ vhost_scsi=""
 vhost_vsock=""
 vhost_user=""
 vhost_user_fs=""
+bpf=""
 kvm="auto"
 hax="auto"
 hvf="auto"
@@ -1210,6 +1211,10 @@ for opt do
   ;;
   --enable-membarrier) membarrier="yes"
   ;;
+  --disable-bpf) bpf="no"
+  ;;
+  --enable-bpf) bpf="yes"
+  ;;
   --disable-blobs) blobs="false"
   ;;
   --with-pkgversion=*) pkgversion="$optarg"
@@ -1792,6 +1797,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   vhost-kernel    vhost kernel backend support
   vhost-user      vhost-user backend support
   vhost-vdpa      vhost-vdpa kernel backend support
+  bpf             BPF kernel support
   spice           spice
   rbd             rados block device (rbd)
   libiscsi        iscsi support
@@ -5347,6 +5353,33 @@ else
     membarrier=no
 fi
 
+##########################################
+# check for usable bpf system call
+if test "$bpf" = ""; then
+    have_bpf=no
+    if test "$linux" = "yes" ; then
+        cat > $TMPC << EOF
+    #include <sys/syscall.h>
+    #include <linux/bpf.h>
+    #include <unistd.h>
+    #include <stdlib.h>
+    #include <string.h>
+    int main(void) {
+        union bpf_attr * attr = NULL;
+        syscall(__NR_bpf, BPF_PROG_LOAD, attr, sizeof(attr));
+        exit(0);
+    }
+EOF
+        if compile_prog "" "" ; then
+            have_bpf=yes
+            bpf=yes
+        fi
+    fi
+    if test "$have_bpf" = "no"; then
+      feature_not_found "bpf" "the bpf system call is not available"
+    fi
+fi
+
 ##########################################
 # check if rtnetlink.h exists and is useful
 have_rtnetlink=no
@@ -6279,6 +6312,9 @@ fi
 if test "$membarrier" = "yes" ; then
   echo "CONFIG_MEMBARRIER=y" >> $config_host_mak
 fi
+if test "$bpf" = "yes" -a "$bigendian" != "yes"; then
+  echo "CONFIG_EBPF=y" >> $config_host_mak
+fi
 if test "$signalfd" = "yes" ; then
   echo "CONFIG_SIGNALFD=y" >> $config_host_mak
 fi
diff --git a/ebpf/ebpf-stub.c b/ebpf/ebpf-stub.c
new file mode 100644
index 0000000000..281dc039d3
--- /dev/null
+++ b/ebpf/ebpf-stub.c
@@ -0,0 +1,28 @@
+#include "qemu/osdep.h"
+#include "ebpf/ebpf_rss.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+    return false;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+    return false;
+}
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key)
+{
+    return false;
+}
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx)
+{
+
+}
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
new file mode 100644
index 0000000000..f3c948a7a0
--- /dev/null
+++ b/ebpf/ebpf_rss.c
@@ -0,0 +1,178 @@
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+
+#include "hw/virtio/virtio-net.h" /* VIRTIO_NET_RSS_MAX_TABLE_LEN */
+
+#include "ebpf/ebpf_rss.h"
+#include "ebpf/ebpf.h"
+#include "ebpf/tun_rss_steering.h"
+#include "trace.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+    if (ctx != NULL) {
+        ctx->program_fd = -1;
+    }
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+    return ctx != NULL && ctx->program_fd >= 0;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+    if (ctx == NULL) {
+        return false;
+    }
+
+    ctx->map_configuration =
+            bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
+                           sizeof(struct EBPFRSSConfig), 1);
+    if (ctx->map_configuration < 0) {
+        trace_ebpf_error("eBPF RSS", "can not create MAP for configurations");
+        goto l_conf_create;
+    }
+    ctx->map_toeplitz_key =
+            bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
+                           VIRTIO_NET_RSS_MAX_KEY_SIZE, 1);
+    if (ctx->map_toeplitz_key < 0) {
+        trace_ebpf_error("eBPF RSS", "can not create MAP for toeplitz key");
+        goto l_toe_create;
+    }
+
+    ctx->map_indirections_table =
+            bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(uint32_t),
+                           sizeof(uint16_t), VIRTIO_NET_RSS_MAX_TABLE_LEN);
+    if (ctx->map_indirections_table < 0) {
+        trace_ebpf_error("eBPF RSS", "can not create MAP for indirections table");
+        goto l_table_create;
+    }
+
+    bpf_fixup_mapfd(reltun_rss_steering,
+            sizeof(reltun_rss_steering) / sizeof(struct fixup_mapfd_t),
+            instun_rss_steering,
+            sizeof(instun_rss_steering) / sizeof(struct bpf_insn),
+            "tap_rss_map_configurations", ctx->map_configuration);
+
+    bpf_fixup_mapfd(reltun_rss_steering,
+            sizeof(reltun_rss_steering) / sizeof(struct fixup_mapfd_t),
+            instun_rss_steering,
+            sizeof(instun_rss_steering) / sizeof(struct bpf_insn),
+            "tap_rss_map_toeplitz_key", ctx->map_toeplitz_key);
+
+    bpf_fixup_mapfd(reltun_rss_steering,
+            sizeof(reltun_rss_steering) / sizeof(struct fixup_mapfd_t),
+            instun_rss_steering,
+            sizeof(instun_rss_steering) / sizeof(struct bpf_insn),
+            "tap_rss_map_indirection_table", ctx->map_indirections_table);
+
+    ctx->program_fd =
+            bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, instun_rss_steering,
+                         sizeof(instun_rss_steering) / sizeof(struct bpf_insn),
+                         "GPL");
+    if (ctx->program_fd < 0) {
+        trace_ebpf_error("eBPF RSS", "can not load eBPF program");
+        goto l_prog_load;
+    }
+
+    return true;
+l_prog_load:
+    close(ctx->map_indirections_table);
+l_table_create:
+    close(ctx->map_toeplitz_key);
+l_toe_create:
+    close(ctx->map_configuration);
+l_conf_create:
+    return false;
+}
+
+static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
+                                struct EBPFRSSConfig *config)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
+    uint32_t map_key = 0;
+    if (bpf_update_elem(ctx->map_configuration,
+                            &map_key, config, BPF_ANY) < 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool ebpf_rss_set_indirections_table(struct EBPFRSSContext *ctx,
+                                            uint16_t *indirections_table,
+                                            size_t len)
+{
+    if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
+       len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
+        return false;
+    }
+    uint32_t i = 0;
+
+    for (; i < len; ++i) {
+        if (bpf_update_elem(ctx->map_indirections_table, &i,
+                                indirections_table + i, BPF_ANY) < 0) {
+            return false;
+        }
+    }
+    return true;
+}
+
+static bool ebpf_rss_set_toepliz_key(struct EBPFRSSContext *ctx,
+                                     uint8_t *toeplitz_key)
+{
+    if (!ebpf_rss_is_loaded(ctx) || toeplitz_key == NULL) {
+        return false;
+    }
+    uint32_t map_key = 0;
+
+    /* prepare toeplitz key */
+    uint8_t toe[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {};
+    memcpy(toe, toeplitz_key, VIRTIO_NET_RSS_MAX_KEY_SIZE);
+    *(uint32_t *)toe = ntohl(*(uint32_t *)toe);
+
+    if (bpf_update_elem(ctx->map_toeplitz_key, &map_key, toe,
+                            BPF_ANY) < 0) {
+        return false;
+    }
+    return true;
+}
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key)
+{
+    if (!ebpf_rss_is_loaded(ctx) || config == NULL ||
+        indirections_table == NULL || toeplitz_key == NULL) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_config(ctx, config)) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_indirections_table(ctx, indirections_table,
+                                      config->indirections_len)) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_toepliz_key(ctx, toeplitz_key)) {
+        return false;
+    }
+
+    return true;
+}
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return;
+    }
+
+    close(ctx->program_fd);
+    close(ctx->map_configuration);
+    close(ctx->map_toeplitz_key);
+    close(ctx->map_indirections_table);
+    ctx->program_fd = -1;
+}
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
new file mode 100644
index 0000000000..ffed7b571a
--- /dev/null
+++ b/ebpf/ebpf_rss.h
@@ -0,0 +1,30 @@
+#ifndef QEMU_EBPF_RSS_H
+#define QEMU_EBPF_RSS_H
+
+struct EBPFRSSContext {
+    int program_fd;
+    int map_configuration;
+    int map_toeplitz_key;
+    int map_indirections_table;
+};
+
+struct EBPFRSSConfig {
+    uint8_t redirect;
+    uint8_t populate_hash;
+    uint32_t hash_types;
+    uint16_t indirections_len;
+    uint16_t default_queue;
+};
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key);
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx);
+
+#endif /* QEMU_EBPF_RSS_H */
diff --git a/ebpf/meson.build b/ebpf/meson.build
new file mode 100644
index 0000000000..10f4bc9ca8
--- /dev/null
+++ b/ebpf/meson.build
@@ -0,0 +1 @@
+specific_ss.add(when: 'CONFIG_EBPF', if_true: files('ebpf_rss.c', 'ebpf.c'), if_false: files('ebpf-stub.c'))
diff --git a/meson.build b/meson.build
index 47e32e1fcb..d0ea1a0e9d 100644
--- a/meson.build
+++ b/meson.build
@@ -1368,6 +1368,7 @@ if have_system
     'backends',
     'backends/tpm',
     'chardev',
+    'ebpf',
     'hw/9pfs',
     'hw/acpi',
     'hw/alpha',
@@ -1530,6 +1531,7 @@ subdir('accel')
 subdir('plugins')
 subdir('bsd-user')
 subdir('linux-user')
+subdir('ebpf')
 
 bsd_user_ss.add(files('gdbstub.c'))
 specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
@@ -2093,6 +2095,7 @@ summary_info += {'vhost-vsock support': config_host.has_key('CONFIG_VHOST_VSOCK'
 summary_info += {'vhost-user support': config_host.has_key('CONFIG_VHOST_KERNEL')}
 summary_info += {'vhost-user-fs support': config_host.has_key('CONFIG_VHOST_USER_FS')}
 summary_info += {'vhost-vdpa support': config_host.has_key('CONFIG_VHOST_VDPA')}
+summary_info += {'bpf support': config_host.has_key('CONFIG_EBPF')}
 summary_info += {'Trace backends':    config_host['TRACE_BACKENDS']}
 if config_host['TRACE_BACKENDS'].split().contains('simple')
   summary_info += {'Trace output file': config_host['CONFIG_TRACE_FILE'] + '-<pid>'}
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
                   ` (3 preceding siblings ...)
  2020-11-02 18:51 ` [RFC PATCH 4/6] ebpf: Added eBPF RSS loader Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-04  3:09   ` Jason Wang
  2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
  2020-11-03  9:02 ` [RFC PATCH 0/6] eBPF RSS support for virtio-net Jason Wang
  6 siblings, 1 reply; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

When RSS is enabled the device tries to load the eBPF program
to select RX virtqueue in the TUN. If eBPF can be loaded
the RSS will function also with vhost (works with kernel 5.8 and later).
Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
or when hash population requested by the guest.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 hw/net/vhost_net.c             |   2 +
 hw/net/virtio-net.c            | 120 +++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-net.h |   4 ++
 net/vhost-vdpa.c               |   2 +
 4 files changed, 124 insertions(+), 4 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 24d555e764..16124f99c3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
+    VIRTIO_NET_F_RSS,
+    VIRTIO_NET_F_HASH_REPORT,
 
     /* This bit implies RARP isn't sent by QEMU out of band */
     VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 277289d56e..afcc3032ec 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
 
 static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
 
+static uint64_t fix_ebpf_vhost_features(uint64_t features)
+{
+    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature */
+    uint64_t ret = features;
+#ifndef CONFIG_EBPF
+    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
+#endif
+    /* for now, there is no solution for populating the hash from eBPF */
+    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);
+
+    return ret;
+}
+
 static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
                                         Error **errp)
 {
@@ -732,9 +745,9 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
         return features;
     }
 
-    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
-    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
-    features = vhost_net_get_features(get_vhost_net(nc->peer), features);
+    features = fix_ebpf_vhost_features(
+            vhost_net_get_features(get_vhost_net(nc->peer), features));
+
     vdev->backend_features = features;
 
     if (n->mtu_bypass_backend &&
@@ -1169,12 +1182,75 @@ static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
     }
 }
 
+static void virtio_net_unload_epbf_rss(VirtIONet *n);
+
 static void virtio_net_disable_rss(VirtIONet *n)
 {
     if (n->rss_data.enabled) {
         trace_virtio_net_rss_disable();
     }
     n->rss_data.enabled = false;
+
+    if (!n->rss_data.enabled_software_rss && ebpf_rss_is_loaded(&n->ebpf_rss)) {
+        virtio_net_unload_epbf_rss(n);
+    }
+}
+
+static bool virtio_net_attach_steering_ebpf(NICState *nic, int prog_fd)
+{
+    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
+    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
+        return false;
+    }
+
+    return nc->info->set_steering_ebpf(nc, prog_fd);
+}
+
+static void rss_data_to_rss_config(struct VirtioNetRssData *data,
+                                   struct EBPFRSSConfig *config)
+{
+    config->redirect = data->redirect;
+    config->populate_hash = data->populate_hash;
+    config->hash_types = data->hash_types;
+    config->indirections_len = data->indirections_len;
+    config->default_queue = data->default_queue;
+}
+
+static bool virtio_net_load_epbf_rss(VirtIONet *n)
+{
+    struct EBPFRSSConfig config = {};
+
+    if (!n->rss_data.enabled) {
+        if (ebpf_rss_is_loaded(&n->ebpf_rss)) {
+            ebpf_rss_unload(&n->ebpf_rss);
+        }
+        return true;
+    }
+
+    if (!ebpf_rss_is_loaded(&n->ebpf_rss) && !ebpf_rss_load(&n->ebpf_rss)) {
+        return false;
+    }
+
+    rss_data_to_rss_config(&n->rss_data, &config);
+
+    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
+                          n->rss_data.indirections_table, n->rss_data.key)) {
+        ebpf_rss_unload(&n->ebpf_rss);
+        return false;
+    }
+
+    if (!virtio_net_attach_steering_ebpf(n->nic, n->ebpf_rss.program_fd)) {
+        ebpf_rss_unload(&n->ebpf_rss);
+        return false;
+    }
+
+    return true;
+}
+
+static void virtio_net_unload_epbf_rss(VirtIONet *n)
+{
+    virtio_net_attach_steering_ebpf(n->nic, -1);
+    ebpf_rss_unload(&n->ebpf_rss);
 }
 
 static uint16_t virtio_net_handle_rss(VirtIONet *n,
@@ -1208,6 +1284,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
         err_value = (uint32_t)s;
         goto error;
     }
+    n->rss_data.enabled_software_rss = false;
     n->rss_data.hash_types = virtio_ldl_p(vdev, &cfg.hash_types);
     n->rss_data.indirections_len =
         virtio_lduw_p(vdev, &cfg.indirection_table_mask);
@@ -1289,9 +1366,30 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
         goto error;
     }
     n->rss_data.enabled = true;
+
+    if (!n->rss_data.populate_hash) {
+        /* load EBPF RSS */
+        if (!virtio_net_load_epbf_rss(n)) {
+            /* EBPF mast be loaded for vhost */
+            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+                warn_report("Can't load eBPF RSS for vhost");
+                goto error;
+            }
+            /* fallback to software RSS */
+            warn_report("Can't load eBPF RSS - fallback to software RSS");
+            n->rss_data.enabled_software_rss = true;
+        }
+    } else {
+        /* use software RSS for hash populating */
+        /* and unload eBPF if was loaded before */
+        virtio_net_unload_epbf_rss(n);
+        n->rss_data.enabled_software_rss = true;
+    }
+
     trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                 n->rss_data.indirections_len,
                                 temp.b);
+
     return queues;
 error:
     trace_virtio_net_rss_error(err_msg, err_value);
@@ -1674,7 +1772,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
         return -1;
     }
 
-    if (!no_rss && n->rss_data.enabled) {
+    if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
         int index = virtio_net_process_rss(nc, buf, size);
         if (index >= 0) {
             NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
@@ -2780,6 +2878,18 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
     }
 
     if (n->rss_data.enabled) {
+        n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+        if (!n->rss_data.populate_hash) {
+            if (!virtio_net_load_epbf_rss(n)) {
+                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+                    error_report("Can't post-load eBPF RSS for vhost");
+                } else {
+                    warn_report("Can't post-load eBPF RSS - fallback to software RSS");
+                    n->rss_data.enabled_software_rss = true;
+                }
+            }
+        }
+
         trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                     n->rss_data.indirections_len,
                                     sizeof(n->rss_data.key));
@@ -3453,6 +3563,8 @@ static void virtio_net_instance_init(Object *obj)
     device_add_bootindex_property(obj, &n->nic_conf.bootindex,
                                   "bootindex", "/ethernet-phy@0",
                                   DEVICE(n));
+
+    ebpf_rss_init(&n->ebpf_rss);
 }
 
 static int virtio_net_pre_save(void *opaque)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index f4852ac27b..4d29a577eb 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -21,6 +21,8 @@
 #include "qemu/option_int.h"
 #include "qom/object.h"
 
+#include "ebpf/ebpf_rss.h"
+
 #define TYPE_VIRTIO_NET "virtio-net-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
 
@@ -130,6 +132,7 @@ typedef struct VirtioNetRscChain {
 
 typedef struct VirtioNetRssData {
     bool    enabled;
+    bool    enabled_software_rss;
     bool    redirect;
     bool    populate_hash;
     uint32_t hash_types;
@@ -214,6 +217,7 @@ struct VirtIONet {
     Notifier migration_state;
     VirtioNetRssData rss_data;
     struct NetRxPkt *rx_pkt;
+    struct EBPFRSSContext ebpf_rss;
 };
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 99c476db8c..feb5fa8624 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -54,6 +54,8 @@ const int vdpa_feature_bits[] = {
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
+    VIRTIO_NET_F_RSS,
+    VIRTIO_NET_F_HASH_REPORT,
     VIRTIO_NET_F_GUEST_ANNOUNCE,
     VIRTIO_NET_F_STATUS,
     VHOST_INVALID_FEATURE_BIT
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 6/6] docs: Added eBPF documentation.
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
                   ` (4 preceding siblings ...)
  2020-11-02 18:51 ` [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net Andrew Melnychenko
@ 2020-11-02 18:51 ` Andrew Melnychenko
  2020-11-04  3:15   ` Jason Wang
  2020-11-05  3:56   ` Jason Wang
  2020-11-03  9:02 ` [RFC PATCH 0/6] eBPF RSS support for virtio-net Jason Wang
  6 siblings, 2 replies; 36+ messages in thread
From: Andrew Melnychenko @ 2020-11-02 18:51 UTC (permalink / raw)
  To: jasowang, mst; +Cc: yan, yuri.benditovich, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

Also, added maintainers information.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
---
 MAINTAINERS       |   6 +++
 docs/ebpf.rst     |  29 +++++++++++
 docs/ebpf_rss.rst | 129 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 docs/ebpf.rst
 create mode 100644 docs/ebpf_rss.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 2c22bbca5a..464b3f3c95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3111,6 +3111,12 @@ S: Maintained
 F: hw/semihosting/
 F: include/hw/semihosting/
 
+EBPF:
+M: Andrew Melnychenko <andrew@daynix.com>
+M: Yuri Benditovich <yuri.benditovich@daynix.com>
+S: Maintained
+F: ebpf/*
+
 Build and test automation
 -------------------------
 Build and test automation
diff --git a/docs/ebpf.rst b/docs/ebpf.rst
new file mode 100644
index 0000000000..e45d085432
--- /dev/null
+++ b/docs/ebpf.rst
@@ -0,0 +1,29 @@
+===========================
+eBPF qemu support
+===========================
+
+eBPF support (CONFIG_EBPF) is enabled automatically by 'configure' script
+if 'bpf' system call is available.
+To disable eBPF support use './configure --disable-bpf'
+
+Basic eBPF functionality is located in ebpf/ebpf.c and ebpf/ebpf.h.
+There are basic functions to load the eBPF program into the kernel.
+Mostly, functions name are self-explanatory:
+
+- `bpf_create_map()`, `bpf_lookup_element()`, `bpf_update_element()`, `bpf_delete_element()` - manages eBPF maps. On error, a basic error message would be reported and returned -1. On success, 0 would be returned(`bpf_create_map()` returns map's file descriptor).
+- `bpf_prog_load()` - load the program. The program has to have proper map file descriptors if there are used. On error - the log eBPF would be reported. On success, the program file descriptor returned.
+- `bpf_fixup_mapfd()` - would place map file descriptor into the program according to 'relocate array' of 'struct fixup_mapfd_t'. The function would return how many instructions were 'fixed' aka how many relocations was occurred.
+
+Simplified workflow would look like this:
+
+.. code:: C
+
+    int map1 = bpf_create_map(...);
+    int map2 = bpf_create_map(...);
+
+    bpf_fixup_mapfd(<fixup table>, ARRAY_SIZE(<fixup table>), <instructions pointer>, ARRAY_SIZE(<instructions pointer>), <map1 name>, map1);
+    bpf_fixup_mapfd(<fixup table>, ARRAY_SIZE(<fixup table>), <instructions pointer>, ARRAY_SIZE(<instructions pointer>), <map2 name>, map2);
+
+    int prog = bpf_prog_load(<program type>, <instructions pointer>, ARRAY_SIZE(<instructions pointer>), "GPL");
+
+See the bpf(2) for details.
diff --git a/docs/ebpf_rss.rst b/docs/ebpf_rss.rst
new file mode 100644
index 0000000000..96fee391b8
--- /dev/null
+++ b/docs/ebpf_rss.rst
@@ -0,0 +1,129 @@
+===========================
+eBPF RSS virtio-net support
+===========================
+
+RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
+by calculating packet hash. Usually every queue is processed then by a specific guest CPU core.
+
+For now there are 2 RSS implementations in qemu:
+- 'software' RSS (functions if qemu receives network packets, i.e. vhost=off)
+- eBPF RSS (can function with also with vhost=on)
+
+If steering BPF is not set for kernel's TUN module, the TUN uses automatic selection
+of rx virtqueue based on lookup table built according to calculated symmetric hash
+of transmitted packets.
+If steering BPF is set for TUN the BPF code calculates the hash of packet header and
+returns the virtqueue number to place the packet to.
+
+Simplified decision formula:
+
+.. code:: C
+
+    queue_index = indirection_table[hash(<packet data>)%<indirection_table size>]
+
+
+Not for all packets, the hash can/should be calculated.
+
+Note: currently, eBPF RSS does not support hash reporting.
+
+eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap configurations:
+
+- eBPF is used:
+
+        tap,vhost=off & virtio-net-pci,rss=on,hash=off
+
+- eBPF is used:
+
+        tap,vhost=on & virtio-net-pci,rss=on,hash=off
+
+- 'software' RSS is used:
+
+        tap,vhost=off & virtio-net-pci,rss=on,hash=on
+
+- eBPF is used, hash population feature is not reported to the guest:
+
+        tap,vhost=on & virtio-net-pci,rss=on,hash=on
+
+If CONFIG_EBPF is not set then only 'software' RSS is supported.
+Also 'software' RSS, as a fallback, is used if the eBPF program failed to load or set to TUN.
+
+RSS eBPF program
+----------------
+
+RSS program located in ebpf/tun_rss_steering.h as an array of 'struct bpf_insn'.
+So the program is part of the qemu binary.
+Initially, the eBPF program was compiled by clang and source code located at ebpf/rss.bpf.c.
+Prerequisites to recompile the eBPF program (regenerate ebpf/tun_rss_steering.h):
+
+        llvm, clang, kernel source tree, python3 + (pip3 pyelftools)
+        Adjust 'linuxhdrs' in Makefile.ebpf to reflect the location of the kernel source tree
+
+        $ cd ebpf
+        $ make -f Makefile.ebpf
+
+Note the python script for convertation from eBPF ELF object to '.h' file - Ebpf_to_C.py:
+
+        $ python EbpfElf_to_C.py rss.bpf.o tun_rss_steering
+
+The first argument of the script is ELF object, second - section name where the eBPF program located.
+The script would generate <section name>.h file with eBPF instructions and 'relocate array'.
+'relocate array' is an array of 'struct fixup_mapfd_t' with the name of the eBPF map and instruction offset where the file descriptor of the map should be placed.
+
+Current eBPF RSS implementation uses 'bounded loops' with 'backward jump instructions' which present in the last kernels.
+Overall eBPF RSS works on kernels 5.8+.
+
+eBPF RSS implementation
+-----------------------
+
+eBPF RSS loading functionality located in ebpf/ebpf_rss.c and ebpf/ebpf_rss.h.
+
+The `struct EBPFRSSContext` structure that holds 4 file descriptors:
+
+- program_fd - file descriptor of the eBPF RSS program.
+- map_configuration - file descriptor of the 'configuration' map. This map contains one element of 'struct EBPFRSSConfig'. This configuration determines eBPF program behavior.
+- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One element of the 40byte key prepared for the hashing algorithm.
+- map_indirections_table - 128 elements of queue indexes.
+
+`struct EBPFRSSConfig` fields:
+
+- redirect - "boolean" value, should the hash be calculated, on false  - `default_queue` would be used as the final decision.
+- populate_hash - for now, not used. eBPF RSS doesn't support hash reporting.
+- hash_types - binary mask of different hash types. See `VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be calculated - `default_queue` would be used.
+- indirections_len - length of the indirections table, maximum 128.
+- default_queue - the queue index that used for packet that shouldn't be hashed. For some packets, the hash can't be calculated(g.e ARP).
+
+Functions:
+
+- `ebpf_rss_init()` - sets program_fd to -1, which indicates that EBPFRSSContext is not loaded.
+- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from tun_rss_steering.h. Returns 'true' on success. After that, program_fd can be used to set steering for TAP.
+- `ebpf_rss_set_all()` - sets values for eBPF maps. `indirections_table` length is in EBPFRSSConfig. `toeplitz_key` is VIRTIO_NET_RSS_MAX_KEY_SIZE aka 40 bytes array.
+- `ebpf_rss_unload()` - close all file descriptors and set program_fd to -1.
+
+Simplified eBPF RSS workflow:
+
+.. code:: C
+
+    struct EBPFRSSConfig config;
+    config.redirect = 1;
+    config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
+    config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_LEN;
+    config.default_queue = 0;
+
+    uint16_t table[VIRTIO_NET_RSS_MAX_TABLE_LEN] = {...};
+    uint8_t key[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {...};
+
+    struct EBPFRSSContext ctx;
+    ebpf_rss_init(&ctx);
+    ebpf_rss_load(&ctx);
+    ebpf_rss_set_all(&ctx, &config, table, key);
+    if (net_client->info->set_steering_ebpf != NULL) {
+        net_client->info->set_steering_ebpf(net_client, ctx->program_fd);
+    }
+    ...
+    ebpf_unload(&ctx);
+
+
+NetClientState SetSteeringEBPF()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For now, `set_steering_ebpf()` method supported by Linux TAP NetClientState. The method requires an eBPF program file descriptor as an argument.
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
                   ` (5 preceding siblings ...)
  2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
@ 2020-11-03  9:02 ` Jason Wang
  2020-11-03 10:32   ` Yuri Benditovich
  6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-03  9:02 UTC (permalink / raw)
  To: Andrew Melnychenko, mst; +Cc: yan, yuri.benditovich, qemu-devel


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> Basic idea is to use eBPF to calculate and steer packets in TAP.
> RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
> by calculating packet hash.
> eBPF RSS allows us to use RSS with vhost TAP.
>
> This set of patches introduces the usage of eBPF for packet steering
> and RSS hash calculation:
> * RSS(Receive Side Scaling) is used to distribute network packets to
> guest virtqueues by calculating packet hash
> * eBPF RSS suppose to be faster than already existing 'software'
> implementation in QEMU
> * Additionally adding support for the usage of RSS with vhost
>
> Supported kernels: 5.8+
>
> Implementation notes:
> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> Added eBPF support to qemu directly through a system call, see the
> bpf(2) for details.
> The eBPF program is part of the qemu and presented as an array of bpf
> instructions.
> The program can be recompiled by provided Makefile.ebpf(need to adjust
> 'linuxhdrs'),
> although it's not required to build QEMU with eBPF support.
> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> 'Software' RSS used in the case of hash population and as a fallback option.
> For vhost, the hash population feature is not reported to the guest.
>
> Please also see the documentation in PATCH 6/6.
>
> I am sending those patches as RFC to initiate the discussions and get
> feedback on the following points:
> * Fallback when eBPF is not supported by the kernel


Yes, and it could also a lacking of CAP_BPF.


> * Live migration to the kernel that doesn't have eBPF support


Is there anything that we needs special treatment here?


> * Integration with current QEMU build


Yes, a question here:

1) Any reason for not using libbpf, e.g it has been shipped with some 
distros
2) It would be better if we can avoid shipping bytecodes


> * Additional usage for eBPF for packet filtering


Another interesting topics in to implement mac/vlan filters. And in the 
future, I plan to add mac based steering. All of these could be done via 
eBPF.


>
> Know issues:
> * hash population not supported by eBPF RSS: 'software' RSS used


Is this because there's not way to write to vnet header in STERRING BPF?


> as a fallback, also, hash population feature is not reported to guests
> with vhost.
> * big-endian BPF support: for now, eBPF is disabled for big-endian systems.


Are there any blocker for this?

Just some quick questions after a glance of the codes. Will go through 
them tomorrow.

Thanks


>
> Andrew (6):
>    Added SetSteeringEBPF method for NetClientState.
>    ebpf: Added basic eBPF API.
>    ebpf: Added eBPF RSS program.
>    ebpf: Added eBPF RSS loader.
>    virtio-net: Added eBPF RSS to virtio-net.
>    docs: Added eBPF documentation.
>
>   MAINTAINERS                    |   6 +
>   configure                      |  36 +++
>   docs/ebpf.rst                  |  29 ++
>   docs/ebpf_rss.rst              | 129 ++++++++
>   ebpf/EbpfElf_to_C.py           |  67 ++++
>   ebpf/Makefile.ebpf             |  38 +++
>   ebpf/ebpf-stub.c               |  28 ++
>   ebpf/ebpf.c                    | 107 +++++++
>   ebpf/ebpf.h                    |  35 +++
>   ebpf/ebpf_rss.c                | 178 +++++++++++
>   ebpf/ebpf_rss.h                |  30 ++
>   ebpf/meson.build               |   1 +
>   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
>   ebpf/trace-events              |   4 +
>   ebpf/trace.h                   |   2 +
>   ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
>   hw/net/vhost_net.c             |   2 +
>   hw/net/virtio-net.c            | 120 ++++++-
>   include/hw/virtio/virtio-net.h |   4 +
>   include/net/net.h              |   2 +
>   meson.build                    |   3 +
>   net/tap-bsd.c                  |   5 +
>   net/tap-linux.c                |  19 ++
>   net/tap-solaris.c              |   5 +
>   net/tap-stub.c                 |   5 +
>   net/tap.c                      |   9 +
>   net/tap_int.h                  |   1 +
>   net/vhost-vdpa.c               |   2 +
>   28 files changed, 1889 insertions(+), 4 deletions(-)
>   create mode 100644 docs/ebpf.rst
>   create mode 100644 docs/ebpf_rss.rst
>   create mode 100644 ebpf/EbpfElf_to_C.py
>   create mode 100755 ebpf/Makefile.ebpf
>   create mode 100644 ebpf/ebpf-stub.c
>   create mode 100644 ebpf/ebpf.c
>   create mode 100644 ebpf/ebpf.h
>   create mode 100644 ebpf/ebpf_rss.c
>   create mode 100644 ebpf/ebpf_rss.h
>   create mode 100644 ebpf/meson.build
>   create mode 100644 ebpf/rss.bpf.c
>   create mode 100644 ebpf/trace-events
>   create mode 100644 ebpf/trace.h
>   create mode 100644 ebpf/tun_rss_steering.h
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-03  9:02 ` [RFC PATCH 0/6] eBPF RSS support for virtio-net Jason Wang
@ 2020-11-03 10:32   ` Yuri Benditovich
  2020-11-03 11:56     ` Daniel P. Berrangé
  2020-11-04  2:07     ` Jason Wang
  0 siblings, 2 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-03 10:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 5954 bytes --]

On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > RSS(Receive Side Scaling) is used to distribute network packets to guest
> virtqueues
> > by calculating packet hash.
> > eBPF RSS allows us to use RSS with vhost TAP.
> >
> > This set of patches introduces the usage of eBPF for packet steering
> > and RSS hash calculation:
> > * RSS(Receive Side Scaling) is used to distribute network packets to
> > guest virtqueues by calculating packet hash
> > * eBPF RSS suppose to be faster than already existing 'software'
> > implementation in QEMU
> > * Additionally adding support for the usage of RSS with vhost
> >
> > Supported kernels: 5.8+
> >
> > Implementation notes:
> > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > Added eBPF support to qemu directly through a system call, see the
> > bpf(2) for details.
> > The eBPF program is part of the qemu and presented as an array of bpf
> > instructions.
> > The program can be recompiled by provided Makefile.ebpf(need to adjust
> > 'linuxhdrs'),
> > although it's not required to build QEMU with eBPF support.
> > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > 'Software' RSS used in the case of hash population and as a fallback
> option.
> > For vhost, the hash population feature is not reported to the guest.
> >
> > Please also see the documentation in PATCH 6/6.
> >
> > I am sending those patches as RFC to initiate the discussions and get
> > feedback on the following points:
> > * Fallback when eBPF is not supported by the kernel
>
>
> Yes, and it could also a lacking of CAP_BPF.
>
>
> > * Live migration to the kernel that doesn't have eBPF support
>
>
> Is there anything that we needs special treatment here?
>
> Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
all the steering does not use proper queues.




>
> > * Integration with current QEMU build
>
>
> Yes, a question here:
>
> 1) Any reason for not using libbpf, e.g it has been shipped with some
> distros
>

We intentionally do not use libbpf, as it present only on some distros.
We can switch to libbpf, but this will disable bpf if libbpf is not
installed


> 2) It would be better if we can avoid shipping bytecodes
>


This creates new dependencies: llvm + clang + ...
We would prefer byte code and ability to generate it if prerequisites are
installed.


>
>
> > * Additional usage for eBPF for packet filtering
>
>
> Another interesting topics in to implement mac/vlan filters. And in the
> future, I plan to add mac based steering. All of these could be done via
> eBPF.
>
>
No problem, we can cooperate if needed


>
> >
> > Know issues:
> > * hash population not supported by eBPF RSS: 'software' RSS used
>
>
> Is this because there's not way to write to vnet header in STERRING BPF?
>
> Yes. We plan to submit changes for kernel to cooperate with BPF and
populate the hash, this work is in progress


>
> > as a fallback, also, hash population feature is not reported to guests
> > with vhost.
> > * big-endian BPF support: for now, eBPF is disabled for big-endian
> systems.
>
>
> Are there any blocker for this?
>

No, can be added in v2


>
> Just some quick questions after a glance of the codes. Will go through
> them tomorrow.
>
> Thanks
>
>
> >
> > Andrew (6):
> >    Added SetSteeringEBPF method for NetClientState.
> >    ebpf: Added basic eBPF API.
> >    ebpf: Added eBPF RSS program.
> >    ebpf: Added eBPF RSS loader.
> >    virtio-net: Added eBPF RSS to virtio-net.
> >    docs: Added eBPF documentation.
> >
> >   MAINTAINERS                    |   6 +
> >   configure                      |  36 +++
> >   docs/ebpf.rst                  |  29 ++
> >   docs/ebpf_rss.rst              | 129 ++++++++
> >   ebpf/EbpfElf_to_C.py           |  67 ++++
> >   ebpf/Makefile.ebpf             |  38 +++
> >   ebpf/ebpf-stub.c               |  28 ++
> >   ebpf/ebpf.c                    | 107 +++++++
> >   ebpf/ebpf.h                    |  35 +++
> >   ebpf/ebpf_rss.c                | 178 +++++++++++
> >   ebpf/ebpf_rss.h                |  30 ++
> >   ebpf/meson.build               |   1 +
> >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
> >   ebpf/trace-events              |   4 +
> >   ebpf/trace.h                   |   2 +
> >   ebpf/tun_rss_steering.h        | 556 +++++++++++++++++++++++++++++++++
> >   hw/net/vhost_net.c             |   2 +
> >   hw/net/virtio-net.c            | 120 ++++++-
> >   include/hw/virtio/virtio-net.h |   4 +
> >   include/net/net.h              |   2 +
> >   meson.build                    |   3 +
> >   net/tap-bsd.c                  |   5 +
> >   net/tap-linux.c                |  19 ++
> >   net/tap-solaris.c              |   5 +
> >   net/tap-stub.c                 |   5 +
> >   net/tap.c                      |   9 +
> >   net/tap_int.h                  |   1 +
> >   net/vhost-vdpa.c               |   2 +
> >   28 files changed, 1889 insertions(+), 4 deletions(-)
> >   create mode 100644 docs/ebpf.rst
> >   create mode 100644 docs/ebpf_rss.rst
> >   create mode 100644 ebpf/EbpfElf_to_C.py
> >   create mode 100755 ebpf/Makefile.ebpf
> >   create mode 100644 ebpf/ebpf-stub.c
> >   create mode 100644 ebpf/ebpf.c
> >   create mode 100644 ebpf/ebpf.h
> >   create mode 100644 ebpf/ebpf_rss.c
> >   create mode 100644 ebpf/ebpf_rss.h
> >   create mode 100644 ebpf/meson.build
> >   create mode 100644 ebpf/rss.bpf.c
> >   create mode 100644 ebpf/trace-events
> >   create mode 100644 ebpf/trace.h
> >   create mode 100644 ebpf/tun_rss_steering.h
> >
>
>

[-- Attachment #2: Type: text/html, Size: 8398 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-03 10:32   ` Yuri Benditovich
@ 2020-11-03 11:56     ` Daniel P. Berrangé
  2020-11-04  2:15       ` Jason Wang
  2020-11-04  2:07     ` Jason Wang
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-03 11:56 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Jason Wang, Michael S . Tsirkin,
	Andrew Melnychenko, qemu-devel

On Tue, Nov 03, 2020 at 12:32:43PM +0200, Yuri Benditovich wrote:
> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:
> 
> >
> > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > RSS(Receive Side Scaling) is used to distribute network packets to guest
> > virtqueues
> > > by calculating packet hash.
> > > eBPF RSS allows us to use RSS with vhost TAP.
> > >
> > > This set of patches introduces the usage of eBPF for packet steering
> > > and RSS hash calculation:
> > > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > guest virtqueues by calculating packet hash
> > > * eBPF RSS suppose to be faster than already existing 'software'
> > > implementation in QEMU
> > > * Additionally adding support for the usage of RSS with vhost
> > >
> > > Supported kernels: 5.8+
> > >
> > > Implementation notes:
> > > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > Added eBPF support to qemu directly through a system call, see the
> > > bpf(2) for details.
> > > The eBPF program is part of the qemu and presented as an array of bpf
> > > instructions.
> > > The program can be recompiled by provided Makefile.ebpf(need to adjust
> > > 'linuxhdrs'),
> > > although it's not required to build QEMU with eBPF support.
> > > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > 'Software' RSS used in the case of hash population and as a fallback
> > option.
> > > For vhost, the hash population feature is not reported to the guest.
> > >
> > > Please also see the documentation in PATCH 6/6.
> > >
> > > I am sending those patches as RFC to initiate the discussions and get
> > > feedback on the following points:
> > > * Fallback when eBPF is not supported by the kernel
> >
> >
> > Yes, and it could also a lacking of CAP_BPF.
> >
> >
> > > * Live migration to the kernel that doesn't have eBPF support
> >
> >
> > Is there anything that we needs special treatment here?
> >
> > Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
> works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
> all the steering does not use proper queues.
> 
> 
> 
> 
> >
> > > * Integration with current QEMU build
> >
> >
> > Yes, a question here:
> >
> > 1) Any reason for not using libbpf, e.g it has been shipped with some
> > distros
> >
> 
> We intentionally do not use libbpf, as it present only on some distros.
> We can switch to libbpf, but this will disable bpf if libbpf is not
> installed

If we were modifying existing funtionality then introducing a dep on
libbpf would be a problem as you'd be breaking existing QEMU users
on distros without libbpf.

This is brand new functionality though, so it is fine to place a
requirement on libbpf. If distros don't ship that library and they
want BPF features in QEMU, then those distros should take responsibility
for adding libbpf to their package set.

> > 2) It would be better if we can avoid shipping bytecodes
> >
> 
> 
> This creates new dependencies: llvm + clang + ...
> We would prefer byte code and ability to generate it if prerequisites are
> installed.

I've double checked with Fedora, and generating the BPF program from
source is a mandatory requirement for QEMU. Pre-generated BPF bytecode
is not permitted.

There was also a question raised about the kernel ABI compatibility
for BPF programs ? 

  https://lwn.net/Articles/831402/

  "The basic problem is that when BPF is compiled, it uses a set
   of kernel headers that describe various kernel data structures
   for that particular version, which may be different from those
   on the kernel where the program is run. Until relatively recently,
   that was solved by distributing the BPF as C code along with the
   Clang compiler to build the BPF on the system where it was going
   to be run."

Is this not an issue for QEMU's usage of BPF here ?

The dependancy on llvm is unfortunate for people who build with GCC,
but at least they can opt-out via a configure switch if they really
want to. As that LWN article notes, GCC will gain BPF support


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 3/6] ebpf: Added eBPF RSS program.
  2020-11-02 18:51 ` [RFC PATCH 3/6] ebpf: Added eBPF RSS program Andrew Melnychenko
@ 2020-11-03 13:07   ` Daniel P. Berrangé
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-03 13:07 UTC (permalink / raw)
  To: Andrew Melnychenko; +Cc: yan, yuri.benditovich, jasowang, qemu-devel, mst

On Mon, Nov 02, 2020 at 08:51:13PM +0200, Andrew Melnychenko wrote:
> From: Andrew <andrew@daynix.com>
> 
> RSS program and Makefile to build it.
> Also, added a python script that would generate '.h' file.
> The data in that file may be loaded by eBPF API.
> EBPF compilation is not required for building qemu.
> You can use Makefile if you need to regenerate tun_rss_steering.h.
> 
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>  ebpf/EbpfElf_to_C.py    |  67 +++++
>  ebpf/Makefile.ebpf      |  38 +++
>  ebpf/rss.bpf.c          | 470 +++++++++++++++++++++++++++++++++
>  ebpf/tun_rss_steering.h | 556 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 1131 insertions(+)
>  create mode 100644 ebpf/EbpfElf_to_C.py
>  create mode 100755 ebpf/Makefile.ebpf
>  create mode 100644 ebpf/rss.bpf.c
>  create mode 100644 ebpf/tun_rss_steering.h
> 


> diff --git a/ebpf/Makefile.ebpf b/ebpf/Makefile.ebpf
> new file mode 100755
> index 0000000000..f7008d7d32
> --- /dev/null
> +++ b/ebpf/Makefile.ebpf
> @@ -0,0 +1,38 @@
> +OBJS = rss.bpf.o
> +
> +LLC ?= llc
> +CLANG ?= clang
> +INC_FLAGS = -nostdinc -isystem `$(CLANG) -print-file-name=include`
> +EXTRA_CFLAGS ?= -O2 -emit-llvm
> +
> +linuxhdrs = ~/src/kernel/master
> +
> +LINUXINCLUDE =  -I $(linuxhdrs)/arch/x86/include/uapi \
> +                -I $(linuxhdrs)/arch/x86/include/generated/uapi \
> +                -I $(linuxhdrs)/arch/x86/include/generated \
> +                -I $(linuxhdrs)/include/generated/uapi \
> +                -I $(linuxhdrs)/include/uapi \
> +                -I $(linuxhdrs)/include \
> +                -I $(linuxhdrs)/tools/lib
> +
> +all: $(OBJS)
> +
> +.PHONY: clean
> +
> +clean:
> +	rm -f $(OBJS)
> +
> +INC_FLAGS = -nostdinc -isystem `$(CLANG) -print-file-name=include`
> +
> +$(OBJS):  %.o:%.c
> +	$(CLANG) $(INC_FLAGS) \
> +                -D__KERNEL__ -D__ASM_SYSREG_H \
> +                -Wno-unused-value -Wno-pointer-sign \
> +                -Wno-compare-distinct-pointer-types \
> +                -Wno-gnu-variable-sized-type-not-at-end \
> +                -Wno-address-of-packed-member -Wno-tautological-compare \
> +                -Wno-unknown-warning-option \
> +                -I../include $(LINUXINCLUDE) \
> +                $(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> +	python3 EbpfElf_to_C.py -f rss.bpf.o -s tun_rss_steering

Note that QEMU has switched to Meson for its build system, so even if
we're not running the rules by default, they should still be defined
with Meson, rather than make.

> diff --git a/ebpf/rss.bpf.c b/ebpf/rss.bpf.c
> new file mode 100644
> index 0000000000..084fc33f96
> --- /dev/null
> +++ b/ebpf/rss.bpf.c
> @@ -0,0 +1,470 @@

Missing license header

> +#include <stddef.h>
> +#include <stdbool.h>
> +#include <linux/bpf.h>
> +
> +#include <linux/in.h>
> +#include <linux/if_ether.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +
> +#include <linux/udp.h>
> +#include <linux/tcp.h>
> +
> +#include <bpf/bpf_helpers.h>
> +#include <linux/virtio_net.h>
> +
> +/*
> + * Prepare:
> + * Requires llvm, clang, python3 with pyelftools, linux kernel tree
> + *
> + * Build tun_rss_steering.h:
> + * make -f Mefile.ebpf clean all
> + */
> +
> +#define INDIRECTION_TABLE_SIZE 128
> +#define HASH_CALCULATION_BUFFER_SIZE 36
> +
> +struct rss_config_t {
> +    __u8 redirect;
> +    __u8 populate_hash;
> +    __u32 hash_types;
> +    __u16 indirections_len;
> +    __u16 default_queue;
> +};
> +
> +struct toeplitz_key_data_t {
> +    __u32 leftmost_32_bits;
> +    __u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
> +};
> +
> +struct packet_hash_info_t {
> +    __u8 is_ipv4;
> +    __u8 is_ipv6;
> +    __u8 is_udp;
> +    __u8 is_tcp;
> +    __u8 is_ipv6_ext_src;
> +    __u8 is_ipv6_ext_dst;
> +
> +    __u16 src_port;
> +    __u16 dst_port;
> +
> +    union {
> +        struct {
> +            __be32 in_src;
> +            __be32 in_dst;
> +        };
> +
> +        struct {
> +            struct in6_addr in6_src;
> +            struct in6_addr in6_dst;
> +            struct in6_addr in6_ext_src;
> +            struct in6_addr in6_ext_dst;
> +        };
> +    };
> +};
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_ARRAY);
> +    __type(key, __u32);
> +    __type(value, struct rss_config_t);
> +    __uint(max_entries, 1);
> +} tap_rss_map_configurations SEC(".maps");
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_ARRAY);
> +    __type(key, __u32);
> +    __type(value, struct toeplitz_key_data_t);
> +    __uint(max_entries, 1);
> +} tap_rss_map_toeplitz_key SEC(".maps");
> +
> +struct {
> +    __uint(type, BPF_MAP_TYPE_ARRAY);
> +    __type(key, __u32);
> +    __type(value, __u16);
> +    __uint(max_entries, INDIRECTION_TABLE_SIZE);
> +} tap_rss_map_indirection_table SEC(".maps");
> +
> +
> +static inline void net_rx_rss_add_chunk(__u8 *rss_input, size_t *bytes_written,
> +                                        const void *ptr, size_t size) {
> +    __builtin_memcpy(&rss_input[*bytes_written], ptr, size);
> +    *bytes_written += size;
> +}
> +
> +static inline
> +void net_toeplitz_add(__u32 *result,
> +                      __u8 *input,
> +                      __u32 len
> +        , struct toeplitz_key_data_t *key) {
> +
> +    __u32 accumulator = *result;
> +    __u32 leftmost_32_bits = key->leftmost_32_bits;
> +    __u32 byte;
> +
> +    for (byte = 0; byte < HASH_CALCULATION_BUFFER_SIZE; byte++) {
> +        __u8 input_byte = input[byte];
> +        __u8 key_byte = key->next_byte[byte];
> +        __u8 bit;
> +
> +        for (bit = 0; bit < 8; bit++) {
> +            if (input_byte & (1 << 7)) {
> +                accumulator ^= leftmost_32_bits;
> +            }
> +
> +            leftmost_32_bits =
> +                    (leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
> +
> +            input_byte <<= 1;
> +            key_byte <<= 1;
> +        }
> +    }
> +
> +    *result = accumulator;
> +}
> +
> +
> +static inline int ip6_extension_header_type(__u8 hdr_type)
> +{
> +    switch (hdr_type) {
> +    case IPPROTO_HOPOPTS:
> +    case IPPROTO_ROUTING:
> +    case IPPROTO_FRAGMENT:
> +    case IPPROTO_ICMPV6:
> +    case IPPROTO_NONE:
> +    case IPPROTO_DSTOPTS:
> +    case IPPROTO_MH:
> +        return 1;
> +    default:
> +        return 0;
> +    }
> +}
> +/*
> + * According to https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml
> + * we suspect that there are would be no more than 11 extensions in IPv6 header,
> + * also there is 27 TLV options for Destination and Hop-by-hop extensions.
> + * Need to choose reasonable amount of maximum extensions/options we may check to find
> + * ext src/dst.
> + */
> +#define IP6_EXTENSIONS_COUNT 11
> +#define IP6_OPTIONS_COUNT 30
> +
> +static inline void parse_ipv6_ext(struct __sk_buff *skb,
> +        struct packet_hash_info_t *info,
> +        __u8 *l4_protocol, size_t *l4_offset)
> +{
> +    if (!ip6_extension_header_type(*l4_protocol)) {
> +        return;
> +    }
> +
> +    struct ipv6_opt_hdr ext_hdr = {};
> +
> +    for (unsigned int i = 0; i < IP6_EXTENSIONS_COUNT; ++i) {
> +
> +        bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_hdr,
> +                                    sizeof(ext_hdr), BPF_HDR_START_NET);
> +
> +        if (*l4_protocol == IPPROTO_ROUTING) {
> +            struct ipv6_rt_hdr ext_rt = {};
> +
> +            bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_rt,
> +                                        sizeof(ext_rt), BPF_HDR_START_NET);
> +
> +            if ((ext_rt.type == IPV6_SRCRT_TYPE_2) &&
> +                    (ext_rt.hdrlen == sizeof(struct in6_addr) / 8) &&
> +                    (ext_rt.segments_left == 1)) {
> +
> +                bpf_skb_load_bytes_relative(skb,
> +                    *l4_offset + offsetof(struct rt2_hdr, addr),
> +                    &info->in6_ext_dst, sizeof(info->in6_ext_dst),
> +                    BPF_HDR_START_NET);
> +
> +                info->is_ipv6_ext_dst = 1;
> +            }
> +
> +        } else if (*l4_protocol == IPPROTO_DSTOPTS) {
> +            struct ipv6_opt_t {
> +                __u8 type;
> +                __u8 length;
> +            } __attribute__((packed)) opt = {};
> +
> +            size_t opt_offset = sizeof(ext_hdr);
> +
> +            for (unsigned int j = 0; j < IP6_OPTIONS_COUNT; ++j) {
> +                bpf_skb_load_bytes_relative(skb, *l4_offset + opt_offset,
> +                                        &opt, sizeof(opt), BPF_HDR_START_NET);
> +
> +                opt_offset += (opt.type == IPV6_TLV_PAD1) ?
> +                        1 : opt.length + sizeof(opt);
> +
> +                if (opt_offset + 1 >= ext_hdr.hdrlen * 8) {
> +                    break;
> +                }
> +
> +                if (opt.type == IPV6_TLV_HAO) {
> +                    bpf_skb_load_bytes_relative(skb,
> +                        *l4_offset + opt_offset + offsetof(struct ipv6_destopt_hao, addr),
> +                        &info->is_ipv6_ext_src, sizeof(info->is_ipv6_ext_src),
> +                        BPF_HDR_START_NET);
> +
> +                    info->is_ipv6_ext_src = 1;
> +                    break;
> +                }
> +            }
> +        }
> +
> +        *l4_protocol = ext_hdr.nexthdr;
> +        *l4_offset += (ext_hdr.hdrlen + 1) * 8;
> +
> +        if (!ip6_extension_header_type(ext_hdr.nexthdr)) {
> +            return;
> +        }
> +    }
> +}
> +
> +static inline void parse_packet(struct __sk_buff *skb,
> +        struct packet_hash_info_t *info)
> +{
> +    if (!info || !skb) {
> +        return;
> +    }
> +
> +    size_t l4_offset = 0;
> +    __u8 l4_protocol = 0;
> +    __u16 l3_protocol = __be16_to_cpu(skb->protocol);
> +
> +    if (l3_protocol == ETH_P_IP) {
> +        info->is_ipv4 = 1;
> +
> +        struct iphdr ip = {};
> +        bpf_skb_load_bytes_relative(skb, 0, &ip, sizeof(ip),
> +                                    BPF_HDR_START_NET);
> +
> +        info->in_src = ip.saddr;
> +        info->in_dst = ip.daddr;
> +
> +        l4_protocol = ip.protocol;
> +        l4_offset = ip.ihl * 4;
> +    } else if (l3_protocol == ETH_P_IPV6) {
> +        info->is_ipv6 = 1;
> +
> +        struct ipv6hdr ip6 = {};
> +        bpf_skb_load_bytes_relative(skb, 0, &ip6, sizeof(ip6),
> +                                    BPF_HDR_START_NET);
> +
> +        info->in6_src = ip6.saddr;
> +        info->in6_dst = ip6.daddr;
> +
> +        l4_protocol = ip6.nexthdr;
> +        l4_offset = sizeof(ip6);
> +
> +        parse_ipv6_ext(skb, info, &l4_protocol, &l4_offset);
> +    }
> +
> +    if (l4_protocol != 0) {
> +        if (l4_protocol == IPPROTO_TCP) {
> +            info->is_tcp = 1;
> +
> +            struct tcphdr tcp = {};
> +            bpf_skb_load_bytes_relative(skb, l4_offset, &tcp, sizeof(tcp),
> +                                        BPF_HDR_START_NET);
> +
> +            info->src_port = tcp.source;
> +            info->dst_port = tcp.dest;
> +        } else if (l4_protocol == IPPROTO_UDP) { /* TODO: add udplite? */
> +            info->is_udp = 1;
> +
> +            struct udphdr udp = {};
> +            bpf_skb_load_bytes_relative(skb, l4_offset, &udp, sizeof(udp),
> +                                        BPF_HDR_START_NET);
> +
> +            info->src_port = udp.source;
> +            info->dst_port = udp.dest;
> +        }
> +    }
> +}
> +
> +static inline __u32 calculate_rss_hash(struct __sk_buff *skb,
> +        struct rss_config_t *config, struct toeplitz_key_data_t *toe)
> +{
> +    __u8 rss_input[HASH_CALCULATION_BUFFER_SIZE] = {};
> +    size_t bytes_written = 0;
> +    __u32 result = 0;
> +    struct packet_hash_info_t packet_info = {};
> +
> +    parse_packet(skb, &packet_info);
> +
> +    if (packet_info.is_ipv4) {
> +        if (packet_info.is_tcp &&
> +            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
> +
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_src,
> +                                 sizeof(packet_info.in_src));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_dst,
> +                                 sizeof(packet_info.in_dst));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.src_port,
> +                                 sizeof(packet_info.src_port));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.dst_port,
> +                                 sizeof(packet_info.dst_port));
> +        } else if (packet_info.is_udp &&
> +                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
> +
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_src,
> +                                 sizeof(packet_info.in_src));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_dst,
> +                                 sizeof(packet_info.in_dst));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.src_port,
> +                                 sizeof(packet_info.src_port));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.dst_port,
> +                                 sizeof(packet_info.dst_port));
> +        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_src,
> +                                 sizeof(packet_info.in_src));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.in_dst,
> +                                 sizeof(packet_info.in_dst));
> +        }
> +    } else if (packet_info.is_ipv6) {
> +        if (packet_info.is_tcp &&
> +            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
> +
> +            if (packet_info.is_ipv6_ext_src &&
> +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_src,
> +                                     sizeof(packet_info.in6_ext_src));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_src,
> +                                     sizeof(packet_info.in6_src));
> +            }
> +            if (packet_info.is_ipv6_ext_dst &&
> +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_dst,
> +                                     sizeof(packet_info.in6_ext_dst));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_dst,
> +                                     sizeof(packet_info.in6_dst));
> +            }
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.src_port,
> +                                 sizeof(packet_info.src_port));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.dst_port,
> +                                 sizeof(packet_info.dst_port));
> +        } else if (packet_info.is_udp &&
> +                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
> +
> +            if (packet_info.is_ipv6_ext_src &&
> +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_src,
> +                                     sizeof(packet_info.in6_ext_src));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_src,
> +                                     sizeof(packet_info.in6_src));
> +            }
> +            if (packet_info.is_ipv6_ext_dst &&
> +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_dst,
> +                                     sizeof(packet_info.in6_ext_dst));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_dst,
> +                                     sizeof(packet_info.in6_dst));
> +            }
> +
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.src_port,
> +                                 sizeof(packet_info.src_port));
> +            net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                 &packet_info.dst_port,
> +                                 sizeof(packet_info.dst_port));
> +
> +        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
> +            if (packet_info.is_ipv6_ext_src &&
> +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_src,
> +                                     sizeof(packet_info.in6_ext_src));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_src,
> +                                     sizeof(packet_info.in6_src));
> +            }
> +            if (packet_info.is_ipv6_ext_dst &&
> +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
> +
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_ext_dst,
> +                                     sizeof(packet_info.in6_ext_dst));
> +            } else {
> +                net_rx_rss_add_chunk(rss_input, &bytes_written,
> +                                     &packet_info.in6_dst,
> +                                     sizeof(packet_info.in6_dst));
> +            }
> +        }
> +    }
> +
> +    if (bytes_written) {
> +        net_toeplitz_add(&result, rss_input, bytes_written, toe);
> +    }
> +
> +    return result;
> +}
> +
> +SEC("tun_rss_steering")
> +int tun_rss_steering_prog(struct __sk_buff *skb)
> +{
> +
> +    struct rss_config_t *config = 0;
> +    struct toeplitz_key_data_t *toe = 0;
> +
> +    __u32 key = 0;
> +    __u32 hash = 0;
> +
> +    config = bpf_map_lookup_elem(&tap_rss_map_configurations, &key);
> +    toe = bpf_map_lookup_elem(&tap_rss_map_toeplitz_key, &key);
> +
> +    if (config && toe) {
> +        if (!config->redirect) {
> +            return config->default_queue;
> +        }
> +
> +        hash = calculate_rss_hash(skb, config, toe);
> +        if (hash) {
> +            __u32 table_idx = hash % config->indirections_len;
> +            __u16 *queue = 0;
> +
> +            queue = bpf_map_lookup_elem(&tap_rss_map_indirection_table,
> +                                        &table_idx);
> +
> +            if (queue) {
> +                return *queue;
> +            }
> +        }
> +
> +        return config->default_queue;
> +    }
> +
> +    return -1;
> +}
> +
> +char _license[] SEC("license") = "GPL";

This doesn't specify any GPL version


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-03 10:32   ` Yuri Benditovich
  2020-11-03 11:56     ` Daniel P. Berrangé
@ 2020-11-04  2:07     ` Jason Wang
  2020-11-04  9:31       ` Daniel P. Berrangé
  2020-11-04 11:49       ` Yuri Benditovich
  1 sibling, 2 replies; 36+ messages in thread
From: Jason Wang @ 2020-11-04  2:07 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin


On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>
>
> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > Basic idea is to use eBPF to calculate and steer packets in TAP.
>     > RSS(Receive Side Scaling) is used to distribute network packets
>     to guest virtqueues
>     > by calculating packet hash.
>     > eBPF RSS allows us to use RSS with vhost TAP.
>     >
>     > This set of patches introduces the usage of eBPF for packet steering
>     > and RSS hash calculation:
>     > * RSS(Receive Side Scaling) is used to distribute network packets to
>     > guest virtqueues by calculating packet hash
>     > * eBPF RSS suppose to be faster than already existing 'software'
>     > implementation in QEMU
>     > * Additionally adding support for the usage of RSS with vhost
>     >
>     > Supported kernels: 5.8+
>     >
>     > Implementation notes:
>     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>     > Added eBPF support to qemu directly through a system call, see the
>     > bpf(2) for details.
>     > The eBPF program is part of the qemu and presented as an array
>     of bpf
>     > instructions.
>     > The program can be recompiled by provided Makefile.ebpf(need to
>     adjust
>     > 'linuxhdrs'),
>     > although it's not required to build QEMU with eBPF support.
>     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>     > 'Software' RSS used in the case of hash population and as a
>     fallback option.
>     > For vhost, the hash population feature is not reported to the guest.
>     >
>     > Please also see the documentation in PATCH 6/6.
>     >
>     > I am sending those patches as RFC to initiate the discussions
>     and get
>     > feedback on the following points:
>     > * Fallback when eBPF is not supported by the kernel
>
>
>     Yes, and it could also a lacking of CAP_BPF.
>
>
>     > * Live migration to the kernel that doesn't have eBPF support
>
>
>     Is there anything that we needs special treatment here?
>
> Possible case: rss=on, vhost=on, source system with kernel 5.8 
> (everything works) -> dest. system 5.6 (bpf does not work), the 
> adapter functions, but all the steering does not use proper queues.


Right, I think we need to disable vhost on dest.


>
>
>
>     > * Integration with current QEMU build
>
>
>     Yes, a question here:
>
>     1) Any reason for not using libbpf, e.g it has been shipped with some
>     distros
>
>
> We intentionally do not use libbpf, as it present only on some distros.
> We can switch to libbpf, but this will disable bpf if libbpf is not 
> installed


That's better I think.


>     2) It would be better if we can avoid shipping bytecodes
>
>
>
> This creates new dependencies: llvm + clang + ...
> We would prefer byte code and ability to generate it if prerequisites 
> are installed.


It's probably ok if we treat the bytecode as a kind of firmware.

But in the long run, it's still worthwhile consider the qemu source is 
used for development and llvm/clang should be a common requirement for 
generating eBPF bytecode for host.


>
>
>     > * Additional usage for eBPF for packet filtering
>
>
>     Another interesting topics in to implement mac/vlan filters. And
>     in the
>     future, I plan to add mac based steering. All of these could be
>     done via
>     eBPF.
>
>
> No problem, we can cooperate if needed
>
>
>     >
>     > Know issues:
>     > * hash population not supported by eBPF RSS: 'software' RSS used
>
>
>     Is this because there's not way to write to vnet header in
>     STERRING BPF?
>
> Yes. We plan to submit changes for kernel to cooperate with BPF and 
> populate the hash, this work is in progress


That would require a new type of eBPF program and may need some work on 
verifier.

Btw, macvtap is still lacking even steering ebpf program. Would you want 
to post a patch to support that?


>
>     > as a fallback, also, hash population feature is not reported to
>     guests
>     > with vhost.
>     > * big-endian BPF support: for now, eBPF is disabled for
>     big-endian systems.
>
>
>     Are there any blocker for this?
>
>
> No, can be added in v2


Cool.

Thanks


>
>     Just some quick questions after a glance of the codes. Will go
>     through
>     them tomorrow.
>
>     Thanks
>
>
>     >
>     > Andrew (6):
>     >    Added SetSteeringEBPF method for NetClientState.
>     >    ebpf: Added basic eBPF API.
>     >    ebpf: Added eBPF RSS program.
>     >    ebpf: Added eBPF RSS loader.
>     >    virtio-net: Added eBPF RSS to virtio-net.
>     >    docs: Added eBPF documentation.
>     >
>     >   MAINTAINERS                    |   6 +
>     >   configure                      |  36 +++
>     >   docs/ebpf.rst                  |  29 ++
>     >   docs/ebpf_rss.rst              | 129 ++++++++
>     >   ebpf/EbpfElf_to_C.py           |  67 ++++
>     >   ebpf/Makefile.ebpf             |  38 +++
>     >   ebpf/ebpf-stub.c               |  28 ++
>     >   ebpf/ebpf.c                    | 107 +++++++
>     >   ebpf/ebpf.h                    |  35 +++
>     >   ebpf/ebpf_rss.c                | 178 +++++++++++
>     >   ebpf/ebpf_rss.h                |  30 ++
>     >   ebpf/meson.build               |   1 +
>     >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
>     >   ebpf/trace-events              |   4 +
>     >   ebpf/trace.h                   |   2 +
>     >   ebpf/tun_rss_steering.h        | 556
>     +++++++++++++++++++++++++++++++++
>     >   hw/net/vhost_net.c             |   2 +
>     >   hw/net/virtio-net.c            | 120 ++++++-
>     >   include/hw/virtio/virtio-net.h |   4 +
>     >   include/net/net.h              |   2 +
>     >   meson.build                    |   3 +
>     >   net/tap-bsd.c                  |   5 +
>     >   net/tap-linux.c                |  19 ++
>     >   net/tap-solaris.c              |   5 +
>     >   net/tap-stub.c                 |   5 +
>     >   net/tap.c                      |   9 +
>     >   net/tap_int.h                  |   1 +
>     >   net/vhost-vdpa.c               |   2 +
>     >   28 files changed, 1889 insertions(+), 4 deletions(-)
>     >   create mode 100644 docs/ebpf.rst
>     >   create mode 100644 docs/ebpf_rss.rst
>     >   create mode 100644 ebpf/EbpfElf_to_C.py
>     >   create mode 100755 ebpf/Makefile.ebpf
>     >   create mode 100644 ebpf/ebpf-stub.c
>     >   create mode 100644 ebpf/ebpf.c
>     >   create mode 100644 ebpf/ebpf.h
>     >   create mode 100644 ebpf/ebpf_rss.c
>     >   create mode 100644 ebpf/ebpf_rss.h
>     >   create mode 100644 ebpf/meson.build
>     >   create mode 100644 ebpf/rss.bpf.c
>     >   create mode 100644 ebpf/trace-events
>     >   create mode 100644 ebpf/trace.h
>     >   create mode 100644 ebpf/tun_rss_steering.h
>     >
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-03 11:56     ` Daniel P. Berrangé
@ 2020-11-04  2:15       ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2020-11-04  2:15 UTC (permalink / raw)
  To: Daniel P. Berrangé, Yuri Benditovich
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin


On 2020/11/3 下午7:56, Daniel P. Berrangé wrote:
> On Tue, Nov 03, 2020 at 12:32:43PM +0200, Yuri Benditovich wrote:
>> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>>> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>>>> Basic idea is to use eBPF to calculate and steer packets in TAP.
>>>> RSS(Receive Side Scaling) is used to distribute network packets to guest
>>> virtqueues
>>>> by calculating packet hash.
>>>> eBPF RSS allows us to use RSS with vhost TAP.
>>>>
>>>> This set of patches introduces the usage of eBPF for packet steering
>>>> and RSS hash calculation:
>>>> * RSS(Receive Side Scaling) is used to distribute network packets to
>>>> guest virtqueues by calculating packet hash
>>>> * eBPF RSS suppose to be faster than already existing 'software'
>>>> implementation in QEMU
>>>> * Additionally adding support for the usage of RSS with vhost
>>>>
>>>> Supported kernels: 5.8+
>>>>
>>>> Implementation notes:
>>>> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>>>> Added eBPF support to qemu directly through a system call, see the
>>>> bpf(2) for details.
>>>> The eBPF program is part of the qemu and presented as an array of bpf
>>>> instructions.
>>>> The program can be recompiled by provided Makefile.ebpf(need to adjust
>>>> 'linuxhdrs'),
>>>> although it's not required to build QEMU with eBPF support.
>>>> Added changes to virtio-net and vhost, primary eBPF RSS is used.
>>>> 'Software' RSS used in the case of hash population and as a fallback
>>> option.
>>>> For vhost, the hash population feature is not reported to the guest.
>>>>
>>>> Please also see the documentation in PATCH 6/6.
>>>>
>>>> I am sending those patches as RFC to initiate the discussions and get
>>>> feedback on the following points:
>>>> * Fallback when eBPF is not supported by the kernel
>>>
>>> Yes, and it could also a lacking of CAP_BPF.
>>>
>>>
>>>> * Live migration to the kernel that doesn't have eBPF support
>>>
>>> Is there anything that we needs special treatment here?
>>>
>>> Possible case: rss=on, vhost=on, source system with kernel 5.8 (everything
>> works) -> dest. system 5.6 (bpf does not work), the adapter functions, but
>> all the steering does not use proper queues.
>>
>>
>>
>>
>>>> * Integration with current QEMU build
>>>
>>> Yes, a question here:
>>>
>>> 1) Any reason for not using libbpf, e.g it has been shipped with some
>>> distros
>>>
>> We intentionally do not use libbpf, as it present only on some distros.
>> We can switch to libbpf, but this will disable bpf if libbpf is not
>> installed
> If we were modifying existing funtionality then introducing a dep on
> libbpf would be a problem as you'd be breaking existing QEMU users
> on distros without libbpf.
>
> This is brand new functionality though, so it is fine to place a
> requirement on libbpf. If distros don't ship that library and they
> want BPF features in QEMU, then those distros should take responsibility
> for adding libbpf to their package set.
>
>>> 2) It would be better if we can avoid shipping bytecodes
>>>
>>
>> This creates new dependencies: llvm + clang + ...
>> We would prefer byte code and ability to generate it if prerequisites are
>> installed.
> I've double checked with Fedora, and generating the BPF program from
> source is a mandatory requirement for QEMU. Pre-generated BPF bytecode
> is not permitted.
>
> There was also a question raised about the kernel ABI compatibility
> for BPF programs ?
>
>    https://lwn.net/Articles/831402/
>
>    "The basic problem is that when BPF is compiled, it uses a set
>     of kernel headers that describe various kernel data structures
>     for that particular version, which may be different from those
>     on the kernel where the program is run. Until relatively recently,
>     that was solved by distributing the BPF as C code along with the
>     Clang compiler to build the BPF on the system where it was going
>     to be run."
>
> Is this not an issue for QEMU's usage of BPF here ?


That's good point. Actually, DPDK ships RSS bytecodes but I don't know 
it works.

But as mentioned in the link, if we generate the code with BTF that 
would be fine.

Thanks


>
> The dependancy on llvm is unfortunate for people who build with GCC,
> but at least they can opt-out via a configure switch if they really
> want to. As that LWN article notes, GCC will gain BPF support
>
>
> Regards,
> Daniel



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState.
  2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
@ 2020-11-04  2:49   ` Jason Wang
  2020-11-04  9:34     ` Yuri Benditovich
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-04  2:49 UTC (permalink / raw)
  To: Andrew Melnychenko, mst; +Cc: yan, yuri.benditovich, qemu-devel


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> From: Andrew <andrew@daynix.com>
>
> For now, that method supported only by Linux TAP.
> Linux TAP uses TUNSETSTEERINGEBPF ioctl.
> TUNSETSTEERINGBPF was added 3 years ago.
> Qemu checks if it was defined before using.
>
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>   include/net/net.h |  2 ++
>   net/tap-bsd.c     |  5 +++++
>   net/tap-linux.c   | 19 +++++++++++++++++++
>   net/tap-solaris.c |  5 +++++
>   net/tap-stub.c    |  5 +++++
>   net/tap.c         |  9 +++++++++
>   net/tap_int.h     |  1 +
>   7 files changed, 46 insertions(+)
>
> diff --git a/include/net/net.h b/include/net/net.h
> index 897b2d7595..d8a41fb010 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -60,6 +60,7 @@ typedef int (SetVnetBE)(NetClientState *, bool);
>   typedef struct SocketReadState SocketReadState;
>   typedef void (SocketReadStateFinalize)(SocketReadState *rs);
>   typedef void (NetAnnounce)(NetClientState *);
> +typedef bool (SetSteeringEBPF)(NetClientState *, int);
>   
>   typedef struct NetClientInfo {
>       NetClientDriver type;
> @@ -81,6 +82,7 @@ typedef struct NetClientInfo {
>       SetVnetLE *set_vnet_le;
>       SetVnetBE *set_vnet_be;
>       NetAnnounce *announce;
> +    SetSteeringEBPF *set_steering_ebpf;
>   } NetClientInfo;
>   
>   struct NetClientState {
> diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> index 77aaf674b1..4f64f31e98 100644
> --- a/net/tap-bsd.c
> +++ b/net/tap-bsd.c
> @@ -259,3 +259,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
>   {
>       return -1;
>   }
> +
> +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-linux.c b/net/tap-linux.c
> index b0635e9e32..196373019f 100644
> --- a/net/tap-linux.c
> +++ b/net/tap-linux.c
> @@ -31,6 +31,7 @@
>   
>   #include <net/if.h>
>   #include <sys/ioctl.h>
> +#include <linux/if_tun.h> /* TUNSETSTEERINGEBPF */
>   
>   #include "qapi/error.h"
>   #include "qemu/error-report.h"
> @@ -316,3 +317,21 @@ int tap_fd_get_ifname(int fd, char *ifname)
>       pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
>       return 0;
>   }
> +
> +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> +{
> +#ifdef TUNSETSTEERINGEBPF


I'm not sure how much this can help.

But looking at tap-linux.h, I wonder do we need to pull TUN/TAP uapi 
headers.

Thanks


> +    if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
> +        error_report("Issue while setting TUNSETSTEERINGEBPF:"
> +                    " %s with fd: %d, prog_fd: %d",
> +                    strerror(errno), fd, prog_fd);
> +
> +       return -1;
> +    }
> +
> +    return 0;
> +#else
> +    error_report("TUNSETSTEERINGEBPF is not supported");
> +    return -1;
> +#endif
> +}
> diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> index 0475a58207..d85224242b 100644
> --- a/net/tap-solaris.c
> +++ b/net/tap-solaris.c
> @@ -255,3 +255,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
>   {
>       return -1;
>   }
> +
> +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> +{
> +    return -1;
> +}
> diff --git a/net/tap-stub.c b/net/tap-stub.c
> index de525a2e69..a0fa25804b 100644
> --- a/net/tap-stub.c
> +++ b/net/tap-stub.c
> @@ -85,3 +85,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
>   {
>       return -1;
>   }
> +
> +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> +{
> +    return -1;
> +}
> diff --git a/net/tap.c b/net/tap.c
> index c46ff66184..81f50017bd 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -337,6 +337,14 @@ static void tap_poll(NetClientState *nc, bool enable)
>       tap_write_poll(s, enable);
>   }
>   
> +static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
> +{
> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
> +
> +    return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
> +}
> +
>   int tap_get_fd(NetClientState *nc)
>   {
>       TAPState *s = DO_UPCAST(TAPState, nc, nc);
> @@ -362,6 +370,7 @@ static NetClientInfo net_tap_info = {
>       .set_vnet_hdr_len = tap_set_vnet_hdr_len,
>       .set_vnet_le = tap_set_vnet_le,
>       .set_vnet_be = tap_set_vnet_be,
> +    .set_steering_ebpf = tap_set_steering_ebpf,
>   };
>   
>   static TAPState *net_tap_fd_init(NetClientState *peer,
> diff --git a/net/tap_int.h b/net/tap_int.h
> index 225a49ea48..547f8a5a28 100644
> --- a/net/tap_int.h
> +++ b/net/tap_int.h
> @@ -44,5 +44,6 @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
>   int tap_fd_enable(int fd);
>   int tap_fd_disable(int fd);
>   int tap_fd_get_ifname(int fd, char *ifname);
> +int tap_fd_set_steering_ebpf(int fd, int prog_fd);
>   
>   #endif /* NET_TAP_INT_H */



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-02 18:51 ` [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net Andrew Melnychenko
@ 2020-11-04  3:09   ` Jason Wang
  2020-11-04 11:07     ` Yuri Benditovich
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-04  3:09 UTC (permalink / raw)
  To: Andrew Melnychenko, mst; +Cc: yan, yuri.benditovich, qemu-devel


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> From: Andrew <andrew@daynix.com>
>
> When RSS is enabled the device tries to load the eBPF program
> to select RX virtqueue in the TUN. If eBPF can be loaded
> the RSS will function also with vhost (works with kernel 5.8 and later).
> Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
> or when hash population requested by the guest.
>
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>   hw/net/vhost_net.c             |   2 +
>   hw/net/virtio-net.c            | 120 +++++++++++++++++++++++++++++++--
>   include/hw/virtio/virtio-net.h |   4 ++
>   net/vhost-vdpa.c               |   2 +
>   4 files changed, 124 insertions(+), 4 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 24d555e764..16124f99c3 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
>       VIRTIO_F_RING_PACKED,
> +    VIRTIO_NET_F_RSS,
> +    VIRTIO_NET_F_HASH_REPORT,
>   
>       /* This bit implies RARP isn't sent by QEMU out of band */
>       VIRTIO_NET_F_GUEST_ANNOUNCE,
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 277289d56e..afcc3032ec 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
>   
>   static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
>   
> +static uint64_t fix_ebpf_vhost_features(uint64_t features)
> +{
> +    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature */
> +    uint64_t ret = features;
> +#ifndef CONFIG_EBPF
> +    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
> +#endif
> +    /* for now, there is no solution for populating the hash from eBPF */
> +    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);


I think we probably need to to something reverse since RSS is under the 
control on qemu cli, disable features like this may break migration.

We need disable vhost instead when:

1) eBPF is not supported but RSS is required from command line

or

2) HASH_REPORT is required from command line


> +
> +    return ret;
> +}
> +
>   static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>                                           Error **errp)
>   {
> @@ -732,9 +745,9 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>           return features;
>       }
>   
> -    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
> -    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
> -    features = vhost_net_get_features(get_vhost_net(nc->peer), features);
> +    features = fix_ebpf_vhost_features(
> +            vhost_net_get_features(get_vhost_net(nc->peer), features));
> +
>       vdev->backend_features = features;
>   
>       if (n->mtu_bypass_backend &&
> @@ -1169,12 +1182,75 @@ static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>       }
>   }
>   
> +static void virtio_net_unload_epbf_rss(VirtIONet *n);
> +
>   static void virtio_net_disable_rss(VirtIONet *n)
>   {
>       if (n->rss_data.enabled) {
>           trace_virtio_net_rss_disable();
>       }
>       n->rss_data.enabled = false;
> +
> +    if (!n->rss_data.enabled_software_rss && ebpf_rss_is_loaded(&n->ebpf_rss)) {
> +        virtio_net_unload_epbf_rss(n);
> +    }
> +}
> +
> +static bool virtio_net_attach_steering_ebpf(NICState *nic, int prog_fd)
> +{
> +    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
> +    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
> +        return false;
> +    }
> +
> +    return nc->info->set_steering_ebpf(nc, prog_fd);
> +}
> +
> +static void rss_data_to_rss_config(struct VirtioNetRssData *data,
> +                                   struct EBPFRSSConfig *config)
> +{
> +    config->redirect = data->redirect;
> +    config->populate_hash = data->populate_hash;
> +    config->hash_types = data->hash_types;
> +    config->indirections_len = data->indirections_len;
> +    config->default_queue = data->default_queue;
> +}
> +
> +static bool virtio_net_load_epbf_rss(VirtIONet *n)
> +{
> +    struct EBPFRSSConfig config = {};
> +
> +    if (!n->rss_data.enabled) {
> +        if (ebpf_rss_is_loaded(&n->ebpf_rss)) {
> +            ebpf_rss_unload(&n->ebpf_rss);
> +        }
> +        return true;
> +    }
> +
> +    if (!ebpf_rss_is_loaded(&n->ebpf_rss) && !ebpf_rss_load(&n->ebpf_rss)) {
> +        return false;
> +    }
> +
> +    rss_data_to_rss_config(&n->rss_data, &config);
> +
> +    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
> +                          n->rss_data.indirections_table, n->rss_data.key)) {
> +        ebpf_rss_unload(&n->ebpf_rss);
> +        return false;
> +    }
> +
> +    if (!virtio_net_attach_steering_ebpf(n->nic, n->ebpf_rss.program_fd)) {
> +        ebpf_rss_unload(&n->ebpf_rss);
> +        return false;
> +    }
> +
> +    return true;
> +}
> +
> +static void virtio_net_unload_epbf_rss(VirtIONet *n)
> +{
> +    virtio_net_attach_steering_ebpf(n->nic, -1);
> +    ebpf_rss_unload(&n->ebpf_rss);
>   }
>   
>   static uint16_t virtio_net_handle_rss(VirtIONet *n,
> @@ -1208,6 +1284,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
>           err_value = (uint32_t)s;
>           goto error;
>       }
> +    n->rss_data.enabled_software_rss = false;
>       n->rss_data.hash_types = virtio_ldl_p(vdev, &cfg.hash_types);
>       n->rss_data.indirections_len =
>           virtio_lduw_p(vdev, &cfg.indirection_table_mask);
> @@ -1289,9 +1366,30 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
>           goto error;
>       }
>       n->rss_data.enabled = true;
> +
> +    if (!n->rss_data.populate_hash) {
> +        /* load EBPF RSS */
> +        if (!virtio_net_load_epbf_rss(n)) {
> +            /* EBPF mast be loaded for vhost */
> +            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> +                warn_report("Can't load eBPF RSS for vhost");
> +                goto error;
> +            }
> +            /* fallback to software RSS */
> +            warn_report("Can't load eBPF RSS - fallback to software RSS");
> +            n->rss_data.enabled_software_rss = true;
> +        }
> +    } else {
> +        /* use software RSS for hash populating */
> +        /* and unload eBPF if was loaded before */
> +        virtio_net_unload_epbf_rss(n);
> +        n->rss_data.enabled_software_rss = true;
> +    }
> +
>       trace_virtio_net_rss_enable(n->rss_data.hash_types,
>                                   n->rss_data.indirections_len,
>                                   temp.b);
> +
>       return queues;
>   error:
>       trace_virtio_net_rss_error(err_msg, err_value);
> @@ -1674,7 +1772,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>           return -1;
>       }
>   
> -    if (!no_rss && n->rss_data.enabled) {
> +    if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
>           int index = virtio_net_process_rss(nc, buf, size);
>           if (index >= 0) {
>               NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
> @@ -2780,6 +2878,18 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
>       }
>   
>       if (n->rss_data.enabled) {
> +        n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
> +        if (!n->rss_data.populate_hash) {
> +            if (!virtio_net_load_epbf_rss(n)) {
> +                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> +                    error_report("Can't post-load eBPF RSS for vhost");
> +                } else {
> +                    warn_report("Can't post-load eBPF RSS - fallback to software RSS");
> +                    n->rss_data.enabled_software_rss = true;
> +                }
> +            }
> +        }
> +
>           trace_virtio_net_rss_enable(n->rss_data.hash_types,
>                                       n->rss_data.indirections_len,
>                                       sizeof(n->rss_data.key));
> @@ -3453,6 +3563,8 @@ static void virtio_net_instance_init(Object *obj)
>       device_add_bootindex_property(obj, &n->nic_conf.bootindex,
>                                     "bootindex", "/ethernet-phy@0",
>                                     DEVICE(n));
> +
> +    ebpf_rss_init(&n->ebpf_rss);
>   }
>   
>   static int virtio_net_pre_save(void *opaque)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index f4852ac27b..4d29a577eb 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -21,6 +21,8 @@
>   #include "qemu/option_int.h"
>   #include "qom/object.h"
>   
> +#include "ebpf/ebpf_rss.h"
> +
>   #define TYPE_VIRTIO_NET "virtio-net-device"
>   OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
>   
> @@ -130,6 +132,7 @@ typedef struct VirtioNetRscChain {
>   
>   typedef struct VirtioNetRssData {
>       bool    enabled;
> +    bool    enabled_software_rss;


We probably need a better name of this since "software" is kind of 
confusing.


>       bool    redirect;
>       bool    populate_hash;
>       uint32_t hash_types;
> @@ -214,6 +217,7 @@ struct VirtIONet {
>       Notifier migration_state;
>       VirtioNetRssData rss_data;
>       struct NetRxPkt *rx_pkt;
> +    struct EBPFRSSContext ebpf_rss;
>   };
>   
>   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 99c476db8c..feb5fa8624 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -54,6 +54,8 @@ const int vdpa_feature_bits[] = {
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
>       VIRTIO_F_RING_PACKED,
> +    VIRTIO_NET_F_RSS,
> +    VIRTIO_NET_F_HASH_REPORT,


This is fine but looks unrelated to eBPF RSS support.

Usually it means the hardware can support RSS/hash reporting.

Thanks


>       VIRTIO_NET_F_GUEST_ANNOUNCE,
>       VIRTIO_NET_F_STATUS,
>       VHOST_INVALID_FEATURE_BIT



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/6] docs: Added eBPF documentation.
  2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
@ 2020-11-04  3:15   ` Jason Wang
  2020-11-05  3:56   ` Jason Wang
  1 sibling, 0 replies; 36+ messages in thread
From: Jason Wang @ 2020-11-04  3:15 UTC (permalink / raw)
  To: Andrew Melnychenko, mst; +Cc: yan, yuri.benditovich, qemu-devel


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> From: Andrew <andrew@daynix.com>
>
> Also, added maintainers information.
>
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> ---
>   MAINTAINERS       |   6 +++
>   docs/ebpf.rst     |  29 +++++++++++
>   docs/ebpf_rss.rst | 129 ++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 164 insertions(+)
>   create mode 100644 docs/ebpf.rst
>   create mode 100644 docs/ebpf_rss.rst
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2c22bbca5a..464b3f3c95 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3111,6 +3111,12 @@ S: Maintained
>   F: hw/semihosting/
>   F: include/hw/semihosting/
>   
> +EBPF:
> +M: Andrew Melnychenko <andrew@daynix.com>
> +M: Yuri Benditovich <yuri.benditovich@daynix.com>
> +S: Maintained
> +F: ebpf/*
> +
>   Build and test automation
>   -------------------------
>   Build and test automation
> diff --git a/docs/ebpf.rst b/docs/ebpf.rst
> new file mode 100644
> index 0000000000..e45d085432
> --- /dev/null
> +++ b/docs/ebpf.rst
> @@ -0,0 +1,29 @@
> +===========================
> +eBPF qemu support
> +===========================
> +
> +eBPF support (CONFIG_EBPF) is enabled automatically by 'configure' script
> +if 'bpf' system call is available.
> +To disable eBPF support use './configure --disable-bpf'
> +
> +Basic eBPF functionality is located in ebpf/ebpf.c and ebpf/ebpf.h.
> +There are basic functions to load the eBPF program into the kernel.
> +Mostly, functions name are self-explanatory:
> +
> +- `bpf_create_map()`, `bpf_lookup_element()`, `bpf_update_element()`, `bpf_delete_element()` - manages eBPF maps. On error, a basic error message would be reported and returned -1. On success, 0 would be returned(`bpf_create_map()` returns map's file descriptor).
> +- `bpf_prog_load()` - load the program. The program has to have proper map file descriptors if there are used. On error - the log eBPF would be reported. On success, the program file descriptor returned.
> +- `bpf_fixup_mapfd()` - would place map file descriptor into the program according to 'relocate array' of 'struct fixup_mapfd_t'. The function would return how many instructions were 'fixed' aka how many relocations was occurred.
> +
> +Simplified workflow would look like this:
> +
> +.. code:: C
> +
> +    int map1 = bpf_create_map(...);
> +    int map2 = bpf_create_map(...);
> +
> +    bpf_fixup_mapfd(<fixup table>, ARRAY_SIZE(<fixup table>), <instructions pointer>, ARRAY_SIZE(<instructions pointer>), <map1 name>, map1);
> +    bpf_fixup_mapfd(<fixup table>, ARRAY_SIZE(<fixup table>), <instructions pointer>, ARRAY_SIZE(<instructions pointer>), <map2 name>, map2);
> +
> +    int prog = bpf_prog_load(<program type>, <instructions pointer>, ARRAY_SIZE(<instructions pointer>), "GPL");
> +
> +See the bpf(2) for details.
> diff --git a/docs/ebpf_rss.rst b/docs/ebpf_rss.rst
> new file mode 100644
> index 0000000000..96fee391b8
> --- /dev/null
> +++ b/docs/ebpf_rss.rst
> @@ -0,0 +1,129 @@
> +===========================
> +eBPF RSS virtio-net support
> +===========================
> +
> +RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
> +by calculating packet hash. Usually every queue is processed then by a specific guest CPU core.
> +
> +For now there are 2 RSS implementations in qemu:
> +- 'software' RSS (functions if qemu receives network packets, i.e. vhost=off)
> +- eBPF RSS (can function with also with vhost=on)
> +
> +If steering BPF is not set for kernel's TUN module, the TUN uses automatic selection
> +of rx virtqueue based on lookup table built according to calculated symmetric hash
> +of transmitted packets.
> +If steering BPF is set for TUN the BPF code calculates the hash of packet header and
> +returns the virtqueue number to place the packet to.
> +
> +Simplified decision formula:
> +
> +.. code:: C
> +
> +    queue_index = indirection_table[hash(<packet data>)%<indirection_table size>]
> +
> +
> +Not for all packets, the hash can/should be calculated.
> +
> +Note: currently, eBPF RSS does not support hash reporting.
> +
> +eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap configurations:
> +
> +- eBPF is used:
> +
> +        tap,vhost=off & virtio-net-pci,rss=on,hash=off
> +
> +- eBPF is used:
> +
> +        tap,vhost=on & virtio-net-pci,rss=on,hash=off
> +
> +- 'software' RSS is used:
> +
> +        tap,vhost=off & virtio-net-pci,rss=on,hash=on
> +
> +- eBPF is used, hash population feature is not reported to the guest:
> +
> +        tap,vhost=on & virtio-net-pci,rss=on,hash=on
> +
> +If CONFIG_EBPF is not set then only 'software' RSS is supported.
> +Also 'software' RSS, as a fallback, is used if the eBPF program failed to load or set to TUN.
> +
> +RSS eBPF program
> +----------------
> +
> +RSS program located in ebpf/tun_rss_steering.h as an array of 'struct bpf_insn'.
> +So the program is part of the qemu binary.
> +Initially, the eBPF program was compiled by clang and source code located at ebpf/rss.bpf.c.
> +Prerequisites to recompile the eBPF program (regenerate ebpf/tun_rss_steering.h):
> +
> +        llvm, clang, kernel source tree, python3 + (pip3 pyelftools)
> +        Adjust 'linuxhdrs' in Makefile.ebpf to reflect the location of the kernel source tree
> +
> +        $ cd ebpf
> +        $ make -f Makefile.ebpf
> +
> +Note the python script for convertation from eBPF ELF object to '.h' file - Ebpf_to_C.py:
> +
> +        $ python EbpfElf_to_C.py rss.bpf.o tun_rss_steering
> +
> +The first argument of the script is ELF object, second - section name where the eBPF program located.
> +The script would generate <section name>.h file with eBPF instructions and 'relocate array'.
> +'relocate array' is an array of 'struct fixup_mapfd_t' with the name of the eBPF map and instruction offset where the file descriptor of the map should be placed.
> +


Do we still need this if we decide to use llvm/clang toolchain? (I guess 
not)

Thanks


> +Current eBPF RSS implementation uses 'bounded loops' with 'backward jump instructions' which present in the last kernels.
> +Overall eBPF RSS works on kernels 5.8+.
> +
> +eBPF RSS implementation
> +-----------------------
> +
> +eBPF RSS loading functionality located in ebpf/ebpf_rss.c and ebpf/ebpf_rss.h.
> +
> +The `struct EBPFRSSContext` structure that holds 4 file descriptors:
> +
> +- program_fd - file descriptor of the eBPF RSS program.
> +- map_configuration - file descriptor of the 'configuration' map. This map contains one element of 'struct EBPFRSSConfig'. This configuration determines eBPF program behavior.
> +- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One element of the 40byte key prepared for the hashing algorithm.
> +- map_indirections_table - 128 elements of queue indexes.
> +
> +`struct EBPFRSSConfig` fields:
> +
> +- redirect - "boolean" value, should the hash be calculated, on false  - `default_queue` would be used as the final decision.
> +- populate_hash - for now, not used. eBPF RSS doesn't support hash reporting.
> +- hash_types - binary mask of different hash types. See `VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be calculated - `default_queue` would be used.
> +- indirections_len - length of the indirections table, maximum 128.
> +- default_queue - the queue index that used for packet that shouldn't be hashed. For some packets, the hash can't be calculated(g.e ARP).
> +
> +Functions:
> +
> +- `ebpf_rss_init()` - sets program_fd to -1, which indicates that EBPFRSSContext is not loaded.
> +- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from tun_rss_steering.h. Returns 'true' on success. After that, program_fd can be used to set steering for TAP.
> +- `ebpf_rss_set_all()` - sets values for eBPF maps. `indirections_table` length is in EBPFRSSConfig. `toeplitz_key` is VIRTIO_NET_RSS_MAX_KEY_SIZE aka 40 bytes array.
> +- `ebpf_rss_unload()` - close all file descriptors and set program_fd to -1.
> +
> +Simplified eBPF RSS workflow:
> +
> +.. code:: C
> +
> +    struct EBPFRSSConfig config;
> +    config.redirect = 1;
> +    config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
> +    config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_LEN;
> +    config.default_queue = 0;
> +
> +    uint16_t table[VIRTIO_NET_RSS_MAX_TABLE_LEN] = {...};
> +    uint8_t key[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {...};
> +
> +    struct EBPFRSSContext ctx;
> +    ebpf_rss_init(&ctx);
> +    ebpf_rss_load(&ctx);
> +    ebpf_rss_set_all(&ctx, &config, table, key);
> +    if (net_client->info->set_steering_ebpf != NULL) {
> +        net_client->info->set_steering_ebpf(net_client, ctx->program_fd);
> +    }
> +    ...
> +    ebpf_unload(&ctx);
> +
> +
> +NetClientState SetSteeringEBPF()
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For now, `set_steering_ebpf()` method supported by Linux TAP NetClientState. The method requires an eBPF program file descriptor as an argument.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-04  2:07     ` Jason Wang
@ 2020-11-04  9:31       ` Daniel P. Berrangé
  2020-11-05  3:46         ` Jason Wang
  2020-11-04 11:49       ` Yuri Benditovich
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-04  9:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Yuri Benditovich, Andrew Melnychenko, qemu-devel,
	Michael S . Tsirkin

On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> 
> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > 
> > 
> > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> > 
> > 
> >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> >     > RSS(Receive Side Scaling) is used to distribute network packets
> >     to guest virtqueues
> >     > by calculating packet hash.
> >     > eBPF RSS allows us to use RSS with vhost TAP.
> >     >
> >     > This set of patches introduces the usage of eBPF for packet steering
> >     > and RSS hash calculation:
> >     > * RSS(Receive Side Scaling) is used to distribute network packets to
> >     > guest virtqueues by calculating packet hash
> >     > * eBPF RSS suppose to be faster than already existing 'software'
> >     > implementation in QEMU
> >     > * Additionally adding support for the usage of RSS with vhost
> >     >
> >     > Supported kernels: 5.8+
> >     >
> >     > Implementation notes:
> >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> >     > Added eBPF support to qemu directly through a system call, see the
> >     > bpf(2) for details.
> >     > The eBPF program is part of the qemu and presented as an array
> >     of bpf
> >     > instructions.
> >     > The program can be recompiled by provided Makefile.ebpf(need to
> >     adjust
> >     > 'linuxhdrs'),
> >     > although it's not required to build QEMU with eBPF support.
> >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >     > 'Software' RSS used in the case of hash population and as a
> >     fallback option.
> >     > For vhost, the hash population feature is not reported to the guest.
> >     >
> >     > Please also see the documentation in PATCH 6/6.
> >     >
> >     > I am sending those patches as RFC to initiate the discussions
> >     and get
> >     > feedback on the following points:
> >     > * Fallback when eBPF is not supported by the kernel
> > 
> > 
> >     Yes, and it could also a lacking of CAP_BPF.
> > 
> > 
> >     > * Live migration to the kernel that doesn't have eBPF support
> > 
> > 
> >     Is there anything that we needs special treatment here?
> > 
> > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > functions, but all the steering does not use proper queues.
> 
> 
> Right, I think we need to disable vhost on dest.
> 
> 
> > 
> > 
> > 
> >     > * Integration with current QEMU build
> > 
> > 
> >     Yes, a question here:
> > 
> >     1) Any reason for not using libbpf, e.g it has been shipped with some
> >     distros
> > 
> > 
> > We intentionally do not use libbpf, as it present only on some distros.
> > We can switch to libbpf, but this will disable bpf if libbpf is not
> > installed
> 
> 
> That's better I think.
> 
> 
> >     2) It would be better if we can avoid shipping bytecodes
> > 
> > 
> > 
> > This creates new dependencies: llvm + clang + ...
> > We would prefer byte code and ability to generate it if prerequisites
> > are installed.
> 
> 
> It's probably ok if we treat the bytecode as a kind of firmware.

That is explicitly *not* OK for inclusion in Fedora. They require that
BPF is compiled from source, and rejected my suggestion that it could
be considered a kind of firmware and thus have an exception from building
from source.

> But in the long run, it's still worthwhile consider the qemu source is used
> for development and llvm/clang should be a common requirement for generating
> eBPF bytecode for host.

So we need to do this right straight way before this merges.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState.
  2020-11-04  2:49   ` Jason Wang
@ 2020-11-04  9:34     ` Yuri Benditovich
  0 siblings, 0 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-04  9:34 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 5314 bytes --]

On Wed, Nov 4, 2020 at 4:49 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > From: Andrew <andrew@daynix.com>
> >
> > For now, that method supported only by Linux TAP.
> > Linux TAP uses TUNSETSTEERINGEBPF ioctl.
> > TUNSETSTEERINGBPF was added 3 years ago.
> > Qemu checks if it was defined before using.
> >
> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > ---
> >   include/net/net.h |  2 ++
> >   net/tap-bsd.c     |  5 +++++
> >   net/tap-linux.c   | 19 +++++++++++++++++++
> >   net/tap-solaris.c |  5 +++++
> >   net/tap-stub.c    |  5 +++++
> >   net/tap.c         |  9 +++++++++
> >   net/tap_int.h     |  1 +
> >   7 files changed, 46 insertions(+)
> >
> > diff --git a/include/net/net.h b/include/net/net.h
> > index 897b2d7595..d8a41fb010 100644
> > --- a/include/net/net.h
> > +++ b/include/net/net.h
> > @@ -60,6 +60,7 @@ typedef int (SetVnetBE)(NetClientState *, bool);
> >   typedef struct SocketReadState SocketReadState;
> >   typedef void (SocketReadStateFinalize)(SocketReadState *rs);
> >   typedef void (NetAnnounce)(NetClientState *);
> > +typedef bool (SetSteeringEBPF)(NetClientState *, int);
> >
> >   typedef struct NetClientInfo {
> >       NetClientDriver type;
> > @@ -81,6 +82,7 @@ typedef struct NetClientInfo {
> >       SetVnetLE *set_vnet_le;
> >       SetVnetBE *set_vnet_be;
> >       NetAnnounce *announce;
> > +    SetSteeringEBPF *set_steering_ebpf;
> >   } NetClientInfo;
> >
> >   struct NetClientState {
> > diff --git a/net/tap-bsd.c b/net/tap-bsd.c
> > index 77aaf674b1..4f64f31e98 100644
> > --- a/net/tap-bsd.c
> > +++ b/net/tap-bsd.c
> > @@ -259,3 +259,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
> >   {
> >       return -1;
> >   }
> > +
> > +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> > +{
> > +    return -1;
> > +}
> > diff --git a/net/tap-linux.c b/net/tap-linux.c
> > index b0635e9e32..196373019f 100644
> > --- a/net/tap-linux.c
> > +++ b/net/tap-linux.c
> > @@ -31,6 +31,7 @@
> >
> >   #include <net/if.h>
> >   #include <sys/ioctl.h>
> > +#include <linux/if_tun.h> /* TUNSETSTEERINGEBPF */
> >
> >   #include "qapi/error.h"
> >   #include "qemu/error-report.h"
> > @@ -316,3 +317,21 @@ int tap_fd_get_ifname(int fd, char *ifname)
> >       pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
> >       return 0;
> >   }
> > +
> > +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> > +{
> > +#ifdef TUNSETSTEERINGEBPF
>
>
> I'm not sure how much this can help.
>
> But looking at tap-linux.h, I wonder do we need to pull TUN/TAP uapi
> headers.
>
> Thanks
>

Agree, we just need to add this define to tap-linux.h


>
>
> > +    if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
> > +        error_report("Issue while setting TUNSETSTEERINGEBPF:"
> > +                    " %s with fd: %d, prog_fd: %d",
> > +                    strerror(errno), fd, prog_fd);
> > +
> > +       return -1;
> > +    }
> > +
> > +    return 0;
> > +#else
> > +    error_report("TUNSETSTEERINGEBPF is not supported");
> > +    return -1;
> > +#endif
> > +}
> > diff --git a/net/tap-solaris.c b/net/tap-solaris.c
> > index 0475a58207..d85224242b 100644
> > --- a/net/tap-solaris.c
> > +++ b/net/tap-solaris.c
> > @@ -255,3 +255,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
> >   {
> >       return -1;
> >   }
> > +
> > +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> > +{
> > +    return -1;
> > +}
> > diff --git a/net/tap-stub.c b/net/tap-stub.c
> > index de525a2e69..a0fa25804b 100644
> > --- a/net/tap-stub.c
> > +++ b/net/tap-stub.c
> > @@ -85,3 +85,8 @@ int tap_fd_get_ifname(int fd, char *ifname)
> >   {
> >       return -1;
> >   }
> > +
> > +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
> > +{
> > +    return -1;
> > +}
> > diff --git a/net/tap.c b/net/tap.c
> > index c46ff66184..81f50017bd 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -337,6 +337,14 @@ static void tap_poll(NetClientState *nc, bool
> enable)
> >       tap_write_poll(s, enable);
> >   }
> >
> > +static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
> > +{
> > +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
> > +    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
> > +
> > +    return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
> > +}
> > +
> >   int tap_get_fd(NetClientState *nc)
> >   {
> >       TAPState *s = DO_UPCAST(TAPState, nc, nc);
> > @@ -362,6 +370,7 @@ static NetClientInfo net_tap_info = {
> >       .set_vnet_hdr_len = tap_set_vnet_hdr_len,
> >       .set_vnet_le = tap_set_vnet_le,
> >       .set_vnet_be = tap_set_vnet_be,
> > +    .set_steering_ebpf = tap_set_steering_ebpf,
> >   };
> >
> >   static TAPState *net_tap_fd_init(NetClientState *peer,
> > diff --git a/net/tap_int.h b/net/tap_int.h
> > index 225a49ea48..547f8a5a28 100644
> > --- a/net/tap_int.h
> > +++ b/net/tap_int.h
> > @@ -44,5 +44,6 @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
> >   int tap_fd_enable(int fd);
> >   int tap_fd_disable(int fd);
> >   int tap_fd_get_ifname(int fd, char *ifname);
> > +int tap_fd_set_steering_ebpf(int fd, int prog_fd);
> >
> >   #endif /* NET_TAP_INT_H */
>
>

[-- Attachment #2: Type: text/html, Size: 6995 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-04  3:09   ` Jason Wang
@ 2020-11-04 11:07     ` Yuri Benditovich
  2020-11-04 11:13       ` Daniel P. Berrangé
  2020-11-05  3:29       ` Jason Wang
  0 siblings, 2 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-04 11:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 11552 bytes --]

On Wed, Nov 4, 2020 at 5:09 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > From: Andrew <andrew@daynix.com>
> >
> > When RSS is enabled the device tries to load the eBPF program
> > to select RX virtqueue in the TUN. If eBPF can be loaded
> > the RSS will function also with vhost (works with kernel 5.8 and later).
> > Software RSS is used as a fallback with vhost=off when eBPF can't be
> loaded
> > or when hash population requested by the guest.
> >
> > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > ---
> >   hw/net/vhost_net.c             |   2 +
> >   hw/net/virtio-net.c            | 120 +++++++++++++++++++++++++++++++--
> >   include/hw/virtio/virtio-net.h |   4 ++
> >   net/vhost-vdpa.c               |   2 +
> >   4 files changed, 124 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 24d555e764..16124f99c3 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
> >       VIRTIO_NET_F_MTU,
> >       VIRTIO_F_IOMMU_PLATFORM,
> >       VIRTIO_F_RING_PACKED,
> > +    VIRTIO_NET_F_RSS,
> > +    VIRTIO_NET_F_HASH_REPORT,
> >
> >       /* This bit implies RARP isn't sent by QEMU out of band */
> >       VIRTIO_NET_F_GUEST_ANNOUNCE,
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 277289d56e..afcc3032ec 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
> >
> >   static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
> >
> > +static uint64_t fix_ebpf_vhost_features(uint64_t features)
> > +{
> > +    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature */
> > +    uint64_t ret = features;
> > +#ifndef CONFIG_EBPF
> > +    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
> > +#endif
> > +    /* for now, there is no solution for populating the hash from eBPF
> */
> > +    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);
>
>
> I think we probably need to to something reverse since RSS is under the
> control on qemu cli, disable features like this may break migration.
>
>
How by design we add new features to qemu in light of possible migration to
older qemu version when the destination
qemu does not support these features?


> We need disable vhost instead when:
>
> 1) eBPF is not supported but RSS is required from command line
>
> or
>
> 2) HASH_REPORT is required from command line
>
> > +
> > +    return ret;
> > +}
> > +
> >   static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t
> features,
> >                                           Error **errp)
> >   {
> > @@ -732,9 +745,9 @@ static uint64_t virtio_net_get_features(VirtIODevice
> *vdev, uint64_t features,
> >           return features;
> >       }
> >
> > -    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
> > -    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
> > -    features = vhost_net_get_features(get_vhost_net(nc->peer),
> features);
> > +    features = fix_ebpf_vhost_features(
> > +            vhost_net_get_features(get_vhost_net(nc->peer), features));
> > +
> >       vdev->backend_features = features;
> >
> >       if (n->mtu_bypass_backend &&
> > @@ -1169,12 +1182,75 @@ static int virtio_net_handle_announce(VirtIONet
> *n, uint8_t cmd,
> >       }
> >   }
> >
> > +static void virtio_net_unload_epbf_rss(VirtIONet *n);
> > +
> >   static void virtio_net_disable_rss(VirtIONet *n)
> >   {
> >       if (n->rss_data.enabled) {
> >           trace_virtio_net_rss_disable();
> >       }
> >       n->rss_data.enabled = false;
> > +
> > +    if (!n->rss_data.enabled_software_rss &&
> ebpf_rss_is_loaded(&n->ebpf_rss)) {
> > +        virtio_net_unload_epbf_rss(n);
> > +    }
> > +}
> > +
> > +static bool virtio_net_attach_steering_ebpf(NICState *nic, int prog_fd)
> > +{
> > +    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
> > +    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
> > +        return false;
> > +    }
> > +
> > +    return nc->info->set_steering_ebpf(nc, prog_fd);
> > +}
> > +
> > +static void rss_data_to_rss_config(struct VirtioNetRssData *data,
> > +                                   struct EBPFRSSConfig *config)
> > +{
> > +    config->redirect = data->redirect;
> > +    config->populate_hash = data->populate_hash;
> > +    config->hash_types = data->hash_types;
> > +    config->indirections_len = data->indirections_len;
> > +    config->default_queue = data->default_queue;
> > +}
> > +
> > +static bool virtio_net_load_epbf_rss(VirtIONet *n)
> > +{
> > +    struct EBPFRSSConfig config = {};
> > +
> > +    if (!n->rss_data.enabled) {
> > +        if (ebpf_rss_is_loaded(&n->ebpf_rss)) {
> > +            ebpf_rss_unload(&n->ebpf_rss);
> > +        }
> > +        return true;
> > +    }
> > +
> > +    if (!ebpf_rss_is_loaded(&n->ebpf_rss) &&
> !ebpf_rss_load(&n->ebpf_rss)) {
> > +        return false;
> > +    }
> > +
> > +    rss_data_to_rss_config(&n->rss_data, &config);
> > +
> > +    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
> > +                          n->rss_data.indirections_table,
> n->rss_data.key)) {
> > +        ebpf_rss_unload(&n->ebpf_rss);
> > +        return false;
> > +    }
> > +
> > +    if (!virtio_net_attach_steering_ebpf(n->nic,
> n->ebpf_rss.program_fd)) {
> > +        ebpf_rss_unload(&n->ebpf_rss);
> > +        return false;
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static void virtio_net_unload_epbf_rss(VirtIONet *n)
> > +{
> > +    virtio_net_attach_steering_ebpf(n->nic, -1);
> > +    ebpf_rss_unload(&n->ebpf_rss);
> >   }
> >
> >   static uint16_t virtio_net_handle_rss(VirtIONet *n,
> > @@ -1208,6 +1284,7 @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
> >           err_value = (uint32_t)s;
> >           goto error;
> >       }
> > +    n->rss_data.enabled_software_rss = false;
> >       n->rss_data.hash_types = virtio_ldl_p(vdev, &cfg.hash_types);
> >       n->rss_data.indirections_len =
> >           virtio_lduw_p(vdev, &cfg.indirection_table_mask);
> > @@ -1289,9 +1366,30 @@ static uint16_t virtio_net_handle_rss(VirtIONet
> *n,
> >           goto error;
> >       }
> >       n->rss_data.enabled = true;
> > +
> > +    if (!n->rss_data.populate_hash) {
> > +        /* load EBPF RSS */
> > +        if (!virtio_net_load_epbf_rss(n)) {
> > +            /* EBPF mast be loaded for vhost */
> > +            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> > +                warn_report("Can't load eBPF RSS for vhost");
> > +                goto error;
> > +            }
> > +            /* fallback to software RSS */
> > +            warn_report("Can't load eBPF RSS - fallback to software
> RSS");
> > +            n->rss_data.enabled_software_rss = true;
> > +        }
> > +    } else {
> > +        /* use software RSS for hash populating */
> > +        /* and unload eBPF if was loaded before */
> > +        virtio_net_unload_epbf_rss(n);
> > +        n->rss_data.enabled_software_rss = true;
> > +    }
> > +
> >       trace_virtio_net_rss_enable(n->rss_data.hash_types,
> >                                   n->rss_data.indirections_len,
> >                                   temp.b);
> > +
> >       return queues;
> >   error:
> >       trace_virtio_net_rss_error(err_msg, err_value);
> > @@ -1674,7 +1772,7 @@ static ssize_t
> virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
> >           return -1;
> >       }
> >
> > -    if (!no_rss && n->rss_data.enabled) {
> > +    if (!no_rss && n->rss_data.enabled &&
> n->rss_data.enabled_software_rss) {
> >           int index = virtio_net_process_rss(nc, buf, size);
> >           if (index >= 0) {
> >               NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
> > @@ -2780,6 +2878,18 @@ static int virtio_net_post_load_device(void
> *opaque, int version_id)
> >       }
> >
> >       if (n->rss_data.enabled) {
> > +        n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
> > +        if (!n->rss_data.populate_hash) {
> > +            if (!virtio_net_load_epbf_rss(n)) {
> > +                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
> > +                    error_report("Can't post-load eBPF RSS for vhost");
> > +                } else {
> > +                    warn_report("Can't post-load eBPF RSS - fallback to
> software RSS");
> > +                    n->rss_data.enabled_software_rss = true;
> > +                }
> > +            }
> > +        }
> > +
> >           trace_virtio_net_rss_enable(n->rss_data.hash_types,
> >                                       n->rss_data.indirections_len,
> >                                       sizeof(n->rss_data.key));
> > @@ -3453,6 +3563,8 @@ static void virtio_net_instance_init(Object *obj)
> >       device_add_bootindex_property(obj, &n->nic_conf.bootindex,
> >                                     "bootindex", "/ethernet-phy@0",
> >                                     DEVICE(n));
> > +
> > +    ebpf_rss_init(&n->ebpf_rss);
> >   }
> >
> >   static int virtio_net_pre_save(void *opaque)
> > diff --git a/include/hw/virtio/virtio-net.h
> b/include/hw/virtio/virtio-net.h
> > index f4852ac27b..4d29a577eb 100644
> > --- a/include/hw/virtio/virtio-net.h
> > +++ b/include/hw/virtio/virtio-net.h
> > @@ -21,6 +21,8 @@
> >   #include "qemu/option_int.h"
> >   #include "qom/object.h"
> >
> > +#include "ebpf/ebpf_rss.h"
> > +
> >   #define TYPE_VIRTIO_NET "virtio-net-device"
> >   OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
> >
> > @@ -130,6 +132,7 @@ typedef struct VirtioNetRscChain {
> >
> >   typedef struct VirtioNetRssData {
> >       bool    enabled;
> > +    bool    enabled_software_rss;
>
>
> We probably need a better name of this since "software" is kind of
> confusing.
>
> No problem


>
> >       bool    redirect;
> >       bool    populate_hash;
> >       uint32_t hash_types;
> > @@ -214,6 +217,7 @@ struct VirtIONet {
> >       Notifier migration_state;
> >       VirtioNetRssData rss_data;
> >       struct NetRxPkt *rx_pkt;
> > +    struct EBPFRSSContext ebpf_rss;
> >   };
> >
> >   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 99c476db8c..feb5fa8624 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -54,6 +54,8 @@ const int vdpa_feature_bits[] = {
> >       VIRTIO_NET_F_MTU,
> >       VIRTIO_F_IOMMU_PLATFORM,
> >       VIRTIO_F_RING_PACKED,
> > +    VIRTIO_NET_F_RSS,
> > +    VIRTIO_NET_F_HASH_REPORT,
>
>
> This is fine but looks unrelated to eBPF RSS support.
>
> Usually it means the hardware can support RSS/hash reporting.
>
> As I see from
https://github.com/qemu/qemu/blob/master/hw/virtio/vhost.c#L1526:
If we do not add these features this means they are transparent to specific
vhost solution.
When in fact they are not. We need to ensure these features are not in use
in case of
vhost-vdpa or vhost-user.


> Thanks
>
>
> >       VIRTIO_NET_F_GUEST_ANNOUNCE,
> >       VIRTIO_NET_F_STATUS,
> >       VHOST_INVALID_FEATURE_BIT
>
>

[-- Attachment #2: Type: text/html, Size: 15398 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-04 11:07     ` Yuri Benditovich
@ 2020-11-04 11:13       ` Daniel P. Berrangé
  2020-11-04 15:51         ` Yuri Benditovich
  2020-11-05  3:29       ` Jason Wang
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-04 11:13 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Jason Wang, Michael S . Tsirkin,
	Andrew Melnychenko, qemu-devel

On Wed, Nov 04, 2020 at 01:07:41PM +0200, Yuri Benditovich wrote:
> On Wed, Nov 4, 2020 at 5:09 AM Jason Wang <jasowang@redhat.com> wrote:
> 
> >
> > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > From: Andrew <andrew@daynix.com>
> > >
> > > When RSS is enabled the device tries to load the eBPF program
> > > to select RX virtqueue in the TUN. If eBPF can be loaded
> > > the RSS will function also with vhost (works with kernel 5.8 and later).
> > > Software RSS is used as a fallback with vhost=off when eBPF can't be
> > loaded
> > > or when hash population requested by the guest.
> > >
> > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > > ---
> > >   hw/net/vhost_net.c             |   2 +
> > >   hw/net/virtio-net.c            | 120 +++++++++++++++++++++++++++++++--
> > >   include/hw/virtio/virtio-net.h |   4 ++
> > >   net/vhost-vdpa.c               |   2 +
> > >   4 files changed, 124 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > index 24d555e764..16124f99c3 100644
> > > --- a/hw/net/vhost_net.c
> > > +++ b/hw/net/vhost_net.c
> > > @@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
> > >       VIRTIO_NET_F_MTU,
> > >       VIRTIO_F_IOMMU_PLATFORM,
> > >       VIRTIO_F_RING_PACKED,
> > > +    VIRTIO_NET_F_RSS,
> > > +    VIRTIO_NET_F_HASH_REPORT,
> > >
> > >       /* This bit implies RARP isn't sent by QEMU out of band */
> > >       VIRTIO_NET_F_GUEST_ANNOUNCE,
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 277289d56e..afcc3032ec 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
> > >
> > >   static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue);
> > >
> > > +static uint64_t fix_ebpf_vhost_features(uint64_t features)
> > > +{
> > > +    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature */
> > > +    uint64_t ret = features;
> > > +#ifndef CONFIG_EBPF
> > > +    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
> > > +#endif
> > > +    /* for now, there is no solution for populating the hash from eBPF
> > */
> > > +    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);
> >
> >
> > I think we probably need to to something reverse since RSS is under the
> > control on qemu cli, disable features like this may break migration.
> >
> >
> How by design we add new features to qemu in light of possible migration to
> older qemu version when the destination
> qemu does not support these features?

If the feature affects guest ABI, then we don't want to silently/
automatically turn on features that have a dependancy on kernel
features existing. They need to be an opt-in by mgmt app/admin.

IOW there needs to be an explicit property that is set to turn on use
of eBPF. If this property is set, then QEMU must use eBPF or fail
with an error. If it is unset, then QEMU must never use eBPF.

The mgmt app controlling QEMU will decide whether to use eBPF and
turn on the property, and will then know not to migrate it to a
host without eBPF support.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-04  2:07     ` Jason Wang
  2020-11-04  9:31       ` Daniel P. Berrangé
@ 2020-11-04 11:49       ` Yuri Benditovich
  2020-11-04 12:04         ` Daniel P. Berrangé
  1 sibling, 1 reply; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-04 11:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 8733 bytes --]

On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >
> >
> > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> >     > RSS(Receive Side Scaling) is used to distribute network packets
> >     to guest virtqueues
> >     > by calculating packet hash.
> >     > eBPF RSS allows us to use RSS with vhost TAP.
> >     >
> >     > This set of patches introduces the usage of eBPF for packet
> steering
> >     > and RSS hash calculation:
> >     > * RSS(Receive Side Scaling) is used to distribute network packets
> to
> >     > guest virtqueues by calculating packet hash
> >     > * eBPF RSS suppose to be faster than already existing 'software'
> >     > implementation in QEMU
> >     > * Additionally adding support for the usage of RSS with vhost
> >     >
> >     > Supported kernels: 5.8+
> >     >
> >     > Implementation notes:
> >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> program.
> >     > Added eBPF support to qemu directly through a system call, see the
> >     > bpf(2) for details.
> >     > The eBPF program is part of the qemu and presented as an array
> >     of bpf
> >     > instructions.
> >     > The program can be recompiled by provided Makefile.ebpf(need to
> >     adjust
> >     > 'linuxhdrs'),
> >     > although it's not required to build QEMU with eBPF support.
> >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >     > 'Software' RSS used in the case of hash population and as a
> >     fallback option.
> >     > For vhost, the hash population feature is not reported to the
> guest.
> >     >
> >     > Please also see the documentation in PATCH 6/6.
> >     >
> >     > I am sending those patches as RFC to initiate the discussions
> >     and get
> >     > feedback on the following points:
> >     > * Fallback when eBPF is not supported by the kernel
> >
> >
> >     Yes, and it could also a lacking of CAP_BPF.
> >
> >
> >     > * Live migration to the kernel that doesn't have eBPF support
> >
> >
> >     Is there anything that we needs special treatment here?
> >
> > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > (everything works) -> dest. system 5.6 (bpf does not work), the
> > adapter functions, but all the steering does not use proper queues.
>
>
> Right, I think we need to disable vhost on dest.
>
>
Is this acceptable to disable vhost at time of migration?


> >
> >
> >
> >     > * Integration with current QEMU build
> >
> >
> >     Yes, a question here:
> >
> >     1) Any reason for not using libbpf, e.g it has been shipped with some
> >     distros
> >
> >
> > We intentionally do not use libbpf, as it present only on some distros.
> > We can switch to libbpf, but this will disable bpf if libbpf is not
> > installed
>
>
> That's better I think.
>

We think the preferred way is to have an eBPF code built-in in QEMU (not
distribute it as a separate file).

Our initial idea was to not use the libbpf because it:
1. Does not create additional dependency during build time and during
run-time
2. Gives us smaller footprint of loadable eBPF blob inside qemu
3. Do not add too much code to QEMU

We can switch to libbpf, in this case:
1. Presence of dynamic library is not guaranteed on the target system
2. Static library is large
3. libbpf uses eBPF ELF which is significantly bigger than just the array
or instructions (May be we succeed to reduce the ELF to some suitable size
and still have it built-in)

Please let us know whether you still think libbpf is better and why.

Thanks


>
> >     2) It would be better if we can avoid shipping bytecodes
> >
> >
> >
> > This creates new dependencies: llvm + clang + ...
> > We would prefer byte code and ability to generate it if prerequisites
> > are installed.
>
>
> It's probably ok if we treat the bytecode as a kind of firmware.
>
> But in the long run, it's still worthwhile consider the qemu source is
> used for development and llvm/clang should be a common requirement for
> generating eBPF bytecode for host.
>
>
> >
> >
> >     > * Additional usage for eBPF for packet filtering
> >
> >
> >     Another interesting topics in to implement mac/vlan filters. And
> >     in the
> >     future, I plan to add mac based steering. All of these could be
> >     done via
> >     eBPF.
> >
> >
> > No problem, we can cooperate if needed
> >
> >
> >     >
> >     > Know issues:
> >     > * hash population not supported by eBPF RSS: 'software' RSS used
> >
> >
> >     Is this because there's not way to write to vnet header in
> >     STERRING BPF?
> >
> > Yes. We plan to submit changes for kernel to cooperate with BPF and
> > populate the hash, this work is in progress
>
>
> That would require a new type of eBPF program and may need some work on
> verifier.
>
>
May be need to allow loading of an additional type in tun.c, not only
socket filter (to use bpf_set_hash)
Also vhost and tun in kernel need to be aware of header extension for hash
population.


> Btw, macvtap is still lacking even steering ebpf program. Would you want
> to post a patch to support that?
>
>
Probably after we have full functioning BPF with TAP/TUN


>
> >
> >     > as a fallback, also, hash population feature is not reported to
> >     guests
> >     > with vhost.
> >     > * big-endian BPF support: for now, eBPF is disabled for
> >     big-endian systems.
> >
> >
> >     Are there any blocker for this?
> >
> >
> > No, can be added in v2
>
>
> Cool.
>
> Thanks
>
>
> >
> >     Just some quick questions after a glance of the codes. Will go
> >     through
> >     them tomorrow.
> >
> >     Thanks
> >
> >
> >     >
> >     > Andrew (6):
> >     >    Added SetSteeringEBPF method for NetClientState.
> >     >    ebpf: Added basic eBPF API.
> >     >    ebpf: Added eBPF RSS program.
> >     >    ebpf: Added eBPF RSS loader.
> >     >    virtio-net: Added eBPF RSS to virtio-net.
> >     >    docs: Added eBPF documentation.
> >     >
> >     >   MAINTAINERS                    |   6 +
> >     >   configure                      |  36 +++
> >     >   docs/ebpf.rst                  |  29 ++
> >     >   docs/ebpf_rss.rst              | 129 ++++++++
> >     >   ebpf/EbpfElf_to_C.py           |  67 ++++
> >     >   ebpf/Makefile.ebpf             |  38 +++
> >     >   ebpf/ebpf-stub.c               |  28 ++
> >     >   ebpf/ebpf.c                    | 107 +++++++
> >     >   ebpf/ebpf.h                    |  35 +++
> >     >   ebpf/ebpf_rss.c                | 178 +++++++++++
> >     >   ebpf/ebpf_rss.h                |  30 ++
> >     >   ebpf/meson.build               |   1 +
> >     >   ebpf/rss.bpf.c                 | 470 ++++++++++++++++++++++++++++
> >     >   ebpf/trace-events              |   4 +
> >     >   ebpf/trace.h                   |   2 +
> >     >   ebpf/tun_rss_steering.h        | 556
> >     +++++++++++++++++++++++++++++++++
> >     >   hw/net/vhost_net.c             |   2 +
> >     >   hw/net/virtio-net.c            | 120 ++++++-
> >     >   include/hw/virtio/virtio-net.h |   4 +
> >     >   include/net/net.h              |   2 +
> >     >   meson.build                    |   3 +
> >     >   net/tap-bsd.c                  |   5 +
> >     >   net/tap-linux.c                |  19 ++
> >     >   net/tap-solaris.c              |   5 +
> >     >   net/tap-stub.c                 |   5 +
> >     >   net/tap.c                      |   9 +
> >     >   net/tap_int.h                  |   1 +
> >     >   net/vhost-vdpa.c               |   2 +
> >     >   28 files changed, 1889 insertions(+), 4 deletions(-)
> >     >   create mode 100644 docs/ebpf.rst
> >     >   create mode 100644 docs/ebpf_rss.rst
> >     >   create mode 100644 ebpf/EbpfElf_to_C.py
> >     >   create mode 100755 ebpf/Makefile.ebpf
> >     >   create mode 100644 ebpf/ebpf-stub.c
> >     >   create mode 100644 ebpf/ebpf.c
> >     >   create mode 100644 ebpf/ebpf.h
> >     >   create mode 100644 ebpf/ebpf_rss.c
> >     >   create mode 100644 ebpf/ebpf_rss.h
> >     >   create mode 100644 ebpf/meson.build
> >     >   create mode 100644 ebpf/rss.bpf.c
> >     >   create mode 100644 ebpf/trace-events
> >     >   create mode 100644 ebpf/trace.h
> >     >   create mode 100644 ebpf/tun_rss_steering.h
> >     >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 12053 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-04 11:49       ` Yuri Benditovich
@ 2020-11-04 12:04         ` Daniel P. Berrangé
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-04 12:04 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Jason Wang, Michael S . Tsirkin,
	Andrew Melnychenko, qemu-devel

On Wed, Nov 04, 2020 at 01:49:05PM +0200, Yuri Benditovich wrote:
> On Wed, Nov 4, 2020 at 4:08 AM Jason Wang <jasowang@redhat.com> wrote:
> 
> >
> > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > >
> > >
> > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > <mailto:jasowang@redhat.com>> wrote:
> > >
> > >
> > >     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > >     > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > >     > RSS(Receive Side Scaling) is used to distribute network packets
> > >     to guest virtqueues
> > >     > by calculating packet hash.
> > >     > eBPF RSS allows us to use RSS with vhost TAP.
> > >     >
> > >     > This set of patches introduces the usage of eBPF for packet
> > steering
> > >     > and RSS hash calculation:
> > >     > * RSS(Receive Side Scaling) is used to distribute network packets
> > to
> > >     > guest virtqueues by calculating packet hash
> > >     > * eBPF RSS suppose to be faster than already existing 'software'
> > >     > implementation in QEMU
> > >     > * Additionally adding support for the usage of RSS with vhost
> > >     >
> > >     > Supported kernels: 5.8+
> > >     >
> > >     > Implementation notes:
> > >     > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> > program.
> > >     > Added eBPF support to qemu directly through a system call, see the
> > >     > bpf(2) for details.
> > >     > The eBPF program is part of the qemu and presented as an array
> > >     of bpf
> > >     > instructions.
> > >     > The program can be recompiled by provided Makefile.ebpf(need to
> > >     adjust
> > >     > 'linuxhdrs'),
> > >     > although it's not required to build QEMU with eBPF support.
> > >     > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > >     > 'Software' RSS used in the case of hash population and as a
> > >     fallback option.
> > >     > For vhost, the hash population feature is not reported to the
> > guest.
> > >     >
> > >     > Please also see the documentation in PATCH 6/6.
> > >     >
> > >     > I am sending those patches as RFC to initiate the discussions
> > >     and get
> > >     > feedback on the following points:
> > >     > * Fallback when eBPF is not supported by the kernel
> > >
> > >
> > >     Yes, and it could also a lacking of CAP_BPF.
> > >
> > >
> > >     > * Live migration to the kernel that doesn't have eBPF support
> > >
> > >
> > >     Is there anything that we needs special treatment here?
> > >
> > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > (everything works) -> dest. system 5.6 (bpf does not work), the
> > > adapter functions, but all the steering does not use proper queues.
> >
> >
> > Right, I think we need to disable vhost on dest.
> >
> >
> Is this acceptable to disable vhost at time of migration?
> 
> 
> > >
> > >
> > >
> > >     > * Integration with current QEMU build
> > >
> > >
> > >     Yes, a question here:
> > >
> > >     1) Any reason for not using libbpf, e.g it has been shipped with some
> > >     distros
> > >
> > >
> > > We intentionally do not use libbpf, as it present only on some distros.
> > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > installed
> >
> >
> > That's better I think.
> >
> 
> We think the preferred way is to have an eBPF code built-in in QEMU (not
> distribute it as a separate file).
> 
> Our initial idea was to not use the libbpf because it:
> 1. Does not create additional dependency during build time and during
> run-time
> 2. Gives us smaller footprint of loadable eBPF blob inside qemu
> 3. Do not add too much code to QEMU
> 
> We can switch to libbpf, in this case:
> 1. Presence of dynamic library is not guaranteed on the target system

Again if a distro or users wants to use this feature in
QEMU they should be expected build the library.

> 2. Static library is large

QEMU doesn't support static linking for system emulators.  It may
happen to work at times but there's no expectations in this respect.

> 3. libbpf uses eBPF ELF which is significantly bigger than just the array
> or instructions (May be we succeed to reduce the ELF to some suitable size
> and still have it built-in)
> 
> Please let us know whether you still think libbpf is better and why.

It looks like both CLang and GCC compilers for BPF are moving towards
a world where they use BTF to get compile once, run everywhere portability
for the compiled bytecode. IIUC the libbpf is what is responsible for
processing the BTF data when loading it into the running kernel. This
all looks like a good thing in general. 

If we introduce BPF to QEMU without using libbpf, and then later decide
we absolutely need libbpf features, it creates an upgrade back compat
issue for existing deployments. It is better to use libbpf right from
the start, so we're set up to take full advantage of what it offers
long term.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-04 11:13       ` Daniel P. Berrangé
@ 2020-11-04 15:51         ` Yuri Benditovich
  0 siblings, 0 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-04 15:51 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yan Vugenfirer, Jason Wang, Michael S . Tsirkin,
	Andrew Melnychenko, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4157 bytes --]

On Wed, Nov 4, 2020 at 1:13 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Wed, Nov 04, 2020 at 01:07:41PM +0200, Yuri Benditovich wrote:
> > On Wed, Nov 4, 2020 at 5:09 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > >
> > > On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > From: Andrew <andrew@daynix.com>
> > > >
> > > > When RSS is enabled the device tries to load the eBPF program
> > > > to select RX virtqueue in the TUN. If eBPF can be loaded
> > > > the RSS will function also with vhost (works with kernel 5.8 and
> later).
> > > > Software RSS is used as a fallback with vhost=off when eBPF can't be
> > > loaded
> > > > or when hash population requested by the guest.
> > > >
> > > > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > > > Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
> > > > ---
> > > >   hw/net/vhost_net.c             |   2 +
> > > >   hw/net/virtio-net.c            | 120
> +++++++++++++++++++++++++++++++--
> > > >   include/hw/virtio/virtio-net.h |   4 ++
> > > >   net/vhost-vdpa.c               |   2 +
> > > >   4 files changed, 124 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > > index 24d555e764..16124f99c3 100644
> > > > --- a/hw/net/vhost_net.c
> > > > +++ b/hw/net/vhost_net.c
> > > > @@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
> > > >       VIRTIO_NET_F_MTU,
> > > >       VIRTIO_F_IOMMU_PLATFORM,
> > > >       VIRTIO_F_RING_PACKED,
> > > > +    VIRTIO_NET_F_RSS,
> > > > +    VIRTIO_NET_F_HASH_REPORT,
> > > >
> > > >       /* This bit implies RARP isn't sent by QEMU out of band */
> > > >       VIRTIO_NET_F_GUEST_ANNOUNCE,
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 277289d56e..afcc3032ec 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
> > > >
> > > >   static void virtio_net_set_multiqueue(VirtIONet *n, int
> multiqueue);
> > > >
> > > > +static uint64_t fix_ebpf_vhost_features(uint64_t features)
> > > > +{
> > > > +    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS feature
> */
> > > > +    uint64_t ret = features;
> > > > +#ifndef CONFIG_EBPF
> > > > +    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
> > > > +#endif
> > > > +    /* for now, there is no solution for populating the hash from
> eBPF
> > > */
> > > > +    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);
> > >
> > >
> > > I think we probably need to to something reverse since RSS is under the
> > > control on qemu cli, disable features like this may break migration.
> > >
> > >
> > How by design we add new features to qemu in light of possible migration
> to
> > older qemu version when the destination
> > qemu does not support these features?
>
> If the feature affects guest ABI, then we don't want to silently/
> automatically turn on features that have a dependancy on kernel
> features existing. They need to be an opt-in by mgmt app/admin.
>
>
We understand that. But the eBPF itself does not affect the guest ABI.
The 'RSS' feature of virtio-net device already exists (implemented in QEMU,
so it can work in case the vhost is off).
eBPF is able to do the same on TAP/TUN level (so it can do the job also
when vhost is on).
By default it is turned off and requires explicit command line switch
'rss=on'


> IOW there needs to be an explicit property that is set to turn on use
> of eBPF. If this property is set, then QEMU must use eBPF or fail
> with an error. If it is unset, then QEMU must never use eBPF.
>
> The mgmt app controlling QEMU will decide whether to use eBPF and
> turn on the property, and will then know not to migrate it to a
> host without eBPF support.
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

[-- Attachment #2: Type: text/html, Size: 6167 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net.
  2020-11-04 11:07     ` Yuri Benditovich
  2020-11-04 11:13       ` Daniel P. Berrangé
@ 2020-11-05  3:29       ` Jason Wang
  1 sibling, 0 replies; 36+ messages in thread
From: Jason Wang @ 2020-11-05  3:29 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin


On 2020/11/4 下午7:07, Yuri Benditovich wrote:
>
>
> On Wed, Nov 4, 2020 at 5:09 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > From: Andrew <andrew@daynix.com <mailto:andrew@daynix.com>>
>     >
>     > When RSS is enabled the device tries to load the eBPF program
>     > to select RX virtqueue in the TUN. If eBPF can be loaded
>     > the RSS will function also with vhost (works with kernel 5.8 and
>     later).
>     > Software RSS is used as a fallback with vhost=off when eBPF
>     can't be loaded
>     > or when hash population requested by the guest.
>     >
>     > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com
>     <mailto:yuri.benditovich@daynix.com>>
>     > Signed-off-by: Andrew Melnychenko <andrew@daynix.com
>     <mailto:andrew@daynix.com>>
>     > ---
>     >   hw/net/vhost_net.c             |   2 +
>     >   hw/net/virtio-net.c            | 120
>     +++++++++++++++++++++++++++++++--
>     >   include/hw/virtio/virtio-net.h |   4 ++
>     >   net/vhost-vdpa.c               |   2 +
>     >   4 files changed, 124 insertions(+), 4 deletions(-)
>     >
>     > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>     > index 24d555e764..16124f99c3 100644
>     > --- a/hw/net/vhost_net.c
>     > +++ b/hw/net/vhost_net.c
>     > @@ -71,6 +71,8 @@ static const int user_feature_bits[] = {
>     >       VIRTIO_NET_F_MTU,
>     >       VIRTIO_F_IOMMU_PLATFORM,
>     >       VIRTIO_F_RING_PACKED,
>     > +    VIRTIO_NET_F_RSS,
>     > +    VIRTIO_NET_F_HASH_REPORT,
>     >
>     >       /* This bit implies RARP isn't sent by QEMU out of band */
>     >       VIRTIO_NET_F_GUEST_ANNOUNCE,
>     > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>     > index 277289d56e..afcc3032ec 100644
>     > --- a/hw/net/virtio-net.c
>     > +++ b/hw/net/virtio-net.c
>     > @@ -698,6 +698,19 @@ static void virtio_net_set_queues(VirtIONet *n)
>     >
>     >   static void virtio_net_set_multiqueue(VirtIONet *n, int
>     multiqueue);
>     >
>     > +static uint64_t fix_ebpf_vhost_features(uint64_t features)
>     > +{
>     > +    /* If vhost=on & CONFIG_EBPF doesn't set - disable RSS
>     feature */
>     > +    uint64_t ret = features;
>     > +#ifndef CONFIG_EBPF
>     > +    virtio_clear_feature(&ret, VIRTIO_NET_F_RSS);
>     > +#endif
>     > +    /* for now, there is no solution for populating the hash
>     from eBPF */
>     > +    virtio_clear_feature(&ret, VIRTIO_NET_F_HASH_REPORT);
>
>
>     I think we probably need to to something reverse since RSS is
>     under the
>     control on qemu cli, disable features like this may break migration.
>
>
> How by design we add new features to qemu in light of possible 
> migration to older qemu version when the destination
> qemu does not support these features?


There's a machine type definition in qemu, so we only guarantee the 
success of migration between the same machine type.

For new qemu with old machine type, new features will be disabled.

Thanks


>     We need disable vhost instead when:
>
>     1) eBPF is not supported but RSS is required from command line
>
>     or
>
>     2) HASH_REPORT is required from command line
>
>     > +
>     > +    return ret;
>     > +}
>     > +
>     >   static uint64_t virtio_net_get_features(VirtIODevice *vdev,
>     uint64_t features,
>     >                                           Error **errp)
>     >   {
>     > @@ -732,9 +745,9 @@ static uint64_t
>     virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>     >           return features;
>     >       }
>     >
>     > -    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
>     > -    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
>     > -    features = vhost_net_get_features(get_vhost_net(nc->peer),
>     features);
>     > +    features = fix_ebpf_vhost_features(
>     > + vhost_net_get_features(get_vhost_net(nc->peer), features));
>     > +
>     >       vdev->backend_features = features;
>     >
>     >       if (n->mtu_bypass_backend &&
>     > @@ -1169,12 +1182,75 @@ static int
>     virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
>     >       }
>     >   }
>     >
>     > +static void virtio_net_unload_epbf_rss(VirtIONet *n);
>     > +
>     >   static void virtio_net_disable_rss(VirtIONet *n)
>     >   {
>     >       if (n->rss_data.enabled) {
>     >           trace_virtio_net_rss_disable();
>     >       }
>     >       n->rss_data.enabled = false;
>     > +
>     > +    if (!n->rss_data.enabled_software_rss &&
>     ebpf_rss_is_loaded(&n->ebpf_rss)) {
>     > +        virtio_net_unload_epbf_rss(n);
>     > +    }
>     > +}
>     > +
>     > +static bool virtio_net_attach_steering_ebpf(NICState *nic, int
>     prog_fd)
>     > +{
>     > +    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
>     > +    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
>     > +        return false;
>     > +    }
>     > +
>     > +    return nc->info->set_steering_ebpf(nc, prog_fd);
>     > +}
>     > +
>     > +static void rss_data_to_rss_config(struct VirtioNetRssData *data,
>     > +                                   struct EBPFRSSConfig *config)
>     > +{
>     > +    config->redirect = data->redirect;
>     > +    config->populate_hash = data->populate_hash;
>     > +    config->hash_types = data->hash_types;
>     > +    config->indirections_len = data->indirections_len;
>     > +    config->default_queue = data->default_queue;
>     > +}
>     > +
>     > +static bool virtio_net_load_epbf_rss(VirtIONet *n)
>     > +{
>     > +    struct EBPFRSSConfig config = {};
>     > +
>     > +    if (!n->rss_data.enabled) {
>     > +        if (ebpf_rss_is_loaded(&n->ebpf_rss)) {
>     > +            ebpf_rss_unload(&n->ebpf_rss);
>     > +        }
>     > +        return true;
>     > +    }
>     > +
>     > +    if (!ebpf_rss_is_loaded(&n->ebpf_rss) &&
>     !ebpf_rss_load(&n->ebpf_rss)) {
>     > +        return false;
>     > +    }
>     > +
>     > +    rss_data_to_rss_config(&n->rss_data, &config);
>     > +
>     > +    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
>     > + n->rss_data.indirections_table, n->rss_data.key)) {
>     > +        ebpf_rss_unload(&n->ebpf_rss);
>     > +        return false;
>     > +    }
>     > +
>     > +    if (!virtio_net_attach_steering_ebpf(n->nic,
>     n->ebpf_rss.program_fd)) {
>     > +        ebpf_rss_unload(&n->ebpf_rss);
>     > +        return false;
>     > +    }
>     > +
>     > +    return true;
>     > +}
>     > +
>     > +static void virtio_net_unload_epbf_rss(VirtIONet *n)
>     > +{
>     > +    virtio_net_attach_steering_ebpf(n->nic, -1);
>     > +    ebpf_rss_unload(&n->ebpf_rss);
>     >   }
>     >
>     >   static uint16_t virtio_net_handle_rss(VirtIONet *n,
>     > @@ -1208,6 +1284,7 @@ static uint16_t
>     virtio_net_handle_rss(VirtIONet *n,
>     >           err_value = (uint32_t)s;
>     >           goto error;
>     >       }
>     > +    n->rss_data.enabled_software_rss = false;
>     >       n->rss_data.hash_types = virtio_ldl_p(vdev, &cfg.hash_types);
>     >       n->rss_data.indirections_len =
>     >           virtio_lduw_p(vdev, &cfg.indirection_table_mask);
>     > @@ -1289,9 +1366,30 @@ static uint16_t
>     virtio_net_handle_rss(VirtIONet *n,
>     >           goto error;
>     >       }
>     >       n->rss_data.enabled = true;
>     > +
>     > +    if (!n->rss_data.populate_hash) {
>     > +        /* load EBPF RSS */
>     > +        if (!virtio_net_load_epbf_rss(n)) {
>     > +            /* EBPF mast be loaded for vhost */
>     > +            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
>     > +                warn_report("Can't load eBPF RSS for vhost");
>     > +                goto error;
>     > +            }
>     > +            /* fallback to software RSS */
>     > +            warn_report("Can't load eBPF RSS - fallback to
>     software RSS");
>     > +            n->rss_data.enabled_software_rss = true;
>     > +        }
>     > +    } else {
>     > +        /* use software RSS for hash populating */
>     > +        /* and unload eBPF if was loaded before */
>     > +        virtio_net_unload_epbf_rss(n);
>     > +        n->rss_data.enabled_software_rss = true;
>     > +    }
>     > +
>     >  trace_virtio_net_rss_enable(n->rss_data.hash_types,
>     >  n->rss_data.indirections_len,
>     >                                   temp.b);
>     > +
>     >       return queues;
>     >   error:
>     >       trace_virtio_net_rss_error(err_msg, err_value);
>     > @@ -1674,7 +1772,7 @@ static ssize_t
>     virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
>     >           return -1;
>     >       }
>     >
>     > -    if (!no_rss && n->rss_data.enabled) {
>     > +    if (!no_rss && n->rss_data.enabled &&
>     n->rss_data.enabled_software_rss) {
>     >           int index = virtio_net_process_rss(nc, buf, size);
>     >           if (index >= 0) {
>     >               NetClientState *nc2 = qemu_get_subqueue(n->nic,
>     index);
>     > @@ -2780,6 +2878,18 @@ static int
>     virtio_net_post_load_device(void *opaque, int version_id)
>     >       }
>     >
>     >       if (n->rss_data.enabled) {
>     > +        n->rss_data.enabled_software_rss =
>     n->rss_data.populate_hash;
>     > +        if (!n->rss_data.populate_hash) {
>     > +            if (!virtio_net_load_epbf_rss(n)) {
>     > +                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
>     > +                    error_report("Can't post-load eBPF RSS for
>     vhost");
>     > +                } else {
>     > +                    warn_report("Can't post-load eBPF RSS -
>     fallback to software RSS");
>     > + n->rss_data.enabled_software_rss = true;
>     > +                }
>     > +            }
>     > +        }
>     > +
>     >  trace_virtio_net_rss_enable(n->rss_data.hash_types,
>     >  n->rss_data.indirections_len,
>     >  sizeof(n->rss_data.key));
>     > @@ -3453,6 +3563,8 @@ static void
>     virtio_net_instance_init(Object *obj)
>     >       device_add_bootindex_property(obj, &n->nic_conf.bootindex,
>     >                                     "bootindex", "/ethernet-phy@0",
>     >                                     DEVICE(n));
>     > +
>     > +    ebpf_rss_init(&n->ebpf_rss);
>     >   }
>     >
>     >   static int virtio_net_pre_save(void *opaque)
>     > diff --git a/include/hw/virtio/virtio-net.h
>     b/include/hw/virtio/virtio-net.h
>     > index f4852ac27b..4d29a577eb 100644
>     > --- a/include/hw/virtio/virtio-net.h
>     > +++ b/include/hw/virtio/virtio-net.h
>     > @@ -21,6 +21,8 @@
>     >   #include "qemu/option_int.h"
>     >   #include "qom/object.h"
>     >
>     > +#include "ebpf/ebpf_rss.h"
>     > +
>     >   #define TYPE_VIRTIO_NET "virtio-net-device"
>     >   OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
>     >
>     > @@ -130,6 +132,7 @@ typedef struct VirtioNetRscChain {
>     >
>     >   typedef struct VirtioNetRssData {
>     >       bool    enabled;
>     > +    bool    enabled_software_rss;
>
>
>     We probably need a better name of this since "software" is kind of
>     confusing.
>
> No problem
>
>
>     >       bool    redirect;
>     >       bool    populate_hash;
>     >       uint32_t hash_types;
>     > @@ -214,6 +217,7 @@ struct VirtIONet {
>     >       Notifier migration_state;
>     >       VirtioNetRssData rss_data;
>     >       struct NetRxPkt *rx_pkt;
>     > +    struct EBPFRSSContext ebpf_rss;
>     >   };
>     >
>     >   void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>     > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>     > index 99c476db8c..feb5fa8624 100644
>     > --- a/net/vhost-vdpa.c
>     > +++ b/net/vhost-vdpa.c
>     > @@ -54,6 +54,8 @@ const int vdpa_feature_bits[] = {
>     >       VIRTIO_NET_F_MTU,
>     >       VIRTIO_F_IOMMU_PLATFORM,
>     >       VIRTIO_F_RING_PACKED,
>     > +    VIRTIO_NET_F_RSS,
>     > +    VIRTIO_NET_F_HASH_REPORT,
>
>
>     This is fine but looks unrelated to eBPF RSS support.
>
>     Usually it means the hardware can support RSS/hash reporting.
>
> As I see from 
> https://github.com/qemu/qemu/blob/master/hw/virtio/vhost.c#L1526:
> If we do not add these features this means they are transparent to 
> specific vhost solution.
> When in fact they are not. We need to ensure these features are not in 
> use in case of
> vhost-vdpa or vhost-user.
>
>     Thanks
>
>
>     >       VIRTIO_NET_F_GUEST_ANNOUNCE,
>     >       VIRTIO_NET_F_STATUS,
>     >       VHOST_INVALID_FEATURE_BIT
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-04  9:31       ` Daniel P. Berrangé
@ 2020-11-05  3:46         ` Jason Wang
  2020-11-05  3:52           ` Jason Wang
  2020-11-05 10:01           ` Daniel P. Berrangé
  0 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2020-11-05  3:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yan Vugenfirer, Yuri Benditovich, Andrew Melnychenko, qemu-devel,
	Michael S . Tsirkin


On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
>> On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>>>
>>> On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
>>> <mailto:jasowang@redhat.com>> wrote:
>>>
>>>
>>>      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>>>      > Basic idea is to use eBPF to calculate and steer packets in TAP.
>>>      > RSS(Receive Side Scaling) is used to distribute network packets
>>>      to guest virtqueues
>>>      > by calculating packet hash.
>>>      > eBPF RSS allows us to use RSS with vhost TAP.
>>>      >
>>>      > This set of patches introduces the usage of eBPF for packet steering
>>>      > and RSS hash calculation:
>>>      > * RSS(Receive Side Scaling) is used to distribute network packets to
>>>      > guest virtqueues by calculating packet hash
>>>      > * eBPF RSS suppose to be faster than already existing 'software'
>>>      > implementation in QEMU
>>>      > * Additionally adding support for the usage of RSS with vhost
>>>      >
>>>      > Supported kernels: 5.8+
>>>      >
>>>      > Implementation notes:
>>>      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
>>>      > Added eBPF support to qemu directly through a system call, see the
>>>      > bpf(2) for details.
>>>      > The eBPF program is part of the qemu and presented as an array
>>>      of bpf
>>>      > instructions.
>>>      > The program can be recompiled by provided Makefile.ebpf(need to
>>>      adjust
>>>      > 'linuxhdrs'),
>>>      > although it's not required to build QEMU with eBPF support.
>>>      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
>>>      > 'Software' RSS used in the case of hash population and as a
>>>      fallback option.
>>>      > For vhost, the hash population feature is not reported to the guest.
>>>      >
>>>      > Please also see the documentation in PATCH 6/6.
>>>      >
>>>      > I am sending those patches as RFC to initiate the discussions
>>>      and get
>>>      > feedback on the following points:
>>>      > * Fallback when eBPF is not supported by the kernel
>>>
>>>
>>>      Yes, and it could also a lacking of CAP_BPF.
>>>
>>>
>>>      > * Live migration to the kernel that doesn't have eBPF support
>>>
>>>
>>>      Is there anything that we needs special treatment here?
>>>
>>> Possible case: rss=on, vhost=on, source system with kernel 5.8
>>> (everything works) -> dest. system 5.6 (bpf does not work), the adapter
>>> functions, but all the steering does not use proper queues.
>>
>> Right, I think we need to disable vhost on dest.
>>
>>
>>>
>>>
>>>      > * Integration with current QEMU build
>>>
>>>
>>>      Yes, a question here:
>>>
>>>      1) Any reason for not using libbpf, e.g it has been shipped with some
>>>      distros
>>>
>>>
>>> We intentionally do not use libbpf, as it present only on some distros.
>>> We can switch to libbpf, but this will disable bpf if libbpf is not
>>> installed
>>
>> That's better I think.
>>
>>
>>>      2) It would be better if we can avoid shipping bytecodes
>>>
>>>
>>>
>>> This creates new dependencies: llvm + clang + ...
>>> We would prefer byte code and ability to generate it if prerequisites
>>> are installed.
>>
>> It's probably ok if we treat the bytecode as a kind of firmware.
> That is explicitly *not* OK for inclusion in Fedora. They require that
> BPF is compiled from source, and rejected my suggestion that it could
> be considered a kind of firmware and thus have an exception from building
> from source.


Please refer what it was done in DPDK:

http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235

I don't think what proposed here makes anything different.

It's still a bytecode that lives in an array.


>
>> But in the long run, it's still worthwhile consider the qemu source is used
>> for development and llvm/clang should be a common requirement for generating
>> eBPF bytecode for host.
> So we need to do this right straight way before this merges.


Yes.

Thanks


>
> Regards,
> Daniel



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05  3:46         ` Jason Wang
@ 2020-11-05  3:52           ` Jason Wang
  2020-11-05  9:11             ` Yuri Benditovich
  2020-11-05 10:01           ` Daniel P. Berrangé
  1 sibling, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-05  3:52 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yan Vugenfirer, Yuri Benditovich, Andrew Melnychenko, qemu-devel,
	Michael S . Tsirkin


On 2020/11/5 上午11:46, Jason Wang wrote:
>>
>> It's probably ok if we treat the bytecode as a kind of firmware.
> That is explicitly *not* OK for inclusion in Fedora. They require that
> BPF is compiled from source, and rejected my suggestion that it could
> be considered a kind of firmware and thus have an exception from building
> from source. 


Actually, there's another advantages. If we treat it as firmware, 
(actually it is). It allows us to upgrade it independently with qemu.

Thanks



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/6] docs: Added eBPF documentation.
  2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
  2020-11-04  3:15   ` Jason Wang
@ 2020-11-05  3:56   ` Jason Wang
  2020-11-05  9:40     ` Yuri Benditovich
  1 sibling, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-05  3:56 UTC (permalink / raw)
  To: Andrew Melnychenko, mst; +Cc: yan, yuri.benditovich, qemu-devel


On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> From: Andrew<andrew@daynix.com>
>
> Also, added maintainers information.
>
> Signed-off-by: Yuri Benditovich<yuri.benditovich@daynix.com>
> Signed-off-by: Andrew Melnychenko<andrew@daynix.com>
> ---
>   MAINTAINERS       |   6 +++
>   docs/ebpf.rst     |  29 +++++++++++
>   docs/ebpf_rss.rst | 129 ++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 164 insertions(+)
>   create mode 100644 docs/ebpf.rst
>   create mode 100644 docs/ebpf_rss.rst
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2c22bbca5a..464b3f3c95 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3111,6 +3111,12 @@ S: Maintained
>   F: hw/semihosting/
>   F: include/hw/semihosting/
>   
> +EBPF:
> +M: Andrew Melnychenko<andrew@daynix.com>
> +M: Yuri Benditovich<yuri.benditovich@daynix.com>
> +S: Maintained
> +F: ebpf/*
> +


If it's possible, I would like to be one of the maintainer or at least 
reviewer :)

Thanks



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05  3:52           ` Jason Wang
@ 2020-11-05  9:11             ` Yuri Benditovich
  0 siblings, 0 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-05  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, Daniel P. Berrangé,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

On Thu, Nov 5, 2020 at 5:52 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/5 上午11:46, Jason Wang wrote:
> >>
> >> It's probably ok if we treat the bytecode as a kind of firmware.
> > That is explicitly *not* OK for inclusion in Fedora. They require that
> > BPF is compiled from source, and rejected my suggestion that it could
> > be considered a kind of firmware and thus have an exception from building
> > from source.
>
>
> Actually, there's another advantages. If we treat it as firmware,
> (actually it is). It allows us to upgrade it independently with qemu.
>
> Hi Jason,
I think this is a big disadvantage to have the BPF binary outside of QEMU.
It is compiled with common structures (for example RSS configuration)
defined in QEMU and if it is not built in the QEMU then nobody is
responsible for the compatibility of the BPF and QEMU.
Just an array of instructions (af today) is ~2k, full object file (if we
use libbpf) is ~8K, so there is no big problem with the size.
If we even keep the entire object in QEMU, it is for sure 100% compatible.

Thanks
>
>

[-- Attachment #2: Type: text/html, Size: 1673 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 6/6] docs: Added eBPF documentation.
  2020-11-05  3:56   ` Jason Wang
@ 2020-11-05  9:40     ` Yuri Benditovich
  0 siblings, 0 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-05  9:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1410 bytes --]

On Thu, Nov 5, 2020 at 5:56 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > From: Andrew<andrew@daynix.com>
> >
> > Also, added maintainers information.
> >
> > Signed-off-by: Yuri Benditovich<yuri.benditovich@daynix.com>
> > Signed-off-by: Andrew Melnychenko<andrew@daynix.com>
> > ---
> >   MAINTAINERS       |   6 +++
> >   docs/ebpf.rst     |  29 +++++++++++
> >   docs/ebpf_rss.rst | 129 ++++++++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 164 insertions(+)
> >   create mode 100644 docs/ebpf.rst
> >   create mode 100644 docs/ebpf_rss.rst
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 2c22bbca5a..464b3f3c95 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3111,6 +3111,12 @@ S: Maintained
> >   F: hw/semihosting/
> >   F: include/hw/semihosting/
> >
> > +EBPF:
> > +M: Andrew Melnychenko<andrew@daynix.com>
> > +M: Yuri Benditovich<yuri.benditovich@daynix.com>
> > +S: Maintained
> > +F: ebpf/*
> > +
>
>
> If it's possible, I would like to be one of the maintainer or at least
> reviewer :)
>
> With pleasure. We did not know who would want to maintain eBPF related
things, so we added ourselves as maintainers.
If you agree, we'll place you as a maintainer and ourselves as reviewers to
be informed about changes before they happen.

Thanks



> Thanks
>
>

[-- Attachment #2: Type: text/html, Size: 2431 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05  3:46         ` Jason Wang
  2020-11-05  3:52           ` Jason Wang
@ 2020-11-05 10:01           ` Daniel P. Berrangé
  2020-11-05 13:19             ` Daniel P. Berrangé
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-05 10:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Yuri Benditovich, Andrew Melnychenko, qemu-devel,
	Michael S . Tsirkin

On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> 
> On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > 
> > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > <mailto:jasowang@redhat.com>> wrote:
> > > > 
> > > > 
> > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > >      > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > >      > RSS(Receive Side Scaling) is used to distribute network packets
> > > >      to guest virtqueues
> > > >      > by calculating packet hash.
> > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > >      >
> > > >      > This set of patches introduces the usage of eBPF for packet steering
> > > >      > and RSS hash calculation:
> > > >      > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > >      > guest virtqueues by calculating packet hash
> > > >      > * eBPF RSS suppose to be faster than already existing 'software'
> > > >      > implementation in QEMU
> > > >      > * Additionally adding support for the usage of RSS with vhost
> > > >      >
> > > >      > Supported kernels: 5.8+
> > > >      >
> > > >      > Implementation notes:
> > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > >      > Added eBPF support to qemu directly through a system call, see the
> > > >      > bpf(2) for details.
> > > >      > The eBPF program is part of the qemu and presented as an array
> > > >      of bpf
> > > >      > instructions.
> > > >      > The program can be recompiled by provided Makefile.ebpf(need to
> > > >      adjust
> > > >      > 'linuxhdrs'),
> > > >      > although it's not required to build QEMU with eBPF support.
> > > >      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > >      > 'Software' RSS used in the case of hash population and as a
> > > >      fallback option.
> > > >      > For vhost, the hash population feature is not reported to the guest.
> > > >      >
> > > >      > Please also see the documentation in PATCH 6/6.
> > > >      >
> > > >      > I am sending those patches as RFC to initiate the discussions
> > > >      and get
> > > >      > feedback on the following points:
> > > >      > * Fallback when eBPF is not supported by the kernel
> > > > 
> > > > 
> > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > 
> > > > 
> > > >      > * Live migration to the kernel that doesn't have eBPF support
> > > > 
> > > > 
> > > >      Is there anything that we needs special treatment here?
> > > > 
> > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > > > functions, but all the steering does not use proper queues.
> > > 
> > > Right, I think we need to disable vhost on dest.
> > > 
> > > 
> > > > 
> > > > 
> > > >      > * Integration with current QEMU build
> > > > 
> > > > 
> > > >      Yes, a question here:
> > > > 
> > > >      1) Any reason for not using libbpf, e.g it has been shipped with some
> > > >      distros
> > > > 
> > > > 
> > > > We intentionally do not use libbpf, as it present only on some distros.
> > > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > > installed
> > > 
> > > That's better I think.
> > > 
> > > 
> > > >      2) It would be better if we can avoid shipping bytecodes
> > > > 
> > > > 
> > > > 
> > > > This creates new dependencies: llvm + clang + ...
> > > > We would prefer byte code and ability to generate it if prerequisites
> > > > are installed.
> > > 
> > > It's probably ok if we treat the bytecode as a kind of firmware.
> > That is explicitly *not* OK for inclusion in Fedora. They require that
> > BPF is compiled from source, and rejected my suggestion that it could
> > be considered a kind of firmware and thus have an exception from building
> > from source.
> 
> 
> Please refer what it was done in DPDK:
> 
> http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> 
> I don't think what proposed here makes anything different.

I'm not convinced that what DPDK does is acceptable to Fedora either
based on the responses I've received when asking about BPF handling
during build.  I wouldn't suprise me, however, if this was simply
missed by reviewers when accepting DPDK into Fedora, because it is
not entirely obvious unless you are looking closely.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05 10:01           ` Daniel P. Berrangé
@ 2020-11-05 13:19             ` Daniel P. Berrangé
  2020-11-05 15:13               ` Yuri Benditovich
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2020-11-05 13:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Yuri Benditovich, Andrew Melnychenko, qemu-devel,
	Michael S . Tsirkin

On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> > 
> > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > > 
> > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > > <mailto:jasowang@redhat.com>> wrote:
> > > > > 
> > > > > 
> > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > >      > Basic idea is to use eBPF to calculate and steer packets in TAP.
> > > > >      > RSS(Receive Side Scaling) is used to distribute network packets
> > > > >      to guest virtqueues
> > > > >      > by calculating packet hash.
> > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > > >      >
> > > > >      > This set of patches introduces the usage of eBPF for packet steering
> > > > >      > and RSS hash calculation:
> > > > >      > * RSS(Receive Side Scaling) is used to distribute network packets to
> > > > >      > guest virtqueues by calculating packet hash
> > > > >      > * eBPF RSS suppose to be faster than already existing 'software'
> > > > >      > implementation in QEMU
> > > > >      > * Additionally adding support for the usage of RSS with vhost
> > > > >      >
> > > > >      > Supported kernels: 5.8+
> > > > >      >
> > > > >      > Implementation notes:
> > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF program.
> > > > >      > Added eBPF support to qemu directly through a system call, see the
> > > > >      > bpf(2) for details.
> > > > >      > The eBPF program is part of the qemu and presented as an array
> > > > >      of bpf
> > > > >      > instructions.
> > > > >      > The program can be recompiled by provided Makefile.ebpf(need to
> > > > >      adjust
> > > > >      > 'linuxhdrs'),
> > > > >      > although it's not required to build QEMU with eBPF support.
> > > > >      > Added changes to virtio-net and vhost, primary eBPF RSS is used.
> > > > >      > 'Software' RSS used in the case of hash population and as a
> > > > >      fallback option.
> > > > >      > For vhost, the hash population feature is not reported to the guest.
> > > > >      >
> > > > >      > Please also see the documentation in PATCH 6/6.
> > > > >      >
> > > > >      > I am sending those patches as RFC to initiate the discussions
> > > > >      and get
> > > > >      > feedback on the following points:
> > > > >      > * Fallback when eBPF is not supported by the kernel
> > > > > 
> > > > > 
> > > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > > 
> > > > > 
> > > > >      > * Live migration to the kernel that doesn't have eBPF support
> > > > > 
> > > > > 
> > > > >      Is there anything that we needs special treatment here?
> > > > > 
> > > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > > (everything works) -> dest. system 5.6 (bpf does not work), the adapter
> > > > > functions, but all the steering does not use proper queues.
> > > > 
> > > > Right, I think we need to disable vhost on dest.
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > >      > * Integration with current QEMU build
> > > > > 
> > > > > 
> > > > >      Yes, a question here:
> > > > > 
> > > > >      1) Any reason for not using libbpf, e.g it has been shipped with some
> > > > >      distros
> > > > > 
> > > > > 
> > > > > We intentionally do not use libbpf, as it present only on some distros.
> > > > > We can switch to libbpf, but this will disable bpf if libbpf is not
> > > > > installed
> > > > 
> > > > That's better I think.
> > > > 
> > > > 
> > > > >      2) It would be better if we can avoid shipping bytecodes
> > > > > 
> > > > > 
> > > > > 
> > > > > This creates new dependencies: llvm + clang + ...
> > > > > We would prefer byte code and ability to generate it if prerequisites
> > > > > are installed.
> > > > 
> > > > It's probably ok if we treat the bytecode as a kind of firmware.
> > > That is explicitly *not* OK for inclusion in Fedora. They require that
> > > BPF is compiled from source, and rejected my suggestion that it could
> > > be considered a kind of firmware and thus have an exception from building
> > > from source.
> > 
> > 
> > Please refer what it was done in DPDK:
> > 
> > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> > 
> > I don't think what proposed here makes anything different.
> 
> I'm not convinced that what DPDK does is acceptable to Fedora either
> based on the responses I've received when asking about BPF handling
> during build.  I wouldn't suprise me, however, if this was simply
> missed by reviewers when accepting DPDK into Fedora, because it is
> not entirely obvious unless you are looking closely.

FWIW, I'm pushing back against the idea that we have to compile the
BPF code from master source, as I think it is reasonable to have the
program embedded as a static array in the source code similar to what
DPDK does.  It doesn't feel much different from other places where apps
use generated sources, and don't build them from the original source
every time. eg "configure" is never re-generated from "configure.ac"
by Fedora packagers, they just use the generated "configure" script
as-is.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05 13:19             ` Daniel P. Berrangé
@ 2020-11-05 15:13               ` Yuri Benditovich
  2020-11-09  2:13                 ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-05 15:13 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Yan Vugenfirer, Jason Wang, Michael S . Tsirkin,
	Andrew Melnychenko, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8075 bytes --]

First of all, thank you for all your feedbacks

Please help me to summarize and let us understand better what we do in v2:
Major questions are:
1. Building eBPF from source during qemu build vs. regenerating it on
demand and keeping in the repository
Solution 1a (~ as in v1): keep instructions or ELF in H file, generate it
out of qemu build. In general we'll need to have BE and LE binaries.
Solution 1b: build ELF or instructions during QEMU build if llvm + clang
exist. Then we will have only one (BE or LE, depending on current QEMU
build)
We agree with any solution - I believe you know the requirements better.

2. Use libbpf or not
In general we do not see any advantage of using libbpf. It works with
object files (does ELF parsing at time of loading), but it does not do any
magic.
Solution 2a. Switch to libbpf, generate object files (LE and BE) from
source, keep them inside QEMU (~8k each) or aside
Solution 2b. (as in v1) Use python script to parse object -> instructions
(~2k each)
We'd prefer not to use libbpf at the moment.
If due to some unknown reason we'll find it useful in future, we can switch
to it, this does not create any incompatibility. Then this will create a
dependency on libbpf.so

3. Keep instructions or ELF inside QEMU or as separate external file
Solution 3a (~as in v1): Built-in array of instructions or ELF. If we
generate them out of QEMU build - keep 2 arrays or instructions or ELF (BE
and LE),
Solution 3b: Install them as separate files (/usr/share/qemu).
We'd prefer 3a:
 Then there is a guarantee that the eBPF is built with exactly the same
config structures as QEMU (qemu creates a mapping of its structures, eBPF
uses them).
 No need to take care on scenarios like 'file not found', 'file is not
suitable' etc

4. Is there some real request to have the eBPF for big-endian?
If no, we can enable eBPF only for LE builds

Jason, Daniel, Michael
Can you please let us know what you think and why?

On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> > >
> > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> > > > > >
> > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang <jasowang@redhat.com
> > > > > > <mailto:jasowang@redhat.com>> wrote:
> > > > > >
> > > > > >
> > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> > > > > >      > Basic idea is to use eBPF to calculate and steer packets
> in TAP.
> > > > > >      > RSS(Receive Side Scaling) is used to distribute network
> packets
> > > > > >      to guest virtqueues
> > > > > >      > by calculating packet hash.
> > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> > > > > >      >
> > > > > >      > This set of patches introduces the usage of eBPF for
> packet steering
> > > > > >      > and RSS hash calculation:
> > > > > >      > * RSS(Receive Side Scaling) is used to distribute network
> packets to
> > > > > >      > guest virtqueues by calculating packet hash
> > > > > >      > * eBPF RSS suppose to be faster than already existing
> 'software'
> > > > > >      > implementation in QEMU
> > > > > >      > * Additionally adding support for the usage of RSS with
> vhost
> > > > > >      >
> > > > > >      > Supported kernels: 5.8+
> > > > > >      >
> > > > > >      > Implementation notes:
> > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the
> eBPF program.
> > > > > >      > Added eBPF support to qemu directly through a system
> call, see the
> > > > > >      > bpf(2) for details.
> > > > > >      > The eBPF program is part of the qemu and presented as an
> array
> > > > > >      of bpf
> > > > > >      > instructions.
> > > > > >      > The program can be recompiled by provided
> Makefile.ebpf(need to
> > > > > >      adjust
> > > > > >      > 'linuxhdrs'),
> > > > > >      > although it's not required to build QEMU with eBPF
> support.
> > > > > >      > Added changes to virtio-net and vhost, primary eBPF RSS
> is used.
> > > > > >      > 'Software' RSS used in the case of hash population and as
> a
> > > > > >      fallback option.
> > > > > >      > For vhost, the hash population feature is not reported to
> the guest.
> > > > > >      >
> > > > > >      > Please also see the documentation in PATCH 6/6.
> > > > > >      >
> > > > > >      > I am sending those patches as RFC to initiate the
> discussions
> > > > > >      and get
> > > > > >      > feedback on the following points:
> > > > > >      > * Fallback when eBPF is not supported by the kernel
> > > > > >
> > > > > >
> > > > > >      Yes, and it could also a lacking of CAP_BPF.
> > > > > >
> > > > > >
> > > > > >      > * Live migration to the kernel that doesn't have eBPF
> support
> > > > > >
> > > > > >
> > > > > >      Is there anything that we needs special treatment here?
> > > > > >
> > > > > > Possible case: rss=on, vhost=on, source system with kernel 5.8
> > > > > > (everything works) -> dest. system 5.6 (bpf does not work), the
> adapter
> > > > > > functions, but all the steering does not use proper queues.
> > > > >
> > > > > Right, I think we need to disable vhost on dest.
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >      > * Integration with current QEMU build
> > > > > >
> > > > > >
> > > > > >      Yes, a question here:
> > > > > >
> > > > > >      1) Any reason for not using libbpf, e.g it has been shipped
> with some
> > > > > >      distros
> > > > > >
> > > > > >
> > > > > > We intentionally do not use libbpf, as it present only on some
> distros.
> > > > > > We can switch to libbpf, but this will disable bpf if libbpf is
> not
> > > > > > installed
> > > > >
> > > > > That's better I think.
> > > > >
> > > > >
> > > > > >      2) It would be better if we can avoid shipping bytecodes
> > > > > >
> > > > > >
> > > > > >
> > > > > > This creates new dependencies: llvm + clang + ...
> > > > > > We would prefer byte code and ability to generate it if
> prerequisites
> > > > > > are installed.
> > > > >
> > > > > It's probably ok if we treat the bytecode as a kind of firmware.
> > > > That is explicitly *not* OK for inclusion in Fedora. They require
> that
> > > > BPF is compiled from source, and rejected my suggestion that it could
> > > > be considered a kind of firmware and thus have an exception from
> building
> > > > from source.
> > >
> > >
> > > Please refer what it was done in DPDK:
> > >
> > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> > >
> > > I don't think what proposed here makes anything different.
> >
> > I'm not convinced that what DPDK does is acceptable to Fedora either
> > based on the responses I've received when asking about BPF handling
> > during build.  I wouldn't suprise me, however, if this was simply
> > missed by reviewers when accepting DPDK into Fedora, because it is
> > not entirely obvious unless you are looking closely.
>
> FWIW, I'm pushing back against the idea that we have to compile the
> BPF code from master source, as I think it is reasonable to have the
> program embedded as a static array in the source code similar to what
> DPDK does.  It doesn't feel much different from other places where apps
> use generated sources, and don't build them from the original source
> every time. eg "configure" is never re-generated from "configure.ac"
> by Fedora packagers, they just use the generated "configure" script
> as-is.
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

[-- Attachment #2: Type: text/html, Size: 11448 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-05 15:13               ` Yuri Benditovich
@ 2020-11-09  2:13                 ` Jason Wang
  2020-11-09 13:33                   ` Yuri Benditovich
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-09  2:13 UTC (permalink / raw)
  To: Yuri Benditovich, Daniel P. Berrangé
  Cc: Yan Vugenfirer, Andrew Melnychenko, qemu-devel, Michael S . Tsirkin


On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> First of all, thank you for all your feedbacks
>
> Please help me to summarize and let us understand better what we do in v2:
> Major questions are:
> 1. Building eBPF from source during qemu build vs. regenerating it on 
> demand and keeping in the repository
> Solution 1a (~ as in v1): keep instructions or ELF in H file, generate 
> it out of qemu build. In general we'll need to have BE and LE binaries.
> Solution 1b: build ELF or instructions during QEMU build if llvm + 
> clang exist. Then we will have only one (BE or LE, depending on 
> current QEMU build)
> We agree with any solution - I believe you know the requirements better.


I think we can go with 1a. (See Danial's comment)


>
> 2. Use libbpf or not
> In general we do not see any advantage of using libbpf. It works with 
> object files (does ELF parsing at time of loading), but it does not do 
> any magic.
> Solution 2a. Switch to libbpf, generate object files (LE and BE) from 
> source, keep them inside QEMU (~8k each) or aside


Can we simply use dynamic linking here?


> Solution 2b. (as in v1) Use python script to parse object -> 
> instructions (~2k each)
> We'd prefer not to use libbpf at the moment.
> If due to some unknown reason we'll find it useful in future, we can 
> switch to it, this does not create any incompatibility. Then this will 
> create a dependency on libbpf.so


I think we need to care about compatibility. E.g we need to enable BTF 
so I don't know how hard if we add BTF support in the current design. It 
would be probably OK it's not a lot of effort.


>
> 3. Keep instructions or ELF inside QEMU or as separate external file
> Solution 3a (~as in v1): Built-in array of instructions or ELF. If we 
> generate them out of QEMU build - keep 2 arrays or instructions or ELF 
> (BE and LE),
> Solution 3b: Install them as separate files (/usr/share/qemu).
> We'd prefer 3a:
>  Then there is a guarantee that the eBPF is built with exactly the 
> same config structures as QEMU (qemu creates a mapping of its 
> structures, eBPF uses them).
>  No need to take care on scenarios like 'file not found', 'file is not 
> suitable' etc


Yes, let's go 3a for upstream.


>
> 4. Is there some real request to have the eBPF for big-endian?
> If no, we can enable eBPF only for LE builds


We can go with LE first.

Thanks


>
> Jason, Daniel, Michael
> Can you please let us know what you think and why?
>
> On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com 
> <mailto:berrange@redhat.com>> wrote:
>
>     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
>     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
>     > >
>     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
>     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
>     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>     > > > > >
>     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
>     <jasowang@redhat.com <mailto:jasowang@redhat.com>
>     > > > > > <mailto:jasowang@redhat.com
>     <mailto:jasowang@redhat.com>>> wrote:
>     > > > > >
>     > > > > >
>     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     > > > > >      > Basic idea is to use eBPF to calculate and steer
>     packets in TAP.
>     > > > > >      > RSS(Receive Side Scaling) is used to distribute
>     network packets
>     > > > > >      to guest virtqueues
>     > > > > >      > by calculating packet hash.
>     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
>     > > > > >      >
>     > > > > >      > This set of patches introduces the usage of eBPF
>     for packet steering
>     > > > > >      > and RSS hash calculation:
>     > > > > >      > * RSS(Receive Side Scaling) is used to distribute
>     network packets to
>     > > > > >      > guest virtqueues by calculating packet hash
>     > > > > >      > * eBPF RSS suppose to be faster than already
>     existing 'software'
>     > > > > >      > implementation in QEMU
>     > > > > >      > * Additionally adding support for the usage of
>     RSS with vhost
>     > > > > >      >
>     > > > > >      > Supported kernels: 5.8+
>     > > > > >      >
>     > > > > >      > Implementation notes:
>     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
>     set the eBPF program.
>     > > > > >      > Added eBPF support to qemu directly through a
>     system call, see the
>     > > > > >      > bpf(2) for details.
>     > > > > >      > The eBPF program is part of the qemu and
>     presented as an array
>     > > > > >      of bpf
>     > > > > >      > instructions.
>     > > > > >      > The program can be recompiled by provided
>     Makefile.ebpf(need to
>     > > > > >      adjust
>     > > > > >      > 'linuxhdrs'),
>     > > > > >      > although it's not required to build QEMU with
>     eBPF support.
>     > > > > >      > Added changes to virtio-net and vhost, primary
>     eBPF RSS is used.
>     > > > > >      > 'Software' RSS used in the case of hash
>     population and as a
>     > > > > >      fallback option.
>     > > > > >      > For vhost, the hash population feature is not
>     reported to the guest.
>     > > > > >      >
>     > > > > >      > Please also see the documentation in PATCH 6/6.
>     > > > > >      >
>     > > > > >      > I am sending those patches as RFC to initiate the
>     discussions
>     > > > > >      and get
>     > > > > >      > feedback on the following points:
>     > > > > >      > * Fallback when eBPF is not supported by the kernel
>     > > > > >
>     > > > > >
>     > > > > >      Yes, and it could also a lacking of CAP_BPF.
>     > > > > >
>     > > > > >
>     > > > > >      > * Live migration to the kernel that doesn't have
>     eBPF support
>     > > > > >
>     > > > > >
>     > > > > >      Is there anything that we needs special treatment here?
>     > > > > >
>     > > > > > Possible case: rss=on, vhost=on, source system with
>     kernel 5.8
>     > > > > > (everything works) -> dest. system 5.6 (bpf does not
>     work), the adapter
>     > > > > > functions, but all the steering does not use proper queues.
>     > > > >
>     > > > > Right, I think we need to disable vhost on dest.
>     > > > >
>     > > > >
>     > > > > >
>     > > > > >
>     > > > > >      > * Integration with current QEMU build
>     > > > > >
>     > > > > >
>     > > > > >      Yes, a question here:
>     > > > > >
>     > > > > >      1) Any reason for not using libbpf, e.g it has been
>     shipped with some
>     > > > > >      distros
>     > > > > >
>     > > > > >
>     > > > > > We intentionally do not use libbpf, as it present only
>     on some distros.
>     > > > > > We can switch to libbpf, but this will disable bpf if
>     libbpf is not
>     > > > > > installed
>     > > > >
>     > > > > That's better I think.
>     > > > >
>     > > > >
>     > > > > >      2) It would be better if we can avoid shipping
>     bytecodes
>     > > > > >
>     > > > > >
>     > > > > >
>     > > > > > This creates new dependencies: llvm + clang + ...
>     > > > > > We would prefer byte code and ability to generate it if
>     prerequisites
>     > > > > > are installed.
>     > > > >
>     > > > > It's probably ok if we treat the bytecode as a kind of
>     firmware.
>     > > > That is explicitly *not* OK for inclusion in Fedora. They
>     require that
>     > > > BPF is compiled from source, and rejected my suggestion that
>     it could
>     > > > be considered a kind of firmware and thus have an exception
>     from building
>     > > > from source.
>     > >
>     > >
>     > > Please refer what it was done in DPDK:
>     > >
>     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
>     > >
>     > > I don't think what proposed here makes anything different.
>     >
>     > I'm not convinced that what DPDK does is acceptable to Fedora either
>     > based on the responses I've received when asking about BPF handling
>     > during build.  I wouldn't suprise me, however, if this was simply
>     > missed by reviewers when accepting DPDK into Fedora, because it is
>     > not entirely obvious unless you are looking closely.
>
>     FWIW, I'm pushing back against the idea that we have to compile the
>     BPF code from master source, as I think it is reasonable to have the
>     program embedded as a static array in the source code similar to what
>     DPDK does.  It doesn't feel much different from other places where
>     apps
>     use generated sources, and don't build them from the original source
>     every time. eg "configure" is never re-generated from
>     "configure.ac <http://configure.ac>"
>     by Fedora packagers, they just use the generated "configure" script
>     as-is.
>
>     Regards,
>     Daniel
>     -- 
>     |: https://berrange.com     -o-
>     https://www.flickr.com/photos/dberrange :|
>     |: https://libvirt.org        -o- https://fstop138.berrange.com :|
>     |: https://entangle-photo.org   -o-
>     https://www.instagram.com/dberrange :|
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-09  2:13                 ` Jason Wang
@ 2020-11-09 13:33                   ` Yuri Benditovich
  2020-11-10  2:23                     ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-09 13:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, Daniel P. Berrangé,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 10267 bytes --]

On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> > First of all, thank you for all your feedbacks
> >
> > Please help me to summarize and let us understand better what we do in
> v2:
> > Major questions are:
> > 1. Building eBPF from source during qemu build vs. regenerating it on
> > demand and keeping in the repository
> > Solution 1a (~ as in v1): keep instructions or ELF in H file, generate
> > it out of qemu build. In general we'll need to have BE and LE binaries.
> > Solution 1b: build ELF or instructions during QEMU build if llvm +
> > clang exist. Then we will have only one (BE or LE, depending on
> > current QEMU build)
> > We agree with any solution - I believe you know the requirements better.
>
>
> I think we can go with 1a. (See Danial's comment)
>
>
> >
> > 2. Use libbpf or not
> > In general we do not see any advantage of using libbpf. It works with
> > object files (does ELF parsing at time of loading), but it does not do
> > any magic.
> > Solution 2a. Switch to libbpf, generate object files (LE and BE) from
> > source, keep them inside QEMU (~8k each) or aside
>
>
> Can we simply use dynamic linking here?
>
>
Can you please explain, where exactly you suggest to use dynamic linking?


>
> > Solution 2b. (as in v1) Use python script to parse object ->
> > instructions (~2k each)
> > We'd prefer not to use libbpf at the moment.
> > If due to some unknown reason we'll find it useful in future, we can
> > switch to it, this does not create any incompatibility. Then this will
> > create a dependency on libbpf.so
>
>
> I think we need to care about compatibility. E.g we need to enable BTF
> so I don't know how hard if we add BTF support in the current design. It
> would be probably OK it's not a lot of effort.
>
>
As far as we understand BTF helps in BPF debugging and libbpf supports it
as is.
Without libbpf we in v1 load the BPF instructions only.
If you think the BTF is mandatory (BTW, why?) I think it is better to
switch to libbpf and keep the entire ELF in the qemu data.


> >
> > 3. Keep instructions or ELF inside QEMU or as separate external file
> > Solution 3a (~as in v1): Built-in array of instructions or ELF. If we
> > generate them out of QEMU build - keep 2 arrays or instructions or ELF
> > (BE and LE),
> > Solution 3b: Install them as separate files (/usr/share/qemu).
> > We'd prefer 3a:
> >  Then there is a guarantee that the eBPF is built with exactly the
> > same config structures as QEMU (qemu creates a mapping of its
> > structures, eBPF uses them).
> >  No need to take care on scenarios like 'file not found', 'file is not
> > suitable' etc
>
>
> Yes, let's go 3a for upstream.
>
>
> >
> > 4. Is there some real request to have the eBPF for big-endian?
> > If no, we can enable eBPF only for LE builds
>
>
> We can go with LE first.
>
> Thanks
>
>
> >
> > Jason, Daniel, Michael
> > Can you please let us know what you think and why?
> >
> > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé <berrange@redhat.com
> > <mailto:berrange@redhat.com>> wrote:
> >
> >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé wrote:
> >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >     > >
> >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang wrote:
> >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >     > > > > >
> >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
> >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >     > > > > > <mailto:jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>>> wrote:
> >     > > > > >
> >     > > > > >
> >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     > > > > >      > Basic idea is to use eBPF to calculate and steer
> >     packets in TAP.
> >     > > > > >      > RSS(Receive Side Scaling) is used to distribute
> >     network packets
> >     > > > > >      to guest virtqueues
> >     > > > > >      > by calculating packet hash.
> >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> >     > > > > >      >
> >     > > > > >      > This set of patches introduces the usage of eBPF
> >     for packet steering
> >     > > > > >      > and RSS hash calculation:
> >     > > > > >      > * RSS(Receive Side Scaling) is used to distribute
> >     network packets to
> >     > > > > >      > guest virtqueues by calculating packet hash
> >     > > > > >      > * eBPF RSS suppose to be faster than already
> >     existing 'software'
> >     > > > > >      > implementation in QEMU
> >     > > > > >      > * Additionally adding support for the usage of
> >     RSS with vhost
> >     > > > > >      >
> >     > > > > >      > Supported kernels: 5.8+
> >     > > > > >      >
> >     > > > > >      > Implementation notes:
> >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
> >     set the eBPF program.
> >     > > > > >      > Added eBPF support to qemu directly through a
> >     system call, see the
> >     > > > > >      > bpf(2) for details.
> >     > > > > >      > The eBPF program is part of the qemu and
> >     presented as an array
> >     > > > > >      of bpf
> >     > > > > >      > instructions.
> >     > > > > >      > The program can be recompiled by provided
> >     Makefile.ebpf(need to
> >     > > > > >      adjust
> >     > > > > >      > 'linuxhdrs'),
> >     > > > > >      > although it's not required to build QEMU with
> >     eBPF support.
> >     > > > > >      > Added changes to virtio-net and vhost, primary
> >     eBPF RSS is used.
> >     > > > > >      > 'Software' RSS used in the case of hash
> >     population and as a
> >     > > > > >      fallback option.
> >     > > > > >      > For vhost, the hash population feature is not
> >     reported to the guest.
> >     > > > > >      >
> >     > > > > >      > Please also see the documentation in PATCH 6/6.
> >     > > > > >      >
> >     > > > > >      > I am sending those patches as RFC to initiate the
> >     discussions
> >     > > > > >      and get
> >     > > > > >      > feedback on the following points:
> >     > > > > >      > * Fallback when eBPF is not supported by the kernel
> >     > > > > >
> >     > > > > >
> >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
> >     > > > > >
> >     > > > > >
> >     > > > > >      > * Live migration to the kernel that doesn't have
> >     eBPF support
> >     > > > > >
> >     > > > > >
> >     > > > > >      Is there anything that we needs special treatment
> here?
> >     > > > > >
> >     > > > > > Possible case: rss=on, vhost=on, source system with
> >     kernel 5.8
> >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
> >     work), the adapter
> >     > > > > > functions, but all the steering does not use proper queues.
> >     > > > >
> >     > > > > Right, I think we need to disable vhost on dest.
> >     > > > >
> >     > > > >
> >     > > > > >
> >     > > > > >
> >     > > > > >      > * Integration with current QEMU build
> >     > > > > >
> >     > > > > >
> >     > > > > >      Yes, a question here:
> >     > > > > >
> >     > > > > >      1) Any reason for not using libbpf, e.g it has been
> >     shipped with some
> >     > > > > >      distros
> >     > > > > >
> >     > > > > >
> >     > > > > > We intentionally do not use libbpf, as it present only
> >     on some distros.
> >     > > > > > We can switch to libbpf, but this will disable bpf if
> >     libbpf is not
> >     > > > > > installed
> >     > > > >
> >     > > > > That's better I think.
> >     > > > >
> >     > > > >
> >     > > > > >      2) It would be better if we can avoid shipping
> >     bytecodes
> >     > > > > >
> >     > > > > >
> >     > > > > >
> >     > > > > > This creates new dependencies: llvm + clang + ...
> >     > > > > > We would prefer byte code and ability to generate it if
> >     prerequisites
> >     > > > > > are installed.
> >     > > > >
> >     > > > > It's probably ok if we treat the bytecode as a kind of
> >     firmware.
> >     > > > That is explicitly *not* OK for inclusion in Fedora. They
> >     require that
> >     > > > BPF is compiled from source, and rejected my suggestion that
> >     it could
> >     > > > be considered a kind of firmware and thus have an exception
> >     from building
> >     > > > from source.
> >     > >
> >     > >
> >     > > Please refer what it was done in DPDK:
> >     > >
> >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >     > >
> >     > > I don't think what proposed here makes anything different.
> >     >
> >     > I'm not convinced that what DPDK does is acceptable to Fedora
> either
> >     > based on the responses I've received when asking about BPF handling
> >     > during build.  I wouldn't suprise me, however, if this was simply
> >     > missed by reviewers when accepting DPDK into Fedora, because it is
> >     > not entirely obvious unless you are looking closely.
> >
> >     FWIW, I'm pushing back against the idea that we have to compile the
> >     BPF code from master source, as I think it is reasonable to have the
> >     program embedded as a static array in the source code similar to what
> >     DPDK does.  It doesn't feel much different from other places where
> >     apps
> >     use generated sources, and don't build them from the original source
> >     every time. eg "configure" is never re-generated from
> >     "configure.ac <http://configure.ac>"
> >     by Fedora packagers, they just use the generated "configure" script
> >     as-is.
> >
> >     Regards,
> >     Daniel
> >     --
> >     |: https://berrange.com     -o-
> >     https://www.flickr.com/photos/dberrange :|
> >     |: https://libvirt.org        -o- https://fstop138.berrange.com :|
> >     |: https://entangle-photo.org   -o-
> >     https://www.instagram.com/dberrange :|
> >
>
>

[-- Attachment #2: Type: text/html, Size: 15452 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-09 13:33                   ` Yuri Benditovich
@ 2020-11-10  2:23                     ` Jason Wang
  2020-11-10  8:00                       ` Yuri Benditovich
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2020-11-10  2:23 UTC (permalink / raw)
  To: Yuri Benditovich
  Cc: Yan Vugenfirer, Andrew Melnychenko, Daniel P. Berrangé,
	qemu-devel, Michael S . Tsirkin


On 2020/11/9 下午9:33, Yuri Benditovich wrote:
>
>
> On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com 
> <mailto:jasowang@redhat.com>> wrote:
>
>
>     On 2020/11/5 下午11:13, Yuri Benditovich wrote:
>     > First of all, thank you for all your feedbacks
>     >
>     > Please help me to summarize and let us understand better what we
>     do in v2:
>     > Major questions are:
>     > 1. Building eBPF from source during qemu build vs. regenerating
>     it on
>     > demand and keeping in the repository
>     > Solution 1a (~ as in v1): keep instructions or ELF in H file,
>     generate
>     > it out of qemu build. In general we'll need to have BE and LE
>     binaries.
>     > Solution 1b: build ELF or instructions during QEMU build if llvm +
>     > clang exist. Then we will have only one (BE or LE, depending on
>     > current QEMU build)
>     > We agree with any solution - I believe you know the requirements
>     better.
>
>
>     I think we can go with 1a. (See Danial's comment)
>
>
>     >
>     > 2. Use libbpf or not
>     > In general we do not see any advantage of using libbpf. It works
>     with
>     > object files (does ELF parsing at time of loading), but it does
>     not do
>     > any magic.
>     > Solution 2a. Switch to libbpf, generate object files (LE and BE)
>     from
>     > source, keep them inside QEMU (~8k each) or aside
>
>
>     Can we simply use dynamic linking here?
>
>
> Can you please explain, where exactly you suggest to use dynamic linking?


Yes. If I understand your 2a properly, you meant static linking of 
libbpf. So what I want to ask is the possibility of dynamic linking of 
libbpf here.


>
>     > Solution 2b. (as in v1) Use python script to parse object ->
>     > instructions (~2k each)
>     > We'd prefer not to use libbpf at the moment.
>     > If due to some unknown reason we'll find it useful in future, we
>     can
>     > switch to it, this does not create any incompatibility. Then
>     this will
>     > create a dependency on libbpf.so
>
>
>     I think we need to care about compatibility. E.g we need to enable
>     BTF
>     so I don't know how hard if we add BTF support in the current
>     design. It
>     would be probably OK it's not a lot of effort.
>
>
> As far as we understand BTF helps in BPF debugging and libbpf supports 
> it as is.
> Without libbpf we in v1 load the BPF instructions only.
> If you think the BTF is mandatory (BTW, why?) I think it is better to 
> switch to libbpf and keep the entire ELF in the qemu data.


It is used to make sure the BPF can do compile once run everywhere.

This is explained in detail in here: 
https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html.

Thanks


>
>
>     >
>     > 3. Keep instructions or ELF inside QEMU or as separate external file
>     > Solution 3a (~as in v1): Built-in array of instructions or ELF.
>     If we
>     > generate them out of QEMU build - keep 2 arrays or instructions
>     or ELF
>     > (BE and LE),
>     > Solution 3b: Install them as separate files (/usr/share/qemu).
>     > We'd prefer 3a:
>     >  Then there is a guarantee that the eBPF is built with exactly the
>     > same config structures as QEMU (qemu creates a mapping of its
>     > structures, eBPF uses them).
>     >  No need to take care on scenarios like 'file not found', 'file
>     is not
>     > suitable' etc
>
>
>     Yes, let's go 3a for upstream.
>
>
>     >
>     > 4. Is there some real request to have the eBPF for big-endian?
>     > If no, we can enable eBPF only for LE builds
>
>
>     We can go with LE first.
>
>     Thanks
>
>
>     >
>     > Jason, Daniel, Michael
>     > Can you please let us know what you think and why?
>     >
>     > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé
>     <berrange@redhat.com <mailto:berrange@redhat.com>
>     > <mailto:berrange@redhat.com <mailto:berrange@redhat.com>>> wrote:
>     >
>     >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé
>     wrote:
>     >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
>     >     > >
>     >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
>     >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang
>     wrote:
>     >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
>     >     > > > > >
>     >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
>     >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
>     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>
>     >     > > > > > <mailto:jasowang@redhat.com
>     <mailto:jasowang@redhat.com>
>     >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>>
>     wrote:
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
>     >     > > > > >      > Basic idea is to use eBPF to calculate and
>     steer
>     >     packets in TAP.
>     >     > > > > >      > RSS(Receive Side Scaling) is used to distribute
>     >     network packets
>     >     > > > > >      to guest virtqueues
>     >     > > > > >      > by calculating packet hash.
>     >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
>     >     > > > > >      >
>     >     > > > > >      > This set of patches introduces the usage of
>     eBPF
>     >     for packet steering
>     >     > > > > >      > and RSS hash calculation:
>     >     > > > > >      > * RSS(Receive Side Scaling) is used to
>     distribute
>     >     network packets to
>     >     > > > > >      > guest virtqueues by calculating packet hash
>     >     > > > > >      > * eBPF RSS suppose to be faster than already
>     >     existing 'software'
>     >     > > > > >      > implementation in QEMU
>     >     > > > > >      > * Additionally adding support for the usage of
>     >     RSS with vhost
>     >     > > > > >      >
>     >     > > > > >      > Supported kernels: 5.8+
>     >     > > > > >      >
>     >     > > > > >      > Implementation notes:
>     >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
>     >     set the eBPF program.
>     >     > > > > >      > Added eBPF support to qemu directly through a
>     >     system call, see the
>     >     > > > > >      > bpf(2) for details.
>     >     > > > > >      > The eBPF program is part of the qemu and
>     >     presented as an array
>     >     > > > > >      of bpf
>     >     > > > > >      > instructions.
>     >     > > > > >      > The program can be recompiled by provided
>     >     Makefile.ebpf(need to
>     >     > > > > >      adjust
>     >     > > > > >      > 'linuxhdrs'),
>     >     > > > > >      > although it's not required to build QEMU with
>     >     eBPF support.
>     >     > > > > >      > Added changes to virtio-net and vhost, primary
>     >     eBPF RSS is used.
>     >     > > > > >      > 'Software' RSS used in the case of hash
>     >     population and as a
>     >     > > > > >      fallback option.
>     >     > > > > >      > For vhost, the hash population feature is not
>     >     reported to the guest.
>     >     > > > > >      >
>     >     > > > > >      > Please also see the documentation in PATCH 6/6.
>     >     > > > > >      >
>     >     > > > > >      > I am sending those patches as RFC to
>     initiate the
>     >     discussions
>     >     > > > > >      and get
>     >     > > > > >      > feedback on the following points:
>     >     > > > > >      > * Fallback when eBPF is not supported by
>     the kernel
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      > * Live migration to the kernel that doesn't
>     have
>     >     eBPF support
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Is there anything that we needs special
>     treatment here?
>     >     > > > > >
>     >     > > > > > Possible case: rss=on, vhost=on, source system with
>     >     kernel 5.8
>     >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
>     >     work), the adapter
>     >     > > > > > functions, but all the steering does not use
>     proper queues.
>     >     > > > >
>     >     > > > > Right, I think we need to disable vhost on dest.
>     >     > > > >
>     >     > > > >
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      > * Integration with current QEMU build
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >      Yes, a question here:
>     >     > > > > >
>     >     > > > > >      1) Any reason for not using libbpf, e.g it
>     has been
>     >     shipped with some
>     >     > > > > >      distros
>     >     > > > > >
>     >     > > > > >
>     >     > > > > > We intentionally do not use libbpf, as it present only
>     >     on some distros.
>     >     > > > > > We can switch to libbpf, but this will disable bpf if
>     >     libbpf is not
>     >     > > > > > installed
>     >     > > > >
>     >     > > > > That's better I think.
>     >     > > > >
>     >     > > > >
>     >     > > > > >      2) It would be better if we can avoid shipping
>     >     bytecodes
>     >     > > > > >
>     >     > > > > >
>     >     > > > > >
>     >     > > > > > This creates new dependencies: llvm + clang + ...
>     >     > > > > > We would prefer byte code and ability to generate
>     it if
>     >     prerequisites
>     >     > > > > > are installed.
>     >     > > > >
>     >     > > > > It's probably ok if we treat the bytecode as a kind of
>     >     firmware.
>     >     > > > That is explicitly *not* OK for inclusion in Fedora. They
>     >     require that
>     >     > > > BPF is compiled from source, and rejected my
>     suggestion that
>     >     it could
>     >     > > > be considered a kind of firmware and thus have an
>     exception
>     >     from building
>     >     > > > from source.
>     >     > >
>     >     > >
>     >     > > Please refer what it was done in DPDK:
>     >     > >
>     >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
>     >     > >
>     >     > > I don't think what proposed here makes anything different.
>     >     >
>     >     > I'm not convinced that what DPDK does is acceptable to
>     Fedora either
>     >     > based on the responses I've received when asking about BPF
>     handling
>     >     > during build.  I wouldn't suprise me, however, if this was
>     simply
>     >     > missed by reviewers when accepting DPDK into Fedora,
>     because it is
>     >     > not entirely obvious unless you are looking closely.
>     >
>     >     FWIW, I'm pushing back against the idea that we have to
>     compile the
>     >     BPF code from master source, as I think it is reasonable to
>     have the
>     >     program embedded as a static array in the source code
>     similar to what
>     >     DPDK does.  It doesn't feel much different from other places
>     where
>     >     apps
>     >     use generated sources, and don't build them from the
>     original source
>     >     every time. eg "configure" is never re-generated from
>     >     "configure.ac <http://configure.ac> <http://configure.ac>"
>     >     by Fedora packagers, they just use the generated "configure"
>     script
>     >     as-is.
>     >
>     >     Regards,
>     >     Daniel
>     >     --
>     >     |: https://berrange.com     -o-
>     > https://www.flickr.com/photos/dberrange :|
>     >     |: https://libvirt.org        -o-
>     https://fstop138.berrange.com :|
>     >     |: https://entangle-photo.org   -o-
>     > https://www.instagram.com/dberrange :|
>     >
>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/6] eBPF RSS support for virtio-net
  2020-11-10  2:23                     ` Jason Wang
@ 2020-11-10  8:00                       ` Yuri Benditovich
  0 siblings, 0 replies; 36+ messages in thread
From: Yuri Benditovich @ 2020-11-10  8:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: Yan Vugenfirer, Andrew Melnychenko, Daniel P. Berrangé,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 13160 bytes --]

On Tue, Nov 10, 2020 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:

>
> On 2020/11/9 下午9:33, Yuri Benditovich wrote:
> >
> >
> > On Mon, Nov 9, 2020 at 4:14 AM Jason Wang <jasowang@redhat.com
> > <mailto:jasowang@redhat.com>> wrote:
> >
> >
> >     On 2020/11/5 下午11:13, Yuri Benditovich wrote:
> >     > First of all, thank you for all your feedbacks
> >     >
> >     > Please help me to summarize and let us understand better what we
> >     do in v2:
> >     > Major questions are:
> >     > 1. Building eBPF from source during qemu build vs. regenerating
> >     it on
> >     > demand and keeping in the repository
> >     > Solution 1a (~ as in v1): keep instructions or ELF in H file,
> >     generate
> >     > it out of qemu build. In general we'll need to have BE and LE
> >     binaries.
> >     > Solution 1b: build ELF or instructions during QEMU build if llvm +
> >     > clang exist. Then we will have only one (BE or LE, depending on
> >     > current QEMU build)
> >     > We agree with any solution - I believe you know the requirements
> >     better.
> >
> >
> >     I think we can go with 1a. (See Danial's comment)
> >
> >
> >     >
> >     > 2. Use libbpf or not
> >     > In general we do not see any advantage of using libbpf. It works
> >     with
> >     > object files (does ELF parsing at time of loading), but it does
> >     not do
> >     > any magic.
> >     > Solution 2a. Switch to libbpf, generate object files (LE and BE)
> >     from
> >     > source, keep them inside QEMU (~8k each) or aside
> >
> >
> >     Can we simply use dynamic linking here?
> >
> >
> > Can you please explain, where exactly you suggest to use dynamic linking?
>
>
> Yes. If I understand your 2a properly, you meant static linking of
> libbpf. So what I want to ask is the possibility of dynamic linking of
> libbpf here.
>
>
As Daniel explained above, QEMU is always linked dynamically vs libraries.
Also I see the libbpf package does not even contain the static library.
If the build environment contains libbpf, the libbpf.so becomes runtime
dependency, just as with other libs.


>
> >
> >     > Solution 2b. (as in v1) Use python script to parse object ->
> >     > instructions (~2k each)
> >     > We'd prefer not to use libbpf at the moment.
> >     > If due to some unknown reason we'll find it useful in future, we
> >     can
> >     > switch to it, this does not create any incompatibility. Then
> >     this will
> >     > create a dependency on libbpf.so
> >
> >
> >     I think we need to care about compatibility. E.g we need to enable
> >     BTF
> >     so I don't know how hard if we add BTF support in the current
> >     design. It
> >     would be probably OK it's not a lot of effort.
> >
> >
> > As far as we understand BTF helps in BPF debugging and libbpf supports
> > it as is.
> > Without libbpf we in v1 load the BPF instructions only.
> > If you think the BTF is mandatory (BTW, why?) I think it is better to
> > switch to libbpf and keep the entire ELF in the qemu data.
>
>
> It is used to make sure the BPF can do compile once run everywhere.
>
> This is explained in detail in here:
>
> https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html
> .
>
>
Thank you, then there is no question, we need to use libbpf.


> Thanks
>
>
> >
> >
> >     >
> >     > 3. Keep instructions or ELF inside QEMU or as separate external
> file
> >     > Solution 3a (~as in v1): Built-in array of instructions or ELF.
> >     If we
> >     > generate them out of QEMU build - keep 2 arrays or instructions
> >     or ELF
> >     > (BE and LE),
> >     > Solution 3b: Install them as separate files (/usr/share/qemu).
> >     > We'd prefer 3a:
> >     >  Then there is a guarantee that the eBPF is built with exactly the
> >     > same config structures as QEMU (qemu creates a mapping of its
> >     > structures, eBPF uses them).
> >     >  No need to take care on scenarios like 'file not found', 'file
> >     is not
> >     > suitable' etc
> >
> >
> >     Yes, let's go 3a for upstream.
> >
> >
> >     >
> >     > 4. Is there some real request to have the eBPF for big-endian?
> >     > If no, we can enable eBPF only for LE builds
> >
> >
> >     We can go with LE first.
> >
> >     Thanks
> >
> >
> >     >
> >     > Jason, Daniel, Michael
> >     > Can you please let us know what you think and why?
> >     >
> >     > On Thu, Nov 5, 2020 at 3:19 PM Daniel P. Berrangé
> >     <berrange@redhat.com <mailto:berrange@redhat.com>
> >     > <mailto:berrange@redhat.com <mailto:berrange@redhat.com>>> wrote:
> >     >
> >     >     On Thu, Nov 05, 2020 at 10:01:09AM +0000, Daniel P. Berrangé
> >     wrote:
> >     >     > On Thu, Nov 05, 2020 at 11:46:18AM +0800, Jason Wang wrote:
> >     >     > >
> >     >     > > On 2020/11/4 下午5:31, Daniel P. Berrangé wrote:
> >     >     > > > On Wed, Nov 04, 2020 at 10:07:52AM +0800, Jason Wang
> >     wrote:
> >     >     > > > > On 2020/11/3 下午6:32, Yuri Benditovich wrote:
> >     >     > > > > >
> >     >     > > > > > On Tue, Nov 3, 2020 at 11:02 AM Jason Wang
> >     >     <jasowang@redhat.com <mailto:jasowang@redhat.com>
> >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>
> >     >     > > > > > <mailto:jasowang@redhat.com
> >     <mailto:jasowang@redhat.com>
> >     >     <mailto:jasowang@redhat.com <mailto:jasowang@redhat.com>>>>
> >     wrote:
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      On 2020/11/3 上午2:51, Andrew Melnychenko wrote:
> >     >     > > > > >      > Basic idea is to use eBPF to calculate and
> >     steer
> >     >     packets in TAP.
> >     >     > > > > >      > RSS(Receive Side Scaling) is used to
> distribute
> >     >     network packets
> >     >     > > > > >      to guest virtqueues
> >     >     > > > > >      > by calculating packet hash.
> >     >     > > > > >      > eBPF RSS allows us to use RSS with vhost TAP.
> >     >     > > > > >      >
> >     >     > > > > >      > This set of patches introduces the usage of
> >     eBPF
> >     >     for packet steering
> >     >     > > > > >      > and RSS hash calculation:
> >     >     > > > > >      > * RSS(Receive Side Scaling) is used to
> >     distribute
> >     >     network packets to
> >     >     > > > > >      > guest virtqueues by calculating packet hash
> >     >     > > > > >      > * eBPF RSS suppose to be faster than already
> >     >     existing 'software'
> >     >     > > > > >      > implementation in QEMU
> >     >     > > > > >      > * Additionally adding support for the usage of
> >     >     RSS with vhost
> >     >     > > > > >      >
> >     >     > > > > >      > Supported kernels: 5.8+
> >     >     > > > > >      >
> >     >     > > > > >      > Implementation notes:
> >     >     > > > > >      > Linux TAP TUNSETSTEERINGEBPF ioctl was used to
> >     >     set the eBPF program.
> >     >     > > > > >      > Added eBPF support to qemu directly through a
> >     >     system call, see the
> >     >     > > > > >      > bpf(2) for details.
> >     >     > > > > >      > The eBPF program is part of the qemu and
> >     >     presented as an array
> >     >     > > > > >      of bpf
> >     >     > > > > >      > instructions.
> >     >     > > > > >      > The program can be recompiled by provided
> >     >     Makefile.ebpf(need to
> >     >     > > > > >      adjust
> >     >     > > > > >      > 'linuxhdrs'),
> >     >     > > > > >      > although it's not required to build QEMU with
> >     >     eBPF support.
> >     >     > > > > >      > Added changes to virtio-net and vhost, primary
> >     >     eBPF RSS is used.
> >     >     > > > > >      > 'Software' RSS used in the case of hash
> >     >     population and as a
> >     >     > > > > >      fallback option.
> >     >     > > > > >      > For vhost, the hash population feature is not
> >     >     reported to the guest.
> >     >     > > > > >      >
> >     >     > > > > >      > Please also see the documentation in PATCH
> 6/6.
> >     >     > > > > >      >
> >     >     > > > > >      > I am sending those patches as RFC to
> >     initiate the
> >     >     discussions
> >     >     > > > > >      and get
> >     >     > > > > >      > feedback on the following points:
> >     >     > > > > >      > * Fallback when eBPF is not supported by
> >     the kernel
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, and it could also a lacking of CAP_BPF.
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Live migration to the kernel that doesn't
> >     have
> >     >     eBPF support
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Is there anything that we needs special
> >     treatment here?
> >     >     > > > > >
> >     >     > > > > > Possible case: rss=on, vhost=on, source system with
> >     >     kernel 5.8
> >     >     > > > > > (everything works) -> dest. system 5.6 (bpf does not
> >     >     work), the adapter
> >     >     > > > > > functions, but all the steering does not use
> >     proper queues.
> >     >     > > > >
> >     >     > > > > Right, I think we need to disable vhost on dest.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      > * Integration with current QEMU build
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >      Yes, a question here:
> >     >     > > > > >
> >     >     > > > > >      1) Any reason for not using libbpf, e.g it
> >     has been
> >     >     shipped with some
> >     >     > > > > >      distros
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > We intentionally do not use libbpf, as it present
> only
> >     >     on some distros.
> >     >     > > > > > We can switch to libbpf, but this will disable bpf if
> >     >     libbpf is not
> >     >     > > > > > installed
> >     >     > > > >
> >     >     > > > > That's better I think.
> >     >     > > > >
> >     >     > > > >
> >     >     > > > > >      2) It would be better if we can avoid shipping
> >     >     bytecodes
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > >
> >     >     > > > > > This creates new dependencies: llvm + clang + ...
> >     >     > > > > > We would prefer byte code and ability to generate
> >     it if
> >     >     prerequisites
> >     >     > > > > > are installed.
> >     >     > > > >
> >     >     > > > > It's probably ok if we treat the bytecode as a kind of
> >     >     firmware.
> >     >     > > > That is explicitly *not* OK for inclusion in Fedora. They
> >     >     require that
> >     >     > > > BPF is compiled from source, and rejected my
> >     suggestion that
> >     >     it could
> >     >     > > > be considered a kind of firmware and thus have an
> >     exception
> >     >     from building
> >     >     > > > from source.
> >     >     > >
> >     >     > >
> >     >     > > Please refer what it was done in DPDK:
> >     >     > >
> >     >     > > http://git.dpdk.org/dpdk/tree/doc/guides/nics/tap.rst#n235
> >     >     > >
> >     >     > > I don't think what proposed here makes anything different.
> >     >     >
> >     >     > I'm not convinced that what DPDK does is acceptable to
> >     Fedora either
> >     >     > based on the responses I've received when asking about BPF
> >     handling
> >     >     > during build.  I wouldn't suprise me, however, if this was
> >     simply
> >     >     > missed by reviewers when accepting DPDK into Fedora,
> >     because it is
> >     >     > not entirely obvious unless you are looking closely.
> >     >
> >     >     FWIW, I'm pushing back against the idea that we have to
> >     compile the
> >     >     BPF code from master source, as I think it is reasonable to
> >     have the
> >     >     program embedded as a static array in the source code
> >     similar to what
> >     >     DPDK does.  It doesn't feel much different from other places
> >     where
> >     >     apps
> >     >     use generated sources, and don't build them from the
> >     original source
> >     >     every time. eg "configure" is never re-generated from
> >     >     "configure.ac <http://configure.ac> <http://configure.ac>"
> >     >     by Fedora packagers, they just use the generated "configure"
> >     script
> >     >     as-is.
> >     >
> >     >     Regards,
> >     >     Daniel
> >     >     --
> >     >     |: https://berrange.com     -o-
> >     > https://www.flickr.com/photos/dberrange :|
> >     >     |: https://libvirt.org        -o-
> >     https://fstop138.berrange.com :|
> >     >     |: https://entangle-photo.org   -o-
> >     > https://www.instagram.com/dberrange :|
> >     >
> >
>
>

[-- Attachment #2: Type: text/html, Size: 20842 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-11-10  8:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 18:51 [RFC PATCH 0/6] eBPF RSS support for virtio-net Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 1/6] net: Added SetSteeringEBPF method for NetClientState Andrew Melnychenko
2020-11-04  2:49   ` Jason Wang
2020-11-04  9:34     ` Yuri Benditovich
2020-11-02 18:51 ` [RFC PATCH 2/6] ebpf: Added basic eBPF API Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 3/6] ebpf: Added eBPF RSS program Andrew Melnychenko
2020-11-03 13:07   ` Daniel P. Berrangé
2020-11-02 18:51 ` [RFC PATCH 4/6] ebpf: Added eBPF RSS loader Andrew Melnychenko
2020-11-02 18:51 ` [RFC PATCH 5/6] virtio-net: Added eBPF RSS to virtio-net Andrew Melnychenko
2020-11-04  3:09   ` Jason Wang
2020-11-04 11:07     ` Yuri Benditovich
2020-11-04 11:13       ` Daniel P. Berrangé
2020-11-04 15:51         ` Yuri Benditovich
2020-11-05  3:29       ` Jason Wang
2020-11-02 18:51 ` [RFC PATCH 6/6] docs: Added eBPF documentation Andrew Melnychenko
2020-11-04  3:15   ` Jason Wang
2020-11-05  3:56   ` Jason Wang
2020-11-05  9:40     ` Yuri Benditovich
2020-11-03  9:02 ` [RFC PATCH 0/6] eBPF RSS support for virtio-net Jason Wang
2020-11-03 10:32   ` Yuri Benditovich
2020-11-03 11:56     ` Daniel P. Berrangé
2020-11-04  2:15       ` Jason Wang
2020-11-04  2:07     ` Jason Wang
2020-11-04  9:31       ` Daniel P. Berrangé
2020-11-05  3:46         ` Jason Wang
2020-11-05  3:52           ` Jason Wang
2020-11-05  9:11             ` Yuri Benditovich
2020-11-05 10:01           ` Daniel P. Berrangé
2020-11-05 13:19             ` Daniel P. Berrangé
2020-11-05 15:13               ` Yuri Benditovich
2020-11-09  2:13                 ` Jason Wang
2020-11-09 13:33                   ` Yuri Benditovich
2020-11-10  2:23                     ` Jason Wang
2020-11-10  8:00                       ` Yuri Benditovich
2020-11-04 11:49       ` Yuri Benditovich
2020-11-04 12:04         ` Daniel P. Berrangé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.