All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Unified Datagram Socket Transport
@ 2017-07-20 19:12 anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 1/5] Unified Datagram Socket Transports anton.ivanov
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang

Hi all,

This addresses comments so far except Eric's suggestion to use
InetSocketAddressBase. If I understand correctly its intended use,
it will not be of help for protocols which have no port (raw
sockets - GRE, L2TPv3, etc).

It also includes a port of the original socket.c transport to
the new UDST backend. The relevant code is ifdef-ed so there
should be no effect on other systems.

I think that this is would be the appropriate place to stop in this
iteration. I would prefer to have this polished, before I start
looking at sendmmsg and bulk send or some of the more unpleasant
encapsulations like geneve.

A.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 1/5] Unified Datagram Socket Transports
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
@ 2017-07-20 19:12 ` anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 2/5] Migrate l2tpv3 to UDST Backend anton.ivanov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Basic infrastructure to start moving datagram based transports
to a common infrastructure as well as introduce several
additional transports.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 configure         |  12 +-
 net/Makefile.objs |   2 +-
 net/net.c         |   4 +-
 net/udst.c        | 420 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/udst.h        | 121 ++++++++++++++++
 qapi-schema.json  |  19 ++-
 qemu-options.hx   |   2 +-
 7 files changed, 569 insertions(+), 11 deletions(-)
 create mode 100644 net/udst.c
 create mode 100644 net/udst.h

diff --git a/configure b/configure
index bad50f5368..00c911c49b 100755
--- a/configure
+++ b/configure
@@ -1862,7 +1862,9 @@ if ! compile_object -Werror ; then
 fi
 
 ##########################################
-# L2TPV3 probe
+# UDST probe
+# identical to L2TPv3 probe used for both
+# during migration of L2TPv3 to udst backend
 
 cat > $TMPC <<EOF
 #include <sys/socket.h>
@@ -1870,9 +1872,9 @@ cat > $TMPC <<EOF
 int main(void) { return sizeof(struct mmsghdr); }
 EOF
 if compile_prog "" "" ; then
-  l2tpv3=yes
+  udst=yes
 else
-  l2tpv3=no
+  udst=no
 fi
 
 ##########################################
@@ -5491,8 +5493,8 @@ fi
 if test "$netmap" = "yes" ; then
   echo "CONFIG_NETMAP=y" >> $config_host_mak
 fi
-if test "$l2tpv3" = "yes" ; then
-  echo "CONFIG_L2TPV3=y" >> $config_host_mak
+if test "$udst" = "yes" ; then
+  echo "CONFIG_UDST=y" >> $config_host_mak
 fi
 if test "$cap_ng" = "yes" ; then
   echo "CONFIG_LIBCAP=y" >> $config_host_mak
diff --git a/net/Makefile.objs b/net/Makefile.objs
index 67ba5e26fb..ffdfb96bd0 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_L2TPV3) += l2tpv3.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/net.c b/net/net.c
index 0e28099554..723a256260 100644
--- a/net/net.c
+++ b/net/net.c
@@ -960,8 +960,8 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_VHOST_NET_USED
         [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user,
 #endif
-#ifdef CONFIG_L2TPV3
-        [NET_CLIENT_DRIVER_L2TPV3]    = net_init_l2tpv3,
+#ifdef CONFIG_UDST
+        [NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3,
 #endif
 };
 
diff --git a/net/udst.c b/net/udst.c
new file mode 100644
index 0000000000..612c90cb3a
--- /dev/null
+++ b/net/udst.c
@@ -0,0 +1,420 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
+ * Copyright (c) 2012-2014 Cisco Systems
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * Udst Datagram Socket Transport Backend
+ * This transport is not intended to be initiated directly by an end-user
+ * It is used as a backend for other transports which use recv/sendmmsg
+ * socket functions for RX/TX.
+ */
+
+#include "qemu/osdep.h"
+#include <linux/ip.h>
+#include <netdb.h>
+#include "net/net.h"
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+static void net_udst_send(void *opaque);
+static void udst_writable(void *opaque);
+
+static void udst_update_fd_handler(NetUdstState *s)
+{
+    qemu_set_fd_handler(s->fd,
+                        s->read_poll ? net_udst_send : NULL,
+                        s->write_poll ? udst_writable : NULL,
+                        s);
+}
+
+static void udst_read_poll(NetUdstState *s, bool enable)
+{
+    if (s->read_poll != enable) {
+        s->read_poll = enable;
+        udst_update_fd_handler(s);
+    }
+}
+
+static void udst_write_poll(NetUdstState *s, bool enable)
+{
+    if (s->write_poll != enable) {
+        s->write_poll = enable;
+        udst_update_fd_handler(s);
+    }
+}
+
+static void udst_writable(void *opaque)
+{
+    NetUdstState *s = opaque;
+    udst_write_poll(s, false);
+    qemu_flush_queued_packets(&s->nc);
+}
+
+static void udst_send_completed(NetClientState *nc, ssize_t len)
+{
+    NetUdstState *s = DO_UPCAST(NetUdstState, nc, nc);
+    udst_read_poll(s, true);
+}
+
+static void udst_poll(NetClientState *nc, bool enable)
+{
+    NetUdstState *s = DO_UPCAST(NetUdstState, nc, nc);
+    udst_write_poll(s, enable);
+    udst_read_poll(s, enable);
+}
+
+static ssize_t net_udst_receive_dgram_iov(NetClientState *nc,
+                    const struct iovec *iov,
+                    int iovcnt)
+{
+    NetUdstState *s = DO_UPCAST(NetUdstState, nc, nc);
+
+    struct msghdr message;
+    int ret;
+
+    if (iovcnt > MAX_UNIFIED_IOVCNT - 1) {
+        error_report(
+            "iovec too long %d > %d, change udst.h",
+            iovcnt, MAX_UNIFIED_IOVCNT
+        );
+        return -1;
+    }
+    if (s->offset > 0) {
+        s->form_header(s);
+        memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
+        s->vec->iov_base = s->header_buf;
+        s->vec->iov_len = s->offset;
+        message.msg_iovlen = iovcnt + 1;
+    } else {
+        memcpy(s->vec, iov, iovcnt * sizeof(struct iovec));
+        message.msg_iovlen = iovcnt;
+    }
+    message.msg_name = s->dgram_dst;
+    message.msg_namelen = s->dst_size;
+    message.msg_iov = s->vec;
+    message.msg_control = NULL;
+    message.msg_controllen = 0;
+    message.msg_flags = 0;
+    do {
+        ret = sendmsg(s->fd, &message, 0);
+    } while ((ret == -1) && (errno == EINTR));
+    if (ret > 0) {
+        ret -= s->offset;
+    } else if (ret == 0) {
+        /* belt and braces - should not occur on DGRAM
+        * we should get an error and never a 0 send
+        */
+        ret = iov_size(iov, iovcnt);
+    } else {
+        /* signal upper layer that socket buffer is full */
+        ret = -errno;
+        if (ret == -EAGAIN || ret == -ENOBUFS) {
+            udst_write_poll(s, true);
+            ret = 0;
+        }
+    }
+    return ret;
+}
+
+static ssize_t net_udst_receive_dgram(NetClientState *nc,
+                    const uint8_t *buf,
+                    size_t size)
+{
+    NetUdstState *s = DO_UPCAST(NetUdstState, nc, nc);
+
+    struct iovec *vec;
+    struct msghdr message;
+    ssize_t ret = 0;
+
+    vec = s->vec;
+    if (s->offset > 0) {
+        s->form_header(s);
+        vec->iov_base = s->header_buf;
+        vec->iov_len = s->offset;
+        message.msg_iovlen = 2;
+        vec++;
+    } else {
+        message.msg_iovlen = 1;
+    }
+    vec->iov_base = (void *) buf;
+    vec->iov_len = size;
+    message.msg_name = s->dgram_dst;
+    message.msg_namelen = s->dst_size;
+    message.msg_iov = s->vec;
+    message.msg_control = NULL;
+    message.msg_controllen = 0;
+    message.msg_flags = 0;
+    do {
+        ret = sendmsg(s->fd, &message, 0);
+    } while ((ret == -1) && (errno == EINTR));
+    if (ret > 0) {
+        ret -= s->offset;
+    } else if (ret == 0) {
+        /* belt and braces - should not occur on DGRAM
+        * we should get an error and never a 0 send
+        */
+        ret = size;
+    } else {
+        ret = -errno;
+        if (ret == -EAGAIN || ret == -ENOBUFS) {
+            /* signal upper layer that socket buffer is full */
+            udst_write_poll(s, true);
+            ret = 0;
+        }
+    }
+    return ret;
+}
+
+
+static void net_udst_process_queue(NetUdstState *s)
+{
+    int size = 0;
+    struct iovec *vec;
+    bool bad_read;
+    int data_size;
+    struct mmsghdr *msgvec;
+
+    /* go into ring mode only if there is a "pending" tail */
+    if (s->queue_depth > 0) {
+        do {
+            msgvec = s->msgvec + s->queue_tail;
+            if (msgvec->msg_len > 0) {
+                data_size = msgvec->msg_len - s->header_size;
+                vec = msgvec->msg_hdr.msg_iov;
+                if ((data_size > 0) &&
+                    (s->verify_header(s, vec->iov_base) == 0)) {
+                    if (s->header_size > 0) {
+                        vec++;
+                    }
+                    /* Use the legacy delivery for now, we will
+                     * switch to using our own ring as a queueing mechanism
+                     * at a later date
+                     */
+                    size = qemu_send_packet_async(
+                            &s->nc,
+                            vec->iov_base,
+                            data_size,
+                            udst_send_completed
+                        );
+                    if (size == 0) {
+                        udst_read_poll(s, false);
+                    }
+                    bad_read = false;
+                } else {
+                    bad_read = true;
+                    if (!s->header_mismatch) {
+                        /* report error only once */
+                        error_report("udst header verification failed");
+                        s->header_mismatch = true;
+                    }
+                }
+            } else {
+                bad_read = true;
+            }
+            s->queue_tail = (s->queue_tail + 1) % MAX_UNIFIED_MSGCNT;
+            s->queue_depth--;
+        } while (
+                (s->queue_depth > 0) &&
+                 qemu_can_send_packet(&s->nc) &&
+                ((size > 0) || bad_read)
+            );
+    }
+}
+
+static void net_udst_send(void *opaque)
+{
+    NetUdstState *s = opaque;
+    int target_count, count;
+    struct mmsghdr *msgvec;
+
+    /* go into ring mode only if there is a "pending" tail */
+
+    if (s->queue_depth) {
+
+        /* The ring buffer we use has variable intake
+         * count of how much we can read varies - adjust accordingly
+         */
+
+        target_count = MAX_UNIFIED_MSGCNT - s->queue_depth;
+
+        /* Ensure we do not overrun the ring when we have
+         * a lot of enqueued packets
+         */
+
+        if (s->queue_head + target_count > MAX_UNIFIED_MSGCNT) {
+            target_count = MAX_UNIFIED_MSGCNT - s->queue_head;
+        }
+    } else {
+
+        /* we do not have any pending packets - we can use
+        * the whole message vector linearly instead of using
+        * it as a ring
+        */
+
+        s->queue_head = 0;
+        s->queue_tail = 0;
+        target_count = MAX_UNIFIED_MSGCNT;
+    }
+
+    msgvec = s->msgvec + s->queue_head;
+    if (target_count > 0) {
+        do {
+            count = recvmmsg(
+                s->fd,
+                msgvec,
+                target_count, MSG_DONTWAIT, NULL);
+        } while ((count == -1) && (errno == EINTR));
+        if (count < 0) {
+            /* Recv error - we still need to flush packets here,
+             * (re)set queue head to current position
+             */
+            count = 0;
+        }
+        s->queue_head = (s->queue_head + count) % MAX_UNIFIED_MSGCNT;
+        s->queue_depth += count;
+    }
+    net_udst_process_queue(s);
+}
+
+static void destroy_vector(struct mmsghdr *msgvec, int count, int iovcount)
+{
+    int i, j;
+    struct iovec *iov;
+    struct mmsghdr *cleanup = msgvec;
+    if (cleanup) {
+        for (i = 0; i < count; i++) {
+            if (cleanup->msg_hdr.msg_iov) {
+                iov = cleanup->msg_hdr.msg_iov;
+                for (j = 0; j < iovcount; j++) {
+                    g_free(iov->iov_base);
+                    iov++;
+                }
+                g_free(cleanup->msg_hdr.msg_iov);
+            }
+            cleanup++;
+        }
+        g_free(msgvec);
+    }
+}
+
+
+
+static struct mmsghdr *build_udst_vector(NetUdstState *s, int count)
+{
+    int i;
+    struct iovec *iov;
+    struct mmsghdr *msgvec, *result;
+
+    msgvec = g_new(struct mmsghdr, count);
+    result = msgvec;
+    for (i = 0; i < count ; i++) {
+        msgvec->msg_hdr.msg_name = NULL;
+        msgvec->msg_hdr.msg_namelen = 0;
+        iov =  g_new(struct iovec, IOVSIZE);
+        msgvec->msg_hdr.msg_iov = iov;
+        if (s->header_size > 0) {
+            iov->iov_base = g_malloc(s->header_size);
+            iov->iov_len = s->header_size;
+            iov++ ;
+        }
+        iov->iov_base = qemu_memalign(BUFFER_ALIGN, BUFFER_SIZE);
+        iov->iov_len = BUFFER_SIZE;
+        msgvec->msg_hdr.msg_iovlen = 2;
+        msgvec->msg_hdr.msg_control = NULL;
+        msgvec->msg_hdr.msg_controllen = 0;
+        msgvec->msg_hdr.msg_flags = 0;
+        msgvec++;
+    }
+    return result;
+}
+
+static void net_udst_cleanup(NetClientState *nc)
+{
+    NetUdstState *s = DO_UPCAST(NetUdstState, nc, nc);
+    qemu_purge_queued_packets(nc);
+    udst_read_poll(s, false);
+    udst_write_poll(s, false);
+    if (s->fd >= 0) {
+        close(s->fd);
+    }
+    if (s->header_size > 0) {
+        destroy_vector(s->msgvec, MAX_UNIFIED_MSGCNT, IOVSIZE);
+    } else {
+        destroy_vector(s->msgvec, MAX_UNIFIED_MSGCNT, 1);
+    }
+    g_free(s->vec);
+    if (s->header_buf != NULL) {
+        g_free(s->header_buf);
+    }
+    if (s->dgram_dst != NULL) {
+        g_free(s->dgram_dst);
+    }
+}
+
+static NetClientInfo net_udst_info = {
+    /* we share this one for all types for now, wrong I know :) */
+    .type = NET_CLIENT_DRIVER_UDST,
+    .size = sizeof(NetUdstState),
+    .receive = net_udst_receive_dgram,
+    .receive_iov = net_udst_receive_dgram_iov,
+    .poll = udst_poll,
+    .cleanup = net_udst_cleanup,
+};
+
+NetClientState *qemu_new_udst_net_client(const char *name,
+                    NetClientState *peer) {
+    return qemu_new_net_client(&net_udst_info, peer, "udst", name);
+}
+
+void qemu_net_finalize_udst_init(NetUdstState *s,
+        int (*verify_header)(void *s, uint8_t *buf),
+        void (*form_header)(void *s),
+        int fd)
+{
+
+    s->form_header = form_header;
+    s->verify_header = verify_header;
+    s->queue_head = 0;
+    s->queue_tail = 0;
+    s->header_mismatch = false;
+    s->msgvec = build_udst_vector(s, MAX_UNIFIED_MSGCNT);
+    s->vec = g_new(struct iovec, MAX_UNIFIED_IOVCNT);
+    if (s->header_size > 0) {
+        s->header_buf = g_malloc(s->header_size);
+    } else {
+        s->header_buf = NULL;
+    }
+    qemu_set_nonblock(fd);
+
+    s->fd = fd;
+    udst_read_poll(s, true);
+
+}
diff --git a/net/udst.h b/net/udst.h
new file mode 100644
index 0000000000..2a6b44c74d
--- /dev/null
+++ b/net/udst.h
@@ -0,0 +1,121 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
+ * Copyright (c) 2012-2014 Cisco Systems
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+
+#define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
+#define BUFFER_SIZE 2048
+#define IOVSIZE 2
+#define MAX_UNIFIED_MSGCNT 64
+#define MAX_UNIFIED_IOVCNT (MAX_UNIFIED_MSGCNT * IOVSIZE)
+
+#ifndef QEMU_NET_UNIFIED_H
+#define QEMU_NET_UNIFIED_H
+
+typedef struct NetUdstState {
+    NetClientState nc;
+
+    int fd;
+
+    /*
+     * these are used for xmit - that happens packet a time
+     * and for first sign of life packet (easier to parse that once)
+     */
+
+    uint8_t *header_buf;
+    struct iovec *vec;
+
+    /*
+     * these are used for receive - try to "eat" up to 32 packets at a time
+     */
+
+    struct mmsghdr *msgvec;
+
+    /*
+     * peer address
+     */
+
+    struct sockaddr_storage *dgram_dst;
+    uint32_t dst_size;
+
+    /*
+     * Internal Queue
+     */
+
+    /*
+    * DOS avoidance in error handling
+    */
+
+    /* Easier to keep l2tpv3 specific */
+
+    bool header_mismatch;
+
+    /*
+     *
+     * Ring buffer handling
+     *
+     */
+
+    int queue_head;
+    int queue_tail;
+    int queue_depth;
+
+    /*
+     * Offset to data - common for all protocols
+     */
+
+    uint32_t offset;
+
+    /*
+     * Header size - common for all protocols
+     */
+
+    uint32_t header_size;
+    /* Poll Control */
+
+    bool read_poll;
+    bool write_poll;
+
+    /* Parameters */
+
+    void *params;
+
+    /* header forming functions */
+
+    int (*verify_header)(void *s, uint8_t *buf);
+    void (*form_header)(void *s);
+
+} NetUdstState;
+
+extern NetClientState *qemu_new_udst_net_client(const char *name,
+                    NetClientState *peer);
+
+extern void qemu_net_finalize_udst_init(NetUdstState *s,
+        int (*verify_header)(void *s, uint8_t *buf),
+        void (*form_header)(void *s),
+        int fd);
+#endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 8b015bee2e..62a044f006 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3854,6 +3854,18 @@
     '*offset':      'uint32' } }
 
 ##
+# @NetdevUdstOptions:
+#
+# Common Datagram backend for unified datagram
+# socket transports
+# Should not be instantiated directly
+
+# Since: 2.11
+##
+{ 'struct': 'NetdevUdstOptions',
+  'data': { } }
+
+##
 # @NetdevVdeOptions:
 #
 # Connect the VLAN to a vde switch running on the host.
@@ -3971,7 +3983,7 @@
 ##
 { 'enum': 'NetClientDriver',
   'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde', 'dump',
-            'bridge', 'hubport', 'netmap', 'vhost-user' ] }
+            'bridge', 'hubport', 'netmap', 'vhost-user', 'udst' ] }
 
 ##
 # @Netdev:
@@ -3985,6 +3997,8 @@
 # Since: 1.2
 #
 # 'l2tpv3' - since 2.1
+#
+# 'udst' - since 2.11
 ##
 { 'union': 'Netdev',
   'base': { 'id': 'str', 'type': 'NetClientDriver' },
@@ -4001,7 +4015,8 @@
     'bridge':   'NetdevBridgeOptions',
     'hubport':  'NetdevHubPortOptions',
     'netmap':   'NetdevNetmapOptions',
-    'vhost-user': 'NetdevVhostUserOptions' } }
+    'vhost-user': 'NetdevVhostUserOptions',
+    'udst':     'NetdevUdstOptions' } }
 
 ##
 # @NetLegacy:
diff --git a/qemu-options.hx b/qemu-options.hx
index 746b5fa75d..9caf53fd76 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1945,7 +1945,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
     "                connected to a bridge (default=" DEFAULT_BRIDGE_INTERFACE ")\n"
     "                using the program 'helper (default=" DEFAULT_BRIDGE_HELPER ")\n"
 #endif
-#ifdef __linux__
+#ifdef CONFIG_UDST
     "-netdev l2tpv3,id=str,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport]\n"
     "         [,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,udp=on/off]\n"
     "         [,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 2/5] Migrate l2tpv3 to UDST Backend
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 1/5] Unified Datagram Socket Transports anton.ivanov
@ 2017-07-20 19:12 ` anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 3/5] GRETAP Backend for UDST anton.ivanov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

1. Migrate L2TPv3 transport to the Unified Datagram Socket
Transport Backend.

2. Make v4/v6 behaviour identical to all other transports

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 net/l2tpv3.c     | 566 +++++++++++--------------------------------------------
 qapi-schema.json |   1 +
 qemu-options.hx  |   9 +-
 3 files changed, 117 insertions(+), 459 deletions(-)

diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index 6745b78990..f52e0339a4 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -1,6 +1,7 @@
 /*
  * QEMU System Emulator
  *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
  * Copyright (c) 2003-2008 Fabrice Bellard
  * Copyright (c) 2012-2014 Cisco Systems
  *
@@ -30,23 +31,14 @@
 #include "clients.h"
 #include "qemu-common.h"
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 #include "qemu/option.h"
 #include "qemu/sockets.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
+#include "udst.h"
 
 
-/* The buffer size needs to be investigated for optimum numbers and
- * optimum means of paging in on different systems. This size is
- * chosen to be sufficient to accommodate one packet with some headers
- */
-
-#define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
-#define BUFFER_SIZE 2048
-#define IOVSIZE 2
-#define MAX_L2TPV3_MSGCNT 64
-#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
-
 /* Header set to 0x30000 signifies a data packet */
 
 #define L2TPV3_DATA_PACKET 0x30000
@@ -57,31 +49,7 @@
 #define IPPROTO_L2TP 0x73
 #endif
 
-typedef struct NetL2TPV3State {
-    NetClientState nc;
-    int fd;
-
-    /*
-     * these are used for xmit - that happens packet a time
-     * and for first sign of life packet (easier to parse that once)
-     */
-
-    uint8_t *header_buf;
-    struct iovec *vec;
-
-    /*
-     * these are used for receive - try to "eat" up to 32 packets at a time
-     */
-
-    struct mmsghdr *msgvec;
-
-    /*
-     * peer address
-     */
-
-    struct sockaddr_storage *dgram_dst;
-    uint32_t dst_size;
-
+typedef struct L2TPV3TunnelParams {
     /*
      * L2TPv3 parameters
      */
@@ -90,229 +58,74 @@ typedef struct NetL2TPV3State {
     uint64_t tx_cookie;
     uint32_t rx_session;
     uint32_t tx_session;
-    uint32_t header_size;
     uint32_t counter;
 
-    /*
-    * DOS avoidance in error handling
-    */
-
-    bool header_mismatch;
-
-    /*
-     * Ring buffer handling
-     */
-
-    int queue_head;
-    int queue_tail;
-    int queue_depth;
-
-    /*
-     * Precomputed offsets
-     */
-
-    uint32_t offset;
-    uint32_t cookie_offset;
-    uint32_t counter_offset;
-    uint32_t session_offset;
-
-    /* Poll Control */
-
-    bool read_poll;
-    bool write_poll;
-
     /* Flags */
 
     bool ipv6;
+    bool ipv4;
     bool udp;
     bool has_counter;
     bool pin_counter;
     bool cookie;
     bool cookie_is_64;
 
-} NetL2TPV3State;
-
-static void net_l2tpv3_send(void *opaque);
-static void l2tpv3_writable(void *opaque);
-
-static void l2tpv3_update_fd_handler(NetL2TPV3State *s)
-{
-    qemu_set_fd_handler(s->fd,
-                        s->read_poll ? net_l2tpv3_send : NULL,
-                        s->write_poll ? l2tpv3_writable : NULL,
-                        s);
-}
-
-static void l2tpv3_read_poll(NetL2TPV3State *s, bool enable)
-{
-    if (s->read_poll != enable) {
-        s->read_poll = enable;
-        l2tpv3_update_fd_handler(s);
-    }
-}
+    /* Precomputed L2TPV3 specific offsets */
+    uint32_t cookie_offset;
+    uint32_t counter_offset;
+    uint32_t session_offset;
 
-static void l2tpv3_write_poll(NetL2TPV3State *s, bool enable)
-{
-    if (s->write_poll != enable) {
-        s->write_poll = enable;
-        l2tpv3_update_fd_handler(s);
-    }
-}
+} L2TPV3TunnelParams;
 
-static void l2tpv3_writable(void *opaque)
-{
-    NetL2TPV3State *s = opaque;
-    l2tpv3_write_poll(s, false);
-    qemu_flush_queued_packets(&s->nc);
-}
 
-static void l2tpv3_send_completed(NetClientState *nc, ssize_t len)
-{
-    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-    l2tpv3_read_poll(s, true);
-}
 
-static void l2tpv3_poll(NetClientState *nc, bool enable)
+static void l2tpv3_form_header(void *us)
 {
-    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-    l2tpv3_write_poll(s, enable);
-    l2tpv3_read_poll(s, enable);
-}
+    NetUdstState *s = (NetUdstState *) us;
+    L2TPV3TunnelParams *p = (L2TPV3TunnelParams *) s->params;
 
-static void l2tpv3_form_header(NetL2TPV3State *s)
-{
     uint32_t *counter;
 
-    if (s->udp) {
+    if (p->udp) {
         stl_be_p((uint32_t *) s->header_buf, L2TPV3_DATA_PACKET);
     }
     stl_be_p(
-            (uint32_t *) (s->header_buf + s->session_offset),
-            s->tx_session
+            (uint32_t *) (s->header_buf + p->session_offset),
+            p->tx_session
         );
-    if (s->cookie) {
-        if (s->cookie_is_64) {
+    if (p->cookie) {
+        if (p->cookie_is_64) {
             stq_be_p(
-                (uint64_t *)(s->header_buf + s->cookie_offset),
-                s->tx_cookie
+                (uint64_t *)(s->header_buf + p->cookie_offset),
+                p->tx_cookie
             );
         } else {
             stl_be_p(
-                (uint32_t *) (s->header_buf + s->cookie_offset),
-                s->tx_cookie
+                (uint32_t *) (s->header_buf + p->cookie_offset),
+                p->tx_cookie
             );
         }
     }
-    if (s->has_counter) {
-        counter = (uint32_t *)(s->header_buf + s->counter_offset);
-        if (s->pin_counter) {
+    if (p->has_counter) {
+        counter = (uint32_t *)(s->header_buf + p->counter_offset);
+        if (p->pin_counter) {
             *counter = 0;
         } else {
-            stl_be_p(counter, ++s->counter);
+            stl_be_p(counter, ++p->counter);
         }
     }
 }
 
-static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc,
-                    const struct iovec *iov,
-                    int iovcnt)
-{
-    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
 
-    struct msghdr message;
-    int ret;
-
-    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
-        error_report(
-            "iovec too long %d > %d, change l2tpv3.h",
-            iovcnt, MAX_L2TPV3_IOVCNT
-        );
-        return -1;
-    }
-    l2tpv3_form_header(s);
-    memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
-    s->vec->iov_base = s->header_buf;
-    s->vec->iov_len = s->offset;
-    message.msg_name = s->dgram_dst;
-    message.msg_namelen = s->dst_size;
-    message.msg_iov = s->vec;
-    message.msg_iovlen = iovcnt + 1;
-    message.msg_control = NULL;
-    message.msg_controllen = 0;
-    message.msg_flags = 0;
-    do {
-        ret = sendmsg(s->fd, &message, 0);
-    } while ((ret == -1) && (errno == EINTR));
-    if (ret > 0) {
-        ret -= s->offset;
-    } else if (ret == 0) {
-        /* belt and braces - should not occur on DGRAM
-        * we should get an error and never a 0 send
-        */
-        ret = iov_size(iov, iovcnt);
-    } else {
-        /* signal upper layer that socket buffer is full */
-        ret = -errno;
-        if (ret == -EAGAIN || ret == -ENOBUFS) {
-            l2tpv3_write_poll(s, true);
-            ret = 0;
-        }
-    }
-    return ret;
-}
-
-static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc,
-                    const uint8_t *buf,
-                    size_t size)
-{
-    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-
-    struct iovec *vec;
-    struct msghdr message;
-    ssize_t ret = 0;
-
-    l2tpv3_form_header(s);
-    vec = s->vec;
-    vec->iov_base = s->header_buf;
-    vec->iov_len = s->offset;
-    vec++;
-    vec->iov_base = (void *) buf;
-    vec->iov_len = size;
-    message.msg_name = s->dgram_dst;
-    message.msg_namelen = s->dst_size;
-    message.msg_iov = s->vec;
-    message.msg_iovlen = 2;
-    message.msg_control = NULL;
-    message.msg_controllen = 0;
-    message.msg_flags = 0;
-    do {
-        ret = sendmsg(s->fd, &message, 0);
-    } while ((ret == -1) && (errno == EINTR));
-    if (ret > 0) {
-        ret -= s->offset;
-    } else if (ret == 0) {
-        /* belt and braces - should not occur on DGRAM
-        * we should get an error and never a 0 send
-        */
-        ret = size;
-    } else {
-        ret = -errno;
-        if (ret == -EAGAIN || ret == -ENOBUFS) {
-            /* signal upper layer that socket buffer is full */
-            l2tpv3_write_poll(s, true);
-            ret = 0;
-        }
-    }
-    return ret;
-}
-
-static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
+static int l2tpv3_verify_header(void *us, uint8_t *buf)
 {
 
+    NetUdstState *s = (NetUdstState *) us;
+    L2TPV3TunnelParams *p = (L2TPV3TunnelParams *) s->params;
     uint32_t *session;
     uint64_t cookie;
 
-    if ((!s->udp) && (!s->ipv6)) {
+    if ((!p->udp) && (!p->ipv6)) {
         buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
     }
 
@@ -321,21 +134,21 @@ static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
     * that anyway.
     */
 
-    if (s->cookie) {
-        if (s->cookie_is_64) {
-            cookie = ldq_be_p(buf + s->cookie_offset);
+    if (p->cookie) {
+        if (p->cookie_is_64) {
+            cookie = ldq_be_p(buf + p->cookie_offset);
         } else {
-            cookie = ldl_be_p(buf + s->cookie_offset) & 0xffffffffULL;
+            cookie = ldl_be_p(buf + p->cookie_offset) & 0xffffffffULL;
         }
-        if (cookie != s->rx_cookie) {
+        if (cookie != p->rx_cookie) {
             if (!s->header_mismatch) {
                 error_report("unknown cookie id");
             }
             return -1;
         }
     }
-    session = (uint32_t *) (buf + s->session_offset);
-    if (ldl_be_p(session) != s->rx_session) {
+    session = (uint32_t *) (buf + p->session_offset);
+    if (ldl_be_p(session) != p->rx_session) {
         if (!s->header_mismatch) {
             error_report("session mismatch");
         }
@@ -344,214 +157,47 @@ static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf)
     return 0;
 }
 
-static void net_l2tpv3_process_queue(NetL2TPV3State *s)
-{
-    int size = 0;
-    struct iovec *vec;
-    bool bad_read;
-    int data_size;
-    struct mmsghdr *msgvec;
-
-    /* go into ring mode only if there is a "pending" tail */
-    if (s->queue_depth > 0) {
-        do {
-            msgvec = s->msgvec + s->queue_tail;
-            if (msgvec->msg_len > 0) {
-                data_size = msgvec->msg_len - s->header_size;
-                vec = msgvec->msg_hdr.msg_iov;
-                if ((data_size > 0) &&
-                    (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
-                    vec++;
-                    /* Use the legacy delivery for now, we will
-                     * switch to using our own ring as a queueing mechanism
-                     * at a later date
-                     */
-                    size = qemu_send_packet_async(
-                            &s->nc,
-                            vec->iov_base,
-                            data_size,
-                            l2tpv3_send_completed
-                        );
-                    if (size == 0) {
-                        l2tpv3_read_poll(s, false);
-                    }
-                    bad_read = false;
-                } else {
-                    bad_read = true;
-                    if (!s->header_mismatch) {
-                        /* report error only once */
-                        error_report("l2tpv3 header verification failed");
-                        s->header_mismatch = true;
-                    }
-                }
-            } else {
-                bad_read = true;
-            }
-            s->queue_tail = (s->queue_tail + 1) % MAX_L2TPV3_MSGCNT;
-            s->queue_depth--;
-        } while (
-                (s->queue_depth > 0) &&
-                 qemu_can_send_packet(&s->nc) &&
-                ((size > 0) || bad_read)
-            );
-    }
-}
-
-static void net_l2tpv3_send(void *opaque)
-{
-    NetL2TPV3State *s = opaque;
-    int target_count, count;
-    struct mmsghdr *msgvec;
-
-    /* go into ring mode only if there is a "pending" tail */
-
-    if (s->queue_depth) {
-
-        /* The ring buffer we use has variable intake
-         * count of how much we can read varies - adjust accordingly
-         */
-
-        target_count = MAX_L2TPV3_MSGCNT - s->queue_depth;
-
-        /* Ensure we do not overrun the ring when we have
-         * a lot of enqueued packets
-         */
-
-        if (s->queue_head + target_count > MAX_L2TPV3_MSGCNT) {
-            target_count = MAX_L2TPV3_MSGCNT - s->queue_head;
-        }
-    } else {
-
-        /* we do not have any pending packets - we can use
-        * the whole message vector linearly instead of using
-        * it as a ring
-        */
-
-        s->queue_head = 0;
-        s->queue_tail = 0;
-        target_count = MAX_L2TPV3_MSGCNT;
-    }
-
-    msgvec = s->msgvec + s->queue_head;
-    if (target_count > 0) {
-        do {
-            count = recvmmsg(
-                s->fd,
-                msgvec,
-                target_count, MSG_DONTWAIT, NULL);
-        } while ((count == -1) && (errno == EINTR));
-        if (count < 0) {
-            /* Recv error - we still need to flush packets here,
-             * (re)set queue head to current position
-             */
-            count = 0;
-        }
-        s->queue_head = (s->queue_head + count) % MAX_L2TPV3_MSGCNT;
-        s->queue_depth += count;
-    }
-    net_l2tpv3_process_queue(s);
-}
-
-static void destroy_vector(struct mmsghdr *msgvec, int count, int iovcount)
-{
-    int i, j;
-    struct iovec *iov;
-    struct mmsghdr *cleanup = msgvec;
-    if (cleanup) {
-        for (i = 0; i < count; i++) {
-            if (cleanup->msg_hdr.msg_iov) {
-                iov = cleanup->msg_hdr.msg_iov;
-                for (j = 0; j < iovcount; j++) {
-                    g_free(iov->iov_base);
-                    iov++;
-                }
-                g_free(cleanup->msg_hdr.msg_iov);
-            }
-            cleanup++;
-        }
-        g_free(msgvec);
-    }
-}
-
-static struct mmsghdr *build_l2tpv3_vector(NetL2TPV3State *s, int count)
-{
-    int i;
-    struct iovec *iov;
-    struct mmsghdr *msgvec, *result;
-
-    msgvec = g_new(struct mmsghdr, count);
-    result = msgvec;
-    for (i = 0; i < count ; i++) {
-        msgvec->msg_hdr.msg_name = NULL;
-        msgvec->msg_hdr.msg_namelen = 0;
-        iov =  g_new(struct iovec, IOVSIZE);
-        msgvec->msg_hdr.msg_iov = iov;
-        iov->iov_base = g_malloc(s->header_size);
-        iov->iov_len = s->header_size;
-        iov++ ;
-        iov->iov_base = qemu_memalign(BUFFER_ALIGN, BUFFER_SIZE);
-        iov->iov_len = BUFFER_SIZE;
-        msgvec->msg_hdr.msg_iovlen = 2;
-        msgvec->msg_hdr.msg_control = NULL;
-        msgvec->msg_hdr.msg_controllen = 0;
-        msgvec->msg_hdr.msg_flags = 0;
-        msgvec++;
-    }
-    return result;
-}
-
-static void net_l2tpv3_cleanup(NetClientState *nc)
-{
-    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-    qemu_purge_queued_packets(nc);
-    l2tpv3_read_poll(s, false);
-    l2tpv3_write_poll(s, false);
-    if (s->fd >= 0) {
-        close(s->fd);
-    }
-    destroy_vector(s->msgvec, MAX_L2TPV3_MSGCNT, IOVSIZE);
-    g_free(s->vec);
-    g_free(s->header_buf);
-    g_free(s->dgram_dst);
-}
-
-static NetClientInfo net_l2tpv3_info = {
-    .type = NET_CLIENT_DRIVER_L2TPV3,
-    .size = sizeof(NetL2TPV3State),
-    .receive = net_l2tpv3_receive_dgram,
-    .receive_iov = net_l2tpv3_receive_dgram_iov,
-    .poll = l2tpv3_poll,
-    .cleanup = net_l2tpv3_cleanup,
-};
-
 int net_init_l2tpv3(const Netdev *netdev,
                     const char *name,
                     NetClientState *peer, Error **errp)
 {
-    /* FIXME error_setg(errp, ...) on failure */
     const NetdevL2TPv3Options *l2tpv3;
-    NetL2TPV3State *s;
+    NetUdstState *s;
     NetClientState *nc;
+    L2TPV3TunnelParams *p;
+
     int fd = -1, gairet;
     struct addrinfo hints;
     struct addrinfo *result = NULL;
     char *srcport, *dstport;
 
-    nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
+    nc = qemu_new_udst_net_client(name, peer);
+
+    s = DO_UPCAST(NetUdstState, nc, nc);
 
-    s = DO_UPCAST(NetL2TPV3State, nc, nc);
+    p = g_malloc(sizeof(L2TPV3TunnelParams));
 
-    s->queue_head = 0;
-    s->queue_tail = 0;
-    s->header_mismatch = false;
+    s->params = p;
 
     assert(netdev->type == NET_CLIENT_DRIVER_L2TPV3);
     l2tpv3 = &netdev->u.l2tpv3;
 
+    if ((l2tpv3->has_ipv4 && l2tpv3->ipv4) &&
+            (l2tpv3->has_ipv6 && l2tpv3->ipv6)) {
+        error_report("please choose either ipv4 or ipv6");
+        goto outerr;
+    }
+
     if (l2tpv3->has_ipv6 && l2tpv3->ipv6) {
-        s->ipv6 = l2tpv3->ipv6;
+        p->ipv6 = l2tpv3->ipv6;
+    } else {
+        p->ipv6 = false;
+    }
+
+    if (l2tpv3->has_ipv4 && l2tpv3->ipv4) {
+        p->ipv4 = l2tpv3->ipv4;
     } else {
-        s->ipv6 = false;
+        p->ipv4 = false;
     }
 
     if ((l2tpv3->has_offset) && (l2tpv3->offset > 256)) {
@@ -561,22 +207,22 @@ int net_init_l2tpv3(const Netdev *netdev,
 
     if (l2tpv3->has_rxcookie || l2tpv3->has_txcookie) {
         if (l2tpv3->has_rxcookie && l2tpv3->has_txcookie) {
-            s->cookie = true;
+            p->cookie = true;
         } else {
             goto outerr;
         }
     } else {
-        s->cookie = false;
+        p->cookie = false;
     }
 
     if (l2tpv3->has_cookie64 || l2tpv3->cookie64) {
-        s->cookie_is_64  = true;
+        p->cookie_is_64  = true;
     } else {
-        s->cookie_is_64  = false;
+        p->cookie_is_64  = false;
     }
 
     if (l2tpv3->has_udp && l2tpv3->udp) {
-        s->udp = true;
+        p->udp = true;
         if (!(l2tpv3->has_srcport && l2tpv3->has_dstport)) {
             error_report("l2tpv3_open : need both src and dst port for udp");
             goto outerr;
@@ -585,52 +231,56 @@ int net_init_l2tpv3(const Netdev *netdev,
             dstport = l2tpv3->dstport;
         }
     } else {
-        s->udp = false;
+        p->udp = false;
         srcport = NULL;
         dstport = NULL;
     }
 
 
     s->offset = 4;
-    s->session_offset = 0;
-    s->cookie_offset = 4;
-    s->counter_offset = 4;
+    p->session_offset = 0;
+    p->cookie_offset = 4;
+    p->counter_offset = 4;
 
-    s->tx_session = l2tpv3->txsession;
+    p->tx_session = l2tpv3->txsession;
     if (l2tpv3->has_rxsession) {
-        s->rx_session = l2tpv3->rxsession;
+        p->rx_session = l2tpv3->rxsession;
     } else {
-        s->rx_session = s->tx_session;
+        p->rx_session = p->tx_session;
     }
 
-    if (s->cookie) {
-        s->rx_cookie = l2tpv3->rxcookie;
-        s->tx_cookie = l2tpv3->txcookie;
-        if (s->cookie_is_64 == true) {
+    if (p->cookie) {
+        p->rx_cookie = l2tpv3->rxcookie;
+        p->tx_cookie = l2tpv3->txcookie;
+        if (p->cookie_is_64 == true) {
             /* 64 bit cookie */
             s->offset += 8;
-            s->counter_offset += 8;
+            p->counter_offset += 8;
         } else {
             /* 32 bit cookie */
             s->offset += 4;
-            s->counter_offset += 4;
+            p->counter_offset += 4;
         }
     }
 
     memset(&hints, 0, sizeof(hints));
 
-    if (s->ipv6) {
+    if (p->ipv6) {
         hints.ai_family = AF_INET6;
     } else {
-        hints.ai_family = AF_INET;
+        if (p->ipv4) {
+            hints.ai_family = AF_INET;
+        } else {
+            hints.ai_family = AF_UNSPEC;
+        }
     }
-    if (s->udp) {
+    if (p->udp) {
         hints.ai_socktype = SOCK_DGRAM;
         hints.ai_protocol = 0;
         s->offset += 4;
-        s->counter_offset += 4;
-        s->session_offset += 4;
-        s->cookie_offset += 4;
+        p->counter_offset += 4;
+        p->session_offset += 4;
+        p->cookie_offset += 4;
     } else {
         hints.ai_socktype = SOCK_RAW;
         hints.ai_protocol = IPPROTO_L2TP;
@@ -645,6 +295,16 @@ int net_init_l2tpv3(const Netdev *netdev,
         );
         goto outerr;
     }
+
+    /* Update flags to match actual result of name resolution */
+
+    if (result->ai_family == AF_INET) {
+        p->ipv4 = true;
+        p->ipv6 = false;
+    } else {
+        p->ipv6 = true;
+        p->ipv4 = false;
+    }
     fd = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
     if (fd == -1) {
         fd = -errno;
@@ -661,12 +321,12 @@ int net_init_l2tpv3(const Netdev *netdev,
 
     memset(&hints, 0, sizeof(hints));
 
-    if (s->ipv6) {
+    if (p->ipv6) {
         hints.ai_family = AF_INET6;
     } else {
         hints.ai_family = AF_INET;
     }
-    if (s->udp) {
+    if (p->udp) {
         hints.ai_socktype = SOCK_DGRAM;
         hints.ai_protocol = 0;
     } else {
@@ -693,17 +353,17 @@ int net_init_l2tpv3(const Netdev *netdev,
     }
 
     if (l2tpv3->has_counter && l2tpv3->counter) {
-        s->has_counter = true;
+        p->has_counter = true;
         s->offset += 4;
     } else {
-        s->has_counter = false;
+        p->has_counter = false;
     }
 
     if (l2tpv3->has_pincounter && l2tpv3->pincounter) {
-        s->has_counter = true;  /* pin counter implies that there is counter */
-        s->pin_counter = true;
+        p->has_counter = true;  /* pin counter implies that there is counter */
+        p->pin_counter = true;
     } else {
-        s->pin_counter = false;
+        p->pin_counter = false;
     }
 
     if (l2tpv3->has_offset) {
@@ -711,27 +371,23 @@ int net_init_l2tpv3(const Netdev *netdev,
         s->offset += l2tpv3->offset;
     }
 
-    if ((s->ipv6) || (s->udp)) {
+    if ((p->ipv6) || (p->udp)) {
         s->header_size = s->offset;
     } else {
         s->header_size = s->offset + sizeof(struct iphdr);
     }
 
-    s->msgvec = build_l2tpv3_vector(s, MAX_L2TPV3_MSGCNT);
-    s->vec = g_new(struct iovec, MAX_L2TPV3_IOVCNT);
-    s->header_buf = g_malloc(s->header_size);
-
-    qemu_set_nonblock(fd);
-
-    s->fd = fd;
-    s->counter = 0;
-
-    l2tpv3_read_poll(s, true);
+    qemu_net_finalize_udst_init(s,
+        &l2tpv3_verify_header,
+        &l2tpv3_form_header,
+        fd);
+    p->counter = 0;
 
     snprintf(s->nc.info_str, sizeof(s->nc.info_str),
              "l2tpv3: connected");
     return 0;
 outerr:
+    error_setg(errp, "Cannot initialize L2TPv3 transport");
     qemu_del_net_client(nc);
     if (fd >= 0) {
         close(fd);
diff --git a/qapi-schema.json b/qapi-schema.json
index 62a044f006..91e27ca2b0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3842,6 +3842,7 @@
     'dst':          'str',
     '*srcport':     'str',
     '*dstport':     'str',
+    '*ipv4':        'bool',
     '*ipv6':        'bool',
     '*udp':         'bool',
     '*cookie64':    'bool',
diff --git a/qemu-options.hx b/qemu-options.hx
index 9caf53fd76..20e0df6e9c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1947,8 +1947,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #endif
 #ifdef CONFIG_UDST
     "-netdev l2tpv3,id=str,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport]\n"
-    "         [,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,udp=on/off]\n"
-    "         [,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n"
+    "         [,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,ipv4=on/off]\n"
+    "         [,udp=on/off][,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n"
     "         [,rxcookie=rxcookie][,offset=offset]\n"
     "                configure a network backend with ID 'str' connected to\n"
     "                an Ethernet over L2TPv3 pseudowire.\n"
@@ -1963,6 +1963,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
     "                use 'srcport=' to specify source udp port\n"
     "                use 'dstport=' to specify destination udp port\n"
     "                use 'ipv6=on' to force v6\n"
+    "                use 'ipv4=on' to force v4\n"
     "                L2TPv3 uses cookies to prevent misconfiguration as\n"
     "                well as a weak security measure\n"
     "                use 'rxcookie=0x012345678' to specify a rxcookie\n"
@@ -2345,8 +2346,8 @@ qemu-system-i386 linux.img \
                  -net socket,mcast=239.192.168.1:1102,localaddr=1.2.3.4
 @end example
 
-@item -netdev l2tpv3,id=@var{id},src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
-@itemx -net l2tpv3[,vlan=@var{n}][,name=@var{name}],src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
+@item -netdev l2tpv3,id=@var{id},src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,ipv4][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
+@itemx -net l2tpv3[,vlan=@var{n}][,name=@var{name}],src=@var{srcaddr},dst=@var{dstaddr}[,srcport=@var{srcport}][,dstport=@var{dstport}],txsession=@var{txsession}[,rxsession=@var{rxsession}][,ipv6][,ipv4][,udp][,cookie64][,counter][,pincounter][,txcookie=@var{txcookie}][,rxcookie=@var{rxcookie}][,offset=@var{offset}]
 Connect VLAN @var{n} to L2TPv3 pseudowire. L2TPv3 (RFC3391) is a popular
 protocol to transport Ethernet (and other Layer 2) data frames between
 two systems. It is present in routers, firewalls and the Linux kernel
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 3/5] GRETAP Backend for UDST
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 1/5] Unified Datagram Socket Transports anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 2/5] Migrate l2tpv3 to UDST Backend anton.ivanov
@ 2017-07-20 19:12 ` anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 4/5] Raw " anton.ivanov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

GRETAP Backend for Universal Datagram Socket Transport

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 net/Makefile.objs |   2 +-
 net/clients.h     |   4 +
 net/gre.c         | 340 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/net.c         |   1 +
 qapi-schema.json  |  50 +++++++-
 qemu-options.hx   |  61 +++++++++-
 6 files changed, 453 insertions(+), 5 deletions(-)
 create mode 100644 net/gre.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index ffdfb96bd0..919bc3d78f 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/clients.h b/net/clients.h
index 5cae479730..8f8a59aee3 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -49,6 +49,10 @@ int net_init_bridge(const Netdev *netdev, const char *name,
 
 int net_init_l2tpv3(const Netdev *netdev, const char *name,
                     NetClientState *peer, Error **errp);
+
+int net_init_gre(const Netdev *netdev, const char *name,
+                    NetClientState *peer, Error **errp);
+
 #ifdef CONFIG_VDE
 int net_init_vde(const Netdev *netdev, const char *name,
                  NetClientState *peer, Error **errp);
diff --git a/net/gre.c b/net/gre.c
new file mode 100644
index 0000000000..d2c96db87e
--- /dev/null
+++ b/net/gre.c
@@ -0,0 +1,340 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge GREys Limited
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012-2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include <linux/ip.h>
+#include <netdb.h>
+#include "net/net.h"
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+/* IANA-assigned IP protocol ID for GRE */
+
+
+#ifndef IPPROTO_GRE
+#define IPPROTO_GRE 0x2F
+#endif
+
+#define GRE_MODE_CHECKSUM     htons(8 << 12)   /* checksum */
+#define GRE_MODE_RESERVED     htons(4 << 12)   /* unused */
+#define GRE_MODE_KEY          htons(2 << 12)   /* KEY present */
+#define GRE_MODE_SEQUENCE     htons(1 << 12)   /* no sequence */
+
+
+/* GRE TYPE for Ethernet in GRE aka GRETAP */
+
+#define GRE_IRB htons(0x6558)
+
+struct gre_minimal_header {
+   uint16_t header;
+   uint16_t arptype;
+};
+
+typedef struct GRETunnelParams {
+    /*
+     * GRE parameters
+     */
+
+    uint32_t rx_key;
+    uint32_t tx_key;
+    uint32_t sequence;
+
+    /* Flags */
+
+    bool ipv4;
+    bool ipv6;
+    bool udp;
+    bool has_sequence;
+    bool pin_sequence;
+    bool checksum;
+    bool key;
+
+    /* Precomputed GRE specific offsets */
+
+    uint32_t key_offset;
+    uint32_t sequence_offset;
+    uint32_t checksum_offset;
+
+    struct gre_minimal_header header_bits;
+
+} GRETunnelParams;
+
+
+
+static void gre_form_header(void *us)
+{
+    NetUdstState *s = (NetUdstState *) us;
+    GRETunnelParams *p = (GRETunnelParams *) s->params;
+
+    uint32_t *sequence;
+
+    *((uint32_t *) s->header_buf) = *((uint32_t *) &p->header_bits);
+
+    if (p->key) {
+        stl_be_p(
+            (uint32_t *) (s->header_buf + p->key_offset),
+            p->tx_key
+        );
+    }
+    if (p->has_sequence) {
+        sequence = (uint32_t *)(s->header_buf + p->sequence_offset);
+        if (p->pin_sequence) {
+            *sequence = 0;
+        } else {
+            stl_be_p(sequence, ++p->sequence);
+        }
+    }
+}
+
+static int gre_verify_header(void *us, uint8_t *buf)
+{
+
+    NetUdstState *s = (NetUdstState *) us;
+    GRETunnelParams *p = (GRETunnelParams *) s->params;
+    uint32_t key;
+
+
+    if (!p->ipv6) {
+        buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
+    }
+
+    if (*((uint32_t *) buf) != *((uint32_t *) &p->header_bits)) {
+        if (!s->header_mismatch) {
+            error_report("header type disagreement, expecting %0x, got %0x",
+                *((uint32_t *) &p->header_bits), *((uint32_t *) buf));
+        }
+        return -1;
+    }
+
+    if (p->key) {
+        key = ldl_be_p(buf + p->key_offset);
+        if (key != p->rx_key) {
+            if (!s->header_mismatch) {
+                error_report("unknown key id %0x, expecting %0x",
+                    key, p->rx_key);
+            }
+            return -1;
+        }
+    }
+    return 0;
+}
+
+int net_init_gre(const Netdev *netdev,
+                    const char *name,
+                    NetClientState *peer, Error **errp)
+{
+    const NetdevGREOptions *gre;
+    NetUdstState *s;
+    NetClientState *nc;
+    GRETunnelParams *p;
+
+    int fd = -1, gairet;
+    struct addrinfo hints;
+    struct addrinfo *result = NULL;
+
+    nc = qemu_new_udst_net_client(name, peer);
+
+    s = DO_UPCAST(NetUdstState, nc, nc);
+
+    p = g_malloc(sizeof(GRETunnelParams));
+
+    s->params = p;
+    p->header_bits.arptype = GRE_IRB;
+    p->header_bits.header = 0;
+
+    assert(netdev->type == NET_CLIENT_DRIVER_GRE);
+    gre = &netdev->u.gre;
+
+    if ((gre->has_ipv4 && gre->ipv4) &&
+            (gre->has_ipv6 && gre->ipv6)) {
+        error_report("please choose either ipv4 or ipv6");
+        goto outerr;
+    }
+
+    if (gre->has_ipv6 && gre->ipv6) {
+        p->ipv6 = gre->ipv6;
+    } else {
+        p->ipv6 = false;
+    }
+
+    if (gre->has_ipv4 && gre->ipv4) {
+        p->ipv4 = gre->ipv4;
+    } else {
+        p->ipv4 = false;
+    }
+
+
+    s->offset = 4;
+    p->key_offset = 4;
+    p->sequence_offset = 4;
+    p->checksum_offset = 4;
+
+    if (gre->has_rxkey || gre->has_txkey) {
+        if (gre->has_rxkey && gre->has_txkey) {
+            p->key = true;
+            p->header_bits.header |= GRE_MODE_KEY;
+        } else {
+            goto outerr;
+        }
+    } else {
+        p->key = false;
+    }
+
+    if (p->key) {
+        p->rx_key = gre->rxkey;
+        p->tx_key = gre->txkey;
+        s->offset += 4;
+        p->sequence_offset += 4;
+    }
+
+
+    if (gre->has_sequence && gre->sequence) {
+        s->offset += 4;
+        p->has_sequence = true;
+        p->header_bits.header |= GRE_MODE_SEQUENCE;
+    } else {
+        p->sequence = false;
+    }
+
+    if (gre->has_pinsequence && gre->pinsequence) {
+        /* pin sequence implies that there is sequence */
+        p->has_sequence = true;
+        p->pin_sequence = true;
+    } else {
+        p->pin_sequence = false;
+    }
+
+    memset(&hints, 0, sizeof(hints));
+
+    if (p->ipv6) {
+        hints.ai_family = AF_INET6;
+    } else {
+        if (p->ipv4) {
+            hints.ai_family = AF_INET;
+        } else {
+            hints.ai_family = AF_UNSPEC;
+        }
+    }
+
+    hints.ai_socktype = SOCK_RAW;
+    hints.ai_protocol = IPPROTO_GRE;
+
+    gairet = getaddrinfo(gre->src, NULL, &hints, &result);
+
+    if ((gairet != 0) || (result == NULL)) {
+        error_report(
+            "gre_open : could not resolve src, errno = %s",
+            gai_strerror(gairet)
+        );
+        goto outerr;
+    }
+
+    /* Update flags to match actual result of name resolution */
+
+    if (result->ai_family == AF_INET) {
+        p->ipv4 = true;
+        p->ipv6 = false;
+    } else {
+        p->ipv6 = true;
+        p->ipv4 = false;
+    }
+
+    fd = socket(result->ai_family, result->ai_socktype, result->ai_protocol);
+    if (fd == -1) {
+        fd = -errno;
+        error_report("gre_open : socket creation failed, errno = %d", -fd);
+        goto outerr;
+    }
+    if (bind(fd, (struct sockaddr *) result->ai_addr, result->ai_addrlen)) {
+        error_report("gre_open :  could not bind socket err=%i", errno);
+        goto outerr;
+    }
+    if (result) {
+        freeaddrinfo(result);
+    }
+
+    memset(&hints, 0, sizeof(hints));
+
+    if (p->ipv6) {
+        hints.ai_family = AF_INET6;
+    } else {
+        hints.ai_family = AF_INET;
+    }
+    hints.ai_socktype = SOCK_RAW;
+    hints.ai_protocol = IPPROTO_GRE;
+
+    result = NULL;
+    gairet = getaddrinfo(gre->dst, NULL, &hints, &result);
+    if ((gairet != 0) || (result == NULL)) {
+        error_report(
+            "gre_open : could not resolve dst, error = %s",
+            gai_strerror(gairet)
+        );
+        goto outerr;
+    }
+
+    s->dgram_dst = g_new0(struct sockaddr_storage, 1);
+    memcpy(s->dgram_dst, result->ai_addr, result->ai_addrlen);
+    s->dst_size = result->ai_addrlen;
+
+    if (result) {
+        freeaddrinfo(result);
+    }
+
+    if ((p->ipv6) || (p->udp)) {
+        s->header_size = s->offset;
+    } else {
+        s->header_size = s->offset + sizeof(struct iphdr);
+    }
+
+    qemu_net_finalize_udst_init(s,
+        &gre_verify_header,
+        &gre_form_header,
+        fd);
+
+    p->sequence = 0;
+
+    snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+             "gre: connected");
+    return 0;
+outerr:
+    error_setg(errp, "Cannot initialize GRE transport");
+    qemu_del_net_client(nc);
+    if (fd >= 0) {
+        close(fd);
+    }
+    if (result) {
+        freeaddrinfo(result);
+    }
+    return -1;
+}
diff --git a/net/net.c b/net/net.c
index 723a256260..6163a8a3af 100644
--- a/net/net.c
+++ b/net/net.c
@@ -962,6 +962,7 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #endif
 #ifdef CONFIG_UDST
         [NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3,
+        [NET_CLIENT_DRIVER_GRE] = net_init_gre,
 #endif
 };
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 91e27ca2b0..3f2a9bf8a2 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3853,7 +3853,44 @@
     'txsession':    'uint32',
     '*rxsession':   'uint32',
     '*offset':      'uint32' } }
-
+##
+# @NetdevGREOptions:
+#
+# Connect the VLAN to Ethernet over Ethernet over GRE (GRETAP) tunnel
+#
+# @src: source address
+#
+# @dst: destination address
+#
+# @ipv4: force the use of ipv4
+#
+# @ipv6: force the use of ipv6
+#
+# @sequence: have sequence counter
+#
+# @pinsequence: pin sequence counter to zero -
+#              workaround for buggy implementations or
+#              networks with packet reorder
+#
+# @txkey: 32 bit transmit key
+#
+# @rxkey: 32 bit receive key
+#
+# Note - gre checksums are not supported at present
+#
+#
+# Since 2.11
+##
+{ 'struct': 'NetdevGREOptions',
+  'data': {
+    'src':          'str',
+    'dst':          'str',
+    '*ipv4':        'bool',
+    '*ipv6':        'bool',
+    '*sequence':     'bool',
+    '*pinsequence':  'bool',
+    '*txkey':    'uint32',
+    '*rxkey':    'uint32' } }
 ##
 # @NetdevUdstOptions:
 #
@@ -3981,10 +4018,14 @@
 # Available netdev drivers.
 #
 # Since: 2.7
+#
+# udst -  since: 2.11
+#
+# gre -  since: 2.11
 ##
 { 'enum': 'NetClientDriver',
   'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde', 'dump',
-            'bridge', 'hubport', 'netmap', 'vhost-user', 'udst' ] }
+            'bridge', 'hubport', 'netmap', 'vhost-user', 'udst', 'gre' ] }
 
 ##
 # @Netdev:
@@ -4000,6 +4041,8 @@
 # 'l2tpv3' - since 2.1
 #
 # 'udst' - since 2.11
+#
+# 'gre' - since 2.11
 ##
 { 'union': 'Netdev',
   'base': { 'id': 'str', 'type': 'NetClientDriver' },
@@ -4017,7 +4060,8 @@
     'hubport':  'NetdevHubPortOptions',
     'netmap':   'NetdevNetmapOptions',
     'vhost-user': 'NetdevVhostUserOptions',
-    'udst':     'NetdevUdstOptions' } }
+    'udst':     'NetdevUdstOptions',
+    'gre':      'NetdevGREOptions' } }
 
 ##
 # @NetLegacy:
diff --git a/qemu-options.hx b/qemu-options.hx
index 20e0df6e9c..2692858d94 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1972,6 +1972,24 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
     "                use 'counter=off' to force a 'cut-down' L2TPv3 with no counter\n"
     "                use 'pincounter=on' to work around broken counter handling in peer\n"
     "                use 'offset=X' to add an extra offset between header and data\n"
+    "-netdev gre,id=str,src=srcaddr,dst=dstaddr[,rxkey=rxkey],txkey=txkey[,ipv6=on/off]\n"
+    "         [,ipv4=on/off][,sequence][,pinsequence]\n"
+    "                configure a network backend with ID 'str' connected to\n"
+    "                an Ethernet over GRE pseudowire (aka GRE TAP).\n"
+    "                Linux kernel 3.3+ as well as most routers and some switches\n"
+    "                can talk GRETAP. This transport allows connecting a VM to a VM,\n"
+    "                VM to a router and even VM to Host. It is a nearly-universal\n"
+    "                standard (RFC1701).\n"
+    "                use 'src=' to specify source address\n"
+    "                use 'dst=' to specify destination address\n"
+    "                use 'ipv4=on' to force v4\n"
+    "                use 'ipv6=on' to force v6\n"
+    "                GRE may use keys to prevent misconfiguration as\n"
+    "                well as a weak security measure\n"
+    "                use 'rxkey=0x01234' to specify a rxkey\n"
+    "                use 'txkey=0x01234' to specify a txkey\n"
+    "                use 'sequence=on' to add frame sequence to each packet\n"
+    "                use 'pinsequence=on' to work around broken sequence handling in peer\n"
 #endif
     "-netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]\n"
     "                configure a network backend to connect to another network\n"
@@ -2395,12 +2413,53 @@ ip l2tp add session tunnel_id 1 name vmtunnel0 session_id \
 ifconfig vmtunnel0 mtu 1500
 ifconfig vmtunnel0 up
 brctl addif br-lan vmtunnel0
+@end example
+
+Alternatively, it is possible to assign an IP address to vmtunnel0, which allows
+the VM to connect to the host directly without using Linux bridging.
+
+
+@item -netdev gre,id=@var{id},src=@var{srcaddr},dst=@var{dstaddr}[,ipv4][,ipv6][,sequence][,pinsequence][,txkey=@var{txkey}][,rxkey=@var{rxkey}]
+Connect VLAN @var{n} to a GRE pseudowire. GRE (RFC1701) is a popular
+protocol to transport various data frames between two systems.
+We are interested in a specific GRE variety where the transported
+frames are Ethernet. This GRE type is usually referred to as GRETAP.
+It is present in routers, firewalls, switches and the Linux kernel
+(from version 3.3 onwards).
+
+This transport allows a VM to communicate to another VM, router or firewall directly.
+
+@item src=@var{srcaddr}
+    source address (mandatory)
+@item dst=@var{dstaddr}
+    destination address (mandatory)
+@item ipv6
+    force v6, otherwise defaults to v4.
+@item rxkey=@var{rxkey}
+@itemx txkey=@var{txkey}
+    Keys are a weak form of security in the gre specification.
+Their function is mostly to prevent misconfiguration.
+@item sequence=on
+    Add frame sequence to GRE frames
+@item pinsequence=on
+    Work around broken sequence handling in peer. This may also help on
+networks which have packet reorder.
+
+For example, to attach a VM running on host 4.3.2.1 via GRETAP to the bridge br-lan
+on the remote Linux host 1.2.3.4:
+@example
+# Setup tunnel on linux host using raw ip as encapsulation
+# on 1.2.3.4
+ip link add gt0 type gretap local 1.2.3.4 remote 4.3.2.1
+ifconfig gt0 mtu 1500
+ifconfig gt0 up
+brctl addif br-lan gt0
 
 
 # on 4.3.2.1
 # launch QEMU instance - if your network has reorder or is very lossy add ,pincounter
 
-qemu-system-i386 linux.img -net nic -net l2tpv3,src=4.2.3.1,dst=1.2.3.4,udp,srcport=16384,dstport=16384,rxsession=0xffffffff,txsession=0xffffffff,counter
+qemu-system-i386 linux.img -device virtio-net-pci,netdev=gre0 -netdev gre,id=gre0,src=4.2.3.1,dst=1.2.3.4
 
 
 @end example
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 4/5] Raw Backend for UDST
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
                   ` (2 preceding siblings ...)
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 3/5] GRETAP Backend for UDST anton.ivanov
@ 2017-07-20 19:12 ` anton.ivanov
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 5/5] Migrate Datagram operation in socket transport to UDST anton.ivanov
  2017-07-21  3:55 ` [Qemu-devel] Unified Datagram Socket Transport Jason Wang
  5 siblings, 0 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Raw Socket Backend for Universal Datagram Socket Transport

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 net/Makefile.objs |   2 +-
 net/clients.h     |   3 ++
 net/net.c         |   1 +
 net/raw.c         | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json  |  20 ++++++++-
 qemu-options.hx   |  32 ++++++++++++++
 6 files changed, 178 insertions(+), 3 deletions(-)
 create mode 100644 net/raw.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 919bc3d78f..457297b5ed 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o raw.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/clients.h b/net/clients.h
index 8f8a59aee3..98d8ae59b7 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -53,6 +53,9 @@ int net_init_l2tpv3(const Netdev *netdev, const char *name,
 int net_init_gre(const Netdev *netdev, const char *name,
                     NetClientState *peer, Error **errp);
 
+int net_init_raw(const Netdev *netdev, const char *name,
+                    NetClientState *peer, Error **errp);
+
 #ifdef CONFIG_VDE
 int net_init_vde(const Netdev *netdev, const char *name,
                  NetClientState *peer, Error **errp);
diff --git a/net/net.c b/net/net.c
index 6163a8a3af..8eb0aa2bee 100644
--- a/net/net.c
+++ b/net/net.c
@@ -963,6 +963,7 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_UDST
         [NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3,
         [NET_CLIENT_DRIVER_GRE] = net_init_gre,
+        [NET_CLIENT_DRIVER_RAW] = net_init_raw,
 #endif
 };
 
diff --git a/net/raw.c b/net/raw.c
new file mode 100644
index 0000000000..8f73248095
--- /dev/null
+++ b/net/raw.c
@@ -0,0 +1,123 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012-2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include <linux/ip.h>
+#include <netdb.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include "net/net.h"
+#include <sys/socket.h>
+#include <linux/if_packet.h>
+#include <net/ethernet.h>
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+static int noop(void *us, uint8_t *buf)
+{
+    return 0;
+}
+
+int net_init_raw(const Netdev *netdev,
+                    const char *name,
+                    NetClientState *peer, Error **errp)
+{
+
+    const NetdevRawOptions *raw;
+    NetUdstState *s;
+    NetClientState *nc;
+
+    int fd = -1;
+    int err;
+
+    struct ifreq ifr;
+    struct sockaddr_ll sock;
+
+
+    nc = qemu_new_udst_net_client(name, peer);
+
+    s = DO_UPCAST(NetUdstState, nc, nc);
+
+    fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+    if (fd == -1) {
+        err = -errno;
+        error_report("raw_open : raw socket creation failed, errno = %d", -err);
+        goto outerr;
+    }
+
+
+    s->dgram_dst = NULL;
+    s->dst_size = 0;
+
+    assert(netdev->type == NET_CLIENT_DRIVER_RAW);
+    raw = &netdev->u.raw;
+
+    memset(&ifr, 0, sizeof(struct ifreq));
+    strncpy((char *) &ifr.ifr_name, raw->ifname, sizeof(ifr.ifr_name) - 1);
+
+    if (ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
+        err = -errno;
+        error_report("SIOCGIFINDEX, failed to get raw interface index for %s",
+            raw->ifname);
+        goto outerr;
+    }
+
+    sock.sll_family = AF_PACKET;
+    sock.sll_protocol = htons(ETH_P_ALL);
+    sock.sll_ifindex = ifr.ifr_ifindex;
+
+    if (bind(fd, (struct sockaddr *) &sock, sizeof(struct sockaddr_ll)) < 0) {
+        error_report("raw: failed to bind raw socket");
+        err = -errno;
+        goto outerr;
+    }
+
+    s->offset = 0;
+
+    qemu_net_finalize_udst_init(s,
+        &noop,
+        NULL,
+        fd);
+
+    snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+             "raw: connected");
+    return 0;
+outerr:
+    error_setg(errp, "Cannot initialize GRE transport");
+    qemu_del_net_client(nc);
+    if (fd >= 0) {
+        close(fd);
+    }
+    return -1;
+}
+
diff --git a/qapi-schema.json b/qapi-schema.json
index 3f2a9bf8a2..6882fb7fc6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3904,6 +3904,21 @@
   'data': { } }
 
 ##
+# @NetdevRawOptions:
+#
+# Connect the VLAN to an network interface using raw sockets
+#
+# @ifname: network interface name
+#
+
+# Since 2.9
+##
+{ 'struct': 'NetdevRawOptions',
+  'data': {
+    'ifname':          'str'
+} }
+
+##
 # @NetdevVdeOptions:
 #
 # Connect the VLAN to a vde switch running on the host.
@@ -4025,7 +4040,7 @@
 ##
 { 'enum': 'NetClientDriver',
   'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde', 'dump',
-            'bridge', 'hubport', 'netmap', 'vhost-user', 'udst', 'gre' ] }
+            'bridge', 'hubport', 'netmap', 'vhost-user', 'udst', 'gre', 'raw' ] }
 
 ##
 # @Netdev:
@@ -4061,7 +4076,8 @@
     'netmap':   'NetdevNetmapOptions',
     'vhost-user': 'NetdevVhostUserOptions',
     'udst':     'NetdevUdstOptions',
-    'gre':      'NetdevGREOptions' } }
+    'gre':      'NetdevGREOptions',
+    'raw':      'NetdevRawOptions' } }
 
 ##
 # @NetLegacy:
diff --git a/qemu-options.hx b/qemu-options.hx
index 2692858d94..6a24cafdf5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1990,6 +1990,13 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
     "                use 'txkey=0x01234' to specify a txkey\n"
     "                use 'sequence=on' to add frame sequence to each packet\n"
     "                use 'pinsequence=on' to work around broken sequence handling in peer\n"
+    "-netdev raw,id=str,ifname=ifname\n"
+    "                configure a network backend with ID 'str' connected to\n"
+    "                an Ethernet interface named ifname via raw socket.\n"
+    "                This backend does not change the interface settings.\n"
+    "                Most interfaces will require being set into promisc mode,\n"
+    "                as well having most offloads (TSO, etc) turned off.\n"
+    "                Some virtual interfaces like tap support only RX.\n"
 #endif
     "-netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]\n"
     "                configure a network backend to connect to another network\n"
@@ -2464,6 +2471,31 @@ qemu-system-i386 linux.img -device virtio-net-pci,netdev=gre0 -netdev gre,id=gre
 
 @end example
 
+@item -netdev raw,id=@var{id},ifname=@var{ifname}
+Connect VLAN @var{n} directly to an Ethernet interface using raw socket.
+
+This transport allows a VM to bypass most of the network stack which is
+extremely useful for tapping.
+
+@item ifname=@var{ifname}
+    interface name (mandatory)
+
+@example
+# set up the interface - put it in promiscuous mode and turn off offloads
+ifconfig eth0 up
+ifconfig eth0 promisc
+
+/sbin/ethtool -K eth0 gro off
+/sbin/ethtool -K eth0 tso off
+/sbin/ethtool -K eth0 gso off
+/sbin/ethtool -K eth0 tx off
+
+# launch QEMU instance - if your network has reorder or is very lossy add ,pincounter
+
+qemu-system-i386 linux.img -device virtio-net-pci,netdev=raw0 -netdev raw,id=raw0,ifname=eth0
+
+@end example
+
 @item -netdev vde,id=@var{id}[,sock=@var{socketpath}][,port=@var{n}][,group=@var{groupname}][,mode=@var{octalmode}]
 @itemx -net vde[,vlan=@var{n}][,name=@var{name}][,sock=@var{socketpath}] [,port=@var{n}][,group=@var{groupname}][,mode=@var{octalmode}]
 Connect VLAN @var{n} to PORT @var{n} of a vde switch running on host and
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH v2 5/5] Migrate Datagram operation in socket transport to UDST
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
                   ` (3 preceding siblings ...)
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 4/5] Raw " anton.ivanov
@ 2017-07-20 19:12 ` anton.ivanov
  2017-07-21  3:55 ` [Qemu-devel] Unified Datagram Socket Transport Jason Wang
  5 siblings, 0 replies; 11+ messages in thread
From: anton.ivanov @ 2017-07-20 19:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, Anton Ivanov

From: Anton Ivanov <anton.ivanov@cambridgegreys.com>

Migrate datagram operation to UDST if UDST is available.

Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
---
 net/socket.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 118 insertions(+), 5 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index f85ef7d61b..0523fe9ac1 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -33,6 +33,9 @@
 #include "qemu/sockets.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
+#ifdef CONFIG_UDST
+#include "udst.h"
+#endif
 
 typedef struct NetSocketState {
     NetClientState nc;
@@ -49,6 +52,14 @@ typedef struct NetSocketState {
 static void net_socket_accept(void *opaque);
 static void net_socket_writable(void *opaque);
 
+#ifdef CONFIG_UDST
+static int noop(void *us, uint8_t *buf)
+{
+    return 0;
+}
+#endif
+
+
 static void net_socket_update_fd_handler(NetSocketState *s)
 {
     qemu_set_fd_handler(s->fd,
@@ -113,6 +124,8 @@ static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, size_t
     return size;
 }
 
+#ifndef CONFIG_UDST
+
 static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
 {
     NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
@@ -130,6 +143,7 @@ static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf,
     }
     return ret;
 }
+#endif
 
 static void net_socket_send_completed(NetClientState *nc, ssize_t len)
 {
@@ -189,6 +203,8 @@ static void net_socket_send(void *opaque)
     }
 }
 
+#ifndef CONFIG_UDST
+
 static void net_socket_send_dgram(void *opaque)
 {
     NetSocketState *s = opaque;
@@ -208,6 +224,7 @@ static void net_socket_send_dgram(void *opaque)
         net_socket_read_poll(s, false);
     }
 }
+#endif
 
 static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct in_addr *localaddr)
 {
@@ -309,7 +326,85 @@ static void net_socket_cleanup(NetClientState *nc)
         s->listen_fd = -1;
     }
 }
+#ifdef CONFIG_UDST
+static NetUdstState *net_socket_fd_init_dgram(NetClientState *peer,
+                                                const char *model,
+                                                const char *name,
+                                                int fd, int is_connected)
+{
+    struct sockaddr_in saddr;
+    int newfd;
+    socklen_t saddr_len = sizeof(saddr);
+    NetClientState *nc;
+    NetUdstState *s;
 
+    /* fd passed: multicast: "learn" dgram_dst address from bound address and save it
+     * Because this may be "shared" socket from a "master" process, datagrams would be recv()
+     * by ONLY ONE process: we must "clone" this dgram socket --jjo
+     */
+
+    if (is_connected) {
+        if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
+            /* must be bound */
+            if (saddr.sin_addr.s_addr == 0) {
+                fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
+                        "cannot setup multicast dst addr\n", fd);
+                goto err;
+            }
+            /* clone dgram socket */
+            newfd = net_socket_mcast_create(&saddr, NULL);
+            if (newfd < 0) {
+                /* error already reported by net_socket_mcast_create() */
+                goto err;
+            }
+            /* clone newfd to fd, close newfd */
+            dup2(newfd, fd);
+            close(newfd);
+
+        } else {
+            fprintf(stderr,
+                    "qemu: error: init_dgram: fd=%d failed getsockname(): %s\n",
+                    fd, strerror(errno));
+            goto err;
+        }
+    }
+
+    fprintf(stderr,
+            "qemu: init udst for fd=%d\n",
+            fd);
+    nc = qemu_new_udst_net_client(name, peer);
+
+    s = DO_UPCAST(NetUdstState, nc, nc);
+
+    s->offset = 0;
+
+    qemu_net_finalize_udst_init(s,
+        &noop,
+        NULL,
+        fd);
+
+    /* mcast: save bound address as dst */
+    if (is_connected) {
+        s->dgram_dst = g_memdup(&saddr, sizeof(struct sockaddr_in));
+        s->dst_size = sizeof(struct sockaddr_in);
+        snprintf(nc->info_str, sizeof(nc->info_str),
+                 "socket: fd=%d (cloned mcast=%s:%d)",
+                 fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+    } else {
+        /* This will be overwritten later if we have a dst */
+        s->dgram_dst = NULL;
+        s->dst_size = 0;
+        snprintf(nc->info_str, sizeof(nc->info_str),
+                 "socket: fd=%d", fd);
+    }
+
+    return s;
+
+err:
+    closesocket(fd);
+    return NULL;
+}
+#else
 static NetClientInfo net_dgram_socket_info = {
     .type = NET_CLIENT_DRIVER_SOCKET,
     .size = sizeof(NetSocketState),
@@ -386,6 +481,7 @@ err:
     closesocket(fd);
     return NULL;
 }
+#endif
 
 static void net_socket_connect(void *opaque)
 {
@@ -430,7 +526,7 @@ static NetSocketState *net_socket_fd_init_stream(NetClientState *peer,
     return s;
 }
 
-static NetSocketState *net_socket_fd_init(NetClientState *peer,
+static void *net_socket_fd_init(NetClientState *peer,
                                           const char *model, const char *name,
                                           int fd, int is_connected)
 {
@@ -567,7 +663,7 @@ static int net_socket_connect_init(NetClientState *peer,
             break;
         }
     }
-    s = net_socket_fd_init(peer, model, name, fd, connected);
+    s = net_socket_fd_init_stream(peer, model, name, fd, connected);
     if (!s)
         return -1;
     snprintf(s->nc.info_str, sizeof(s->nc.info_str),
@@ -582,7 +678,11 @@ static int net_socket_mcast_init(NetClientState *peer,
                                  const char *host_str,
                                  const char *localaddr_str)
 {
+#ifdef CONFIG_UDST
+    NetUdstState *s;
+#else
     NetSocketState *s;
+#endif
     int fd;
     struct sockaddr_in saddr;
     struct in_addr localaddr, *param_localaddr;
@@ -602,11 +702,15 @@ static int net_socket_mcast_init(NetClientState *peer,
     if (fd < 0)
         return -1;
 
-    s = net_socket_fd_init(peer, model, name, fd, 0);
+    s = net_socket_fd_init_dgram(peer, model, name, fd, 0);
     if (!s)
         return -1;
-
+#ifdef CONFIG_UDST
+    s->dgram_dst = g_memdup(&saddr, sizeof(struct sockaddr_in));
+    s->dst_size = sizeof(struct sockaddr_in);
+#else
     s->dgram_dst = saddr;
+#endif
 
     snprintf(s->nc.info_str, sizeof(s->nc.info_str),
              "socket: mcast=%s:%d",
@@ -621,7 +725,11 @@ static int net_socket_udp_init(NetClientState *peer,
                                  const char *rhost,
                                  const char *lhost)
 {
+#ifdef CONFIG_UDST
+    NetUdstState *s;
+#else
     NetSocketState *s;
+#endif
     int fd, ret;
     struct sockaddr_in laddr, raddr;
 
@@ -652,12 +760,17 @@ static int net_socket_udp_init(NetClientState *peer,
     }
     qemu_set_nonblock(fd);
 
-    s = net_socket_fd_init(peer, model, name, fd, 0);
+    s = net_socket_fd_init_dgram(peer, model, name, fd, 0);
     if (!s) {
         return -1;
     }
 
+#ifdef CONFIG_UDST
+    s->dgram_dst = g_memdup(&raddr, sizeof(struct sockaddr_in));
+    s->dst_size = sizeof(struct sockaddr_in);
+#else
     s->dgram_dst = raddr;
+#endif
 
     snprintf(s->nc.info_str, sizeof(s->nc.info_str),
              "socket: udp=%s:%d",
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unified Datagram Socket Transport
  2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
                   ` (4 preceding siblings ...)
  2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 5/5] Migrate Datagram operation in socket transport to UDST anton.ivanov
@ 2017-07-21  3:55 ` Jason Wang
  2017-07-21  4:25   ` Anton Ivanov
  2017-07-21 19:05   ` Anton Ivanov
  5 siblings, 2 replies; 11+ messages in thread
From: Jason Wang @ 2017-07-21  3:55 UTC (permalink / raw)
  To: anton.ivanov, qemu-devel



On 2017年07月21日 03:12, anton.ivanov@cambridgegreys.com wrote:
> Hi all,
>
> This addresses comments so far except Eric's suggestion to use
> InetSocketAddressBase. If I understand correctly its intended use,
> it will not be of help for protocols which have no port (raw
> sockets - GRE, L2TPv3, etc).
>
> It also includes a port of the original socket.c transport to
> the new UDST backend. The relevant code is ifdef-ed so there
> should be no effect on other systems.

This looks sub-optimal. If you want to do this, I would rather suggest 
you just extend the socket dgram backend like what udst did now.

>
> I think that this is would be the appropriate place to stop in this
> iteration. I would prefer to have this polished, before I start
> looking at sendmmsg and bulk send or some of the more unpleasant
> encapsulations like geneve.

Pay attention we're softfreeze now. So the feature is for 2.11, if it 
looks good, I can only queue it for 2.11.

Btw, looks like not all comments of v1 were addressed.

Thanks

>
> A.
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unified Datagram Socket Transport
  2017-07-21  3:55 ` [Qemu-devel] Unified Datagram Socket Transport Jason Wang
@ 2017-07-21  4:25   ` Anton Ivanov
  2017-07-24  3:17     ` Jason Wang
  2017-07-21 19:05   ` Anton Ivanov
  1 sibling, 1 reply; 11+ messages in thread
From: Anton Ivanov @ 2017-07-21  4:25 UTC (permalink / raw)
  To: Jason Wang, qemu-devel



On 21/07/17 04:55, Jason Wang wrote:
>
>
> On 2017年07月21日 03:12, anton.ivanov@cambridgegreys.com wrote:
>> Hi all,
>>
>> This addresses comments so far except Eric's suggestion to use
>> InetSocketAddressBase. If I understand correctly its intended use,
>> it will not be of help for protocols which have no port (raw
>> sockets - GRE, L2TPv3, etc).
>>
>> It also includes a port of the original socket.c transport to
>> the new UDST backend. The relevant code is ifdef-ed so there
>> should be no effect on other systems.
>
> This looks sub-optimal. If you want to do this, I would rather suggest 
> you just extend the socket dgram backend like what udst did now.

Apologies, do you mean extend it further to handle the tcp form?

That does not work at present. Sure, you can receive tcp using recvmmsg, 
but you cannot use it to handle a tcp based variable length encaps where 
frame lengths are set in a header. That can be done only be sequential 
read() or recv() to read the frame length first, then the frame.

I can in theory add that to a unified socket backend, but this is 
completely different tx and rx logic - so it will become suboptimal (and 
quite ugly). It will need different tx and rx functions and appropriate 
initialization to select them. I'd rather keep it to datagram only - 
what says on the tin.

>
>>
>> I think that this is would be the appropriate place to stop in this
>> iteration. I would prefer to have this polished, before I start
>> looking at sendmmsg and bulk send or some of the more unpleasant
>> encapsulations like geneve.
>
> Pay attention we're softfreeze now. So the feature is for 2.11, if it 
> looks good, I can only queue it for 2.11.

OK. Understood.

>
> Btw, looks like not all comments of v1 were addressed.

I will go through the comments one more time. I realized I may have 
missed converting malloc to a g_memdup in a couple places.

>
> Thanks

Best Regards,

A.

>
>>
>> A.
>>
>>
>
>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unified Datagram Socket Transport
  2017-07-21  3:55 ` [Qemu-devel] Unified Datagram Socket Transport Jason Wang
  2017-07-21  4:25   ` Anton Ivanov
@ 2017-07-21 19:05   ` Anton Ivanov
  1 sibling, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-07-21 19:05 UTC (permalink / raw)
  To: Jason Wang, qemu-devel

Hi Jason, Hi Eric, hi list,

I have gone through all comments and addressed everything to which I did 
not reply separately with clarifications.

Before I resubmit I have a couple of architectural questions:

1. Is it OK in its current form: UDST client which cannot be 
instantiated and the others creating instances of it. I am aware that 
this does not quite match the current semantics, but this keeps the 
per-transport code to the minimum possible - init and (in the newest 
version) optional verify and form header functions. F.e. in the next 
submission raw will be init only and its data - nothing else.

2. I have updated the help, docs and the API.

3. I did not quite understand your comment on socket.c - what are you 
suggesting there - do you want to fold stream mode into a common 
backend? I do not think it is possible. I have tried to do surgery only 
on the datagram stuff. Also, socket.c is quite old and has a violations 
of current coding style and conventions. Should I fix those as a part of 
the submission or this can be a separate patch?

A.


On 21/07/17 04:55, Jason Wang wrote:
>
>
> On 2017年07月21日 03:12, anton.ivanov@cambridgegreys.com wrote:
>> Hi all,
>>
>> This addresses comments so far except Eric's suggestion to use
>> InetSocketAddressBase. If I understand correctly its intended use,
>> it will not be of help for protocols which have no port (raw
>> sockets - GRE, L2TPv3, etc).
>>
>> It also includes a port of the original socket.c transport to
>> the new UDST backend. The relevant code is ifdef-ed so there
>> should be no effect on other systems.
>
> This looks sub-optimal. If you want to do this, I would rather suggest 
> you just extend the socket dgram backend like what udst did now.
>
>>
>> I think that this is would be the appropriate place to stop in this
>> iteration. I would prefer to have this polished, before I start
>> looking at sendmmsg and bulk send or some of the more unpleasant
>> encapsulations like geneve.
>
> Pay attention we're softfreeze now. So the feature is for 2.11, if it 
> looks good, I can only queue it for 2.11.
>
> Btw, looks like not all comments of v1 were addressed.
>
> Thanks
>
>>
>> A.
>>
>>
>
>

-- 
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unified Datagram Socket Transport
  2017-07-21  4:25   ` Anton Ivanov
@ 2017-07-24  3:17     ` Jason Wang
  2017-07-24  6:15       ` Anton Ivanov
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2017-07-24  3:17 UTC (permalink / raw)
  To: Anton Ivanov, qemu-devel



On 2017年07月21日 12:25, Anton Ivanov wrote:
>
>
> On 21/07/17 04:55, Jason Wang wrote:
>>
>>
>> On 2017年07月21日 03:12, anton.ivanov@cambridgegreys.com wrote:
>>> Hi all,
>>>
>>> This addresses comments so far except Eric's suggestion to use
>>> InetSocketAddressBase. If I understand correctly its intended use,
>>> it will not be of help for protocols which have no port (raw
>>> sockets - GRE, L2TPv3, etc).
>>>
>>> It also includes a port of the original socket.c transport to
>>> the new UDST backend. The relevant code is ifdef-ed so there
>>> should be no effect on other systems.
>>
>> This looks sub-optimal. If you want to do this, I would rather 
>> suggest you just extend the socket dgram backend like what udst did now.
>
> Apologies, do you mean extend it further to handle the tcp form? 

Not that far, since you try to convert net_dgram_socket_info, I'm 
thinking just do the all udst in net_dgram_socket. For recvmmsg you can 
have transport specific callback for this.

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unified Datagram Socket Transport
  2017-07-24  3:17     ` Jason Wang
@ 2017-07-24  6:15       ` Anton Ivanov
  0 siblings, 0 replies; 11+ messages in thread
From: Anton Ivanov @ 2017-07-24  6:15 UTC (permalink / raw)
  To: Jason Wang, Anton Ivanov, qemu-devel

That is an option as well. 

I am travelling this week and will get back to this first thing next week.

A.

On 24 July 2017 05:17:25 CEST, Jason Wang <jasowang@redhat.com> wrote:
>
>
>On 2017年07月21日 12:25, Anton Ivanov wrote:
>>
>>
>> On 21/07/17 04:55, Jason Wang wrote:
>>>
>>>
>>> On 2017年07月21日 03:12, anton.ivanov@cambridgegreys.com wrote:
>>>> Hi all,
>>>>
>>>> This addresses comments so far except Eric's suggestion to use
>>>> InetSocketAddressBase. If I understand correctly its intended use,
>>>> it will not be of help for protocols which have no port (raw
>>>> sockets - GRE, L2TPv3, etc).
>>>>
>>>> It also includes a port of the original socket.c transport to
>>>> the new UDST backend. The relevant code is ifdef-ed so there
>>>> should be no effect on other systems.
>>>
>>> This looks sub-optimal. If you want to do this, I would rather 
>>> suggest you just extend the socket dgram backend like what udst did
>now.
>>
>> Apologies, do you mean extend it further to handle the tcp form? 
>
>Not that far, since you try to convert net_dgram_socket_info, I'm 
>thinking just do the all udst in net_dgram_socket. For recvmmsg you can
>
>have transport specific callback for this.
>
>Thanks

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-24  6:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-20 19:12 [Qemu-devel] Unified Datagram Socket Transport anton.ivanov
2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 1/5] Unified Datagram Socket Transports anton.ivanov
2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 2/5] Migrate l2tpv3 to UDST Backend anton.ivanov
2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 3/5] GRETAP Backend for UDST anton.ivanov
2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 4/5] Raw " anton.ivanov
2017-07-20 19:12 ` [Qemu-devel] [PATCH v2 5/5] Migrate Datagram operation in socket transport to UDST anton.ivanov
2017-07-21  3:55 ` [Qemu-devel] Unified Datagram Socket Transport Jason Wang
2017-07-21  4:25   ` Anton Ivanov
2017-07-24  3:17     ` Jason Wang
2017-07-24  6:15       ` Anton Ivanov
2017-07-21 19:05   ` Anton Ivanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.