All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Contribution - L2TPv3 transport
@ 2014-02-28  8:28 Anton Ivanov (antivano)
  2014-02-28 10:02 ` Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28  8:28 UTC (permalink / raw)
  To: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2372 bytes --]

Hi all,

On behalf of Cisco Systems I am authorized to contribute a new transport
to the network subsystem in qemu.

Specifically, we would like to contribute a new transport:

L2TPv3 static tunnel network transport.

Earlier versions of this (we have patchsets going back to qemu-1.0) have
been used extensively as a testing tool in our group and will be shipped
in product at a later date. We also use its counterpart designed for UML
in a soon to ship product.

The idea is quite simple - instead of inventing "yet another weird and
wonderful" encapsulation of the day for virtual transports, we use a
well established industry standard. Specifically we use L2TPv3 static
tunnels. This is very similar to what qemu has been doing in the storage
area for a long time (the iSCSI storage driver).

This allows:

1. Qemu to communicate with any other qemu, UML, router, firewall or
network device directly with no mediation by a virtual switch. Local and
remote.

2. Qemu to communicate with any modern linux distribution directly
(including communicating with containers). Local and remote.

3. Qemu to communicate with the local host, remote vms, network devices,
etc at speeds which for a number of use cases exceed the speed of the
legacy tap driver.

Our suggestion would be that this over time obsoletes the UDP variety of
the "socket" driver. It has a number of advantages over that driver:

    * Standard compliant - RFC 3931 updated recently by RFC 5641
    * Supported on most network equipment
    * Allows to move virtual switching off-host to a high performance
physical switch if needed
    * Allows to mix virtual and physical switching (well supported on
modern Linuxes and other OSes)
    * Well researched security profile as well as established
interactions with IPSEC allowing to extend virtual networks outside the
datacenter to remote physical devices and/or VMs.

This transport uses a linux specific call to get several GBit RX rate.
This call can be wrapped (at some cost) in a compatibility loop using
posix compliant recvmsg instead for other systems. As I am not familiar
with the fine details on how to add linux specific, windows specific,
etc bits to qemu I have decided to leave that bit out for the time being
and submit it "as we use it".

Patch attached.

Best Regards,

Anton


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: l2tpv3.diff --]
[-- Type: text/x-patch; name="l2tpv3.diff", Size: 34468 bytes --]

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 45588d7..865f1c2 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -79,6 +79,7 @@ int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
 int parse_host_port(struct sockaddr_in *saddr, const char *str);
+int parse_host_port6(struct sockaddr_in6 *saddr, const char *str);
 int socket_init(void);
 
 #endif /* QEMU_SOCKET_H */
diff --git a/net/Makefile.objs b/net/Makefile.objs
index 4854a14..364c785 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
+common-obj-y += l2tpv3.o
 common-obj-$(CONFIG_POSIX) += tap.o
 common-obj-$(CONFIG_LINUX) += tap-linux.o
 common-obj-$(CONFIG_WIN32) += tap-win32.o
diff --git a/net/clients.h b/net/clients.h
index 7793294..ea2a831 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -47,6 +47,11 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 int net_init_bridge(const NetClientOptions *opts, const char *name,
                     NetClientState *peer);
 
+int net_init_raw(const NetClientOptions *opts, const char *name,
+                    NetClientState *peer);
+
+int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
+                    NetClientState *peer);
 #ifdef CONFIG_VDE
 int net_init_vde(const NetClientOptions *opts, const char *name,
                  NetClientState *peer);
diff --git a/net/l2tpv3.c b/net/l2tpv3.c
new file mode 100644
index 0000000..4b5d8ee
--- /dev/null
+++ b/net/l2tpv3.c
@@ -0,0 +1,630 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "net/l2tpv3.h"
+#include "config-host.h"
+#include "config-host.h"
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include <linux/ip.h>
+
+
+
+
+#define PAGE_SIZE 4096
+#define MAX_L2TPV3_IOVCNT 64
+
+typedef struct NetL2TPV3State {
+    NetClientState nc;
+    int fd;
+    int state; /* 0 = getting length, 1 = getting data */
+    unsigned int index;
+    unsigned int packet_len;
+    /* 
+	these are used for xmit - that happens packet a time
+	and for first sign of life packet (easier to parse that once)
+
+    */ 
+    uint8_t * header_buf;
+    uint8_t * buf;
+    struct iovec * vec;
+    /* 
+	these are used for receive - try to "eat" up to 32 packets at a time
+    */
+    struct mmsghdr * msgvec;
+    /*
+      contains inet host and port destination if 
+      connectionless (SOCK_DGRAM). Will be NULL if 
+      in listen mode waiting for first connect
+
+    */
+    struct sockaddr_storage * dgram_dst; 
+    uint64_t rx_cookie;
+    uint64_t tx_cookie;
+    uint32_t rx_session;
+    uint32_t tx_session;
+    uint32_t header_size;
+    uint32_t counter;
+    uint32_t dst_size;
+    uint32_t new_mode;
+   
+    /* offsets */ 
+
+    uint32_t offset;   /* main offset == header offset */
+    uint32_t cookie_offset;
+    uint32_t counter_offset;
+    uint32_t session_offset;
+
+} NetL2TPV3State;
+
+typedef struct NetL2TPV3ListenState {
+    NetClientState nc;
+    char *model;
+    char *name;
+    int fd;
+} NetL2TPV3ListenState;
+
+static int l2tpv3_parse_cookie32(const char *src , void * dst) {
+
+  if (
+	 (src == NULL) || 
+	 (sscanf(src, "%x", (unsigned int *) dst) != 1)
+      ) { 
+	 fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
+	 return -1;
+   } 
+
+   return 0;
+}
+
+static int l2tpv3_parse_cookie64(const char *src , void * dst) {
+
+   struct temphtonl temph;
+   uint32_t temp;
+   const int num = 42;
+    if (
+	 (src == NULL) || 
+	 (sscanf(src, "%lx", (long unsigned int *) &temph) != 1)
+      ) { 
+	 fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
+	 return -1;
+   } 
+
+   if(*(char *)&num == 42) {
+      // why oh why there is no htonll
+      temp = htonl(temph.high);
+      temph.high = htonl(temph.low);
+      temph.low = temp;
+   } else {
+      temph.low = htonl(temph.low); 
+      temph.high = htonl(temph.high);
+   }	
+
+   memcpy(dst, &temph, sizeof (uint64_t));
+   return 0;
+}
+
+
+static int l2tpv3_form_header(NetL2TPV3State *s) {
+   
+   uint32_t *header;
+   uint32_t *session;
+   uint64_t *cookie64;
+   uint32_t *cookie32;
+   uint32_t *counter;
+
+   if (s->new_mode & NEW_MODE_UDP) {
+     header = (uint32_t *) s->header_buf;
+     * header = htonl(0x30000);
+   }
+   session = (uint32_t *) (s->header_buf + s->session_offset);
+   * session = s->tx_session;	
+
+   if (s->new_mode & NEW_MODE_COOKIE) {
+      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
+	 cookie64 = (uint64_t *)(s->header_buf + s->cookie_offset);
+	 * cookie64 = s->tx_cookie;
+      } else {
+	 cookie32 = (uint32_t *) (s->header_buf + s->cookie_offset);
+	 * cookie32 = * ((uint32_t *) s->tx_cookie);
+      }
+   }
+   
+   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
+      counter = (uint32_t *)(s->header_buf + s->counter_offset);
+      * counter = htonl(++ s->counter);
+   }
+
+   return 0;
+}
+
+
+static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+
+    struct msghdr message;
+    int ret;
+    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
+	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
+	return -1;
+    }
+    l2tpv3_form_header(s);
+    memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
+    s->vec->iov_base = s->header_buf;
+    s->vec->iov_len = s->offset;
+    if (s->dgram_dst) {
+       message.msg_name = s->dgram_dst;
+       message.msg_namelen = s->dst_size;
+       message.msg_iov = (struct iovec *) s->vec;
+       message.msg_iovlen = iovcnt + 1; 
+       message.msg_control = NULL;
+       message.msg_controllen = 0;
+       message.msg_flags = 0;
+       ret = sendmsg(s->fd, &message, MSG_DONTWAIT) - s->offset;
+       if (ret == 0) {
+	    ret = iov_size (iov, iovcnt);
+       }
+   } else {
+      fprintf(stderr, "no dst yet\n");
+      ret = iov_size (iov, iovcnt); /* silently drop, otherwise it is being stupid */
+   }
+    return ret;
+}
+
+
+static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+    struct iovec * vec;
+
+    struct msghdr message;
+    l2tpv3_form_header(s);
+    vec = s->vec;
+    vec->iov_base = s->header_buf;
+    vec->iov_len = s->offset;
+    vec++;
+    vec->iov_base = (void *) buf;
+    vec->iov_len = size;
+
+    if (s->dgram_dst) {
+       message.msg_name = s->dgram_dst;
+       message.msg_namelen = s->dst_size;
+       message.msg_iov = (struct iovec *) s->vec;
+       message.msg_iovlen = 2;
+       message.msg_control = NULL;
+       message.msg_controllen = 0;
+       message.msg_flags = 0;
+       size = sendmsg(s->fd, &message, 0) - s->offset;
+   } else {
+      fprintf(stderr, "no dst yet\n");
+   }
+   return size; /* silently drop, otherwise it is being stupid */
+}
+
+static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {
+   uint64_t *cookie64;
+   uint32_t *cookie32;
+   uint32_t *session;
+
+
+    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
+	buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
+    } 
+
+
+   if (s->new_mode & NEW_MODE_COOKIE) {
+      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
+	 /* 64 bit cookie */
+	 cookie64 = (uint64_t *)(buf + s->cookie_offset);
+	 if (*cookie64 != s->rx_cookie) {
+	    fprintf(stderr, "unknown cookie id\n");
+	    return -1; /* we need to return 0, otherwise barfus */
+	 }
+      } else {
+	 cookie32 = (uint32_t *)(buf + s->cookie_offset);
+	 if (*cookie32 != * (uint32_t *) &s->rx_cookie) {
+	    fprintf(stderr,"unknown cookie id\n");
+	    return -1 ; /* we need to return 0, otherwise barfus */
+	 }
+      }
+   }
+   session = (uint32_t *) (buf + s->session_offset);
+   if (* session != s->rx_session) {
+      fprintf(stderr,"session mismatch\n");
+      return -1;
+   }	
+   return 0;
+}
+
+static void net_l2tpv3_send(void *opaque)
+{
+    NetL2TPV3State *s = opaque;
+    
+    int i, count, offset;
+    
+    struct mmsghdr * msgvec;
+    struct iovec * vec;
+    
+    
+    msgvec = s->msgvec;
+    offset = s->offset;
+    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
+	offset +=   sizeof(struct iphdr);
+    }  
+    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2, MSG_DONTWAIT, NULL);
+    for (i=0;i<count;i++) {
+	if (msgvec->msg_len > 0) {
+	    vec = msgvec->msg_hdr.msg_iov;
+	    if (l2tpv3_verify_header(s, vec->iov_base) == 0) {
+		//vec->iov_len = offset; /* reset size just in case*/
+		vec++;
+		qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len - offset);
+	    } else {
+		fprintf(stderr, "l2tpv3 header verification failed\n");
+		vec->iov_len = offset;
+		vec++;
+	    }
+	    //vec->iov_len = PAGE_SIZE; /* reset for next read */
+	} 
+	msgvec++;
+    }
+}
+
+static void net_l2tpv3_cleanup(NetClientState *nc) 
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+    close(s->fd);
+}
+
+static NetClientInfo net_l2tpv3_info = {
+    .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
+    .size = sizeof(NetL2TPV3State),
+    .receive = net_l2tpv3_receive_dgram,
+    .receive_iov = net_l2tpv3_receive_dgram_iov,
+    .cleanup = net_l2tpv3_cleanup,
+};
+
+static int net_l2tpv3_init(NetClientState *peer,
+                                 const char *name,
+                                 const char *src,
+                                 const char *dst,
+                                 const char *new_mode,
+                                 const char *tx_cookie,
+                                 const char *rx_cookie,
+                                 const char *tx_session,
+                                 const char *rx_session
+)
+{
+   NetL2TPV3State *s;
+   NetClientState *nc;
+
+   int fd, i;
+
+   int sock_family, sock_type, sock_proto;
+   unsigned int temp;
+   
+   struct mmsghdr * msgvec;
+   struct iovec * iov;
+
+
+   struct sockaddr_storage LocalSock;
+
+       
+   fprintf(stderr, "l2tpv3 user init mode %s\n", new_mode);
+   memset(&LocalSock, 0 , sizeof(LocalSock));
+
+   nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
+
+   s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+   s->header_buf = g_malloc(256);    
+
+   if (s->header_buf == NULL) {
+	fprintf(stderr, "Failed to allocate header buffer \n");
+	return -1;
+   } 
+   s->buf = g_malloc(PAGE_SIZE);    
+
+   if (s->buf == NULL) {
+	fprintf(stderr, "Failed to allocate packet buffer \n");
+	return -1;
+   } 
+   s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);;    
+   if (s->vec == NULL) {
+	fprintf(stderr, "Failed to allocate packet vec \n");
+	return -1;
+   } 
+
+   
+   s->offset = 4;
+   s->session_offset = 0;
+   s->cookie_offset = 4;
+   s->counter_offset = 4;
+   s->dst_size = sizeof(struct sockaddr_storage);
+
+   sscanf(new_mode, "%x", &s->new_mode);
+
+
+   fprintf(stderr, "type is: %s -> %x\n", new_mode, s->new_mode);
+
+   /* Cookies */ 
+
+  if (
+	 (tx_session == NULL) || 
+	 (sscanf(tx_session, "%x", &temp) != 1)
+      ) { 
+	 fprintf(stderr, "cannot parse tx_session!!!: %s\n", tx_session);
+	 return -1;
+   } else {
+      s->tx_session = htonl(temp);
+      fprintf(stderr, "l2tpv3_open :  parsed tx session 32, %0x\n", s->tx_session);
+   }
+
+  if (
+	 (tx_session == NULL) || 
+	 (sscanf(rx_session, "%x", &temp) != 1)
+      ) { 
+	 fprintf(stderr, "cannot parse rx_session!!!: %s\n", rx_session);
+	 return -1;
+   }  else {
+      s->rx_session = htonl(temp);
+      fprintf(stderr, "l2tpv3_open :  parsed rx session 32, %0x\n", s->rx_session);
+   }
+   if (s->new_mode & NEW_MODE_COOKIE) {
+      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
+	 /* 64 bit cookie */
+	 s->offset += 8;
+	 s->counter_offset += 8;
+	 if (l2tpv3_parse_cookie64(tx_cookie,&s->tx_cookie) !=0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie 64\n");
+	    return -1;
+	 } else {
+	    fprintf(stderr, "l2tpv3_open :  parsed tx cookie 64, %0lx\n", s->tx_cookie);
+	 }
+	 if (l2tpv3_parse_cookie64(rx_cookie,&s->rx_cookie) !=0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie 64\n");
+	    return -1;
+	 } else {
+	    fprintf(stderr, "l2tpv3_open :  parsed rx cookie 64, %0lx\n", s->rx_cookie);
+	 }
+      } else {
+	 /* 32 bit cookie */
+	 s->offset += 4;
+	 s->counter_offset +=4;
+	 s->tx_cookie = 0;
+	 if (l2tpv3_parse_cookie32(tx_cookie,&s->tx_cookie) !=0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie 32\n");
+	    return -1;
+	 }
+	 s->rx_cookie = 0;
+	 if (l2tpv3_parse_cookie32(rx_cookie,&s->rx_cookie) !=0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie 32\n");
+	    return -1;
+	 }
+      }
+   }
+
+         
+     
+   if (dst && (strlen(dst) > 0 )) {         
+      /* we now allocate it only if it we are not "listening" */
+      s->dgram_dst = malloc(sizeof(struct sockaddr_storage));
+      memset(s->dgram_dst, 0 , sizeof(struct sockaddr_storage));
+      fprintf(stderr, "l2tpv3_open : setting dst at init\n");
+   } else {
+      s->dgram_dst = NULL;
+   }
+
+   if (s->new_mode & NEW_MODE_IP_VERSION) {
+      /* IPv6 */
+      sock_family = AF_INET6;
+      if (parse_host_port6((struct sockaddr_in6 *) &LocalSock, src) !=0) {
+	 fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
+	 return -1;
+      };
+   } else {
+      /* IPv4 */
+      sock_family = AF_INET;
+      if (parse_host_port((struct sockaddr_in*) &LocalSock, src) != 0) {
+	 fprintf(stderr, "l2tpv3_open :  failed to parse v4 src\n");
+	 return -1;
+      };
+   }
+   if (s->new_mode & NEW_MODE_UDP) {
+      sock_type = SOCK_DGRAM;
+      sock_proto = 0;
+      /* space for header */
+
+      s->offset += 4;
+      s->counter_offset += 4;
+      s->session_offset += 4;
+      s->cookie_offset += 4;
+    } else {
+      sock_type = SOCK_RAW;
+      sock_proto = 0x73;
+      if (s->new_mode & NEW_MODE_IP_VERSION) {
+	 /* IPv6 */
+	 ((struct sockaddr_in6 *) &LocalSock)->sin6_port = htons(0x73);
+      } else {
+	 /* IPv4 */
+	 ((struct sockaddr_in *) &LocalSock)->sin_port = htons(0x73);
+      }
+   }
+
+   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
+      s->offset += 4;
+   }
+   if ((fd = socket(sock_family, sock_type, sock_proto)) == -1) {
+      fd = -errno;
+      fprintf(stderr, "l2tpv3_open : socket creation failed, "
+	       "errno = %d\n", -fd);
+      return fd;
+   } 
+
+
+   if (bind(fd, (struct sockaddr *) &LocalSock, sizeof(LocalSock))) {
+      fprintf(stderr, "l2tpv3_open :  could not bind socket err=%i\n", errno);
+      close(fd);
+      return -1;
+   } else {
+      fprintf(stderr, "l2tpv3_open : socket bound successfully\n");
+   }
+   if (s->dgram_dst) {
+      if (s->new_mode & NEW_MODE_IP_VERSION) {
+	 /* IPv6 */
+	 if (parse_host_port6((struct sockaddr_in6 *) s->dgram_dst, dst) !=0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
+	    return -1;
+	 };
+	 s->dst_size = sizeof(struct sockaddr_in6);
+      } else {
+	 /* IPv4 */
+	 if (parse_host_port((struct sockaddr_in *) s->dgram_dst, dst) != 0) {
+	    fprintf(stderr, "l2tpv3_open :  failed to parse v4 dst\n");
+	    return -1;
+	 };
+	 s->dst_size = sizeof(struct sockaddr_in);
+      }
+   }
+
+   s->msgvec = g_malloc(sizeof(struct mmsghdr) * (MAX_L2TPV3_IOVCNT/2));
+   if (s->msgvec == NULL) {
+	fprintf(stderr, "Failed to allocate receive packet vec \n");
+	return -1;
+   }
+   msgvec = s->msgvec;
+    
+   for (i=0;i<MAX_L2TPV3_IOVCNT/2;i++) {
+	/* we need to allocate these only for the first time and first buffer */
+	msgvec->msg_hdr.msg_name = NULL;
+	msgvec->msg_hdr.msg_namelen = 0;
+	iov =  g_malloc(sizeof(struct iovec) * 2);
+	if (iov == NULL) {
+	    fprintf(stderr, "Failed to allocate receive packet vec \n");
+	    return -1;
+	}
+	msgvec->msg_hdr.msg_iov = iov;
+	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
+	    iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr)); /* fix for ipv4 raw */;
+	} else {
+	    iov->iov_base = g_malloc(s->offset);
+	}
+	
+	if (iov->iov_base == NULL) {
+	    fprintf(stderr, "Failed to allocate receive packet vec \n");
+	    return -1;
+	}
+	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
+	    iov->iov_len = s->offset + sizeof (struct iphdr);
+	} else {
+	    iov->iov_len = s->offset;
+	}
+	iov++ ;
+	iov->iov_base = qemu_memalign(PAGE_SIZE, PAGE_SIZE);
+	if (iov->iov_base == NULL) {
+	    fprintf(stderr, "Failed to allocate receive packet vec \n");
+	    return -1;
+	}
+	iov->iov_len = PAGE_SIZE;
+	msgvec->msg_hdr.msg_iovlen = 2;
+	msgvec->msg_hdr.msg_control = NULL;
+	msgvec->msg_hdr.msg_controllen = 0;
+	msgvec->msg_hdr.msg_flags = 0;
+	msgvec++;
+   }
+ 
+//   sendbuff = 90000;
+//
+//  printf("sets the send buffer to %d\n", sendbuff);
+// setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sendbuff, sizeof(sendbuff));
+
+   qemu_set_nonblock(fd);
+
+   if (fd < 0)
+	return -1;
+
+   s->fd = fd;
+   s->counter = 0;
+
+   qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
+
+   if (!s) {
+      fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
+      return -1;
+   }
+
+   /* this needs fixing too */
+
+   snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+             "l2tpv3: connected");
+   return 0;
+
+}
+
+int net_init_l2tpv3(const NetClientOptions *opts,
+                    const char *name,
+                    NetClientState *peer) {
+
+
+    const NetdevL2TPv3Options * l2tpv3;
+    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
+    l2tpv3 = opts->l2tpv3;
+
+
+  if (!(l2tpv3->has_src && l2tpv3->has_mode && l2tpv3->has_txsession && l2tpv3->has_rxsession)) {
+      error_report("src=, dst=, txsession=, rxsession= and mode= are required for l2tpv3");
+      return -1;
+  }
+
+
+  /* this needs cleaning up for new API */
+
+  if (net_l2tpv3_init(
+	 peer, 
+         name,
+	 l2tpv3->src, 
+	 l2tpv3->dst, 
+	 l2tpv3->mode, 
+	 l2tpv3->txcookie,
+	 l2tpv3->rxcookie,
+	 l2tpv3->txsession,
+	 l2tpv3->rxsession
+      
+   ) == -1) {
+      return -1;
+  }
+  return 0;
+}
diff --git a/net/l2tpv3.h b/net/l2tpv3.h
new file mode 100644
index 0000000..822e0b0
--- /dev/null
+++ b/net/l2tpv3.h
@@ -0,0 +1,65 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+
+#ifndef QEMU_NET_L2TPV3_H
+#define QEMU_NET_L2TPV3_H
+
+#define L2TPV3_BUFSIZE 2048
+
+#define NEW_MODE_IP_VERSION   1		  /* on for v6, off for v4 */
+#define NEW_MODE_UDP	      2		  /* on for udp, off for raw ip */
+#define NEW_MODE_COOKIE	      4		  /* cookie present */
+#define NEW_MODE_COOKIE_SIZE  8		  /* on for 64 bit */
+#define NEW_MODE_NO_COUNTER   16	  /* DT - no counter */
+
+/* legacy modes */
+
+/* mode 0 */
+
+#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
+
+#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
+
+/* mode 1 */
+
+#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
+
+#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
+
+/* mode 2 */
+
+#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP + NEW_MODE_COOKIE_SIZE )
+
+#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER
+
+struct temphtonl {
+   uint32_t low; 
+   uint32_t high;
+};
+
+
+#endif /* QEMU_NET_SOCKET_H */
diff --git a/net/net.c b/net/net.c
index 0a88e68..ba5f51b 100644
--- a/net/net.c
+++ b/net/net.c
@@ -132,6 +132,43 @@ int parse_host_port(struct sockaddr_in *saddr, const char *str)
     return 0;
 }
 
+int parse_host_port6(struct sockaddr_in6 *saddr, const char *str)
+{
+    char buf[512];
+    char *p, *ip, *port;
+
+    strncpy((char *) &buf, str, 511);
+    ip = (char *) &buf;
+    port = NULL;
+    int portint;
+
+
+    for (p = (char *) &buf; (p < (char *) &buf + strlen(str)) || (p < (char *) &buf + 512); p++){
+      if (*p == '[') {
+	 ip = p + 1;
+      }
+      if (*p == ']') {
+	 *p = '\0';
+        if (*(p + 1) == ':') {
+	    port = p + 2;
+	}
+	break;
+      }
+    } 
+
+    saddr->sin6_family = AF_INET6;
+    if (!inet_pton(AF_INET6, ip, &saddr->sin6_addr)) {
+         return -1;
+    } 
+    if (port) {
+      sscanf(port, "%i", &portint);
+      saddr->sin6_port = htons(portint);
+    }
+    return 0;
+}
+
+
+
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
 {
     snprintf(nc->info_str, sizeof(nc->info_str),
@@ -731,6 +768,7 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
         [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
 #endif
         [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
+        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
 };
 
 
diff --git a/net/raw.c b/net/raw.c
new file mode 100644
index 0000000..04d244f
--- /dev/null
+++ b/net/raw.c
@@ -0,0 +1,244 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include <netinet/ether.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+
+#include "net/raw.h"
+
+#include "config-host.h"
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+
+
+#define RAW_BUFSIZE 2048
+
+
+typedef struct NetRAWState {
+    NetClientState nc;
+    int fd;
+    int state; /* 0 = getting length, 1 = getting data */
+    unsigned int ring_index;
+    unsigned int packet_len;
+    uint8_t * multiread_buffer;
+} NetRAWState;
+
+
+static ssize_t net_raw_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
+{
+    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
+    return write(s->fd, buf, size);
+}
+
+static ssize_t net_raw_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
+{
+    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
+    return writev(s->fd, iov, iovcnt);
+}
+
+static inline struct tpacket_hdr * current_header (NetRAWState *s) {
+    uint8_t * buffer;
+    buffer = s->multiread_buffer + (s->ring_index * RAW_TP_FRAME_SIZE);
+    return (struct tpacket_hdr *) buffer;
+}
+
+
+static struct tpacket_hdr * raw_advance_ring(NetRAWState *s) {
+
+    struct tpacket_hdr * header = current_header(s);
+    header->tp_status = TP_STATUS_KERNEL; /* mark as free */
+    s->ring_index = (s->ring_index + 1) % RAW_TP_FRAME_NR;
+    return current_header(s);
+}
+
+static void net_raw_send(void *opaque)
+{
+    NetRAWState *s = opaque;
+    struct tpacket_hdr * header;
+    
+    header = current_header(s);	
+    while ((header->tp_status & TP_STATUS_USER) > 0) {
+	qemu_send_packet(&s->nc, ((uint8_t *) header) + header->tp_mac, header->tp_snaplen);
+	header = raw_advance_ring(s);
+    }
+}
+
+
+static void net_raw_cleanup(NetClientState *nc)
+{
+    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
+    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+    close(s->fd);
+}
+
+static NetClientInfo net_raw_info = {
+    .type = NET_CLIENT_OPTIONS_KIND_RAW,
+    .size = sizeof(NetRAWState),
+    .receive = net_raw_receive_dgram,
+    .receive_iov = net_raw_receive_dgram_iov,
+    .cleanup = net_raw_cleanup,
+};
+
+static int net_raw_init(NetClientState *peer,
+                                 const char *name,
+                                 const char *host_interface
+                 )
+{
+    NetRAWState *s;
+    NetClientState *nc;
+
+    int fd;
+
+
+
+    struct ifreq ifr;
+    struct sockaddr_ll sock;
+
+    int err;
+    struct tpacket_req tpacket;
+
+
+    nc = qemu_new_net_client(&net_raw_info, peer, "raw", name);
+
+    s = DO_UPCAST(NetRAWState, nc, nc);
+
+    s->ring_index = 0;
+	 
+
+    if ((fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
+	err = -errno;
+	fprintf(stderr, "raw_open : raw socket creation failed, errno = %d\n", -err);
+	return err;
+    }
+
+    tpacket.tp_block_size = RAW_TP_BLOCK_SIZE; 
+    tpacket.tp_frame_size = RAW_TP_FRAME_SIZE; 
+    tpacket.tp_block_nr = RAW_TP_BLOCK_NR ; 
+    tpacket.tp_frame_nr = RAW_TP_FRAME_NR;
+
+    if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &tpacket, sizeof(struct tpacket_req))) {
+	fprintf(stderr, "uml_raw: failed to request packet mmap");
+	return -errno;
+    } else {
+	fprintf(stderr, "uml_raw: requested packet mmap\n");
+    }
+
+    s->multiread_buffer = (uint8_t *) mmap(
+	    NULL, 
+	    RAW_TP_FRAME_SIZE * RAW_TP_FRAME_NR, 
+	    PROT_READ | PROT_WRITE, MAP_SHARED, 
+	    fd, 
+	    0
+    );
+
+    if (!(s->multiread_buffer)) {
+	fprintf(stderr, "raw: failed to map buffer");
+	return -1;
+    } else {
+	fprintf(stderr, "raw: mmmap-ed buffer at %p\n", s->multiread_buffer);
+    }
+
+
+    
+
+    memset(&ifr, 0, sizeof(struct ifreq));
+    strncpy((char *) &ifr.ifr_name, host_interface, sizeof(ifr.ifr_name) - 1);
+    if(ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
+	err = -errno;
+	fprintf(stderr, "SIOCGIFINDEX, failed to get raw interface index for %s", host_interface);
+	close(fd);
+	return(-1);
+    } else {
+    }
+
+    sock.sll_family = AF_PACKET;
+    sock.sll_protocol = htons(ETH_P_ALL);
+    sock.sll_ifindex = ifr.ifr_ifindex;
+
+    fprintf(stderr, "raw: binding raw on interface index: %i\n", ifr.ifr_ifindex);
+    if (bind(fd, (struct sockaddr *) &sock, sizeof(struct sockaddr_ll)) < 0) {
+	fprintf(stderr, "raw: failed to bind raw socket");
+	close(fd);
+	return(-1);
+    }
+
+
+    qemu_set_nonblock(fd);
+
+    if (fd < 0)
+	return -1;
+
+    s->ring_index = 0;
+
+    s->fd = fd;
+
+    qemu_set_fd_handler(s->fd, net_raw_send, NULL, s);
+
+    if (!s) {
+      fprintf (stderr, "raw_open : failed to set fd handler\n");
+      return -1;
+    }
+
+/* this needs fixing too */
+
+    snprintf(s->nc.info_str, sizeof(s->nc.info_str), "raw: connected");
+    return 0;
+
+}
+
+int net_init_raw(const NetClientOptions *opts,
+                    const char *name,
+                    NetClientState *peer) {
+
+
+    const NetdevRawOptions * raw;
+    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_RAW);
+    raw = opts->raw;
+
+    if (raw->has_fd) {
+	error_report("passing socket fd not yet supported");
+	return -1 ;
+    }
+    
+    if (!(raw->has_ifname)) {
+      error_report("iface required for raw");
+      return -1;
+    }
+
+    if (net_raw_init( peer, name, raw->ifname) == -1) {
+      return -1;
+    }
+    return 0;
+}
diff --git a/net/raw.h b/net/raw.h
new file mode 100644
index 0000000..80ab43e
--- /dev/null
+++ b/net/raw.h
@@ -0,0 +1,33 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef QEMU_NET_RAW_H
+#define QEMU_NET_RAW_H
+
+#define RAW_TP_BLOCK_SIZE 4096
+#define RAW_TP_FRAME_SIZE 2048
+#define RAW_TP_BLOCK_NR 32
+#define RAW_TP_FRAME_NR 64
+
+#endif /* QEMU_NET_RAW_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index 83fa485..192d19c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2941,6 +2941,45 @@
     '*udp':       'str' } }
 
 ##
+# @NetdevL2TPv3Options
+#
+# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
+#
+# @fd: #optional file descriptor of an already opened socket
+#
+# @src: source address
+#
+# @dst: #optional destination address
+#
+# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual definition)
+#
+# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set mode for actual type selection)
+#
+# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set mode for actual type selection)
+#
+# @txsession: tx 32 bit session
+#
+# @rxsession: rx 32 bit session
+#
+#
+# Since 1.0
+##
+##
+{ 'type': 'NetdevL2TPv3Options',
+  'data': {
+    '*fd':        'str',
+    '*src':	  'str', 
+    '*dst':	  'str', 
+    '*mode':	  'str', 
+    '*txcookie':  'str',
+    '*rxcookie':  'str',
+    '*txsession': 'str',
+    '*rxsession': 'str'
+
+} }
+
+##
+##
 # @NetdevVdeOptions
 #
 # Connect the VLAN to a vde switch running on the host.
@@ -3021,6 +3060,7 @@
     'nic':      'NetLegacyNicOptions',
     'user':     'NetdevUserOptions',
     'tap':      'NetdevTapOptions',
+    'l2tpv3':   'NetdevL2TPv3Options',
     'socket':   'NetdevSocketOptions',
     'vde':      'NetdevVdeOptions',
     'dump':     'NetdevDumpOptions',


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28  8:28 [Qemu-devel] Contribution - L2TPv3 transport Anton Ivanov (antivano)
@ 2014-02-28 10:02 ` Paolo Bonzini
  2014-02-28 11:17   ` Anton Ivanov (antivano)
                     ` (2 more replies)
  2014-02-28 13:40 ` Eric Blake
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 36+ messages in thread
From: Paolo Bonzini @ 2014-02-28 10:02 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Qemu-devel, Stefan Hajnoczi

> This transport uses a linux specific call to get several GBit RX rate.

Might as well mention that it's recvmmsg. :)

> This call can be wrapped (at some cost) in a compatibility loop using
> posix compliant recvmsg instead for other systems. As I am not familiar
> with the fine details on how to add linux specific, windows specific,
> etc bits to qemu I have decided to leave that bit out for the time being
> and submit it "as we use it".

You can add the compatibility structures in include/qemu/recvmmsg.h and 
util/recvmmsg.c, and only compile recvmmsg.c on non-Linux system with 
something like

     util-obj-$(call lnot, CONFIG_LINUX) += recvmmsg.o

For now, you can just make l2tpv3.c Linux-specific.

> Patch attached.

Thanks for the contribution!  I have quite a few comments.  The largest 
part of the work will be to cleanup the user interface.

Also, I suggest using connected sockets instead of connectionless, but 
that should be comparatively less work.

You also need to document the new options in qemu-options.hx.

Please don't be turned off by the amount of comments.  This is quite 
normal.  I'm CCing the maintainer of the network layer, Stefan Hajnoczi, 
who can help you more with this task.

> Best Regards,
>
> Anton
>
>
> l2tpv3.diff
>
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 45588d7..865f1c2 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -79,6 +79,7 @@ int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
>
>  /* Old, ipv4 only bits.  Don't use for new code. */
>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str);
>  int socket_init(void);
>
>  #endif /* QEMU_SOCKET_H */
> diff --git a/net/Makefile.objs b/net/Makefile.objs
> index 4854a14..364c785 100644
> --- a/net/Makefile.objs
> +++ b/net/Makefile.objs
> @@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
>  common-obj-y += socket.o
>  common-obj-y += dump.o
>  common-obj-y += eth.o
> +common-obj-y += l2tpv3.o

This should be CONFIG_LINUX or (better, if you implement a compatibility 
layer for recvmmsg) CONFIG_POSIX, because Windows does not have 
sendmsg/recvmsg.

QEMU already has some compatibility code for sendmsg/recvmsg 
(iov_send/iov_recv) but it doesn't let you specify a destination, so for 
now we can make it POSIX only.

>  common-obj-$(CONFIG_POSIX) += tap.o
>  common-obj-$(CONFIG_LINUX) += tap-linux.o
>  common-obj-$(CONFIG_WIN32) += tap-win32.o
> diff --git a/net/clients.h b/net/clients.h
> index 7793294..ea2a831 100644
> --- a/net/clients.h
> +++ b/net/clients.h
> @@ -47,6 +47,11 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>  int net_init_bridge(const NetClientOptions *opts, const char *name,
>                      NetClientState *peer);
>
> +int net_init_raw(const NetClientOptions *opts, const char *name,
> +                    NetClientState *peer);
> +
> +int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
> +                    NetClientState *peer);
>  #ifdef CONFIG_VDE
>  int net_init_vde(const NetClientOptions *opts, const char *name,
>                   NetClientState *peer);
> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
> new file mode 100644
> index 0000000..4b5d8ee
> --- /dev/null
> +++ b/net/l2tpv3.c
> @@ -0,0 +1,630 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "net/l2tpv3.h"
> +#include "config-host.h"
> +#include "config-host.h"
> +#include "net/net.h"
> +#include "clients.h"
> +#include "monitor/monitor.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/sockets.h"
> +#include "qemu/iov.h"
> +#include "qemu/main-loop.h"
> +#include <linux/ip.h>
> +
> +
> +
> +
> +#define PAGE_SIZE 4096
> +#define MAX_L2TPV3_IOVCNT 64
> +
> +typedef struct NetL2TPV3State {
> +    NetClientState nc;
> +    int fd;
> +    int state; /* 0 = getting length, 1 = getting data */
> +    unsigned int index;
> +    unsigned int packet_len;
> +    /*
> +	these are used for xmit - that happens packet a time
> +	and for first sign of life packet (easier to parse that once)
> +
> +    */

Hi Anton,

thanks for the contribution!

The new files do not obey the QEMU coding standards. We use comments like

     /* These are used for xmit - that happens packet a time
      * and for first sign of life packet (easier to parse that once).
      */

and no C++ comments.

> +    uint8_t * header_buf;
> +    uint8_t * buf;
> +    struct iovec * vec;
> +    /*
> +	these are used for receive - try to "eat" up to 32 packets at a time
> +    */
> +    struct mmsghdr * msgvec;
> +    /*
> +      contains inet host and port destination if
> +      connectionless (SOCK_DGRAM). Will be NULL if
> +      in listen mode waiting for first connect
> +
> +    */
> +    struct sockaddr_storage * dgram_dst;

I cannot see any code that handles "listening" mode, so please drop all 
of it and assume dgram_dst is non-NULL throughout the code.

Even if we were to add a "listening" mode later, I'd prefer a boolean 
value ("bool connected;") and not making dgram_dst a pointer.

> +    uint64_t rx_cookie;
> +    uint64_t tx_cookie;
> +    uint32_t rx_session;
> +    uint32_t tx_session;
> +    uint32_t header_size;
> +    uint32_t counter;
> +    uint32_t dst_size;
> +    uint32_t new_mode;
> +
> +    /* offsets */
> +
> +    uint32_t offset;   /* main offset == header offset */
> +    uint32_t cookie_offset;
> +    uint32_t counter_offset;
> +    uint32_t session_offset;
> +

Useless blank line.

> +} NetL2TPV3State;
> +
> +typedef struct NetL2TPV3ListenState {
> +    NetClientState nc;
> +    char *model;
> +    char *name;
> +    int fd;
> +} NetL2TPV3ListenState;
> +
> +static int l2tpv3_parse_cookie32(const char *src , void * dst) {
> +
> +  if (
> +	 (src == NULL) ||
> +	 (sscanf(src, "%x", (unsigned int *) dst) != 1)

Is it really so bad that the user has to prepend "0x" to the cookie? 
Then you can just use the same parsing that is used elsewhere: declare 
the fields as 'int64' in qapi-schema.json and QEMU will do everything 
for you.

> +      ) {
> +	 fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
> +	 return -1;
> +   }
> +
> +   return 0;
> +}
> +
> +static int l2tpv3_parse_cookie64(const char *src , void * dst) {
> +
> +   struct temphtonl temph;
> +   uint32_t temp;
> +   const int num = 42;
> +    if (
> +	 (src == NULL) ||
> +	 (sscanf(src, "%lx", (long unsigned int *) &temph) != 1)
> +      ) {
> +	 fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
> +	 return -1;
> +   }
> +
> +   if(*(char *)&num == 42) {
> +      // why oh why there is no htonll
> +      temp = htonl(temph.high);
> +      temph.high = htonl(temph.low);
> +      temph.low = temp;
> +   } else {
> +      temph.low = htonl(temph.low);
> +      temph.high = htonl(temph.high);
> +   }	

There is no htonll, but QEMU has stq_be_p.  You can use that instead of 
this hack.

BTW, the indentation is off in most of the file.

As mentioned below, I suggest storing the cookies and session ids in 
host order in NetL2TPV3State, and doing the conversion in 
l2tpv3_form_header and friends.

> +   memcpy(dst, &temph, sizeof (uint64_t));
> +   return 0;
> +}
> +
> +
> +static int l2tpv3_form_header(NetL2TPV3State *s) {
> +
> +   uint32_t *header;
> +   uint32_t *session;
> +   uint64_t *cookie64;
> +   uint32_t *cookie32;
> +   uint32_t *counter;
> +
> +   if (s->new_mode & NEW_MODE_UDP) {
> +     header = (uint32_t *) s->header_buf;
> +     * header = htonl(0x30000);

stl_be_p

> +   }
> +   session = (uint32_t *) (s->header_buf + s->session_offset);
> +   * session = s->tx_session;

As mentioned above, better store s->tx_session in host order, and use 
stl_be_p.

> +
> +   if (s->new_mode & NEW_MODE_COOKIE) {
> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
> +	 cookie64 = (uint64_t *)(s->header_buf + s->cookie_offset);
> +	 * cookie64 = s->tx_cookie;
> +      } else {
> +	 cookie32 = (uint32_t *) (s->header_buf + s->cookie_offset);
> +	 * cookie32 = * ((uint32_t *) s->tx_cookie);
> +      }
> +   }
> +
> +   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
> +      counter = (uint32_t *)(s->header_buf + s->counter_offset);
> +      * counter = htonl(++ s->counter);
> +   }
> +
> +   return 0;
> +}
> +
> +
> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
> +{
> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +

Double blank line.

> +    struct msghdr message;
> +    int ret;

Might as well add a blank line here. :)

> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
> +	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
> +	return -1;
> +    }
> +    l2tpv3_form_header(s);
> +    memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
> +    s->vec->iov_base = s->header_buf;
> +    s->vec->iov_len = s->offset;
> +    if (s->dgram_dst) {
> +       message.msg_name = s->dgram_dst;
> +       message.msg_namelen = s->dst_size;
> +       message.msg_iov = (struct iovec *) s->vec;
> +       message.msg_iovlen = iovcnt + 1;
> +       message.msg_control = NULL;
> +       message.msg_controllen = 0;
> +       message.msg_flags = 0;
> +       ret = sendmsg(s->fd, &message, MSG_DONTWAIT) - s->offset;

You can use iov_send

> +       if (ret == 0) {
> +	    ret = iov_size (iov, iovcnt);
> +       }
> +   } else {
> +      fprintf(stderr, "no dst yet\n");
> +      ret = iov_size (iov, iovcnt); /* silently drop, otherwise it is being stupid */
> +   }
> +    return ret;
> +}
> +
> +
> +static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
> +{
> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +    struct iovec * vec;
> +
> +    struct msghdr message;
> +    l2tpv3_form_header(s);
> +    vec = s->vec;
> +    vec->iov_base = s->header_buf;
> +    vec->iov_len = s->offset;
> +    vec++;
> +    vec->iov_base = (void *) buf;
> +    vec->iov_len = size;
> +
> +    if (s->dgram_dst) {
> +       message.msg_name = s->dgram_dst;
> +       message.msg_namelen = s->dst_size;
> +       message.msg_iov = (struct iovec *) s->vec;
> +       message.msg_iovlen = 2;
> +       message.msg_control = NULL;
> +       message.msg_controllen = 0;
> +       message.msg_flags = 0;
> +       size = sendmsg(s->fd, &message, 0) - s->offset;
> +   } else {
> +      fprintf(stderr, "no dst yet\n");
> +   }
> +   return size; /* silently drop, otherwise it is being stupid */
> +}
> +
> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {
> +   uint64_t *cookie64;
> +   uint32_t *cookie32;
> +   uint32_t *session;
> +
> +
> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
> +	buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
> +    }
> +
> +

Double blank line, and incorrect indentation.

> +   if (s->new_mode & NEW_MODE_COOKIE) {
> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
> +	 /* 64 bit cookie */
> +	 cookie64 = (uint64_t *)(buf + s->cookie_offset);
> +	 if (*cookie64 != s->rx_cookie) {
> +	    fprintf(stderr, "unknown cookie id\n");
> +	    return -1; /* we need to return 0, otherwise barfus */
> +	 }
> +      } else {
> +	 cookie32 = (uint32_t *)(buf + s->cookie_offset);
> +	 if (*cookie32 != * (uint32_t *) &s->rx_cookie) {
> +	    fprintf(stderr,"unknown cookie id\n");
> +	    return -1 ; /* we need to return 0, otherwise barfus */
> +	 }
> +      }
> +   }
> +   session = (uint32_t *) (buf + s->session_offset);
> +   if (* session != s->rx_session) {
> +      fprintf(stderr,"session mismatch\n");
> +      return -1;
> +   }	
> +   return 0;
> +}
> +
> +static void net_l2tpv3_send(void *opaque)
> +{
> +    NetL2TPV3State *s = opaque;
> +
> +    int i, count, offset;
> +
> +    struct mmsghdr * msgvec;
> +    struct iovec * vec;
> +
> +
> +    msgvec = s->msgvec;
> +    offset = s->offset;
> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){

Space before the opening brace, and parentheses around !(a & b) are 
unnecessary.  More instances in the rest of the file.

> +	offset +=   sizeof(struct iphdr);
> +    }
> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2, MSG_DONTWAIT, NULL);
> +    for (i=0;i<count;i++) {
> +	if (msgvec->msg_len > 0) {

More instances of inconsistent indentation (both within the file, and 
with the rest of QEMU).

> +	    vec = msgvec->msg_hdr.msg_iov;
> +	    if (l2tpv3_verify_header(s, vec->iov_base) == 0) {
> +		//vec->iov_len = offset; /* reset size just in case*/
> +		vec++;
> +		qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len - offset);
> +	    } else {
> +		fprintf(stderr, "l2tpv3 header verification failed\n");
> +		vec->iov_len = offset;
> +		vec++;
> +	    }
> +	    //vec->iov_len = PAGE_SIZE; /* reset for next read */

Do not include commented-out code.

> +	}
> +	msgvec++;
> +    }
> +}
> +
> +static void net_l2tpv3_cleanup(NetClientState *nc)
> +{
> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
> +    close(s->fd);
> +}
> +
> +static NetClientInfo net_l2tpv3_info = {
> +    .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
> +    .size = sizeof(NetL2TPV3State),
> +    .receive = net_l2tpv3_receive_dgram,
> +    .receive_iov = net_l2tpv3_receive_dgram_iov,
> +    .cleanup = net_l2tpv3_cleanup,
> +};
> +
> +static int net_l2tpv3_init(NetClientState *peer,
> +                                 const char *name,
> +                                 const char *src,
> +                                 const char *dst,
> +                                 const char *new_mode,
> +                                 const char *tx_cookie,
> +                                 const char *rx_cookie,
> +                                 const char *tx_session,
> +                                 const char *rx_session
> +)

This parenthesis goes on the same line as the last argument.

Also, I suggest getting a NetdevL2TPv3Options* instead of eight 
different arguments.

> +{
> +   NetL2TPV3State *s;
> +   NetClientState *nc;
> +
> +   int fd, i;
> +
> +   int sock_family, sock_type, sock_proto;
> +   unsigned int temp;
> +
> +   struct mmsghdr * msgvec;
> +   struct iovec * iov;
> +
> +
> +   struct sockaddr_storage LocalSock;
> +
> +
> +   fprintf(stderr, "l2tpv3 user init mode %s\n", new_mode);
> +   memset(&LocalSock, 0 , sizeof(LocalSock));
> +
> +   nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
> +
> +   s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +   s->header_buf = g_malloc(256);
> +
> +   if (s->header_buf == NULL) {
> +	fprintf(stderr, "Failed to allocate header buffer \n");
> +	return -1;
> +   }
> +   s->buf = g_malloc(PAGE_SIZE);
> +
> +   if (s->buf == NULL) {
> +	fprintf(stderr, "Failed to allocate packet buffer \n");
> +	return -1;
> +   }
> +   s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);;

Why do you need separate mallocs for these?

> +   if (s->vec == NULL) {
> +	fprintf(stderr, "Failed to allocate packet vec \n");
> +	return -1;
> +   }
> +
> +
Double blank lines.

> +   s->offset = 4;
> +   s->session_offset = 0;
> +   s->cookie_offset = 4;
> +   s->counter_offset = 4;
> +   s->dst_size = sizeof(struct sockaddr_storage);
> +
> +   sscanf(new_mode, "%x", &s->new_mode);

Please replace this with multiple boolean or integer options, QAPI makes 
it easy.
> +
> +
> +   fprintf(stderr, "type is: %s -> %x\n", new_mode, s->new_mode);

No debugging output.

> +   /* Cookies */
> +
> +  if (
> +	 (tx_session == NULL) ||
> +	 (sscanf(tx_session, "%x", &temp) != 1)
> +      ) {
> +	 fprintf(stderr, "cannot parse tx_session!!!: %s\n", tx_session);
> +	 return -1;
> +   } else {
> +      s->tx_session = htonl(temp);
> +      fprintf(stderr, "l2tpv3_open :  parsed tx session 32, %0x\n", s->tx_session);
> +   }

As above, better force the user to prepend a "0x" and let QAPI do the 
parsing.

> +  if (
> +	 (tx_session == NULL) ||
> +	 (sscanf(rx_session, "%x", &temp) != 1)
> +      ) {
> +	 fprintf(stderr, "cannot parse rx_session!!!: %s\n", rx_session);
> +	 return -1;
> +   }  else {
> +      s->rx_session = htonl(temp);
> +      fprintf(stderr, "l2tpv3_open :  parsed rx session 32, %0x\n", s->rx_session);
> +   }
> +   if (s->new_mode & NEW_MODE_COOKIE) {
> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {

Perhaps an optional integer argument like 
"cookie-size=32"/"cookie-size=64"?  Also, please mark 
tx-cookie/rx-cookie as optional and check that they are present if and 
only if cookie-size != 0.

Alternatively, mark tx-cookie/rx-cookie as optional and do all the 
following:

* check that either both are present or none is

* check that the optional integer argument cookie-size, if present, is 
either 32 or 64

* check that cookie-size is not present unless tx-cookie and rx-cookie 
are also present

* if it makes sense, give a default value to cookie-size.  Otherwise, 
check that cookie-size is present if tx-cookie and rx-cookie are

> +	 /* 64 bit cookie */
> +	 s->offset += 8;
> +	 s->counter_offset += 8;
> +	 if (l2tpv3_parse_cookie64(tx_cookie,&s->tx_cookie) !=0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie 64\n");
> +	    return -1;
> +	 } else {
> +	    fprintf(stderr, "l2tpv3_open :  parsed tx cookie 64, %0lx\n", s->tx_cookie);
> +	 }
> +	 if (l2tpv3_parse_cookie64(rx_cookie,&s->rx_cookie) !=0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie 64\n");
> +	    return -1;
> +	 } else {
> +	    fprintf(stderr, "l2tpv3_open :  parsed rx cookie 64, %0lx\n", s->rx_cookie);
> +	 }
> +      } else {
> +	 /* 32 bit cookie */
> +	 s->offset += 4;
> +	 s->counter_offset +=4;
> +	 s->tx_cookie = 0;
> +	 if (l2tpv3_parse_cookie32(tx_cookie,&s->tx_cookie) !=0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie 32\n");
> +	    return -1;
> +	 }
> +	 s->rx_cookie = 0;
> +	 if (l2tpv3_parse_cookie32(rx_cookie,&s->rx_cookie) !=0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie 32\n");
> +	    return -1;
> +	 }
> +      }
> +   }
> +
> +
> +
> +   if (dst && (strlen(dst) > 0 )) {
> +      /* we now allocate it only if it we are not "listening" */
> +      s->dgram_dst = malloc(sizeof(struct sockaddr_storage));
> +      memset(s->dgram_dst, 0 , sizeof(struct sockaddr_storage));
> +      fprintf(stderr, "l2tpv3_open : setting dst at init\n");
> +   } else {
> +      s->dgram_dst = NULL;
> +   }
> +
> +   if (s->new_mode & NEW_MODE_IP_VERSION) {
> +      /* IPv6 */
> +      sock_family = AF_INET6;
> +      if (parse_host_port6((struct sockaddr_in6 *) &LocalSock, src) !=0) {

Is the local address mandatory?

> +	 fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
> +	 return -1;
> +      };
> +   } else {
> +      /* IPv4 */
> +      sock_family = AF_INET;
> +      if (parse_host_port((struct sockaddr_in*) &LocalSock, src) != 0) {

I would try using a connected socket?  This way you can use the 
socket_dgram function to create the UDP socket, resolve host names, use 
getaddrinfo properly, etc.

For raw sockets, you can cut-and-paste relevant bits of socket_dgram 
into a new function that creates raw sockets:

     int socket_raw(const char *host, const char *localaddr,
                    int proto, bool ipv4, bool ipv6, Error **errp)
     {
         ...
     }

Errors from socket_dgram/socket_raw can be printed with the 
qerror_report_err function.

> +	 fprintf(stderr, "l2tpv3_open :  failed to parse v4 src\n");
> +	 return -1;
> +      };
> +   }

The way this is usually done is using two optional booleans named ipv4 
and ipv6.  Also, please use separate options host, port, localaddr, 
localport for consistency with other places where we specify the address 
of a datagram socket.

Add an optional argument instead of the NEW_MODE_UDP bit.  Examples:

* an enum named "transport" and make it accept values "udp" or "ip"

* a boolean named "udp"

etc.  You can decide what the default is.  Make port optional (and 
localport too if making localaddr/localport mandatory is justified); 
check that it isn't specified for the raw protocol.

> +   if (s->new_mode & NEW_MODE_UDP) {
> +      sock_type = SOCK_DGRAM;
> +      sock_proto = 0;
> +      /* space for header */
> +
> +      s->offset += 4;
> +      s->counter_offset += 4;
> +      s->session_offset += 4;
> +      s->cookie_offset += 4;
> +    } else {
> +      sock_type = SOCK_RAW;
> +      sock_proto = 0x73;
> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
> +	 /* IPv6 */
> +	 ((struct sockaddr_in6 *) &LocalSock)->sin6_port = htons(0x73);
> +      } else {
> +	 /* IPv4 */
> +	 ((struct sockaddr_in *) &LocalSock)->sin_port = htons(0x73);

If you move socket creation to a function in util/qemu-sockets.c, like 
the socket_raw I suggest above, note that there is already an 
inet_setport in util/qemu-sockets.c.

> +      }
> +   }
> +
> +   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {

Another optional boolean, "counter", defaulting (I guess) to 0.

> +      s->offset += 4;
> +   }
> +   if ((fd = socket(sock_family, sock_type, sock_proto)) == -1) {
> +      fd = -errno;
> +      fprintf(stderr, "l2tpv3_open : socket creation failed, "
> +	       "errno = %d\n", -fd);
> +      return fd;
> +   }
> +
> +
> +   if (bind(fd, (struct sockaddr *) &LocalSock, sizeof(LocalSock))) {
> +      fprintf(stderr, "l2tpv3_open :  could not bind socket err=%i\n", errno);
> +      close(fd);
> +      return -1;
> +   } else {
> +      fprintf(stderr, "l2tpv3_open : socket bound successfully\n");
> +   }
> +   if (s->dgram_dst) {
> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
> +	 /* IPv6 */
> +	 if (parse_host_port6((struct sockaddr_in6 *) s->dgram_dst, dst) !=0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
> +	    return -1;
> +	 };
> +	 s->dst_size = sizeof(struct sockaddr_in6);
> +      } else {
> +	 /* IPv4 */
> +	 if (parse_host_port((struct sockaddr_in *) s->dgram_dst, dst) != 0) {
> +	    fprintf(stderr, "l2tpv3_open :  failed to parse v4 dst\n");
> +	    return -1;
> +	 };
> +	 s->dst_size = sizeof(struct sockaddr_in);
> +      }
> +   }
> +
> +   s->msgvec = g_malloc(sizeof(struct mmsghdr) * (MAX_L2TPV3_IOVCNT/2));
> +   if (s->msgvec == NULL) {
> +	fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	return -1;
> +   }
> +   msgvec = s->msgvec;

More allocations that could be left in NetL2TPV3State.

> +   for (i=0;i<MAX_L2TPV3_IOVCNT/2;i++) {
> +	/* we need to allocate these only for the first time and first buffer */
> +	msgvec->msg_hdr.msg_name = NULL;
> +	msgvec->msg_hdr.msg_namelen = 0;
> +	iov =  g_malloc(sizeof(struct iovec) * 2);

This could also be a multidimensional array.

> +	if (iov == NULL) {
> +	    fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	    return -1;
> +	}
> +	msgvec->msg_hdr.msg_iov = iov;
> +	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){

Dropping new_mode will also make this code much more readable:

      "if (!s->ipv6 && !s->udp)"

> +	    iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr)); /* fix for ipv4 raw */;

Extra semicolon at the end of the line, also please try to be more verbose:

     /* We need to allocate blah blah for IPv4 raw sockets, because blah
      * blah.
      */

> +	} else {
> +	    iov->iov_base = g_malloc(s->offset);
> +	}
> +	
> +	if (iov->iov_base == NULL) {
> +	    fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	    return -1;
> +	}

g_malloc can never fail.

> +	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
> +	    iov->iov_len = s->offset + sizeof (struct iphdr);
> +	} else {
> +	    iov->iov_len = s->offset;
> +	}

Better set iov_len first, and then refer to it in the allocation.

> +	iov++ ;

Just use iov[0] and iov[1].

> +	iov->iov_base = qemu_memalign(PAGE_SIZE, PAGE_SIZE);
> +	if (iov->iov_base == NULL) {
> +	    fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	    return -1;
> +	}
> +	iov->iov_len = PAGE_SIZE;
> +	msgvec->msg_hdr.msg_iovlen = 2;
> +	msgvec->msg_hdr.msg_control = NULL;
> +	msgvec->msg_hdr.msg_controllen = 0;
> +	msgvec->msg_hdr.msg_flags = 0;
> +	msgvec++;
> +   }
> +
> +//   sendbuff = 90000;
> +//
> +//  printf("sets the send buffer to %d\n", sendbuff);
> +// setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sendbuff, sizeof(sendbuff));

No debug code.

> +
> +   qemu_set_nonblock(fd);

Since the socket is nonblocking, MSG_DONTWAIT is unnecessary.

> +
> +   if (fd < 0)
> +	return -1;

You should have checked this much earlier, right after the socket 
function was created.

> +   s->fd = fd;
> +   s->counter = 0;
> +
> +   qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
> +
> +   if (!s) {
> +      fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
> +      return -1;
> +   }

Cannot happen, please delete.

> +
> +   /* this needs fixing too */
> +
> +   snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> +             "l2tpv3: connected");

What needs fixing?

> +   return 0;
> +
> +}
> +
> +int net_init_l2tpv3(const NetClientOptions *opts,
> +                    const char *name,
> +                    NetClientState *peer) {
> +
> +
> +    const NetdevL2TPv3Options * l2tpv3;
> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
> +    l2tpv3 = opts->l2tpv3;
> +
> +
> +  if (!(l2tpv3->has_src && l2tpv3->has_mode && l2tpv3->has_txsession && l2tpv3->has_rxsession)) {
> +      error_report("src=, dst=, txsession=, rxsession= and mode= are required for l2tpv3");
> +      return -1;
> +  }
> +
> +
> +  /* this needs cleaning up for new API */

Then do it. :)

> +  if (net_l2tpv3_init(
> +	 peer,
> +         name,
> +	 l2tpv3->src,
> +	 l2tpv3->dst,
> +	 l2tpv3->mode,
> +	 l2tpv3->txcookie,
> +	 l2tpv3->rxcookie,
> +	 l2tpv3->txsession,
> +	 l2tpv3->rxsession
> +
> +   ) == -1) {
> +      return -1;
> +  }
> +  return 0;
> +}
> diff --git a/net/l2tpv3.h b/net/l2tpv3.h
> new file mode 100644
> index 0000000..822e0b0
> --- /dev/null
> +++ b/net/l2tpv3.h
> @@ -0,0 +1,65 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +
> +#ifndef QEMU_NET_L2TPV3_H
> +#define QEMU_NET_L2TPV3_H
> +
> +#define L2TPV3_BUFSIZE 2048
> +
> +#define NEW_MODE_IP_VERSION   1		  /* on for v6, off for v4 */
> +#define NEW_MODE_UDP	      2		  /* on for udp, off for raw ip */
> +#define NEW_MODE_COOKIE	      4		  /* cookie present */
> +#define NEW_MODE_COOKIE_SIZE  8		  /* on for 64 bit */
> +#define NEW_MODE_NO_COUNTER   16	  /* DT - no counter */

Please use bools in NetL2TPV3State

> +/* legacy modes */
> +
> +/* mode 0 */
> +
> +#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
> +
> +#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
> +
> +/* mode 1 */
> +
> +#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
> +
> +#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
> +
> +/* mode 2 */
> +
> +#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP + NEW_MODE_COOKIE_SIZE )
> +
> +#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER

All these functions are not used.

> +struct temphtonl {
> +   uint32_t low;
> +   uint32_t high;
> +};

Should not be necessary, but in any case it would belong in net/l2tpv3.c.

> +
> +#endif /* QEMU_NET_SOCKET_H */
> diff --git a/net/net.c b/net/net.c
> index 0a88e68..ba5f51b 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -132,6 +132,43 @@ int parse_host_port(struct sockaddr_in *saddr, const char *str)
>      return 0;
>  }
>
> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str)
> +{
> +    char buf[512];
> +    char *p, *ip, *port;
> +
> +    strncpy((char *) &buf, str, 511);
> +    ip = (char *) &buf;
> +    port = NULL;
> +    int portint;
> +
> +
> +    for (p = (char *) &buf; (p < (char *) &buf + strlen(str)) || (p < (char *) &buf + 512); p++){
> +      if (*p == '[') {
> +	 ip = p + 1;
> +      }
> +      if (*p == ']') {
> +	 *p = '\0';
> +        if (*(p + 1) == ':') {
> +	    port = p + 2;
> +	}
> +	break;
> +      }
> +    }
> +
> +    saddr->sin6_family = AF_INET6;
> +    if (!inet_pton(AF_INET6, ip, &saddr->sin6_addr)) {
> +         return -1;
> +    }
> +    if (port) {
> +      sscanf(port, "%i", &portint);
> +      saddr->sin6_port = htons(portint);
> +    }
> +    return 0;
> +}

Not needed if you use qemu-sockets.c interfaces.

> +
> +
> +
>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
>  {
>      snprintf(nc->info_str, sizeof(nc->info_str),
> @@ -731,6 +768,7 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>          [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
>  #endif
>          [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
> +        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
>  };
>
>
> diff --git a/net/raw.c b/net/raw.c
> new file mode 100644
> index 0000000..04d244f
> --- /dev/null
> +++ b/net/raw.c

Isn't this file unused?  It is not referred to by the makefiles.

> @@ -0,0 +1,244 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2014 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +#include <netinet/ether.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_packet.h>
> +#include <sys/mman.h>
> +#include <sys/ioctl.h>
> +#include <net/if.h>
> +
> +#include "net/raw.h"
> +
> +#include "config-host.h"
> +#include "net/net.h"
> +#include "clients.h"
> +#include "monitor/monitor.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/sockets.h"
> +#include "qemu/iov.h"
> +#include "qemu/main-loop.h"
> +
> +
> +#define RAW_BUFSIZE 2048
> +
> +
> +typedef struct NetRAWState {
> +    NetClientState nc;
> +    int fd;
> +    int state; /* 0 = getting length, 1 = getting data */
> +    unsigned int ring_index;
> +    unsigned int packet_len;
> +    uint8_t * multiread_buffer;
> +} NetRAWState;
> +
> +
> +static ssize_t net_raw_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
> +{
> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
> +    return write(s->fd, buf, size);
> +}
> +
> +static ssize_t net_raw_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
> +{
> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
> +    return writev(s->fd, iov, iovcnt);
> +}
> +
> +static inline struct tpacket_hdr * current_header (NetRAWState *s) {
> +    uint8_t * buffer;
> +    buffer = s->multiread_buffer + (s->ring_index * RAW_TP_FRAME_SIZE);
> +    return (struct tpacket_hdr *) buffer;
> +}
> +
> +
> +static struct tpacket_hdr * raw_advance_ring(NetRAWState *s) {
> +
> +    struct tpacket_hdr * header = current_header(s);
> +    header->tp_status = TP_STATUS_KERNEL; /* mark as free */
> +    s->ring_index = (s->ring_index + 1) % RAW_TP_FRAME_NR;
> +    return current_header(s);
> +}
> +
> +static void net_raw_send(void *opaque)
> +{
> +    NetRAWState *s = opaque;
> +    struct tpacket_hdr * header;
> +
> +    header = current_header(s);	
> +    while ((header->tp_status & TP_STATUS_USER) > 0) {
> +	qemu_send_packet(&s->nc, ((uint8_t *) header) + header->tp_mac, header->tp_snaplen);
> +	header = raw_advance_ring(s);
> +    }
> +}
> +
> +
> +static void net_raw_cleanup(NetClientState *nc)
> +{
> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
> +    close(s->fd);
> +}
> +
> +static NetClientInfo net_raw_info = {
> +    .type = NET_CLIENT_OPTIONS_KIND_RAW,
> +    .size = sizeof(NetRAWState),
> +    .receive = net_raw_receive_dgram,
> +    .receive_iov = net_raw_receive_dgram_iov,
> +    .cleanup = net_raw_cleanup,
> +};
> +
> +static int net_raw_init(NetClientState *peer,
> +                                 const char *name,
> +                                 const char *host_interface
> +                 )
> +{
> +    NetRAWState *s;
> +    NetClientState *nc;
> +
> +    int fd;
> +
> +
> +
> +    struct ifreq ifr;
> +    struct sockaddr_ll sock;
> +
> +    int err;
> +    struct tpacket_req tpacket;
> +
> +
> +    nc = qemu_new_net_client(&net_raw_info, peer, "raw", name);
> +
> +    s = DO_UPCAST(NetRAWState, nc, nc);
> +
> +    s->ring_index = 0;
> +	
> +
> +    if ((fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
> +	err = -errno;
> +	fprintf(stderr, "raw_open : raw socket creation failed, errno = %d\n", -err);
> +	return err;
> +    }
> +
> +    tpacket.tp_block_size = RAW_TP_BLOCK_SIZE;
> +    tpacket.tp_frame_size = RAW_TP_FRAME_SIZE;
> +    tpacket.tp_block_nr = RAW_TP_BLOCK_NR ;
> +    tpacket.tp_frame_nr = RAW_TP_FRAME_NR;
> +
> +    if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &tpacket, sizeof(struct tpacket_req))) {
> +	fprintf(stderr, "uml_raw: failed to request packet mmap");
> +	return -errno;
> +    } else {
> +	fprintf(stderr, "uml_raw: requested packet mmap\n");
> +    }
> +
> +    s->multiread_buffer = (uint8_t *) mmap(
> +	    NULL,
> +	    RAW_TP_FRAME_SIZE * RAW_TP_FRAME_NR,
> +	    PROT_READ | PROT_WRITE, MAP_SHARED,
> +	    fd,
> +	    0
> +    );
> +
> +    if (!(s->multiread_buffer)) {
> +	fprintf(stderr, "raw: failed to map buffer");
> +	return -1;
> +    } else {
> +	fprintf(stderr, "raw: mmmap-ed buffer at %p\n", s->multiread_buffer);
> +    }
> +
> +
> +
> +
> +    memset(&ifr, 0, sizeof(struct ifreq));
> +    strncpy((char *) &ifr.ifr_name, host_interface, sizeof(ifr.ifr_name) - 1);
> +    if(ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
> +	err = -errno;
> +	fprintf(stderr, "SIOCGIFINDEX, failed to get raw interface index for %s", host_interface);
> +	close(fd);
> +	return(-1);
> +    } else {
> +    }
> +
> +    sock.sll_family = AF_PACKET;
> +    sock.sll_protocol = htons(ETH_P_ALL);
> +    sock.sll_ifindex = ifr.ifr_ifindex;
> +
> +    fprintf(stderr, "raw: binding raw on interface index: %i\n", ifr.ifr_ifindex);
> +    if (bind(fd, (struct sockaddr *) &sock, sizeof(struct sockaddr_ll)) < 0) {
> +	fprintf(stderr, "raw: failed to bind raw socket");
> +	close(fd);
> +	return(-1);
> +    }
> +
> +
> +    qemu_set_nonblock(fd);
> +
> +    if (fd < 0)
> +	return -1;
> +
> +    s->ring_index = 0;
> +
> +    s->fd = fd;
> +
> +    qemu_set_fd_handler(s->fd, net_raw_send, NULL, s);
> +
> +    if (!s) {
> +      fprintf (stderr, "raw_open : failed to set fd handler\n");
> +      return -1;
> +    }
> +
> +/* this needs fixing too */
> +
> +    snprintf(s->nc.info_str, sizeof(s->nc.info_str), "raw: connected");
> +    return 0;
> +
> +}
> +
> +int net_init_raw(const NetClientOptions *opts,
> +                    const char *name,
> +                    NetClientState *peer) {
> +
> +
> +    const NetdevRawOptions * raw;
> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_RAW);
> +    raw = opts->raw;
> +
> +    if (raw->has_fd) {
> +	error_report("passing socket fd not yet supported");
> +	return -1 ;
> +    }
> +
> +    if (!(raw->has_ifname)) {
> +      error_report("iface required for raw");
> +      return -1;
> +    }
> +
> +    if (net_raw_init( peer, name, raw->ifname) == -1) {
> +      return -1;
> +    }
> +    return 0;
> +}
> diff --git a/net/raw.h b/net/raw.h
> new file mode 100644
> index 0000000..80ab43e
> --- /dev/null
> +++ b/net/raw.h
> @@ -0,0 +1,33 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2014 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +#ifndef QEMU_NET_RAW_H
> +#define QEMU_NET_RAW_H
> +
> +#define RAW_TP_BLOCK_SIZE 4096
> +#define RAW_TP_FRAME_SIZE 2048
> +#define RAW_TP_BLOCK_NR 32
> +#define RAW_TP_FRAME_NR 64
> +
> +#endif /* QEMU_NET_RAW_H */

Also unused.

> diff --git a/qapi-schema.json b/qapi-schema.json
> index 83fa485..192d19c 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2941,6 +2941,45 @@
>      '*udp':       'str' } }
>
>  ##
> +# @NetdevL2TPv3Options
> +#
> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
> +#
> +# @fd: #optional file descriptor of an already opened socket
> +#
> +# @src: source address
> +#
> +# @dst: #optional destination address
> +#
> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual definition)
> +#
> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @txsession: tx 32 bit session
> +#
> +# @rxsession: rx 32 bit session
> +#
> +#
> +# Since 1.0
> +##
> +##
> +{ 'type': 'NetdevL2TPv3Options',
> +  'data': {
> +    '*fd':        'str',

fd is not used anywhere, I think.  Using it to support a pre-connected 
socket, passed by some management layer, would be useful because QEMU 
does not ordinarily have privileges to create raw sockets.  But you can 
add this later.

> +    '*src':	  'str',
> +    '*dst':	  'str',
> +    '*mode':	  'str',
> +    '*txcookie':  'str',
> +    '*rxcookie':  'str',
> +    '*txsession': 'str',
> +    '*rxsession': 'str'
> +
> +} }
> +
> +##
> +##
>  # @NetdevVdeOptions
>  #
>  # Connect the VLAN to a vde switch running on the host.
> @@ -3021,6 +3060,7 @@
>      'nic':      'NetLegacyNicOptions',
>      'user':     'NetdevUserOptions',
>      'tap':      'NetdevTapOptions',
> +    'l2tpv3':   'NetdevL2TPv3Options',
>      'socket':   'NetdevSocketOptions',
>      'vde':      'NetdevVdeOptions',
>      'dump':     'NetdevDumpOptions',
>

Regards,

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 10:02 ` Paolo Bonzini
@ 2014-02-28 11:17   ` Anton Ivanov (antivano)
  2014-02-28 11:36     ` Paolo Bonzini
  2014-02-28 13:55   ` Anton Ivanov (antivano)
  2014-03-04 15:19   ` Anton Ivanov (antivano)
  2 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 11:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

On 28/02/14 10:02, Paolo Bonzini wrote:
>> This transport uses a linux specific call to get several GBit RX rate.
>
> Might as well mention that it's recvmmsg. :)

:)

>
>> This call can be wrapped (at some cost) in a compatibility loop using
>> posix compliant recvmsg instead for other systems. As I am not familiar
>> with the fine details on how to add linux specific, windows specific,
>> etc bits to qemu I have decided to leave that bit out for the time being
>> and submit it "as we use it".
>
> You can add the compatibility structures in include/qemu/recvmmsg.h
> and util/recvmmsg.c, and only compile recvmmsg.c on non-Linux system
> with something like
>
>     util-obj-$(call lnot, CONFIG_LINUX) += recvmmsg.o
>
> For now, you can just make l2tpv3.c Linux-specific.

Understood, I will fix it next week, apologies if some responses will be
delayed - I will be quite busy with the IETF next week.

I have read all comments, will sort it out.

On the "raw" file. We have one more transport enqueued - raw socket
using packet_mmap. It is not as fast as L2TPv3, but still very
respectable and with obvious uses - IDS, listening on span ports, etc.

It sneaked in the patchset by mistake at this point. I will update it to
comply with all coding conventions as discussed and re-issue the patch -
probably sometimes in the next couple of weeks.

>
>> Patch attached.
>
> Thanks for the contribution!  I have quite a few comments.  The
> largest part of the work will be to cleanup the user interface.
>
> Also, I suggest using connected sockets instead of connectionless, but
> that should be comparatively less work.
>
> You also need to document the new options in qemu-options.hx.

OK.

>
> Please don't be turned off by the amount of comments.  This is quite
> normal.  I'm CCing the maintainer of the network layer, Stefan
> Hajnoczi, who can help you more with this task.

OK, thanks.

>
>> Best Regards,
>>
>> Anton
>>
>>
>> l2tpv3.diff
>>
>> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
>> index 45588d7..865f1c2 100644
>> --- a/include/qemu/sockets.h
>> +++ b/include/qemu/sockets.h
>> @@ -79,6 +79,7 @@ int socket_dgram(SocketAddress *remote,
>> SocketAddress *local, Error **errp);
>>
>>  /* Old, ipv4 only bits.  Don't use for new code. */
>>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
>> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str);
>>  int socket_init(void);
>>
>>  #endif /* QEMU_SOCKET_H */
>> diff --git a/net/Makefile.objs b/net/Makefile.objs
>> index 4854a14..364c785 100644
>> --- a/net/Makefile.objs
>> +++ b/net/Makefile.objs
>> @@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
>>  common-obj-y += socket.o
>>  common-obj-y += dump.o
>>  common-obj-y += eth.o
>> +common-obj-y += l2tpv3.o
>
> This should be CONFIG_LINUX or (better, if you implement a
> compatibility layer for recvmmsg) CONFIG_POSIX, because Windows does
> not have sendmsg/recvmsg.

OK.

>
> QEMU already has some compatibility code for sendmsg/recvmsg
> (iov_send/iov_recv) but it doesn't let you specify a destination, so
> for now we can make it POSIX only.

OK

>
>>  common-obj-$(CONFIG_POSIX) += tap.o
>>  common-obj-$(CONFIG_LINUX) += tap-linux.o
>>  common-obj-$(CONFIG_WIN32) += tap-win32.o
>> diff --git a/net/clients.h b/net/clients.h
>> index 7793294..ea2a831 100644
>> --- a/net/clients.h
>> +++ b/net/clients.h
>> @@ -47,6 +47,11 @@ int net_init_tap(const NetClientOptions *opts,
>> const char *name,
>>  int net_init_bridge(const NetClientOptions *opts, const char *name,
>>                      NetClientState *peer);
>>
>> +int net_init_raw(const NetClientOptions *opts, const char *name,
>> +                    NetClientState *peer);
>> +
>> +int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
>> +                    NetClientState *peer);
>>  #ifdef CONFIG_VDE
>>  int net_init_vde(const NetClientOptions *opts, const char *name,
>>                   NetClientState *peer);
>> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
>> new file mode 100644
>> index 0000000..4b5d8ee
>> --- /dev/null
>> +++ b/net/l2tpv3.c
>> @@ -0,0 +1,630 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2012 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "net/l2tpv3.h"
>> +#include "config-host.h"
>> +#include "config-host.h"
>> +#include "net/net.h"
>> +#include "clients.h"
>> +#include "monitor/monitor.h"
>> +#include "qemu-common.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/option.h"
>> +#include "qemu/sockets.h"
>> +#include "qemu/iov.h"
>> +#include "qemu/main-loop.h"
>> +#include <linux/ip.h>
>> +
>> +
>> +
>> +
>> +#define PAGE_SIZE 4096
>> +#define MAX_L2TPV3_IOVCNT 64
>> +
>> +typedef struct NetL2TPV3State {
>> +    NetClientState nc;
>> +    int fd;
>> +    int state; /* 0 = getting length, 1 = getting data */
>> +    unsigned int index;
>> +    unsigned int packet_len;
>> +    /*
>> +    these are used for xmit - that happens packet a time
>> +    and for first sign of life packet (easier to parse that once)
>> +
>> +    */
>
> Hi Anton,
>
> thanks for the contribution!
>
> The new files do not obey the QEMU coding standards. We use comments like
>
>     /* These are used for xmit - that happens packet a time
>      * and for first sign of life packet (easier to parse that once).
>      */
>
> and no C++ comments.

Understood, will fix.

>
>> +    uint8_t * header_buf;
>> +    uint8_t * buf;
>> +    struct iovec * vec;
>> +    /*
>> +    these are used for receive - try to "eat" up to 32 packets at a
>> time
>> +    */
>> +    struct mmsghdr * msgvec;
>> +    /*
>> +      contains inet host and port destination if
>> +      connectionless (SOCK_DGRAM). Will be NULL if
>> +      in listen mode waiting for first connect
>> +
>> +    */
>> +    struct sockaddr_storage * dgram_dst;
>
> I cannot see any code that handles "listening" mode, so please drop
> all of it and assume dgram_dst is non-NULL throughout the code.
>
> Even if we were to add a "listening" mode later, I'd prefer a boolean
> value ("bool connected;") and not making dgram_dst a pointer.

Understood.

>
>> +    uint64_t rx_cookie;
>> +    uint64_t tx_cookie;
>> +    uint32_t rx_session;
>> +    uint32_t tx_session;
>> +    uint32_t header_size;
>> +    uint32_t counter;
>> +    uint32_t dst_size;
>> +    uint32_t new_mode;
>> +
>> +    /* offsets */
>> +
>> +    uint32_t offset;   /* main offset == header offset */
>> +    uint32_t cookie_offset;
>> +    uint32_t counter_offset;
>> +    uint32_t session_offset;
>> +
>
> Useless blank line.
>
>> +} NetL2TPV3State;
>> +
>> +typedef struct NetL2TPV3ListenState {
>> +    NetClientState nc;
>> +    char *model;
>> +    char *name;
>> +    int fd;
>> +} NetL2TPV3ListenState;
>> +
>> +static int l2tpv3_parse_cookie32(const char *src , void * dst) {
>> +
>> +  if (
>> +     (src == NULL) ||
>> +     (sscanf(src, "%x", (unsigned int *) dst) != 1)
>
> Is it really so bad that the user has to prepend "0x" to the cookie?
> Then you can just use the same parsing that is used elsewhere: declare
> the fields as 'int64' in qapi-schema.json and QEMU will do everything
> for you.

It is an artefact from porting from UML :)  We originally wrote it for
that and its option parsing is not anywhere as rich as qemu.

>
>> +      ) {
>> +     fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
>> +     return -1;
>> +   }
>> +
>> +   return 0;
>> +}
>> +
>> +static int l2tpv3_parse_cookie64(const char *src , void * dst) {
>> +
>> +   struct temphtonl temph;
>> +   uint32_t temp;
>> +   const int num = 42;
>> +    if (
>> +     (src == NULL) ||
>> +     (sscanf(src, "%lx", (long unsigned int *) &temph) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
>> +     return -1;
>> +   }
>> +
>> +   if(*(char *)&num == 42) {
>> +      // why oh why there is no htonll
>> +      temp = htonl(temph.high);
>> +      temph.high = htonl(temph.low);
>> +      temph.low = temp;
>> +   } else {
>> +      temph.low = htonl(temph.low);
>> +      temph.high = htonl(temph.high);
>> +   }   
>
> There is no htonll, but QEMU has stq_be_p.  You can use that instead
> of this hack.

OK.

>
> BTW, the indentation is off in most of the file.

Will fix to coding standard.

>
> As mentioned below, I suggest storing the cookies and session ids in
> host order in NetL2TPV3State, and doing the conversion in
> l2tpv3_form_header and friends.

I can fix it. I prefer to keep all params in "ready to use" form so that
no cycles are wasted on conversion in the portions which may affect
performance.

>
>> +   memcpy(dst, &temph, sizeof (uint64_t)); 

[snip]

>> +       message.msg_flags = 0;
>> +       ret = sendmsg(s->fd, &message, MSG_DONTWAIT) - s->offset;
>
> You can use iov_send

OK.

>
>> +       if (ret == 0) {
>> +        ret = iov_size (iov, iovcnt); 

[snip]

>> +    offset = s->offset;
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>
> Space before the opening brace, and parentheses around !(a & b) are
> unnecessary.  More instances in the rest of the file.

Bad habits die hard. After being burned by a couple of buggy borland
compilers 20 years ago I brace everything to the hilt. You have a point
though.

>
>> +    offset +=   sizeof(struct iphdr);
>> +    }
>> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2,
>> MSG_DONTWAIT, NULL);
>> +    for (i=0;i<count;i++) {
>> +    if (msgvec->msg_len > 0) {
>
> More instances of inconsistent indentation (both within the file, and
> with the rest of QEMU).
>
>> +        vec = msgvec->msg_hdr.msg_iov;
>> +        if (l2tpv3_verify_header(s, vec->iov_base) == 0) {
>> +        //vec->iov_len = offset; /* reset size just in case*/
>> +        vec++;
>> +        qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len -
>> offset);
>> +        } else {
>> +        fprintf(stderr, "l2tpv3 header verification failed\n");
>> +        vec->iov_len = offset;
>> +        vec++;
>> +        }
>> +        //vec->iov_len = PAGE_SIZE; /* reset for next read */
>
> Do not include commented-out code.
>
>> +    }
>> +    msgvec++;
>> +    }
>> +}
>> +
>> +static void net_l2tpv3_cleanup(NetClientState *nc)
>> +{
>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
>> +    close(s->fd);
>> +}
>> +
>> +static NetClientInfo net_l2tpv3_info = {
>> +    .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
>> +    .size = sizeof(NetL2TPV3State),
>> +    .receive = net_l2tpv3_receive_dgram,
>> +    .receive_iov = net_l2tpv3_receive_dgram_iov,
>> +    .cleanup = net_l2tpv3_cleanup,
>> +};
>> +
>> +static int net_l2tpv3_init(NetClientState *peer,
>> +                                 const char *name,
>> +                                 const char *src,
>> +                                 const char *dst,
>> +                                 const char *new_mode,
>> +                                 const char *tx_cookie,
>> +                                 const char *rx_cookie,
>> +                                 const char *tx_session,
>> +                                 const char *rx_session
>> +)
>
> This parenthesis goes on the same line as the last argument.
>
> Also, I suggest getting a NetdevL2TPv3Options* instead of eight
> different arguments.

1.0 did not have that form of args. This part dates back to then.

>
>> +{
>> +   NetL2TPV3State *s;
>> +   NetClientState *nc;
>> +
>> +   int fd, i;
>> +
>> +   int sock_family, sock_type, sock_proto;
>> +   unsigned int temp;
>> +
>> +   struct mmsghdr * msgvec;
>> +   struct iovec * iov;
>> +
>> +
>> +   struct sockaddr_storage LocalSock;
>> +
>> +
>> +   fprintf(stderr, "l2tpv3 user init mode %s\n", new_mode);
>> +   memset(&LocalSock, 0 , sizeof(LocalSock));
>> +
>> +   nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
>> +
>> +   s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +
>> +   s->header_buf = g_malloc(256);
>> +
>> +   if (s->header_buf == NULL) {
>> +    fprintf(stderr, "Failed to allocate header buffer \n");
>> +    return -1;
>> +   }
>> +   s->buf = g_malloc(PAGE_SIZE);
>> +
>> +   if (s->buf == NULL) {
>> +    fprintf(stderr, "Failed to allocate packet buffer \n");
>> +    return -1;
>> +   }
>> +   s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);;
>
> Why do you need separate mallocs for these?

You do not really need to use a separate malloc for TX and RX. You can
reuse the first element of the RX vector for TX.

Fair point.

>
>> +   if (s->vec == NULL) {
>> +    fprintf(stderr, "Failed to allocate packet vec \n");
>> +    return -1;
>> +   }
>> +
>> +
> Double blank lines.
>
>> +   s->offset = 4;
>> +   s->session_offset = 0;
>> +   s->cookie_offset = 4;
>> +   s->counter_offset = 4;
>> +   s->dst_size = sizeof(struct sockaddr_storage);
>> +
>> +   sscanf(new_mode, "%x", &s->new_mode);
>
> Please replace this with multiple boolean or integer options, QAPI
> makes it easy.

OK

>> +
>> +
>> +   fprintf(stderr, "type is: %s -> %x\n", new_mode, s->new_mode);
>
> No debugging output.

Guilty as charged :)

>
>> +   /* Cookies */
>> +
>> +  if (
>> +     (tx_session == NULL) ||
>> +     (sscanf(tx_session, "%x", &temp) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse tx_session!!!: %s\n", tx_session);
>> +     return -1;
>> +   } else {
>> +      s->tx_session = htonl(temp);
>> +      fprintf(stderr, "l2tpv3_open :  parsed tx session 32, %0x\n",
>> s->tx_session);
>> +   }
>
> As above, better force the user to prepend a "0x" and let QAPI do the
> parsing.
>
>> +  if (
>> +     (tx_session == NULL) ||
>> +     (sscanf(rx_session, "%x", &temp) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse rx_session!!!: %s\n", rx_session);
>> +     return -1;
>> +   }  else {
>> +      s->rx_session = htonl(temp);
>> +      fprintf(stderr, "l2tpv3_open :  parsed rx session 32, %0x\n",
>> s->rx_session);
>> +   }
>> +   if (s->new_mode & NEW_MODE_COOKIE) {
>> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
>
> Perhaps an optional integer argument like
> "cookie-size=32"/"cookie-size=64"?  Also, please mark
> tx-cookie/rx-cookie as optional and check that they are present if and
> only if cookie-size != 0.
>
> Alternatively, mark tx-cookie/rx-cookie as optional and do all the
> following:
>
> * check that either both are present or none is
>
> * check that the optional integer argument cookie-size, if present, is
> either 32 or 64
>
> * check that cookie-size is not present unless tx-cookie and rx-cookie
> are also present
>
> * if it makes sense, give a default value to cookie-size.  Otherwise,
> check that cookie-size is present if tx-cookie and rx-cookie are
>
>> +     /* 64 bit cookie */
>> +     s->offset += 8;
>> +     s->counter_offset += 8;
>> +     if (l2tpv3_parse_cookie64(tx_cookie,&s->tx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie
>> 64\n");
>> +        return -1;
>> +     } else {
>> +        fprintf(stderr, "l2tpv3_open :  parsed tx cookie 64,
>> %0lx\n", s->tx_cookie);
>> +     }
>> +     if (l2tpv3_parse_cookie64(rx_cookie,&s->rx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie
>> 64\n");
>> +        return -1;
>> +     } else {
>> +        fprintf(stderr, "l2tpv3_open :  parsed rx cookie 64,
>> %0lx\n", s->rx_cookie);
>> +     }
>> +      } else {
>> +     /* 32 bit cookie */
>> +     s->offset += 4;
>> +     s->counter_offset +=4;
>> +     s->tx_cookie = 0;
>> +     if (l2tpv3_parse_cookie32(tx_cookie,&s->tx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie
>> 32\n");
>> +        return -1;
>> +     }
>> +     s->rx_cookie = 0;
>> +     if (l2tpv3_parse_cookie32(rx_cookie,&s->rx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie
>> 32\n");
>> +        return -1;
>> +     }
>> +      }
>> +   }
>> +
>> +
>> +
>> +   if (dst && (strlen(dst) > 0 )) {
>> +      /* we now allocate it only if it we are not "listening" */
>> +      s->dgram_dst = malloc(sizeof(struct sockaddr_storage));
>> +      memset(s->dgram_dst, 0 , sizeof(struct sockaddr_storage));
>> +      fprintf(stderr, "l2tpv3_open : setting dst at init\n");
>> +   } else {
>> +      s->dgram_dst = NULL;
>> +   }
>> +
>> +   if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +      /* IPv6 */
>> +      sock_family = AF_INET6;
>> +      if (parse_host_port6((struct sockaddr_in6 *) &LocalSock, src)
>> !=0) {
>
> Is the local address mandatory?

In L2TPv3 - yes. In fact so is the remote - our "listen mode" is a hack.

>
>> +     fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
>> +     return -1;
>> +      };
>> +   } else {
>> +      /* IPv4 */
>> +      sock_family = AF_INET;
>> +      if (parse_host_port((struct sockaddr_in*) &LocalSock, src) !=
>> 0) {
>
> I would try using a connected socket?  This way you can use the
> socket_dgram function to create the UDP socket, resolve host names,
> use getaddrinfo properly, etc.

This will also ensure you cannot inject traffic - at least on Linux.
Good idea.

>
> For raw sockets, you can cut-and-paste relevant bits of socket_dgram
> into a new function that creates raw sockets:
>
>     int socket_raw(const char *host, const char *localaddr,
>                    int proto, bool ipv4, bool ipv6, Error **errp)
>     {
>         ...
>     }
>
> Errors from socket_dgram/socket_raw can be printed with the
> qerror_report_err function.
>
>> +     fprintf(stderr, "l2tpv3_open :  failed to parse v4 src\n");
>> +     return -1;
>> +      };
>> +   }
>
> The way this is usually done is using two optional booleans named ipv4
> and ipv6.  Also, please use separate options host, port, localaddr,
> localport for consistency with other places where we specify the
> address of a datagram socket.
>
> Add an optional argument instead of the NEW_MODE_UDP bit.  Examples:
>
> * an enum named "transport" and make it accept values "udp" or "ip"
>
> * a boolean named "udp"
>
> etc.  You can decide what the default is.  Make port optional (and
> localport too if making localaddr/localport mandatory is justified);
> check that it isn't specified for the raw protocol.
>
>> +   if (s->new_mode & NEW_MODE_UDP) {
>> +      sock_type = SOCK_DGRAM;
>> +      sock_proto = 0;
>> +      /* space for header */
>> +
>> +      s->offset += 4;
>> +      s->counter_offset += 4;
>> +      s->session_offset += 4;
>> +      s->cookie_offset += 4;
>> +    } else {
>> +      sock_type = SOCK_RAW;
>> +      sock_proto = 0x73;
>> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +     /* IPv6 */
>> +     ((struct sockaddr_in6 *) &LocalSock)->sin6_port = htons(0x73);
>> +      } else {
>> +     /* IPv4 */
>> +     ((struct sockaddr_in *) &LocalSock)->sin_port = htons(0x73);
>
> If you move socket creation to a function in util/qemu-sockets.c, like
> the socket_raw I suggest above, note that there is already an
> inet_setport in util/qemu-sockets.c.
>
>> +      }
>> +   }
>> +
>> +   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
>
> Another optional boolean, "counter", defaulting (I guess) to 0.
>
>> +      s->offset += 4;
>> +   }
>> +   if ((fd = socket(sock_family, sock_type, sock_proto)) == -1) {
>> +      fd = -errno;
>> +      fprintf(stderr, "l2tpv3_open : socket creation failed, "
>> +           "errno = %d\n", -fd);
>> +      return fd;
>> +   }
>> +
>> +
>> +   if (bind(fd, (struct sockaddr *) &LocalSock, sizeof(LocalSock))) {
>> +      fprintf(stderr, "l2tpv3_open :  could not bind socket
>> err=%i\n", errno);
>> +      close(fd);
>> +      return -1;
>> +   } else {
>> +      fprintf(stderr, "l2tpv3_open : socket bound successfully\n");
>> +   }
>> +   if (s->dgram_dst) {
>> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +     /* IPv6 */
>> +     if (parse_host_port6((struct sockaddr_in6 *) s->dgram_dst, dst)
>> !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
>> +        return -1;
>> +     };
>> +     s->dst_size = sizeof(struct sockaddr_in6);
>> +      } else {
>> +     /* IPv4 */
>> +     if (parse_host_port((struct sockaddr_in *) s->dgram_dst, dst)
>> != 0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse v4 dst\n");
>> +        return -1;
>> +     };
>> +     s->dst_size = sizeof(struct sockaddr_in);
>> +      }
>> +   }
>> +
>> +   s->msgvec = g_malloc(sizeof(struct mmsghdr) *
>> (MAX_L2TPV3_IOVCNT/2));
>> +   if (s->msgvec == NULL) {
>> +    fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +    return -1;
>> +   }
>> +   msgvec = s->msgvec;
>
> More allocations that could be left in NetL2TPV3State.
>
>> +   for (i=0;i<MAX_L2TPV3_IOVCNT/2;i++) {
>> +    /* we need to allocate these only for the first time and first
>> buffer */
>> +    msgvec->msg_hdr.msg_name = NULL;
>> +    msgvec->msg_hdr.msg_namelen = 0;
>> +    iov =  g_malloc(sizeof(struct iovec) * 2);
>
> This could also be a multidimensional array.
>
>> +    if (iov == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>> +    msgvec->msg_hdr.msg_iov = iov;
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>
> Dropping new_mode will also make this code much more readable:
>
>      "if (!s->ipv6 && !s->udp)"
>
>> +        iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr));
>> /* fix for ipv4 raw */;
>
> Extra semicolon at the end of the line, also please try to be more
> verbose:
>
>     /* We need to allocate blah blah for IPv4 raw sockets, because blah
>      * blah.
>      */
>
>> +    } else {
>> +        iov->iov_base = g_malloc(s->offset);
>> +    }
>> +   
>> +    if (iov->iov_base == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>
> g_malloc can never fail.
>
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>> +        iov->iov_len = s->offset + sizeof (struct iphdr);
>> +    } else {
>> +        iov->iov_len = s->offset;
>> +    }
>
> Better set iov_len first, and then refer to it in the allocation.
>
>> +    iov++ ;
>
> Just use iov[0] and iov[1].
>
>> +    iov->iov_base = qemu_memalign(PAGE_SIZE, PAGE_SIZE);
>> +    if (iov->iov_base == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>> +    iov->iov_len = PAGE_SIZE;
>> +    msgvec->msg_hdr.msg_iovlen = 2;
>> +    msgvec->msg_hdr.msg_control = NULL;
>> +    msgvec->msg_hdr.msg_controllen = 0;
>> +    msgvec->msg_hdr.msg_flags = 0;
>> +    msgvec++;
>> +   }
>> +
>> +//   sendbuff = 90000;
>> +//
>> +//  printf("sets the send buffer to %d\n", sendbuff);
>> +// setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sendbuff, sizeof(sendbuff));
>
> No debug code.
>
>> +
>> +   qemu_set_nonblock(fd);
>
> Since the socket is nonblocking, MSG_DONTWAIT is unnecessary.
>
>> +
>> +   if (fd < 0)
>> +    return -1;
>
> You should have checked this much earlier, right after the socket
> function was created.
>
>> +   s->fd = fd;
>> +   s->counter = 0;
>> +
>> +   qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
>> +
>> +   if (!s) {
>> +      fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
>> +      return -1;
>> +   }
>
> Cannot happen, please delete.
>
>> +
>> +   /* this needs fixing too */
>> +
>> +   snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>> +             "l2tpv3: connected");
>
> What needs fixing?
>
>> +   return 0;
>> +
>> +}
>> +
>> +int net_init_l2tpv3(const NetClientOptions *opts,
>> +                    const char *name,
>> +                    NetClientState *peer) {
>> +
>> +
>> +    const NetdevL2TPv3Options * l2tpv3;
>> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
>> +    l2tpv3 = opts->l2tpv3;
>> +
>> +
>> +  if (!(l2tpv3->has_src && l2tpv3->has_mode && l2tpv3->has_txsession
>> && l2tpv3->has_rxsession)) {
>> +      error_report("src=, dst=, txsession=, rxsession= and mode= are
>> required for l2tpv3");
>> +      return -1;
>> +  }
>> +
>> +
>> +  /* this needs cleaning up for new API */
>
> Then do it. :)
>
>> +  if (net_l2tpv3_init(
>> +     peer,
>> +         name,
>> +     l2tpv3->src,
>> +     l2tpv3->dst,
>> +     l2tpv3->mode,
>> +     l2tpv3->txcookie,
>> +     l2tpv3->rxcookie,
>> +     l2tpv3->txsession,
>> +     l2tpv3->rxsession
>> +
>> +   ) == -1) {
>> +      return -1;
>> +  }
>> +  return 0;
>> +}
>> diff --git a/net/l2tpv3.h b/net/l2tpv3.h
>> new file mode 100644
>> index 0000000..822e0b0
>> --- /dev/null
>> +++ b/net/l2tpv3.h
>> @@ -0,0 +1,65 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2012 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu-common.h"
>> +
>> +#ifndef QEMU_NET_L2TPV3_H
>> +#define QEMU_NET_L2TPV3_H
>> +
>> +#define L2TPV3_BUFSIZE 2048
>> +
>> +#define NEW_MODE_IP_VERSION   1          /* on for v6, off for v4 */
>> +#define NEW_MODE_UDP          2          /* on for udp, off for raw
>> ip */
>> +#define NEW_MODE_COOKIE          4          /* cookie present */
>> +#define NEW_MODE_COOKIE_SIZE  8          /* on for 64 bit */
>> +#define NEW_MODE_NO_COUNTER   16      /* DT - no counter */
>
> Please use bools in NetL2TPV3State
>
>> +/* legacy modes */
>> +
>> +/* mode 0 */
>> +
>> +#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION +
>> NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE +
>> NEW_MODE_NO_COUNTER)
>> +
>> +#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
>> +
>> +/* mode 1 */
>> +
>> +#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION +
>> NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
>> +
>> +#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
>> +
>> +/* mode 2 */
>> +
>> +#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP +
>> NEW_MODE_COOKIE_SIZE )
>> +
>> +#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER
>
> All these functions are not used.
>
>> +struct temphtonl {
>> +   uint32_t low;
>> +   uint32_t high;
>> +};
>
> Should not be necessary, but in any case it would belong in net/l2tpv3.c.
>
>> +
>> +#endif /* QEMU_NET_SOCKET_H */
>> diff --git a/net/net.c b/net/net.c
>> index 0a88e68..ba5f51b 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -132,6 +132,43 @@ int parse_host_port(struct sockaddr_in *saddr,
>> const char *str)
>>      return 0;
>>  }
>>
>> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str)
>> +{
>> +    char buf[512];
>> +    char *p, *ip, *port;
>> +
>> +    strncpy((char *) &buf, str, 511);
>> +    ip = (char *) &buf;
>> +    port = NULL;
>> +    int portint;
>> +
>> +
>> +    for (p = (char *) &buf; (p < (char *) &buf + strlen(str)) || (p
>> < (char *) &buf + 512); p++){
>> +      if (*p == '[') {
>> +     ip = p + 1;
>> +      }
>> +      if (*p == ']') {
>> +     *p = '\0';
>> +        if (*(p + 1) == ':') {
>> +        port = p + 2;
>> +    }
>> +    break;
>> +      }
>> +    }
>> +
>> +    saddr->sin6_family = AF_INET6;
>> +    if (!inet_pton(AF_INET6, ip, &saddr->sin6_addr)) {
>> +         return -1;
>> +    }
>> +    if (port) {
>> +      sscanf(port, "%i", &portint);
>> +      saddr->sin6_port = htons(portint);
>> +    }
>> +    return 0;
>> +}
>
> Not needed if you use qemu-sockets.c interfaces.
>
>> +
>> +
>> +
>>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
>>  {
>>      snprintf(nc->info_str, sizeof(nc->info_str),
>> @@ -731,6 +768,7 @@ static int (* const
>> net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>>          [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
>>  #endif
>>          [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
>> +        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
>>  };
>>
>>
>> diff --git a/net/raw.c b/net/raw.c
>> new file mode 100644
>> index 0000000..04d244f
>> --- /dev/null
>> +++ b/net/raw.c
>
> Isn't this file unused?  It is not referred to by the makefiles.
>
>> @@ -0,0 +1,244 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2014 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +#include <netinet/ether.h>
>> +#include <linux/if_ether.h>
>> +#include <linux/if_packet.h>
>> +#include <sys/mman.h>
>> +#include <sys/ioctl.h>
>> +#include <net/if.h>
>> +
>> +#include "net/raw.h"
>> +
>> +#include "config-host.h"
>> +#include "net/net.h"
>> +#include "clients.h"
>> +#include "monitor/monitor.h"
>> +#include "qemu-common.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/option.h"
>> +#include "qemu/sockets.h"
>> +#include "qemu/iov.h"
>> +#include "qemu/main-loop.h"
>> +
>> +
>> +#define RAW_BUFSIZE 2048
>> +
>> +
>> +typedef struct NetRAWState {
>> +    NetClientState nc;
>> +    int fd;
>> +    int state; /* 0 = getting length, 1 = getting data */
>> +    unsigned int ring_index;
>> +    unsigned int packet_len;
>> +    uint8_t * multiread_buffer;
>> +} NetRAWState;
>> +
>> +
>> +static ssize_t net_raw_receive_dgram(NetClientState *nc, const
>> uint8_t *buf, size_t size)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    return write(s->fd, buf, size);
>> +}
>> +
>> +static ssize_t net_raw_receive_dgram_iov(NetClientState *nc, const
>> struct iovec *iov, int iovcnt)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    return writev(s->fd, iov, iovcnt);
>> +}
>> +
>> +static inline struct tpacket_hdr * current_header (NetRAWState *s) {
>> +    uint8_t * buffer;
>> +    buffer = s->multiread_buffer + (s->ring_index * RAW_TP_FRAME_SIZE);
>> +    return (struct tpacket_hdr *) buffer;
>> +}
>> +
>> +
>> +static struct tpacket_hdr * raw_advance_ring(NetRAWState *s) {
>> +
>> +    struct tpacket_hdr * header = current_header(s);
>> +    header->tp_status = TP_STATUS_KERNEL; /* mark as free */
>> +    s->ring_index = (s->ring_index + 1) % RAW_TP_FRAME_NR;
>> +    return current_header(s);
>> +}
>> +
>> +static void net_raw_send(void *opaque)
>> +{
>> +    NetRAWState *s = opaque;
>> +    struct tpacket_hdr * header;
>> +
>> +    header = current_header(s);   
>> +    while ((header->tp_status & TP_STATUS_USER) > 0) {
>> +    qemu_send_packet(&s->nc, ((uint8_t *) header) + header->tp_mac,
>> header->tp_snaplen);
>> +    header = raw_advance_ring(s);
>> +    }
>> +}
>> +
>> +
>> +static void net_raw_cleanup(NetClientState *nc)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
>> +    close(s->fd);
>> +}
>> +
>> +static NetClientInfo net_raw_info = {
>> +    .type = NET_CLIENT_OPTIONS_KIND_RAW,
>> +    .size = sizeof(NetRAWState),
>> +    .receive = net_raw_receive_dgram,
>> +    .receive_iov = net_raw_receive_dgram_iov,
>> +    .cleanup = net_raw_cleanup,
>> +};
>> +
>> +static int net_raw_init(NetClientState *peer,
>> +                                 const char *name,
>> +                                 const char *host_interface
>> +                 )
>> +{
>> +    NetRAWState *s;
>> +    NetClientState *nc;
>> +
>> +    int fd;
>> +
>> +
>> +
>> +    struct ifreq ifr;
>> +    struct sockaddr_ll sock;
>> +
>> +    int err;
>> +    struct tpacket_req tpacket;
>> +
>> +
>> +    nc = qemu_new_net_client(&net_raw_info, peer, "raw", name);
>> +
>> +    s = DO_UPCAST(NetRAWState, nc, nc);
>> +
>> +    s->ring_index = 0;
>> +   
>> +
>> +    if ((fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
>> +    err = -errno;
>> +    fprintf(stderr, "raw_open : raw socket creation failed, errno =
>> %d\n", -err);
>> +    return err;
>> +    }
>> +
>> +    tpacket.tp_block_size = RAW_TP_BLOCK_SIZE;
>> +    tpacket.tp_frame_size = RAW_TP_FRAME_SIZE;
>> +    tpacket.tp_block_nr = RAW_TP_BLOCK_NR ;
>> +    tpacket.tp_frame_nr = RAW_TP_FRAME_NR;
>> +
>> +    if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *)
>> &tpacket, sizeof(struct tpacket_req))) {
>> +    fprintf(stderr, "uml_raw: failed to request packet mmap");
>> +    return -errno;
>> +    } else {
>> +    fprintf(stderr, "uml_raw: requested packet mmap\n");
>> +    }
>> +
>> +    s->multiread_buffer = (uint8_t *) mmap(
>> +        NULL,
>> +        RAW_TP_FRAME_SIZE * RAW_TP_FRAME_NR,
>> +        PROT_READ | PROT_WRITE, MAP_SHARED,
>> +        fd,
>> +        0
>> +    );
>> +
>> +    if (!(s->multiread_buffer)) {
>> +    fprintf(stderr, "raw: failed to map buffer");
>> +    return -1;
>> +    } else {
>> +    fprintf(stderr, "raw: mmmap-ed buffer at %p\n",
>> s->multiread_buffer);
>> +    }
>> +
>> +
>> +
>> +
>> +    memset(&ifr, 0, sizeof(struct ifreq));
>> +    strncpy((char *) &ifr.ifr_name, host_interface,
>> sizeof(ifr.ifr_name) - 1);
>> +    if(ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
>> +    err = -errno;
>> +    fprintf(stderr, "SIOCGIFINDEX, failed to get raw interface index
>> for %s", host_interface);
>> +    close(fd);
>> +    return(-1);
>> +    } else {
>> +    }
>> +
>> +    sock.sll_family = AF_PACKET;
>> +    sock.sll_protocol = htons(ETH_P_ALL);
>> +    sock.sll_ifindex = ifr.ifr_ifindex;
>> +
>> +    fprintf(stderr, "raw: binding raw on interface index: %i\n",
>> ifr.ifr_ifindex);
>> +    if (bind(fd, (struct sockaddr *) &sock, sizeof(struct
>> sockaddr_ll)) < 0) {
>> +    fprintf(stderr, "raw: failed to bind raw socket");
>> +    close(fd);
>> +    return(-1);
>> +    }
>> +
>> +
>> +    qemu_set_nonblock(fd);
>> +
>> +    if (fd < 0)
>> +    return -1;
>> +
>> +    s->ring_index = 0;
>> +
>> +    s->fd = fd;
>> +
>> +    qemu_set_fd_handler(s->fd, net_raw_send, NULL, s);
>> +
>> +    if (!s) {
>> +      fprintf (stderr, "raw_open : failed to set fd handler\n");
>> +      return -1;
>> +    }
>> +
>> +/* this needs fixing too */
>> +
>> +    snprintf(s->nc.info_str, sizeof(s->nc.info_str), "raw: connected");
>> +    return 0;
>> +
>> +}
>> +
>> +int net_init_raw(const NetClientOptions *opts,
>> +                    const char *name,
>> +                    NetClientState *peer) {
>> +
>> +
>> +    const NetdevRawOptions * raw;
>> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_RAW);
>> +    raw = opts->raw;
>> +
>> +    if (raw->has_fd) {
>> +    error_report("passing socket fd not yet supported");
>> +    return -1 ;
>> +    }
>> +
>> +    if (!(raw->has_ifname)) {
>> +      error_report("iface required for raw");
>> +      return -1;
>> +    }
>> +
>> +    if (net_raw_init( peer, name, raw->ifname) == -1) {
>> +      return -1;
>> +    }
>> +    return 0;
>> +}
>> diff --git a/net/raw.h b/net/raw.h
>> new file mode 100644
>> index 0000000..80ab43e
>> --- /dev/null
>> +++ b/net/raw.h
>> @@ -0,0 +1,33 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2014 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +#ifndef QEMU_NET_RAW_H
>> +#define QEMU_NET_RAW_H
>> +
>> +#define RAW_TP_BLOCK_SIZE 4096
>> +#define RAW_TP_FRAME_SIZE 2048
>> +#define RAW_TP_BLOCK_NR 32
>> +#define RAW_TP_FRAME_NR 64
>> +
>> +#endif /* QEMU_NET_RAW_H */
>
> Also unused.
>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 83fa485..192d19c 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -2941,6 +2941,45 @@
>>      '*udp':       'str' } }
>>
>>  ##
>> +# @NetdevL2TPv3Options
>> +#
>> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
>> +#
>> +# @fd: #optional file descriptor of an already opened socket
>> +#
>> +# @src: source address
>> +#
>> +# @dst: #optional destination address
>> +#
>> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual
>> definition)
>> +#
>> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set
>> mode for actual type selection)
>> +#
>> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set
>> mode for actual type selection)
>> +#
>> +# @txsession: tx 32 bit session
>> +#
>> +# @rxsession: rx 32 bit session
>> +#
>> +#
>> +# Since 1.0
>> +##
>> +##
>> +{ 'type': 'NetdevL2TPv3Options',
>> +  'data': {
>> +    '*fd':        'str',
>
> fd is not used anywhere, I think.  Using it to support a pre-connected
> socket, passed by some management layer, would be useful because QEMU
> does not ordinarily have privileges to create raw sockets.  But you
> can add this later.
>
>> +    '*src':      'str',
>> +    '*dst':      'str',
>> +    '*mode':      'str',
>> +    '*txcookie':  'str',
>> +    '*rxcookie':  'str',
>> +    '*txsession': 'str',
>> +    '*rxsession': 'str'
>> +
>> +} }
>> +
>> +##
>> +##
>>  # @NetdevVdeOptions
>>  #
>>  # Connect the VLAN to a vde switch running on the host.
>> @@ -3021,6 +3060,7 @@
>>      'nic':      'NetLegacyNicOptions',
>>      'user':     'NetdevUserOptions',
>>      'tap':      'NetdevTapOptions',
>> +    'l2tpv3':   'NetdevL2TPv3Options',
>>      'socket':   'NetdevSocketOptions',
>>      'vde':      'NetdevVdeOptions',
>>      'dump':     'NetdevDumpOptions',
>>
>
> Regards,
>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 11:17   ` Anton Ivanov (antivano)
@ 2014-02-28 11:36     ` Paolo Bonzini
  2014-02-28 12:59       ` Anton Ivanov (antivano)
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2014-02-28 11:36 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel, Stefan Hajnoczi

Il 28/02/2014 12:17, Anton Ivanov (antivano) ha scritto:
>> > As mentioned below, I suggest storing the cookies and session ids in
>> > host order in NetL2TPV3State, and doing the conversion in
>> > l2tpv3_form_header and friends.
> I can fix it. I prefer to keep all params in "ready to use" form so that
> no cycles are wasted on conversion in the portions which may affect
> performance.
>

This is just one instruction (bswap) or zero on some hardware (PPC with 
has lwbrz, Haswell which has movbe), no reason to worry about it.  It 
makes the code simpler by making all accesses use stX_be_p.

>> Space before the opening brace, and parentheses around !(a & b) are
>> unnecessary.  More instances in the rest of the file.
>
> Bad habits die hard. After being burned by a couple of buggy borland
> compilers 20 years ago I brace everything to the hilt. You have a point
> though.

We also brace everything, but we do not parenthesize everything. :)

>> Why do you need separate mallocs for these?
>
> You do not really need to use a separate malloc for TX and RX. You can
> reuse the first element of the RX vector for TX.
>
> Fair point.

No, I mean: why not just use arrays in NetL2TPV3State?  All of them are 
sized statically.  Avoiding pointer chasing also improves performance. 
:)  It also avoids memory leaks; I just noticed that you're not freeing 
the memory you allocate in net_l2tpv3_init (I checked s->header-buf).

>> Is the local address mandatory?
>
> In L2TPv3 - yes.

Ok.

> In fact so is the remote - our "listen mode" is a hack.

The listen mode is not implemented in this patch, is it?

Thanks for the prompt reply.

Note that I posted two small fixes to qemu-sockets.c.  You may want to 
include them.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 11:36     ` Paolo Bonzini
@ 2014-02-28 12:59       ` Anton Ivanov (antivano)
  0 siblings, 0 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 12:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

On 28/02/14 11:36, Paolo Bonzini wrote:
> Il 28/02/2014 12:17, Anton Ivanov (antivano) ha scritto:
>>> > As mentioned below, I suggest storing the cookies and session ids in
>>> > host order in NetL2TPV3State, and doing the conversion in
>>> > l2tpv3_form_header and friends.
>> I can fix it. I prefer to keep all params in "ready to use" form so that
>> no cycles are wasted on conversion in the portions which may affect
>> performance.
>>
>
> This is just one instruction (bswap) or zero on some hardware (PPC
> with has lwbrz, Haswell which has movbe), no reason to worry about
> it.  It makes the code simpler by making all accesses use stX_be_p.

OK.

>
>>> Space before the opening brace, and parentheses around !(a & b) are
>>> unnecessary.  More instances in the rest of the file.
>>
>> Bad habits die hard. After being burned by a couple of buggy borland
>> compilers 20 years ago I brace everything to the hilt. You have a point
>> though.
>
> We also brace everything, but we do not parenthesize everything. :)
>
>>> Why do you need separate mallocs for these?
>>
>> You do not really need to use a separate malloc for TX and RX. You can
>> reuse the first element of the RX vector for TX.
>>
>> Fair point.
>
> No, I mean: why not just use arrays in NetL2TPV3State?  All of them
> are sized statically.  Avoiding pointer chasing also improves
> performance. :)  It also avoids memory leaks; I just noticed that
> you're not freeing the memory you allocate in net_l2tpv3_init (I
> checked s->header-buf).

Correct, I have not updated the cleanup procedures to deal with freeing
all the vector bits, it will be in the updated version.

>
>>> Is the local address mandatory?
>>
>> In L2TPv3 - yes.
>
> Ok.
>
>> In fact so is the remote - our "listen mode" is a hack.
>
> The listen mode is not implemented in this patch, is it?

Not yet. I had to remove it to add recvmmsg, I was going to add it.
However it is not mandatory by any means. I am happy for this to be
without it.


>
> Thanks for the prompt reply.
>
> Note that I posted two small fixes to qemu-sockets.c.  You may want to
> include them.

OK.

>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28  8:28 [Qemu-devel] Contribution - L2TPv3 transport Anton Ivanov (antivano)
  2014-02-28 10:02 ` Paolo Bonzini
@ 2014-02-28 13:40 ` Eric Blake
  2014-02-28 13:52   ` Anton Ivanov (antivano)
  2014-02-28 14:00   ` Paolo Bonzini
  2014-03-03 13:27 ` Stefan Hajnoczi
  2014-03-03 14:53 ` Stefan Hajnoczi
  3 siblings, 2 replies; 36+ messages in thread
From: Eric Blake @ 2014-02-28 13:40 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3807 bytes --]

On 02/28/2014 01:28 AM, Anton Ivanov (antivano) wrote:
> Hi all,
> 
> On behalf of Cisco Systems I am authorized to contribute a new transport
> to the network subsystem in qemu.
> 

> 
> Patch attached.

Your patch is huge - it might be better to break it into smaller logical
chunks to make it easier to review.  See this page for more hints:
http://wiki.qemu.org/Contribute/SubmitAPatch

Your patch is lacking a diffstat, and was sent as an attachment rather
than inline.  This makes it harder to review.

I'm going to focus my review on just the qapi interface for now.

> +++ b/net/l2tpv3.c
> @@ -0,0 +1,630 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012 Cisco Systems

It's now 2014.


> +++ b/qapi-schema.json
> @@ -2941,6 +2941,45 @@
>      '*udp':       'str' } }
>  
>  ##
> +# @NetdevL2TPv3Options
> +#
> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
> +#
> +# @fd: #optional file descriptor of an already opened socket

Other commands name this parameter 'fdname' (see 'add_client'); it takes
an fd named via the 'getfd' command.  However, this style is old, and
new code should instead prefer file names rather than fd names.  That
is, we want the newer style of using 'add-fd', then referring to the
file by name (/dev/fdset/nnn), and not the older style of 'getfd'; using
a file name also gives you the flexibility of using actual Unix socket
files stored in the file system.

> +#
> +# @src: source address
> +#
> +# @dst: #optional destination address
> +#
> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual definition)

Ewww.  This is is gross.  This API should be self-documented here,
without making the end-reader chase down a source file.  And passing raw
numbers for bit ids is not user-friendly.  Better would be making this
parameter be an enum (if you can describe all modes via a single name,
as in 'mode':'cookie') or a series of bool arguments (perhaps optional
with sane defaults, as in 'udp':true,'cookie':false).

> +#
> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @txsession: tx 32 bit session
> +#
> +# @rxsession: rx 32 bit session
> +#
> +#
> +# Since 1.0

s/1.0/2.0/

> +##
> +##
> +{ 'type': 'NetdevL2TPv3Options',
> +  'data': {
> +    '*fd':        'str',
> +    '*src':	  'str', 

You didn't list 'src' as optional above.

Trailing whitespace - run your submission through checkpatch.pl.

> +    '*dst':	  'str', 
> +    '*mode':	  'str', 

As mentioned above, 'mode' should not be a string.

> +    '*txcookie':  'str',
> +    '*rxcookie':  'str',
> +    '*txsession': 'str',
> +    '*rxsession': 'str'

These should probably be 'int', not 'str'.  If you have to hand-parse a
string into a numeric value, you encoded the QMP wrong.

> +
> +} }
> +
> +##
> +##
>  # @NetdevVdeOptions
>  #
>  # Connect the VLAN to a vde switch running on the host.
> @@ -3021,6 +3060,7 @@
>      'nic':      'NetLegacyNicOptions',
>      'user':     'NetdevUserOptions',
>      'tap':      'NetdevTapOptions',
> +    'l2tpv3':   'NetdevL2TPv3Options',

Please add documentation that the l2tpv3 arm of this union was added in
2.0 (yes, I know we haven't been very good at documenting when unions
were extended, but it can't hurt to do a better job going forward).

>      'socket':   'NetdevSocketOptions',
>      'vde':      'NetdevVdeOptions',
>      'dump':     'NetdevDumpOptions',
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 13:40 ` Eric Blake
@ 2014-02-28 13:52   ` Anton Ivanov (antivano)
  2014-02-28 13:57     ` Eric Blake
  2014-02-28 14:00   ` Paolo Bonzini
  1 sibling, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 13:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: Qemu-devel

On 28/02/14 13:40, Eric Blake wrote:
> On 02/28/2014 01:28 AM, Anton Ivanov (antivano) wrote:
>> Hi all,
>>
>> On behalf of Cisco Systems I am authorized to contribute a new transport
>> to the network subsystem in qemu.
>>
>> Patch attached.
> Your patch is huge - it might be better to break it into smaller logical
> chunks to make it easier to review.  See this page for more hints:
> http://wiki.qemu.org/Contribute/SubmitAPatch
>
> Your patch is lacking a diffstat, and was sent as an attachment rather
> than inline.  This makes it harder to review.
>
> I'm going to focus my review on just the qapi interface for now.
>
>> +++ b/net/l2tpv3.c
>> @@ -0,0 +1,630 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2012 Cisco Systems
> It's now 2014.


Fair point, will fix.
>
>
>> +++ b/qapi-schema.json
>> @@ -2941,6 +2941,45 @@
>>      '*udp':       'str' } }
>>  
>>  ##
>> +# @NetdevL2TPv3Options
>> +#
>> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
>> +#
>> +# @fd: #optional file descriptor of an already opened socket
> Other commands name this parameter 'fdname' (see 'add_client'); it takes
> an fd named via the 'getfd' command.  However, this style is old, and
> new code should instead prefer file names rather than fd names.  That
> is, we want the newer style of using 'add-fd', then referring to the
> file by name (/dev/fdset/nnn), and not the older style of 'getfd'; using
> a file name also gives you the flexibility of using actual Unix socket
> files stored in the file system.
>
>> +#
>> +# @src: source address
>> +#
>> +# @dst: #optional destination address
>> +#
>> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual definition)
> Ewww.  This is is gross.  This API should be self-documented here,
> without making the end-reader chase down a source file.  And passing raw
> numbers for bit ids is not user-friendly.  Better would be making this
> parameter be an enum (if you can describe all modes via a single name,
> as in 'mode':'cookie') or a series of bool arguments (perhaps optional
> with sane defaults, as in 'udp':true,'cookie':false).

I discussed this with Paolo, a fixed version compliant to coding style
guidelines and with a user friendly arg list will be made available in a
week or two.
>
>> +#
>> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set mode for actual type selection)
>> +#
>> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set mode for actual type selection)
>> +#
>> +# @txsession: tx 32 bit session
>> +#
>> +# @rxsession: rx 32 bit session
>> +#
>> +#
>> +# Since 1.0
> s/1.0/2.0/

OK - just to clarify, which version is this referring to - qemu, api, etc?

>
>> +##
>> +##
>> +{ 'type': 'NetdevL2TPv3Options',
>> +  'data': {
>> +    '*fd':        'str',
>> +    '*src':	  'str', 
> You didn't list 'src' as optional above.

That is intended. It is not optional for the ip command in Linux either.
For L2TPv3 tunnels you have to specify the source address.

>
> Trailing whitespace - run your submission through checkpatch.pl.
>
>> +    '*dst':	  'str', 
>> +    '*mode':	  'str', 
> As mentioned above, 'mode' should not be a string.
>
>> +    '*txcookie':  'str',
>> +    '*rxcookie':  'str',
>> +    '*txsession': 'str',
>> +    '*rxsession': 'str'
> These should probably be 'int', not 'str'.  If you have to hand-parse a
> string into a numeric value, you encoded the QMP wrong.

These by spec are either 32 bit or 64 bit integers. I will fix them
accordingly.

>
>> +
>> +} }
>> +
>> +##
>> +##
>>  # @NetdevVdeOptions
>>  #
>>  # Connect the VLAN to a vde switch running on the host.
>> @@ -3021,6 +3060,7 @@
>>      'nic':      'NetLegacyNicOptions',
>>      'user':     'NetdevUserOptions',
>>      'tap':      'NetdevTapOptions',
>> +    'l2tpv3':   'NetdevL2TPv3Options',
> Please add documentation that the l2tpv3 arm of this union was added in
> 2.0 (yes, I know we haven't been very good at documenting when unions
> were extended, but it can't hurt to do a better job going forward).
>
>>      'socket':   'NetdevSocketOptions',
>>      'vde':      'NetdevVdeOptions',
>>      'dump':     'NetdevDumpOptions',
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 10:02 ` Paolo Bonzini
  2014-02-28 11:17   ` Anton Ivanov (antivano)
@ 2014-02-28 13:55   ` Anton Ivanov (antivano)
  2014-03-04 15:19   ` Anton Ivanov (antivano)
  2 siblings, 0 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 13:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

I now remember why I am not using connected sockets.

connect() in UDP used to break source address bind(). I had a bug filed
vs that as far back as early 2.6 which everyone ignored for ages. It was
showing as bug vs hpa-tftpd.

I need to reverify if this is still the case as it has to keep a
particular source address. I will do so before reissuing the patch.

A.


On 28/02/14 10:02, Paolo Bonzini wrote:
>> This transport uses a linux specific call to get several GBit RX rate.
>
> Might as well mention that it's recvmmsg. :)
>
>> This call can be wrapped (at some cost) in a compatibility loop using
>> posix compliant recvmsg instead for other systems. As I am not familiar
>> with the fine details on how to add linux specific, windows specific,
>> etc bits to qemu I have decided to leave that bit out for the time being
>> and submit it "as we use it".
>
> You can add the compatibility structures in include/qemu/recvmmsg.h
> and util/recvmmsg.c, and only compile recvmmsg.c on non-Linux system
> with something like
>
>     util-obj-$(call lnot, CONFIG_LINUX) += recvmmsg.o
>
> For now, you can just make l2tpv3.c Linux-specific.
>
>> Patch attached.
>
> Thanks for the contribution!  I have quite a few comments.  The
> largest part of the work will be to cleanup the user interface.
>
> Also, I suggest using connected sockets instead of connectionless, but
> that should be comparatively less work.
>
> You also need to document the new options in qemu-options.hx.
>
> Please don't be turned off by the amount of comments.  This is quite
> normal.  I'm CCing the maintainer of the network layer, Stefan
> Hajnoczi, who can help you more with this task.
>
>> Best Regards,
>>
>> Anton
>>
>>
>> l2tpv3.diff
>>
>> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
>> index 45588d7..865f1c2 100644
>> --- a/include/qemu/sockets.h
>> +++ b/include/qemu/sockets.h
>> @@ -79,6 +79,7 @@ int socket_dgram(SocketAddress *remote,
>> SocketAddress *local, Error **errp);
>>
>>  /* Old, ipv4 only bits.  Don't use for new code. */
>>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
>> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str);
>>  int socket_init(void);
>>
>>  #endif /* QEMU_SOCKET_H */
>> diff --git a/net/Makefile.objs b/net/Makefile.objs
>> index 4854a14..364c785 100644
>> --- a/net/Makefile.objs
>> +++ b/net/Makefile.objs
>> @@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
>>  common-obj-y += socket.o
>>  common-obj-y += dump.o
>>  common-obj-y += eth.o
>> +common-obj-y += l2tpv3.o
>
> This should be CONFIG_LINUX or (better, if you implement a
> compatibility layer for recvmmsg) CONFIG_POSIX, because Windows does
> not have sendmsg/recvmsg.
>
> QEMU already has some compatibility code for sendmsg/recvmsg
> (iov_send/iov_recv) but it doesn't let you specify a destination, so
> for now we can make it POSIX only.
>
>>  common-obj-$(CONFIG_POSIX) += tap.o
>>  common-obj-$(CONFIG_LINUX) += tap-linux.o
>>  common-obj-$(CONFIG_WIN32) += tap-win32.o
>> diff --git a/net/clients.h b/net/clients.h
>> index 7793294..ea2a831 100644
>> --- a/net/clients.h
>> +++ b/net/clients.h
>> @@ -47,6 +47,11 @@ int net_init_tap(const NetClientOptions *opts,
>> const char *name,
>>  int net_init_bridge(const NetClientOptions *opts, const char *name,
>>                      NetClientState *peer);
>>
>> +int net_init_raw(const NetClientOptions *opts, const char *name,
>> +                    NetClientState *peer);
>> +
>> +int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
>> +                    NetClientState *peer);
>>  #ifdef CONFIG_VDE
>>  int net_init_vde(const NetClientOptions *opts, const char *name,
>>                   NetClientState *peer);
>> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
>> new file mode 100644
>> index 0000000..4b5d8ee
>> --- /dev/null
>> +++ b/net/l2tpv3.c
>> @@ -0,0 +1,630 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2012 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "net/l2tpv3.h"
>> +#include "config-host.h"
>> +#include "config-host.h"
>> +#include "net/net.h"
>> +#include "clients.h"
>> +#include "monitor/monitor.h"
>> +#include "qemu-common.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/option.h"
>> +#include "qemu/sockets.h"
>> +#include "qemu/iov.h"
>> +#include "qemu/main-loop.h"
>> +#include <linux/ip.h>
>> +
>> +
>> +
>> +
>> +#define PAGE_SIZE 4096
>> +#define MAX_L2TPV3_IOVCNT 64
>> +
>> +typedef struct NetL2TPV3State {
>> +    NetClientState nc;
>> +    int fd;
>> +    int state; /* 0 = getting length, 1 = getting data */
>> +    unsigned int index;
>> +    unsigned int packet_len;
>> +    /*
>> +    these are used for xmit - that happens packet a time
>> +    and for first sign of life packet (easier to parse that once)
>> +
>> +    */
>
> Hi Anton,
>
> thanks for the contribution!
>
> The new files do not obey the QEMU coding standards. We use comments like
>
>     /* These are used for xmit - that happens packet a time
>      * and for first sign of life packet (easier to parse that once).
>      */
>
> and no C++ comments.
>
>> +    uint8_t * header_buf;
>> +    uint8_t * buf;
>> +    struct iovec * vec;
>> +    /*
>> +    these are used for receive - try to "eat" up to 32 packets at a
>> time
>> +    */
>> +    struct mmsghdr * msgvec;
>> +    /*
>> +      contains inet host and port destination if
>> +      connectionless (SOCK_DGRAM). Will be NULL if
>> +      in listen mode waiting for first connect
>> +
>> +    */
>> +    struct sockaddr_storage * dgram_dst;
>
> I cannot see any code that handles "listening" mode, so please drop
> all of it and assume dgram_dst is non-NULL throughout the code.
>
> Even if we were to add a "listening" mode later, I'd prefer a boolean
> value ("bool connected;") and not making dgram_dst a pointer.
>
>> +    uint64_t rx_cookie;
>> +    uint64_t tx_cookie;
>> +    uint32_t rx_session;
>> +    uint32_t tx_session;
>> +    uint32_t header_size;
>> +    uint32_t counter;
>> +    uint32_t dst_size;
>> +    uint32_t new_mode;
>> +
>> +    /* offsets */
>> +
>> +    uint32_t offset;   /* main offset == header offset */
>> +    uint32_t cookie_offset;
>> +    uint32_t counter_offset;
>> +    uint32_t session_offset;
>> +
>
> Useless blank line.
>
>> +} NetL2TPV3State;
>> +
>> +typedef struct NetL2TPV3ListenState {
>> +    NetClientState nc;
>> +    char *model;
>> +    char *name;
>> +    int fd;
>> +} NetL2TPV3ListenState;
>> +
>> +static int l2tpv3_parse_cookie32(const char *src , void * dst) {
>> +
>> +  if (
>> +     (src == NULL) ||
>> +     (sscanf(src, "%x", (unsigned int *) dst) != 1)
>
> Is it really so bad that the user has to prepend "0x" to the cookie?
> Then you can just use the same parsing that is used elsewhere: declare
> the fields as 'int64' in qapi-schema.json and QEMU will do everything
> for you.
>
>> +      ) {
>> +     fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
>> +     return -1;
>> +   }
>> +
>> +   return 0;
>> +}
>> +
>> +static int l2tpv3_parse_cookie64(const char *src , void * dst) {
>> +
>> +   struct temphtonl temph;
>> +   uint32_t temp;
>> +   const int num = 42;
>> +    if (
>> +     (src == NULL) ||
>> +     (sscanf(src, "%lx", (long unsigned int *) &temph) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse cookie!!!: %s\n", src);
>> +     return -1;
>> +   }
>> +
>> +   if(*(char *)&num == 42) {
>> +      // why oh why there is no htonll
>> +      temp = htonl(temph.high);
>> +      temph.high = htonl(temph.low);
>> +      temph.low = temp;
>> +   } else {
>> +      temph.low = htonl(temph.low);
>> +      temph.high = htonl(temph.high);
>> +   }   
>
> There is no htonll, but QEMU has stq_be_p.  You can use that instead
> of this hack.
>
> BTW, the indentation is off in most of the file.
>
> As mentioned below, I suggest storing the cookies and session ids in
> host order in NetL2TPV3State, and doing the conversion in
> l2tpv3_form_header and friends.
>
>> +   memcpy(dst, &temph, sizeof (uint64_t));
>> +   return 0;
>> +}
>> +
>> +
>> +static int l2tpv3_form_header(NetL2TPV3State *s) {
>> +
>> +   uint32_t *header;
>> +   uint32_t *session;
>> +   uint64_t *cookie64;
>> +   uint32_t *cookie32;
>> +   uint32_t *counter;
>> +
>> +   if (s->new_mode & NEW_MODE_UDP) {
>> +     header = (uint32_t *) s->header_buf;
>> +     * header = htonl(0x30000);
>
> stl_be_p
>
>> +   }
>> +   session = (uint32_t *) (s->header_buf + s->session_offset);
>> +   * session = s->tx_session;
>
> As mentioned above, better store s->tx_session in host order, and use
> stl_be_p.
>
>> +
>> +   if (s->new_mode & NEW_MODE_COOKIE) {
>> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
>> +     cookie64 = (uint64_t *)(s->header_buf + s->cookie_offset);
>> +     * cookie64 = s->tx_cookie;
>> +      } else {
>> +     cookie32 = (uint32_t *) (s->header_buf + s->cookie_offset);
>> +     * cookie32 = * ((uint32_t *) s->tx_cookie);
>> +      }
>> +   }
>> +
>> +   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
>> +      counter = (uint32_t *)(s->header_buf + s->counter_offset);
>> +      * counter = htonl(++ s->counter);
>> +   }
>> +
>> +   return 0;
>> +}
>> +
>> +
>> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc,
>> const struct iovec *iov, int iovcnt)
>> +{
>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +
>> +
>
> Double blank line.
>
>> +    struct msghdr message;
>> +    int ret;
>
> Might as well add a blank line here. :)
>
>> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>> +    fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n",
>> iovcnt, MAX_L2TPV3_IOVCNT);
>> +    return -1;
>> +    }
>> +    l2tpv3_form_header(s);
>> +    memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
>> +    s->vec->iov_base = s->header_buf;
>> +    s->vec->iov_len = s->offset;
>> +    if (s->dgram_dst) {
>> +       message.msg_name = s->dgram_dst;
>> +       message.msg_namelen = s->dst_size;
>> +       message.msg_iov = (struct iovec *) s->vec;
>> +       message.msg_iovlen = iovcnt + 1;
>> +       message.msg_control = NULL;
>> +       message.msg_controllen = 0;
>> +       message.msg_flags = 0;
>> +       ret = sendmsg(s->fd, &message, MSG_DONTWAIT) - s->offset;
>
> You can use iov_send
>
>> +       if (ret == 0) {
>> +        ret = iov_size (iov, iovcnt);
>> +       }
>> +   } else {
>> +      fprintf(stderr, "no dst yet\n");
>> +      ret = iov_size (iov, iovcnt); /* silently drop, otherwise it
>> is being stupid */
>> +   }
>> +    return ret;
>> +}
>> +
>> +
>> +static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, const
>> uint8_t *buf, size_t size)
>> +{
>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +
>> +    struct iovec * vec;
>> +
>> +    struct msghdr message;
>> +    l2tpv3_form_header(s);
>> +    vec = s->vec;
>> +    vec->iov_base = s->header_buf;
>> +    vec->iov_len = s->offset;
>> +    vec++;
>> +    vec->iov_base = (void *) buf;
>> +    vec->iov_len = size;
>> +
>> +    if (s->dgram_dst) {
>> +       message.msg_name = s->dgram_dst;
>> +       message.msg_namelen = s->dst_size;
>> +       message.msg_iov = (struct iovec *) s->vec;
>> +       message.msg_iovlen = 2;
>> +       message.msg_control = NULL;
>> +       message.msg_controllen = 0;
>> +       message.msg_flags = 0;
>> +       size = sendmsg(s->fd, &message, 0) - s->offset;
>> +   } else {
>> +      fprintf(stderr, "no dst yet\n");
>> +   }
>> +   return size; /* silently drop, otherwise it is being stupid */
>> +}
>> +
>> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {
>> +   uint64_t *cookie64;
>> +   uint32_t *cookie32;
>> +   uint32_t *session;
>> +
>> +
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>> +    buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
>> +    }
>> +
>> +
>
> Double blank line, and incorrect indentation.
>
>> +   if (s->new_mode & NEW_MODE_COOKIE) {
>> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
>> +     /* 64 bit cookie */
>> +     cookie64 = (uint64_t *)(buf + s->cookie_offset);
>> +     if (*cookie64 != s->rx_cookie) {
>> +        fprintf(stderr, "unknown cookie id\n");
>> +        return -1; /* we need to return 0, otherwise barfus */
>> +     }
>> +      } else {
>> +     cookie32 = (uint32_t *)(buf + s->cookie_offset);
>> +     if (*cookie32 != * (uint32_t *) &s->rx_cookie) {
>> +        fprintf(stderr,"unknown cookie id\n");
>> +        return -1 ; /* we need to return 0, otherwise barfus */
>> +     }
>> +      }
>> +   }
>> +   session = (uint32_t *) (buf + s->session_offset);
>> +   if (* session != s->rx_session) {
>> +      fprintf(stderr,"session mismatch\n");
>> +      return -1;
>> +   }   
>> +   return 0;
>> +}
>> +
>> +static void net_l2tpv3_send(void *opaque)
>> +{
>> +    NetL2TPV3State *s = opaque;
>> +
>> +    int i, count, offset;
>> +
>> +    struct mmsghdr * msgvec;
>> +    struct iovec * vec;
>> +
>> +
>> +    msgvec = s->msgvec;
>> +    offset = s->offset;
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>
> Space before the opening brace, and parentheses around !(a & b) are
> unnecessary.  More instances in the rest of the file.
>
>> +    offset +=   sizeof(struct iphdr);
>> +    }
>> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2,
>> MSG_DONTWAIT, NULL);
>> +    for (i=0;i<count;i++) {
>> +    if (msgvec->msg_len > 0) {
>
> More instances of inconsistent indentation (both within the file, and
> with the rest of QEMU).
>
>> +        vec = msgvec->msg_hdr.msg_iov;
>> +        if (l2tpv3_verify_header(s, vec->iov_base) == 0) {
>> +        //vec->iov_len = offset; /* reset size just in case*/
>> +        vec++;
>> +        qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len -
>> offset);
>> +        } else {
>> +        fprintf(stderr, "l2tpv3 header verification failed\n");
>> +        vec->iov_len = offset;
>> +        vec++;
>> +        }
>> +        //vec->iov_len = PAGE_SIZE; /* reset for next read */
>
> Do not include commented-out code.
>
>> +    }
>> +    msgvec++;
>> +    }
>> +}
>> +
>> +static void net_l2tpv3_cleanup(NetClientState *nc)
>> +{
>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
>> +    close(s->fd);
>> +}
>> +
>> +static NetClientInfo net_l2tpv3_info = {
>> +    .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
>> +    .size = sizeof(NetL2TPV3State),
>> +    .receive = net_l2tpv3_receive_dgram,
>> +    .receive_iov = net_l2tpv3_receive_dgram_iov,
>> +    .cleanup = net_l2tpv3_cleanup,
>> +};
>> +
>> +static int net_l2tpv3_init(NetClientState *peer,
>> +                                 const char *name,
>> +                                 const char *src,
>> +                                 const char *dst,
>> +                                 const char *new_mode,
>> +                                 const char *tx_cookie,
>> +                                 const char *rx_cookie,
>> +                                 const char *tx_session,
>> +                                 const char *rx_session
>> +)
>
> This parenthesis goes on the same line as the last argument.
>
> Also, I suggest getting a NetdevL2TPv3Options* instead of eight
> different arguments.
>
>> +{
>> +   NetL2TPV3State *s;
>> +   NetClientState *nc;
>> +
>> +   int fd, i;
>> +
>> +   int sock_family, sock_type, sock_proto;
>> +   unsigned int temp;
>> +
>> +   struct mmsghdr * msgvec;
>> +   struct iovec * iov;
>> +
>> +
>> +   struct sockaddr_storage LocalSock;
>> +
>> +
>> +   fprintf(stderr, "l2tpv3 user init mode %s\n", new_mode);
>> +   memset(&LocalSock, 0 , sizeof(LocalSock));
>> +
>> +   nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
>> +
>> +   s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +
>> +   s->header_buf = g_malloc(256);
>> +
>> +   if (s->header_buf == NULL) {
>> +    fprintf(stderr, "Failed to allocate header buffer \n");
>> +    return -1;
>> +   }
>> +   s->buf = g_malloc(PAGE_SIZE);
>> +
>> +   if (s->buf == NULL) {
>> +    fprintf(stderr, "Failed to allocate packet buffer \n");
>> +    return -1;
>> +   }
>> +   s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);;
>
> Why do you need separate mallocs for these?
>
>> +   if (s->vec == NULL) {
>> +    fprintf(stderr, "Failed to allocate packet vec \n");
>> +    return -1;
>> +   }
>> +
>> +
> Double blank lines.
>
>> +   s->offset = 4;
>> +   s->session_offset = 0;
>> +   s->cookie_offset = 4;
>> +   s->counter_offset = 4;
>> +   s->dst_size = sizeof(struct sockaddr_storage);
>> +
>> +   sscanf(new_mode, "%x", &s->new_mode);
>
> Please replace this with multiple boolean or integer options, QAPI
> makes it easy.
>> +
>> +
>> +   fprintf(stderr, "type is: %s -> %x\n", new_mode, s->new_mode);
>
> No debugging output.
>
>> +   /* Cookies */
>> +
>> +  if (
>> +     (tx_session == NULL) ||
>> +     (sscanf(tx_session, "%x", &temp) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse tx_session!!!: %s\n", tx_session);
>> +     return -1;
>> +   } else {
>> +      s->tx_session = htonl(temp);
>> +      fprintf(stderr, "l2tpv3_open :  parsed tx session 32, %0x\n",
>> s->tx_session);
>> +   }
>
> As above, better force the user to prepend a "0x" and let QAPI do the
> parsing.
>
>> +  if (
>> +     (tx_session == NULL) ||
>> +     (sscanf(rx_session, "%x", &temp) != 1)
>> +      ) {
>> +     fprintf(stderr, "cannot parse rx_session!!!: %s\n", rx_session);
>> +     return -1;
>> +   }  else {
>> +      s->rx_session = htonl(temp);
>> +      fprintf(stderr, "l2tpv3_open :  parsed rx session 32, %0x\n",
>> s->rx_session);
>> +   }
>> +   if (s->new_mode & NEW_MODE_COOKIE) {
>> +      if (s->new_mode & NEW_MODE_COOKIE_SIZE) {
>
> Perhaps an optional integer argument like
> "cookie-size=32"/"cookie-size=64"?  Also, please mark
> tx-cookie/rx-cookie as optional and check that they are present if and
> only if cookie-size != 0.
>
> Alternatively, mark tx-cookie/rx-cookie as optional and do all the
> following:
>
> * check that either both are present or none is
>
> * check that the optional integer argument cookie-size, if present, is
> either 32 or 64
>
> * check that cookie-size is not present unless tx-cookie and rx-cookie
> are also present
>
> * if it makes sense, give a default value to cookie-size.  Otherwise,
> check that cookie-size is present if tx-cookie and rx-cookie are
>
>> +     /* 64 bit cookie */
>> +     s->offset += 8;
>> +     s->counter_offset += 8;
>> +     if (l2tpv3_parse_cookie64(tx_cookie,&s->tx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie
>> 64\n");
>> +        return -1;
>> +     } else {
>> +        fprintf(stderr, "l2tpv3_open :  parsed tx cookie 64,
>> %0lx\n", s->tx_cookie);
>> +     }
>> +     if (l2tpv3_parse_cookie64(rx_cookie,&s->rx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie
>> 64\n");
>> +        return -1;
>> +     } else {
>> +        fprintf(stderr, "l2tpv3_open :  parsed rx cookie 64,
>> %0lx\n", s->rx_cookie);
>> +     }
>> +      } else {
>> +     /* 32 bit cookie */
>> +     s->offset += 4;
>> +     s->counter_offset +=4;
>> +     s->tx_cookie = 0;
>> +     if (l2tpv3_parse_cookie32(tx_cookie,&s->tx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse tx cookie
>> 32\n");
>> +        return -1;
>> +     }
>> +     s->rx_cookie = 0;
>> +     if (l2tpv3_parse_cookie32(rx_cookie,&s->rx_cookie) !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse rx cookie
>> 32\n");
>> +        return -1;
>> +     }
>> +      }
>> +   }
>> +
>> +
>> +
>> +   if (dst && (strlen(dst) > 0 )) {
>> +      /* we now allocate it only if it we are not "listening" */
>> +      s->dgram_dst = malloc(sizeof(struct sockaddr_storage));
>> +      memset(s->dgram_dst, 0 , sizeof(struct sockaddr_storage));
>> +      fprintf(stderr, "l2tpv3_open : setting dst at init\n");
>> +   } else {
>> +      s->dgram_dst = NULL;
>> +   }
>> +
>> +   if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +      /* IPv6 */
>> +      sock_family = AF_INET6;
>> +      if (parse_host_port6((struct sockaddr_in6 *) &LocalSock, src)
>> !=0) {
>
> Is the local address mandatory?
>
>> +     fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
>> +     return -1;
>> +      };
>> +   } else {
>> +      /* IPv4 */
>> +      sock_family = AF_INET;
>> +      if (parse_host_port((struct sockaddr_in*) &LocalSock, src) !=
>> 0) {
>
> I would try using a connected socket?  This way you can use the
> socket_dgram function to create the UDP socket, resolve host names,
> use getaddrinfo properly, etc.
>
> For raw sockets, you can cut-and-paste relevant bits of socket_dgram
> into a new function that creates raw sockets:
>
>     int socket_raw(const char *host, const char *localaddr,
>                    int proto, bool ipv4, bool ipv6, Error **errp)
>     {
>         ...
>     }
>
> Errors from socket_dgram/socket_raw can be printed with the
> qerror_report_err function.
>
>> +     fprintf(stderr, "l2tpv3_open :  failed to parse v4 src\n");
>> +     return -1;
>> +      };
>> +   }
>
> The way this is usually done is using two optional booleans named ipv4
> and ipv6.  Also, please use separate options host, port, localaddr,
> localport for consistency with other places where we specify the
> address of a datagram socket.
>
> Add an optional argument instead of the NEW_MODE_UDP bit.  Examples:
>
> * an enum named "transport" and make it accept values "udp" or "ip"
>
> * a boolean named "udp"
>
> etc.  You can decide what the default is.  Make port optional (and
> localport too if making localaddr/localport mandatory is justified);
> check that it isn't specified for the raw protocol.
>
>> +   if (s->new_mode & NEW_MODE_UDP) {
>> +      sock_type = SOCK_DGRAM;
>> +      sock_proto = 0;
>> +      /* space for header */
>> +
>> +      s->offset += 4;
>> +      s->counter_offset += 4;
>> +      s->session_offset += 4;
>> +      s->cookie_offset += 4;
>> +    } else {
>> +      sock_type = SOCK_RAW;
>> +      sock_proto = 0x73;
>> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +     /* IPv6 */
>> +     ((struct sockaddr_in6 *) &LocalSock)->sin6_port = htons(0x73);
>> +      } else {
>> +     /* IPv4 */
>> +     ((struct sockaddr_in *) &LocalSock)->sin_port = htons(0x73);
>
> If you move socket creation to a function in util/qemu-sockets.c, like
> the socket_raw I suggest above, note that there is already an
> inet_setport in util/qemu-sockets.c.
>
>> +      }
>> +   }
>> +
>> +   if (!(s->new_mode & NEW_MODE_NO_COUNTER)) {
>
> Another optional boolean, "counter", defaulting (I guess) to 0.
>
>> +      s->offset += 4;
>> +   }
>> +   if ((fd = socket(sock_family, sock_type, sock_proto)) == -1) {
>> +      fd = -errno;
>> +      fprintf(stderr, "l2tpv3_open : socket creation failed, "
>> +           "errno = %d\n", -fd);
>> +      return fd;
>> +   }
>> +
>> +
>> +   if (bind(fd, (struct sockaddr *) &LocalSock, sizeof(LocalSock))) {
>> +      fprintf(stderr, "l2tpv3_open :  could not bind socket
>> err=%i\n", errno);
>> +      close(fd);
>> +      return -1;
>> +   } else {
>> +      fprintf(stderr, "l2tpv3_open : socket bound successfully\n");
>> +   }
>> +   if (s->dgram_dst) {
>> +      if (s->new_mode & NEW_MODE_IP_VERSION) {
>> +     /* IPv6 */
>> +     if (parse_host_port6((struct sockaddr_in6 *) s->dgram_dst, dst)
>> !=0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse v6 dst\n");
>> +        return -1;
>> +     };
>> +     s->dst_size = sizeof(struct sockaddr_in6);
>> +      } else {
>> +     /* IPv4 */
>> +     if (parse_host_port((struct sockaddr_in *) s->dgram_dst, dst)
>> != 0) {
>> +        fprintf(stderr, "l2tpv3_open :  failed to parse v4 dst\n");
>> +        return -1;
>> +     };
>> +     s->dst_size = sizeof(struct sockaddr_in);
>> +      }
>> +   }
>> +
>> +   s->msgvec = g_malloc(sizeof(struct mmsghdr) *
>> (MAX_L2TPV3_IOVCNT/2));
>> +   if (s->msgvec == NULL) {
>> +    fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +    return -1;
>> +   }
>> +   msgvec = s->msgvec;
>
> More allocations that could be left in NetL2TPV3State.
>
>> +   for (i=0;i<MAX_L2TPV3_IOVCNT/2;i++) {
>> +    /* we need to allocate these only for the first time and first
>> buffer */
>> +    msgvec->msg_hdr.msg_name = NULL;
>> +    msgvec->msg_hdr.msg_namelen = 0;
>> +    iov =  g_malloc(sizeof(struct iovec) * 2);
>
> This could also be a multidimensional array.
>
>> +    if (iov == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>> +    msgvec->msg_hdr.msg_iov = iov;
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>
> Dropping new_mode will also make this code much more readable:
>
>      "if (!s->ipv6 && !s->udp)"
>
>> +        iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr));
>> /* fix for ipv4 raw */;
>
> Extra semicolon at the end of the line, also please try to be more
> verbose:
>
>     /* We need to allocate blah blah for IPv4 raw sockets, because blah
>      * blah.
>      */
>
>> +    } else {
>> +        iov->iov_base = g_malloc(s->offset);
>> +    }
>> +   
>> +    if (iov->iov_base == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>
> g_malloc can never fail.
>
>> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode &
>> NEW_MODE_UDP))){
>> +        iov->iov_len = s->offset + sizeof (struct iphdr);
>> +    } else {
>> +        iov->iov_len = s->offset;
>> +    }
>
> Better set iov_len first, and then refer to it in the allocation.
>
>> +    iov++ ;
>
> Just use iov[0] and iov[1].
>
>> +    iov->iov_base = qemu_memalign(PAGE_SIZE, PAGE_SIZE);
>> +    if (iov->iov_base == NULL) {
>> +        fprintf(stderr, "Failed to allocate receive packet vec \n");
>> +        return -1;
>> +    }
>> +    iov->iov_len = PAGE_SIZE;
>> +    msgvec->msg_hdr.msg_iovlen = 2;
>> +    msgvec->msg_hdr.msg_control = NULL;
>> +    msgvec->msg_hdr.msg_controllen = 0;
>> +    msgvec->msg_hdr.msg_flags = 0;
>> +    msgvec++;
>> +   }
>> +
>> +//   sendbuff = 90000;
>> +//
>> +//  printf("sets the send buffer to %d\n", sendbuff);
>> +// setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sendbuff, sizeof(sendbuff));
>
> No debug code.
>
>> +
>> +   qemu_set_nonblock(fd);
>
> Since the socket is nonblocking, MSG_DONTWAIT is unnecessary.
>
>> +
>> +   if (fd < 0)
>> +    return -1;
>
> You should have checked this much earlier, right after the socket
> function was created.
>
>> +   s->fd = fd;
>> +   s->counter = 0;
>> +
>> +   qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
>> +
>> +   if (!s) {
>> +      fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
>> +      return -1;
>> +   }
>
> Cannot happen, please delete.
>
>> +
>> +   /* this needs fixing too */
>> +
>> +   snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>> +             "l2tpv3: connected");
>
> What needs fixing?
>
>> +   return 0;
>> +
>> +}
>> +
>> +int net_init_l2tpv3(const NetClientOptions *opts,
>> +                    const char *name,
>> +                    NetClientState *peer) {
>> +
>> +
>> +    const NetdevL2TPv3Options * l2tpv3;
>> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
>> +    l2tpv3 = opts->l2tpv3;
>> +
>> +
>> +  if (!(l2tpv3->has_src && l2tpv3->has_mode && l2tpv3->has_txsession
>> && l2tpv3->has_rxsession)) {
>> +      error_report("src=, dst=, txsession=, rxsession= and mode= are
>> required for l2tpv3");
>> +      return -1;
>> +  }
>> +
>> +
>> +  /* this needs cleaning up for new API */
>
> Then do it. :)
>
>> +  if (net_l2tpv3_init(
>> +     peer,
>> +         name,
>> +     l2tpv3->src,
>> +     l2tpv3->dst,
>> +     l2tpv3->mode,
>> +     l2tpv3->txcookie,
>> +     l2tpv3->rxcookie,
>> +     l2tpv3->txsession,
>> +     l2tpv3->rxsession
>> +
>> +   ) == -1) {
>> +      return -1;
>> +  }
>> +  return 0;
>> +}
>> diff --git a/net/l2tpv3.h b/net/l2tpv3.h
>> new file mode 100644
>> index 0000000..822e0b0
>> --- /dev/null
>> +++ b/net/l2tpv3.h
>> @@ -0,0 +1,65 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2012 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu-common.h"
>> +
>> +#ifndef QEMU_NET_L2TPV3_H
>> +#define QEMU_NET_L2TPV3_H
>> +
>> +#define L2TPV3_BUFSIZE 2048
>> +
>> +#define NEW_MODE_IP_VERSION   1          /* on for v6, off for v4 */
>> +#define NEW_MODE_UDP          2          /* on for udp, off for raw
>> ip */
>> +#define NEW_MODE_COOKIE          4          /* cookie present */
>> +#define NEW_MODE_COOKIE_SIZE  8          /* on for 64 bit */
>> +#define NEW_MODE_NO_COUNTER   16      /* DT - no counter */
>
> Please use bools in NetL2TPV3State
>
>> +/* legacy modes */
>> +
>> +/* mode 0 */
>> +
>> +#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION +
>> NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE +
>> NEW_MODE_NO_COUNTER)
>> +
>> +#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
>> +
>> +/* mode 1 */
>> +
>> +#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION +
>> NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
>> +
>> +#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
>> +
>> +/* mode 2 */
>> +
>> +#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP +
>> NEW_MODE_COOKIE_SIZE )
>> +
>> +#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER
>
> All these functions are not used.
>
>> +struct temphtonl {
>> +   uint32_t low;
>> +   uint32_t high;
>> +};
>
> Should not be necessary, but in any case it would belong in net/l2tpv3.c.
>
>> +
>> +#endif /* QEMU_NET_SOCKET_H */
>> diff --git a/net/net.c b/net/net.c
>> index 0a88e68..ba5f51b 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -132,6 +132,43 @@ int parse_host_port(struct sockaddr_in *saddr,
>> const char *str)
>>      return 0;
>>  }
>>
>> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str)
>> +{
>> +    char buf[512];
>> +    char *p, *ip, *port;
>> +
>> +    strncpy((char *) &buf, str, 511);
>> +    ip = (char *) &buf;
>> +    port = NULL;
>> +    int portint;
>> +
>> +
>> +    for (p = (char *) &buf; (p < (char *) &buf + strlen(str)) || (p
>> < (char *) &buf + 512); p++){
>> +      if (*p == '[') {
>> +     ip = p + 1;
>> +      }
>> +      if (*p == ']') {
>> +     *p = '\0';
>> +        if (*(p + 1) == ':') {
>> +        port = p + 2;
>> +    }
>> +    break;
>> +      }
>> +    }
>> +
>> +    saddr->sin6_family = AF_INET6;
>> +    if (!inet_pton(AF_INET6, ip, &saddr->sin6_addr)) {
>> +         return -1;
>> +    }
>> +    if (port) {
>> +      sscanf(port, "%i", &portint);
>> +      saddr->sin6_port = htons(portint);
>> +    }
>> +    return 0;
>> +}
>
> Not needed if you use qemu-sockets.c interfaces.
>
>> +
>> +
>> +
>>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
>>  {
>>      snprintf(nc->info_str, sizeof(nc->info_str),
>> @@ -731,6 +768,7 @@ static int (* const
>> net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>>          [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
>>  #endif
>>          [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
>> +        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
>>  };
>>
>>
>> diff --git a/net/raw.c b/net/raw.c
>> new file mode 100644
>> index 0000000..04d244f
>> --- /dev/null
>> +++ b/net/raw.c
>
> Isn't this file unused?  It is not referred to by the makefiles.
>
>> @@ -0,0 +1,244 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2014 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +#include <netinet/ether.h>
>> +#include <linux/if_ether.h>
>> +#include <linux/if_packet.h>
>> +#include <sys/mman.h>
>> +#include <sys/ioctl.h>
>> +#include <net/if.h>
>> +
>> +#include "net/raw.h"
>> +
>> +#include "config-host.h"
>> +#include "net/net.h"
>> +#include "clients.h"
>> +#include "monitor/monitor.h"
>> +#include "qemu-common.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/option.h"
>> +#include "qemu/sockets.h"
>> +#include "qemu/iov.h"
>> +#include "qemu/main-loop.h"
>> +
>> +
>> +#define RAW_BUFSIZE 2048
>> +
>> +
>> +typedef struct NetRAWState {
>> +    NetClientState nc;
>> +    int fd;
>> +    int state; /* 0 = getting length, 1 = getting data */
>> +    unsigned int ring_index;
>> +    unsigned int packet_len;
>> +    uint8_t * multiread_buffer;
>> +} NetRAWState;
>> +
>> +
>> +static ssize_t net_raw_receive_dgram(NetClientState *nc, const
>> uint8_t *buf, size_t size)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    return write(s->fd, buf, size);
>> +}
>> +
>> +static ssize_t net_raw_receive_dgram_iov(NetClientState *nc, const
>> struct iovec *iov, int iovcnt)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    return writev(s->fd, iov, iovcnt);
>> +}
>> +
>> +static inline struct tpacket_hdr * current_header (NetRAWState *s) {
>> +    uint8_t * buffer;
>> +    buffer = s->multiread_buffer + (s->ring_index * RAW_TP_FRAME_SIZE);
>> +    return (struct tpacket_hdr *) buffer;
>> +}
>> +
>> +
>> +static struct tpacket_hdr * raw_advance_ring(NetRAWState *s) {
>> +
>> +    struct tpacket_hdr * header = current_header(s);
>> +    header->tp_status = TP_STATUS_KERNEL; /* mark as free */
>> +    s->ring_index = (s->ring_index + 1) % RAW_TP_FRAME_NR;
>> +    return current_header(s);
>> +}
>> +
>> +static void net_raw_send(void *opaque)
>> +{
>> +    NetRAWState *s = opaque;
>> +    struct tpacket_hdr * header;
>> +
>> +    header = current_header(s);   
>> +    while ((header->tp_status & TP_STATUS_USER) > 0) {
>> +    qemu_send_packet(&s->nc, ((uint8_t *) header) + header->tp_mac,
>> header->tp_snaplen);
>> +    header = raw_advance_ring(s);
>> +    }
>> +}
>> +
>> +
>> +static void net_raw_cleanup(NetClientState *nc)
>> +{
>> +    NetRAWState *s = DO_UPCAST(NetRAWState, nc, nc);
>> +    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
>> +    close(s->fd);
>> +}
>> +
>> +static NetClientInfo net_raw_info = {
>> +    .type = NET_CLIENT_OPTIONS_KIND_RAW,
>> +    .size = sizeof(NetRAWState),
>> +    .receive = net_raw_receive_dgram,
>> +    .receive_iov = net_raw_receive_dgram_iov,
>> +    .cleanup = net_raw_cleanup,
>> +};
>> +
>> +static int net_raw_init(NetClientState *peer,
>> +                                 const char *name,
>> +                                 const char *host_interface
>> +                 )
>> +{
>> +    NetRAWState *s;
>> +    NetClientState *nc;
>> +
>> +    int fd;
>> +
>> +
>> +
>> +    struct ifreq ifr;
>> +    struct sockaddr_ll sock;
>> +
>> +    int err;
>> +    struct tpacket_req tpacket;
>> +
>> +
>> +    nc = qemu_new_net_client(&net_raw_info, peer, "raw", name);
>> +
>> +    s = DO_UPCAST(NetRAWState, nc, nc);
>> +
>> +    s->ring_index = 0;
>> +   
>> +
>> +    if ((fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
>> +    err = -errno;
>> +    fprintf(stderr, "raw_open : raw socket creation failed, errno =
>> %d\n", -err);
>> +    return err;
>> +    }
>> +
>> +    tpacket.tp_block_size = RAW_TP_BLOCK_SIZE;
>> +    tpacket.tp_frame_size = RAW_TP_FRAME_SIZE;
>> +    tpacket.tp_block_nr = RAW_TP_BLOCK_NR ;
>> +    tpacket.tp_frame_nr = RAW_TP_FRAME_NR;
>> +
>> +    if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *)
>> &tpacket, sizeof(struct tpacket_req))) {
>> +    fprintf(stderr, "uml_raw: failed to request packet mmap");
>> +    return -errno;
>> +    } else {
>> +    fprintf(stderr, "uml_raw: requested packet mmap\n");
>> +    }
>> +
>> +    s->multiread_buffer = (uint8_t *) mmap(
>> +        NULL,
>> +        RAW_TP_FRAME_SIZE * RAW_TP_FRAME_NR,
>> +        PROT_READ | PROT_WRITE, MAP_SHARED,
>> +        fd,
>> +        0
>> +    );
>> +
>> +    if (!(s->multiread_buffer)) {
>> +    fprintf(stderr, "raw: failed to map buffer");
>> +    return -1;
>> +    } else {
>> +    fprintf(stderr, "raw: mmmap-ed buffer at %p\n",
>> s->multiread_buffer);
>> +    }
>> +
>> +
>> +
>> +
>> +    memset(&ifr, 0, sizeof(struct ifreq));
>> +    strncpy((char *) &ifr.ifr_name, host_interface,
>> sizeof(ifr.ifr_name) - 1);
>> +    if(ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
>> +    err = -errno;
>> +    fprintf(stderr, "SIOCGIFINDEX, failed to get raw interface index
>> for %s", host_interface);
>> +    close(fd);
>> +    return(-1);
>> +    } else {
>> +    }
>> +
>> +    sock.sll_family = AF_PACKET;
>> +    sock.sll_protocol = htons(ETH_P_ALL);
>> +    sock.sll_ifindex = ifr.ifr_ifindex;
>> +
>> +    fprintf(stderr, "raw: binding raw on interface index: %i\n",
>> ifr.ifr_ifindex);
>> +    if (bind(fd, (struct sockaddr *) &sock, sizeof(struct
>> sockaddr_ll)) < 0) {
>> +    fprintf(stderr, "raw: failed to bind raw socket");
>> +    close(fd);
>> +    return(-1);
>> +    }
>> +
>> +
>> +    qemu_set_nonblock(fd);
>> +
>> +    if (fd < 0)
>> +    return -1;
>> +
>> +    s->ring_index = 0;
>> +
>> +    s->fd = fd;
>> +
>> +    qemu_set_fd_handler(s->fd, net_raw_send, NULL, s);
>> +
>> +    if (!s) {
>> +      fprintf (stderr, "raw_open : failed to set fd handler\n");
>> +      return -1;
>> +    }
>> +
>> +/* this needs fixing too */
>> +
>> +    snprintf(s->nc.info_str, sizeof(s->nc.info_str), "raw: connected");
>> +    return 0;
>> +
>> +}
>> +
>> +int net_init_raw(const NetClientOptions *opts,
>> +                    const char *name,
>> +                    NetClientState *peer) {
>> +
>> +
>> +    const NetdevRawOptions * raw;
>> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_RAW);
>> +    raw = opts->raw;
>> +
>> +    if (raw->has_fd) {
>> +    error_report("passing socket fd not yet supported");
>> +    return -1 ;
>> +    }
>> +
>> +    if (!(raw->has_ifname)) {
>> +      error_report("iface required for raw");
>> +      return -1;
>> +    }
>> +
>> +    if (net_raw_init( peer, name, raw->ifname) == -1) {
>> +      return -1;
>> +    }
>> +    return 0;
>> +}
>> diff --git a/net/raw.h b/net/raw.h
>> new file mode 100644
>> index 0000000..80ab43e
>> --- /dev/null
>> +++ b/net/raw.h
>> @@ -0,0 +1,33 @@
>> +/*
>> + * QEMU System Emulator
>> + *
>> + * Copyright (c) 2003-2008 Fabrice Bellard
>> + * Copyright (c) 2014 Cisco Systems
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a copy
>> + * of this software and associated documentation files (the
>> "Software"), to deal
>> + * in the Software without restriction, including without limitation
>> the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> and/or sell
>> + * copies of the Software, and to permit persons to whom the
>> Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +#ifndef QEMU_NET_RAW_H
>> +#define QEMU_NET_RAW_H
>> +
>> +#define RAW_TP_BLOCK_SIZE 4096
>> +#define RAW_TP_FRAME_SIZE 2048
>> +#define RAW_TP_BLOCK_NR 32
>> +#define RAW_TP_FRAME_NR 64
>> +
>> +#endif /* QEMU_NET_RAW_H */
>
> Also unused.
>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 83fa485..192d19c 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -2941,6 +2941,45 @@
>>      '*udp':       'str' } }
>>
>>  ##
>> +# @NetdevL2TPv3Options
>> +#
>> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
>> +#
>> +# @fd: #optional file descriptor of an already opened socket
>> +#
>> +# @src: source address
>> +#
>> +# @dst: #optional destination address
>> +#
>> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual
>> definition)
>> +#
>> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set
>> mode for actual type selection)
>> +#
>> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set
>> mode for actual type selection)
>> +#
>> +# @txsession: tx 32 bit session
>> +#
>> +# @rxsession: rx 32 bit session
>> +#
>> +#
>> +# Since 1.0
>> +##
>> +##
>> +{ 'type': 'NetdevL2TPv3Options',
>> +  'data': {
>> +    '*fd':        'str',
>
> fd is not used anywhere, I think.  Using it to support a pre-connected
> socket, passed by some management layer, would be useful because QEMU
> does not ordinarily have privileges to create raw sockets.  But you
> can add this later.
>
>> +    '*src':      'str',
>> +    '*dst':      'str',
>> +    '*mode':      'str',
>> +    '*txcookie':  'str',
>> +    '*rxcookie':  'str',
>> +    '*txsession': 'str',
>> +    '*rxsession': 'str'
>> +
>> +} }
>> +
>> +##
>> +##
>>  # @NetdevVdeOptions
>>  #
>>  # Connect the VLAN to a vde switch running on the host.
>> @@ -3021,6 +3060,7 @@
>>      'nic':      'NetLegacyNicOptions',
>>      'user':     'NetdevUserOptions',
>>      'tap':      'NetdevTapOptions',
>> +    'l2tpv3':   'NetdevL2TPv3Options',
>>      'socket':   'NetdevSocketOptions',
>>      'vde':      'NetdevVdeOptions',
>>      'dump':     'NetdevDumpOptions',
>>
>
> Regards,
>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 13:52   ` Anton Ivanov (antivano)
@ 2014-02-28 13:57     ` Eric Blake
  2014-02-28 14:03       ` Anton Ivanov (antivano)
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2014-02-28 13:57 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1923 bytes --]

On 02/28/2014 06:52 AM, Anton Ivanov (antivano) wrote:
> On 28/02/14 13:40, Eric Blake wrote:
>> On 02/28/2014 01:28 AM, Anton Ivanov (antivano) wrote:
>>> Hi all,
>>>
>>> On behalf of Cisco Systems I am authorized to contribute a new transport
>>> to the network subsystem in qemu.
>>>

>>> +#
>>> +#
>>> +# Since 1.0
>> s/1.0/2.0/
> 
> OK - just to clarify, which version is this referring to - qemu, api, etc?

This field is the version of qemu that first contains the release.  If
your patch is on time to make the qemu 2.0 release, then 2.0 is
appropriate; but we're close to feature freeze so it may end up being 2.1.

> 
>>
>>> +##
>>> +##
>>> +{ 'type': 'NetdevL2TPv3Options',
>>> +  'data': {
>>> +    '*fd':        'str',
>>> +    '*src':	  'str', 
>> You didn't list 'src' as optional above.
> 
> That is intended. It is not optional for the ip command in Linux either.
> For L2TPv3 tunnels you have to specify the source address.

But listing it as '*src' means it is optional.  If it is mandatory, list
it as 'src'.

> 
>>
>> Trailing whitespace - run your submission through checkpatch.pl.
>>
>>> +    '*dst':	  'str', 
>>> +    '*mode':	  'str', 
>> As mentioned above, 'mode' should not be a string.
>>
>>> +    '*txcookie':  'str',
>>> +    '*rxcookie':  'str',
>>> +    '*txsession': 'str',
>>> +    '*rxsession': 'str'
>> These should probably be 'int', not 'str'.  If you have to hand-parse a
>> string into a numeric value, you encoded the QMP wrong.
> 
> These by spec are either 32 bit or 64 bit integers. I will fix them
> accordingly.

The qapi type 'int' is 64-bit.  You can further range check the values
in your C code and fail the command if the user passed values that don't
fit in 32 bits when they requested a 32-bit mode.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 13:40 ` Eric Blake
  2014-02-28 13:52   ` Anton Ivanov (antivano)
@ 2014-02-28 14:00   ` Paolo Bonzini
  2014-02-28 15:06     ` Eric Blake
  1 sibling, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2014-02-28 14:00 UTC (permalink / raw)
  To: Eric Blake, Anton Ivanov (antivano), Qemu-devel

Il 28/02/2014 14:40, Eric Blake ha scritto:
> These should probably be 'int', not 'str'.  If you have to hand-parse a
> string into a numeric value, you encoded the QMP wrong.

Eric, note that this is not QMP.  It is only used for option parsing.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 13:57     ` Eric Blake
@ 2014-02-28 14:03       ` Anton Ivanov (antivano)
  0 siblings, 0 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 14:03 UTC (permalink / raw)
  To: qemu-devel

[snip]

>>> s/1.0/2.0/
>> OK - just to clarify, which version is this referring to - qemu, api, etc?
> This field is the version of qemu that first contains the release.  If
> your patch is on time to make the qemu 2.0 release, then 2.0 is
> appropriate; but we're close to feature freeze so it may end up being 2.1.

Understood - we have used an earlier version as an internal testing tool
since 1.0, that is why it was marked 1.0 in the patchset.

I will do my best to turn around on the comments as quick as I can so it
can be considered for 2.0 or 2.1

>
>>>> +##
>>>> +##
>>>> +{ 'type': 'NetdevL2TPv3Options',
>>>> +  'data': {
>>>> +    '*fd':        'str',
>>>> +    '*src':	  'str', 
>>> You didn't list 'src' as optional above.
>> That is intended. It is not optional for the ip command in Linux either.
>> For L2TPv3 tunnels you have to specify the source address.
> But listing it as '*src' means it is optional.  If it is mandatory, list
> it as 'src'.

Understood.

>
>>> Trailing whitespace - run your submission through checkpatch.pl.
>>>
>>>> +    '*dst':	  'str', 
>>>> +    '*mode':	  'str', 
>>> As mentioned above, 'mode' should not be a string.
>>>
>>>> +    '*txcookie':  'str',
>>>> +    '*rxcookie':  'str',
>>>> +    '*txsession': 'str',
>>>> +    '*rxsession': 'str'
>>> These should probably be 'int', not 'str'.  If you have to hand-parse a
>>> string into a numeric value, you encoded the QMP wrong.
>> These by spec are either 32 bit or 64 bit integers. I will fix them
>> accordingly.
> The qapi type 'int' is 64-bit.  You can further range check the values
> in your C code and fail the command if the user passed values that don't
> fit in 32 bits when they requested a 32-bit mode.

OK, thanks.

A.

>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 14:00   ` Paolo Bonzini
@ 2014-02-28 15:06     ` Eric Blake
  2014-02-28 15:20       ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2014-02-28 15:06 UTC (permalink / raw)
  To: Paolo Bonzini, Anton Ivanov (antivano), Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 552 bytes --]

On 02/28/2014 07:00 AM, Paolo Bonzini wrote:
> Il 28/02/2014 14:40, Eric Blake ha scritto:
>> These should probably be 'int', not 'str'.  If you have to hand-parse a
>> string into a numeric value, you encoded the QMP wrong.
> 
> Eric, note that this is not QMP.  It is only used for option parsing.

But still, shouldn't option parsing be able to take advantage of
non-'str' typing to ease the task of handling numeric parameters?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 15:06     ` Eric Blake
@ 2014-02-28 15:20       ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2014-02-28 15:20 UTC (permalink / raw)
  To: Eric Blake, Anton Ivanov (antivano), Qemu-devel

Il 28/02/2014 16:06, Eric Blake ha scritto:
> On 02/28/2014 07:00 AM, Paolo Bonzini wrote:
>> Il 28/02/2014 14:40, Eric Blake ha scritto:
>>> These should probably be 'int', not 'str'.  If you have to hand-parse a
>>> string into a numeric value, you encoded the QMP wrong.
>>
>> Eric, note that this is not QMP.  It is only used for option parsing.
>
> But still, shouldn't option parsing be able to take advantage of
> non-'str' typing to ease the task of handling numeric parameters?

Indeed, and I pointed this out in my review as well.  But the tradeoffs 
are a bit different, as you have already found out in the past.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28  8:28 [Qemu-devel] Contribution - L2TPv3 transport Anton Ivanov (antivano)
  2014-02-28 10:02 ` Paolo Bonzini
  2014-02-28 13:40 ` Eric Blake
@ 2014-03-03 13:27 ` Stefan Hajnoczi
  2014-03-03 14:01   ` Anton Ivanov (antivano)
  2014-03-03 14:53 ` Stefan Hajnoczi
  3 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2014-03-03 13:27 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel

On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:
> 3. Qemu to communicate with the local host, remote vms, network devices,
> etc at speeds which for a number of use cases exceed the speed of the
> legacy tap driver.

This surprises me.  It's odd that tap performs significantly worse.

I guess you have two options for using kernel L2TPv3 support:

1. Use a bridge and tap to forward between the L2TPv3 interface and the
   QEMU.

2. Use macvtap on top of the L2TPv3 interface (if possible).

Option #2 should be faster.

Now about the tap userspace ABI, is the performance bottleneck that the
read(2) system call only receives one packet at a time?  The tap file
descriptor is not a socket so recvmmsg(2) cannot be used on it directly.

I have wondered in the past whether something like packet mmap would be
possible on a tap device.  At that point userspace just waits for
notifications and the actual packets don't require any syscalls.

> Our suggestion would be that this over time obsoletes the UDP variety of
> the "socket" driver.

Yes, thank you!  L2TPv3 looks like a good replacement for the "socket"
driver.

Will review and respond to your patch in detail.

Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-03 13:27 ` Stefan Hajnoczi
@ 2014-03-03 14:01   ` Anton Ivanov (antivano)
  2014-03-04  9:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-03 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Qemu-devel

On 03/03/14 13:27, Stefan Hajnoczi wrote:
> On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:
>> 3. Qemu to communicate with the local host, remote vms, network devices,
>> etc at speeds which for a number of use cases exceed the speed of the
>> legacy tap driver.
> This surprises me.  It's odd that tap performs significantly worse.


Multipacket RX can go a very long way and it does not work on tap's 
emulation of a raw socket. At least in 3.2 :)

I could have put multipacket-TX in too, but that means breaking QoS for 
my use cases as well as using 2+ thread IO, improving current qemu timer 
API, etc. I have looked into it - it is doable. I would not try that in 
first instance at this point.

Tap at present can beat l2tpv3 on use cases where offloads have a 
significant contribution. Tap is slower on anything per-packet and it is 
slower end-to-end when combined with a bridge and/or OVS.

>
> I guess you have two options for using kernel L2TPv3 support:
>
> 1. Use a bridge and tap to forward between the L2TPv3 interface and the
>     QEMU.

Correct. It ends up being slower on per-packet use cases. Also, in a 
system it introduces one more touch point - the bridge. It needs to be 
configured and kept up to date. For our key use case (one which we will 
ship as a product) we have the following topology:

[Customer LAN] <-> [Physical CPE] <->... network... <-> [VM running a 
service]

An example of this would be putting a media server or a NAS on a VM and 
joining it to a customer network.

We can connect the VM via a switch. Nothing wrong with that and it may 
have the same performance end-to-end. However, it will introduce an 
extra touch point to deal with in terms control plane, orchestration and 
provisioning.

>
> 2. Use macvtap on top of the L2TPv3 interface (if possible).

Did not try that so cannot really say.

>
> Option #2 should be faster.
>
> Now about the tap userspace ABI, is the performance bottleneck that the
> read(2) system call only receives one packet at a time?  The tap file
> descriptor is not a socket so recvmmsg(2) cannot be used on it directly.

If I read the kernel source correctly the tap fd can emulate a socket 
for some calls. However, when I try recvmmsg I get an ENOTSOCKET.

>
> I have wondered in the past whether something like packet mmap would be
> possible on a tap device.

I have done it on raw. I have it approved for submission, mmap works 
fine on RX (once again). Packet mmap does not work on TX - you end up 
having to filter your own frames leading to an overall drop in 
efficiency. So TX still has to be a write to socket.

We will be contributing that one shortly after I clean it up to the 
required coding standards (in fact parts of the source sneaked into the 
original diff file by mistake).

Theoretically, it has very little advantages compared to recvmmsg as 
there is a copy involved in both cases. I am happy to rewrite that for 
recvmmsg instead of packet mmap so we can reuse the vector IO code 
across both drivers.

> At that point userspace just waits for
> notifications and the actual packets don't require any syscalls.

Indeed. That driver will be contributed shortly. We have done that one 
too :)

A.

>
>> Our suggestion would be that this over time obsoletes the UDP variety of
>> the "socket" driver.
> Yes, thank you!  L2TPv3 looks like a good replacement for the "socket"
> driver.
>
> Will review and respond to your patch in detail.
>
> Stefan

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28  8:28 [Qemu-devel] Contribution - L2TPv3 transport Anton Ivanov (antivano)
                   ` (2 preceding siblings ...)
  2014-03-03 13:27 ` Stefan Hajnoczi
@ 2014-03-03 14:53 ` Stefan Hajnoczi
  2014-03-04 11:32   ` Anton Ivanov (antivano)
  3 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2014-03-03 14:53 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel

On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:

I didn't check all of Paolo and Eric's comments.  Feel free to skip
comments that have already been made.

> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 45588d7..865f1c2 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -79,6 +79,7 @@ int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
>  
>  /* Old, ipv4 only bits.  Don't use for new code. */
>  int parse_host_port(struct sockaddr_in *saddr, const char *str);
> +int parse_host_port6(struct sockaddr_in6 *saddr, const char *str);
>  int socket_init(void);
>  
>  #endif /* QEMU_SOCKET_H */

Paolo posted ipv4/ipv6 follow-ups and suggestions, so maybe you won't
need this in the next version.

But right off the bat, adding a utility function to a common source file
is a good indicator that this should be a separate patch.  Splitting the
code up into multiple patches gives you a chance to explain the
rationale for one change at a time, aids bisectability, and makes code
review easier.

> diff --git a/net/Makefile.objs b/net/Makefile.objs
> index 4854a14..364c785 100644
> --- a/net/Makefile.objs
> +++ b/net/Makefile.objs
> @@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
>  common-obj-y += socket.o
>  common-obj-y += dump.o
>  common-obj-y += eth.o
> +common-obj-y += l2tpv3.o
>  common-obj-$(CONFIG_POSIX) += tap.o
>  common-obj-$(CONFIG_LINUX) += tap-linux.o
>  common-obj-$(CONFIG_WIN32) += tap-win32.o
> diff --git a/net/clients.h b/net/clients.h
> index 7793294..ea2a831 100644
> --- a/net/clients.h
> +++ b/net/clients.h
> @@ -47,6 +47,11 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
>  int net_init_bridge(const NetClientOptions *opts, const char *name,
>                      NetClientState *peer);
>  
> +int net_init_raw(const NetClientOptions *opts, const char *name,
> +                    NetClientState *peer);
> +

Think you already noticed, but the 'raw' net client code slipped into
this patch.  Please split it into a separate patch since it's not
directly part of the L2TPv3 code changes.

I suggest making the 'raw' net client a completely separate patch
series.  Post a separate email thread so it can be reviewed and merged
independently of L2TPv3.

> +int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
> +                    NetClientState *peer);
>  #ifdef CONFIG_VDE
>  int net_init_vde(const NetClientOptions *opts, const char *name,
>                   NetClientState *peer);
> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
> new file mode 100644
> index 0000000..4b5d8ee
> --- /dev/null
> +++ b/net/l2tpv3.c
> @@ -0,0 +1,630 @@
> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "net/l2tpv3.h"
> +#include "config-host.h"
> +#include "config-host.h"
> +#include "net/net.h"
> +#include "clients.h"
> +#include "monitor/monitor.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/sockets.h"
> +#include "qemu/iov.h"
> +#include "qemu/main-loop.h"
> +#include <linux/ip.h>

Please #include <system/headers.h> before #include "user/headers.h" just
to make sure our user headers don't interfere with system headers.  For
example by defining MIN()/MAX() or other macros.

> +
> +
> +
> +
> +#define PAGE_SIZE 4096

If you really *need* the page size, please use sysconf(_SC_PAGESIZE).

If you just need a big buffer size, please rename this to L2TPV3_BUFSIZE
or similar to avoid linking the name to page size.

> +#define MAX_L2TPV3_IOVCNT 64
> +
> +typedef struct NetL2TPV3State {
> +    NetClientState nc;
> +    int fd;
> +    int state; /* 0 = getting length, 1 = getting data */
> +    unsigned int index;
> +    unsigned int packet_len;

state, index, and packet_len are unused.  Please remove.

> +    /* 
> +	these are used for xmit - that happens packet a time
> +	and for first sign of life packet (easier to parse that once)
> +
> +    */ 
> +    uint8_t * header_buf;
> +    uint8_t * buf;
> +    struct iovec * vec;
> +    /* 
> +	these are used for receive - try to "eat" up to 32 packets at a time

It's best to avoid referring explicitly to 32 here.  It can easily get
out-of-sync with the code.

> +    */
> +    struct mmsghdr * msgvec;
> +    /*
> +      contains inet host and port destination if 
> +      connectionless (SOCK_DGRAM). Will be NULL if 
> +      in listen mode waiting for first connect
> +
> +    */
> +    struct sockaddr_storage * dgram_dst; 
> +    uint64_t rx_cookie;
> +    uint64_t tx_cookie;
> +    uint32_t rx_session;
> +    uint32_t tx_session;
> +    uint32_t header_size;
> +    uint32_t counter;
> +    uint32_t dst_size;
> +    uint32_t new_mode;

Why is it called "new" mode?

> +   
> +    /* offsets */ 
> +
> +    uint32_t offset;   /* main offset == header offset */

I suggest using a descriptive name.  header_offset?

> +    uint32_t cookie_offset;
> +    uint32_t counter_offset;
> +    uint32_t session_offset;
> +
> +} NetL2TPV3State;
> +
> +typedef struct NetL2TPV3ListenState {
> +    NetClientState nc;
> +    char *model;
> +    char *name;
> +    int fd;
> +} NetL2TPV3ListenState;

Unused, please remove.

> +static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
> +{
> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +    struct iovec * vec;
> +
> +    struct msghdr message;
> +    l2tpv3_form_header(s);
> +    vec = s->vec;
> +    vec->iov_base = s->header_buf;
> +    vec->iov_len = s->offset;
> +    vec++;
> +    vec->iov_base = (void *) buf;
> +    vec->iov_len = size;
> +
> +    if (s->dgram_dst) {
> +       message.msg_name = s->dgram_dst;
> +       message.msg_namelen = s->dst_size;
> +       message.msg_iov = (struct iovec *) s->vec;
> +       message.msg_iovlen = 2;
> +       message.msg_control = NULL;
> +       message.msg_controllen = 0;
> +       message.msg_flags = 0;
> +       size = sendmsg(s->fd, &message, 0) - s->offset;

Explicit sendmesg() -1 return handling would be nice.  Also, please use
ssize_t (signed) for manipulating the return value instead of size_t
(unsigned).

What happens if the error was EINTR we can retry.  If the error was
EAGAIN we should return 0 and wait for the fd to become writable.  When
it becomes writable we should flush the queue (see
qemu_net_queue_flush()).  Other net clients already do this, so there is
example code to look at.

> +static void net_l2tpv3_send(void *opaque)
> +{
> +    NetL2TPV3State *s = opaque;
> +    
> +    int i, count, offset;
> +    
> +    struct mmsghdr * msgvec;
> +    struct iovec * vec;
> +    
> +    
> +    msgvec = s->msgvec;
> +    offset = s->offset;
> +    if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
> +	offset +=   sizeof(struct iphdr);
> +    }  
> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2, MSG_DONTWAIT, NULL);
> +    for (i=0;i<count;i++) {
> +	if (msgvec->msg_len > 0) {
> +	    vec = msgvec->msg_hdr.msg_iov;
> +	    if (l2tpv3_verify_header(s, vec->iov_base) == 0) {

Header verification should check the size of the packet too.  It must be
large enough for the L2TPv3 header, otherwise we'd access stale data
that happened to be in vec->iov_base...

> +		//vec->iov_len = offset; /* reset size just in case*/
> +		vec++;
> +		qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len - offset);

...and if the header check happened to pass, the msgvec->msg_len -
offset argument value will be negative (int)!

> +	    } else {
> +		fprintf(stderr, "l2tpv3 header verification failed\n");
> +		vec->iov_len = offset;
> +		vec++;
> +	    }
> +	    //vec->iov_len = PAGE_SIZE; /* reset for next read */

I think it *is* necessary to reset ->iov_len for both msgvec iovecs.

> +static int net_l2tpv3_init(NetClientState *peer,
> +                                 const char *name,
> +                                 const char *src,
> +                                 const char *dst,
> +                                 const char *new_mode,
> +                                 const char *tx_cookie,
> +                                 const char *rx_cookie,
> +                                 const char *tx_session,
> +                                 const char *rx_session
> +)
> +{
> +   NetL2TPV3State *s;
> +   NetClientState *nc;
> +
> +   int fd, i;
> +
> +   int sock_family, sock_type, sock_proto;
> +   unsigned int temp;
> +   
> +   struct mmsghdr * msgvec;
> +   struct iovec * iov;
> +
> +
> +   struct sockaddr_storage LocalSock;
> +
> +       
> +   fprintf(stderr, "l2tpv3 user init mode %s\n", new_mode);

If you want observability, please use trace events: docs/tracing.txt.
You can define trace events in ./trace-events and they can be used with
DTrace/SystemTap/LTTng/ftrace/stderr.

It's often useful to define trace events for important state transitions
in the code (e.g. creation, starting a connection, completing a
connection, sending a packet, receiving a packet, disconnecting, etc).

> +   memset(&LocalSock, 0 , sizeof(LocalSock));
> +
> +   nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
> +
> +   s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +   s->header_buf = g_malloc(256);    

Can you use C structs and unions instead of choosing an arbitrary
256-byte size and calculating offsets at runtime?

typedef struct {
    uint16_t flags;
    uint16_t reserved;
    uint32_t session_id;
    union {
    	uint32_t cookie32;
	uint64_t cookie64;
    };
} QEMU_PACKED L2TPv3SessionHeader;

Or perhaps something like:

typedef union {
    L2TPv3SessionHeader32 session32;
    L2TPv3SessionHeader64 session64;
    L2TPv3DataHeader data;
} L2TPv3SessionHeader;

(I haven't checked which approach is simplest but it would make the code
clearer and avoid byte manipulation.)

> +   if (s->header_buf == NULL) {
> +	fprintf(stderr, "Failed to allocate header buffer \n");
> +	return -1;
> +   } 

g_malloc() never fails, it calls abort(3) on out-of-memory.  The same
applies to other g_malloc() calls in this patch.

> +   s->buf = g_malloc(PAGE_SIZE);    
> +
> +   if (s->buf == NULL) {
> +	fprintf(stderr, "Failed to allocate packet buffer \n");
> +	return -1;
> +   } 

This buffer isn't used?

> +   s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);;    

Slightly more concise way to say the same thing:
s->vec = g_new(struct iovec, MAX_L2TPV3_IOVCNT);

> +   if (dst && (strlen(dst) > 0 )) {         
> +      /* we now allocate it only if it we are not "listening" */
> +      s->dgram_dst = malloc(sizeof(struct sockaddr_storage));

Please use g_malloc()/g_new() instead of malloc(3).

> +    } else {
> +      sock_type = SOCK_RAW;
> +      sock_proto = 0x73;

Please use the IPPROTO_L2TP constant instead of a magic number.

> +   for (i=0;i<MAX_L2TPV3_IOVCNT/2;i++) {
> +	/* we need to allocate these only for the first time and first buffer */
> +	msgvec->msg_hdr.msg_name = NULL;
> +	msgvec->msg_hdr.msg_namelen = 0;
> +	iov =  g_malloc(sizeof(struct iovec) * 2);
> +	if (iov == NULL) {
> +	    fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	    return -1;
> +	}
> +	msgvec->msg_hdr.msg_iov = iov;
> +	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
> +	    iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr)); /* fix for ipv4 raw */;

Is there a way to disable the IP header included in received packets?  I
haven't looked into how IP_HDRINCL works...

> +	} else {
> +	    iov->iov_base = g_malloc(s->offset);
> +	}
> +	
> +	if (iov->iov_base == NULL) {
> +	    fprintf(stderr, "Failed to allocate receive packet vec \n");
> +	    return -1;
> +	}
> +	if ((!(s->new_mode & NEW_MODE_IP_VERSION)) && (!(s->new_mode & NEW_MODE_UDP))){
> +	    iov->iov_len = s->offset + sizeof (struct iphdr);
> +	} else {
> +	    iov->iov_len = s->offset;
> +	}

I suggest setting iov_len first, and then allocating iov_base using that
size.  This way you don't duplicate the IPv4 raw check.

> +   qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
> +
> +   if (!s) {
> +      fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
> +      return -1;
> +   }
> +
> +   /* this needs fixing too */

Please fix it :)

> +
> +   snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> +             "l2tpv3: connected");
> +   return 0;
> +
> +}
> +
> +int net_init_l2tpv3(const NetClientOptions *opts,
> +                    const char *name,
> +                    NetClientState *peer) {
> +
> +
> +    const NetdevL2TPv3Options * l2tpv3;
> +    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
> +    l2tpv3 = opts->l2tpv3;
> +
> +
> +  if (!(l2tpv3->has_src && l2tpv3->has_mode && l2tpv3->has_txsession && l2tpv3->has_rxsession)) {
> +      error_report("src=, dst=, txsession=, rxsession= and mode= are required for l2tpv3");
> +      return -1;
> +  }

If they are required then I'm not sure why they were declared optional
in qapi-schema.json.  Drop the '*' in the schema and these fields become
mandatory.

> +
> +
> +  /* this needs cleaning up for new API */
> +
> +  if (net_l2tpv3_init(
> +	 peer, 
> +         name,
> +	 l2tpv3->src, 
> +	 l2tpv3->dst, 
> +	 l2tpv3->mode, 
> +	 l2tpv3->txcookie,
> +	 l2tpv3->rxcookie,
> +	 l2tpv3->txsession,
> +	 l2tpv3->rxsession
> +      
> +   ) == -1) {
> +      return -1;
> +  }
> +  return 0;
> +}
> diff --git a/net/l2tpv3.h b/net/l2tpv3.h
> new file mode 100644
> index 0000000..822e0b0
> --- /dev/null
> +++ b/net/l2tpv3.h
> @@ -0,0 +1,65 @@

A header file is not necessary.  These constants are internal to the
L2TPv3 implementation, please put them in the .c file.

> +/*
> + * QEMU System Emulator
> + *
> + * Copyright (c) 2003-2008 Fabrice Bellard
> + * Copyright (c) 2012 Cisco Systems
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +
> +#ifndef QEMU_NET_L2TPV3_H
> +#define QEMU_NET_L2TPV3_H
> +
> +#define L2TPV3_BUFSIZE 2048

Hey, I see you did define an L2TPV3_BUFSIZE at one point in time after
all ;-).  But it's unused.

> +
> +#define NEW_MODE_IP_VERSION   1		  /* on for v6, off for v4 */
> +#define NEW_MODE_UDP	      2		  /* on for udp, off for raw ip */
> +#define NEW_MODE_COOKIE	      4		  /* cookie present */
> +#define NEW_MODE_COOKIE_SIZE  8		  /* on for 64 bit */
> +#define NEW_MODE_NO_COUNTER   16	  /* DT - no counter */
> +
> +/* legacy modes */
> +
> +/* mode 0 */
> +
> +#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
> +
> +#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
> +
> +/* mode 1 */
> +
> +#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
> +
> +#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
> +
> +/* mode 2 */
> +
> +#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP + NEW_MODE_COOKIE_SIZE )
> +
> +#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER

The legacy mode constants are not used, please remove.

> diff --git a/qapi-schema.json b/qapi-schema.json
> index 83fa485..192d19c 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2941,6 +2941,45 @@
>      '*udp':       'str' } }
>  
>  ##
> +# @NetdevL2TPv3Options
> +#
> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel

The term "VLAN" in QEMU is outdated.  QEMU now uses a peer-to-peer
topology of NetClientState instances.

I suggest "Connect to an Ethernet-over-L2TPv3 static tunnel"

> +#
> +# @fd: #optional file descriptor of an already opened socket

This argument isn't used.

> +#
> +# @src: source address
> +#
> +# @dst: #optional destination address
> +#
> +# @mode: bitmask mode for the tunnel (see l2tpv3.h for actual definition)
> +#
> +# @txcookie:#optional 32 or 64 bit tx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @rxcookie:#optional 32 or 64 bit rx cookie for the tunnel (set mode for actual type selection)
> +#
> +# @txsession: tx 32 bit session
> +#
> +# @rxsession: rx 32 bit session
> +#
> +#
> +# Since 1.0
> +##
> +##
> +{ 'type': 'NetdevL2TPv3Options',
> +  'data': {
> +    '*fd':        'str',
> +    '*src':	  'str', 
> +    '*dst':	  'str', 
> +    '*mode':	  'str', 
> +    '*txcookie':  'str',
> +    '*rxcookie':  'str',
> +    '*txsession': 'str',
> +    '*rxsession': 'str'
> +
> +} }

I guess you already got plenty of feedback from Eric and Paolo on how to
type these parameters and avoid the bitmask :-).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-03 14:01   ` Anton Ivanov (antivano)
@ 2014-03-04  9:36     ` Stefan Hajnoczi
  2014-03-04  9:47       ` Anton Ivanov (antivano)
  0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2014-03-04  9:36 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Luigi Rizzo, Qemu-devel, Vincenzo Maffione

On Mon, Mar 03, 2014 at 02:01:00PM +0000, Anton Ivanov (antivano) wrote:
> On 03/03/14 13:27, Stefan Hajnoczi wrote:
> > On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:
> >> 3. Qemu to communicate with the local host, remote vms, network devices,
> >> etc at speeds which for a number of use cases exceed the speed of the
> >> legacy tap driver.
> > This surprises me.  It's odd that tap performs significantly worse.
> 
> 
> Multipacket RX can go a very long way and it does not work on tap's 
> emulation of a raw socket. At least in 3.2 :)

Luigi and Vincenzo had ideas on making QEMU's net layer support
multipacket tx using something like TCP_CORK.  This would map to
sendmmsg(2).

Basically the net client gets multiple .receive() calls but is told to
hold off on submitting the packets.  Then, when it finally gets
uncorked, it can sendmmsg(2).  The only issue is we need to hold on to
the tx buffers longer than normal.

> > Now about the tap userspace ABI, is the performance bottleneck that the
> > read(2) system call only receives one packet at a time?  The tap file
> > descriptor is not a socket so recvmmsg(2) cannot be used on it directly.
> 
> If I read the kernel source correctly the tap fd can emulate a socket 
> for some calls. However, when I try recvmmsg I get an ENOTSOCKET.

The fd is not a real socket.  Confusingly, inside the kernel the tun.c
driver has a "socket" which is used for zero-copy tx by vhost_net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04  9:36     ` Stefan Hajnoczi
@ 2014-03-04  9:47       ` Anton Ivanov (antivano)
  2014-03-05  8:59         ` Stefan Hajnoczi
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04  9:47 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Luigi Rizzo, Qemu-devel, Vincenzo Maffione

On 04/03/14 09:36, Stefan Hajnoczi wrote:
> On Mon, Mar 03, 2014 at 02:01:00PM +0000, Anton Ivanov (antivano) wrote:
>> On 03/03/14 13:27, Stefan Hajnoczi wrote:
>>> On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:
>>>> 3. Qemu to communicate with the local host, remote vms, network devices,
>>>> etc at speeds which for a number of use cases exceed the speed of the
>>>> legacy tap driver.
>>> This surprises me.  It's odd that tap performs significantly worse.
>>
>> Multipacket RX can go a very long way and it does not work on tap's
>> emulation of a raw socket. At least in 3.2 :)
> Luigi and Vincenzo had ideas on making QEMU's net layer support
> multipacket tx using something like TCP_CORK.  This would map to
> sendmmsg(2).
>
> Basically the net client gets multiple .receive() calls but is told to
> hold off on submitting the packets.  Then, when it finally gets
> uncorked, it can sendmmsg(2).  The only issue is we need to hold on to
> the tx buffers longer than normal.

Cool, I will be happy to give a hand with that.

My main problem so far trying to implement it has been the timers - the 
qemu internal timer API has no relative timers, only absolute. So you 
end up with a very high cost of setting and checking a delayed xmit timer.

>
>>> Now about the tap userspace ABI, is the performance bottleneck that the
>>> read(2) system call only receives one packet at a time?  The tap file
>>> descriptor is not a socket so recvmmsg(2) cannot be used on it directly.
>> If I read the kernel source correctly the tap fd can emulate a socket
>> for some calls. However, when I try recvmmsg I get an ENOTSOCKET.
> The fd is not a real socket.  Confusingly, inside the kernel the tun.c
> driver has a "socket" which is used for zero-copy tx by vhost_net.
That explains it.

Otherwise I am nearly done incorporating all comments. An updated 
version should be available some time this week.

A.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-03 14:53 ` Stefan Hajnoczi
@ 2014-03-04 11:32   ` Anton Ivanov (antivano)
  2014-03-05  9:07     ` Stefan Hajnoczi
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 11:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Qemu-devel

Hi Stefan,

I am addressing a few more comments which I missed on first pass.

> If you really *need* the page size, please use sysconf(_SC_PAGESIZE).

I like to have it page aligned and if possible page sized so I can later 
extend to do jumbo frame support via a vector. If this is the wrong way 
of doing it, I am happy to fix.

>
> If you just need a big buffer size, please rename this to L2TPV3_BUFSIZE
> or similar to avoid linking the name to page size.

At this stage ~ 1580bytes will do. However, later I would like to extend 
this so it works with jumbo frames so you can pass 8K packets if one 
wants to. My current ideas are that in order for that to happen 
efficiently the vectors should have 2-3 buffers page sized each 
(allocated only if the user request jumbo frames).

[snip]


> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_IOVCNT/2, MSG_DONTWAIT, NULL);
> +    for (i=0;i<count;i++) {
> +	if (msgvec->msg_len > 0) {
> +	    vec = msgvec->msg_hdr.msg_iov;
> +	    if (l2tpv3_verify_header(s, vec->iov_base) == 0) {
> Header verification should check the size of the packet too.  It must be
> large enough for the L2TPv3 header, otherwise we'd access stale data
> that happened to be in vec->iov_base...

That should be

msgvec->msg_len > offset


Thanks for noting it.


>
> +	    //vec->iov_len = PAGE_SIZE; /* reset for next read */
> I think it *is* necessary to reset ->iov_len for both msgvec iovecs.

mmsgsend does not return these modified. However better be safe than 
sorry - I am uncommenting these in the next revision.


> Can you use C structs and unions instead of choosing an arbitrary
> 256-byte size and calculating offsets at runtime?

It is has now updated to be the correct size for the actual config.

As far as structs - not really.

I tried that once upon a time in an early version, I ended up with 8+ 
different structs (cookies can vary in size so you cannot union-ize 
them, compiler will allocate the size for the "biggest option"). In 
addition to that the standard has slightly different headers on raw and 
udp. The linux kernel people have done the same - header offsets. It is 
an unfortunate necessity for code like this.

Also, there is one nearly universal non-standard feature which I would 
like to put back. It is present in the linux kernel implementation and 
it is the "arbitrary offset after header" so you can stick extra 
metadata between header and packet. That will necessitate offset 
calculations anyway.

>
> typedef struct {
>      uint16_t flags;
>      uint16_t reserved;
>      uint32_t session_id;
>      union {
>      	uint32_t cookie32;

[snip]


> This buffer isn't used?

Not any more, removing.


> Is there a way to disable the IP header included in received packets? 
> I haven't looked into how IP_HDRINCL works... 

It works the other way - you can get headers on v6 using that option, 
but v6 does not give you headers by default. AFAIK v4 raw always gives 
you the header do you like it or not. Makes the code very ugly 
unfortunately.

[snip]

A.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-02-28 10:02 ` Paolo Bonzini
  2014-02-28 11:17   ` Anton Ivanov (antivano)
  2014-02-28 13:55   ` Anton Ivanov (antivano)
@ 2014-03-04 15:19   ` Anton Ivanov (antivano)
  2014-03-04 15:22     ` Anton Ivanov (antivano)
                       ` (2 more replies)
  2 siblings, 3 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 15:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2138 bytes --]

Attached is a revised version.

     * I am still keeping the sendmsg instead of iov_send because I have 
not had the time to review udp.c in the kernel and do the relevant 
regression testing to see if they connect() still breaks bind() on 
multihomed hosts as it did in 2.6. We can revisit that at a later date, 
without doing the proper investigation I am not comfortable trying this.

     * I have killed completely parse6 and replaced that by a common 
getaddrinfo()  with a selectable address family based on the boolean 
flags. This also allows to force a specific v4 or v6 address choice for dst.

     * addresses now are specified separately from ports and ports are 
deliberately strings so you can specify them as a protocol.

     * mode bitmask is gone it is all booleans now - both for option 
invocation and internally in the code.

     * I have added the extra offset back in so it is feature by feature 
compatible with linux kernel implementation.

     * All mallocs are now to exact size and

     * Indentation, style, debug, etc are all as requested now.

     * l2tpv3.h has become surplus to requirements and is gone now.

     * it now has proper cleanup.

     * I have tested it for a few setups (mostly v4) and it works as 
advertised. I need to rewrite my tests scripts for the new options names 
to give it a full test. I would not expect that to show any problems 
though - core send/receive logic is unchanged from the original version.

     * I have not yet addressed the page size and page alignment of 
buffers items - leaving that open for an RFC on how to get the max 
performance there and how to make it expandable lately so one can 
re-configure it for large MTU/jumbo frames.

Example magic incantation to invoke it with the new options:

#!/bin/sh

kvm -hda kvm.img  -m 1024 \
    -net nic,vlan=0,model=virtio,macaddr=0a:98:fc:96:83:01 \
    -net l2tpv3,vlan=0,udp,src=192.168.63.1,srcport=1700,dst=192.168.63.1,dstport=1701,cookie64,txcookie=0x0123456789abcdef,rxcookie=0xfedcba9876543210,rxsession=0xffffffff,txsession=0xffffffff,counter

A.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: l2tpv3.diff --]
[-- Type: text/x-patch; name="l2tpv3.diff", Size: 16637 bytes --]

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 4854a14..160214e 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
+common-obj-$(CONFIG_LINUX) += l2tpv3.o
 common-obj-$(CONFIG_POSIX) += tap.o
 common-obj-$(CONFIG_LINUX) += tap-linux.o
 common-obj-$(CONFIG_WIN32) += tap-win32.o
diff --git a/net/clients.h b/net/clients.h
index 7793294..bbf177c 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -47,6 +47,8 @@ int net_init_tap(const NetClientOptions *opts, const char *name,
 int net_init_bridge(const NetClientOptions *opts, const char *name,
                     NetClientState *peer);
 
+int net_init_l2tpv3(const NetClientOptions *opts, const char *name,
+                    NetClientState *peer);
 #ifdef CONFIG_VDE
 int net_init_vde(const NetClientOptions *opts, const char *name,
                  NetClientState *peer);
diff --git a/net/l2tpv3.c b/net/l2tpv3.c
new file mode 100644
index 0000000..302fc1d
--- /dev/null
+++ b/net/l2tpv3.c
@@ -0,0 +1,541 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012-2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <linux/ip.h>
+#include <netdb.h>
+#include "config-host.h"
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+
+
+
+
+#define PAGE_SIZE 4096
+#define IOVSIZE 2
+#define MAX_L2TPV3_MSGCNT 32 
+#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
+
+#ifndef IPPROTO_L2TP
+#define IPPROTO_L2TP 0x73
+#endif
+
+typedef struct NetL2TPV3State {
+    NetClientState nc;
+    int fd;
+    int state; 
+    unsigned int index;
+    unsigned int packet_len;
+
+    /* 
+     *	these are used for xmit - that happens packet a time
+     *	and for first sign of life packet (easier to parse that once)
+     */ 
+
+    uint8_t * header_buf;
+    struct iovec * vec;
+
+    /* 
+     * these are used for receive - try to "eat" up to 32 packets at a time
+     */
+
+    struct mmsghdr * msgvec;
+
+    /*
+     * peer address
+     */
+
+    struct sockaddr_storage * dgram_dst; 
+    uint32_t dst_size;
+
+    /*
+     * L2TPv3 parameters
+     */
+
+    uint64_t rx_cookie;
+    uint64_t tx_cookie;
+    uint32_t rx_session;
+    uint32_t tx_session;
+    uint32_t header_size;
+    uint32_t counter;
+
+    /*
+     * Bitmask mode determining encaps behaviour
+     */
+
+    uint32_t offset;   
+    uint32_t cookie_offset;
+    uint32_t counter_offset;
+    uint32_t session_offset;
+
+    /* Flags */
+
+    bool ipv6;
+    bool udp;
+    bool nocounter;
+    bool cookie;
+    bool cookie_is_64;
+
+} NetL2TPV3State;
+
+typedef struct NetL2TPV3ListenState {
+    NetClientState nc;
+    char *model;
+    char *name;
+    int fd;
+} NetL2TPV3ListenState;
+
+static int l2tpv3_form_header(NetL2TPV3State *s) {
+    uint32_t *header;
+    uint32_t *session;
+    uint64_t *cookie64;
+    uint32_t *cookie32;
+    uint32_t *counter;
+
+    if (s->udp == TRUE) {
+	header = (uint32_t *) s->header_buf;
+	stl_be_p(header, 0x30000);
+    }
+    session = (uint32_t *) (s->header_buf + s->session_offset);
+    stl_be_p(session, s->tx_session);
+
+    if (s->cookie == TRUE ) {
+	if (s->cookie_is_64 == TRUE) {
+	    cookie64 = (uint64_t *)(s->header_buf + s->cookie_offset);
+	    stq_be_p(cookie64, s->tx_cookie);
+	} else {
+	    cookie32 = (uint32_t *) (s->header_buf + s->cookie_offset);
+	    stl_be_p(cookie32, s->tx_cookie);
+	}
+    }
+
+    if (s->nocounter == FALSE) {
+	counter = (uint32_t *)(s->header_buf + s->counter_offset);
+	stl_be_p(counter, ++ s->counter);
+    }
+    return 0;
+}
+
+static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+    struct msghdr message;
+    int ret;
+
+    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
+	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
+	return -1;
+    }
+    l2tpv3_form_header(s);
+    memcpy(s->vec + 1, iov, iovcnt * sizeof(struct iovec));
+    s->vec->iov_base = s->header_buf;
+    s->vec->iov_len = s->offset;
+    message.msg_name = s->dgram_dst;
+    message.msg_namelen = s->dst_size;
+    message.msg_iov = (struct iovec *) s->vec;
+    message.msg_iovlen = iovcnt + 1; 
+    message.msg_control = NULL;
+    message.msg_controllen = 0;
+    message.msg_flags = 0;
+    ret = sendmsg(s->fd, &message, MSG_DONTWAIT) - s->offset;
+    if (ret < 0) {
+	ret = - errno;
+    } else if (ret == 0) {
+	ret = iov_size (iov, iovcnt);
+    } else {
+	ret =- s->offset; 
+    }
+    return ret;
+}
+
+static ssize_t net_l2tpv3_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+    struct iovec * vec;
+    struct msghdr message;
+    ssize_t ret = 0;
+
+    l2tpv3_form_header(s);
+    vec = s->vec;
+    vec->iov_base = s->header_buf;
+    vec->iov_len = s->offset;
+    vec++;
+    vec->iov_base = (void *) buf;
+    vec->iov_len = size;
+    message.msg_name = s->dgram_dst;
+    message.msg_namelen = s->dst_size;
+    message.msg_iov = (struct iovec *) s->vec;
+    message.msg_iovlen = 2;
+    message.msg_control = NULL;
+    message.msg_controllen = 0;
+    message.msg_flags = 0;
+    ret = sendmsg(s->fd, &message, 0);
+    if (ret < 0) {
+	ret = - errno;
+    } else if (ret == 0) {
+	ret = size;
+    } else {
+	ret =- s->offset; 
+    }
+    return ret;
+}
+
+static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {
+
+    uint64_t *cookie64;
+    uint32_t *cookie32;
+    uint32_t *session;
+
+    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
+	buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
+    } 
+    if (s->cookie == TRUE) {
+	if (s->cookie_is_64 == TRUE) {
+	    /* 64 bit cookie */
+	    cookie64 = (uint64_t *)(buf + s->cookie_offset);
+	    if ( ldq_be_p(cookie64) != s->rx_cookie) {
+		fprintf(stderr, "unknown cookie id\n");
+		return -1; /* we need to return 0, otherwise barfus */
+	    }
+	} else {
+	    cookie32 = (uint32_t *)(buf + s->cookie_offset);
+	    if (ldl_be_p(cookie32) != * (uint32_t *) &s->rx_cookie) {
+		fprintf(stderr,"unknown cookie id\n");
+		return -1 ; /* we need to return 0, otherwise barfus */
+	    }
+	}
+    }
+    session = (uint32_t *) (buf + s->session_offset);
+    if (ldl_be_p(session) != s->rx_session) {
+	fprintf(stderr,"session mismatch\n");
+	return -1;
+    }	
+    return 0;
+}
+
+static void net_l2tpv3_send(void *opaque)
+{
+    NetL2TPV3State *s = opaque;
+
+    int i, count, offset;
+    struct mmsghdr * msgvec;
+    struct iovec * vec;
+
+    msgvec = s->msgvec;
+    offset = s->offset;
+    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
+	offset +=   sizeof(struct iphdr);
+    }  
+    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_MSGCNT, MSG_DONTWAIT, NULL);
+    for (i=0;i<count;i++) {
+	if (msgvec->msg_len > 0) {
+	    vec = msgvec->msg_hdr.msg_iov;
+	    vec->iov_len = offset; /* belt and braces - restore iov size */
+	    if ((msgvec->msg_len > 0) && (l2tpv3_verify_header(s, vec->iov_base) == 0)) {
+		vec++;
+		qemu_send_packet(&s->nc, vec->iov_base, msgvec->msg_len - offset);
+	    } else {
+		fprintf(stderr, "l2tpv3 header verification failed\n");
+		vec++; 
+	    }
+	    vec->iov_len = PAGE_SIZE; /* belt and braces - restore iov size */
+	} 
+	msgvec++;
+    }
+}
+
+static void destroy_vector(struct mmsghdr * msgvec, int count, int iovcount) {
+    int i, j;
+    struct iovec * iov;
+    struct mmsghdr * cleanup = msgvec;
+    if (cleanup) {
+	for (i=0;i<count;i++) {
+	    if (cleanup->msg_hdr.msg_iov) {
+		iov = cleanup->msg_hdr.msg_iov;
+		for (j=0;j<iovcount;j++) {
+		    if (iov->iov_base) {
+			g_free(iov->iov_base);
+		    }
+		    iov++;
+		}
+		g_free(cleanup->msg_hdr.msg_iov);
+	    }
+	    cleanup++;
+	}
+	g_free(msgvec);
+    }
+}
+
+static struct mmsghdr * build_l2tpv3_vector(NetL2TPV3State *s, int count) {
+    int i;
+    struct iovec * iov;
+    struct mmsghdr * msgvec, *result;
+
+    msgvec = g_malloc(sizeof(struct mmsghdr) * count);
+    result = msgvec;
+    for (i=0;i < count ;i++) {
+	msgvec->msg_hdr.msg_name = NULL;
+	msgvec->msg_hdr.msg_namelen = 0;
+	iov =  g_malloc(sizeof(struct iovec) * IOVSIZE);
+	msgvec->msg_hdr.msg_iov = iov;
+	if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
+	    iov->iov_base = g_malloc(s->offset + sizeof(struct iphdr)); /* fix for ipv4 raw */;
+	    iov->iov_len = s->offset + sizeof (struct iphdr);
+	} else {
+	    iov->iov_base = g_malloc(s->offset);
+	    iov->iov_len = s->offset;
+	}
+	iov++ ;
+	iov->iov_base = qemu_memalign(PAGE_SIZE, PAGE_SIZE);
+	iov->iov_len = PAGE_SIZE;
+	msgvec->msg_hdr.msg_iovlen = 2;
+	msgvec->msg_hdr.msg_control = NULL;
+	msgvec->msg_hdr.msg_controllen = 0;
+	msgvec->msg_hdr.msg_flags = 0;
+	msgvec++;
+    }
+    return result;
+}
+
+static void net_l2tpv3_cleanup(NetClientState *nc) 
+{
+    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
+    qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+    close(s->fd);
+    destroy_vector(s->msgvec, MAX_L2TPV3_MSGCNT, IOVSIZE);
+    g_free(s->header_buf);
+    g_free(s->dgram_dst);
+}
+
+static NetClientInfo net_l2tpv3_info = {
+    .type = NET_CLIENT_OPTIONS_KIND_L2TPV3,
+    .size = sizeof(NetL2TPV3State),
+    .receive = net_l2tpv3_receive_dgram,
+    .receive_iov = net_l2tpv3_receive_dgram_iov,
+    .cleanup = net_l2tpv3_cleanup,
+};
+
+int net_init_l2tpv3(const NetClientOptions *opts,
+                    const char *name,
+                    NetClientState *peer) {
+
+
+    const NetdevL2TPv3Options * l2tpv3;
+    NetL2TPV3State *s;
+    NetClientState *nc;
+    int fd;
+    struct addrinfo hints;
+    struct addrinfo * result = NULL;
+    char * srcport, * dstport;
+
+    nc = qemu_new_net_client(&net_l2tpv3_info, peer, "l2tpv3", name);
+
+    s = DO_UPCAST(NetL2TPV3State, nc, nc);
+
+    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_L2TPV3);
+    l2tpv3 = opts->l2tpv3;
+
+        /* Form mode bitmask */
+
+    if ((l2tpv3->has_ipv6) && (l2tpv3->ipv6 == TRUE)) {
+	s->ipv6 = l2tpv3->ipv6;
+    } else {
+	s->ipv6 = FALSE;
+    }
+
+    if ((l2tpv3->has_rxcookie) || (l2tpv3->has_txcookie)) {
+	if ((l2tpv3->has_rxcookie) && (l2tpv3->has_txcookie)) {
+	    s->cookie = TRUE;
+	} else {
+	    return -1;
+	}
+    } else {
+	s->cookie = FALSE;
+    }
+
+    if ((l2tpv3->has_cookie64) || (l2tpv3->cookie64 == TRUE)) {
+	s->cookie_is_64  = TRUE;
+    } else {
+	s->cookie_is_64  = FALSE;
+    }
+
+    if ((l2tpv3->has_udp) && (l2tpv3->udp == TRUE)) {
+	s->udp = TRUE;
+	if (!(l2tpv3->has_srcport && l2tpv3->has_dstport)) {
+	    fprintf(stderr, "l2tpv3_open : need both src and dst port for udp\n");
+	    return -1;
+	} else {
+	    srcport = l2tpv3->srcport;
+	    dstport = l2tpv3->dstport;
+	}
+    } else {
+	s->udp = FALSE;
+	srcport = NULL;
+	dstport = NULL;
+    }
+
+    if ((l2tpv3->has_counter) && (l2tpv3->counter == FALSE)) {
+	s->nocounter = TRUE;
+    } else {
+	s->nocounter = FALSE;
+    }
+
+    s->offset = 4;
+    s->session_offset = 0;
+    s->cookie_offset = 4;
+    s->counter_offset = 4;
+
+    s->tx_session = l2tpv3->txsession;
+    if (l2tpv3->has_rxsession) {
+	s->rx_session = l2tpv3->rxsession;
+    } else {
+	s->rx_session = s->tx_session;
+    }
+
+    if (s->cookie == TRUE) {
+	s->rx_cookie = l2tpv3->rxcookie;
+	s->tx_cookie = l2tpv3->txcookie;
+	if (s->cookie_is_64 == TRUE) {
+	    /* 64 bit cookie */
+	    s->offset += 8;
+	    s->counter_offset += 8;
+	} else {
+	    /* 32 bit cookie */
+	    s->offset += 4;
+	    s->counter_offset +=4;
+	}
+    }
+
+    memset(&hints, 0, sizeof(hints));
+    
+    if (s->ipv6 == TRUE) {
+	hints.ai_family = AF_INET6;
+    } else {
+	hints.ai_family = AF_INET;
+    }
+    if (s->udp == TRUE) {
+	hints.ai_socktype = SOCK_DGRAM;
+	hints.ai_protocol = 0;
+	s->offset += 4;
+	s->counter_offset += 4;
+	s->session_offset += 4;
+	s->cookie_offset += 4;
+    } else {
+	hints.ai_socktype = SOCK_DGRAM;
+	hints.ai_protocol = IPPROTO_L2TP;
+    }
+
+    if ((getaddrinfo(l2tpv3->src, srcport, &hints, &result) !=0) || (result == NULL)) {
+	fd = -errno;
+	fprintf(stderr, "l2tpv3_open : could not resolve src, " "errno = %d\n", fd);
+	return -1;
+    }
+
+
+    if ((fd = socket(result->ai_family, result->ai_socktype, result->ai_protocol)) == -1) {
+	fd = -errno;
+	fprintf(stderr, "l2tpv3_open : socket creation failed, " "errno = %d\n", -fd);
+	freeaddrinfo(result);
+	return fd;
+    } 
+    if (bind(fd, (struct sockaddr *) result->ai_addr, result->ai_addrlen)) {
+	fprintf(stderr, "l2tpv3_open :  could not bind socket err=%i\n", errno);
+	close(fd);
+	return -1;
+    }
+
+    freeaddrinfo(result);
+
+    memset(&hints, 0, sizeof(hints));
+    
+    if (s->ipv6 == TRUE) {
+	hints.ai_family = AF_INET6;
+    } else {
+	hints.ai_family = AF_INET;
+    }
+    if (s->udp == TRUE) {
+	hints.ai_socktype = SOCK_DGRAM;
+	hints.ai_protocol = 0;
+    } else {
+	hints.ai_socktype = SOCK_DGRAM;
+	hints.ai_protocol = IPPROTO_L2TP;
+    }
+
+    if ((getaddrinfo(l2tpv3->dst, dstport, &hints, &result) !=0) || (result == NULL)) {
+	fprintf(stderr, "l2tpv3_open : could not resolve dst, " "errno = %d\n", -fd);
+	return -1;
+    }
+
+    s->dgram_dst = g_malloc(sizeof(struct sockaddr_storage));
+    memset(s->dgram_dst, '\0' , sizeof(struct sockaddr_storage));
+    memcpy(s->dgram_dst, result->ai_addr, result->ai_addrlen);
+    s->dst_size = result->ai_addrlen;
+
+    freeaddrinfo(result);
+
+    if (s->nocounter == FALSE) {
+	s->offset += 4;
+    }
+     
+    if (l2tpv3->has_offset) {
+	/* extra offset */
+	s->offset += l2tpv3->offset;
+    }
+
+    s->msgvec = build_l2tpv3_vector(s, MAX_L2TPV3_MSGCNT);
+    s->vec = g_malloc(sizeof(struct iovec) * MAX_L2TPV3_IOVCNT);    
+    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
+	s->header_buf = g_malloc(s->offset + sizeof (struct iphdr));    
+    } else {
+	s->header_buf = g_malloc(s->offset);    
+    }
+
+    qemu_set_nonblock(fd);
+
+    if (fd < 0)
+	return -1;
+
+    s->fd = fd;
+    s->counter = 0;
+
+    qemu_set_fd_handler(s->fd, net_l2tpv3_send, NULL, s);
+
+    if (!s) {
+	fprintf (stderr, "l2tpv3_open : failed to set fd handler\n");
+	return -1;
+    }
+    snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+             "l2tpv3: connected");
+    return 0;
+}
+
diff --git a/net/net.c b/net/net.c
index 0a88e68..d03f64d 100644
--- a/net/net.c
+++ b/net/net.c
@@ -731,6 +731,9 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
         [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
 #endif
         [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
+#ifdef CONFIG_LINUX
+        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
+#endif
 };
 
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:19   ` Anton Ivanov (antivano)
@ 2014-03-04 15:22     ` Anton Ivanov (antivano)
  2014-03-04 15:53       ` Eric Blake
  2014-03-04 15:41     ` Eric Blake
  2014-03-04 16:33     ` Eric Blake
  2 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 15:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

Apologies, missed to diff the json definitions.

Attached.

On 04/03/14 15:19, Anton Ivanov wrote:
> Attached is a revised version.
>
>     * I am still keeping the sendmsg instead of iov_send because I 
> have not had the time to review udp.c in the kernel and do the 
> relevant regression testing to see if they connect() still breaks 
> bind() on multihomed hosts as it did in 2.6. We can revisit that at a 
> later date, without doing the proper investigation I am not 
> comfortable trying this.
>
>     * I have killed completely parse6 and replaced that by a common 
> getaddrinfo()  with a selectable address family based on the boolean 
> flags. This also allows to force a specific v4 or v6 address choice 
> for dst.
>
>     * addresses now are specified separately from ports and ports are 
> deliberately strings so you can specify them as a protocol.
>
>     * mode bitmask is gone it is all booleans now - both for option 
> invocation and internally in the code.
>
>     * I have added the extra offset back in so it is feature by 
> feature compatible with linux kernel implementation.
>
>     * All mallocs are now to exact size and
>
>     * Indentation, style, debug, etc are all as requested now.
>
>     * l2tpv3.h has become surplus to requirements and is gone now.
>
>     * it now has proper cleanup.
>
>     * I have tested it for a few setups (mostly v4) and it works as 
> advertised. I need to rewrite my tests scripts for the new options 
> names to give it a full test. I would not expect that to show any 
> problems though - core send/receive logic is unchanged from the 
> original version.
>
>     * I have not yet addressed the page size and page alignment of 
> buffers items - leaving that open for an RFC on how to get the max 
> performance there and how to make it expandable lately so one can 
> re-configure it for large MTU/jumbo frames.
>
> Example magic incantation to invoke it with the new options:
>
> #!/bin/sh
>
> kvm -hda kvm.img  -m 1024 \
>    -net nic,vlan=0,model=virtio,macaddr=0a:98:fc:96:83:01 \
>    -net 
> l2tpv3,vlan=0,udp,src=192.168.63.1,srcport=1700,dst=192.168.63.1,dstport=1701,cookie64,txcookie=0x0123456789abcdef,rxcookie=0xfedcba9876543210,rxsession=0xffffffff,txsession=0xffffffff,counter
>
> A.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: l2tpv3-json.diff --]
[-- Type: text/x-patch; name="l2tpv3-json.diff", Size: 2053 bytes --]

diff --git a/qapi-schema.json b/qapi-schema.json
index 83fa485..56eac6d 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2940,6 +2940,62 @@
     '*localaddr': 'str',
     '*udp':       'str' } }
 
+# @NetdevL2TPv3Options
+#
+# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
+#
+# @ipv6: #bool, use ipv6 
+#
+# @udp: #bool use the udp version of the L2TPv3 encapsulation
+#
+# @cookie64 : #use 64 bit cookies
+#
+# @offset : #extra offset
+#
+# @counter : #have sequence counter
+#
+# @fd: #optional file descriptor of an already opened socket
+#
+# @src: #source address
+#
+# @srcport: #source port - mandatory for udp, optional for ip
+#
+# @dst: #destination address
+#
+# @dstport: #destination port - mandatory for udp, optional for ip
+#
+# @txcookie: #optional 32 or 64 bit tx cookie for the tunnel 
+#
+# @rxcookie: #optional 32 or 64 bit rx cookie for the tunnel
+#
+# @txsession: #tx 32 bit session
+#
+# @rxsession: #rx 32 bit session - if unset value for txsession is used
+#
+#
+# Since 1.2
+##
+##
+{ 'type': 'NetdevL2TPv3Options',
+  'data': {
+    '*fd':        'str',
+    'src':	  'str', 
+    'dst':	  'str', 
+    '*srcport':	  'str', 
+    '*dstport':	  'str', 
+    '*ipv6':	  'bool', 
+    '*udp':	  'bool', 
+    '*cookie64':  'bool', 
+    '*counter':   'bool', 
+    '*txcookie':  'uint64',
+    '*rxcookie':  'uint64',
+    'txsession': 'uint32',
+    '*rxsession': 'uint32',
+    '*offset':  'uint32' 
+
+} }
+
+##
 ##
 # @NetdevVdeOptions
 #
@@ -3014,13 +3070,16 @@
 # A discriminated record of network device traits.
 #
 # Since 1.2
-##
+#
+# Added in 2.0 - l2tpv3 
+#
 { 'union': 'NetClientOptions',
   'data': {
     'none':     'NetdevNoneOptions',
     'nic':      'NetLegacyNicOptions',
     'user':     'NetdevUserOptions',
     'tap':      'NetdevTapOptions',
+    'l2tpv3':   'NetdevL2TPv3Options',
     'socket':   'NetdevSocketOptions',
     'vde':      'NetdevVdeOptions',
     'dump':     'NetdevDumpOptions',

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:19   ` Anton Ivanov (antivano)
  2014-03-04 15:22     ` Anton Ivanov (antivano)
@ 2014-03-04 15:41     ` Eric Blake
  2014-03-04 15:58       ` Anton Ivanov (antivano)
  2014-03-04 16:33     ` Eric Blake
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2014-03-04 15:41 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On 03/04/2014 08:19 AM, Anton Ivanov (antivano) wrote:
> Attached is a revised version.

Better is to post a new top-level thread, with [PATCHv2] in the title.
Also, one of the v1 review comments was to break this into smaller
self-contained patches - that review request still applies.

For more hints on patch submission, see:
http://wiki.qemu.org/Contribute/SubmitAPatch


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:22     ` Anton Ivanov (antivano)
@ 2014-03-04 15:53       ` Eric Blake
  2014-03-04 16:05         ` Anton Ivanov (antivano)
  2014-03-05  8:49         ` Anton Ivanov (antivano)
  0 siblings, 2 replies; 36+ messages in thread
From: Eric Blake @ 2014-03-04 15:53 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3678 bytes --]

On 03/04/2014 08:22 AM, Anton Ivanov (antivano) wrote:
> Apologies, missed to diff the json definitions.
> 
> Attached.
> 

Missing a commit message and a Signed-off-by line, so it can't be
applied as-is.  Also, we prefer inline patches (as sent by 'git
send-email') over attached patches; I suggest using 'git send-email' to
first send the patches to yourself to make sure your settings are correct.


> diff --git a/qapi-schema.json b/qapi-schema.json
> index 83fa485..56eac6d 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2940,6 +2940,62 @@
>      '*localaddr': 'str',
>      '*udp':       'str' } }
>  
> +# @NetdevL2TPv3Options
> +#
> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
> +#
> +# @ipv6: #bool, use ipv6 
> +#

This should be:

# @ipv6: #optional true to use ipv6, default false

That is, we have a marking '#optional', but do not have a marking
'#bool' (you can read the actual definition below to learn that 'ipv6'
is a bool type).

Also, it makes it easier if you document options in the same order as
they appear in the struct below (there, you have 'fd' first).

> +# @udp: #bool use the udp version of the L2TPv3 encapsulation

Again, #optional, not #bool, and mention the default value

> +#
> +# @cookie64 : #use 64 bit cookies

#optional

> +#
> +# @offset : #extra offset

#optional

> +#
> +# @counter : #have sequence counter

#optional

> +#
> +# @fd: #optional file descriptor of an already opened socket

This doesn't seem to take into account my earlier comments - is the goal
to allow opening both files from the file system and the magic string
'/dev/fdset/...' supported by use of 'add-fd'?

> +#
> +# @src: #source address

Should be:

# @src: source address

The only use of # inside the docs has been for the tag '#optional'

> +#
> +# @srcport: #source port - mandatory for udp, optional for ip
> +#
> +# @dst: #destination address
> +#
> +# @dstport: #destination port - mandatory for udp, optional for ip
> +#
> +# @txcookie: #optional 32 or 64 bit tx cookie for the tunnel 
> +#
> +# @rxcookie: #optional 32 or 64 bit rx cookie for the tunnel
> +#
> +# @txsession: #tx 32 bit session
> +#
> +# @rxsession: #rx 32 bit session - if unset value for txsession is used

Should be:

# @rxsession: #optional rx 32 bit session, defaults to @txsession

> +#
> +#
> +# Since 1.2

Should be:

Since 2.0

> +##
> +##
> +{ 'type': 'NetdevL2TPv3Options',
> +  'data': {
> +    '*fd':        'str',
> +    'src':	  'str', 
> +    'dst':	  'str', 
> +    '*srcport':	  'str', 
> +    '*dstport':	  'str', 
> +    '*ipv6':	  'bool', 
> +    '*udp':	  'bool', 
> +    '*cookie64':  'bool', 
> +    '*counter':   'bool', 
> +    '*txcookie':  'uint64',
> +    '*rxcookie':  'uint64',
> +    'txsession': 'uint32',
> +    '*rxsession': 'uint32',
> +    '*offset':  'uint32' 
> +

Why the blank line?
> +} }
> +
> +##
>  ##
>  # @NetdevVdeOptions
>  #
> @@ -3014,13 +3070,16 @@
>  # A discriminated record of network device traits.
>  #
>  # Since 1.2
> -##
> +#
> +# Added in 2.0 - l2tpv3 
> +#
>  { 'union': 'NetClientOptions',
>    'data': {
>      'none':     'NetdevNoneOptions',
>      'nic':      'NetLegacyNicOptions',
>      'user':     'NetdevUserOptions',
>      'tap':      'NetdevTapOptions',
> +    'l2tpv3':   'NetdevL2TPv3Options',
>      'socket':   'NetdevSocketOptions',
>      'vde':      'NetdevVdeOptions',
>      'dump':     'NetdevDumpOptions',
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:41     ` Eric Blake
@ 2014-03-04 15:58       ` Anton Ivanov (antivano)
  2014-03-04 16:04         ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 15:58 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, Qemu-devel, Stefan Hajnoczi

On 04/03/14 15:41, Eric Blake wrote:
> On 03/04/2014 08:19 AM, Anton Ivanov (antivano) wrote:
>> Attached is a revised version.
> Better is to post a new top-level thread, with [PATCHv2] in the title.
> Also, one of the v1 review comments was to break this into smaller
> self-contained patches - that review request still applies.

I will re-post in a minute.
>
> For more hints on patch submission, see:
> http://wiki.qemu.org/Contribute/SubmitAPatch
>
>
It is the smaller self-contained patch.

It is only the L2TPv3 transport - that is what it takes to implement it. 
It is smaller in size than let's say tap or socket (half the size of tap).

There is nothing else in it. It no longer touches any other common files 
either (besides adding its client init to net.c and clients.h).

A.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:58       ` Anton Ivanov (antivano)
@ 2014-03-04 16:04         ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2014-03-04 16:04 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Eric Blake; +Cc: Qemu-devel, Stefan Hajnoczi

Il 04/03/2014 16:58, Anton Ivanov (antivano) ha scritto:
> On 04/03/14 15:41, Eric Blake wrote:
>> On 03/04/2014 08:19 AM, Anton Ivanov (antivano) wrote:
>>> Attached is a revised version.
>> Better is to post a new top-level thread, with [PATCHv2] in the title.
>> Also, one of the v1 review comments was to break this into smaller
>> self-contained patches - that review request still applies.
>
> I will re-post in a minute.
>>
>> For more hints on patch submission, see:
>> http://wiki.qemu.org/Contribute/SubmitAPatch
>>
>>
> It is the smaller self-contained patch.
>
> It is only the L2TPv3 transport - that is what it takes to implement it.
> It is smaller in size than let's say tap or socket (half the size of tap).
>
> There is nothing else in it. It no longer touches any other common files
> either (besides adding its client init to net.c and clients.h).

I agree that there's not much to split here.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:53       ` Eric Blake
@ 2014-03-04 16:05         ` Anton Ivanov (antivano)
  2014-03-05  8:49         ` Anton Ivanov (antivano)
  1 sibling, 0 replies; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 16:05 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, Qemu-devel, Stefan Hajnoczi

On 04/03/14 15:53, Eric Blake wrote:
> On 03/04/2014 08:22 AM, Anton Ivanov (antivano) wrote:
>> Apologies, missed to diff the json definitions.
>>
>> Attached.
>>
> Missing a commit message and a Signed-off-by line, so it can't be
> applied as-is.  Also, we prefer inline patches (as sent by 'git
> send-email') over attached patches; I suggest using 'git send-email' to
> first send the patches to yourself to make sure your settings are correct.

Will do. Can you review the actual transport patch please.

>
>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 83fa485..56eac6d 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -2940,6 +2940,62 @@
>>       '*localaddr': 'str',
>>       '*udp':       'str' } }
>>   
>> +# @NetdevL2TPv3Options
>> +#
>> +# Connect the VLAN to Ethernet over L2TPv3 Static tunnel
>> +#
>> +# @ipv6: #bool, use ipv6
>> +#
> This should be:
>
> # @ipv6: #optional true to use ipv6, default false
>
> That is, we have a marking '#optional', but do not have a marking
> '#bool' (you can read the actual definition below to learn that 'ipv6'
> is a bool type).
>
> Also, it makes it easier if you document options in the same order as
> they appear in the struct below (there, you have 'fd' first).
>
>> +# @udp: #bool use the udp version of the L2TPv3 encapsulation
> Again, #optional, not #bool, and mention the default value
>
>> +#
>> +# @cookie64 : #use 64 bit cookies
> #optional
>
>> +#
>> +# @offset : #extra offset
> #optional
>
>> +#
>> +# @counter : #have sequence counter
> #optional
>
>> +#
>> +# @fd: #optional file descriptor of an already opened socket
> This doesn't seem to take into account my earlier comments - is the goal
> to allow opening both files from the file system and the magic string
> '/dev/fdset/...' supported by use of 'add-fd'?

Yes, but not yet supported in this version. I will cut it out for now.

>
>> +#
>> +# @src: #source address
> Should be:
>
> # @src: source address
>
> The only use of # inside the docs has been for the tag '#optional'
>
>> +#
>> +# @srcport: #source port - mandatory for udp, optional for ip
>> +#
>> +# @dst: #destination address
>> +#
>> +# @dstport: #destination port - mandatory for udp, optional for ip
>> +#
>> +# @txcookie: #optional 32 or 64 bit tx cookie for the tunnel
>> +#
>> +# @rxcookie: #optional 32 or 64 bit rx cookie for the tunnel
>> +#
>> +# @txsession: #tx 32 bit session
>> +#
>> +# @rxsession: #rx 32 bit session - if unset value for txsession is used
> Should be:
>
> # @rxsession: #optional rx 32 bit session, defaults to @txsession
>
>> +#
>> +#
>> +# Since 1.2
> Should be:
>
> Since 2.0
>
>> +##
>> +##
>> +{ 'type': 'NetdevL2TPv3Options',
>> +  'data': {
>> +    '*fd':        'str',
>> +    'src':	  'str',
>> +    'dst':	  'str',
>> +    '*srcport':	  'str',
>> +    '*dstport':	  'str',
>> +    '*ipv6':	  'bool',
>> +    '*udp':	  'bool',
>> +    '*cookie64':  'bool',
>> +    '*counter':   'bool',
>> +    '*txcookie':  'uint64',
>> +    '*rxcookie':  'uint64',
>> +    'txsession': 'uint32',
>> +    '*rxsession': 'uint32',
>> +    '*offset':  'uint32'
>> +
> Why the blank line?
>> +} }
>> +
>> +##
>>   ##
>>   # @NetdevVdeOptions
>>   #
>> @@ -3014,13 +3070,16 @@
>>   # A discriminated record of network device traits.
>>   #
>>   # Since 1.2
>> -##
>> +#
>> +# Added in 2.0 - l2tpv3
>> +#
>>   { 'union': 'NetClientOptions',
>>     'data': {
>>       'none':     'NetdevNoneOptions',
>>       'nic':      'NetLegacyNicOptions',
>>       'user':     'NetdevUserOptions',
>>       'tap':      'NetdevTapOptions',
>> +    'l2tpv3':   'NetdevL2TPv3Options',
>>       'socket':   'NetdevSocketOptions',
>>       'vde':      'NetdevVdeOptions',
>>       'dump':     'NetdevDumpOptions',
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:19   ` Anton Ivanov (antivano)
  2014-03-04 15:22     ` Anton Ivanov (antivano)
  2014-03-04 15:41     ` Eric Blake
@ 2014-03-04 16:33     ` Eric Blake
  2014-03-04 16:48       ` Anton Ivanov (antivano)
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2014-03-04 16:33 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6176 bytes --]

On 03/04/2014 08:19 AM, Anton Ivanov (antivano) wrote:
> Attached is a revised version.
> 

> +
> +
> +#define PAGE_SIZE 4096

One of your earlier review comments suggested using sysconf or else
renaming this, as not all systems have a page size of 4096.

> +#define IOVSIZE 2
> +#define MAX_L2TPV3_MSGCNT 32 
> +#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
> +
> +#ifndef IPPROTO_L2TP
> +#define IPPROTO_L2TP 0x73
> +#endif
> +
> +typedef struct NetL2TPV3State {
> +    NetClientState nc;
> +    int fd;
> +    int state; 
> +    unsigned int index;
> +    unsigned int packet_len;
> +
> +    /* 
> +     *	these are used for xmit - that happens packet a time
> +     *	and for first sign of life packet (easier to parse that once)
> +     */ 
> +
> +    uint8_t * header_buf;

Most code uses this style:

uint8_t *header_buf;

with no space between a pointer designation and the variable name it
applies to (several instances in your patch).

> +
> +    /*
> +     * Bitmask mode determining encaps behaviour
> +     */
> +
> +    uint32_t offset;   
> +    uint32_t cookie_offset;
> +    uint32_t counter_offset;
> +    uint32_t session_offset;

Comment is off, as there is no bitmask here.

> +static int l2tpv3_form_header(NetL2TPV3State *s) {
> +    uint32_t *header;
> +    uint32_t *session;
> +    uint64_t *cookie64;
> +    uint32_t *cookie32;
> +    uint32_t *counter;
> +
> +    if (s->udp == TRUE) {

We require a C99 compiler; use 'true' not 'TRUE' (glib's TRUE caters
mainly to C89 compilers, and isn't necessarily a true boolean).  For
that matter, comparison against true or false is verbose; it's fine to
just use:

if (s->udp) {

> +	header = (uint32_t *) s->header_buf;
> +	stl_be_p(header, 0x30000);
> +    }
> +    session = (uint32_t *) (s->header_buf + s->session_offset);
> +    stl_be_p(session, s->tx_session);
> +
> +    if (s->cookie == TRUE ) {
> +	if (s->cookie_is_64 == TRUE) {

More cases of TRUE that should be fixed.  Also, no space before ).


> +    if (s->nocounter == FALSE) {
> +	counter = (uint32_t *)(s->header_buf + s->counter_offset);
> +	stl_be_p(counter, ++ s->counter);
> +    }

TAB damage - we indent with spaces.  No space after preincrement ++.

> +    return 0;
> +}
> +
> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)

Long line; you can split after , to fit within 80 columns.

> +{
> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
> +
> +    struct msghdr message;
> +    int ret;
> +
> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
> +	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
> +	return -1;

Is printing to stderr always the right thing to do?  It seems to me that
you should look into using QError.


> +    if (ret < 0) {
> +	ret = - errno;
> +    } else if (ret == 0) {
> +	ret = iov_size (iov, iovcnt);

No space before ( in function calls.


> +    vec++;
> +    vec->iov_base = (void *) buf;

This is C, not C++ - no need to cast to (void*).

> +    if (ret < 0) {
> +	ret = - errno;

No space after unary '-'.

> +    } else if (ret == 0) {
> +	ret = size;
> +    } else {
> +	ret =- s->offset; 

Trailing whitespace.  Please run your submission through
scripts/checkpatch.pl, and address all warnings.

'=-' looks odd; did you mean '= -'?


> +
> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {

Opening { on its own line.

> +
> +    uint64_t *cookie64;
> +    uint32_t *cookie32;
> +    uint32_t *session;
> +
> +    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){

Too many (), missing space before { - this should be:

if (!s->udp && !s->ipv6) {

> +	buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
> +    } 
> +    if (s->cookie == TRUE) {
> +	if (s->cookie_is_64 == TRUE) {
> +	    /* 64 bit cookie */
> +	    cookie64 = (uint64_t *)(buf + s->cookie_offset);
> +	    if ( ldq_be_p(cookie64) != s->rx_cookie) {
> +		fprintf(stderr, "unknown cookie id\n");
> +		return -1; /* we need to return 0, otherwise barfus */

What's up with that comment being different from the code?

> +	    }
> +	} else {
> +	    cookie32 = (uint32_t *)(buf + s->cookie_offset);
> +	    if (ldl_be_p(cookie32) != * (uint32_t *) &s->rx_cookie) {
> +		fprintf(stderr,"unknown cookie id\n");

Space after ','.  Again, I think QError is better than directly printing
to stderr.

> +		return -1 ; /* we need to return 0, otherwise barfus */
> +	    }
> +	}
> +    }
> +    session = (uint32_t *) (buf + s->session_offset);
> +    if (ldl_be_p(session) != s->rx_session) {

Are you risking bus errors on platforms where you cannot dereference a
wide int pointer that gets set to a misaligned offset?

> +    msgvec = s->msgvec;
> +    offset = s->offset;
> +    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
> +	offset +=   sizeof(struct iphdr);
> +    }  

Whitespace damage.

> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_MSGCNT, MSG_DONTWAIT, NULL);
> +    for (i=0;i<count;i++) {

Should be:

for (i = 0; i < count; i++) {

Lots of other sites need similar fixes.


> +
> +    if ((getaddrinfo(l2tpv3->src, srcport, &hints, &result) !=0) || (result == NULL)) {
> +	fd = -errno;
> +	fprintf(stderr, "l2tpv3_open : could not resolve src, " "errno = %d\n", fd);

What's up with the string concatenation in the format string?

getaddrinfo() does NOT set errno.  Rather, it returns a code that you
decipher with gai_strerror().  Your error reporting here is very suspect.

> +++ b/net/net.c
> @@ -731,6 +731,9 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>          [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
>  #endif
>          [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
> +#ifdef CONFIG_LINUX
> +        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
> +#endif

Alignment looks off.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 16:33     ` Eric Blake
@ 2014-03-04 16:48       ` Anton Ivanov (antivano)
  2014-03-04 16:55         ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 16:48 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, Qemu-devel, Stefan Hajnoczi

On 04/03/14 16:33, Eric Blake wrote:
>
> +
> +
> +#define PAGE_SIZE 4096
> One of your earlier review comments suggested using sysconf or else
> renaming this, as not all systems have a page size of 4096.

OK.

>
>> +#define IOVSIZE 2
>> +#define MAX_L2TPV3_MSGCNT 32
>> +#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
>> +
>> +#ifndef IPPROTO_L2TP
>> +#define IPPROTO_L2TP 0x73
>> +#endif
>> +
>> +typedef struct NetL2TPV3State {
>> +    NetClientState nc;
>> +    int fd;
>> +    int state;
>> +    unsigned int index;
>> +    unsigned int packet_len;
>> +
>> +    /*
>> +     *	these are used for xmit - that happens packet a time
>> +     *	and for first sign of life packet (easier to parse that once)
>> +     */
>> +
>> +    uint8_t * header_buf;
> Most code uses this style:
>
> uint8_t *header_buf;
>
> with no space between a pointer designation and the variable name it
> applies to (several instances in your patch).

OK, will fix.

>
>> +
>> +    /*
>> +     * Bitmask mode determining encaps behaviour
>> +     */
>> +
>> +    uint32_t offset;
>> +    uint32_t cookie_offset;
>> +    uint32_t counter_offset;
>> +    uint32_t session_offset;
> Comment is off, as there is no bitmask here.

Forgot to clean it :)

>
>> +static int l2tpv3_form_header(NetL2TPV3State *s) {
>> +    uint32_t *header;
>> +    uint32_t *session;
>> +    uint64_t *cookie64;
>> +    uint32_t *cookie32;
>> +    uint32_t *counter;
>> +
>> +    if (s->udp == TRUE) {
> We require a C99 compiler; use 'true' not 'TRUE' (glib's TRUE caters
> mainly to C89 compilers, and isn't necessarily a true boolean).  For
> that matter, comparison against true or false is verbose; it's fine to
> just use:
>
> if (s->udp) {

OK.

>
>> +	header = (uint32_t *) s->header_buf;
>> +	stl_be_p(header, 0x30000);
>> +    }
>> +    session = (uint32_t *) (s->header_buf + s->session_offset);
>> +    stl_be_p(session, s->tx_session);
>> +
>> +    if (s->cookie == TRUE ) {
>> +	if (s->cookie_is_64 == TRUE) {
> More cases of TRUE that should be fixed.  Also, no space before ).
>
>
>> +    if (s->nocounter == FALSE) {
>> +	counter = (uint32_t *)(s->header_buf + s->counter_offset);
>> +	stl_be_p(counter, ++ s->counter);
>> +    }
> TAB damage - we indent with spaces.  No space after preincrement ++.
>
>> +    return 0;
>> +}
>> +
>> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
> Long line; you can split after , to fit within 80 columns.

OK
>
>> +{
>> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>> +
>> +    struct msghdr message;
>> +    int ret;
>> +
>> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>> +	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
>> +	return -1;
> Is printing to stderr always the right thing to do?  It seems to me that
> you should look into using QError.

Thanks, will look into it.

>
>> +    if (ret < 0) {
>> +	ret = - errno;
>> +    } else if (ret == 0) {
>> +	ret = iov_size (iov, iovcnt);
> No space before ( in function calls.
>
>
>> +    vec++;
>> +    vec->iov_base = (void *) buf;
> This is C, not C++ - no need to cast to (void*).



>
>> +    if (ret < 0) {
>> +	ret = - errno;
> No space after unary '-'.
>
>> +    } else if (ret == 0) {
>> +	ret = size;
>> +    } else {
>> +	ret =- s->offset;
> Trailing whitespace.  Please run your submission through
> scripts/checkpatch.pl, and address all warnings.
>
> '=-' looks odd; did you mean '= -'?

ret - s->offset /* you need to adjust the ret so that it reflects the 
data and not data + header */

>
>
>> +
>> +static int l2tpv3_verify_header(NetL2TPV3State *s, uint8_t *buf) {
> Opening { on its own line.
>
>> +
>> +    uint64_t *cookie64;
>> +    uint32_t *cookie32;
>> +    uint32_t *session;
>> +
>> +    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
> Too many (), missing space before { - this should be:
>
> if (!s->udp && !s->ipv6) {

OK

>
>> +	buf += sizeof(struct iphdr) /* fix for ipv4 raw */;
>> +    }
>> +    if (s->cookie == TRUE) {
>> +	if (s->cookie_is_64 == TRUE) {
>> +	    /* 64 bit cookie */
>> +	    cookie64 = (uint64_t *)(buf + s->cookie_offset);
>> +	    if ( ldq_be_p(cookie64) != s->rx_cookie) {
>> +		fprintf(stderr, "unknown cookie id\n");
>> +		return -1; /* we need to return 0, otherwise barfus */
> What's up with that comment being different from the code?

Carryover from UML - you have different semantics. There you have to 
return 0.

>
>> +	    }
>> +	} else {
>> +	    cookie32 = (uint32_t *)(buf + s->cookie_offset);
>> +	    if (ldl_be_p(cookie32) != * (uint32_t *) &s->rx_cookie) {
>> +		fprintf(stderr,"unknown cookie id\n");
> Space after ','.  Again, I think QError is better than directly printing
> to stderr.

OK.

>
>> +		return -1 ; /* we need to return 0, otherwise barfus */
>> +	    }
>> +	}
>> +    }
>> +    session = (uint32_t *) (buf + s->session_offset);
>> +    if (ldl_be_p(session) != s->rx_session) {
> Are you risking bus errors on platforms where you cannot dereference a
> wide int pointer that gets set to a misaligned offset?

Maybe - I am not on such a platform so it is very difficult for me to 
assess the impact of this.

>
>> +    msgvec = s->msgvec;
>> +    offset = s->offset;
>> +    if ((s->udp == FALSE) && (s->ipv6 == FALSE)){
>> +	offset +=   sizeof(struct iphdr);
>> +    }
> Whitespace damage.
>
>> +    count = recvmmsg(s->fd, msgvec, MAX_L2TPV3_MSGCNT, MSG_DONTWAIT, NULL);
>> +    for (i=0;i<count;i++) {
> Should be:
>
> for (i = 0; i < count; i++) {
>
> Lots of other sites need similar fixes.
>
>
>> +
>> +    if ((getaddrinfo(l2tpv3->src, srcport, &hints, &result) !=0) || (result == NULL)) {
>> +	fd = -errno;
>> +	fprintf(stderr, "l2tpv3_open : could not resolve src, " "errno = %d\n", fd);
> What's up with the string concatenation in the format string?

Once upon a time it was on two lines :)

>
> getaddrinfo() does NOT set errno.  Rather, it returns a code that you
> decipher with gai_strerror().  Your error reporting here is very suspect.

Thanks, broke that when converting and accounting for your other comments.

>
>> +++ b/net/net.c
>> @@ -731,6 +731,9 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
>>           [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
>>   #endif
>>           [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
>> +#ifdef CONFIG_LINUX
>> +        [NET_CLIENT_OPTIONS_KIND_L2TPV3]       = net_init_l2tpv3,
>> +#endif
> Alignment looks off.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 16:48       ` Anton Ivanov (antivano)
@ 2014-03-04 16:55         ` Paolo Bonzini
  2014-03-04 17:28           ` Anton Ivanov (antivano)
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2014-03-04 16:55 UTC (permalink / raw)
  To: Anton Ivanov (antivano), Eric Blake; +Cc: Qemu-devel, Stefan Hajnoczi

Il 04/03/2014 17:48, Anton Ivanov (antivano) ha scritto:
>>> >> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
>> > Long line; you can split after , to fit within 80 columns.
> OK
>> >
>>> >> +{
>>> >> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>>> >> +
>>> >> +    struct msghdr message;
>>> >> +    int ret;
>>> >> +
>>> >> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>> >> +	fprintf(stderr, "iovec too long %d > %d, change l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
>>> >> +	return -1;
>> > Is printing to stderr always the right thing to do?  It seems to me that
>> > you should look into using QError.
> Thanks, will look into it.
>

Actually no, this does not need to use QError.  You just need 
"error_report", which is the same as fprintf(stderr) but will add nice 
timestamps in front of the error message if enabled.

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 16:55         ` Paolo Bonzini
@ 2014-03-04 17:28           ` Anton Ivanov (antivano)
  2014-03-04 17:30             ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-04 17:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Qemu-devel, Stefan Hajnoczi

On 04/03/14 16:55, Paolo Bonzini wrote:
> Il 04/03/2014 17:48, Anton Ivanov (antivano) ha scritto:
>>>> >> +static ssize_t net_l2tpv3_receive_dgram_iov(NetClientState *nc, 
>>>> const struct iovec *iov, int iovcnt)
>>> > Long line; you can split after , to fit within 80 columns.
>> OK
>>> >
>>>> >> +{
>>>> >> +    NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
>>>> >> +
>>>> >> +    struct msghdr message;
>>>> >> +    int ret;
>>>> >> +
>>>> >> +    if (iovcnt > MAX_L2TPV3_IOVCNT - 1) {
>>>> >> +    fprintf(stderr, "iovec too long %d > %d, change 
>>>> l2tpv3.h\n", iovcnt, MAX_L2TPV3_IOVCNT);
>>>> >> +    return -1;
>>> > Is printing to stderr always the right thing to do?  It seems to 
>>> me that
>>> > you should look into using QError.
>> Thanks, will look into it.
>>
>
> Actually no, this does not need to use QError.  You just need 
> "error_report", which is the same as fprintf(stderr) but will add nice 
> timestamps in front of the error message if enabled.

Everything updated to that, Eric's comments (except QError and bus 
alignment) taken in to account too.

I have done a compromise on the buffer at the moment and set it to 2048. 
I need to understand fully how this gets paged in and how to make that 
process optimal, then we will adjust it. The idea is to expand on this 
in the future to allow jumbo frames (up to 8k).

I will rebuild it, retest it to make sure it works and provide an 
updated version tomorrow.

One question - what git client are you using to do git send-mail. While 
debian does have a reference to git send-mail in the main git manpage, 
it does not seem to have it as a part of git.

By the way - apologies for the code quality, as you have probably 
guessed, I am not a "proper" developer - I am a network 
engineer/sysadmin by trade.

A.

>
> Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 17:28           ` Anton Ivanov (antivano)
@ 2014-03-04 17:30             ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2014-03-04 17:30 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel, Stefan Hajnoczi

Il 04/03/2014 18:28, Anton Ivanov (antivano) ha scritto:
>
> One question - what git client are you using to do git send-mail. While
> debian does have a reference to git send-mail in the main git manpage,
> it does not seem to have it as a part of git.

It's "git send-email".  Sometimes it is in a separate package.  You may 
want to look for the package that has 
/usr/libexec/git-core/git-send-email or 
/usr/share/man/man1/git-send-email.1.gz since it may not be the main git 
package.

> By the way - apologies for the code quality, as you have probably
> guessed, I am not a "proper" developer - I am a network
> engineer/sysadmin by trade.

You're doing great, no worries!

Paolo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 15:53       ` Eric Blake
  2014-03-04 16:05         ` Anton Ivanov (antivano)
@ 2014-03-05  8:49         ` Anton Ivanov (antivano)
  2014-03-05 11:38           ` Peter Maydell
  1 sibling, 1 reply; 36+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-05  8:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, Qemu-devel, Stefan Hajnoczi

On 04/03/14 15:53, Eric Blake wrote:
> On 03/04/2014 08:22 AM, Anton Ivanov (antivano) wrote:
>> Apologies, missed to diff the json definitions.
>>
>> Attached.
>>
> Missing a commit message and a Signed-off-by line, so it can't be
> applied as-is.  Also, we prefer inline patches (as sent by 'git
> send-email') over attached patches; I suggest using 'git send-email' to
> first send the patches to yourself to make sure your settings are correct.
>

Hi Eric,

I had a look at that - we are happy to use it once and after the 
contribution is accepted.

Once that happens we will re-import the final accepted version on a new 
branch and orphan our old branch(es). From there onwards we can submit 
inline patches generated using git-send-email.

Prior to that it is inappropriate - this tool discloses revision history 
which is prior to the version which has been approved for external 
release. There is no way I can get an approval for that.

I do not quite see the value for an initial contribution as well. We are 
not providing bugfixes or an incremental patch on top of an existing 
qemu feature. We are providing something which is new and self-contained.

Brdgs,

A.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04  9:47       ` Anton Ivanov (antivano)
@ 2014-03-05  8:59         ` Stefan Hajnoczi
  2014-03-05  9:13           ` Vincenzo Maffione
  0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2014-03-05  8:59 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Luigi Rizzo, Qemu-devel, Vincenzo Maffione

On Tue, Mar 04, 2014 at 09:47:09AM +0000, Anton Ivanov (antivano) wrote:
> On 04/03/14 09:36, Stefan Hajnoczi wrote:
> > On Mon, Mar 03, 2014 at 02:01:00PM +0000, Anton Ivanov (antivano) wrote:
> >> On 03/03/14 13:27, Stefan Hajnoczi wrote:
> >>> On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano) wrote:
> >>>> 3. Qemu to communicate with the local host, remote vms, network devices,
> >>>> etc at speeds which for a number of use cases exceed the speed of the
> >>>> legacy tap driver.
> >>> This surprises me.  It's odd that tap performs significantly worse.
> >>
> >> Multipacket RX can go a very long way and it does not work on tap's
> >> emulation of a raw socket. At least in 3.2 :)
> > Luigi and Vincenzo had ideas on making QEMU's net layer support
> > multipacket tx using something like TCP_CORK.  This would map to
> > sendmmsg(2).
> >
> > Basically the net client gets multiple .receive() calls but is told to
> > hold off on submitting the packets.  Then, when it finally gets
> > uncorked, it can sendmmsg(2).  The only issue is we need to hold on to
> > the tx buffers longer than normal.
> 
> Cool, I will be happy to give a hand with that.
> 
> My main problem so far trying to implement it has been the timers - the 
> qemu internal timer API has no relative timers, only absolute. So you 
> end up with a very high cost of setting and checking a delayed xmit timer.

I'm thinking about something simpler that doesn't use a timer:

Rely on the guest to submit a batch of packets for tx.  When processing
the descriptor ring in the device emulation code (virtio-net, etc), use
the CORK flag on all packets except the final one.

This essentially hands the contents of the tx ring to the netdev (tap,
L2TPv3, etc) and then lets them submit the entire batch using
sendmmsg(2).

When we discussed this previously there was concern about the latency
added by CORK.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-04 11:32   ` Anton Ivanov (antivano)
@ 2014-03-05  9:07     ` Stefan Hajnoczi
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2014-03-05  9:07 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Qemu-devel

On Tue, Mar 04, 2014 at 11:32:26AM +0000, Anton Ivanov (antivano) wrote:
> > If you really *need* the page size, please use sysconf(_SC_PAGESIZE).
> 
> I like to have it page aligned and if possible page sized so I can later 
> extend to do jumbo frame support via a vector. If this is the wrong way 
> of doing it, I am happy to fix.

Page size may be the best size unit but you need to express it in a
portable way.  Not all host architectures use 4 KB pages so hardcoding a
constant is wrong.

> > +	    //vec->iov_len = PAGE_SIZE; /* reset for next read */
> > I think it *is* necessary to reset ->iov_len for both msgvec iovecs.
> 
> mmsgsend does not return these modified. However better be safe than 
> sorry - I am uncommenting these in the next revision.

Oh, I see.  In that case feel free to drop the assignments.  I now see
that msg_len contains the size when recvmmsg(2) returns.

> > Can you use C structs and unions instead of choosing an arbitrary
> > 256-byte size and calculating offsets at runtime?
> 
> It is has now updated to be the correct size for the actual config.
> 
> As far as structs - not really.
> 
> I tried that once upon a time in an early version, I ended up with 8+ 
> different structs (cookies can vary in size so you cannot union-ize 
> them, compiler will allocate the size for the "biggest option"). In 
> addition to that the standard has slightly different headers on raw and 
> udp. The linux kernel people have done the same - header offsets. It is 
> an unfortunate necessity for code like this.
> 
> Also, there is one nearly universal non-standard feature which I would 
> like to put back. It is present in the linux kernel implementation and 
> it is the "arbitrary offset after header" so you can stick extra 
> metadata between header and packet. That will necessitate offset 
> calculations anyway.

Okay, that's fine.  I was hoping but agree it cannot be represented
properly with C structs.

> > Is there a way to disable the IP header included in received packets? 
> > I haven't looked into how IP_HDRINCL works... 
> 
> It works the other way - you can get headers on v6 using that option, 
> but v6 does not give you headers by default. AFAIK v4 raw always gives 
> you the header do you like it or not. Makes the code very ugly 
> unfortunately.

Okay.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-05  8:59         ` Stefan Hajnoczi
@ 2014-03-05  9:13           ` Vincenzo Maffione
  0 siblings, 0 replies; 36+ messages in thread
From: Vincenzo Maffione @ 2014-03-05  9:13 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Anton Ivanov (antivano), Luigi Rizzo, Qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

I think the would be nice if we add a command-line option (or maybe even an
option that can be changed dinamically) for the net backend that
enables/disables the use of the CORK flag, so that the device emulation can
use it anyways, but backend does really use it (and so there are real
effects) only if the user specifies it.

In this way we leave standard QEMU setups unchanged and we enable special
applications (e.g. high packet rate middleboxes) to use the CORK flag to
get higher packet rate, possibly sacrifying a bit of latency.

Cheers,
  Vincenzo


2014-03-05 9:59 GMT+01:00 Stefan Hajnoczi <stefanha@gmail.com>:

> On Tue, Mar 04, 2014 at 09:47:09AM +0000, Anton Ivanov (antivano) wrote:
> > On 04/03/14 09:36, Stefan Hajnoczi wrote:
> > > On Mon, Mar 03, 2014 at 02:01:00PM +0000, Anton Ivanov (antivano)
> wrote:
> > >> On 03/03/14 13:27, Stefan Hajnoczi wrote:
> > >>> On Fri, Feb 28, 2014 at 08:28:11AM +0000, Anton Ivanov (antivano)
> wrote:
> > >>>> 3. Qemu to communicate with the local host, remote vms, network
> devices,
> > >>>> etc at speeds which for a number of use cases exceed the speed of
> the
> > >>>> legacy tap driver.
> > >>> This surprises me.  It's odd that tap performs significantly worse.
> > >>
> > >> Multipacket RX can go a very long way and it does not work on tap's
> > >> emulation of a raw socket. At least in 3.2 :)
> > > Luigi and Vincenzo had ideas on making QEMU's net layer support
> > > multipacket tx using something like TCP_CORK.  This would map to
> > > sendmmsg(2).
> > >
> > > Basically the net client gets multiple .receive() calls but is told to
> > > hold off on submitting the packets.  Then, when it finally gets
> > > uncorked, it can sendmmsg(2).  The only issue is we need to hold on to
> > > the tx buffers longer than normal.
> >
> > Cool, I will be happy to give a hand with that.
> >
> > My main problem so far trying to implement it has been the timers - the
> > qemu internal timer API has no relative timers, only absolute. So you
> > end up with a very high cost of setting and checking a delayed xmit
> timer.
>
> I'm thinking about something simpler that doesn't use a timer:
>
> Rely on the guest to submit a batch of packets for tx.  When processing
> the descriptor ring in the device emulation code (virtio-net, etc), use
> the CORK flag on all packets except the final one.
>
> This essentially hands the contents of the tx ring to the netdev (tap,
> L2TPv3, etc) and then lets them submit the entire batch using
> sendmmsg(2).
>
> When we discussed this previously there was concern about the latency
> added by CORK.
>



-- 
Vincenzo Maffione

[-- Attachment #2: Type: text/html, Size: 3334 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] Contribution - L2TPv3 transport
  2014-03-05  8:49         ` Anton Ivanov (antivano)
@ 2014-03-05 11:38           ` Peter Maydell
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Maydell @ 2014-03-05 11:38 UTC (permalink / raw)
  To: Anton Ivanov (antivano); +Cc: Paolo Bonzini, Qemu-devel, Stefan Hajnoczi

On 5 March 2014 08:49, Anton Ivanov (antivano) <antivano@cisco.com> wrote:
> On 04/03/14 15:53, Eric Blake wrote:
>> Missing a commit message and a Signed-off-by line, so it can't be
>> applied as-is.  Also, we prefer inline patches (as sent by 'git
>> send-email') over attached patches; I suggest using 'git send-email' to
>> first send the patches to yourself to make sure your settings are correct.

> I had a look at that - we are happy to use it once and after the
> contribution is accepted.
>
> Once that happens we will re-import the final accepted version on a new
> branch and orphan our old branch(es). From there onwards we can submit
> inline patches generated using git-send-email.
>
> Prior to that it is inappropriate - this tool discloses revision history
> which is prior to the version which has been approved for external
> release. There is no way I can get an approval for that.

No, you're mistaken here. git send-email simply produces a
correctly formatted patch for whichever git commits you want to
send. So you just make sure the git tree you're sending from has
a single commit (or a set of commits if you want to split it into
multiple patches that build up to the feature you want), and then
run git send-email on that. We absolutely do not want the revision
history of your internal work on this code -- we want to see the
cleaned up final patches. You just need to construct within git
a set of commits which correspond to what you want to send out.

I see you've just sent another version which is still not in the correct
format, which still doesn't have a signed-off-by line and which still
doesn't have a commit message. Please don't do that, it is a waste
of everybody's time.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-03-05 11:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28  8:28 [Qemu-devel] Contribution - L2TPv3 transport Anton Ivanov (antivano)
2014-02-28 10:02 ` Paolo Bonzini
2014-02-28 11:17   ` Anton Ivanov (antivano)
2014-02-28 11:36     ` Paolo Bonzini
2014-02-28 12:59       ` Anton Ivanov (antivano)
2014-02-28 13:55   ` Anton Ivanov (antivano)
2014-03-04 15:19   ` Anton Ivanov (antivano)
2014-03-04 15:22     ` Anton Ivanov (antivano)
2014-03-04 15:53       ` Eric Blake
2014-03-04 16:05         ` Anton Ivanov (antivano)
2014-03-05  8:49         ` Anton Ivanov (antivano)
2014-03-05 11:38           ` Peter Maydell
2014-03-04 15:41     ` Eric Blake
2014-03-04 15:58       ` Anton Ivanov (antivano)
2014-03-04 16:04         ` Paolo Bonzini
2014-03-04 16:33     ` Eric Blake
2014-03-04 16:48       ` Anton Ivanov (antivano)
2014-03-04 16:55         ` Paolo Bonzini
2014-03-04 17:28           ` Anton Ivanov (antivano)
2014-03-04 17:30             ` Paolo Bonzini
2014-02-28 13:40 ` Eric Blake
2014-02-28 13:52   ` Anton Ivanov (antivano)
2014-02-28 13:57     ` Eric Blake
2014-02-28 14:03       ` Anton Ivanov (antivano)
2014-02-28 14:00   ` Paolo Bonzini
2014-02-28 15:06     ` Eric Blake
2014-02-28 15:20       ` Paolo Bonzini
2014-03-03 13:27 ` Stefan Hajnoczi
2014-03-03 14:01   ` Anton Ivanov (antivano)
2014-03-04  9:36     ` Stefan Hajnoczi
2014-03-04  9:47       ` Anton Ivanov (antivano)
2014-03-05  8:59         ` Stefan Hajnoczi
2014-03-05  9:13           ` Vincenzo Maffione
2014-03-03 14:53 ` Stefan Hajnoczi
2014-03-04 11:32   ` Anton Ivanov (antivano)
2014-03-05  9:07     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.