All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL
@ 2016-04-03 19:25 wexu
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit wexu
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: wexu @ 2016-04-03 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: victork, mst, jasowang, yvugenfi, Wei Xu, marcel, dfleytma

From: Wei Xu <wexu@redhat.com>

Changes in V4:
- Add new host feature bit
- Replace using fixed header lenght with dynamic header lenght in VirtIONet 
- Change ip/ip6 header union in NetRscUnit to void* pointer
- Add macro prefix, adjust code indent, etc.

Changes in V3:
- Removed big param list, replace it with 'NetRscUnit' 
- Different virtio header size
- Modify callback function to direct call.
- Needn't check the failure of g_malloc()
- Other code format adjustment, macro naming, etc 

Changes in V2:
- Add detailed commit log

This patch is to support WHQL test for Windows guest, while this feature also
benifits other guest works as a kernel 'gro' like feature with userspace 
implementation.
Feature information:
  http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324

Both IPv4 and IPv6 are supported, though performance with userspace virtio
is slow than vhost-net, there is about 1.5x to 2x performance improvement to
userspace virtio, this is done by turning this feature on and disable
'tso/gso/gro' on corresponding tap interface and guest interface, while get
less improment with all these feature on.

Linux guest performance data(Netperf):
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.101 () port 0 AF_INET : nodelay
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384     64    6.00     1221.20   
 87380  16384     64    6.00     1260.30   

 87380  16384    128    6.00     1978.51   
 87380  16384    128    6.00     2286.05   

 87380  16384    256    6.00     2677.94   
 87380  16384    256    6.00     4615.42   

 87380  16384    512    6.00     2956.54   
 87380  16384    512    6.00     5356.39   

 87380  16384   1024    6.00     2798.17   
 87380  16384   1024    6.00     4943.30   

 87380  16384   2048    6.00     2681.09   
 87380  16384   2048    6.00     4835.81   

 87380  16384   4096    6.00     3390.14   
 87380  16384   4096    6.00     5391.54   

 87380  16384   8092    6.00     3008.27   
 87380  16384   8092    6.00     5381.68   

 87380  16384  10240    6.00     2999.89   
 87380  16384  10240    6.00     5393.11 

Test steps:
Although this feature is mainly used for window guest, i used linux guest to 
help test
the feature, to make things simple, i used 3 steps to test the patch as i moved 
on.
1. With a tcp socket client/server pair running on 2 linux guest, thus i can 
control
the traffic and debugging the code as i want.
2. Netperf on linux guest test the throughput.
3. WHQL test with 2 Windows guests.

Current status:
IPv4 pass all the above tests.
IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot
receive any packet in WHQL test, trying with a win debug binary

Note:
A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes during 
setup, this can be figured out by replacing it with an 'e1000' nic.

Todo:
More sanity check and tcp 'ecn' and 'window' scale test.

Wei Xu (3):
  virtio-net rsc: add a new host offload(rsc) feature bit
  virtio-net rsc: support coalescing ipv4 tcp traffic
  virtio-net rsc: support coalescing ipv6 tcp traffic

 hw/net/virtio-net.c                         | 642 +++++++++++++++++++++++++++-
 include/hw/virtio/virtio-net.h              |   1 +
 include/hw/virtio/virtio.h                  |  72 ++++
 include/standard-headers/linux/virtio_net.h |   1 +
 4 files changed, 714 insertions(+), 2 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit
  2016-04-03 19:25 [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL wexu
@ 2016-04-03 19:25 ` wexu
  2016-04-05  2:05   ` Jason Wang
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: wexu @ 2016-04-03 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: victork, mst, jasowang, yvugenfi, Wei Xu, marcel, dfleytma

From: Wei Xu <wexu@redhat.com>

A new feature bit 'VIRTIO_NET_F_GUEST_RSC' is introduced to support WHQL
Receive-Segment-Offload test, this feature will coalesce tcp packets in
IPv4/6 for userspace virtio-net driver.

This feature can be enabled either by 'ACK' the feature when loading
the driver in the guest, or by sending the 'VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET'
command to the host via control queue.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/virtio-net.c                         | 29 +++++++++++++++++++++++++++--
 include/standard-headers/linux/virtio_net.h |  1 +
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5798f87..bd91a4b 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -537,6 +537,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
         virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
         virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
         virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
+        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_RSC);
     }
 
     if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
@@ -582,7 +583,8 @@ static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
         (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
         (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
         (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
-        (1ULL << VIRTIO_NET_F_GUEST_UFO);
+        (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
+        (1ULL << VIRTIO_NET_F_GUEST_RSC);
 
     return guest_offloads_mask & features;
 }
@@ -1089,7 +1091,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
     return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t virtio_net_do_receive(NetClientState *nc,
+                                     const uint8_t *buf, size_t size)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
     VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1685,6 +1688,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
     return 0;
 }
 
+
+static ssize_t virtio_net_rsc_receive(NetClientState *nc,
+                                      const uint8_t *buf, size_t size)
+{
+    return virtio_net_do_receive(nc, buf, size);
+}
+
+static ssize_t virtio_net_receive(NetClientState *nc,
+                                  const uint8_t *buf, size_t size)
+{
+    VirtIONet *n;
+
+    n = qemu_get_nic_opaque(nc);
+    if (n->curr_guest_offloads & VIRTIO_NET_F_GUEST_RSC) {
+        return virtio_net_rsc_receive(nc, buf, size);
+    } else {
+        return virtio_net_do_receive(nc, buf, size);
+    }
+}
+
 static NetClientInfo net_virtio_info = {
     .type = NET_CLIENT_OPTIONS_KIND_NIC,
     .size = sizeof(NICState),
@@ -1909,6 +1932,8 @@ static Property virtio_net_properties[] = {
                        TX_TIMER_INTERVAL),
     DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
     DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
+    DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
+                    VIRTIO_NET_F_GUEST_RSC, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index a78f33e..5b95762 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -55,6 +55,7 @@
 #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
+#define VIRTIO_NET_F_GUEST_RSC  24      /* Guest can coalesce tcp packets */
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-03 19:25 [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL wexu
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit wexu
@ 2016-04-03 19:25 ` wexu
  2016-04-05  2:47   ` Jason Wang
  2016-04-12  8:23   ` Michael S. Tsirkin
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 " wexu
  2016-04-05  2:52 ` [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL Jason Wang
  3 siblings, 2 replies; 17+ messages in thread
From: wexu @ 2016-04-03 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: victork, mst, jasowang, yvugenfi, Wei Xu, marcel, dfleytma

From: Wei Xu <wexu@redhat.com>

All the data packets in a tcp connection will be cached to a big buffer
in every receive interval, and will be sent out via a timer, the
'virtio_net_rsc_timeout' controls the interval, the value will influent the
performance and response of tcp connection extremely, 50000(50us) is a
experience value to gain a performance improvement, since the whql test
sends packets every 100us, so '300000(300us)' can pass the test case,
this is also the default value, it's gonna to be tunable.

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets

'NetRscChain' is used to save the segments of different protocols in a
VirtIONet device.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed sanity check
and is identified as an 'wanted' one.

An 'wanted' segment means:
1. Segment is within current window and the sequence is the expected one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less than 2,
   this is to notify upper layer TCP starting retransmission due to the spec.

Sanity check includes:
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

There maybe more cases should be considered such as ip identification other
flags, while it broke the test because windows set it to the same even it's
not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed, an 'URG' packet
also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the 'out of order'
and 'out of window' means how many retransmitting packets, thus describe the
performance intuitively.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/virtio-net.c            | 480 ++++++++++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-net.h |   1 +
 include/hw/virtio/virtio.h     |  72 +++++++
 3 files changed, 552 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index bd91a4b..81e8e71 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -15,10 +15,12 @@
 #include "qemu/iov.h"
 #include "hw/virtio/virtio.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/sockets.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "hw/virtio/virtio-bus.h"
@@ -38,6 +40,24 @@
 #define endof(container, field) \
     (offsetof(container, field) + sizeof(((container *)0)->field))
 
+#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
+#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
+
+/* IPv4 max payload, 16 bits in the header */
+#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
+#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
+
+/* header lenght value in ip header without option */
+#define VIRTIO_NET_IP4_HEADER_LENGTH 5
+
+/* Purge coalesced packets timer interval */
+#define VIRTIO_NET_RSC_INTERVAL  300000
+
+/* This value affects the performance a lot, and should be tuned carefully,
+   '300000'(300us) is the recommended value to pass the WHQL test, '50000' can
+   gain 2x netperf throughput with tso/gso/gro 'off'. */
+static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
+
 typedef struct VirtIOFeature {
     uint32_t flags;
     size_t end;
@@ -1688,11 +1708,467 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
     return 0;
 }
 
+static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
+                                         const uint8_t *buf, NetRscUnit* unit)
+{
+    uint16_t hdr_len;
+    uint16_t ip_hdrlen;
+    struct ip_header *ip;
+
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    ip = (struct ip_header *)(buf + hdr_len + sizeof(struct eth_header));
+    unit->ip = (void *)ip;
+    ip_hdrlen = (ip->ip_ver_len & 0xF) << 2;
+    unit->ip_plen = &ip->ip_len;
+    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip) + ip_hdrlen);
+    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
+    unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
+}
+
+static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
+{
+    uint32_t sum;
+
+    ip->ip_sum = 0;
+    sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
+    ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
+}
+
+static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
+{
+    int ret;
+
+    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
+    ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+    QTAILQ_REMOVE(&chain->buffers, seg, next);
+    g_free(seg->buf);
+    g_free(seg);
+
+    return ret;
+}
+
+static void virtio_net_rsc_purge(void *opq)
+{
+    NetRscChain *chain = (NetRscChain *)opq;
+    NetRscSeg *seg, *rn;
+
+    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
+        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
+            chain->stat.purge_failed++;
+            continue;
+        }
+    }
+
+    chain->stat.timer++;
+    if (!QTAILQ_EMPTY(&chain->buffers)) {
+        timer_mod(chain->drain_timer,
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
+    }
+}
+
+static void virtio_net_rsc_cleanup(VirtIONet *n)
+{
+    NetRscChain *chain, *rn_chain;
+    NetRscSeg *seg, *rn_seg;
+
+    QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
+        QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
+            QTAILQ_REMOVE(&chain->buffers, seg, next);
+            g_free(seg->buf);
+            g_free(seg);
+        }
+
+        timer_del(chain->drain_timer);
+        timer_free(chain->drain_timer);
+        QTAILQ_REMOVE(&n->rsc_chains, chain, next);
+        g_free(chain);
+    }
+}
+
+static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
+                                     const uint8_t *buf, size_t size)
+{
+    uint16_t hdr_len;
+    NetRscSeg *seg;
+
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    seg = g_malloc(sizeof(NetRscSeg));
+    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
+                   + sizeof(struct ip6_header) + VIRTIO_NET_MAX_TCP_PAYLOAD);
+    memcpy(seg->buf, buf, size);
+    seg->size = size;
+    seg->dup_ack_count = 0;
+    seg->is_coalesced = 0;
+    seg->nc = nc;
+
+    QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
+    chain->stat.cache++;
+
+    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
+}
+
+static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
+                                 const uint8_t *buf, struct tcp_header *n_tcp,
+                                 struct tcp_header *o_tcp)
+{
+    uint32_t nack, oack;
+    uint16_t nwin, owin;
+
+    nack = htonl(n_tcp->th_ack);
+    nwin = htons(n_tcp->th_win);
+    oack = htonl(o_tcp->th_ack);
+    owin = htons(o_tcp->th_win);
+
+    if ((nack - oack) >= VIRTIO_NET_MAX_TCP_PAYLOAD) {
+        chain->stat.ack_out_of_win++;
+        return RSC_FINAL;
+    } else if (nack == oack) {
+        /* duplicated ack or window probe */
+        if (nwin == owin) {
+            /* duplicated ack, add dup ack count due to whql test up to 1 */
+            chain->stat.dup_ack++;
+
+            if (seg->dup_ack_count == 0) {
+                seg->dup_ack_count++;
+                chain->stat.dup_ack1++;
+                return RSC_COALESCE;
+            } else {
+                /* Spec says should send it directly */
+                chain->stat.dup_ack2++;
+                return RSC_FINAL;
+            }
+        } else {
+            /* Coalesce window update */
+            o_tcp->th_win = n_tcp->th_win;
+            chain->stat.win_update++;
+            return RSC_COALESCE;
+        }
+    } else {
+        /* pure ack, update ack */
+        o_tcp->th_ack = n_tcp->th_ack;
+        chain->stat.pure_ack++;
+        return RSC_COALESCE;
+    }
+}
+
+static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain, NetRscSeg *seg,
+                                    const uint8_t *buf, NetRscUnit *n_unit)
+{
+    void *data;
+    uint16_t o_ip_len;
+    uint32_t nseq, oseq;
+    NetRscUnit *o_unit;
+
+    o_unit = &seg->unit;
+    o_ip_len = htons(*o_unit->ip_plen);
+    nseq = htonl(n_unit->tcp->th_seq);
+    oseq = htonl(o_unit->tcp->th_seq);
+
+    if (n_unit->tcp_hdrlen > sizeof(struct tcp_header)) {
+        /* Log this only for debugging observation */
+        chain->stat.tcp_option++;
+    }
+
+    /* Ignore packet with more/larger tcp options */
+    if (n_unit->tcp_hdrlen > o_unit->tcp_hdrlen) {
+        chain->stat.tcp_larger_option++;
+        return RSC_FINAL;
+    }
+
+    /* Remember larger/smaller options may behave different,so keep a
+       separate statitics for each of them for debugging convenience. */
+    if (n_unit->tcp_hdrlen < o_unit->tcp_hdrlen) {
+        chain->stat.tcp_smaller_option++;
+        return RSC_FINAL;
+    }
+
+    /* out of order or retransmitted. */
+    if ((nseq - oseq) > VIRTIO_NET_MAX_TCP_PAYLOAD) {
+        chain->stat.data_out_of_win++;
+        return RSC_FINAL;
+    }
+
+    data = ((uint8_t *)n_unit->tcp) + n_unit->tcp_hdrlen;
+    if (nseq == oseq) {
+        if ((0 == o_unit->payload) && n_unit->payload) {
+            /* From no payload to payload, normal case, not a dup ack or etc */
+            chain->stat.data_after_pure_ack++;
+            goto coalesce;
+        } else {
+            return virtio_net_rsc_handle_ack(chain, seg, buf,
+                                             n_unit->tcp, o_unit->tcp);
+        }
+    } else if ((nseq - oseq) != o_unit->payload) {
+        /* Not a consistent packet, out of order */
+        chain->stat.data_out_of_order++;
+        return RSC_FINAL;
+    } else {
+coalesce:
+        if ((o_ip_len + n_unit->payload) > chain->max_payload) {
+            chain->stat.over_size++;
+            return RSC_FINAL;
+        }
+
+        /* Here comes the right data, the payload lengh in v4/v6 is different,
+           so use the field value to update and record the new data len */
+        o_unit->payload += n_unit->payload; /* update new data len */
+
+        /* update field in ip header */
+        *o_unit->ip_plen = htons(o_ip_len + n_unit->payload);
+
+        /* Bring 'PUSH' big, the whql test guide says 'PUSH' can be coalesced
+           for windows guest, while this may change the behavior for linux
+           guest. */
+        o_unit->tcp->th_offset_flags = n_unit->tcp->th_offset_flags;
+
+        o_unit->tcp->th_ack = n_unit->tcp->th_ack;
+        o_unit->tcp->th_win = n_unit->tcp->th_win;
+
+        memmove(seg->buf + seg->size, data, n_unit->payload);
+        seg->size += n_unit->payload;
+        chain->stat.coalesced++;
+        return RSC_COALESCE;
+    }
+}
+
+static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
+                        const uint8_t *buf, size_t size, NetRscUnit *unit)
+{
+    struct ip_header *ip1, *ip2;
+
+    ip1 = (struct ip_header *)(unit->ip);
+    ip2 = (struct ip_header *)(seg->unit.ip);
+    if ((ip1->ip_src ^ ip2->ip_src) || (ip1->ip_dst ^ ip2->ip_dst)
+        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
+        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
+        chain->stat.no_match++;
+        return RSC_NO_MATCH;
+    }
+
+    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
+}
+
+/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
+ * to prevent out of order */
+static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
+                                         struct tcp_header *tcp)
+{
+    uint16_t tcp_flag;
+
+    tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
+    if (tcp_flag & TH_SYN) {
+        chain->stat.tcp_syn++;
+        return RSC_BYPASS;
+    }
+
+    if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
+        chain->stat.tcp_ctrl_drain++;
+        return RSC_FINAL;
+    }
+
+    return RSC_WANT;
+}
+
+static bool virtio_net_rsc_empty_cache(NetRscChain *chain, NetClientState *nc,
+                                       const uint8_t *buf, size_t size)
+{
+    if (QTAILQ_EMPTY(&chain->buffers)) {
+        chain->stat.empty_cache++;
+        virtio_net_rsc_cache_buf(chain, nc, buf, size);
+        timer_mod(chain->drain_timer,
+              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
+        return 1;
+    }
+
+    return 0;
+}
+
+static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
+                                         const uint8_t *buf, size_t size,
+                                         NetRscUnit *unit)
+{
+    int ret;
+    NetRscSeg *seg, *nseg;
+
+    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
+        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
+
+        if (ret == RSC_FINAL) {
+            if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
+                /* Send failed */
+                chain->stat.final_failed++;
+                return 0;
+            }
+
+            /* Send current packet */
+            return virtio_net_do_receive(nc, buf, size);
+        } else if (ret == RSC_NO_MATCH) {
+            continue;
+        } else {
+            /* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
+            seg->is_coalesced = 1;
+            return size;
+        }
+    }
+
+    chain->stat.no_match_cache++;
+    virtio_net_rsc_cache_buf(chain, nc, buf, size);
+    return size;
+}
+
+/* Drain a connection data, this is to avoid out of order segments */
+static size_t virtio_net_rsc_drain_flow(NetRscChain *chain, NetClientState *nc,
+                        const uint8_t *buf, size_t size, uint16_t ip_start,
+                        uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
+{
+    NetRscSeg *seg, *nseg;
+
+    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
+        if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
+            || memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
+            continue;
+        }
+        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
+            chain->stat.drain_failed++;
+        }
+
+        break;
+    }
+
+    return virtio_net_do_receive(nc, buf, size);
+}
+
+static int32_t virtio_net_rsc_sanity_check4(NetRscChain *chain,
+                        struct ip_header *ip, const uint8_t *buf, size_t size)
+{
+    uint16_t ip_len;
+    uint16_t hdr_len;
+
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header)
+        + sizeof(struct tcp_header))) {
+        return RSC_BYPASS;
+    }
+
+    /* Not an ipv4 one */
+    if (((ip->ip_ver_len & 0xF0) >> 4) != IP_HEADER_VERSION_4) {
+        chain->stat.ip_option++;
+        return RSC_BYPASS;
+    }
+
+    /* Don't handle packets with ip option */
+    if (VIRTIO_NET_IP4_HEADER_LENGTH != (ip->ip_ver_len & 0xF)) {
+        chain->stat.ip_option++;
+        return RSC_BYPASS;
+    }
+
+    /* Don't handle packets with ip fragment */
+    if (!(htons(ip->ip_off) & IP_DF)) {
+        chain->stat.ip_frag++;
+        return RSC_BYPASS;
+    }
+
+    if (ip->ip_p != IPPROTO_TCP) {
+        chain->stat.bypass_not_tcp++;
+        return RSC_BYPASS;
+    }
+
+    ip_len = htons(ip->ip_len);
+    if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
+        || ip_len > (size - hdr_len - sizeof(struct eth_header))) {
+        chain->stat.ip_hacked++;
+        return RSC_BYPASS;
+    }
+
+    return RSC_WANT;
+}
+
+static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
+                                      const uint8_t *buf, size_t size)
+{
+    int32_t ret;
+    uint16_t hdr_len;
+    NetRscChain *chain;
+    NetRscUnit unit;
+
+    chain = (NetRscChain *)opq;
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    virtio_net_rsc_extract_unit4(chain, buf, &unit);
+    if (RSC_WANT != virtio_net_rsc_sanity_check4(chain, unit.ip, buf, size)) {
+        return virtio_net_do_receive(nc, buf, size);
+    }
+
+    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
+    if (ret == RSC_BYPASS) {
+        return virtio_net_do_receive(nc, buf, size);
+    } else if (ret == RSC_FINAL) {
+        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
+                ((hdr_len + sizeof(struct eth_header)) + 12),
+                VIRTIO_NET_IP4_ADDR_SIZE,
+                hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header),
+                VIRTIO_NET_TCP_PORT_SIZE);
+    }
+
+    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
+        return size;
+    }
+
+    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
+}
+
+static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
+                                                NetClientState *nc,
+                                                uint16_t proto)
+{
+    NetRscChain *chain;
+
+    if (proto != (uint16_t)ETH_P_IP) {
+        return NULL;
+    }
+
+    QTAILQ_FOREACH(chain, &n->rsc_chains, next) {
+        if (chain->proto == proto) {
+            return chain;
+        }
+    }
+
+    chain = g_malloc(sizeof(*chain));
+    chain->n = n;
+    chain->proto = proto;
+    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
+    chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                      virtio_net_rsc_purge, chain);
+    memset(&chain->stat, 0, sizeof(chain->stat));
+
+    QTAILQ_INIT(&chain->buffers);
+    QTAILQ_INSERT_TAIL(&n->rsc_chains, chain, next);
+
+    return chain;
+}
 
 static ssize_t virtio_net_rsc_receive(NetClientState *nc,
                                       const uint8_t *buf, size_t size)
 {
-    return virtio_net_do_receive(nc, buf, size);
+    uint16_t proto;
+    NetRscChain *chain;
+    struct eth_header *eth;
+    VirtIONet *n;
+
+    n = qemu_get_nic_opaque(nc);
+    if (size < (n->guest_hdr_len + sizeof(struct eth_header))) {
+        return virtio_net_do_receive(nc, buf, size);
+    }
+
+    eth = (struct eth_header *)(buf + n->guest_hdr_len);
+    proto = htons(eth->h_proto);
+
+    chain = virtio_net_rsc_lookup_chain(n, nc, proto);
+    if (!chain) {
+        return virtio_net_do_receive(nc, buf, size);
+    } else {
+        chain->stat.received++;
+        return virtio_net_rsc_receive4(chain, nc, buf, size);
+    }
 }
 
 static ssize_t virtio_net_receive(NetClientState *nc,
@@ -1837,6 +2313,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     nc = qemu_get_queue(n->nic);
     nc->rxfilter_notify_enabled = 1;
 
+    QTAILQ_INIT(&n->rsc_chains);
     n->qdev = dev;
     register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
                     virtio_net_save, virtio_net_load, n);
@@ -1871,6 +2348,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
     g_free(n->vqs);
     qemu_del_nic(n->nic);
     virtio_cleanup(vdev);
+    virtio_net_rsc_cleanup(n);
 }
 
 static void virtio_net_instance_init(Object *obj)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 0cabdb6..6939e92 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -59,6 +59,7 @@ typedef struct VirtIONet {
     VirtIONetQueue *vqs;
     VirtQueue *ctrl_vq;
     NICState *nic;
+    QTAILQ_HEAD(, NetRscChain) rsc_chains;
     uint32_t tx_timeout;
     int32_t tx_burst;
     uint32_t has_vnet_hdr;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 2b5b248..6be1463 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -128,6 +128,78 @@ typedef struct VirtioDeviceClass {
     int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
 } VirtioDeviceClass;
 
+/* Coalesced packets type & status */
+typedef enum {
+    RSC_COALESCE,           /* Data been coalesced */
+    RSC_FINAL,              /* Will terminate current connection */
+    RSC_NO_MATCH,           /* No matched in the buffer pool */
+    RSC_BYPASS,             /* Packet to be bypass, not tcp, tcp ctrl, etc */
+    RSC_WANT                /* Data want to be coalesced */
+} COALESCE_STATUS;
+
+typedef struct NetRscStat {
+    uint32_t received;
+    uint32_t coalesced;
+    uint32_t over_size;
+    uint32_t cache;
+    uint32_t empty_cache;
+    uint32_t no_match_cache;
+    uint32_t win_update;
+    uint32_t no_match;
+    uint32_t tcp_syn;
+    uint32_t tcp_ctrl_drain;
+    uint32_t dup_ack;
+    uint32_t dup_ack1;
+    uint32_t dup_ack2;
+    uint32_t pure_ack;
+    uint32_t ack_out_of_win;
+    uint32_t data_out_of_win;
+    uint32_t data_out_of_order;
+    uint32_t data_after_pure_ack;
+    uint32_t bypass_not_tcp;
+    uint32_t tcp_option;
+    uint32_t tcp_larger_option;
+    uint32_t tcp_smaller_option;
+    uint32_t ip_frag;
+    uint32_t ip_hacked;
+    uint32_t ip_option;
+    uint32_t purge_failed;
+    uint32_t drain_failed;
+    uint32_t final_failed;
+    int64_t  timer;
+} NetRscStat;
+
+/* Rsc unit general info used to checking if can coalescing */
+typedef struct NetRscUnit {
+    void *ip;   /* ip header */
+    uint16_t *ip_plen;      /* data len pointer in ip header field */
+    struct tcp_header *tcp; /* tcp header */
+    uint16_t tcp_hdrlen;    /* tcp header len */
+    uint16_t payload;       /* pure payload without virtio/eth/ip/tcp */
+} NetRscUnit;
+
+/* Coalesced segmant */
+typedef struct NetRscSeg {
+    QTAILQ_ENTRY(NetRscSeg) next;
+    void *buf;
+    size_t size;
+    uint32_t dup_ack_count;
+    bool is_coalesced;      /* need recal ipv4 header checksum, mark here */
+    NetRscUnit unit;
+    NetClientState *nc;
+} NetRscSeg;
+
+/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
+typedef struct NetRscChain {
+    QTAILQ_ENTRY(NetRscChain) next;
+    void *n;                            /* VirtIONet */
+    uint16_t proto;
+    uint16_t max_payload;
+    QEMUTimer *drain_timer;
+    QTAILQ_HEAD(, NetRscSeg) buffers;
+    NetRscStat stat;
+} NetRscChain;
+
 void virtio_instance_init_common(Object *proxy_obj, void *data,
                                  size_t vdev_size, const char *vdev_name);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic
  2016-04-03 19:25 [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL wexu
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit wexu
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
@ 2016-04-03 19:25 ` wexu
  2016-04-05  2:50   ` Jason Wang
  2016-04-05  2:52 ` [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL Jason Wang
  3 siblings, 1 reply; 17+ messages in thread
From: wexu @ 2016-04-03 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: victork, mst, jasowang, yvugenfi, Wei Xu, marcel, dfleytma

From: Wei Xu <wexu@redhat.com>

Most things like ipv4 except there is a significant difference between ipv4
and ipv6, the fragment lenght in ipv4 header includes itself, while it's not
included for ipv6, thus means ipv6 can carry a real '65535' payload.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/virtio-net.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 141 insertions(+), 6 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 81e8e71..2d09352 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -50,6 +50,10 @@
 /* header lenght value in ip header without option */
 #define VIRTIO_NET_IP4_HEADER_LENGTH 5
 
+#define ETH_IP6_HDR_SZ (ETH_HDR_SZ + IP6_HDR_SZ)
+#define VIRTIO_NET_IP6_ADDR_SIZE   32      /* ipv6 saddr + daddr */
+#define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
+
 /* Purge coalesced packets timer interval */
 #define VIRTIO_NET_RSC_INTERVAL  300000
 
@@ -1725,6 +1729,25 @@ static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
     unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
 }
 
+static void virtio_net_rsc_extract_unit6(NetRscChain *chain,
+                                         const uint8_t *buf, NetRscUnit* unit)
+{
+    uint16_t hdr_len;
+    struct ip6_header *ip6;
+
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    ip6 = (struct ip6_header *)(buf + hdr_len + sizeof(struct eth_header));
+    unit->ip = ip6;
+    unit->ip_plen = &(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
+    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip)\
+                                        + sizeof(struct ip6_header));
+    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
+
+    /* There is a difference between payload lenght in ipv4 and v6,
+       ip header is excluded in ipv6 */
+    unit->payload = htons(*unit->ip_plen) - unit->tcp_hdrlen;
+}
+
 static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
 {
     uint32_t sum;
@@ -1738,7 +1761,9 @@ static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
 {
     int ret;
 
-    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
+    if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+        virtio_net_rsc_ipv4_checksum(seg->unit.ip);
+    }
     ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
     QTAILQ_REMOVE(&chain->buffers, seg, next);
     g_free(seg->buf);
@@ -1804,7 +1829,18 @@ static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
     QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
     chain->stat.cache++;
 
-    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
+    switch (chain->proto) {
+    case ETH_P_IP:
+        virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
+        break;
+
+    case ETH_P_IPV6:
+        virtio_net_rsc_extract_unit6(chain, seg->buf, &seg->unit);
+        break;
+
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
@@ -1948,6 +1984,24 @@ static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
     return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
 }
 
+static int32_t virtio_net_rsc_coalesce6(NetRscChain *chain, NetRscSeg *seg,
+                        const uint8_t *buf, size_t size, NetRscUnit *unit)
+{
+    struct ip6_header *ip1, *ip2;
+
+    ip1 = (struct ip6_header *)(unit->ip);
+    ip2 = (struct ip6_header *)(seg->unit.ip);
+    if (memcmp(&ip1->ip6_src, &ip2->ip6_src, sizeof(struct in6_address))
+        || memcmp(&ip1->ip6_dst, &ip2->ip6_dst, sizeof(struct in6_address))
+        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
+        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
+            chain->stat.no_match++;
+            return RSC_NO_MATCH;
+    }
+
+    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
+}
+
 /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
  * to prevent out of order */
 static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
@@ -1991,7 +2045,11 @@ static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
     NetRscSeg *seg, *nseg;
 
     QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
-        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
+        if (chain->proto == ETH_P_IP) {
+            ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
+        } else {
+            ret = virtio_net_rsc_coalesce6(chain, seg, buf, size, unit);
+        }
 
         if (ret == RSC_FINAL) {
             if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
@@ -2116,13 +2174,82 @@ static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
     return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
 }
 
+static int32_t virtio_net_rsc_sanity_check6(NetRscChain *chain,
+                                            struct ip6_header *ip6,
+                                            const uint8_t *buf, size_t size)
+{
+    uint16_t ip_len;
+    uint16_t hdr_len;
+
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header)
+        + sizeof(tcp_header))) {
+        return RSC_BYPASS;
+    }
+
+    if (((ip6->ip6_ctlun.ip6_un1.ip6_un1_flow & 0xF0) >> 4)
+        != IP_HEADER_VERSION_6) {
+        return RSC_BYPASS;
+    }
+
+    /* Both option and protocol is checked in this */
+    if (ip6->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
+        chain->stat.bypass_not_tcp++;
+        return RSC_BYPASS;
+    }
+
+    ip_len = htons(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
+    if (ip_len < sizeof(struct tcp_header)
+        || ip_len > (size - hdr_len - sizeof(struct eth_header)
+                     - sizeof(struct ip6_header))) {
+        chain->stat.ip_hacked++;
+        return RSC_BYPASS;
+    }
+
+    return RSC_WANT;
+}
+
+static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
+                                      const uint8_t *buf, size_t size)
+{
+    int32_t ret;
+    uint16_t hdr_len;
+    NetRscChain *chain;
+    NetRscUnit unit;
+
+    chain = (NetRscChain *)opq;
+    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
+    virtio_net_rsc_extract_unit6(chain, buf, &unit);
+    if (RSC_WANT != virtio_net_rsc_sanity_check6(chain,
+                                                 unit.ip, buf, size)) {
+        return virtio_net_do_receive(nc, buf, size);
+    }
+
+    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
+    if (ret == RSC_BYPASS) {
+        return virtio_net_do_receive(nc, buf, size);
+    } else if (ret == RSC_FINAL) {
+        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
+                ((hdr_len + sizeof(struct eth_header)) + 8),
+                VIRTIO_NET_IP6_ADDR_SIZE,
+                hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header),
+                VIRTIO_NET_TCP_PORT_SIZE);
+    }
+
+    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
+        return size;
+    }
+
+    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
+}
+
 static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
                                                 NetClientState *nc,
                                                 uint16_t proto)
 {
     NetRscChain *chain;
 
-    if (proto != (uint16_t)ETH_P_IP) {
+    if ((proto != (uint16_t)ETH_P_IP) && (proto != (uint16_t)ETH_P_IPV6)) {
         return NULL;
     }
 
@@ -2135,7 +2262,11 @@ static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
     chain = g_malloc(sizeof(*chain));
     chain->n = n;
     chain->proto = proto;
-    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
+    if (proto == (uint16_t)ETH_P_IP) {
+        chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
+    } else {
+        chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
+    }
     chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
                                       virtio_net_rsc_purge, chain);
     memset(&chain->stat, 0, sizeof(chain->stat));
@@ -2167,7 +2298,11 @@ static ssize_t virtio_net_rsc_receive(NetClientState *nc,
         return virtio_net_do_receive(nc, buf, size);
     } else {
         chain->stat.received++;
-        return virtio_net_rsc_receive4(chain, nc, buf, size);
+        if (proto == (uint16_t)ETH_P_IP) {
+            return virtio_net_rsc_receive4(chain, nc, buf, size);
+        } else  {
+            return virtio_net_rsc_receive6(chain, nc, buf, size);
+        }
     }
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit wexu
@ 2016-04-05  2:05   ` Jason Wang
  2016-04-05  8:17     ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2016-04-05  2:05 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: marcel, victork, dfleytma, yvugenfi, mst



On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> A new feature bit 'VIRTIO_NET_F_GUEST_RSC' is introduced to support WHQL
> Receive-Segment-Offload test, this feature will coalesce tcp packets in
> IPv4/6 for userspace virtio-net driver.
>
> This feature can be enabled either by 'ACK' the feature when loading
> the driver in the guest, or by sending the 'VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET'
> command to the host via control queue.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>  hw/net/virtio-net.c                         | 29 +++++++++++++++++++++++++++--
>  include/standard-headers/linux/virtio_net.h |  1 +
>  2 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 5798f87..bd91a4b 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -537,6 +537,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
>          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
>          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> +        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_RSC);

Several questions here:

- I think RSC should work even without vnet_hdr?
- Need we differentiate ipv4 and ipv6 like TSO here?
- And looks like this patch should be squash to following patches.

>      }
>  
>      if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
> @@ -582,7 +583,8 @@ static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
>          (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
>          (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
>          (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
> -        (1ULL << VIRTIO_NET_F_GUEST_UFO);
> +        (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
> +        (1ULL << VIRTIO_NET_F_GUEST_RSC);

Looks like this is unnecessary since we won't set peer offload based on
GUEST_RSC.

>  
>      return guest_offloads_mask & features;
>  }
> @@ -1089,7 +1091,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
>      return 0;
>  }
>  
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t virtio_net_do_receive(NetClientState *nc,
> +                                     const uint8_t *buf, size_t size)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
>      VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1685,6 +1688,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>      return 0;
>  }
>  
> +
> +static ssize_t virtio_net_rsc_receive(NetClientState *nc,
> +                                      const uint8_t *buf, size_t size)
> +{
> +    return virtio_net_do_receive(nc, buf, size);
> +}
> +
> +static ssize_t virtio_net_receive(NetClientState *nc,
> +                                  const uint8_t *buf, size_t size)
> +{
> +    VirtIONet *n;
> +
> +    n = qemu_get_nic_opaque(nc);
> +    if (n->curr_guest_offloads & VIRTIO_NET_F_GUEST_RSC) {
> +        return virtio_net_rsc_receive(nc, buf, size);
> +    } else {
> +        return virtio_net_do_receive(nc, buf, size);
> +    }
> +}

The changes here looks odd since it did nothing. Like I've mentioned,
better merge the patch into following ones.

> +
>  static NetClientInfo net_virtio_info = {
>      .type = NET_CLIENT_OPTIONS_KIND_NIC,
>      .size = sizeof(NICState),
> @@ -1909,6 +1932,8 @@ static Property virtio_net_properties[] = {
>                         TX_TIMER_INTERVAL),
>      DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
>      DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
> +    DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
> +                    VIRTIO_NET_F_GUEST_RSC, true),

Need to compat the bit for old machine type to unbreak migration I believe?

Btw, also need a patch for virtio spec.

Thanks

>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> index a78f33e..5b95762 100644
> --- a/include/standard-headers/linux/virtio_net.h
> +++ b/include/standard-headers/linux/virtio_net.h
> @@ -55,6 +55,7 @@
>  #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
>  					 * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
> +#define VIRTIO_NET_F_GUEST_RSC  24      /* Guest can coalesce tcp packets */
>  
>  #ifndef VIRTIO_NET_NO_LEGACY
>  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
@ 2016-04-05  2:47   ` Jason Wang
  2016-04-08  7:47     ` Wei Xu
  2016-04-12  8:23   ` Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Wang @ 2016-04-05  2:47 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: marcel, victork, dfleytma, yvugenfi, mst



On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 50000(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '300000(300us)' can pass the test case,
> this is also the default value, it's gonna to be tunable.
>
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets
>
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
>
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
>
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>    this is to notify upper layer TCP starting retransmission due to the spec.
>
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
>
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
>
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
>
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit.
>
> Statistics can be used to monitor the basic coalescing status, the 'out of order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>  hw/net/virtio-net.c            | 480 ++++++++++++++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio-net.h |   1 +
>  include/hw/virtio/virtio.h     |  72 +++++++
>  3 files changed, 552 insertions(+), 1 deletion(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index bd91a4b..81e8e71 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -15,10 +15,12 @@
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> +#include "net/eth.h"
>  #include "net/checksum.h"
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/sockets.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "hw/virtio/virtio-bus.h"
> @@ -38,6 +40,24 @@
>  #define endof(container, field) \
>      (offsetof(container, field) + sizeof(((container *)0)->field))
>  
> +#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
> +#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
> +
> +/* IPv4 max payload, 16 bits in the header */
> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
> +
> +/* header lenght value in ip header without option */

typo here.

> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
> +
> +/* Purge coalesced packets timer interval */
> +#define VIRTIO_NET_RSC_INTERVAL  300000
> +
> +/* This value affects the performance a lot, and should be tuned carefully,
> +   '300000'(300us) is the recommended value to pass the WHQL test, '50000' can
> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;

Like we've discussed in previous versions, need we add another property
for this?

> +
>  typedef struct VirtIOFeature {
>      uint32_t flags;
>      size_t end;
> @@ -1688,11 +1708,467 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>      return 0;
>  }
>  
> +static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
> +                                         const uint8_t *buf, NetRscUnit* unit)
> +{
> +    uint16_t hdr_len;
> +    uint16_t ip_hdrlen;
> +    struct ip_header *ip;
> +
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;

The case seems odd. Why not just use VirtIONet * for chain->n?

> +    ip = (struct ip_header *)(buf + hdr_len + sizeof(struct eth_header));
> +    unit->ip = (void *)ip;
> +    ip_hdrlen = (ip->ip_ver_len & 0xF) << 2;
> +    unit->ip_plen = &ip->ip_len;
> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip) + ip_hdrlen);
> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
> +    unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
> +}
> +
> +static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
> +{
> +    uint32_t sum;
> +
> +    ip->ip_sum = 0;
> +    sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
> +    ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
 
Should we set VIRTIO_NET_HDR_F_DATA_VALID for hdr.flags here?

> +}
> +
> +static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
> +{
> +    int ret;
> +
> +    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
> +    ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
> +    QTAILQ_REMOVE(&chain->buffers, seg, next);
> +    g_free(seg->buf);
> +    g_free(seg);
> +
> +    return ret;
> +}
> +
> +static void virtio_net_rsc_purge(void *opq)
> +{
> +    NetRscChain *chain = (NetRscChain *)opq;
> +    NetRscSeg *seg, *rn;
> +
> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> +            chain->stat.purge_failed++;
> +            continue;

Is it better to break here, consider we fail to do the receive?

> +        }
> +    }
> +
> +    chain->stat.timer++;
> +    if (!QTAILQ_EMPTY(&chain->buffers)) {
> +        timer_mod(chain->drain_timer,
> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
> +    }
> +}
> +
> +static void virtio_net_rsc_cleanup(VirtIONet *n)
> +{
> +    NetRscChain *chain, *rn_chain;
> +    NetRscSeg *seg, *rn_seg;
> +
> +    QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
> +        QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
> +            QTAILQ_REMOVE(&chain->buffers, seg, next);
> +            g_free(seg->buf);
> +            g_free(seg);
> +        }
> +
> +        timer_del(chain->drain_timer);
> +        timer_free(chain->drain_timer);
> +        QTAILQ_REMOVE(&n->rsc_chains, chain, next);
> +        g_free(chain);
> +    }
> +}
> +
> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
> +                                     const uint8_t *buf, size_t size)
> +{
> +    uint16_t hdr_len;
> +    NetRscSeg *seg;
> +
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    seg = g_malloc(sizeof(NetRscSeg));
> +    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
> +                   + sizeof(struct ip6_header) + VIRTIO_NET_MAX_TCP_PAYLOAD);

Why ip6_header here?

> +    memcpy(seg->buf, buf, size);
> +    seg->size = size;
> +    seg->dup_ack_count = 0;
> +    seg->is_coalesced = 0;
> +    seg->nc = nc;
> +
> +    QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
> +    chain->stat.cache++;
> +
> +    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
> +}
> +
> +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
> +                                 const uint8_t *buf, struct tcp_header *n_tcp,
> +                                 struct tcp_header *o_tcp)
> +{
> +    uint32_t nack, oack;
> +    uint16_t nwin, owin;
> +
> +    nack = htonl(n_tcp->th_ack);
> +    nwin = htons(n_tcp->th_win);
> +    oack = htonl(o_tcp->th_ack);
> +    owin = htons(o_tcp->th_win);
> +
> +    if ((nack - oack) >= VIRTIO_NET_MAX_TCP_PAYLOAD) {
> +        chain->stat.ack_out_of_win++;
> +        return RSC_FINAL;
> +    } else if (nack == oack) {
> +        /* duplicated ack or window probe */
> +        if (nwin == owin) {
> +            /* duplicated ack, add dup ack count due to whql test up to 1 */
> +            chain->stat.dup_ack++;
> +
> +            if (seg->dup_ack_count == 0) {
> +                seg->dup_ack_count++;
> +                chain->stat.dup_ack1++;
> +                return RSC_COALESCE;
> +            } else {
> +                /* Spec says should send it directly */

Better quote the what spec said here or just remove the above comment.

> +                chain->stat.dup_ack2++;
> +                return RSC_FINAL;
> +            }
> +        } else {
> +            /* Coalesce window update */
> +            o_tcp->th_win = n_tcp->th_win;
> +            chain->stat.win_update++;
> +            return RSC_COALESCE;
> +        }
> +    } else {
> +        /* pure ack, update ack */
> +        o_tcp->th_ack = n_tcp->th_ack;
> +        chain->stat.pure_ack++;
> +        return RSC_COALESCE;
> +    }
> +}
> +
> +static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain, NetRscSeg *seg,
> +                                    const uint8_t *buf, NetRscUnit *n_unit)
> +{
> +    void *data;
> +    uint16_t o_ip_len;
> +    uint32_t nseq, oseq;
> +    NetRscUnit *o_unit;
> +
> +    o_unit = &seg->unit;
> +    o_ip_len = htons(*o_unit->ip_plen);
> +    nseq = htonl(n_unit->tcp->th_seq);
> +    oseq = htonl(o_unit->tcp->th_seq);
> +
> +    if (n_unit->tcp_hdrlen > sizeof(struct tcp_header)) {
> +        /* Log this only for debugging observation */
> +        chain->stat.tcp_option++;
> +    }
> +
> +    /* Ignore packet with more/larger tcp options */
> +    if (n_unit->tcp_hdrlen > o_unit->tcp_hdrlen) {
> +        chain->stat.tcp_larger_option++;
> +        return RSC_FINAL;
> +    }
> +
> +    /* Remember larger/smaller options may behave different,so keep a
> +       separate statitics for each of them for debugging convenience. */
> +    if (n_unit->tcp_hdrlen < o_unit->tcp_hdrlen) {
> +        chain->stat.tcp_smaller_option++;
> +        return RSC_FINAL;
> +    }
> +
> +    /* out of order or retransmitted. */
> +    if ((nseq - oseq) > VIRTIO_NET_MAX_TCP_PAYLOAD) {
> +        chain->stat.data_out_of_win++;
> +        return RSC_FINAL;
> +    }
> +
> +    data = ((uint8_t *)n_unit->tcp) + n_unit->tcp_hdrlen;
> +    if (nseq == oseq) {
> +        if ((0 == o_unit->payload) && n_unit->payload) {

o_unit->payload == 0 please.

> +            /* From no payload to payload, normal case, not a dup ack or etc */
> +            chain->stat.data_after_pure_ack++;
> +            goto coalesce;
> +        } else {
> +            return virtio_net_rsc_handle_ack(chain, seg, buf,
> +                                             n_unit->tcp, o_unit->tcp);
> +        }
> +    } else if ((nseq - oseq) != o_unit->payload) {
> +        /* Not a consistent packet, out of order */
> +        chain->stat.data_out_of_order++;
> +        return RSC_FINAL;
> +    } else {
> +coalesce:
> +        if ((o_ip_len + n_unit->payload) > chain->max_payload) {
> +            chain->stat.over_size++;
> +            return RSC_FINAL;
> +        }
> +
> +        /* Here comes the right data, the payload lengh in v4/v6 is different,
> +           so use the field value to update and record the new data len */
> +        o_unit->payload += n_unit->payload; /* update new data len */
> +
> +        /* update field in ip header */
> +        *o_unit->ip_plen = htons(o_ip_len + n_unit->payload);
> +
> +        /* Bring 'PUSH' big, the whql test guide says 'PUSH' can be coalesced
> +           for windows guest, while this may change the behavior for linux
> +           guest. */

In the future, need to measure TCP_RR performance in this case to see
the impact on Linux guests.

> +        o_unit->tcp->th_offset_flags = n_unit->tcp->th_offset_flags;
> +
> +        o_unit->tcp->th_ack = n_unit->tcp->th_ack;
> +        o_unit->tcp->th_win = n_unit->tcp->th_win;
> +
> +        memmove(seg->buf + seg->size, data, n_unit->payload);
> +        seg->size += n_unit->payload;
> +        chain->stat.coalesced++;
> +        return RSC_COALESCE;
> +    }
> +}
> +
> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
> +                        const uint8_t *buf, size_t size, NetRscUnit *unit)

indentation is wrong here.

> +{
> +    struct ip_header *ip1, *ip2;
> +
> +    ip1 = (struct ip_header *)(unit->ip);
> +    ip2 = (struct ip_header *)(seg->unit.ip);
> +    if ((ip1->ip_src ^ ip2->ip_src) || (ip1->ip_dst ^ ip2->ip_dst)
> +        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
> +        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
> +        chain->stat.no_match++;
> +        return RSC_NO_MATCH;
> +    }
> +
> +    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
> +}
> +
> +/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
> + * to prevent out of order */
> +static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
> +                                         struct tcp_header *tcp)
> +{
> +    uint16_t tcp_flag;
> +
> +    tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
> +    if (tcp_flag & TH_SYN) {
> +        chain->stat.tcp_syn++;
> +        return RSC_BYPASS;
> +    }
> +
> +    if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
> +        chain->stat.tcp_ctrl_drain++;
> +        return RSC_FINAL;
> +    }
> +
> +    return RSC_WANT;
> +}
> +
> +static bool virtio_net_rsc_empty_cache(NetRscChain *chain, NetClientState *nc,
> +                                       const uint8_t *buf, size_t size)
> +{
> +    if (QTAILQ_EMPTY(&chain->buffers)) {
> +        chain->stat.empty_cache++;
> +        virtio_net_rsc_cache_buf(chain, nc, buf, size);
> +        timer_mod(chain->drain_timer,
> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
> +                                         const uint8_t *buf, size_t size,
> +                                         NetRscUnit *unit)
> +{
> +    int ret;
> +    NetRscSeg *seg, *nseg;
> +
> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> +        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
> +
> +        if (ret == RSC_FINAL) {
> +            if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> +                /* Send failed */
> +                chain->stat.final_failed++;
> +                return 0;
> +            }
> +
> +            /* Send current packet */
> +            return virtio_net_do_receive(nc, buf, size);
> +        } else if (ret == RSC_NO_MATCH) {
> +            continue;
> +        } else {
> +            /* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
> +            seg->is_coalesced = 1;
> +            return size;
> +        }
> +    }
> +
> +    chain->stat.no_match_cache++;
> +    virtio_net_rsc_cache_buf(chain, nc, buf, size);

Can we rework the code to make it a little bit more readable here?

RSC_BYPASS, RSC_WANT was checked by the caller
(virtio_net_rsc_receive4()). and the rest were checked here. Can we do
all the check in one function with a switch...case?

Maybe you can introduce sanity check and tcp check callback for each chain.


> +    return size;
> +}
> +
> +/* Drain a connection data, this is to avoid out of order segments */
> +static size_t virtio_net_rsc_drain_flow(NetRscChain *chain, NetClientState *nc,
> +                        const uint8_t *buf, size_t size, uint16_t ip_start,
> +                        uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)

indentation is wrong.

> +{
> +    NetRscSeg *seg, *nseg;
> +
> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> +        if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
> +            || memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
> +            continue;

May consider to optimize (e.g not using linear search) in the future.

> +        }
> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> +            chain->stat.drain_failed++;
> +        }
> +
> +        break;
> +    }
> +
> +    return virtio_net_do_receive(nc, buf, size);
> +}
> +
> +static int32_t virtio_net_rsc_sanity_check4(NetRscChain *chain,
> +                        struct ip_header *ip, const uint8_t *buf, size_t size)+        chain->stat.ip_option++;

indentation.

> +{
> +    uint16_t ip_len;
> +    uint16_t hdr_len;
> +
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header)
> +        + sizeof(struct tcp_header))) {
> +        return RSC_BYPASS;
> +    }
> +
> +    /* Not an ipv4 one */
> +    if (((ip->ip_ver_len & 0xF0) >> 4) != IP_HEADER_VERSION_4) {
> +        chain->stat.ip_option++;
> +        return RSC_BYPASS;
> +    }
> +
> +    /* Don't handle packets with ip option */
> +    if (VIRTIO_NET_IP4_HEADER_LENGTH != (ip->ip_ver_len & 0xF)) {
> +        chain->stat.ip_option++;
> +        return RSC_BYPASS;
> +    }
> +
> +    /* Don't handle packets with ip fragment */
> +    if (!(htons(ip->ip_off) & IP_DF)) {
> +        chain->stat.ip_frag++;
> +        return RSC_BYPASS;
> +    }
> +
> +    if (ip->ip_p != IPPROTO_TCP) {
> +        chain->stat.bypass_not_tcp++;
> +        return RSC_BYPASS;
> +    }
> +
> +    ip_len = htons(ip->ip_len);
> +    if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
> +        || ip_len > (size - hdr_len - sizeof(struct eth_header))) {
> +        chain->stat.ip_hacked++;
> +        return RSC_BYPASS;
> +    }
> +
> +    return RSC_WANT;
> +}
> +
> +static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,

Why use void * here?

> +                                      const uint8_t *buf, size_t size)
> +{
> +    int32_t ret;
> +    uint16_t hdr_len;
> +    NetRscChain *chain;
> +    NetRscUnit unit;
> +
> +    chain = (NetRscChain *)opq;
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    virtio_net_rsc_extract_unit4(chain, buf, &unit);
> +    if (RSC_WANT != virtio_net_rsc_sanity_check4(chain, unit.ip, buf, size)) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    }
> +
> +    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
> +    if (ret == RSC_BYPASS) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    } else if (ret == RSC_FINAL) {
> +        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
> +                ((hdr_len + sizeof(struct eth_header)) + 12),
> +                VIRTIO_NET_IP4_ADDR_SIZE,
> +                hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header),
> +                VIRTIO_NET_TCP_PORT_SIZE);
> +    }
> +
> +    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
> +        return size;
> +    }
> +
> +    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
> +}
> +
> +static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
> +                                                NetClientState *nc,
> +                                                uint16_t proto)
> +{
> +    NetRscChain *chain;
> +
> +    if (proto != (uint16_t)ETH_P_IP) {
> +        return NULL;
> +    }
> +
> +    QTAILQ_FOREACH(chain, &n->rsc_chains, next) {
> +        if (chain->proto == proto) {
> +            return chain;
> +        }
> +    }
> +
> +    chain = g_malloc(sizeof(*chain));
> +    chain->n = n;
> +    chain->proto = proto;
> +    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
> +    chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
> +                                      virtio_net_rsc_purge, chain);
> +    memset(&chain->stat, 0, sizeof(chain->stat));
> +
> +    QTAILQ_INIT(&chain->buffers);
> +    QTAILQ_INSERT_TAIL(&n->rsc_chains, chain, next);
> +
> +    return chain;
> +}
>  
>  static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>                                        const uint8_t *buf, size_t size)
>  {
> -    return virtio_net_do_receive(nc, buf, size);
> +    uint16_t proto;
> +    NetRscChain *chain;
> +    struct eth_header *eth;
> +    VirtIONet *n;
> +
> +    n = qemu_get_nic_opaque(nc);
> +    if (size < (n->guest_hdr_len + sizeof(struct eth_header))) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    }
> +
> +    eth = (struct eth_header *)(buf + n->guest_hdr_len);
> +    proto = htons(eth->h_proto);
> +
> +    chain = virtio_net_rsc_lookup_chain(n, nc, proto);
> +    if (!chain) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    } else {
> +        chain->stat.received++;
> +        return virtio_net_rsc_receive4(chain, nc, buf, size);
> +    }
>  }
>  
>  static ssize_t virtio_net_receive(NetClientState *nc,
> @@ -1837,6 +2313,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>      nc = qemu_get_queue(n->nic);
>      nc->rxfilter_notify_enabled = 1;
>  
> +    QTAILQ_INIT(&n->rsc_chains);
>      n->qdev = dev;
>      register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>                      virtio_net_save, virtio_net_load, n);
> @@ -1871,6 +2348,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
>      g_free(n->vqs);
>      qemu_del_nic(n->nic);
>      virtio_cleanup(vdev);
> +    virtio_net_rsc_cleanup(n);
>  }
>  
>  static void virtio_net_instance_init(Object *obj)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 0cabdb6..6939e92 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -59,6 +59,7 @@ typedef struct VirtIONet {
>      VirtIONetQueue *vqs;
>      VirtQueue *ctrl_vq;
>      NICState *nic;
> +    QTAILQ_HEAD(, NetRscChain) rsc_chains;
>      uint32_t tx_timeout;
>      int32_t tx_burst;
>      uint32_t has_vnet_hdr;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 2b5b248..6be1463 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -128,6 +128,78 @@ typedef struct VirtioDeviceClass {
>      int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
>  } VirtioDeviceClass;
>  
> +/* Coalesced packets type & status */
> +typedef enum {
> +    RSC_COALESCE,           /* Data been coalesced */
> +    RSC_FINAL,              /* Will terminate current connection */
> +    RSC_NO_MATCH,           /* No matched in the buffer pool */
> +    RSC_BYPASS,             /* Packet to be bypass, not tcp, tcp ctrl, etc */
> +    RSC_WANT                /* Data want to be coalesced */
> +} COALESCE_STATUS;
> +
> +typedef struct NetRscStat {
> +    uint32_t received;
> +    uint32_t coalesced;
> +    uint32_t over_size;
> +    uint32_t cache;
> +    uint32_t empty_cache;
> +    uint32_t no_match_cache;
> +    uint32_t win_update;
> +    uint32_t no_match;
> +    uint32_t tcp_syn;
> +    uint32_t tcp_ctrl_drain;
> +    uint32_t dup_ack;
> +    uint32_t dup_ack1;
> +    uint32_t dup_ack2;
> +    uint32_t pure_ack;
> +    uint32_t ack_out_of_win;
> +    uint32_t data_out_of_win;
> +    uint32_t data_out_of_order;
> +    uint32_t data_after_pure_ack;
> +    uint32_t bypass_not_tcp;
> +    uint32_t tcp_option;
> +    uint32_t tcp_larger_option;
> +    uint32_t tcp_smaller_option;
> +    uint32_t ip_frag;
> +    uint32_t ip_hacked;
> +    uint32_t ip_option;
> +    uint32_t purge_failed;
> +    uint32_t drain_failed;
> +    uint32_t final_failed;
> +    int64_t  timer;
> +} NetRscStat;
> +
> +/* Rsc unit general info used to checking if can coalescing */
> +typedef struct NetRscUnit {
> +    void *ip;   /* ip header */
> +    uint16_t *ip_plen;      /* data len pointer in ip header field */
> +    struct tcp_header *tcp; /* tcp header */
> +    uint16_t tcp_hdrlen;    /* tcp header len */
> +    uint16_t payload;       /* pure payload without virtio/eth/ip/tcp */
> +} NetRscUnit;
> +
> +/* Coalesced segmant */
> +typedef struct NetRscSeg {
> +    QTAILQ_ENTRY(NetRscSeg) next;
> +    void *buf;
> +    size_t size;
> +    uint32_t dup_ack_count;
> +    bool is_coalesced;      /* need recal ipv4 header checksum, mark here */
> +    NetRscUnit unit;
> +    NetClientState *nc;
> +} NetRscSeg;
> +
> +/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
> +typedef struct NetRscChain {
> +    QTAILQ_ENTRY(NetRscChain) next;
> +    void *n;                            /* VirtIONet */
> +    uint16_t proto;
> +    uint16_t max_payload;
> +    QEMUTimer *drain_timer;
> +    QTAILQ_HEAD(, NetRscSeg) buffers;
> +    NetRscStat stat;
> +} NetRscChain;
> +
>  void virtio_instance_init_common(Object *proxy_obj, void *data,
>                                   size_t vdev_size, const char *vdev_name);
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 " wexu
@ 2016-04-05  2:50   ` Jason Wang
  2016-04-08  7:06     ` Wei Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2016-04-05  2:50 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: marcel, victork, dfleytma, yvugenfi, mst



On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Most things like ipv4 except there is a significant difference between ipv4
> and ipv6, the fragment lenght in ipv4 header includes itself, while it's not

typo

> included for ipv6, thus means ipv6 can carry a real '65535' payload.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>  hw/net/virtio-net.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 141 insertions(+), 6 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 81e8e71..2d09352 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -50,6 +50,10 @@
>  /* header lenght value in ip header without option */
>  #define VIRTIO_NET_IP4_HEADER_LENGTH 5
>  
> +#define ETH_IP6_HDR_SZ (ETH_HDR_SZ + IP6_HDR_SZ)
> +#define VIRTIO_NET_IP6_ADDR_SIZE   32      /* ipv6 saddr + daddr */
> +#define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
> +
>  /* Purge coalesced packets timer interval */
>  #define VIRTIO_NET_RSC_INTERVAL  300000
>  
> @@ -1725,6 +1729,25 @@ static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
>      unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
>  }
>  
> +static void virtio_net_rsc_extract_unit6(NetRscChain *chain,
> +                                         const uint8_t *buf, NetRscUnit* unit)
> +{
> +    uint16_t hdr_len;
> +    struct ip6_header *ip6;
> +
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    ip6 = (struct ip6_header *)(buf + hdr_len + sizeof(struct eth_header));
> +    unit->ip = ip6;
> +    unit->ip_plen = &(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip)\
> +                                        + sizeof(struct ip6_header));
> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
> +
> +    /* There is a difference between payload lenght in ipv4 and v6,
> +       ip header is excluded in ipv6 */
> +    unit->payload = htons(*unit->ip_plen) - unit->tcp_hdrlen;
> +}
> +
>  static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
>  {
>      uint32_t sum;
> @@ -1738,7 +1761,9 @@ static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
>  {
>      int ret;
>  
> -    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
> +    if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
> +        virtio_net_rsc_ipv4_checksum(seg->unit.ip);
> +    }

Why not introduce proto specific checksum function for chain?

>      ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>      QTAILQ_REMOVE(&chain->buffers, seg, next);
>      g_free(seg->buf);
> @@ -1804,7 +1829,18 @@ static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
>      QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
>      chain->stat.cache++;
>  
> -    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
> +    switch (chain->proto) {
> +    case ETH_P_IP:
> +        virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);

Another call for proto specific callbacks maybe?

> +        break;
> +
> +    case ETH_P_IPV6:
> +        virtio_net_rsc_extract_unit6(chain, seg->buf, &seg->unit);
> +        break;
> +
> +    default:
> +        g_assert_not_reached();
> +    }
>  }
>  
>  static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
> @@ -1948,6 +1984,24 @@ static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
>      return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
>  }
>  
> +static int32_t virtio_net_rsc_coalesce6(NetRscChain *chain, NetRscSeg *seg,
> +                        const uint8_t *buf, size_t size, NetRscUnit *unit)
> +{
> +    struct ip6_header *ip1, *ip2;
> +
> +    ip1 = (struct ip6_header *)(unit->ip);
> +    ip2 = (struct ip6_header *)(seg->unit.ip);
> +    if (memcmp(&ip1->ip6_src, &ip2->ip6_src, sizeof(struct in6_address))
> +        || memcmp(&ip1->ip6_dst, &ip2->ip6_dst, sizeof(struct in6_address))
> +        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
> +        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
> +            chain->stat.no_match++;
> +            return RSC_NO_MATCH;
> +    }
> +
> +    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
> +}
> +
>  /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
>   * to prevent out of order */
>  static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
> @@ -1991,7 +2045,11 @@ static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
>      NetRscSeg *seg, *nseg;
>  
>      QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
> -        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
> +        if (chain->proto == ETH_P_IP) {
> +            ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
> +        } else {
> +            ret = virtio_net_rsc_coalesce6(chain, seg, buf, size, unit);

Ditto.

> +        }
>  
>          if (ret == RSC_FINAL) {
>              if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
> @@ -2116,13 +2174,82 @@ static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
>      return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
>  }
>  
> +static int32_t virtio_net_rsc_sanity_check6(NetRscChain *chain,
> +                                            struct ip6_header *ip6,
> +                                            const uint8_t *buf, size_t size)
> +{
> +    uint16_t ip_len;
> +    uint16_t hdr_len;
> +
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header)
> +        + sizeof(tcp_header))) {
> +        return RSC_BYPASS;
> +    }
> +
> +    if (((ip6->ip6_ctlun.ip6_un1.ip6_un1_flow & 0xF0) >> 4)
> +        != IP_HEADER_VERSION_6) {
> +        return RSC_BYPASS;
> +    }
> +
> +    /* Both option and protocol is checked in this */
> +    if (ip6->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
> +        chain->stat.bypass_not_tcp++;
> +        return RSC_BYPASS;
> +    }
> +
> +    ip_len = htons(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
> +    if (ip_len < sizeof(struct tcp_header)
> +        || ip_len > (size - hdr_len - sizeof(struct eth_header)
> +                     - sizeof(struct ip6_header))) {
> +        chain->stat.ip_hacked++;
> +        return RSC_BYPASS;
> +    }
> +
> +    return RSC_WANT;
> +}
> +
> +static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
> +                                      const uint8_t *buf, size_t size)
> +{
> +    int32_t ret;
> +    uint16_t hdr_len;
> +    NetRscChain *chain;
> +    NetRscUnit unit;
> +
> +    chain = (NetRscChain *)opq;
> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> +    virtio_net_rsc_extract_unit6(chain, buf, &unit);
> +    if (RSC_WANT != virtio_net_rsc_sanity_check6(chain,
> +                                                 unit.ip, buf, size)) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    }
> +
> +    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
> +    if (ret == RSC_BYPASS) {
> +        return virtio_net_do_receive(nc, buf, size);
> +    } else if (ret == RSC_FINAL) {
> +        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
> +                ((hdr_len + sizeof(struct eth_header)) + 8),
> +                VIRTIO_NET_IP6_ADDR_SIZE,
> +                hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header),
> +                VIRTIO_NET_TCP_PORT_SIZE);
> +    }
> +
> +    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
> +        return size;
> +    }
> +
> +    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
> +}
> +
>  static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
>                                                  NetClientState *nc,
>                                                  uint16_t proto)
>  {
>      NetRscChain *chain;
>  
> -    if (proto != (uint16_t)ETH_P_IP) {
> +    if ((proto != (uint16_t)ETH_P_IP) && (proto != (uint16_t)ETH_P_IPV6)) {
>          return NULL;
>      }
>  
> @@ -2135,7 +2262,11 @@ static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
>      chain = g_malloc(sizeof(*chain));
>      chain->n = n;
>      chain->proto = proto;
> -    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
> +    if (proto == (uint16_t)ETH_P_IP) {
> +        chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
> +    } else {
> +        chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
> +    }
>      chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>                                        virtio_net_rsc_purge, chain);
>      memset(&chain->stat, 0, sizeof(chain->stat));
> @@ -2167,7 +2298,11 @@ static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>          return virtio_net_do_receive(nc, buf, size);
>      } else {
>          chain->stat.received++;
> -        return virtio_net_rsc_receive4(chain, nc, buf, size);
> +        if (proto == (uint16_t)ETH_P_IP) {
> +            return virtio_net_rsc_receive4(chain, nc, buf, size);
> +        } else  {
> +            return virtio_net_rsc_receive6(chain, nc, buf, size);
> +        }
>      }
>  }
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL
  2016-04-03 19:25 [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL wexu
                   ` (2 preceding siblings ...)
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 " wexu
@ 2016-04-05  2:52 ` Jason Wang
  3 siblings, 0 replies; 17+ messages in thread
From: Jason Wang @ 2016-04-05  2:52 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: marcel, victork, dfleytma, yvugenfi, mst



On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Changes in V4:
> - Add new host feature bit
> - Replace using fixed header lenght with dynamic header lenght in VirtIONet 
> - Change ip/ip6 header union in NetRscUnit to void* pointer
> - Add macro prefix, adjust code indent, etc.

Looks good, just few comments.

It's also better to have a unit test for this.

>
> Changes in V3:
> - Removed big param list, replace it with 'NetRscUnit' 
> - Different virtio header size
> - Modify callback function to direct call.
> - Needn't check the failure of g_malloc()
> - Other code format adjustment, macro naming, etc 
>
> Changes in V2:
> - Add detailed commit log
>
> This patch is to support WHQL test for Windows guest, while this feature also
> benifits other guest works as a kernel 'gro' like feature with userspace 
> implementation.
> Feature information:
>   http://msdn.microsoft.com/en-us/library/windows/hardware/jj853324
>
> Both IPv4 and IPv6 are supported, though performance with userspace virtio
> is slow than vhost-net, there is about 1.5x to 2x performance improvement to
> userspace virtio, this is done by turning this feature on and disable
> 'tso/gso/gro' on corresponding tap interface and guest interface, while get
> less improment with all these feature on.
>
> Linux guest performance data(Netperf):
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.101 () port 0 AF_INET : nodelay
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
>
>  87380  16384     64    6.00     1221.20   
>  87380  16384     64    6.00     1260.30   
>
>  87380  16384    128    6.00     1978.51   
>  87380  16384    128    6.00     2286.05   
>
>  87380  16384    256    6.00     2677.94   
>  87380  16384    256    6.00     4615.42   
>
>  87380  16384    512    6.00     2956.54   
>  87380  16384    512    6.00     5356.39   
>
>  87380  16384   1024    6.00     2798.17   
>  87380  16384   1024    6.00     4943.30   
>
>  87380  16384   2048    6.00     2681.09   
>  87380  16384   2048    6.00     4835.81   
>
>  87380  16384   4096    6.00     3390.14   
>  87380  16384   4096    6.00     5391.54   
>
>  87380  16384   8092    6.00     3008.27   
>  87380  16384   8092    6.00     5381.68   
>
>  87380  16384  10240    6.00     2999.89   
>  87380  16384  10240    6.00     5393.11 
>
> Test steps:
> Although this feature is mainly used for window guest, i used linux guest to 
> help test
> the feature, to make things simple, i used 3 steps to test the patch as i moved 
> on.
> 1. With a tcp socket client/server pair running on 2 linux guest, thus i can 
> control
> the traffic and debugging the code as i want.
> 2. Netperf on linux guest test the throughput.
> 3. WHQL test with 2 Windows guests.
>
> Current status:
> IPv4 pass all the above tests.
> IPv6 just passed test step 1 and 2 as described ahead, the virtio nic cannot
> receive any packet in WHQL test, trying with a win debug binary
>
> Note:
> A 'MessageDevice' nic chose as 'Realtek' will panic the system sometimes during 
> setup, this can be figured out by replacing it with an 'e1000' nic.
>
> Todo:
> More sanity check and tcp 'ecn' and 'window' scale test.
>
> Wei Xu (3):
>   virtio-net rsc: add a new host offload(rsc) feature bit
>   virtio-net rsc: support coalescing ipv4 tcp traffic
>   virtio-net rsc: support coalescing ipv6 tcp traffic
>
>  hw/net/virtio-net.c                         | 642 +++++++++++++++++++++++++++-
>  include/hw/virtio/virtio-net.h              |   1 +
>  include/hw/virtio/virtio.h                  |  72 ++++
>  include/standard-headers/linux/virtio_net.h |   1 +
>  4 files changed, 714 insertions(+), 2 deletions(-)
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit
  2016-04-05  2:05   ` Jason Wang
@ 2016-04-05  8:17     ` Michael S. Tsirkin
  2016-04-10 15:23       ` Wei Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2016-04-05  8:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: victork, yvugenfi, qemu-devel, wexu, marcel, dfleytma

On Tue, Apr 05, 2016 at 10:05:17AM +0800, Jason Wang wrote:
> 
> 
> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
> > From: Wei Xu <wexu@redhat.com>
> >
> > A new feature bit 'VIRTIO_NET_F_GUEST_RSC' is introduced to support WHQL
> > Receive-Segment-Offload test, this feature will coalesce tcp packets in
> > IPv4/6 for userspace virtio-net driver.
> >
> > This feature can be enabled either by 'ACK' the feature when loading
> > the driver in the guest, or by sending the 'VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET'
> > command to the host via control queue.
> >
> > Signed-off-by: Wei Xu <wexu@redhat.com>
> > ---
> >  hw/net/virtio-net.c                         | 29 +++++++++++++++++++++++++++--
> >  include/standard-headers/linux/virtio_net.h |  1 +
> >  2 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 5798f87..bd91a4b 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -537,6 +537,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> >          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
> >          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
> >          virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> > +        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_RSC);
> 
> Several questions here:
> 
> - I think RSC should work even without vnet_hdr?
> - Need we differentiate ipv4 and ipv6 like TSO here?
> - And looks like this patch should be squash to following patches.
> 
> >      }
> >  
> >      if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
> > @@ -582,7 +583,8 @@ static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
> >          (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
> >          (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
> >          (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
> > -        (1ULL << VIRTIO_NET_F_GUEST_UFO);
> > +        (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
> > +        (1ULL << VIRTIO_NET_F_GUEST_RSC);
> 
> Looks like this is unnecessary since we won't set peer offload based on
> GUEST_RSC.
> 
> >  
> >      return guest_offloads_mask & features;
> >  }
> > @@ -1089,7 +1091,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
> >      return 0;
> >  }
> >  
> > -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> > +static ssize_t virtio_net_do_receive(NetClientState *nc,
> > +                                     const uint8_t *buf, size_t size)
> >  {
> >      VirtIONet *n = qemu_get_nic_opaque(nc);
> >      VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> > @@ -1685,6 +1688,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
> >      return 0;
> >  }
> >  
> > +
> > +static ssize_t virtio_net_rsc_receive(NetClientState *nc,
> > +                                      const uint8_t *buf, size_t size)
> > +{
> > +    return virtio_net_do_receive(nc, buf, size);
> > +}
> > +
> > +static ssize_t virtio_net_receive(NetClientState *nc,
> > +                                  const uint8_t *buf, size_t size)
> > +{
> > +    VirtIONet *n;
> > +
> > +    n = qemu_get_nic_opaque(nc);
> > +    if (n->curr_guest_offloads & VIRTIO_NET_F_GUEST_RSC) {
> > +        return virtio_net_rsc_receive(nc, buf, size);
> > +    } else {
> > +        return virtio_net_do_receive(nc, buf, size);
> > +    }
> > +}
> 
> The changes here looks odd since it did nothing. Like I've mentioned,
> better merge the patch into following ones.
> 
> > +
> >  static NetClientInfo net_virtio_info = {
> >      .type = NET_CLIENT_OPTIONS_KIND_NIC,
> >      .size = sizeof(NICState),
> > @@ -1909,6 +1932,8 @@ static Property virtio_net_properties[] = {
> >                         TX_TIMER_INTERVAL),
> >      DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
> >      DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
> > +    DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
> > +                    VIRTIO_NET_F_GUEST_RSC, true),
> 
> Need to compat the bit for old machine type to unbreak migration I believe?

And definitely disable by default.

> Btw, also need a patch for virtio spec.
> 
> Thanks
> 
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> > index a78f33e..5b95762 100644
> > --- a/include/standard-headers/linux/virtio_net.h
> > +++ b/include/standard-headers/linux/virtio_net.h
> > @@ -55,6 +55,7 @@
> >  #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
> >  					 * Steering */
> >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
> > +#define VIRTIO_NET_F_GUEST_RSC  24      /* Guest can coalesce tcp packets */
> >  
> >  #ifndef VIRTIO_NET_NO_LEGACY
> >  #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic
  2016-04-05  2:50   ` Jason Wang
@ 2016-04-08  7:06     ` Wei Xu
  2016-04-08  7:27       ` Jason Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Wei Xu @ 2016-04-08  7:06 UTC (permalink / raw)
  To: qemu-devel



On 2016年04月05日 10:50, Jason Wang wrote:
>
> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>> From: Wei Xu <wexu@redhat.com>
>>
>> Most things like ipv4 except there is a significant difference between ipv4
>> and ipv6, the fragment lenght in ipv4 header includes itself, while it's not
> typo
Thanks.
>
>> included for ipv6, thus means ipv6 can carry a real '65535' payload.
>>
>> Signed-off-by: Wei Xu <wexu@redhat.com>
>> ---
>>   hw/net/virtio-net.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 141 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 81e8e71..2d09352 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -50,6 +50,10 @@
>>   /* header lenght value in ip header without option */
>>   #define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>   
>> +#define ETH_IP6_HDR_SZ (ETH_HDR_SZ + IP6_HDR_SZ)
>> +#define VIRTIO_NET_IP6_ADDR_SIZE   32      /* ipv6 saddr + daddr */
>> +#define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
>> +
>>   /* Purge coalesced packets timer interval */
>>   #define VIRTIO_NET_RSC_INTERVAL  300000
>>   
>> @@ -1725,6 +1729,25 @@ static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
>>       unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
>>   }
>>   
>> +static void virtio_net_rsc_extract_unit6(NetRscChain *chain,
>> +                                         const uint8_t *buf, NetRscUnit* unit)
>> +{
>> +    uint16_t hdr_len;
>> +    struct ip6_header *ip6;
>> +
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    ip6 = (struct ip6_header *)(buf + hdr_len + sizeof(struct eth_header));
>> +    unit->ip = ip6;
>> +    unit->ip_plen = &(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
>> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip)\
>> +                                        + sizeof(struct ip6_header));
>> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
>> +
>> +    /* There is a difference between payload lenght in ipv4 and v6,
>> +       ip header is excluded in ipv6 */
>> +    unit->payload = htons(*unit->ip_plen) - unit->tcp_hdrlen;
>> +}
>> +
>>   static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
>>   {
>>       uint32_t sum;
>> @@ -1738,7 +1761,9 @@ static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
>>   {
>>       int ret;
>>   
>> -    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>> +    if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
>> +        virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>> +    }
> Why not introduce proto specific checksum function for chain?
Since there are only 2 protocols to be supported, and very limited 
extension for this feature, mst suggest to use direct call in v2 patch
to make things simple, and i took it.
>
>>       ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>>       QTAILQ_REMOVE(&chain->buffers, seg, next);
>>       g_free(seg->buf);
>> @@ -1804,7 +1829,18 @@ static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
>>       QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
>>       chain->stat.cache++;
>>   
>> -    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
>> +    switch (chain->proto) {
>> +    case ETH_P_IP:
>> +        virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
> Another call for proto specific callbacks maybe?
Same as above.
>
>> +        break;
>> +
>> +    case ETH_P_IPV6:
>> +        virtio_net_rsc_extract_unit6(chain, seg->buf, &seg->unit);
>> +        break;
>> +
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>>   }
>>   
>>   static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
>> @@ -1948,6 +1984,24 @@ static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
>>       return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
>>   }
>>   
>> +static int32_t virtio_net_rsc_coalesce6(NetRscChain *chain, NetRscSeg *seg,
>> +                        const uint8_t *buf, size_t size, NetRscUnit *unit)
>> +{
>> +    struct ip6_header *ip1, *ip2;
>> +
>> +    ip1 = (struct ip6_header *)(unit->ip);
>> +    ip2 = (struct ip6_header *)(seg->unit.ip);
>> +    if (memcmp(&ip1->ip6_src, &ip2->ip6_src, sizeof(struct in6_address))
>> +        || memcmp(&ip1->ip6_dst, &ip2->ip6_dst, sizeof(struct in6_address))
>> +        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
>> +        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
>> +            chain->stat.no_match++;
>> +            return RSC_NO_MATCH;
>> +    }
>> +
>> +    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
>> +}
>> +
>>   /* Pakcets with 'SYN' should bypass, other flag should be sent after drain
>>    * to prevent out of order */
>>   static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
>> @@ -1991,7 +2045,11 @@ static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
>>       NetRscSeg *seg, *nseg;
>>   
>>       QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
>> -        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
>> +        if (chain->proto == ETH_P_IP) {
>> +            ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
>> +        } else {
>> +            ret = virtio_net_rsc_coalesce6(chain, seg, buf, size, unit);
> Ditto.
Ditto too:)
>
>> +        }
>>   
>>           if (ret == RSC_FINAL) {
>>               if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>> @@ -2116,13 +2174,82 @@ static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
>>       return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
>>   }
>>   
>> +static int32_t virtio_net_rsc_sanity_check6(NetRscChain *chain,
>> +                                            struct ip6_header *ip6,
>> +                                            const uint8_t *buf, size_t size)
>> +{
>> +    uint16_t ip_len;
>> +    uint16_t hdr_len;
>> +
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header)
>> +        + sizeof(tcp_header))) {
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    if (((ip6->ip6_ctlun.ip6_un1.ip6_un1_flow & 0xF0) >> 4)
>> +        != IP_HEADER_VERSION_6) {
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    /* Both option and protocol is checked in this */
>> +    if (ip6->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
>> +        chain->stat.bypass_not_tcp++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    ip_len = htons(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
>> +    if (ip_len < sizeof(struct tcp_header)
>> +        || ip_len > (size - hdr_len - sizeof(struct eth_header)
>> +                     - sizeof(struct ip6_header))) {
>> +        chain->stat.ip_hacked++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    return RSC_WANT;
>> +}
>> +
>> +static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
>> +                                      const uint8_t *buf, size_t size)
>> +{
>> +    int32_t ret;
>> +    uint16_t hdr_len;
>> +    NetRscChain *chain;
>> +    NetRscUnit unit;
>> +
>> +    chain = (NetRscChain *)opq;
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    virtio_net_rsc_extract_unit6(chain, buf, &unit);
>> +    if (RSC_WANT != virtio_net_rsc_sanity_check6(chain,
>> +                                                 unit.ip, buf, size)) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    }
>> +
>> +    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
>> +    if (ret == RSC_BYPASS) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    } else if (ret == RSC_FINAL) {
>> +        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
>> +                ((hdr_len + sizeof(struct eth_header)) + 8),
>> +                VIRTIO_NET_IP6_ADDR_SIZE,
>> +                hdr_len + sizeof(struct eth_header) + sizeof(struct ip6_header),
>> +                VIRTIO_NET_TCP_PORT_SIZE);
>> +    }
>> +
>> +    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
>> +        return size;
>> +    }
>> +
>> +    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
>> +}
>> +
>>   static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
>>                                                   NetClientState *nc,
>>                                                   uint16_t proto)
>>   {
>>       NetRscChain *chain;
>>   
>> -    if (proto != (uint16_t)ETH_P_IP) {
>> +    if ((proto != (uint16_t)ETH_P_IP) && (proto != (uint16_t)ETH_P_IPV6)) {
>>           return NULL;
>>       }
>>   
>> @@ -2135,7 +2262,11 @@ static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
>>       chain = g_malloc(sizeof(*chain));
>>       chain->n = n;
>>       chain->proto = proto;
>> -    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
>> +    if (proto == (uint16_t)ETH_P_IP) {
>> +        chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
>> +    } else {
>> +        chain->max_payload = VIRTIO_NET_MAX_IP6_PAYLOAD;
>> +    }
>>       chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>>                                         virtio_net_rsc_purge, chain);
>>       memset(&chain->stat, 0, sizeof(chain->stat));
>> @@ -2167,7 +2298,11 @@ static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>>           return virtio_net_do_receive(nc, buf, size);
>>       } else {
>>           chain->stat.received++;
>> -        return virtio_net_rsc_receive4(chain, nc, buf, size);
>> +        if (proto == (uint16_t)ETH_P_IP) {
>> +            return virtio_net_rsc_receive4(chain, nc, buf, size);
>> +        } else  {
>> +            return virtio_net_rsc_receive6(chain, nc, buf, size);
>> +        }
>>       }
>>   }
>>   
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic
  2016-04-08  7:06     ` Wei Xu
@ 2016-04-08  7:27       ` Jason Wang
  2016-04-08  7:51         ` Wei Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2016-04-08  7:27 UTC (permalink / raw)
  To: Wei Xu, qemu-devel



On 04/08/2016 03:06 PM, Wei Xu wrote:
>
>
> On 2016年04月05日 10:50, Jason Wang wrote:
>>
>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> Most things like ipv4 except there is a significant difference
>>> between ipv4
>>> and ipv6, the fragment lenght in ipv4 header includes itself, while
>>> it's not
>> typo
> Thanks.
>>
>>> included for ipv6, thus means ipv6 can carry a real '65535' payload.
>>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>> ---
>>>   hw/net/virtio-net.c | 147
>>> +++++++++++++++++++++++++++++++++++++++++++++++++---
>>>   1 file changed, 141 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 81e8e71..2d09352 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -50,6 +50,10 @@
>>>   /* header lenght value in ip header without option */
>>>   #define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>>   +#define ETH_IP6_HDR_SZ (ETH_HDR_SZ + IP6_HDR_SZ)
>>> +#define VIRTIO_NET_IP6_ADDR_SIZE   32      /* ipv6 saddr + daddr */
>>> +#define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
>>> +
>>>   /* Purge coalesced packets timer interval */
>>>   #define VIRTIO_NET_RSC_INTERVAL  300000
>>>   @@ -1725,6 +1729,25 @@ static void
>>> virtio_net_rsc_extract_unit4(NetRscChain *chain,
>>>       unit->payload = htons(*unit->ip_plen) - ip_hdrlen -
>>> unit->tcp_hdrlen;
>>>   }
>>>   +static void virtio_net_rsc_extract_unit6(NetRscChain *chain,
>>> +                                         const uint8_t *buf,
>>> NetRscUnit* unit)
>>> +{
>>> +    uint16_t hdr_len;
>>> +    struct ip6_header *ip6;
>>> +
>>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>> +    ip6 = (struct ip6_header *)(buf + hdr_len + sizeof(struct
>>> eth_header));
>>> +    unit->ip = ip6;
>>> +    unit->ip_plen = &(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
>>> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip)\
>>> +                                        + sizeof(struct ip6_header));
>>> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000)
>>> >> 10;
>>> +
>>> +    /* There is a difference between payload lenght in ipv4 and v6,
>>> +       ip header is excluded in ipv6 */
>>> +    unit->payload = htons(*unit->ip_plen) - unit->tcp_hdrlen;
>>> +}
>>> +
>>>   static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
>>>   {
>>>       uint32_t sum;
>>> @@ -1738,7 +1761,9 @@ static size_t
>>> virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
>>>   {
>>>       int ret;
>>>   -    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>>> +    if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
>>> +        virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>>> +    }
>> Why not introduce proto specific checksum function for chain?
> Since there are only 2 protocols to be supported, and very limited
> extension for this feature, mst suggest to use direct call in v2 patch
> to make things simple, and i took it.

Have you tried with my suggestion? I think it will actually simplify the
current code (at least several lines of codes).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-05  2:47   ` Jason Wang
@ 2016-04-08  7:47     ` Wei Xu
  2016-04-08  8:31       ` Jason Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Wei Xu @ 2016-04-08  7:47 UTC (permalink / raw)
  To: qemu-devel



On 2016年04月05日 10:47, Jason Wang wrote:
>
> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>> From: Wei Xu <wexu@redhat.com>
>>
>> All the data packets in a tcp connection will be cached to a big buffer
>> in every receive interval, and will be sent out via a timer, the
>> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
>> performance and response of tcp connection extremely, 50000(50us) is a
>> experience value to gain a performance improvement, since the whql test
>> sends packets every 100us, so '300000(300us)' can pass the test case,
>> this is also the default value, it's gonna to be tunable.
>>
>> The timer will only be triggered if the packets pool is not empty,
>> and it'll drain off all the cached packets
>>
>> 'NetRscChain' is used to save the segments of different protocols in a
>> VirtIONet device.
>>
>> The main handler of TCP includes TCP window update, duplicated ACK check
>> and the real data coalescing if the new segment passed sanity check
>> and is identified as an 'wanted' one.
>>
>> An 'wanted' segment means:
>> 1. Segment is within current window and the sequence is the expected one.
>> 2. ACK of the segment is in the valid window.
>> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>>     this is to notify upper layer TCP starting retransmission due to the spec.
>>
>> Sanity check includes:
>> 1. Incorrect version in IP header
>> 2. IP options & IP fragment
>> 3. Not a TCP packets
>> 4. Sanity size check to prevent buffer overflow attack.
>>
>> There maybe more cases should be considered such as ip identification other
>> flags, while it broke the test because windows set it to the same even it's
>> not a fragment.
>>
>> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
>> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
>> means the packets should also be bypassed, and this should be done
>> after searching for the same connection packets in the pool and sending
>> all of them out, this is to avoid out of data.
>>
>> All the 'SYN' packets will be bypassed since this always begin a new'
>> connection, other flags such 'FIN/RST' will trigger a finalization, because
>> this normally happens upon a connection is going to be closed, an 'URG' packet
>> also finalize current coalescing unit.
>>
>> Statistics can be used to monitor the basic coalescing status, the 'out of order'
>> and 'out of window' means how many retransmitting packets, thus describe the
>> performance intuitively.
>>
>> Signed-off-by: Wei Xu <wexu@redhat.com>
>> ---
>>   hw/net/virtio-net.c            | 480 ++++++++++++++++++++++++++++++++++++++++-
>>   include/hw/virtio/virtio-net.h |   1 +
>>   include/hw/virtio/virtio.h     |  72 +++++++
>>   3 files changed, 552 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index bd91a4b..81e8e71 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -15,10 +15,12 @@
>>   #include "qemu/iov.h"
>>   #include "hw/virtio/virtio.h"
>>   #include "net/net.h"
>> +#include "net/eth.h"
>>   #include "net/checksum.h"
>>   #include "net/tap.h"
>>   #include "qemu/error-report.h"
>>   #include "qemu/timer.h"
>> +#include "qemu/sockets.h"
>>   #include "hw/virtio/virtio-net.h"
>>   #include "net/vhost_net.h"
>>   #include "hw/virtio/virtio-bus.h"
>> @@ -38,6 +40,24 @@
>>   #define endof(container, field) \
>>       (offsetof(container, field) + sizeof(((container *)0)->field))
>>   
>> +#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
>> +#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
>> +
>> +/* IPv4 max payload, 16 bits in the header */
>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
>> +
>> +/* header lenght value in ip header without option */
> typo here.
Thanks.
>
>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
>> +
>> +/* Purge coalesced packets timer interval */
>> +#define VIRTIO_NET_RSC_INTERVAL  300000
>> +
>> +/* This value affects the performance a lot, and should be tuned carefully,
>> +   '300000'(300us) is the recommended value to pass the WHQL test, '50000' can
>> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
> Like we've discussed in previous versions, need we add another property
> for this?
Do you know how to make this a tunable parameter to guest? can this 
parameter be set via control queue?
>
>> +
>>   typedef struct VirtIOFeature {
>>       uint32_t flags;
>>       size_t end;
>> @@ -1688,11 +1708,467 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>       return 0;
>>   }
>>   
>> +static void virtio_net_rsc_extract_unit4(NetRscChain *chain,
>> +                                         const uint8_t *buf, NetRscUnit* unit)
>> +{
>> +    uint16_t hdr_len;
>> +    uint16_t ip_hdrlen;
>> +    struct ip_header *ip;
>> +
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
> The case seems odd. Why not just use VirtIONet * for chain->n?
OK, will take it.
>
>> +    ip = (struct ip_header *)(buf + hdr_len + sizeof(struct eth_header));
>> +    unit->ip = (void *)ip;
>> +    ip_hdrlen = (ip->ip_ver_len & 0xF) << 2;
>> +    unit->ip_plen = &ip->ip_len;
>> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip) + ip_hdrlen);
>> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000) >> 10;
>> +    unit->payload = htons(*unit->ip_plen) - ip_hdrlen - unit->tcp_hdrlen;
>> +}
>> +
>> +static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
>> +{
>> +    uint32_t sum;
>> +
>> +    ip->ip_sum = 0;
>> +    sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
>> +    ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
>   
> Should we set VIRTIO_NET_HDR_F_DATA_VALID for hdr.flags here?
have a look at the code, seems should set it, packets will be dropped by 
guest if don't calculate checksum here with this patch,
looks there is an inexplicit logical check, will check it out and test 
again, thanks.
>
>> +}
>> +
>> +static size_t virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
>> +{
>> +    int ret;
>> +
>> +    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>> +    ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>> +    QTAILQ_REMOVE(&chain->buffers, seg, next);
>> +    g_free(seg->buf);
>> +    g_free(seg);
>> +
>> +    return ret;
>> +}
>> +
>> +static void virtio_net_rsc_purge(void *opq)
>> +{
>> +    NetRscChain *chain = (NetRscChain *)opq;
>> +    NetRscSeg *seg, *rn;
>> +
>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
>> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>> +            chain->stat.purge_failed++;
>> +            continue;
> Is it better to break here, consider we fail to do the receive?
Actually this fails only when receive fails according to the test, but 
should drain all the segments here,
if break here, then will have to free all the buffers, is it possible a 
successful receiving followed by a failed one?
if that does happens, i preferred give it a shot other than free it 
directly.
>
>> +        }
>> +    }
>> +
>> +    chain->stat.timer++;
>> +    if (!QTAILQ_EMPTY(&chain->buffers)) {
>> +        timer_mod(chain->drain_timer,
>> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
>> +    }
>> +}
>> +
>> +static void virtio_net_rsc_cleanup(VirtIONet *n)
>> +{
>> +    NetRscChain *chain, *rn_chain;
>> +    NetRscSeg *seg, *rn_seg;
>> +
>> +    QTAILQ_FOREACH_SAFE(chain, &n->rsc_chains, next, rn_chain) {
>> +        QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn_seg) {
>> +            QTAILQ_REMOVE(&chain->buffers, seg, next);
>> +            g_free(seg->buf);
>> +            g_free(seg);
>> +        }
>> +
>> +        timer_del(chain->drain_timer);
>> +        timer_free(chain->drain_timer);
>> +        QTAILQ_REMOVE(&n->rsc_chains, chain, next);
>> +        g_free(chain);
>> +    }
>> +}
>> +
>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,
>> +                                     const uint8_t *buf, size_t size)
>> +{
>> +    uint16_t hdr_len;
>> +    NetRscSeg *seg;
>> +
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    seg = g_malloc(sizeof(NetRscSeg));
>> +    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
>> +                   + sizeof(struct ip6_header) + VIRTIO_NET_MAX_TCP_PAYLOAD);
> Why ip6_header here?
ipv6 packets can carry 65535(2^16) bytes pure data payload, that means 
the header is not included.
but ipv4 is included, i choose the maximum here.
>
>> +    memcpy(seg->buf, buf, size);
>> +    seg->size = size;
>> +    seg->dup_ack_count = 0;
>> +    seg->is_coalesced = 0;
>> +    seg->nc = nc;
>> +
>> +    QTAILQ_INSERT_TAIL(&chain->buffers, seg, next);
>> +    chain->stat.cache++;
>> +
>> +    virtio_net_rsc_extract_unit4(chain, seg->buf, &seg->unit);
>> +}
>> +
>> +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,
>> +                                 const uint8_t *buf, struct tcp_header *n_tcp,
>> +                                 struct tcp_header *o_tcp)
>> +{
>> +    uint32_t nack, oack;
>> +    uint16_t nwin, owin;
>> +
>> +    nack = htonl(n_tcp->th_ack);
>> +    nwin = htons(n_tcp->th_win);
>> +    oack = htonl(o_tcp->th_ack);
>> +    owin = htons(o_tcp->th_win);
>> +
>> +    if ((nack - oack) >= VIRTIO_NET_MAX_TCP_PAYLOAD) {
>> +        chain->stat.ack_out_of_win++;
>> +        return RSC_FINAL;
>> +    } else if (nack == oack) {
>> +        /* duplicated ack or window probe */
>> +        if (nwin == owin) {
>> +            /* duplicated ack, add dup ack count due to whql test up to 1 */
>> +            chain->stat.dup_ack++;
>> +
>> +            if (seg->dup_ack_count == 0) {
>> +                seg->dup_ack_count++;
>> +                chain->stat.dup_ack1++;
>> +                return RSC_COALESCE;
>> +            } else {
>> +                /* Spec says should send it directly */
> Better quote the what spec said here or just remove the above comment.
OK.
>
>> +                chain->stat.dup_ack2++;
>> +                return RSC_FINAL;
>> +            }
>> +        } else {
>> +            /* Coalesce window update */
>> +            o_tcp->th_win = n_tcp->th_win;
>> +            chain->stat.win_update++;
>> +            return RSC_COALESCE;
>> +        }
>> +    } else {
>> +        /* pure ack, update ack */
>> +        o_tcp->th_ack = n_tcp->th_ack;
>> +        chain->stat.pure_ack++;
>> +        return RSC_COALESCE;
>> +    }
>> +}
>> +
>> +static int32_t virtio_net_rsc_coalesce_data(NetRscChain *chain, NetRscSeg *seg,
>> +                                    const uint8_t *buf, NetRscUnit *n_unit)
>> +{
>> +    void *data;
>> +    uint16_t o_ip_len;
>> +    uint32_t nseq, oseq;
>> +    NetRscUnit *o_unit;
>> +
>> +    o_unit = &seg->unit;
>> +    o_ip_len = htons(*o_unit->ip_plen);
>> +    nseq = htonl(n_unit->tcp->th_seq);
>> +    oseq = htonl(o_unit->tcp->th_seq);
>> +
>> +    if (n_unit->tcp_hdrlen > sizeof(struct tcp_header)) {
>> +        /* Log this only for debugging observation */
>> +        chain->stat.tcp_option++;
>> +    }
>> +
>> +    /* Ignore packet with more/larger tcp options */
>> +    if (n_unit->tcp_hdrlen > o_unit->tcp_hdrlen) {
>> +        chain->stat.tcp_larger_option++;
>> +        return RSC_FINAL;
>> +    }
>> +
>> +    /* Remember larger/smaller options may behave different,so keep a
>> +       separate statitics for each of them for debugging convenience. */
>> +    if (n_unit->tcp_hdrlen < o_unit->tcp_hdrlen) {
>> +        chain->stat.tcp_smaller_option++;
>> +        return RSC_FINAL;
>> +    }
>> +
>> +    /* out of order or retransmitted. */
>> +    if ((nseq - oseq) > VIRTIO_NET_MAX_TCP_PAYLOAD) {
>> +        chain->stat.data_out_of_win++;
>> +        return RSC_FINAL;
>> +    }
>> +
>> +    data = ((uint8_t *)n_unit->tcp) + n_unit->tcp_hdrlen;
>> +    if (nseq == oseq) {
>> +        if ((0 == o_unit->payload) && n_unit->payload) {
> o_unit->payload == 0 please.
ok, sorry.
>
>> +            /* From no payload to payload, normal case, not a dup ack or etc */
>> +            chain->stat.data_after_pure_ack++;
>> +            goto coalesce;
>> +        } else {
>> +            return virtio_net_rsc_handle_ack(chain, seg, buf,
>> +                                             n_unit->tcp, o_unit->tcp);
>> +        }
>> +    } else if ((nseq - oseq) != o_unit->payload) {
>> +        /* Not a consistent packet, out of order */
>> +        chain->stat.data_out_of_order++;
>> +        return RSC_FINAL;
>> +    } else {
>> +coalesce:
>> +        if ((o_ip_len + n_unit->payload) > chain->max_payload) {
>> +            chain->stat.over_size++;
>> +            return RSC_FINAL;
>> +        }
>> +
>> +        /* Here comes the right data, the payload lengh in v4/v6 is different,
>> +           so use the field value to update and record the new data len */
>> +        o_unit->payload += n_unit->payload; /* update new data len */
>> +
>> +        /* update field in ip header */
>> +        *o_unit->ip_plen = htons(o_ip_len + n_unit->payload);
>> +
>> +        /* Bring 'PUSH' big, the whql test guide says 'PUSH' can be coalesced
>> +           for windows guest, while this may change the behavior for linux
>> +           guest. */
> In the future, need to measure TCP_RR performance in this case to see
> the impact on Linux guests.
OK.
>
>> +        o_unit->tcp->th_offset_flags = n_unit->tcp->th_offset_flags;
>> +
>> +        o_unit->tcp->th_ack = n_unit->tcp->th_ack;
>> +        o_unit->tcp->th_win = n_unit->tcp->th_win;
>> +
>> +        memmove(seg->buf + seg->size, data, n_unit->payload);
>> +        seg->size += n_unit->payload;
>> +        chain->stat.coalesced++;
>> +        return RSC_COALESCE;
>> +    }
>> +}
>> +
>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain, NetRscSeg *seg,
>> +                        const uint8_t *buf, size_t size, NetRscUnit *unit)
> indentation is wrong here.
Should it be aligned from parameter begin? i was going to save an extra 
line parameter.
>
>> +{
>> +    struct ip_header *ip1, *ip2;
>> +
>> +    ip1 = (struct ip_header *)(unit->ip);
>> +    ip2 = (struct ip_header *)(seg->unit.ip);
>> +    if ((ip1->ip_src ^ ip2->ip_src) || (ip1->ip_dst ^ ip2->ip_dst)
>> +        || (unit->tcp->th_sport ^ seg->unit.tcp->th_sport)
>> +        || (unit->tcp->th_dport ^ seg->unit.tcp->th_dport)) {
>> +        chain->stat.no_match++;
>> +        return RSC_NO_MATCH;
>> +    }
>> +
>> +    return virtio_net_rsc_coalesce_data(chain, seg, buf, unit);
>> +}
>> +
>> +/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
>> + * to prevent out of order */
>> +static int virtio_net_rsc_tcp_ctrl_check(NetRscChain *chain,
>> +                                         struct tcp_header *tcp)
>> +{
>> +    uint16_t tcp_flag;
>> +
>> +    tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
>> +    if (tcp_flag & TH_SYN) {
>> +        chain->stat.tcp_syn++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
>> +        chain->stat.tcp_ctrl_drain++;
>> +        return RSC_FINAL;
>> +    }
>> +
>> +    return RSC_WANT;
>> +}
>> +
>> +static bool virtio_net_rsc_empty_cache(NetRscChain *chain, NetClientState *nc,
>> +                                       const uint8_t *buf, size_t size)
>> +{
>> +    if (QTAILQ_EMPTY(&chain->buffers)) {
>> +        chain->stat.empty_cache++;
>> +        virtio_net_rsc_cache_buf(chain, nc, buf, size);
>> +        timer_mod(chain->drain_timer,
>> +              qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + virtio_net_rsc_timeout);
>> +        return 1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static size_t virtio_net_rsc_do_coalesce(NetRscChain *chain, NetClientState *nc,
>> +                                         const uint8_t *buf, size_t size,
>> +                                         NetRscUnit *unit)
>> +{
>> +    int ret;
>> +    NetRscSeg *seg, *nseg;
>> +
>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
>> +        ret = virtio_net_rsc_coalesce4(chain, seg, buf, size, unit);
>> +
>> +        if (ret == RSC_FINAL) {
>> +            if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>> +                /* Send failed */
>> +                chain->stat.final_failed++;
>> +                return 0;
>> +            }
>> +
>> +            /* Send current packet */
>> +            return virtio_net_do_receive(nc, buf, size);
>> +        } else if (ret == RSC_NO_MATCH) {
>> +            continue;
>> +        } else {
>> +            /* Coalesced, mark coalesced flag to tell calc cksum for ipv4 */
>> +            seg->is_coalesced = 1;
>> +            return size;
>> +        }
>> +    }
>> +
>> +    chain->stat.no_match_cache++;
>> +    virtio_net_rsc_cache_buf(chain, nc, buf, size);
> Can we rework the code to make it a little bit more readable here?
>
> RSC_BYPASS, RSC_WANT was checked by the caller
> (virtio_net_rsc_receive4()). and the rest were checked here. Can we do
> all the check in one function with a switch...case?
>
> Maybe you can introduce sanity check and tcp check callback for each chain.
OK, will check it out.
>
>
>> +    return size;
>> +}
>> +
>> +/* Drain a connection data, this is to avoid out of order segments */
>> +static size_t virtio_net_rsc_drain_flow(NetRscChain *chain, NetClientState *nc,
>> +                        const uint8_t *buf, size_t size, uint16_t ip_start,
>> +                        uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
> indentation is wrong.
OK.
>> +{
>> +    NetRscSeg *seg, *nseg;
>> +
>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, nseg) {
>> +        if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
>> +            || memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {
>> +            continue;
> May consider to optimize (e.g not using linear search) in the future.
Yes.
>
>> +        }
>> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>> +            chain->stat.drain_failed++;
>> +        }
>> +
>> +        break;
>> +    }
>> +
>> +    return virtio_net_do_receive(nc, buf, size);
>> +}
>> +
>> +static int32_t virtio_net_rsc_sanity_check4(NetRscChain *chain,
>> +                        struct ip_header *ip, const uint8_t *buf, size_t size)+        chain->stat.ip_option++;
> indentation.
OK.
>
>> +{
>> +    uint16_t ip_len;
>> +    uint16_t hdr_len;
>> +
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    if (size < (hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header)
>> +        + sizeof(struct tcp_header))) {
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    /* Not an ipv4 one */
>> +    if (((ip->ip_ver_len & 0xF0) >> 4) != IP_HEADER_VERSION_4) {
>> +        chain->stat.ip_option++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    /* Don't handle packets with ip option */
>> +    if (VIRTIO_NET_IP4_HEADER_LENGTH != (ip->ip_ver_len & 0xF)) {
>> +        chain->stat.ip_option++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    /* Don't handle packets with ip fragment */
>> +    if (!(htons(ip->ip_off) & IP_DF)) {
>> +        chain->stat.ip_frag++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    if (ip->ip_p != IPPROTO_TCP) {
>> +        chain->stat.bypass_not_tcp++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    ip_len = htons(ip->ip_len);
>> +    if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
>> +        || ip_len > (size - hdr_len - sizeof(struct eth_header))) {
>> +        chain->stat.ip_hacked++;
>> +        return RSC_BYPASS;
>> +    }
>> +
>> +    return RSC_WANT;
>> +}
>> +
>> +static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
> Why use void * here?
Seems there is a type definition error before, will check it out.
>
>> +                                      const uint8_t *buf, size_t size)
>> +{
>> +    int32_t ret;
>> +    uint16_t hdr_len;
>> +    NetRscChain *chain;
>> +    NetRscUnit unit;
>> +
>> +    chain = (NetRscChain *)opq;
>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>> +    virtio_net_rsc_extract_unit4(chain, buf, &unit);
>> +    if (RSC_WANT != virtio_net_rsc_sanity_check4(chain, unit.ip, buf, size)) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    }
>> +
>> +    ret = virtio_net_rsc_tcp_ctrl_check(chain, unit.tcp);
>> +    if (ret == RSC_BYPASS) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    } else if (ret == RSC_FINAL) {
>> +        return virtio_net_rsc_drain_flow(chain, nc, buf, size,
>> +                ((hdr_len + sizeof(struct eth_header)) + 12),
>> +                VIRTIO_NET_IP4_ADDR_SIZE,
>> +                hdr_len + sizeof(struct eth_header) + sizeof(struct ip_header),
>> +                VIRTIO_NET_TCP_PORT_SIZE);
>> +    }
>> +
>> +    if (virtio_net_rsc_empty_cache(chain, nc, buf, size)) {
>> +        return size;r
>> +    }
>> +
>> +    return virtio_net_rsc_do_coalesce(chain, nc, buf, size, &unit);
>> +}
>> +
>> +static NetRscChain *virtio_net_rsc_lookup_chain(VirtIONet * n,
>> +                                                NetClientState *nc,
>> +                                                uint16_t proto)
>> +{
>> +    NetRscChain *chain;
>> +
>> +    if (proto != (uint16_t)ETH_P_IP) {
>> +        return NULL;
>> +    }
>> +
>> +    QTAILQ_FOREACH(chain, &n->rsc_chains, next) {
>> +        if (chain->proto == proto) {
>> +            return chain;
>> +        }
>> +    }
>> +
>> +    chain = g_malloc(sizeof(*chain));
>> +    chain->n = n;
>> +    chain->proto = proto;
>> +    chain->max_payload = VIRTIO_NET_MAX_IP4_PAYLOAD;
>> +    chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>> +                                      virtio_net_rsc_purge, chain);
>> +    memset(&chain->stat, 0, sizeof(chain->stat));
>> +
>> +    QTAILQ_INIT(&chain->buffers);
>> +    QTAILQ_INSERT_TAIL(&n->rsc_chains, chain, next);
>> +
>> +    return chain;
>> +}
>>   
>>   static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>>                                         const uint8_t *buf, size_t size)
>>   {
>> -    return virtio_net_do_receive(nc, buf, size);
>> +    uint16_t proto;
>> +    NetRscChain *chain;
>> +    struct eth_header *eth;
>> +    VirtIONet *n;
>> +
>> +    n = qemu_get_nic_opaque(nc);
>> +    if (size < (n->guest_hdr_len + sizeof(struct eth_header))) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    }
>> +
>> +    eth = (struct eth_header *)(buf + n->guest_hdr_len);
>> +    proto = htons(eth->h_proto);
>> +
>> +    chain = virtio_net_rsc_lookup_chain(n, nc, proto);
>> +    if (!chain) {
>> +        return virtio_net_do_receive(nc, buf, size);
>> +    } else {
>> +        chain->stat.received++;
>> +        return virtio_net_rsc_receive4(chain, nc, buf, size);
>> +    }
>>   }
>>   
>>   static ssize_t virtio_net_receive(NetClientState *nc,
>> @@ -1837,6 +2313,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>>       nc = qemu_get_queue(n->nic);
>>       nc->rxfilter_notify_enabled = 1;
>>   
>> +    QTAILQ_INIT(&n->rsc_chains);
>>       n->qdev = dev;
>>       register_savevm(dev, "virtio-net", -1, VIRTIO_NET_VM_VERSION,
>>                       virtio_net_save, virtio_net_load, n);
>> @@ -1871,6 +2348,7 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
>>       g_free(n->vqs);
>>       qemu_del_nic(n->nic);
>>       virtio_cleanup(vdev);
>> +    virtio_net_rsc_cleanup(n);
>>   }
>>   
>>   static void virtio_net_instance_init(Object *obj)
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 0cabdb6..6939e92 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -59,6 +59,7 @@ typedef struct VirtIONet {
>>       VirtIONetQueue *vqs;
>>       VirtQueue *ctrl_vq;
>>       NICState *nic;
>> +    QTAILQ_HEAD(, NetRscChain) rsc_chains;
>>       uint32_t tx_timeout;
>>       int32_t tx_burst;
>>       uint32_t has_vnet_hdr;
>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>> index 2b5b248..6be1463 100644
>> --- a/include/hw/virtio/virtio.h
>> +++ b/include/hw/virtio/virtio.h
>> @@ -128,6 +128,78 @@ typedef struct VirtioDeviceClass {
>>       int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
>>   } VirtioDeviceClass;
>>   
>> +/* Coalesced packets type & status */
>> +typedef enum {
>> +    RSC_COALESCE,           /* Data been coalesced */
>> +    RSC_FINAL,              /* Will terminate current connection */
>> +    RSC_NO_MATCH,           /* No matched in the buffer pool */
>> +    RSC_BYPASS,             /* Packet to be bypass, not tcp, tcp ctrl, etc */
>> +    RSC_WANT                /* Data want to be coalesced */
>> +} COALESCE_STATUS;
>> +
>> +typedef struct NetRscStat {
>> +    uint32_t received;
>> +    uint32_t coalesced;
>> +    uint32_t over_size;
>> +    uint32_t cache;
>> +    uint32_t empty_cache;
>> +    uint32_t no_match_cache;
>> +    uint32_t win_update;
>> +    uint32_t no_match;
>> +    uint32_t tcp_syn;
>> +    uint32_t tcp_ctrl_drain;
>> +    uint32_t dup_ack;
>> +    uint32_t dup_ack1;
>> +    uint32_t dup_ack2;
>> +    uint32_t pure_ack;
>> +    uint32_t ack_out_of_win;
>> +    uint32_t data_out_of_win;
>> +    uint32_t data_out_of_order;
>> +    uint32_t data_after_pure_ack;
>> +    uint32_t bypass_not_tcp;
>> +    uint32_t tcp_option;
>> +    uint32_t tcp_larger_option;
>> +    uint32_t tcp_smaller_option;
>> +    uint32_t ip_frag;
>> +    uint32_t ip_hacked;
>> +    uint32_t ip_option;
>> +    uint32_t purge_failed;
>> +    uint32_t drain_failed;
>> +    uint32_t final_failed;
>> +    int64_t  timer;
>> +} NetRscStat;
>> +
>> +/* Rsc unit general info used to checking if can coalescing */
>> +typedef struct NetRscUnit {
>> +    void *ip;   /* ip header */
>> +    uint16_t *ip_plen;      /* data len pointer in ip header field */
>> +    struct tcp_header *tcp; /* tcp header */
>> +    uint16_t tcp_hdrlen;    /* tcp header len */
>> +    uint16_t payload;       /* pure payload without virtio/eth/ip/tcp */
>> +} NetRscUnit;
>> +
>> +/* Coalesced segmant */
>> +typedef struct NetRscSeg {
>> +    QTAILQ_ENTRY(NetRscSeg) next;
>> +    void *buf;
>> +    size_t size;
>> +    uint32_t dup_ack_count;
>> +    bool is_coalesced;      /* need recal ipv4 header checksum, mark here */
>> +    NetRscUnit unit;
>> +    NetClientState *nc;
>> +} NetRscSeg;
>> +
>> +/* Chain is divided by protocol(ipv4/v6) and NetClientInfo */
>> +typedef struct NetRscChain {
>> +    QTAILQ_ENTRY(NetRscChain) next;
>> +    void *n;                            /* VirtIONet */
>> +    uint16_t proto;
>> +    uint16_t max_payload;
>> +    QEMUTimer *drain_timer;
>> +    QTAILQ_HEAD(, NetRscSeg) buffers;
>> +    NetRscStat stat;
>> +} NetRscChain;
>> +
>>   void virtio_instance_init_common(Object *proxy_obj, void *data,
>>                                    size_t vdev_size, const char *vdev_name);
>>   
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 tcp traffic
  2016-04-08  7:27       ` Jason Wang
@ 2016-04-08  7:51         ` Wei Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Xu @ 2016-04-08  7:51 UTC (permalink / raw)
  To: Jason Wang, qemu-devel



On 2016年04月08日 15:27, Jason Wang wrote:
>
> On 04/08/2016 03:06 PM, Wei Xu wrote:
>>
>> On 2016年04月05日 10:50, Jason Wang wrote:
>>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>>>> From: Wei Xu <wexu@redhat.com>
>>>>
>>>> Most things like ipv4 except there is a significant difference
>>>> between ipv4
>>>> and ipv6, the fragment lenght in ipv4 header includes itself, while
>>>> it's not
>>> typo
>> Thanks.
>>>> included for ipv6, thus means ipv6 can carry a real '65535' payload.
>>>>
>>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>>> ---
>>>>    hw/net/virtio-net.c | 147
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++---
>>>>    1 file changed, 141 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 81e8e71..2d09352 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -50,6 +50,10 @@
>>>>    /* header lenght value in ip header without option */
>>>>    #define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>>>    +#define ETH_IP6_HDR_SZ (ETH_HDR_SZ + IP6_HDR_SZ)
>>>> +#define VIRTIO_NET_IP6_ADDR_SIZE   32      /* ipv6 saddr + daddr */
>>>> +#define VIRTIO_NET_MAX_IP6_PAYLOAD VIRTIO_NET_MAX_TCP_PAYLOAD
>>>> +
>>>>    /* Purge coalesced packets timer interval */
>>>>    #define VIRTIO_NET_RSC_INTERVAL  300000
>>>>    @@ -1725,6 +1729,25 @@ static void
>>>> virtio_net_rsc_extract_unit4(NetRscChain *chain,
>>>>        unit->payload = htons(*unit->ip_plen) - ip_hdrlen -
>>>> unit->tcp_hdrlen;
>>>>    }
>>>>    +static void virtio_net_rsc_extract_unit6(NetRscChain *chain,
>>>> +                                         const uint8_t *buf,
>>>> NetRscUnit* unit)
>>>> +{
>>>> +    uint16_t hdr_len;
>>>> +    struct ip6_header *ip6;
>>>> +
>>>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>>> +    ip6 = (struct ip6_header *)(buf + hdr_len + sizeof(struct
>>>> eth_header));
>>>> +    unit->ip = ip6;
>>>> +    unit->ip_plen = &(ip6->ip6_ctlun.ip6_un1.ip6_un1_plen);
>>>> +    unit->tcp = (struct tcp_header *)(((uint8_t *)unit->ip)\
>>>> +                                        + sizeof(struct ip6_header));
>>>> +    unit->tcp_hdrlen = (htons(unit->tcp->th_offset_flags) & 0xF000)
>>>>>> 10;
>>>> +
>>>> +    /* There is a difference between payload lenght in ipv4 and v6,
>>>> +       ip header is excluded in ipv6 */
>>>> +    unit->payload = htons(*unit->ip_plen) - unit->tcp_hdrlen;
>>>> +}
>>>> +
>>>>    static void virtio_net_rsc_ipv4_checksum(struct ip_header *ip)
>>>>    {
>>>>        uint32_t sum;
>>>> @@ -1738,7 +1761,9 @@ static size_t
>>>> virtio_net_rsc_drain_seg(NetRscChain *chain, NetRscSeg *seg)
>>>>    {
>>>>        int ret;
>>>>    -    virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>>>> +    if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
>>>> +        virtio_net_rsc_ipv4_checksum(seg->unit.ip);
>>>> +    }
>>> Why not introduce proto specific checksum function for chain?
>> Since there are only 2 protocols to be supported, and very limited
>> extension for this feature, mst suggest to use direct call in v2 patch
>> to make things simple, and i took it.
> Have you tried with my suggestion? I think it will actually simplify the
> current code (at least several lines of codes).
ok, will give it a try.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-08  7:47     ` Wei Xu
@ 2016-04-08  8:31       ` Jason Wang
  2016-04-08  9:15         ` Wei Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Wang @ 2016-04-08  8:31 UTC (permalink / raw)
  To: Wei Xu, qemu-devel



On 04/08/2016 03:47 PM, Wei Xu wrote:
>
>
> On 2016年04月05日 10:47, Jason Wang wrote:
>>
>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> All the data packets in a tcp connection will be cached to a big buffer
>>> in every receive interval, and will be sent out via a timer, the
>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>> influent the
>>> performance and response of tcp connection extremely, 50000(50us) is a
>>> experience value to gain a performance improvement, since the whql test
>>> sends packets every 100us, so '300000(300us)' can pass the test case,
>>> this is also the default value, it's gonna to be tunable.
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets
>>>
>>> 'NetRscChain' is used to save the segments of different protocols in a
>>> VirtIONet device.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed sanity check
>>> and is identified as an 'wanted' one.
>>>
>>> An 'wanted' segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>>     this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Sanity check includes:
>>> 1. Incorrect version in IP header
>>> 2. IP options & IP fragment
>>> 3. Not a TCP packets
>>> 4. Sanity size check to prevent buffer overflow attack.
>>>
>>> There maybe more cases should be considered such as ip
>>> identification other
>>> flags, while it broke the test because windows set it to the same
>>> even it's
>>> not a fragment.
>>>
>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>> 'bypass'
>>> and 'finalize', 'bypass' means should be sent out directly, and
>>> 'finalize'
>>> means the packets should also be bypassed, and this should be done
>>> after searching for the same connection packets in the pool and sending
>>> all of them out, this is to avoid out of data.
>>>
>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>> because
>>> this normally happens upon a connection is going to be closed, an
>>> 'URG' packet
>>> also finalize current coalescing unit.
>>>
>>> Statistics can be used to monitor the basic coalescing status, the
>>> 'out of order'
>>> and 'out of window' means how many retransmitting packets, thus
>>> describe the
>>> performance intuitively.
>>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>> ---
>>>   hw/net/virtio-net.c            | 480
>>> ++++++++++++++++++++++++++++++++++++++++-
>>>   include/hw/virtio/virtio-net.h |   1 +
>>>   include/hw/virtio/virtio.h     |  72 +++++++
>>>   3 files changed, 552 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index bd91a4b..81e8e71 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -15,10 +15,12 @@
>>>   #include "qemu/iov.h"
>>>   #include "hw/virtio/virtio.h"
>>>   #include "net/net.h"
>>> +#include "net/eth.h"
>>>   #include "net/checksum.h"
>>>   #include "net/tap.h"
>>>   #include "qemu/error-report.h"
>>>   #include "qemu/timer.h"
>>> +#include "qemu/sockets.h"
>>>   #include "hw/virtio/virtio-net.h"
>>>   #include "net/vhost_net.h"
>>>   #include "hw/virtio/virtio-bus.h"
>>> @@ -38,6 +40,24 @@
>>>   #define endof(container, field) \
>>>       (offsetof(container, field) + sizeof(((container *)0)->field))
>>>   +#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
>>> +#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
>>> +
>>> +/* IPv4 max payload, 16 bits in the header */
>>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
>>> +
>>> +/* header lenght value in ip header without option */
>> typo here.
> Thanks.
>>
>>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>> +
>>> +/* Purge coalesced packets timer interval */
>>> +#define VIRTIO_NET_RSC_INTERVAL  300000
>>> +
>>> +/* This value affects the performance a lot, and should be tuned
>>> carefully,
>>> +   '300000'(300us) is the recommended value to pass the WHQL test,
>>> '50000' can
>>> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
>>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
>> Like we've discussed in previous versions, need we add another property
>> for this?
> Do you know how to make this a tunable parameter to guest? can this
> parameter be set via control queue?

It's possible I think.


[...]

>>> +
>>> +static void virtio_net_rsc_purge(void *opq)
>>> +{
>>> +    NetRscChain *chain = (NetRscChain *)opq;
>>> +    NetRscSeg *seg, *rn;
>>> +
>>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
>>> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>>> +            chain->stat.purge_failed++;
>>> +            continue;
>> Is it better to break here, consider we fail to do the receive?
> Actually this fails only when receive fails according to the test, but
> should drain all the segments here,
> if break here, then will have to free all the buffers, is it possible
> a successful receiving followed by a failed one?

I'd say it's possible, e.g guest just finish rx buffer refills.

> if that does happens, i preferred give it a shot other than free it
> directly.

Ok.

[...]

>>> +
>>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain,
>>> NetClientState *nc,
>>> +                                     const uint8_t *buf, size_t size)
>>> +{
>>> +    uint16_t hdr_len;
>>> +    NetRscSeg *seg;
>>> +
>>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>> +    seg = g_malloc(sizeof(NetRscSeg));
>>> +    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
>>> +                   + sizeof(struct ip6_header) +
>>> VIRTIO_NET_MAX_TCP_PAYLOAD);
>> Why ip6_header here?
> ipv6 packets can carry 65535(2^16) bytes pure data payload, that means
> the header is not included.
> but ipv4 is included, i choose the maximum here.

Let's add a comment here, since the patch's title is to implement ipv4
coalescing.

[...]

>>> +
>>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> +                        const uint8_t *buf, size_t size, NetRscUnit
>>> *unit)
>> indentation is wrong here.
> Should it be aligned from parameter begin?

Looks like we'd better do this.

> i was going to save an extra line parameter.

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-08  8:31       ` Jason Wang
@ 2016-04-08  9:15         ` Wei Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Xu @ 2016-04-08  9:15 UTC (permalink / raw)
  To: qemu-devel



On 2016年04月08日 16:31, Jason Wang wrote:
>
> On 04/08/2016 03:47 PM, Wei Xu wrote:
>>
>> On 2016年04月05日 10:47, Jason Wang wrote:
>>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>>>> From: Wei Xu <wexu@redhat.com>
>>>>
>>>> All the data packets in a tcp connection will be cached to a big buffer
>>>> in every receive interval, and will be sent out via a timer, the
>>>> 'virtio_net_rsc_timeout' controls the interval, the value will
>>>> influent the
>>>> performance and response of tcp connection extremely, 50000(50us) is a
>>>> experience value to gain a performance improvement, since the whql test
>>>> sends packets every 100us, so '300000(300us)' can pass the test case,
>>>> this is also the default value, it's gonna to be tunable.
>>>>
>>>> The timer will only be triggered if the packets pool is not empty,
>>>> and it'll drain off all the cached packets
>>>>
>>>> 'NetRscChain' is used to save the segments of different protocols in a
>>>> VirtIONet device.
>>>>
>>>> The main handler of TCP includes TCP window update, duplicated ACK
>>>> check
>>>> and the real data coalescing if the new segment passed sanity check
>>>> and is identified as an 'wanted' one.
>>>>
>>>> An 'wanted' segment means:
>>>> 1. Segment is within current window and the sequence is the expected
>>>> one.
>>>> 2. ACK of the segment is in the valid window.
>>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>>> than 2,
>>>>      this is to notify upper layer TCP starting retransmission due to
>>>> the spec.
>>>>
>>>> Sanity check includes:
>>>> 1. Incorrect version in IP header
>>>> 2. IP options & IP fragment
>>>> 3. Not a TCP packets
>>>> 4. Sanity size check to prevent buffer overflow attack.
>>>>
>>>> There maybe more cases should be considered such as ip
>>>> identification other
>>>> flags, while it broke the test because windows set it to the same
>>>> even it's
>>>> not a fragment.
>>>>
>>>> Normally it includes 2 typical ways to handle a TCP control flag,
>>>> 'bypass'
>>>> and 'finalize', 'bypass' means should be sent out directly, and
>>>> 'finalize'
>>>> means the packets should also be bypassed, and this should be done
>>>> after searching for the same connection packets in the pool and sending
>>>> all of them out, this is to avoid out of data.
>>>>
>>>> All the 'SYN' packets will be bypassed since this always begin a new'
>>>> connection, other flags such 'FIN/RST' will trigger a finalization,
>>>> because
>>>> this normally happens upon a connection is going to be closed, an
>>>> 'URG' packet
>>>> also finalize current coalescing unit.
>>>>
>>>> Statistics can be used to monitor the basic coalescing status, the
>>>> 'out of order'
>>>> and 'out of window' means how many retransmitting packets, thus
>>>> describe the
>>>> performance intuitively.
>>>>
>>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>>> ---
>>>>    hw/net/virtio-net.c            | 480
>>>> ++++++++++++++++++++++++++++++++++++++++-
>>>>    include/hw/virtio/virtio-net.h |   1 +
>>>>    include/hw/virtio/virtio.h     |  72 +++++++
>>>>    3 files changed, 552 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index bd91a4b..81e8e71 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -15,10 +15,12 @@
>>>>    #include "qemu/iov.h"
>>>>    #include "hw/virtio/virtio.h"
>>>>    #include "net/net.h"
>>>> +#include "net/eth.h"
>>>>    #include "net/checksum.h"
>>>>    #include "net/tap.h"
>>>>    #include "qemu/error-report.h"
>>>>    #include "qemu/timer.h"
>>>> +#include "qemu/sockets.h"
>>>>    #include "hw/virtio/virtio-net.h"
>>>>    #include "net/vhost_net.h"
>>>>    #include "hw/virtio/virtio-bus.h"
>>>> @@ -38,6 +40,24 @@
>>>>    #define endof(container, field) \
>>>>        (offsetof(container, field) + sizeof(((container *)0)->field))
>>>>    +#define VIRTIO_NET_IP4_ADDR_SIZE   8        /* ipv4 saddr + daddr */
>>>> +#define VIRTIO_NET_TCP_PORT_SIZE   4        /* sport + dport */
>>>> +
>>>> +/* IPv4 max payload, 16 bits in the header */
>>>> +#define VIRTIO_NET_MAX_IP4_PAYLOAD (65535 - sizeof(struct ip_header))
>>>> +#define VIRTIO_NET_MAX_TCP_PAYLOAD 65535
>>>> +
>>>> +/* header lenght value in ip header without option */
>>> typo here.
>> Thanks.
>>>> +#define VIRTIO_NET_IP4_HEADER_LENGTH 5
>>>> +
>>>> +/* Purge coalesced packets timer interval */
>>>> +#define VIRTIO_NET_RSC_INTERVAL  300000
>>>> +
>>>> +/* This value affects the performance a lot, and should be tuned
>>>> carefully,
>>>> +   '300000'(300us) is the recommended value to pass the WHQL test,
>>>> '50000' can
>>>> +   gain 2x netperf throughput with tso/gso/gro 'off'. */
>>>> +static uint32_t virtio_net_rsc_timeout = VIRTIO_NET_RSC_INTERVAL;
>>> Like we've discussed in previous versions, need we add another property
>>> for this?
>> Do you know how to make this a tunable parameter to guest? can this
>> parameter be set via control queue?
> It's possible I think.
>
>
> [...]
>
>>>> +
>>>> +static void virtio_net_rsc_purge(void *opq)
>>>> +{
>>>> +    NetRscChain *chain = (NetRscChain *)opq;
>>>> +    NetRscSeg *seg, *rn;
>>>> +
>>>> +    QTAILQ_FOREACH_SAFE(seg, &chain->buffers, next, rn) {
>>>> +        if (virtio_net_rsc_drain_seg(chain, seg) == 0) {
>>>> +            chain->stat.purge_failed++;
>>>> +            continue;
>>> Is it better to break here, consider we fail to do the receive?
>> Actually this fails only when receive fails according to the test, but
>> should drain all the segments here,
>> if break here, then will have to free all the buffers, is it possible
>> a successful receiving followed by a failed one?
> I'd say it's possible, e.g guest just finish rx buffer refills.
>
>> if that does happens, i preferred give it a shot other than free it
>> directly.
> Ok.
>
> [...]
>
>>>> +
>>>> +static void virtio_net_rsc_cache_buf(NetRscChain *chain,
>>>> NetClientState *nc,
>>>> +                                     const uint8_t *buf, size_t size)
>>>> +{
>>>> +    uint16_t hdr_len;
>>>> +    NetRscSeg *seg;
>>>> +
>>>> +    hdr_len = ((VirtIONet *)(chain->n))->guest_hdr_len;
>>>> +    seg = g_malloc(sizeof(NetRscSeg));
>>>> +    seg->buf = g_malloc(hdr_len + sizeof(struct eth_header)\
>>>> +                   + sizeof(struct ip6_header) +
>>>> VIRTIO_NET_MAX_TCP_PAYLOAD);
>>> Why ip6_header here?
>> ipv6 packets can carry 65535(2^16) bytes pure data payload, that means
>> the header is not included.
>> but ipv4 is included, i choose the maximum here.
> Let's add a comment here, since the patch's title is to implement ipv4
> coalescing.
ok.
> [...]
>
>>>> +
>>>> +static int32_t virtio_net_rsc_coalesce4(NetRscChain *chain,
>>>> NetRscSeg *seg,
>>>> +                        const uint8_t *buf, size_t size, NetRscUnit
>>>> *unit)
>>> indentation is wrong here.
>> Should it be aligned from parameter begin?
> Looks like we'd better do this.
ok.
>
>> i was going to save an extra line parameter.
> [...]
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit
  2016-04-05  8:17     ` Michael S. Tsirkin
@ 2016-04-10 15:23       ` Wei Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Xu @ 2016-04-10 15:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: qemu-devel, victork, yvugenfi, marcel, dfleytma



On 2016年04月05日 16:17, Michael S. Tsirkin wrote:
> On Tue, Apr 05, 2016 at 10:05:17AM +0800, Jason Wang wrote:
>>
>> On 04/04/2016 03:25 AM, wexu@redhat.com wrote:
>>> From: Wei Xu <wexu@redhat.com>
>>>
>>> A new feature bit 'VIRTIO_NET_F_GUEST_RSC' is introduced to support WHQL
>>> Receive-Segment-Offload test, this feature will coalesce tcp packets in
>>> IPv4/6 for userspace virtio-net driver.
>>>
>>> This feature can be enabled either by 'ACK' the feature when loading
>>> the driver in the guest, or by sending the 'VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET'
>>> command to the host via control queue.
>>>
>>> Signed-off-by: Wei Xu <wexu@redhat.com>
>>> ---
>>>   hw/net/virtio-net.c                         | 29 +++++++++++++++++++++++++++--
>>>   include/standard-headers/linux/virtio_net.h |  1 +
>>>   2 files changed, 28 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 5798f87..bd91a4b 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -537,6 +537,7 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
>>>           virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
>>>           virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
>>>           virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
>>> +        virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_RSC);
>> Several questions here:
>>
>> - I think RSC should work even without vnet_hdr?
That's interesting, but i'm wondering how to test this? could you please 
point me out?
>> - Need we differentiate ipv4 and ipv6 like TSO here?
Sure, thanks.
>> - And looks like this patch should be squash to following patches.
OK.
>>
>>>       }
>>>   
>>>       if (!peer_has_vnet_hdr(n) || !peer_has_ufo(n)) {
>>> @@ -582,7 +583,8 @@ static uint64_t virtio_net_guest_offloads_by_features(uint32_t features)
>>>           (1ULL << VIRTIO_NET_F_GUEST_TSO4) |
>>>           (1ULL << VIRTIO_NET_F_GUEST_TSO6) |
>>>           (1ULL << VIRTIO_NET_F_GUEST_ECN)  |
>>> -        (1ULL << VIRTIO_NET_F_GUEST_UFO);
>>> +        (1ULL << VIRTIO_NET_F_GUEST_UFO)  |
>>> +        (1ULL << VIRTIO_NET_F_GUEST_RSC);
>> Looks like this is unnecessary since we won't set peer offload based on
>> GUEST_RSC.
there is an exclusive check when handling set feature command from 
control queue, so looks it will broke the check if don't include this bit.
>>
>>>   
>>>       return guest_offloads_mask & features;
>>>   }
>>> @@ -1089,7 +1091,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
>>>       return 0;
>>>   }
>>>   
>>> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
>>> +static ssize_t virtio_net_do_receive(NetClientState *nc,
>>> +                                     const uint8_t *buf, size_t size)
>>>   {
>>>       VirtIONet *n = qemu_get_nic_opaque(nc);
>>>       VirtIONetQueue *q = virtio_net_get_subqueue(nc);
>>> @@ -1685,6 +1688,26 @@ static int virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>       return 0;
>>>   }
>>>   
>>> +
>>> +static ssize_t virtio_net_rsc_receive(NetClientState *nc,
>>> +                                      const uint8_t *buf, size_t size)
>>> +{
>>> +    return virtio_net_do_receive(nc, buf, size);
>>> +}
>>> +
>>> +static ssize_t virtio_net_receive(NetClientState *nc,
>>> +                                  const uint8_t *buf, size_t size)
>>> +{
>>> +    VirtIONet *n;
>>> +
>>> +    n = qemu_get_nic_opaque(nc);
>>> +    if (n->curr_guest_offloads & VIRTIO_NET_F_GUEST_RSC) {
>>> +        return virtio_net_rsc_receive(nc, buf, size);
>>> +    } else {
>>> +        return virtio_net_do_receive(nc, buf, size);
>>> +    }
>>> +}
>> The changes here looks odd since it did nothing. Like I've mentioned,
>> better merge the patch into following ones.
OK.
>>
>>> +
>>>   static NetClientInfo net_virtio_info = {
>>>       .type = NET_CLIENT_OPTIONS_KIND_NIC,
>>>       .size = sizeof(NICState),
>>> @@ -1909,6 +1932,8 @@ static Property virtio_net_properties[] = {
>>>                          TX_TIMER_INTERVAL),
>>>       DEFINE_PROP_INT32("x-txburst", VirtIONet, net_conf.txburst, TX_BURST),
>>>       DEFINE_PROP_STRING("tx", VirtIONet, net_conf.tx),
>>> +    DEFINE_PROP_BIT("guest_rsc", VirtIONet, host_features,
>>> +                    VIRTIO_NET_F_GUEST_RSC, true),
>> Need to compat the bit for old machine type to unbreak migration I believe?
> And definitely disable by default.
There maybe some windows specific details about this, i'll discuss with 
Yan and update.
>
>> Btw, also need a patch for virtio spec.
Sure.
>>
>> Thanks
>>
>>>       DEFINE_PROP_END_OF_LIST(),
>>>   };
>>>   
>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>> index a78f33e..5b95762 100644
>>> --- a/include/standard-headers/linux/virtio_net.h
>>> +++ b/include/standard-headers/linux/virtio_net.h
>>> @@ -55,6 +55,7 @@
>>>   #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
>>>   					 * Steering */
>>>   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
>>> +#define VIRTIO_NET_F_GUEST_RSC  24      /* Guest can coalesce tcp packets */
>>>   
>>>   #ifndef VIRTIO_NET_NO_LEGACY
>>>   #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic
  2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
  2016-04-05  2:47   ` Jason Wang
@ 2016-04-12  8:23   ` Michael S. Tsirkin
  1 sibling, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2016-04-12  8:23 UTC (permalink / raw)
  To: wexu; +Cc: qemu-devel, jasowang, marcel, victork, dfleytma, yvugenfi

On Mon, Apr 04, 2016 at 03:25:55AM +0800, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
> 
> All the data packets in a tcp connection will be cached to a big buffer
> in every receive interval, and will be sent out via a timer, the
> 'virtio_net_rsc_timeout' controls the interval, the value will influent the
> performance and response of tcp connection extremely, 50000(50us) is a
> experience value to gain a performance improvement, since the whql test
> sends packets every 100us, so '300000(300us)' can pass the test case,
> this is also the default value, it's gonna to be tunable.
> 
> The timer will only be triggered if the packets pool is not empty,
> and it'll drain off all the cached packets
> 
> 'NetRscChain' is used to save the segments of different protocols in a
> VirtIONet device.
> 
> The main handler of TCP includes TCP window update, duplicated ACK check
> and the real data coalescing if the new segment passed sanity check
> and is identified as an 'wanted' one.
> 
> An 'wanted' segment means:
> 1. Segment is within current window and the sequence is the expected one.
> 2. ACK of the segment is in the valid window.
> 3. If the ACK in the segment is a duplicated one, then it must less than 2,
>    this is to notify upper layer TCP starting retransmission due to the spec.
> 
> Sanity check includes:
> 1. Incorrect version in IP header
> 2. IP options & IP fragment
> 3. Not a TCP packets
> 4. Sanity size check to prevent buffer overflow attack.
> 
> There maybe more cases should be considered such as ip identification other
> flags, while it broke the test because windows set it to the same even it's
> not a fragment.
> 
> Normally it includes 2 typical ways to handle a TCP control flag, 'bypass'
> and 'finalize', 'bypass' means should be sent out directly, and 'finalize'
> means the packets should also be bypassed, and this should be done
> after searching for the same connection packets in the pool and sending
> all of them out, this is to avoid out of data.
> 
> All the 'SYN' packets will be bypassed since this always begin a new'
> connection, other flags such 'FIN/RST' will trigger a finalization, because
> this normally happens upon a connection is going to be closed, an 'URG' packet
> also finalize current coalescing unit.
> 
> Statistics can be used to monitor the basic coalescing status, the 'out of order'
> and 'out of window' means how many retransmitting packets, thus describe the
> performance intuitively.
> 
> Signed-off-by: Wei Xu <wexu@redhat.com>

Yan pointed out that VIRTIO_NET_HDR_GSO_* is never used in this patch,
nor is gso_type ever set in virtio net header.

This seems strange.

-- 
MST

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-12  8:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-03 19:25 [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL wexu
2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 1/3] virtio-net rsc: add a new host offload(rsc) feature bit wexu
2016-04-05  2:05   ` Jason Wang
2016-04-05  8:17     ` Michael S. Tsirkin
2016-04-10 15:23       ` Wei Xu
2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 2/3] virtio-net rsc: support coalescing ipv4 tcp traffic wexu
2016-04-05  2:47   ` Jason Wang
2016-04-08  7:47     ` Wei Xu
2016-04-08  8:31       ` Jason Wang
2016-04-08  9:15         ` Wei Xu
2016-04-12  8:23   ` Michael S. Tsirkin
2016-04-03 19:25 ` [Qemu-devel] [ RFC Patch v4 3/3] virtio-net rsc: support coalescing ipv6 " wexu
2016-04-05  2:50   ` Jason Wang
2016-04-08  7:06     ` Wei Xu
2016-04-08  7:27       ` Jason Wang
2016-04-08  7:51         ` Wei Xu
2016-04-05  2:52 ` [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.