All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
@ 2015-12-22 10:42 Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
                   ` (13 more replies)
  0 siblings, 14 replies; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Hi,all

This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
based on qemu netfilter and it's a plugin for qemu netfilter. the function
keep Secondary VM connect normal to Primary VM and compare packets
sent by PVM to sent by SVM.if the packet difference,notify COLO do
checkpoint and send all primary packet has queued.

You can also get the series from:

https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2

Usage:

primary:
-netdev tap,id=bn0 -device e1000,netdev=bn0
-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port

secondary:
-netdev tap,id=bn0 -device e1000,netdev=bn0
-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port 

NOTE:
queue must set "all". See enum NetFilterDirection for detail.
colo-proxy need queue all packets
colo-proxy V2 just can compare ip packet


## Background

COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
project is a high availability solution. Both Primary VM (PVM) and Secondary VM
(SVM) run in parallel. They receive the same request from client, and generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint (on
demand)is conducted.

Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0

COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping

COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and finding out
whether they are identical, we introduce a new module to qemu networking
called colo-proxy.

V2:
  rebase colo-proxy with qemu-colo-v2.2-periodic-mode
  fix dave's comments
  fix wency's comments
  fix zhanghailiang's comments

v1:
  initial patch.



zhangchen (10):
  Init colo-proxy object based on netfilter
  Jhash: add linux kernel jhashtable in qemu
  Colo-proxy: add colo-proxy framework
  Colo-proxy: add data structure and jhash func
  net/colo-proxy: Add colo interface to use proxy
  net/colo-proxy: add socket used by forward func
  net/colo-proxy: Add packet enqueue & handle func
  net/colo-proxy: Handle packet and connection
  net/colo-proxy: Compare pri pkt to sec pkt
  net/colo-proxy: Colo-proxy do checkpoint and clear

 include/qemu/jhash.h |  61 ++++
 net/Makefile.objs    |   1 +
 net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
 net/colo-proxy.h     |  24 ++
 qemu-options.hx      |   6 +
 trace-events         |   8 +
 vl.c                 |   3 +-
 7 files changed, 1041 insertions(+), 1 deletion(-)
 create mode 100644 include/qemu/jhash.h
 create mode 100644 net/colo-proxy.c
 create mode 100644 net/colo-proxy.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object based on netfilter
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-01-15 18:21   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu Zhang Chen
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

add colo-proxy to vl.c and qemu-options.hx
add trace-colo-proxy relation

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 qemu-options.hx | 6 ++++++
 trace-events    | 8 ++++++++
 vl.c            | 3 ++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 0eea4ee..6daa3f0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3670,6 +3670,12 @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
 @option{tx}: the filter is attached to the transmit queue of the netdev,
              where it will receive packets sent by the netdev.
 
+@item -object colo-proxy,id=@var{id},netdev=@var{netdevid},addr=@var{host:port},mode=@var{primary|secondary}[,queue=@var{all}]
+
+Colo-proxy on netdev @var{netdevid},set colo mode @var{primary|secondary}
+connect other colo through addr@var{host:port},and colo needs queue all
+packet arriving in queue=@var{all}
+
 @item -object filter-dump,id=@var{id},netdev=@var{dev},file=@var{filename}][,maxlen=@var{len}]
 
 Dump the network traffic on netdev @var{dev} to the file specified by
diff --git a/trace-events b/trace-events
index 5f95b3c..a957fb3 100644
--- a/trace-events
+++ b/trace-events
@@ -1586,6 +1586,14 @@ colo_failover_set_state(int new_state) "new state %d"
 colo_start_block_replication(void) "Block replication is started"
 colo_stop_block_replication(const char *reason) "Block replication is stopped(reason: '%s')"
 
+# net/colo-proxy.c
+colo_proxy(const char *sta) ": %s"
+colo_proxy_with_ret(const char *sta, ssize_t ret) ": %s ret = %zu"
+colo_proxy_packet_src(const char *src) ":ipsrc = %s"
+colo_proxy_packet_dst(const char *dst) ":ipdst = %s"
+colo_proxy_packet_size(int size) ": %d"
+colo_proxy_queue_size(int size) ": %d"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
diff --git a/vl.c b/vl.c
index 8dc34ce..dcfb3a9 100644
--- a/vl.c
+++ b/vl.c
@@ -2838,7 +2838,8 @@ static bool object_create_initial(const char *type)
      * they depend on netdevs already existing
      */
     if (g_str_equal(type, "filter-buffer") ||
-        g_str_equal(type, "filter-dump")) {
+        g_str_equal(type, "filter-dump") ||
+        g_str_equal(type, "colo-proxy")) {
         return false;
     }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-01-08 12:08   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework Zhang Chen
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Jhash used by colo-proxy to save and lookup
net connection info

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 include/qemu/jhash.h

diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
new file mode 100644
index 0000000..5b82d02
--- /dev/null
+++ b/include/qemu/jhash.h
@@ -0,0 +1,61 @@
+/* jhash.h: Jenkins hash support.
+  *
+  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
+  *
+  * http://burtleburtle.net/bob/hash/
+  *
+  * These are the credits from Bob's sources:
+  *
+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+  *
+  * These are functions for producing 32-bit hashes for hash table lookup.
+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+  * are externally useful functions.  Routines to test the hash are
+included
+  * if SELF_TEST is defined.  You can use this free for any purpose.
+It's in
+  * the public domain.  It has no warranty.
+  *
+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
+  *
+  * I've modified Bob's hash to be useful in the Linux kernel, and
+  * any bugs present are my fault.
+  * Jozsef
+  */
+
+#ifndef QEMU_JHASH_H__
+#define QEMU_JHASH_H__
+
+#include "qemu/bitopt.h"
+
+/*
+ * hashtable relation copy from linux kernel jhash
+ */
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)                \
+{                                           \
+    a -= c;  a ^= rol32(c, 4);  c += b;     \
+    b -= a;  b ^= rol32(a, 6);  a += c;     \
+    c -= b;  c ^= rol32(b, 8);  b += a;     \
+    a -= c;  a ^= rol32(c, 16); c += b;     \
+    b -= a;  b ^= rol32(a, 19); a += c;     \
+    c -= b;  c ^= rol32(b, 4);  b += a;     \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)  \
+{                               \
+    c ^= b; c -= rol32(b, 14);  \
+    a ^= c; a -= rol32(c, 11);  \
+    b ^= a; b -= rol32(a, 25);  \
+    c ^= b; c -= rol32(b, 16);  \
+    a ^= c; a -= rol32(c, 4);   \
+    b ^= a; b -= rol32(a, 14);  \
+    c ^= b; c -= rol32(b, 24);  \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL           0xdeadbeef
+
+#endif /* QEMU_JHASH_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-02-19 19:57   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 04/10] Colo-proxy: add data structure and jhash func Zhang Chen
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Colo-proxy is a plugin of qemu netfilter
like filter-buffer and dump

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/Makefile.objs |   1 +
 net/colo-proxy.c  | 240 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/colo-proxy.h  |  24 ++++++
 3 files changed, 265 insertions(+)
 create mode 100644 net/colo-proxy.c
 create mode 100644 net/colo-proxy.h

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 5fa2f97..95670f2 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -15,3 +15,4 @@ common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
 common-obj-y += filter.o
 common-obj-y += filter-buffer.o
+common-obj-y += colo-proxy.o
diff --git a/net/colo-proxy.c b/net/colo-proxy.c
new file mode 100644
index 0000000..2e37c45
--- /dev/null
+++ b/net/colo-proxy.c
@@ -0,0 +1,240 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Author: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "net/filter.h"
+#include "net/queue.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/sockets.h"
+#include "qemu/main-loop.h"
+#include "qemu/jhash.h"
+#include "qemu/coroutine.h"
+#include "net/eth.h"
+#include "slirp/slirp.h"
+#include "slirp/slirp_config.h"
+#include "slirp/ip.h"
+#include "net/net.h"
+#include "qemu/error-report.h"
+#include "net/colo-proxy.h"
+#include "trace.h"
+#include <sys/sysinfo.h>
+
+#define FILTER_COLO_PROXY(obj) \
+    OBJECT_CHECK(COLOProxyState, (obj), TYPE_FILTER_COLO_PROXY)
+
+#define TYPE_FILTER_COLO_PROXY "colo-proxy"
+#define PRIMARY_MODE "primary"
+#define SECONDARY_MODE "secondary"
+
+/*
+
+  |COLOProxyState++
+  |               |
+  +---------------+   +---------------+         +---------------+
+  |conn list      +--->conn           +--------->conn           |
+  +---------------+   +---------------+         +---------------+
+  |               |     |           |             |           |
+  +---------------+ +---v----+  +---v----+    +---v----+  +---v----+
+                    |primary |  |secondary    |primary |  |secondary
+                    |packet  |  |packet  +    |packet  |  |packet  +
+                    +--------+  +--------+    +--------+  +--------+
+                        |           |             |           |
+                    +---v----+  +---v----+    +---v----+  +---v----+
+                    |primary |  |secondary    |primary |  |secondary
+                    |packet  |  |packet  +    |packet  |  |packet  +
+                    +--------+  +--------+    +--------+  +--------+
+                        |           |             |           |
+                    +---v----+  +---v----+    +---v----+  +---v----+
+                    |primary |  |secondary    |primary |  |secondary
+                    |packet  |  |packet  +    |packet  |  |packet  +
+                    +--------+  +--------+    +--------+  +--------+
+
+
+*/
+
+typedef struct COLOProxyState {
+    NetFilterState parent_obj;
+    NetQueue *incoming_queue;/* guest normal net queue */
+    NetFilterDirection direction; /* packet direction */
+    /* colo mode (primary or secondary) */
+    int colo_mode;
+    /* primary colo connect address(192.168.0.100:12345)
+     * or secondary listening address(:12345)
+     */
+    char *addr;
+    int sockfd;
+
+     /* connection list: the packet belonged to this NIC
+     * could be found in this list.
+     * element type: Connection
+     */
+    GQueue conn_list;
+    int status; /* proxy is running or not */
+    ssize_t hashtable_size; /* proxy current hash size */
+    QemuEvent need_compare_ev;  /* notify compare thread */
+    QemuThread thread; /* compare thread, a thread for each NIC */
+
+} COLOProxyState;
+
+enum {
+    COLO_PROXY_NONE,     /* colo proxy is not started */
+    COLO_PROXY_RUNNING,  /* colo proxy is running */
+    COLO_PROXY_DONE,     /* colo proxyis done(failover) */
+};
+
+/* save all the connections of a vm instance in this table */
+GHashTable *colo_conn_hash;
+static bool colo_do_checkpoint;
+static ssize_t hashtable_max_size;
+
+static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
+                                         NetClientState *sender,
+                                         unsigned flags,
+                                         const struct iovec *iov,
+                                         int iovcnt,
+                                         NetPacketSent *sent_cb)
+{
+    /*
+     * We return size when buffer a packet, the sender will take it as
+     * a already sent packet, so sent_cb should not be called later.
+     *
+     */
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    ssize_t ret = 0;
+
+    if (s->status != COLO_PROXY_RUNNING) {
+        /* proxy is not started or failovered */
+        return 0;
+    }
+
+    if (s->colo_mode == COLO_MODE_PRIMARY) {
+        /* colo_proxy_primary_handler */
+    } else {
+        /* colo_proxy_secondary_handler */
+    }
+    return iov_size(iov, iovcnt);
+}
+
+static void colo_proxy_cleanup(NetFilterState *nf)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    close(s->sockfd);
+    s->sockfd = -1;
+    qemu_event_destroy(&s->need_compare_ev);
+}
+
+static void colo_proxy_setup(NetFilterState *nf, Error **errp)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+
+    if (!s->addr) {
+        error_setg(errp, "filter colo_proxy needs 'addr' property set!");
+        return;
+    }
+
+    if (nf->direction != NET_FILTER_DIRECTION_ALL) {
+        error_setg(errp, "colo need queue all packet,"
+                        "please startup colo-proxy with queue=all\n");
+        return;
+    }
+
+    s->sockfd = -1;
+    s->hashtable_size = 0;
+    colo_do_checkpoint = false;
+    qemu_event_init(&s->need_compare_ev, false);
+
+    s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+    colo_conn_hash = g_hash_table_new_full(connection_key_hash,
+                                           connection_key_equal,
+                                           g_free,
+                                           connection_destroy);
+    g_queue_init(&s->conn_list);
+}
+
+static void colo_proxy_class_init(ObjectClass *oc, void *data)
+{
+    NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+    nfc->setup = colo_proxy_setup;
+    nfc->cleanup = colo_proxy_cleanup;
+    nfc->receive_iov = colo_proxy_receive_iov;
+}
+
+static int colo_proxy_get_mode(Object *obj, Error **errp)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+    return s->colo_mode;
+}
+
+static void
+colo_proxy_set_mode(Object *obj, int mode, Error **errp)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+    s->colo_mode = mode;
+}
+
+static char *colo_proxy_get_addr(Object *obj, Error **errp)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(obj);
+
+    return g_strdup(s->addr);
+}
+
+static void
+colo_proxy_set_addr(Object *obj, const char *value, Error **errp)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(obj);
+    g_free(s->addr);
+    s->addr = g_strdup(value);
+    if (!s->addr) {
+        error_setg(errp, "colo_proxy needs 'addr'"
+                     "property set!");
+        return;
+    }
+}
+
+static void colo_proxy_init(Object *obj)
+{
+    object_property_add_enum(obj, "mode", "COLOMode", COLOMode_lookup,
+                             colo_proxy_get_mode, colo_proxy_set_mode, NULL);
+    object_property_add_str(obj, "addr", colo_proxy_get_addr,
+                            colo_proxy_set_addr, NULL);
+}
+
+static void colo_proxy_fini(Object *obj)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(obj);
+    g_free(s->addr);
+}
+
+static const TypeInfo colo_proxy_info = {
+    .name = TYPE_FILTER_COLO_PROXY,
+    .parent = TYPE_NETFILTER,
+    .class_init = colo_proxy_class_init,
+    .instance_init = colo_proxy_init,
+    .instance_finalize = colo_proxy_fini,
+    .instance_size = sizeof(COLOProxyState),
+};
+
+static void register_types(void)
+{
+    type_register_static(&colo_proxy_info);
+}
+
+type_init(register_types);
diff --git a/net/colo-proxy.h b/net/colo-proxy.h
new file mode 100644
index 0000000..affc117
--- /dev/null
+++ b/net/colo-proxy.h
@@ -0,0 +1,24 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Author: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+
+#ifndef QEMU_COLO_PROXY_H
+#define QEMU_COLO_PROXY_H
+
+int colo_proxy_start(int mode);
+void colo_proxy_stop(int mode);
+int colo_proxy_do_checkpoint(int mode);
+bool colo_proxy_query_checkpoint(void);
+
+#endif /* QEMU_COLO_PROXY_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 04/10] Colo-proxy: add data structure and jhash func
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (2 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy Zhang Chen
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

add data structure and hash func will be uesed

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index 2e37c45..f448ee1 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -90,6 +90,40 @@ typedef struct COLOProxyState {
 
 } COLOProxyState;
 
+typedef struct Packet {
+    void *data;
+    union {
+        uint8_t *network_layer;
+        struct ip *ip;
+    };
+    uint8_t *transport_layer;
+    int size;
+    COLOProxyState *s;
+    NetClientState *sender;
+} Packet;
+
+typedef struct ConnectionKey {
+    /* (src, dst) must be grouped, in the same way than in IP header */
+    struct in_addr src;
+    struct in_addr dst;
+    uint16_t src_port;
+    uint16_t dst_port;
+    uint8_t ip_proto;
+} QEMU_PACKED ConnectionKey;
+
+/* define one connection */
+typedef struct Connection {
+    /* connection primary send queue: element type: Packet */
+    GQueue primary_list;
+    /* connection secondary send queue: element type: Packet */
+    GQueue secondary_list;
+     /* flag to enqueue unprocessed_connections */
+    bool processing;
+    int ip_proto;
+
+    void *proto; /* tcp only now */
+} Connection;
+
 enum {
     COLO_PROXY_NONE,     /* colo proxy is not started */
     COLO_PROXY_RUNNING,  /* colo proxy is running */
@@ -101,6 +135,38 @@ GHashTable *colo_conn_hash;
 static bool colo_do_checkpoint;
 static ssize_t hashtable_max_size;
 
+static inline void colo_proxy_dump_packet(Packet *pkt)
+{
+    int i;
+    for (i = 0; i < pkt->size; i++) {
+        printf("%02x ", ((uint8_t *)pkt->data)[i]);
+    }
+    printf("\n");
+}
+
+static uint32_t connection_key_hash(const void *opaque)
+{
+    const ConnectionKey *key = opaque;
+    uint32_t a, b, c;
+
+    /* Jenkins hash */
+    a = b = c = JHASH_INITVAL + sizeof(*key);
+    a += key->src.s_addr;
+    b += key->dst.s_addr;
+    c += (key->src_port | key->dst_port << 16);
+    __jhash_mix(a, b, c);
+
+    a += key->ip_proto;
+    __jhash_final(a, b, c);
+
+    return c;
+}
+
+static int connection_key_equal(const void *opaque1, const void *opaque2)
+{
+    return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
+}
+
 static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
                                          NetClientState *sender,
                                          unsigned flags,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (3 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 04/10] Colo-proxy: add data structure and jhash func Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-02-19 19:58   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func Zhang Chen
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Add interface used by migration/colo.c
so colo framework can work with proxy

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index f448ee1..ba2bbe7 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -167,6 +167,11 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
     return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
 }
 
+bool colo_proxy_query_checkpoint(void)
+{
+    return colo_do_checkpoint;
+}
+
 static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
                                          NetClientState *sender,
                                          unsigned flags,
@@ -203,6 +208,94 @@ static void colo_proxy_cleanup(NetFilterState *nf)
     qemu_event_destroy(&s->need_compare_ev);
 }
 
+static void colo_proxy_notify_checkpoint(void)
+{
+    trace_colo_proxy("colo_proxy_notify_checkpoint");
+    colo_do_checkpoint = true;
+}
+
+static void colo_proxy_start_one(NetFilterState *nf,
+                                      void *opaque, Error **errp)
+{
+    COLOProxyState *s;
+    int mode, ret;
+
+    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
+        return;
+    }
+
+    mode = *(int *)opaque;
+    s = FILTER_COLO_PROXY(nf);
+    assert(s->colo_mode == mode);
+
+    if (s->colo_mode == COLO_MODE_PRIMARY) {
+        char thread_name[1024];
+
+        ret = colo_proxy_connect(s);
+        if (ret) {
+            error_setg(errp, "colo proxy connect failed");
+            return ;
+        }
+
+        s->status = COLO_PROXY_RUNNING;
+        sprintf(thread_name, "proxy compare %s", nf->netdev_id);
+        qemu_thread_create(&s->thread, thread_name,
+                                colo_proxy_compare_thread, s,
+                                QEMU_THREAD_JOINABLE);
+    } else {
+        ret = colo_wait_incoming(s);
+        if (ret) {
+            error_setg(errp, "colo proxy wait incoming failed");
+            return ;
+        }
+        s->status = COLO_PROXY_RUNNING;
+    }
+}
+
+int colo_proxy_start(int mode)
+{
+    Error *err = NULL;
+    qemu_foreach_netfilter(colo_proxy_start_one, &mode, &err);
+    if (err) {
+        return -1;
+    }
+    return 0;
+}
+
+static void colo_proxy_stop_one(NetFilterState *nf,
+                                      void *opaque, Error **errp)
+{
+    COLOProxyState *s;
+    int mode;
+
+    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
+        return;
+    }
+
+    s = FILTER_COLO_PROXY(nf);
+    mode = *(int *)opaque;
+    assert(s->colo_mode == mode);
+
+    s->status = COLO_PROXY_DONE;
+    if (s->sockfd >= 0) {
+        qemu_set_fd_handler(s->sockfd, NULL, NULL, NULL);
+        closesocket(s->sockfd);
+    }
+    if (s->colo_mode == COLO_MODE_PRIMARY) {
+        colo_proxy_primary_checkpoint(s);
+        qemu_event_set(&s->need_compare_ev);
+        qemu_thread_join(&s->thread);
+    } else {
+        colo_proxy_secondary_checkpoint(s);
+    }
+}
+
+void colo_proxy_stop(int mode)
+{
+    Error *err = NULL;
+    qemu_foreach_netfilter(colo_proxy_stop_one, &mode, &err);
+}
+
 static void colo_proxy_setup(NetFilterState *nf, Error **errp)
 {
     COLOProxyState *s = FILTER_COLO_PROXY(nf);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (4 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-02-19 20:01   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 07/10] net/colo-proxy: Add packet enqueue & handle func Zhang Chen
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Colo need to forward packets
we start socket server in secondary and primary
connect to secondary in startup
the packet recv by primary forward to secondary
the packet send by secondary forward to primary

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index ba2bbe7..2347bbf 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -172,6 +172,69 @@ bool colo_proxy_query_checkpoint(void)
     return colo_do_checkpoint;
 }
 
+/*
+ * send a packet to peer
+ * >=0: success
+ * <0: fail
+ */
+static ssize_t colo_proxy_sock_send(NetFilterState *nf,
+                                         const struct iovec *iov,
+                                         int iovcnt)
+{
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    ssize_t ret = 0;
+    ssize_t size = 0;
+    struct iovec sizeiov = {
+        .iov_base = &size,
+        .iov_len = sizeof(size)
+    };
+    size = iov_size(iov, iovcnt);
+    if (!size) {
+        return 0;
+    }
+
+    ret = iov_send(s->sockfd, &sizeiov, 1, 0, sizeof(size));
+    if (ret < 0) {
+        return ret;
+    }
+    ret = iov_send(s->sockfd, iov, iovcnt, 0, size);
+    return ret;
+}
+
+/*
+ * receive a packet from peer
+ * in primary: enqueue packet to secondary_list
+ * in secondary: pass packet to next
+ */
+static void colo_proxy_sock_receive(void *opaque)
+{
+    NetFilterState *nf = opaque;
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    ssize_t len = 0;
+    struct iovec sizeiov = {
+        .iov_base = &len,
+        .iov_len = sizeof(len)
+    };
+
+    iov_recv(s->sockfd, &sizeiov, 1, 0, sizeof(len));
+    if (len > 0 && len < NET_BUFSIZE) {
+        char *buf = g_malloc0(len);
+        struct iovec iov = {
+            .iov_base = buf,
+            .iov_len = len
+        };
+
+        iov_recv(s->sockfd, &iov, 1, 0, len);
+        if (s->colo_mode == COLO_MODE_PRIMARY) {
+            colo_proxy_enqueue_secondary_packet(nf, buf, len);
+            /* buf will be release when pakcet destroy */
+        } else {
+            qemu_net_queue_send(s->incoming_queue, nf->netdev,
+                            0, (const uint8_t *)buf, len, NULL);
+        }
+    }
+}
+
 static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
                                          NetClientState *sender,
                                          unsigned flags,
@@ -208,6 +271,57 @@ static void colo_proxy_cleanup(NetFilterState *nf)
     qemu_event_destroy(&s->need_compare_ev);
 }
 
+/* wait for peer connecting
+ * NOTE: this function will block the caller
+ * 0 on success, otherwise returns -1
+ */
+static int colo_wait_incoming(COLOProxyState *s)
+{
+    struct sockaddr_in addr;
+    socklen_t addrlen = sizeof(addr);
+    int accept_sock, err;
+    int fd = inet_listen(s->addr, NULL, 256, SOCK_STREAM, 0, NULL);
+
+    if (fd < 0) {
+        error_report("colo proxy listen failed");
+        return -1;
+    }
+
+    do {
+        accept_sock = qemu_accept(fd, (struct sockaddr *)&addr, &addrlen);
+        err = socket_error();
+    } while (accept_sock < 0 && err == EINTR);
+    closesocket(fd);
+
+    if (accept_sock < 0) {
+        error_report("colo proxy accept failed(%s)", strerror(err));
+        return -1;
+    }
+    s->sockfd = accept_sock;
+
+    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
+
+    return 0;
+}
+
+/* try to connect listening server
+ * 0 on success, otherwise something wrong
+ */
+static ssize_t colo_proxy_connect(COLOProxyState *s)
+{
+    int sock;
+    sock = inet_connect(s->addr, NULL);
+
+    if (sock < 0) {
+        error_report("colo proxy inet_connect failed");
+        return -1;
+    }
+    s->sockfd = sock;
+    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
+
+    return 0;
+}
+
 static void colo_proxy_notify_checkpoint(void)
 {
     trace_colo_proxy("colo_proxy_notify_checkpoint");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 07/10] net/colo-proxy: Add packet enqueue & handle func
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (5 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection Zhang Chen
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Add common packet handle function and enqueue
packet distinguished connection,then we can
lookup one connection packet to compare

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 2 deletions(-)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index 2347bbf..5e5c72e 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -172,6 +172,73 @@ bool colo_proxy_query_checkpoint(void)
     return colo_do_checkpoint;
 }
 
+static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
+                                         NetClientState *sender,
+                                         unsigned flags,
+                                         const struct iovec *iov,
+                                         int iovcnt,
+                                         NetPacketSent *sent_cb)
+{
+    /*
+     * 1. parse packet, try to get connection factor
+     * (src_ip, src_port, dest_ip, dest_port)
+     * 2. enqueue the packet to primary_packet_list by connection
+     */
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    ssize_t size = iov_size(iov, iovcnt);
+    char *buf = g_malloc0(size); /* free by packet destory */
+    ConnectionKey key = {{ 0 } };
+    Packet *pkt;
+    Connection *conn;
+
+    iov_to_buf(iov, iovcnt, 0, buf, size);
+    pkt = packet_new(s, buf, size, &key, sender);
+    if (!pkt) {
+        return 0;
+    }
+
+    conn = colo_proxy_get_conn(s, &key);
+    if (!conn->processing) {
+        g_queue_push_tail(&s->conn_list, conn);
+        conn->processing = true;
+    }
+
+    g_queue_push_tail(&conn->primary_list, pkt);
+    qemu_event_set(&s->need_compare_ev);
+    return 1;
+}
+
+static ssize_t
+colo_proxy_enqueue_secondary_packet(NetFilterState *nf,
+                                    char *buf, int len)
+{
+    /*
+     * 1, parse packet, try to get connection factor
+     * (src_ip, src_port, dest_ip, dest_port)
+     * 2. enqueue the packet to secondary_packet_list by connection
+    */
+    COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    Connection *conn;
+    ConnectionKey key = {{ 0 } };
+    Packet *pkt = packet_new(s, buf, len, &key, NULL);
+
+    if (!pkt) {
+        error_report("%s paket_new failed", __func__);
+        return -1;
+    }
+
+    conn = colo_proxy_get_conn(s, &key);
+    if (!conn->processing) {
+        g_queue_push_tail(&s->conn_list, conn);
+        conn->processing = true;
+    }
+
+    /* In primary notify compare thead */
+    g_queue_push_tail(&conn->secondary_list, pkt);
+    qemu_event_set(&s->need_compare_ev);
+    return 0;
+}
+
 /*
  * send a packet to peer
  * >=0: success
@@ -235,6 +302,75 @@ static void colo_proxy_sock_receive(void *opaque)
     }
 }
 
+/*
+ * colo primary handle host's normal send and
+ * recv packets to primary guest
+ * return:          >= 0      success
+ *                  < 0       failed
+ */
+static ssize_t colo_proxy_primary_handler(NetFilterState *nf,
+                                         NetClientState *sender,
+                                         unsigned flags,
+                                         const struct iovec *iov,
+                                         int iovcnt,
+                                         NetPacketSent *sent_cb)
+{
+    ssize_t ret = 0;
+
+    /*
+     * if packet's direction=rx
+     * enqueue packets to primary queue
+     * and wait secondary queue to compare
+     * if packet's direction=tx
+     * enqueue packets then send packets to
+     * secondary and flush  queued packets
+    */
+    if (sender == nf->netdev) {
+        /* This packet is sent by netdev itself */
+        ret = colo_proxy_sock_send(nf, iov, iovcnt);
+        if (ret > 0) {
+            ret = 0;
+        }
+    } else {
+        ret = colo_proxy_enqueue_primary_packet(nf, sender, flags, iov,
+                    iovcnt, sent_cb);
+    }
+
+    return ret;
+}
+
+/*
+ * colo secondary handle host's normal send and
+ * recv packets to secondary guest
+ * return:          >= 0      success
+ *                  < 0       failed
+ */
+static ssize_t colo_proxy_secondary_handler(NetFilterState *nf,
+                                         NetClientState *sender,
+                                         unsigned flags,
+                                         const struct iovec *iov,
+                                         int iovcnt,
+                                         NetPacketSent *sent_cb)
+{
+    ssize_t ret = 0;
+
+    /*
+     * if packet's direction=rx
+     * enqueue packets and send to
+     * primary QEMU
+     * if packet's direction=tx
+     * record PVM's packet inital seq & adjust
+     * client's ack,send adjusted packets to SVM(next version will be do)
+     */
+    if (sender == nf->netdev) {
+        /* This packet is sent by netdev itself */
+    } else {
+        ret = colo_proxy_sock_send(nf, iov, iovcnt);
+    }
+
+    return ret;
+}
+
 static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
                                          NetClientState *sender,
                                          unsigned flags,
@@ -256,9 +392,17 @@ static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
     }
 
     if (s->colo_mode == COLO_MODE_PRIMARY) {
-        /* colo_proxy_primary_handler */
+        ret = colo_proxy_primary_handler(nf, sender, flags,
+                    iov, iovcnt, sent_cb);
+        if (ret == 0) {
+            return 0;
+        }
     } else {
-        /* colo_proxy_secondary_handler */
+        ret = colo_proxy_secondary_handler(nf, sender, flags,
+                    iov, iovcnt, sent_cb);
+    }
+    if (ret < 0) {
+        trace_colo_proxy("colo_proxy_receive_iov running failed");
     }
     return iov_size(iov, iovcnt);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (6 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 07/10] net/colo-proxy: Add packet enqueue & handle func Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-02-19 20:04   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt Zhang Chen
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

In here we will handle ip packet and connection

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index 5e5c72e..06bab80 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
     return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
 }
 
+static void connection_destroy(void *opaque)
+{
+    Connection *conn = opaque;
+
+    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
+    g_queue_free(&conn->primary_list);
+    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
+    g_queue_free(&conn->secondary_list);
+    g_slice_free(Connection, conn);
+}
+
+static Connection *connection_new(ConnectionKey *key)
+{
+    Connection *conn = g_slice_new(Connection);
+
+    conn->ip_proto = key->ip_proto;
+    conn->processing = false;
+    g_queue_init(&conn->primary_list);
+    g_queue_init(&conn->secondary_list);
+
+    return conn;
+}
+
+/*
+ * Clear hashtable, stop this hash growing really huge
+ */
+static void clear_connection_hashtable(COLOProxyState *s)
+{
+    s->hashtable_size = 0;
+    g_hash_table_remove_all(colo_conn_hash);
+    trace_colo_proxy("clear_connection_hashtable");
+}
+
 bool colo_proxy_query_checkpoint(void)
 {
     return colo_do_checkpoint;
 }
 
+/* Return 0 on success, or return -1 if the pkt is corrupted */
+static int parse_packet_early(Packet *pkt, ConnectionKey *key)
+{
+    int network_length;
+    uint8_t *data = pkt->data;
+    uint16_t l3_proto;
+    uint32_t tmp_ports;
+    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+
+    pkt->network_layer = data + ETH_HLEN;
+    l3_proto = eth_get_l3_proto(data, l2hdr_len);
+    if (l3_proto != ETH_P_IP) {
+        if (l3_proto == ETH_P_ARP) {
+            return -1;
+        }
+        return 0;
+    }
+
+    network_length = pkt->ip->ip_hl * 4;
+    pkt->transport_layer = pkt->network_layer + network_length;
+    key->ip_proto = pkt->ip->ip_p;
+    key->src = pkt->ip->ip_src;
+    key->dst = pkt->ip->ip_dst;
+
+    switch (key->ip_proto) {
+    case IPPROTO_TCP:
+    case IPPROTO_UDP:
+    case IPPROTO_DCCP:
+    case IPPROTO_ESP:
+    case IPPROTO_SCTP:
+    case IPPROTO_UDPLITE:
+        tmp_ports = *(uint32_t *)(pkt->transport_layer);
+        key->src_port = tmp_ports & 0xffff;
+        key->dst_port = tmp_ports >> 16;
+        break;
+    case IPPROTO_AH:
+        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
+        key->src_port = tmp_ports & 0xffff;
+        key->dst_port = tmp_ports >> 16;
+        break;
+    default:
+        break;
+    }
+
+    return 0;
+}
+
+static Packet *packet_new(COLOProxyState *s, void *data,
+                          int size, ConnectionKey *key, NetClientState *sender)
+{
+    Packet *pkt = g_slice_new(Packet);
+
+    pkt->data = data;
+    pkt->size = size;
+    pkt->s = s;
+    pkt->sender = sender;
+
+    if (parse_packet_early(pkt, key)) {
+        packet_destroy(pkt, NULL);
+        pkt = NULL;
+    }
+
+    return pkt;
+}
+
+static void packet_destroy(void *opaque, void *user_data)
+{
+    Packet *pkt = opaque;
+    g_free(pkt->data);
+    g_slice_free(Packet, pkt);
+}
+
+/* if not found, creata a new connection and add to hash table */
+static Connection *colo_proxy_get_conn(COLOProxyState *s,
+            ConnectionKey *key)
+{
+    /* FIXME: protect colo_conn_hash */
+    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
+
+    if (conn == NULL) {
+        ConnectionKey *new_key = g_malloc(sizeof(*key));
+
+        conn = connection_new(key);
+        memcpy(new_key, key, sizeof(*key));
+
+        s->hashtable_size++;
+        if (s->hashtable_size > hashtable_max_size) {
+            trace_colo_proxy("colo proxy connection hashtable full, clear it");
+            clear_connection_hashtable(s);
+        } else {
+            g_hash_table_insert(colo_conn_hash, new_key, conn);
+        }
+    }
+
+     return conn;
+}
+
 static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
                                          NetClientState *sender,
                                          unsigned flags,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (7 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2016-02-19 20:07   ` Dr. David Alan Gilbert
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 10/10] net/colo-proxy: Colo-proxy do checkpoint and clear Zhang Chen
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

We will compare packet sent by primary guest
to secondary guest,if same,send primary packet.
else we will notify colo to do checkpoint to
make secondary guset running same as primary

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index 06bab80..abb289f 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -602,6 +602,70 @@ static void colo_proxy_notify_checkpoint(void)
     colo_do_checkpoint = true;
 }
 
+/*
+ * The IP packets sent by primary and secondary
+ * will be comparison in here
+ * TODO: support ip fragment
+ * return:    0  means packet same
+ *            > 0 || < 0 means packet different
+ */
+static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+{
+    trace_colo_proxy("colo_packet_compare data   ppkt");
+    trace_colo_proxy_packet_size(ppkt->size);
+    trace_colo_proxy_packet_src(inet_ntoa(ppkt->ip->ip_src));
+    trace_colo_proxy_packet_dst(inet_ntoa(ppkt->ip->ip_dst));
+    colo_proxy_dump_packet(ppkt);
+    trace_colo_proxy("colo_packet_compare data   spkt");
+    trace_colo_proxy_packet_size(spkt->size);
+    trace_colo_proxy_packet_src(inet_ntoa(spkt->ip->ip_src));
+    trace_colo_proxy_packet_dst(inet_ntoa(spkt->ip->ip_dst));
+    colo_proxy_dump_packet(spkt);
+
+    if (ppkt->size == spkt->size) {
+        return memcmp(ppkt->data, spkt->data, spkt->size);
+    } else {
+        trace_colo_proxy("colo_packet_compare size not same");
+        return -1;
+    }
+}
+
+static void colo_compare_connection(void *opaque, void *user_data)
+{
+    Connection *conn = opaque;
+    Packet *pkt = NULL;
+    GList *result = NULL;
+
+    while (!g_queue_is_empty(&conn->primary_list) &&
+                !g_queue_is_empty(&conn->secondary_list)) {
+        pkt = g_queue_pop_head(&conn->primary_list);
+        result = g_queue_find_custom(&conn->secondary_list,
+                    pkt, (GCompareFunc)colo_packet_compare);
+        if (result) {
+            colo_send_primary_packet(pkt, NULL);
+            trace_colo_proxy("packet same and release packet");
+        } else {
+            g_queue_push_tail(&conn->primary_list, pkt);
+            trace_colo_proxy("packet different");
+            colo_proxy_notify_checkpoint();
+            break;
+        }
+    }
+}
+
+static void *colo_proxy_compare_thread(void *opaque)
+{
+    COLOProxyState *s = opaque;
+
+    while (s->status == COLO_PROXY_RUNNING) {
+        qemu_event_wait(&s->need_compare_ev);
+        qemu_event_reset(&s->need_compare_ev);
+        g_queue_foreach(&s->conn_list, colo_compare_connection, NULL);
+    }
+
+    return NULL;
+}
+
 static void colo_proxy_start_one(NetFilterState *nf,
                                       void *opaque, Error **errp)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [RFC PATCH v2 10/10] net/colo-proxy: Colo-proxy do checkpoint and clear
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (8 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt Zhang Chen
@ 2015-12-22 10:42 ` Zhang Chen
  2015-12-29  6:31 ` [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2015-12-22 10:42 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Zhang Chen, Yang Hongyang,
	zhanghailiang

From: zhangchen <zhangchen.fnst@cn.fujitsu.com>

Do checkpoint and flush

Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 net/colo-proxy.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/net/colo-proxy.c b/net/colo-proxy.c
index abb289f..79b1b1b 100644
--- a/net/colo-proxy.c
+++ b/net/colo-proxy.c
@@ -144,6 +144,8 @@ static inline void colo_proxy_dump_packet(Packet *pkt)
     printf("\n");
 }
 
+static void packet_destroy(void *opaque, void *user_data);
+
 static uint32_t connection_key_hash(const void *opaque)
 {
     const ConnectionKey *key = opaque;
@@ -190,6 +192,28 @@ static Connection *connection_new(ConnectionKey *key)
     return conn;
 }
 
+static void colo_send_primary_packet(void *opaque, void *user_data)
+{
+    Packet *pkt = opaque;
+    qemu_net_queue_send(pkt->s->incoming_queue, pkt->sender, 0,
+                    (const uint8_t *)pkt->data, pkt->size, NULL);
+}
+
+static void colo_flush_connection(void *opaque, void *user_data)
+{
+    Connection *conn = opaque;
+    Packet *pkt = NULL;
+
+    while (!g_queue_is_empty(&conn->primary_list)) {
+        pkt = g_queue_pop_head(&conn->primary_list);
+        colo_send_primary_packet(pkt, NULL);
+    }
+    while (!g_queue_is_empty(&conn->secondary_list)) {
+        pkt = g_queue_pop_head(&conn->secondary_list);
+        packet_destroy(pkt, NULL);
+    }
+}
+
 /*
  * Clear hashtable, stop this hash growing really huge
  */
@@ -205,6 +229,52 @@ bool colo_proxy_query_checkpoint(void)
     return colo_do_checkpoint;
 }
 
+static int colo_proxy_primary_checkpoint(COLOProxyState *s)
+{
+    g_queue_foreach(&s->conn_list, colo_flush_connection, NULL);
+    return 0;
+}
+
+static int colo_proxy_secondary_checkpoint(COLOProxyState *s)
+{
+    return 0;
+}
+
+static void colo_proxy_checkpoint_one(NetFilterState *nf,
+                                             void *opaque, Error **errp)
+{
+    COLOProxyState *s;
+    int mode;
+
+    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
+        return;
+    }
+
+    s = FILTER_COLO_PROXY(nf);
+    mode = *(int *)opaque;
+    assert(s->colo_mode == mode);
+
+    if (s->colo_mode == COLO_MODE_PRIMARY) {
+        colo_proxy_primary_checkpoint(s);
+    } else {
+        /* secondary do checkpoint */
+        colo_proxy_secondary_checkpoint(s);
+    }
+}
+
+int colo_proxy_do_checkpoint(int mode)
+{
+    Error *err = NULL;
+    qemu_foreach_netfilter(colo_proxy_checkpoint_one, &mode, &err);
+    if (err) {
+        error_report("colo proxy do checkpoint failed");
+        return -1;
+    }
+
+    colo_do_checkpoint = false;
+    return 0;
+}
+
 /* Return 0 on success, or return -1 if the pkt is corrupted */
 static int parse_packet_early(Packet *pkt, ConnectionKey *key)
 {
@@ -294,6 +364,7 @@ static Connection *colo_proxy_get_conn(COLOProxyState *s,
         if (s->hashtable_size > hashtable_max_size) {
             trace_colo_proxy("colo proxy connection hashtable full, clear it");
             clear_connection_hashtable(s);
+            /* TODO:clear conn_list */
         } else {
             g_hash_table_insert(colo_conn_hash, new_key, conn);
         }
@@ -751,6 +822,8 @@ void colo_proxy_stop(int mode)
 static void colo_proxy_setup(NetFilterState *nf, Error **errp)
 {
     COLOProxyState *s = FILTER_COLO_PROXY(nf);
+    ssize_t factor = 8;
+    struct sysinfo si;
 
     if (!s->addr) {
         error_setg(errp, "filter colo_proxy needs 'addr' property set!");
@@ -768,6 +841,21 @@ static void colo_proxy_setup(NetFilterState *nf, Error **errp)
     colo_do_checkpoint = false;
     qemu_event_init(&s->need_compare_ev, false);
 
+    /*
+     *  Idea from kernel tcp.c: use 1/16384 of memory.  On i386: 32MB
+     * machine has 512 buckets. >= 1GB machines have 16384 buckets.
+     * default factor = 8
+     */
+    sysinfo(&si);
+    hashtable_max_size = 16384;
+    if (si.totalram > (1024 * 1024 * 1024)) {
+        hashtable_max_size = 16384;
+    }
+    if (hashtable_max_size < 32) {
+        hashtable_max_size = 32;
+    }
+
+    hashtable_max_size = hashtable_max_size * factor;
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
     colo_conn_hash = g_hash_table_new_full(connection_key_hash,
                                            connection_key_equal,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (9 preceding siblings ...)
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 10/10] net/colo-proxy: Colo-proxy do checkpoint and clear Zhang Chen
@ 2015-12-29  6:31 ` Zhang Chen
  2015-12-29  6:58   ` Jason Wang
  2015-12-31  2:36 ` Jason Wang
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-29  6:31 UTC (permalink / raw)
  To: qemu devel, Jason Wang, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Yang Hongyang, zhanghailiang

Hi~
Just a small ping...
No news for a week.
Colo proxy is a part of COLO project, we need review and comments.


Thanks
zhangchen


On 12/22/2015 06:42 PM, Zhang Chen wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>
> Hi,all
>
> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> based on qemu netfilter and it's a plugin for qemu netfilter. the function
> keep Secondary VM connect normal to Primary VM and compare packets
> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> checkpoint and send all primary packet has queued.
>
> You can also get the series from:
>
> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>
> Usage:
>
> primary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>
> secondary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>
> NOTE:
> queue must set "all". See enum NetFilterDirection for detail.
> colo-proxy need queue all packets
> colo-proxy V2 just can compare ip packet
>
>
> ## Background
>
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> (SVM) run in parallel. They receive the same request from client, and generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)is conducted.
>
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
>
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called colo-proxy.
>
> V2:
>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>    fix dave's comments
>    fix wency's comments
>    fix zhanghailiang's comments
>
> v1:
>    initial patch.
>
>
>
> zhangchen (10):
>    Init colo-proxy object based on netfilter
>    Jhash: add linux kernel jhashtable in qemu
>    Colo-proxy: add colo-proxy framework
>    Colo-proxy: add data structure and jhash func
>    net/colo-proxy: Add colo interface to use proxy
>    net/colo-proxy: add socket used by forward func
>    net/colo-proxy: Add packet enqueue & handle func
>    net/colo-proxy: Handle packet and connection
>    net/colo-proxy: Compare pri pkt to sec pkt
>    net/colo-proxy: Colo-proxy do checkpoint and clear
>
>   include/qemu/jhash.h |  61 ++++
>   net/Makefile.objs    |   1 +
>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>   net/colo-proxy.h     |  24 ++
>   qemu-options.hx      |   6 +
>   trace-events         |   8 +
>   vl.c                 |   3 +-
>   7 files changed, 1041 insertions(+), 1 deletion(-)
>   create mode 100644 include/qemu/jhash.h
>   create mode 100644 net/colo-proxy.c
>   create mode 100644 net/colo-proxy.h
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-29  6:31 ` [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
@ 2015-12-29  6:58   ` Jason Wang
  2015-12-29  7:08     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2015-12-29  6:58 UTC (permalink / raw)
  To: Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, jan.kiszka,
	Yang Hongyang



On 12/29/2015 02:31 PM, Zhang Chen wrote:
> Hi~
> Just a small ping...
> No news for a week.
> Colo proxy is a part of COLO project, we need review and comments.
>
>
> Thanks
> zhangchen

Hi, will find sometime to review this this week.

Thanks

>
>
> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Hi,all
>>
>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>> function
>> keep Secondary VM connect normal to Primary VM and compare packets
>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>> checkpoint and send all primary packet has queued.
>>
>> You can also get the series from:
>>
>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>
>>
>> Usage:
>>
>> primary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object
>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>
>> secondary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object
>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>
>> NOTE:
>> queue must set "all". See enum NetFilterDirection for detail.
>> colo-proxy need queue all packets
>> colo-proxy V2 just can compare ip packet
>>
>>
>> ## Background
>>
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>> Service)
>> project is a high availability solution. Both Primary VM (PVM) and
>> Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and
>> generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)is conducted.
>>
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and
>> finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called colo-proxy.
>>
>> V2:
>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>    fix dave's comments
>>    fix wency's comments
>>    fix zhanghailiang's comments
>>
>> v1:
>>    initial patch.
>>
>>
>>
>> zhangchen (10):
>>    Init colo-proxy object based on netfilter
>>    Jhash: add linux kernel jhashtable in qemu
>>    Colo-proxy: add colo-proxy framework
>>    Colo-proxy: add data structure and jhash func
>>    net/colo-proxy: Add colo interface to use proxy
>>    net/colo-proxy: add socket used by forward func
>>    net/colo-proxy: Add packet enqueue & handle func
>>    net/colo-proxy: Handle packet and connection
>>    net/colo-proxy: Compare pri pkt to sec pkt
>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>
>>   include/qemu/jhash.h |  61 ++++
>>   net/Makefile.objs    |   1 +
>>   net/colo-proxy.c     | 939
>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/colo-proxy.h     |  24 ++
>>   qemu-options.hx      |   6 +
>>   trace-events         |   8 +
>>   vl.c                 |   3 +-
>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>   create mode 100644 include/qemu/jhash.h
>>   create mode 100644 net/colo-proxy.c
>>   create mode 100644 net/colo-proxy.h
>>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-29  6:58   ` Jason Wang
@ 2015-12-29  7:08     ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2015-12-29  7:08 UTC (permalink / raw)
  To: Jason Wang, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, jan.kiszka,
	Yang Hongyang



On 12/29/2015 02:58 PM, Jason Wang wrote:
>
> On 12/29/2015 02:31 PM, Zhang Chen wrote:
>> Hi~
>> Just a small ping...
>> No news for a week.
>> Colo proxy is a part of COLO project, we need review and comments.
>>
>>
>> Thanks
>> zhangchen
> Hi, will find sometime to review this this week.
>
> Thanks

Thanks very much for your help~~

>>
>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>
>>> Hi,all
>>>
>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>> function
>>> keep Secondary VM connect normal to Primary VM and compare packets
>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>> checkpoint and send all primary packet has queued.
>>>
>>> You can also get the series from:
>>>
>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>
>>>
>>> Usage:
>>>
>>> primary:
>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>> -object
>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>
>>> secondary:
>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>> -object
>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>
>>> NOTE:
>>> queue must set "all". See enum NetFilterDirection for detail.
>>> colo-proxy need queue all packets
>>> colo-proxy V2 just can compare ip packet
>>>
>>>
>>> ## Background
>>>
>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>> Service)
>>> project is a high availability solution. Both Primary VM (PVM) and
>>> Secondary VM
>>> (SVM) run in parallel. They receive the same request from client, and
>>> generate
>>> responses in parallel too. If the response packets from PVM and SVM are
>>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>>> demand)is conducted.
>>>
>>> Paper:
>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>>
>>> COLO on Xen:
>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>>
>>> COLO on Qemu/KVM:
>>> http://wiki.qemu.org/Features/COLO
>>>
>>> By the needs of capturing response packets from PVM and SVM and
>>> finding out
>>> whether they are identical, we introduce a new module to qemu networking
>>> called colo-proxy.
>>>
>>> V2:
>>>     rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>>     fix dave's comments
>>>     fix wency's comments
>>>     fix zhanghailiang's comments
>>>
>>> v1:
>>>     initial patch.
>>>
>>>
>>>
>>> zhangchen (10):
>>>     Init colo-proxy object based on netfilter
>>>     Jhash: add linux kernel jhashtable in qemu
>>>     Colo-proxy: add colo-proxy framework
>>>     Colo-proxy: add data structure and jhash func
>>>     net/colo-proxy: Add colo interface to use proxy
>>>     net/colo-proxy: add socket used by forward func
>>>     net/colo-proxy: Add packet enqueue & handle func
>>>     net/colo-proxy: Handle packet and connection
>>>     net/colo-proxy: Compare pri pkt to sec pkt
>>>     net/colo-proxy: Colo-proxy do checkpoint and clear
>>>
>>>    include/qemu/jhash.h |  61 ++++
>>>    net/Makefile.objs    |   1 +
>>>    net/colo-proxy.c     | 939
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    net/colo-proxy.h     |  24 ++
>>>    qemu-options.hx      |   6 +
>>>    trace-events         |   8 +
>>>    vl.c                 |   3 +-
>>>    7 files changed, 1041 insertions(+), 1 deletion(-)
>>>    create mode 100644 include/qemu/jhash.h
>>>    create mode 100644 net/colo-proxy.c
>>>    create mode 100644 net/colo-proxy.h
>>>
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (10 preceding siblings ...)
  2015-12-29  6:31 ` [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
@ 2015-12-31  2:36 ` Jason Wang
  2015-12-31  8:02   ` Li Zhijian
  2015-12-31  8:40   ` Zhang Chen
  2016-01-08 11:19 ` Dr. David Alan Gilbert
  2016-02-29 20:04 ` Dr. David Alan Gilbert
  13 siblings, 2 replies; 75+ messages in thread
From: Jason Wang @ 2015-12-31  2:36 UTC (permalink / raw)
  To: Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Yang Hongyang, zhanghailiang



On 12/22/2015 06:42 PM, Zhang Chen wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>
> Hi,all
>
> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> based on qemu netfilter and it's a plugin for qemu netfilter. the function
> keep Secondary VM connect normal to Primary VM and compare packets
> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> checkpoint and send all primary packet has queued.

Thanks for the work. I don't object this method but still not convinced
that qemu is the best place for this.

As been raised in the past discussion, it's almost impossible to
cooperate with vhost backends. If we want this to be used in production
environment, need to think of a solution for vhost. There's no such
worry if we decouple this from qemu.

>
> You can also get the series from:
>
> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>
> Usage:
>
> primary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>
> secondary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port 

Have a quick glance at how secondary mode work. What it does is just
forwarding packets between a nic and a socket, qemu socket backend did
exact the same job. You could even use socket in primary node and let
packet compare module talk to both primary and secondary node.

>
> NOTE:
> queue must set "all". See enum NetFilterDirection for detail.
> colo-proxy need queue all packets
> colo-proxy V2 just can compare ip packet
>
>
> ## Background
>
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> (SVM) run in parallel. They receive the same request from client, and generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)is conducted.
>
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
>
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called colo-proxy.
>
> V2:
>   rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>   fix dave's comments
>   fix wency's comments
>   fix zhanghailiang's comments
>
> v1:
>   initial patch.
>
>
>
> zhangchen (10):
>   Init colo-proxy object based on netfilter
>   Jhash: add linux kernel jhashtable in qemu
>   Colo-proxy: add colo-proxy framework
>   Colo-proxy: add data structure and jhash func
>   net/colo-proxy: Add colo interface to use proxy
>   net/colo-proxy: add socket used by forward func
>   net/colo-proxy: Add packet enqueue & handle func
>   net/colo-proxy: Handle packet and connection
>   net/colo-proxy: Compare pri pkt to sec pkt
>   net/colo-proxy: Colo-proxy do checkpoint and clear
>
>  include/qemu/jhash.h |  61 ++++
>  net/Makefile.objs    |   1 +
>  net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/colo-proxy.h     |  24 ++
>  qemu-options.hx      |   6 +
>  trace-events         |   8 +
>  vl.c                 |   3 +-
>  7 files changed, 1041 insertions(+), 1 deletion(-)
>  create mode 100644 include/qemu/jhash.h
>  create mode 100644 net/colo-proxy.c
>  create mode 100644 net/colo-proxy.h
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-31  2:36 ` Jason Wang
@ 2015-12-31  8:02   ` Li Zhijian
  2016-01-04  2:08     ` Jason Wang
  2015-12-31  8:40   ` Zhang Chen
  1 sibling, 1 reply; 75+ messages in thread
From: Li Zhijian @ 2015-12-31  8:02 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Yang Hongyang



On 12/31/2015 10:36 AM, Jason Wang wrote:
>
>
> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Hi,all
>>
>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>> based on qemu netfilter and it's a plugin for qemu netfilter. the function
>> keep Secondary VM connect normal to Primary VM and compare packets
>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>> checkpoint and send all primary packet has queued.
>
> Thanks for the work. I don't object this method but still not convinced
> that qemu is the best place for this.
>
> As been raised in the past discussion, it's almost impossible to
> cooperate with vhost backends. If we want this to be used in production
> environment, need to think of a solution for vhost. There's no such
> worry if we decouple this from qemu.

Yes, I agree with you. But not everything is perfect at the beginning.
If we implement proxy as kernel modules, maybe some one will ask how
about the packet(e.g. vhost-user) that host kernel can not touch.

As you said, colo-proxy in qemu will not support vhost scene, but it can make
colo more easier to be used so that more and more pepole will join us to test it.
I'm looking forward to that day.

if everything goes fine, I think colo can support vhost scene in other way(such as
introduce extra proxy module in kernel) in the feature.

So, I think colo-proxy in qemu is a good choice for current COLO.

Thanks
Li

>
>>
>> You can also get the series from:
>>
>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>
>> Usage:
>>
>> primary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>
>> secondary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>
> Have a quick glance at how secondary mode work. What it does is just
> forwarding packets between a nic and a socket, qemu socket backend did
> exact the same job. You could even use socket in primary node and let
> packet compare module talk to both primary and secondary node.
>
>>
>> NOTE:
>> queue must set "all". See enum NetFilterDirection for detail.
>> colo-proxy need queue all packets
>> colo-proxy V2 just can compare ip packet
>>
>>
>> ## Background
>>
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
>> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)is conducted.
>>
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called colo-proxy.
>>
>> V2:
>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>    fix dave's comments
>>    fix wency's comments
>>    fix zhanghailiang's comments
>>
>> v1:
>>    initial patch.
>>
>>
>>
>> zhangchen (10):
>>    Init colo-proxy object based on netfilter
>>    Jhash: add linux kernel jhashtable in qemu
>>    Colo-proxy: add colo-proxy framework
>>    Colo-proxy: add data structure and jhash func
>>    net/colo-proxy: Add colo interface to use proxy
>>    net/colo-proxy: add socket used by forward func
>>    net/colo-proxy: Add packet enqueue & handle func
>>    net/colo-proxy: Handle packet and connection
>>    net/colo-proxy: Compare pri pkt to sec pkt
>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>
>>   include/qemu/jhash.h |  61 ++++
>>   net/Makefile.objs    |   1 +
>>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/colo-proxy.h     |  24 ++
>>   qemu-options.hx      |   6 +
>>   trace-events         |   8 +
>>   vl.c                 |   3 +-
>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>   create mode 100644 include/qemu/jhash.h
>>   create mode 100644 net/colo-proxy.c
>>   create mode 100644 net/colo-proxy.h
>>
>
>
>
>
> .
>

-- 
Best regards.
Li Zhijian (8555)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-31  2:36 ` Jason Wang
  2015-12-31  8:02   ` Li Zhijian
@ 2015-12-31  8:40   ` Zhang Chen
  2016-01-04  5:37     ` Jason Wang
  1 sibling, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2015-12-31  8:40 UTC (permalink / raw)
  To: Jason Wang, qemu devel, Stefan Hajnoczi
  Cc: Li Zhijian, Gui jianfeng, eddie.dong, Dr. David Alan Gilbert,
	Huang peng, Gong lei, jan.kiszka, Yang Hongyang, zhanghailiang



On 12/31/2015 10:36 AM, Jason Wang wrote:
>
> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Hi,all
>>
>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>> based on qemu netfilter and it's a plugin for qemu netfilter. the function
>> keep Secondary VM connect normal to Primary VM and compare packets
>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>> checkpoint and send all primary packet has queued.
> Thanks for the work. I don't object this method but still not convinced
> that qemu is the best place for this.
>
> As been raised in the past discussion, it's almost impossible to
> cooperate with vhost backends. If we want this to be used in production
> environment, need to think of a solution for vhost. There's no such
> worry if we decouple this from qemu.
>
>> You can also get the series from:
>>
>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>
>> Usage:
>>
>> primary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>
>> secondary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
> Have a quick glance at how secondary mode work. What it does is just
> forwarding packets between a nic and a socket, qemu socket backend did
> exact the same job. You could even use socket in primary node and let
> packet compare module talk to both primary and secondary node.

If we use qemu socket backend , the same netdev will used by qemu socket 
and
qemu netfilter. this will against qemu net design. and then, when colo 
do failover,
secondary do not have backend to use. that's the real problem.


Thanks
zhangchen

>> NOTE:
>> queue must set "all". See enum NetFilterDirection for detail.
>> colo-proxy need queue all packets
>> colo-proxy V2 just can compare ip packet
>>
>>
>> ## Background
>>
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
>> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)is conducted.
>>
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called colo-proxy.
>>
>> V2:
>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>    fix dave's comments
>>    fix wency's comments
>>    fix zhanghailiang's comments
>>
>> v1:
>>    initial patch.
>>
>>
>>
>> zhangchen (10):
>>    Init colo-proxy object based on netfilter
>>    Jhash: add linux kernel jhashtable in qemu
>>    Colo-proxy: add colo-proxy framework
>>    Colo-proxy: add data structure and jhash func
>>    net/colo-proxy: Add colo interface to use proxy
>>    net/colo-proxy: add socket used by forward func
>>    net/colo-proxy: Add packet enqueue & handle func
>>    net/colo-proxy: Handle packet and connection
>>    net/colo-proxy: Compare pri pkt to sec pkt
>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>
>>   include/qemu/jhash.h |  61 ++++
>>   net/Makefile.objs    |   1 +
>>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/colo-proxy.h     |  24 ++
>>   qemu-options.hx      |   6 +
>>   trace-events         |   8 +
>>   vl.c                 |   3 +-
>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>   create mode 100644 include/qemu/jhash.h
>>   create mode 100644 net/colo-proxy.c
>>   create mode 100644 net/colo-proxy.h
>>
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-31  8:02   ` Li Zhijian
@ 2016-01-04  2:08     ` Jason Wang
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Wang @ 2016-01-04  2:08 UTC (permalink / raw)
  To: Li Zhijian, Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, jan.kiszka, Yang Hongyang



On 12/31/2015 04:02 PM, Li Zhijian wrote:
>
>
> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>
>>
>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>
>>> Hi,all
>>>
>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>> function
>>> keep Secondary VM connect normal to Primary VM and compare packets
>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>> checkpoint and send all primary packet has queued.
>>
>> Thanks for the work. I don't object this method but still not convinced
>> that qemu is the best place for this.
>>
>> As been raised in the past discussion, it's almost impossible to
>> cooperate with vhost backends. If we want this to be used in production
>> environment, need to think of a solution for vhost. There's no such
>> worry if we decouple this from qemu.
>
> Yes, I agree with you. But not everything is perfect at the beginning.
> If we implement proxy as kernel modules, maybe some one will ask how
> about the packet(e.g. vhost-user) that host kernel can not touch.

Then I think the best place is still in userspace but not qemu.  With
this the packet comparing module can easily accept the mirrored traffic
from both kernel and userspace. And qemu part can focus on the
infrastructures to support them (e.g mirroring traffic to a socket or
elsewhere).

>
> As you said, colo-proxy in qemu will not support vhost scene, but it
> can make
> colo more easier to be used so that more and more pepole will join us
> to test it.
> I'm looking forward to that day.

Agree, so moving this out of qemu can greatly reduce the complexity and
make it easier to be merged.

>
> if everything goes fine, I think colo can support vhost scene in other
> way(such as
> introduce extra proxy module in kernel) in the feature.

I believe we don't want duplicate codes & bugs(fixes) in two places.

>
> So, I think colo-proxy in qemu is a good choice for current COLO.
>
> Thanks
> Li 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-31  8:40   ` Zhang Chen
@ 2016-01-04  5:37     ` Jason Wang
  2016-01-04  8:16       ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-04  5:37 UTC (permalink / raw)
  To: Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, jan.kiszka,
	Yang Hongyang



On 12/31/2015 04:40 PM, Zhang Chen wrote:
>
>
> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>
>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>
>>> Hi,all
>>>
>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>> function
>>> keep Secondary VM connect normal to Primary VM and compare packets
>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>> checkpoint and send all primary packet has queued.
>> Thanks for the work. I don't object this method but still not convinced
>> that qemu is the best place for this.
>>
>> As been raised in the past discussion, it's almost impossible to
>> cooperate with vhost backends. If we want this to be used in production
>> environment, need to think of a solution for vhost. There's no such
>> worry if we decouple this from qemu.
>>
>>> You can also get the series from:
>>>
>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>
>>>
>>> Usage:
>>>
>>> primary:
>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>> -object
>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>
>>> secondary:
>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>> -object
>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>> Have a quick glance at how secondary mode work. What it does is just
>> forwarding packets between a nic and a socket, qemu socket backend did
>> exact the same job. You could even use socket in primary node and let
>> packet compare module talk to both primary and secondary node.
>
> If we use qemu socket backend , the same netdev will used by qemu
> socket and
> qemu netfilter. this will against qemu net design. and then, when colo
> do failover,
> secondary do not have backend to use. that's the real problem.

Then, maybe it's time to implement changing the netdev of a nic. The
point here is that what secondary mode did is in fact a netdev backend
instead of a filter ...

>
>
> Thanks
> zhangchen
>
>>> NOTE:
>>> queue must set "all". See enum NetFilterDirection for detail.
>>> colo-proxy need queue all packets
>>> colo-proxy V2 just can compare ip packet
>>>
>>>
>>> ## Background
>>>
>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>> Service)
>>> project is a high availability solution. Both Primary VM (PVM) and
>>> Secondary VM
>>> (SVM) run in parallel. They receive the same request from client,
>>> and generate
>>> responses in parallel too. If the response packets from PVM and SVM are
>>> identical, they are released immediately. Otherwise, a VM checkpoint
>>> (on
>>> demand)is conducted.
>>>
>>> Paper:
>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>>
>>> COLO on Xen:
>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>>
>>> COLO on Qemu/KVM:
>>> http://wiki.qemu.org/Features/COLO
>>>
>>> By the needs of capturing response packets from PVM and SVM and
>>> finding out
>>> whether they are identical, we introduce a new module to qemu
>>> networking
>>> called colo-proxy.
>>>
>>> V2:
>>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>>    fix dave's comments
>>>    fix wency's comments
>>>    fix zhanghailiang's comments
>>>
>>> v1:
>>>    initial patch.
>>>
>>>
>>>
>>> zhangchen (10):
>>>    Init colo-proxy object based on netfilter
>>>    Jhash: add linux kernel jhashtable in qemu
>>>    Colo-proxy: add colo-proxy framework
>>>    Colo-proxy: add data structure and jhash func
>>>    net/colo-proxy: Add colo interface to use proxy
>>>    net/colo-proxy: add socket used by forward func
>>>    net/colo-proxy: Add packet enqueue & handle func
>>>    net/colo-proxy: Handle packet and connection
>>>    net/colo-proxy: Compare pri pkt to sec pkt
>>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>>
>>>   include/qemu/jhash.h |  61 ++++
>>>   net/Makefile.objs    |   1 +
>>>   net/colo-proxy.c     | 939
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   net/colo-proxy.h     |  24 ++
>>>   qemu-options.hx      |   6 +
>>>   trace-events         |   8 +
>>>   vl.c                 |   3 +-
>>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>>   create mode 100644 include/qemu/jhash.h
>>>   create mode 100644 net/colo-proxy.c
>>>   create mode 100644 net/colo-proxy.h
>>>
>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04  5:37     ` Jason Wang
@ 2016-01-04  8:16       ` Zhang Chen
  2016-01-04  9:46         ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-04  8:16 UTC (permalink / raw)
  To: Jason Wang, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, jan.kiszka, Yang Hongyang



On 01/04/2016 01:37 PM, Jason Wang wrote:
>
> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>
>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>
>>>> Hi,all
>>>>
>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>> function
>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>> checkpoint and send all primary packet has queued.
>>> Thanks for the work. I don't object this method but still not convinced
>>> that qemu is the best place for this.
>>>
>>> As been raised in the past discussion, it's almost impossible to
>>> cooperate with vhost backends. If we want this to be used in production
>>> environment, need to think of a solution for vhost. There's no such
>>> worry if we decouple this from qemu.
>>>
>>>> You can also get the series from:
>>>>
>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>
>>>>
>>>> Usage:
>>>>
>>>> primary:
>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>> -object
>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>
>>>> secondary:
>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>> -object
>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>> Have a quick glance at how secondary mode work. What it does is just
>>> forwarding packets between a nic and a socket, qemu socket backend did
>>> exact the same job. You could even use socket in primary node and let
>>> packet compare module talk to both primary and secondary node.
>> If we use qemu socket backend , the same netdev will used by qemu
>> socket and
>> qemu netfilter. this will against qemu net design. and then, when colo
>> do failover,
>> secondary do not have backend to use. that's the real problem.
> Then, maybe it's time to implement changing the netdev of a nic. The
> point here is that what secondary mode did is in fact a netdev backend
> instead of a filter ...

Currently, you are right. in colo-proxy V2 code, I just compare IP packet to
decide whether to do checkpoint.
But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
because that can reduce frequency of checkpoint and improve
performance. To keep tcp connection well, colo secondary need to record
primary guest's init seq and adjust secondary guest's ack. if colo do 
failover,
secondary also need do this to old tcp connection. qemu socket
can't do this job. and another problem is do failover, if we use qemu socket
to be backend in secondary, when colo do failover, I don't know how to 
change
secondary be a normal qemu, if you know, please tell me.


Thanks for your revew
zhangchen

>>
>> Thanks
>> zhangchen
>>
>>>> NOTE:
>>>> queue must set "all". See enum NetFilterDirection for detail.
>>>> colo-proxy need queue all packets
>>>> colo-proxy V2 just can compare ip packet
>>>>
>>>>
>>>> ## Background
>>>>
>>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>>> Service)
>>>> project is a high availability solution. Both Primary VM (PVM) and
>>>> Secondary VM
>>>> (SVM) run in parallel. They receive the same request from client,
>>>> and generate
>>>> responses in parallel too. If the response packets from PVM and SVM are
>>>> identical, they are released immediately. Otherwise, a VM checkpoint
>>>> (on
>>>> demand)is conducted.
>>>>
>>>> Paper:
>>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>>>
>>>> COLO on Xen:
>>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>>>
>>>> COLO on Qemu/KVM:
>>>> http://wiki.qemu.org/Features/COLO
>>>>
>>>> By the needs of capturing response packets from PVM and SVM and
>>>> finding out
>>>> whether they are identical, we introduce a new module to qemu
>>>> networking
>>>> called colo-proxy.
>>>>
>>>> V2:
>>>>     rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>>>     fix dave's comments
>>>>     fix wency's comments
>>>>     fix zhanghailiang's comments
>>>>
>>>> v1:
>>>>     initial patch.
>>>>
>>>>
>>>>
>>>> zhangchen (10):
>>>>     Init colo-proxy object based on netfilter
>>>>     Jhash: add linux kernel jhashtable in qemu
>>>>     Colo-proxy: add colo-proxy framework
>>>>     Colo-proxy: add data structure and jhash func
>>>>     net/colo-proxy: Add colo interface to use proxy
>>>>     net/colo-proxy: add socket used by forward func
>>>>     net/colo-proxy: Add packet enqueue & handle func
>>>>     net/colo-proxy: Handle packet and connection
>>>>     net/colo-proxy: Compare pri pkt to sec pkt
>>>>     net/colo-proxy: Colo-proxy do checkpoint and clear
>>>>
>>>>    include/qemu/jhash.h |  61 ++++
>>>>    net/Makefile.objs    |   1 +
>>>>    net/colo-proxy.c     | 939
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    net/colo-proxy.h     |  24 ++
>>>>    qemu-options.hx      |   6 +
>>>>    trace-events         |   8 +
>>>>    vl.c                 |   3 +-
>>>>    7 files changed, 1041 insertions(+), 1 deletion(-)
>>>>    create mode 100644 include/qemu/jhash.h
>>>>    create mode 100644 net/colo-proxy.c
>>>>    create mode 100644 net/colo-proxy.h
>>>>
>>>
>>> .
>>>
>
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04  8:16       ` Zhang Chen
@ 2016-01-04  9:46         ` Jason Wang
  2016-01-04 11:17           ` Zhang Chen
  2016-01-04 16:52           ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 75+ messages in thread
From: Jason Wang @ 2016-01-04  9:46 UTC (permalink / raw)
  To: Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, jan.kiszka,
	Yang Hongyang



On 01/04/2016 04:16 PM, Zhang Chen wrote:
>
>
> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>
>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>
>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>
>>>>> Hi,all
>>>>>
>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>> function
>>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>>> checkpoint and send all primary packet has queued.
>>>> Thanks for the work. I don't object this method but still not
>>>> convinced
>>>> that qemu is the best place for this.
>>>>
>>>> As been raised in the past discussion, it's almost impossible to
>>>> cooperate with vhost backends. If we want this to be used in
>>>> production
>>>> environment, need to think of a solution for vhost. There's no such
>>>> worry if we decouple this from qemu.
>>>>
>>>>> You can also get the series from:
>>>>>
>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>
>>>>>
>>>>>
>>>>> Usage:
>>>>>
>>>>> primary:
>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>> -object
>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>
>>>>> secondary:
>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>> -object
>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>> Have a quick glance at how secondary mode work. What it does is just
>>>> forwarding packets between a nic and a socket, qemu socket backend did
>>>> exact the same job. You could even use socket in primary node and let
>>>> packet compare module talk to both primary and secondary node.
>>> If we use qemu socket backend , the same netdev will used by qemu
>>> socket and
>>> qemu netfilter. this will against qemu net design. and then, when colo
>>> do failover,
>>> secondary do not have backend to use. that's the real problem.
>> Then, maybe it's time to implement changing the netdev of a nic. The
>> point here is that what secondary mode did is in fact a netdev backend
>> instead of a filter ...
>
> Currently, you are right. in colo-proxy V2 code, I just compare IP
> packet to
> decide whether to do checkpoint.
> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
> because that can reduce frequency of checkpoint and improve
> performance. To keep tcp connection well, colo secondary need to record
> primary guest's init seq and adjust secondary guest's ack. if colo do
> failover,
> secondary also need do this to old tcp connection. qemu socket
> can't do this job.

So a question here: is it a must to do things (e.g TCP analysis stuffs)
at secondary? Looks like we could do this at primary node. And I saw
you're doing packet comparing in primary node, any advantages of doing
this in primary instead of secondary?

> and another problem is do failover, if we use qemu socket
> to be backend in secondary, when colo do failover, I don't know how to
> change
> secondary be a normal qemu, if you know, please tell me.

Current qemu couldn't do this, but I mean we implement something like
nic_change_backend which can change nic's peer(s). With this, in
secondary, we can replace the socket backend with whatever you want (e.g
tap or other).

Thanks

>
>
> Thanks for your revew
> zhangchen 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04  9:46         ` Jason Wang
@ 2016-01-04 11:17           ` Zhang Chen
  2016-01-06  5:16             ` Jason Wang
  2016-01-04 16:52           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-04 11:17 UTC (permalink / raw)
  To: Jason Wang, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, jan.kiszka,
	Yang Hongyang



On 01/04/2016 05:46 PM, Jason Wang wrote:
>
> On 01/04/2016 04:16 PM, Zhang Chen wrote:
>>
>> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>>
>>>>>> Hi,all
>>>>>>
>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>>> function
>>>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>>>> checkpoint and send all primary packet has queued.
>>>>> Thanks for the work. I don't object this method but still not
>>>>> convinced
>>>>> that qemu is the best place for this.
>>>>>
>>>>> As been raised in the past discussion, it's almost impossible to
>>>>> cooperate with vhost backends. If we want this to be used in
>>>>> production
>>>>> environment, need to think of a solution for vhost. There's no such
>>>>> worry if we decouple this from qemu.
>>>>>
>>>>>> You can also get the series from:
>>>>>>
>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>>
>>>>>>
>>>>>>
>>>>>> Usage:
>>>>>>
>>>>>> primary:
>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>> -object
>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>>
>>>>>> secondary:
>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>> -object
>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>> Have a quick glance at how secondary mode work. What it does is just
>>>>> forwarding packets between a nic and a socket, qemu socket backend did
>>>>> exact the same job. You could even use socket in primary node and let
>>>>> packet compare module talk to both primary and secondary node.
>>>> If we use qemu socket backend , the same netdev will used by qemu
>>>> socket and
>>>> qemu netfilter. this will against qemu net design. and then, when colo
>>>> do failover,
>>>> secondary do not have backend to use. that's the real problem.
>>> Then, maybe it's time to implement changing the netdev of a nic. The
>>> point here is that what secondary mode did is in fact a netdev backend
>>> instead of a filter ...
>> Currently, you are right. in colo-proxy V2 code, I just compare IP
>> packet to
>> decide whether to do checkpoint.
>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
>> because that can reduce frequency of checkpoint and improve
>> performance. To keep tcp connection well, colo secondary need to record
>> primary guest's init seq and adjust secondary guest's ack. if colo do
>> failover,
>> secondary also need do this to old tcp connection. qemu socket
>> can't do this job.
> So a question here: is it a must to do things (e.g TCP analysis stuffs)
> at secondary? Looks like we could do this at primary node. And I saw
> you're doing packet comparing in primary node, any advantages of doing
> this in primary instead of secondary?

We think must  to do this in secondary, because if colo do 
failover,secondary
must continues do TCP analysis stuffs to before tcp connection(if not, 
tcp connection
will disconnect in that time), in this time primary already down or 
disconnect to
secondary.so we can't make primary do this  TCP analysis stuffs.it can 
not ensure
FT function.

Thanks
zhangchen

>> and another problem is do failover, if we use qemu socket
>> to be backend in secondary, when colo do failover, I don't know how to
>> change
>> secondary be a normal qemu, if you know, please tell me.
> Current qemu couldn't do this, but I mean we implement something like
> nic_change_backend which can change nic's peer(s). With this, in
> secondary, we can replace the socket backend with whatever you want (e.g
> tap or other).
>
> Thanks
>
>>
>> Thanks for your revew
>> zhangchen
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04  9:46         ` Jason Wang
  2016-01-04 11:17           ` Zhang Chen
@ 2016-01-04 16:52           ` Dr. David Alan Gilbert
  2016-01-06  5:20             ` Jason Wang
  1 sibling, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-04 16:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhang Chen, Li Zhijian, Gui jianfeng, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 01/04/2016 04:16 PM, Zhang Chen wrote:
> >
> >
> > On 01/04/2016 01:37 PM, Jason Wang wrote:
> >>
> >> On 12/31/2015 04:40 PM, Zhang Chen wrote:
> >>>
> >>> On 12/31/2015 10:36 AM, Jason Wang wrote:
> >>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
> >>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>>>>
> >>>>> Hi,all
> >>>>>
> >>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> >>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
> >>>>> function
> >>>>> keep Secondary VM connect normal to Primary VM and compare packets
> >>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> >>>>> checkpoint and send all primary packet has queued.
> >>>> Thanks for the work. I don't object this method but still not
> >>>> convinced
> >>>> that qemu is the best place for this.
> >>>>
> >>>> As been raised in the past discussion, it's almost impossible to
> >>>> cooperate with vhost backends. If we want this to be used in
> >>>> production
> >>>> environment, need to think of a solution for vhost. There's no such
> >>>> worry if we decouple this from qemu.
> >>>>
> >>>>> You can also get the series from:
> >>>>>
> >>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> >>>>>
> >>>>>
> >>>>>
> >>>>> Usage:
> >>>>>
> >>>>> primary:
> >>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
> >>>>> -object
> >>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> >>>>>
> >>>>> secondary:
> >>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
> >>>>> -object
> >>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
> >>>> Have a quick glance at how secondary mode work. What it does is just
> >>>> forwarding packets between a nic and a socket, qemu socket backend did
> >>>> exact the same job. You could even use socket in primary node and let
> >>>> packet compare module talk to both primary and secondary node.
> >>> If we use qemu socket backend , the same netdev will used by qemu
> >>> socket and
> >>> qemu netfilter. this will against qemu net design. and then, when colo
> >>> do failover,
> >>> secondary do not have backend to use. that's the real problem.
> >> Then, maybe it's time to implement changing the netdev of a nic. The
> >> point here is that what secondary mode did is in fact a netdev backend
> >> instead of a filter ...
> >
> > Currently, you are right. in colo-proxy V2 code, I just compare IP
> > packet to
> > decide whether to do checkpoint.
> > But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
> > because that can reduce frequency of checkpoint and improve
> > performance. To keep tcp connection well, colo secondary need to record
> > primary guest's init seq and adjust secondary guest's ack. if colo do
> > failover,
> > secondary also need do this to old tcp connection. qemu socket
> > can't do this job.
> 
> So a question here: is it a must to do things (e.g TCP analysis stuffs)
> at secondary? Looks like we could do this at primary node. And I saw
> you're doing packet comparing in primary node, any advantages of doing
> this in primary instead of secondary?

It needs to do this on the secondary; the trick is that things like TCP sequence
numbers are likely to be different on the primary and secondary; the kernel colo-proxy
implementation solved this problem by rewriting the sequence numbers on
the secondary to match the primary, after a failover, the secondary has
to keep doing that rewrite to ensure existing connections are OK.
Thus it's holding some state about the current connections.
I think also, to be able to do a 2nd failover (i.e. recover from the 1st failure
and then sometime later have another) you'd have to sync this
state over to a new host, so again that says the state needs to be part of
qemu or at least easily available to it.

Dave

> > and another problem is do failover, if we use qemu socket
> > to be backend in secondary, when colo do failover, I don't know how to
> > change
> > secondary be a normal qemu, if you know, please tell me.
> 
> Current qemu couldn't do this, but I mean we implement something like
> nic_change_backend which can change nic's peer(s). With this, in
> secondary, we can replace the socket backend with whatever you want (e.g
> tap or other).
> 
> Thanks
> 
> >
> >
> > Thanks for your revew
> > zhangchen 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04 11:17           ` Zhang Chen
@ 2016-01-06  5:16             ` Jason Wang
  2016-01-18  7:05               ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-06  5:16 UTC (permalink / raw)
  To: Zhang Chen, qemu devel, Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, jan.kiszka, Yang Hongyang



On 01/04/2016 07:17 PM, Zhang Chen wrote:
>
>
> On 01/04/2016 05:46 PM, Jason Wang wrote:
>>
>> On 01/04/2016 04:16 PM, Zhang Chen wrote:
>>>
>>> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>>>
>>>>>>> Hi,all
>>>>>>>
>>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>>>> function
>>>>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>>>>> checkpoint and send all primary packet has queued.
>>>>>> Thanks for the work. I don't object this method but still not
>>>>>> convinced
>>>>>> that qemu is the best place for this.
>>>>>>
>>>>>> As been raised in the past discussion, it's almost impossible to
>>>>>> cooperate with vhost backends. If we want this to be used in
>>>>>> production
>>>>>> environment, need to think of a solution for vhost. There's no such
>>>>>> worry if we decouple this from qemu.
>>>>>>
>>>>>>> You can also get the series from:
>>>>>>>
>>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Usage:
>>>>>>>
>>>>>>> primary:
>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>> -object
>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>>>
>>>>>>> secondary:
>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>> -object
>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>>> Have a quick glance at how secondary mode work. What it does is just
>>>>>> forwarding packets between a nic and a socket, qemu socket
>>>>>> backend did
>>>>>> exact the same job. You could even use socket in primary node and
>>>>>> let
>>>>>> packet compare module talk to both primary and secondary node.
>>>>> If we use qemu socket backend , the same netdev will used by qemu
>>>>> socket and
>>>>> qemu netfilter. this will against qemu net design. and then, when
>>>>> colo
>>>>> do failover,
>>>>> secondary do not have backend to use. that's the real problem.
>>>> Then, maybe it's time to implement changing the netdev of a nic. The
>>>> point here is that what secondary mode did is in fact a netdev backend
>>>> instead of a filter ...
>>> Currently, you are right. in colo-proxy V2 code, I just compare IP
>>> packet to
>>> decide whether to do checkpoint.
>>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
>>> because that can reduce frequency of checkpoint and improve
>>> performance. To keep tcp connection well, colo secondary need to record
>>> primary guest's init seq and adjust secondary guest's ack. if colo do
>>> failover,
>>> secondary also need do this to old tcp connection. qemu socket
>>> can't do this job.
>> So a question here: is it a must to do things (e.g TCP analysis stuffs)
>> at secondary? Looks like we could do this at primary node. And I saw
>> you're doing packet comparing in primary node, any advantages of doing
>> this in primary instead of secondary?
>
> We think must  to do this in secondary, because if colo do
> failover,secondary
> must continues do TCP analysis stuffs to before tcp connection(if not,
> tcp connection
> will disconnect in that time), in this time primary already down or
> disconnect to
> secondary.so we can't make primary do this  TCP analysis stuffs.it can
> not ensure
> FT function.
>
> Thanks
> zhangchen

Makes sense.

Thanks

>
>>> and another problem is do failover, if we use qemu socket
>>> to be backend in secondary, when colo do failover, I don't know how to
>>> change
>>> secondary be a normal qemu, if you know, please tell me.
>> Current qemu couldn't do this, but I mean we implement something like
>> nic_change_backend which can change nic's peer(s). With this, in
>> secondary, we can replace the socket backend with whatever you want (e.g
>> tap or other).
>>
>> Thanks
>>
>>>
>>> Thanks for your revew
>>> zhangchen
>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-04 16:52           ` Dr. David Alan Gilbert
@ 2016-01-06  5:20             ` Jason Wang
  2016-01-06  9:10               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-06  5:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Zhang Chen, Li Zhijian, Gui jianfeng, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang



On 01/05/2016 12:52 AM, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
>>
>> On 01/04/2016 04:16 PM, Zhang Chen wrote:
>>>
>>> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>>>
>>>>>>> Hi,all
>>>>>>>
>>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>>>> function
>>>>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>>>>> checkpoint and send all primary packet has queued.
>>>>>> Thanks for the work. I don't object this method but still not
>>>>>> convinced
>>>>>> that qemu is the best place for this.
>>>>>>
>>>>>> As been raised in the past discussion, it's almost impossible to
>>>>>> cooperate with vhost backends. If we want this to be used in
>>>>>> production
>>>>>> environment, need to think of a solution for vhost. There's no such
>>>>>> worry if we decouple this from qemu.
>>>>>>
>>>>>>> You can also get the series from:
>>>>>>>
>>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Usage:
>>>>>>>
>>>>>>> primary:
>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>> -object
>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>>>
>>>>>>> secondary:
>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>> -object
>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>>> Have a quick glance at how secondary mode work. What it does is just
>>>>>> forwarding packets between a nic and a socket, qemu socket backend did
>>>>>> exact the same job. You could even use socket in primary node and let
>>>>>> packet compare module talk to both primary and secondary node.
>>>>> If we use qemu socket backend , the same netdev will used by qemu
>>>>> socket and
>>>>> qemu netfilter. this will against qemu net design. and then, when colo
>>>>> do failover,
>>>>> secondary do not have backend to use. that's the real problem.
>>>> Then, maybe it's time to implement changing the netdev of a nic. The
>>>> point here is that what secondary mode did is in fact a netdev backend
>>>> instead of a filter ...
>>> Currently, you are right. in colo-proxy V2 code, I just compare IP
>>> packet to
>>> decide whether to do checkpoint.
>>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
>>> because that can reduce frequency of checkpoint and improve
>>> performance. To keep tcp connection well, colo secondary need to record
>>> primary guest's init seq and adjust secondary guest's ack. if colo do
>>> failover,
>>> secondary also need do this to old tcp connection. qemu socket
>>> can't do this job.
>> So a question here: is it a must to do things (e.g TCP analysis stuffs)
>> at secondary? Looks like we could do this at primary node. And I saw
>> you're doing packet comparing in primary node, any advantages of doing
>> this in primary instead of secondary?
> It needs to do this on the secondary; the trick is that things like TCP sequence
> numbers are likely to be different on the primary and secondary; the kernel colo-proxy
> implementation solved this problem by rewriting the sequence numbers on
> the secondary to match the primary, after a failover, the secondary has
> to keep doing that rewrite to ensure existing connections are OK.
> Thus it's holding some state about the current connections.

I see.

> I think also, to be able to do a 2nd failover (i.e. recover from the 1st failure
> and then sometime later have another) you'd have to sync this
> state over to a new host, so again that says the state needs to be part of
> qemu or at least easily available to it.
>
> Dave

Right, if it does thing like tcp seq rewrite (which is missed in current
version), it works much more like a netfilter. Wonder if the function is
generic enough for users other than colo.

Thanks

>
>>> and another problem is do failover, if we use qemu socket
>>> to be backend in secondary, when colo do failover, I don't know how to
>>> change
>>> secondary be a normal qemu, if you know, please tell me.
>> Current qemu couldn't do this, but I mean we implement something like
>> nic_change_backend which can change nic's peer(s). With this, in
>> secondary, we can replace the socket backend with whatever you want (e.g
>> tap or other).
>>
>> Thanks
>>
>>>
>>> Thanks for your revew
>>> zhangchen 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-06  5:20             ` Jason Wang
@ 2016-01-06  9:10               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-06  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhang Chen, Li Zhijian, Gui jianfeng, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 01/05/2016 12:52 AM, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> >>
> >> On 01/04/2016 04:16 PM, Zhang Chen wrote:
> >>>
> >>> On 01/04/2016 01:37 PM, Jason Wang wrote:
> >>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
> >>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
> >>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
> >>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>>>>>>
> >>>>>>> Hi,all
> >>>>>>>
> >>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> >>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
> >>>>>>> function
> >>>>>>> keep Secondary VM connect normal to Primary VM and compare packets
> >>>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> >>>>>>> checkpoint and send all primary packet has queued.
> >>>>>> Thanks for the work. I don't object this method but still not
> >>>>>> convinced
> >>>>>> that qemu is the best place for this.
> >>>>>>
> >>>>>> As been raised in the past discussion, it's almost impossible to
> >>>>>> cooperate with vhost backends. If we want this to be used in
> >>>>>> production
> >>>>>> environment, need to think of a solution for vhost. There's no such
> >>>>>> worry if we decouple this from qemu.
> >>>>>>
> >>>>>>> You can also get the series from:
> >>>>>>>
> >>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Usage:
> >>>>>>>
> >>>>>>> primary:
> >>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
> >>>>>>> -object
> >>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> >>>>>>>
> >>>>>>> secondary:
> >>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
> >>>>>>> -object
> >>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
> >>>>>> Have a quick glance at how secondary mode work. What it does is just
> >>>>>> forwarding packets between a nic and a socket, qemu socket backend did
> >>>>>> exact the same job. You could even use socket in primary node and let
> >>>>>> packet compare module talk to both primary and secondary node.
> >>>>> If we use qemu socket backend , the same netdev will used by qemu
> >>>>> socket and
> >>>>> qemu netfilter. this will against qemu net design. and then, when colo
> >>>>> do failover,
> >>>>> secondary do not have backend to use. that's the real problem.
> >>>> Then, maybe it's time to implement changing the netdev of a nic. The
> >>>> point here is that what secondary mode did is in fact a netdev backend
> >>>> instead of a filter ...
> >>> Currently, you are right. in colo-proxy V2 code, I just compare IP
> >>> packet to
> >>> decide whether to do checkpoint.
> >>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
> >>> because that can reduce frequency of checkpoint and improve
> >>> performance. To keep tcp connection well, colo secondary need to record
> >>> primary guest's init seq and adjust secondary guest's ack. if colo do
> >>> failover,
> >>> secondary also need do this to old tcp connection. qemu socket
> >>> can't do this job.
> >> So a question here: is it a must to do things (e.g TCP analysis stuffs)
> >> at secondary? Looks like we could do this at primary node. And I saw
> >> you're doing packet comparing in primary node, any advantages of doing
> >> this in primary instead of secondary?
> > It needs to do this on the secondary; the trick is that things like TCP sequence
> > numbers are likely to be different on the primary and secondary; the kernel colo-proxy
> > implementation solved this problem by rewriting the sequence numbers on
> > the secondary to match the primary, after a failover, the secondary has
> > to keep doing that rewrite to ensure existing connections are OK.
> > Thus it's holding some state about the current connections.
> 
> I see.
> 
> > I think also, to be able to do a 2nd failover (i.e. recover from the 1st failure
> > and then sometime later have another) you'd have to sync this
> > state over to a new host, so again that says the state needs to be part of
> > qemu or at least easily available to it.
> >
> > Dave
> 
> Right, if it does thing like tcp seq rewrite (which is missed in current
> version), it works much more like a netfilter. Wonder if the function is
> generic enough for users other than colo.

I can imagine the sequence number rework might be, but I doubt the packet
comparison is.

Dave

> Thanks
> 
> >
> >>> and another problem is do failover, if we use qemu socket
> >>> to be backend in secondary, when colo do failover, I don't know how to
> >>> change
> >>> secondary be a normal qemu, if you know, please tell me.
> >> Current qemu couldn't do this, but I mean we implement something like
> >> nic_change_backend which can change nic's peer(s). With this, in
> >> secondary, we can replace the socket backend with whatever you want (e.g
> >> tap or other).
> >>
> >> Thanks
> >>
> >>>
> >>> Thanks for your revew
> >>> zhangchen 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (11 preceding siblings ...)
  2015-12-31  2:36 ` Jason Wang
@ 2016-01-08 11:19 ` Dr. David Alan Gilbert
  2016-01-11  1:30   ` Zhang Chen
  2016-02-29 20:04 ` Dr. David Alan Gilbert
  13 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-08 11:19 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> Hi,all
> 
> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> based on qemu netfilter and it's a plugin for qemu netfilter. the function
> keep Secondary VM connect normal to Primary VM and compare packets
> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> checkpoint and send all primary packet has queued.
> 
> You can also get the series from:
> 
> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2

Are you sure that tag is correct? The series of commits doesn't seem to match
up with the set of commits posted.

Dave

> 
> Usage:
> 
> primary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> 
> secondary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port 
> 
> NOTE:
> queue must set "all". See enum NetFilterDirection for detail.
> colo-proxy need queue all packets
> colo-proxy V2 just can compare ip packet
> 
> 
> ## Background
> 
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> (SVM) run in parallel. They receive the same request from client, and generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)is conducted.
> 
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> 
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> 
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
> 
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called colo-proxy.
> 
> V2:
>   rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>   fix dave's comments
>   fix wency's comments
>   fix zhanghailiang's comments
> 
> v1:
>   initial patch.
> 
> 
> 
> zhangchen (10):
>   Init colo-proxy object based on netfilter
>   Jhash: add linux kernel jhashtable in qemu
>   Colo-proxy: add colo-proxy framework
>   Colo-proxy: add data structure and jhash func
>   net/colo-proxy: Add colo interface to use proxy
>   net/colo-proxy: add socket used by forward func
>   net/colo-proxy: Add packet enqueue & handle func
>   net/colo-proxy: Handle packet and connection
>   net/colo-proxy: Compare pri pkt to sec pkt
>   net/colo-proxy: Colo-proxy do checkpoint and clear
> 
>  include/qemu/jhash.h |  61 ++++
>  net/Makefile.objs    |   1 +
>  net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/colo-proxy.h     |  24 ++
>  qemu-options.hx      |   6 +
>  trace-events         |   8 +
>  vl.c                 |   3 +-
>  7 files changed, 1041 insertions(+), 1 deletion(-)
>  create mode 100644 include/qemu/jhash.h
>  create mode 100644 net/colo-proxy.c
>  create mode 100644 net/colo-proxy.h
> 
> -- 
> 1.9.1
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu Zhang Chen
@ 2016-01-08 12:08   ` Dr. David Alan Gilbert
  2016-01-11  1:49     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-08 12:08 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> Jhash used by colo-proxy to save and lookup
> net connection info
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 include/qemu/jhash.h
> 
> diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
> new file mode 100644
> index 0000000..5b82d02
> --- /dev/null
> +++ b/include/qemu/jhash.h
> @@ -0,0 +1,61 @@
> +/* jhash.h: Jenkins hash support.
> +  *
> +  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
> +  *
> +  * http://burtleburtle.net/bob/hash/
> +  *
> +  * These are the credits from Bob's sources:
> +  *
> +  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
> +  *
> +  * These are functions for producing 32-bit hashes for hash table lookup.
> +  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
> +  * are externally useful functions.  Routines to test the hash are
> +included
> +  * if SELF_TEST is defined.  You can use this free for any purpose.
> +It's in
> +  * the public domain.  It has no warranty.
> +  *
> +  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
> +  *
> +  * I've modified Bob's hash to be useful in the Linux kernel, and
> +  * any bugs present are my fault.
> +  * Jozsef
> +  */
> +
> +#ifndef QEMU_JHASH_H__
> +#define QEMU_JHASH_H__
> +
> +#include "qemu/bitopt.h"

That does not build, the header in qemu is bitop*s*.h.

Dave

> +
> +/*
> + * hashtable relation copy from linux kernel jhash
> + */
> +
> +/* __jhash_mix -- mix 3 32-bit values reversibly. */
> +#define __jhash_mix(a, b, c)                \
> +{                                           \
> +    a -= c;  a ^= rol32(c, 4);  c += b;     \
> +    b -= a;  b ^= rol32(a, 6);  a += c;     \
> +    c -= b;  c ^= rol32(b, 8);  b += a;     \
> +    a -= c;  a ^= rol32(c, 16); c += b;     \
> +    b -= a;  b ^= rol32(a, 19); a += c;     \
> +    c -= b;  c ^= rol32(b, 4);  b += a;     \
> +}
> +
> +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> +#define __jhash_final(a, b, c)  \
> +{                               \
> +    c ^= b; c -= rol32(b, 14);  \
> +    a ^= c; a -= rol32(c, 11);  \
> +    b ^= a; b -= rol32(a, 25);  \
> +    c ^= b; c -= rol32(b, 16);  \
> +    a ^= c; a -= rol32(c, 4);   \
> +    b ^= a; b -= rol32(a, 14);  \
> +    c ^= b; c -= rol32(b, 24);  \
> +}
> +
> +/* An arbitrary initial parameter */
> +#define JHASH_INITVAL           0xdeadbeef
> +
> +#endif /* QEMU_JHASH_H__ */
> -- 
> 1.9.1
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-08 11:19 ` Dr. David Alan Gilbert
@ 2016-01-11  1:30   ` Zhang Chen
  2016-01-11 12:59     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-11  1:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang



On 01/08/2016 07:19 PM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Hi,all
>>
>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>> based on qemu netfilter and it's a plugin for qemu netfilter. the function
>> keep Secondary VM connect normal to Primary VM and compare packets
>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>> checkpoint and send all primary packet has queued.
>>
>> You can also get the series from:
>>
>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> Are you sure that tag is correct? The series of commits doesn't seem to match
> up with the set of commits posted.
>
> Dave

Yes, it is. we have some code fix in other colo unrelated file.
in email, I just send colo related part.

zhangchen
Thanks for review

>> Usage:
>>
>> primary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>
>> secondary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>
>> NOTE:
>> queue must set "all". See enum NetFilterDirection for detail.
>> colo-proxy need queue all packets
>> colo-proxy V2 just can compare ip packet
>>
>>
>> ## Background
>>
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
>> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)is conducted.
>>
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called colo-proxy.
>>
>> V2:
>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>    fix dave's comments
>>    fix wency's comments
>>    fix zhanghailiang's comments
>>
>> v1:
>>    initial patch.
>>
>>
>>
>> zhangchen (10):
>>    Init colo-proxy object based on netfilter
>>    Jhash: add linux kernel jhashtable in qemu
>>    Colo-proxy: add colo-proxy framework
>>    Colo-proxy: add data structure and jhash func
>>    net/colo-proxy: Add colo interface to use proxy
>>    net/colo-proxy: add socket used by forward func
>>    net/colo-proxy: Add packet enqueue & handle func
>>    net/colo-proxy: Handle packet and connection
>>    net/colo-proxy: Compare pri pkt to sec pkt
>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>
>>   include/qemu/jhash.h |  61 ++++
>>   net/Makefile.objs    |   1 +
>>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/colo-proxy.h     |  24 ++
>>   qemu-options.hx      |   6 +
>>   trace-events         |   8 +
>>   vl.c                 |   3 +-
>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>   create mode 100644 include/qemu/jhash.h
>>   create mode 100644 net/colo-proxy.c
>>   create mode 100644 net/colo-proxy.h
>>
>> -- 
>> 1.9.1
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2016-01-08 12:08   ` Dr. David Alan Gilbert
@ 2016-01-11  1:49     ` Zhang Chen
  2016-01-11 12:50       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-11  1:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang



On 01/08/2016 08:08 PM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Jhash used by colo-proxy to save and lookup
>> net connection info
>>
>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 61 insertions(+)
>>   create mode 100644 include/qemu/jhash.h
>>
>> diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
>> new file mode 100644
>> index 0000000..5b82d02
>> --- /dev/null
>> +++ b/include/qemu/jhash.h
>> @@ -0,0 +1,61 @@
>> +/* jhash.h: Jenkins hash support.
>> +  *
>> +  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
>> +  *
>> +  * http://burtleburtle.net/bob/hash/
>> +  *
>> +  * These are the credits from Bob's sources:
>> +  *
>> +  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
>> +  *
>> +  * These are functions for producing 32-bit hashes for hash table lookup.
>> +  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
>> +  * are externally useful functions.  Routines to test the hash are
>> +included
>> +  * if SELF_TEST is defined.  You can use this free for any purpose.
>> +It's in
>> +  * the public domain.  It has no warranty.
>> +  *
>> +  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
>> +  *
>> +  * I've modified Bob's hash to be useful in the Linux kernel, and
>> +  * any bugs present are my fault.
>> +  * Jozsef
>> +  */
>> +
>> +#ifndef QEMU_JHASH_H__
>> +#define QEMU_JHASH_H__
>> +
>> +#include "qemu/bitopt.h"
> That does not build, the header in qemu is bitop*s*.h.
>
> Dave

I'm very sorry for it, fix it to

#include "qemu/bitopts.h"

Thanks
zhangchen


>> +
>> +/*
>> + * hashtable relation copy from linux kernel jhash
>> + */
>> +
>> +/* __jhash_mix -- mix 3 32-bit values reversibly. */
>> +#define __jhash_mix(a, b, c)                \
>> +{                                           \
>> +    a -= c;  a ^= rol32(c, 4);  c += b;     \
>> +    b -= a;  b ^= rol32(a, 6);  a += c;     \
>> +    c -= b;  c ^= rol32(b, 8);  b += a;     \
>> +    a -= c;  a ^= rol32(c, 16); c += b;     \
>> +    b -= a;  b ^= rol32(a, 19); a += c;     \
>> +    c -= b;  c ^= rol32(b, 4);  b += a;     \
>> +}
>> +
>> +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
>> +#define __jhash_final(a, b, c)  \
>> +{                               \
>> +    c ^= b; c -= rol32(b, 14);  \
>> +    a ^= c; a -= rol32(c, 11);  \
>> +    b ^= a; b -= rol32(a, 25);  \
>> +    c ^= b; c -= rol32(b, 16);  \
>> +    a ^= c; a -= rol32(c, 4);   \
>> +    b ^= a; b -= rol32(a, 14);  \
>> +    c ^= b; c -= rol32(b, 24);  \
>> +}
>> +
>> +/* An arbitrary initial parameter */
>> +#define JHASH_INITVAL           0xdeadbeef
>> +
>> +#endif /* QEMU_JHASH_H__ */
>> -- 
>> 1.9.1
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2016-01-11  1:49     ` Zhang Chen
@ 2016-01-11 12:50       ` Dr. David Alan Gilbert
  2016-01-12  1:58         ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-11 12:50 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 01/08/2016 08:08 PM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>
> >>Jhash used by colo-proxy to save and lookup
> >>net connection info
> >>
> >>Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>---
> >>  include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 61 insertions(+)
> >>  create mode 100644 include/qemu/jhash.h
> >>
> >>diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
> >>new file mode 100644
> >>index 0000000..5b82d02
> >>--- /dev/null
> >>+++ b/include/qemu/jhash.h
> >>@@ -0,0 +1,61 @@
> >>+/* jhash.h: Jenkins hash support.
> >>+  *
> >>+  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
> >>+  *
> >>+  * http://burtleburtle.net/bob/hash/
> >>+  *
> >>+  * These are the credits from Bob's sources:
> >>+  *
> >>+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
> >>+  *
> >>+  * These are functions for producing 32-bit hashes for hash table lookup.
> >>+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
> >>+  * are externally useful functions.  Routines to test the hash are
> >>+included
> >>+  * if SELF_TEST is defined.  You can use this free for any purpose.
> >>+It's in
> >>+  * the public domain.  It has no warranty.
> >>+  *
> >>+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
> >>+  *
> >>+  * I've modified Bob's hash to be useful in the Linux kernel, and
> >>+  * any bugs present are my fault.
> >>+  * Jozsef
> >>+  */
> >>+
> >>+#ifndef QEMU_JHASH_H__
> >>+#define QEMU_JHASH_H__
> >>+
> >>+#include "qemu/bitopt.h"
> >That does not build, the header in qemu is bitop*s*.h.
> >
> >Dave
> 
> I'm very sorry for it, fix it to
> 
> #include "qemu/bitopts.h"

No! It's:

#include "qemu/bitops.h"

Please at least build test this code!

Dave

> 
> Thanks
> zhangchen
> 
> 
> >>+
> >>+/*
> >>+ * hashtable relation copy from linux kernel jhash
> >>+ */
> >>+
> >>+/* __jhash_mix -- mix 3 32-bit values reversibly. */
> >>+#define __jhash_mix(a, b, c)                \
> >>+{                                           \
> >>+    a -= c;  a ^= rol32(c, 4);  c += b;     \
> >>+    b -= a;  b ^= rol32(a, 6);  a += c;     \
> >>+    c -= b;  c ^= rol32(b, 8);  b += a;     \
> >>+    a -= c;  a ^= rol32(c, 16); c += b;     \
> >>+    b -= a;  b ^= rol32(a, 19); a += c;     \
> >>+    c -= b;  c ^= rol32(b, 4);  b += a;     \
> >>+}
> >>+
> >>+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> >>+#define __jhash_final(a, b, c)  \
> >>+{                               \
> >>+    c ^= b; c -= rol32(b, 14);  \
> >>+    a ^= c; a -= rol32(c, 11);  \
> >>+    b ^= a; b -= rol32(a, 25);  \
> >>+    c ^= b; c -= rol32(b, 16);  \
> >>+    a ^= c; a -= rol32(c, 4);   \
> >>+    b ^= a; b -= rol32(a, 14);  \
> >>+    c ^= b; c -= rol32(b, 24);  \
> >>+}
> >>+
> >>+/* An arbitrary initial parameter */
> >>+#define JHASH_INITVAL           0xdeadbeef
> >>+
> >>+#endif /* QEMU_JHASH_H__ */
> >>-- 
> >>1.9.1
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-11  1:30   ` Zhang Chen
@ 2016-01-11 12:59     ` Dr. David Alan Gilbert
  2016-01-12  7:32       ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-11 12:59 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 01/08/2016 07:19 PM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>
> >>Hi,all
> >>
> >>This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> >>based on qemu netfilter and it's a plugin for qemu netfilter. the function
> >>keep Secondary VM connect normal to Primary VM and compare packets
> >>sent by PVM to sent by SVM.if the packet difference,notify COLO do
> >>checkpoint and send all primary packet has queued.
> >>
> >>You can also get the series from:
> >>
> >>https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> >Are you sure that tag is correct? The series of commits doesn't seem to match
> >up with the set of commits posted.
> >
> >Dave
> 
> Yes, it is. we have some code fix in other colo unrelated file.
> in email, I just send colo related part.

That doesn't seem to be what's happening in that git tree.
For example, your patch '[RFC PATCH v2 01/10] Init colo-proxy object based on netfilter'
adds the colo-proxy object to qemu-options.hx, but in that git tree
it comes from Li Zhijian's 'add proxy prototype' patch.
If you're going to include a git link with a series then please
make sure it contains exactly the patches posted.
It's OK to add some more patches somewhere, e.g. on a different tag or
branch, but make sure the one that you post for the series matches
the series posted.

Dave

> 
> zhangchen
> Thanks for review
> 
> >>Usage:
> >>
> >>primary:
> >>-netdev tap,id=bn0 -device e1000,netdev=bn0
> >>-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> >>
> >>secondary:
> >>-netdev tap,id=bn0 -device e1000,netdev=bn0
> >>-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
> >>
> >>NOTE:
> >>queue must set "all". See enum NetFilterDirection for detail.
> >>colo-proxy need queue all packets
> >>colo-proxy V2 just can compare ip packet
> >>
> >>
> >>## Background
> >>
> >>COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> >>project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> >>(SVM) run in parallel. They receive the same request from client, and generate
> >>responses in parallel too. If the response packets from PVM and SVM are
> >>identical, they are released immediately. Otherwise, a VM checkpoint (on
> >>demand)is conducted.
> >>
> >>Paper:
> >>http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> >>
> >>COLO on Xen:
> >>http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> >>
> >>COLO on Qemu/KVM:
> >>http://wiki.qemu.org/Features/COLO
> >>
> >>By the needs of capturing response packets from PVM and SVM and finding out
> >>whether they are identical, we introduce a new module to qemu networking
> >>called colo-proxy.
> >>
> >>V2:
> >>   rebase colo-proxy with qemu-colo-v2.2-periodic-mode
> >>   fix dave's comments
> >>   fix wency's comments
> >>   fix zhanghailiang's comments
> >>
> >>v1:
> >>   initial patch.
> >>
> >>
> >>
> >>zhangchen (10):
> >>   Init colo-proxy object based on netfilter
> >>   Jhash: add linux kernel jhashtable in qemu
> >>   Colo-proxy: add colo-proxy framework
> >>   Colo-proxy: add data structure and jhash func
> >>   net/colo-proxy: Add colo interface to use proxy
> >>   net/colo-proxy: add socket used by forward func
> >>   net/colo-proxy: Add packet enqueue & handle func
> >>   net/colo-proxy: Handle packet and connection
> >>   net/colo-proxy: Compare pri pkt to sec pkt
> >>   net/colo-proxy: Colo-proxy do checkpoint and clear
> >>
> >>  include/qemu/jhash.h |  61 ++++
> >>  net/Makefile.objs    |   1 +
> >>  net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  net/colo-proxy.h     |  24 ++
> >>  qemu-options.hx      |   6 +
> >>  trace-events         |   8 +
> >>  vl.c                 |   3 +-
> >>  7 files changed, 1041 insertions(+), 1 deletion(-)
> >>  create mode 100644 include/qemu/jhash.h
> >>  create mode 100644 net/colo-proxy.c
> >>  create mode 100644 net/colo-proxy.h
> >>
> >>-- 
> >>1.9.1
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2016-01-11 12:50       ` Dr. David Alan Gilbert
@ 2016-01-12  1:58         ` Zhang Chen
  2016-01-12  8:58           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-12  1:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang



On 01/11/2016 08:50 PM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>
>> On 01/08/2016 08:08 PM, Dr. David Alan Gilbert wrote:
>>> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>
>>>> Jhash used by colo-proxy to save and lookup
>>>> net connection info
>>>>
>>>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> ---
>>>>   include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 61 insertions(+)
>>>>   create mode 100644 include/qemu/jhash.h
>>>>
>>>> diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
>>>> new file mode 100644
>>>> index 0000000..5b82d02
>>>> --- /dev/null
>>>> +++ b/include/qemu/jhash.h
>>>> @@ -0,0 +1,61 @@
>>>> +/* jhash.h: Jenkins hash support.
>>>> +  *
>>>> +  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
>>>> +  *
>>>> +  * http://burtleburtle.net/bob/hash/
>>>> +  *
>>>> +  * These are the credits from Bob's sources:
>>>> +  *
>>>> +  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
>>>> +  *
>>>> +  * These are functions for producing 32-bit hashes for hash table lookup.
>>>> +  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
>>>> +  * are externally useful functions.  Routines to test the hash are
>>>> +included
>>>> +  * if SELF_TEST is defined.  You can use this free for any purpose.
>>>> +It's in
>>>> +  * the public domain.  It has no warranty.
>>>> +  *
>>>> +  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
>>>> +  *
>>>> +  * I've modified Bob's hash to be useful in the Linux kernel, and
>>>> +  * any bugs present are my fault.
>>>> +  * Jozsef
>>>> +  */
>>>> +
>>>> +#ifndef QEMU_JHASH_H__
>>>> +#define QEMU_JHASH_H__
>>>> +
>>>> +#include "qemu/bitopt.h"
>>> That does not build, the header in qemu is bitop*s*.h.
>>>
>>> Dave
>> I'm very sorry for it, fix it to
>>
>> #include "qemu/bitopts.h"
> No! It's:
>
> #include "qemu/bitops.h"
>
> Please at least build test this code!
>
> Dave

Fix it to #include "qemu/bitops.h"
I have rebuild this code,but qemu makefile did't check the .h
I don't know whether it is a qemu bug.
you can try change it to #include "qemu/bitops.h" and make.
then change it to #include "qemu/bitopts.h" and make.
repeat it twice, now, you can change it to #include "everything"
in jhash.h. gcc don't check the .h and report error.////


Thanks
zhangchen


>> Thanks
>> zhangchen
>>
>>
>>>> +
>>>> +/*
>>>> + * hashtable relation copy from linux kernel jhash
>>>> + */
>>>> +
>>>> +/* __jhash_mix -- mix 3 32-bit values reversibly. */
>>>> +#define __jhash_mix(a, b, c)                \
>>>> +{                                           \
>>>> +    a -= c;  a ^= rol32(c, 4);  c += b;     \
>>>> +    b -= a;  b ^= rol32(a, 6);  a += c;     \
>>>> +    c -= b;  c ^= rol32(b, 8);  b += a;     \
>>>> +    a -= c;  a ^= rol32(c, 16); c += b;     \
>>>> +    b -= a;  b ^= rol32(a, 19); a += c;     \
>>>> +    c -= b;  c ^= rol32(b, 4);  b += a;     \
>>>> +}
>>>> +
>>>> +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
>>>> +#define __jhash_final(a, b, c)  \
>>>> +{                               \
>>>> +    c ^= b; c -= rol32(b, 14);  \
>>>> +    a ^= c; a -= rol32(c, 11);  \
>>>> +    b ^= a; b -= rol32(a, 25);  \
>>>> +    c ^= b; c -= rol32(b, 16);  \
>>>> +    a ^= c; a -= rol32(c, 4);   \
>>>> +    b ^= a; b -= rol32(a, 14);  \
>>>> +    c ^= b; c -= rol32(b, 24);  \
>>>> +}
>>>> +
>>>> +/* An arbitrary initial parameter */
>>>> +#define JHASH_INITVAL           0xdeadbeef
>>>> +
>>>> +#endif /* QEMU_JHASH_H__ */
>>>> -- 
>>>> 1.9.1
>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>>
>>> .
>>>
>> -- 
>> Thanks
>> zhangchen
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-11 12:59     ` Dr. David Alan Gilbert
@ 2016-01-12  7:32       ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-01-12  7:32 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 01/11/2016 08:59 PM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>
>> On 01/08/2016 07:19 PM, Dr. David Alan Gilbert wrote:
>>> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>
>>>> Hi,all
>>>>
>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the function
>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>> checkpoint and send all primary packet has queued.
>>>>
>>>> You can also get the series from:
>>>>
>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>> Are you sure that tag is correct? The series of commits doesn't seem to match
>>> up with the set of commits posted.
>>>
>>> Dave
>> Yes, it is. we have some code fix in other colo unrelated file.
>> in email, I just send colo related part.
> That doesn't seem to be what's happening in that git tree.
> For example, your patch '[RFC PATCH v2 01/10] Init colo-proxy object based on netfilter'
> adds the colo-proxy object to qemu-options.hx, but in that git tree
> it comes from Li Zhijian's 'add proxy prototype' patch.
> If you're going to include a git link with a series then please
> make sure it contains exactly the patches posted.
> It's OK to add some more patches somewhere, e.g. on a different tag or
> branch, but make sure the one that you post for the series matches
> the series posted.
>
> Dave
>

Make sense.
I have fix it.

https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2

Thanks
zhangchen

>> zhangchen
>> Thanks for review
>>
>>>> Usage:
>>>>
>>>> primary:
>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>
>>>> secondary:
>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>
>>>> NOTE:
>>>> queue must set "all". See enum NetFilterDirection for detail.
>>>> colo-proxy need queue all packets
>>>> colo-proxy V2 just can compare ip packet
>>>>
>>>>
>>>> ## Background
>>>>
>>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
>>>> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
>>>> (SVM) run in parallel. They receive the same request from client, and generate
>>>> responses in parallel too. If the response packets from PVM and SVM are
>>>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>>>> demand)is conducted.
>>>>
>>>> Paper:
>>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>>>
>>>> COLO on Xen:
>>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>>>
>>>> COLO on Qemu/KVM:
>>>> http://wiki.qemu.org/Features/COLO
>>>>
>>>> By the needs of capturing response packets from PVM and SVM and finding out
>>>> whether they are identical, we introduce a new module to qemu networking
>>>> called colo-proxy.
>>>>
>>>> V2:
>>>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>>>    fix dave's comments
>>>>    fix wency's comments
>>>>    fix zhanghailiang's comments
>>>>
>>>> v1:
>>>>    initial patch.
>>>>
>>>>
>>>>
>>>> zhangchen (10):
>>>>    Init colo-proxy object based on netfilter
>>>>    Jhash: add linux kernel jhashtable in qemu
>>>>    Colo-proxy: add colo-proxy framework
>>>>    Colo-proxy: add data structure and jhash func
>>>>    net/colo-proxy: Add colo interface to use proxy
>>>>    net/colo-proxy: add socket used by forward func
>>>>    net/colo-proxy: Add packet enqueue & handle func
>>>>    net/colo-proxy: Handle packet and connection
>>>>    net/colo-proxy: Compare pri pkt to sec pkt
>>>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>>>
>>>>   include/qemu/jhash.h |  61 ++++
>>>>   net/Makefile.objs    |   1 +
>>>>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   net/colo-proxy.h     |  24 ++
>>>>   qemu-options.hx      |   6 +
>>>>   trace-events         |   8 +
>>>>   vl.c                 |   3 +-
>>>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>>>   create mode 100644 include/qemu/jhash.h
>>>>   create mode 100644 net/colo-proxy.c
>>>>   create mode 100644 net/colo-proxy.h
>>>>
>>>> -- 
>>>> 1.9.1
>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>>
>>> .
>>>
>> -- 
>> Thanks
>> zhangchen
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu
  2016-01-12  1:58         ` Zhang Chen
@ 2016-01-12  8:58           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-12  8:58 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 01/11/2016 08:50 PM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>
> >>On 01/08/2016 08:08 PM, Dr. David Alan Gilbert wrote:
> >>>* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>>>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>>>
> >>>>Jhash used by colo-proxy to save and lookup
> >>>>net connection info
> >>>>
> >>>>Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>>>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>>>---
> >>>>  include/qemu/jhash.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 61 insertions(+)
> >>>>  create mode 100644 include/qemu/jhash.h
> >>>>
> >>>>diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
> >>>>new file mode 100644
> >>>>index 0000000..5b82d02
> >>>>--- /dev/null
> >>>>+++ b/include/qemu/jhash.h
> >>>>@@ -0,0 +1,61 @@
> >>>>+/* jhash.h: Jenkins hash support.
> >>>>+  *
> >>>>+  * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
> >>>>+  *
> >>>>+  * http://burtleburtle.net/bob/hash/
> >>>>+  *
> >>>>+  * These are the credits from Bob's sources:
> >>>>+  *
> >>>>+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
> >>>>+  *
> >>>>+  * These are functions for producing 32-bit hashes for hash table lookup.
> >>>>+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
> >>>>+  * are externally useful functions.  Routines to test the hash are
> >>>>+included
> >>>>+  * if SELF_TEST is defined.  You can use this free for any purpose.
> >>>>+It's in
> >>>>+  * the public domain.  It has no warranty.
> >>>>+  *
> >>>>+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
> >>>>+  *
> >>>>+  * I've modified Bob's hash to be useful in the Linux kernel, and
> >>>>+  * any bugs present are my fault.
> >>>>+  * Jozsef
> >>>>+  */
> >>>>+
> >>>>+#ifndef QEMU_JHASH_H__
> >>>>+#define QEMU_JHASH_H__
> >>>>+
> >>>>+#include "qemu/bitopt.h"
> >>>That does not build, the header in qemu is bitop*s*.h.
> >>>
> >>>Dave
> >>I'm very sorry for it, fix it to
> >>
> >>#include "qemu/bitopts.h"
> >No! It's:
> >
> >#include "qemu/bitops.h"
> >
> >Please at least build test this code!
> >
> >Dave
> 
> Fix it to #include "qemu/bitops.h"
> I have rebuild this code,but qemu makefile did't check the .h
> I don't know whether it is a qemu bug.
> you can try change it to #include "qemu/bitops.h" and make.
> then change it to #include "qemu/bitopts.h" and make.
> repeat it twice, now, you can change it to #include "everything"
> in jhash.h. gcc don't check the .h and report error.////

gcc/makefile don't check these things; it's when you include the
next patch in your series, that #include "qemu/jhash.h" in colo-proxy.c
which is where it will break.

Dave

> 
> 
> Thanks
> zhangchen
> 
> 
> >>Thanks
> >>zhangchen
> >>
> >>
> >>>>+
> >>>>+/*
> >>>>+ * hashtable relation copy from linux kernel jhash
> >>>>+ */
> >>>>+
> >>>>+/* __jhash_mix -- mix 3 32-bit values reversibly. */
> >>>>+#define __jhash_mix(a, b, c)                \
> >>>>+{                                           \
> >>>>+    a -= c;  a ^= rol32(c, 4);  c += b;     \
> >>>>+    b -= a;  b ^= rol32(a, 6);  a += c;     \
> >>>>+    c -= b;  c ^= rol32(b, 8);  b += a;     \
> >>>>+    a -= c;  a ^= rol32(c, 16); c += b;     \
> >>>>+    b -= a;  b ^= rol32(a, 19); a += c;     \
> >>>>+    c -= b;  c ^= rol32(b, 4);  b += a;     \
> >>>>+}
> >>>>+
> >>>>+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
> >>>>+#define __jhash_final(a, b, c)  \
> >>>>+{                               \
> >>>>+    c ^= b; c -= rol32(b, 14);  \
> >>>>+    a ^= c; a -= rol32(c, 11);  \
> >>>>+    b ^= a; b -= rol32(a, 25);  \
> >>>>+    c ^= b; c -= rol32(b, 16);  \
> >>>>+    a ^= c; a -= rol32(c, 4);   \
> >>>>+    b ^= a; b -= rol32(a, 14);  \
> >>>>+    c ^= b; c -= rol32(b, 24);  \
> >>>>+}
> >>>>+
> >>>>+/* An arbitrary initial parameter */
> >>>>+#define JHASH_INITVAL           0xdeadbeef
> >>>>+
> >>>>+#endif /* QEMU_JHASH_H__ */
> >>>>-- 
> >>>>1.9.1
> >>>>
> >>>>
> >>>>
> >>>--
> >>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>
> >>>.
> >>>
> >>-- 
> >>Thanks
> >>zhangchen
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object based on netfilter
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
@ 2016-01-15 18:21   ` Dr. David Alan Gilbert
  2016-01-18  7:08     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-15 18:21 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> add colo-proxy to vl.c and qemu-options.hx
> add trace-colo-proxy relation
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  qemu-options.hx | 6 ++++++
>  trace-events    | 8 ++++++++
>  vl.c            | 3 ++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 0eea4ee..6daa3f0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3670,6 +3670,12 @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
>  @option{tx}: the filter is attached to the transmit queue of the netdev,
>               where it will receive packets sent by the netdev.
>  
> +@item -object colo-proxy,id=@var{id},netdev=@var{netdevid},addr=@var{host:port},mode=@var{primary|secondary}[,queue=@var{all}]
> +
> +Colo-proxy on netdev @var{netdevid},set colo mode @var{primary|secondary}
> +connect other colo through addr@var{host:port},and colo needs queue all
> +packet arriving in queue=@var{all}
> +
>  @item -object filter-dump,id=@var{id},netdev=@var{dev},file=@var{filename}][,maxlen=@var{len}]
>  
>  Dump the network traffic on netdev @var{dev} to the file specified by
> diff --git a/trace-events b/trace-events
> index 5f95b3c..a957fb3 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1586,6 +1586,14 @@ colo_failover_set_state(int new_state) "new state %d"
>  colo_start_block_replication(void) "Block replication is started"
>  colo_stop_block_replication(const char *reason) "Block replication is stopped(reason: '%s')"
>  
> +# net/colo-proxy.c
> +colo_proxy(const char *sta) ": %s"

You use the 'colo_proxy' trace in a lot of different places;  it would
be better to use individual trace entries, so for example you could
just trace miscompares.

Dave

> +colo_proxy_with_ret(const char *sta, ssize_t ret) ": %s ret = %zu"
> +colo_proxy_packet_src(const char *src) ":ipsrc = %s"
> +colo_proxy_packet_dst(const char *dst) ":ipdst = %s"
> +colo_proxy_packet_size(int size) ": %d"
> +colo_proxy_queue_size(int size) ": %d"
> +
>  # kvm-all.c
>  kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>  kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
> diff --git a/vl.c b/vl.c
> index 8dc34ce..dcfb3a9 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2838,7 +2838,8 @@ static bool object_create_initial(const char *type)
>       * they depend on netdevs already existing
>       */
>      if (g_str_equal(type, "filter-buffer") ||
> -        g_str_equal(type, "filter-dump")) {
> +        g_str_equal(type, "filter-dump") ||
> +        g_str_equal(type, "colo-proxy")) {
>          return false;
>      }
>  
> -- 
> 1.9.1
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-06  5:16             ` Jason Wang
@ 2016-01-18  7:05               ` Zhang Chen
  2016-01-18  9:29                 ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-18  7:05 UTC (permalink / raw)
  To: Jason Wang, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/06/2016 01:16 PM, Jason Wang wrote:
>
> On 01/04/2016 07:17 PM, Zhang Chen wrote:
>>
>> On 01/04/2016 05:46 PM, Jason Wang wrote:
>>> On 01/04/2016 04:16 PM, Zhang Chen wrote:
>>>> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>>>>
>>>>>>>> Hi,all
>>>>>>>>
>>>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>>>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>>>>> function
>>>>>>>> keep Secondary VM connect normal to Primary VM and compare packets
>>>>>>>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>>>>>>>> checkpoint and send all primary packet has queued.
>>>>>>> Thanks for the work. I don't object this method but still not
>>>>>>> convinced
>>>>>>> that qemu is the best place for this.
>>>>>>>
>>>>>>> As been raised in the past discussion, it's almost impossible to
>>>>>>> cooperate with vhost backends. If we want this to be used in
>>>>>>> production
>>>>>>> environment, need to think of a solution for vhost. There's no such
>>>>>>> worry if we decouple this from qemu.
>>>>>>>
>>>>>>>> You can also get the series from:
>>>>>>>>
>>>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Usage:
>>>>>>>>
>>>>>>>> primary:
>>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>>> -object
>>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>>>>
>>>>>>>> secondary:
>>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>>> -object
>>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>>>> Have a quick glance at how secondary mode work. What it does is just
>>>>>>> forwarding packets between a nic and a socket, qemu socket
>>>>>>> backend did
>>>>>>> exact the same job. You could even use socket in primary node and
>>>>>>> let
>>>>>>> packet compare module talk to both primary and secondary node.
>>>>>> If we use qemu socket backend , the same netdev will used by qemu
>>>>>> socket and
>>>>>> qemu netfilter. this will against qemu net design. and then, when
>>>>>> colo
>>>>>> do failover,
>>>>>> secondary do not have backend to use. that's the real problem.
>>>>> Then, maybe it's time to implement changing the netdev of a nic. The
>>>>> point here is that what secondary mode did is in fact a netdev backend
>>>>> instead of a filter ...
>>>> Currently, you are right. in colo-proxy V2 code, I just compare IP
>>>> packet to
>>>> decide whether to do checkpoint.
>>>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide it.
>>>> because that can reduce frequency of checkpoint and improve
>>>> performance. To keep tcp connection well, colo secondary need to record
>>>> primary guest's init seq and adjust secondary guest's ack. if colo do
>>>> failover,
>>>> secondary also need do this to old tcp connection. qemu socket
>>>> can't do this job.
>>> So a question here: is it a must to do things (e.g TCP analysis stuffs)
>>> at secondary? Looks like we could do this at primary node. And I saw
>>> you're doing packet comparing in primary node, any advantages of doing
>>> this in primary instead of secondary?
>> We think must  to do this in secondary, because if colo do
>> failover,secondary
>> must continues do TCP analysis stuffs to before tcp connection(if not,
>> tcp connection
>> will disconnect in that time), in this time primary already down or
>> disconnect to
>> secondary.so we can't make primary do this  TCP analysis stuffs.it can
>> not ensure
>> FT function.
>>
>> Thanks
>> zhangchen
> Makes sense.
>
> Thanks

Hi~, Jason.
No news for a week.
Can you give me some comments for code.
Let's make colo-proxy work well.

Thanks
zhangchen

>>>> and another problem is do failover, if we use qemu socket
>>>> to be backend in secondary, when colo do failover, I don't know how to
>>>> change
>>>> secondary be a normal qemu, if you know, please tell me.
>>> Current qemu couldn't do this, but I mean we implement something like
>>> nic_change_backend which can change nic's peer(s). With this, in
>>> secondary, we can replace the socket backend with whatever you want (e.g
>>> tap or other).
>>>
>>> Thanks
>>>
>>>> Thanks for your revew
>>>> zhangchen
>>>
>>> .
>>>
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object based on netfilter
  2016-01-15 18:21   ` Dr. David Alan Gilbert
@ 2016-01-18  7:08     ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-01-18  7:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang



On 01/16/2016 02:21 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> add colo-proxy to vl.c and qemu-options.hx
>> add trace-colo-proxy relation
>>
>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   qemu-options.hx | 6 ++++++
>>   trace-events    | 8 ++++++++
>>   vl.c            | 3 ++-
>>   3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 0eea4ee..6daa3f0 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -3670,6 +3670,12 @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
>>   @option{tx}: the filter is attached to the transmit queue of the netdev,
>>                where it will receive packets sent by the netdev.
>>   
>> +@item -object colo-proxy,id=@var{id},netdev=@var{netdevid},addr=@var{host:port},mode=@var{primary|secondary}[,queue=@var{all}]
>> +
>> +Colo-proxy on netdev @var{netdevid},set colo mode @var{primary|secondary}
>> +connect other colo through addr@var{host:port},and colo needs queue all
>> +packet arriving in queue=@var{all}
>> +
>>   @item -object filter-dump,id=@var{id},netdev=@var{dev},file=@var{filename}][,maxlen=@var{len}]
>>   
>>   Dump the network traffic on netdev @var{dev} to the file specified by
>> diff --git a/trace-events b/trace-events
>> index 5f95b3c..a957fb3 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1586,6 +1586,14 @@ colo_failover_set_state(int new_state) "new state %d"
>>   colo_start_block_replication(void) "Block replication is started"
>>   colo_stop_block_replication(const char *reason) "Block replication is stopped(reason: '%s')"
>>   
>> +# net/colo-proxy.c
>> +colo_proxy(const char *sta) ": %s"
> You use the 'colo_proxy' trace in a lot of different places;  it would
> be better to use individual trace entries, so for example you could
> just trace miscompares.
>
> Dave

I will fix it in next version.

Thanks
zhangchen

>
>> +colo_proxy_with_ret(const char *sta, ssize_t ret) ": %s ret = %zu"
>> +colo_proxy_packet_src(const char *src) ":ipsrc = %s"
>> +colo_proxy_packet_dst(const char *dst) ":ipdst = %s"
>> +colo_proxy_packet_size(int size) ": %d"
>> +colo_proxy_queue_size(int size) ": %d"
>> +
>>   # kvm-all.c
>>   kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>>   kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
>> diff --git a/vl.c b/vl.c
>> index 8dc34ce..dcfb3a9 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2838,7 +2838,8 @@ static bool object_create_initial(const char *type)
>>        * they depend on netdevs already existing
>>        */
>>       if (g_str_equal(type, "filter-buffer") ||
>> -        g_str_equal(type, "filter-dump")) {
>> +        g_str_equal(type, "filter-dump") ||
>> +        g_str_equal(type, "colo-proxy")) {
>>           return false;
>>       }
>>   
>> -- 
>> 1.9.1
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-18  7:05               ` Zhang Chen
@ 2016-01-18  9:29                 ` Jason Wang
  2016-01-20  3:29                   ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-18  9:29 UTC (permalink / raw)
  To: Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/18/2016 03:05 PM, Zhang Chen wrote:
>
>
> On 01/06/2016 01:16 PM, Jason Wang wrote:
>>
>> On 01/04/2016 07:17 PM, Zhang Chen wrote:
>>>
>>> On 01/04/2016 05:46 PM, Jason Wang wrote:
>>>> On 01/04/2016 04:16 PM, Zhang Chen wrote:
>>>>> On 01/04/2016 01:37 PM, Jason Wang wrote:
>>>>>> On 12/31/2015 04:40 PM, Zhang Chen wrote:
>>>>>>> On 12/31/2015 10:36 AM, Jason Wang wrote:
>>>>>>>> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>>>>>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>>>>>>
>>>>>>>>> Hi,all
>>>>>>>>>
>>>>>>>>> This patch add an colo-proxy object, COLO-Proxy is a part of
>>>>>>>>> COLO,
>>>>>>>>> based on qemu netfilter and it's a plugin for qemu netfilter. the
>>>>>>>>> function
>>>>>>>>> keep Secondary VM connect normal to Primary VM and compare
>>>>>>>>> packets
>>>>>>>>> sent by PVM to sent by SVM.if the packet difference,notify
>>>>>>>>> COLO do
>>>>>>>>> checkpoint and send all primary packet has queued.
>>>>>>>> Thanks for the work. I don't object this method but still not
>>>>>>>> convinced
>>>>>>>> that qemu is the best place for this.
>>>>>>>>
>>>>>>>> As been raised in the past discussion, it's almost impossible to
>>>>>>>> cooperate with vhost backends. If we want this to be used in
>>>>>>>> production
>>>>>>>> environment, need to think of a solution for vhost. There's no
>>>>>>>> such
>>>>>>>> worry if we decouple this from qemu.
>>>>>>>>
>>>>>>>>> You can also get the series from:
>>>>>>>>>
>>>>>>>>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Usage:
>>>>>>>>>
>>>>>>>>> primary:
>>>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>>>> -object
>>>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>>>>>>>>
>>>>>>>>> secondary:
>>>>>>>>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>>>>>>>>> -object
>>>>>>>>> colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>>>>>>>>
>>>>>>>> Have a quick glance at how secondary mode work. What it does is
>>>>>>>> just
>>>>>>>> forwarding packets between a nic and a socket, qemu socket
>>>>>>>> backend did
>>>>>>>> exact the same job. You could even use socket in primary node and
>>>>>>>> let
>>>>>>>> packet compare module talk to both primary and secondary node.
>>>>>>> If we use qemu socket backend , the same netdev will used by qemu
>>>>>>> socket and
>>>>>>> qemu netfilter. this will against qemu net design. and then, when
>>>>>>> colo
>>>>>>> do failover,
>>>>>>> secondary do not have backend to use. that's the real problem.
>>>>>> Then, maybe it's time to implement changing the netdev of a nic. The
>>>>>> point here is that what secondary mode did is in fact a netdev
>>>>>> backend
>>>>>> instead of a filter ...
>>>>> Currently, you are right. in colo-proxy V2 code, I just compare IP
>>>>> packet to
>>>>> decide whether to do checkpoint.
>>>>> But, in colo-proxy V3 I will compare tcp,icmp,udp packet to decide
>>>>> it.
>>>>> because that can reduce frequency of checkpoint and improve
>>>>> performance. To keep tcp connection well, colo secondary need to
>>>>> record
>>>>> primary guest's init seq and adjust secondary guest's ack. if colo do
>>>>> failover,
>>>>> secondary also need do this to old tcp connection. qemu socket
>>>>> can't do this job.
>>>> So a question here: is it a must to do things (e.g TCP analysis
>>>> stuffs)
>>>> at secondary? Looks like we could do this at primary node. And I saw
>>>> you're doing packet comparing in primary node, any advantages of doing
>>>> this in primary instead of secondary?
>>> We think must  to do this in secondary, because if colo do
>>> failover,secondary
>>> must continues do TCP analysis stuffs to before tcp connection(if not,
>>> tcp connection
>>> will disconnect in that time), in this time primary already down or
>>> disconnect to
>>> secondary.so we can't make primary do this  TCP analysis stuffs.it can
>>> not ensure
>>> FT function.
>>>
>>> Thanks
>>> zhangchen
>> Makes sense.
>>
>> Thanks
>
> Hi~, Jason.
> No news for a week.
> Can you give me some comments for code.
> Let's make colo-proxy work well.

Sure.

Two main comments/suggestions:

- TCP analysis is missed in current version, maybe you point a git tree
(or another version of RFC) to me for a better understanding of the
design. (Just a skeleton for TCP should be sufficient to discuss).
- I prefer to make the code as reusable as possible. So it's better to
split/decouple the reusable parts from the codes. So a vague idea is:

1) Decouple the packet comparing from the netfilter. You've achieved
this 99% since the work has been done in a thread. Just let the thread
poll sockets directly, then the comparing have the possibility to be
reused by other kinds of dataplane.
2) Implement traffic mirror/redirector as filter.
3) Implement TCP seq rewriting as a filter.

Then, in primary node, you need just a traffic mirror, which did:
- mirror ingress traffic to secondary node
- mirror outgress traffic to packet comparing thread

And in secondadry node, you need two filters:
- A TCP seq rewriter which adjust tcp sequence number.
- A traffic redirector which redirect packet from a socket as ingress
traffic, and redirect outgress traffic to the socket which could be
polled by remote packet comparing thread.
 
Thoughts?

Thanks

>
> Thanks
> zhangchen 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-18  9:29                 ` Jason Wang
@ 2016-01-20  3:29                   ` Zhang Chen
  2016-01-20  6:54                     ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Zhang Chen @ 2016-01-20  3:29 UTC (permalink / raw)
  To: Jason Wang, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang


> Sure.
>
> Two main comments/suggestions:
>
> - TCP analysis is missed in current version, maybe you point a git tree
> (or another version of RFC) to me for a better understanding of the
> design. (Just a skeleton for TCP should be sufficient to discuss).
> - I prefer to make the code as reusable as possible. So it's better to
> split/decouple the reusable parts from the codes. So a vague idea is:
>
> 1) Decouple the packet comparing from the netfilter. You've achieved
> this 99% since the work has been done in a thread. Just let the thread
> poll sockets directly, then the comparing have the possibility to be
> reused by other kinds of dataplane.
> 2) Implement traffic mirror/redirector as filter.
> 3) Implement TCP seq rewriting as a filter.
>
> Then, in primary node, you need just a traffic mirror, which did:
> - mirror ingress traffic to secondary node
> - mirror outgress traffic to packet comparing thread
>
> And in secondadry node, you need two filters:
> - A TCP seq rewriter which adjust tcp sequence number.
> - A traffic redirector which redirect packet from a socket as ingress
> traffic, and redirect outgress traffic to the socket which could be
> polled by remote packet comparing thread.
>   
> Thoughts?
>
> Thanks
>
>> Thanks
>> zhangchen
>


Hi, Jason.
We consider your suggestion to split/decouple
the reusable parts from the codes.
Due to filter plugin are traversed one by one in order
we will split colo-proxy to three filters in each side.

But in this plan,primary and secondary both have socket
server,startup is a problem.


  Primary qemu                                                       Secondary qemu
+----------------------------------------------------------+       +-----------------------------------------------------------+
| +-----------------------------------------------------+  |       |  +------------------------------------------------------+ |
| |                                                     |  |       |  |                                                      | |
| |                        guest                        |  |       |  |                        guest                         | |
| |                                                     |  |       |  |                                                      | |
| +-----------^--------------+--------------------------+  |       |  +---------------------+--------+-----------------------+ |
|             |              |                             |       |                        ^        |                         |
|             |              |                             |       |                        |        |                         |
|             +-------------------------------------------------+  |                        |        |                         |
|  netfilter  |              |                             |    |  |   netfilter            |        |                         |
| +-----------------------------------------------------+  |    |  |  +------------------------------------------------------+ |
| |           |              |     filter excute order  |  |    |  |  |                     |        |  filter excute order  | |
| |           |              |    +-------------------> |  |    |  |  |                     |        | +-------------------> | |
| |           |              |                          |  |    |  |  |                     |        |   TCP                 | |
| | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  |  | +-----------+   +---+----+---v+rewriter+  +--------+ | |
| | |           |     |            |    |            |  |  |    |  |  | |           |   |        |             |  |        | | |
| | |  mirror   |     |  redirect  +---->  compare   |  |  |    +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
| | |  client   |     |  server    |    |            |  |  |       |  | |  server   |   | ack    |   seq       |  |client  | | |
| | |           |     |            |    |            |  |  |       |  | |           |   |        |             |  |        | | |
| | +----^------+     +----^-------+    +-----+------+  |  |       |  | +-----------+   +--------+-------------+  +----+---+ | |
| |      |     tx          |      rx          |     rx  |  |       |  |            tx                        all       |  rx | |
| +-----------------------------------------------------+  |       |  +------------------------------------------------------+ |
|        |                 +-------------------------------------------------------------------------------------------+       |
|        |                                    |            |       |                                                           |
+----------------------------------------------------------+       +-----------------------------------------------------------+
          |                                    |
          |guest receive                       |guest send
          |                                    |
+--------+------------------------------------v------------+
|                                                          |
|                                                          |
|                         tap                              |                              NOTE: filter direction is rx/tx/all
|                                                          |                              rx:receive packets sent to the netdev
|                                                          |                              tx:receive packets sent by the netdev
+----------------------------------------------------------+







guest recv packet route

primary
tap --> mirror client filter
mirror client will send packet to guest,at the
same time, copy and forward packet to secondary
mirror server.

secondary
mirror server filter --> TCP rewriter
if recv packet is TCP packet,we will adjust ack
and update TCP checksum, then send to secondary
guest. else directly send to guest.


guest send packet route

primary
guest --> redirect server filter
redirect server filter recv primary guest packet
but do nothing, just pass to next filter.

redirect server filter --> compare filter
compare filter recv primary guest packet then
waiting scondary redirect packet to compare it.
if packet same,send primary packet and clear secondary
packet, else send primary packet and do
checkpoint.

secondary
guest --> TCP rewriter filter
if the packet is TCP packet,we will adjust seq
and update TCP checksum. then send it to
redirect client filter. else directly send to
redirect client filter.

redirect client filter --> redirect server filter
forward packet to primary


In failover scene(primary is down), the TCP rewriter will keep servicing
for the TCP connection which is established after the last checkpoint。



How about this plan?


> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  3:29                   ` Zhang Chen
@ 2016-01-20  6:54                     ` Jason Wang
  2016-01-20  7:44                       ` Wen Congyang
  2016-01-20 10:01                       ` Wen Congyang
  0 siblings, 2 replies; 75+ messages in thread
From: Jason Wang @ 2016-01-20  6:54 UTC (permalink / raw)
  To: Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 01/20/2016 11:29 AM, Zhang Chen wrote:
>
>> Sure.
>>
>> Two main comments/suggestions:
>>
>> - TCP analysis is missed in current version, maybe you point a git tree
>> (or another version of RFC) to me for a better understanding of the
>> design. (Just a skeleton for TCP should be sufficient to discuss).
>> - I prefer to make the code as reusable as possible. So it's better to
>> split/decouple the reusable parts from the codes. So a vague idea is:
>>
>> 1) Decouple the packet comparing from the netfilter. You've achieved
>> this 99% since the work has been done in a thread. Just let the thread
>> poll sockets directly, then the comparing have the possibility to be
>> reused by other kinds of dataplane.
>> 2) Implement traffic mirror/redirector as filter.
>> 3) Implement TCP seq rewriting as a filter.
>>
>> Then, in primary node, you need just a traffic mirror, which did:
>> - mirror ingress traffic to secondary node
>> - mirror outgress traffic to packet comparing thread
>>
>> And in secondadry node, you need two filters:
>> - A TCP seq rewriter which adjust tcp sequence number.
>> - A traffic redirector which redirect packet from a socket as ingress
>> traffic, and redirect outgress traffic to the socket which could be
>> polled by remote packet comparing thread.
>>   Thoughts?
>>
>> Thanks
>>
>>> Thanks
>>> zhangchen
>>
>
>
> Hi, Jason.
> We consider your suggestion to split/decouple
> the reusable parts from the codes.
> Due to filter plugin are traversed one by one in order
> we will split colo-proxy to three filters in each side.
>
> But in this plan,primary and secondary both have socket
> server,startup is a problem.

I believe this issue could be solved by reusing socket chardev.

>
>
>  Primary qemu                                                      
> Secondary qemu
> +----------------------------------------------------------+      
> +-----------------------------------------------------------+
> | +-----------------------------------------------------+  |       | 
> +------------------------------------------------------+ |
> | |                                                     |  |       | 
> |                                                      | |
> | |                        guest                        |  |       | 
> |                        guest                         | |
> | |                                                     |  |       | 
> |                                                      | |
> | +-----------^--------------+--------------------------+  |       | 
> +---------------------+--------+-----------------------+ |
> |             |              |                             |      
> |                        ^        |                         |
> |             |              |                             |      
> |                        |        |                         |
> |             +-------------------------------------------------+ 
> |                        |        |                         |
> |  netfilter  |              |                             |    |  |  
> netfilter            |        |                         |
> | +-----------------------------------------------------+  |    |  | 
> +------------------------------------------------------+ |
> | |           |              |     filter excute order  |  |    |  | 
> |                     |        |  filter excute order  | |
> | |           |              |    +-------------------> |  |    |  | 
> |                     |        | +-------------------> | |
> | |           |              |                          |  |    |  | 
> |                     |        |   TCP                 | |
> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
> | | |           |     |            |    |            |  |  |    |  | 
> | |           |   |        |             |  |        | | |
> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
> | | |  client   |     |  server    |    |            |  |  |       | 
> | |  server   |   | ack    |   seq       |  |client  | | |
> | | |           |     |            |    |            |  |  |       | 
> | |           |   |        |             |  |        | | |
> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
> | +-----------+   +--------+-------------+  +----+---+ | |
> | |      |     tx          |      rx          |     rx  |  |       | 
> |            tx                        all       |  rx | |
> | +-----------------------------------------------------+  |       | 
> +------------------------------------------------------+ |
> |        |                
> +-------------------------------------------------------------------------------------------+      
> |
> |        |                                    |            |      
> |                                                           |
> +----------------------------------------------------------+      
> +-----------------------------------------------------------+
>          |                                    |
>          |guest receive                       |guest send
>          |                                    |
> +--------+------------------------------------v------------+
> |                                                          |
> |                                                          |
> |                         tap                             
> |                              NOTE: filter direction is rx/tx/all
> |                                                         
> |                              rx:receive packets sent to the netdev
> |                                                         
> |                              tx:receive packets sent by the netdev
> +----------------------------------------------------------+
>
>
>

I still like to decouple comparer from netfilter. It have two obvious
advantages:

- make it can be reused by other dataplane (e.g vhost)
- secondary redirector could redirect rx to comparer on primary node
directly which simplify the design.

>
>
>
>
> guest recv packet route
>
> primary
> tap --> mirror client filter
> mirror client will send packet to guest,at the
> same time, copy and forward packet to secondary
> mirror server.
>
> secondary
> mirror server filter --> TCP rewriter
> if recv packet is TCP packet,we will adjust ack
> and update TCP checksum, then send to secondary
> guest. else directly send to guest.
>
>
> guest send packet route
>
> primary
> guest --> redirect server filter
> redirect server filter recv primary guest packet
> but do nothing, just pass to next filter.
>
> redirect server filter --> compare filter
> compare filter recv primary guest packet then
> waiting scondary redirect packet to compare it.
> if packet same,send primary packet and clear secondary
> packet, else send primary packet and do
> checkpoint.
>
> secondary
> guest --> TCP rewriter filter
> if the packet is TCP packet,we will adjust seq
> and update TCP checksum. then send it to
> redirect client filter. else directly send to
> redirect client filter.
>
> redirect client filter --> redirect server filter
> forward packet to primary
>
>
> In failover scene(primary is down), the TCP rewriter will keep
> servicing
> for the TCP connection which is established after the last checkpoint。
>
>
>
> How about this plan?

Sounds good.

And there's indeed no need to differ client/server by reusing the socket
chardev. E.g:

In primary node:

...
-chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
-chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
-chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
-netdev tap,id=hn0
-traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
-colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
...

packet comparer compares the packets from two chardev: comparer0 and
comparer1.
traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
and mirror rx to packet comparer through chardev comparer0.

In secondary node:

...
-chardev socket,id=redirector0,host=ip_primary,port=Y
-chardev socket,id=redirector1,host=ip_primary,port=Z
-netdev tap,id=hn0
-traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
-colo-rewriter netdev=hn0,id=c0
...

traffic-redirector redirect the rx traffic from primary node through
redirector0 and redirect the tx traffic to promary node through redirector1.
colo-rewriter rewrite seq number as a normal netfilter.



>
>
>> .
>>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  6:54                     ` Jason Wang
@ 2016-01-20  7:44                       ` Wen Congyang
  2016-01-20  9:20                         ` Jason Wang
  2016-01-20 10:01                       ` Wen Congyang
  1 sibling, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-20  7:44 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/20/2016 02:54 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>
>>> Sure.
>>>
>>> Two main comments/suggestions:
>>>
>>> - TCP analysis is missed in current version, maybe you point a git tree
>>> (or another version of RFC) to me for a better understanding of the
>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>> - I prefer to make the code as reusable as possible. So it's better to
>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>
>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>> this 99% since the work has been done in a thread. Just let the thread
>>> poll sockets directly, then the comparing have the possibility to be
>>> reused by other kinds of dataplane.
>>> 2) Implement traffic mirror/redirector as filter.
>>> 3) Implement TCP seq rewriting as a filter.
>>>
>>> Then, in primary node, you need just a traffic mirror, which did:
>>> - mirror ingress traffic to secondary node
>>> - mirror outgress traffic to packet comparing thread
>>>
>>> And in secondadry node, you need two filters:
>>> - A TCP seq rewriter which adjust tcp sequence number.
>>> - A traffic redirector which redirect packet from a socket as ingress
>>> traffic, and redirect outgress traffic to the socket which could be
>>> polled by remote packet comparing thread.
>>>   Thoughts?
>>>
>>> Thanks
>>>
>>>> Thanks
>>>> zhangchen
>>>
>>
>>
>> Hi, Jason.
>> We consider your suggestion to split/decouple
>> the reusable parts from the codes.
>> Due to filter plugin are traversed one by one in order
>> we will split colo-proxy to three filters in each side.
>>
>> But in this plan,primary and secondary both have socket
>> server,startup is a problem.
> 
> I believe this issue could be solved by reusing socket chardev.
> 
>>
>>
>>  Primary qemu                                                      
>> Secondary qemu
>> +----------------------------------------------------------+      
>> +-----------------------------------------------------------+
>> | +-----------------------------------------------------+  |       | 
>> +------------------------------------------------------+ |
>> | |                                                     |  |       | 
>> |                                                      | |
>> | |                        guest                        |  |       | 
>> |                        guest                         | |
>> | |                                                     |  |       | 
>> |                                                      | |
>> | +-----------^--------------+--------------------------+  |       | 
>> +---------------------+--------+-----------------------+ |
>> |             |              |                             |      
>> |                        ^        |                         |
>> |             |              |                             |      
>> |                        |        |                         |
>> |             +-------------------------------------------------+ 
>> |                        |        |                         |
>> |  netfilter  |              |                             |    |  |  
>> netfilter            |        |                         |
>> | +-----------------------------------------------------+  |    |  | 
>> +------------------------------------------------------+ |
>> | |           |              |     filter excute order  |  |    |  | 
>> |                     |        |  filter excute order  | |
>> | |           |              |    +-------------------> |  |    |  | 
>> |                     |        | +-------------------> | |
>> | |           |              |                          |  |    |  | 
>> |                     |        |   TCP                 | |
>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>> | | |           |     |            |    |            |  |  |    |  | 
>> | |           |   |        |             |  |        | | |
>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>> | | |  client   |     |  server    |    |            |  |  |       | 
>> | |  server   |   | ack    |   seq       |  |client  | | |
>> | | |           |     |            |    |            |  |  |       | 
>> | |           |   |        |             |  |        | | |
>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>> | +-----------+   +--------+-------------+  +----+---+ | |
>> | |      |     tx          |      rx          |     rx  |  |       | 
>> |            tx                        all       |  rx | |
>> | +-----------------------------------------------------+  |       | 
>> +------------------------------------------------------+ |
>> |        |                
>> +-------------------------------------------------------------------------------------------+      
>> |
>> |        |                                    |            |      
>> |                                                           |
>> +----------------------------------------------------------+      
>> +-----------------------------------------------------------+
>>          |                                    |
>>          |guest receive                       |guest send
>>          |                                    |
>> +--------+------------------------------------v------------+
>> |                                                          |
>> |                                                          |
>> |                         tap                             
>> |                              NOTE: filter direction is rx/tx/all
>> |                                                         
>> |                              rx:receive packets sent to the netdev
>> |                                                         
>> |                              tx:receive packets sent by the netdev
>> +----------------------------------------------------------+
>>
>>
>>
> 
> I still like to decouple comparer from netfilter. It have two obvious
> advantages:
> 
> - make it can be reused by other dataplane (e.g vhost)
> - secondary redirector could redirect rx to comparer on primary node
> directly which simplify the design.
> 
>>
>>
>>
>>
>> guest recv packet route
>>
>> primary
>> tap --> mirror client filter
>> mirror client will send packet to guest,at the
>> same time, copy and forward packet to secondary
>> mirror server.
>>
>> secondary
>> mirror server filter --> TCP rewriter
>> if recv packet is TCP packet,we will adjust ack
>> and update TCP checksum, then send to secondary
>> guest. else directly send to guest.
>>
>>
>> guest send packet route
>>
>> primary
>> guest --> redirect server filter
>> redirect server filter recv primary guest packet
>> but do nothing, just pass to next filter.
>>
>> redirect server filter --> compare filter
>> compare filter recv primary guest packet then
>> waiting scondary redirect packet to compare it.
>> if packet same,send primary packet and clear secondary
>> packet, else send primary packet and do
>> checkpoint.
>>
>> secondary
>> guest --> TCP rewriter filter
>> if the packet is TCP packet,we will adjust seq
>> and update TCP checksum. then send it to
>> redirect client filter. else directly send to
>> redirect client filter.
>>
>> redirect client filter --> redirect server filter
>> forward packet to primary
>>
>>
>> In failover scene(primary is down), the TCP rewriter will keep
>> servicing
>> for the TCP connection which is established after the last checkpoint。
>>
>>
>>
>> How about this plan?
> 
> Sounds good.
> 
> And there's indeed no need to differ client/server by reusing the socket
> chardev. E.g:
> 
> In primary node:

Thanks for your suggestion.

> 
> ...
> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
> -netdev tap,id=hn0
> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
> ...
> 
> packet comparer compares the packets from two chardev: comparer0 and
> comparer1.
> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
> and mirror rx to packet comparer through chardev comparer0.
> 
> In secondary node:
> 
> ...
> -chardev socket,id=redirector0,host=ip_primary,port=Y
> -chardev socket,id=redirector1,host=ip_primary,port=Z
> -netdev tap,id=hn0
> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
> -colo-rewriter netdev=hn0,id=c0
> ...
> 
> traffic-redirector redirect the rx traffic from primary node through
> redirector0 and redirect the tx traffic to promary node through redirector1.
> colo-rewriter rewrite seq number as a normal netfilter.

What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
A netfilter driver?

If not, how to get the packet from the netdev, and send back the packet to
the netdev?

Thanks
Wen Congyang

> 
> 
> 
>>
>>
>>> .
>>>
>>
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  7:44                       ` Wen Congyang
@ 2016-01-20  9:20                         ` Jason Wang
  2016-01-20  9:49                           ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-20  9:20 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/20/2016 03:44 PM, Wen Congyang wrote:
>> > 
>> > ...
>> > -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>> > -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>> > -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>> > -netdev tap,id=hn0
>> > -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>> > -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>> > ...
>> > 
>> > packet comparer compares the packets from two chardev: comparer0 and
>> > comparer1.
>> > traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>> > and mirror rx to packet comparer through chardev comparer0.
>> > 
>> > In secondary node:
>> > 
>> > ...
>> > -chardev socket,id=redirector0,host=ip_primary,port=Y
>> > -chardev socket,id=redirector1,host=ip_primary,port=Z
>> > -netdev tap,id=hn0
>> > -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>> > -colo-rewriter netdev=hn0,id=c0
>> > ...
>> > 
>> > traffic-redirector redirect the rx traffic from primary node through
>> > redirector0 and redirect the tx traffic to promary node through redirector1.
>> > colo-rewriter rewrite seq number as a normal netfilter.
> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
> A netfilter driver?

traffic-mirrorer/redirector is a type of netfilter that just
mirror/redirect packets between netdev and chardev (just the mirror
client/sever and redirect client/sever in the above graph)
colo-rewriter is a type of netfilter that did ack/seq adjust (just the
TCP rewriter in the above graph)
colo-comparer is a thread object that did packet comparing (similar to
"compare" in the above graph but not a netfiler)

>
> If not, how to get the packet from the netdev, and send back the packet to
> the netdev?
>
> Thanks
> Wen Congyang
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  9:20                         ` Jason Wang
@ 2016-01-20  9:49                           ` Wen Congyang
  2016-01-20 10:03                             ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-20  9:49 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/20/2016 05:20 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 03:44 PM, Wen Congyang wrote:
>>>>
>>>> ...
>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>> -netdev tap,id=hn0
>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>> ...
>>>>
>>>> packet comparer compares the packets from two chardev: comparer0 and
>>>> comparer1.
>>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>>> and mirror rx to packet comparer through chardev comparer0.
>>>>
>>>> In secondary node:
>>>>
>>>> ...
>>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>>> -netdev tap,id=hn0
>>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>>> -colo-rewriter netdev=hn0,id=c0
>>>> ...
>>>>
>>>> traffic-redirector redirect the rx traffic from primary node through
>>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>>> colo-rewriter rewrite seq number as a normal netfilter.
>> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
>> A netfilter driver?
> 
> traffic-mirrorer/redirector is a type of netfilter that just
> mirror/redirect packets between netdev and chardev (just the mirror
> client/sever and redirect client/sever in the above graph)
> colo-rewriter is a type of netfilter that did ack/seq adjust (just the
> TCP rewriter in the above graph)
> colo-comparer is a thread object that did packet comparing (similar to
> "compare" in the above graph but not a netfiler)

Thanks. I have another question:
IIRC, both rx and tx packets walk through all netfilter objects in the same order.

tx packet(sent to the guest): we want that redirector hanldes it first
rx packet(sent from the guest): we want that colo-rewriter handles it first
Change the order or use two traffic-redirectors?

Thanks
Wen Congyang

> 
>>
>> If not, how to get the packet from the netdev, and send back the packet to
>> the netdev?
>>
>> Thanks
>> Wen Congyang
>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  6:54                     ` Jason Wang
  2016-01-20  7:44                       ` Wen Congyang
@ 2016-01-20 10:01                       ` Wen Congyang
  2016-01-20 10:19                         ` Jason Wang
  1 sibling, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-20 10:01 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/20/2016 02:54 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>
>>> Sure.
>>>
>>> Two main comments/suggestions:
>>>
>>> - TCP analysis is missed in current version, maybe you point a git tree
>>> (or another version of RFC) to me for a better understanding of the
>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>> - I prefer to make the code as reusable as possible. So it's better to
>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>
>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>> this 99% since the work has been done in a thread. Just let the thread
>>> poll sockets directly, then the comparing have the possibility to be
>>> reused by other kinds of dataplane.
>>> 2) Implement traffic mirror/redirector as filter.
>>> 3) Implement TCP seq rewriting as a filter.
>>>
>>> Then, in primary node, you need just a traffic mirror, which did:
>>> - mirror ingress traffic to secondary node
>>> - mirror outgress traffic to packet comparing thread
>>>
>>> And in secondadry node, you need two filters:
>>> - A TCP seq rewriter which adjust tcp sequence number.
>>> - A traffic redirector which redirect packet from a socket as ingress
>>> traffic, and redirect outgress traffic to the socket which could be
>>> polled by remote packet comparing thread.
>>>   Thoughts?
>>>
>>> Thanks
>>>
>>>> Thanks
>>>> zhangchen
>>>
>>
>>
>> Hi, Jason.
>> We consider your suggestion to split/decouple
>> the reusable parts from the codes.
>> Due to filter plugin are traversed one by one in order
>> we will split colo-proxy to three filters in each side.
>>
>> But in this plan,primary and secondary both have socket
>> server,startup is a problem.
> 
> I believe this issue could be solved by reusing socket chardev.
> 
>>
>>
>>  Primary qemu                                                      
>> Secondary qemu
>> +----------------------------------------------------------+      
>> +-----------------------------------------------------------+
>> | +-----------------------------------------------------+  |       | 
>> +------------------------------------------------------+ |
>> | |                                                     |  |       | 
>> |                                                      | |
>> | |                        guest                        |  |       | 
>> |                        guest                         | |
>> | |                                                     |  |       | 
>> |                                                      | |
>> | +-----------^--------------+--------------------------+  |       | 
>> +---------------------+--------+-----------------------+ |
>> |             |              |                             |      
>> |                        ^        |                         |
>> |             |              |                             |      
>> |                        |        |                         |
>> |             +-------------------------------------------------+ 
>> |                        |        |                         |
>> |  netfilter  |              |                             |    |  |  
>> netfilter            |        |                         |
>> | +-----------------------------------------------------+  |    |  | 
>> +------------------------------------------------------+ |
>> | |           |              |     filter excute order  |  |    |  | 
>> |                     |        |  filter excute order  | |
>> | |           |              |    +-------------------> |  |    |  | 
>> |                     |        | +-------------------> | |
>> | |           |              |                          |  |    |  | 
>> |                     |        |   TCP                 | |
>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>> | | |           |     |            |    |            |  |  |    |  | 
>> | |           |   |        |             |  |        | | |
>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>> | | |  client   |     |  server    |    |            |  |  |       | 
>> | |  server   |   | ack    |   seq       |  |client  | | |
>> | | |           |     |            |    |            |  |  |       | 
>> | |           |   |        |             |  |        | | |
>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>> | +-----------+   +--------+-------------+  +----+---+ | |
>> | |      |     tx          |      rx          |     rx  |  |       | 
>> |            tx                        all       |  rx | |
>> | +-----------------------------------------------------+  |       | 
>> +------------------------------------------------------+ |
>> |        |                
>> +-------------------------------------------------------------------------------------------+      
>> |
>> |        |                                    |            |      
>> |                                                           |
>> +----------------------------------------------------------+      
>> +-----------------------------------------------------------+
>>          |                                    |
>>          |guest receive                       |guest send
>>          |                                    |
>> +--------+------------------------------------v------------+
>> |                                                          |
>> |                                                          |
>> |                         tap                             
>> |                              NOTE: filter direction is rx/tx/all
>> |                                                         
>> |                              rx:receive packets sent to the netdev
>> |                                                         
>> |                              tx:receive packets sent by the netdev
>> +----------------------------------------------------------+
>>
>>
>>
> 
> I still like to decouple comparer from netfilter. It have two obvious
> advantages:
> 
> - make it can be reused by other dataplane (e.g vhost)
> - secondary redirector could redirect rx to comparer on primary node
> directly which simplify the design.
> 
>>
>>
>>
>>
>> guest recv packet route
>>
>> primary
>> tap --> mirror client filter
>> mirror client will send packet to guest,at the
>> same time, copy and forward packet to secondary
>> mirror server.
>>
>> secondary
>> mirror server filter --> TCP rewriter
>> if recv packet is TCP packet,we will adjust ack
>> and update TCP checksum, then send to secondary
>> guest. else directly send to guest.
>>
>>
>> guest send packet route
>>
>> primary
>> guest --> redirect server filter
>> redirect server filter recv primary guest packet
>> but do nothing, just pass to next filter.
>>
>> redirect server filter --> compare filter
>> compare filter recv primary guest packet then
>> waiting scondary redirect packet to compare it.
>> if packet same,send primary packet and clear secondary
>> packet, else send primary packet and do
>> checkpoint.
>>
>> secondary
>> guest --> TCP rewriter filter
>> if the packet is TCP packet,we will adjust seq
>> and update TCP checksum. then send it to
>> redirect client filter. else directly send to
>> redirect client filter.
>>
>> redirect client filter --> redirect server filter
>> forward packet to primary
>>
>>
>> In failover scene(primary is down), the TCP rewriter will keep
>> servicing
>> for the TCP connection which is established after the last checkpoint。
>>
>>
>>
>> How about this plan?
> 
> Sounds good.
> 
> And there's indeed no need to differ client/server by reusing the socket
> chardev. E.g:
> 
> In primary node:
> 
> ...
> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
> -netdev tap,id=hn0
> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1

Why mirrorer has indev? I think we can use traffic-redirector to do it.
The command line is:
-netdev tap,id=hn0
-object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
-object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
-colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
In the comparer thread, we can use qemu_net_queue_send_iov() to send
out the packet.

Also, we can merge the socketdev comparer1 and mirrorer0.

Thanks
Wen Congyang

> ...
> 
> packet comparer compares the packets from two chardev: comparer0 and
> comparer1.
> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
> and mirror rx to packet comparer through chardev comparer0.
> 
> In secondary node:
> 
> ...
> -chardev socket,id=redirector0,host=ip_primary,port=Y
> -chardev socket,id=redirector1,host=ip_primary,port=Z
> -netdev tap,id=hn0
> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
> -colo-rewriter netdev=hn0,id=c0
> ...
> 
> traffic-redirector redirect the rx traffic from primary node through
> redirector0 and redirect the tx traffic to promary node through redirector1.
> colo-rewriter rewrite seq number as a normal netfilter.
> 
> 
> 
>>
>>
>>> .
>>>
>>
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20  9:49                           ` Wen Congyang
@ 2016-01-20 10:03                             ` Jason Wang
  2016-01-20 10:34                               ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-20 10:03 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 01/20/2016 05:49 PM, Wen Congyang wrote:
> On 01/20/2016 05:20 PM, Jason Wang wrote:
>>
>> On 01/20/2016 03:44 PM, Wen Congyang wrote:
>>>>> ...
>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>> -netdev tap,id=hn0
>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>> ...
>>>>>
>>>>> packet comparer compares the packets from two chardev: comparer0 and
>>>>> comparer1.
>>>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>>>> and mirror rx to packet comparer through chardev comparer0.
>>>>>
>>>>> In secondary node:
>>>>>
>>>>> ...
>>>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>>>> -netdev tap,id=hn0
>>>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>>>> -colo-rewriter netdev=hn0,id=c0
>>>>> ...
>>>>>
>>>>> traffic-redirector redirect the rx traffic from primary node through
>>>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>>>> colo-rewriter rewrite seq number as a normal netfilter.
>>> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
>>> A netfilter driver?
>> traffic-mirrorer/redirector is a type of netfilter that just
>> mirror/redirect packets between netdev and chardev (just the mirror
>> client/sever and redirect client/sever in the above graph)
>> colo-rewriter is a type of netfilter that did ack/seq adjust (just the
>> TCP rewriter in the above graph)
>> colo-comparer is a thread object that did packet comparing (similar to
>> "compare" in the above graph but not a netfiler)
> Thanks. I have another question:
> IIRC, both rx and tx packets walk through all netfilter objects in the same order.
>
> tx packet(sent to the guest): we want that redirector hanldes it first
> rx packet(sent from the guest): we want that colo-rewriter handles it first
> Change the order or use two traffic-redirectors?
>
> Thanks
> Wen Congyang

Interesting question.

Two redirectors sounds ok or maybe we can go through rx filters in a
reverse order?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20 10:01                       ` Wen Congyang
@ 2016-01-20 10:19                         ` Jason Wang
  2016-01-20 10:30                           ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-20 10:19 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/20/2016 06:01 PM, Wen Congyang wrote:
> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>
>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>> Sure.
>>>>
>>>> Two main comments/suggestions:
>>>>
>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>> (or another version of RFC) to me for a better understanding of the
>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>
>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>> this 99% since the work has been done in a thread. Just let the thread
>>>> poll sockets directly, then the comparing have the possibility to be
>>>> reused by other kinds of dataplane.
>>>> 2) Implement traffic mirror/redirector as filter.
>>>> 3) Implement TCP seq rewriting as a filter.
>>>>
>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>> - mirror ingress traffic to secondary node
>>>> - mirror outgress traffic to packet comparing thread
>>>>
>>>> And in secondadry node, you need two filters:
>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>> traffic, and redirect outgress traffic to the socket which could be
>>>> polled by remote packet comparing thread.
>>>>   Thoughts?
>>>>
>>>> Thanks
>>>>
>>>>> Thanks
>>>>> zhangchen
>>>
>>> Hi, Jason.
>>> We consider your suggestion to split/decouple
>>> the reusable parts from the codes.
>>> Due to filter plugin are traversed one by one in order
>>> we will split colo-proxy to three filters in each side.
>>>
>>> But in this plan,primary and secondary both have socket
>>> server,startup is a problem.
>> I believe this issue could be solved by reusing socket chardev.
>>
>>>
>>>  Primary qemu                                                      
>>> Secondary qemu
>>> +----------------------------------------------------------+      
>>> +-----------------------------------------------------------+
>>> | +-----------------------------------------------------+  |       | 
>>> +------------------------------------------------------+ |
>>> | |                                                     |  |       | 
>>> |                                                      | |
>>> | |                        guest                        |  |       | 
>>> |                        guest                         | |
>>> | |                                                     |  |       | 
>>> |                                                      | |
>>> | +-----------^--------------+--------------------------+  |       | 
>>> +---------------------+--------+-----------------------+ |
>>> |             |              |                             |      
>>> |                        ^        |                         |
>>> |             |              |                             |      
>>> |                        |        |                         |
>>> |             +-------------------------------------------------+ 
>>> |                        |        |                         |
>>> |  netfilter  |              |                             |    |  |  
>>> netfilter            |        |                         |
>>> | +-----------------------------------------------------+  |    |  | 
>>> +------------------------------------------------------+ |
>>> | |           |              |     filter excute order  |  |    |  | 
>>> |                     |        |  filter excute order  | |
>>> | |           |              |    +-------------------> |  |    |  | 
>>> |                     |        | +-------------------> | |
>>> | |           |              |                          |  |    |  | 
>>> |                     |        |   TCP                 | |
>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>> | | |           |     |            |    |            |  |  |    |  | 
>>> | |           |   |        |             |  |        | | |
>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>> | | |           |     |            |    |            |  |  |       | 
>>> | |           |   |        |             |  |        | | |
>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>> |            tx                        all       |  rx | |
>>> | +-----------------------------------------------------+  |       | 
>>> +------------------------------------------------------+ |
>>> |        |                
>>> +-------------------------------------------------------------------------------------------+      
>>> |
>>> |        |                                    |            |      
>>> |                                                           |
>>> +----------------------------------------------------------+      
>>> +-----------------------------------------------------------+
>>>          |                                    |
>>>          |guest receive                       |guest send
>>>          |                                    |
>>> +--------+------------------------------------v------------+
>>> |                                                          |
>>> |                                                          |
>>> |                         tap                             
>>> |                              NOTE: filter direction is rx/tx/all
>>> |                                                         
>>> |                              rx:receive packets sent to the netdev
>>> |                                                         
>>> |                              tx:receive packets sent by the netdev
>>> +----------------------------------------------------------+
>>>
>>>
>>>
>> I still like to decouple comparer from netfilter. It have two obvious
>> advantages:
>>
>> - make it can be reused by other dataplane (e.g vhost)
>> - secondary redirector could redirect rx to comparer on primary node
>> directly which simplify the design.
>>
>>>
>>>
>>>
>>> guest recv packet route
>>>
>>> primary
>>> tap --> mirror client filter
>>> mirror client will send packet to guest,at the
>>> same time, copy and forward packet to secondary
>>> mirror server.
>>>
>>> secondary
>>> mirror server filter --> TCP rewriter
>>> if recv packet is TCP packet,we will adjust ack
>>> and update TCP checksum, then send to secondary
>>> guest. else directly send to guest.
>>>
>>>
>>> guest send packet route
>>>
>>> primary
>>> guest --> redirect server filter
>>> redirect server filter recv primary guest packet
>>> but do nothing, just pass to next filter.
>>>
>>> redirect server filter --> compare filter
>>> compare filter recv primary guest packet then
>>> waiting scondary redirect packet to compare it.
>>> if packet same,send primary packet and clear secondary
>>> packet, else send primary packet and do
>>> checkpoint.
>>>
>>> secondary
>>> guest --> TCP rewriter filter
>>> if the packet is TCP packet,we will adjust seq
>>> and update TCP checksum. then send it to
>>> redirect client filter. else directly send to
>>> redirect client filter.
>>>
>>> redirect client filter --> redirect server filter
>>> forward packet to primary
>>>
>>>
>>> In failover scene(primary is down), the TCP rewriter will keep
>>> servicing
>>> for the TCP connection which is established after the last checkpoint。
>>>
>>>
>>>
>>> How about this plan?
>> Sounds good.
>>
>> And there's indeed no need to differ client/server by reusing the socket
>> chardev. E.g:
>>
>> In primary node:
>>
>> ...
>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>> -netdev tap,id=hn0
>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
> Why mirrorer has indev? 


As I said in the previous mails. I would like to decouple packet
comparing from netfilter. You've already done most of this since the
comparing is done in an independent thread. So the indev here is to
mirror the packet sent by guest to the packet comparing thread.

> I think we can use traffic-redirector to do it.
> The command line is:
> -netdev tap,id=hn0
> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
> In the comparer thread, we can use qemu_net_queue_send_iov() to send
> out the packet.
>
> Also, we can merge the socketdev comparer1 and mirrorer0.

It depends on whether or not packet comparing was done in a net filter
(which I prefer not).

>
> Thanks
> Wen Congyang
>
>> ...
>>
>> packet comparer compares the packets from two chardev: comparer0 and
>> comparer1.
>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>> and mirror rx to packet comparer through chardev comparer0.
>>
>> In secondary node:
>>
>> ...
>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>> -netdev tap,id=hn0
>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>> -colo-rewriter netdev=hn0,id=c0
>> ...
>>
>> traffic-redirector redirect the rx traffic from primary node through
>> redirector0 and redirect the tx traffic to promary node through redirector1.
>> colo-rewriter rewrite seq number as a normal netfilter.
>>
>>
>>
>>>
>>>> .
>>>>
>>
>>
>>
>> .
>>
>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20 10:19                         ` Jason Wang
@ 2016-01-20 10:30                           ` Wen Congyang
  2016-01-22  3:15                             ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-20 10:30 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/20/2016 06:19 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>
>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>> Sure.
>>>>>
>>>>> Two main comments/suggestions:
>>>>>
>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>> (or another version of RFC) to me for a better understanding of the
>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>
>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>> reused by other kinds of dataplane.
>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>
>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>> - mirror ingress traffic to secondary node
>>>>> - mirror outgress traffic to packet comparing thread
>>>>>
>>>>> And in secondadry node, you need two filters:
>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>> polled by remote packet comparing thread.
>>>>>   Thoughts?
>>>>>
>>>>> Thanks
>>>>>
>>>>>> Thanks
>>>>>> zhangchen
>>>>
>>>> Hi, Jason.
>>>> We consider your suggestion to split/decouple
>>>> the reusable parts from the codes.
>>>> Due to filter plugin are traversed one by one in order
>>>> we will split colo-proxy to three filters in each side.
>>>>
>>>> But in this plan,primary and secondary both have socket
>>>> server,startup is a problem.
>>> I believe this issue could be solved by reusing socket chardev.
>>>
>>>>
>>>>  Primary qemu                                                      
>>>> Secondary qemu
>>>> +----------------------------------------------------------+      
>>>> +-----------------------------------------------------------+
>>>> | +-----------------------------------------------------+  |       | 
>>>> +------------------------------------------------------+ |
>>>> | |                                                     |  |       | 
>>>> |                                                      | |
>>>> | |                        guest                        |  |       | 
>>>> |                        guest                         | |
>>>> | |                                                     |  |       | 
>>>> |                                                      | |
>>>> | +-----------^--------------+--------------------------+  |       | 
>>>> +---------------------+--------+-----------------------+ |
>>>> |             |              |                             |      
>>>> |                        ^        |                         |
>>>> |             |              |                             |      
>>>> |                        |        |                         |
>>>> |             +-------------------------------------------------+ 
>>>> |                        |        |                         |
>>>> |  netfilter  |              |                             |    |  |  
>>>> netfilter            |        |                         |
>>>> | +-----------------------------------------------------+  |    |  | 
>>>> +------------------------------------------------------+ |
>>>> | |           |              |     filter excute order  |  |    |  | 
>>>> |                     |        |  filter excute order  | |
>>>> | |           |              |    +-------------------> |  |    |  | 
>>>> |                     |        | +-------------------> | |
>>>> | |           |              |                          |  |    |  | 
>>>> |                     |        |   TCP                 | |
>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>> | |           |   |        |             |  |        | | |
>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>> | | |           |     |            |    |            |  |  |       | 
>>>> | |           |   |        |             |  |        | | |
>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>> |            tx                        all       |  rx | |
>>>> | +-----------------------------------------------------+  |       | 
>>>> +------------------------------------------------------+ |
>>>> |        |                
>>>> +-------------------------------------------------------------------------------------------+      
>>>> |
>>>> |        |                                    |            |      
>>>> |                                                           |
>>>> +----------------------------------------------------------+      
>>>> +-----------------------------------------------------------+
>>>>          |                                    |
>>>>          |guest receive                       |guest send
>>>>          |                                    |
>>>> +--------+------------------------------------v------------+
>>>> |                                                          |
>>>> |                                                          |
>>>> |                         tap                             
>>>> |                              NOTE: filter direction is rx/tx/all
>>>> |                                                         
>>>> |                              rx:receive packets sent to the netdev
>>>> |                                                         
>>>> |                              tx:receive packets sent by the netdev
>>>> +----------------------------------------------------------+
>>>>
>>>>
>>>>
>>> I still like to decouple comparer from netfilter. It have two obvious
>>> advantages:
>>>
>>> - make it can be reused by other dataplane (e.g vhost)
>>> - secondary redirector could redirect rx to comparer on primary node
>>> directly which simplify the design.
>>>
>>>>
>>>>
>>>>
>>>> guest recv packet route
>>>>
>>>> primary
>>>> tap --> mirror client filter
>>>> mirror client will send packet to guest,at the
>>>> same time, copy and forward packet to secondary
>>>> mirror server.
>>>>
>>>> secondary
>>>> mirror server filter --> TCP rewriter
>>>> if recv packet is TCP packet,we will adjust ack
>>>> and update TCP checksum, then send to secondary
>>>> guest. else directly send to guest.
>>>>
>>>>
>>>> guest send packet route
>>>>
>>>> primary
>>>> guest --> redirect server filter
>>>> redirect server filter recv primary guest packet
>>>> but do nothing, just pass to next filter.
>>>>
>>>> redirect server filter --> compare filter
>>>> compare filter recv primary guest packet then
>>>> waiting scondary redirect packet to compare it.
>>>> if packet same,send primary packet and clear secondary
>>>> packet, else send primary packet and do
>>>> checkpoint.
>>>>
>>>> secondary
>>>> guest --> TCP rewriter filter
>>>> if the packet is TCP packet,we will adjust seq
>>>> and update TCP checksum. then send it to
>>>> redirect client filter. else directly send to
>>>> redirect client filter.
>>>>
>>>> redirect client filter --> redirect server filter
>>>> forward packet to primary
>>>>
>>>>
>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>> servicing
>>>> for the TCP connection which is established after the last checkpoint。
>>>>
>>>>
>>>>
>>>> How about this plan?
>>> Sounds good.
>>>
>>> And there's indeed no need to differ client/server by reusing the socket
>>> chardev. E.g:
>>>
>>> In primary node:
>>>
>>> ...
>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>> -netdev tap,id=hn0
>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>> Why mirrorer has indev? 
> 
> 
> As I said in the previous mails. I would like to decouple packet
> comparing from netfilter. You've already done most of this since the
> comparing is done in an independent thread. So the indev here is to
> mirror the packet sent by guest to the packet comparing thread.
> 
>> I think we can use traffic-redirector to do it.
>> The command line is:
>> -netdev tap,id=hn0
>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>> out the packet.
>>
>> Also, we can merge the socketdev comparer1 and mirrorer0.
> 
> It depends on whether or not packet comparing was done in a net filter
> (which I prefer not).

I mean that: packet comapring is done in a thread, not a net filter.
The flow of the packet sent from guest:
1. traffice-redirecotr, we will redirector the packet to comparer0, the next
   filter will never see it.
2. comparing thread: read it from socket chardev comparer0
3. call qemu_net_queue_send_iov() to send it back to the netdev.

Thanks
Wen Congyang

> 
>>
>> Thanks
>> Wen Congyang
>>
>>> ...
>>>
>>> packet comparer compares the packets from two chardev: comparer0 and
>>> comparer1.
>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>> and mirror rx to packet comparer through chardev comparer0.
>>>
>>> In secondary node:
>>>
>>> ...
>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>> -netdev tap,id=hn0
>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>> -colo-rewriter netdev=hn0,id=c0
>>> ...
>>>
>>> traffic-redirector redirect the rx traffic from primary node through
>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>> colo-rewriter rewrite seq number as a normal netfilter.
>>>
>>>
>>>
>>>>
>>>>> .
>>>>>
>>>
>>>
>>>
>>> .
>>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20 10:03                             ` Jason Wang
@ 2016-01-20 10:34                               ` Wen Congyang
  2016-01-22  5:33                                 ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-20 10:34 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

On 01/20/2016 06:03 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 05:49 PM, Wen Congyang wrote:
>> On 01/20/2016 05:20 PM, Jason Wang wrote:
>>>
>>> On 01/20/2016 03:44 PM, Wen Congyang wrote:
>>>>>> ...
>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>> -netdev tap,id=hn0
>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>> ...
>>>>>>
>>>>>> packet comparer compares the packets from two chardev: comparer0 and
>>>>>> comparer1.
>>>>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>>>>> and mirror rx to packet comparer through chardev comparer0.
>>>>>>
>>>>>> In secondary node:
>>>>>>
>>>>>> ...
>>>>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>>>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>>>>> -netdev tap,id=hn0
>>>>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>>>>> -colo-rewriter netdev=hn0,id=c0
>>>>>> ...
>>>>>>
>>>>>> traffic-redirector redirect the rx traffic from primary node through
>>>>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>>>>> colo-rewriter rewrite seq number as a normal netfilter.
>>>> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
>>>> A netfilter driver?
>>> traffic-mirrorer/redirector is a type of netfilter that just
>>> mirror/redirect packets between netdev and chardev (just the mirror
>>> client/sever and redirect client/sever in the above graph)
>>> colo-rewriter is a type of netfilter that did ack/seq adjust (just the
>>> TCP rewriter in the above graph)
>>> colo-comparer is a thread object that did packet comparing (similar to
>>> "compare" in the above graph but not a netfiler)
>> Thanks. I have another question:
>> IIRC, both rx and tx packets walk through all netfilter objects in the same order.
>>
>> tx packet(sent to the guest): we want that redirector hanldes it first
>> rx packet(sent from the guest): we want that colo-rewriter handles it first
>> Change the order or use two traffic-redirectors?
>>
>> Thanks
>> Wen Congyang
> 
> Interesting question.
> 
> Two redirectors sounds ok or maybe we can go through rx filters in a
> reverse order?

netdev <---> filter1 <----> filter2 <----> .... <----> emulated device <----> guest
So I think we can go through rx filters in a reverse order. But it changes
the behavior. So I am not sure if we can do it.

Thanks
Wen Congyang

> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20 10:30                           ` Wen Congyang
@ 2016-01-22  3:15                             ` Jason Wang
  2016-01-22  3:28                               ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-22  3:15 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/20/2016 06:30 PM, Wen Congyang wrote:
> On 01/20/2016 06:19 PM, Jason Wang wrote:
>> > 
>> > 
>> > On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>> >> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>> >>>
>>>> >>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>> >>>>> Sure.
>>>>>> >>>>>
>>>>>> >>>>> Two main comments/suggestions:
>>>>>> >>>>>
>>>>>> >>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>> >>>>> (or another version of RFC) to me for a better understanding of the
>>>>>> >>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>> >>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>> >>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>> >>>>>
>>>>>> >>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>> >>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>> >>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>> >>>>> reused by other kinds of dataplane.
>>>>>> >>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>> >>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>> >>>>>
>>>>>> >>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>> >>>>> - mirror ingress traffic to secondary node
>>>>>> >>>>> - mirror outgress traffic to packet comparing thread
>>>>>> >>>>>
>>>>>> >>>>> And in secondadry node, you need two filters:
>>>>>> >>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>> >>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>> >>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>> >>>>> polled by remote packet comparing thread.
>>>>>> >>>>>   Thoughts?
>>>>>> >>>>>
>>>>>> >>>>> Thanks
>>>>>> >>>>>
>>>>>>> >>>>>> Thanks
>>>>>>> >>>>>> zhangchen
>>>>> >>>>
>>>>> >>>> Hi, Jason.
>>>>> >>>> We consider your suggestion to split/decouple
>>>>> >>>> the reusable parts from the codes.
>>>>> >>>> Due to filter plugin are traversed one by one in order
>>>>> >>>> we will split colo-proxy to three filters in each side.
>>>>> >>>>
>>>>> >>>> But in this plan,primary and secondary both have socket
>>>>> >>>> server,startup is a problem.
>>>> >>> I believe this issue could be solved by reusing socket chardev.
>>>> >>>
>>>>> >>>>
>>>>> >>>>  Primary qemu                                                      
>>>>> >>>> Secondary qemu
>>>>> >>>> +----------------------------------------------------------+      
>>>>> >>>> +-----------------------------------------------------------+
>>>>> >>>> | +-----------------------------------------------------+  |       | 
>>>>> >>>> +------------------------------------------------------+ |
>>>>> >>>> | |                                                     |  |       | 
>>>>> >>>> |                                                      | |
>>>>> >>>> | |                        guest                        |  |       | 
>>>>> >>>> |                        guest                         | |
>>>>> >>>> | |                                                     |  |       | 
>>>>> >>>> |                                                      | |
>>>>> >>>> | +-----------^--------------+--------------------------+  |       | 
>>>>> >>>> +---------------------+--------+-----------------------+ |
>>>>> >>>> |             |              |                             |      
>>>>> >>>> |                        ^        |                         |
>>>>> >>>> |             |              |                             |      
>>>>> >>>> |                        |        |                         |
>>>>> >>>> |             +-------------------------------------------------+ 
>>>>> >>>> |                        |        |                         |
>>>>> >>>> |  netfilter  |              |                             |    |  |  
>>>>> >>>> netfilter            |        |                         |
>>>>> >>>> | +-----------------------------------------------------+  |    |  | 
>>>>> >>>> +------------------------------------------------------+ |
>>>>> >>>> | |           |              |     filter excute order  |  |    |  | 
>>>>> >>>> |                     |        |  filter excute order  | |
>>>>> >>>> | |           |              |    +-------------------> |  |    |  | 
>>>>> >>>> |                     |        | +-------------------> | |
>>>>> >>>> | |           |              |                          |  |    |  | 
>>>>> >>>> |                     |        |   TCP                 | |
>>>>> >>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>> >>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>> >>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>> >>>> | |           |   |        |             |  |        | | |
>>>>> >>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>> >>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>> >>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>> >>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>> >>>> | | |           |     |            |    |            |  |  |       | 
>>>>> >>>> | |           |   |        |             |  |        | | |
>>>>> >>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>> >>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>> >>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>> >>>> |            tx                        all       |  rx | |
>>>>> >>>> | +-----------------------------------------------------+  |       | 
>>>>> >>>> +------------------------------------------------------+ |
>>>>> >>>> |        |                
>>>>> >>>> +-------------------------------------------------------------------------------------------+      
>>>>> >>>> |
>>>>> >>>> |        |                                    |            |      
>>>>> >>>> |                                                           |
>>>>> >>>> +----------------------------------------------------------+      
>>>>> >>>> +-----------------------------------------------------------+
>>>>> >>>>          |                                    |
>>>>> >>>>          |guest receive                       |guest send
>>>>> >>>>          |                                    |
>>>>> >>>> +--------+------------------------------------v------------+
>>>>> >>>> |                                                          |
>>>>> >>>> |                                                          |
>>>>> >>>> |                         tap                             
>>>>> >>>> |                              NOTE: filter direction is rx/tx/all
>>>>> >>>> |                                                         
>>>>> >>>> |                              rx:receive packets sent to the netdev
>>>>> >>>> |                                                         
>>>>> >>>> |                              tx:receive packets sent by the netdev
>>>>> >>>> +----------------------------------------------------------+
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>> >>> I still like to decouple comparer from netfilter. It have two obvious
>>>> >>> advantages:
>>>> >>>
>>>> >>> - make it can be reused by other dataplane (e.g vhost)
>>>> >>> - secondary redirector could redirect rx to comparer on primary node
>>>> >>> directly which simplify the design.
>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> guest recv packet route
>>>>> >>>>
>>>>> >>>> primary
>>>>> >>>> tap --> mirror client filter
>>>>> >>>> mirror client will send packet to guest,at the
>>>>> >>>> same time, copy and forward packet to secondary
>>>>> >>>> mirror server.
>>>>> >>>>
>>>>> >>>> secondary
>>>>> >>>> mirror server filter --> TCP rewriter
>>>>> >>>> if recv packet is TCP packet,we will adjust ack
>>>>> >>>> and update TCP checksum, then send to secondary
>>>>> >>>> guest. else directly send to guest.
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> guest send packet route
>>>>> >>>>
>>>>> >>>> primary
>>>>> >>>> guest --> redirect server filter
>>>>> >>>> redirect server filter recv primary guest packet
>>>>> >>>> but do nothing, just pass to next filter.
>>>>> >>>>
>>>>> >>>> redirect server filter --> compare filter
>>>>> >>>> compare filter recv primary guest packet then
>>>>> >>>> waiting scondary redirect packet to compare it.
>>>>> >>>> if packet same,send primary packet and clear secondary
>>>>> >>>> packet, else send primary packet and do
>>>>> >>>> checkpoint.
>>>>> >>>>
>>>>> >>>> secondary
>>>>> >>>> guest --> TCP rewriter filter
>>>>> >>>> if the packet is TCP packet,we will adjust seq
>>>>> >>>> and update TCP checksum. then send it to
>>>>> >>>> redirect client filter. else directly send to
>>>>> >>>> redirect client filter.
>>>>> >>>>
>>>>> >>>> redirect client filter --> redirect server filter
>>>>> >>>> forward packet to primary
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>> >>>> servicing
>>>>> >>>> for the TCP connection which is established after the last checkpoint。
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> How about this plan?
>>>> >>> Sounds good.
>>>> >>>
>>>> >>> And there's indeed no need to differ client/server by reusing the socket
>>>> >>> chardev. E.g:
>>>> >>>
>>>> >>> In primary node:
>>>> >>>
>>>> >>> ...
>>>> >>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>> >>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>> >>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>> >>> -netdev tap,id=hn0
>>>> >>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>> >>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>> >> Why mirrorer has indev? 
>> > 
>> > 
>> > As I said in the previous mails. I would like to decouple packet
>> > comparing from netfilter. You've already done most of this since the
>> > comparing is done in an independent thread. So the indev here is to
>> > mirror the packet sent by guest to the packet comparing thread.
>> > 
>>> >> I think we can use traffic-redirector to do it.
>>> >> The command line is:
>>> >> -netdev tap,id=hn0
>>> >> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>> >> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>> >> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>> >> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>> >> out the packet.
>>> >>
>>> >> Also, we can merge the socketdev comparer1 and mirrorer0.
>> > 
>> > It depends on whether or not packet comparing was done in a net filter
>> > (which I prefer not).
> I mean that: packet comapring is done in a thread, not a net filter.
> The flow of the packet sent from guest:
> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>    filter will never see it.
> 2. comparing thread: read it from socket chardev comparer0
> 3. call qemu_net_queue_send_iov() to send it back to the netdev.

Ok, looks like I miss something.

My suggestion tries best to let the packet comparing not tie to filter
or netdev. But your suggestion still need it to be coupled with a
netdev. Any advantages of doing this (or is there a reason that packet
must be sent to netdev after doing comparing?). If not, why not just
mirror (duplicate the packet and forward it to a chardev, and pass the
original packet to the next filter or netdev)? And doing
qemu_net_queue_send_iov() to a netdev in another thread may need some
synchronization with iothread.

>
> Thanks
> Wen Congyang
>
>> > 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  3:15                             ` Jason Wang
@ 2016-01-22  3:28                               ` Wen Congyang
  2016-01-22  5:41                                 ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-22  3:28 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/22/2016 11:15 AM, Jason Wang wrote:
> 
> 
> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>
>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>> Sure.
>>>>>>>>>>>>
>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>
>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>
>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>
>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>
>>>>>>>>>> Hi, Jason.
>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>
>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>> server,startup is a problem.
>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>> Secondary qemu
>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>> |                                                      | |
>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>> |                        guest                         | |
>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>> |                                                      | |
>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>> |             |              |                             |      
>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>> |             |              |                             |      
>>>>>>>>>> |                        |        |                         |
>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>> |                        |        |                         |
>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>> |        |                
>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>> |
>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>> |                                                           |
>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>          |                                    |
>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>          |                                    |
>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>> |                                                          |
>>>>>>>>>> |                                                          |
>>>>>>>>>> |                         tap                             
>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>> |                                                         
>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>> |                                                         
>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>> advantages:
>>>>>>>>
>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>> directly which simplify the design.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> guest recv packet route
>>>>>>>>>>
>>>>>>>>>> primary
>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>> mirror server.
>>>>>>>>>>
>>>>>>>>>> secondary
>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> guest send packet route
>>>>>>>>>>
>>>>>>>>>> primary
>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>
>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>> checkpoint.
>>>>>>>>>>
>>>>>>>>>> secondary
>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>> redirect client filter.
>>>>>>>>>>
>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>> forward packet to primary
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>> servicing
>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> How about this plan?
>>>>>>>> Sounds good.
>>>>>>>>
>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>> chardev. E.g:
>>>>>>>>
>>>>>>>> In primary node:
>>>>>>>>
>>>>>>>> ...
>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>> -netdev tap,id=hn0
>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>> Why mirrorer has indev? 
>>>>
>>>>
>>>> As I said in the previous mails. I would like to decouple packet
>>>> comparing from netfilter. You've already done most of this since the
>>>> comparing is done in an independent thread. So the indev here is to
>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>
>>>>>> I think we can use traffic-redirector to do it.
>>>>>> The command line is:
>>>>>> -netdev tap,id=hn0
>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>> out the packet.
>>>>>>
>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>
>>>> It depends on whether or not packet comparing was done in a net filter
>>>> (which I prefer not).
>> I mean that: packet comapring is done in a thread, not a net filter.
>> The flow of the packet sent from guest:
>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>    filter will never see it.
>> 2. comparing thread: read it from socket chardev comparer0
>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
> 
> Ok, looks like I miss something.
> 
> My suggestion tries best to let the packet comparing not tie to filter
> or netdev. But your suggestion still need it to be coupled with a
> netdev. Any advantages of doing this (or is there a reason that packet
> must be sent to netdev after doing comparing?). If not, why not just

Yes, the packet must be sent to netdev after doing comparing. If both
the primary packet and secondary packet are the same(contains the same
application level data), we will drop the secondary packet, and send the
primary packet to the netdev. Otherwise, we will sync the state.

> mirror (duplicate the packet and forward it to a chardev, and pass the
> original packet to the next filter or netdev)? And doing

We cannot send the packet to the netdev before comparing. We need to keep
the connection after failover.

Thanks
Wen Congyang

> qemu_net_queue_send_iov() to a netdev in another thread may need some
> synchronization with iothread.
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-20 10:34                               ` Wen Congyang
@ 2016-01-22  5:33                                 ` Jason Wang
  2016-01-22  5:57                                   ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-22  5:33 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 01/20/2016 06:34 PM, Wen Congyang wrote:
> On 01/20/2016 06:03 PM, Jason Wang wrote:
>>
>> On 01/20/2016 05:49 PM, Wen Congyang wrote:
>>> On 01/20/2016 05:20 PM, Jason Wang wrote:
>>>> On 01/20/2016 03:44 PM, Wen Congyang wrote:
>>>>>>> ...
>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>> -netdev tap,id=hn0
>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>> ...
>>>>>>>
>>>>>>> packet comparer compares the packets from two chardev: comparer0 and
>>>>>>> comparer1.
>>>>>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>>>>>> and mirror rx to packet comparer through chardev comparer0.
>>>>>>>
>>>>>>> In secondary node:
>>>>>>>
>>>>>>> ...
>>>>>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>>>>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>>>>>> -netdev tap,id=hn0
>>>>>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>>>>>> -colo-rewriter netdev=hn0,id=c0
>>>>>>> ...
>>>>>>>
>>>>>>> traffic-redirector redirect the rx traffic from primary node through
>>>>>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>>>>>> colo-rewriter rewrite seq number as a normal netfilter.
>>>>> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
>>>>> A netfilter driver?
>>>> traffic-mirrorer/redirector is a type of netfilter that just
>>>> mirror/redirect packets between netdev and chardev (just the mirror
>>>> client/sever and redirect client/sever in the above graph)
>>>> colo-rewriter is a type of netfilter that did ack/seq adjust (just the
>>>> TCP rewriter in the above graph)
>>>> colo-comparer is a thread object that did packet comparing (similar to
>>>> "compare" in the above graph but not a netfiler)
>>> Thanks. I have another question:
>>> IIRC, both rx and tx packets walk through all netfilter objects in the same order.
>>>
>>> tx packet(sent to the guest): we want that redirector hanldes it first
>>> rx packet(sent from the guest): we want that colo-rewriter handles it first
>>> Change the order or use two traffic-redirectors?
>>>
>>> Thanks
>>> Wen Congyang
>> Interesting question.
>>
>> Two redirectors sounds ok or maybe we can go through rx filters in a
>> reverse order?
> netdev <---> filter1 <----> filter2 <----> .... <----> emulated device <----> guest
> So I think we can go through rx filters in a reverse order. But it changes
> the behavior. So I am not sure if we can do it.

I think we can. Both dump and buffer filter does not require strict
order, so it's a good time and change to do this.

>
> Thanks
> Wen Congyang
>
>>
>> .
>>
>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  3:28                               ` Wen Congyang
@ 2016-01-22  5:41                                 ` Jason Wang
  2016-01-22  5:56                                   ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-22  5:41 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/22/2016 11:28 AM, Wen Congyang wrote:
> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>
>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>
>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>
>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>
>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>> server,startup is a problem.
>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>
>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>> Secondary qemu
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> |        |                
>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>> |
>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>> |                                                           |
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>          |                                    |
>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>          |                                    |
>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                         tap                             
>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>> advantages:
>>>>>>>>>
>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>> directly which simplify the design.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>> mirror server.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest send packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>> checkpoint.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>> servicing
>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> How about this plan?
>>>>>>>>> Sounds good.
>>>>>>>>>
>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>> chardev. E.g:
>>>>>>>>>
>>>>>>>>> In primary node:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>> Why mirrorer has indev? 
>>>>>
>>>>> As I said in the previous mails. I would like to decouple packet
>>>>> comparing from netfilter. You've already done most of this since the
>>>>> comparing is done in an independent thread. So the indev here is to
>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>
>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>> The command line is:
>>>>>>> -netdev tap,id=hn0
>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>> out the packet.
>>>>>>>
>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>> (which I prefer not).
>>> I mean that: packet comapring is done in a thread, not a net filter.
>>> The flow of the packet sent from guest:
>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>    filter will never see it.
>>> 2. comparing thread: read it from socket chardev comparer0
>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>> Ok, looks like I miss something.
>>
>> My suggestion tries best to let the packet comparing not tie to filter
>> or netdev. But your suggestion still need it to be coupled with a
>> netdev. Any advantages of doing this (or is there a reason that packet
>> must be sent to netdev after doing comparing?). If not, why not just
> Yes, the packet must be sent to netdev after doing comparing. If both
> the primary packet and secondary packet are the same(contains the same
> application level data), we will drop the secondary packet, and send the
> primary packet to the netdev. Otherwise, we will sync the state.

And drop primary packet also here?

>
>> mirror (duplicate the packet and forward it to a chardev, and pass the
>> original packet to the next filter or netdev)? And doing
> We cannot send the packet to the netdev before comparing. We need to keep
> the connection after failover.
>
> Thanks
> Wen Congyang
>
>> qemu_net_queue_send_iov() to a netdev in another thread may need some
>> synchronization with iothread.
>>
>>> Thanks
>>> Wen Congyang
>>>
>>
>>
>> .
>>
>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  5:41                                 ` Jason Wang
@ 2016-01-22  5:56                                   ` Wen Congyang
  2016-01-22  6:21                                     ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-22  5:56 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/22/2016 01:41 PM, Jason Wang wrote:
> 
> 
> On 01/22/2016 11:28 AM, Wen Congyang wrote:
>> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>>
>>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>>
>>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>>
>>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>>> server,startup is a problem.
>>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>>
>>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>>> Secondary qemu
>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>> |        |                
>>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>>> |
>>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>>> |                                                           |
>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>> |                         tap                             
>>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>>> |                                                         
>>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>>> |                                                         
>>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>>> advantages:
>>>>>>>>>>
>>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>>> directly which simplify the design.
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>>
>>>>>>>>>>>> primary
>>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>>> mirror server.
>>>>>>>>>>>>
>>>>>>>>>>>> secondary
>>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> guest send packet route
>>>>>>>>>>>>
>>>>>>>>>>>> primary
>>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>>
>>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>
>>>>>>>>>>>> secondary
>>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>>
>>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>>> servicing
>>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How about this plan?
>>>>>>>>>> Sounds good.
>>>>>>>>>>
>>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>>> chardev. E.g:
>>>>>>>>>>
>>>>>>>>>> In primary node:
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>> Why mirrorer has indev? 
>>>>>>
>>>>>> As I said in the previous mails. I would like to decouple packet
>>>>>> comparing from netfilter. You've already done most of this since the
>>>>>> comparing is done in an independent thread. So the indev here is to
>>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>>
>>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>>> The command line is:
>>>>>>>> -netdev tap,id=hn0
>>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>>> out the packet.
>>>>>>>>
>>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>>> (which I prefer not).
>>>> I mean that: packet comapring is done in a thread, not a net filter.
>>>> The flow of the packet sent from guest:
>>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>>    filter will never see it.
>>>> 2. comparing thread: read it from socket chardev comparer0
>>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>>> Ok, looks like I miss something.
>>>
>>> My suggestion tries best to let the packet comparing not tie to filter
>>> or netdev. But your suggestion still need it to be coupled with a
>>> netdev. Any advantages of doing this (or is there a reason that packet
>>> must be sent to netdev after doing comparing?). If not, why not just
>> Yes, the packet must be sent to netdev after doing comparing. If both
>> the primary packet and secondary packet are the same(contains the same
>> application level data), we will drop the secondary packet, and send the
>> primary packet to the netdev. Otherwise, we will sync the state.
> 
> And drop primary packet also here?

No, the primary packet must be sent back to the netdev, so the client can receive
the response.

For example:
1. guest has a ftp server
2. we connect to the ftp server via the network
3. both primary guest and secondary guest receive this request
4. both primary guest and secondary guest ack it
5. we compare these two ack packets in the comparing thread
6. it is the same(the seqno is different, but it is not important, we can modify it in
   colo-rewriter). So we drop the secondary packets, and sent back the primary packet
   to netdev
7. The primary ack packet is sent to the ftp client via netdev.

The ftp client only cares of the received packet. So if the packets from primay
and secondary guest contain the same data, we can say they are in the "same" state.

Thanks
Wen Congyang

> 
>>
>>> mirror (duplicate the packet and forward it to a chardev, and pass the
>>> original packet to the next filter or netdev)? And doing
>> We cannot send the packet to the netdev before comparing. We need to keep
>> the connection after failover.
>>
>> Thanks
>> Wen Congyang
>>
>>> qemu_net_queue_send_iov() to a netdev in another thread may need some
>>> synchronization with iothread.
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>
>>>
>>> .
>>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  5:33                                 ` Jason Wang
@ 2016-01-22  5:57                                   ` Wen Congyang
  0 siblings, 0 replies; 75+ messages in thread
From: Wen Congyang @ 2016-01-22  5:57 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

On 01/22/2016 01:33 PM, Jason Wang wrote:
> 
> 
> On 01/20/2016 06:34 PM, Wen Congyang wrote:
>> On 01/20/2016 06:03 PM, Jason Wang wrote:
>>>
>>> On 01/20/2016 05:49 PM, Wen Congyang wrote:
>>>> On 01/20/2016 05:20 PM, Jason Wang wrote:
>>>>> On 01/20/2016 03:44 PM, Wen Congyang wrote:
>>>>>>>> ...
>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>> -netdev tap,id=hn0
>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>> ...
>>>>>>>>
>>>>>>>> packet comparer compares the packets from two chardev: comparer0 and
>>>>>>>> comparer1.
>>>>>>>> traffic-mirrorer mirror tx to secondary node through chardev mirrorer0,
>>>>>>>> and mirror rx to packet comparer through chardev comparer0.
>>>>>>>>
>>>>>>>> In secondary node:
>>>>>>>>
>>>>>>>> ...
>>>>>>>> -chardev socket,id=redirector0,host=ip_primary,port=Y
>>>>>>>> -chardev socket,id=redirector1,host=ip_primary,port=Z
>>>>>>>> -netdev tap,id=hn0
>>>>>>>> -traffic-redirector netdev=hn0,id,r0,indev=redirector0,outdev=redirector1
>>>>>>>> -colo-rewriter netdev=hn0,id=c0
>>>>>>>> ...
>>>>>>>>
>>>>>>>> traffic-redirector redirect the rx traffic from primary node through
>>>>>>>> redirector0 and redirect the tx traffic to promary node through redirector1.
>>>>>>>> colo-rewriter rewrite seq number as a normal netfilter.
>>>>>> What are traffic-mirrorer and colo-comparer, traffic-redirector, colo-rewriter?
>>>>>> A netfilter driver?
>>>>> traffic-mirrorer/redirector is a type of netfilter that just
>>>>> mirror/redirect packets between netdev and chardev (just the mirror
>>>>> client/sever and redirect client/sever in the above graph)
>>>>> colo-rewriter is a type of netfilter that did ack/seq adjust (just the
>>>>> TCP rewriter in the above graph)
>>>>> colo-comparer is a thread object that did packet comparing (similar to
>>>>> "compare" in the above graph but not a netfiler)
>>>> Thanks. I have another question:
>>>> IIRC, both rx and tx packets walk through all netfilter objects in the same order.
>>>>
>>>> tx packet(sent to the guest): we want that redirector hanldes it first
>>>> rx packet(sent from the guest): we want that colo-rewriter handles it first
>>>> Change the order or use two traffic-redirectors?
>>>>
>>>> Thanks
>>>> Wen Congyang
>>> Interesting question.
>>>
>>> Two redirectors sounds ok or maybe we can go through rx filters in a
>>> reverse order?
>> netdev <---> filter1 <----> filter2 <----> .... <----> emulated device <----> guest
>> So I think we can go through rx filters in a reverse order. But it changes
>> the behavior. So I am not sure if we can do it.
> 
> I think we can. Both dump and buffer filter does not require strict
> order, so it's a good time and change to do this.

OK, we will do it.

Thanks
Wen Congyang

> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> .
>>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  5:56                                   ` Wen Congyang
@ 2016-01-22  6:21                                     ` Jason Wang
  2016-01-22  6:47                                       ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-22  6:21 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/22/2016 01:56 PM, Wen Congyang wrote:
> On 01/22/2016 01:41 PM, Jason Wang wrote:
>> > 
>> > 
>> > On 01/22/2016 11:28 AM, Wen Congyang wrote:
>>> >> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>>> >>>
>>>> >>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>>>> >>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>>> >>>>>>
>>>>>>> >>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>>>> >>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>>>> >>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>>>> >>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>>>> >>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>>>> >>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>>>> >>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>>>> >>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>>>> >>>>>>>>>>>> server,startup is a problem.
>>>>>>>>>>> >>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>>>> >>>>>>>>>>>> Secondary qemu
>>>>>>>>>>>>> >>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>> >>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>> >>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>> >>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>>>> >>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>>>> >>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>> >>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>>>> >>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>> >>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>> >>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>>>> >>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>> >>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>>>> >>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>> >>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>>>> >>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>>>> >>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>>>> >>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>>>> >>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>> >>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>>>> >>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>>>> >>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>> >>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>> >>>>>>>>>>>> |        |                
>>>>>>>>>>>>> >>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>>>> >>>>>>>>>>>> |
>>>>>>>>>>>>> >>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                           |
>>>>>>>>>>>>> >>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>> >>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>> >>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>> >>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>>> >>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>> >>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>> >>>>>>>>>>>> |                         tap                             
>>>>>>>>>>>>> >>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                         
>>>>>>>>>>>>> >>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>>>> >>>>>>>>>>>> |                                                         
>>>>>>>>>>>>> >>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>>>> >>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>>>> >>>>>>>>>> advantages:
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>>>> >>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>>>> >>>>>>>>>> directly which simplify the design.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> primary
>>>>>>>>>>>>> >>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>>>> >>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>>>> >>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>>>> >>>>>>>>>>>> mirror server.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> secondary
>>>>>>>>>>>>> >>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>>>> >>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>>>> >>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>>>> >>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> guest send packet route
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> primary
>>>>>>>>>>>>> >>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>>>> >>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>>>> >>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>>>> >>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>>>> >>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>>>> >>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>>>> >>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>>>> >>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> secondary
>>>>>>>>>>>>> >>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>>>> >>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>>>> >>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>>>> >>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>>>> >>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>>>> >>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>>>> >>>>>>>>>>>> servicing
>>>>>>>>>>>>> >>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>> >>>>>>>>>>>> How about this plan?
>>>>>>>>>>> >>>>>>>>>> Sounds good.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>>>> >>>>>>>>>> chardev. E.g:
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> In primary node:
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> ...
>>>>>>>>>>> >>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>>>> >>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>>>> >>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>>>> >>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>> >>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>>>> >>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>>> >>>>>>>> Why mirrorer has indev? 
>>>>>>> >>>>>>
>>>>>>> >>>>>> As I said in the previous mails. I would like to decouple packet
>>>>>>> >>>>>> comparing from netfilter. You've already done most of this since the
>>>>>>> >>>>>> comparing is done in an independent thread. So the indev here is to
>>>>>>> >>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>>> >>>>>>
>>>>>>>>> >>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>>>> >>>>>>>> The command line is:
>>>>>>>>> >>>>>>>> -netdev tap,id=hn0
>>>>>>>>> >>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>>>> >>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>>>> >>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>>>> >>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>>>> >>>>>>>> out the packet.
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>>>> >>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>>>> >>>>>> (which I prefer not).
>>>>> >>>> I mean that: packet comapring is done in a thread, not a net filter.
>>>>> >>>> The flow of the packet sent from guest:
>>>>> >>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>>> >>>>    filter will never see it.
>>>>> >>>> 2. comparing thread: read it from socket chardev comparer0
>>>>> >>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>>>> >>> Ok, looks like I miss something.
>>>> >>>
>>>> >>> My suggestion tries best to let the packet comparing not tie to filter
>>>> >>> or netdev. But your suggestion still need it to be coupled with a
>>>> >>> netdev. Any advantages of doing this (or is there a reason that packet
>>>> >>> must be sent to netdev after doing comparing?). If not, why not just
>>> >> Yes, the packet must be sent to netdev after doing comparing. If both
>>> >> the primary packet and secondary packet are the same(contains the same
>>> >> application level data), we will drop the secondary packet, and send the
>>> >> primary packet to the netdev. Otherwise, we will sync the state.
>> > 
>> > And drop primary packet also here?
> No, the primary packet must be sent back to the netdev, so the client can receive
> the response.
>
> For example:
> 1. guest has a ftp server
> 2. we connect to the ftp server via the network
> 3. both primary guest and secondary guest receive this request
> 4. both primary guest and secondary guest ack it
> 5. we compare these two ack packets in the comparing thread
> 6. it is the same(the seqno is different, but it is not important, we can modify it in
>    colo-rewriter). So we drop the secondary packets, and sent back the primary packet
>    to netdev
> 7. The primary ack packet is sent to the ftp client via netdev.
>
> The ftp client only cares of the received packet. So if the packets from primay
> and secondary guest contain the same data, we can say they are in the "same" state.
>
> Thanks
> Wen Congyang
>

Thanks for the example. But still don't get why it must be done before
comparing consider it will always be sent regardless the result of
comparing?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  6:21                                     ` Jason Wang
@ 2016-01-22  6:47                                       ` Wen Congyang
  2016-01-22  7:42                                         ` Jason Wang
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-22  6:47 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/22/2016 02:21 PM, Jason Wang wrote:
> 
> 
> On 01/22/2016 01:56 PM, Wen Congyang wrote:
>> On 01/22/2016 01:41 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 01/22/2016 11:28 AM, Wen Congyang wrote:
>>>>>> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>>>>>>>
>>>>>>>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>>>>>>>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>>>>>>>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>>>>>>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>>>>>>>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>>>>>>>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>>>>>>>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>>>>>>>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>>>>>>>>>>>>>>>>> server,startup is a problem.
>>>>>>>>>>>>>>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>>>>>>>>>>>>>>>>> Secondary qemu
>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>>>>>>>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>>>>>>>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                
>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                           |
>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>> |                         tap                             
>>>>>>>>>>>>>>>>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>>>>>>>>>>>>>>> advantages:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>>>>>>>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>>>>>>>>>>>>>>> directly which simplify the design.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>>>>>>>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>>>>>>>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>>>>>>>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> guest send packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>>>>>>>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>>>>>>>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>>>>>>>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>>>>>>>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>>>>>>>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>>>>>>>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>>>>>>>>>>>>>>>>> servicing
>>>>>>>>>>>>>>>>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> How about this plan?
>>>>>>>>>>>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>>>>>>>>>>>>>>> chardev. E.g:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In primary node:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>>>>>>>>>>>> Why mirrorer has indev? 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As I said in the previous mails. I would like to decouple packet
>>>>>>>>>>>>>> comparing from netfilter. You've already done most of this since the
>>>>>>>>>>>>>> comparing is done in an independent thread. So the indev here is to
>>>>>>>>>>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>>>>>>>>>>>>> The command line is:
>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>>>>>>>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>>>>>>>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>>>>>>>>>>>>> out the packet.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>>>>>>>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>>>>>>>>>>> (which I prefer not).
>>>>>>>>>> I mean that: packet comapring is done in a thread, not a net filter.
>>>>>>>>>> The flow of the packet sent from guest:
>>>>>>>>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>>>>>>>>    filter will never see it.
>>>>>>>>>> 2. comparing thread: read it from socket chardev comparer0
>>>>>>>>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>>>>>>>> Ok, looks like I miss something.
>>>>>>>>
>>>>>>>> My suggestion tries best to let the packet comparing not tie to filter
>>>>>>>> or netdev. But your suggestion still need it to be coupled with a
>>>>>>>> netdev. Any advantages of doing this (or is there a reason that packet
>>>>>>>> must be sent to netdev after doing comparing?). If not, why not just
>>>>>> Yes, the packet must be sent to netdev after doing comparing. If both
>>>>>> the primary packet and secondary packet are the same(contains the same
>>>>>> application level data), we will drop the secondary packet, and send the
>>>>>> primary packet to the netdev. Otherwise, we will sync the state.
>>>>
>>>> And drop primary packet also here?
>> No, the primary packet must be sent back to the netdev, so the client can receive
>> the response.
>>
>> For example:
>> 1. guest has a ftp server
>> 2. we connect to the ftp server via the network
>> 3. both primary guest and secondary guest receive this request
>> 4. both primary guest and secondary guest ack it
>> 5. we compare these two ack packets in the comparing thread
>> 6. it is the same(the seqno is different, but it is not important, we can modify it in
>>    colo-rewriter). So we drop the secondary packets, and sent back the primary packet
>>    to netdev
>> 7. The primary ack packet is sent to the ftp client via netdev.
>>
>> The ftp client only cares of the received packet. So if the packets from primay
>> and secondary guest contain the same data, we can say they are in the "same" state.
>>
>> Thanks
>> Wen Congyang
>>
> 
> Thanks for the example. But still don't get why it must be done before
> comparing consider it will always be sent regardless the result of
> comparing?

Our goal is that: the connection is OK after failover, and the user doesn't know one of
the hosts crashed.

If it sent out regardless the result of comparing, and primary host crashes. The connection
may be corrupted after failover. For example: the packet from primary and secondary host
contains different host, and we send the primary packet before comparing. The primary host
crashes before comparing these two packets. After failover, the connection may be reset or
the client doesn't receive the correct data, or some unexpected problems occurs.

Another example(tcp):
1. primary guest acks 100, and secondary guest only ack 95(some packet is lost in the guest)
2. client doesn't resend the lost packet
3. the connection will be recovered after the next checkpoint
If we do failover before the next checkpoint, there is no way to recover this connection.

If we send out the packet after comparing, we can assume that the client always receives the
same data.

Thanks
Wen Congyang

> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  6:47                                       ` Wen Congyang
@ 2016-01-22  7:42                                         ` Jason Wang
  2016-01-22  7:46                                           ` Wen Congyang
  0 siblings, 1 reply; 75+ messages in thread
From: Jason Wang @ 2016-01-22  7:42 UTC (permalink / raw)
  To: Wen Congyang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang



On 01/22/2016 02:47 PM, Wen Congyang wrote:
> On 01/22/2016 02:21 PM, Jason Wang wrote:
>>
>> On 01/22/2016 01:56 PM, Wen Congyang wrote:
>>> On 01/22/2016 01:41 PM, Jason Wang wrote:
>>>>>
>>>>> On 01/22/2016 11:28 AM, Wen Congyang wrote:
>>>>>>> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>>>>>>>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>>>>>>>>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>>>>>>>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>>>>>>>>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>>>>>>>>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>>>>>>>>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>>>>>>>>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>>>>>>>>>>>>>>>>>> server,startup is a problem.
>>>>>>>>>>>>>>>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>>>>>>>>>>>>>>>>>> Secondary qemu
>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>>>>>>>>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                
>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                           |
>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                         tap                             
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>>>>>>>>>>>>>>>> advantages:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>>>>>>>>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>>>>>>>>>>>>>>>> directly which simplify the design.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>>>>>>>>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>>>>>>>>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> guest send packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>>>>>>>>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>>>>>>>>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>>>>>>>>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>>>>>>>>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>>>>>>>>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>>>>>>>>>>>>>>>>>> servicing
>>>>>>>>>>>>>>>>>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> How about this plan?
>>>>>>>>>>>>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>>>>>>>>>>>>>>>> chardev. E.g:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In primary node:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>>>>>>>>>>>>> Why mirrorer has indev? 
>>>>>>>>>>>>>>> As I said in the previous mails. I would like to decouple packet
>>>>>>>>>>>>>>> comparing from netfilter. You've already done most of this since the
>>>>>>>>>>>>>>> comparing is done in an independent thread. So the indev here is to
>>>>>>>>>>>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>>>>>>>>>>>>>> The command line is:
>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>>>>>>>>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>>>>>>>>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>>>>>>>>>>>>>> out the packet.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>>>>>>>>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>>>>>>>>>>>> (which I prefer not).
>>>>>>>>>>> I mean that: packet comapring is done in a thread, not a net filter.
>>>>>>>>>>> The flow of the packet sent from guest:
>>>>>>>>>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>>>>>>>>>    filter will never see it.
>>>>>>>>>>> 2. comparing thread: read it from socket chardev comparer0
>>>>>>>>>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>>>>>>>>> Ok, looks like I miss something.
>>>>>>>>>
>>>>>>>>> My suggestion tries best to let the packet comparing not tie to filter
>>>>>>>>> or netdev. But your suggestion still need it to be coupled with a
>>>>>>>>> netdev. Any advantages of doing this (or is there a reason that packet
>>>>>>>>> must be sent to netdev after doing comparing?). If not, why not just
>>>>>>> Yes, the packet must be sent to netdev after doing comparing. If both
>>>>>>> the primary packet and secondary packet are the same(contains the same
>>>>>>> application level data), we will drop the secondary packet, and send the
>>>>>>> primary packet to the netdev. Otherwise, we will sync the state.
>>>>> And drop primary packet also here?
>>> No, the primary packet must be sent back to the netdev, so the client can receive
>>> the response.
>>>
>>> For example:
>>> 1. guest has a ftp server
>>> 2. we connect to the ftp server via the network
>>> 3. both primary guest and secondary guest receive this request
>>> 4. both primary guest and secondary guest ack it
>>> 5. we compare these two ack packets in the comparing thread
>>> 6. it is the same(the seqno is different, but it is not important, we can modify it in
>>>    colo-rewriter). So we drop the secondary packets, and sent back the primary packet
>>>    to netdev
>>> 7. The primary ack packet is sent to the ftp client via netdev.
>>>
>>> The ftp client only cares of the received packet. So if the packets from primay
>>> and secondary guest contain the same data, we can say they are in the "same" state.
>>>
>>> Thanks
>>> Wen Congyang
>>>
>> Thanks for the example. But still don't get why it must be done before
>> comparing consider it will always be sent regardless the result of
>> comparing?
> Our goal is that: the connection is OK after failover, and the user doesn't know one of
> the hosts crashed.
>
> If it sent out regardless the result of comparing, and primary host crashes. The connection
> may be corrupted after failover. For example: the packet from primary and secondary host
> contains different host, and we send the primary packet before comparing. The primary host
> crashes before comparing these two packets. After failover, the connection may be reset or
> the client doesn't receive the correct data, or some unexpected problems occurs.
>
> Another example(tcp):
> 1. primary guest acks 100, and secondary guest only ack 95(some packet is lost in the guest)
> 2. client doesn't resend the lost packet
> 3. the connection will be recovered after the next checkpoint
> If we do failover before the next checkpoint, there is no way to recover this connection.
>
> If we send out the packet after comparing, we can assume that the client always receives the
> same data.

Thanks. I think I get the point. So if there's a difference, primary
packet will only be sent after checkpoint and we could not assume the
checkpoint itself is reliable.

Back to the filters design. We'd better still decouple packet comparing
out of netdev. Maybe a little bit more tweak on what you've suggested:

-netdev tap,id=hn0
-object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
-object
traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0,indev=comparer2
-colo-comparer
primary_traffic=comparer0,secondary_traffic=comparer1,outdev=comparer2

Just add one more socket for comparer for sending primary packet, and
let f1 redirector its output to netdev?

>
> Thanks
> Wen Congyang
>
>>
>>
>> .
>>
>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  7:42                                         ` Jason Wang
@ 2016-01-22  7:46                                           ` Wen Congyang
  2016-01-27 15:22                                             ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Wen Congyang @ 2016-01-22  7:46 UTC (permalink / raw)
  To: Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong,
	Dr. David Alan Gilbert, Huang peng, Gong lei, Stefan Hajnoczi,
	jan.kiszka, Yang Hongyang

On 01/22/2016 03:42 PM, Jason Wang wrote:
> 
> 
> On 01/22/2016 02:47 PM, Wen Congyang wrote:
>> On 01/22/2016 02:21 PM, Jason Wang wrote:
>>>
>>> On 01/22/2016 01:56 PM, Wen Congyang wrote:
>>>> On 01/22/2016 01:41 PM, Jason Wang wrote:
>>>>>>
>>>>>> On 01/22/2016 11:28 AM, Wen Congyang wrote:
>>>>>>>> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>>>>>>>>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>>>>>>>>>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>>>>>>>>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>>>>>>>>>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a git tree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's better to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've achieved
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as ingress
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could be
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>>>>>>>>>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>>>>>>>>>>>>>>>>>>> server,startup is a problem.
>>>>>>>>>>>>>>>>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Secondary qemu
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                        guest                        |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |                                                     |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                      | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |  netfilter  |              |                             |    |  |  
>>>>>>>>>>>>>>>>>>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |              |                          |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | |           |     |            |    |            |  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> | +-----------------------------------------------------+  |       | 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +-------------------------------------------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                           |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>>>>>>>>>>>>>>>>>>          |                                    |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                          |
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                         tap                             
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                                                         
>>>>>>>>>>>>>>>>>>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>>>>>>>>>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>>>>>>>>>>>>>>>>> advantages:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>>>>>>>>>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>>>>>>>>>>>>>>>>> directly which simplify the design.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> guest send packet route
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> primary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>>>>>>>>>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>>>>>>>>>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>>>>>>>>>>>>>>>>>>> checkpoint.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> secondary
>>>>>>>>>>>>>>>>>>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>>>>>>>>>>>>>>>>>>> servicing
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the TCP connection which is established after the last checkpoint。
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> How about this plan?
>>>>>>>>>>>>>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> And there's indeed no need to differ client/server by reusing the socket
>>>>>>>>>>>>>>>>>>>>>>>> chardev. E.g:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In primary node:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>>>>>>>>>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>>>>>>>>>>>>>>> Why mirrorer has indev? 
>>>>>>>>>>>>>>>> As I said in the previous mails. I would like to decouple packet
>>>>>>>>>>>>>>>> comparing from netfilter. You've already done most of this since the
>>>>>>>>>>>>>>>> comparing is done in an independent thread. So the indev here is to
>>>>>>>>>>>>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>>>>>>>>>>>>>>> The command line is:
>>>>>>>>>>>>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>>>>>>>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>>>>>>>>>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>>>>>>>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>>>>>>>>>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>>>>>>>>>>>>>>> out the packet.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>>>>>>>>>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>>>>>>>>>>>>> (which I prefer not).
>>>>>>>>>>>> I mean that: packet comapring is done in a thread, not a net filter.
>>>>>>>>>>>> The flow of the packet sent from guest:
>>>>>>>>>>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>>>>>>>>>>    filter will never see it.
>>>>>>>>>>>> 2. comparing thread: read it from socket chardev comparer0
>>>>>>>>>>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>>>>>>>>>> Ok, looks like I miss something.
>>>>>>>>>>
>>>>>>>>>> My suggestion tries best to let the packet comparing not tie to filter
>>>>>>>>>> or netdev. But your suggestion still need it to be coupled with a
>>>>>>>>>> netdev. Any advantages of doing this (or is there a reason that packet
>>>>>>>>>> must be sent to netdev after doing comparing?). If not, why not just
>>>>>>>> Yes, the packet must be sent to netdev after doing comparing. If both
>>>>>>>> the primary packet and secondary packet are the same(contains the same
>>>>>>>> application level data), we will drop the secondary packet, and send the
>>>>>>>> primary packet to the netdev. Otherwise, we will sync the state.
>>>>>> And drop primary packet also here?
>>>> No, the primary packet must be sent back to the netdev, so the client can receive
>>>> the response.
>>>>
>>>> For example:
>>>> 1. guest has a ftp server
>>>> 2. we connect to the ftp server via the network
>>>> 3. both primary guest and secondary guest receive this request
>>>> 4. both primary guest and secondary guest ack it
>>>> 5. we compare these two ack packets in the comparing thread
>>>> 6. it is the same(the seqno is different, but it is not important, we can modify it in
>>>>    colo-rewriter). So we drop the secondary packets, and sent back the primary packet
>>>>    to netdev
>>>> 7. The primary ack packet is sent to the ftp client via netdev.
>>>>
>>>> The ftp client only cares of the received packet. So if the packets from primay
>>>> and secondary guest contain the same data, we can say they are in the "same" state.
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>> Thanks for the example. But still don't get why it must be done before
>>> comparing consider it will always be sent regardless the result of
>>> comparing?
>> Our goal is that: the connection is OK after failover, and the user doesn't know one of
>> the hosts crashed.
>>
>> If it sent out regardless the result of comparing, and primary host crashes. The connection
>> may be corrupted after failover. For example: the packet from primary and secondary host
>> contains different host, and we send the primary packet before comparing. The primary host
>> crashes before comparing these two packets. After failover, the connection may be reset or
>> the client doesn't receive the correct data, or some unexpected problems occurs.
>>
>> Another example(tcp):
>> 1. primary guest acks 100, and secondary guest only ack 95(some packet is lost in the guest)
>> 2. client doesn't resend the lost packet
>> 3. the connection will be recovered after the next checkpoint
>> If we do failover before the next checkpoint, there is no way to recover this connection.
>>
>> If we send out the packet after comparing, we can assume that the client always receives the
>> same data.
> 
> Thanks. I think I get the point. So if there's a difference, primary
> packet will only be sent after checkpoint and we could not assume the
> checkpoint itself is reliable.

Yes.

> 
> Back to the filters design. We'd better still decouple packet comparing
> out of netdev. Maybe a little bit more tweak on what you've suggested:
> 
> -netdev tap,id=hn0
> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
> -object
> traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0,indev=comparer2
> -colo-comparer
> primary_traffic=comparer0,secondary_traffic=comparer1,outdev=comparer2
> 
> Just add one more socket for comparer for sending primary packet, and
> let f1 redirector its output to netdev?

OK, I understand it now.
Thanks for your suggestion.

Wen Congyang

> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>
>>> .
>>>
>>
>>
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-01-22  7:46                                           ` Wen Congyang
@ 2016-01-27 15:22                                             ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2016-01-27 15:22 UTC (permalink / raw)
  To: Wen Congyang, Jason Wang, Zhang Chen, qemu devel
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, eddie.dong, Huang peng,
	Dr. David Alan Gilbert, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1362 bytes --]

On 01/22/2016 12:46 AM, Wen Congyang wrote:
...

>>>>>>>>>>>>>>>>>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sure.

Wow, that's a lot of wasted quoting.  Your mail weighed in at 24k, even
though...


>> Thanks. I think I get the point. So if there's a difference, primary
>> packet will only be sent after checkpoint and we could not assume the
>> checkpoint itself is reliable.
> 
> Yes.
> 
>>
>> Back to the filters design. We'd better still decouple packet comparing
>> out of netdev. Maybe a little bit more tweak on what you've suggested:
>>
>> -netdev tap,id=hn0
>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>> -object
>> traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0,indev=comparer2
>> -colo-comparer
>> primary_traffic=comparer0,secondary_traffic=comparer1,outdev=comparer2
>>
>> Just add one more socket for comparer for sending primary packet, and
>> let f1 redirector its output to netdev?
> 
> OK, I understand it now.
> Thanks for your suggestion.

...content-wise, you only added about 100 bytes.  It's okay to trim
replies down to relevant portions, to make it easier for readers to get
to the meat of your message.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework Zhang Chen
@ 2016-02-19 19:57   ` Dr. David Alan Gilbert
  2016-02-22  3:04     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-19 19:57 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 

> +static void colo_proxy_setup(NetFilterState *nf, Error **errp)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
> +
> +    if (!s->addr) {
> +        error_setg(errp, "filter colo_proxy needs 'addr' property set!");
> +        return;
> +    }
> +
> +    if (nf->direction != NET_FILTER_DIRECTION_ALL) {
> +        error_setg(errp, "colo need queue all packet,"
> +                        "please startup colo-proxy with queue=all\n");
> +        return;
> +    }
> +
> +    s->sockfd = -1;
> +    s->hashtable_size = 0;
> +    colo_do_checkpoint = false;
> +    qemu_event_init(&s->need_compare_ev, false);
> +
> +    s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);

I found that I had to be careful that this queue got flushed.  If the packet
can't be sent immediately, then the packet only gets sent if another
packet is added to the queue later.  I added a state change notifier to
flush it when the VM started running (this is more of a problem in my hybrid
mode case).

Note also that the queue is not protected by locks; so take care since packets
are sent from both the comparison thread and the colo thread (when it flushes)
and I think it's read by the main thread as well potentially as packets are sent.

Dave


> +    colo_conn_hash = g_hash_table_new_full(connection_key_hash,
> +                                           connection_key_equal,
> +                                           g_free,
> +                                           connection_destroy);
> +    g_queue_init(&s->conn_list);
> +}
> +
> +static void colo_proxy_class_init(ObjectClass *oc, void *data)
> +{
> +    NetFilterClass *nfc = NETFILTER_CLASS(oc);
> +
> +    nfc->setup = colo_proxy_setup;
> +    nfc->cleanup = colo_proxy_cleanup;
> +    nfc->receive_iov = colo_proxy_receive_iov;
> +}
> +
> +static int colo_proxy_get_mode(Object *obj, Error **errp)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
> +
> +    return s->colo_mode;
> +}
> +
> +static void
> +colo_proxy_set_mode(Object *obj, int mode, Error **errp)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
> +
> +    s->colo_mode = mode;
> +}
> +
> +static char *colo_proxy_get_addr(Object *obj, Error **errp)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
> +
> +    return g_strdup(s->addr);
> +}
> +
> +static void
> +colo_proxy_set_addr(Object *obj, const char *value, Error **errp)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
> +    g_free(s->addr);
> +    s->addr = g_strdup(value);
> +    if (!s->addr) {
> +        error_setg(errp, "colo_proxy needs 'addr'"
> +                     "property set!");
> +        return;
> +    }
> +}
> +
> +static void colo_proxy_init(Object *obj)
> +{
> +    object_property_add_enum(obj, "mode", "COLOMode", COLOMode_lookup,
> +                             colo_proxy_get_mode, colo_proxy_set_mode, NULL);
> +    object_property_add_str(obj, "addr", colo_proxy_get_addr,
> +                            colo_proxy_set_addr, NULL);
> +}
> +
> +static void colo_proxy_fini(Object *obj)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
> +    g_free(s->addr);
> +}
> +
> +static const TypeInfo colo_proxy_info = {
> +    .name = TYPE_FILTER_COLO_PROXY,
> +    .parent = TYPE_NETFILTER,
> +    .class_init = colo_proxy_class_init,
> +    .instance_init = colo_proxy_init,
> +    .instance_finalize = colo_proxy_fini,
> +    .instance_size = sizeof(COLOProxyState),
> +};
> +
> +static void register_types(void)
> +{
> +    type_register_static(&colo_proxy_info);
> +}
> +
> +type_init(register_types);
> diff --git a/net/colo-proxy.h b/net/colo-proxy.h
> new file mode 100644
> index 0000000..affc117
> --- /dev/null
> +++ b/net/colo-proxy.h
> @@ -0,0 +1,24 @@
> +/*
> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
> + * (a.k.a. Fault Tolerance or Continuous Replication)
> + *
> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2015 FUJITSU LIMITED
> + * Copyright (c) 2015 Intel Corporation
> + *
> + * Author: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * later.  See the COPYING file in the top-level directory.
> + */
> +
> +
> +#ifndef QEMU_COLO_PROXY_H
> +#define QEMU_COLO_PROXY_H
> +
> +int colo_proxy_start(int mode);
> +void colo_proxy_stop(int mode);
> +int colo_proxy_do_checkpoint(int mode);
> +bool colo_proxy_query_checkpoint(void);
> +
> +#endif /* QEMU_COLO_PROXY_H */
> -- 
> 1.9.1
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy Zhang Chen
@ 2016-02-19 19:58   ` Dr. David Alan Gilbert
  2016-02-22  3:08     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-19 19:58 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> Add interface used by migration/colo.c
> so colo framework can work with proxy
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  net/colo-proxy.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
> 
> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> index f448ee1..ba2bbe7 100644
> --- a/net/colo-proxy.c
> +++ b/net/colo-proxy.c
> @@ -167,6 +167,11 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
>      return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
>  }
>  
> +bool colo_proxy_query_checkpoint(void)
> +{
> +    return colo_do_checkpoint;
> +}
> +
>  static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
>                                           NetClientState *sender,
>                                           unsigned flags,
> @@ -203,6 +208,94 @@ static void colo_proxy_cleanup(NetFilterState *nf)
>      qemu_event_destroy(&s->need_compare_ev);
>  }
>  
> +static void colo_proxy_notify_checkpoint(void)
> +{
> +    trace_colo_proxy("colo_proxy_notify_checkpoint");
> +    colo_do_checkpoint = true;
> +}
> +
> +static void colo_proxy_start_one(NetFilterState *nf,
> +                                      void *opaque, Error **errp)
> +{
> +    COLOProxyState *s;
> +    int mode, ret;
> +
> +    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
> +        return;
> +    }
> +
> +    mode = *(int *)opaque;
> +    s = FILTER_COLO_PROXY(nf);
> +    assert(s->colo_mode == mode);
> +
> +    if (s->colo_mode == COLO_MODE_PRIMARY) {
> +        char thread_name[1024];
> +
> +        ret = colo_proxy_connect(s);
> +        if (ret) {
> +            error_setg(errp, "colo proxy connect failed");
> +            return ;
> +        }
> +
> +        s->status = COLO_PROXY_RUNNING;
> +        sprintf(thread_name, "proxy compare %s", nf->netdev_id);
> +        qemu_thread_create(&s->thread, thread_name,
> +                                colo_proxy_compare_thread, s,
> +                                QEMU_THREAD_JOINABLE);

Note most OSs have a ~14 character limit on the size of the thread
name, otherwise they ignore the request to set the name (and the
thread shows up as 'migration'), so I suggest keep it as "proxy:%s".

Dave

> +    } else {
> +        ret = colo_wait_incoming(s);
> +        if (ret) {
> +            error_setg(errp, "colo proxy wait incoming failed");
> +            return ;
> +        }
> +        s->status = COLO_PROXY_RUNNING;
> +    }
> +}
> +
> +int colo_proxy_start(int mode)
> +{
> +    Error *err = NULL;
> +    qemu_foreach_netfilter(colo_proxy_start_one, &mode, &err);
> +    if (err) {
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +static void colo_proxy_stop_one(NetFilterState *nf,
> +                                      void *opaque, Error **errp)
> +{
> +    COLOProxyState *s;
> +    int mode;
> +
> +    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
> +        return;
> +    }
> +
> +    s = FILTER_COLO_PROXY(nf);
> +    mode = *(int *)opaque;
> +    assert(s->colo_mode == mode);
> +
> +    s->status = COLO_PROXY_DONE;
> +    if (s->sockfd >= 0) {
> +        qemu_set_fd_handler(s->sockfd, NULL, NULL, NULL);
> +        closesocket(s->sockfd);
> +    }
> +    if (s->colo_mode == COLO_MODE_PRIMARY) {
> +        colo_proxy_primary_checkpoint(s);
> +        qemu_event_set(&s->need_compare_ev);
> +        qemu_thread_join(&s->thread);
> +    } else {
> +        colo_proxy_secondary_checkpoint(s);
> +    }
> +}
> +
> +void colo_proxy_stop(int mode)
> +{
> +    Error *err = NULL;
> +    qemu_foreach_netfilter(colo_proxy_stop_one, &mode, &err);
> +}
> +
>  static void colo_proxy_setup(NetFilterState *nf, Error **errp)
>  {
>      COLOProxyState *s = FILTER_COLO_PROXY(nf);
> -- 
> 1.9.1
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func Zhang Chen
@ 2016-02-19 20:01   ` Dr. David Alan Gilbert
  2016-02-22  5:51     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-19 20:01 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> Colo need to forward packets
> we start socket server in secondary and primary
> connect to secondary in startup
> the packet recv by primary forward to secondary
> the packet send by secondary forward to primary
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>

I found one problem with the socket setup is that the
packets from the primary and secondary aren't tied to the
checkpoint they are part of; so for example a packet from the secondary
may reach the primary at the start of the next checkpoint, causing a
miscomparison.
I added a counter to discard old packets.

Dave

> ---
>  net/colo-proxy.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 114 insertions(+)
> 
> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> index ba2bbe7..2347bbf 100644
> --- a/net/colo-proxy.c
> +++ b/net/colo-proxy.c
> @@ -172,6 +172,69 @@ bool colo_proxy_query_checkpoint(void)
>      return colo_do_checkpoint;
>  }
>  
> +/*
> + * send a packet to peer
> + * >=0: success
> + * <0: fail
> + */
> +static ssize_t colo_proxy_sock_send(NetFilterState *nf,
> +                                         const struct iovec *iov,
> +                                         int iovcnt)
> +{
> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
> +    ssize_t ret = 0;
> +    ssize_t size = 0;
> +    struct iovec sizeiov = {
> +        .iov_base = &size,
> +        .iov_len = sizeof(size)
> +    };
> +    size = iov_size(iov, iovcnt);
> +    if (!size) {
> +        return 0;
> +    }
> +
> +    ret = iov_send(s->sockfd, &sizeiov, 1, 0, sizeof(size));
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    ret = iov_send(s->sockfd, iov, iovcnt, 0, size);
> +    return ret;
> +}
> +
> +/*
> + * receive a packet from peer
> + * in primary: enqueue packet to secondary_list
> + * in secondary: pass packet to next
> + */
> +static void colo_proxy_sock_receive(void *opaque)
> +{
> +    NetFilterState *nf = opaque;
> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
> +    ssize_t len = 0;
> +    struct iovec sizeiov = {
> +        .iov_base = &len,
> +        .iov_len = sizeof(len)
> +    };
> +
> +    iov_recv(s->sockfd, &sizeiov, 1, 0, sizeof(len));
> +    if (len > 0 && len < NET_BUFSIZE) {
> +        char *buf = g_malloc0(len);
> +        struct iovec iov = {
> +            .iov_base = buf,
> +            .iov_len = len
> +        };
> +
> +        iov_recv(s->sockfd, &iov, 1, 0, len);
> +        if (s->colo_mode == COLO_MODE_PRIMARY) {
> +            colo_proxy_enqueue_secondary_packet(nf, buf, len);
> +            /* buf will be release when pakcet destroy */
> +        } else {
> +            qemu_net_queue_send(s->incoming_queue, nf->netdev,
> +                            0, (const uint8_t *)buf, len, NULL);
> +        }
> +    }
> +}
> +
>  static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
>                                           NetClientState *sender,
>                                           unsigned flags,
> @@ -208,6 +271,57 @@ static void colo_proxy_cleanup(NetFilterState *nf)
>      qemu_event_destroy(&s->need_compare_ev);
>  }
>  
> +/* wait for peer connecting
> + * NOTE: this function will block the caller
> + * 0 on success, otherwise returns -1
> + */
> +static int colo_wait_incoming(COLOProxyState *s)
> +{
> +    struct sockaddr_in addr;
> +    socklen_t addrlen = sizeof(addr);
> +    int accept_sock, err;
> +    int fd = inet_listen(s->addr, NULL, 256, SOCK_STREAM, 0, NULL);
> +
> +    if (fd < 0) {
> +        error_report("colo proxy listen failed");
> +        return -1;
> +    }
> +
> +    do {
> +        accept_sock = qemu_accept(fd, (struct sockaddr *)&addr, &addrlen);
> +        err = socket_error();
> +    } while (accept_sock < 0 && err == EINTR);
> +    closesocket(fd);
> +
> +    if (accept_sock < 0) {
> +        error_report("colo proxy accept failed(%s)", strerror(err));
> +        return -1;
> +    }
> +    s->sockfd = accept_sock;
> +
> +    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
> +
> +    return 0;
> +}
> +
> +/* try to connect listening server
> + * 0 on success, otherwise something wrong
> + */
> +static ssize_t colo_proxy_connect(COLOProxyState *s)
> +{
> +    int sock;
> +    sock = inet_connect(s->addr, NULL);
> +
> +    if (sock < 0) {
> +        error_report("colo proxy inet_connect failed");
> +        return -1;
> +    }
> +    s->sockfd = sock;
> +    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
> +
> +    return 0;
> +}
> +
>  static void colo_proxy_notify_checkpoint(void)
>  {
>      trace_colo_proxy("colo_proxy_notify_checkpoint");
> -- 
> 1.9.1
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection Zhang Chen
@ 2016-02-19 20:04   ` Dr. David Alan Gilbert
  2016-02-22  6:41     ` Zhang Chen
  0 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-19 20:04 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> In here we will handle ip packet and connection
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 130 insertions(+)
> 
> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> index 5e5c72e..06bab80 100644
> --- a/net/colo-proxy.c
> +++ b/net/colo-proxy.c
> @@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
>      return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
>  }
>  
> +static void connection_destroy(void *opaque)
> +{
> +    Connection *conn = opaque;
> +
> +    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
> +    g_queue_free(&conn->primary_list);
> +    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);

Be careful about these lists and which threads access them;
I found I could occasionally trigger a seg fault as two
threads tried to manipulate them at once; I just put a 'list_lock'
in the connection, which seems to fix it, but I might have to be
more careful with deadlocks.

> +    g_queue_free(&conn->secondary_list);
> +    g_slice_free(Connection, conn);
> +}
> +
> +static Connection *connection_new(ConnectionKey *key)
> +{
> +    Connection *conn = g_slice_new(Connection);
> +
> +    conn->ip_proto = key->ip_proto;
> +    conn->processing = false;
> +    g_queue_init(&conn->primary_list);
> +    g_queue_init(&conn->secondary_list);
> +
> +    return conn;
> +}
> +
> +/*
> + * Clear hashtable, stop this hash growing really huge
> + */
> +static void clear_connection_hashtable(COLOProxyState *s)
> +{
> +    s->hashtable_size = 0;
> +    g_hash_table_remove_all(colo_conn_hash);
> +    trace_colo_proxy("clear_connection_hashtable");
> +}
> +
>  bool colo_proxy_query_checkpoint(void)
>  {
>      return colo_do_checkpoint;
>  }
>  
> +/* Return 0 on success, or return -1 if the pkt is corrupted */
> +static int parse_packet_early(Packet *pkt, ConnectionKey *key)
> +{
> +    int network_length;
> +    uint8_t *data = pkt->data;
> +    uint16_t l3_proto;
> +    uint32_t tmp_ports;
> +    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> +
> +    pkt->network_layer = data + ETH_HLEN;
> +    l3_proto = eth_get_l3_proto(data, l2hdr_len);
> +    if (l3_proto != ETH_P_IP) {
> +        if (l3_proto == ETH_P_ARP) {
> +            return -1;
> +        }
> +        return 0;
> +    }
> +
> +    network_length = pkt->ip->ip_hl * 4;
> +    pkt->transport_layer = pkt->network_layer + network_length;
> +    key->ip_proto = pkt->ip->ip_p;
> +    key->src = pkt->ip->ip_src;
> +    key->dst = pkt->ip->ip_dst;
> +
> +    switch (key->ip_proto) {
> +    case IPPROTO_TCP:
> +    case IPPROTO_UDP:
> +    case IPPROTO_DCCP:
> +    case IPPROTO_ESP:
> +    case IPPROTO_SCTP:
> +    case IPPROTO_UDPLITE:
> +        tmp_ports = *(uint32_t *)(pkt->transport_layer);
> +        key->src_port = tmp_ports & 0xffff;
> +        key->dst_port = tmp_ports >> 16;

These fields are not byteswapped; it makes it very confusing
when printing them for debug;  I added htons around every
reading of the ports from the packets.

Dave

> +        break;
> +    case IPPROTO_AH:
> +        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
> +        key->src_port = tmp_ports & 0xffff;
> +        key->dst_port = tmp_ports >> 16;
> +        break;
> +    default:
> +        break;
> +    }
> +
> +    return 0;
> +}
> +
> +static Packet *packet_new(COLOProxyState *s, void *data,
> +                          int size, ConnectionKey *key, NetClientState *sender)
> +{
> +    Packet *pkt = g_slice_new(Packet);
> +
> +    pkt->data = data;
> +    pkt->size = size;
> +    pkt->s = s;
> +    pkt->sender = sender;
> +
> +    if (parse_packet_early(pkt, key)) {
> +        packet_destroy(pkt, NULL);
> +        pkt = NULL;
> +    }
> +
> +    return pkt;
> +}
> +
> +static void packet_destroy(void *opaque, void *user_data)
> +{
> +    Packet *pkt = opaque;
> +    g_free(pkt->data);
> +    g_slice_free(Packet, pkt);
> +}
> +
> +/* if not found, creata a new connection and add to hash table */
> +static Connection *colo_proxy_get_conn(COLOProxyState *s,
> +            ConnectionKey *key)
> +{
> +    /* FIXME: protect colo_conn_hash */
> +    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
> +
> +    if (conn == NULL) {
> +        ConnectionKey *new_key = g_malloc(sizeof(*key));
> +
> +        conn = connection_new(key);
> +        memcpy(new_key, key, sizeof(*key));
> +
> +        s->hashtable_size++;
> +        if (s->hashtable_size > hashtable_max_size) {
> +            trace_colo_proxy("colo proxy connection hashtable full, clear it");
> +            clear_connection_hashtable(s);
> +        } else {
> +            g_hash_table_insert(colo_conn_hash, new_key, conn);
> +        }
> +    }
> +
> +     return conn;
> +}
> +
>  static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
>                                           NetClientState *sender,
>                                           unsigned flags,
> -- 
> 1.9.1
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt
  2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt Zhang Chen
@ 2016-02-19 20:07   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-19 20:07 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> We will compare packet sent by primary guest
> to secondary guest,if same,send primary packet.
> else we will notify colo to do checkpoint to
> make secondary guset running same as primary
> 
> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  net/colo-proxy.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> index 06bab80..abb289f 100644
> --- a/net/colo-proxy.c
> +++ b/net/colo-proxy.c
> @@ -602,6 +602,70 @@ static void colo_proxy_notify_checkpoint(void)
>      colo_do_checkpoint = true;
>  }
>  
> +/*
> + * The IP packets sent by primary and secondary
> + * will be comparison in here
> + * TODO: support ip fragment
> + * return:    0  means packet same
> + *            > 0 || < 0 means packet different
> + */
> +static int colo_packet_compare(Packet *ppkt, Packet *spkt)
> +{
> +    trace_colo_proxy("colo_packet_compare data   ppkt");
> +    trace_colo_proxy_packet_size(ppkt->size);
> +    trace_colo_proxy_packet_src(inet_ntoa(ppkt->ip->ip_src));
> +    trace_colo_proxy_packet_dst(inet_ntoa(ppkt->ip->ip_dst));
> +    colo_proxy_dump_packet(ppkt);
> +    trace_colo_proxy("colo_packet_compare data   spkt");
> +    trace_colo_proxy_packet_size(spkt->size);
> +    trace_colo_proxy_packet_src(inet_ntoa(spkt->ip->ip_src));
> +    trace_colo_proxy_packet_dst(inet_ntoa(spkt->ip->ip_dst));
> +    colo_proxy_dump_packet(spkt);
> +
> +    if (ppkt->size == spkt->size) {
> +        return memcmp(ppkt->data, spkt->data, spkt->size);
> +    } else {
> +        trace_colo_proxy("colo_packet_compare size not same");
> +        return -1;
> +    }
> +}
> +
> +static void colo_compare_connection(void *opaque, void *user_data)
> +{
> +    Connection *conn = opaque;
> +    Packet *pkt = NULL;
> +    GList *result = NULL;
> +
> +    while (!g_queue_is_empty(&conn->primary_list) &&
> +                !g_queue_is_empty(&conn->secondary_list)) {
> +        pkt = g_queue_pop_head(&conn->primary_list);
> +        result = g_queue_find_custom(&conn->secondary_list,
> +                    pkt, (GCompareFunc)colo_packet_compare);

Are you sure the 'ppkt' and 'spkt' are the right way around in colo_packet_compare?
(Not that in this simple version it makes much difference).
My reading of g_queue_find_custom's man page is that the first parameter
of the compare function comes from the list, which is the secondary.

> +        if (result) {
> +            colo_send_primary_packet(pkt, NULL);
> +            trace_colo_proxy("packet same and release packet");
> +        } else {
> +            g_queue_push_tail(&conn->primary_list, pkt);

You pop the packets off the head of the primary list above,
but push it back to the tail here; why do you reorder?

Dave

> +            trace_colo_proxy("packet different");
> +            colo_proxy_notify_checkpoint();
> +            break;
> +        }
> +    }
> +}
> +
> +static void *colo_proxy_compare_thread(void *opaque)
> +{
> +    COLOProxyState *s = opaque;
> +
> +    while (s->status == COLO_PROXY_RUNNING) {
> +        qemu_event_wait(&s->need_compare_ev);
> +        qemu_event_reset(&s->need_compare_ev);
> +        g_queue_foreach(&s->conn_list, colo_compare_connection, NULL);
> +    }
> +
> +    return NULL;
> +}
> +
>  static void colo_proxy_start_one(NetFilterState *nf,
>                                        void *opaque, Error **errp)
>  {
> -- 
> 1.9.1
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework
  2016-02-19 19:57   ` Dr. David Alan Gilbert
@ 2016-02-22  3:04     ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-02-22  3:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 02/20/2016 03:57 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> +static void colo_proxy_setup(NetFilterState *nf, Error **errp)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
>> +
>> +    if (!s->addr) {
>> +        error_setg(errp, "filter colo_proxy needs 'addr' property set!");
>> +        return;
>> +    }
>> +
>> +    if (nf->direction != NET_FILTER_DIRECTION_ALL) {
>> +        error_setg(errp, "colo need queue all packet,"
>> +                        "please startup colo-proxy with queue=all\n");
>> +        return;
>> +    }
>> +
>> +    s->sockfd = -1;
>> +    s->hashtable_size = 0;
>> +    colo_do_checkpoint = false;
>> +    qemu_event_init(&s->need_compare_ev, false);
>> +
>> +    s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> I found that I had to be careful that this queue got flushed.  If the packet
> can't be sent immediately, then the packet only gets sent if another
> packet is added to the queue later.  I added a state change notifier to
> flush it when the VM started running (this is more of a problem in my hybrid
> mode case).
>
> Note also that the queue is not protected by locks; so take care since packets
> are sent from both the comparison thread and the colo thread (when it flushes)
> and I think it's read by the main thread as well potentially as packets are sent.
>
> Dave
>

Hi, Dave.
Thanks for your review, I will pay attention to this problem in the 
following modules.
and We have split colo-proxy to filter-mirror, filter-redirector, 
filter-rewriter and
colo-compare about jason's comments. The detail please look at the 
discussion
about "[RFC PATCH v2 00/10] Add colo-proxy based on netfilter" . If you 
have time,
please review it.

Thanks
zhangchen

>> +    colo_conn_hash = g_hash_table_new_full(connection_key_hash,
>> +                                           connection_key_equal,
>> +                                           g_free,
>> +                                           connection_destroy);
>> +    g_queue_init(&s->conn_list);
>> +}
>> +
>> +static void colo_proxy_class_init(ObjectClass *oc, void *data)
>> +{
>> +    NetFilterClass *nfc = NETFILTER_CLASS(oc);
>> +
>> +    nfc->setup = colo_proxy_setup;
>> +    nfc->cleanup = colo_proxy_cleanup;
>> +    nfc->receive_iov = colo_proxy_receive_iov;
>> +}
>> +
>> +static int colo_proxy_get_mode(Object *obj, Error **errp)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
>> +
>> +    return s->colo_mode;
>> +}
>> +
>> +static void
>> +colo_proxy_set_mode(Object *obj, int mode, Error **errp)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
>> +
>> +    s->colo_mode = mode;
>> +}
>> +
>> +static char *colo_proxy_get_addr(Object *obj, Error **errp)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
>> +
>> +    return g_strdup(s->addr);
>> +}
>> +
>> +static void
>> +colo_proxy_set_addr(Object *obj, const char *value, Error **errp)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
>> +    g_free(s->addr);
>> +    s->addr = g_strdup(value);
>> +    if (!s->addr) {
>> +        error_setg(errp, "colo_proxy needs 'addr'"
>> +                     "property set!");
>> +        return;
>> +    }
>> +}
>> +
>> +static void colo_proxy_init(Object *obj)
>> +{
>> +    object_property_add_enum(obj, "mode", "COLOMode", COLOMode_lookup,
>> +                             colo_proxy_get_mode, colo_proxy_set_mode, NULL);
>> +    object_property_add_str(obj, "addr", colo_proxy_get_addr,
>> +                            colo_proxy_set_addr, NULL);
>> +}
>> +
>> +static void colo_proxy_fini(Object *obj)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(obj);
>> +    g_free(s->addr);
>> +}
>> +
>> +static const TypeInfo colo_proxy_info = {
>> +    .name = TYPE_FILTER_COLO_PROXY,
>> +    .parent = TYPE_NETFILTER,
>> +    .class_init = colo_proxy_class_init,
>> +    .instance_init = colo_proxy_init,
>> +    .instance_finalize = colo_proxy_fini,
>> +    .instance_size = sizeof(COLOProxyState),
>> +};
>> +
>> +static void register_types(void)
>> +{
>> +    type_register_static(&colo_proxy_info);
>> +}
>> +
>> +type_init(register_types);
>> diff --git a/net/colo-proxy.h b/net/colo-proxy.h
>> new file mode 100644
>> index 0000000..affc117
>> --- /dev/null
>> +++ b/net/colo-proxy.h
>> @@ -0,0 +1,24 @@
>> +/*
>> + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
>> + * (a.k.a. Fault Tolerance or Continuous Replication)
>> + *
>> + * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2015 FUJITSU LIMITED
>> + * Copyright (c) 2015 Intel Corporation
>> + *
>> + * Author: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> + * later.  See the COPYING file in the top-level directory.
>> + */
>> +
>> +
>> +#ifndef QEMU_COLO_PROXY_H
>> +#define QEMU_COLO_PROXY_H
>> +
>> +int colo_proxy_start(int mode);
>> +void colo_proxy_stop(int mode);
>> +int colo_proxy_do_checkpoint(int mode);
>> +bool colo_proxy_query_checkpoint(void);
>> +
>> +#endif /* QEMU_COLO_PROXY_H */
>> -- 
>> 1.9.1
>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy
  2016-02-19 19:58   ` Dr. David Alan Gilbert
@ 2016-02-22  3:08     ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-02-22  3:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 02/20/2016 03:58 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Add interface used by migration/colo.c
>> so colo framework can work with proxy
>>
>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   net/colo-proxy.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 93 insertions(+)
>>
>> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
>> index f448ee1..ba2bbe7 100644
>> --- a/net/colo-proxy.c
>> +++ b/net/colo-proxy.c
>> @@ -167,6 +167,11 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
>>       return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
>>   }
>>   
>> +bool colo_proxy_query_checkpoint(void)
>> +{
>> +    return colo_do_checkpoint;
>> +}
>> +
>>   static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
>>                                            NetClientState *sender,
>>                                            unsigned flags,
>> @@ -203,6 +208,94 @@ static void colo_proxy_cleanup(NetFilterState *nf)
>>       qemu_event_destroy(&s->need_compare_ev);
>>   }
>>   
>> +static void colo_proxy_notify_checkpoint(void)
>> +{
>> +    trace_colo_proxy("colo_proxy_notify_checkpoint");
>> +    colo_do_checkpoint = true;
>> +}
>> +
>> +static void colo_proxy_start_one(NetFilterState *nf,
>> +                                      void *opaque, Error **errp)
>> +{
>> +    COLOProxyState *s;
>> +    int mode, ret;
>> +
>> +    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
>> +        return;
>> +    }
>> +
>> +    mode = *(int *)opaque;
>> +    s = FILTER_COLO_PROXY(nf);
>> +    assert(s->colo_mode == mode);
>> +
>> +    if (s->colo_mode == COLO_MODE_PRIMARY) {
>> +        char thread_name[1024];
>> +
>> +        ret = colo_proxy_connect(s);
>> +        if (ret) {
>> +            error_setg(errp, "colo proxy connect failed");
>> +            return ;
>> +        }
>> +
>> +        s->status = COLO_PROXY_RUNNING;
>> +        sprintf(thread_name, "proxy compare %s", nf->netdev_id);
>> +        qemu_thread_create(&s->thread, thread_name,
>> +                                colo_proxy_compare_thread, s,
>> +                                QEMU_THREAD_JOINABLE);
> Note most OSs have a ~14 character limit on the size of the thread
> name, otherwise they ignore the request to set the name (and the
> thread shows up as 'migration'), so I suggest keep it as "proxy:%s".
>
> Dave

I will fix it in colo-compare module.

Thanks
zhangchen

>> +    } else {
>> +        ret = colo_wait_incoming(s);
>> +        if (ret) {
>> +            error_setg(errp, "colo proxy wait incoming failed");
>> +            return ;
>> +        }
>> +        s->status = COLO_PROXY_RUNNING;
>> +    }
>> +}
>> +
>> +int colo_proxy_start(int mode)
>> +{
>> +    Error *err = NULL;
>> +    qemu_foreach_netfilter(colo_proxy_start_one, &mode, &err);
>> +    if (err) {
>> +        return -1;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static void colo_proxy_stop_one(NetFilterState *nf,
>> +                                      void *opaque, Error **errp)
>> +{
>> +    COLOProxyState *s;
>> +    int mode;
>> +
>> +    if (strcmp(object_get_typename(OBJECT(nf)), TYPE_FILTER_COLO_PROXY)) {
>> +        return;
>> +    }
>> +
>> +    s = FILTER_COLO_PROXY(nf);
>> +    mode = *(int *)opaque;
>> +    assert(s->colo_mode == mode);
>> +
>> +    s->status = COLO_PROXY_DONE;
>> +    if (s->sockfd >= 0) {
>> +        qemu_set_fd_handler(s->sockfd, NULL, NULL, NULL);
>> +        closesocket(s->sockfd);
>> +    }
>> +    if (s->colo_mode == COLO_MODE_PRIMARY) {
>> +        colo_proxy_primary_checkpoint(s);
>> +        qemu_event_set(&s->need_compare_ev);
>> +        qemu_thread_join(&s->thread);
>> +    } else {
>> +        colo_proxy_secondary_checkpoint(s);
>> +    }
>> +}
>> +
>> +void colo_proxy_stop(int mode)
>> +{
>> +    Error *err = NULL;
>> +    qemu_foreach_netfilter(colo_proxy_stop_one, &mode, &err);
>> +}
>> +
>>   static void colo_proxy_setup(NetFilterState *nf, Error **errp)
>>   {
>>       COLOProxyState *s = FILTER_COLO_PROXY(nf);
>> -- 
>> 1.9.1
>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func
  2016-02-19 20:01   ` Dr. David Alan Gilbert
@ 2016-02-22  5:51     ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-02-22  5:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 02/20/2016 04:01 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Colo need to forward packets
>> we start socket server in secondary and primary
>> connect to secondary in startup
>> the packet recv by primary forward to secondary
>> the packet send by secondary forward to primary
>>
>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> I found one problem with the socket setup is that the
> packets from the primary and secondary aren't tied to the
> checkpoint they are part of; so for example a packet from the secondary
> may reach the primary at the start of the next checkpoint, causing a
> miscomparison.
> I added a counter to discard old packets.
>
> Dave

I will fix it in colo-compare module.

Thanks
zhangchen

>
>> ---
>>   net/colo-proxy.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 114 insertions(+)
>>
>> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
>> index ba2bbe7..2347bbf 100644
>> --- a/net/colo-proxy.c
>> +++ b/net/colo-proxy.c
>> @@ -172,6 +172,69 @@ bool colo_proxy_query_checkpoint(void)
>>       return colo_do_checkpoint;
>>   }
>>   
>> +/*
>> + * send a packet to peer
>> + * >=0: success
>> + * <0: fail
>> + */
>> +static ssize_t colo_proxy_sock_send(NetFilterState *nf,
>> +                                         const struct iovec *iov,
>> +                                         int iovcnt)
>> +{
>> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
>> +    ssize_t ret = 0;
>> +    ssize_t size = 0;
>> +    struct iovec sizeiov = {
>> +        .iov_base = &size,
>> +        .iov_len = sizeof(size)
>> +    };
>> +    size = iov_size(iov, iovcnt);
>> +    if (!size) {
>> +        return 0;
>> +    }
>> +
>> +    ret = iov_send(s->sockfd, &sizeiov, 1, 0, sizeof(size));
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    ret = iov_send(s->sockfd, iov, iovcnt, 0, size);
>> +    return ret;
>> +}
>> +
>> +/*
>> + * receive a packet from peer
>> + * in primary: enqueue packet to secondary_list
>> + * in secondary: pass packet to next
>> + */
>> +static void colo_proxy_sock_receive(void *opaque)
>> +{
>> +    NetFilterState *nf = opaque;
>> +    COLOProxyState *s = FILTER_COLO_PROXY(nf);
>> +    ssize_t len = 0;
>> +    struct iovec sizeiov = {
>> +        .iov_base = &len,
>> +        .iov_len = sizeof(len)
>> +    };
>> +
>> +    iov_recv(s->sockfd, &sizeiov, 1, 0, sizeof(len));
>> +    if (len > 0 && len < NET_BUFSIZE) {
>> +        char *buf = g_malloc0(len);
>> +        struct iovec iov = {
>> +            .iov_base = buf,
>> +            .iov_len = len
>> +        };
>> +
>> +        iov_recv(s->sockfd, &iov, 1, 0, len);
>> +        if (s->colo_mode == COLO_MODE_PRIMARY) {
>> +            colo_proxy_enqueue_secondary_packet(nf, buf, len);
>> +            /* buf will be release when pakcet destroy */
>> +        } else {
>> +            qemu_net_queue_send(s->incoming_queue, nf->netdev,
>> +                            0, (const uint8_t *)buf, len, NULL);
>> +        }
>> +    }
>> +}
>> +
>>   static ssize_t colo_proxy_receive_iov(NetFilterState *nf,
>>                                            NetClientState *sender,
>>                                            unsigned flags,
>> @@ -208,6 +271,57 @@ static void colo_proxy_cleanup(NetFilterState *nf)
>>       qemu_event_destroy(&s->need_compare_ev);
>>   }
>>   
>> +/* wait for peer connecting
>> + * NOTE: this function will block the caller
>> + * 0 on success, otherwise returns -1
>> + */
>> +static int colo_wait_incoming(COLOProxyState *s)
>> +{
>> +    struct sockaddr_in addr;
>> +    socklen_t addrlen = sizeof(addr);
>> +    int accept_sock, err;
>> +    int fd = inet_listen(s->addr, NULL, 256, SOCK_STREAM, 0, NULL);
>> +
>> +    if (fd < 0) {
>> +        error_report("colo proxy listen failed");
>> +        return -1;
>> +    }
>> +
>> +    do {
>> +        accept_sock = qemu_accept(fd, (struct sockaddr *)&addr, &addrlen);
>> +        err = socket_error();
>> +    } while (accept_sock < 0 && err == EINTR);
>> +    closesocket(fd);
>> +
>> +    if (accept_sock < 0) {
>> +        error_report("colo proxy accept failed(%s)", strerror(err));
>> +        return -1;
>> +    }
>> +    s->sockfd = accept_sock;
>> +
>> +    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
>> +
>> +    return 0;
>> +}
>> +
>> +/* try to connect listening server
>> + * 0 on success, otherwise something wrong
>> + */
>> +static ssize_t colo_proxy_connect(COLOProxyState *s)
>> +{
>> +    int sock;
>> +    sock = inet_connect(s->addr, NULL);
>> +
>> +    if (sock < 0) {
>> +        error_report("colo proxy inet_connect failed");
>> +        return -1;
>> +    }
>> +    s->sockfd = sock;
>> +    qemu_set_fd_handler(s->sockfd, colo_proxy_sock_receive, NULL, (void *)s);
>> +
>> +    return 0;
>> +}
>> +
>>   static void colo_proxy_notify_checkpoint(void)
>>   {
>>       trace_colo_proxy("colo_proxy_notify_checkpoint");
>> -- 
>> 1.9.1
>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2016-02-19 20:04   ` Dr. David Alan Gilbert
@ 2016-02-22  6:41     ` Zhang Chen
  2016-02-22 19:54       ` Dr. David Alan Gilbert
  2016-02-23 17:58       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 75+ messages in thread
From: Zhang Chen @ 2016-02-22  6:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 02/20/2016 04:04 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> In here we will handle ip packet and connection
>>
>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 130 insertions(+)
>>
>> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
>> index 5e5c72e..06bab80 100644
>> --- a/net/colo-proxy.c
>> +++ b/net/colo-proxy.c
>> @@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
>>       return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
>>   }
>>   
>> +static void connection_destroy(void *opaque)
>> +{
>> +    Connection *conn = opaque;
>> +
>> +    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
>> +    g_queue_free(&conn->primary_list);
>> +    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
> Be careful about these lists and which threads access them;
> I found I could occasionally trigger a seg fault as two
> threads tried to manipulate them at once; I just put a 'list_lock'
> in the connection, which seems to fix it, but I might have to be
> more careful with deadlocks.

Thanks for your work to colo.
and where can I  see your code for colo-proxy?
maybe I need it to make my code better.

>
>> +    g_queue_free(&conn->secondary_list);
>> +    g_slice_free(Connection, conn);
>> +}
>> +
>> +static Connection *connection_new(ConnectionKey *key)
>> +{
>> +    Connection *conn = g_slice_new(Connection);
>> +
>> +    conn->ip_proto = key->ip_proto;
>> +    conn->processing = false;
>> +    g_queue_init(&conn->primary_list);
>> +    g_queue_init(&conn->secondary_list);
>> +
>> +    return conn;
>> +}
>> +
>> +/*
>> + * Clear hashtable, stop this hash growing really huge
>> + */
>> +static void clear_connection_hashtable(COLOProxyState *s)
>> +{
>> +    s->hashtable_size = 0;
>> +    g_hash_table_remove_all(colo_conn_hash);
>> +    trace_colo_proxy("clear_connection_hashtable");
>> +}
>> +
>>   bool colo_proxy_query_checkpoint(void)
>>   {
>>       return colo_do_checkpoint;
>>   }
>>   
>> +/* Return 0 on success, or return -1 if the pkt is corrupted */
>> +static int parse_packet_early(Packet *pkt, ConnectionKey *key)
>> +{
>> +    int network_length;
>> +    uint8_t *data = pkt->data;
>> +    uint16_t l3_proto;
>> +    uint32_t tmp_ports;
>> +    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
>> +
>> +    pkt->network_layer = data + ETH_HLEN;
>> +    l3_proto = eth_get_l3_proto(data, l2hdr_len);
>> +    if (l3_proto != ETH_P_IP) {
>> +        if (l3_proto == ETH_P_ARP) {
>> +            return -1;
>> +        }
>> +        return 0;
>> +    }
>> +
>> +    network_length = pkt->ip->ip_hl * 4;
>> +    pkt->transport_layer = pkt->network_layer + network_length;
>> +    key->ip_proto = pkt->ip->ip_p;
>> +    key->src = pkt->ip->ip_src;
>> +    key->dst = pkt->ip->ip_dst;
>> +
>> +    switch (key->ip_proto) {
>> +    case IPPROTO_TCP:
>> +    case IPPROTO_UDP:
>> +    case IPPROTO_DCCP:
>> +    case IPPROTO_ESP:
>> +    case IPPROTO_SCTP:
>> +    case IPPROTO_UDPLITE:
>> +        tmp_ports = *(uint32_t *)(pkt->transport_layer);
>> +        key->src_port = tmp_ports & 0xffff;
>> +        key->dst_port = tmp_ports >> 16;
> These fields are not byteswapped; it makes it very confusing
> when printing them for debug;  I added htons around every
> reading of the ports from the packets.
>
> Dave

I will fix it in colo-compare module.

Thanks
zhangchen

>> +        break;
>> +    case IPPROTO_AH:
>> +        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
>> +        key->src_port = tmp_ports & 0xffff;
>> +        key->dst_port = tmp_ports >> 16;
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static Packet *packet_new(COLOProxyState *s, void *data,
>> +                          int size, ConnectionKey *key, NetClientState *sender)
>> +{
>> +    Packet *pkt = g_slice_new(Packet);
>> +
>> +    pkt->data = data;
>> +    pkt->size = size;
>> +    pkt->s = s;
>> +    pkt->sender = sender;
>> +
>> +    if (parse_packet_early(pkt, key)) {
>> +        packet_destroy(pkt, NULL);
>> +        pkt = NULL;
>> +    }
>> +
>> +    return pkt;
>> +}
>> +
>> +static void packet_destroy(void *opaque, void *user_data)
>> +{
>> +    Packet *pkt = opaque;
>> +    g_free(pkt->data);
>> +    g_slice_free(Packet, pkt);
>> +}
>> +
>> +/* if not found, creata a new connection and add to hash table */
>> +static Connection *colo_proxy_get_conn(COLOProxyState *s,
>> +            ConnectionKey *key)
>> +{
>> +    /* FIXME: protect colo_conn_hash */
>> +    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
>> +
>> +    if (conn == NULL) {
>> +        ConnectionKey *new_key = g_malloc(sizeof(*key));
>> +
>> +        conn = connection_new(key);
>> +        memcpy(new_key, key, sizeof(*key));
>> +
>> +        s->hashtable_size++;
>> +        if (s->hashtable_size > hashtable_max_size) {
>> +            trace_colo_proxy("colo proxy connection hashtable full, clear it");
>> +            clear_connection_hashtable(s);
>> +        } else {
>> +            g_hash_table_insert(colo_conn_hash, new_key, conn);
>> +        }
>> +    }
>> +
>> +     return conn;
>> +}
>> +
>>   static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
>>                                            NetClientState *sender,
>>                                            unsigned flags,
>> -- 
>> 1.9.1
>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2016-02-22  6:41     ` Zhang Chen
@ 2016-02-22 19:54       ` Dr. David Alan Gilbert
  2016-02-23 17:58       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-22 19:54 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 02/20/2016 04:04 AM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>
> >>In here we will handle ip packet and connection
> >>
> >>Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>---
> >>  net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 130 insertions(+)
> >>
> >>diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> >>index 5e5c72e..06bab80 100644
> >>--- a/net/colo-proxy.c
> >>+++ b/net/colo-proxy.c
> >>@@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
> >>      return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
> >>  }
> >>+static void connection_destroy(void *opaque)
> >>+{
> >>+    Connection *conn = opaque;
> >>+
> >>+    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
> >>+    g_queue_free(&conn->primary_list);
> >>+    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
> >Be careful about these lists and which threads access them;
> >I found I could occasionally trigger a seg fault as two
> >threads tried to manipulate them at once; I just put a 'list_lock'
> >in the connection, which seems to fix it, but I might have to be
> >more careful with deadlocks.
> 
> Thanks for your work to colo.
> and where can I  see your code for colo-proxy?

I'll clean it up and post it in the next couple of days.

> maybe I need it to make my code better.

Maybe, but it's a bit hacky at the moment,  I added
on sequence number compensation, like in the old kernel proxy,
but I've only done it for inbound connections, and the code
doesn't yet:
   a) Handle sequence numbers after failover
   b) deal with socket shutdown properly
   c) try to deal with TCP fragmentation.

There's lots of different places we get random data from that
throws the comparison.

Dave

> 
> >
> >>+    g_queue_free(&conn->secondary_list);
> >>+    g_slice_free(Connection, conn);
> >>+}
> >>+
> >>+static Connection *connection_new(ConnectionKey *key)
> >>+{
> >>+    Connection *conn = g_slice_new(Connection);
> >>+
> >>+    conn->ip_proto = key->ip_proto;
> >>+    conn->processing = false;
> >>+    g_queue_init(&conn->primary_list);
> >>+    g_queue_init(&conn->secondary_list);
> >>+
> >>+    return conn;
> >>+}
> >>+
> >>+/*
> >>+ * Clear hashtable, stop this hash growing really huge
> >>+ */
> >>+static void clear_connection_hashtable(COLOProxyState *s)
> >>+{
> >>+    s->hashtable_size = 0;
> >>+    g_hash_table_remove_all(colo_conn_hash);
> >>+    trace_colo_proxy("clear_connection_hashtable");
> >>+}
> >>+
> >>  bool colo_proxy_query_checkpoint(void)
> >>  {
> >>      return colo_do_checkpoint;
> >>  }
> >>+/* Return 0 on success, or return -1 if the pkt is corrupted */
> >>+static int parse_packet_early(Packet *pkt, ConnectionKey *key)
> >>+{
> >>+    int network_length;
> >>+    uint8_t *data = pkt->data;
> >>+    uint16_t l3_proto;
> >>+    uint32_t tmp_ports;
> >>+    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> >>+
> >>+    pkt->network_layer = data + ETH_HLEN;
> >>+    l3_proto = eth_get_l3_proto(data, l2hdr_len);
> >>+    if (l3_proto != ETH_P_IP) {
> >>+        if (l3_proto == ETH_P_ARP) {
> >>+            return -1;
> >>+        }
> >>+        return 0;
> >>+    }
> >>+
> >>+    network_length = pkt->ip->ip_hl * 4;
> >>+    pkt->transport_layer = pkt->network_layer + network_length;
> >>+    key->ip_proto = pkt->ip->ip_p;
> >>+    key->src = pkt->ip->ip_src;
> >>+    key->dst = pkt->ip->ip_dst;
> >>+
> >>+    switch (key->ip_proto) {
> >>+    case IPPROTO_TCP:
> >>+    case IPPROTO_UDP:
> >>+    case IPPROTO_DCCP:
> >>+    case IPPROTO_ESP:
> >>+    case IPPROTO_SCTP:
> >>+    case IPPROTO_UDPLITE:
> >>+        tmp_ports = *(uint32_t *)(pkt->transport_layer);
> >>+        key->src_port = tmp_ports & 0xffff;
> >>+        key->dst_port = tmp_ports >> 16;
> >These fields are not byteswapped; it makes it very confusing
> >when printing them for debug;  I added htons around every
> >reading of the ports from the packets.
> >
> >Dave
> 
> I will fix it in colo-compare module.
> 
> Thanks
> zhangchen
> 
> >>+        break;
> >>+    case IPPROTO_AH:
> >>+        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
> >>+        key->src_port = tmp_ports & 0xffff;
> >>+        key->dst_port = tmp_ports >> 16;
> >>+        break;
> >>+    default:
> >>+        break;
> >>+    }
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static Packet *packet_new(COLOProxyState *s, void *data,
> >>+                          int size, ConnectionKey *key, NetClientState *sender)
> >>+{
> >>+    Packet *pkt = g_slice_new(Packet);
> >>+
> >>+    pkt->data = data;
> >>+    pkt->size = size;
> >>+    pkt->s = s;
> >>+    pkt->sender = sender;
> >>+
> >>+    if (parse_packet_early(pkt, key)) {
> >>+        packet_destroy(pkt, NULL);
> >>+        pkt = NULL;
> >>+    }
> >>+
> >>+    return pkt;
> >>+}
> >>+
> >>+static void packet_destroy(void *opaque, void *user_data)
> >>+{
> >>+    Packet *pkt = opaque;
> >>+    g_free(pkt->data);
> >>+    g_slice_free(Packet, pkt);
> >>+}
> >>+
> >>+/* if not found, creata a new connection and add to hash table */
> >>+static Connection *colo_proxy_get_conn(COLOProxyState *s,
> >>+            ConnectionKey *key)
> >>+{
> >>+    /* FIXME: protect colo_conn_hash */
> >>+    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
> >>+
> >>+    if (conn == NULL) {
> >>+        ConnectionKey *new_key = g_malloc(sizeof(*key));
> >>+
> >>+        conn = connection_new(key);
> >>+        memcpy(new_key, key, sizeof(*key));
> >>+
> >>+        s->hashtable_size++;
> >>+        if (s->hashtable_size > hashtable_max_size) {
> >>+            trace_colo_proxy("colo proxy connection hashtable full, clear it");
> >>+            clear_connection_hashtable(s);
> >>+        } else {
> >>+            g_hash_table_insert(colo_conn_hash, new_key, conn);
> >>+        }
> >>+    }
> >>+
> >>+     return conn;
> >>+}
> >>+
> >>  static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
> >>                                           NetClientState *sender,
> >>                                           unsigned flags,
> >>-- 
> >>1.9.1
> >>
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2016-02-22  6:41     ` Zhang Chen
  2016-02-22 19:54       ` Dr. David Alan Gilbert
@ 2016-02-23 17:58       ` Dr. David Alan Gilbert
  2016-02-24  2:01         ` Zhang Chen
  1 sibling, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-23 17:58 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 02/20/2016 04:04 AM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>
> >>In here we will handle ip packet and connection
> >>
> >>Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >>---
> >>  net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 130 insertions(+)
> >>
> >>diff --git a/net/colo-proxy.c b/net/colo-proxy.c
> >>index 5e5c72e..06bab80 100644
> >>--- a/net/colo-proxy.c
> >>+++ b/net/colo-proxy.c
> >>@@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
> >>      return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
> >>  }
> >>+static void connection_destroy(void *opaque)
> >>+{
> >>+    Connection *conn = opaque;
> >>+
> >>+    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
> >>+    g_queue_free(&conn->primary_list);
> >>+    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
> >Be careful about these lists and which threads access them;
> >I found I could occasionally trigger a seg fault as two
> >threads tried to manipulate them at once; I just put a 'list_lock'
> >in the connection, which seems to fix it, but I might have to be
> >more careful with deadlocks.
> 
> Thanks for your work to colo.
> and where can I  see your code for colo-proxy?
> maybe I need it to make my code better.

Here is my latest version; it seems to just about work; but very
much still working on it:

   https://github.com/orbitfp7/qemu/tree/orbit-wp4-colo-jan16
 with the wp4-colo-rdma-2016-02-23 tag.

Dave
> 
> >
> >>+    g_queue_free(&conn->secondary_list);
> >>+    g_slice_free(Connection, conn);
> >>+}
> >>+
> >>+static Connection *connection_new(ConnectionKey *key)
> >>+{
> >>+    Connection *conn = g_slice_new(Connection);
> >>+
> >>+    conn->ip_proto = key->ip_proto;
> >>+    conn->processing = false;
> >>+    g_queue_init(&conn->primary_list);
> >>+    g_queue_init(&conn->secondary_list);
> >>+
> >>+    return conn;
> >>+}
> >>+
> >>+/*
> >>+ * Clear hashtable, stop this hash growing really huge
> >>+ */
> >>+static void clear_connection_hashtable(COLOProxyState *s)
> >>+{
> >>+    s->hashtable_size = 0;
> >>+    g_hash_table_remove_all(colo_conn_hash);
> >>+    trace_colo_proxy("clear_connection_hashtable");
> >>+}
> >>+
> >>  bool colo_proxy_query_checkpoint(void)
> >>  {
> >>      return colo_do_checkpoint;
> >>  }
> >>+/* Return 0 on success, or return -1 if the pkt is corrupted */
> >>+static int parse_packet_early(Packet *pkt, ConnectionKey *key)
> >>+{
> >>+    int network_length;
> >>+    uint8_t *data = pkt->data;
> >>+    uint16_t l3_proto;
> >>+    uint32_t tmp_ports;
> >>+    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> >>+
> >>+    pkt->network_layer = data + ETH_HLEN;
> >>+    l3_proto = eth_get_l3_proto(data, l2hdr_len);
> >>+    if (l3_proto != ETH_P_IP) {
> >>+        if (l3_proto == ETH_P_ARP) {
> >>+            return -1;
> >>+        }
> >>+        return 0;
> >>+    }
> >>+
> >>+    network_length = pkt->ip->ip_hl * 4;
> >>+    pkt->transport_layer = pkt->network_layer + network_length;
> >>+    key->ip_proto = pkt->ip->ip_p;
> >>+    key->src = pkt->ip->ip_src;
> >>+    key->dst = pkt->ip->ip_dst;
> >>+
> >>+    switch (key->ip_proto) {
> >>+    case IPPROTO_TCP:
> >>+    case IPPROTO_UDP:
> >>+    case IPPROTO_DCCP:
> >>+    case IPPROTO_ESP:
> >>+    case IPPROTO_SCTP:
> >>+    case IPPROTO_UDPLITE:
> >>+        tmp_ports = *(uint32_t *)(pkt->transport_layer);
> >>+        key->src_port = tmp_ports & 0xffff;
> >>+        key->dst_port = tmp_ports >> 16;
> >These fields are not byteswapped; it makes it very confusing
> >when printing them for debug;  I added htons around every
> >reading of the ports from the packets.
> >
> >Dave
> 
> I will fix it in colo-compare module.
> 
> Thanks
> zhangchen
> 
> >>+        break;
> >>+    case IPPROTO_AH:
> >>+        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
> >>+        key->src_port = tmp_ports & 0xffff;
> >>+        key->dst_port = tmp_ports >> 16;
> >>+        break;
> >>+    default:
> >>+        break;
> >>+    }
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static Packet *packet_new(COLOProxyState *s, void *data,
> >>+                          int size, ConnectionKey *key, NetClientState *sender)
> >>+{
> >>+    Packet *pkt = g_slice_new(Packet);
> >>+
> >>+    pkt->data = data;
> >>+    pkt->size = size;
> >>+    pkt->s = s;
> >>+    pkt->sender = sender;
> >>+
> >>+    if (parse_packet_early(pkt, key)) {
> >>+        packet_destroy(pkt, NULL);
> >>+        pkt = NULL;
> >>+    }
> >>+
> >>+    return pkt;
> >>+}
> >>+
> >>+static void packet_destroy(void *opaque, void *user_data)
> >>+{
> >>+    Packet *pkt = opaque;
> >>+    g_free(pkt->data);
> >>+    g_slice_free(Packet, pkt);
> >>+}
> >>+
> >>+/* if not found, creata a new connection and add to hash table */
> >>+static Connection *colo_proxy_get_conn(COLOProxyState *s,
> >>+            ConnectionKey *key)
> >>+{
> >>+    /* FIXME: protect colo_conn_hash */
> >>+    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
> >>+
> >>+    if (conn == NULL) {
> >>+        ConnectionKey *new_key = g_malloc(sizeof(*key));
> >>+
> >>+        conn = connection_new(key);
> >>+        memcpy(new_key, key, sizeof(*key));
> >>+
> >>+        s->hashtable_size++;
> >>+        if (s->hashtable_size > hashtable_max_size) {
> >>+            trace_colo_proxy("colo proxy connection hashtable full, clear it");
> >>+            clear_connection_hashtable(s);
> >>+        } else {
> >>+            g_hash_table_insert(colo_conn_hash, new_key, conn);
> >>+        }
> >>+    }
> >>+
> >>+     return conn;
> >>+}
> >>+
> >>  static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
> >>                                           NetClientState *sender,
> >>                                           unsigned flags,
> >>-- 
> >>1.9.1
> >>
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Thanks
> zhangchen
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection
  2016-02-23 17:58       ` Dr. David Alan Gilbert
@ 2016-02-24  2:01         ` Zhang Chen
  0 siblings, 0 replies; 75+ messages in thread
From: Zhang Chen @ 2016-02-24  2:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong,
	qemu devel, Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka,
	Yang Hongyang



On 02/24/2016 01:58 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>
>> On 02/20/2016 04:04 AM, Dr. David Alan Gilbert wrote:
>>> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>>>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>>
>>>> In here we will handle ip packet and connection
>>>>
>>>> Signed-off-by: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> ---
>>>>   net/colo-proxy.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 130 insertions(+)
>>>>
>>>> diff --git a/net/colo-proxy.c b/net/colo-proxy.c
>>>> index 5e5c72e..06bab80 100644
>>>> --- a/net/colo-proxy.c
>>>> +++ b/net/colo-proxy.c
>>>> @@ -167,11 +167,141 @@ static int connection_key_equal(const void *opaque1, const void *opaque2)
>>>>       return memcmp(opaque1, opaque2, sizeof(ConnectionKey)) == 0;
>>>>   }
>>>> +static void connection_destroy(void *opaque)
>>>> +{
>>>> +    Connection *conn = opaque;
>>>> +
>>>> +    g_queue_foreach(&conn->primary_list, packet_destroy, NULL);
>>>> +    g_queue_free(&conn->primary_list);
>>>> +    g_queue_foreach(&conn->secondary_list, packet_destroy, NULL);
>>> Be careful about these lists and which threads access them;
>>> I found I could occasionally trigger a seg fault as two
>>> threads tried to manipulate them at once; I just put a 'list_lock'
>>> in the connection, which seems to fix it, but I might have to be
>>> more careful with deadlocks.
>> Thanks for your work to colo.
>> and where can I  see your code for colo-proxy?
>> maybe I need it to make my code better.
> Here is my latest version; it seems to just about work; but very
> much still working on it:
>
>     https://github.com/orbitfp7/qemu/tree/orbit-wp4-colo-jan16
>   with the wp4-colo-rdma-2016-02-23 tag.
>
> Dave

Thanks for your work to colo~~
zhangchen

>>>> +    g_queue_free(&conn->secondary_list);
>>>> +    g_slice_free(Connection, conn);
>>>> +}
>>>> +
>>>> +static Connection *connection_new(ConnectionKey *key)
>>>> +{
>>>> +    Connection *conn = g_slice_new(Connection);
>>>> +
>>>> +    conn->ip_proto = key->ip_proto;
>>>> +    conn->processing = false;
>>>> +    g_queue_init(&conn->primary_list);
>>>> +    g_queue_init(&conn->secondary_list);
>>>> +
>>>> +    return conn;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Clear hashtable, stop this hash growing really huge
>>>> + */
>>>> +static void clear_connection_hashtable(COLOProxyState *s)
>>>> +{
>>>> +    s->hashtable_size = 0;
>>>> +    g_hash_table_remove_all(colo_conn_hash);
>>>> +    trace_colo_proxy("clear_connection_hashtable");
>>>> +}
>>>> +
>>>>   bool colo_proxy_query_checkpoint(void)
>>>>   {
>>>>       return colo_do_checkpoint;
>>>>   }
>>>> +/* Return 0 on success, or return -1 if the pkt is corrupted */
>>>> +static int parse_packet_early(Packet *pkt, ConnectionKey *key)
>>>> +{
>>>> +    int network_length;
>>>> +    uint8_t *data = pkt->data;
>>>> +    uint16_t l3_proto;
>>>> +    uint32_t tmp_ports;
>>>> +    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
>>>> +
>>>> +    pkt->network_layer = data + ETH_HLEN;
>>>> +    l3_proto = eth_get_l3_proto(data, l2hdr_len);
>>>> +    if (l3_proto != ETH_P_IP) {
>>>> +        if (l3_proto == ETH_P_ARP) {
>>>> +            return -1;
>>>> +        }
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    network_length = pkt->ip->ip_hl * 4;
>>>> +    pkt->transport_layer = pkt->network_layer + network_length;
>>>> +    key->ip_proto = pkt->ip->ip_p;
>>>> +    key->src = pkt->ip->ip_src;
>>>> +    key->dst = pkt->ip->ip_dst;
>>>> +
>>>> +    switch (key->ip_proto) {
>>>> +    case IPPROTO_TCP:
>>>> +    case IPPROTO_UDP:
>>>> +    case IPPROTO_DCCP:
>>>> +    case IPPROTO_ESP:
>>>> +    case IPPROTO_SCTP:
>>>> +    case IPPROTO_UDPLITE:
>>>> +        tmp_ports = *(uint32_t *)(pkt->transport_layer);
>>>> +        key->src_port = tmp_ports & 0xffff;
>>>> +        key->dst_port = tmp_ports >> 16;
>>> These fields are not byteswapped; it makes it very confusing
>>> when printing them for debug;  I added htons around every
>>> reading of the ports from the packets.
>>>
>>> Dave
>> I will fix it in colo-compare module.
>>
>> Thanks
>> zhangchen
>>
>>>> +        break;
>>>> +    case IPPROTO_AH:
>>>> +        tmp_ports = *(uint32_t *)(pkt->transport_layer + 4);
>>>> +        key->src_port = tmp_ports & 0xffff;
>>>> +        key->dst_port = tmp_ports >> 16;
>>>> +        break;
>>>> +    default:
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static Packet *packet_new(COLOProxyState *s, void *data,
>>>> +                          int size, ConnectionKey *key, NetClientState *sender)
>>>> +{
>>>> +    Packet *pkt = g_slice_new(Packet);
>>>> +
>>>> +    pkt->data = data;
>>>> +    pkt->size = size;
>>>> +    pkt->s = s;
>>>> +    pkt->sender = sender;
>>>> +
>>>> +    if (parse_packet_early(pkt, key)) {
>>>> +        packet_destroy(pkt, NULL);
>>>> +        pkt = NULL;
>>>> +    }
>>>> +
>>>> +    return pkt;
>>>> +}
>>>> +
>>>> +static void packet_destroy(void *opaque, void *user_data)
>>>> +{
>>>> +    Packet *pkt = opaque;
>>>> +    g_free(pkt->data);
>>>> +    g_slice_free(Packet, pkt);
>>>> +}
>>>> +
>>>> +/* if not found, creata a new connection and add to hash table */
>>>> +static Connection *colo_proxy_get_conn(COLOProxyState *s,
>>>> +            ConnectionKey *key)
>>>> +{
>>>> +    /* FIXME: protect colo_conn_hash */
>>>> +    Connection *conn = g_hash_table_lookup(colo_conn_hash, key);
>>>> +
>>>> +    if (conn == NULL) {
>>>> +        ConnectionKey *new_key = g_malloc(sizeof(*key));
>>>> +
>>>> +        conn = connection_new(key);
>>>> +        memcpy(new_key, key, sizeof(*key));
>>>> +
>>>> +        s->hashtable_size++;
>>>> +        if (s->hashtable_size > hashtable_max_size) {
>>>> +            trace_colo_proxy("colo proxy connection hashtable full, clear it");
>>>> +            clear_connection_hashtable(s);
>>>> +        } else {
>>>> +            g_hash_table_insert(colo_conn_hash, new_key, conn);
>>>> +        }
>>>> +    }
>>>> +
>>>> +     return conn;
>>>> +}
>>>> +
>>>>   static ssize_t colo_proxy_enqueue_primary_packet(NetFilterState *nf,
>>>>                                            NetClientState *sender,
>>>>                                            unsigned flags,
>>>> -- 
>>>> 1.9.1
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>>
>>> .
>>>
>> -- 
>> Thanks
>> zhangchen
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Thanks
zhangchen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
                   ` (12 preceding siblings ...)
  2016-01-08 11:19 ` Dr. David Alan Gilbert
@ 2016-02-29 20:04 ` Dr. David Alan Gilbert
  2016-03-01  2:39   ` Li Zhijian
  13 siblings, 1 reply; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-29 20:04 UTC (permalink / raw)
  To: Zhang Chen
  Cc: Li Zhijian, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> 
> Hi,all
> 
> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> based on qemu netfilter and it's a plugin for qemu netfilter. the function
> keep Secondary VM connect normal to Primary VM and compare packets
> sent by PVM to sent by SVM.if the packet difference,notify COLO do
> checkpoint and send all primary packet has queued.

Hi Zhangchen,
  How are you dealing with the IP 'identification' field?
It's a very very random field, and not just the initial value in the connection.
I can't see how the kernel colo-proxy dealt with it either; but I think it's
comparison was after defragementation so probably ignored the identification
field - wouldn't that confuse a client at failover?

Dave

> You can also get the series from:
> 
> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> 
> Usage:
> 
> primary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> 
> secondary:
> -netdev tap,id=bn0 -device e1000,netdev=bn0
> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port 
> 
> NOTE:
> queue must set "all". See enum NetFilterDirection for detail.
> colo-proxy need queue all packets
> colo-proxy V2 just can compare ip packet
> 
> 
> ## Background
> 
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> (SVM) run in parallel. They receive the same request from client, and generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)is conducted.
> 
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> 
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> 
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
> 
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called colo-proxy.
> 
> V2:
>   rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>   fix dave's comments
>   fix wency's comments
>   fix zhanghailiang's comments
> 
> v1:
>   initial patch.
> 
> 
> 
> zhangchen (10):
>   Init colo-proxy object based on netfilter
>   Jhash: add linux kernel jhashtable in qemu
>   Colo-proxy: add colo-proxy framework
>   Colo-proxy: add data structure and jhash func
>   net/colo-proxy: Add colo interface to use proxy
>   net/colo-proxy: add socket used by forward func
>   net/colo-proxy: Add packet enqueue & handle func
>   net/colo-proxy: Handle packet and connection
>   net/colo-proxy: Compare pri pkt to sec pkt
>   net/colo-proxy: Colo-proxy do checkpoint and clear
> 
>  include/qemu/jhash.h |  61 ++++
>  net/Makefile.objs    |   1 +
>  net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/colo-proxy.h     |  24 ++
>  qemu-options.hx      |   6 +
>  trace-events         |   8 +
>  vl.c                 |   3 +-
>  7 files changed, 1041 insertions(+), 1 deletion(-)
>  create mode 100644 include/qemu/jhash.h
>  create mode 100644 net/colo-proxy.c
>  create mode 100644 net/colo-proxy.h
> 
> -- 
> 1.9.1
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-02-29 20:04 ` Dr. David Alan Gilbert
@ 2016-03-01  2:39   ` Li Zhijian
  2016-03-01 10:48     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 75+ messages in thread
From: Li Zhijian @ 2016-03-01  2:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Zhang Chen
  Cc: zhanghailiang, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang



On 03/01/2016 04:04 AM, Dr. David Alan Gilbert wrote:
> * Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
>> From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
>>
>> Hi,all
>>
>> This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
>> based on qemu netfilter and it's a plugin for qemu netfilter. the function
>> keep Secondary VM connect normal to Primary VM and compare packets
>> sent by PVM to sent by SVM.if the packet difference,notify COLO do
>> checkpoint and send all primary packet has queued.
>
> Hi Zhangchen,
>    How are you dealing with the IP 'identification' field?
> It's a very very random field, and not just the initial value in the connection.
> I can't see how the kernel colo-proxy dealt with it either; but I think it's
> comparison was after defragementation so probably ignored the identification
> field
You are right, most kernel colo-proxy code is working at mangle table (after defrag).
and colo proxy only compare the contents of L4(TCP/UDP) excluding IP identification.

>  - wouldn't that confuse a client at failover?
Err..., instersting question.

for example, a COLO including primay(PVM) adn secondary(SVM)
1. primay is sending a compared P_IP packet(identification=0x12345, split to IP_s1, IP_s2..IP_s100) to client
2. client is receiving the ip segment(but IP_s2, IP_s50, IP_s80..IP_s99 are missing)
    and primary host is down.
3. secondary VM takeover, and send a S_IP packet(ip contents is always same as at PVM)

in step 3,
if the S_IP identification isn't 0x12345, client will drop the ip segment at step 2 because of defrag timeout.
if the S_IP identification isn 0x12345, client may mix the segment from PVM or SVM(just like segment come
from different router), but that's Okay, because we have ensured the ip contents is identical.

so IMO, it will not confuse the client at failover.

>
> Dave
>
>> You can also get the series from:
>>
>> https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
>>
>> Usage:
>>
>> primary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
>>
>> secondary:
>> -netdev tap,id=bn0 -device e1000,netdev=bn0
>> -object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
>>
>> NOTE:
>> queue must set "all". See enum NetFilterDirection for detail.
>> colo-proxy need queue all packets
>> colo-proxy V2 just can compare ip packet
>>
>>
>> ## Background
>>
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
>> project is a high availability solution. Both Primary VM (PVM) and Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)is conducted.
>>
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called colo-proxy.
>>
>> V2:
>>    rebase colo-proxy with qemu-colo-v2.2-periodic-mode
>>    fix dave's comments
>>    fix wency's comments
>>    fix zhanghailiang's comments
>>
>> v1:
>>    initial patch.
>>
>>
>>
>> zhangchen (10):
>>    Init colo-proxy object based on netfilter
>>    Jhash: add linux kernel jhashtable in qemu
>>    Colo-proxy: add colo-proxy framework
>>    Colo-proxy: add data structure and jhash func
>>    net/colo-proxy: Add colo interface to use proxy
>>    net/colo-proxy: add socket used by forward func
>>    net/colo-proxy: Add packet enqueue & handle func
>>    net/colo-proxy: Handle packet and connection
>>    net/colo-proxy: Compare pri pkt to sec pkt
>>    net/colo-proxy: Colo-proxy do checkpoint and clear
>>
>>   include/qemu/jhash.h |  61 ++++
>>   net/Makefile.objs    |   1 +
>>   net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/colo-proxy.h     |  24 ++
>>   qemu-options.hx      |   6 +
>>   trace-events         |   8 +
>>   vl.c                 |   3 +-
>>   7 files changed, 1041 insertions(+), 1 deletion(-)
>>   create mode 100644 include/qemu/jhash.h
>>   create mode 100644 net/colo-proxy.c
>>   create mode 100644 net/colo-proxy.h
>>
>> --
>> 1.9.1
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

-- 
Best regards.
Li Zhijian (8555)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter
  2016-03-01  2:39   ` Li Zhijian
@ 2016-03-01 10:48     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 75+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-01 10:48 UTC (permalink / raw)
  To: Li Zhijian
  Cc: Zhang Chen, Gui jianfeng, Jason Wang, eddie.dong, qemu devel,
	Huang peng, Gong lei, Stefan Hajnoczi, jan.kiszka, Yang Hongyang,
	zhanghailiang

* Li Zhijian (lizhijian@cn.fujitsu.com) wrote:
> 
> 
> On 03/01/2016 04:04 AM, Dr. David Alan Gilbert wrote:
> >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> >>From: zhangchen <zhangchen.fnst@cn.fujitsu.com>
> >>
> >>Hi,all
> >>
> >>This patch add an colo-proxy object, COLO-Proxy is a part of COLO,
> >>based on qemu netfilter and it's a plugin for qemu netfilter. the function
> >>keep Secondary VM connect normal to Primary VM and compare packets
> >>sent by PVM to sent by SVM.if the packet difference,notify COLO do
> >>checkpoint and send all primary packet has queued.
> >
> >Hi Zhangchen,
> >   How are you dealing with the IP 'identification' field?
> >It's a very very random field, and not just the initial value in the connection.
> >I can't see how the kernel colo-proxy dealt with it either; but I think it's
> >comparison was after defragementation so probably ignored the identification
> >field
> You are right, most kernel colo-proxy code is working at mangle table (after defrag).
> and colo proxy only compare the contents of L4(TCP/UDP) excluding IP identification.
> 
> > - wouldn't that confuse a client at failover?
> Err..., instersting question.
> 
> for example, a COLO including primay(PVM) adn secondary(SVM)
> 1. primay is sending a compared P_IP packet(identification=0x12345, split to IP_s1, IP_s2..IP_s100) to client
> 2. client is receiving the ip segment(but IP_s2, IP_s50, IP_s80..IP_s99 are missing)
>    and primary host is down.
> 3. secondary VM takeover, and send a S_IP packet(ip contents is always same as at PVM)
> 
> in step 3,
> if the S_IP identification isn't 0x12345, client will drop the ip segment at step 2 because of defrag timeout.

So that triggers a timeout (30 second?) - hmm OK, a bit slow but OK.

> if the S_IP identification isn 0x12345, client may mix the segment from PVM or SVM(just like segment come
> from different router), but that's Okay, because we have ensured the ip contents is identical.

Could the S_IP identification match a later/earlier fragment?

Dave

> so IMO, it will not confuse the client at failover.
> 
> >
> >Dave
> >
> >>You can also get the series from:
> >>
> >>https://github.com/zhangckid/qemu/tree/colo-v2.2-periodic-mode-with-colo-proxyV2
> >>
> >>Usage:
> >>
> >>primary:
> >>-netdev tap,id=bn0 -device e1000,netdev=bn0
> >>-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=primary,addr=host:port
> >>
> >>secondary:
> >>-netdev tap,id=bn0 -device e1000,netdev=bn0
> >>-object colo-proxy,id=f0,netdev=bn0,queue=all,mode=secondary,addr=host:port
> >>
> >>NOTE:
> >>queue must set "all". See enum NetFilterDirection for detail.
> >>colo-proxy need queue all packets
> >>colo-proxy V2 just can compare ip packet
> >>
> >>
> >>## Background
> >>
> >>COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> >>project is a high availability solution. Both Primary VM (PVM) and Secondary VM
> >>(SVM) run in parallel. They receive the same request from client, and generate
> >>responses in parallel too. If the response packets from PVM and SVM are
> >>identical, they are released immediately. Otherwise, a VM checkpoint (on
> >>demand)is conducted.
> >>
> >>Paper:
> >>http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> >>
> >>COLO on Xen:
> >>http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> >>
> >>COLO on Qemu/KVM:
> >>http://wiki.qemu.org/Features/COLO
> >>
> >>By the needs of capturing response packets from PVM and SVM and finding out
> >>whether they are identical, we introduce a new module to qemu networking
> >>called colo-proxy.
> >>
> >>V2:
> >>   rebase colo-proxy with qemu-colo-v2.2-periodic-mode
> >>   fix dave's comments
> >>   fix wency's comments
> >>   fix zhanghailiang's comments
> >>
> >>v1:
> >>   initial patch.
> >>
> >>
> >>
> >>zhangchen (10):
> >>   Init colo-proxy object based on netfilter
> >>   Jhash: add linux kernel jhashtable in qemu
> >>   Colo-proxy: add colo-proxy framework
> >>   Colo-proxy: add data structure and jhash func
> >>   net/colo-proxy: Add colo interface to use proxy
> >>   net/colo-proxy: add socket used by forward func
> >>   net/colo-proxy: Add packet enqueue & handle func
> >>   net/colo-proxy: Handle packet and connection
> >>   net/colo-proxy: Compare pri pkt to sec pkt
> >>   net/colo-proxy: Colo-proxy do checkpoint and clear
> >>
> >>  include/qemu/jhash.h |  61 ++++
> >>  net/Makefile.objs    |   1 +
> >>  net/colo-proxy.c     | 939 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  net/colo-proxy.h     |  24 ++
> >>  qemu-options.hx      |   6 +
> >>  trace-events         |   8 +
> >>  vl.c                 |   3 +-
> >>  7 files changed, 1041 insertions(+), 1 deletion(-)
> >>  create mode 100644 include/qemu/jhash.h
> >>  create mode 100644 net/colo-proxy.c
> >>  create mode 100644 net/colo-proxy.h
> >>
> >>--
> >>1.9.1
> >>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >.
> >
> 
> -- 
> Best regards.
> Li Zhijian (8555)
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2016-03-01 10:48 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22 10:42 [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 01/10] Init colo-proxy object " Zhang Chen
2016-01-15 18:21   ` Dr. David Alan Gilbert
2016-01-18  7:08     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 02/10] Jhash: add linux kernel jhashtable in qemu Zhang Chen
2016-01-08 12:08   ` Dr. David Alan Gilbert
2016-01-11  1:49     ` Zhang Chen
2016-01-11 12:50       ` Dr. David Alan Gilbert
2016-01-12  1:58         ` Zhang Chen
2016-01-12  8:58           ` Dr. David Alan Gilbert
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 03/10] Colo-proxy: add colo-proxy framework Zhang Chen
2016-02-19 19:57   ` Dr. David Alan Gilbert
2016-02-22  3:04     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 04/10] Colo-proxy: add data structure and jhash func Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 05/10] net/colo-proxy: Add colo interface to use proxy Zhang Chen
2016-02-19 19:58   ` Dr. David Alan Gilbert
2016-02-22  3:08     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 06/10] net/colo-proxy: add socket used by forward func Zhang Chen
2016-02-19 20:01   ` Dr. David Alan Gilbert
2016-02-22  5:51     ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 07/10] net/colo-proxy: Add packet enqueue & handle func Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 08/10] net/colo-proxy: Handle packet and connection Zhang Chen
2016-02-19 20:04   ` Dr. David Alan Gilbert
2016-02-22  6:41     ` Zhang Chen
2016-02-22 19:54       ` Dr. David Alan Gilbert
2016-02-23 17:58       ` Dr. David Alan Gilbert
2016-02-24  2:01         ` Zhang Chen
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 09/10] net/colo-proxy: Compare pri pkt to sec pkt Zhang Chen
2016-02-19 20:07   ` Dr. David Alan Gilbert
2015-12-22 10:42 ` [Qemu-devel] [RFC PATCH v2 10/10] net/colo-proxy: Colo-proxy do checkpoint and clear Zhang Chen
2015-12-29  6:31 ` [Qemu-devel] [RFC PATCH v2 00/10] Add colo-proxy based on netfilter Zhang Chen
2015-12-29  6:58   ` Jason Wang
2015-12-29  7:08     ` Zhang Chen
2015-12-31  2:36 ` Jason Wang
2015-12-31  8:02   ` Li Zhijian
2016-01-04  2:08     ` Jason Wang
2015-12-31  8:40   ` Zhang Chen
2016-01-04  5:37     ` Jason Wang
2016-01-04  8:16       ` Zhang Chen
2016-01-04  9:46         ` Jason Wang
2016-01-04 11:17           ` Zhang Chen
2016-01-06  5:16             ` Jason Wang
2016-01-18  7:05               ` Zhang Chen
2016-01-18  9:29                 ` Jason Wang
2016-01-20  3:29                   ` Zhang Chen
2016-01-20  6:54                     ` Jason Wang
2016-01-20  7:44                       ` Wen Congyang
2016-01-20  9:20                         ` Jason Wang
2016-01-20  9:49                           ` Wen Congyang
2016-01-20 10:03                             ` Jason Wang
2016-01-20 10:34                               ` Wen Congyang
2016-01-22  5:33                                 ` Jason Wang
2016-01-22  5:57                                   ` Wen Congyang
2016-01-20 10:01                       ` Wen Congyang
2016-01-20 10:19                         ` Jason Wang
2016-01-20 10:30                           ` Wen Congyang
2016-01-22  3:15                             ` Jason Wang
2016-01-22  3:28                               ` Wen Congyang
2016-01-22  5:41                                 ` Jason Wang
2016-01-22  5:56                                   ` Wen Congyang
2016-01-22  6:21                                     ` Jason Wang
2016-01-22  6:47                                       ` Wen Congyang
2016-01-22  7:42                                         ` Jason Wang
2016-01-22  7:46                                           ` Wen Congyang
2016-01-27 15:22                                             ` Eric Blake
2016-01-04 16:52           ` Dr. David Alan Gilbert
2016-01-06  5:20             ` Jason Wang
2016-01-06  9:10               ` Dr. David Alan Gilbert
2016-01-08 11:19 ` Dr. David Alan Gilbert
2016-01-11  1:30   ` Zhang Chen
2016-01-11 12:59     ` Dr. David Alan Gilbert
2016-01-12  7:32       ` Zhang Chen
2016-02-29 20:04 ` Dr. David Alan Gilbert
2016-03-01  2:39   ` Li Zhijian
2016-03-01 10:48     ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.