All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [POC]colo-proxy in qemu
@ 2015-11-10  5:26 Tkid
  2015-11-10  7:35 ` Jason Wang
  2015-11-10 10:54 ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 63+ messages in thread
From: Tkid @ 2015-11-10  5:26 UTC (permalink / raw)
  To: qemu-devel, stefanha, jasowang
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, eddie.dong, dgilbert,
	peter.huangpeng, arei.gonglei, guijianfeng, zhangchen.fnst

[-- Attachment #1: Type: text/plain, Size: 4580 bytes --]

Hi,all

We are planning to reimplement colo proxy in userspace (Here is in qemu) to
cache and compare net packets.This module is one of the important components
of COLO project and now it is still in early stage, so any comments and
feedback are warmly welcomed,thanks in advance.

## Background
COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop 
Service)
project is a high availability solution. Both Primary VM (PVM) and 
Secondary VM
(SVM) run in parallel. They receive the same request from client, and 
generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint (on 
demand)
is conducted.
Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and finding out
whether they are identical, we introduce a new module to qemu networking 
called
colo-proxy.

This document describes the design of the colo-proxy module

## Glossary
   PVM - Primary VM, which provides services to clients.
   SVM - Secondary VM, a hot standby and replication of PVM.
   PN - Primary Node, the host which PVM runs on
   SN - Secondary Node, the host which SVM runs on

## Our Idea ##

COLO-Proxy
COLO-Proxy is a part of COLO,based on qemu net filter and it's a plugin for
qemu net filter.the function keep SVM connect normal to PVM and compare
PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.

== Workflow ==


+--+                                      +--+
|PN|                                      |SN|
+-----------------------+                 +-----------------------+
| +-------------------+ |                 | +-------------------+ |
| |                   | |                 | |                   | |
| |        PVM        | |                 | |        SVM        | |
| |                   | |                 | |                   | |
| +--+-^--------------+ |                 | +-------------^----++ |
|    | |                |                 |               |    |  |
|    | | +------------+ |                 | +-----------+ |    |  |
|    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
|    | | | CheckPoint +---------------------> CheckPoint| |    |  |
|    | | |            | |      (6)        | |           | |    |  |
|    | | +-----^------+ |                 | +-----------+ |    |  |
|    | |   (5) |        |                 |               |    |  |
|    | |       |        |                 |               |    |  |
| +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
| |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
| |      +-----+------+ |                 | +-----------------+ | |
| |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
| +-------------------+ | Forward(socket) | +-------------------+ |
++Qemu+-----------------+                 ++Qemu+-----------------+
            | ^
            | |
            | |
   +--------v-+--------+
   |                   |
   |      Client       |
   |                   |
   +-------------------+





(1)When PN receive client packets,PN COLO-Proxy copy and forward packets to
SN COLO-Proxy.
(2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
adjusted packets to SVM
(3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu 
COLO-Proxy.
(4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
compare PVM's packets data with SVM's packets data. If packets is 
different, compare
module notify COLO CheckPoint module to do a checkpoint then send PVM's 
packets to
client and drop SVM's packets, otherwise, just send PVM's packets to 
client and
drop SVM's packets.
(5)notify COLO-Checkpoint module checkpoint is needed
(6)Do COLO-Checkpoint

### QEMU space TCP/IP stack(Based on SLIRP) ###
We need a QEMU space TCP/IP stack to help us to analysis packet. After 
looking
into QEMU, we found that SLIRP

http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29

is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, 
it can
help use to handle the packet written to/read from backend(tap) device 
which is
just like a link layer(L2) packet.

### Packet enqueue and compare ###
Together with QEMU space TCP/IP stack, we enqueue all packets sent by 
PVM and
SVM on Primary QEMU, and then compare the packet payload for each 
connection.


[-- Attachment #2: Type: text/html, Size: 6443 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  5:26 [Qemu-devel] [POC]colo-proxy in qemu Tkid
@ 2015-11-10  7:35 ` Jason Wang
  2015-11-10  8:30   ` zhanghailiang
                     ` (3 more replies)
  2015-11-10 10:54 ` Dr. David Alan Gilbert
  1 sibling, 4 replies; 63+ messages in thread
From: Jason Wang @ 2015-11-10  7:35 UTC (permalink / raw)
  To: Tkid, qemu-devel, stefanha
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, eddie.dong, dgilbert,
	peter.huangpeng, arei.gonglei, guijianfeng



On 11/10/2015 01:26 PM, Tkid wrote:
> Hi,all
>
> We are planning to reimplement colo proxy in userspace (Here is in
> qemu) to
> cache and compare net packets.This module is one of the important
> components
> of COLO project and now it is still in early stage, so any comments and
> feedback are warmly welcomed,thanks in advance.
>
> ## Background
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> Service)
> project is a high availability solution. Both Primary VM (PVM) and
> Secondary VM
> (SVM) run in parallel. They receive the same request from client, and
> generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint
> (on demand)
> is conducted.
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
>
> By the needs of capturing response packets from PVM and SVM and
> finding out
> whether they are identical, we introduce a new module to qemu
> networking called
> colo-proxy.
>
> This document describes the design of the colo-proxy module
>
> ## Glossary
>   PVM - Primary VM, which provides services to clients.
>   SVM - Secondary VM, a hot standby and replication of PVM.
>   PN - Primary Node, the host which PVM runs on
>   SN - Secondary Node, the host which SVM runs on
>
> ## Our Idea ##
>
> COLO-Proxy
> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
> plugin for
> qemu net filter.the function keep SVM connect normal to PVM and compare
> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>
> == Workflow ==
>
>
> +--+                                      +--+
> |PN|                                      |SN|
> +-----------------------+                 +-----------------------+
> | +-------------------+ |                 | +-------------------+ |
> | |                   | |                 | |                   | |
> | |        PVM        | |                 | |        SVM        | |
> | |                   | |                 | |                   | |
> | +--+-^--------------+ |                 | +-------------^----++ |
> |    | |                |                 |               |    |  |
> |    | | +------------+ |                 | +-----------+ |    |  |
> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
> |    | | |            | |      (6)        | |           | |    |  |
> |    | | +-----^------+ |                 | +-----------+ |    |  |
> |    | |   (5) |        |                 |               |    |  |
> |    | |       |        |                 |               |    |  |
> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
> | |      +-----+------+ |                 | +-----------------+ | |
> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
> | +-------------------+ | Forward(socket) | +-------------------+ |
> ++Qemu+-----------------+                 ++Qemu+-----------------+
>            | ^
>            | |
>            | |
>   +--------v-+--------+
>   |                   |
>   |      Client       |
>   |                   |
>   +-------------------+
>
>
>
>
> (1)When PN receive client packets,PN COLO-Proxy copy and forward
> packets to
> SN COLO-Proxy.
> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> adjusted packets to SVM
> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> COLO-Proxy.
> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> compare PVM's packets data with SVM's packets data. If packets is
> different, compare
> module notify COLO CheckPoint module to do a checkpoint then send
> PVM's packets to
> client and drop SVM's packets, otherwise, just send PVM's packets to
> client and
> drop SVM's packets.
> (5)notify COLO-Checkpoint module checkpoint is needed
> (6)Do COLO-Checkpoint
>
> ### QEMU space TCP/IP stack(Based on SLIRP) ###
> We need a QEMU space TCP/IP stack to help us to analysis packet. After
> looking
> into QEMU, we found that SLIRP
>
> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>
> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
> QEMU, it can
> help use to handle the packet written to/read from backend(tap) device
> which is
> just like a link layer(L2) packet.
>
> ### Packet enqueue and compare ###
> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
> PVM and
> SVM on Primary QEMU, and then compare the packet payload for each
> connection.
>

Hi:

Just have the following questions in my mind (some has been raised in
the previous rounds of discussion without a conclusion):

- What's the plan for management layer? The setup seems complicated so
we could not simply depend on user to do each step. (And for security
reason, qemu was usually run as unprivileged user)
- What's the plan for vhost? Userspace network in qemu is rather slow,
most user will choose vhost.
- What if application generate packet based on hwrng device? This will
produce always different packets.
- Not sure SLIRP is perfect matched for this task. As has been raised, 
another method is to decouple the packet comparing from qemu. In this
way, lots of open source userspace stack could be used.
- Haven't read the code of packet comparing, but if it needs to keep
track the state of each connection, it could be easily DOS from guest.

Thanks

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  7:35 ` Jason Wang
@ 2015-11-10  8:30   ` zhanghailiang
  2015-11-11  2:28     ` Jason Wang
  2015-11-10  9:35   ` Tkid
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-11-10  8:30 UTC (permalink / raw)
  To: Jason Wang, Tkid, qemu-devel, stefanha
  Cc: lizhijian, jan.kiszka, eddie.dong, peter.huangpeng, dgilbert,
	arei.gonglei, guijianfeng

On 2015/11/10 15:35, Jason Wang wrote:
>
>
> On 11/10/2015 01:26 PM, Tkid wrote:
>> Hi,all
>>
>> We are planning to reimplement colo proxy in userspace (Here is in
>> qemu) to
>> cache and compare net packets.This module is one of the important
>> components
>> of COLO project and now it is still in early stage, so any comments and
>> feedback are warmly welcomed,thanks in advance.
>>
>> ## Background
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>> Service)
>> project is a high availability solution. Both Primary VM (PVM) and
>> Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and
>> generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint
>> (on demand)
>> is conducted.
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and
>> finding out
>> whether they are identical, we introduce a new module to qemu
>> networking called
>> colo-proxy.
>>
>> This document describes the design of the colo-proxy module
>>
>> ## Glossary
>>    PVM - Primary VM, which provides services to clients.
>>    SVM - Secondary VM, a hot standby and replication of PVM.
>>    PN - Primary Node, the host which PVM runs on
>>    SN - Secondary Node, the host which SVM runs on
>>
>> ## Our Idea ##
>>
>> COLO-Proxy
>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
>> plugin for
>> qemu net filter.the function keep SVM connect normal to PVM and compare
>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>
>> == Workflow ==
>>
>>
>> +--+                                      +--+
>> |PN|                                      |SN|
>> +-----------------------+                 +-----------------------+
>> | +-------------------+ |                 | +-------------------+ |
>> | |                   | |                 | |                   | |
>> | |        PVM        | |                 | |        SVM        | |
>> | |                   | |                 | |                   | |
>> | +--+-^--------------+ |                 | +-------------^----++ |
>> |    | |                |                 |               |    |  |
>> |    | | +------------+ |                 | +-----------+ |    |  |
>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>> |    | | |            | |      (6)        | |           | |    |  |
>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>> |    | |   (5) |        |                 |               |    |  |
>> |    | |       |        |                 |               |    |  |
>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>> | |      +-----+------+ |                 | +-----------------+ | |
>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>> | +-------------------+ | Forward(socket) | +-------------------+ |
>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>             | ^
>>             | |
>>             | |
>>    +--------v-+--------+
>>    |                   |
>>    |      Client       |
>>    |                   |
>>    +-------------------+
>>
>>
>>
>>
>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
>> packets to
>> SN COLO-Proxy.
>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
>> adjusted packets to SVM
>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>> COLO-Proxy.
>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
>> compare PVM's packets data with SVM's packets data. If packets is
>> different, compare
>> module notify COLO CheckPoint module to do a checkpoint then send
>> PVM's packets to
>> client and drop SVM's packets, otherwise, just send PVM's packets to
>> client and
>> drop SVM's packets.
>> (5)notify COLO-Checkpoint module checkpoint is needed
>> (6)Do COLO-Checkpoint
>>
>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>> looking
>> into QEMU, we found that SLIRP
>>
>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>
>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
>> QEMU, it can
>> help use to handle the packet written to/read from backend(tap) device
>> which is
>> just like a link layer(L2) packet.
>>
>> ### Packet enqueue and compare ###
>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
>> PVM and
>> SVM on Primary QEMU, and then compare the packet payload for each
>> connection.
>>
>
> Hi:
>
> Just have the following questions in my mind (some has been raised in
> the previous rounds of discussion without a conclusion):
>
> - What's the plan for management layer? The setup seems complicated so
> we could not simply depend on user to do each step. (And for security
> reason, qemu was usually run as unprivileged user)

We will do most of the setup works automatically in qemu as possible as we can.
Compared with kernel proxy scheme, it is not a big deal. :)

> - What's the plan for vhost? Userspace network in qemu is rather slow,
> most user will choose vhost.
> - What if application generate packet based on hwrng device? This will
> produce always different packets.

Yes, that is really a big problem, actually, we have discussed it for many
times, it seems that there is no perfect way to solve it. :(
We have a compromise approach, when we find there are too many continuous
checkpoint requests, we switch COLO from normal mode to periodic mode
which SVM will stop running.
(Dave have realized this before, which called Hybrid mode. The patches is
"[RFC/COLO:  0/3] Hybrid mode and parameterisation")

> - Not sure SLIRP is perfect matched for this task. As has been raised,
> another method is to decouple the packet comparing from qemu. In this
> way, lots of open source userspace stack could be used.

Hmm, it seems to be a good idea, maybe we can add a checkpoint request command
in COLO to support more packets comparing scheme ...

Thanks,
zhanghailiang

> - Haven't read the code of packet comparing, but if it needs to keep
> track the state of each connection, it could be easily DOS from guest.
>
> Thanks
>
> .
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  7:35 ` Jason Wang
  2015-11-10  8:30   ` zhanghailiang
@ 2015-11-10  9:35   ` Tkid
  2015-11-11  3:04     ` Jason Wang
  2015-11-10  9:41   ` Dr. David Alan Gilbert
  2015-11-11  1:23   ` Dong, Eddie
  3 siblings, 1 reply; 63+ messages in thread
From: Tkid @ 2015-11-10  9:35 UTC (permalink / raw)
  To: Jason Wang, qemu-devel, stefanha
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, eddie.dong, dgilbert,
	peter.huangpeng, arei.gonglei, guijianfeng



On 11/10/2015 03:35 PM, Jason Wang wrote:
> On 11/10/2015 01:26 PM, Tkid wrote:
>> Hi,all
>>
>> We are planning to reimplement colo proxy in userspace (Here is in
>> qemu) to
>> cache and compare net packets.This module is one of the important
>> components
>> of COLO project and now it is still in early stage, so any comments and
>> feedback are warmly welcomed,thanks in advance.
>>
>> ## Background
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>> Service)
>> project is a high availability solution. Both Primary VM (PVM) and
>> Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and
>> generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint
>> (on demand)
>> is conducted.
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and
>> finding out
>> whether they are identical, we introduce a new module to qemu
>> networking called
>> colo-proxy.
>>
>> This document describes the design of the colo-proxy module
>>
>> ## Glossary
>>    PVM - Primary VM, which provides services to clients.
>>    SVM - Secondary VM, a hot standby and replication of PVM.
>>    PN - Primary Node, the host which PVM runs on
>>    SN - Secondary Node, the host which SVM runs on
>>
>> ## Our Idea ##
>>
>> COLO-Proxy
>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
>> plugin for
>> qemu net filter.the function keep SVM connect normal to PVM and compare
>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>
>> == Workflow ==
>>
>> +--+                                      +--+
>> |PN|                                      |SN|
>> +-----------------------+                 +-----------------------+
>> | +-------------------+ |                 | +-------------------+ |
>> | |                   | |                 | |                   | |
>> | |        PVM        | |                 | |        SVM        | |
>> | |                   | |                 | |                   | |
>> | +--+-^--------------+ |                 | +-------------^----++ |
>> |    | |                |                 |               |    |  |
>> |    | | +------------+ |                 | +-----------+ |    |  |
>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>> |    | | |            | |      (6)        | |           | |    |  |
>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>> |    | |   (5) |        |                 |               |    |  |
>> |    | |       |        |                 |               |    |  |
>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>> | |      +-----+------+ |                 | +-----------------+ | |
>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>> | +-------------------+ | Forward(socket) | +-------------------+ |
>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>             | ^
>>             | |
>>             | |
>>    +--------v-+--------+
>>    |                   |
>>    |      Client       |
>>    |                   |
>>    +-------------------+
>>
>>
>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
>> packets to
>> SN COLO-Proxy.
>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
>> adjusted packets to SVM
>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>> COLO-Proxy.
>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
>> compare PVM's packets data with SVM's packets data. If packets is
>> different, compare
>> module notify COLO CheckPoint module to do a checkpoint then send
>> PVM's packets to
>> client and drop SVM's packets, otherwise, just send PVM's packets to
>> client and
>> drop SVM's packets.
>> (5)notify COLO-Checkpoint module checkpoint is needed
>> (6)Do COLO-Checkpoint
>>
>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>> looking
>> into QEMU, we found that SLIRP
>>
>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>
>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
>> QEMU, it can
>> help use to handle the packet written to/read from backend(tap) device
>> which is
>> just like a link layer(L2) packet.
>>
>> ### Packet enqueue and compare ###
>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
>> PVM and
>> SVM on Primary QEMU, and then compare the packet payload for each
>> connection.
>>
Thanks for review ~
> Hi:
>
> Just have the following questions in my mind (some has been raised in
> the previous rounds of discussion without a conclusion):
>
> - What's the plan for management layer? The setup seems complicated so
> we could not simply depend on user to do each step. (And for security
> reason, qemu was usually run as unprivileged user)
-We don't need to run as privileged user, colo-proxy just run like 
filter-buffer. usage: primary: -netdev tap,id=bn0 -device 
e1000,netdev=bn0 -object 
colo-proxy,id=f0,netdev=bn0,queue=all,side=primary,host=3.3.3.8,port=xxx 
secondary: -netdev tap,id=bn0 -device e1000,netdev=bn0 -object 
colo-proxy,id=f0,netdev=bn0,queue=all,side=secondary,server=tcp:xxxx:port
> - What's the plan for vhost? Userspace network in qemu is rather slow,
> most user will choose vhost.
colo-proxy in qemu space don't support vhost. people who want to use 
colo must disable vhost,  but virtio-net is another choice which is 
enough in most case.
> - What if application generate packet based on hwrng device? This will
> produce always different packets.
just like hailiang said.
> - Not sure SLIRP is perfect matched for this task. As has been raised,
> another method is to decouple the packet comparing from qemu. In this
> way, lots of open source userspace stack could be used.
-we just need the some capabilities(such as IP frag/defrag) of SLIRP。 
We have investigated some open source userspace stack,but not find one 
better to SLIRP. if you know,please tell me.
> - Haven't read the code of packet comparing, but if it needs to keep
> track the state of each connection, it could be easily DOS from guest.

-We think preventDOS from guest is out of our focus,it should be firewall to concerned.

> Thanks
> .
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  7:35 ` Jason Wang
  2015-11-10  8:30   ` zhanghailiang
  2015-11-10  9:35   ` Tkid
@ 2015-11-10  9:41   ` Dr. David Alan Gilbert
  2015-11-11  3:09     ` Jason Wang
  2015-11-11  1:23   ` Dong, Eddie
  3 siblings, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-10  9:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tkid, lizhijian, jan.kiszka, eddie.dong, qemu-devel,
	peter.huangpeng, arei.gonglei, luis, stefanha, guijianfeng,
	zhang.zhanghailiang

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 11/10/2015 01:26 PM, Tkid wrote:
> > Hi,all
> >
> > We are planning to reimplement colo proxy in userspace (Here is in
> > qemu) to
> > cache and compare net packets.This module is one of the important
> > components
> > of COLO project and now it is still in early stage, so any comments and
> > feedback are warmly welcomed,thanks in advance.
> >
> > ## Background
> > COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> > Service)
> > project is a high availability solution. Both Primary VM (PVM) and
> > Secondary VM
> > (SVM) run in parallel. They receive the same request from client, and
> > generate
> > responses in parallel too. If the response packets from PVM and SVM are
> > identical, they are released immediately. Otherwise, a VM checkpoint
> > (on demand)
> > is conducted.
> > Paper:
> > http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> > COLO on Xen:
> > http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> > COLO on Qemu/KVM:
> > http://wiki.qemu.org/Features/COLO
> >
> > By the needs of capturing response packets from PVM and SVM and
> > finding out
> > whether they are identical, we introduce a new module to qemu
> > networking called
> > colo-proxy.
> >
> > This document describes the design of the colo-proxy module
> >
> > ## Glossary
> >   PVM - Primary VM, which provides services to clients.
> >   SVM - Secondary VM, a hot standby and replication of PVM.
> >   PN - Primary Node, the host which PVM runs on
> >   SN - Secondary Node, the host which SVM runs on
> >
> > ## Our Idea ##
> >
> > COLO-Proxy
> > COLO-Proxy is a part of COLO,based on qemu net filter and it's a
> > plugin for
> > qemu net filter.the function keep SVM connect normal to PVM and compare
> > PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
> >
> > == Workflow ==
> >
> >
> > +--+                                      +--+
> > |PN|                                      |SN|
> > +-----------------------+                 +-----------------------+
> > | +-------------------+ |                 | +-------------------+ |
> > | |                   | |                 | |                   | |
> > | |        PVM        | |                 | |        SVM        | |
> > | |                   | |                 | |                   | |
> > | +--+-^--------------+ |                 | +-------------^----++ |
> > |    | |                |                 |               |    |  |
> > |    | | +------------+ |                 | +-----------+ |    |  |
> > |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
> > |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
> > |    | | |            | |      (6)        | |           | |    |  |
> > |    | | +-----^------+ |                 | +-----------+ |    |  |
> > |    | |   (5) |        |                 |               |    |  |
> > |    | |       |        |                 |               |    |  |
> > | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
> > | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
> > | |      +-----+------+ |                 | +-----------------+ | |
> > | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
> > | +-------------------+ | Forward(socket) | +-------------------+ |
> > ++Qemu+-----------------+                 ++Qemu+-----------------+
> >            | ^
> >            | |
> >            | |
> >   +--------v-+--------+
> >   |                   |
> >   |      Client       |
> >   |                   |
> >   +-------------------+
> >
> >
> >
> >
> > (1)When PN receive client packets,PN COLO-Proxy copy and forward
> > packets to
> > SN COLO-Proxy.
> > (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> > adjusted packets to SVM
> > (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> > COLO-Proxy.
> > (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> > compare PVM's packets data with SVM's packets data. If packets is
> > different, compare
> > module notify COLO CheckPoint module to do a checkpoint then send
> > PVM's packets to
> > client and drop SVM's packets, otherwise, just send PVM's packets to
> > client and
> > drop SVM's packets.
> > (5)notify COLO-Checkpoint module checkpoint is needed
> > (6)Do COLO-Checkpoint
> >
> > ### QEMU space TCP/IP stack(Based on SLIRP) ###
> > We need a QEMU space TCP/IP stack to help us to analysis packet. After
> > looking
> > into QEMU, we found that SLIRP
> >
> > http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
> >
> > is a good choice for us. SLIRP proivdes a full TCP/IP stack within
> > QEMU, it can
> > help use to handle the packet written to/read from backend(tap) device
> > which is
> > just like a link layer(L2) packet.
> >
> > ### Packet enqueue and compare ###
> > Together with QEMU space TCP/IP stack, we enqueue all packets sent by
> > PVM and
> > SVM on Primary QEMU, and then compare the packet payload for each
> > connection.
> >
> 
> Hi:
> 
> Just have the following questions in my mind (some has been raised in
> the previous rounds of discussion without a conclusion):
> 
> - What's the plan for management layer? The setup seems complicated so
> we could not simply depend on user to do each step. (And for security
> reason, qemu was usually run as unprivileged user)

It's certainly easier than the current COLO code that relies on a very
complex set of bridges, extra network interfaces and kernel modules.
UMU  (cc'd) have been working on a libvirt set that starts COLO up, although
one bit that's very messy is the curretn kernel based network comparison
code.

> - What's the plan for vhost? Userspace network in qemu is rather slow,
> most user will choose vhost.
> - What if application generate packet based on hwrng device? This will
> produce always different packets.

Yes, there are cases this happens - COLO's worst case is similar to simple
checkpointing (because it has a limit to the smallest checkpoint), but it's
best case is much better, on a compute heavy load, it ends up taking
a checkpoint very rarely.
Actually the big problem is where randomness occurs in unexpected places,
e.g. where things like Perl's hash randomisation means that the two
hosts produce the same data in different orders. 

> - Not sure SLIRP is perfect matched for this task. As has been raised, 
> another method is to decouple the packet comparing from qemu. In this
> way, lots of open source userspace stack could be used.
> - Haven't read the code of packet comparing, but if it needs to keep
> track the state of each connection, it could be easily DOS from guest.

The guest can only break it's own networking; so shooting itself in the foot
is no big deal.

Dave

> 
> Thanks
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  5:26 [Qemu-devel] [POC]colo-proxy in qemu Tkid
  2015-11-10  7:35 ` Jason Wang
@ 2015-11-10 10:54 ` Dr. David Alan Gilbert
  2015-11-11  2:46   ` Zhang Chen
  1 sibling, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-10 10:54 UTC (permalink / raw)
  To: Tkid
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, jasowang, eddie.dong,
	qemu-devel, peter.huangpeng, arei.gonglei, stefanha, guijianfeng

* Tkid (zhangchen.fnst@cn.fujitsu.com) wrote:
> Hi,all
> 
> We are planning to reimplement colo proxy in userspace (Here is in qemu) to
> cache and compare net packets.This module is one of the important components
> of COLO project and now it is still in early stage, so any comments and
> feedback are warmly welcomed,thanks in advance.
> 
> ## Background
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary
> VM
> (SVM) run in parallel. They receive the same request from client, and
> generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand)
> is conducted.
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
> 
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called
> colo-proxy.
> 
> This document describes the design of the colo-proxy module
> 
> ## Glossary
>   PVM - Primary VM, which provides services to clients.
>   SVM - Secondary VM, a hot standby and replication of PVM.
>   PN - Primary Node, the host which PVM runs on
>   SN - Secondary Node, the host which SVM runs on
> 
> ## Our Idea ##
> 
> COLO-Proxy
> COLO-Proxy is a part of COLO,based on qemu net filter and it's a plugin for
> qemu net filter.the function keep SVM connect normal to PVM and compare
> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
> 
> == Workflow ==
> 
> 
> +--+                                      +--+
> |PN|                                      |SN|
> +-----------------------+                 +-----------------------+
> | +-------------------+ |                 | +-------------------+ |
> | |                   | |                 | |                   | |
> | |        PVM        | |                 | |        SVM        | |
> | |                   | |                 | |                   | |
> | +--+-^--------------+ |                 | +-------------^----++ |
> |    | |                |                 |               |    |  |
> |    | | +------------+ |                 | +-----------+ |    |  |
> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
> |    | | |            | |      (6)        | |           | |    |  |
> |    | | +-----^------+ |                 | +-----------+ |    |  |
> |    | |   (5) |        |                 |               |    |  |
> |    | |       |        |                 |               |    |  |
> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
> | |      +-----+------+ |                 | +-----------------+ | |
> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
> | +-------------------+ | Forward(socket) | +-------------------+ |
> ++Qemu+-----------------+                 ++Qemu+-----------------+
>            | ^
>            | |
>            | |
>   +--------v-+--------+
>   |                   |
>   |      Client       |
>   |                   |
>   +-------------------+
> 
> 
> (1)When PN receive client packets,PN COLO-Proxy copy and forward packets to
> SN COLO-Proxy.
> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> adjusted packets to SVM
> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> COLO-Proxy.

What protocol are you using for the data carried over the Forward(socket)?
I'm just wondering if there's an existing layer2 tunneling protocol that
it would be best to use.

> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> compare PVM's packets data with SVM's packets data. If packets is different,
> compare
> module notify COLO CheckPoint module to do a checkpoint then send PVM's
> packets to
> client and drop SVM's packets, otherwise, just send PVM's packets to client
> and
> drop SVM's packets.
> (5)notify COLO-Checkpoint module checkpoint is needed
> (6)Do COLO-Checkpoint
> 
> ### QEMU space TCP/IP stack(Based on SLIRP) ###
> We need a QEMU space TCP/IP stack to help us to analysis packet. After
> looking
> into QEMU, we found that SLIRP
> 
> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
> 
> is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, it
> can
> help use to handle the packet written to/read from backend(tap) device which
> is
> just like a link layer(L2) packet.

I still think SLIRP might be painful; but it might be an easy one to start
with.

> ### Packet enqueue and compare ###
> Together with QEMU space TCP/IP stack, we enqueue all packets sent by PVM
> and
> SVM on Primary QEMU, and then compare the packet payload for each
> connection.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  7:35 ` Jason Wang
                     ` (2 preceding siblings ...)
  2015-11-10  9:41   ` Dr. David Alan Gilbert
@ 2015-11-11  1:23   ` Dong, Eddie
  2015-11-11  3:26     ` Jason Wang
  3 siblings, 1 reply; 63+ messages in thread
From: Dong, Eddie @ 2015-11-11  1:23 UTC (permalink / raw)
  To: Jason Wang, Tkid, qemu-devel, stefanha
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, dgilbert,
	peter.huangpeng, arei.gonglei, guijianfeng

> - What's the plan for vhost? Userspace network in qemu is rather slow, most
> user will choose vhost.
[Dong, Eddie] Hi Jason:
	How about we take staging approach? In general, COLO opens a door of high performance HA solution, but it will take very long time to make everything perfect. As for the network virtualization, I think we may start from usage with moderate network bandwidth, like 1Gbps. Otherwise, the performance of COLO may be not that good (of course, like David mentioned, the worst case is same with periodic checkpoint). At the moment, how about we start from in Qemu virtio network, and enhance for vhost case in future? 

	The good thing is that we get more people working on the patch series, and glad to see UMU also joined the effort.  Thanks and welcome...

Thx Eddie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  8:30   ` zhanghailiang
@ 2015-11-11  2:28     ` Jason Wang
  0 siblings, 0 replies; 63+ messages in thread
From: Jason Wang @ 2015-11-11  2:28 UTC (permalink / raw)
  To: zhanghailiang, Tkid, qemu-devel, stefanha
  Cc: lizhijian, jan.kiszka, eddie.dong, peter.huangpeng, dgilbert,
	arei.gonglei, guijianfeng



On 11/10/2015 04:30 PM, zhanghailiang wrote:
> On 2015/11/10 15:35, Jason Wang wrote:
>>
>>
>> On 11/10/2015 01:26 PM, Tkid wrote:
>>> Hi,all
>>>
>>> We are planning to reimplement colo proxy in userspace (Here is in
>>> qemu) to
>>> cache and compare net packets.This module is one of the important
>>> components
>>> of COLO project and now it is still in early stage, so any comments and
>>> feedback are warmly welcomed,thanks in advance.
>>>
>>> ## Background
>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>> Service)
>>> project is a high availability solution. Both Primary VM (PVM) and
>>> Secondary VM
>>> (SVM) run in parallel. They receive the same request from client, and
>>> generate
>>> responses in parallel too. If the response packets from PVM and SVM are
>>> identical, they are released immediately. Otherwise, a VM checkpoint
>>> (on demand)
>>> is conducted.
>>> Paper:
>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>> COLO on Xen:
>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>> COLO on Qemu/KVM:
>>> http://wiki.qemu.org/Features/COLO
>>>
>>> By the needs of capturing response packets from PVM and SVM and
>>> finding out
>>> whether they are identical, we introduce a new module to qemu
>>> networking called
>>> colo-proxy.
>>>
>>> This document describes the design of the colo-proxy module
>>>
>>> ## Glossary
>>>    PVM - Primary VM, which provides services to clients.
>>>    SVM - Secondary VM, a hot standby and replication of PVM.
>>>    PN - Primary Node, the host which PVM runs on
>>>    SN - Secondary Node, the host which SVM runs on
>>>
>>> ## Our Idea ##
>>>
>>> COLO-Proxy
>>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
>>> plugin for
>>> qemu net filter.the function keep SVM connect normal to PVM and compare
>>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>>
>>> == Workflow ==
>>>
>>>
>>> +--+                                      +--+
>>> |PN|                                      |SN|
>>> +-----------------------+                 +-----------------------+
>>> | +-------------------+ |                 | +-------------------+ |
>>> | |                   | |                 | |                   | |
>>> | |        PVM        | |                 | |        SVM        | |
>>> | |                   | |                 | |                   | |
>>> | +--+-^--------------+ |                 | +-------------^----++ |
>>> |    | |                |                 |               |    |  |
>>> |    | | +------------+ |                 | +-----------+ |    |  |
>>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>>> |    | | |            | |      (6)        | |           | |    |  |
>>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>>> |    | |   (5) |        |                 |               |    |  |
>>> |    | |       |        |                 |               |    |  |
>>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>>> | |      +-----+------+ |                 | +-----------------+ | |
>>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>>> | +-------------------+ | Forward(socket) | +-------------------+ |
>>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>>             | ^
>>>             | |
>>>             | |
>>>    +--------v-+--------+
>>>    |                   |
>>>    |      Client       |
>>>    |                   |
>>>    +-------------------+
>>>
>>>
>>>
>>>
>>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
>>> packets to
>>> SN COLO-Proxy.
>>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's
>>> ack,send
>>> adjusted packets to SVM
>>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>>> COLO-Proxy.
>>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's
>>> packets,then
>>> compare PVM's packets data with SVM's packets data. If packets is
>>> different, compare
>>> module notify COLO CheckPoint module to do a checkpoint then send
>>> PVM's packets to
>>> client and drop SVM's packets, otherwise, just send PVM's packets to
>>> client and
>>> drop SVM's packets.
>>> (5)notify COLO-Checkpoint module checkpoint is needed
>>> (6)Do COLO-Checkpoint
>>>
>>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>>> looking
>>> into QEMU, we found that SLIRP
>>>
>>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>>
>>>
>>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
>>> QEMU, it can
>>> help use to handle the packet written to/read from backend(tap) device
>>> which is
>>> just like a link layer(L2) packet.
>>>
>>> ### Packet enqueue and compare ###
>>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
>>> PVM and
>>> SVM on Primary QEMU, and then compare the packet payload for each
>>> connection.
>>>
>>
>> Hi:
>>
>> Just have the following questions in my mind (some has been raised in
>> the previous rounds of discussion without a conclusion):
>>
>> - What's the plan for management layer? The setup seems complicated so
>> we could not simply depend on user to do each step. (And for security
>> reason, qemu was usually run as unprivileged user)
>
> We will do most of the setup works automatically in qemu as possible
> as we can.
> Compared with kernel proxy scheme, it is not a big deal. :)

Yes, but what I mean is for host. E.g setting up network like bridge or
others.
>
>> - What's the plan for vhost? Userspace network in qemu is rather slow,
>> most user will choose vhost.
>> - What if application generate packet based on hwrng device? This will
>> produce always different packets.
>
> Yes, that is really a big problem, actually, we have discussed it for
> many
> times, it seems that there is no perfect way to solve it. :(
> We have a compromise approach, when we find there are too many continuous
> checkpoint requests, we switch COLO from normal mode to periodic mode
> which SVM will stop running.
> (Dave have realized this before, which called Hybrid mode. The patches is
> "[RFC/COLO:  0/3] Hybrid mode and parameterisation")

Aha, I see. Thanks for the pointer.

>
>> - Not sure SLIRP is perfect matched for this task. As has been raised,
>> another method is to decouple the packet comparing from qemu. In this
>> way, lots of open source userspace stack could be used.
>
> Hmm, it seems to be a good idea, maybe we can add a checkpoint request
> command
> in COLO to support more packets comparing scheme ...
>

Yes, then when to synchronize could be determined by external program.
(Just an idea FYI).

> Thanks,
> zhanghailiang
>
>> - Haven't read the code of packet comparing, but if it needs to keep
>> track the state of each connection, it could be easily DOS from guest.
>>
>> Thanks
>>
>> .
>>
>
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10 10:54 ` Dr. David Alan Gilbert
@ 2015-11-11  2:46   ` Zhang Chen
  2015-11-13 12:33     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Zhang Chen @ 2015-11-11  2:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, jasowang, eddie.dong,
	qemu-devel, peter.huangpeng, arei.gonglei, stefanha, guijianfeng



On 11/10/2015 06:54 PM, Dr. David Alan Gilbert wrote:
> * Tkid (zhangchen.fnst@cn.fujitsu.com) wrote:
>> Hi,all
>>
>> We are planning to reimplement colo proxy in userspace (Here is in qemu) to
>> cache and compare net packets.This module is one of the important components
>> of COLO project and now it is still in early stage, so any comments and
>> feedback are warmly welcomed,thanks in advance.
>>
>> ## Background
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>> Service)
>> project is a high availability solution. Both Primary VM (PVM) and Secondary
>> VM
>> (SVM) run in parallel. They receive the same request from client, and
>> generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint (on
>> demand)
>> is conducted.
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and finding out
>> whether they are identical, we introduce a new module to qemu networking
>> called
>> colo-proxy.
>>
>> This document describes the design of the colo-proxy module
>>
>> ## Glossary
>>    PVM - Primary VM, which provides services to clients.
>>    SVM - Secondary VM, a hot standby and replication of PVM.
>>    PN - Primary Node, the host which PVM runs on
>>    SN - Secondary Node, the host which SVM runs on
>>
>> ## Our Idea ##
>>
>> COLO-Proxy
>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a plugin for
>> qemu net filter.the function keep SVM connect normal to PVM and compare
>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>
>> == Workflow ==
>>
>>
>> +--+                                      +--+
>> |PN|                                      |SN|
>> +-----------------------+                 +-----------------------+
>> | +-------------------+ |                 | +-------------------+ |
>> | |                   | |                 | |                   | |
>> | |        PVM        | |                 | |        SVM        | |
>> | |                   | |                 | |                   | |
>> | +--+-^--------------+ |                 | +-------------^----++ |
>> |    | |                |                 |               |    |  |
>> |    | | +------------+ |                 | +-----------+ |    |  |
>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>> |    | | |            | |      (6)        | |           | |    |  |
>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>> |    | |   (5) |        |                 |               |    |  |
>> |    | |       |        |                 |               |    |  |
>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>> | |      +-----+------+ |                 | +-----------------+ | |
>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>> | +-------------------+ | Forward(socket) | +-------------------+ |
>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>             | ^
>>             | |
>>             | |
>>    +--------v-+--------+
>>    |                   |
>>    |      Client       |
>>    |                   |
>>    +-------------------+
>>
>>
>> (1)When PN receive client packets,PN COLO-Proxy copy and forward packets to
>> SN COLO-Proxy.
>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
>> adjusted packets to SVM
>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>> COLO-Proxy.
> What protocol are you using for the data carried over the Forward(socket)?
> I'm just wondering if there's an existing layer2 tunneling protocol that
> it would be best to use.
Currently,we will use raw tcp socket send a packet's length then send 
the packet.
>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
>> compare PVM's packets data with SVM's packets data. If packets is different,
>> compare
>> module notify COLO CheckPoint module to do a checkpoint then send PVM's
>> packets to
>> client and drop SVM's packets, otherwise, just send PVM's packets to client
>> and
>> drop SVM's packets.
>> (5)notify COLO-Checkpoint module checkpoint is needed
>> (6)Do COLO-Checkpoint
>>
>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>> looking
>> into QEMU, we found that SLIRP
>>
>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>
>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, it
>> can
>> help use to handle the packet written to/read from backend(tap) device which
>> is
>> just like a link layer(L2) packet.
> I still think SLIRP might be painful; but it might be an easy one to start
> with.
>
>> ### Packet enqueue and compare ###
>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by PVM
>> and
>> SVM on Primary QEMU, and then compare the packet payload for each
>> connection.
> Dave
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  9:35   ` Tkid
@ 2015-11-11  3:04     ` Jason Wang
  0 siblings, 0 replies; 63+ messages in thread
From: Jason Wang @ 2015-11-11  3:04 UTC (permalink / raw)
  To: Tkid, qemu-devel, stefanha
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, eddie.dong,
	peter.huangpeng, dgilbert, arei.gonglei, guijianfeng



On 11/10/2015 05:35 PM, Tkid wrote:
>
>
> On 11/10/2015 03:35 PM, Jason Wang wrote:
>> On 11/10/2015 01:26 PM, Tkid wrote:
>>> Hi,all
>>>
>>> We are planning to reimplement colo proxy in userspace (Here is in
>>> qemu) to
>>> cache and compare net packets.This module is one of the important
>>> components
>>> of COLO project and now it is still in early stage, so any comments and
>>> feedback are warmly welcomed,thanks in advance.
>>>
>>> ## Background
>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>> Service)
>>> project is a high availability solution. Both Primary VM (PVM) and
>>> Secondary VM
>>> (SVM) run in parallel. They receive the same request from client, and
>>> generate
>>> responses in parallel too. If the response packets from PVM and SVM are
>>> identical, they are released immediately. Otherwise, a VM checkpoint
>>> (on demand)
>>> is conducted.
>>> Paper:
>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>> COLO on Xen:
>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>> COLO on Qemu/KVM:
>>> http://wiki.qemu.org/Features/COLO
>>>
>>> By the needs of capturing response packets from PVM and SVM and
>>> finding out
>>> whether they are identical, we introduce a new module to qemu
>>> networking called
>>> colo-proxy.
>>>
>>> This document describes the design of the colo-proxy module
>>>
>>> ## Glossary
>>>    PVM - Primary VM, which provides services to clients.
>>>    SVM - Secondary VM, a hot standby and replication of PVM.
>>>    PN - Primary Node, the host which PVM runs on
>>>    SN - Secondary Node, the host which SVM runs on
>>>
>>> ## Our Idea ##
>>>
>>> COLO-Proxy
>>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
>>> plugin for
>>> qemu net filter.the function keep SVM connect normal to PVM and compare
>>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>>
>>> == Workflow ==
>>>
>>> +--+                                      +--+
>>> |PN|                                      |SN|
>>> +-----------------------+                 +-----------------------+
>>> | +-------------------+ |                 | +-------------------+ |
>>> | |                   | |                 | |                   | |
>>> | |        PVM        | |                 | |        SVM        | |
>>> | |                   | |                 | |                   | |
>>> | +--+-^--------------+ |                 | +-------------^----++ |
>>> |    | |                |                 |               |    |  |
>>> |    | | +------------+ |                 | +-----------+ |    |  |
>>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>>> |    | | |            | |      (6)        | |           | |    |  |
>>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>>> |    | |   (5) |        |                 |               |    |  |
>>> |    | |       |        |                 |               |    |  |
>>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>>> | |      +-----+------+ |                 | +-----------------+ | |
>>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>>> | +-------------------+ | Forward(socket) | +-------------------+ |
>>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>>             | ^
>>>             | |
>>>             | |
>>>    +--------v-+--------+
>>>    |                   |
>>>    |      Client       |
>>>    |                   |
>>>    +-------------------+
>>>
>>>
>>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
>>> packets to
>>> SN COLO-Proxy.
>>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's
>>> ack,send
>>> adjusted packets to SVM
>>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>>> COLO-Proxy.
>>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's
>>> packets,then
>>> compare PVM's packets data with SVM's packets data. If packets is
>>> different, compare
>>> module notify COLO CheckPoint module to do a checkpoint then send
>>> PVM's packets to
>>> client and drop SVM's packets, otherwise, just send PVM's packets to
>>> client and
>>> drop SVM's packets.
>>> (5)notify COLO-Checkpoint module checkpoint is needed
>>> (6)Do COLO-Checkpoint
>>>
>>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>>> looking
>>> into QEMU, we found that SLIRP
>>>
>>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>>
>>>
>>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
>>> QEMU, it can
>>> help use to handle the packet written to/read from backend(tap) device
>>> which is
>>> just like a link layer(L2) packet.
>>>
>>> ### Packet enqueue and compare ###
>>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
>>> PVM and
>>> SVM on Primary QEMU, and then compare the packet payload for each
>>> connection.
>>>
> Thanks for review ~
>> Hi:
>>
>> Just have the following questions in my mind (some has been raised in
>> the previous rounds of discussion without a conclusion):
>>
>> - What's the plan for management layer? The setup seems complicated so
>> we could not simply depend on user to do each step. (And for security
>> reason, qemu was usually run as unprivileged user)
> -We don't need to run as privileged user, colo-proxy just run like
> filter-buffer. usage: primary: -netdev tap,id=bn0 -device
> e1000,netdev=bn0 -object
> colo-proxy,id=f0,netdev=bn0,queue=all,side=primary,host=3.3.3.8,port=xxx
> secondary: -netdev tap,id=bn0 -device e1000,netdev=bn0 -object
> colo-proxy,id=f0,netdev=bn0,queue=all,side=secondary,server=tcp:xxxx:port

Ok.

>> - What's the plan for vhost? Userspace network in qemu is rather slow,
>> most user will choose vhost.
> colo-proxy in qemu space don't support vhost. people who want to use
> colo must disable vhost,  but virtio-net is another choice which is
> enough in most case.

Ok for function but not for performance :) There're lots of users that
cares about performance.

>> - What if application generate packet based on hwrng device? This will
>> produce always different packets.
> just like hailiang said.
>> - Not sure SLIRP is perfect matched for this task. As has been raised,
>> another method is to decouple the packet comparing from qemu. In this
>> way, lots of open source userspace stack could be used.
> -we just need the some capabilities(such as IP frag/defrag) of SLIRP。
> We have investigated some open source userspace stack,but not find one
> better to SLIRP. if you know,please tell me.

I think it's ok to use SLIRP. But it has some drawbacks:

- Lacking ipv6 support. Which means you need implement this (there're
rfc posted in the list) even if you may only want the (de)fragmentation.
- Not used in any production environment AFAIK, so maybe buggy and you
need to fix the bugs.

So if possible choosing an certified ip stack which may save lots of
attentions.

For userspace IP implementation, not very familiar, google gives me this
something like uip, lwip and libuinet.

>> - Haven't read the code of packet comparing, but if it needs to keep
>> track the state of each connection, it could be easily DOS from guest.
>
> -We think preventDOS from guest is out of our focus,it should be
> firewall to concerned.

Maybe net clear for the question. I mean e.g if you need to track the
state of each connection, is there a limitation of the maximum
connections that is accepted?

If yes, what if guest have more connections than this, switch to
periodic mode?
If no, guest could exhaust the host memory by faking connections in guest.

>
>> Thanks
>> .
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-10  9:41   ` Dr. David Alan Gilbert
@ 2015-11-11  3:09     ` Jason Wang
  2015-11-11  9:03       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-11-11  3:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tkid, lizhijian, jan.kiszka, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, luis, stefanha, guijianfeng,
	zhang.zhanghailiang



On 11/10/2015 05:41 PM, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
>>
>> On 11/10/2015 01:26 PM, Tkid wrote:
>>> Hi,all
>>>
>>> We are planning to reimplement colo proxy in userspace (Here is in
>>> qemu) to
>>> cache and compare net packets.This module is one of the important
>>> components
>>> of COLO project and now it is still in early stage, so any comments and
>>> feedback are warmly welcomed,thanks in advance.
>>>
>>> ## Background
>>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>>> Service)
>>> project is a high availability solution. Both Primary VM (PVM) and
>>> Secondary VM
>>> (SVM) run in parallel. They receive the same request from client, and
>>> generate
>>> responses in parallel too. If the response packets from PVM and SVM are
>>> identical, they are released immediately. Otherwise, a VM checkpoint
>>> (on demand)
>>> is conducted.
>>> Paper:
>>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>>> COLO on Xen:
>>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>>> COLO on Qemu/KVM:
>>> http://wiki.qemu.org/Features/COLO
>>>
>>> By the needs of capturing response packets from PVM and SVM and
>>> finding out
>>> whether they are identical, we introduce a new module to qemu
>>> networking called
>>> colo-proxy.
>>>
>>> This document describes the design of the colo-proxy module
>>>
>>> ## Glossary
>>>   PVM - Primary VM, which provides services to clients.
>>>   SVM - Secondary VM, a hot standby and replication of PVM.
>>>   PN - Primary Node, the host which PVM runs on
>>>   SN - Secondary Node, the host which SVM runs on
>>>
>>> ## Our Idea ##
>>>
>>> COLO-Proxy
>>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
>>> plugin for
>>> qemu net filter.the function keep SVM connect normal to PVM and compare
>>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
>>>
>>> == Workflow ==
>>>
>>>
>>> +--+                                      +--+
>>> |PN|                                      |SN|
>>> +-----------------------+                 +-----------------------+
>>> | +-------------------+ |                 | +-------------------+ |
>>> | |                   | |                 | |                   | |
>>> | |        PVM        | |                 | |        SVM        | |
>>> | |                   | |                 | |                   | |
>>> | +--+-^--------------+ |                 | +-------------^----++ |
>>> |    | |                |                 |               |    |  |
>>> |    | | +------------+ |                 | +-----------+ |    |  |
>>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
>>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
>>> |    | | |            | |      (6)        | |           | |    |  |
>>> |    | | +-----^------+ |                 | +-----------+ |    |  |
>>> |    | |   (5) |        |                 |               |    |  |
>>> |    | |       |        |                 |               |    |  |
>>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
>>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
>>> | |      +-----+------+ |                 | +-----------------+ | |
>>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
>>> | +-------------------+ | Forward(socket) | +-------------------+ |
>>> ++Qemu+-----------------+                 ++Qemu+-----------------+
>>>            | ^
>>>            | |
>>>            | |
>>>   +--------v-+--------+
>>>   |                   |
>>>   |      Client       |
>>>   |                   |
>>>   +-------------------+
>>>
>>>
>>>
>>>
>>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
>>> packets to
>>> SN COLO-Proxy.
>>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
>>> adjusted packets to SVM
>>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
>>> COLO-Proxy.
>>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
>>> compare PVM's packets data with SVM's packets data. If packets is
>>> different, compare
>>> module notify COLO CheckPoint module to do a checkpoint then send
>>> PVM's packets to
>>> client and drop SVM's packets, otherwise, just send PVM's packets to
>>> client and
>>> drop SVM's packets.
>>> (5)notify COLO-Checkpoint module checkpoint is needed
>>> (6)Do COLO-Checkpoint
>>>
>>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
>>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
>>> looking
>>> into QEMU, we found that SLIRP
>>>
>>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
>>>
>>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
>>> QEMU, it can
>>> help use to handle the packet written to/read from backend(tap) device
>>> which is
>>> just like a link layer(L2) packet.
>>>
>>> ### Packet enqueue and compare ###
>>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
>>> PVM and
>>> SVM on Primary QEMU, and then compare the packet payload for each
>>> connection.
>>>
>> Hi:
>>
>> Just have the following questions in my mind (some has been raised in
>> the previous rounds of discussion without a conclusion):
>>
>> - What's the plan for management layer? The setup seems complicated so
>> we could not simply depend on user to do each step. (And for security
>> reason, qemu was usually run as unprivileged user)
> It's certainly easier than the current COLO code that relies on a very
> complex set of bridges, extra network interfaces and kernel modules.
> UMU  (cc'd) have been working on a libvirt set that starts COLO up, although
> one bit that's very messy is the curretn kernel based network comparison
> code.

Ok.

>> - What's the plan for vhost? Userspace network in qemu is rather slow,
>> most user will choose vhost.
>> - What if application generate packet based on hwrng device? This will
>> produce always different packets.
> Yes, there are cases this happens - COLO's worst case is similar to simple
> checkpointing (because it has a limit to the smallest checkpoint), but it's
> best case is much better, on a compute heavy load, it ends up taking
> a checkpoint very rarely.
> Actually the big problem is where randomness occurs in unexpected places,
> e.g. where things like Perl's hash randomisation means that the two
> hosts produce the same data in different orders. 

Not familiar with this, but unlike the hwrng, if the random data was
computed by software, after a synchronization, it still has the
possibility to produce the same result for a while.

>
>> - Not sure SLIRP is perfect matched for this task. As has been raised, 
>> another method is to decouple the packet comparing from qemu. In this
>> way, lots of open source userspace stack could be used.
>> - Haven't read the code of packet comparing, but if it needs to keep
>> track the state of each connection, it could be easily DOS from guest.
> The guest can only break it's own networking; so shooting itself in the foot
> is no big deal.
>
> Dave

The question is for the packet comparing, if the number of connections
in guest exceed the maximum connections it could track, what will it do? 

>
>> Thanks
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-11  1:23   ` Dong, Eddie
@ 2015-11-11  3:26     ` Jason Wang
  0 siblings, 0 replies; 63+ messages in thread
From: Jason Wang @ 2015-11-11  3:26 UTC (permalink / raw)
  To: Dong, Eddie, Tkid, qemu-devel, stefanha
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, peter.huangpeng,
	dgilbert, arei.gonglei, guijianfeng



On 11/11/2015 09:23 AM, Dong, Eddie wrote:
>> - What's the plan for vhost? Userspace network in qemu is rather slow, most
>> user will choose vhost.
> [Dong, Eddie] Hi Jason:
> 	How about we take staging approach? In general, COLO opens a door of high performance HA solution, but it will take very long time to make everything perfect. As for the network virtualization, I think we may start from usage with moderate network bandwidth, like 1Gbps. Otherwise, the performance of COLO may be not that good (of course, like David mentioned, the worst case is same with periodic checkpoint). At the moment, how about we start from in Qemu virtio network, and enhance for vhost case in future? 

Yes, of course and it makes sense.

Mentioning vhost here is to avoid re-inventing or abandoning some exist
infrastructures. For example, thinking how netfiler can work with vhost
from the beginning does not harm ...

>
> 	The good thing is that we get more people working on the patch series, and glad to see UMU also joined the effort.  Thanks and welcome...
>
> Thx Eddie

Thanks

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-11  3:09     ` Jason Wang
@ 2015-11-11  9:03       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-11  9:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tkid, lizhijian, jan.kiszka, eddie.dong, peter.huangpeng,
	qemu-devel, arei.gonglei, luis, stefanha, guijianfeng,
	zhang.zhanghailiang

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 11/10/2015 05:41 PM, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> >>
> >> On 11/10/2015 01:26 PM, Tkid wrote:
> >>> Hi,all
> >>>
> >>> We are planning to reimplement colo proxy in userspace (Here is in
> >>> qemu) to
> >>> cache and compare net packets.This module is one of the important
> >>> components
> >>> of COLO project and now it is still in early stage, so any comments and
> >>> feedback are warmly welcomed,thanks in advance.
> >>>
> >>> ## Background
> >>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> >>> Service)
> >>> project is a high availability solution. Both Primary VM (PVM) and
> >>> Secondary VM
> >>> (SVM) run in parallel. They receive the same request from client, and
> >>> generate
> >>> responses in parallel too. If the response packets from PVM and SVM are
> >>> identical, they are released immediately. Otherwise, a VM checkpoint
> >>> (on demand)
> >>> is conducted.
> >>> Paper:
> >>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> >>> COLO on Xen:
> >>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> >>> COLO on Qemu/KVM:
> >>> http://wiki.qemu.org/Features/COLO
> >>>
> >>> By the needs of capturing response packets from PVM and SVM and
> >>> finding out
> >>> whether they are identical, we introduce a new module to qemu
> >>> networking called
> >>> colo-proxy.
> >>>
> >>> This document describes the design of the colo-proxy module
> >>>
> >>> ## Glossary
> >>>   PVM - Primary VM, which provides services to clients.
> >>>   SVM - Secondary VM, a hot standby and replication of PVM.
> >>>   PN - Primary Node, the host which PVM runs on
> >>>   SN - Secondary Node, the host which SVM runs on
> >>>
> >>> ## Our Idea ##
> >>>
> >>> COLO-Proxy
> >>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a
> >>> plugin for
> >>> qemu net filter.the function keep SVM connect normal to PVM and compare
> >>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint.
> >>>
> >>> == Workflow ==
> >>>
> >>>
> >>> +--+                                      +--+
> >>> |PN|                                      |SN|
> >>> +-----------------------+                 +-----------------------+
> >>> | +-------------------+ |                 | +-------------------+ |
> >>> | |                   | |                 | |                   | |
> >>> | |        PVM        | |                 | |        SVM        | |
> >>> | |                   | |                 | |                   | |
> >>> | +--+-^--------------+ |                 | +-------------^----++ |
> >>> |    | |                |                 |               |    |  |
> >>> |    | | +------------+ |                 | +-----------+ |    |  |
> >>> |    | | |    COLO    | |    (socket)     | |    COLO   | |    |  |
> >>> |    | | | CheckPoint +---------------------> CheckPoint| |    |  |
> >>> |    | | |            | |      (6)        | |           | |    |  |
> >>> |    | | +-----^------+ |                 | +-----------+ |    |  |
> >>> |    | |   (5) |        |                 |               |    |  |
> >>> |    | |       |        |                 |               |    |  |
> >>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ |
> >>> | |COLO Proxy  |      +-------+(1)+--------->seq&ack adjust(2)| | |
> >>> | |      +-----+------+ |                 | +-----------------+ | |
> >>> | |      | Compare(4) <-------+(3)+---------+     COLO Proxy    | |
> >>> | +-------------------+ | Forward(socket) | +-------------------+ |
> >>> ++Qemu+-----------------+                 ++Qemu+-----------------+
> >>>            | ^
> >>>            | |
> >>>            | |
> >>>   +--------v-+--------+
> >>>   |                   |
> >>>   |      Client       |
> >>>   |                   |
> >>>   +-------------------+
> >>>
> >>>
> >>>
> >>>
> >>> (1)When PN receive client packets,PN COLO-Proxy copy and forward
> >>> packets to
> >>> SN COLO-Proxy.
> >>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send
> >>> adjusted packets to SVM
> >>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> >>> COLO-Proxy.
> >>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then
> >>> compare PVM's packets data with SVM's packets data. If packets is
> >>> different, compare
> >>> module notify COLO CheckPoint module to do a checkpoint then send
> >>> PVM's packets to
> >>> client and drop SVM's packets, otherwise, just send PVM's packets to
> >>> client and
> >>> drop SVM's packets.
> >>> (5)notify COLO-Checkpoint module checkpoint is needed
> >>> (6)Do COLO-Checkpoint
> >>>
> >>> ### QEMU space TCP/IP stack(Based on SLIRP) ###
> >>> We need a QEMU space TCP/IP stack to help us to analysis packet. After
> >>> looking
> >>> into QEMU, we found that SLIRP
> >>>
> >>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
> >>>
> >>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within
> >>> QEMU, it can
> >>> help use to handle the packet written to/read from backend(tap) device
> >>> which is
> >>> just like a link layer(L2) packet.
> >>>
> >>> ### Packet enqueue and compare ###
> >>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by
> >>> PVM and
> >>> SVM on Primary QEMU, and then compare the packet payload for each
> >>> connection.
> >>>
> >> Hi:
> >>
> >> Just have the following questions in my mind (some has been raised in
> >> the previous rounds of discussion without a conclusion):
> >>
> >> - What's the plan for management layer? The setup seems complicated so
> >> we could not simply depend on user to do each step. (And for security
> >> reason, qemu was usually run as unprivileged user)
> > It's certainly easier than the current COLO code that relies on a very
> > complex set of bridges, extra network interfaces and kernel modules.
> > UMU  (cc'd) have been working on a libvirt set that starts COLO up, although
> > one bit that's very messy is the curretn kernel based network comparison
> > code.
> 
> Ok.
> 
> >> - What's the plan for vhost? Userspace network in qemu is rather slow,
> >> most user will choose vhost.
> >> - What if application generate packet based on hwrng device? This will
> >> produce always different packets.
> > Yes, there are cases this happens - COLO's worst case is similar to simple
> > checkpointing (because it has a limit to the smallest checkpoint), but it's
> > best case is much better, on a compute heavy load, it ends up taking
> > a checkpoint very rarely.
> > Actually the big problem is where randomness occurs in unexpected places,
> > e.g. where things like Perl's hash randomisation means that the two
> > hosts produce the same data in different orders. 
> 
> Not familiar with this, but unlike the hwrng, if the random data was
> computed by software, after a synchronization, it still has the
> possibility to produce the same result for a while.

The hwrng, variation in tsc or anything else that feeds the entropy pool
can do cause the divergence.

> >> - Not sure SLIRP is perfect matched for this task. As has been raised, 
> >> another method is to decouple the packet comparing from qemu. In this
> >> way, lots of open source userspace stack could be used.
> >> - Haven't read the code of packet comparing, but if it needs to keep
> >> track the state of each connection, it could be easily DOS from guest.
> > The guest can only break it's own networking; so shooting itself in the foot
> > is no big deal.
> >
> > Dave
> 
> The question is for the packet comparing, if the number of connections
> in guest exceed the maximum connections it could track, what will it do? 

The slow case choice is to just force simple checkpointing mode.
However, I don't think this case is any different from the kernel's stateful
connection tracking used in iptables for firewalling.

Dave

> 
> >
> >> Thanks
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC]colo-proxy in qemu
  2015-11-11  2:46   ` Zhang Chen
@ 2015-11-13 12:33     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-11-13 12:33 UTC (permalink / raw)
  To: Zhang Chen
  Cc: zhang.zhanghailiang, lizhijian, jan.kiszka, jasowang, eddie.dong,
	qemu-devel, peter.huangpeng, arei.gonglei, stefanha, guijianfeng

* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote:
> 
> 
> On 11/10/2015 06:54 PM, Dr. David Alan Gilbert wrote:
> >* Tkid (zhangchen.fnst@cn.fujitsu.com) wrote:


> >>(3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu
> >>COLO-Proxy.
> >What protocol are you using for the data carried over the Forward(socket)?
> >I'm just wondering if there's an existing layer2 tunneling protocol that
> >it would be best to use.
> Currently,we will use raw tcp socket send a packet's length then send the
> packet.

Yes OK (I think there's a qemu -net socket option that does something similar).

Dave

> >
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-31  1:28                           ` zhanghailiang
@ 2015-07-31  1:31                             ` Yang Hongyang
  0 siblings, 0 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-31  1:31 UTC (permalink / raw)
  To: zhanghailiang, Dr. David Alan Gilbert
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, Gonglei, stefanha

On 07/31/2015 09:28 AM, zhanghailiang wrote:
> On 2015/7/31 9:08, Yang Hongyang wrote:
>>
>>
>> On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote:
>>> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>>>
>>>>
>>>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
>>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>>>>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For
>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>>>>>> data?
>>>>>>>>>>>>> Good question, the random data connection is a big problem for
>>>>>>>>>>>>> COLO. At
>>>>>>>>>>>>> present, it will trigger checkpoint processing because of the
>>>>>>>>>>>>> different random
>>>>>>>>>>>>> data.
>>>>>>>>>>>>> I don't think any mechanisms can assure two different machines
>>>>>>>>>>>>> generate the
>>>>>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>>>>>> performance poor. :(
>>>>>>>>>>>>>
>>>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have
>>>>>>>>>>>> identical internal state, so the pattern used to generate random
>>>>>>>>>>>> data has high possibility to generate identical data at short time,
>>>>>>>>>>>> at least...
>>>>>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>>>>>> reliably cause a checkpoint because of simple randomness in
>>>>>>>>>>> applications.
>>>>>>>>>>>
>>>>>>>>>>> Dave
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>>>>>
>>>>>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>>>>>> once established, tends to work well without triggering checkpoints;
>>>>>>>>> and static web pages also work well.  Examples of things that do cause
>>>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>>>>>> it's front page was a fun one), I think also establishing
>>>>>>>>> new encrypted connections cause the same randomness.
>>>>>>>>>
>>>>>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>>>>>> number of checkpoints compared to a simple checkpointing world
>>>>>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>>>>>> and for compute bound workloads, or ones that don't expose
>>>>>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>>>>>> in length which greatly reduces the overhead.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes. That's the truth.
>>>>>>>> We can set two different modes for different scenarios. Maybe Named
>>>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>>>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>>>>>
>>>>>>>> But that's the next plan, we are thinking about that.
>>>>>>>
>>>>>>> I have some code that tries to automatically switch between those;
>>>>>>> it measures the checkpoint lengths, and if they're consistently short
>>>>>>> it sends a different message byte to the secondary at the start of the
>>>>>>> checkpoint, so that it doesn't bother running.   Every so often it
>>>>>>> then flips back to a COLO checkpoint to see if the checkpoints
>>>>>>> are still really fast.
>>>>>>>
>>>>>>
>>>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint
>>>>>> but just send a special message to SVM?
>>>>>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>>>>>
>>>>>    We still have to do checkpoints, but we send a special message to the
>>>>> SVM so that
>>>>> the SVM just takes the checkpoint but does not run.
>>>>>
>>>>>    I'll send the code after I've updated it to your current version; but it's
>>>>> quite rough/experimental.
>>>>>
>>>>> It works something like
>>>>>
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>
>>>>>       mode       miscompare         < After a few short runs
>>>>>                  checkpoint
>>>>>   -----------run PVM     SVM idle   \
>>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>>       mode       checkpoint         /
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <short gap>          < Still a short gap
>>>>>       mode       miscompare
>>>>>   -----------run PVM     SVM idle   \
>>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>>       mode       checkpoint         /
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>          < long gap now, stay in COLO
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>   -----------run PVM     run SVM
>>>>>       COLO     <long gap>
>>>>>       mode       miscompare
>>>>>                  checkpoint
>>>>>
>>>>> So it saves the CPU time on the SVM, and the comparison traffic, and is
>>>>> automatic at switching into the passive mode.
>>>>>
>>>>> It used to be more useful, but your minimum COLO run time that you
>>>>> added a few versions ago helps a lot in the cases where there are miscompares,
>>>>> and the delay after the miscompare before you take the checkpoint also helps
>>>>> in the case where the data is very random.
>>>>
>>>> This is great! This is exactly what we were thinking about, when random
>>>> scenario will fallback to MC/Remus like FT. Thank you very much!
>>>> I have a question, do you also modify colo-proxy kernel module? because
>>>> in the fixed checkpoint mode, I think we need to buffer the network
>>>> packets, and release them at checkpoint.
>>>
>>> Yes, we do need to buffer and release them at the end, but I've not modified
>>> colo-proxy so far.  Doesn't the current code on PMY already need to buffer
>>> packets
>>> that are generated after the first miscompare
>>
>> Yes, they are buffered,
>>
>>> and before the checkpoint and
>>> then release them at the checkpoint?
>>
>> but will be release only if the packets compare returns identical. so in order
>> to support this fallback mode, we need to modify it to release the packets at
>> the checkpoint, there won't be too much code though.
>>
>
> No, when do checkpoint, we will send all the residual queued packets.
> So it is already supported.

Great, my memory is wrong, sorry...

>
>>>
>>> Dave
>>>
>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> -Gonglei
>>>>>>>>
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>> .
>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Yang.
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-31  1:08                         ` Yang Hongyang
@ 2015-07-31  1:28                           ` zhanghailiang
  2015-07-31  1:31                             ` Yang Hongyang
  0 siblings, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-07-31  1:28 UTC (permalink / raw)
  To: Yang Hongyang, Dr. David Alan Gilbert
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, Gonglei, stefanha

On 2015/7/31 9:08, Yang Hongyang wrote:
>
>
> On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote:
>> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>>
>>>
>>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
>>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>>>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>>>>> data?
>>>>>>>>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>>>>>>>>> present, it will trigger checkpoint processing because of the different random
>>>>>>>>>>>> data.
>>>>>>>>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>>>>
>>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>>>>> performance poor. :(
>>>>>>>>>>>>
>>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>>>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>>>>> reliably cause a checkpoint because of simple randomness in applications.
>>>>>>>>>>
>>>>>>>>>> Dave
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>>>>
>>>>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>>>>> once established, tends to work well without triggering checkpoints;
>>>>>>>> and static web pages also work well.  Examples of things that do cause
>>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>>>>> it's front page was a fun one), I think also establishing
>>>>>>>> new encrypted connections cause the same randomness.
>>>>>>>>
>>>>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>>>>> number of checkpoints compared to a simple checkpointing world
>>>>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>>>>> and for compute bound workloads, or ones that don't expose
>>>>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>>>>> in length which greatly reduces the overhead.
>>>>>>>>
>>>>>>>
>>>>>>> Yes. That's the truth.
>>>>>>> We can set two different modes for different scenarios. Maybe Named
>>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>>>>
>>>>>>> But that's the next plan, we are thinking about that.
>>>>>>
>>>>>> I have some code that tries to automatically switch between those;
>>>>>> it measures the checkpoint lengths, and if they're consistently short
>>>>>> it sends a different message byte to the secondary at the start of the
>>>>>> checkpoint, so that it doesn't bother running.   Every so often it
>>>>>> then flips back to a COLO checkpoint to see if the checkpoints
>>>>>> are still really fast.
>>>>>>
>>>>>
>>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
>>>>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>>>>
>>>>    We still have to do checkpoints, but we send a special message to the SVM so that
>>>> the SVM just takes the checkpoint but does not run.
>>>>
>>>>    I'll send the code after I've updated it to your current version; but it's
>>>> quite rough/experimental.
>>>>
>>>> It works something like
>>>>
>>>>   -----------run PVM     run SVM
>>>>       COLO     <long gap>
>>>>       mode       miscompare
>>>>                  checkpoint
>>>>   -----------run PVM     run SVM
>>>>       COLO     <short gap>
>>>>       mode       miscompare
>>>>                  checkpoint
>>>>   -----------run PVM     run SVM
>>>>       COLO     <short gap>
>>>>       mode       miscompare         < After a few short runs
>>>>                  checkpoint
>>>>   -----------run PVM     SVM idle   \
>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>       mode       checkpoint         /
>>>>   -----------run PVM     run SVM
>>>>       COLO     <short gap>          < Still a short gap
>>>>       mode       miscompare
>>>>   -----------run PVM     SVM idle   \
>>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>>       mode       checkpoint         /
>>>>   -----------run PVM     run SVM
>>>>       COLO     <long gap>          < long gap now, stay in COLO
>>>>       mode       miscompare
>>>>                  checkpoint
>>>>   -----------run PVM     run SVM
>>>>       COLO     <long gap>
>>>>       mode       miscompare
>>>>                  checkpoint
>>>>
>>>> So it saves the CPU time on the SVM, and the comparison traffic, and is
>>>> automatic at switching into the passive mode.
>>>>
>>>> It used to be more useful, but your minimum COLO run time that you
>>>> added a few versions ago helps a lot in the cases where there are miscompares,
>>>> and the delay after the miscompare before you take the checkpoint also helps
>>>> in the case where the data is very random.
>>>
>>> This is great! This is exactly what we were thinking about, when random
>>> scenario will fallback to MC/Remus like FT. Thank you very much!
>>> I have a question, do you also modify colo-proxy kernel module? because
>>> in the fixed checkpoint mode, I think we need to buffer the network
>>> packets, and release them at checkpoint.
>>
>> Yes, we do need to buffer and release them at the end, but I've not modified
>> colo-proxy so far.  Doesn't the current code on PMY already need to buffer packets
>> that are generated after the first miscompare
>
> Yes, they are buffered,
>
>> and before the checkpoint and
>> then release them at the checkpoint?
>
> but will be release only if the packets compare returns identical. so in order
> to support this fallback mode, we need to modify it to release the packets at
> the checkpoint, there won't be too much code though.
>

No, when do checkpoint, we will send all the residual queued packets.
So it is already supported.

>>
>> Dave
>>
>>>
>>>>
>>>> Dave
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>> Dave
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> -Gonglei
>>>>>>>
>>>>>> --
>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>> .
>>>>
>>>
>>> --
>>> Thanks,
>>> Yang.
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> .
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 17:53                       ` Dr. David Alan Gilbert
  2015-07-31  1:08                         ` Yang Hongyang
@ 2015-07-31  1:26                         ` zhanghailiang
  1 sibling, 0 replies; 63+ messages in thread
From: zhanghailiang @ 2015-07-31  1:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Yang Hongyang
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, Gonglei, stefanha

On 2015/7/31 1:53, Dr. David Alan Gilbert wrote:
> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>
>>
>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>>>> data?
>>>>>>>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>>>>>>>> present, it will trigger checkpoint processing because of the different random
>>>>>>>>>>> data.
>>>>>>>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>>>
>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>>>> performance poor. :(
>>>>>>>>>>>
>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>>>> reliably cause a checkpoint because of simple randomness in applications.
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>
>>>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>>>
>>>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>>>> once established, tends to work well without triggering checkpoints;
>>>>>>> and static web pages also work well.  Examples of things that do cause
>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>>>> it's front page was a fun one), I think also establishing
>>>>>>> new encrypted connections cause the same randomness.
>>>>>>>
>>>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>>>> number of checkpoints compared to a simple checkpointing world
>>>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>>>> and for compute bound workloads, or ones that don't expose
>>>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>>>> in length which greatly reduces the overhead.
>>>>>>>
>>>>>>
>>>>>> Yes. That's the truth.
>>>>>> We can set two different modes for different scenarios. Maybe Named
>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>>>
>>>>>> But that's the next plan, we are thinking about that.
>>>>>
>>>>> I have some code that tries to automatically switch between those;
>>>>> it measures the checkpoint lengths, and if they're consistently short
>>>>> it sends a different message byte to the secondary at the start of the
>>>>> checkpoint, so that it doesn't bother running.   Every so often it
>>>>> then flips back to a COLO checkpoint to see if the checkpoints
>>>>> are still really fast.
>>>>>
>>>>
>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
>>>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>>>
>>>    We still have to do checkpoints, but we send a special message to the SVM so that
>>> the SVM just takes the checkpoint but does not run.
>>>
>>>    I'll send the code after I've updated it to your current version; but it's
>>> quite rough/experimental.
>>>

Yes, please, we can merge them into our branch. ;)

>>> It works something like
>>>
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>
>>>       mode       miscompare         < After a few short runs
>>>                  checkpoint
>>>   -----------run PVM     SVM idle   \
>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>       mode       checkpoint         /
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>          < Still a short gap
>>>       mode       miscompare
>>>   -----------run PVM     SVM idle   \
>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>       mode       checkpoint         /
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>          < long gap now, stay in COLO
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>
>>> So it saves the CPU time on the SVM, and the comparison traffic, and is
>>> automatic at switching into the passive mode.
>>>

That's a good solution, actually, we have a plan to realize the checkpoint strategy,
which can automatically adapt to different situation, including period checkpoint (MC/Remus mode),
COLO mode, mix mode (just like your above method), also, it it a good idea to apply command for users to
choose the mode~

>>> It used to be more useful, but your minimum COLO run time that you
>>> added a few versions ago helps a lot in the cases where there are miscompares,
>>> and the delay after the miscompare before you take the checkpoint also helps
>>> in the case where the data is very random.
>>
>> This is great! This is exactly what we were thinking about, when random
>> scenario will fallback to MC/Remus like FT. Thank you very much!
>> I have a question, do you also modify colo-proxy kernel module? because
>> in the fixed checkpoint mode, I think we need to buffer the network
>> packets, and release them at checkpoint.
>
> Yes, we do need to buffer and release them at the end, but I've not modified
> colo-proxy so far.  Doesn't the current code on PMY already need to buffer packets
> that are generated after the first miscompare and before the checkpoint and
> then release them at the checkpoint?
>

Yes, we have support this already, for PMY, it will queue all packets sent from PVM,
and waits for corresponding net packets sent from SVM, after comparing, if it is consistent
it will be send out, if not, there will be a mis-compare checkpoint request. After a checkpoint,
we will release all packets that queued. So if the SVM is not running, we will queued all packets,
without any comparison, and just release them after a new checkpoint.

Thanks,
zhanghailiang

>>
>>>
>>> Dave
>>>
>>>>
>>>> Thanks.
>>>>
>>>>> Dave
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> -Gonglei
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 17:53                       ` Dr. David Alan Gilbert
@ 2015-07-31  1:08                         ` Yang Hongyang
  2015-07-31  1:28                           ` zhanghailiang
  2015-07-31  1:26                         ` zhanghailiang
  1 sibling, 1 reply; 63+ messages in thread
From: Yang Hongyang @ 2015-07-31  1:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie,
	peter.huangpeng, qemu-devel, Gonglei, stefanha



On 07/31/2015 01:53 AM, Dr. David Alan Gilbert wrote:
> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>
>>
>> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>>>> data?
>>>>>>>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>>>>>>>> present, it will trigger checkpoint processing because of the different random
>>>>>>>>>>> data.
>>>>>>>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>>>
>>>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>>>> performance poor. :(
>>>>>>>>>>>
>>>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>>>> reliably cause a checkpoint because of simple randomness in applications.
>>>>>>>>>
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>
>>>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>>>
>>>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>>>> once established, tends to work well without triggering checkpoints;
>>>>>>> and static web pages also work well.  Examples of things that do cause
>>>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>>>> it's front page was a fun one), I think also establishing
>>>>>>> new encrypted connections cause the same randomness.
>>>>>>>
>>>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>>>> number of checkpoints compared to a simple checkpointing world
>>>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>>>> and for compute bound workloads, or ones that don't expose
>>>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>>>> in length which greatly reduces the overhead.
>>>>>>>
>>>>>>
>>>>>> Yes. That's the truth.
>>>>>> We can set two different modes for different scenarios. Maybe Named
>>>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>>>
>>>>>> But that's the next plan, we are thinking about that.
>>>>>
>>>>> I have some code that tries to automatically switch between those;
>>>>> it measures the checkpoint lengths, and if they're consistently short
>>>>> it sends a different message byte to the secondary at the start of the
>>>>> checkpoint, so that it doesn't bother running.   Every so often it
>>>>> then flips back to a COLO checkpoint to see if the checkpoints
>>>>> are still really fast.
>>>>>
>>>>
>>>> Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
>>>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>>>
>>>    We still have to do checkpoints, but we send a special message to the SVM so that
>>> the SVM just takes the checkpoint but does not run.
>>>
>>>    I'll send the code after I've updated it to your current version; but it's
>>> quite rough/experimental.
>>>
>>> It works something like
>>>
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>
>>>       mode       miscompare         < After a few short runs
>>>                  checkpoint
>>>   -----------run PVM     SVM idle   \
>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>       mode       checkpoint         /
>>>   -----------run PVM     run SVM
>>>       COLO     <short gap>          < Still a short gap
>>>       mode       miscompare
>>>   -----------run PVM     SVM idle   \
>>>     Passive    <fixed delay>        |  - repeat 'n' times
>>>       mode       checkpoint         /
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>          < long gap now, stay in COLO
>>>       mode       miscompare
>>>                  checkpoint
>>>   -----------run PVM     run SVM
>>>       COLO     <long gap>
>>>       mode       miscompare
>>>                  checkpoint
>>>
>>> So it saves the CPU time on the SVM, and the comparison traffic, and is
>>> automatic at switching into the passive mode.
>>>
>>> It used to be more useful, but your minimum COLO run time that you
>>> added a few versions ago helps a lot in the cases where there are miscompares,
>>> and the delay after the miscompare before you take the checkpoint also helps
>>> in the case where the data is very random.
>>
>> This is great! This is exactly what we were thinking about, when random
>> scenario will fallback to MC/Remus like FT. Thank you very much!
>> I have a question, do you also modify colo-proxy kernel module? because
>> in the fixed checkpoint mode, I think we need to buffer the network
>> packets, and release them at checkpoint.
>
> Yes, we do need to buffer and release them at the end, but I've not modified
> colo-proxy so far.  Doesn't the current code on PMY already need to buffer packets
> that are generated after the first miscompare

Yes, they are buffered,

> and before the checkpoint and
> then release them at the checkpoint?

but will be release only if the packets compare returns identical. so in order
to support this fallback mode, we need to modify it to release the packets at
the checkpoint, there won't be too much code though.

>
> Dave
>
>>
>>>
>>> Dave
>>>
>>>>
>>>> Thanks.
>>>>
>>>>> Dave
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> -Gonglei
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 15:17                     ` Yang Hongyang
@ 2015-07-30 17:53                       ` Dr. David Alan Gilbert
  2015-07-31  1:08                         ` Yang Hongyang
  2015-07-31  1:26                         ` zhanghailiang
  0 siblings, 2 replies; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-30 17:53 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie,
	peter.huangpeng, qemu-devel, Gonglei, stefanha

* Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
> 
> 
> On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
> >* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> >>On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
> >>>* Gonglei (arei.gonglei@huawei.com) wrote:
> >>>>On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> >>>>>* Jason Wang (jasowang@redhat.com) wrote:
> >>>>>>
> >>>>>>
> >>>>>>On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> >>>>>>>* Dong, Eddie (eddie.dong@intel.com) wrote:
> >>>>>>>>>>A question here, the packet comparing may be very tricky. For example,
> >>>>>>>>>>some protocol use random data to generate unpredictable id or
> >>>>>>>>>>something else. One example is ipv6_select_ident() in Linux. So COLO
> >>>>>>>>>>needs a mechanism to make sure PVM and SVM can generate same random
> >>>>>>>>>data?
> >>>>>>>>>Good question, the random data connection is a big problem for COLO. At
> >>>>>>>>>present, it will trigger checkpoint processing because of the different random
> >>>>>>>>>data.
> >>>>>>>>>I don't think any mechanisms can assure two different machines generate the
> >>>>>>>>>same random data. If you have any ideas, pls tell us :)
> >>>>>>>>>
> >>>>>>>>>Frequent checkpoint can handle this scenario, but maybe will cause the
> >>>>>>>>>performance poor. :(
> >>>>>>>>>
> >>>>>>>>The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
> >>>>>>>They do diverge pretty quickly though; I have simple examples which
> >>>>>>>reliably cause a checkpoint because of simple randomness in applications.
> >>>>>>>
> >>>>>>>Dave
> >>>>>>>
> >>>>>>
> >>>>>>And it will become even worse if hwrng is used in guest.
> >>>>>
> >>>>>Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> >>>>>once established, tends to work well without triggering checkpoints;
> >>>>>and static web pages also work well.  Examples of things that do cause
> >>>>>more checkpoints are, displaying guest statistics (e.g. running top
> >>>>>in that ssh) which is timing dependent, and dynamically generated
> >>>>>web pages that include a unique ID (bugzilla's password reset link in
> >>>>>it's front page was a fun one), I think also establishing
> >>>>>new encrypted connections cause the same randomness.
> >>>>>
> >>>>>However, it's worth remembering that COLO is trying to reduce the
> >>>>>number of checkpoints compared to a simple checkpointing world
> >>>>>which would be aiming to do a checkpoint ~100 times a second,
> >>>>>and for compute bound workloads, or ones that don't expose
> >>>>>the randomness that much, it can get checkpoints of a few seconds
> >>>>>in length which greatly reduces the overhead.
> >>>>>
> >>>>
> >>>>Yes. That's the truth.
> >>>>We can set two different modes for different scenarios. Maybe Named
> >>>>1) frequent checkpoint mode for multi-connections and randomness scenarios
> >>>>and 2) non-frequent checkpoint mode for other scenarios.
> >>>>
> >>>>But that's the next plan, we are thinking about that.
> >>>
> >>>I have some code that tries to automatically switch between those;
> >>>it measures the checkpoint lengths, and if they're consistently short
> >>>it sends a different message byte to the secondary at the start of the
> >>>checkpoint, so that it doesn't bother running.   Every so often it
> >>>then flips back to a COLO checkpoint to see if the checkpoints
> >>>are still really fast.
> >>>
> >>
> >>Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
> >>Resume to common COLO mode until the checkpoint lengths is so not short ?
> >
> >   We still have to do checkpoints, but we send a special message to the SVM so that
> >the SVM just takes the checkpoint but does not run.
> >
> >   I'll send the code after I've updated it to your current version; but it's
> >quite rough/experimental.
> >
> >It works something like
> >
> >  -----------run PVM     run SVM
> >      COLO     <long gap>
> >      mode       miscompare
> >                 checkpoint
> >  -----------run PVM     run SVM
> >      COLO     <short gap>
> >      mode       miscompare
> >                 checkpoint
> >  -----------run PVM     run SVM
> >      COLO     <short gap>
> >      mode       miscompare         < After a few short runs
> >                 checkpoint
> >  -----------run PVM     SVM idle   \
> >    Passive    <fixed delay>        |  - repeat 'n' times
> >      mode       checkpoint         /
> >  -----------run PVM     run SVM
> >      COLO     <short gap>          < Still a short gap
> >      mode       miscompare
> >  -----------run PVM     SVM idle   \
> >    Passive    <fixed delay>        |  - repeat 'n' times
> >      mode       checkpoint         /
> >  -----------run PVM     run SVM
> >      COLO     <long gap>          < long gap now, stay in COLO
> >      mode       miscompare
> >                 checkpoint
> >  -----------run PVM     run SVM
> >      COLO     <long gap>
> >      mode       miscompare
> >                 checkpoint
> >
> >So it saves the CPU time on the SVM, and the comparison traffic, and is
> >automatic at switching into the passive mode.
> >
> >It used to be more useful, but your minimum COLO run time that you
> >added a few versions ago helps a lot in the cases where there are miscompares,
> >and the delay after the miscompare before you take the checkpoint also helps
> >in the case where the data is very random.
> 
> This is great! This is exactly what we were thinking about, when random
> scenario will fallback to MC/Remus like FT. Thank you very much!
> I have a question, do you also modify colo-proxy kernel module? because
> in the fixed checkpoint mode, I think we need to buffer the network
> packets, and release them at checkpoint.

Yes, we do need to buffer and release them at the end, but I've not modified
colo-proxy so far.  Doesn't the current code on PMY already need to buffer packets
that are generated after the first miscompare and before the checkpoint and
then release them at the checkpoint?

Dave

> 
> >
> >Dave
> >
> >>
> >>Thanks.
> >>
> >>>Dave
> >>>
> >>>>
> >>>>Regards,
> >>>>-Gonglei
> >>>>
> >>>--
> >>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>.
> >>>
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >.
> >
> 
> -- 
> Thanks,
> Yang.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 13:59                   ` Dr. David Alan Gilbert
@ 2015-07-30 15:17                     ` Yang Hongyang
  2015-07-30 17:53                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Yang Hongyang @ 2015-07-30 15:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, zhanghailiang
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, Gonglei, stefanha



On 07/30/2015 09:59 PM, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
>>> * Gonglei (arei.gonglei@huawei.com) wrote:
>>>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>>>
>>>>>>
>>>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>>>> data?
>>>>>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>>>>>> present, it will trigger checkpoint processing because of the different random
>>>>>>>>> data.
>>>>>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>>>
>>>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>>>> performance poor. :(
>>>>>>>>>
>>>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>>>> reliably cause a checkpoint because of simple randomness in applications.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>
>>>>>> And it will become even worse if hwrng is used in guest.
>>>>>
>>>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>>>> once established, tends to work well without triggering checkpoints;
>>>>> and static web pages also work well.  Examples of things that do cause
>>>>> more checkpoints are, displaying guest statistics (e.g. running top
>>>>> in that ssh) which is timing dependent, and dynamically generated
>>>>> web pages that include a unique ID (bugzilla's password reset link in
>>>>> it's front page was a fun one), I think also establishing
>>>>> new encrypted connections cause the same randomness.
>>>>>
>>>>> However, it's worth remembering that COLO is trying to reduce the
>>>>> number of checkpoints compared to a simple checkpointing world
>>>>> which would be aiming to do a checkpoint ~100 times a second,
>>>>> and for compute bound workloads, or ones that don't expose
>>>>> the randomness that much, it can get checkpoints of a few seconds
>>>>> in length which greatly reduces the overhead.
>>>>>
>>>>
>>>> Yes. That's the truth.
>>>> We can set two different modes for different scenarios. Maybe Named
>>>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>>>> and 2) non-frequent checkpoint mode for other scenarios.
>>>>
>>>> But that's the next plan, we are thinking about that.
>>>
>>> I have some code that tries to automatically switch between those;
>>> it measures the checkpoint lengths, and if they're consistently short
>>> it sends a different message byte to the secondary at the start of the
>>> checkpoint, so that it doesn't bother running.   Every so often it
>>> then flips back to a COLO checkpoint to see if the checkpoints
>>> are still really fast.
>>>
>>
>> Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
>> Resume to common COLO mode until the checkpoint lengths is so not short ?
>
>    We still have to do checkpoints, but we send a special message to the SVM so that
> the SVM just takes the checkpoint but does not run.
>
>    I'll send the code after I've updated it to your current version; but it's
> quite rough/experimental.
>
> It works something like
>
>   -----------run PVM     run SVM
>       COLO     <long gap>
>       mode       miscompare
>                  checkpoint
>   -----------run PVM     run SVM
>       COLO     <short gap>
>       mode       miscompare
>                  checkpoint
>   -----------run PVM     run SVM
>       COLO     <short gap>
>       mode       miscompare         < After a few short runs
>                  checkpoint
>   -----------run PVM     SVM idle   \
>     Passive    <fixed delay>        |  - repeat 'n' times
>       mode       checkpoint         /
>   -----------run PVM     run SVM
>       COLO     <short gap>          < Still a short gap
>       mode       miscompare
>   -----------run PVM     SVM idle   \
>     Passive    <fixed delay>        |  - repeat 'n' times
>       mode       checkpoint         /
>   -----------run PVM     run SVM
>       COLO     <long gap>          < long gap now, stay in COLO
>       mode       miscompare
>                  checkpoint
>   -----------run PVM     run SVM
>       COLO     <long gap>
>       mode       miscompare
>                  checkpoint
>
> So it saves the CPU time on the SVM, and the comparison traffic, and is
> automatic at switching into the passive mode.
>
> It used to be more useful, but your minimum COLO run time that you
> added a few versions ago helps a lot in the cases where there are miscompares,
> and the delay after the miscompare before you take the checkpoint also helps
> in the case where the data is very random.

This is great! This is exactly what we were thinking about, when random
scenario will fallback to MC/Remus like FT. Thank you very much!
I have a question, do you also modify colo-proxy kernel module? because
in the fixed checkpoint mode, I think we need to buffer the network
packets, and release them at checkpoint.

>
> Dave
>
>>
>> Thanks.
>>
>>> Dave
>>>
>>>>
>>>> Regards,
>>>> -Gonglei
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 12:42                 ` zhanghailiang
@ 2015-07-30 13:59                   ` Dr. David Alan Gilbert
  2015-07-30 15:17                     ` Yang Hongyang
  0 siblings, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-30 13:59 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, Gonglei, stefanha, Yang Hongyang

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
> >* Gonglei (arei.gonglei@huawei.com) wrote:
> >>On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> >>>* Jason Wang (jasowang@redhat.com) wrote:
> >>>>
> >>>>
> >>>>On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> >>>>>* Dong, Eddie (eddie.dong@intel.com) wrote:
> >>>>>>>>A question here, the packet comparing may be very tricky. For example,
> >>>>>>>>some protocol use random data to generate unpredictable id or
> >>>>>>>>something else. One example is ipv6_select_ident() in Linux. So COLO
> >>>>>>>>needs a mechanism to make sure PVM and SVM can generate same random
> >>>>>>>data?
> >>>>>>>Good question, the random data connection is a big problem for COLO. At
> >>>>>>>present, it will trigger checkpoint processing because of the different random
> >>>>>>>data.
> >>>>>>>I don't think any mechanisms can assure two different machines generate the
> >>>>>>>same random data. If you have any ideas, pls tell us :)
> >>>>>>>
> >>>>>>>Frequent checkpoint can handle this scenario, but maybe will cause the
> >>>>>>>performance poor. :(
> >>>>>>>
> >>>>>>The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
> >>>>>They do diverge pretty quickly though; I have simple examples which
> >>>>>reliably cause a checkpoint because of simple randomness in applications.
> >>>>>
> >>>>>Dave
> >>>>>
> >>>>
> >>>>And it will become even worse if hwrng is used in guest.
> >>>
> >>>Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> >>>once established, tends to work well without triggering checkpoints;
> >>>and static web pages also work well.  Examples of things that do cause
> >>>more checkpoints are, displaying guest statistics (e.g. running top
> >>>in that ssh) which is timing dependent, and dynamically generated
> >>>web pages that include a unique ID (bugzilla's password reset link in
> >>>it's front page was a fun one), I think also establishing
> >>>new encrypted connections cause the same randomness.
> >>>
> >>>However, it's worth remembering that COLO is trying to reduce the
> >>>number of checkpoints compared to a simple checkpointing world
> >>>which would be aiming to do a checkpoint ~100 times a second,
> >>>and for compute bound workloads, or ones that don't expose
> >>>the randomness that much, it can get checkpoints of a few seconds
> >>>in length which greatly reduces the overhead.
> >>>
> >>
> >>Yes. That's the truth.
> >>We can set two different modes for different scenarios. Maybe Named
> >>1) frequent checkpoint mode for multi-connections and randomness scenarios
> >>and 2) non-frequent checkpoint mode for other scenarios.
> >>
> >>But that's the next plan, we are thinking about that.
> >
> >I have some code that tries to automatically switch between those;
> >it measures the checkpoint lengths, and if they're consistently short
> >it sends a different message byte to the secondary at the start of the
> >checkpoint, so that it doesn't bother running.   Every so often it
> >then flips back to a COLO checkpoint to see if the checkpoints
> >are still really fast.
> >
> 
> Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
> Resume to common COLO mode until the checkpoint lengths is so not short ?

  We still have to do checkpoints, but we send a special message to the SVM so that
the SVM just takes the checkpoint but does not run.

  I'll send the code after I've updated it to your current version; but it's
quite rough/experimental.

It works something like

 -----------run PVM     run SVM
     COLO     <long gap>
     mode       miscompare
                checkpoint
 -----------run PVM     run SVM
     COLO     <short gap>
     mode       miscompare
                checkpoint
 -----------run PVM     run SVM
     COLO     <short gap>
     mode       miscompare         < After a few short runs
                checkpoint
 -----------run PVM     SVM idle   \
   Passive    <fixed delay>        |  - repeat 'n' times
     mode       checkpoint         /
 -----------run PVM     run SVM
     COLO     <short gap>          < Still a short gap
     mode       miscompare
 -----------run PVM     SVM idle   \
   Passive    <fixed delay>        |  - repeat 'n' times
     mode       checkpoint         /
 -----------run PVM     run SVM
     COLO     <long gap>          < long gap now, stay in COLO
     mode       miscompare
                checkpoint
 -----------run PVM     run SVM
     COLO     <long gap>
     mode       miscompare
                checkpoint
     
So it saves the CPU time on the SVM, and the comparison traffic, and is
automatic at switching into the passive mode.

It used to be more useful, but your minimum COLO run time that you
added a few versions ago helps a lot in the cases where there are miscompares,
and the delay after the miscompare before you take the checkpoint also helps
in the case where the data is very random.

Dave

> 
> Thanks.
> 
> >Dave
> >
> >>
> >>Regards,
> >>-Gonglei
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 12:30               ` Dr. David Alan Gilbert
@ 2015-07-30 12:42                 ` zhanghailiang
  2015-07-30 13:59                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-07-30 12:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Gonglei
  Cc: Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie, peter.huangpeng,
	qemu-devel, stefanha, Yang Hongyang

On 2015/7/30 20:30, Dr. David Alan Gilbert wrote:
> * Gonglei (arei.gonglei@huawei.com) wrote:
>> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
>>> * Jason Wang (jasowang@redhat.com) wrote:
>>>>
>>>>
>>>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>>>> some protocol use random data to generate unpredictable id or
>>>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>>>> data?
>>>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>>>> present, it will trigger checkpoint processing because of the different random
>>>>>>> data.
>>>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>>>
>>>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>>>> performance poor. :(
>>>>>>>
>>>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>>>> They do diverge pretty quickly though; I have simple examples which
>>>>> reliably cause a checkpoint because of simple randomness in applications.
>>>>>
>>>>> Dave
>>>>>
>>>>
>>>> And it will become even worse if hwrng is used in guest.
>>>
>>> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
>>> once established, tends to work well without triggering checkpoints;
>>> and static web pages also work well.  Examples of things that do cause
>>> more checkpoints are, displaying guest statistics (e.g. running top
>>> in that ssh) which is timing dependent, and dynamically generated
>>> web pages that include a unique ID (bugzilla's password reset link in
>>> it's front page was a fun one), I think also establishing
>>> new encrypted connections cause the same randomness.
>>>
>>> However, it's worth remembering that COLO is trying to reduce the
>>> number of checkpoints compared to a simple checkpointing world
>>> which would be aiming to do a checkpoint ~100 times a second,
>>> and for compute bound workloads, or ones that don't expose
>>> the randomness that much, it can get checkpoints of a few seconds
>>> in length which greatly reduces the overhead.
>>>
>>
>> Yes. That's the truth.
>> We can set two different modes for different scenarios. Maybe Named
>> 1) frequent checkpoint mode for multi-connections and randomness scenarios
>> and 2) non-frequent checkpoint mode for other scenarios.
>>
>> But that's the next plan, we are thinking about that.
>
> I have some code that tries to automatically switch between those;
> it measures the checkpoint lengths, and if they're consistently short
> it sends a different message byte to the secondary at the start of the
> checkpoint, so that it doesn't bother running.   Every so often it
> then flips back to a COLO checkpoint to see if the checkpoints
> are still really fast.
>

Do you mean if there are consistent checkpoint requests, not do checkpoint but just send a special message to SVM?
Resume to common COLO mode until the checkpoint lengths is so not short ?

Thanks.

> Dave
>
>>
>> Regards,
>> -Gonglei
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 12:10             ` Gonglei
@ 2015-07-30 12:30               ` Dr. David Alan Gilbert
  2015-07-30 12:42                 ` zhanghailiang
  0 siblings, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-30 12:30 UTC (permalink / raw)
  To: Gonglei
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Jason Wang, Dong, Eddie,
	qemu-devel, peter.huangpeng, stefanha, Yang Hongyang

* Gonglei (arei.gonglei@huawei.com) wrote:
> On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> >>
> >>
> >> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> >>> * Dong, Eddie (eddie.dong@intel.com) wrote:
> >>>>>> A question here, the packet comparing may be very tricky. For example,
> >>>>>> some protocol use random data to generate unpredictable id or
> >>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
> >>>>>> needs a mechanism to make sure PVM and SVM can generate same random
> >>>>> data?
> >>>>> Good question, the random data connection is a big problem for COLO. At
> >>>>> present, it will trigger checkpoint processing because of the different random
> >>>>> data.
> >>>>> I don't think any mechanisms can assure two different machines generate the
> >>>>> same random data. If you have any ideas, pls tell us :)
> >>>>>
> >>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
> >>>>> performance poor. :(
> >>>>>
> >>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
> >>> They do diverge pretty quickly though; I have simple examples which
> >>> reliably cause a checkpoint because of simple randomness in applications.
> >>>
> >>> Dave
> >>>
> >>
> >> And it will become even worse if hwrng is used in guest.
> > 
> > Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> > once established, tends to work well without triggering checkpoints;
> > and static web pages also work well.  Examples of things that do cause
> > more checkpoints are, displaying guest statistics (e.g. running top
> > in that ssh) which is timing dependent, and dynamically generated
> > web pages that include a unique ID (bugzilla's password reset link in
> > it's front page was a fun one), I think also establishing
> > new encrypted connections cause the same randomness.
> > 
> > However, it's worth remembering that COLO is trying to reduce the
> > number of checkpoints compared to a simple checkpointing world
> > which would be aiming to do a checkpoint ~100 times a second,
> > and for compute bound workloads, or ones that don't expose
> > the randomness that much, it can get checkpoints of a few seconds
> > in length which greatly reduces the overhead.
> > 
> 
> Yes. That's the truth.
> We can set two different modes for different scenarios. Maybe Named
> 1) frequent checkpoint mode for multi-connections and randomness scenarios
> and 2) non-frequent checkpoint mode for other scenarios.
> 
> But that's the next plan, we are thinking about that.

I have some code that tries to automatically switch between those;
it measures the checkpoint lengths, and if they're consistently short
it sends a different message byte to the secondary at the start of the
checkpoint, so that it doesn't bother running.   Every so often it
then flips back to a COLO checkpoint to see if the checkpoints
are still really fast.

Dave

> 
> Regards,
> -Gonglei
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30 11:56           ` Dr. David Alan Gilbert
@ 2015-07-30 12:10             ` Gonglei
  2015-07-30 12:30               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Gonglei @ 2015-07-30 12:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Jason Wang
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Dong, Eddie, qemu-devel,
	peter.huangpeng, stefanha, Yang Hongyang

On 2015/7/30 19:56, Dr. David Alan Gilbert wrote:
> * Jason Wang (jasowang@redhat.com) wrote:
>>
>>
>> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
>>> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>>>> A question here, the packet comparing may be very tricky. For example,
>>>>>> some protocol use random data to generate unpredictable id or
>>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>>>> data?
>>>>> Good question, the random data connection is a big problem for COLO. At
>>>>> present, it will trigger checkpoint processing because of the different random
>>>>> data.
>>>>> I don't think any mechanisms can assure two different machines generate the
>>>>> same random data. If you have any ideas, pls tell us :)
>>>>>
>>>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>>>> performance poor. :(
>>>>>
>>>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
>>> They do diverge pretty quickly though; I have simple examples which
>>> reliably cause a checkpoint because of simple randomness in applications.
>>>
>>> Dave
>>>
>>
>> And it will become even worse if hwrng is used in guest.
> 
> Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
> once established, tends to work well without triggering checkpoints;
> and static web pages also work well.  Examples of things that do cause
> more checkpoints are, displaying guest statistics (e.g. running top
> in that ssh) which is timing dependent, and dynamically generated
> web pages that include a unique ID (bugzilla's password reset link in
> it's front page was a fun one), I think also establishing
> new encrypted connections cause the same randomness.
> 
> However, it's worth remembering that COLO is trying to reduce the
> number of checkpoints compared to a simple checkpointing world
> which would be aiming to do a checkpoint ~100 times a second,
> and for compute bound workloads, or ones that don't expose
> the randomness that much, it can get checkpoints of a few seconds
> in length which greatly reduces the overhead.
> 

Yes. That's the truth.
We can set two different modes for different scenarios. Maybe Named
1) frequent checkpoint mode for multi-connections and randomness scenarios
and 2) non-frequent checkpoint mode for other scenarios.

But that's the next plan, we are thinking about that.

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30  8:15         ` Jason Wang
@ 2015-07-30 11:56           ` Dr. David Alan Gilbert
  2015-07-30 12:10             ` Gonglei
  0 siblings, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-30 11:56 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Dong, Eddie, qemu-devel,
	peter.huangpeng, Gonglei, stefanha, Yang Hongyang

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> > * Dong, Eddie (eddie.dong@intel.com) wrote:
> >>>> A question here, the packet comparing may be very tricky. For example,
> >>>> some protocol use random data to generate unpredictable id or
> >>>> something else. One example is ipv6_select_ident() in Linux. So COLO
> >>>> needs a mechanism to make sure PVM and SVM can generate same random
> >>> data?
> >>> Good question, the random data connection is a big problem for COLO. At
> >>> present, it will trigger checkpoint processing because of the different random
> >>> data.
> >>> I don't think any mechanisms can assure two different machines generate the
> >>> same random data. If you have any ideas, pls tell us :)
> >>>
> >>> Frequent checkpoint can handle this scenario, but maybe will cause the
> >>> performance poor. :(
> >>>
> >> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
> > They do diverge pretty quickly though; I have simple examples which
> > reliably cause a checkpoint because of simple randomness in applications.
> >
> > Dave
> >
> 
> And it will become even worse if hwrng is used in guest.

Yes; it seems quite application dependent;  (on IPv4) an ssh connection,
once established, tends to work well without triggering checkpoints;
and static web pages also work well.  Examples of things that do cause
more checkpoints are, displaying guest statistics (e.g. running top
in that ssh) which is timing dependent, and dynamically generated
web pages that include a unique ID (bugzilla's password reset link in
it's front page was a fun one), I think also establishing
new encrypted connections cause the same randomness.

However, it's worth remembering that COLO is trying to reduce the
number of checkpoints compared to a simple checkpointing world
which would be aiming to do a checkpoint ~100 times a second,
and for compute bound workloads, or ones that don't expose
the randomness that much, it can get checkpoints of a few seconds
in length which greatly reduces the overhead.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30  8:03       ` Dr. David Alan Gilbert
@ 2015-07-30  8:15         ` Jason Wang
  2015-07-30 11:56           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-30  8:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Dong, Eddie
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, qemu-devel,
	peter.huangpeng, Gonglei, stefanha, Yang Hongyang



On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote:
> * Dong, Eddie (eddie.dong@intel.com) wrote:
>>>> A question here, the packet comparing may be very tricky. For example,
>>>> some protocol use random data to generate unpredictable id or
>>>> something else. One example is ipv6_select_ident() in Linux. So COLO
>>>> needs a mechanism to make sure PVM and SVM can generate same random
>>> data?
>>> Good question, the random data connection is a big problem for COLO. At
>>> present, it will trigger checkpoint processing because of the different random
>>> data.
>>> I don't think any mechanisms can assure two different machines generate the
>>> same random data. If you have any ideas, pls tell us :)
>>>
>>> Frequent checkpoint can handle this scenario, but maybe will cause the
>>> performance poor. :(
>>>
>> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...
> They do diverge pretty quickly though; I have simple examples which
> reliably cause a checkpoint because of simple randomness in applications.
>
> Dave
>

And it will become even worse if hwrng is used in guest.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30  7:47     ` Dong, Eddie
@ 2015-07-30  8:03       ` Dr. David Alan Gilbert
  2015-07-30  8:15         ` Jason Wang
  0 siblings, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-30  8:03 UTC (permalink / raw)
  To: Dong, Eddie
  Cc: zhanghailiang, Li Zhijian, jan.kiszka, Jason Wang, qemu-devel,
	peter.huangpeng, Gonglei, stefanha, Yang Hongyang

* Dong, Eddie (eddie.dong@intel.com) wrote:
> > >
> > > A question here, the packet comparing may be very tricky. For example,
> > > some protocol use random data to generate unpredictable id or
> > > something else. One example is ipv6_select_ident() in Linux. So COLO
> > > needs a mechanism to make sure PVM and SVM can generate same random
> > data?
> > >
> > Good question, the random data connection is a big problem for COLO. At
> > present, it will trigger checkpoint processing because of the different random
> > data.
> > I don't think any mechanisms can assure two different machines generate the
> > same random data. If you have any ideas, pls tell us :)
> > 
> > Frequent checkpoint can handle this scenario, but maybe will cause the
> > performance poor. :(
> > 
> The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...

They do diverge pretty quickly though; I have simple examples which
reliably cause a checkpoint because of simple randomness in applications.

Dave

> Thx Eddie
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30  7:16   ` Gonglei
@ 2015-07-30  7:47     ` Dong, Eddie
  2015-07-30  8:03       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 63+ messages in thread
From: Dong, Eddie @ 2015-07-30  7:47 UTC (permalink / raw)
  To: Gonglei, Jason Wang, Li Zhijian, qemu-devel, stefanha
  Cc: zhanghailiang, jan.kiszka, Dong, Eddie, peter.huangpeng,
	dgilbert, Yang Hongyang

> >
> > A question here, the packet comparing may be very tricky. For example,
> > some protocol use random data to generate unpredictable id or
> > something else. One example is ipv6_select_ident() in Linux. So COLO
> > needs a mechanism to make sure PVM and SVM can generate same random
> data?
> >
> Good question, the random data connection is a big problem for COLO. At
> present, it will trigger checkpoint processing because of the different random
> data.
> I don't think any mechanisms can assure two different machines generate the
> same random data. If you have any ideas, pls tell us :)
> 
> Frequent checkpoint can handle this scenario, but maybe will cause the
> performance poor. :(
> 
The assumption is that, after VM checkpoint, SVM and PVM have identical internal state, so the pattern used to generate random data has high possibility to generate identical data at short time, at least...

Thx Eddie


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-30  4:23 ` Jason Wang
@ 2015-07-30  7:16   ` Gonglei
  2015-07-30  7:47     ` Dong, Eddie
  0 siblings, 1 reply; 63+ messages in thread
From: Gonglei @ 2015-07-30  7:16 UTC (permalink / raw)
  To: Jason Wang, Li Zhijian, qemu-devel, stefanha
  Cc: jan.kiszka, Yang Hongyang, zhanghailiang, dgilbert, peter.huangpeng

On 2015/7/30 12:23, Jason Wang wrote:
> 
> 
> On 07/20/2015 02:42 PM, Li Zhijian wrote:
>> Hi, all
>>
>> We are planning to implement colo-proxy in qemu to cache and compare
>> packets.
>> This module is one of the important component of COLO project and now
>> it is
>> still in early stage, so any comments and feedback are warmly welcomed,
>> thanks in advance.
>>
>> ## Background
>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
>> Service)
>> project is a high availability solution. Both Primary VM (PVM) and
>> Secondary VM
>> (SVM) run in parallel. They receive the same request from client, and
>> generate
>> responses in parallel too. If the response packets from PVM and SVM are
>> identical, they are released immediately. Otherwise, a VM checkpoint
>> (on demand)
>> is conducted.
>> Paper:
>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
>> COLO on Xen:
>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
>> COLO on Qemu/KVM:
>> http://wiki.qemu.org/Features/COLO
>>
>> By the needs of capturing response packets from PVM and SVM and
>> finding out
>> whether they are identical, we introduce a new module to qemu
>> networking called
>> colo-proxy.
> 
> A question here, the packet comparing may be very tricky. For example,
> some protocol use random data to generate unpredictable id or something
> else. One example is ipv6_select_ident() in Linux. So COLO needs a
> mechanism to make sure PVM and SVM can generate same random data?
> 
Good question, the random data connection is a big problem for COLO. At
present, it will trigger checkpoint processing because of the different random data.
I don't think any mechanisms can assure two different machines generate
the same random data. If you have any ideas, pls tell us :)

Frequent checkpoint can handle this scenario, but maybe will cause the
performance poor. :(

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20  6:42 [Qemu-devel] [POC] colo-proxy " Li Zhijian
  2015-07-20 10:32 ` Stefan Hajnoczi
  2015-07-24  2:05 ` Dong, Eddie
@ 2015-07-30  4:23 ` Jason Wang
  2015-07-30  7:16   ` Gonglei
  2 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-30  4:23 UTC (permalink / raw)
  To: Li Zhijian, qemu-devel, stefanha
  Cc: zhanghailiang, jan.kiszka, dgilbert, peter.huangpeng,
	Gonglei (Arei),
	Yang Hongyang



On 07/20/2015 02:42 PM, Li Zhijian wrote:
> Hi, all
>
> We are planning to implement colo-proxy in qemu to cache and compare
> packets.
> This module is one of the important component of COLO project and now
> it is
> still in early stage, so any comments and feedback are warmly welcomed,
> thanks in advance.
>
> ## Background
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop
> Service)
> project is a high availability solution. Both Primary VM (PVM) and
> Secondary VM
> (SVM) run in parallel. They receive the same request from client, and
> generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint
> (on demand)
> is conducted.
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
>
> By the needs of capturing response packets from PVM and SVM and
> finding out
> whether they are identical, we introduce a new module to qemu
> networking called
> colo-proxy.

A question here, the packet comparing may be very tricky. For example,
some protocol use random data to generate unpredictable id or something
else. One example is ipv6_select_ident() in Linux. So COLO needs a
mechanism to make sure PVM and SVM can generate same random data?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-28 22:12                 ` Samuel Thibault
@ 2015-07-29  7:36                   ` Jan Kiszka
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Kiszka @ 2015-07-29  7:36 UTC (permalink / raw)
  To: Samuel Thibault
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Jason Wang,
	qemu-devel, Vasiliy Tolstov, Dave Gilbert, Gonglei (Arei),
	Stefan Hajnoczi, Huangpeng (Peter),
	Yang Hongyang

On 2015-07-29 00:12, Samuel Thibault wrote:
> Hello,
> 
> Jan Kiszka, le Mon 27 Jul 2015 15:33:27 +0200, a écrit :
>> Of course, I'm fine with handing this over to someone who'd like to
>> pick up. Do we have volunteers?
>>
>> Samuel, would you like to do this? As a subsystem maintainer, you are
>> already familiar with QEMU processes.
> 
> I can help with maintenance, yes.

"Help with" will easily mean "be the one and only". ;) If you prefer,
send a patch which only adds you as a maintainer, but I would also ack
one that drops me from the list as well.

> 
>> Well, this still wouldn't resolve the independent review need for
>> slirp-ipv6.
> 
> Well, actually I didn't write slirp-ipv6, Guillaume Subiron did, and I
> reviewed it (and we iterated quite a bit) before we submit the patch
> series to qemu-devel.

Perfekt!

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-21  1:59         ` zhanghailiang
@ 2015-07-28 22:13           ` Samuel Thibault
  0 siblings, 0 replies; 63+ messages in thread
From: Samuel Thibault @ 2015-07-28 22:13 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, Stefan Hajnoczi, Jason Wang, qemu-devel,
	Vasiliy Tolstov, peter.huangpeng, Gonglei (Arei),
	Stefan Hajnoczi, J. Kiszka, Yang Hongyang, Dave Gilbert

zhanghailiang, le Tue 21 Jul 2015 09:59:22 +0800, a écrit :
> I didn't find any news since that version, are you still trying to
> push them to qemu upstream ?

I'd still be trying if I had any actual answer other than "we need to
find time to deal about it" :)

I can rebase the patch series over the current master and submit again
the patches.

Samuel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27 13:33               ` Jan Kiszka
@ 2015-07-28 22:12                 ` Samuel Thibault
  2015-07-29  7:36                   ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Samuel Thibault @ 2015-07-28 22:12 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Jason Wang,
	qemu-devel, Vasiliy Tolstov, Dave Gilbert, Gonglei (Arei),
	Stefan Hajnoczi, Huangpeng (Peter),
	Yang Hongyang

Hello,

Jan Kiszka, le Mon 27 Jul 2015 15:33:27 +0200, a écrit :
> Of course, I'm fine with handing this over to someone who'd like to
> pick up. Do we have volunteers?
> 
> Samuel, would you like to do this? As a subsystem maintainer, you are
> already familiar with QEMU processes.

I can help with maintenance, yes.

> Well, this still wouldn't resolve the independent review need for
> slirp-ipv6.

Well, actually I didn't write slirp-ipv6, Guillaume Subiron did, and I
reviewed it (and we iterated quite a bit) before we submit the patch
series to qemu-devel.

Samuel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  7:53                 ` Jason Wang
  2015-07-27  8:17                   ` Yang Hongyang
@ 2015-07-27 18:33                   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27 18:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Dong, Eddie,
	peter.huangpeng, qemu-devel, Gonglei (Arei),
	stefanha, jan.kiszka, Yang Hongyang, dgilbert

* Jason Wang (jasowang@redhat.com) wrote:
> 
> 
> On 07/27/2015 01:51 PM, Yang Hongyang wrote:
> > On 07/27/2015 12:49 PM, Jason Wang wrote:
> >>
> >>
> >> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
> >>>
> >>>
> >>> On 07/27/2015 11:24 AM, Jason Wang wrote:
> >>>>
> >>>>
> >>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
> >>>>>>> Hi Stefan:
> >>>>>>>       Thanks for your comments!
> >>>>>>>
> >>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
> >>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
> >>>>>>>>> compare
> >>>>>>>> packets.
> >>>>>>>>
> >>>>>>>> I thought there is a kernel module to do that?
> >>>>>>>       Yes, that is the previous solution the COLO sub-community
> >>>>>>> choose
> >>>>>>> to go, but we realized it might be not the best choices, and
> >>>>>>> thus we
> >>>>>>> want to bring discussion back here :)  More comments are welcome.
> >>>>>>>
> >>>>>>
> >>>>>> Hi:
> >>>>>>
> >>>>>> Could you pls describe more details on this decision? What's the
> >>>>>> reason
> >>>>>> that you realize it was not the best choice?
> >>>>>
> >>>>> Below is my opinion:
> >>>>>
> >>>>> We realized that there're disadvantages do it in kernel spaces:
> >>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
> >>>>>      implemented as a nf conntrack extension. Adding a extension
> >>>>> need to
> >>>>>      modify the extension struct in-kernel, so recompile kernel is
> >>>>> needed.
> >>>>
> >>>> There's no need to do all in kernel, you can use a separate process to
> >>>> do the comparing and trigger the state sync through monitor.
> >>>
> >>> I don't get it, colo-proxy kernel module using a kthread do the
> >>> comparing and
> >>> trigger the state sync. We implemented it as a nf conntrack extension
> >>> module,
> >>> so we need to extend the extension struct in-kernel, although it just
> >>> needs
> >>> few lines changes to kernel, but a recompile of kernel is needed.
> >>> Are you
> >>> talking about not implement it as a nf conntrack extension?
> >>
> >> Yes, I mean implement the comparing in userspace but not in qemu.
> >
> > Yes, it is an alternative, that requires other components such as
> > netfilter userspace tools, it will add the complexity I think, we
> > wanted to implement a simple solution in QEMU.
> 
> I didn't get the point that why netfilter is needed? Do you mean the
> packet comparing needs to be stateful?

The current kernel world does a few things that take advantage
of the netfilter code:
   1) It's stateful hanging state off conntrack
   2) It modifies sequence numbers off the secondary to match what the
      primary did when it created the stream.
   3) Comparison is on a per-stream basis so that the order of unrelated
      packets doesn't cause a miscompare.

Dave

> > Another reason is
> > that using other userspace tools will affect the performance, the
> > context switch between kernel and userspace may be an overhead.
> 
> We can use 100% time of this process but looks like your RFC of filter
> just did it in iothread?
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27 10:40         ` Dr. David Alan Gilbert
@ 2015-07-27 13:39           ` Yang Hongyang
  0 siblings, 0 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27 13:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Jason Wang, Dong,
	Eddie, peter.huangpeng, qemu-devel, Gonglei (Arei),
	stefanha, jan.kiszka

Hi Dave,

   Thanks for the comments!

On 07/27/2015 06:40 PM, Dr. David Alan Gilbert wrote:
> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>> Hi Jason,
>>
>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>
>>>
>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>> Hi Stefan:
>>>> 	Thanks for your comments!
>>>>
>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>> We are planning to implement colo-proxy in qemu to cache and compare
>>>>> packets.
>>>>>
>>>>> I thought there is a kernel module to do that?
>>>> 	Yes, that is the previous solution the COLO sub-community choose to go, but we realized it might be not the best choices, and thus we want to bring discussion back here :)  More comments are welcome.
>>>>
>>>
>>> Hi:
>>>
>>> Could you pls describe more details on this decision? What's the reason
>>> that you realize it was not the best choice?
>>
>> Below is my opinion:
>>
>> We realized that there're disadvantages do it in kernel spaces:
>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>     implemented as a nf conntrack extension. Adding a extension need to
>>     modify the extension struct in-kernel, so recompile kernel is needed.
>
> That change is tiny though, so I don't think the change to the kernel
> is a big issue (but I'm not a netfilter guy).
>
> (For those following, the patch is:
> https://github.com/coloft/colo-proxy/blob/master/patch4kernel/0001-colo-patch-for-kernel.patch
> )
> The comparison modules are bigger though, but still not massive.
>
>> 2. We need to recompile iptables/nftables to use together with the colo-proxy
>>     kernel module.
>
> Again, the changes to iptables are small; so I don't think this should
> influence it too much.

Yes, these changes are small, but even a small change needs to recompile
the component and reinstall it, for user, it is not friendly...

>
> The bigger problem shown by 1&2 is that these changes are single-use - just for
> COLO, which does make it a little harder to justify.

That's true.

>
>> 3. Need to configure primary host to forward input packets to secondary as
>>     well as configure secondary to forward output packets to primary host, the
>>     network topology and configuration is too complex for a regular user.
>
> Yes, and that bit is HARD - it took me quite a while to get it right; however,
> we'll still need to forward packets between primary and secondary,

If we forward in qemu using a socket connection, a separate forward nic will not
be needed, and all tc stuff will not needed, will make configuration easier I
think.

> and all that
> hard setup should get rolled into something like libvirt, so perhaps it's not really
> that bad for the user in the end.
>
>> You can refer to http://wiki.qemu.org/Features/COLO
>> to see the network topology and the steps to setup an env.
>>
>> Setup a test env is too complex. The usability is so important to a feature
>> like COLO which provide VM FT solution, if fewer people can/willing to
>> setup the env, the feature is useless. So we decide to develop user space
>> colo-proxy.
>>
>> The advantage is obvious,
>> 1. we do not need to recompile kernel.
>> 2. No need to recompile iptables/nftables.
>> 3. we do not need to deal with the network configuration, we just using a
>>     socket connection between 2 QEMUs to forward packets.
>> 4. A complete VM FT solution in one go, we have already developed the block
>>     replication in QEMU, so with the network replication in QEMU, all
>>     components we needed are within QEMU, this is very important, it greatly
>>     improves the usability of COLO feature! We hope it will gain more testers,
>>     users and developers.
>> 5. QEMU will gain a complete VM FT solution and the most advantage FT solution
>>     so far!
>>
>> Overall, usability is the most important factor that impact our choice.
>
> My biggest worry is your reliance on SLIRP for the TCP/IP stack; it
> doesn't get much work done on it and I worry about it's reliability for
> using it for the level of complexity you need.
>
> Your current kernel implementation gets all the nf_conntrack stuff for free
> which is very powerful.
>
> However, I can see some advantages from doing it in user space; it would
> be easier to debug, and possibly easier to configure, and might also be easier
> to handle continuous FT (i.e. transferring the state of the proxy to a new COLO
> connection).
>
> I think at the moment I'd still prefer kernel space (especially since your kernel
> code now works pretty reliably!)
>
> Another thought; if you're main worry is to do with the complexity of kernel
> changes, had you considered looking at the bpf-jit - I'm not sure if it can
> do what you need, but perhaps it's worth a look?

Will have a look, thank you!

>
> Dave
> P.S. I think 'proxy' is still the right word to describe it rather than 'agency'.
>
>>
>>
>>>
>>> Thanks
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27 10:13             ` Stefan Hajnoczi
  2015-07-27 11:24               ` zhanghailiang
@ 2015-07-27 13:33               ` Jan Kiszka
  2015-07-28 22:12                 ` Samuel Thibault
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2015-07-27 13:33 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefan Hajnoczi, Samuel Thibault
  Cc: zhanghailiang, Li Zhijian, Jason Wang, Dave Gilbert,
	Vasiliy Tolstov, qemu-devel, Gonglei (Arei), Huangpeng (Peter),
	Yang Hongyang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2015-07-27 12:13, Stefan Hajnoczi wrote:
> On Tue, Jul 21, 2015 at 10:49:29AM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jul 21, 2015 at 08:13:42AM +0200, Jan Kiszka wrote:
>>> On 2015-07-20 17:01, Stefan Hajnoczi wrote:
>>>> On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov
>>>> <v.tolstov@selfip.ru> wrote:
>>>>> 2015-07-20 14:55 GMT+03:00 zhanghailiang
>>>>> <zhang.zhanghailiang@huawei.com>:
>>>>>> Agreed, besides, it is seemed that slirp is not
>>>>>> supporting ipv6, we also have to supplement it.
>>>>> 
>>>>> 
>>>>> patch for ipv6 slirp support some times ago sended to qemu
>>>>> list, but i don't know why in not accepted.
>>>> 
>>>> I think no one reviewed it but there was no objection against
>>>> IPv6 support in principle.
>>>> 
>>>> Jan: Can we merge slirp IPv6 support for QEMU 2.5?
>>> 
>>> Sorry, as I pointed out some time back, I don't have the
>>> bandwidth to look into slirp. Someone need to do a review, then
>>> send a pull request.
>> 
>> Do you want to remove yourself from the slirp section of the
>> MAINTAINERS file?
>> 
>> Going forward we'll need to find someone familiar with the QEMU 
>> development process and with enough time to review slirp
>> patches.
> 
> Ping?
> 
> I hoped this would raise some discussion and that maybe we could
> find a new maintainer or co-maintainer to get slirp moving.
> 
> Any thoughts?

Of course, I'm fine with handing this over to someone who'd like to
pick up. Do we have volunteers?

Samuel, would you like to do this? As a subsystem maintainer, you are
already familiar with QEMU processes. Well, this still wouldn't
resolve the independent review need for slirp-ipv6.

Jan

- -- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlW2MycACgkQitSsb3rl5xSCaACePNubKPBkrdxQkcThUGD7w56B
Q6oAoIgCzT9qVRzDf5IhY2eKFXgTZ+Ul
=yk6R
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27 11:24               ` zhanghailiang
@ 2015-07-27 11:31                 ` Samuel Thibault
  0 siblings, 0 replies; 63+ messages in thread
From: Samuel Thibault @ 2015-07-27 11:31 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, Stefan Hajnoczi, Jason Wang, peter.huangpeng,
	Vasiliy Tolstov, qemu-devel, Gonglei (Arei),
	Stefan Hajnoczi, Jan Kiszka, Yang Hongyang, Dave Gilbert

Hello,

I'm just back from vacancy with no Internet access, so will answer
shortly :)

Samuel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27 10:13             ` Stefan Hajnoczi
@ 2015-07-27 11:24               ` zhanghailiang
  2015-07-27 11:31                 ` Samuel Thibault
  2015-07-27 13:33               ` Jan Kiszka
  1 sibling, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-07-27 11:24 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefan Hajnoczi
  Cc: Li Zhijian, Jan Kiszka, Jason Wang, peter.huangpeng,
	Vasiliy Tolstov, qemu-devel, Gonglei (Arei),
	samuel.thibault, samuel.thibault, Yang Hongyang, Dave Gilbert

On 2015/7/27 18:13, Stefan Hajnoczi wrote:
> On Tue, Jul 21, 2015 at 10:49:29AM +0100, Stefan Hajnoczi wrote:
>> On Tue, Jul 21, 2015 at 08:13:42AM +0200, Jan Kiszka wrote:
>>> On 2015-07-20 17:01, Stefan Hajnoczi wrote:
>>>> On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
>>>>> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
>>>>>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
>>>>>> have to supplement it.
>>>>>
>>>>>
>>>>> patch for ipv6 slirp support some times ago sended to qemu list, but i
>>>>> don't know why in not accepted.
>>>>
>>>> I think no one reviewed it but there was no objection against IPv6
>>>> support in principle.
>>>>
>>>> Jan: Can we merge slirp IPv6 support for QEMU 2.5?
>>>
>>> Sorry, as I pointed out some time back, I don't have the bandwidth to
>>> look into slirp. Someone need to do a review, then send a pull request.
>>
>> Do you want to remove yourself from the slirp section of the MAINTAINERS
>> file?
>>
>> Going forward we'll need to find someone familiar with the QEMU
>> development process and with enough time to review slirp patches.
>
> Ping?
>
> I hoped this would raise some discussion and that maybe we could find a
> new maintainer or co-maintainer to get slirp moving.
>

Yes, please, this is important, we need slirp's maintainer to help reviewing the COLO proxy
patches that will be implemented based on slirp. (If we finally come to an agreement on realizing it in qemu)

We also need to support ipv6 for slirp, i have emailed Samuel who have sent ipv6 patches for slirp before,
but got no response. (I would like to respin and test Samuel's ipv6 slirp patch if he don't have time to do this,
but firstly, i need to get his permission :) )

Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> <samuel.thibault@gnu.org>.

Thanks,
zhanghailiang

> Any thoughts?
>
> Stefan
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-24  8:04       ` Yang Hongyang
  2015-07-27  3:24         ` Jason Wang
@ 2015-07-27 10:40         ` Dr. David Alan Gilbert
  2015-07-27 13:39           ` Yang Hongyang
  1 sibling, 1 reply; 63+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27 10:40 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Jason Wang, Dong,
	Eddie, peter.huangpeng, qemu-devel, Gonglei (Arei),
	stefanha, jan.kiszka

* Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
> Hi Jason,
> 
> On 07/24/2015 10:12 AM, Jason Wang wrote:
> >
> >
> >On 07/24/2015 10:04 AM, Dong, Eddie wrote:
> >>Hi Stefan:
> >>	Thanks for your comments!
> >>
> >>>On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
> >>>>We are planning to implement colo-proxy in qemu to cache and compare
> >>>packets.
> >>>
> >>>I thought there is a kernel module to do that?
> >>	Yes, that is the previous solution the COLO sub-community choose to go, but we realized it might be not the best choices, and thus we want to bring discussion back here :)  More comments are welcome.
> >>
> >
> >Hi:
> >
> >Could you pls describe more details on this decision? What's the reason
> >that you realize it was not the best choice?
> 
> Below is my opinion:
> 
> We realized that there're disadvantages do it in kernel spaces:
> 1. We need to recompile kernel: the colo-proxy kernel module is
>    implemented as a nf conntrack extension. Adding a extension need to
>    modify the extension struct in-kernel, so recompile kernel is needed.

That change is tiny though, so I don't think the change to the kernel
is a big issue (but I'm not a netfilter guy).

(For those following, the patch is:
https://github.com/coloft/colo-proxy/blob/master/patch4kernel/0001-colo-patch-for-kernel.patch
)
The comparison modules are bigger though, but still not massive.

> 2. We need to recompile iptables/nftables to use together with the colo-proxy
>    kernel module.

Again, the changes to iptables are small; so I don't think this should
influence it too much.

The bigger problem shown by 1&2 is that these changes are single-use - just for
COLO, which does make it a little harder to justify.

> 3. Need to configure primary host to forward input packets to secondary as
>    well as configure secondary to forward output packets to primary host, the
>    network topology and configuration is too complex for a regular user.

Yes, and that bit is HARD - it took me quite a while to get it right; however,
we'll still need to forward packets between primary and secondary, and all that
hard setup should get rolled into something like libvirt, so perhaps it's not really
that bad for the user in the end.

> You can refer to http://wiki.qemu.org/Features/COLO
> to see the network topology and the steps to setup an env.
> 
> Setup a test env is too complex. The usability is so important to a feature
> like COLO which provide VM FT solution, if fewer people can/willing to
> setup the env, the feature is useless. So we decide to develop user space
> colo-proxy.
> 
> The advantage is obvious,
> 1. we do not need to recompile kernel.
> 2. No need to recompile iptables/nftables.
> 3. we do not need to deal with the network configuration, we just using a
>    socket connection between 2 QEMUs to forward packets.
> 4. A complete VM FT solution in one go, we have already developed the block
>    replication in QEMU, so with the network replication in QEMU, all
>    components we needed are within QEMU, this is very important, it greatly
>    improves the usability of COLO feature! We hope it will gain more testers,
>    users and developers.
> 5. QEMU will gain a complete VM FT solution and the most advantage FT solution
>    so far!
> 
> Overall, usability is the most important factor that impact our choice.

My biggest worry is your reliance on SLIRP for the TCP/IP stack; it
doesn't get much work done on it and I worry about it's reliability for
using it for the level of complexity you need.

Your current kernel implementation gets all the nf_conntrack stuff for free
which is very powerful.

However, I can see some advantages from doing it in user space; it would
be easier to debug, and possibly easier to configure, and might also be easier
to handle continuous FT (i.e. transferring the state of the proxy to a new COLO
connection).

I think at the moment I'd still prefer kernel space (especially since your kernel
code now works pretty reliably!)

Another thought; if you're main worry is to do with the complexity of kernel
changes, had you considered looking at the bpf-jit - I'm not sure if it can
do what you need, but perhaps it's worth a look?

Dave
P.S. I think 'proxy' is still the right word to describe it rather than 'agency'.

> 
> 
> >
> >Thanks
> >.
> >
> 
> -- 
> Thanks,
> Yang.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-21  9:49           ` Stefan Hajnoczi
@ 2015-07-27 10:13             ` Stefan Hajnoczi
  2015-07-27 11:24               ` zhanghailiang
  2015-07-27 13:33               ` Jan Kiszka
  0 siblings, 2 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2015-07-27 10:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Jan Kiszka, Jason Wang, qemu-devel,
	Vasiliy Tolstov, Dave Gilbert, Gonglei (Arei), Huangpeng (Peter),
	Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

On Tue, Jul 21, 2015 at 10:49:29AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 21, 2015 at 08:13:42AM +0200, Jan Kiszka wrote:
> > On 2015-07-20 17:01, Stefan Hajnoczi wrote:
> > > On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
> > >> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
> > >>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
> > >>> have to supplement it.
> > >>
> > >>
> > >> patch for ipv6 slirp support some times ago sended to qemu list, but i
> > >> don't know why in not accepted.
> > > 
> > > I think no one reviewed it but there was no objection against IPv6
> > > support in principle.
> > > 
> > > Jan: Can we merge slirp IPv6 support for QEMU 2.5?
> > 
> > Sorry, as I pointed out some time back, I don't have the bandwidth to
> > look into slirp. Someone need to do a review, then send a pull request.
> 
> Do you want to remove yourself from the slirp section of the MAINTAINERS
> file?
> 
> Going forward we'll need to find someone familiar with the QEMU
> development process and with enough time to review slirp patches.

Ping?

I hoped this would raise some discussion and that maybe we could find a
new maintainer or co-maintainer to get slirp moving.

Any thoughts?

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  8:06                     ` Jason Wang
@ 2015-07-27  8:22                       ` Yang Hongyang
  0 siblings, 0 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27  8:22 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, peter.huangpeng, qemu-devel,
	Gonglei (Arei),
	stefanha, dgilbert

On 07/27/2015 04:06 PM, Jason Wang wrote:
>
>
> On 07/27/2015 03:49 PM, Yang Hongyang wrote:
>> On 07/27/2015 03:37 PM, Jason Wang wrote:
>>>
>>>
>>> On 07/27/2015 01:51 PM, Yang Hongyang wrote:
>>>> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>>>>
>>>>>>
>>>>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>>>>> Hi Stefan:
>>>>>>>>>>         Thanks for your comments!
>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>>>>> compare
>>>>>>>>>>> packets.
>>>>>>>>>>>
>>>>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>>>>         Yes, that is the previous solution the COLO sub-community
>>>>>>>>>> choose
>>>>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>>>>> thus we
>>>>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi:
>>>>>>>>>
>>>>>>>>> Could you pls describe more details on this decision? What's the
>>>>>>>>> reason
>>>>>>>>> that you realize it was not the best choice?
>>>>>>>>
>>>>>>>> Below is my opinion:
>>>>>>>>
>>>>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>>>>        implemented as a nf conntrack extension. Adding a extension
>>>>>>>> need to
>>>>>>>>        modify the extension struct in-kernel, so recompile kernel is
>>>>>>>> needed.
>>>>>>>
>>>>>>> There's no need to do all in kernel, you can use a separate
>>>>>>> process to
>>>>>>> do the comparing and trigger the state sync through monitor.
>>>>>>
>>>>>> I don't get it, colo-proxy kernel module using a kthread do the
>>>>>> comparing and
>>>>>> trigger the state sync. We implemented it as a nf conntrack extension
>>>>>> module,
>>>>>> so we need to extend the extension struct in-kernel, although it just
>>>>>> needs
>>>>>> few lines changes to kernel, but a recompile of kernel is needed.
>>>>>> Are you
>>>>>> talking about not implement it as a nf conntrack extension?
>>>>>
>>>>> Yes, I mean implement the comparing in userspace but not in qemu.
>>>>
>>>> Yes, it is an alternative, that requires other components such as
>>>> netfilter userspace tools, it will add the complexity I think, we
>>>> wanted to implement a simple solution in QEMU. Another reason is
>>>> that using other userspace tools will affect the performance, the
>>>> context switch between kernel and userspace may be an overhead.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> 2. We need to recompile iptables/nftables to use together with the
>>>>>>>> colo-proxy
>>>>>>>>        kernel module.
>>>>>>>> 3. Need to configure primary host to forward input packets to
>>>>>>>> secondary as
>>>>>>>>        well as configure secondary to forward output packets to
>>>>>>>> primary
>>>>>>>> host, the
>>>>>>>>        network topology and configuration is too complex for a
>>>>>>>> regular
>>>>>>>> user.
>>>>>>>>
>>>>>>>
>>>>>>> You can use current kernel primitives to mirror the traffic of both
>>>>>>> PVM
>>>>>>> and SVM to another process without any modification of kernel. And
>>>>>>> qemu
>>>>>>> can offload all network configuration to management in this
>>>>>>> case.  And
>>>>>>> what's more import, this works for vhost. Filtering in qemu won't
>>>>>>> work
>>>>>>> for vhost.
>>>>>>
>>>>>> We are using tc to mirror/forward packets now. Implement in QEMU do
>>>>>> have some
>>>>>> limits, but there're also limits in kernel, if the packet do not pass
>>>>>> the host kernel TCP/IP stack, such as vhost-user.
>>>>>
>>>>> But the limits are much less than userspace, no? For vhost-user, maybe
>>>>> we could extend the backed to mirror the traffic also.
>>>>
>>>> IMO the limits are more or less. Besides, for mirror/forward packets,
>>>> using tc requires a separate physical nic or a vlan, the nic should not
>>>> be used for other purpose. if we implement it in QEMU, using an socket
>>>> connection to forward packets, we no longer need an separate nic, it
>>>> will
>>>> reduce the network topology complexity.
>>>
>>> It depends on how do you design your user space. If you want using
>>> userspace to forward the packet, you can 1) use packet socket to capture
>>> all traffic on the tap that is used by VM 2) mirror the traffic to a new
>>> tap device, the user space can then read all traffic from this new tap
>>> device.
>>
>> Yes, but we can also do it in QEMU space, right?
>
> Right.
>
>> This will make life easier
>> because we do all in one solution within QEMU.
>
> But I'm not sure qemu is the right place to do this as you mention that
> it needs userspace protocol stack support.

We only need some simple features like defragment of TCP packets, analyze
TCP headers, since QEMU has a slirp userspace protocol stack, that should
not be a big deal.

>
>>
>>>
>>> .
>>>
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  7:53                 ` Jason Wang
@ 2015-07-27  8:17                   ` Yang Hongyang
  2015-07-27 18:33                   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27  8:17 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert

On 07/27/2015 03:53 PM, Jason Wang wrote:
>
>
> On 07/27/2015 01:51 PM, Yang Hongyang wrote:
>> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>>
>>>
>>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>>
>>>>
>>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>>> Hi Stefan:
>>>>>>>>        Thanks for your comments!
>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>>> compare
>>>>>>>>> packets.
>>>>>>>>>
>>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>>        Yes, that is the previous solution the COLO sub-community
>>>>>>>> choose
>>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>>> thus we
>>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>>
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> Could you pls describe more details on this decision? What's the
>>>>>>> reason
>>>>>>> that you realize it was not the best choice?
>>>>>>
>>>>>> Below is my opinion:
>>>>>>
>>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>>       implemented as a nf conntrack extension. Adding a extension
>>>>>> need to
>>>>>>       modify the extension struct in-kernel, so recompile kernel is
>>>>>> needed.
>>>>>
>>>>> There's no need to do all in kernel, you can use a separate process to
>>>>> do the comparing and trigger the state sync through monitor.
>>>>
>>>> I don't get it, colo-proxy kernel module using a kthread do the
>>>> comparing and
>>>> trigger the state sync. We implemented it as a nf conntrack extension
>>>> module,
>>>> so we need to extend the extension struct in-kernel, although it just
>>>> needs
>>>> few lines changes to kernel, but a recompile of kernel is needed.
>>>> Are you
>>>> talking about not implement it as a nf conntrack extension?
>>>
>>> Yes, I mean implement the comparing in userspace but not in qemu.
>>
>> Yes, it is an alternative, that requires other components such as
>> netfilter userspace tools, it will add the complexity I think, we
>> wanted to implement a simple solution in QEMU.
>
> I didn't get the point that why netfilter is needed? Do you mean the
> packet comparing needs to be stateful?

Yes.

>
>> Another reason is
>> that using other userspace tools will affect the performance, the
>> context switch between kernel and userspace may be an overhead.
>
> We can use 100% time of this process but looks like your RFC of filter
> just did it in iothread?

That's not the colo-proxy case, colo-proxy will require a separate
thread to do the comparing work.

>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  7:49                   ` Yang Hongyang
@ 2015-07-27  8:06                     ` Jason Wang
  2015-07-27  8:22                       ` Yang Hongyang
  0 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-27  8:06 UTC (permalink / raw)
  To: Yang Hongyang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, peter.huangpeng, qemu-devel,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/27/2015 03:49 PM, Yang Hongyang wrote:
> On 07/27/2015 03:37 PM, Jason Wang wrote:
>>
>>
>> On 07/27/2015 01:51 PM, Yang Hongyang wrote:
>>> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>>>
>>>>
>>>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>>>
>>>>>
>>>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>>>> Hi Stefan:
>>>>>>>>>        Thanks for your comments!
>>>>>>>>>
>>>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>>>> compare
>>>>>>>>>> packets.
>>>>>>>>>>
>>>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>>>        Yes, that is the previous solution the COLO sub-community
>>>>>>>>> choose
>>>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>>>> thus we
>>>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>> Could you pls describe more details on this decision? What's the
>>>>>>>> reason
>>>>>>>> that you realize it was not the best choice?
>>>>>>>
>>>>>>> Below is my opinion:
>>>>>>>
>>>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>>>       implemented as a nf conntrack extension. Adding a extension
>>>>>>> need to
>>>>>>>       modify the extension struct in-kernel, so recompile kernel is
>>>>>>> needed.
>>>>>>
>>>>>> There's no need to do all in kernel, you can use a separate
>>>>>> process to
>>>>>> do the comparing and trigger the state sync through monitor.
>>>>>
>>>>> I don't get it, colo-proxy kernel module using a kthread do the
>>>>> comparing and
>>>>> trigger the state sync. We implemented it as a nf conntrack extension
>>>>> module,
>>>>> so we need to extend the extension struct in-kernel, although it just
>>>>> needs
>>>>> few lines changes to kernel, but a recompile of kernel is needed.
>>>>> Are you
>>>>> talking about not implement it as a nf conntrack extension?
>>>>
>>>> Yes, I mean implement the comparing in userspace but not in qemu.
>>>
>>> Yes, it is an alternative, that requires other components such as
>>> netfilter userspace tools, it will add the complexity I think, we
>>> wanted to implement a simple solution in QEMU. Another reason is
>>> that using other userspace tools will affect the performance, the
>>> context switch between kernel and userspace may be an overhead.
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> 2. We need to recompile iptables/nftables to use together with the
>>>>>>> colo-proxy
>>>>>>>       kernel module.
>>>>>>> 3. Need to configure primary host to forward input packets to
>>>>>>> secondary as
>>>>>>>       well as configure secondary to forward output packets to
>>>>>>> primary
>>>>>>> host, the
>>>>>>>       network topology and configuration is too complex for a
>>>>>>> regular
>>>>>>> user.
>>>>>>>
>>>>>>
>>>>>> You can use current kernel primitives to mirror the traffic of both
>>>>>> PVM
>>>>>> and SVM to another process without any modification of kernel. And
>>>>>> qemu
>>>>>> can offload all network configuration to management in this
>>>>>> case.  And
>>>>>> what's more import, this works for vhost. Filtering in qemu won't
>>>>>> work
>>>>>> for vhost.
>>>>>
>>>>> We are using tc to mirror/forward packets now. Implement in QEMU do
>>>>> have some
>>>>> limits, but there're also limits in kernel, if the packet do not pass
>>>>> the host kernel TCP/IP stack, such as vhost-user.
>>>>
>>>> But the limits are much less than userspace, no? For vhost-user, maybe
>>>> we could extend the backed to mirror the traffic also.
>>>
>>> IMO the limits are more or less. Besides, for mirror/forward packets,
>>> using tc requires a separate physical nic or a vlan, the nic should not
>>> be used for other purpose. if we implement it in QEMU, using an socket
>>> connection to forward packets, we no longer need an separate nic, it
>>> will
>>> reduce the network topology complexity.
>>
>> It depends on how do you design your user space. If you want using
>> userspace to forward the packet, you can 1) use packet socket to capture
>> all traffic on the tap that is used by VM 2) mirror the traffic to a new
>> tap device, the user space can then read all traffic from this new tap
>> device.
>
> Yes, but we can also do it in QEMU space, right? 

Right.

> This will make life easier
> because we do all in one solution within QEMU.

But I'm not sure qemu is the right place to do this as you mention that
it needs userspace protocol stack support.

>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  5:51               ` Yang Hongyang
  2015-07-27  7:37                 ` Jason Wang
@ 2015-07-27  7:53                 ` Jason Wang
  2015-07-27  8:17                   ` Yang Hongyang
  2015-07-27 18:33                   ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 63+ messages in thread
From: Jason Wang @ 2015-07-27  7:53 UTC (permalink / raw)
  To: Yang Hongyang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/27/2015 01:51 PM, Yang Hongyang wrote:
> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>
>>
>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>
>>>
>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>
>>>>
>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>> Hi Stefan:
>>>>>>>       Thanks for your comments!
>>>>>>>
>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>> compare
>>>>>>>> packets.
>>>>>>>>
>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>       Yes, that is the previous solution the COLO sub-community
>>>>>>> choose
>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>> thus we
>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>
>>>>>>
>>>>>> Hi:
>>>>>>
>>>>>> Could you pls describe more details on this decision? What's the
>>>>>> reason
>>>>>> that you realize it was not the best choice?
>>>>>
>>>>> Below is my opinion:
>>>>>
>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>      implemented as a nf conntrack extension. Adding a extension
>>>>> need to
>>>>>      modify the extension struct in-kernel, so recompile kernel is
>>>>> needed.
>>>>
>>>> There's no need to do all in kernel, you can use a separate process to
>>>> do the comparing and trigger the state sync through monitor.
>>>
>>> I don't get it, colo-proxy kernel module using a kthread do the
>>> comparing and
>>> trigger the state sync. We implemented it as a nf conntrack extension
>>> module,
>>> so we need to extend the extension struct in-kernel, although it just
>>> needs
>>> few lines changes to kernel, but a recompile of kernel is needed.
>>> Are you
>>> talking about not implement it as a nf conntrack extension?
>>
>> Yes, I mean implement the comparing in userspace but not in qemu.
>
> Yes, it is an alternative, that requires other components such as
> netfilter userspace tools, it will add the complexity I think, we
> wanted to implement a simple solution in QEMU.

I didn't get the point that why netfilter is needed? Do you mean the
packet comparing needs to be stateful?

> Another reason is
> that using other userspace tools will affect the performance, the
> context switch between kernel and userspace may be an overhead.

We can use 100% time of this process but looks like your RFC of filter
just did it in iothread?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  7:37                 ` Jason Wang
@ 2015-07-27  7:49                   ` Yang Hongyang
  2015-07-27  8:06                     ` Jason Wang
  0 siblings, 1 reply; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27  7:49 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert

On 07/27/2015 03:37 PM, Jason Wang wrote:
>
>
> On 07/27/2015 01:51 PM, Yang Hongyang wrote:
>> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>>
>>>
>>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>>
>>>>
>>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>>> Hi Stefan:
>>>>>>>>        Thanks for your comments!
>>>>>>>>
>>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>>> compare
>>>>>>>>> packets.
>>>>>>>>>
>>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>>        Yes, that is the previous solution the COLO sub-community
>>>>>>>> choose
>>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>>> thus we
>>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>>
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> Could you pls describe more details on this decision? What's the
>>>>>>> reason
>>>>>>> that you realize it was not the best choice?
>>>>>>
>>>>>> Below is my opinion:
>>>>>>
>>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>>       implemented as a nf conntrack extension. Adding a extension
>>>>>> need to
>>>>>>       modify the extension struct in-kernel, so recompile kernel is
>>>>>> needed.
>>>>>
>>>>> There's no need to do all in kernel, you can use a separate process to
>>>>> do the comparing and trigger the state sync through monitor.
>>>>
>>>> I don't get it, colo-proxy kernel module using a kthread do the
>>>> comparing and
>>>> trigger the state sync. We implemented it as a nf conntrack extension
>>>> module,
>>>> so we need to extend the extension struct in-kernel, although it just
>>>> needs
>>>> few lines changes to kernel, but a recompile of kernel is needed.
>>>> Are you
>>>> talking about not implement it as a nf conntrack extension?
>>>
>>> Yes, I mean implement the comparing in userspace but not in qemu.
>>
>> Yes, it is an alternative, that requires other components such as
>> netfilter userspace tools, it will add the complexity I think, we
>> wanted to implement a simple solution in QEMU. Another reason is
>> that using other userspace tools will affect the performance, the
>> context switch between kernel and userspace may be an overhead.
>>
>>>
>>>>
>>>>>
>>>>>> 2. We need to recompile iptables/nftables to use together with the
>>>>>> colo-proxy
>>>>>>       kernel module.
>>>>>> 3. Need to configure primary host to forward input packets to
>>>>>> secondary as
>>>>>>       well as configure secondary to forward output packets to primary
>>>>>> host, the
>>>>>>       network topology and configuration is too complex for a regular
>>>>>> user.
>>>>>>
>>>>>
>>>>> You can use current kernel primitives to mirror the traffic of both
>>>>> PVM
>>>>> and SVM to another process without any modification of kernel. And
>>>>> qemu
>>>>> can offload all network configuration to management in this case.  And
>>>>> what's more import, this works for vhost. Filtering in qemu won't work
>>>>> for vhost.
>>>>
>>>> We are using tc to mirror/forward packets now. Implement in QEMU do
>>>> have some
>>>> limits, but there're also limits in kernel, if the packet do not pass
>>>> the host kernel TCP/IP stack, such as vhost-user.
>>>
>>> But the limits are much less than userspace, no? For vhost-user, maybe
>>> we could extend the backed to mirror the traffic also.
>>
>> IMO the limits are more or less. Besides, for mirror/forward packets,
>> using tc requires a separate physical nic or a vlan, the nic should not
>> be used for other purpose. if we implement it in QEMU, using an socket
>> connection to forward packets, we no longer need an separate nic, it will
>> reduce the network topology complexity.
>
> It depends on how do you design your user space. If you want using
> userspace to forward the packet, you can 1) use packet socket to capture
> all traffic on the tap that is used by VM 2) mirror the traffic to a new
> tap device, the user space can then read all traffic from this new tap
> device.

Yes, but we can also do it in QEMU space, right? This will make life easier
because we do all in one solution within QEMU.

>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  5:51               ` Yang Hongyang
@ 2015-07-27  7:37                 ` Jason Wang
  2015-07-27  7:49                   ` Yang Hongyang
  2015-07-27  7:53                 ` Jason Wang
  1 sibling, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-27  7:37 UTC (permalink / raw)
  To: Yang Hongyang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/27/2015 01:51 PM, Yang Hongyang wrote:
> On 07/27/2015 12:49 PM, Jason Wang wrote:
>>
>>
>> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>>
>>>
>>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>>
>>>>
>>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>>> Hi Stefan:
>>>>>>>       Thanks for your comments!
>>>>>>>
>>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>>> compare
>>>>>>>> packets.
>>>>>>>>
>>>>>>>> I thought there is a kernel module to do that?
>>>>>>>       Yes, that is the previous solution the COLO sub-community
>>>>>>> choose
>>>>>>> to go, but we realized it might be not the best choices, and
>>>>>>> thus we
>>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>>
>>>>>>
>>>>>> Hi:
>>>>>>
>>>>>> Could you pls describe more details on this decision? What's the
>>>>>> reason
>>>>>> that you realize it was not the best choice?
>>>>>
>>>>> Below is my opinion:
>>>>>
>>>>> We realized that there're disadvantages do it in kernel spaces:
>>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>>      implemented as a nf conntrack extension. Adding a extension
>>>>> need to
>>>>>      modify the extension struct in-kernel, so recompile kernel is
>>>>> needed.
>>>>
>>>> There's no need to do all in kernel, you can use a separate process to
>>>> do the comparing and trigger the state sync through monitor.
>>>
>>> I don't get it, colo-proxy kernel module using a kthread do the
>>> comparing and
>>> trigger the state sync. We implemented it as a nf conntrack extension
>>> module,
>>> so we need to extend the extension struct in-kernel, although it just
>>> needs
>>> few lines changes to kernel, but a recompile of kernel is needed.
>>> Are you
>>> talking about not implement it as a nf conntrack extension?
>>
>> Yes, I mean implement the comparing in userspace but not in qemu.
>
> Yes, it is an alternative, that requires other components such as
> netfilter userspace tools, it will add the complexity I think, we
> wanted to implement a simple solution in QEMU. Another reason is
> that using other userspace tools will affect the performance, the
> context switch between kernel and userspace may be an overhead.
>
>>
>>>
>>>>
>>>>> 2. We need to recompile iptables/nftables to use together with the
>>>>> colo-proxy
>>>>>      kernel module.
>>>>> 3. Need to configure primary host to forward input packets to
>>>>> secondary as
>>>>>      well as configure secondary to forward output packets to primary
>>>>> host, the
>>>>>      network topology and configuration is too complex for a regular
>>>>> user.
>>>>>
>>>>
>>>> You can use current kernel primitives to mirror the traffic of both
>>>> PVM
>>>> and SVM to another process without any modification of kernel. And
>>>> qemu
>>>> can offload all network configuration to management in this case.  And
>>>> what's more import, this works for vhost. Filtering in qemu won't work
>>>> for vhost.
>>>
>>> We are using tc to mirror/forward packets now. Implement in QEMU do
>>> have some
>>> limits, but there're also limits in kernel, if the packet do not pass
>>> the host kernel TCP/IP stack, such as vhost-user.
>>
>> But the limits are much less than userspace, no? For vhost-user, maybe
>> we could extend the backed to mirror the traffic also.
>
> IMO the limits are more or less. Besides, for mirror/forward packets,
> using tc requires a separate physical nic or a vlan, the nic should not
> be used for other purpose. if we implement it in QEMU, using an socket
> connection to forward packets, we no longer need an separate nic, it will
> reduce the network topology complexity.

It depends on how do you design your user space. If you want using
userspace to forward the packet, you can 1) use packet socket to capture
all traffic on the tap that is used by VM 2) mirror the traffic to a new
tap device, the user space can then read all traffic from this new tap
device.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  4:49             ` Jason Wang
@ 2015-07-27  5:51               ` Yang Hongyang
  2015-07-27  7:37                 ` Jason Wang
  2015-07-27  7:53                 ` Jason Wang
  0 siblings, 2 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27  5:51 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert

On 07/27/2015 12:49 PM, Jason Wang wrote:
>
>
> On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>>
>>
>> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>>
>>>
>>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>>> Hi Jason,
>>>>
>>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>>
>>>>>
>>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>>> Hi Stefan:
>>>>>>       Thanks for your comments!
>>>>>>
>>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>>> compare
>>>>>>> packets.
>>>>>>>
>>>>>>> I thought there is a kernel module to do that?
>>>>>>       Yes, that is the previous solution the COLO sub-community choose
>>>>>> to go, but we realized it might be not the best choices, and thus we
>>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>>
>>>>>
>>>>> Hi:
>>>>>
>>>>> Could you pls describe more details on this decision? What's the
>>>>> reason
>>>>> that you realize it was not the best choice?
>>>>
>>>> Below is my opinion:
>>>>
>>>> We realized that there're disadvantages do it in kernel spaces:
>>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>>      implemented as a nf conntrack extension. Adding a extension need to
>>>>      modify the extension struct in-kernel, so recompile kernel is
>>>> needed.
>>>
>>> There's no need to do all in kernel, you can use a separate process to
>>> do the comparing and trigger the state sync through monitor.
>>
>> I don't get it, colo-proxy kernel module using a kthread do the
>> comparing and
>> trigger the state sync. We implemented it as a nf conntrack extension
>> module,
>> so we need to extend the extension struct in-kernel, although it just
>> needs
>> few lines changes to kernel, but a recompile of kernel is needed. Are you
>> talking about not implement it as a nf conntrack extension?
>
> Yes, I mean implement the comparing in userspace but not in qemu.

Yes, it is an alternative, that requires other components such as
netfilter userspace tools, it will add the complexity I think, we
wanted to implement a simple solution in QEMU. Another reason is
that using other userspace tools will affect the performance, the
context switch between kernel and userspace may be an overhead.

>
>>
>>>
>>>> 2. We need to recompile iptables/nftables to use together with the
>>>> colo-proxy
>>>>      kernel module.
>>>> 3. Need to configure primary host to forward input packets to
>>>> secondary as
>>>>      well as configure secondary to forward output packets to primary
>>>> host, the
>>>>      network topology and configuration is too complex for a regular
>>>> user.
>>>>
>>>
>>> You can use current kernel primitives to mirror the traffic of both PVM
>>> and SVM to another process without any modification of kernel. And qemu
>>> can offload all network configuration to management in this case.  And
>>> what's more import, this works for vhost. Filtering in qemu won't work
>>> for vhost.
>>
>> We are using tc to mirror/forward packets now. Implement in QEMU do
>> have some
>> limits, but there're also limits in kernel, if the packet do not pass
>> the host kernel TCP/IP stack, such as vhost-user.
>
> But the limits are much less than userspace, no? For vhost-user, maybe
> we could extend the backed to mirror the traffic also.

IMO the limits are more or less. Besides, for mirror/forward packets,
using tc requires a separate physical nic or a vlan, the nic should not
be used for other purpose. if we implement it in QEMU, using an socket
connection to forward packets, we no longer need an separate nic, it will
reduce the network topology complexity.


>
>>
>>>
>>>
>>>> You can refer to http://wiki.qemu.org/Features/COLO
>>>> to see the network topology and the steps to setup an env.
>>>
>>> The figure "COLO Framework" shows there's a proxy kernel module in
>>> primary node but in secondary node this is done through a process? This
>>> will complicate the environment a bit more.
>>
>> proxy kernel module also works for secondary node.
>>
>>>
>>>>
>>>> Setup a test env is too complex. The usability is so important to a
>>>> feature
>>>> like COLO which provide VM FT solution, if fewer people can/willing to
>>>> setup the env, the feature is useless. So we decide to develop user
>>>> space
>>>> colo-proxy.
>>>
>>> If the setup is too complex, need to consider to simplify or reuse codes
>>> and designs. Otherwise you probably introduce something new that needs
>>> fault tolerance.
>>>
>>>>
>>>> The advantage is obvious,
>>>> 1. we do not need to recompile kernel.
>>>> 2. No need to recompile iptables/nftables.
>>>
>>> As I descried above, looks like there's no need to modify kernel.
>>>
>>>> 3. we do not need to deal with the network configuration, we just
>>>> using a
>>>>      socket connection between 2 QEMUs to forward packets.
>>>
>>> All network configurations should be offloaded to management. And you
>>> still need a dedicated topology according to the wiki.
>>>
>>>> 4. A complete VM FT solution in one go, we have already developed the
>>>> block
>>>>      replication in QEMU, so with the network replication in QEMU, all
>>>>      components we needed are within QEMU, this is very important, it
>>>> greatly
>>>>      improves the usability of COLO feature! We hope it will gain more
>>>> testers,
>>>>      users and developers.
>>>
>>> Is your block solution works for vhost?
>>
>> No, it can't works for vhost and dataplane, migration also won't work
>> for dataplane IIRC.
>>
>>>
>>>> 5. QEMU will gain a complete VM FT solution and the most advantage FT
>>>> solution
>>>>      so far!
>>>>
>>>> Overall, usability is the most important factor that impact our choice.
>>>>
>>>>
>>>
>>> Usability will be improved if you can use exist primitives and decouple
>>> unnecessary codes from qemu.
>>>
>>> Thanks
>>>
>>>>>
>>>>> Thanks
>>>>> .
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  3:54           ` Yang Hongyang
@ 2015-07-27  4:49             ` Jason Wang
  2015-07-27  5:51               ` Yang Hongyang
  0 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-27  4:49 UTC (permalink / raw)
  To: Yang Hongyang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/27/2015 11:54 AM, Yang Hongyang wrote:
>
>
> On 07/27/2015 11:24 AM, Jason Wang wrote:
>>
>>
>> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>>> Hi Jason,
>>>
>>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>>
>>>>
>>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>>> Hi Stefan:
>>>>>      Thanks for your comments!
>>>>>
>>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>>> We are planning to implement colo-proxy in qemu to cache and
>>>>>>> compare
>>>>>> packets.
>>>>>>
>>>>>> I thought there is a kernel module to do that?
>>>>>      Yes, that is the previous solution the COLO sub-community choose
>>>>> to go, but we realized it might be not the best choices, and thus we
>>>>> want to bring discussion back here :)  More comments are welcome.
>>>>>
>>>>
>>>> Hi:
>>>>
>>>> Could you pls describe more details on this decision? What's the
>>>> reason
>>>> that you realize it was not the best choice?
>>>
>>> Below is my opinion:
>>>
>>> We realized that there're disadvantages do it in kernel spaces:
>>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>>     implemented as a nf conntrack extension. Adding a extension need to
>>>     modify the extension struct in-kernel, so recompile kernel is
>>> needed.
>>
>> There's no need to do all in kernel, you can use a separate process to
>> do the comparing and trigger the state sync through monitor.
>
> I don't get it, colo-proxy kernel module using a kthread do the
> comparing and
> trigger the state sync. We implemented it as a nf conntrack extension
> module,
> so we need to extend the extension struct in-kernel, although it just
> needs
> few lines changes to kernel, but a recompile of kernel is needed. Are you
> talking about not implement it as a nf conntrack extension?

Yes, I mean implement the comparing in userspace but not in qemu.

>
>>
>>> 2. We need to recompile iptables/nftables to use together with the
>>> colo-proxy
>>>     kernel module.
>>> 3. Need to configure primary host to forward input packets to
>>> secondary as
>>>     well as configure secondary to forward output packets to primary
>>> host, the
>>>     network topology and configuration is too complex for a regular
>>> user.
>>>
>>
>> You can use current kernel primitives to mirror the traffic of both PVM
>> and SVM to another process without any modification of kernel. And qemu
>> can offload all network configuration to management in this case.  And
>> what's more import, this works for vhost. Filtering in qemu won't work
>> for vhost.
>
> We are using tc to mirror/forward packets now. Implement in QEMU do
> have some
> limits, but there're also limits in kernel, if the packet do not pass
> the host kernel TCP/IP stack, such as vhost-user.

But the limits are much less than userspace, no? For vhost-user, maybe
we could extend the backed to mirror the traffic also.

>
>>
>>
>>> You can refer to http://wiki.qemu.org/Features/COLO
>>> to see the network topology and the steps to setup an env.
>>
>> The figure "COLO Framework" shows there's a proxy kernel module in
>> primary node but in secondary node this is done through a process? This
>> will complicate the environment a bit more.
>
> proxy kernel module also works for secondary node.
>
>>
>>>
>>> Setup a test env is too complex. The usability is so important to a
>>> feature
>>> like COLO which provide VM FT solution, if fewer people can/willing to
>>> setup the env, the feature is useless. So we decide to develop user
>>> space
>>> colo-proxy.
>>
>> If the setup is too complex, need to consider to simplify or reuse codes
>> and designs. Otherwise you probably introduce something new that needs
>> fault tolerance.
>>
>>>
>>> The advantage is obvious,
>>> 1. we do not need to recompile kernel.
>>> 2. No need to recompile iptables/nftables.
>>
>> As I descried above, looks like there's no need to modify kernel.
>>
>>> 3. we do not need to deal with the network configuration, we just
>>> using a
>>>     socket connection between 2 QEMUs to forward packets.
>>
>> All network configurations should be offloaded to management. And you
>> still need a dedicated topology according to the wiki.
>>
>>> 4. A complete VM FT solution in one go, we have already developed the
>>> block
>>>     replication in QEMU, so with the network replication in QEMU, all
>>>     components we needed are within QEMU, this is very important, it
>>> greatly
>>>     improves the usability of COLO feature! We hope it will gain more
>>> testers,
>>>     users and developers.
>>
>> Is your block solution works for vhost?
>
> No, it can't works for vhost and dataplane, migration also won't work
> for dataplane IIRC.
>
>>
>>> 5. QEMU will gain a complete VM FT solution and the most advantage FT
>>> solution
>>>     so far!
>>>
>>> Overall, usability is the most important factor that impact our choice.
>>>
>>>
>>
>> Usability will be improved if you can use exist primitives and decouple
>> unnecessary codes from qemu.
>>
>> Thanks
>>
>>>>
>>>> Thanks
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-27  3:24         ` Jason Wang
@ 2015-07-27  3:54           ` Yang Hongyang
  2015-07-27  4:49             ` Jason Wang
  0 siblings, 1 reply; 63+ messages in thread
From: Yang Hongyang @ 2015-07-27  3:54 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/27/2015 11:24 AM, Jason Wang wrote:
>
>
> On 07/24/2015 04:04 PM, Yang Hongyang wrote:
>> Hi Jason,
>>
>> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>>
>>>
>>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>>> Hi Stefan:
>>>>      Thanks for your comments!
>>>>
>>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>>> We are planning to implement colo-proxy in qemu to cache and compare
>>>>> packets.
>>>>>
>>>>> I thought there is a kernel module to do that?
>>>>      Yes, that is the previous solution the COLO sub-community choose
>>>> to go, but we realized it might be not the best choices, and thus we
>>>> want to bring discussion back here :)  More comments are welcome.
>>>>
>>>
>>> Hi:
>>>
>>> Could you pls describe more details on this decision? What's the reason
>>> that you realize it was not the best choice?
>>
>> Below is my opinion:
>>
>> We realized that there're disadvantages do it in kernel spaces:
>> 1. We need to recompile kernel: the colo-proxy kernel module is
>>     implemented as a nf conntrack extension. Adding a extension need to
>>     modify the extension struct in-kernel, so recompile kernel is needed.
>
> There's no need to do all in kernel, you can use a separate process to
> do the comparing and trigger the state sync through monitor.

I don't get it, colo-proxy kernel module using a kthread do the comparing and
trigger the state sync. We implemented it as a nf conntrack extension module,
so we need to extend the extension struct in-kernel, although it just needs
few lines changes to kernel, but a recompile of kernel is needed. Are you
talking about not implement it as a nf conntrack extension?

>
>> 2. We need to recompile iptables/nftables to use together with the
>> colo-proxy
>>     kernel module.
>> 3. Need to configure primary host to forward input packets to
>> secondary as
>>     well as configure secondary to forward output packets to primary
>> host, the
>>     network topology and configuration is too complex for a regular user.
>>
>
> You can use current kernel primitives to mirror the traffic of both PVM
> and SVM to another process without any modification of kernel. And qemu
> can offload all network configuration to management in this case.  And
> what's more import, this works for vhost. Filtering in qemu won't work
> for vhost.

We are using tc to mirror/forward packets now. Implement in QEMU do have some
limits, but there're also limits in kernel, if the packet do not pass
the host kernel TCP/IP stack, such as vhost-user.

>
>
>> You can refer to http://wiki.qemu.org/Features/COLO
>> to see the network topology and the steps to setup an env.
>
> The figure "COLO Framework" shows there's a proxy kernel module in
> primary node but in secondary node this is done through a process? This
> will complicate the environment a bit more.

proxy kernel module also works for secondary node.

>
>>
>> Setup a test env is too complex. The usability is so important to a
>> feature
>> like COLO which provide VM FT solution, if fewer people can/willing to
>> setup the env, the feature is useless. So we decide to develop user space
>> colo-proxy.
>
> If the setup is too complex, need to consider to simplify or reuse codes
> and designs. Otherwise you probably introduce something new that needs
> fault tolerance.
>
>>
>> The advantage is obvious,
>> 1. we do not need to recompile kernel.
>> 2. No need to recompile iptables/nftables.
>
> As I descried above, looks like there's no need to modify kernel.
>
>> 3. we do not need to deal with the network configuration, we just using a
>>     socket connection between 2 QEMUs to forward packets.
>
> All network configurations should be offloaded to management. And you
> still need a dedicated topology according to the wiki.
>
>> 4. A complete VM FT solution in one go, we have already developed the
>> block
>>     replication in QEMU, so with the network replication in QEMU, all
>>     components we needed are within QEMU, this is very important, it
>> greatly
>>     improves the usability of COLO feature! We hope it will gain more
>> testers,
>>     users and developers.
>
> Is your block solution works for vhost?

No, it can't works for vhost and dataplane, migration also won't work
for dataplane IIRC.

>
>> 5. QEMU will gain a complete VM FT solution and the most advantage FT
>> solution
>>     so far!
>>
>> Overall, usability is the most important factor that impact our choice.
>>
>>
>
> Usability will be improved if you can use exist primitives and decouple
> unnecessary codes from qemu.
>
> Thanks
>
>>>
>>> Thanks
>>> .
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-24  8:04       ` Yang Hongyang
@ 2015-07-27  3:24         ` Jason Wang
  2015-07-27  3:54           ` Yang Hongyang
  2015-07-27 10:40         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-27  3:24 UTC (permalink / raw)
  To: Yang Hongyang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, peter.huangpeng, qemu-devel,
	Gonglei (Arei),
	stefanha, dgilbert



On 07/24/2015 04:04 PM, Yang Hongyang wrote:
> Hi Jason,
>
> On 07/24/2015 10:12 AM, Jason Wang wrote:
>>
>>
>> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>>> Hi Stefan:
>>>     Thanks for your comments!
>>>
>>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>>> We are planning to implement colo-proxy in qemu to cache and compare
>>>> packets.
>>>>
>>>> I thought there is a kernel module to do that?
>>>     Yes, that is the previous solution the COLO sub-community choose
>>> to go, but we realized it might be not the best choices, and thus we
>>> want to bring discussion back here :)  More comments are welcome.
>>>
>>
>> Hi:
>>
>> Could you pls describe more details on this decision? What's the reason
>> that you realize it was not the best choice?
>
> Below is my opinion:
>
> We realized that there're disadvantages do it in kernel spaces:
> 1. We need to recompile kernel: the colo-proxy kernel module is
>    implemented as a nf conntrack extension. Adding a extension need to
>    modify the extension struct in-kernel, so recompile kernel is needed.

There's no need to do all in kernel, you can use a separate process to
do the comparing and trigger the state sync through monitor.

> 2. We need to recompile iptables/nftables to use together with the
> colo-proxy
>    kernel module.
> 3. Need to configure primary host to forward input packets to
> secondary as
>    well as configure secondary to forward output packets to primary
> host, the
>    network topology and configuration is too complex for a regular user.
>

You can use current kernel primitives to mirror the traffic of both PVM
and SVM to another process without any modification of kernel. And qemu
can offload all network configuration to management in this case.  And
what's more import, this works for vhost. Filtering in qemu won't work
for vhost.


> You can refer to http://wiki.qemu.org/Features/COLO
> to see the network topology and the steps to setup an env.

The figure "COLO Framework" shows there's a proxy kernel module in
primary node but in secondary node this is done through a process? This
will complicate the environment a bit more.

>
> Setup a test env is too complex. The usability is so important to a
> feature
> like COLO which provide VM FT solution, if fewer people can/willing to
> setup the env, the feature is useless. So we decide to develop user space
> colo-proxy.

If the setup is too complex, need to consider to simplify or reuse codes
and designs. Otherwise you probably introduce something new that needs
fault tolerance.

>
> The advantage is obvious,
> 1. we do not need to recompile kernel.
> 2. No need to recompile iptables/nftables.

As I descried above, looks like there's no need to modify kernel.

> 3. we do not need to deal with the network configuration, we just using a
>    socket connection between 2 QEMUs to forward packets.

All network configurations should be offloaded to management. And you
still need a dedicated topology according to the wiki.

> 4. A complete VM FT solution in one go, we have already developed the
> block
>    replication in QEMU, so with the network replication in QEMU, all
>    components we needed are within QEMU, this is very important, it
> greatly
>    improves the usability of COLO feature! We hope it will gain more
> testers,
>    users and developers.

Is your block solution works for vhost?

> 5. QEMU will gain a complete VM FT solution and the most advantage FT
> solution
>    so far!
>
> Overall, usability is the most important factor that impact our choice.
>
>

Usability will be improved if you can use exist primitives and decouple
unnecessary codes from qemu.

Thanks

>>
>> Thanks
>> .
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-24  2:12     ` Jason Wang
@ 2015-07-24  8:04       ` Yang Hongyang
  2015-07-27  3:24         ` Jason Wang
  2015-07-27 10:40         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 63+ messages in thread
From: Yang Hongyang @ 2015-07-24  8:04 UTC (permalink / raw)
  To: Jason Wang, Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, dgilbert

Hi Jason,

On 07/24/2015 10:12 AM, Jason Wang wrote:
>
>
> On 07/24/2015 10:04 AM, Dong, Eddie wrote:
>> Hi Stefan:
>> 	Thanks for your comments!
>>
>>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>>> We are planning to implement colo-proxy in qemu to cache and compare
>>> packets.
>>>
>>> I thought there is a kernel module to do that?
>> 	Yes, that is the previous solution the COLO sub-community choose to go, but we realized it might be not the best choices, and thus we want to bring discussion back here :)  More comments are welcome.
>>
>
> Hi:
>
> Could you pls describe more details on this decision? What's the reason
> that you realize it was not the best choice?

Below is my opinion:

We realized that there're disadvantages do it in kernel spaces:
1. We need to recompile kernel: the colo-proxy kernel module is
    implemented as a nf conntrack extension. Adding a extension need to
    modify the extension struct in-kernel, so recompile kernel is needed.
2. We need to recompile iptables/nftables to use together with the colo-proxy
    kernel module.
3. Need to configure primary host to forward input packets to secondary as
    well as configure secondary to forward output packets to primary host, the
    network topology and configuration is too complex for a regular user.

You can refer to http://wiki.qemu.org/Features/COLO
to see the network topology and the steps to setup an env.

Setup a test env is too complex. The usability is so important to a feature
like COLO which provide VM FT solution, if fewer people can/willing to
setup the env, the feature is useless. So we decide to develop user space
colo-proxy.

The advantage is obvious,
1. we do not need to recompile kernel.
2. No need to recompile iptables/nftables.
3. we do not need to deal with the network configuration, we just using a
    socket connection between 2 QEMUs to forward packets.
4. A complete VM FT solution in one go, we have already developed the block
    replication in QEMU, so with the network replication in QEMU, all
    components we needed are within QEMU, this is very important, it greatly
    improves the usability of COLO feature! We hope it will gain more testers,
    users and developers.
5. QEMU will gain a complete VM FT solution and the most advantage FT solution
    so far!

Overall, usability is the most important factor that impact our choice.


>
> Thanks
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-24  2:04   ` Dong, Eddie
@ 2015-07-24  2:12     ` Jason Wang
  2015-07-24  8:04       ` Yang Hongyang
  0 siblings, 1 reply; 63+ messages in thread
From: Jason Wang @ 2015-07-24  2:12 UTC (permalink / raw)
  To: Dong, Eddie, Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, qemu-devel, peter.huangpeng,
	Gonglei (Arei),
	stefanha, Yang Hongyang, dgilbert



On 07/24/2015 10:04 AM, Dong, Eddie wrote:
> Hi Stefan:
> 	Thanks for your comments!
>
>> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>>> We are planning to implement colo-proxy in qemu to cache and compare
>> packets.
>>
>> I thought there is a kernel module to do that?
> 	Yes, that is the previous solution the COLO sub-community choose to go, but we realized it might be not the best choices, and thus we want to bring discussion back here :)  More comments are welcome.
>

Hi:

Could you pls describe more details on this decision? What's the reason
that you realize it was not the best choice?

Thanks

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20  6:42 [Qemu-devel] [POC] colo-proxy " Li Zhijian
  2015-07-20 10:32 ` Stefan Hajnoczi
@ 2015-07-24  2:05 ` Dong, Eddie
  2015-07-30  4:23 ` Jason Wang
  2 siblings, 0 replies; 63+ messages in thread
From: Dong, Eddie @ 2015-07-24  2:05 UTC (permalink / raw)
  To: Li Zhijian, qemu-devel, stefanha, jasowang
  Cc: zhanghailiang, jan.kiszka, Dong, Eddie, dgilbert,
	peter.huangpeng, Gonglei (Arei),
	Yang Hongyang

BTW,  I felt it is better to be called as an agency, rather than a proxy. Any comments from native speaker?

Thx Eddie



> Hi, all
> 
> We are planning to implement colo-proxy in qemu to cache and compare
> packets.
> This module is one of the important component of COLO project and now it is
> still in early stage, so any comments and feedback are warmly welcomed, thanks
> in advance.
> 
> ## Background
> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
> project is a high availability solution. Both Primary VM (PVM) and Secondary
> VM
> (SVM) run in parallel. They receive the same request from client, and generate
> responses in parallel too. If the response packets from PVM and SVM are
> identical, they are released immediately. Otherwise, a VM checkpoint (on
> demand) is conducted.
> Paper:
> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
> COLO on Xen:
> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
> COLO on Qemu/KVM:
> http://wiki.qemu.org/Features/COLO
> 
> By the needs of capturing response packets from PVM and SVM and finding out
> whether they are identical, we introduce a new module to qemu networking
> called colo-proxy.
> 
> This document describes the design of the colo-proxy module
> 
> ## Glossary
>    PVM - Primary VM, which provides services to clients.
>    SVM - Secondary VM, a hot standby and replication of PVM.
>    PN - Primary Node, the host which PVM runs on
>    SN - Secondary Node, the host which SVM runs on
> 
> ## Workflow ##
> The following image show the qemu networking packet datapath between
> guest's NIC and qemu's backend in colo-proxy.
> 
> +---+                                        +---+
> |PN |                                        |SN |
> +---+--------------------------+
> +---+--------------------------+ +------------------------------+
> |               +-------+      |             |   +-------+                  |
> +--------+      |chkpoint<--------[socket]------->chkpoint         +--------+
> |PVM     |      +---^---+      |             |   +---+---+         |SVM     |
> |        |  +proxy--v--------+ |             |       |             |        |
> |        |  |                | |             |       |             |        |
> | +---+  |  | +TCP/IP stack+ | |             | +-----v-------proxy | +---+  |
> +-|NIC|--+  | |            | | |             | |                 | +-|NIC|--+
> | +^-++     | | +--------+ | | |             | | +TCP/IP stack-+ |   +^--+  |
> |  | +------> | | compare| | <-[socket]-forward- | +--------+  | |    |     |
> |  |        | | +---+----+ | | |             | | | |seq&ack |  | <----+     |
> |  |        | +-----|------+ | |             | | | |adjust  |  | |          |
> |  |        |       |        | |             | | | +--------+  | |          |
> |  +-----------<+>-----copy&forward-[socket]---> +-------------+ |          |
> |           +---|---|--------+ |             | +------------^----+          |
> |               |   |          |             |              |               |
> |               |   |          |             |              x               |
> |            +--+---v----+     |             |            +-v---------+     |
> | QEMU       |  backend  |     |             | QEMU       |  backend  |     |
> +------------+  (tap)    +-----+             +------------+  (tap)    +-----+
>               +-----------+                                +-----------+
> 
> ## Our Idea ##
> 
> ### Net filter
> In current QEMU, a packet is transported between networking backend(tap)
> and qemu network adapter(NIC) directly. Backend and adapter is linked by
> NetClientState->peer in qemu as following
>         +----------------------------------------+
>         v                                        |
> +NetClientState+   +------->+NetClientState+    |
> |info->type=TAP|   |        |info->type=NIC|    |
> +--------------+   |        +--------------+    |
> |   *peer      +---+        |   *peer      +----+
> +--------------+            +--------------+
> |name="tap0"   |            |name="e1000"  |
> +--------------+            +--------------+
> | ...          |            | ...          |
> +--------------+            +--------------+
> 
> In COLO QEMU, we insert a net filter named colo-proxy between backend and
> adapter like below:
> typedef struct COLOState {
>      NetClientState nc;
>      NetClientState *peer;
> } COLOState;
>     +------->+NetClientState+            +NetClientState+<--------+
>     |        |info->type=TAP|            |info->type=NIC|         |
>     |        +--------------+            +--------------+         |
> +-----------+   *peer      |            |   *peer      +------------+
> |  |        +--------------+            +--------------+         |  |
> |  |        |name="tap0"   |            |name="e1000"  |         |  |
> |  |        +--------------+            +--------------+         |  |
> |  |        | ...          |            | ...          |         |  |
> |  |        +--------------+            +--------------+         |  |
> |  |                                                             |  |
> |  |   +-COLOState------------+       +-COLOState------------+   |  |
> +--------->+NetClientState+<- - +   +---->+NetClientState+<---------+
>     |   |   |info->type=COLO   | |   | |   |info->type=COLO   |   |
>     |   |   +--------------+   | |   | |   +--------------+   |   |
>     +-------+   *peer      |   | |   | |   |   *peer      +-------+
>         |   +--------------+   | |   | |   +--------------+   |
>         |   |name="colo1"  |   | |   | |   |name="colo2"  |   |
>         |   +--------------+   | |   | |   +--------------+   |
>         |                      | |   | |                      |
>         |   +--------------+---------+ |   +--------------+   |
>         |   |   *peer      |   | |     |   |   *peer      |   |
>         |   +--------------+   | +---------+--------------+   |
>         +----------------------+       +----------------------+
> 
> After we insert colo-proxy filter, all packets will pass by this filter and more
> important thing is that we can analysis packet by ourselves.
> 
> ### QEMU space TCP/IP stack(re-use SLIRP) ### We need a QEMU space TCP/IP
> stack to help us to analysis packet. After looking into QEMU, we found that
> SLIRP
> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.
> 29
> is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, it can
> help use to handle the packet written to/read from backend(tap) device which is
> just like a link layer(L2) packet.
> 
> ### packet enqueue and compare ###
> Together with QEMU space TCP/IP stack, we enqueue all packets sent by PVM
> and SVM on Primary QEMU, and then compare the packet payload for each
> connection.
> 
> ### Net filter Usage ###
> On both Primary/Secondary host, invoke QEMU with the following parameters
> to insert a net filter(colo-proxy):
>    "-netdev tap,id=hn0 -device e1000,netdev=hn0 \
>     -netdev colo,id=colo,backend=hn0"
> 


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 10:32 ` Stefan Hajnoczi
  2015-07-20 11:55   ` zhanghailiang
  2015-07-20 12:02   ` Li Zhijian
@ 2015-07-24  2:04   ` Dong, Eddie
  2015-07-24  2:12     ` Jason Wang
  2 siblings, 1 reply; 63+ messages in thread
From: Dong, Eddie @ 2015-07-24  2:04 UTC (permalink / raw)
  To: Stefan Hajnoczi, Li Zhijian
  Cc: zhanghailiang, jan.kiszka, jasowang, Dong, Eddie, qemu-devel,
	peter.huangpeng, Gonglei (Arei),
	stefanha, Yang Hongyang, dgilbert

Hi Stefan:
	Thanks for your comments!

> 
> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
> > We are planning to implement colo-proxy in qemu to cache and compare
> packets.
> 
> I thought there is a kernel module to do that?

	Yes, that is the previous solution the COLO sub-community choose to go, but we realized it might be not the best choices, and thus we want to bring discussion back here :)  More comments are welcome.

> 
> Why does the proxy need to be part of the QEMU process?  -netdev socket or
> host network stack features allow you to process packets in a separate process.

	Hailiang did a very good summary, and we don't need privilege ops so far. The main thing that motivated us to revisit is because the former kernel land driver would be a pure virtualization (and for high availability) component. It may have limited user only at very beginning and thus slower to be accepted. And, people typically like to put component to be in user land as if it can. 

	In addition, as a pure virtualization feature, we guess people in Qemu mailing list may be much more interested to support all the VMMs, such as pure Qemu VMM, KVM, etc. 


Thx Eddie 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-21  6:13         ` Jan Kiszka
@ 2015-07-21  9:49           ` Stefan Hajnoczi
  2015-07-27 10:13             ` Stefan Hajnoczi
  0 siblings, 1 reply; 63+ messages in thread
From: Stefan Hajnoczi @ 2015-07-21  9:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: zhanghailiang, Li Zhijian, Stefan Hajnoczi, Jason Wang,
	qemu-devel, Vasiliy Tolstov, Dave Gilbert, Gonglei (Arei),
	Huangpeng (Peter),
	Yang Hongyang

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Tue, Jul 21, 2015 at 08:13:42AM +0200, Jan Kiszka wrote:
> On 2015-07-20 17:01, Stefan Hajnoczi wrote:
> > On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
> >> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
> >>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
> >>> have to supplement it.
> >>
> >>
> >> patch for ipv6 slirp support some times ago sended to qemu list, but i
> >> don't know why in not accepted.
> > 
> > I think no one reviewed it but there was no objection against IPv6
> > support in principle.
> > 
> > Jan: Can we merge slirp IPv6 support for QEMU 2.5?
> 
> Sorry, as I pointed out some time back, I don't have the bandwidth to
> look into slirp. Someone need to do a review, then send a pull request.

Do you want to remove yourself from the slirp section of the MAINTAINERS
file?

Going forward we'll need to find someone familiar with the QEMU
development process and with enough time to review slirp patches.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 15:01       ` Stefan Hajnoczi
  2015-07-21  1:59         ` zhanghailiang
@ 2015-07-21  6:13         ` Jan Kiszka
  2015-07-21  9:49           ` Stefan Hajnoczi
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2015-07-21  6:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: zhanghailiang, Li Zhijian, Jason Wang, Dave Gilbert,
	Vasiliy Tolstov, qemu-devel, Gonglei (Arei),
	Stefan Hajnoczi, Huangpeng (Peter),
	Yang Hongyang

On 2015-07-20 17:01, Stefan Hajnoczi wrote:
> On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
>> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
>>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
>>> have to supplement it.
>>
>>
>> patch for ipv6 slirp support some times ago sended to qemu list, but i
>> don't know why in not accepted.
> 
> I think no one reviewed it but there was no objection against IPv6
> support in principle.
> 
> Jan: Can we merge slirp IPv6 support for QEMU 2.5?

Sorry, as I pointed out some time back, I don't have the bandwidth to
look into slirp. Someone need to do a review, then send a pull request.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 15:01       ` Stefan Hajnoczi
@ 2015-07-21  1:59         ` zhanghailiang
  2015-07-28 22:13           ` Samuel Thibault
  2015-07-21  6:13         ` Jan Kiszka
  1 sibling, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-07-21  1:59 UTC (permalink / raw)
  To: Stefan Hajnoczi, J. Kiszka
  Cc: Li Zhijian, qemu-devel, Jason Wang, Dave Gilbert,
	Vasiliy Tolstov, peter.huangpeng, Gonglei (Arei),
	Stefan Hajnoczi, samuel.thibault, Yang Hongyang

On 2015/7/20 23:01, Stefan Hajnoczi wrote:
> On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
>> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
>>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
>>> have to supplement it.
>>
>>
>> patch for ipv6 slirp support some times ago sended to qemu list, but i
>> don't know why in not accepted.
>
> I think no one reviewed it but there was no objection against IPv6
> support in principle.
>
> Jan: Can we merge slirp IPv6 support for QEMU 2.5?
>

I have found the corresponding patch series:
https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg01832.html

Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>

Hi Samuel,

What's the status of the 'slirp: Adding IPv6 support to Qemu -net use' series ?
I didn't find any news since that version, are you still trying to push them to qemu upstream ?

Thanks,
zhanghailiang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 13:12     ` Vasiliy Tolstov
@ 2015-07-20 15:01       ` Stefan Hajnoczi
  2015-07-21  1:59         ` zhanghailiang
  2015-07-21  6:13         ` Jan Kiszka
  0 siblings, 2 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2015-07-20 15:01 UTC (permalink / raw)
  To: J. Kiszka
  Cc: zhanghailiang, Li Zhijian, Jason Wang, Dave Gilbert,
	Vasiliy Tolstov, qemu-devel, Gonglei (Arei),
	Stefan Hajnoczi, Huangpeng (Peter),
	Yang Hongyang

On Mon, Jul 20, 2015 at 2:12 PM, Vasiliy Tolstov <v.tolstov@selfip.ru> wrote:
> 2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
>> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
>> have to supplement it.
>
>
> patch for ipv6 slirp support some times ago sended to qemu list, but i
> don't know why in not accepted.

I think no one reviewed it but there was no objection against IPv6
support in principle.

Jan: Can we merge slirp IPv6 support for QEMU 2.5?

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 11:55   ` zhanghailiang
@ 2015-07-20 13:12     ` Vasiliy Tolstov
  2015-07-20 15:01       ` Stefan Hajnoczi
  0 siblings, 1 reply; 63+ messages in thread
From: Vasiliy Tolstov @ 2015-07-20 13:12 UTC (permalink / raw)
  To: zhanghailiang
  Cc: Li Zhijian, Stefan Hajnoczi, jasowang, dgilbert, qemu-devel,
	Gonglei (Arei),
	stefanha, jan.kiszka, peter.huangpeng, Yang Hongyang

2015-07-20 14:55 GMT+03:00 zhanghailiang <zhang.zhanghailiang@huawei.com>:
> Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
> have to supplement it.


patch for ipv6 slirp support some times ago sended to qemu list, but i
don't know why in not accepted.

-- 
Vasiliy Tolstov,
e-mail: v.tolstov@selfip.ru

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 10:32 ` Stefan Hajnoczi
  2015-07-20 11:55   ` zhanghailiang
@ 2015-07-20 12:02   ` Li Zhijian
  2015-07-24  2:04   ` Dong, Eddie
  2 siblings, 0 replies; 63+ messages in thread
From: Li Zhijian @ 2015-07-20 12:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: zhanghailiang, jan.kiszka, jasowang, peter.huangpeng, qemu-devel,
	Gonglei (Arei),
	stefanha, Yang Hongyang, dgilbert

CC Wen Congyang

On 07/20/2015 06:32 PM, Stefan Hajnoczi wrote:
> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>> We are planning to implement colo-proxy in qemu to cache and compare packets.
> I thought there is a kernel module to do that?
>
> Why does the proxy need to be part of the QEMU process?  -netdev socket
> or host network stack features allow you to process packets in a
> separate process.
yes, it used to be a kernel module.
we plan to re-implement a QEMU space colo-proxy by the following reasons:
1. colo-proxy in kernel was based on netfilter, it was
impletmented by add a new nf_ct_ext_id, but this will
touch the existed kernel code and we must re-build the kernel before
we install the colo-proxy modules. For this reason, less people
is like to test colo-proxy and it become harder to post to kenel
2. COLO is the only scene of colo-proxy in kernel
3. colo-proxy in kernel only works on the case where packet will deliver to
kernel tcp/ip stack.

COLO project is mainly including 3 components, COLO-Frame COLO-Block and 
COLO-Proxy.
The first tow components is being post to QEMU, if we integrate proxy 
into QEMU,
it will become convenienter to manage the whole COLO project.
further more, COLO will become easier to configure without depending on 
kernel


> Without details on what the proxy does it's hard to discuss this.  What
> happens in the non-TCP case?  What happens in the TCP case?
more details will be post soon


> Does the proxy need to perform privileged operations, create sockets,
> open files, etc?
IMO, we just need to create a new socket like the migration socket to
forward packet between PVM and SVM.

Best regards.
Li Zhijian

> The slirp code is not actively developed or used much in production.  It
> might be a good idea to audit the code for bugs if you want to use it.
>
> Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20 10:32 ` Stefan Hajnoczi
@ 2015-07-20 11:55   ` zhanghailiang
  2015-07-20 13:12     ` Vasiliy Tolstov
  2015-07-20 12:02   ` Li Zhijian
  2015-07-24  2:04   ` Dong, Eddie
  2 siblings, 1 reply; 63+ messages in thread
From: zhanghailiang @ 2015-07-20 11:55 UTC (permalink / raw)
  To: Stefan Hajnoczi, Li Zhijian
  Cc: qemu-devel, jan.kiszka, jasowang, dgilbert, peter.huangpeng,
	Gonglei (Arei),
	stefanha, Yang Hongyang

On 2015/7/20 18:32, Stefan Hajnoczi wrote:
> On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
>> We are planning to implement colo-proxy in qemu to cache and compare packets.
>
> I thought there is a kernel module to do that?
>

Yes, but we decided to re-implement it in userspace (Here is in qemu),
there are mainly two reasons that we made this change. One is the colo-proxy in kernel is
narrowly used, which can only be used for COLO FT, besides, we have to modify iptables and nftables
to support this capability. IMHO, it is hardly been accepted by the kernel community.
The other reason is that the kernel proxy scenario can't been used in all situations, for example,
evs + vhost-user + dpdk, it can't work if VM's network packets don't go through host's network
stack. (For the new userspace colo proxy scheme, we also can't use it with vhost-net,
we have to use virtio-net instead).

> Why does the proxy need to be part of the QEMU process?  -netdev socket
> or host network stack features allow you to process packets in a
> separate process.
>
> Without details on what the proxy does it's hard to discuss this.  What
> happens in the non-TCP case?  What happens in the TCP case?
>
> Does the proxy need to perform privileged operations, create sockets,
> open files, etc?
>
> The slirp code is not actively developed or used much in production.  It
> might be a good idea to audit the code for bugs if you want to use it.
>

Agreed, besides, it is seemed that slirp is not supporting ipv6, we also
have to supplement it.

Thanks,
zhanghailiang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [POC] colo-proxy in qemu
  2015-07-20  6:42 [Qemu-devel] [POC] colo-proxy " Li Zhijian
@ 2015-07-20 10:32 ` Stefan Hajnoczi
  2015-07-20 11:55   ` zhanghailiang
                     ` (2 more replies)
  2015-07-24  2:05 ` Dong, Eddie
  2015-07-30  4:23 ` Jason Wang
  2 siblings, 3 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2015-07-20 10:32 UTC (permalink / raw)
  To: Li Zhijian
  Cc: zhanghailiang, jan.kiszka, jasowang, peter.huangpeng, qemu-devel,
	Gonglei (Arei),
	stefanha, Yang Hongyang, dgilbert

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

On Mon, Jul 20, 2015 at 02:42:33PM +0800, Li Zhijian wrote:
> We are planning to implement colo-proxy in qemu to cache and compare packets.

I thought there is a kernel module to do that?

Why does the proxy need to be part of the QEMU process?  -netdev socket
or host network stack features allow you to process packets in a
separate process.

Without details on what the proxy does it's hard to discuss this.  What
happens in the non-TCP case?  What happens in the TCP case?

Does the proxy need to perform privileged operations, create sockets,
open files, etc?

The slirp code is not actively developed or used much in production.  It
might be a good idea to audit the code for bugs if you want to use it.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [POC] colo-proxy in qemu
@ 2015-07-20  6:42 Li Zhijian
  2015-07-20 10:32 ` Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Li Zhijian @ 2015-07-20  6:42 UTC (permalink / raw)
  To: qemu-devel, stefanha, jasowang
  Cc: zhanghailiang, jan.kiszka, peter.huangpeng, dgilbert,
	Gonglei (Arei),
	Yang Hongyang

Hi, all

We are planning to implement colo-proxy in qemu to cache and compare packets.
This module is one of the important component of COLO project and now it is
still in early stage, so any comments and feedback are warmly welcomed,
thanks in advance.

## Background
COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
project is a high availability solution. Both Primary VM (PVM) and Secondary VM
(SVM) run in parallel. They receive the same request from client, and generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint (on demand)
is conducted.
Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and finding out
whether they are identical, we introduce a new module to qemu networking called
colo-proxy.

This document describes the design of the colo-proxy module

## Glossary
   PVM - Primary VM, which provides services to clients.
   SVM - Secondary VM, a hot standby and replication of PVM.
   PN - Primary Node, the host which PVM runs on
   SN - Secondary Node, the host which SVM runs on

## Workflow ##
The following image show the qemu networking packet datapath between
guest's NIC and qemu's backend in colo-proxy.

+---+                                        +---+
|PN |                                        |SN |
+---+--------------------------+             +------------------------------+
|               +-------+      |             |   +-------+                  |
+--------+      |chkpoint<--------[socket]------->chkpoint         +--------+
|PVM     |      +---^---+      |             |   +---+---+         |SVM     |
|        |  +proxy--v--------+ |             |       |             |        |
|        |  |                | |             |       |             |        |
| +---+  |  | +TCP/IP stack+ | |             | +-----v-------proxy | +---+  |
+-|NIC|--+  | |            | | |             | |                 | +-|NIC|--+
| +^-++     | | +--------+ | | |             | | +TCP/IP stack-+ |   +^--+  |
|  | +------> | | compare| | <-[socket]-forward- | +--------+  | |    |     |
|  |        | | +---+----+ | | |             | | | |seq&ack |  | <----+     |
|  |        | +-----|------+ | |             | | | |adjust  |  | |          |
|  |        |       |        | |             | | | +--------+  | |          |
|  +-----------<+>-----copy&forward-[socket]---> +-------------+ |          |
|           +---|---|--------+ |             | +------------^----+          |
|               |   |          |             |              |               |
|               |   |          |             |              x               |
|            +--+---v----+     |             |            +-v---------+     |
| QEMU       |  backend  |     |             | QEMU       |  backend  |     |
+------------+  (tap)    +-----+             +------------+  (tap)    +-----+
              +-----------+                                +-----------+

## Our Idea ##

### Net filter
In current QEMU, a packet is transported between networking backend(tap) and
qemu network adapter(NIC) directly. Backend and adapter is linked by
NetClientState->peer in qemu as following
        +----------------------------------------+
        v                                        |
+NetClientState+   +------->+NetClientState+    |
|info->type=TAP|   |        |info->type=NIC|    |
+--------------+   |        +--------------+    |
|   *peer      +---+        |   *peer      +----+
+--------------+            +--------------+
|name="tap0"   |            |name="e1000"  |
+--------------+            +--------------+
| ...          |            | ...          |
+--------------+            +--------------+

In COLO QEMU, we insert a net filter named colo-proxy between backend and
adapter like below:
typedef struct COLOState {
     NetClientState nc;
     NetClientState *peer;
} COLOState;
    +------->+NetClientState+            +NetClientState+<--------+
    |        |info->type=TAP|            |info->type=NIC|         |
    |        +--------------+            +--------------+         |
+-----------+   *peer      |            |   *peer      +------------+
|  |        +--------------+            +--------------+         |  |
|  |        |name="tap0"   |            |name="e1000"  |         |  |
|  |        +--------------+            +--------------+         |  |
|  |        | ...          |            | ...          |         |  |
|  |        +--------------+            +--------------+         |  |
|  |                                                             |  |
|  |   +-COLOState------------+       +-COLOState------------+   |  |
+--------->+NetClientState+<- - +   +---->+NetClientState+<---------+
    |   |   |info->type=COLO   | |   | |   |info->type=COLO   |   |
    |   |   +--------------+   | |   | |   +--------------+   |   |
    +-------+   *peer      |   | |   | |   |   *peer      +-------+
        |   +--------------+   | |   | |   +--------------+   |
        |   |name="colo1"  |   | |   | |   |name="colo2"  |   |
        |   +--------------+   | |   | |   +--------------+   |
        |                      | |   | |                      |
        |   +--------------+---------+ |   +--------------+   |
        |   |   *peer      |   | |     |   |   *peer      |   |
        |   +--------------+   | +---------+--------------+   |
        +----------------------+       +----------------------+

After we insert colo-proxy filter, all packets will pass by this filter
and more important thing is that we can analysis packet by ourselves.

### QEMU space TCP/IP stack(re-use SLIRP) ###
We need a QEMU space TCP/IP stack to help us to analysis packet. After looking
into QEMU, we found that SLIRP
http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29
is a good choice for us. SLIRP proivdes a full TCP/IP stack within QEMU, it can
help use to handle the packet written to/read from backend(tap) device which is
just like a link layer(L2) packet.

### packet enqueue and compare ###
Together with QEMU space TCP/IP stack, we enqueue all packets sent by PVM and
SVM on Primary QEMU, and then compare the packet payload for each connection.

### Net filter Usage ###
On both Primary/Secondary host, invoke QEMU with the following parameters to insert
a net filter(colo-proxy):
   "-netdev tap,id=hn0 -device e1000,netdev=hn0 \
    -netdev colo,id=colo,backend=hn0"

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2015-11-13 12:33 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10  5:26 [Qemu-devel] [POC]colo-proxy in qemu Tkid
2015-11-10  7:35 ` Jason Wang
2015-11-10  8:30   ` zhanghailiang
2015-11-11  2:28     ` Jason Wang
2015-11-10  9:35   ` Tkid
2015-11-11  3:04     ` Jason Wang
2015-11-10  9:41   ` Dr. David Alan Gilbert
2015-11-11  3:09     ` Jason Wang
2015-11-11  9:03       ` Dr. David Alan Gilbert
2015-11-11  1:23   ` Dong, Eddie
2015-11-11  3:26     ` Jason Wang
2015-11-10 10:54 ` Dr. David Alan Gilbert
2015-11-11  2:46   ` Zhang Chen
2015-11-13 12:33     ` Dr. David Alan Gilbert
  -- strict thread matches above, loose matches on Subject: below --
2015-07-20  6:42 [Qemu-devel] [POC] colo-proxy " Li Zhijian
2015-07-20 10:32 ` Stefan Hajnoczi
2015-07-20 11:55   ` zhanghailiang
2015-07-20 13:12     ` Vasiliy Tolstov
2015-07-20 15:01       ` Stefan Hajnoczi
2015-07-21  1:59         ` zhanghailiang
2015-07-28 22:13           ` Samuel Thibault
2015-07-21  6:13         ` Jan Kiszka
2015-07-21  9:49           ` Stefan Hajnoczi
2015-07-27 10:13             ` Stefan Hajnoczi
2015-07-27 11:24               ` zhanghailiang
2015-07-27 11:31                 ` Samuel Thibault
2015-07-27 13:33               ` Jan Kiszka
2015-07-28 22:12                 ` Samuel Thibault
2015-07-29  7:36                   ` Jan Kiszka
2015-07-20 12:02   ` Li Zhijian
2015-07-24  2:04   ` Dong, Eddie
2015-07-24  2:12     ` Jason Wang
2015-07-24  8:04       ` Yang Hongyang
2015-07-27  3:24         ` Jason Wang
2015-07-27  3:54           ` Yang Hongyang
2015-07-27  4:49             ` Jason Wang
2015-07-27  5:51               ` Yang Hongyang
2015-07-27  7:37                 ` Jason Wang
2015-07-27  7:49                   ` Yang Hongyang
2015-07-27  8:06                     ` Jason Wang
2015-07-27  8:22                       ` Yang Hongyang
2015-07-27  7:53                 ` Jason Wang
2015-07-27  8:17                   ` Yang Hongyang
2015-07-27 18:33                   ` Dr. David Alan Gilbert
2015-07-27 10:40         ` Dr. David Alan Gilbert
2015-07-27 13:39           ` Yang Hongyang
2015-07-24  2:05 ` Dong, Eddie
2015-07-30  4:23 ` Jason Wang
2015-07-30  7:16   ` Gonglei
2015-07-30  7:47     ` Dong, Eddie
2015-07-30  8:03       ` Dr. David Alan Gilbert
2015-07-30  8:15         ` Jason Wang
2015-07-30 11:56           ` Dr. David Alan Gilbert
2015-07-30 12:10             ` Gonglei
2015-07-30 12:30               ` Dr. David Alan Gilbert
2015-07-30 12:42                 ` zhanghailiang
2015-07-30 13:59                   ` Dr. David Alan Gilbert
2015-07-30 15:17                     ` Yang Hongyang
2015-07-30 17:53                       ` Dr. David Alan Gilbert
2015-07-31  1:08                         ` Yang Hongyang
2015-07-31  1:28                           ` zhanghailiang
2015-07-31  1:31                             ` Yang Hongyang
2015-07-31  1:26                         ` zhanghailiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.