From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHmai-0007Yv-Qq for qemu-devel@nongnu.org; Tue, 28 Jun 2016 02:33:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bHmaf-0000f3-Hc for qemu-devel@nongnu.org; Tue, 28 Jun 2016 02:33:48 -0400 Received: from [59.151.112.132] (port=3866 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bHmae-0000Z8-73 for qemu-devel@nongnu.org; Tue, 28 Jun 2016 02:33:45 -0400 References: <1465902912-27527-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <1465902912-27527-4-git-send-email-zhangchen.fnst@cn.fujitsu.com> <57678CDA.7020700@redhat.com> <20160620121402.GC2891@work-vm> <576A020E.8040804@cn.fujitsu.com> <576A3174.2010905@redhat.com> <576BBE8D.1030009@cn.fujitsu.com> <576CCE68.90309@redhat.com> From: Zhang Chen Message-ID: <57721A55.8020906@cn.fujitsu.com> Date: Tue, 28 Jun 2016 14:33:57 +0800 MIME-Version: 1.0 In-Reply-To: <576CCE68.90309@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC PATCH 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jason Wang , "Dr. David Alan Gilbert" Cc: Yang Hongyang , "eddie . dong" , qemu devel , Li Zhijian , zhanghailiang On 06/24/2016 02:08 PM, Jason Wang wrote: > > > On 2016年06月23日 18:48, Zhang Chen wrote: >> >> >> On 06/22/2016 02:34 PM, Jason Wang wrote: >>> >>> >>> On 2016年06月22日 11:12, Zhang Chen wrote: >>>> >>>> >>>> On 06/20/2016 08:14 PM, Dr. David Alan Gilbert wrote: >>>>> * Jason Wang (jasowang@redhat.com) wrote: >>>>>> >>>>>> On 2016年06月14日 19:15, Zhang Chen wrote: >>>>>>> We will rewrite tcp packet secondary received and sent. >>>>>> More verbose please. E.g which fields were rewrote and why. >>>> >>>> OK. >>>> >>>>>>> Signed-off-by: Zhang Chen >>>>>>> Signed-off-by: Li Zhijian >>>>>>> Signed-off-by: Wen Congyang >>>>>>> --- >>>>>>> net/filter-rewriter.c | 94 >>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>>> trace-events | 3 ++ >>>>>>> 2 files changed, 95 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c >>>>>>> index 12f88c5..86a2f53 100644 >>>>>>> --- a/net/filter-rewriter.c >>>>>>> +++ b/net/filter-rewriter.c >>>>>>> @@ -21,6 +21,7 @@ >>>>>>> #include "qemu/main-loop.h" >>>>>>> #include "qemu/iov.h" >>>>>>> #include "net/checksum.h" >>>>>>> +#include "trace.h" >>>>>>> #define FILTER_COLO_REWRITER(obj) \ >>>>>>> OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER) >>>>>>> @@ -64,6 +65,75 @@ static int is_tcp_packet(Packet *pkt) >>>>>>> } >>>>>>> } >>>>>>> +static int handle_primary_tcp_pkt(NetFilterState *nf, >>>>>>> + Connection *conn, >>>>>>> + Packet *pkt) >>>>>>> +{ >>>>>>> + struct tcphdr *tcp_pkt; >>>>>>> + >>>>>>> + tcp_pkt = (struct tcphdr *)pkt->transport_layer; >>>>>>> + >>>>>>> + if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) { >>>>>> Why not use tracepoints directly? >>>>> Because trace can't cope with you having to do an allocation/free. >>>>> >>>>>>> + char *sdebug, *ddebug; >>>>>>> + sdebug = strdup(inet_ntoa(pkt->ip->ip_src)); >>>>>>> + ddebug = strdup(inet_ntoa(pkt->ip->ip_dst)); >>>>>>> + fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u" >>>>>>> + " flags=%x\n", __func__, sdebug, ddebug, >>>>>>> + ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack), >>>>>>> + tcp_pkt->th_flags); >>>>> However, this should use the trace_ call to write the result even >>>>> if it's >>>>> using trace_event_get_state to switch the whole block on/off. >>>> >>>> I will fix it in next version. >>>> >>>>> >>>>>>> + g_free(sdebug); >>>>>>> + g_free(ddebug); >>>>>>> + } >>>>>>> + >>>>>>> + if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) { >>>>>>> + /* save primary colo tcp packet seq */ >>>>>>> + conn->primary_seq = ntohl(tcp_pkt->th_ack) - 1; >>>>>> Looks like primary_seq will only be updated during handshake, I >>>>>> wonder how >>>>>> this works. >>>> >>>> OK. >>>> We assume that colo guest is a tcp server. >>>> >>>> Firstly, client start a tcp handshake. the packet's seq=client_seq, >>>> ack=0,flag=SYN. COLO primary guest get this pkt and >>>> mirror(filter-mirror) >>>> to secondary guest, secondary get it use filter-redirector. >>>> Then,primary guest response >>>> pkt(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN). >>>> secondary guest response >>>> pkt(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN). >>>> In here,we use filter-rewriter save the secondary_seq to it's tcp >>>> connection. >>>> Finally handshake,client send >>>> pkt(seq=client_seq+1,ack=primary_seq+1,flag=ACK). >>>> Here,filter-rewriter can get primary_seq, and rewrite ack from >>>> primary_seq+1 >>>> to secondary_seq+1, recalculate checksum. So the secondary tcp >>>> connection >>>> kept good. >>>> >>>> When we send/recv packet. >>>> client send >>>> pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH). >>>> filter-rewriter rewrite ack and send to secondary guest. >>> >>> If I read your code correctly, secondary_seq will only be updated >>> during handshake. So the ack seq will always be same for each packet >>> received by secondary? >> >> Yes. I don't know why kernel do this. But I dump the packet hex >> found that, >> the ack packet flag=ACK means only ack enabled.and the seq will >> affect tcp checksum >> make connection failed. >> > > Not sure I get your meaning, but basically the code here should not > have any assumptions on guest behaviors. Yes. I get your point. > >> >>> >>>> primary guest response >>>> pkt(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK) >>>> secondary guest response >>>> pkt(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK) >>> >>> Is ACK a must here? >> >> Yes. >> > > Looks not, e.g what happens if guest does not use piggybacking acks? > > If guest does not use piggybacking acks, it will send a independent packet for ack. we will get this packet. like: pkt(seq=xxxx,ack=xxx,flag=ACK). Thanks Zhang Chen > . > -- Thanks zhangchen