From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56088) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alZ8F-0002WH-FH for qemu-devel@nongnu.org; Thu, 31 Mar 2016 05:43:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alZ8C-0001ev-77 for qemu-devel@nongnu.org; Thu, 31 Mar 2016 05:43:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52642) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alZ8B-0001ei-W2 for qemu-devel@nongnu.org; Thu, 31 Mar 2016 05:43:12 -0400 Date: Thu, 31 Mar 2016 10:43:06 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160331094305.GE2265@work-vm> References: <1459326950-17708-1-git-send-email-zhangchen.fnst@cn.fujitsu.com> <20160330120544.GB7580@work-vm> <56FC92ED.8000401@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <56FC92ED.8000401@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH V2 0/3] Introduce COLO-compare List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Li Zhijian Cc: Zhang Chen , Gui jianfeng , Jason Wang , "eddie.dong" , qemu devel , Yang Hongyang , zhanghailiang * Li Zhijian (lizhijian@cn.fujitsu.com) wrote: >=20 >=20 > On 03/30/2016 08:05 PM, Dr. David Alan Gilbert wrote: > >* Zhang Chen (zhangchen.fnst@cn.fujitsu.com) wrote: > >>COLO-compare is a part of COLO project. It is used > >>to compare the network package to help COLO decide > >>whether to do checkpoint. > > > >Hi Zhang Chen, > > I've put comments on the individual patches, but some more general th= ings: > > > > 1) Please add a coment giving the example of the command line for the= primary > > and secondary use of this module - it helps make it easier to under= stand the patches. > > > > 2) There's no tracing in here - please add some; I found when I tried= to get > > COLO working I needed to use lots of tracing and debugging to under= stand the > > packet flow. > > > > 3) Add comments; e.g. for each function say which thread is using it = and where > > the packets are coming from; e.g. > > called from the main thread on the primary for packets arriving= over the socket > > from the secondary. > > > > There's just so many packets going in so many directions it would = make it > > easier to follow. > > > > 4) A more fundamental problem is what happens if the secondary never = sends anything > > on the socket, the result is you end up running until the end of t= he long COLO > > checkpoint without triggering a discompare - in my world I added a= timeout (400ms) > > for an unmatched packet from the primary, where if no matching pac= ket was received > > a checkpoint would be triggered. > > > > 5) I see the packet comparison is still the simple memcmpy that you h= ad in December; > > are you planning on doing anything more complicated; you must be s= eing most packets > > miscompare? > > > >You can see my current world at; https://github.com/orbitfp7/qemu/commit= s/orbit-wp4-colo-mar16 > >which has my basic TCP comparison (it's only tracking incoming connectio= ns) and I know it's > >not complete either. It mostly works OK, although I've got an occasiona= l seg > >(which makes me wonder if I need to add the conn_list_lock I see you add= ed). I'm also > >not doing any TCP reassembly which is probably needed. > > > Thank you very much for your comments. > I just see you tree, you put in a lot of work(tcp comparison enhance, seq= uence/acknowledge > number re-write, timeout...) >=20 > Actually, this compare module is just in a RFC stage(only including compa= re frame), there are > many works to be done: >=20 > 1) Integrate to COLO frame(and Let COLO primary and secondary at running = state) Yes; although I think you've had most of the code for that, also feel free = to use any of the bits I've changed. > 2) ip segment defrag Yes, this seems the hardest bit to me. It is needed for some workloads; for= example a simple case if just an ssh connection, the differences in timing between = the primary and secondary you see that the primary might send two short packets and the= secondary might send one long packet with the same data. > 3) comparison base on the sequence number(tcp and udp) if packet has > Because tcp re-transmission is quit common. IRC, your code will compar= e the whole tcp > packet(sequence number will be compare) Yes; once the secondary reworks the sequence number I found the seuence num= ber matched most of the time on the primary. One problem (depending on the tr= affic) is that the ack number might not match and I'm not sure of the best fix, fo= r example, consider: a) Send packet 1 b) Send packet 2 c) Receive response The order of b & c is random - if the response on the same socket arrives= (c) before (b) then the ack number in packet 2 is different; on one workload this caused a= lot of miscompares but only if the timing is just wrong. (I also had to turn off TCP timestamping to get useful comparisons) > 4) packet belongs to the same connection is sort by sequence number >=20 > 5) Out-Of-Oder packet handle I think 4 & 5 both happen as part of the defrag; out of order packets were = a problem for me on some of the workloads; I had to turn off multiqueue in one case. > 6) cleanup the un-active conn_list which maybe closed. the simple way is = to introduce a > timer to record whether a connection have packet come within a timeout= , connection gone > beyond this timeout should be cleanup. At the moment that isn't a big problem because when you receive the next ch= eckpoint you can flush the list. The only case where you have to deal with this is for continuous failover w= hen the original secondary is promoted to primary, then it's connection list has to= live on longer for any connections it created prior to the failover. Perhaps this gets more complex with defrag? > 7) Dave point out above (4) >=20 > 8) something I miss... >=20 > For Various reasons, not all the works can be done immediately, So we hop= e to discuss and > decide which function have the high priority. > Any comments and suggestions are welcome. Yes, there's a lot of work; as I say, feel free to use any of my patches =66rom the world above, I wasn't planning on doing much more work on that s= et. > IMO, a compare frame and a COLO frame hack patch could be simple enough. I think you'd have to show that you got some useful comparison matches; if it almost always failed the comparison then I can't see the point. Dave >=20 > Thanks > Li >=20 > >Dave > > > >>v2: > >> - add jhash.h > >> > >>v1: > >> - initial patch > >> > >> > >>Zhang Chen (3): > >> colo-compare: introduce colo compare initlization > >> colo-compare: track connection and enqueue packet > >> colo-compare: introduce packet comparison thread > >> > >> include/qemu/jhash.h | 59 ++++ > >> net/Makefile.objs | 1 + > >> net/colo-compare.c | 782 ++++++++++++++++++++++++++++++++++++++++++= +++++++++ > >> vl.c | 3 +- > >> 4 files changed, 844 insertions(+), 1 deletion(-) > >> create mode 100644 include/qemu/jhash.h > >> create mode 100644 net/colo-compare.c > >> > >>-- > >>1.9.1 > >> > >> > >> > >-- > >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > >. > > >=20 > --=20 > Best regards. > Li Zhijian (8555) >=20 >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK