From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DC8FC432C0 for ; Tue, 3 Dec 2019 09:19:36 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D92AF205ED for ; Tue, 3 Dec 2019 09:19:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=qnap.com header.i=@qnap.com header.b="SHZgUwK4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D92AF205ED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=qnap.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:50418 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ic4LG-0006zQ-EC for qemu-devel@archiver.kernel.org; Tue, 03 Dec 2019 04:19:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:41946) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ic4BC-0001vD-SY for qemu-devel@nongnu.org; Tue, 03 Dec 2019 04:09:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ic4B4-0000q6-9A for qemu-devel@nongnu.org; Tue, 03 Dec 2019 04:09:07 -0500 Received: from mail-yb1-xb2c.google.com ([2607:f8b0:4864:20::b2c]:35085) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ic4B3-0000fv-Ul for qemu-devel@nongnu.org; Tue, 03 Dec 2019 04:09:02 -0500 Received: by mail-yb1-xb2c.google.com with SMTP id h23so1261193ybg.2 for ; Tue, 03 Dec 2019 01:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qnap.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2EDstYWvyppakp3BW4JhWmESr54YJdoNQCna91s3fJg=; b=SHZgUwK4Cu91Tp8dtrbN/b16Dd8YqpsFpOjmGKMUG9ucaoVVjihMyw6oeUQiasUgO5 5Gmd8Xgj7aLmHh3x3Hdq+Z7gywKOXCeTZ2o9cFxBjwpi0H9G0GgTV7HCk6eQzgBG7rEn sXem4vi6nZ6lTUBis0Y4ewP8yVn4AY3rvy7RA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2EDstYWvyppakp3BW4JhWmESr54YJdoNQCna91s3fJg=; b=j6hL5eHCCFFw4+4SGCcit5Zzkm+lPW4EaRrHbh9Is/HVF7D5YKGz52Jt1Rw+PhShL8 v7Aq7o0C06J4HVfTn88YMAx/aizUglaO+5vd9taLbMZyyAdJbA781aNlKwuoqYfUnYtB IWlW7LfKW/9uefInGOw2miOGC8AFS/E6A6yhjHA6M/jn/6+tMpZpkYh+csPOCoFeHhOm m/3ZRGL0QI8ZCyCw7xemT1p1n13nht3gPa3iiAJSlhD7xDDbaPhGYPr2GasOtEfGYH1A 4aD9gb6M8CgtlTnpSQq3Gyc6moyhp4zd0NpoHspHZNOw2UTzhHZORlelpFShZ144M91i /IKA== X-Gm-Message-State: APjAAAUaJBxQD/x3zGXrR/s4+OEtDzc6hdIYFNBOwBVS2CcedpOjvWBK 77LnIjlswqLJ7cGPFEfxZoDMao+LvT8x9sF0/XIEwQ== X-Google-Smtp-Source: APXvYqzHz1b1isdUTD0I0VySLX7GefYSlhYxS0tvjvZxbqIiLdfS51AmYn1C4AylmhzT4BFxLucbrbJNj0pvog1+H/g= X-Received: by 2002:a25:d00f:: with SMTP id h15mr3788313ybg.70.1575364138593; Tue, 03 Dec 2019 01:08:58 -0800 (PST) MIME-Version: 1.0 References: <20191127105121.GA3017@work-vm> <9CFF81C0F6B98A43A459C9EDAD400D780631A02A@shsmsx102.ccr.corp.intel.com> <9CFF81C0F6B98A43A459C9EDAD400D780631C682@shsmsx102.ccr.corp.intel.com> <20191202095806.GA2904@work-vm> In-Reply-To: <20191202095806.GA2904@work-vm> From: Daniel Cho Date: Tue, 3 Dec 2019 17:08:39 +0800 Message-ID: Subject: Re: Network connection with COLO VM To: "Dr. David Alan Gilbert" Content-Type: multipart/alternative; boundary="000000000000df813b0598c90b37" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::b2c X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Zhang, Chen" , "lukasstraub2@web.de" , "qemu-devel@nongnu.org" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --000000000000df813b0598c90b37 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Dave, We could use the exist interface to add netfilter and chardev, it might not have the problem you said. However, the netfilter and chardev on the primary at the start, that means we could not dynamic set COLO feature to VM? We try to change this chardev to prevent primary VM will stuck to wait secondary VM. -chardev socket,id=3Dcompare1,host=3D127.0.0.1,port=3D9004,server,wait \ to -chardev socket,id=3Dcompare1,host=3D127.0.0.1,port=3D9004,server,nowait \ But it will make primary VM's network not works. (Can't get ip), until starting connect with secondary VM. Otherwise, the primary VM with netfileter / chardev and without netfilter / chardev , they takes very different booting time. Without netfilter / chardev : about 1 mins With netfilter / chardev : about 5 mins Is this an issue? Best regards, Daniel Cho Dr. David Alan Gilbert =E6=96=BC 2019=E5=B9=B412=E6= =9C=882=E6=97=A5 =E9=80=B1=E4=B8=80 =E4=B8=8B=E5=8D=885:58=E5=AF=AB=E9=81= =93=EF=BC=9A > * Daniel Cho (danielcho@qnap.com) wrote: > > Hi Zhang, > > > > We use qemu-4.1.0 release on this case. > > > > I think we need use block mirror to sync the disk to secondary node > first, > > then stop the primary VM and build COLO system. > > > > In the stop moment, you need add some netfilter and chardev socket node > for > > COLO, maybe you need re-check this part. > > > > > > Our test was already follow those step. Maybe I could describe the deta= il > > of the test flow and issues. > > > > > > Step 1: > > > > Create primary VM without any netfilter and chardev for COLO, and using > > other host ping primary VM continually. > > > > > > Step 2: > > > > Create secondary VM (the same device/drive with primary VM), and do blo= ck > > mirror sync ( ping to primary VM normally ) > > > > > > Step 3: > > > > After block mirror sync finish, add those netfilter and chardev to > primary > > VM and secondary VM for COLO ( *Can't* ping to primary VM but those > packets > > will be received later ) > > > > > > Step 4: > > > > Start migrate primary VM to secondary VM, and primary VM & secondary VM > are > > running ( ping to primary VM works and receive those packets on step 3 > > status ) > > > > > > > > > > Between Step 3 to Step 4, it will take 10~20 seconds in our environment= . > > > > I could image this issue (delay reply packets) is because of setting CO= LO > > proxy for temporary status, > > > > but we thought 10~20 seconds might a little long. (If primary VM is > already > > doing some jobs, it might lose the data.) > > > > > > Could we reduce those time? or those delay is depends on different VM? > > I think you need to set up the netfilter and chardev on the primary at > the start; the filter contains the state of the TCP connections working > with the VM, so adding it later can't gain that state for existing > connections. > > Dave > > > > > Best Regard, > > > > Daniel Cho. > > > > > > > > Zhang, Chen =E6=96=BC 2019=E5=B9=B411=E6=9C=8830= =E6=97=A5 =E9=80=B1=E5=85=AD =E4=B8=8A=E5=8D=882:04=E5=AF=AB=E9=81=93=EF=BC= =9A > > > > > > > > > > > > > > > > > *From:* Daniel Cho > > > *Sent:* Friday, November 29, 2019 10:43 AM > > > *To:* Zhang, Chen > > > *Cc:* Dr. David Alan Gilbert ; > lukasstraub2@web.de; > > > qemu-devel@nongnu.org > > > *Subject:* Re: Network connection with COLO VM > > > > > > > > > > > > Hi David, Zhang, > > > > > > > > > > > > Thanks for replying my question. > > > > > > We know why will occur this issue. > > > > > > As you said, the COLO VM's network needs > > > > > > colo-proxy to control packets, so the guest's > > > > > > interface should set the filter to solve the problem. > > > > > > > > > > > > But we found another question, when we set the > > > > > > fault-tolerance feature to guest (primary VM is running, > > > > > > secondary VM is pausing), the guest's network would not > > > > > > responds any request for a while (in our environment > > > > > > about 20~30 secs) after secondary VM runs. > > > > > > > > > > > > Does it be a normal situation, or a known issue? > > > > > > > > > > > > Our test is creating primary VM for a while, then creating > > > > > > secondary VM to make it with COLO feature. > > > > > > > > > > > > Hi Daniel, > > > > > > > > > > > > Happy to hear you have solved ssh disconnection issue. > > > > > > > > > > > > Do you use Lukas=E2=80=99s patch on this case? > > > > > > I think we need use block mirror to sync the disk to secondary node > first, > > > then stop the primary VM and build COLO system. > > > > > > In the stop moment, you need add some netfilter and chardev socket no= de > > > for COLO, maybe you need re-check this part. > > > > > > > > > > > > Best Regard, > > > > > > Daniel Cho > > > > > > > > > > > > Zhang, Chen =E6=96=BC 2019=E5=B9=B411=E6=9C=88= 28=E6=97=A5 =E9=80=B1=E5=9B=9B =E4=B8=8A=E5=8D=889:26=E5=AF=AB=E9=81=93=EF= =BC=9A > > > > > > > > > > > > > -----Original Message----- > > > > From: Dr. David Alan Gilbert > > > > Sent: Wednesday, November 27, 2019 6:51 PM > > > > To: Daniel Cho ; Zhang, Chen > > > > ; lukasstraub2@web.de > > > > Cc: qemu-devel@nongnu.org > > > > Subject: Re: Network connection with COLO VM > > > > > > > > * Daniel Cho (danielcho@qnap.com) wrote: > > > > > Hello everyone, > > > > > > > > > > Could we ssh to colo VM (means PVM & SVM are starting)? > > > > > > > > > > > > > Lets cc in Zhang Chen and Lukas Straub. > > > > > > Thanks Dave. > > > > > > > > > > > > SSH will connect to colo VM for a while, but it will disconnect > with > > > > > error > > > > > *client_loop: send disconnect: Broken pipe* > > > > > > > > > > It seems to colo VM could not keep network session. > > > > > > > > > > Does it be a known issue? > > > > > > > > That sounds like the COLO proxy is getting upset; it's supposed to > > > compare > > > > packets sent by the primary and secondary and only send one to the > > > outside > > > > - you shouldn't be talking directly to the guest, but always via th= e > > > proxy. See > > > > docs/colo-proxy.txt > > > > > > > > > > Hi Daniel, > > > > > > I have try ssh to COLO guest with 8 hours, not occurred this issue. > > > Please check your network/qemu configuration. > > > But I found another problem maybe related this issue, if no network > > > communication for a period of time(maybe 10min), the first message > send to > > > guest have a chance with delay(maybe 1-5 sec), I will try to fix it > when I > > > have time. > > > > > > Thanks > > > Zhang Chen > > > > > > > Dave > > > > > > > > > Best Regard, > > > > > Daniel Cho > > > > -- > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > --000000000000df813b0598c90b37 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Dave,=C2=A0

We could use = the exist interface to add netfilter and chardev, it might not have the pro= blem you said.

However,=C2=A0the netfilter and chardev on the primary at the start, that means= we could not dynamic set COLO
feature to VM?

We try to change this chardev to prevent primary VM will stuck to wai= t secondary VM.
-chardev sock=
et,id=3Dcompare1,host=3D127.0.0.1,port=3D9004,server,wait \
to
-chardev socket,id=3Dcompare1,h=
ost=3D127.0.0.1,port=3D9004,server,nowait \
But it will make prim= ary VM's network=C2=A0not works. (Can't get ip), until starting con= nect with secondary VM.


Otherwise, = the primary VM with netfileter / chardev and without netfilter / chardev , = they takes very different=C2=A0
booting time.
Without=C2=A0 netfilter / chardev : about 1 mins
With=C2=A0 =C2=A0netfilter / chardev : about 5 mins=C2=A0=C2=A0
<= div>Is this an issue?

Best=C2=A0regards,
Daniel Cho


<= div dir=3D"ltr" class=3D"gmail_attr">Dr. David Alan Gilbert <dgilbert@redhat.com> =E6=96=BC 2019=E5= =B9=B412=E6=9C=882=E6=97=A5 =E9=80=B1=E4=B8=80 =E4=B8=8B=E5=8D=885:58=E5=AF= =AB=E9=81=93=EF=BC=9A
* Daniel Cho (danielcho@qnap.com) wrote:
> Hi Zhang,
>
> We use qemu-4.1.0 release on this case.
>
> I think we need use block mirror to sync the disk to secondary node fi= rst,
> then stop the primary VM and build COLO system.
>
> In the stop moment, you need add some netfilter and chardev socket nod= e for
> COLO, maybe you need re-check this part.
>
>
> Our test was already follow those step. Maybe I could describe the det= ail
> of the test flow and issues.
>
>
> Step 1:
>
> Create primary VM without any netfilter and chardev for COLO, and usin= g
> other host ping primary VM continually.
>
>
> Step 2:
>
> Create secondary VM (the same device/drive with primary VM), and do bl= ock
> mirror sync ( ping to primary VM normally )
>
>
> Step 3:
>
> After block mirror sync finish, add those netfilter and chardev to pri= mary
> VM and secondary VM for COLO ( *Can't* ping to primary VM but thos= e packets
> will be received later )
>
>
> Step 4:
>
> Start migrate primary VM to secondary VM, and primary VM & seconda= ry VM are
> running ( ping to primary VM works and receive those packets on step 3=
> status )
>
>
>
>
> Between Step 3 to Step 4, it will take 10~20 seconds in our environmen= t.
>
> I could image this issue (delay reply packets) is because of setting C= OLO
> proxy for temporary status,
>
> but we thought 10~20 seconds might a little long. (If primary VM is al= ready
> doing some jobs, it might lose the data.)
>
>
> Could we reduce those time? or those delay is depends on different VM?=

I think you need to set up the netfilter and chardev on the primary at
the start;=C2=A0 the filter contains the state of the TCP connections worki= ng
with the VM, so adding it later can't gain that state for existing
connections.

Dave

>
> Best Regard,
>
> Daniel Cho.
>
>
>
> Zhang, Chen <chen.zhang@intel.com> =E6=96=BC 2019=E5=B9=B411=E6=9C=8830=E6=97= =A5 =E9=80=B1=E5=85=AD =E4=B8=8A=E5=8D=882:04=E5=AF=AB=E9=81=93=EF=BC=9A >
> >
> >
> >
> >
> > *From:* Daniel Cho <danielcho@qnap.com>
> > *Sent:* Friday, November 29, 2019 10:43 AM
> > *To:* Zhang, Chen <chen.zhang@intel.com>
> > *Cc:* Dr. David Alan Gilbert <dgilbert@redhat.com>; lukasstraub2@web.de;
> > qemu-d= evel@nongnu.org
> > *Subject:* Re: Network connection with COLO VM
> >
> >
> >
> > Hi David,=C2=A0 Zhang,
> >
> >
> >
> > Thanks for replying my question.
> >
> > We know why will occur this issue.
> >
> > As you said, the COLO VM's network needs
> >
> > colo-proxy to control packets, so the guest's
> >
> > interface should set the filter to solve the problem.
> >
> >
> >
> > But we found another question, when we set the
> >
> > fault-tolerance feature to guest (primary VM is running,
> >
> > secondary VM is pausing), the guest's network would not
> >
> > responds any request for a while (in our environment
> >
> > about 20~30 secs) after secondary VM runs.
> >
> >
> >
> > Does it be a normal situation, or a known issue?
> >
> >
> >
> > Our test is creating primary VM for a while, then creating
> >
> > secondary VM to make it with COLO feature.
> >
> >
> >
> > Hi Daniel,
> >
> >
> >
> > Happy to hear you have solved ssh disconnection issue.
> >
> >
> >
> > Do you use Lukas=E2=80=99s patch on this case?
> >
> > I think we need use block mirror to sync the disk to secondary no= de first,
> > then stop the primary VM and build COLO system.
> >
> > In the stop moment, you need add some netfilter and chardev socke= t node
> > for COLO, maybe you need re-check this part.
> >
> >
> >
> > Best Regard,
> >
> > Daniel Cho
> >
> >
> >
> > Zhang, Chen <chen.zhang@intel.com> =E6=96=BC 2019=E5=B9=B411=E6=9C=8828= =E6=97=A5 =E9=80=B1=E5=9B=9B =E4=B8=8A=E5=8D=889:26=E5=AF=AB=E9=81=93=EF=BC= =9A
> >
> >
> >
> > > -----Original Message-----
> > > From: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Sent: Wednesday, November 27, 2019 6:51 PM
> > > To: Daniel Cho <danielcho@qnap.com>; Zhang, Chen
> > > <chen.zhang@intel.com>; lukasstraub2@web.de
> > > Cc: qemu-devel@nongnu.org
> > > Subject: Re: Network connection with COLO VM
> > >
> > > * Daniel Cho (danielcho@qnap.com) wrote:
> > > > Hello everyone,
> > > >
> > > > Could we ssh to colo VM (means PVM & SVM are starti= ng)?
> > > >
> > >
> > > Lets cc in Zhang Chen and Lukas Straub.
> >
> > Thanks Dave.
> >
> > >
> > > > SSH will connect to colo VM for a while, but it will di= sconnect with
> > > > error
> > > > *client_loop: send disconnect: Broken pipe*
> > > >
> > > > It seems to colo VM could not keep network session.
> > > >
> > > > Does it be a known issue?
> > >
> > > That sounds like the COLO proxy is getting upset; it's s= upposed to
> > compare
> > > packets sent by the primary and secondary and only send one = to the
> > outside
> > > - you shouldn't be talking directly to the guest, but al= ways via the
> > proxy.=C2=A0 See
> > > docs/colo-proxy.txt
> > >
> >
> > Hi Daniel,
> >
> > I have try ssh to COLO guest with 8 hours, not occurred this issu= e.
> > Please check your network/qemu configuration.
> > But I found another problem maybe related this issue, if no netwo= rk
> > communication for a period of time(maybe 10min), the first messag= e send to
> > guest have a chance with delay(maybe 1-5 sec), I will try to fix = it when I
> > have time.
> >
> > Thanks
> > Zhang Chen
> >
> > > Dave
> > >
> > > > Best Regard,
> > > > Daniel Cho
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--000000000000df813b0598c90b37--