From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jonathan Kirsch Subject: Re: Remus blktap2 issue Date: Wed, 8 Sep 2010 11:33:18 -0700 Message-ID: References: <20100908060101.GE2804@reaktio.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0404290202==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --===============0404290202== Content-Type: multipart/alternative; boundary=001636c5a4625ae714048fc3bf9c --001636c5a4625ae714048fc3bf9c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, Thanks a lot for the patch. Unfortunately, this did not solve the problem for me (after applying the patch on both primary and backup, rebuilding and installing xen/tools/stubdom, and then rebooting both hosts). The backup i= s still unable to create the disk device when the fail-over occurs. Thus, although I see checkpoint traffic flowing from primary to backup, the state of the backup's disk image is never modified (as judged by the image's last-modified time). The backup does switch from "paused" to "running," bu= t it consumes 100% CPU and when I connect to its vnc console it is as if the VM is frozen. So *something* is being transferred, because I do see the screen from the primary, but obviously all is not right, because I can't interact with it at all. Out of curiosity, in your working Remus deployment, which dom0 kernel are you running (and which version of Xen)? I'm running Xen 4.0.1 and the pvop= s 2.6.31.14 dom0 kernel. My understanding was that Remus supported pvops dom= 0 2.6.31.x. Any other ideas regarding what this might be a symptom of? My naive interpretation is that it is not a networking configuration problem (since state is being transferred), but that it has something to do with setting u= p the tapdisk via tapdisk2. Thanks, Jon On Wed, Sep 8, 2010 at 1:50 AM, Shriram Rajagopalan wro= te: > Its not just the tap2:remus:.... > > there is a bug lurking in the in tools/python/xen/remus/device.py in > ReplicatedDisk class. The regular expression scans the domU config for on= ly > tap:tapdisk:remus... or tap:remus.. disk types only. I was able to get it > working by fixing that regexp. > This applies for xen 4.0.1 only. Am not sure about xen unstable. > Here is a patch that might be of help to you (its rather crude but heck = I > was too lazy :) ) > diff -r b536ebfba183 tools/python/xen/remus/device.py > --- a/tools/python/xen/remus/device.py Wed Aug 25 09:22:42 2010 +0100 > +++ b/tools/python/xen/remus/device.py Fri Sep 03 08:47:13 2010 -0700 > @@ -36,10 +36,13 @@ > # to request commits. > self.ctlfd =3D None > > - if not disk.uname.startswith('tap:remus:') and not > disk.uname.startswith('tap:tapdisk:remus:'): > + if not disk.uname.startswith('tap2:remus:') and not > disk.uname.startswith('tap:remus:') and not > disk.uname.startswith('tap:tapdisk:remus:'): > raise ReplicatedDiskException('Disk is not replicated: %s' % > str(disk)) > - fifo =3D re.match("tap:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > + if disk.uname.startswith('tap2:remus:'): > + fifo =3D re.match("tap2:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > + else: > + fifo =3D re.match("tap:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > absfifo =3D os.path.join(self.FIFODIR, fifo) > absmsgfifo =3D absfifo + '.msg' > > > > On Tue, Sep 7, 2010 at 11:01 PM, Pasi K=E4rkk=E4inen wrote= : > >> On Tue, Sep 07, 2010 at 03:28:32PM -0700, Jonathan Kirsch wrote: >> > Hello, >> > >> > I have been playing around with Remus on Xen 4.0.1, attempting to >> > fail-over for an HVM domU. >> > >> > I've run into some problems that I think could be related to tapdis= k2 >> and >> > its interaction with how one sets up Remus disk replication in the >> domU >> > config file. >> > >> > A few things I've noticed: >> > >> > -The tap:remus:backupHostIP:port|aio:imagePath notation does not wo= rk >> for >> > me, although this is what is written in the Remus documentation. >> However, >> > I have found the following to work (i.e., not complain when startin= g >> > domU), so this is what I've been using: >> > >> > tap2:remus:backupHostIP:port|aio:imagePath... >> > >> >> Yeah, this stuff was changed in Xen 4.0.1: >> http://wiki.xensource.com/xenwiki/blktap2 >> >> I guess someone should update the remus wiki page. >> >> -- Pasi >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > > -- > perception is but an offspring of its own self > --001636c5a4625ae714048fc3bf9c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

Thanks a lot for the patch.=A0 Unfortunately, this did not solve= the problem for me (after applying the patch on both primary and backup, r= ebuilding and installing xen/tools/stubdom, and then rebooting both hosts).= =A0 The backup is still unable to create the disk device when the fail-over= occurs.=A0 Thus, although I see checkpoint traffic flowing from primary to= backup, the state of the backup's disk image is never modified (as jud= ged by the image's last-modified time).=A0 The backup does switch from = "paused" to "running," but it consumes 100% CPU and whe= n I connect to its vnc console it is as if the VM is frozen.=A0 So *somethi= ng* is being transferred, because I do see the screen from the primary, but= obviously all is not right, because I can't interact with it at all.
Out of curiosity, in your working Remus deployment, which dom0 kernel a= re you running (and which version of Xen)?=A0 I'm running Xen 4.0.1 and= the pvops 2.6.31.14 dom0 kernel.=A0 My understanding was that Remus suppor= ted pvops dom0 2.6.31.x.=A0

Any other ideas regarding what this might be a symptom of?=A0 My naive = interpretation is that it is not a networking configuration problem (since = state is being transferred), but that it has something to do with setting u= p the tapdisk via tapdisk2.

Thanks,
Jon=A0

On Wed, Sep 8, 201= 0 at 1:50 AM, Shriram Rajagopalan <rshriram@gmail.com> wrote:
Its not just the tap2:remus:....

there is a bug lurking in the in to= ols/python/xen/remus/device.py in ReplicatedDisk class. The regular express= ion scans the domU config for only tap:tapdisk:remus... or tap:remus.. disk= types only. I was able to get it working by fixing that regexp.
This applies for xen 4.0.1 only. Am not sure about xen unstable.
=A0Here= is a patch that might be of help to you (its rather crude but heck I was t= oo lazy :) )
diff -r b536ebfba183 tools/python/xen/remus/device.py
--- a/tools/python/xen/remus/device.py=A0 Wed Aug 25 09:22:42 2010 +0100+++ b/tools/python/xen/remus/device.py=A0 Fri Sep 03 08:47:13 2010 -0700@@ -36,10 +36,13 @@
=A0=A0=A0=A0=A0=A0=A0=A0 # to request commits.
= =A0=A0=A0=A0=A0=A0=A0=A0 self.ctlfd =3D None
=A0
-=A0=A0=A0=A0=A0=A0=A0 if not disk.uname.startswith('tap:remus:&= #39;) and not disk.uname.startswith('tap:tapdisk:remus:'):
+=A0= =A0=A0=A0=A0=A0=A0 if not disk.uname.startswith('tap2:remus:') and = not disk.uname.startswith('tap:remus:') and not disk.uname.startswi= th('tap:tapdisk:remus:'):
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 raise ReplicatedDiskException('Dis= k is not replicated: %s' %
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 str(disk))
-=A0=A0=A0=A0=A0=A0=A0 fifo =3D re.match("tap:.*(= remus.*)\|", disk.uname).group(1).replace(':', '_') +=A0=A0=A0=A0=A0=A0=A0 if disk.uname.startswith('tap2:remus:'):=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0
+=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 fifo = =3D re.match("tap2:.*(remus.*)\|", disk.uname).group(1).replace(&= #39;:', '_')
+=A0=A0=A0=A0=A0=A0=A0 else:
+=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 fifo =3D re.match("tap:.*(remus.*)\|", disk= .uname).group(1).replace(':', '_')
=A0=A0=A0=A0=A0=A0=A0=A0 absfifo =3D os.path.join(self.FIFODIR, fifo)
= =A0=A0=A0=A0=A0=A0=A0=A0 absmsgfifo =3D absfifo + '.msg'


On Tue, S= ep 7, 2010 at 11:01 PM, Pasi K=E4rkk=E4inen <pasik@iki.fi> wrot= e:
=
On Tue, Sep 07, 2010 at 03:28:32PM -0700,= Jonathan Kirsch wrote:
> =A0 =A0Hello,
>
> =A0 =A0I have been playing around with Remus on Xen 4.0.1, attempting = to
> =A0 =A0fail-over for an HVM domU.
>
> =A0 =A0I've run into some problems that I think could be related t= o tapdisk2 and
> =A0 =A0its interaction with how one sets up Remus disk replication in = the domU
> =A0 =A0config file.
>
> =A0 =A0A few things I've noticed:
>
> =A0 =A0-The tap:remus:backupHostIP:port|aio:imagePath notation does no= t work for
> =A0 =A0me, although this is what is written in the Remus documentation= . =A0However,
> =A0 =A0I have found the following to work (i.e., not complain when sta= rting
> =A0 =A0domU), so this is what I've been using:
>
> =A0 =A0tap2:remus:backupHostIP:port|aio:imagePath...
>

Yeah, this stuff was changed in Xen 4.0.1:
htt= p://wiki.xensource.com/xenwiki/blktap2

I guess someone should update the remus wiki page.

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-deve= l@lists.xensource.com
http://l= ists.xensource.com/xen-devel



--
perception is but an of= fspring of its own self

--001636c5a4625ae714048fc3bf9c-- --===============0404290202== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0404290202==--