Hi, Thanks a lot for the patch. Unfortunately, this did not solve the problem for me (after applying the patch on both primary and backup, rebuilding and installing xen/tools/stubdom, and then rebooting both hosts). The backup is still unable to create the disk device when the fail-over occurs. Thus, although I see checkpoint traffic flowing from primary to backup, the state of the backup's disk image is never modified (as judged by the image's last-modified time). The backup does switch from "paused" to "running," but it consumes 100% CPU and when I connect to its vnc console it is as if the VM is frozen. So *something* is being transferred, because I do see the screen from the primary, but obviously all is not right, because I can't interact with it at all. Out of curiosity, in your working Remus deployment, which dom0 kernel are you running (and which version of Xen)? I'm running Xen 4.0.1 and the pvops 2.6.31.14 dom0 kernel. My understanding was that Remus supported pvops dom0 2.6.31.x. Any other ideas regarding what this might be a symptom of? My naive interpretation is that it is not a networking configuration problem (since state is being transferred), but that it has something to do with setting up the tapdisk via tapdisk2. Thanks, Jon On Wed, Sep 8, 2010 at 1:50 AM, Shriram Rajagopalan wrote: > Its not just the tap2:remus:.... > > there is a bug lurking in the in tools/python/xen/remus/device.py in > ReplicatedDisk class. The regular expression scans the domU config for only > tap:tapdisk:remus... or tap:remus.. disk types only. I was able to get it > working by fixing that regexp. > This applies for xen 4.0.1 only. Am not sure about xen unstable. > Here is a patch that might be of help to you (its rather crude but heck I > was too lazy :) ) > diff -r b536ebfba183 tools/python/xen/remus/device.py > --- a/tools/python/xen/remus/device.py Wed Aug 25 09:22:42 2010 +0100 > +++ b/tools/python/xen/remus/device.py Fri Sep 03 08:47:13 2010 -0700 > @@ -36,10 +36,13 @@ > # to request commits. > self.ctlfd = None > > - if not disk.uname.startswith('tap:remus:') and not > disk.uname.startswith('tap:tapdisk:remus:'): > + if not disk.uname.startswith('tap2:remus:') and not > disk.uname.startswith('tap:remus:') and not > disk.uname.startswith('tap:tapdisk:remus:'): > raise ReplicatedDiskException('Disk is not replicated: %s' % > str(disk)) > - fifo = re.match("tap:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > + if disk.uname.startswith('tap2:remus:'): > + fifo = re.match("tap2:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > + else: > + fifo = re.match("tap:.*(remus.*)\|", > disk.uname).group(1).replace(':', '_') > absfifo = os.path.join(self.FIFODIR, fifo) > absmsgfifo = absfifo + '.msg' > > > > On Tue, Sep 7, 2010 at 11:01 PM, Pasi Kärkkäinen wrote: > >> On Tue, Sep 07, 2010 at 03:28:32PM -0700, Jonathan Kirsch wrote: >> > Hello, >> > >> > I have been playing around with Remus on Xen 4.0.1, attempting to >> > fail-over for an HVM domU. >> > >> > I've run into some problems that I think could be related to tapdisk2 >> and >> > its interaction with how one sets up Remus disk replication in the >> domU >> > config file. >> > >> > A few things I've noticed: >> > >> > -The tap:remus:backupHostIP:port|aio:imagePath notation does not work >> for >> > me, although this is what is written in the Remus documentation. >> However, >> > I have found the following to work (i.e., not complain when starting >> > domU), so this is what I've been using: >> > >> > tap2:remus:backupHostIP:port|aio:imagePath... >> > >> >> Yeah, this stuff was changed in Xen 4.0.1: >> http://wiki.xensource.com/xenwiki/blktap2 >> >> I guess someone should update the remus wiki page. >> >> -- Pasi >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> > > > > -- > perception is but an offspring of its own self >