* hanging tapdisk2 processes and improper udev rules
@ 2011-07-22 9:18 Andreas Olsowski
2011-07-22 9:28 ` Ian Campbell
2011-07-22 9:31 ` Daniel Stodden
0 siblings, 2 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22 9:18 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2644 bytes --]
When i xl-create a guest, i get one message per assigned block device:
root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
Parsing config file /etc/xen/domains/x1test.sxp
Daemon running with PID 8704
root@xenturio1:/var/log# tail -10 error |grep SYMLINK
syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name
'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use
SYMLINK+= or change the kernel to provide the proper name
syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name
'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use
SYMLINK+= or change the kernel to provide the proper name
The guest works fine at that point.
root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2
root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd
--daemon
root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
--daemon
root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
--daemon
root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl
create /etc/xen/domains/x1test2.sxp
root 9067 0.0 0.0 0 0 ? S 11:00 0:00
[blkback.3.xvda1]
root 9068 0.0 0.0 0 0 ? S 11:00 0:00
[blkback.3.xvda2]
Then i shutdown the guest:
root@xenturio1:/var/log# xl shutdown x1test
And i am left with remaining tapdisk2 and udev processes, one for each
block device that was assigned to the guest:
root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
--daemon
root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
--daemon
I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
My distro is debian 6.0.2. that uses udev 164-3.
I did update it on a different machine to 171-3, but that did not help.
My xen-backend.rules contains the default:
SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
MODE="0600"
SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
MODE="0600
My questions are:
- Are the two issues related?
- How can i fix them?
I think that eventually this will cause the host to run out of either
free process IDs and/or RAM.
--
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg
Tel: ++49 4131 677 1309
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
@ 2011-07-22 9:28 ` Ian Campbell
2011-07-22 11:36 ` Andreas Olsowski
2011-07-22 9:31 ` Daniel Stodden
1 sibling, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2011-07-22 9:28 UTC (permalink / raw)
To: Andreas Olsowski; +Cc: xen-devel, Ian Jackson
On Fri, 2011-07-22 at 10:18 +0100, Andreas Olsowski wrote:
> When i xl-create a guest, i get one message per assigned block device:
>
> root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> Parsing config file /etc/xen/domains/x1test.sxp
> Daemon running with PID 8704
>
> root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name
> 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use
> SYMLINK+= or change the kernel to provide the proper name
> syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name
> 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use
> SYMLINK+= or change the kernel to provide the proper name
This is because udev and forward/backward compatibility are strangers
passing in the night. I presume if you make the recommended change to
SYMLINK+= instead of NAME= in your udev script this goes away?
> Then i shutdown the guest:
> root@xenturio1:/var/log# xl shutdown x1test
>
> And i am left with remaining tapdisk2 and udev processes, one for each
> block device that was assigned to the guest:
> root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
> --daemon
> root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
> --daemon
I posted a patch to fix this "libxl: attempt to cleanup tapdisk
processes on disk backend destroy" a couple of times, most recently at
http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
applied yet. Can you try it?
> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> My distro is debian 6.0.2. that uses udev 164-3.
> I did update it on a different machine to 171-3, but that did not help.
>
>
> My xen-backend.rules contains the default:
> SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> MODE="0600"
> SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> MODE="0600
>
>
> My questions are:
> - Are the two issues related?
> - How can i fix them?
>
>
> I think that eventually this will cause the host to run out of either
> free process IDs and/or RAM.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
2011-07-22 9:28 ` Ian Campbell
@ 2011-07-22 9:31 ` Daniel Stodden
2011-07-22 9:32 ` Daniel Stodden
2011-07-22 9:34 ` Sébastien RICCIO
1 sibling, 2 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22 9:31 UTC (permalink / raw)
To: Andreas Olsowski; +Cc: xen-devel
On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote:
> When i xl-create a guest, i get one message per assigned block device:
>
> root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> Parsing config file /etc/xen/domains/x1test.sxp
> Daemon running with PID 8704
Can you try if it gets better when removing that file?
Thanks,
Daniel
> root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name
> 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use
> SYMLINK+= or change the kernel to provide the proper name
> syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name
> 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use
> SYMLINK+= or change the kernel to provide the proper name
>
>
> The guest works fine at that point.
> root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2
> root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd
> --daemon
> root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
> --daemon
> root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
> --daemon
> root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl
> create /etc/xen/domains/x1test2.sxp
> root 9067 0.0 0.0 0 0 ? S 11:00 0:00
> [blkback.3.xvda1]
> root 9068 0.0 0.0 0 0 ? S 11:00 0:00
> [blkback.3.xvda2]
>
>
>
> Then i shutdown the guest:
> root@xenturio1:/var/log# xl shutdown x1test
>
> And i am left with remaining tapdisk2 and udev processes, one for each
> block device that was assigned to the guest:
> root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
> --daemon
> root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
> --daemon
>
> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> My distro is debian 6.0.2. that uses udev 164-3.
> I did update it on a different machine to 171-3, but that did not help.
>
>
> My xen-backend.rules contains the default:
> SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> MODE="0600"
> SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> MODE="0600
>
>
> My questions are:
> - Are the two issues related?
> - How can i fix them?
>
>
> I think that eventually this will cause the host to run out of either
> free process IDs and/or RAM.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:31 ` Daniel Stodden
@ 2011-07-22 9:32 ` Daniel Stodden
2011-07-22 9:34 ` Sébastien RICCIO
1 sibling, 0 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22 9:32 UTC (permalink / raw)
To: Andreas Olsowski; +Cc: xen-devel
On Fri, 2011-07-22 at 05:31 -0400, Daniel Stodden wrote:
> On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote:
> > When i xl-create a guest, i get one message per assigned block device:
> >
> > root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> > Parsing config file /etc/xen/domains/x1test.sxp
> > Daemon running with PID 8704
>
> Can you try if it gets better when removing that file?
The udev rules, in case it isn't clear :)
Daniel
> Thanks,
> Daniel
>
> > root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> > syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name
> > 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use
> > SYMLINK+= or change the kernel to provide the proper name
> > syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name
> > 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use
> > SYMLINK+= or change the kernel to provide the proper name
> >
> >
> > The guest works fine at that point.
> > root 8975 1.0 0.0 21664 3292 ? SLs 11:00 0:00 tapdisk2
> > root 8978 0.0 0.0 21008 916 ? S 11:00 0:00 udevd
> > --daemon
> > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
> > --daemon
> > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
> > --daemon
> > root 9020 0.0 0.0 35500 952 ? Ssl 11:00 0:00 xl
> > create /etc/xen/domains/x1test2.sxp
> > root 9067 0.0 0.0 0 0 ? S 11:00 0:00
> > [blkback.3.xvda1]
> > root 9068 0.0 0.0 0 0 ? S 11:00 0:00
> > [blkback.3.xvda2]
> >
> >
> >
> > Then i shutdown the guest:
> > root@xenturio1:/var/log# xl shutdown x1test
> >
> > And i am left with remaining tapdisk2 and udev processes, one for each
> > block device that was assigned to the guest:
> > root 8975 0.1 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> > root 8981 0.0 0.0 21664 3256 ? SLs 11:00 0:00 tapdisk2
> > root 8983 0.0 0.0 21008 796 ? S 11:00 0:00 udevd
> > --daemon
> > root 9002 0.0 0.0 21008 800 ? S 11:00 0:00 udevd
> > --daemon
> >
> > I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> > My distro is debian 6.0.2. that uses udev 164-3.
> > I did update it on a different machine to 171-3, but that did not help.
> >
> >
> > My xen-backend.rules contains the default:
> > SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> > MODE="0600"
> > SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k",
> > MODE="0600
> >
> >
> > My questions are:
> > - Are the two issues related?
> > - How can i fix them?
> >
> >
> > I think that eventually this will cause the host to run out of either
> > free process IDs and/or RAM.
> >
> >
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:31 ` Daniel Stodden
2011-07-22 9:32 ` Daniel Stodden
@ 2011-07-22 9:34 ` Sébastien RICCIO
2011-07-22 9:50 ` Daniel Stodden
1 sibling, 1 reply; 14+ messages in thread
From: Sébastien RICCIO @ 2011-07-22 9:34 UTC (permalink / raw)
To: Daniel Stodden; +Cc: Andreas Olsowski, xen-devel
>> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
>> My distro is debian 6.0.2. that uses udev 164-3.
>> I did update it on a different machine to 171-3, but that did not help.
>>
Hi,
Just for curiosity, are you running multipathd on that box ? I had
(still have in fact) an issue with tapdisk processes hanging
while multipathd process running.
Sébastien
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:34 ` Sébastien RICCIO
@ 2011-07-22 9:50 ` Daniel Stodden
2011-07-22 10:01 ` Sébastien RICCIO
0 siblings, 1 reply; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22 9:50 UTC (permalink / raw)
To: Sébastien RICCIO; +Cc: Andreas Olsowski, xen-devel
On Fri, 2011-07-22 at 05:34 -0400, Sébastien RICCIO wrote:
> >> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> >> My distro is debian 6.0.2. that uses udev 164-3.
> >> I did update it on a different machine to 171-3, but that did not help.
> >>
> Hi,
> Just for curiosity, are you running multipathd on that box ? I had
> (still have in fact) an issue with tapdisk processes hanging
> while multipathd process running.
The processes, really? Where do they hang? (check out the wait state --
ps -eopid,wchan:25,cmd or so).
Or do you mean they're stuck waiting for I/Os?
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:50 ` Daniel Stodden
@ 2011-07-22 10:01 ` Sébastien RICCIO
2011-07-22 18:55 ` hanging tapdisk2 processes and multipathing Daniel Stodden
0 siblings, 1 reply; 14+ messages in thread
From: Sébastien RICCIO @ 2011-07-22 10:01 UTC (permalink / raw)
To: Daniel Stodden; +Cc: Andreas Olsowski, xen-devel
> The processes, really? Where do they hang? (check out the wait state --
> ps -eopid,wchan:25,cmd or so).
>
> Or do you mean they're stuck waiting for I/Os?
>
> Daniel
>
>
They seems to work and to do their job, but they are in a strange state.
For example a ps -aux on dom0 hangs when processing
the line about the tapdisk process, also it cannot be detached from the
vm, and issuing a reboot of the host hangs too (can't kill the process
so it doesn't reboot).
I fighted quite a lot with this on a debian6 + xen 4.1.x box and found
out that disabling the multipath-tools and multipath-tools-boot
corrected the problem (but I need them). I thought that maybe it was
beacause multipathd try to "multipath" the block device
handled by blktap2 and somehow locks it. But it's speculations :)
I do not have the the hands on the box at the moment to give you more
informations and do not want to hijack this thread. It's just that it
looked like the problem I encountered, but I will send you more
informations when I am on the box.
Thanks,
Sébastien
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 9:28 ` Ian Campbell
@ 2011-07-22 11:36 ` Andreas Olsowski
2011-07-22 14:07 ` Ian Campbell
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22 11:36 UTC (permalink / raw)
To: Ian Campbell, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2621 bytes --]
On 07/22/2011 11:28 AM, Ian Campbell wrote:
> This is because udev and forward/backward compatibility are strangers
> passing in the night. I presume if you make the recommended change to
> SYMLINK+= instead of NAME= in your udev script this goes away?
You assume correctly.
> I posted a patch to fix this "libxl: attempt to cleanup tapdisk
> processes on disk backend destroy" a couple of times, most recently at
> http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
> applied yet. Can you try it?
I tried it:
make -j7 tools:
...
libxl_device.c: In function ‘libxl__device_destroy’:
libxl_device.c:253: error: incompatible type for argument 1 of
‘libxl__device_destroy_tapdisk’
libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument
is of type ‘libxl__gc’
libxl_device.c:274: error: incompatible type for argument 1 of
‘libxl__device_destroy_tapdisk’
libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument
is of type ‘libxl__gc’
My expertise with C is barely existant, but i took a look at
tools/libxl/libxl_device.c
and changed your
libxl__device_destroy_tapdisk(gc, be_path);
into
libxl__device_destroy_tapdisk(&gc, be_path);
as i have seen some &gc on other lines of code.
And it compiled.
I then created a guest, shut it down.
First it kept beeing in a -ps--- state, i wanted to take a look at the
runing processes with "ps auxww" but the ps process hung itself.
I could no longer run "ps" successfully after this point.
syslog showed:
ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure
reading message
Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed
to receive 'unknown' message
Either my hack to get your code to compile was no good or your patch has
some unforseen side effects.
I have now rebooted the server.
As i went on to check if multipath had any effect on it i added
devnode "^td" to the blacklist.
Now when i xl create a vm it only boots up to a certain point and then
does nothing.
If that certain point were to be the login prompt everything would be
fine, but it isnt:
http://pastebin.com/Lmie6KwY
This is how it should look like:
http://pastebin.com/CsgYypbk
I will try to backtrace my steps and see what i did do to break my system.
In the meantime i have other systems i can test stuff on.
-with best regards
--
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg
Tel: ++49 4131 677 1309
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 11:36 ` Andreas Olsowski
@ 2011-07-22 14:07 ` Ian Campbell
2011-07-22 14:28 ` Andreas Olsowski
0 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2011-07-22 14:07 UTC (permalink / raw)
To: Andreas Olsowski, Daniel Stodden; +Cc: xen-devel
On Fri, 2011-07-22 at 12:36 +0100, Andreas Olsowski wrote:
> On 07/22/2011 11:28 AM, Ian Campbell wrote:
>
> > This is because udev and forward/backward compatibility are strangers
> > passing in the night. I presume if you make the recommended change to
> > SYMLINK+= instead of NAME= in your udev script this goes away?
> You assume correctly.
>
> > I posted a patch to fix this "libxl: attempt to cleanup tapdisk
> > processes on disk backend destroy" a couple of times, most recently at
> > http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
> > applied yet. Can you try it?
>
> I tried it:
>
> make -j7 tools:
> ...
> libxl_device.c: In function ‘libxl__device_destroy’:
> libxl_device.c:253: error: incompatible type for argument 1 of
> ‘libxl__device_destroy_tapdisk’
> libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument
> is of type ‘libxl__gc’
> libxl_device.c:274: error: incompatible type for argument 1 of
> ‘libxl__device_destroy_tapdisk’
> libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument
> is of type ‘libxl__gc’
>
> My expertise with C is barely existant, but i took a look at
> tools/libxl/libxl_device.c
>
> and changed your
> libxl__device_destroy_tapdisk(gc, be_path);
> into
> libxl__device_destroy_tapdisk(&gc, be_path);
>
> as i have seen some &gc on other lines of code.
That looks right. I think this is just a difference between current
xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).
> And it compiled.
>
> I then created a guest, shut it down.
> First it kept beeing in a -ps--- state, i wanted to take a look at the
> runing processes with "ps auxww" but the ps process hung itself.
> I could no longer run "ps" successfully after this point.
Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
itself I'm not sure how ps can ever get into that state...
> syslog showed:
> ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure
> reading message
> Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed
> to receive 'unknown' message
>
> Either my hack to get your code to compile was no good or your patch has
> some unforseen side effects.
It's possible that it relies on something in xen-unstable that I'm not
aware of. Would it be possible for you to try and repro this issue with
xen-unstable.hg and this patch?
Daniel, have you got any idea what might be going on here?
Ian.
>
>
>
> I have now rebooted the server.
>
>
> As i went on to check if multipath had any effect on it i added
> devnode "^td" to the blacklist.
>
> Now when i xl create a vm it only boots up to a certain point and then
> does nothing.
> If that certain point were to be the login prompt everything would be
> fine, but it isnt:
> http://pastebin.com/Lmie6KwY
>
> This is how it should look like:
>
> http://pastebin.com/CsgYypbk
>
> I will try to backtrace my steps and see what i did do to break my system.
>
> In the meantime i have other systems i can test stuff on.
>
>
>
> -with best regards
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 14:07 ` Ian Campbell
@ 2011-07-22 14:28 ` Andreas Olsowski
2011-07-22 14:32 ` Ian Campbell
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22 14:28 UTC (permalink / raw)
To: Ian Campbell, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1406 bytes --]
>> My expertise with C is barely existant, but i took a look at
>> tools/libxl/libxl_device.c
>>
>> and changed your
>> libxl__device_destroy_tapdisk(gc, be_path);
>> into
>> libxl__device_destroy_tapdisk(&gc, be_path);
>>
>> as i have seen some&gc on other lines of code.
>
> That looks right. I think this is just a difference between current
> xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).
What do you mean looks right, the compilation errors or my
shot-in-the-dark adjustment?
> Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
> itself I'm not sure how ps can ever get into that state...
I had this happen before on two occasions (one of them using xm to
create a guest, whereas xl worked fine) and Sébastien Riccio wrote in
this thread, that he encountered it too.
If this one returns during "normal operation", ill write some more.
> It's possible that it relies on something in xen-unstable that I'm not
> aware of. Would it be possible for you to try and repro this issue with
> xen-unstable.hg and this patch?
Yes, i can and will do that.
Probably later this evening (4PM here now), but definitely this weekend.
I will reply to this thread with the results.
--
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg
Tel: ++49 4131 677 1309
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 14:28 ` Andreas Olsowski
@ 2011-07-22 14:32 ` Ian Campbell
2011-07-25 8:55 ` Andreas Olsowski
2011-08-11 13:32 ` Andreas Olsowski
0 siblings, 2 replies; 14+ messages in thread
From: Ian Campbell @ 2011-07-22 14:32 UTC (permalink / raw)
To: Andreas Olsowski; +Cc: xen-devel
On Fri, 2011-07-22 at 15:28 +0100, Andreas Olsowski wrote:
> >> My expertise with C is barely existant, but i took a look at
> >> tools/libxl/libxl_device.c
> >>
> >> and changed your
> >> libxl__device_destroy_tapdisk(gc, be_path);
> >> into
> >> libxl__device_destroy_tapdisk(&gc, be_path);
> >>
> >> as i have seen some&gc on other lines of code.
> >
> > That looks right. I think this is just a difference between current
> > xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).
> What do you mean looks right, the compilation errors or my
> shot-in-the-dark adjustment?
Your fix looked sensible.
>
> > Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
> > itself I'm not sure how ps can ever get into that state...
> I had this happen before on two occasions (one of them using xm to
> create a guest, whereas xl worked fine) and Sébastien Riccio wrote in
> this thread, that he encountered it too.
> If this one returns during "normal operation", ill write some more.
>
> > It's possible that it relies on something in xen-unstable that I'm not
> > aware of. Would it be possible for you to try and repro this issue with
> > xen-unstable.hg and this patch?
> Yes, i can and will do that.
> Probably later this evening (4PM here now), but definitely this weekend.
> I will reply to this thread with the results.
Thanks.
Ian.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and multipathing
2011-07-22 10:01 ` Sébastien RICCIO
@ 2011-07-22 18:55 ` Daniel Stodden
0 siblings, 0 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22 18:55 UTC (permalink / raw)
To: Sébastien RICCIO; +Cc: Andreas Olsowski, xen-devel
On Fri, 2011-07-22 at 06:01 -0400, Sébastien RICCIO wrote:
> > The processes, really? Where do they hang? (check out the wait state --
> > ps -eopid,wchan:25,cmd or so).
> >
> > Or do you mean they're stuck waiting for I/Os?
> >
> > Daniel
> >
> >
>
> They seems to work and to do their job, but they are in a strange state.
> For example a ps -aux on dom0 hangs when processing
> the line about the tapdisk process, also it cannot be detached from the
> vm, and issuing a reboot of the host hangs too (can't kill the process
> so it doesn't reboot).
>
> I fighted quite a lot with this on a debian6 + xen 4.1.x box and found
> out that disabling the multipath-tools and multipath-tools-boot
> corrected the problem (but I need them). I thought that maybe it was
> beacause multipathd try to "multipath" the block device
> handled by blktap2 and somehow locks it. But it's speculations :)
The multipathing is in a dm node to which tapdisk issues I/O. There's no
special handling involved in there whatsoever. It's completely
transparent, to blktap and tapdisk, as it should be.
I could imagine tapdisk wedging in dm code, during some I/O operations.
These should be fully asynchronous, but for some storage types under
special conditions that's sometimes wishful thinking. That applies if
you find a tap-ctl call (even just a list command) blocking.
The blktap module does not do anything unusual to the tapdisk task.
Anyway, it'd initially be a matter of figuring out where exactly it
blocks. If ps is borked, try to get another shell and
cat /proc/<pid>/wchan. Makes sense with both the ps and tapdisk2 tasks.
You say from the guest I/O perspective it still makes progress? If not,
that would explain why you're unable to detach: Blkback won't be able to
release the device before all pending I/O is flushed.
To check tapdev I/O state from the host side, do a
cat /sys/class/blktap2/tapdisk<n>/debug
That will dump some task stuff and a list of outstanding requests, if
there are any.
> I do not have the the hands on the box at the moment to give you more
> informations and do not want to hijack this thread. It's just that it
> looked like the problem I encountered, but I will send you more
> informations when I am on the box.
Thanks!
Daniel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 14:32 ` Ian Campbell
@ 2011-07-25 8:55 ` Andreas Olsowski
2011-08-11 13:32 ` Andreas Olsowski
1 sibling, 0 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-25 8:55 UTC (permalink / raw)
To: xen-devel, Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]
Well i did some testing this morning as my VPN connection was borked all
weekend.
__xen-unstable does not leave any tapdisk processes running.__
In fact it would seem that tapdisk is only started to spawn the block
device and then ends.
I may be misreading normal behavior here: However the udevd processes
that are started when a guest is created will stick around even if the
guest is shut down but will be replaced with different udevd processes
for the next created guest.
Nevertheless i applied you patch an tried again.
That gc &gc fix wasnt neccessary to patch.
The patch had no visible effect.
´
I hope this info helps in creating a patch for 4.1
--
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg
Tel: ++49 4131 677 1309
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: hanging tapdisk2 processes and improper udev rules
2011-07-22 14:32 ` Ian Campbell
2011-07-25 8:55 ` Andreas Olsowski
@ 2011-08-11 13:32 ` Andreas Olsowski
1 sibling, 0 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-08-11 13:32 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
[-- Attachment #1.1: Type: text/plain, Size: 378 bytes --]
Hi
i was wondering if something has happened in the last weeks regarding
this issue.
For now i am using xen 4.2 that either already has some kind of patch
applied or does not need one.
With best regards
--
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg
Tel: ++49 4131 677 1309
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-08-11 13:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22 9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
2011-07-22 9:28 ` Ian Campbell
2011-07-22 11:36 ` Andreas Olsowski
2011-07-22 14:07 ` Ian Campbell
2011-07-22 14:28 ` Andreas Olsowski
2011-07-22 14:32 ` Ian Campbell
2011-07-25 8:55 ` Andreas Olsowski
2011-08-11 13:32 ` Andreas Olsowski
2011-07-22 9:31 ` Daniel Stodden
2011-07-22 9:32 ` Daniel Stodden
2011-07-22 9:34 ` Sébastien RICCIO
2011-07-22 9:50 ` Daniel Stodden
2011-07-22 10:01 ` Sébastien RICCIO
2011-07-22 18:55 ` hanging tapdisk2 processes and multipathing Daniel Stodden
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.