All of lore.kernel.org
 help / color / mirror / Atom feed
* hanging tapdisk2 processes and improper udev rules
@ 2011-07-22  9:18 Andreas Olsowski
  2011-07-22  9:28 ` Ian Campbell
  2011-07-22  9:31 ` Daniel Stodden
  0 siblings, 2 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22  9:18 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2644 bytes --]

When i xl-create a guest, i get one message per assigned block device:

root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
Parsing config file /etc/xen/domains/x1test.sxp
Daemon running with PID 8704

root@xenturio1:/var/log# tail -10 error |grep SYMLINK
syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name 
'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use 
SYMLINK+= or change the kernel to provide the proper name
syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name 
'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use 
SYMLINK+= or change the kernel to provide the proper name


The guest works fine at that point.
root      8975  1.0  0.0  21664  3292 ?        SLs  11:00   0:00 tapdisk2
root      8978  0.0  0.0  21008   916 ?        S    11:00   0:00 udevd 
--daemon
root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
--daemon
root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
--daemon
root      9020  0.0  0.0  35500   952 ?        Ssl  11:00   0:00 xl 
create /etc/xen/domains/x1test2.sxp
root      9067  0.0  0.0      0     0 ?        S    11:00   0:00 
[blkback.3.xvda1]
root      9068  0.0  0.0      0     0 ?        S    11:00   0:00 
[blkback.3.xvda2]



Then i shutdown the guest:
root@xenturio1:/var/log# xl shutdown x1test

And i am left with remaining tapdisk2 and udev processes, one for each 
block device that was assigned to the guest:
root      8975  0.1  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
--daemon
root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
--daemon

I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
My distro is debian 6.0.2. that uses udev 164-3.
I did update it on a different machine to 171-3, but that did not help.


My xen-backend.rules contains the default:
SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
MODE="0600"
SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
MODE="0600


My questions are:
- Are the two issues related?
- How can i fix them?


I think that eventually this will cause the host to run out of either 
free process IDs and/or RAM.


-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
@ 2011-07-22  9:28 ` Ian Campbell
  2011-07-22 11:36   ` Andreas Olsowski
  2011-07-22  9:31 ` Daniel Stodden
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2011-07-22  9:28 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel, Ian Jackson

On Fri, 2011-07-22 at 10:18 +0100, Andreas Olsowski wrote:
> When i xl-create a guest, i get one message per assigned block device:
> 
> root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> Parsing config file /etc/xen/domains/x1test.sxp
> Daemon running with PID 8704
> 
> root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name 
> 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use 
> SYMLINK+= or change the kernel to provide the proper name
> syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name 
> 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use 
> SYMLINK+= or change the kernel to provide the proper name

This is because udev and forward/backward compatibility are strangers
passing in the night. I presume if you make the recommended change to
SYMLINK+= instead of NAME= in your udev script this goes away?

> Then i shutdown the guest:
> root@xenturio1:/var/log# xl shutdown x1test
> 
> And i am left with remaining tapdisk2 and udev processes, one for each 
> block device that was assigned to the guest:
> root      8975  0.1  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
> --daemon
> root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
> --daemon

I posted a patch to fix this "libxl: attempt to cleanup tapdisk
processes on disk backend destroy" a couple of times, most recently at 
http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
applied yet. Can you try it?

> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> My distro is debian 6.0.2. that uses udev 164-3.
> I did update it on a different machine to 171-3, but that did not help.
> 
> 
> My xen-backend.rules contains the default:
> SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> MODE="0600"
> SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> MODE="0600
> 
> 
> My questions are:
> - Are the two issues related?
> - How can i fix them?
> 
> 
> I think that eventually this will cause the host to run out of either 
> free process IDs and/or RAM.
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
  2011-07-22  9:28 ` Ian Campbell
@ 2011-07-22  9:31 ` Daniel Stodden
  2011-07-22  9:32   ` Daniel Stodden
  2011-07-22  9:34   ` Sébastien RICCIO
  1 sibling, 2 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22  9:31 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote:
> When i xl-create a guest, i get one message per assigned block device:
> 
> root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> Parsing config file /etc/xen/domains/x1test.sxp
> Daemon running with PID 8704

Can you try if it gets better when removing that file?

Thanks,
Daniel

> root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name 
> 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use 
> SYMLINK+= or change the kernel to provide the proper name
> syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name 
> 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use 
> SYMLINK+= or change the kernel to provide the proper name
> 
> 
> The guest works fine at that point.
> root      8975  1.0  0.0  21664  3292 ?        SLs  11:00   0:00 tapdisk2
> root      8978  0.0  0.0  21008   916 ?        S    11:00   0:00 udevd 
> --daemon
> root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
> --daemon
> root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
> --daemon
> root      9020  0.0  0.0  35500   952 ?        Ssl  11:00   0:00 xl 
> create /etc/xen/domains/x1test2.sxp
> root      9067  0.0  0.0      0     0 ?        S    11:00   0:00 
> [blkback.3.xvda1]
> root      9068  0.0  0.0      0     0 ?        S    11:00   0:00 
> [blkback.3.xvda2]
> 
> 
> 
> Then i shutdown the guest:
> root@xenturio1:/var/log# xl shutdown x1test
> 
> And i am left with remaining tapdisk2 and udev processes, one for each 
> block device that was assigned to the guest:
> root      8975  0.1  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
> --daemon
> root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
> --daemon
> 
> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> My distro is debian 6.0.2. that uses udev 164-3.
> I did update it on a different machine to 171-3, but that did not help.
> 
> 
> My xen-backend.rules contains the default:
> SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> MODE="0600"
> SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> MODE="0600
> 
> 
> My questions are:
> - Are the two issues related?
> - How can i fix them?
> 
> 
> I think that eventually this will cause the host to run out of either 
> free process IDs and/or RAM.
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:31 ` Daniel Stodden
@ 2011-07-22  9:32   ` Daniel Stodden
  2011-07-22  9:34   ` Sébastien RICCIO
  1 sibling, 0 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22  9:32 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Fri, 2011-07-22 at 05:31 -0400, Daniel Stodden wrote:
> On Fri, 2011-07-22 at 05:18 -0400, Andreas Olsowski wrote:
> > When i xl-create a guest, i get one message per assigned block device:
> > 
> > root@xenturio1:/var/log# xl create /etc/xen/domains/x1test.sxp
> > Parsing config file /etc/xen/domains/x1test.sxp
> > Daemon running with PID 8704
> 
> Can you try if it gets better when removing that file?

The udev rules, in case it isn't clear :)

Daniel


> Thanks,
> Daniel
> 
> > root@xenturio1:/var/log# tail -10 error |grep SYMLINK
> > syslog:Jul 22 10:58:05 xenturio1 udevd[8658]: kernel-provided name 
> > 'blktap2' and NAME= 'xen/blktap-2/blktap2' disagree, please use 
> > SYMLINK+= or change the kernel to provide the proper name
> > syslog:Jul 22 10:58:05 xenturio1 udevd[8664]: kernel-provided name 
> > 'blktap3' and NAME= 'xen/blktap-2/blktap3' disagree, please use 
> > SYMLINK+= or change the kernel to provide the proper name
> > 
> > 
> > The guest works fine at that point.
> > root      8975  1.0  0.0  21664  3292 ?        SLs  11:00   0:00 tapdisk2
> > root      8978  0.0  0.0  21008   916 ?        S    11:00   0:00 udevd 
> > --daemon
> > root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> > root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
> > --daemon
> > root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
> > --daemon
> > root      9020  0.0  0.0  35500   952 ?        Ssl  11:00   0:00 xl 
> > create /etc/xen/domains/x1test2.sxp
> > root      9067  0.0  0.0      0     0 ?        S    11:00   0:00 
> > [blkback.3.xvda1]
> > root      9068  0.0  0.0      0     0 ?        S    11:00   0:00 
> > [blkback.3.xvda2]
> > 
> > 
> > 
> > Then i shutdown the guest:
> > root@xenturio1:/var/log# xl shutdown x1test
> > 
> > And i am left with remaining tapdisk2 and udev processes, one for each 
> > block device that was assigned to the guest:
> > root      8975  0.1  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> > root      8981  0.0  0.0  21664  3256 ?        SLs  11:00   0:00 tapdisk2
> > root      8983  0.0  0.0  21008   796 ?        S    11:00   0:00 udevd 
> > --daemon
> > root      9002  0.0  0.0  21008   800 ?        S    11:00   0:00 udevd 
> > --daemon
> > 
> > I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> > My distro is debian 6.0.2. that uses udev 164-3.
> > I did update it on a different machine to 171-3, but that did not help.
> > 
> > 
> > My xen-backend.rules contains the default:
> > SUBSYSTEM=="xen", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> > MODE="0600"
> > SUBSYSTEM=="blktap2", KERNEL=="blktap[0-9]*", NAME="xen/blktap-2/%k", 
> > MODE="0600
> > 
> > 
> > My questions are:
> > - Are the two issues related?
> > - How can i fix them?
> > 
> > 
> > I think that eventually this will cause the host to run out of either 
> > free process IDs and/or RAM.
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:31 ` Daniel Stodden
  2011-07-22  9:32   ` Daniel Stodden
@ 2011-07-22  9:34   ` Sébastien RICCIO
  2011-07-22  9:50     ` Daniel Stodden
  1 sibling, 1 reply; 14+ messages in thread
From: Sébastien RICCIO @ 2011-07-22  9:34 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Andreas Olsowski, xen-devel


>> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
>> My distro is debian 6.0.2. that uses udev 164-3.
>> I did update it on a different machine to 171-3, but that did not help.
>>
Hi,
Just for curiosity, are you running multipathd on that box ? I had 
(still have in fact) an issue with tapdisk processes hanging
while multipathd process running.

Sébastien

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:34   ` Sébastien RICCIO
@ 2011-07-22  9:50     ` Daniel Stodden
  2011-07-22 10:01       ` Sébastien RICCIO
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22  9:50 UTC (permalink / raw)
  To: Sébastien RICCIO; +Cc: Andreas Olsowski, xen-devel

On Fri, 2011-07-22 at 05:34 -0400, Sébastien RICCIO wrote:
> >> I am using Xen 4.1.1 with the 2.6.32.43-pvops kernel from jeremy.
> >> My distro is debian 6.0.2. that uses udev 164-3.
> >> I did update it on a different machine to 171-3, but that did not help.
> >>
> Hi,
> Just for curiosity, are you running multipathd on that box ? I had 
> (still have in fact) an issue with tapdisk processes hanging
> while multipathd process running.

The processes, really? Where do they hang? (check out the wait state --
ps -eopid,wchan:25,cmd or so).

Or do you mean they're stuck waiting for I/Os?

Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:50     ` Daniel Stodden
@ 2011-07-22 10:01       ` Sébastien RICCIO
  2011-07-22 18:55         ` hanging tapdisk2 processes and multipathing Daniel Stodden
  0 siblings, 1 reply; 14+ messages in thread
From: Sébastien RICCIO @ 2011-07-22 10:01 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Andreas Olsowski, xen-devel


> The processes, really? Where do they hang? (check out the wait state --
> ps -eopid,wchan:25,cmd or so).
>
> Or do you mean they're stuck waiting for I/Os?
>
> Daniel
>
>

They seems to work and to do their job, but they are in a strange state. 
For example a ps -aux on dom0 hangs when processing
the line about the tapdisk process, also it cannot be detached from the 
vm, and issuing a reboot of the host hangs too (can't kill the process 
so it doesn't reboot).

I fighted quite a lot with this on a debian6 + xen 4.1.x  box and found 
out that disabling the  multipath-tools and multipath-tools-boot 
corrected the problem (but I need them). I thought that maybe it was 
beacause multipathd try to "multipath" the block device
handled by blktap2 and somehow locks it. But it's speculations :)

I do not have the the hands on the box at the moment to give you more 
informations and do not want to hijack this thread. It's just that it 
looked like the problem I encountered, but I will send you more 
informations when I am on the box.

Thanks,
Sébastien

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22  9:28 ` Ian Campbell
@ 2011-07-22 11:36   ` Andreas Olsowski
  2011-07-22 14:07     ` Ian Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22 11:36 UTC (permalink / raw)
  To: Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2621 bytes --]


On 07/22/2011 11:28 AM, Ian Campbell wrote:

> This is because udev and forward/backward compatibility are strangers
> passing in the night. I presume if you make the recommended change to
> SYMLINK+= instead of NAME= in your udev script this goes away?
You assume correctly.

> I posted a patch to fix this "libxl: attempt to cleanup tapdisk
> processes on disk backend destroy" a couple of times, most recently at
> http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
> applied yet. Can you try it?

I tried it:

make -j7 tools:
...
libxl_device.c: In function ‘libxl__device_destroy’:
libxl_device.c:253: error: incompatible type for argument 1 of 
‘libxl__device_destroy_tapdisk’
libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument 
is of type ‘libxl__gc’
libxl_device.c:274: error: incompatible type for argument 1 of 
‘libxl__device_destroy_tapdisk’
libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument 
is of type ‘libxl__gc’

My expertise with C is barely existant, but i took a look at 
tools/libxl/libxl_device.c

and changed your
libxl__device_destroy_tapdisk(gc, be_path);
into
libxl__device_destroy_tapdisk(&gc, be_path);

as i have seen some &gc on other lines of code.

And it compiled.

I then created a guest, shut it down.
First it kept beeing in a -ps--- state, i wanted to take a look at the 
runing processes with "ps auxww" but the ps process hung itself.
I could no longer run "ps" successfully after this point.
syslog showed:
ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure 
reading message
Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed 
to receive 'unknown' message

Either my hack to get your code to compile was no good or your patch has 
some unforseen side effects.



I have now rebooted the server.


As i went on to check if multipath had any effect on it i added
devnode "^td" to the blacklist.

Now when i xl create a vm it only boots up to a certain point and then 
does nothing.
If that certain point were to be the login prompt everything would be 
fine, but it isnt:
http://pastebin.com/Lmie6KwY

This is how it should look like:

http://pastebin.com/CsgYypbk

I will try to backtrace my steps and see what i did do to break my system.

In the meantime i have other systems i can test stuff on.



-with best regards


-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22 11:36   ` Andreas Olsowski
@ 2011-07-22 14:07     ` Ian Campbell
  2011-07-22 14:28       ` Andreas Olsowski
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2011-07-22 14:07 UTC (permalink / raw)
  To: Andreas Olsowski, Daniel Stodden; +Cc: xen-devel

On Fri, 2011-07-22 at 12:36 +0100, Andreas Olsowski wrote:
> On 07/22/2011 11:28 AM, Ian Campbell wrote:
> 
> > This is because udev and forward/backward compatibility are strangers
> > passing in the night. I presume if you make the recommended change to
> > SYMLINK+= instead of NAME= in your udev script this goes away?
> You assume correctly.
> 
> > I posted a patch to fix this "libxl: attempt to cleanup tapdisk
> > processes on disk backend destroy" a couple of times, most recently at
> > http://marc.info/?l=xen-devel&m=131066210526755 but it hasn't been
> > applied yet. Can you try it?
> 
> I tried it:
> 
> make -j7 tools:
> ...
> libxl_device.c: In function ‘libxl__device_destroy’:
> libxl_device.c:253: error: incompatible type for argument 1 of 
> ‘libxl__device_destroy_tapdisk’
> libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument 
> is of type ‘libxl__gc’
> libxl_device.c:274: error: incompatible type for argument 1 of 
> ‘libxl__device_destroy_tapdisk’
> libxl_internal.h:321: note: expected ‘struct libxl__gc *’ but argument 
> is of type ‘libxl__gc’
> 
> My expertise with C is barely existant, but i took a look at 
> tools/libxl/libxl_device.c
> 
> and changed your
> libxl__device_destroy_tapdisk(gc, be_path);
> into
> libxl__device_destroy_tapdisk(&gc, be_path);
> 
> as i have seen some &gc on other lines of code.

That looks right. I think this is just a difference between current
xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).

> And it compiled.
> 
> I then created a guest, shut it down.
> First it kept beeing in a -ps--- state, i wanted to take a look at the 
> runing processes with "ps auxww" but the ps process hung itself.
> I could no longer run "ps" successfully after this point.

Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
itself I'm not sure how ps can ever get into that state...

> syslog showed:
> ul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_read_message: failure 
> reading message
> Jul 22 13:00:07 xenturio1 xl: tap-err:tap_ctl_send_and_receive: failed 
> to receive 'unknown' message
> 
> Either my hack to get your code to compile was no good or your patch has 
> some unforseen side effects.

It's possible that it relies on something in xen-unstable that I'm not
aware of. Would it be possible for you to try and repro this issue with
xen-unstable.hg and this patch?

Daniel, have you got any idea what might be going on here?

Ian.
> 
> 
> 
> I have now rebooted the server.
> 
> 
> As i went on to check if multipath had any effect on it i added
> devnode "^td" to the blacklist.
> 
> Now when i xl create a vm it only boots up to a certain point and then 
> does nothing.
> If that certain point were to be the login prompt everything would be 
> fine, but it isnt:
> http://pastebin.com/Lmie6KwY
> 
> This is how it should look like:
> 
> http://pastebin.com/CsgYypbk
> 
> I will try to backtrace my steps and see what i did do to break my system.
> 
> In the meantime i have other systems i can test stuff on.
> 
> 
> 
> -with best regards
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22 14:07     ` Ian Campbell
@ 2011-07-22 14:28       ` Andreas Olsowski
  2011-07-22 14:32         ` Ian Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-22 14:28 UTC (permalink / raw)
  To: Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1406 bytes --]

>> My expertise with C is barely existant, but i took a look at
>> tools/libxl/libxl_device.c
>>
>> and changed your
>> libxl__device_destroy_tapdisk(gc, be_path);
>> into
>> libxl__device_destroy_tapdisk(&gc, be_path);
>>
>> as i have seen some&gc on other lines of code.
>
> That looks right. I think this is just a difference between current
> xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).
What do you mean looks right, the compilation errors or my 
shot-in-the-dark adjustment?

> Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
> itself I'm not sure how ps can ever get into that state...
I had this happen before on two occasions (one of them using xm to 
create a guest, whereas xl worked fine) and Sébastien Riccio wrote in 
this thread, that he encountered it too.
If this one returns during "normal operation", ill write some more.

> It's possible that it relies on something in xen-unstable that I'm not
> aware of. Would it be possible for you to try and repro this issue with
> xen-unstable.hg and this patch?
Yes, i can and will do that.
Probably later this evening (4PM here now), but definitely this weekend.
I will reply to this thread with the results.




-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22 14:28       ` Andreas Olsowski
@ 2011-07-22 14:32         ` Ian Campbell
  2011-07-25  8:55           ` Andreas Olsowski
  2011-08-11 13:32           ` Andreas Olsowski
  0 siblings, 2 replies; 14+ messages in thread
From: Ian Campbell @ 2011-07-22 14:32 UTC (permalink / raw)
  To: Andreas Olsowski; +Cc: xen-devel

On Fri, 2011-07-22 at 15:28 +0100, Andreas Olsowski wrote:
> >> My expertise with C is barely existant, but i took a look at
> >> tools/libxl/libxl_device.c
> >>
> >> and changed your
> >> libxl__device_destroy_tapdisk(gc, be_path);
> >> into
> >> libxl__device_destroy_tapdisk(&gc, be_path);
> >>
> >> as i have seen some&gc on other lines of code.
> >
> > That looks right. I think this is just a difference between current
> > xen-unstable and xen-4.1 (due to 23045:c426a7140c99 FWIW).
> What do you mean looks right, the compilation errors or my 
> shot-in-the-dark adjustment?

Your fix looked sensible.

> 
> > Uh. That really shouldn't happen :-/ In fact baring a bug in the host OS
> > itself I'm not sure how ps can ever get into that state...
> I had this happen before on two occasions (one of them using xm to 
> create a guest, whereas xl worked fine) and Sébastien Riccio wrote in 
> this thread, that he encountered it too.
> If this one returns during "normal operation", ill write some more.
> 
> > It's possible that it relies on something in xen-unstable that I'm not
> > aware of. Would it be possible for you to try and repro this issue with
> > xen-unstable.hg and this patch?
> Yes, i can and will do that.
> Probably later this evening (4PM here now), but definitely this weekend.
> I will reply to this thread with the results.

Thanks.

Ian.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and multipathing
  2011-07-22 10:01       ` Sébastien RICCIO
@ 2011-07-22 18:55         ` Daniel Stodden
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Stodden @ 2011-07-22 18:55 UTC (permalink / raw)
  To: Sébastien RICCIO; +Cc: Andreas Olsowski, xen-devel

On Fri, 2011-07-22 at 06:01 -0400, Sébastien RICCIO wrote:
> > The processes, really? Where do they hang? (check out the wait state --
> > ps -eopid,wchan:25,cmd or so).
> >
> > Or do you mean they're stuck waiting for I/Os?
> >
> > Daniel
> >
> >
> 
> They seems to work and to do their job, but they are in a strange state. 
> For example a ps -aux on dom0 hangs when processing
> the line about the tapdisk process, also it cannot be detached from the 
> vm, and issuing a reboot of the host hangs too (can't kill the process 
> so it doesn't reboot).
> 
> I fighted quite a lot with this on a debian6 + xen 4.1.x  box and found 
> out that disabling the  multipath-tools and multipath-tools-boot 
> corrected the problem (but I need them). I thought that maybe it was 
> beacause multipathd try to "multipath" the block device
> handled by blktap2 and somehow locks it. But it's speculations :)

The multipathing is in a dm node to which tapdisk issues I/O. There's no
special handling involved in there whatsoever. It's completely
transparent, to blktap and tapdisk, as it should be.

I could imagine tapdisk wedging in dm code, during some I/O operations.
These should be fully asynchronous, but for some storage types under
special conditions that's sometimes wishful thinking. That applies if
you find a tap-ctl call (even just a list command) blocking.

The blktap module does not do anything unusual to the tapdisk task.

Anyway, it'd initially be a matter of figuring out where exactly it
blocks. If ps is borked, try to get another shell and
cat /proc/<pid>/wchan. Makes sense with both the ps and tapdisk2 tasks.

You say from the guest I/O perspective it still makes progress? If not,
that would explain why you're unable to detach: Blkback won't be able to
release the device before all pending I/O is flushed.

To check tapdev I/O state from the host side, do a
cat /sys/class/blktap2/tapdisk<n>/debug

That will dump some task stuff and a list of outstanding requests, if
there are any.

> I do not have the the hands on the box at the moment to give you more 
> informations and do not want to hijack this thread. It's just that it 
> looked like the problem I encountered, but I will send you more 
> informations when I am on the box.

Thanks!

Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22 14:32         ` Ian Campbell
@ 2011-07-25  8:55           ` Andreas Olsowski
  2011-08-11 13:32           ` Andreas Olsowski
  1 sibling, 0 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-07-25  8:55 UTC (permalink / raw)
  To: xen-devel, Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

Well i did some testing this morning as my VPN connection was borked all 
weekend.

__xen-unstable does not leave any tapdisk processes running.__

In fact it would seem that tapdisk is only started to spawn the block 
device and then ends.

I may be misreading normal behavior here: However the udevd processes 
that are started when a guest is created will stick around even if the 
guest is shut down but will be replaced with different udevd processes 
for the next created guest.


Nevertheless i applied you patch an tried again.

That gc &gc fix wasnt neccessary to patch.

The patch had no visible effect.
´

I hope this info helps in creating a patch for 4.1



-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hanging tapdisk2 processes and improper udev rules
  2011-07-22 14:32         ` Ian Campbell
  2011-07-25  8:55           ` Andreas Olsowski
@ 2011-08-11 13:32           ` Andreas Olsowski
  1 sibling, 0 replies; 14+ messages in thread
From: Andreas Olsowski @ 2011-08-11 13:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 378 bytes --]

Hi

i was wondering if something has happened in the last weeks regarding 
this issue.

For now i am using xen 4.2 that either already has some kind of patch 
applied or does not need one.

With best regards

-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-08-11 13:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-22  9:18 hanging tapdisk2 processes and improper udev rules Andreas Olsowski
2011-07-22  9:28 ` Ian Campbell
2011-07-22 11:36   ` Andreas Olsowski
2011-07-22 14:07     ` Ian Campbell
2011-07-22 14:28       ` Andreas Olsowski
2011-07-22 14:32         ` Ian Campbell
2011-07-25  8:55           ` Andreas Olsowski
2011-08-11 13:32           ` Andreas Olsowski
2011-07-22  9:31 ` Daniel Stodden
2011-07-22  9:32   ` Daniel Stodden
2011-07-22  9:34   ` Sébastien RICCIO
2011-07-22  9:50     ` Daniel Stodden
2011-07-22 10:01       ` Sébastien RICCIO
2011-07-22 18:55         ` hanging tapdisk2 processes and multipathing Daniel Stodden

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.