From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Stodden Subject: Re: hanging tapdisk2 processes and multipathing Date: Fri, 22 Jul 2011 11:55:15 -0700 Message-ID: <1311360915.14071.290.camel@agari.van.xensource.com> References: <4E294068.2030700@leuphana.de> <1311327074.2360.14.camel@ramone> <4E294430.7090805@swisscenter.com> <1311328223.2360.23.camel@ramone> <4E294A75.9040106@swisscenter.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Return-path: In-Reply-To: <4E294A75.9040106@swisscenter.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: =?ISO-8859-1?Q?S=E9bastien?= RICCIO Cc: Andreas Olsowski , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On Fri, 2011-07-22 at 06:01 -0400, Sébastien RICCIO wrote: > > The processes, really? Where do they hang? (check out the wait state -- > > ps -eopid,wchan:25,cmd or so). > > > > Or do you mean they're stuck waiting for I/Os? > > > > Daniel > > > > > > They seems to work and to do their job, but they are in a strange state. > For example a ps -aux on dom0 hangs when processing > the line about the tapdisk process, also it cannot be detached from the > vm, and issuing a reboot of the host hangs too (can't kill the process > so it doesn't reboot). > > I fighted quite a lot with this on a debian6 + xen 4.1.x box and found > out that disabling the multipath-tools and multipath-tools-boot > corrected the problem (but I need them). I thought that maybe it was > beacause multipathd try to "multipath" the block device > handled by blktap2 and somehow locks it. But it's speculations :) The multipathing is in a dm node to which tapdisk issues I/O. There's no special handling involved in there whatsoever. It's completely transparent, to blktap and tapdisk, as it should be. I could imagine tapdisk wedging in dm code, during some I/O operations. These should be fully asynchronous, but for some storage types under special conditions that's sometimes wishful thinking. That applies if you find a tap-ctl call (even just a list command) blocking. The blktap module does not do anything unusual to the tapdisk task. Anyway, it'd initially be a matter of figuring out where exactly it blocks. If ps is borked, try to get another shell and cat /proc//wchan. Makes sense with both the ps and tapdisk2 tasks. You say from the guest I/O perspective it still makes progress? If not, that would explain why you're unable to detach: Blkback won't be able to release the device before all pending I/O is flushed. To check tapdev I/O state from the host side, do a cat /sys/class/blktap2/tapdisk/debug That will dump some task stuff and a list of outstanding requests, if there are any. > I do not have the the hands on the box at the moment to give you more > informations and do not want to hijack this thread. It's just that it > looked like the problem I encountered, but I will send you more > informations when I am on the box. Thanks! Daniel