From: Dave Chinner <david@fromorbit.com> To: Justin Piszcz <jpiszcz@lucidpixels.com> Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com, Alan Piszcz <ap@solarrain.com> Subject: Re: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Date: Tue, 20 Oct 2009 11:33:58 +1100 [thread overview] Message-ID: <20091020003358.GW9464@discord.disaster> (raw) In-Reply-To: <alpine.DEB.2.00.0910190431180.23395@p34.internal.lan> On Mon, Oct 19, 2009 at 06:18:58AM -0400, Justin Piszcz wrote: > On Mon, 19 Oct 2009, Dave Chinner wrote: >> On Sun, Oct 18, 2009 at 04:17:42PM -0400, Justin Piszcz wrote: >>> It has happened again, all sysrq-X output was saved this time. >> ..... >> >> All pointing to log IO not completing. >> .... > So far I do not have a reproducible test case, Ok. What sort of load is being placed on the machine? > the only other thing not posted was the output of ps auxww during > the time of the lockup, not sure if it will help, but here it is: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 1 0.0 0.0 10320 684 ? Ss Oct16 0:00 init [2] .... > root 371 0.0 0.0 0 0 ? R< Oct16 0:01 [xfslogd/0] > root 372 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/1] > root 373 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/2] > root 374 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/3] > root 375 0.0 0.0 0 0 ? R< Oct16 0:00 [xfsdatad/0] > root 376 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsdatad/1] > root 377 0.0 0.0 0 0 ? S< Oct16 0:03 [xfsdatad/2] > root 378 0.0 0.0 0 0 ? S< Oct16 0:01 [xfsdatad/3] > root 379 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/0] > root 380 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/1] > root 381 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/2] > root 382 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/3] ..... It appears that both the xfslogd and the xfsdatad on CPU 0 are in the running state but don't appear to be consuming any significant CPU time. If they remain like this then I think that means they are stuck waiting on the run queue. Do these XFS threads always appear like this when the hang occurs? If so, is there something else that is hogging CPU 0 preventing these threads from getting the CPU? Cheers, Dave. -- Dave Chinner david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com> To: Justin Piszcz <jpiszcz@lucidpixels.com> Cc: linux-raid@vger.kernel.org, Alan Piszcz <ap@solarrain.com>, linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Date: Tue, 20 Oct 2009 11:33:58 +1100 [thread overview] Message-ID: <20091020003358.GW9464@discord.disaster> (raw) In-Reply-To: <alpine.DEB.2.00.0910190431180.23395@p34.internal.lan> On Mon, Oct 19, 2009 at 06:18:58AM -0400, Justin Piszcz wrote: > On Mon, 19 Oct 2009, Dave Chinner wrote: >> On Sun, Oct 18, 2009 at 04:17:42PM -0400, Justin Piszcz wrote: >>> It has happened again, all sysrq-X output was saved this time. >> ..... >> >> All pointing to log IO not completing. >> .... > So far I do not have a reproducible test case, Ok. What sort of load is being placed on the machine? > the only other thing not posted was the output of ps auxww during > the time of the lockup, not sure if it will help, but here it is: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 1 0.0 0.0 10320 684 ? Ss Oct16 0:00 init [2] .... > root 371 0.0 0.0 0 0 ? R< Oct16 0:01 [xfslogd/0] > root 372 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/1] > root 373 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/2] > root 374 0.0 0.0 0 0 ? S< Oct16 0:00 [xfslogd/3] > root 375 0.0 0.0 0 0 ? R< Oct16 0:00 [xfsdatad/0] > root 376 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsdatad/1] > root 377 0.0 0.0 0 0 ? S< Oct16 0:03 [xfsdatad/2] > root 378 0.0 0.0 0 0 ? S< Oct16 0:01 [xfsdatad/3] > root 379 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/0] > root 380 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/1] > root 381 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/2] > root 382 0.0 0.0 0 0 ? S< Oct16 0:00 [xfsconvertd/3] ..... It appears that both the xfslogd and the xfsdatad on CPU 0 are in the running state but don't appear to be consuming any significant CPU time. If they remain like this then I think that means they are stuck waiting on the run queue. Do these XFS threads always appear like this when the hang occurs? If so, is there something else that is hogging CPU 0 preventing these threads from getting the CPU? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2009-10-20 0:33 UTC|newest] Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-10-17 22:34 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz 2009-10-17 22:34 ` Justin Piszcz 2009-10-18 20:17 ` Justin Piszcz 2009-10-18 20:17 ` Justin Piszcz 2009-10-19 3:04 ` Dave Chinner 2009-10-19 3:04 ` Dave Chinner 2009-10-19 10:18 ` Justin Piszcz 2009-10-19 10:18 ` Justin Piszcz 2009-10-20 0:33 ` Dave Chinner [this message] 2009-10-20 0:33 ` Dave Chinner 2009-10-20 8:33 ` Justin Piszcz 2009-10-20 8:33 ` Justin Piszcz 2009-10-21 10:19 ` Justin Piszcz 2009-10-21 10:19 ` Justin Piszcz 2009-10-21 14:17 ` mdadm --detail showing annoying device Stephane Bunel 2009-10-21 21:46 ` Neil Brown 2009-10-22 11:22 ` Stephane Bunel 2009-10-29 3:44 ` Neil Brown 2009-11-03 9:37 ` Stephane Bunel 2009-11-03 10:09 ` Beolach 2009-11-03 12:16 ` Stephane Bunel 2009-10-22 11:29 ` Mario 'BitKoenig' Holbe 2009-10-22 14:17 ` Stephane Bunel 2009-10-22 16:00 ` Stephane Bunel 2009-10-22 22:49 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz 2009-10-22 22:49 ` Justin Piszcz 2009-10-22 23:00 ` Dave Chinner 2009-10-22 23:00 ` Dave Chinner 2009-10-26 11:24 ` Justin Piszcz 2009-10-26 11:24 ` Justin Piszcz 2009-11-02 21:46 ` Justin Piszcz 2009-11-02 21:46 ` Justin Piszcz 2009-11-20 20:39 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk Justin Piszcz 2009-11-20 20:39 ` Justin Piszcz 2009-11-20 23:44 ` Bug#557262: " Faidon Liambotis 2009-11-20 23:44 ` Faidon Liambotis 2009-11-20 23:44 ` Faidon Liambotis 2009-11-20 23:51 ` Justin Piszcz 2009-11-20 23:51 ` Justin Piszcz 2009-11-21 14:29 ` Roger Heflin 2009-11-21 14:29 ` Roger Heflin 2009-11-24 13:08 ` Which kernel options should be enabled to find the root cause of this bug? Justin Piszcz 2009-11-24 13:08 ` Justin Piszcz 2009-11-24 15:14 ` Eric Sandeen 2009-11-24 15:14 ` Eric Sandeen 2009-11-24 16:20 ` Justin Piszcz 2009-11-24 16:20 ` Justin Piszcz 2009-11-24 16:23 ` Eric Sandeen 2009-11-24 16:23 ` Eric Sandeen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20091020003358.GW9464@discord.disaster \ --to=david@fromorbit.com \ --cc=ap@solarrain.com \ --cc=jpiszcz@lucidpixels.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=xfs@oss.sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.