2.4.18-14 kernel stuck during ext3 umount with ping still responding

All of lore.kernel.org
 help / color / mirror / Atom feed

* 2.4.18-14 kernel stuck during ext3 umount with ping still responding
@ 2003-01-09  9:38 yuval yeret
  2003-01-09  9:58 ` Arjan van de Ven
  2003-01-09 10:06 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: yuval yeret @ 2003-01-09  9:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: yuval

Hi,

I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over RAID 
0+1 volumes.

>From time to time I get a black screen stuck machine while trying to umount 
a volume during an IO workload (as part of a failback solution - but after 
killing all IO processes ), with ping still responding, but everything else 
mostly dead.

I tried using the forcedumount patch to solve this problem - to no avail. 
Also tried upgrading the qlogic drivers to the latest drivers from Qlogic.

After one of the occurences I managed to get some output using the sysrq 
keys.

This seems similar to what is described in 
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but with a 
different call trace

What I have here is what I managed to copy down (for some reason pgup/pgdown 
didn't work so not all information is full...) together with a manual lookup 
of the call trace
from /proc/ksyms :

process umount
EIP c01190b8			(set_running_and_schedule)
call trace:
c01144c9	f25f9ec0	IO_APIC_get_PCI_irq_vector
c010a8b0	f25f9ed0	enable_irq
c014200c	f25f9ef0	fsync_buffers_list
c0155595	f25f9efc	clear_inode
c015553d	f25f9f2c	invalidate_inodes
c01461d8	f25f9f78	get_super
c014a629	f25f9f94	path_release
c0157c58	f25f9fc0	sys_umount
c0108cab			sys_sigaltstack

Any idea what can cause this ?

I'm hoping the ext3fix.patch will solve this problem... am trying that now.


Thanks,
Yuval

P.S. please CC me for questions/replies as I'm not currently subscribed to 
the list.

--
Yuval Yeret
Exanet
http://www.exanet.com
Tel.  972-9-9717782
Fax. 972-9-9717778








_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online 
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
  2003-01-09  9:38 2.4.18-14 kernel stuck during ext3 umount with ping still responding yuval yeret
@ 2003-01-09  9:58 ` Arjan van de Ven
  2003-01-09 10:06 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: Arjan van de Ven @ 2003-01-09  9:58 UTC (permalink / raw)
  To: yuval yeret; +Cc: linux-kernel, yuval

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On Thu, 2003-01-09 at 10:38, yuval yeret wrote:
> Hi,
> 
> I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over RAID 
> 0+1 volumes.
> 
> >From time to time I get a black screen stuck machine while trying to umount 
> a volume during an IO workload (as part of a failback solution - but after 
> killing all IO processes ), with ping still responding, but everything else 
> mostly dead.
> 

> I'm hoping the ext3fix.patch will solve this problem... am trying that now.

this got fixed in the recent erratum kernel 2.4.18-19.8.0

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
  2003-01-09  9:38 2.4.18-14 kernel stuck during ext3 umount with ping still responding yuval yeret
  2003-01-09  9:58 ` Arjan van de Ven
@ 2003-01-09 10:06 ` Andrew Morton
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2003-01-09 10:06 UTC (permalink / raw)
  To: yuval yeret; +Cc: linux-kernel, yuval

yuval yeret wrote:
> 
> Hi,
> 
> I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over RAID
> 0+1 volumes.
> 
> >From time to time I get a black screen stuck machine while trying to umount
> a volume during an IO workload (as part of a failback solution - but after
> killing all IO processes ), with ping still responding, but everything else
> mostly dead.
> 
> I tried using the forcedumount patch to solve this problem - to no avail.
> Also tried upgrading the qlogic drivers to the latest drivers from Qlogic.
> 
> After one of the occurences I managed to get some output using the sysrq
> keys.
> 
> This seems similar to what is described in
> http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but with a
> different call trace
> 
> What I have here is what I managed to copy down (for some reason pgup/pgdown
> didn't work so not all information is full...) together with a manual lookup
> of the call trace
> from /proc/ksyms :
> 
> process umount
> EIP c01190b8                    (set_running_and_schedule)
> call trace:
> c01144c9        f25f9ec0        IO_APIC_get_PCI_irq_vector
> c010a8b0        f25f9ed0        enable_irq
> c014200c        f25f9ef0        fsync_buffers_list
> c0155595        f25f9efc        clear_inode
> c015553d        f25f9f2c        invalidate_inodes
> c01461d8        f25f9f78        get_super
> c014a629        f25f9f94        path_release
> c0157c58        f25f9fc0        sys_umount
> c0108cab                        sys_sigaltstack
> 
> Any idea what can cause this ?
> 

If you have a large amount of data against two or more filesystems,
and you try to unmount one of them the kernel can seize up for a
very long time in the fsync_dev()->sync_buffers() function.  Under
these circumstances that function has O(n*n) search complexity
and n is quite large.

However your backtrace shows neither of those functions.

Still, as an experiment it would be interesting to see if the below
patch fixes it up.  It converts O(n*n) to O(m), where m > n.



 fs/buffer.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

--- 2420/fs/buffer.c~a	Thu Jan  9 02:03:06 2003
+++ 2420-akpm/fs/buffer.c	Thu Jan  9 02:04:02 2003
@@ -307,11 +307,11 @@ int sync_buffers(kdev_t dev, int wait)
 	 * 2) write out all dirty, unlocked buffers;
 	 * 2) wait for completion by waiting for all buffers to unlock.
 	 */
-	write_unlocked_buffers(dev);
+	write_unlocked_buffers(NODEV);
 	if (wait) {
-		err = wait_for_locked_buffers(dev, BUF_DIRTY, 0);
+		err = wait_for_locked_buffers(NODEV, BUF_DIRTY, 0);
 		write_unlocked_buffers(dev);
-		err |= wait_for_locked_buffers(dev, BUF_LOCKED, 1);
+		err |= wait_for_locked_buffers(NODEV, BUF_LOCKED, 1);
 	}
 	return err;
 }

_

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
@ 2003-01-22 18:02 yuval yeret
  0 siblings, 0 replies; 5+ messages in thread
From: yuval yeret @ 2003-01-22 18:02 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, yuval

Well, took some time to reproduce the environment with the kernel including 
the patch, but seems it didn't help after all.

The ctrl-scroll-lock output shows a stuck umount in R status, with a similar 
call trace as before, but the inner calls a little different:

c0141295                        __wait_on_buffer
c014200c                        fsync_buffers_list
c0155595        f25f9efc        clear_inode
c015553d        f25f9f2c        invalidate_inodes
c01461d8        f25f9f78        get_super
c014a629        f25f9f94        path_release
c0157c58        f25f9fc0        sys_umount
c0108cab                        sys_sigaltstack

This is using a 2.4.18-19.7 kernel patched as per the below suggestion.

Any pointers/suggestions are welcome

Thanks,
Yuval








>From: Andrew Morton <akpm@digeo.com>
>To: yuval yeret <yuval_yeret@hotmail.com>
>CC: linux-kernel@vger.kernel.org, yuval@exanet.com
>Subject: Re: 2.4.18-14 kernel stuck during ext3 umount with ping still 
>responding
>Date: Thu, 09 Jan 2003 02:06:55 -0800
>
>yuval yeret wrote:
> >
> > Hi,
> >
> > I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over 
>RAID
> > 0+1 volumes.
> >
> > >From time to time I get a black screen stuck machine while trying to 
>umount
> > a volume during an IO workload (as part of a failback solution - but 
>after
> > killing all IO processes ), with ping still responding, but everything 
>else
> > mostly dead.
> >
> > I tried using the forcedumount patch to solve this problem - to no 
>avail.
> > Also tried upgrading the qlogic drivers to the latest drivers from 
>Qlogic.
> >
> > After one of the occurences I managed to get some output using the sysrq
> > keys.
> >
> > This seems similar to what is described in
> > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but with a
> > different call trace
> >
> > What I have here is what I managed to copy down (for some reason 
>pgup/pgdown
> > didn't work so not all information is full...) together with a manual 
>lookup
> > of the call trace
> > from /proc/ksyms :
> >
> > process umount
> > EIP c01190b8                    (set_running_and_schedule)
> > call trace:
> > c01144c9        f25f9ec0        IO_APIC_get_PCI_irq_vector
> > c010a8b0        f25f9ed0        enable_irq
> > c014200c        f25f9ef0        fsync_buffers_list
> > c0155595        f25f9efc        clear_inode
> > c015553d        f25f9f2c        invalidate_inodes
> > c01461d8        f25f9f78        get_super
> > c014a629        f25f9f94        path_release
> > c0157c58        f25f9fc0        sys_umount
> > c0108cab                        sys_sigaltstack
> >
> > Any idea what can cause this ?
> >
>
>If you have a large amount of data against two or more filesystems,
>and you try to unmount one of them the kernel can seize up for a
>very long time in the fsync_dev()->sync_buffers() function.  Under
>these circumstances that function has O(n*n) search complexity
>and n is quite large.
>
>However your backtrace shows neither of those functions.
>
>Still, as an experiment it would be interesting to see if the below
>patch fixes it up.  It converts O(n*n) to O(m), where m > n.
>
>
>
>  fs/buffer.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
>--- 2420/fs/buffer.c~a	Thu Jan  9 02:03:06 2003
>+++ 2420-akpm/fs/buffer.c	Thu Jan  9 02:04:02 2003
>@@ -307,11 +307,11 @@ int sync_buffers(kdev_t dev, int wait)
>  	 * 2) write out all dirty, unlocked buffers;
>  	 * 2) wait for completion by waiting for all buffers to unlock.
>  	 */
>-	write_unlocked_buffers(dev);
>+	write_unlocked_buffers(NODEV);
>  	if (wait) {
>-		err = wait_for_locked_buffers(dev, BUF_DIRTY, 0);
>+		err = wait_for_locked_buffers(NODEV, BUF_DIRTY, 0);
>  		write_unlocked_buffers(dev);
>-		err |= wait_for_locked_buffers(dev, BUF_LOCKED, 1);
>+		err |= wait_for_locked_buffers(NODEV, BUF_LOCKED, 1);
>  	}
>  	return err;
>  }
>
>_


_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.18-14 kernel stuck during ext3 umount with ping still responding
@ 2003-01-15 19:13 yuval yeret
  0 siblings, 0 replies; 5+ messages in thread
From: yuval yeret @ 2003-01-15 19:13 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, yuval

Hi,

The problem reproduces on a 2.4.18-19 kernel as well. Took some more time 
but finally it roared its ugly head.

This is the stack trace from the new kernel:

>c01190b8	f3791eb4	set_running_and_schedule
>c010a8b0	f3791ed0	enable_irq
>c014200c	f3791f0c	IO_APIC_get_PCI_irq_vector
>c0155595	f3791f0c	clear_inode
>c01556639	f3791f60	invalidate_inodes
>c0149629	f3791f8	set_binfmt
>c0157c58	f3791f94	sys_umount
>c0108cab	0f3791fc0	sys_sigaltstack


Andrew Morton suggested a buffer.c patch for reducing search complexity, 
which I will try next.

Any further comments/suggestions are welcome

Thanks,
Yuval







>From: Andrew Morton <akpm@digeo.com>
>To: yuval yeret <yuval_yeret@hotmail.com>
>CC: linux-kernel@vger.kernel.org, yuval@exanet.com
>Subject: Re: 2.4.18-14 kernel stuck during ext3 umount with ping still 
>responding
>Date: Thu, 09 Jan 2003 02:06:55 -0800
>
>yuval yeret wrote:
> >
> > Hi,
> >
> > I'm running a 2.4.18-14 kernel with a heavy IO profile using ext3 over 
>RAID
> > 0+1 volumes.
> >
> > >From time to time I get a black screen stuck machine while trying to 
>umount
> > a volume during an IO workload (as part of a failback solution - but 
>after
> > killing all IO processes ), with ping still responding, but everything 
>else
> > mostly dead.
> >
> > I tried using the forcedumount patch to solve this problem - to no 
>avail.
> > Also tried upgrading the qlogic drivers to the latest drivers from 
>Qlogic.
> >
> > After one of the occurences I managed to get some output using the sysrq
> > keys.
> >
> > This seems similar to what is described in
> > http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=77508 but with a
> > different call trace
> >
> > What I have here is what I managed to copy down (for some reason 
>pgup/pgdown
> > didn't work so not all information is full...) together with a manual 
>lookup
> > of the call trace
> > from /proc/ksyms :
> >
> > process umount
> > EIP c01190b8                    (set_running_and_schedule)
> > call trace:
> > c01144c9        f25f9ec0        IO_APIC_get_PCI_irq_vector
> > c010a8b0        f25f9ed0        enable_irq
> > c014200c        f25f9ef0        fsync_buffers_list
> > c0155595        f25f9efc        clear_inode
> > c015553d        f25f9f2c        invalidate_inodes
> > c01461d8        f25f9f78        get_super
> > c014a629        f25f9f94        path_release
> > c0157c58        f25f9fc0        sys_umount
> > c0108cab                        sys_sigaltstack
> >
> > Any idea what can cause this ?
> >
>
>If you have a large amount of data against two or more filesystems,
>and you try to unmount one of them the kernel can seize up for a
>very long time in the fsync_dev()->sync_buffers() function.  Under
>these circumstances that function has O(n*n) search complexity
>and n is quite large.
>
>However your backtrace shows neither of those functions.
>
>Still, as an experiment it would be interesting to see if the below
>patch fixes it up.  It converts O(n*n) to O(m), where m > n.
>
>
>
>  fs/buffer.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
>--- 2420/fs/buffer.c~a	Thu Jan  9 02:03:06 2003
>+++ 2420-akpm/fs/buffer.c	Thu Jan  9 02:04:02 2003
>@@ -307,11 +307,11 @@ int sync_buffers(kdev_t dev, int wait)
>  	 * 2) write out all dirty, unlocked buffers;
>  	 * 2) wait for completion by waiting for all buffers to unlock.
>  	 */
>-	write_unlocked_buffers(dev);
>+	write_unlocked_buffers(NODEV);
>  	if (wait) {
>-		err = wait_for_locked_buffers(dev, BUF_DIRTY, 0);
>+		err = wait_for_locked_buffers(NODEV, BUF_DIRTY, 0);
>  		write_unlocked_buffers(dev);
>-		err |= wait_for_locked_buffers(dev, BUF_LOCKED, 1);
>+		err |= wait_for_locked_buffers(NODEV, BUF_LOCKED, 1);
>  	}
>  	return err;
>  }
>
>_


_________________________________________________________________
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-01-22 17:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-09  9:38 2.4.18-14 kernel stuck during ext3 umount with ping still responding yuval yeret
2003-01-09  9:58 ` Arjan van de Ven
2003-01-09 10:06 ` Andrew Morton
2003-01-15 19:13 yuval yeret
2003-01-22 18:02 yuval yeret

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.