linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to send a break?
@ 2006-05-27 12:58 Haar János
  2006-05-27 23:43 ` Jim Crilly
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Haar János @ 2006-05-27 12:58 UTC (permalink / raw)
  To: linux-kernel

Hello, list,

I wish to know, how to send a "BREAK" to trigger the sysreq functions on the
serial line, using echo.

I mean like this:

#!/bin/bash
echo "?BREAK?" >/dev/ttyS0
sleep 2
echo "m" >/dev/ttyS0

Thanks,
Janos


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break?
  2006-05-27 12:58 How to send a break? Haar János
@ 2006-05-27 23:43 ` Jim Crilly
  2006-05-28  7:04   ` How to send a break? - dump from frozen 64bit linux Haar János
  2006-05-28 23:06 ` How to send a break? H. Peter Anvin
  2006-05-29 15:08 ` linux-os (Dick Johnson)
  2 siblings, 1 reply; 26+ messages in thread
From: Jim Crilly @ 2006-05-27 23:43 UTC (permalink / raw)
  To: Haar János; +Cc: linux-kernel

On 05/27/06 02:58:44PM +0200, Haar János wrote:
> Hello, list,
> 
> I wish to know, how to send a "BREAK" to trigger the sysreq functions on the
> serial line, using echo.
> 
> I mean like this:
> 
> #!/bin/bash
> echo "?BREAK?" >/dev/ttyS0
> sleep 2
> echo "m" >/dev/ttyS0
> 

Is there a reason you can't use "echo -n m > /proc/sysrq-trigger"?

Jim.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-27 23:43 ` Jim Crilly
@ 2006-05-28  7:04   ` Haar János
  2006-05-28 16:17     ` Jesper Juhl
  0 siblings, 1 reply; 26+ messages in thread
From: Haar János @ 2006-05-28  7:04 UTC (permalink / raw)
  To: Jim Crilly; +Cc: linux-kernel


----- Original Message ----- 
From: "Jim Crilly" <jim@why.dont.jablowme.net>
To: "Haar János" <djani22@netcenter.hu>
Cc: <linux-kernel@vger.kernel.org>
Sent: Sunday, May 28, 2006 1:43 AM
Subject: Re: How to send a break?


> On 05/27/06 02:58:44PM +0200, Haar János wrote:
> > Hello, list,
> >
> > I wish to know, how to send a "BREAK" to trigger the sysreq functions on
the
> > serial line, using echo.
> >
> > I mean like this:
> >
> > #!/bin/bash
> > echo "?BREAK?" >/dev/ttyS0
> > sleep 2
> > echo "m" >/dev/ttyS0
> >
>
> Is there a reason you can't use "echo -n m > /proc/sysrq-trigger"?

Yes, i want to dump my frequently frozen remote server, automatically, if it
is possible.  (using null-modem cable, and another server.)

Anyway, i made it this time by hand.

Here is the dump:
http://download.netcenter.hu/bughunt/20060528/log.txt  (400KB!)

Can somebody tell me, what is wrong exactly?

Anyway, i interested about, how can i -a single user- interpret these dump
to made error reporting more useful?


The problem:
I used one stable system on 32bit, but i need to switch to X86_64, because
the nbd cannot use >2TB devices on 32bit.
I have reinstall the RH 9.0 to FC 5.0, and recompiled everything what i used
to serving. (the OS is only for external tasks.)

But on 64bit, my system becomes unstable.
Sometimes it is frozen, but no error message at all!
I can see, the crond-jobs is hangs too, but syslog can post messages, but no
valuable at all.
(I cannot login with ssh.)

Thats why i need to dump with serial console.

Can somebody help me? :-)

Here is the "normal" dmesg message:
http://download.netcenter.hu/bughunt/20060528/dmesg.txt  (21KB)

Thanks,
Janos


>
> Jim.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-28  7:04   ` How to send a break? - dump from frozen 64bit linux Haar János
@ 2006-05-28 16:17     ` Jesper Juhl
  2006-05-28 17:34       ` Haar János
  0 siblings, 1 reply; 26+ messages in thread
From: Jesper Juhl @ 2006-05-28 16:17 UTC (permalink / raw)
  To: Haar János; +Cc: linux-kernel

On 28/05/06, Haar János <djani22@netcenter.hu> wrote:
>
> Can somebody tell me, what is wrong exactly?
>
I can't tell you exactely what's wrong unfortunately, but after
looking at your dump & dmesg I notice two things that might be worth
trying to change :

1) You seem to be running without any swap space at all. I't usually a
good idea always to have some swap configured - try adding a swap
file.
(note: I don't think this will help with your current problem, it's
just a good thing to do generally).

2) You should try the latest stable kernel. Currently that's 2.6.16.18
(http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.16.18.tar.bz2).
There have been lots of fixes added since 2.6.15.x and perhaps you are
lucky that whatever is giving you trouble  has already been fixed in
that kernel.


> Anyway, i interested about, how can i -a single user- interpret these dump
> to made error reporting more useful?
>
You can find some info in Documentation/sysrq.txt &
Documentation/oops-tracing.txt .
As for posting good error/bug reports, please read the REPORTING-BUGS
file in the root of the kernel source dir.


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-28 16:17     ` Jesper Juhl
@ 2006-05-28 17:34       ` Haar János
  2006-05-29  4:37         ` Jesper Juhl
  2006-05-30 10:22         ` Janos Haar
  0 siblings, 2 replies; 26+ messages in thread
From: Haar János @ 2006-05-28 17:34 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: linux-kernel

----- Original Message ----- 
From: "Jesper Juhl" <jesper.juhl@gmail.com>
To: "Haar János" <djani22@netcenter.hu>
Cc: <linux-kernel@vger.kernel.org>
Sent: Sunday, May 28, 2006 6:17 PM
Subject: Re: How to send a break? - dump from frozen 64bit linux


> On 28/05/06, Haar János <djani22@netcenter.hu> wrote:
> >
> > Can somebody tell me, what is wrong exactly?
> >
> I can't tell you exactely what's wrong unfortunately, but after
> looking at your dump & dmesg I notice two things that might be worth
> trying to change :
>
> 1) You seem to be running without any swap space at all. I't usually a
> good idea always to have some swap configured - try adding a swap
> file.
> (note: I don't think this will help with your current problem, it's
> just a good thing to do generally).

Thanks for the idea!

I did thinking of it allready, but dropped, because:
I can only use swap _file_ in this config, and swapping into file is
relatively slow.
I am affraid, it will be slow down this system for some cases.
The system (programs) is relatively small next to the used buffers/caches,
and the kernel will swap out the rarely used programs to be able free up
memory for caching.
I think this is not too good idea on this system, what have allready 4GB of
memory.
The minimum free space is changeable thanks to VM.

Are you sure?

>
> 2) You should try the latest stable kernel. Currently that's 2.6.16.18
> (http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.16.18.tar.bz2).
> There have been lots of fixes added since 2.6.15.x and perhaps you are
> lucky that whatever is giving you trouble  has already been fixed in
> that kernel.

Hmm.
Last time, when i try the 2.6.16.x, i have lost close to 4000 users home,
and documents on XFS filesystem!
(a lot of directory have renamed to "/*" like this one: "/ost+found" in the
root.)
I don't want to try it again! :-)

>
>
> > Anyway, i interested about, how can i -a single user- interpret these
dump
> > to made error reporting more useful?
> >
> You can find some info in Documentation/sysrq.txt &
> Documentation/oops-tracing.txt .
> As for posting good error/bug reports, please read the REPORTING-BUGS
> file in the root of the kernel source dir.

Thanks, i will read this.

Anyway, this 64bit hanging issue is reproducible on my system. (normally
about daily, but if i try to trigger it, can be about 3-4x daily.)
If somebody is interested, please let me know, and i will send any useful
infos to debugging this! :-)

Cheers,
Janos


>
>
> -- 
> Jesper Juhl <jesper.juhl@gmail.com>
> Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
> Plain text mails only, please      http://www.expita.com/nomime.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break?
  2006-05-27 12:58 How to send a break? Haar János
  2006-05-27 23:43 ` Jim Crilly
@ 2006-05-28 23:06 ` H. Peter Anvin
  2006-05-29 15:08 ` linux-os (Dick Johnson)
  2 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2006-05-28 23:06 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <01b701c6818d$4bcd37b0$1800a8c0@dcccs>
By author:    =?iso-8859-2?Q?Haar_J=E1nos?= <djani22@netcenter.hu>
In newsgroup: linux.dev.kernel
>
> Hello, list,
> 
> I wish to know, how to send a "BREAK" to trigger the sysreq functions on the
> serial line, using echo.
> 
> I mean like this:
> 
> #!/bin/bash
> echo "?BREAK?" >/dev/ttyS0
> sleep 2
> echo "m" >/dev/ttyS0
> 

You can't use it using echo, however, you can do it using Perl:

perl -e 'use POSIX; tcsendbreak(1,0);' > /dev/ttyS0

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-28 17:34       ` Haar János
@ 2006-05-29  4:37         ` Jesper Juhl
  2007-08-20  7:44           ` Andev Debi
  2006-05-30 10:22         ` Janos Haar
  1 sibling, 1 reply; 26+ messages in thread
From: Jesper Juhl @ 2006-05-29  4:37 UTC (permalink / raw)
  To: Haar János; +Cc: linux-kernel

On 28/05/06, Haar János <djani22@netcenter.hu> wrote:
[snip]
> I can only use swap _file_ in this config, and swapping into file is
> relatively slow.

Not so. With a 2.4.x kernel swap files were slower than swap
partitions, but with the 2.6 kernel a swap file is just as fast as a
swap partition.


[snip]
> >
> > 2) You should try the latest stable kernel. Currently that's 2.6.16.18
> > (http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.16.18.tar.bz2).
> > There have been lots of fixes added since 2.6.15.x and perhaps you are
> > lucky that whatever is giving you trouble  has already been fixed in
> > that kernel.
>
> Hmm.
> Last time, when i try the 2.6.16.x, i have lost close to 4000 users home,
> and documents on XFS filesystem!
> (a lot of directory have renamed to "/*" like this one: "/ost+found" in the
> root.)
> I don't want to try it again! :-)
>

That sounds like a pretty serious bug.
Are you sure it was caused by the kernel?
Did you report the bug to LKML & the XFS maintainers so it can get fixed?


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break?
  2006-05-27 12:58 How to send a break? Haar János
  2006-05-27 23:43 ` Jim Crilly
  2006-05-28 23:06 ` How to send a break? H. Peter Anvin
@ 2006-05-29 15:08 ` linux-os (Dick Johnson)
  2006-05-29 15:35   ` Valdis.Kletnieks
  2 siblings, 1 reply; 26+ messages in thread
From: linux-os (Dick Johnson) @ 2006-05-29 15:08 UTC (permalink / raw)
  To: Haar János; +Cc: linux-kernel


On Sat, 27 May 2006, [iso-8859-2] Haar János wrote:

> Hello, list,
>
> I wish to know, how to send a "BREAK" to trigger the sysreq functions on the
> serial line, using echo.
>
> I mean like this:
>
> #!/bin/bash
> echo "?BREAK?" >/dev/ttyS0
> sleep 2
> echo "m" >/dev/ttyS0
>
> Thanks,
> Janos
>

Can't you use /proc/sysrq-trigger?

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.73 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break?
  2006-05-29 15:08 ` linux-os (Dick Johnson)
@ 2006-05-29 15:35   ` Valdis.Kletnieks
  2006-05-29 17:32     ` Haar János
  0 siblings, 1 reply; 26+ messages in thread
From: Valdis.Kletnieks @ 2006-05-29 15:35 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Haar János, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 807 bytes --]

On Mon, 29 May 2006 11:08:15 EDT, "linux-os (Dick Johnson)" said:
> 
> On Sat, 27 May 2006, [iso-8859-2] Haar János wrote:
> 
> > Hello, list,
> >
> > I wish to know, how to send a "BREAK" to trigger the sysreq functions on the
> > serial line, using echo.
> >
> > I mean like this:
> >
> > #!/bin/bash
> > echo "?BREAK?" >/dev/ttyS0
> > sleep 2
> > echo "m" >/dev/ttyS0
> >
> > Thanks,
> > Janos
> >
> 
> Can't you use /proc/sysrq-trigger?

That can be tricky if the other end of /dev/ttyS0 is plugged into a debugging
serial port on an embedded system where you don't have easy access to a shell.

Or for that matter, if you're trying to talk to the serial port on a non-embedded
system, which is too far into OOM thrashing for you to be able to get a
usable shell prompt.....

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break?
  2006-05-29 15:35   ` Valdis.Kletnieks
@ 2006-05-29 17:32     ` Haar János
  0 siblings, 0 replies; 26+ messages in thread
From: Haar János @ 2006-05-29 17:32 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel


----- Original Message ----- 
From: <Valdis.Kletnieks@vt.edu>
To: "linux-os (Dick Johnson)" <linux-os@analogic.com>
Cc: "Haar János" <djani22@netcenter.hu>; <linux-kernel@vger.kernel.org>
Sent: Monday, May 29, 2006 5:35 PM
Subject: Re: How to send a break?

On Mon, 29 May 2006 11:08:15 EDT, "linux-os (Dick Johnson)" said:
>
> On Sat, 27 May 2006, [iso-8859-2] Haar János wrote:
>
> > Hello, list,
> >
> > I wish to know, how to send a "BREAK" to trigger the sysreq functions on
the
> > serial line, using echo.
> >
> > I mean like this:
> >
> > #!/bin/bash
> > echo "?BREAK?" >/dev/ttyS0
> > sleep 2
> > echo "m" >/dev/ttyS0
> >
> > Thanks,
> > Janos
> >
>
> Can't you use /proc/sysrq-trigger?

> That can be tricky if the other end of /dev/ttyS0 is plugged into a
debugging
> serial port on an embedded system where you don't have easy access to a
shell.

> Or for that matter, if you're trying to talk to the serial port on a
non-embedded
> system, which is too far into OOM thrashing for you to be able to get a
> usable shell prompt.....

This is for debugging an frozen X86_64 system! :-)

Thanks,
Janos


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-28 17:34       ` Haar János
  2006-05-29  4:37         ` Jesper Juhl
@ 2006-05-30 10:22         ` Janos Haar
  2006-05-30 19:03           ` Valdis.Kletnieks
  1 sibling, 1 reply; 26+ messages in thread
From: Janos Haar @ 2006-05-30 10:22 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: linux-kernel

[cut]

>
> >
> > 2) You should try the latest stable kernel. Currently that's 2.6.16.18
> > (http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.16.18.tar.bz2).
> > There have been lots of fixes added since 2.6.15.x and perhaps you are
> > lucky that whatever is giving you trouble  has already been fixed in
> > that kernel.
>

This time i try the 2.6.16.18 kernel, but the issue is the same!

Here is the logs:
http://download.netcenter.hu/bughunt/20060530/dump.txt  (The frozen system,
540KB)
http://download.netcenter.hu/bughunt/20060530/261618-good.txt  (After
reboot, the working system, 300KB, uptime 54 min)
http://download.netcenter.hu/bughunt/20060530/dmesg.txt  (The boot dmesg
file)

Can somebody tell me, whats wrong?

It seems like some part of the fs died.
(The "top", "watch df" hangs on the ssh window, in the "mc" the line is
moving, but if i try to step in/out from/to dir, it hangs too, ping reply is
working. )

I use only 3 fs:
- the root FS on NFS.
- one XFS mount point from sata drive (200GB)
- one huge XFS mount point from NBD. (14TB)

Cheers,
Janos


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-30 10:22         ` Janos Haar
@ 2006-05-30 19:03           ` Valdis.Kletnieks
  2006-05-30 21:44             ` Janos Haar
  2006-05-31  1:20             ` Steven Rostedt
  0 siblings, 2 replies; 26+ messages in thread
From: Valdis.Kletnieks @ 2006-05-30 19:03 UTC (permalink / raw)
  To: Janos Haar; +Cc: Jesper Juhl, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:

> http://download.netcenter.hu/bughunt/20060530/dump.txt  (The frozen system,
> 540KB)

> Can somebody tell me, whats wrong?

kblockd/1     D ffff81011f641778     0    25     19            26    24 (L-TLB)
ffff81011f641778 0000000000000000 0000000000000009 ffff81011f735358 
       ffff81011f735140 ffff81011fc79100 000014a00f9a0ef2 00000000000410dd 
       0000000102866d40 ffff810003900280 
Call Trace: <ffffffff8026d72a>{xfs_qm_shake+135} <ffffffff804e6046>{__mutex_lock_slowpath+424}
       <ffffffff804e62e4>{mutex_lock+41} <ffffffff8026d72a>{xfs_qm_shake+135}
       <ffffffff80157cfd>{shrink_slab+100} <ffffffff801584d9>{try_to_free_pages+372}
       <ffffffff80153c3f>{__alloc_pages+432} <ffffffff8046aef3>{tcp_sendmsg+1373}
       <ffffffff804848ad>{inet_sendmsg+70} <ffffffff8043f619>{sock_sendmsg+270}
       <ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff80440db3>{kernel_sendmsg+61}
       <ffffffff8802c111>{:nbd:sock_xmit+273} <ffffffff8015195d>{mempool_alloc_slab+17}
       <ffffffff80169b1b>{poison_obj+39} <ffffffff8015195d>{mempool_alloc_slab+17}
       <ffffffff80169c11>{cache_alloc_debugcheck_after+235}
       <ffffffff8015195d>{mempool_alloc_slab+17} <ffffffff802da471>{as_remove_queued_request+267}
       <ffffffff8802c472>{:nbd:nbd_send_req+517} <ffffffff8802c712>{:nbd:do_nbd_request+329}
       <ffffffff802d9b45>{as_work_handler+46} <ffffffff80139d30>{run_workqueue+168}
       <ffffffff802d9b17>{as_work_handler+0} <ffffffff8013a27f>{worker_thread+0}
       <ffffffff8013a383>{worker_thread+260} <ffffffff80123fa4>{default_wake_function+0}
       <ffffffff8013a27f>{worker_thread+0} <ffffffff8013d29f>{kthread+219}
       <ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
       <ffffffff8013d1c4>{kthread+0} <ffffffff8010bb9e>{child_rip+0}

Half the processes on the box seem wedged at that same mutex_lock. I can't
seem to find an xfs_qm_shake in my source tree though.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-30 19:03           ` Valdis.Kletnieks
@ 2006-05-30 21:44             ` Janos Haar
  2006-05-31  1:20             ` Steven Rostedt
  1 sibling, 0 replies; 26+ messages in thread
From: Janos Haar @ 2006-05-30 21:44 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: jesper.juhl, linux-kernel

> On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
>
> > http://download.netcenter.hu/bughunt/20060530/dump.txt  (The frozen
system,
> > 540KB)
>
> > Can somebody tell me, whats wrong?

> kblockd/1     D ffff81011f641778     0    25     19            26    24
(L-TLB)
> ffff81011f641778 0000000000000000 0000000000000009 ffff81011f735358
>        ffff81011f735140 ffff81011fc79100 000014a00f9a0ef2 00000000000410dd
>        0000000102866d40 ffff810003900280
> Call Trace: <ffffffff8026d72a>{xfs_qm_shake+135}
<ffffffff804e6046>{__mutex_lock_slowpath+424}
>        <ffffffff804e62e4>{mutex_lock+41}
<ffffffff8026d72a>{xfs_qm_shake+135}
>        <ffffffff80157cfd>{shrink_slab+100}
<ffffffff801584d9>{try_to_free_pages+372}
>        <ffffffff80153c3f>{__alloc_pages+432}
<ffffffff8046aef3>{tcp_sendmsg+1373}
>        <ffffffff804848ad>{inet_sendmsg+70}
<ffffffff8043f619>{sock_sendmsg+270}
>      <ffffffff8013d3e0>{autoremove_wake_function+0}
<ffffffff80440db3>{kernel_sendmsg+61}
>        <ffffffff8802c111>{:nbd:sock_xmit+273}
<ffffffff8015195d>{mempool_alloc_slab+17}
>        <ffffffff80169b1b>{poison_obj+39}
<ffffffff8015195d>{mempool_alloc_slab+17}
>        <ffffffff80169c11>{cache_alloc_debugcheck_after+235}
>        <ffffffff8015195d>{mempool_alloc_slab+17}
<ffffffff802da471>{as_remove_queued_request+267}
>        <ffffffff8802c472>{:nbd:nbd_send_req+517}
<ffffffff8802c712>{:nbd:do_nbd_request+329}
>        <ffffffff802d9b45>{as_work_handler+46}
<ffffffff80139d30>{run_workqueue+168}
>        <ffffffff802d9b17>{as_work_handler+0}
<ffffffff8013a27f>{worker_thread+0}
>        <ffffffff8013a383>{worker_thread+260}
<ffffffff80123fa4>{default_wake_function+0}
>        <ffffffff8013a27f>{worker_thread+0} <ffffffff8013d29f>{kthread+219}
>        <ffffffff8012590d>{schedule_tail+70}
<ffffffff8010bba6>{child_rip+8}
>        <ffffffff8013d1c4>{kthread+0} <ffffffff8010bb9e>{child_rip+0}
>
> Half the processes on the box seem wedged at that same mutex_lock. I can't
> seem to find an xfs_qm_shake in my source tree though.

The XFS, what i use is the default on the 2.6.16.18.
Anyway, the 2.6.16.18 is unpatched, i can use it from the original source.
This kernel already know what i need.
I only use this external module:
e1000-7.0.33

The XFS parts:

acl-2.2.34
attr-2.4.28
dmapi-2.2.3
xfsdump-2.2.33
xfsprogs-2.7.11


Sorry, but i cannot understand the mutex and lock, this is bad thing in this
dump? :-)

Anyway, this issue since then i step from i686 to X86_64!
I have upgrade the OS from rh9.0 to FC 5, and i had recompile to 64bit ALL
of the services, what i need to use.
(Kernel, apache, mysql+lib+client,pure-ftpd, nbd-client, xfs, php+libs)

Why doing this?
Because i need to use >2TB nbd-devices, and the nbd-client refused to use
them on 32bit. :-(
After i upgrade to X86_64, i have upgrade my huge device from 8TB to 14TB.
(And on XFS, there is no way to shrink back...)

Another useful info:
If this issue happens, i always use the reset button, or the sysreq-boot
with the serial cable.
During the reboot, the rc script runs the xfs_repair on my 2 general device,
and i can see, both are clean!
This shows, the kernel can flush the buffers using  nbd, and sata (libata)!
The NFS-ROOT can not to be unclean. :-)

Cheers,
Janos


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-30 19:03           ` Valdis.Kletnieks
  2006-05-30 21:44             ` Janos Haar
@ 2006-05-31  1:20             ` Steven Rostedt
  2006-05-31  4:38               ` XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux) Nathan Scott
  1 sibling, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2006-05-31  1:20 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Janos Haar, Jesper Juhl, linux-kernel, xfs-masters, nathans, linux-xfs

Added all those listed in the MAINTAINERS file for XFS.

On Tue, 2006-05-30 at 15:03 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
> 
> > http://download.netcenter.hu/bughunt/20060530/dump.txt  (The frozen system,
> > 540KB)
> 
> > Can somebody tell me, whats wrong?
> 
> kblockd/1     D ffff81011f641778     0    25     19            26    24 (L-TLB)
> ffff81011f641778 0000000000000000 0000000000000009 ffff81011f735358 
>        ffff81011f735140 ffff81011fc79100 000014a00f9a0ef2 00000000000410dd 
>        0000000102866d40 ffff810003900280 
> Call Trace: <ffffffff8026d72a>{xfs_qm_shake+135} <ffffffff804e6046>{__mutex_lock_slowpath+424}
>        <ffffffff804e62e4>{mutex_lock+41} <ffffffff8026d72a>{xfs_qm_shake+135}
>        <ffffffff80157cfd>{shrink_slab+100} <ffffffff801584d9>{try_to_free_pages+372}
>        <ffffffff80153c3f>{__alloc_pages+432} <ffffffff8046aef3>{tcp_sendmsg+1373}
>        <ffffffff804848ad>{inet_sendmsg+70} <ffffffff8043f619>{sock_sendmsg+270}
>        <ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff80440db3>{kernel_sendmsg+61}
>        <ffffffff8802c111>{:nbd:sock_xmit+273} <ffffffff8015195d>{mempool_alloc_slab+17}
>        <ffffffff80169b1b>{poison_obj+39} <ffffffff8015195d>{mempool_alloc_slab+17}
>        <ffffffff80169c11>{cache_alloc_debugcheck_after+235}
>        <ffffffff8015195d>{mempool_alloc_slab+17} <ffffffff802da471>{as_remove_queued_request+267}
>        <ffffffff8802c472>{:nbd:nbd_send_req+517} <ffffffff8802c712>{:nbd:do_nbd_request+329}
>        <ffffffff802d9b45>{as_work_handler+46} <ffffffff80139d30>{run_workqueue+168}
>        <ffffffff802d9b17>{as_work_handler+0} <ffffffff8013a27f>{worker_thread+0}
>        <ffffffff8013a383>{worker_thread+260} <ffffffff80123fa4>{default_wake_function+0}
>        <ffffffff8013a27f>{worker_thread+0} <ffffffff8013d29f>{kthread+219}
>        <ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
>        <ffffffff8013d1c4>{kthread+0} <ffffffff8010bb9e>{child_rip+0}
> 
> Half the processes on the box seem wedged at that same mutex_lock. I can't
> seem to find an xfs_qm_shake in my source tree though.

What everyone is waiting for is being blocked here:

kswapd0       D ffff81011fe03c38     0   297      1          1287    19 (L-TLB)
ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
       ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c 000000000003bcde
       000000002f2c46e0 ffff81008bc37180
Call Trace: <ffffffff804e5522>{schedule_timeout+34}
       <ffffffff80269f87>{xfs_qm_dqunpin_wait+220} <ffffffff80140e74>{debug_mutex_free_waiter+141}
       <ffffffff80123fa4>{default_wake_function+0} <ffffffff80268ca5>{xfs_qm_dqflush+70}
       <ffffffff8026d7a7>{xfs_qm_shake+260} <ffffffff80157cfd>{shrink_slab+100}
       <ffffffff8015801e>{balance_pgdat+559} <ffffffff801582e8>{kswapd+283}
       <ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff804e6a80>{_spin_unlock_irq+9}
       <ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
       <ffffffff801581cd>{kswapd+0} <ffffffff8010bb9e>{child_rip+0}


Seems that the kswapd0 has the lock in questing and has put itself to
sleep waiting to be woken up.  I don't know the xfs code very well, but
the kswapd0 seems to be in this function:

xfs_qm_dqunpin_wait(
	xfs_dquot_t	*dqp)
{
	SPLDECL(s);

	ASSERT(XFS_DQ_IS_LOCKED(dqp));
	if (dqp->q_pincount == 0) {
		return;
	}

	/*
	 * Give the log a push so we don't wait here too long.
	 */
	xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
	s = XFS_DQ_PINLOCK(dqp);
	if (dqp->q_pincount == 0) {
		XFS_DQ_PINUNLOCK(dqp, s);
		return;
	}
	sv_wait(&(dqp->q_pinwait), PINOD,
		&(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
}


Where sv_wait is:

#define sv_wait(sv, pri, lock, s) \
	_sv_wait(sv, lock, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)

And our macro hell goes further ...

static inline void _sv_wait(sv_t *sv, spinlock_t *lock, int state,
			     unsigned long timeout)
{
	DECLARE_WAITQUEUE(wait, current);

	add_wait_queue_exclusive(&sv->waiters, &wait);
	__set_current_state(state);
	spin_unlock(lock);

	schedule_timeout(timeout);

	remove_wait_queue(&sv->waiters, &wait);
}


So it is now waiting to be woken up by something that calls:

xfs_qm_dquot_logitem_unpin  which seems to be the function to wake it
up.

And decyphering all the macro crap it seems that the function that wakes
it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.


The above xfs_qm_dqunpin_wait still looks awfully racy, and the
xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
kswapd0, doesn't have a return code check.  So if it failed to do
whatever the hell it's doing (that code gives me a headache), it looks
like this guy might sleep forever holding a lock that will prevent
others from freeing kernel memory.

Well that's about all I can figure out.

Good luck,

-- Steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-05-31  1:20             ` Steven Rostedt
@ 2006-05-31  4:38               ` Nathan Scott
  2006-05-31  8:00                 ` Janos Haar
  0 siblings, 1 reply; 26+ messages in thread
From: Nathan Scott @ 2006-05-31  4:38 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-kernel, linux-xfs

On Tue, May 30, 2006 at 09:20:31PM -0400, Steven Rostedt wrote:
> Added all those listed in the MAINTAINERS file for XFS.

Thanks Steve.

> On Tue, 2006-05-30 at 15:03 -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
> > Half the processes on the box seem wedged at that same mutex_lock. I can't
> > seem to find an xfs_qm_shake in my source tree though.

Its in fs/xfs/quota/xfs_qm.c.

> kswapd0       D ffff81011fe03c38     0   297      1          1287    19 (L-TLB)
> ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
>        ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c 000000000003bcde
>        000000002f2c46e0 ffff81008bc37180
> Call Trace: <ffffffff804e5522>{schedule_timeout+34}
>        <ffffffff80269f87>{xfs_qm_dqunpin_wait+220} <ffffffff80140e74>{debug_mutex_free_waiter+141}

So, we're waiting here on a synchronisation variable that'll
be released once the dquot metadata buffer write completes.

> So it is now waiting to be woken up by something that calls:
> 
> xfs_qm_dquot_logitem_unpin  which seems to be the function to wake it
> up.

Mhmm, that'd be called by the I/O completion handler on the buffer
containing that dquot.

> And decyphering all the macro crap it seems that the function that wakes
> it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.

Right (the former, at this point in the code).

> The above xfs_qm_dqunpin_wait still looks awfully racy, and the
> xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
> kswapd0, doesn't have a return code check.  So if it failed to do

The logforce isn't race-critical here - its ensuring writeout
of previously logged buffers is started before we go to sleep
waiting for the driver to wake us up when its done.

An earlier I/O error on the journal is the only thing the log
force can return as an error there, which isnt useful at that
point anyway (we're in a kernel thread trying to free mem).

> whatever the hell it's doing (that code gives me a headache), it looks

Heh, likewise.  I have voodoo dolls of one or two of the early
XFS folks that I like to poke with needles occasionally.. :)

> like this guy might sleep forever holding a lock that will prevent
> others from freeing kernel memory.

It will sleep until the previously initiated buffer write is done.
AFAICT, we aren't seeing the I/O completion here for some reason...
which points more to a possible device driver or h/ware issue (that
is the usual root cause of this sort of hang, anyway).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-05-31  4:38               ` XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux) Nathan Scott
@ 2006-05-31  8:00                 ` Janos Haar
  2006-05-31 21:54                   ` Jan Engelhardt
  2006-06-01 21:58                   ` Nathan Scott
  0 siblings, 2 replies; 26+ messages in thread
From: Janos Haar @ 2006-05-31  8:00 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, linux-xfs


----- Original Message ----- 
From: "Nathan Scott" <nathans@sgi.com>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: <linux-kernel@vger.kernel.org>; <linux-xfs@oss.sgi.com>
Sent: Wednesday, May 31, 2006 6:38 AM
Subject: XFS related hang (was Re: How to send a break? - dump from frozen
64bit linux)


> On Tue, May 30, 2006 at 09:20:31PM -0400, Steven Rostedt wrote:
> > Added all those listed in the MAINTAINERS file for XFS.
>
> Thanks Steve.
>
> > On Tue, 2006-05-30 at 15:03 -0400, Valdis.Kletnieks@vt.edu wrote:
> > > On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
> > > Half the processes on the box seem wedged at that same mutex_lock. I
can't
> > > seem to find an xfs_qm_shake in my source tree though.
>
> Its in fs/xfs/quota/xfs_qm.c.
>
> > kswapd0       D ffff81011fe03c38     0   297      1          1287    19
(L-TLB)
> > ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
> >        ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c
000000000003bcde
> >        000000002f2c46e0 ffff81008bc37180
> > Call Trace: <ffffffff804e5522>{schedule_timeout+34}
> >        <ffffffff80269f87>{xfs_qm_dqunpin_wait+220}
<ffffffff80140e74>{debug_mutex_free_waiter+141}
>
> So, we're waiting here on a synchronisation variable that'll
> be released once the dquot metadata buffer write completes.
>
> > So it is now waiting to be woken up by something that calls:
> >
> > xfs_qm_dquot_logitem_unpin  which seems to be the function to wake it
> > up.
>
> Mhmm, that'd be called by the I/O completion handler on the buffer
> containing that dquot.
>
> > And decyphering all the macro crap it seems that the function that wakes
> > it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.
>
> Right (the former, at this point in the code).
>
> > The above xfs_qm_dqunpin_wait still looks awfully racy, and the
> > xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
> > kswapd0, doesn't have a return code check.  So if it failed to do
>
> The logforce isn't race-critical here - its ensuring writeout
> of previously logged buffers is started before we go to sleep
> waiting for the driver to wake us up when its done.
>
> An earlier I/O error on the journal is the only thing the log
> force can return as an error there, which isnt useful at that
> point anyway (we're in a kernel thread trying to free mem).
>
> > whatever the hell it's doing (that code gives me a headache), it looks
>
> Heh, likewise.  I have voodoo dolls of one or two of the early
> XFS folks that I like to poke with needles occasionally.. :)
>
> > like this guy might sleep forever holding a lock that will prevent
> > others from freeing kernel memory.
>
> It will sleep until the previously initiated buffer write is done.
> AFAICT, we aren't seeing the I/O completion here for some reason...
> which points more to a possible device driver or h/ware issue (that
> is the usual root cause of this sort of hang, anyway).
>
> cheers.

Hey, i think i found something.
My quota on my huge device is broken.
(inferno   -- 18014398504855404       0       0        18446744073709551519
0     0)
I cant found a way to re-initialize it.
But anyway, at this point i dont need it, trying to disable the quota usage.
We will see....

Thanks a lot!

Janos

>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-05-31  8:00                 ` Janos Haar
@ 2006-05-31 21:54                   ` Jan Engelhardt
  2006-06-01  7:29                     ` Janos Haar
  2006-06-01 21:58                   ` Nathan Scott
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Engelhardt @ 2006-05-31 21:54 UTC (permalink / raw)
  To: Janos Haar; +Cc: Nathan Scott, linux-kernel, linux-xfs

>
>Hey, i think i found something.
>My quota on my huge device is broken.

That should not be a problem. I ran into that "problem" too but had no 
lockups back then (2.6.16-rc1).

>(inferno   -- 18014398504855404       0       0        18446744073709551519
>0     0)
>I cant found a way to re-initialize it.

Reinit:

quotaoff /mntpt
umount /mntpt
mount /mntpt

>But anyway, at this point i dont need it, trying to disable the quota usage.
>We will see....


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-05-31 21:54                   ` Jan Engelhardt
@ 2006-06-01  7:29                     ` Janos Haar
  2006-06-01  9:44                       ` Jan Engelhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Janos Haar @ 2006-06-01  7:29 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: nathans, linux-kernel, linux-xfs


----- Original Message ----- 
From: "Jan Engelhardt" <jengelh@linux01.gwdg.de>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: "Nathan Scott" <nathans@sgi.com>; <linux-kernel@vger.kernel.org>;
<linux-xfs@oss.sgi.com>
Sent: Wednesday, May 31, 2006 11:54 PM
Subject: Re: XFS related hang (was Re: How to send a break? - dump from
frozen 64bit linux)


> >
> >Hey, i think i found something.
> >My quota on my huge device is broken.
>
> That should not be a problem. I ran into that "problem" too but had no
> lockups back then (2.6.16-rc1).

 09:21:36 up 23:05,  1 user,  load average: 13.45, 13.14, 13.11
This looks like fixed with disable the quota usage.

The system hangs more often, when i use a script what heavily uses chown and
chgrp and chmod.
Thats why i think, to disable the quota.
At this point it looks like fixed.


>
> >(inferno   -- 18014398504855404       0       0
18446744073709551519
> >0     0)
> >I cant found a way to re-initialize it.
>
> Reinit:
>
> quotaoff /mntpt
> umount /mntpt
> mount /mntpt

Thanks! :-)

Janos


>
> >But anyway, at this point i dont need it, trying to disable the quota
usage.
> >We will see....
>
>
> Jan Engelhardt
> -- 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01  7:29                     ` Janos Haar
@ 2006-06-01  9:44                       ` Jan Engelhardt
  2006-06-01 22:04                         ` Nathan Scott
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Engelhardt @ 2006-06-01  9:44 UTC (permalink / raw)
  To: Janos Haar; +Cc: nathans, linux-kernel, linux-xfs

>> Reinit:
>>
>> quotaoff /mntpt
>> umount /mntpt
>> mount /mntpt
>
>Thanks! :-)
>
Too bad XFS does not reinit quota on these commands:

qutoaoff /mp
quotaon /mp

Yes, it would lock the filesystem for a moment, but that's better than 
trying to unmount it under someone having inodes open!


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-05-31  8:00                 ` Janos Haar
  2006-05-31 21:54                   ` Jan Engelhardt
@ 2006-06-01 21:58                   ` Nathan Scott
  2006-06-01 22:14                     ` Janos Haar
  1 sibling, 1 reply; 26+ messages in thread
From: Nathan Scott @ 2006-06-01 21:58 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-kernel, linux-xfs

On Wed, May 31, 2006 at 10:00:33AM +0200, Janos Haar wrote:
> 
> Hey, i think i found something.
> My quota on my huge device is broken.
> (inferno   -- 18014398504855404       0       0        18446744073709551519
> 0     0)

Hmm, that is interesting.  I guess you don't know whether this
accounting problem happened before you rebooted or whether it
only just got this way (after journal recovery)?

> I cant found a way to re-initialize it.
> But anyway, at this point i dont need it, trying to disable the quota usage.
> We will see....

Jan's recipe was spot on, do that.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01  9:44                       ` Jan Engelhardt
@ 2006-06-01 22:04                         ` Nathan Scott
  2006-06-02  5:11                           ` Jan Engelhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Nathan Scott @ 2006-06-01 22:04 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Janos Haar, linux-kernel, linux-xfs

On Thu, Jun 01, 2006 at 11:44:46AM +0200, Jan Engelhardt wrote:
> >> Reinit:
> >>
> >> quotaoff /mntpt
> >> umount /mntpt
> >> mount /mntpt
> >
> >Thanks! :-)
> >
> Too bad XFS does not reinit quota on these commands:
> 
> qutoaoff /mp
> quotaon /mp

Hmm, remount would be saner if we wanted to take that approach...

> Yes, it would lock the filesystem for a moment, but that's better than 
> trying to unmount it under someone having inodes open!

But its not just a moment, a quotacheck needs to scan every inode
in the filesystem (on disk) to correctly account for all space/inode
usage.  Its not something to be encouraging people to do frequently,
and it would also be very difficult to correctly implement (while the
filesystem is actively being modified I mean).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01 21:58                   ` Nathan Scott
@ 2006-06-01 22:14                     ` Janos Haar
  2006-06-01 23:43                       ` Nathan Scott
  0 siblings, 1 reply; 26+ messages in thread
From: Janos Haar @ 2006-06-01 22:14 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, linux-xfs

---- Original Message ----- 
From: "Nathan Scott" <nathans@sgi.com>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: <linux-kernel@vger.kernel.org>; <linux-xfs@oss.sgi.com>
Sent: Thursday, June 01, 2006 11:58 PM
Subject: Re: XFS related hang (was Re: How to send a break? - dump from
frozen 64bit linux)


> On Wed, May 31, 2006 at 10:00:33AM +0200, Janos Haar wrote:
> >
> > Hey, i think i found something.
> > My quota on my huge device is broken.
> > (inferno   -- 18014398504855404       0       0
18446744073709551519
> > 0     0)
>
> Hmm, that is interesting.  I guess you don't know whether this
> accounting problem happened before you rebooted or whether it
> only just got this way (after journal recovery)?

In my system, this huge device is difficult.
I often need to reboot, and run xfs_repair, to make it clean. (nodes hangs,
reboots, etc...)
On the beginning, i use the xfs_repair without any options, but it requires
to do a mount/umount the mtp before.
The problem is, i often get an error message, (dump) during the journal
recovery, and after i cannot run the xfs_repair from script, because it
needs the log done by mount.
Now is my default reboot option is xfs_repair -L, so i dont know, this
happens before, or after, sorry.


>
> > I cant found a way to re-initialize it.
> > But anyway, at this point i dont need it, trying to disable the quota
usage.
> > We will see....
>
> Jan's recipe was spot on, do that.

The qouta stop solves the hangs problem.
This is a bug?

Cheers,
Janos

>
> cheers.
>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01 22:14                     ` Janos Haar
@ 2006-06-01 23:43                       ` Nathan Scott
  2006-06-02  8:01                         ` Janos Haar
  0 siblings, 1 reply; 26+ messages in thread
From: Nathan Scott @ 2006-06-01 23:43 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-kernel, linux-xfs

On Fri, Jun 02, 2006 at 12:14:04AM +0200, Janos Haar wrote:
> ---- Original Message ----- 
> > On Wed, May 31, 2006 at 10:00:33AM +0200, Janos Haar wrote:
> > >
> > > Hey, i think i found something.
> > > My quota on my huge device is broken.
> > > (inferno   -- 18014398504855404       0       0
> 18446744073709551519
> > > 0     0)
> >
> > Hmm, that is interesting.  I guess you don't know whether this
> > accounting problem happened before you rebooted or whether it
> > only just got this way (after journal recovery)?
> 
> In my system, this huge device is difficult.

Can you describe your hardware a bit more?  (and send xfs_info
output too please).

> I often need to reboot, and run xfs_repair, to make it clean. (nodes hangs,
> reboots, etc...)

Ehrm, hmm, that smells fishy... does this device have a write
cache enabled by any chance?

> Now is my default reboot option is xfs_repair -L, so i dont know, this
> happens before, or after, sorry.

Oh, thats bad, all bets are off then - you really cant go doing
that routinely, thats an "in emergency only" big red button -
it throws away the contents of the journal, and will pretty much
guarantee filesystem corruption.

But, it sounds alot like you may have a big hardware reliability
issue there, which is going to make it difficult to distinguish
any software problems.  However, if you find a way to reproduce
that quota accounting problem (above), I'm all ears.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01 22:04                         ` Nathan Scott
@ 2006-06-02  5:11                           ` Jan Engelhardt
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Engelhardt @ 2006-06-02  5:11 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Janos Haar, linux-kernel, linux-xfs

>> Too bad XFS does not reinit quota on these commands:
>> 
>> qutoaoff /mp
>> quotaon /mp
>
>Hmm, remount would be saner if we wanted to take that approach...
>
quotacheck would be sanest :) But the struct super_block->remount is 
probably the best idea in kernel space.

>> Yes, it would lock the filesystem for a moment, but that's better than 
>> trying to unmount it under someone having inodes open!
>
>But its not just a moment, a quotacheck needs to scan every inode
>in the filesystem (on disk) to correctly account for all space/inode
>usage.

Yeah right, XFS was designed for large systems rather than for just 
my 262188 files. (The latter of which completes in an "adequate" time of 
a few secs.)

>Its not something to be encouraging people to do frequently,
>
Certainly not, but XFS has the advange of bulkstat for quota scanning.
`quotacheck` on vfsv0 quota databases always takes longer IMO.

>and it would also be very difficult to correctly implement (while the
>filesystem is actively being modified I mean).
>
Noted.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
  2006-06-01 23:43                       ` Nathan Scott
@ 2006-06-02  8:01                         ` Janos Haar
  0 siblings, 0 replies; 26+ messages in thread
From: Janos Haar @ 2006-06-02  8:01 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, linux-xfs


----- Original Message ----- 
From: "Nathan Scott" <nathans@sgi.com>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: <linux-kernel@vger.kernel.org>; <linux-xfs@oss.sgi.com>
Sent: Friday, June 02, 2006 1:43 AM
Subject: Re: XFS related hang (was Re: How to send a break? - dump from
frozen 64bit linux)


> On Fri, Jun 02, 2006 at 12:14:04AM +0200, Janos Haar wrote:
> > ---- Original Message ----- 
> > > On Wed, May 31, 2006 at 10:00:33AM +0200, Janos Haar wrote:
> > > >
> > > > Hey, i think i found something.
> > > > My quota on my huge device is broken.
> > > > (inferno   -- 18014398504855404       0       0
> > 18446744073709551519
> > > > 0     0)
> > >
> > > Hmm, that is interesting.  I guess you don't know whether this
> > > accounting problem happened before you rebooted or whether it
> > > only just got this way (after journal recovery)?
> >
> > In my system, this huge device is difficult.
>
> Can you describe your hardware a bit more?  (and send xfs_info
> output too please).

[root@X64 ~]# xfs_info /mnt/md0
meta-data=/dev/md31              isize=256    agcount=2600, agsize=1240024
blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=3223457536, imaxpct=25
         =                       sunit=1      swidth=4 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=16384  blocks=0, rtextents=0

(I used the xfs_grow 2 times)

The hw:
I use 4 nodes, each has 3.3TiB (RAID4 array), and serves NBD.

On the concentrator the RAID0 makes one 12.9TiB from 4x3.3TiB nbd device.
The strip is 4kb. (tested, and optimal for performance)


>
> > I often need to reboot, and run xfs_repair, to make it clean. (nodes
hangs,
> > reboots, etc...)
>
> Ehrm, hmm, that smells fishy... does this device have a write
> cache enabled by any chance?

Yes, you have right!
I know, this is a big chance to corrupt the fs, but i need strongly the
write caching!

This quota corruption is from that case too....

>
> > Now is my default reboot option is xfs_repair -L, so i dont know, this
> > happens before, or after, sorry.
>
> Oh, thats bad, all bets are off then - you really cant go doing
> that routinely, thats an "in emergency only" big red button -

:-) Yes, i know.
But on my case, the service is much more important than the data inside the
fs.
I run a huge "free web storage", and i hate that thing, when i get up, and
can see, the system stops on the automatic reboot, and down for few hours...
>8-(
If it can reboot normally, and drop some MB or GB, this is a "little lose"
for me.

> it throws away the contents of the journal, and will pretty much
> guarantee filesystem corruption.

Anyway, if i remove the he -L, the boot hangs on mount about 8 from10 times.
The ~1GB lose of 4K strip can pretty much damage the journal too.....
If it can repair the fs (2 times from 10), it is often uncompleted, and some
minutes or hours lated i get the XFS_FORCE_SHUTDOWN message thanks to the
corruption....
(I planned to use an external log, but at this time i dont trust too much
the journal recovery....
And if the concentrator finish the flush, the journal notes that it is
completed, but the node can hang during write, and drop the data anyway.)


>
> But, it sounds alot like you may have a big hardware reliability
> issue there, which is going to make it difficult to distinguish
> any software problems.  However, if you find a way to reproduce
> that quota accounting problem (above), I'm all ears.

Sorry, but i cant.
Additionally, i allready have shut down the quota, and i cannot reproduce
the "bad quota related hang" problem.

Thanks a lot!

Janos

>
> cheers.
>
> -- 
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to send a break? - dump from frozen 64bit linux
  2006-05-29  4:37         ` Jesper Juhl
@ 2007-08-20  7:44           ` Andev Debi
  0 siblings, 0 replies; 26+ messages in thread
From: Andev Debi @ 2007-08-20  7:44 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Haar János, linux-kernel

On 5/29/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 28/05/06, Haar János <djani22@netcenter.hu> wrote:
> [snip]
> > I can only use swap _file_ in this config, and swapping into file is
> > relatively slow.
>
> Not so. With a 2.4.x kernel swap files were slower than swap
> partitions, but with the 2.6 kernel a swap file is just as fast as a
> swap partition.
>

what made this possible? any pointers please?

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-08-20  7:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-27 12:58 How to send a break? Haar János
2006-05-27 23:43 ` Jim Crilly
2006-05-28  7:04   ` How to send a break? - dump from frozen 64bit linux Haar János
2006-05-28 16:17     ` Jesper Juhl
2006-05-28 17:34       ` Haar János
2006-05-29  4:37         ` Jesper Juhl
2007-08-20  7:44           ` Andev Debi
2006-05-30 10:22         ` Janos Haar
2006-05-30 19:03           ` Valdis.Kletnieks
2006-05-30 21:44             ` Janos Haar
2006-05-31  1:20             ` Steven Rostedt
2006-05-31  4:38               ` XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux) Nathan Scott
2006-05-31  8:00                 ` Janos Haar
2006-05-31 21:54                   ` Jan Engelhardt
2006-06-01  7:29                     ` Janos Haar
2006-06-01  9:44                       ` Jan Engelhardt
2006-06-01 22:04                         ` Nathan Scott
2006-06-02  5:11                           ` Jan Engelhardt
2006-06-01 21:58                   ` Nathan Scott
2006-06-01 22:14                     ` Janos Haar
2006-06-01 23:43                       ` Nathan Scott
2006-06-02  8:01                         ` Janos Haar
2006-05-28 23:06 ` How to send a break? H. Peter Anvin
2006-05-29 15:08 ` linux-os (Dick Johnson)
2006-05-29 15:35   ` Valdis.Kletnieks
2006-05-29 17:32     ` Haar János

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).