linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.66: slow to friggin slow journal recover
@ 2003-04-01 10:02 CaT
  2003-04-01 10:28 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: CaT @ 2003-04-01 10:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: sct, akpm, adilger

The journal recovery rangers from slow to really friggin slow under
2.5.66 with definate pauses in disk io stretching for 10s of seconds.
This does not happen with 2.5.63 and if I hit ^c on fsck and let the
kernel handle the journal recover for all partitions  on mountime
the recovery under 2.5.66 is either so fast that you don't notice
it or just a buttload faster. Very objective measurements of time but
the slowness of a journal recover as done by fsck is so noticible it's
not funny.

At the time of fsck journal recover or moiunt-time kernel journal
recover DMA is turned on the drive.

e2fsprogs 1.27 is in use. (1.27-2 from debian woody)

-- 
"Other countries of course, bear the same risk. But there's no doubt his
hatred is mainly directed at us. After all this is the guy who tried to
kill my dad."
        - George W. Bush Jr, Leader of the United States Regime
          September 26, 2002 (from a political fundraiser in Houston, Texas)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-01 10:02 2.5.66: slow to friggin slow journal recover CaT
@ 2003-04-01 10:28 ` Andrew Morton
  2003-04-11 17:26   ` CaT
  2003-04-12  2:16   ` CaT
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-04-01 10:28 UTC (permalink / raw)
  To: CaT; +Cc: linux-kernel, sct, akpm, adilger

CaT <cat@zip.com.au> wrote:
>
> The journal recovery rangers from slow to really friggin slow under
> 2.5.66 with definate pauses in disk io stretching for 10s of seconds.
> This does not happen with 2.5.63 and if I hit ^c on fsck and let the
> kernel handle the journal recover for all partitions  on mountime
> the recovery under 2.5.66 is either so fast that you don't notice
> it or just a buttload faster. Very objective measurements of time but
> the slowness of a journal recover as done by fsck is so noticible it's
> not funny.
> 
> At the time of fsck journal recover or moiunt-time kernel journal
> recover DMA is turned on the drive.
> 
> e2fsprogs 1.27 is in use. (1.27-2 from debian woody)
> 

e2fsck 1.32 seems to work fine here.

Try arranging for a partition to _not_ be mounted at boot (disable it in
/etc/fstab).  Then do a `reboot -f' and when you get a login prompt, run
e2fsck against that partition.

If the journal recovery is still slow then try capturing the state when it is
stuck with sysrq-T.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-01 10:28 ` Andrew Morton
@ 2003-04-11 17:26   ` CaT
  2003-04-12  2:16   ` CaT
  1 sibling, 0 replies; 10+ messages in thread
From: CaT @ 2003-04-11 17:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, sct, akpm, adilger

On Tue, Apr 01, 2003 at 02:28:44AM -0800, Andrew Morton wrote:
> CaT <cat@zip.com.au> wrote:
> > The journal recovery rangers from slow to really friggin slow under
> > 2.5.66 with definate pauses in disk io stretching for 10s of seconds.
> > This does not happen with 2.5.63 and if I hit ^c on fsck and let the
> > kernel handle the journal recover for all partitions  on mountime
> > the recovery under 2.5.66 is either so fast that you don't notice
> > it or just a buttload faster. Very objective measurements of time but
> > the slowness of a journal recover as done by fsck is so noticible it's
> > not funny.
> 
> e2fsck 1.32 seems to work fine here.
> 
> Try arranging for a partition to _not_ be mounted at boot (disable it in
> /etc/fstab).  Then do a `reboot -f' and when you get a login prompt, run
> e2fsck against that partition.
> 
> If the journal recovery is still slow then try capturing the state when it is
> stuck with sysrq-T.

Any other thing I can do? Every time I try alt-sysrq-t the fscking fsck
terminates and I don't get a list of tasks whilst it is running but only
straight after (which I'm assuming is less then useful). Most annoying.

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-01 10:28 ` Andrew Morton
  2003-04-11 17:26   ` CaT
@ 2003-04-12  2:16   ` CaT
  2003-04-12  2:24     ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: CaT @ 2003-04-12  2:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, sct, akpm, adilger

On Tue, Apr 01, 2003 at 02:28:44AM -0800, Andrew Morton wrote:
> If the journal recovery is still slow then try capturing the state when it is
> stuck with sysrq-T.

It finally didn't kill it. This is what it's like when it's not doing
anything (ie no disk access). Once Again, after this I had to ^c it to
get some progress on the booting (ie let mount handle journal recovery).

Something I realised last night though is that this happens when I get
sudden power off. Not everytime but most of the time and as such reboot
-f wont help as I think that lets everything finish nicely.

SysRq : Show State

                         free                        sibling
  task             PC    stack   pid father child younger older
init          S FFFBB2FE 39392     1      0     2               (NOTLB)
Call Trace:
 [<c0120b5e>] schedule_timeout+0x7a/0xa0
 [<c0120ad4>] process_timeout+0x0/0x10
 [<c0151227>] do_select+0x1a3/0x1dc
 [<c0150f38>] __pollwait+0x0/0x9c
 [<c0151590>] sys_select+0x304/0x424
 [<c0108cf3>] syscall_call+0x7/0xb

ksoftirqd/0   S C12AC000 4294955984     2      1             3       (L-TLB)
Call Trace:
 [<c011d93c>] ksoftirqd+0x24/0x9c
 [<c011d974>] ksoftirqd+0x5c/0x9c
 [<c011d918>] ksoftirqd+0x0/0x9c
 [<c0107211>] kernel_thread_helper+0x5/0xc

events/0      S CFFE889C 4294946240     3      1             4     2 (L-TLB)
Call Trace:
 [<c0125bd9>] worker_thread+0x105/0x238
 [<c0125ad4>] worker_thread+0x0/0x238
 [<c0287b60>] fb_flashcursor+0x0/0x34
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0107211>] kernel_thread_helper+0x5/0xc

khubd         S C1396000 944048     4      1             5     3 (L-TLB)
Call Trace:
 [<c029ca10>] usb_hub_thread+0x0/0xe0
 [<c029ca97>] usb_hub_thread+0x87/0xe0
 [<c029ca10>] usb_hub_thread+0x0/0xe0
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0107211>] kernel_thread_helper+0x5/0xc

pdflush       S C1387FE0 876960     5      1             6     4 (L-TLB)
Call Trace:
 [<c013156b>] __pdflush+0x7b/0x160
 [<c0131650>] pdflush+0x0/0x14
 [<c013165f>] pdflush+0xf/0x14
 [<c0107211>] kernel_thread_helper+0x5/0xc

pdflush       S C1385FE0  6592     6      1             7     5 (L-TLB)
Call Trace:
 [<c013156b>] __pdflush+0x7b/0x160
 [<c0131650>] pdflush+0x0/0x14
 [<c013165f>] pdflush+0xf/0x14
 [<c0107211>] kernel_thread_helper+0x5/0xc

kswapd0       S C1381F44 4294951296     7      1             8     6 (L-TLB)
Call Trace:
 [<c0135e8e>] kswapd+0xd2/0xf8
 [<c0135dbc>] kswapd+0x0/0xf8
 [<c0108d1a>] work_resched+0x5/0x16
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c0107211>] kernel_thread_helper+0x5/0xc

aio/0         S C13BB9DC 4294944656     8      1             9     7 (L-TLB)
Call Trace:
 [<c0125bd9>] worker_thread+0x105/0x238
 [<c0125ad4>] worker_thread+0x0/0x238
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0107211>] kernel_thread_helper+0x5/0xc

kpnpbiosd     S FFFBB59F 4294823328     9      1            10     8 (L-TLB)
Call Trace:
 [<c0120b5e>] schedule_timeout+0x7a/0xa0
 [<c0120ad4>] process_timeout+0x0/0x10
 [<c0219767>] pnp_dock_thread+0x53/0xf0
 [<c0219714>] pnp_dock_thread+0x0/0xf0
 [<c0107211>] kernel_thread_helper+0x5/0xc

kseriod       S CFDC0000 245617584    10      1            11     9 (L-TLB)
Call Trace:
 [<c02c051c>] serio_thread+0x0/0xf0
 [<c02c05a7>] serio_thread+0x8b/0xf0
 [<c02c051c>] serio_thread+0x0/0xf0
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0107211>] kernel_thread_helper+0x5/0xc

kjournald     S CFD12000   352    11      1            12    10 (L-TLB)
Call Trace:
 [<c0118218>] interruptible_sleep_on+0x48/0x5c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c017dc58>] kjournald+0x130/0x1b4
 [<c017db28>] kjournald+0x0/0x1b4
 [<c017db10>] commit_timeout+0x0/0x10
 [<c0107211>] kernel_thread_helper+0x5/0xc

init          S CFD1134C 4293920624    12      1    13            11 (NOTLB)
Call Trace:
 [<c011c814>] sys_wait4+0xa8/0x218
 [<c011c94d>] sys_wait4+0x1e1/0x218
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0108cf3>] syscall_call+0x7/0xb

rcS           S CFD10D3C 4293856640    13     12    34               (NOTLB)
Call Trace:
 [<c011c814>] sys_wait4+0xa8/0x218
 [<c011c94d>] sys_wait4+0x1e1/0x218
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0108cf3>] syscall_call+0x7/0xb

rcS           S CFD1072C 4292703120    34     13    37               (NOTLB)
Call Trace:
 [<c011c814>] sys_wait4+0xa8/0x218
 [<c011c94d>] sys_wait4+0x1e1/0x218
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0108cf3>] syscall_call+0x7/0xb

fsck          S current  4291623328    37     34    40               (NOTLB)
Call Trace:
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0118040>] default_wake_function+0x0/0x1c
 [<c0108cf3>] syscall_call+0x7/0xb

fsck.ext3     R CF971E28 4293902656    40     37                     (NOTLB)
Call Trace:
 [<c0118a2e>] io_schedule+0xe/0x18
 [<c012cde8>] wait_on_page_bit+0x9c/0xb8
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c0119088>] autoremove_wake_function+0x0/0x3c
 [<c012d3a2>] do_generic_mapping_read+0x1be/0x328
 [<c012d7a0>] __generic_file_aio_read+0x184/0x1a0
 [<c012d50c>] file_read_actor+0x0/0x110
 [<c012d887>] generic_file_read+0x7f/0x9c
 [<c0138618>] handle_mm_fault+0x68/0xfc
 [<c0116960>] do_page_fault+0x0/0x3fe
 [<c01494ee>] blkdev_file_write+0x26/0x30
 [<c0142f4e>] vfs_read+0xa2/0xd4
 [<c014312a>] sys_read+0x2a/0x40
 [<c0108cf3>] syscall_call+0x7/0xb

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  2:16   ` CaT
@ 2003-04-12  2:24     ` Andrew Morton
  2003-04-12  2:38       ` CaT
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-04-12  2:24 UTC (permalink / raw)
  To: CaT; +Cc: linux-kernel, sct, adilger

CaT <cat@zip.com.au> wrote:
>
> On Tue, Apr 01, 2003 at 02:28:44AM -0800, Andrew Morton wrote:
> > If the journal recovery is still slow then try capturing the state when it is
> > stuck with sysrq-T.
> 
> It finally didn't kill it. This is what it's like when it's not doing
> anything (ie no disk access).
> ...
> fsck.ext3     R CF971E28 4293902656    40     37                     (NOTLB)
> Call Trace:
>  [<c0118a2e>] io_schedule+0xe/0x18
>  [<c012cde8>] wait_on_page_bit+0x9c/0xb8
>  [<c0119088>] autoremove_wake_function+0x0/0x3c
>  [<c0119088>] autoremove_wake_function+0x0/0x3c
>  [<c012d3a2>] do_generic_mapping_read+0x1be/0x328
>  [<c012d7a0>] __generic_file_aio_read+0x184/0x1a0
>  [<c012d50c>] file_read_actor+0x0/0x110
>  [<c012d887>] generic_file_read+0x7f/0x9c
>  [<c0138618>] handle_mm_fault+0x68/0xfc
>  [<c0116960>] do_page_fault+0x0/0x3fe
>  [<c01494ee>] blkdev_file_write+0x26/0x30
>  [<c0142f4e>] vfs_read+0xa2/0xd4
>  [<c014312a>] sys_read+0x2a/0x40
>  [<c0108cf3>] syscall_call+0x7/0xb

OK, you may be thrashing the VM.  fsck can use a lot of memory.

How large is the filesystem?

How many files are on the filesystem?

How much physical memory does the machine have?

Run ALT-sysrq-M during the fsck to get some stats.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  2:24     ` Andrew Morton
@ 2003-04-12  2:38       ` CaT
  2003-04-12  2:53         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: CaT @ 2003-04-12  2:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, sct, adilger

On Fri, Apr 11, 2003 at 07:24:13PM -0700, Andrew Morton wrote:
> CaT <cat@zip.com.au> wrote:
> >  [<c0138618>] handle_mm_fault+0x68/0xfc
> >  [<c0116960>] do_page_fault+0x0/0x3fe
> >  [<c01494ee>] blkdev_file_write+0x26/0x30
> >  [<c0142f4e>] vfs_read+0xa2/0xd4
> >  [<c014312a>] sys_read+0x2a/0x40
> >  [<c0108cf3>] syscall_call+0x7/0xb
> 
> OK, you may be thrashing the VM.  fsck can use a lot of memory.

*nod* Woiuld this also cause the kernel to hang? Last time this happened
it was on a small 256MB partition that holds 13 files (it's /var/mail)

> How large is the filesystem?

Not sure which one it was this time but the two possibilities are 1.2GB
and 197MB. It does this on either partition.

> How many files are on the filesystem?

197MB: 2728 files
1.2GB: 53707 files

> How much physical memory does the machine have?

256MB

> Run ALT-sysrq-M during the fsck to get some stats.

Ok. I'll try and get it to work for me again (and not interrupt the fsck
:/).

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  2:38       ` CaT
@ 2003-04-12  2:53         ` Andrew Morton
  2003-04-12  4:48           ` CaT
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-04-12  2:53 UTC (permalink / raw)
  To: CaT; +Cc: linux-kernel, sct, adilger

CaT <cat@zip.com.au> wrote:
>
> > How many files are on the filesystem?
> 
> 197MB: 2728 files
> 1.2GB: 53707 files
> 
> > How much physical memory does the machine have?
> 
> 256MB

scrub that theory then.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  2:53         ` Andrew Morton
@ 2003-04-12  4:48           ` CaT
  2003-04-12  5:01             ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: CaT @ 2003-04-12  4:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, sct, adilger

On Fri, Apr 11, 2003 at 07:53:08PM -0700, Andrew Morton wrote:
> CaT <cat@zip.com.au> wrote:
> > > How many files are on the filesystem?
> > 
> > 197MB: 2728 files
> > 1.2GB: 53707 files
> > 
> > > How much physical memory does the machine have?
> > 
> > 256MB
> 
> scrub that theory then.

Do you still want the alt-sysrq-m stuff? And is there anything else I
can do? profiling? apply a patch with debugging stuff that'd give you a
clue? etc...

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  4:48           ` CaT
@ 2003-04-12  5:01             ` Andrew Morton
  2003-04-12  5:13               ` CaT
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-04-12  5:01 UTC (permalink / raw)
  To: CaT; +Cc: linux-kernel, sct, adilger

CaT <cat@zip.com.au> wrote:
>
> Do you still want the alt-sysrq-m stuff? And is there anything else I
> can do? profiling? apply a patch with debugging stuff that'd give you a
> clue? etc...

Can't think of a lot really.

Are you sure that fsck is only running journal recovery?  Is it possible that
it is performing a full fsck for some reason?

Make sure you're running current e2fsprogs?

Boot with `init=/bin/sh', run fsck by hand in the background, poke around
with `ps' to see what it's up to, etc.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.66: slow to friggin slow journal recover
  2003-04-12  5:01             ` Andrew Morton
@ 2003-04-12  5:13               ` CaT
  0 siblings, 0 replies; 10+ messages in thread
From: CaT @ 2003-04-12  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, sct, adilger

On Fri, Apr 11, 2003 at 10:01:58PM -0700, Andrew Morton wrote:
> CaT <cat@zip.com.au> wrote:
> >
> > Do you still want the alt-sysrq-m stuff? And is there anything else I
> > can do? profiling? apply a patch with debugging stuff that'd give you a
> > clue? etc...
> 
> Can't think of a lot really.

doh.

> Are you sure that fsck is only running journal recovery?  Is it possible that
> it is performing a full fsck for some reason?

It's still in the journal stage. 2 reasons lead me to believe this:

1. if I hit ^c then mount does the journal recovery (and does it
   quick-smart - the hd light is on full-force during this)
2. the progress bar that comes up during a full fsck is not displayed.

> Make sure you're running current e2fsprogs?

I've compiled 1.32+1.33-WIP-2003.03.30 (debian e2fsprogs from sid). I'll
install that now.

> Boot with `init=/bin/sh', run fsck by hand in the background, poke around
> with `ps' to see what it's up to, etc.

I'll try to remember to do this when my laptop loses power again (it
happens often - power connector is wonky). Don't wanna do it right now
cos I have the connector in just the right spot so that the laptop is
actually getting power.

-- 
Martin's distress was in contrast to the bitter satisfaction of some
of his fellow marines as they surveyed the scene. "The Iraqis are sick
people and we are the chemotherapy," said Corporal Ryan Dupre. "I am
starting to hate this country. Wait till I get hold of a friggin' Iraqi.
No, I won't get hold of one. I'll just kill him."
	- http://www.informationclearinghouse.info/article2479.htm

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-04-12  5:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-01 10:02 2.5.66: slow to friggin slow journal recover CaT
2003-04-01 10:28 ` Andrew Morton
2003-04-11 17:26   ` CaT
2003-04-12  2:16   ` CaT
2003-04-12  2:24     ` Andrew Morton
2003-04-12  2:38       ` CaT
2003-04-12  2:53         ` Andrew Morton
2003-04-12  4:48           ` CaT
2003-04-12  5:01             ` Andrew Morton
2003-04-12  5:13               ` CaT

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).