reiserfs blocks long on getdents64() during concurrent write

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* reiserfs blocks long on getdents64() during concurrent write
@ 2002-08-04 16:09 Roland Kuhn
  2002-08-05  5:44 ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-04 16:09 UTC (permalink / raw)
  To: linux-kernel

Dear kernel hackers!

Yesterday, Linus kindly directed me towards the filesystem implementation 
concerning this, so I produced some diagnostics in that direction. The 
task is to write a big file with dd to a hardware RAID on a dual Athlon MP 
machine, while in the meantime checking the progress with ls -l. Here's 
the relevant part of 'strace -T ls -l' (it is not always so slow)

open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 <0.000013>
fstat64(3, {st_mode=S_IFDIR|0755, st_size=160, ...}) = 0 <0.000007>
fcntl64(0x3, 0x2, 0x1, 0x3)             = 0 <0.000011>
brk(0x805b000)                          = 0x805b000 <0.000008>
getdents64(0x3, 0x8058ac8, 0x1000, 0x8058ac8) = 176 <4.726011>
lstat64("bigfile_2", {st_mode=S_IFREG|0644, st_size=2147483648, ...}) = 0 <0.000023>
lstat64("bigfile_3", {st_mode=S_IFREG|0644, st_size=726581248, ...}) = 0 <0.000012>
lstat64("writer", {st_mode=S_IFREG|0755, st_size=58, ...}) = 0 <0.000012>
lstat64("bigfile_", {st_mode=S_IFREG|0644, st_size=2147483648, ...}) = 0 <0.000012>
getdents64(0x3, 0x8058ac8, 0x1000, 0x8058ac8) = 0 <0.000033>
close(3)                                = 0 <0.000013>

The first call to getdents64 takes 4.7s! I captured the task status and 
got this:

writer        S 00000018  4732  1652      1  1653          1242 (NOTLB)
Call Trace:    [sys_wait4+952/1008] [system_call+51/56]
dd            D 0000001A     0  1653   1652                     (NOTLB)
Call Trace:    [sleep_on+74/112]
        [do_journal_begin_r+87/544]
        [unmap_underlying_metadata+26/96]
        [__block_prepare_write+231/752]
        [call_reschedule_interrupt+5/11]
        [journal_begin+19/32]
        [reiserfs_commit_write+136/336]
        [block_prepare_write+29/64]
        [generic_file_write+1256/1776]
        [sys_write+149/272]
        [do_IRQ+197/240]
        [system_call+51/56]
ls            D 080541A4  5084  1659   1562                     (NOTLB)
Call Trace:    [sleep_on+74/112]
        [do_journal_begin_r+87/544]
        [search_by_key+2338/3552]
        [journal_begin+19/32]
        [reiserfs_dirty_inode+88/160]
        [__mark_inode_dirty+46/160]
        [update_atime+81/96]
        [reiserfs_readdir+1112/1136]
        [filemap_nopage+233/528]
        [rb_insert_color+112/240]
        [vfs_readdir+124/192]
        [filldir64+0/320]
        [sys_getdents64+79/186]
        [filldir64+0/320]
        [system_call+51/56]

writer is a shell script calling dd with a blocksize of 1M. The problem of 
course vanishes when using noatime, but it still makes me wonder why a 
single write request is delayed so long. The other possibility to avoid 
this is to use a sync mount, and there I discovered something really 
strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping 
slow with 10kB/s!

If you have any thoughts on the cause of this behaviour and/or on how to 
fix it, I would be glad to hear them. If it's not too complicated I could 
even code something up myself, and I for sure can do any testing needed.

Thanks in advance,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-04 16:09 reiserfs blocks long on getdents64() during concurrent write Roland Kuhn
@ 2002-08-05  5:44 ` Oleg Drokin
  2002-08-05  9:22   ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2002-08-05  5:44 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: linux-kernel

Hello!

On Sun, Aug 04, 2002 at 06:09:34PM +0200, Roland Kuhn wrote:

> The first call to getdents64 takes 4.7s! I captured the task status and 
> got this:

[call traces skipped]

> writer is a shell script calling dd with a blocksize of 1M. The problem of 
> course vanishes when using noatime, but it still makes me wonder why a 
> single write request is delayed so long. The other possibility to avoid 

It seems that your kernel performs writes slowly.

> this is to use a sync mount, and there I discovered something really 
> strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping 
> slow with 10kB/s!

2.4.18-3 is particularly bad know for this exact problem (slow write speed),
you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.

For me on plain 2.4.18 the problem is not visible that bad as for you.
I.e. ls on a directory where I write this big file finishes in under
half a second.

> If you have any thoughts on the cause of this behaviour and/or on how to 
> fix it, I would be glad to hear them. If it's not too complicated I could 
> even code something up myself, and I for sure can do any testing needed.

You said 2.4.19 writes stuff faster for you, how about testing ls on that
kernel?

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05  5:44 ` Oleg Drokin
@ 2002-08-05  9:22   ` Roland Kuhn
  2002-08-05  9:54     ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05  9:22 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
> 
> On Sun, Aug 04, 2002 at 06:09:34PM +0200, Roland Kuhn wrote:
> 
> > The first call to getdents64 takes 4.7s! I captured the task status and 
> > got this:
> 
> It seems that your kernel performs writes slowly.
> 
That's true only in the case of a "filled write queue" (I don't know 
exactly how it's working). However, I verified that the average SCSI 
command takes less than two seconds to complete in this case (time between 
send to card and receive response interrupt).

BTW: was it directly clear for you for what the do_journal_begin_r() was 
waiting? I'm not so familiar with fs coding, especially the locking...

While we're at it: it looks like e.g. the 3ware driver is busy waiting in
tw_scsi_queue() until (by virtue of an interrupt?) a request_id becomes
free. Is this common practice? Or is there code somewhere checking for the
availability of these request slots before calling scsi_queue? I would
think that the CPU can do more useful things while waiting for an
interrupt...

> > this is to use a sync mount, and there I discovered something really 
> > strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping 
> > slow with 10kB/s!
> 
> 2.4.18-3 is particularly bad know for this exact problem (slow write speed),
> you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.
> 
> For me on plain 2.4.18 the problem is not visible that bad as for you.
> I.e. ls on a directory where I write this big file finishes in under
> half a second.
> 
Do you use IDE or SCSI? It seems like the SCSI drivers can have up to 
255*256 sectors pending on the controller. If that's the problem, how can 
I decrease this buffering?

> > If you have any thoughts on the cause of this behaviour and/or on how to 
> > fix it, I would be glad to hear them. If it's not too complicated I could 
> > even code something up myself, and I for sure can do any testing needed.
> 
> You said 2.4.19 writes stuff faster for you, how about testing ls on that
> kernel?
> 
The call trace was from 2.4.19, and the situation actually was much better 
than in the 2.4.18-3 case, where it took about 1 minute for ls to come 
back. 

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05  9:22   ` Roland Kuhn
@ 2002-08-05  9:54     ` Oleg Drokin
  2002-08-05 10:51       ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2002-08-05  9:54 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: linux-kernel

Hello!

On Mon, Aug 05, 2002 at 11:22:49AM +0200, Roland Kuhn wrote:

> BTW: was it directly clear for you for what the do_journal_begin_r() was 
> waiting? I'm not so familiar with fs coding, especially the locking...

I'd say that it's waiting for journal lock that was captured by kupdated
(or something like that) doing periodic write_super() calls, when it thinks
memory is low. Our tests some time ago shown that once memory is lower than
some value, write_super() "method" is called for every superblock that
have "dirty" flag set. In reiserfs that flushes the journal, and while journal
is being flushed, nobody can create new transactions obviously, and since
ls changes directory's atime in normal case, it want to open new transaction.
Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
we have some patches that should help this (from Chris Mason).
You can try these for yourself too, for example from here:
ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 

> > > this is to use a sync mount, and there I discovered something really 
> > > strange: 2.4.19 gives me about 17MB/s while 2.4.18-3 (RedHat) was creeping 
> > > slow with 10kB/s!
> > 2.4.18-3 is particularly bad know for this exact problem (slow write speed),
> > you should consider upgrading to at least 2.4.18-5 kernel from redhat updates.
> > For me on plain 2.4.18 the problem is not visible that bad as for you.
> > I.e. ls on a directory where I write this big file finishes in under
> > half a second.
> Do you use IDE or SCSI? It seems like the SCSI drivers can have up to 

I tested on IDE.

> > You said 2.4.19 writes stuff faster for you, how about testing ls on that
> > kernel?
> The call trace was from 2.4.19, and the situation actually was much better 

Ah, I thought call trace was from 2.4.18-3.

> than in the 2.4.18-3 case, where it took about 1 minute for ls to come 
> back. 

So you might try the patch I mentioned or you can mount with nodiratime mount
option to prevent updqting of directory atimes, but having atimes still to be
updated on regular files at the same time.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05  9:54     ` Oleg Drokin
@ 2002-08-05 10:51       ` Roland Kuhn
  2002-08-05 11:04         ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 10:51 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
> 
> On Mon, Aug 05, 2002 at 11:22:49AM +0200, Roland Kuhn wrote:
> 
> > BTW: was it directly clear for you for what the do_journal_begin_r() was 
> > waiting? I'm not so familiar with fs coding, especially the locking...
> 
> I'd say that it's waiting for journal lock that was captured by kupdated
> (or something like that) doing periodic write_super() calls, when it thinks
> memory is low. Our tests some time ago shown that once memory is lower than
> some value, write_super() "method" is called for every superblock that
> have "dirty" flag set. In reiserfs that flushes the journal, and while journal
> is being flushed, nobody can create new transactions obviously, and since
> ls changes directory's atime in normal case, it want to open new transaction.
> Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> we have some patches that should help this (from Chris Mason).
> You can try these for yourself too, for example from here:
> ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> 
I will try it immediately. And I will try to find documentation on how 
this all is working, so I can understand the implications. Does the 
"flushing" also happen, when the journal is full? Or is it possible to 
begin a new journal while the old one is written out?

> > than in the 2.4.18-3 case, where it took about 1 minute for ls to come 
> > back. 
> 
> So you might try the patch I mentioned or you can mount with nodiratime mount
> option to prevent updqting of directory atimes, but having atimes still to be
> updated on regular files at the same time.
> 
Interesting mount option, though I think very few application actually use 
atime anyway, isn't it?

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 10:51       ` Roland Kuhn
@ 2002-08-05 11:04         ` Oleg Drokin
  2002-08-05 18:19           ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2002-08-05 11:04 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: linux-kernel

Hello!

On Mon, Aug 05, 2002 at 12:51:35PM +0200, Roland Kuhn wrote:

> > Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> > we have some patches that should help this (from Chris Mason).
> > You can try these for yourself too, for example from here:
> > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> I will try it immediately. And I will try to find documentation on how 
> this all is working, so I can understand the implications. Does the 
> "flushing" also happen, when the journal is full? Or is it possible to 

Yes, if journal is full it is also flushed.

> begin a new journal while the old one is written out?

No, it is not possible to begin "new journal" in reiser3.

> > So you might try the patch I mentioned or you can mount with nodiratime mount
> > option to prevent updqting of directory atimes, but having atimes still to be
> > updated on regular files at the same time.
> Interesting mount option, though I think very few application actually use 
> atime anyway, isn't it?

I know for sure that atime on mbox-style files used to determine which messages
are new, and which are not. (N and O status in mutt).
There may be other users as well.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 11:04         ` Oleg Drokin
@ 2002-08-05 18:19           ` Roland Kuhn
  2002-08-05 18:30             ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 18:19 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel

Hi Oleg!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
> 
> On Mon, Aug 05, 2002 at 12:51:35PM +0200, Roland Kuhn wrote:
> 
> > > Unfortunatelly dirty flag for reiserfs gets set way to often than necessary,
> > > we have some patches that should help this (from Chris Mason).
> > > You can try these for yourself too, for example from here:
> > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 

>From there I get 'permission denied', but I got it somewhere else (google 
is great).

However, it does not apply cleanly to 2.4.19. It is already partly in, as 
it seems, but there are some rejects that are not obvious to fix for me. 
If this patch still makes sense, it would be great if someone with more 
knowledge/experience than me could have a look...

> No, it is not possible to begin "new journal" in reiser3.
> 
Is this going to change in reiser4?

> I know for sure that atime on mbox-style files used to determine which messages
> are new, and which are not. (N and O status in mutt).
> There may be other users as well.
> 
Yes, this kind of stuff. But we use it "only" to buffer some 600GB of 
valuable physics data before it can go to tape ;-)

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 18:19           ` Roland Kuhn
@ 2002-08-05 18:30             ` Oleg Drokin
  2002-08-05 18:54               ` Chris Mason
  2002-08-05 19:11               ` Roland Kuhn
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Drokin @ 2002-08-05 18:30 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: linux-kernel

Hello!

On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> >From there I get 'permission denied', but I got it somewhere else (google 
> is great).
> However, it does not apply cleanly to 2.4.19. It is already partly in, as 
> it seems, but there are some rejects that are not obvious to fix for me. 
> If this patch still makes sense, it would be great if someone with more 
> knowledge/experience than me could have a look...

In the same dir there is 
03-data-logging-24.diff.gz
It contains this patch and more stuff, you can try it.

> > No, it is not possible to begin "new journal" in reiser3.
> Is this going to change in reiser4?

Sort of, there would be no static journal.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 18:30             ` Oleg Drokin
@ 2002-08-05 18:54               ` Chris Mason
  2002-08-05 19:48                 ` Roland Kuhn
  2002-08-05 19:11               ` Roland Kuhn
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2002-08-05 18:54 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Roland Kuhn, linux-kernel

On Mon, 2002-08-05 at 14:30, Oleg Drokin wrote:
> Hello!
> 
> On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> > >From there I get 'permission denied', but I got it somewhere else (google 
> > is great).
> > However, it does not apply cleanly to 2.4.19. It is already partly in, as 
> > it seems, but there are some rejects that are not obvious to fix for me. 
> > If this patch still makes sense, it would be great if someone with more 
> > knowledge/experience than me could have a look...

The stack traces you sent earlier show a few procs stuck waiting for the
transaction to begin, but they don't show which process is currently in
a transaction (this is who they are waiting on).

Oleg is right, they are probably waiting on kupdated, since the FS might
get marked dirty faster than it can clear it.

Another possibility is ctime/mtime updates during write.

So, on ftp.suse.com/pub/people/mason/patches/data-logging

Apply:
01-relocation-4.diff
02-commit_super-8.diff # this is the one you want, but it depends on 01.

And try again.  If that doesn't do it, try 04-write_times.diff (which
doesn't depend on anything).

Or, send a few fully decoded sysrq-t outputs during the run, so we can
see what all the procs are up to.

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 18:30             ` Oleg Drokin
  2002-08-05 18:54               ` Chris Mason
@ 2002-08-05 19:11               ` Roland Kuhn
  1 sibling, 0 replies; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 19:11 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: linux-kernel

Hi!

On Mon, 5 Aug 2002, Oleg Drokin wrote:

> Hello!
> 
> On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> > >From there I get 'permission denied', but I got it somewhere else (google 
> > is great).
> > However, it does not apply cleanly to 2.4.19. It is already partly in, as 
> > it seems, but there are some rejects that are not obvious to fix for me. 
> > If this patch still makes sense, it would be great if someone with more 
> > knowledge/experience than me could have a look...
> 
> In the same dir there is 
> 03-data-logging-24.diff.gz
> It contains this patch and more stuff, you can try it.
> 
The same comment applies: I get 26 rejected hunks, most in journal.c. For 
some it could be a whitespace problem, because I could not immediately see 
the reason. Are some of these things already in 2.4.19?

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 18:54               ` Chris Mason
@ 2002-08-05 19:48                 ` Roland Kuhn
  2002-08-05 20:02                   ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 19:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: Oleg Drokin, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2243 bytes --]

Hi!

sorry, I read this mail after Oleg's, so disregard my previous post.

On 5 Aug 2002, Chris Mason wrote:

> On Mon, 2002-08-05 at 14:30, Oleg Drokin wrote:
> > Hello!
> > 
> > On Mon, Aug 05, 2002 at 08:19:05PM +0200, Roland Kuhn wrote:
> > > > > > ftp://ftp.suse.com/pub/people/mason/patches/data-logging/02-commit_super-8-relocation.diff.gz 
> > > >From there I get 'permission denied', but I got it somewhere else (google 
> > > is great).
> > > However, it does not apply cleanly to 2.4.19. It is already partly in, as 
> > > it seems, but there are some rejects that are not obvious to fix for me. 
> > > If this patch still makes sense, it would be great if someone with more 
> > > knowledge/experience than me could have a look...
> 
> The stack traces you sent earlier show a few procs stuck waiting for the
> transaction to begin, but they don't show which process is currently in
> a transaction (this is who they are waiting on).
> 
The ShowTasks output is attached (it's a RedHat patched klogd which
resolves symbols on the fly). To me it looks like kupdated is indeed
inside do_journal_end().

> Oleg is right, they are probably waiting on kupdated, since the FS might
> get marked dirty faster than it can clear it.
> 
> Another possibility is ctime/mtime updates during write.
> 
In our application there are always at least one writer and one reader 
active in one directory of the partition, both with data rates up to 
10MB/s. The reader uses a block size of 128kB, but I don't know about the 
writer.

> So, on ftp.suse.com/pub/people/mason/patches/data-logging
> 
> Apply:
> 01-relocation-4.diff
> 02-commit_super-8.diff # this is the one you want, but it depends on 01.
> 
Okay, will try.

> And try again.  If that doesn't do it, try 04-write_times.diff (which
> doesn't depend on anything).
> 
Is there a documentation about what this patch does as a whole?

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


[-- Attachment #2: Type: APPLICATION/x-gzip, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 19:48                 ` Roland Kuhn
@ 2002-08-05 20:02                   ` Roland Kuhn
  2002-08-05 22:14                     ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 20:02 UTC (permalink / raw)
  To: Chris Mason; +Cc: Oleg Drokin, linux-kernel

Hi!

On Mon, 5 Aug 2002, Roland Kuhn wrote:

> > So, on ftp.suse.com/pub/people/mason/patches/data-logging
> > 
> > Apply:
> > 01-relocation-4.diff
> > 02-commit_super-8.diff # this is the one you want, but it depends on 01.
> > 
> Okay, will try.
> 
> > And try again.  If that doesn't do it, try 04-write_times.diff (which
> > doesn't depend on anything).
> > 
> Is there a documentation about what this patch does as a whole?
> 
Sorry, stupid question for the 04 one. What my brain wanted to say: The 
patches 01 and 02 seem to aim at dirtying the super block less often. If 
there is serious writing activity, will this lead to fewer but longer 
commits? The problem with our current (kinda stupid) software is that 
lower write() latency is more important than a few percent more 
throughput.

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 20:02                   ` Roland Kuhn
@ 2002-08-05 22:14                     ` Chris Mason
  2002-08-05 22:46                       ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2002-08-05 22:14 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: Oleg Drokin, linux-kernel

On Mon, 2002-08-05 at 16:02, Roland Kuhn wrote:
> Hi!
> 
> On Mon, 5 Aug 2002, Roland Kuhn wrote:
> 
> > > So, on ftp.suse.com/pub/people/mason/patches/data-logging
> > > 
> > > Apply:
> > > 01-relocation-4.diff
> > > 02-commit_super-8.diff # this is the one you want, but it depends on 01.
> > > 
> > Okay, will try.
> > 
> > > And try again.  If that doesn't do it, try 04-write_times.diff (which
> > > doesn't depend on anything).
> > > 
> > Is there a documentation about what this patch does as a whole?
> > 
> Sorry, stupid question for the 04 one. What my brain wanted to say: The 
> patches 01 and 02 seem to aim at dirtying the super block less often. If 
> there is serious writing activity, will this lead to fewer but longer 
> commits? The problem with our current (kinda stupid) software is that 
> lower write() latency is more important than a few percent more 
> throughput.

01-relocation-4 deals with allowing reiserfs to use an external logging
device.  It isn't related to your problem, but 02-commit_super-8 is
diffed against it.

02-commit_super-8 does two things.  First it changes sync_supers() so
that it won't loop on a single filesystem while it's super is dirty. 
Before, if kupdate triggered a write_super call, and another FS writer
redirtied the super after write_super cleared it (but before it
returned), write_super gets called a second time.  Since a commit was
done for each write_super call, that gets expensive quickly.

Second, the patch adds a commit_super call, and changes sync() to use
that instead of write_super.  This allows the FS to skip the commit when
write_super is called.

This does lead to fewer commits and longer running transactions, but
does not increase the amount of time it takes the write() call to
complete.  It does increase the time between when you make a metadata
change and when that change actually goes to the disk.  

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 22:14                     ` Chris Mason
@ 2002-08-05 22:46                       ` Roland Kuhn
  2002-08-06  0:47                         ` Chris Mason
  0 siblings, 1 reply; 16+ messages in thread
From: Roland Kuhn @ 2002-08-05 22:46 UTC (permalink / raw)
  To: Chris Mason; +Cc: Oleg Drokin, linux-kernel

On 5 Aug 2002, Chris Mason wrote:

> 01-relocation-4 deals with allowing reiserfs to use an external logging
> device.  It isn't related to your problem, but 02-commit_super-8 is
> diffed against it.
> 
> 02-commit_super-8 does two things.  First it changes sync_supers() so
> that it won't loop on a single filesystem while it's super is dirty. 
> Before, if kupdate triggered a write_super call, and another FS writer
> redirtied the super after write_super cleared it (but before it
> returned), write_super gets called a second time.  Since a commit was
> done for each write_super call, that gets expensive quickly.
> 
> Second, the patch adds a commit_super call, and changes sync() to use
> that instead of write_super.  This allows the FS to skip the commit when
> write_super is called.
> 
> This does lead to fewer commits and longer running transactions, but
> does not increase the amount of time it takes the write() call to
> complete.  It does increase the time between when you make a metadata
> change and when that change actually goes to the disk.  
> 
Ahh, thanks! This sounds like a good idea to me, hopefully your patch will 
be accepted despite the fact that Alan is busy doing other things ;-)

Coming back to the issue: applying these patches increased the throughput
by about 20% :-) Now it takes about 100sec instead of 120sec to write a
2GB file. Tomorrow I will try it without the write_times part, to see how 
much that does.

But more important: the hiccups are more seldom and sometimes shorter than 
before. With plain 2.4.19 I would hit it about twice per minute (I have 
not measured it), now it happens only after two minutes when writing 1M 
chunks at 20MB/s. The longest seen so far was also about 4 seconds, 
though.

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-05 22:46                       ` Roland Kuhn
@ 2002-08-06  0:47                         ` Chris Mason
  2002-08-06 10:01                           ` Roland Kuhn
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2002-08-06  0:47 UTC (permalink / raw)
  To: Roland Kuhn; +Cc: Oleg Drokin, linux-kernel

On Mon, 2002-08-05 at 18:46, Roland Kuhn wrote:
> > 
> Ahh, thanks! This sounds like a good idea to me, hopefully your patch will 
> be accepted despite the fact that Alan is busy doing other things ;-)
> 
> Coming back to the issue: applying these patches increased the throughput
> by about 20% :-) Now it takes about 100sec instead of 120sec to write a
> 2GB file. Tomorrow I will try it without the write_times part, to see how 
> much that does.

The write_times patch fixes a lot of latency problems, but all 5 of my
patches kind of build on each other to solve problems in different
workloads.

> 
> But more important: the hiccups are more seldom and sometimes shorter than 
> before. With plain 2.4.19 I would hit it about twice per minute (I have 
> not measured it), now it happens only after two minutes when writing 1M 
> chunks at 20MB/s. The longest seen so far was also about 4 seconds, 
> though.

This is harder to guess at ;-)  But I'll try 2 things I know I've fixed.

#1 I've made the metadata writeback code much more efficient, especially
for smaller transactions.  A dd to a large file won't generate lots of
log traffic, writing 700M only changes about 160 blocks in 30 seconds. 
Anyway, 03-data-logging-24 might make a big difference.

#2 During an ls, the current code ends up reading the directory more or
less one block at a time.  Try 05-search_reada-4.diff, which will read
more tree nodes at once.

If that still doesn't work, I'm porting forward some reiserfs
low-latency work that andrew morton did, we can give it a try.

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs blocks long on getdents64() during concurrent write
  2002-08-06  0:47                         ` Chris Mason
@ 2002-08-06 10:01                           ` Roland Kuhn
  0 siblings, 0 replies; 16+ messages in thread
From: Roland Kuhn @ 2002-08-06 10:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: Oleg Drokin, linux-kernel

On 5 Aug 2002, Chris Mason wrote:

> On Mon, 2002-08-05 at 18:46, Roland Kuhn wrote:
> > 
> > But more important: the hiccups are more seldom and sometimes shorter than 
> > before. With plain 2.4.19 I would hit it about twice per minute (I have 
> > not measured it), now it happens only after two minutes when writing 1M 
> > chunks at 20MB/s. The longest seen so far was also about 4 seconds, 
> > though.
> 
> This is harder to guess at ;-)  But I'll try 2 things I know I've fixed.
> 
> #1 I've made the metadata writeback code much more efficient, especially
> for smaller transactions.  A dd to a large file won't generate lots of
> log traffic, writing 700M only changes about 160 blocks in 30 seconds. 
> Anyway, 03-data-logging-24 might make a big difference.
> 
It is hard to get precise numbers, but I have the feeling that with atime 
enabled the ls-hiccups have at least not decreased, and there has been a 
rather long one of about 13 seconds, frequency is about one per minute, 
usually taking 2-4 seconds.

With noatime this of course doesn't happen, but with the patch the file 
size does not increase smoothly any more, it sometimes stays the same for 
some ten seconds and then jumps a few hundred MBs. However, the throughput 
is roughly the same as without the patch (compatible within the errors of 
about 10%).

> #2 During an ls, the current code ends up reading the directory more or
> less one block at a time.  Try 05-search_reada-4.diff, which will read
> more tree nodes at once.
> 
Unfortunately I'm ordered to do something now, so I won't have the time to
check the performance of this one during the next few hours. The
combination of 01+02+04 seemed more stable (concerning sustained
throughput), so I will test it on all 12 machines and tell you what
happens.

Thanks very much for your help! I appreciate that!

Ciao,
					Roland

+---------------------------+-------------------------+
|    TU Muenchen            |                         |
|    Physik-Department E18  |  Raum    3558           |
|    James-Franck-Str.      |  Telefon 089/289-12592  |
|    85747 Garching         |                         |
+---------------------------+-------------------------+


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2002-08-06  9:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-04 16:09 reiserfs blocks long on getdents64() during concurrent write Roland Kuhn
2002-08-05  5:44 ` Oleg Drokin
2002-08-05  9:22   ` Roland Kuhn
2002-08-05  9:54     ` Oleg Drokin
2002-08-05 10:51       ` Roland Kuhn
2002-08-05 11:04         ` Oleg Drokin
2002-08-05 18:19           ` Roland Kuhn
2002-08-05 18:30             ` Oleg Drokin
2002-08-05 18:54               ` Chris Mason
2002-08-05 19:48                 ` Roland Kuhn
2002-08-05 20:02                   ` Roland Kuhn
2002-08-05 22:14                     ` Chris Mason
2002-08-05 22:46                       ` Roland Kuhn
2002-08-06  0:47                         ` Chris Mason
2002-08-06 10:01                           ` Roland Kuhn
2002-08-05 19:11               ` Roland Kuhn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).