linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext3 metadata performace
@ 2006-05-11 14:11 Dieter Stüken
  2006-05-11 15:43 ` Avi Kivity
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dieter Stüken @ 2006-05-11 14:11 UTC (permalink / raw)
  To: linux-kernel

after I switched from from ext2 to ext3 i observed some severe 
performance degradation. Most discussion about this topic deals
with tuning of data-io performance. My problem however is related to 
metadata updates. When cloning (cp -al) or deleting directory trees I 
find, that about 7200 files are created/deleted per minute. Seems
this is related to some ex3 strategy, to wait for each metadata to be
written to disk. Interestingly this occurs with my new hw-raid
controller (3ware 9500S), which even has an battery buffered disk cache.
Thus there is no need for synchronous IO anyway. If I disable the
disk cache on my plain SATA disk using ext3, I also get this behavior.

Would it be make sense for ext3, to disable synchronous writes even
for metadata (similar to the "data=writeback" option)? This means, that
ext3 won't protect the (meta) data currently written. This is needed
if running a database or an email server, where the process performing
the IO must be sure, the data is definitely on disk, if it returns form
the system call. In most cases, however, you choose ex3 to ensure the
consistency of your file system after a crash, to avoid an fsck.
If some files, created just before the crash, vanish, does not hurt
me too much.

Dieter.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3 metadata performace
  2006-05-11 14:11 ext3 metadata performace Dieter Stüken
@ 2006-05-11 15:43 ` Avi Kivity
  2006-05-11 18:46   ` Miquel van Smoorenburg
  2006-05-12  0:36 ` Hua Zhong
  2006-05-12  6:35 ` Helge Hafting
  2 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2006-05-11 15:43 UTC (permalink / raw)
  To: Dieter Stüken; +Cc: linux-kernel

Dieter Stüken wrote:
> after I switched from from ext2 to ext3 i observed some severe 
> performance degradation. Most discussion about this topic deals
> with tuning of data-io performance. My problem however is related to 
> metadata updates. When cloning (cp -al) or deleting directory trees I 
> find, that about 7200 files are created/deleted per minute. Seems
> this is related to some ex3 strategy, to wait for each metadata to be
> written to disk. Interestingly this occurs with my new hw-raid
> controller (3ware 9500S), which even has an battery buffered disk cache.
> Thus there is no need for synchronous IO anyway. If I disable the
> disk cache on my plain SATA disk using ext3, I also get this behavior.
>
Try increasing the journal size (mkfs -t ext3 -J size=20000) and see if 
that improves things.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3 metadata performace
  2006-05-11 15:43 ` Avi Kivity
@ 2006-05-11 18:46   ` Miquel van Smoorenburg
  0 siblings, 0 replies; 7+ messages in thread
From: Miquel van Smoorenburg @ 2006-05-11 18:46 UTC (permalink / raw)
  To: linux-kernel

In article <44635BA8.9060002@argo.co.il>,
Avi Kivity  <avi@argo.co.il> wrote:
>Dieter Stuken wrote:
>> after I switched from from ext2 to ext3 i observed some severe 
>> performance degradation. Most discussion about this topic deals
>> with tuning of data-io performance. My problem however is related to 
>> metadata updates. When cloning (cp -al) or deleting directory trees I 
>> find, that about 7200 files are created/deleted per minute. Seems
>> this is related to some ex3 strategy, to wait for each metadata to be
>> written to disk. Interestingly this occurs with my new hw-raid
>> controller (3ware 9500S), which even has an battery buffered disk cache.
>> Thus there is no need for synchronous IO anyway. If I disable the
>> disk cache on my plain SATA disk using ext3, I also get this behavior.
>>
>Try increasing the journal size (mkfs -t ext3 -J size=20000) and see if 
>that improves things.

Also, with 3ware, look in /sys/block/sd* and set queue_depth to
254/(nr_arrays), and nr_requests to at least 2*queue_depth. Also
try another I/O scheduler (deadline instead of as).

Mike.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: ext3 metadata performace
  2006-05-11 14:11 ext3 metadata performace Dieter Stüken
  2006-05-11 15:43 ` Avi Kivity
@ 2006-05-12  0:36 ` Hua Zhong
  2006-05-12  6:35 ` Helge Hafting
  2 siblings, 0 replies; 7+ messages in thread
From: Hua Zhong @ 2006-05-12  0:36 UTC (permalink / raw)
  To: 'Dieter Stüken', linux-kernel

> (3ware 9500S), which even has an battery buffered disk cache.
> Thus there is no need for synchronous IO anyway. If I disable 
> the disk cache on my plain SATA disk using ext3, I also get 
> this behavior.

If you mean the disk cache is reliable with the battery, then it should be done by the block layer that a write barrier doesn't
translate into a SYNC (or whatever it is called). Instead, data is considered synced to disk as soon as it hits the cache.

It's really nothing to do with EXT3. It's doing the right thing.

Hua




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3 metadata performace
  2006-05-11 14:11 ext3 metadata performace Dieter Stüken
  2006-05-11 15:43 ` Avi Kivity
  2006-05-12  0:36 ` Hua Zhong
@ 2006-05-12  6:35 ` Helge Hafting
  2006-05-12 10:11   ` Dieter Stüken
  2 siblings, 1 reply; 7+ messages in thread
From: Helge Hafting @ 2006-05-12  6:35 UTC (permalink / raw)
  To: Dieter Stüken; +Cc: linux-kernel

Dieter Stüken wrote:

> after I switched from from ext2 to ext3 i observed some severe 
> performance degradation. Most discussion about this topic deals
> with tuning of data-io performance. My problem however is related to 
> metadata updates. When cloning (cp -al) or deleting directory trees I 
> find, that about 7200 files are created/deleted per minute. Seems
> this is related to some ex3 strategy, to wait for each metadata to be
> written to disk. Interestingly this occurs with my new hw-raid
> controller (3ware 9500S), which even has an battery buffered disk cache.
> Thus there is no need for synchronous IO anyway. If I disable the
> disk cache on my plain SATA disk using ext3, I also get this behavior.
>
> Would it be make sense for ext3, to disable synchronous writes even
> for metadata (similar to the "data=writeback" option)? This means, that
> ext3 won't protect the (meta) data currently written. This is needed
> if running a database or an email server, where the process performing
> the IO must be sure, the data is definitely on disk, if it returns form
> the system call. In most cases, however, you choose ex3 to ensure the
> consistency of your file system after a crash, to avoid an fsck.
> If some files, created just before the crash, vanish, does not hurt
> me too much.

Turning off synchronous writes like this won't work!
The battery-backed cache can help you in that you can consider
data "written" once it is transferred to that cache.  Metadata must still
go synchronously into the cache though, or you get a broken fs
if ever your machine crash in the middle of a transaction. (Leaving
an update halfway in that battery cache, and halfway in main memory.
Then main memory dies from the power cut / reboot.)

The caching controller should report back to the linux device driver
that "data is committed" as soon as it hits the cache - no need to
wait for it to actually hit the platters.  This can help performance with
bursty writes tremendously - but it won't help you with long-lasting writes
as you will then be limited by platter speed as soon as the battery cache
is completely full.

Helge Hafting




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3 metadata performace
  2006-05-12  6:35 ` Helge Hafting
@ 2006-05-12 10:11   ` Dieter Stüken
  0 siblings, 0 replies; 7+ messages in thread
From: Dieter Stüken @ 2006-05-12 10:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Helge Hafting, hzhong

Helge Hafting wrote:
> Dieter Stüken wrote:
>> Would it be make sense for ext3, to disable synchronous writes even
>> for metadata (similar to the "data=writeback" option)?
> 
> Turning off synchronous writes like this won't work!
> The battery-backed cache can help you in that you can consider
> data "written" once it is transferred to that cache.  Metadata must still
> go synchronously into the cache though, or you get a broken fs
> if ever your machine crash in the middle of a transaction. (Leaving
> an update halfway in that battery cache, and halfway in main memory.
> Then main memory dies from the power cut / reboot.)
> 
> The caching controller should report back to the linux device driver
> that "data is committed" as soon as it hits the cache - no need to
> wait for it to actually hit the platters.  This can help performance with
> bursty writes tremendously - but it won't help you with long-lasting writes
> as you will then be limited by platter speed as soon as the battery cache
> is completely full.

The battery buffered cache is about 100Mb compared to 8k or 16k of the
disk buffer cache itself. So it won't become full that fast...

I just tested the same with my other controller (a 3ware 9550SX) which
has an option to configure explicitly if a write is acknowledged as
soon as the data is saved to the (buffered) memory or if it will delay
the acknowledge until data got written to disk. So this is similar to
enabling/disabling the disk cache on a plain disk. I did not found a
way to configure this on my older 3ware 9500S controller, even if it
has a battery backup, too (will ask 3ware about this).

Hua Zhong wrote:
>> If you mean the disk cache is reliable with the battery, then it 
 >> should be done by the block layer that a write barrier doesn't
>> translate into a SYNC (or whatever it is called). Instead, data is 
 >> considered synced to disk as soon as it hits the cache.
>> 
>> It's really nothing to do with EXT3. It's doing the right thing.

I read something about "write barriers", but I don't know if these are
already used by my current 2.6.15 (I may try to use the actual kernel
tomorrow). Is there a difference between a SATA disk and a SCSI disk?
(which is emulated by my 3Ware controllers).

Dieter.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ext3 metadata performace
       [not found] <6bkbC-4V9-27@gated-at.bofh.it>
@ 2006-05-12  0:12 ` Robert Hancock
  0 siblings, 0 replies; 7+ messages in thread
From: Robert Hancock @ 2006-05-12  0:12 UTC (permalink / raw)
  To: Dieter Stüken, linux-kernel

Dieter Stüken wrote:
> after I switched from from ext2 to ext3 i observed some severe 
> performance degradation. Most discussion about this topic deals
> with tuning of data-io performance. My problem however is related to 
> metadata updates. When cloning (cp -al) or deleting directory trees I 
> find, that about 7200 files are created/deleted per minute. Seems
> this is related to some ex3 strategy, to wait for each metadata to be
> written to disk. Interestingly this occurs with my new hw-raid
> controller (3ware 9500S), which even has an battery buffered disk cache.
> Thus there is no need for synchronous IO anyway. If I disable the
> disk cache on my plain SATA disk using ext3, I also get this behavior.
> 
> Would it be make sense for ext3, to disable synchronous writes even
> for metadata (similar to the "data=writeback" option)? This means, that
> ext3 won't protect the (meta) data currently written. This is needed
> if running a database or an email server, where the process performing
> the IO must be sure, the data is definitely on disk, if it returns form
> the system call. In most cases, however, you choose ex3 to ensure the
> consistency of your file system after a crash, to avoid an fsck.
> If some files, created just before the crash, vanish, does not hurt
> me too much.

I think that doing this would destroy all filesystem consistency 
guarantees provided by ext3. In this case you might as well use ext2. In 
order for the journalling to work, the metadata updates must be written 
to the journal before any of them start modifying the actual disk 
metadata, otherwise there is no way to recover in the event of a crash.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-05-12 10:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-11 14:11 ext3 metadata performace Dieter Stüken
2006-05-11 15:43 ` Avi Kivity
2006-05-11 18:46   ` Miquel van Smoorenburg
2006-05-12  0:36 ` Hua Zhong
2006-05-12  6:35 ` Helge Hafting
2006-05-12 10:11   ` Dieter Stüken
     [not found] <6bkbC-4V9-27@gated-at.bofh.it>
2006-05-12  0:12 ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).