Re: Filesystem performance on 2.4.28-pre3 on hardware RAID5.

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Filesystem performance on 2.4.28-pre3 on hardware RAID5.
@ 2004-10-29 11:10 mmokrejs
  2004-10-31 23:24 ` Nathan Scott
  0 siblings, 1 reply; 5+ messages in thread
From: mmokrejs @ 2004-10-29 11:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: nathans

Hi Nathan, Marcello and others,
  the collested meminfo, slabinfo, vmstat output are at
http://www.natur.cuni.cz/~mmokrejs/crash/

Those precrash-* files contain output since the machine
was fresh, every second I appended current stats to each
of them. I believe some data were not flushed into the disk
before the problem.

I get on STDERR "fork: Cannot allocate memory"

Using anothe ropen console session and doing df gives me:

start_pipeline: Too many open files in system
fork: Cannot allocate memory

I had fortunately xdm to kill, so I could then do
sync and collect some stats (although some resources
got freed by xdm/X11). Those files are named with
prefix crash-*.

After that, I decided to put continue the suspended job,
and those files collected are precrash2-* prefixed.

There /var/log/messages included.

If you tell what kind of memory/xfs debugging I should turn
on adn *how*, I can do it immediately. I don't have access
to the machine daily, and already had to be in production. :(

Martin
P.S: It is hardware raid5. I use mkfs.xfs version 2.6.13.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Filesystem performance on 2.4.28-pre3 on hardware RAID5.
  2004-10-29 11:10 Filesystem performance on 2.4.28-pre3 on hardware RAID5 mmokrejs
@ 2004-10-31 23:24 ` Nathan Scott
       [not found]   ` <418574FB.2020907@ribosome.natur.cuni.cz>
  0 siblings, 1 reply; 5+ messages in thread
From: Nathan Scott @ 2004-10-31 23:24 UTC (permalink / raw)
  To: mmokrejs; +Cc: linux-kernel, linux-xfs

On Fri, Oct 29, 2004 at 01:10:49PM +0200, mmokrejs@ribosome.natur.cuni.cz wrote:
> Hi Nathan, Marcello and others,
>   the collested meminfo, slabinfo, vmstat output are at
> http://www.natur.cuni.cz/~mmokrejs/crash/

On Sun, Oct 31, 2004 at 11:20:35PM +0100, Martin MOKREJ? wrote:
> Sorry, fixed by soflink. Was actually http://www.natur.cuni.cz/~mmokrejs/tmp/c
rash/

OK, well there's your problem - see the slabinfo output - you
have over 700MB of buffer_head structures that are not being
reclaimed.  Everything else seems to be fine.

> If you tell what kind of memory/xfs debugging I should turn
> on adn *how*, I can do it immediately. I don't have access

I think turning on debugging is going to hinder more than it
will help here.

> to the machine daily, and already had to be in production. :(
> 
> P.S: It is hardware raid5. I use mkfs.xfs version 2.6.13.

Hmm.  Did that patch I sent you help at all?  That should help
free up buffer_heads more effectively in low memory situations
like this.  You may also have some luck with tweaking bdflush
parameters so that flushing out of dirty buffers is started
earlier and/or done more often.  I can't remember off the top
of my head what all the bdflush tunables are - see the bdflush
section in Documentation/filesystems/proc.txt.

Alternatively, pick a filesystem blocksize that matches your
pagesize (4K instead of 512 bytes) to minimise the number of
buffer_heads you end up using.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Filesystem performance on 2.4.28-pre3 on hardware RAID5.
       [not found]       ` <41878432.5060904@ribosome.natur.cuni.cz>
@ 2004-11-03  0:17         ` Nathan Scott
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Scott @ 2004-11-03  0:17 UTC (permalink / raw)
  To: Martin MOKREJ?; +Cc: linux-xfs, linux-kernel

On Tue, Nov 02, 2004 at 01:57:22PM +0100, Martin MOKREJ? wrote:
> I retested with blocksize 1024, instead of 512 (default) which causes problems:

4K is the default blocksize, not 1024 or 512 bytes.  From going
through all your notes, the default mkfs parameters are working
fine, and changing to a 512 byte blocksize (-blog=9 / -bsize=512)
is where the VM starts to see problems.

I don't have a device the size of yours handy on my test box, nor
do I have as much memory as you -- but I ran similar bonnie++
commands with -bsize=512 filesystems on a machine with very little
memory, and a filesystem and file size exponentially larger than
available memory, and it ran to completion without problems.
I did see vastly more buffer_heads being created than with the
default mkfs parameters (as we'd expect with that blocksize) but
it didn't cause me any VM problems.

> How can I free the buffer_head without rebooting? I'm trying to help myself with

AFAICT, there is no way to do this without a reboot.  They are
meant to be reclaimed (and were reclaimed on my test box) as
needed, but they don't seem to be for you.

This looks alot like a VM balancing sort of problem to me (that
6G of memory you have is a bit unusual - probably not a widely
tested configuration on i386... maybe try booting with mem=1G
and see if that changes anything?), so far it doesn't seem like
XFS is at fault here at least.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Filesystem performance on 2.4.28-pre3 on hardware RAID5.
  2004-10-28 22:43 Martin MOKREJŠ
@ 2004-10-29  7:31 ` Nathan Scott
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Scott @ 2004-10-29  7:31 UTC (permalink / raw)
  To: Martin MOKREJ?; +Cc: linux-xfs, linux-kernel

Hi there,

On Fri, Oct 29, 2004 at 12:43:30AM +0200, Martin MOKREJ? wrote:
> "mount -t xfs -o async" unexpectedly kills random seek performance,
> but is still a bit better than with "-o sync". ;) Maybe it has to do
> with the dramatic jump in CPU consumption of this operation,
> as it in both cases it takes about 21-26% instead of usual 3%.
> Why? Isn't actually async default mode?

Thats odd.  Actually, I'm not sure what the "async" option is meant
to do, it isn't seen by the fs afaict (XFS isn't looking for it)... 
we also use the generic_file_llseek code in XFS ... so we're not
doing anything special there either -- some profiling data showing
where that CPU time is spent would be insightful.

> Sequential create /Create
> Random create /Create
> XFS             60-120 ms

You may get better results using a version 2 log (mkfs option)
with large in-core log buffers (mount option) for these (which
mkfs version are you using atm?)

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Filesystem performance on 2.4.28-pre3 on hardware RAID5.
@ 2004-10-28 22:43 Martin MOKREJŠ
  2004-10-29  7:31 ` Nathan Scott
  0 siblings, 1 reply; 5+ messages in thread
From: Martin MOKREJŠ @ 2004-10-28 22:43 UTC (permalink / raw)
  To: linux-kernel

Hi,
  I have finished evaluation of my tests of filesystems. For your
interest I put the results at http://www.natur.cuni.cz/~mmokrejs/linux-performance.
I have some questions on the developers:

1.
ext3 has lost nice performance compared to ext2 in terms of "Random create /Delete"
test, whatever it does (see bonnie++ 1.93c docs), also in terms of "Sequential output /Char"
test and in peak perfomance of "Sequential output /Block" test, in perfomance
of "Sequential create /Create" test, "Random seek" perfomance.

2.
"mount -t xfs -o async" unexpectedly kills random seek performance,
but is still a bit better than with "-o sync". ;) Maybe it has to do
with the dramatic jump in CPU consumption of this operation,
as it in both cases it takes about 21-26% instead of usual 3%.
Why? Isn't actually async default mode?

3.
"mke2fs -O dir_index" decreased Sequential output /Block perf
by 25%. It seems the only positive effect of this option
is in Random create /Delete test and "Random seek" test.

4.
At least on this RAID5, "mke2fs -T largefile or -T largefile4"
should be prohibited, as they kill Sequential create /Create perf.

5. Based on you experience, would you prefer better random IO
or sequential IO? The server should house many huge files around 800MB or more,
in general there should be more reads then writes on the partition.
As the RAID5 splits data into blocks on the RAID, sequential reads or writes
anyway get split (cannot if if also randomly sheared over every disk plate). 

Any feedback? Please Cc: me. Thanks.
Martin

For the archive, some minimal results from the test.
There's a lot more, including raw data on that page and even more comments
and some comparisons, briefly (more comments on the web):

----------------------------------------------------
Sequential output /Char
ReiserFS3       255 K/sec
XFS             425 K/sec
ext3            122 K/sec
ext2            540 K/sec (560?)
----------------------------------------------------
Sequential output /Block
ReiserFS3       53400 K/sec
XFS             56500 K/sec
ext3            48000-51500 K/sec
ext2            36000-53000 K/sec
----------------------------------------------------
Sequential create /Create
ReiserFS3       18-23 ms
ReiserFs3       55 us (when "mkreiserfs -t 128 or 256 or 512 or 768" used)
XFS             60-120 ms
ext3            2-3 ms 
ext2            25-30 us (with exception of "mke2fs -T largefile or largefile4")
----------------------------------------------------
Random seeks take very different amount of CPU and have
different amount of seeks per second: 
ReiserFS3       520 seeks/sec and consumes 60% (with 2 weird exceptions below)
ReiserFS3       1290 seeks/sec and consumes 3%
                (mkreiserfs -s 1024 or 2048 or 16384)
                (mkreiserfs -t 768)
XFS             804 seeks/sec and consumes 3%
XFS             540-660 seeks/sec and consumes 21-26%
                (worse values for mount -o sync,
                better values for -o async, but still
                worse than if the switch is omitted).
ext3            770 seeks/sec and consumes 30%
ext2            790-800 seeks/sec and consumes 30%
ext2            815  seeks/sec and consumes 30%
                (mke2fs -O dir_index)
----------------------------------------------------
Random create /Create
ReiserFS3       50-55 us
XFS             3000 us
ext3            1400-7000 us
ext2            24-30 us
----------------------------------------------------
Random create /Read
ReiserFS3       mostly 8-10 us (-o notail doubles the time,
                also by 30% increases Sequential create /Create time
                and by 60% decreases number of Random seeks per sec.
XFS             9-13 or 19 us
ext3            5 us
ext2            10 us
----------------------------------------------------
Random create /Delete:
ReiserFS3       80-90 us
XFS             3000-3500us
ext3            43-66us
ext2            23-38us

How I tested?
I used "bonnie++ -n 1 -s 12G -d /scratch -u apache -q"
on an external RAID5 1TB logical drive. Data should be split
by raid controller into 128 kB chunks. The server has 6 GB RAM,
SMP and HIGHMEM enabled 2.4.28-pre3 kernel, 12 GB swap
on same RAID5 and 1 GB ECC RAM on raid controller. However,
only 500MB are used (it's dual-controller, so every controller
uses just half, the other half is used to mirror the other controller).

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-11-03  0:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-29 11:10 Filesystem performance on 2.4.28-pre3 on hardware RAID5 mmokrejs
2004-10-31 23:24 ` Nathan Scott
     [not found]   ` <418574FB.2020907@ribosome.natur.cuni.cz>
     [not found]     ` <20041031223214.GB690@frodo>
     [not found]       ` <41878432.5060904@ribosome.natur.cuni.cz>
2004-11-03  0:17         ` Nathan Scott
  -- strict thread matches above, loose matches on Subject: below --
2004-10-28 22:43 Martin MOKREJŠ
2004-10-29  7:31 ` Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).