linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write
@ 2004-03-15 21:47 Matthias Andree
  2004-03-15 23:23 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Andree @ 2004-03-15 21:47 UTC (permalink / raw)
  To: Linux-Kernel mailing list

Hi,

I have an application that writes bulky data files in a short time. The
file sizes are 7, 13, 75 and 95 MBytes (total 190); each file is fsynced
after write, the test partitions are otherwise idle.  Think gunzip if
you wish, but it isn't gunzip.

The destination partition is on a Maxtor 4K060H3 ATA drive (5400/min, 12
ms seek, UDMA/100 is enabled), with disabled write cache for one test
set and enabled write cache for another. The ext3 partition is on the
outside (hda5) of the xfs partition (hda7).

Either test uses default scheduler settings (anticipatory I/O
scheduler), ATA controller is a VIA 8237, 256 MB RAM of which around 130
MB are used for X11, and idle squid and some idle Perl processes.

For comparison purposes, I have also checked a reiserfs partition on a
7200/min SCSI drive with write cache and tagged command queueing on.

ext3fs runs in the default data=ordered mode for one test and
data=writeback for another. xfs runs in default mode without special
realtime tricks or such. XFS is at least by a factor three faster than
even ext3 -o data=writeback.

                          real usr system time in seconds
                             |   |   |  throughput MB/s approx.
ATA ext3 ordered WC on      52 6.3 1.2  3.6
ATA ext3 writeback WC on    46 6.4 1.3  4.1
ATA xfs WC on               15 6.3  .9 12.7

ATA ext3 ordered WC off    172 6.3 1.2  1.1
ATA ext3 writeback WC off  128 6.2 1.3  1.5
ATA xfs WC off              32 6.3  .9  5.9
-----------------------------------------------------------------------------
SCSI reiserfs TCQ on WC on  38 6.8 1.2  5.0

READ performance with hdparm -tT (above is write performance):
 Timing buffer-cache reads:  1180 MB in  2.01 seconds = 588.03 MB/sec
 Timing buffered disk reads:   98 MB in  3.03 seconds =  32.38 MB/sec

I'd think we can't reach such a value for writes to a real file system.

Watching "vmstat 1" reveals that ext3fs hangs with >= 98% I/O wait for a
long time, something the other FS don't do; the block-out rate is
considerably lower for ext3fs as well. XFS does not hesitate to stuff
down 26,000 blocks in a single second, ext3fs hardly exceeds 700.

What makes ext3fs so much slower than xfs? (Note that even XFS with
write cache on doesn't get close to even half of what one would expect of
such a system.)

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write
  2004-03-15 21:47 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write Matthias Andree
@ 2004-03-15 23:23 ` Andrew Morton
  2004-03-16  1:54   ` Matthias Andree
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2004-03-15 23:23 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

Matthias Andree <matthias.andree@gmx.de> wrote:
>
> ext3fs runs in the default data=ordered mode for one test and
> data=writeback for another. xfs runs in default mode without special
> realtime tricks or such. XFS is at least by a factor three faster than
> even ext3 -o data=writeback.

It should be possible to generate a simple testcase which demonstrates this
problem on that machine.  Is that something you can do?

>From your description, write-and-fsync.c from

	http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz

would be a good starting point.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write
  2004-03-15 23:23 ` Andrew Morton
@ 2004-03-16  1:54   ` Matthias Andree
  2004-03-18  5:43     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Andree @ 2004-03-16  1:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

Andrew Morton:

> It should be possible to generate a simple testcase which demonstrates this
> problem on that machine.  Is that something you can do?
> 
> From your description, write-and-fsync.c from
> 
> 	http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
> 
> would be a good starting point.

I've run "write-and-fsync -m SIZE -f SOMEFILE" where size is given in
each section and SOMEFILE was chosen for some real-life but idle file
system. I hope this is what you meant. I did one run per test.

Benchmarks from three different machines below (1/2 from the same
machine). Times as per "GNU time", user time always below timer resolution.

I am aware that bulk write performance is just one of the possible load
profiles and there are many others involving seeks.

I am also aware that most of the Linux kernel developers run their ATA
data tombs with write caches turned on. Given that there is no proper
documentation on write barrier/ordered tag behaviour of file systems and
block device drivers, I feel uncomfortable with that for production
systems. (No UPS, we don't have difficulties with power outages... yet.)

The 2.6.4 kernel was taken from BK rather than the tarball but short
after the announcement with the TAG being the most recent change. test
#3 and #4 are just to give some other figures from Linux 2.4, they may
be irrelevant to the case but I was curious and thought I'd share the
figures.

========================================================================
Test set #1: file size 512 MB
AMD XP 2500+ Kernel 2.6.4 Maxtor 4K060H3 (5400/min ATA UDMA/100 2 MB cache)
VIA 8237   256 MB RAM  hdparm -tT 586 MB/s buffer 32.24 MB/s disk
               sys/s real/s   throughput/(MB/s)
WC=1 -------------------
XFS              1.8  24.18   21.2
ext3 writeback   2.6  74.21    6.9
ext3 ordered     2.4  74.70    6.9
ext3 journal     2.6  58.65    8.7
WC=0 -------------------
XFS              1.8  67.11    7.6
ext3 writeback   2.4 115.21    4.4
ext3 ordered     2.3 114.55    4.5
ext3 journal     2.8 323.30    1.6
               sys/s real/s   throughput/(MB/s)

========================================================================
Test set #2: file size 512 MB for ext3, 400 MB for reiserfs
(same motherboard as in test #1)
AMD XP 2500+ Kernel 2.6.4 Fujitsu MAH3182MP (7200/min Ultra 160 SCSI 4 MB cache)
SYM53C875   256 MB RAM  hdparm -tT 580 MB/s buffer 29.18 MB/s disk
The drive likely saturates the SCSI host bus.

/sys/block/sda/queue/iosched/antic_expire was set to 0
as the anticipatory scheduler documentation recommends so for data base
workload and drives with tagged command queueing (which mine is)

               sys/s real/s   throughput/(MB/s)
ext3 ordered     1.4  55.27    9.3   512 MB file
reiserfs 400 MB  1.4  52.74    7.6   400 MB file
reiserfs NO TCQ  1.4  74.64    5.4   tagged command queueing off, 400 MB
========================================================================
Test set #3: file size 1024 MB
Xeon 2.8 GHz  Kernel 2.4.20 (SuSE 9.0) RAID5 of 3 pcs Fujitsu MAP3367NC
512 MB RAM                       (10,025/min Ultra 320 SCSI 8 MB Cache)
LSI MegaRAID 320-1 w/ 64 MB Cache, write-through cache
This is server-class hardware with PCI-X hotplug, ServerWorks chips,
monitoring, remote management and such.
hdparm -tT 1331 MB/s buffer 135 MB/s "disk"

ext3 ordered         116.37s elapsed  8.8 MB/s (sys unknown)

This looks unbelievably low but may be a tribute to RAID5 and
write-through caching - the system is set for reliability.

========================================================================
Test set #4: file size 1024 MB
AMD XP 1700+ Kernel 2.4.21 (SuSE 8.2)
SCSI: Fujitsu MAP3367NP Adaptec 2940UW (10,025/min Ultra320 8 MB Cache)
ATA:  IBM IC35L060AVV207-0 VIA 8233    ( 7,200/min UDMA100  2 MB Cache)
write caches off on all drives
hdparm -tT ATA: buffer 320/disk 47.76, SCSI: buffer 328/disk 28.57
The SCSI drive definitely saturates the SCSI BUS.

times in s     sys  real  throughput MB/s
SCSI reiserfs 6.7  43.44  23.6
ATA ext3fs    6.7 162.53   6.3
ATA jfs       3.6 171.15   6.0
========================================================================

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write
  2004-03-16  1:54   ` Matthias Andree
@ 2004-03-18  5:43     ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2004-03-18  5:43 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

Matthias Andree <matthias.andree@gmx.de> wrote:
>
> Andrew Morton:
> 
> > It should be possible to generate a simple testcase which demonstrates this
> > problem on that machine.  Is that something you can do?
> > 
> > From your description, write-and-fsync.c from
> > 
> > 	http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz
> > 
> > would be a good starting point.
> 
> I've run "write-and-fsync -m SIZE -f SOMEFILE" where size is given in
> each section and SOMEFILE was chosen for some real-life but idle file
> system. I hope this is what you meant. I did one run per test.

Cannot reproduce, sorry.  Kernel is 2.6.5-rc1, disk is a "MAXTOR 6L080J4"


time write-and-fsync -m 256 -f foo

writeback caching off, ext3/ordered:
  write-and-fsync -m 256 -f foo  0.00s user 1.19s system 4% cpu 24.247 total

writeback caching off, XFS:
  write-and-fsync -m 256 -f foo  0.00s user 0.57s system 2% cpu 24.041 total

writeback caching on, ext3/ordered:
  write-and-fsync -m 256 -f foo  0.00s user 1.16s system 14% cpu 8.169 total

writeback caching on, XFS:
  write-and-fsync -m 256 -f foo  0.00s user 0.58s system 8% cpu 6.950 total
  write-and-fsync -m 256 -f foo  0.00s user 0.54s system 6% cpu 8.109 total
  write-and-fsync -m 256 -f foo  0.00s user 0.55s system 5% cpu 10.057 total
  write-and-fsync -m 256 -f foo  0.00s user 0.56s system 8% cpu 6.870 total

(quite some variability in XFS)


So...   Maybe you could test some other disks or something?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-03-18  5:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-15 21:47 2.6.4 ext3fs half order of magnitude slower than xfs - bulk write Matthias Andree
2004-03-15 23:23 ` Andrew Morton
2004-03-16  1:54   ` Matthias Andree
2004-03-18  5:43     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).