linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Tomasz Chmielewski <mangoo@wpkg.org>
Cc: Theodore Tso <tytso@mit.edu>, Andi Kleen <andi@firstfloor.org>,
	LKML <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-raid@vger.kernel.org
Subject: Re: very poor ext3 write performance on big filesystems?
Date: Wed, 27 Feb 2008 12:03:35 -0800	[thread overview]
Message-ID: <20080227200335.GB9331@webber.adilger.int> (raw)
In-Reply-To: <47C54773.4040402@wpkg.org>

I'm CCing the linux-raid mailing list, since I suspect they will be
interested in this result.

I would suspect that the "journal guided RAID recovery" mechanism
developed by U.Wisconsin may significantly benefit this workload
because the filesystem journal is already recording all of these
block numbers and the MD bitmap mechanism is pure overhead.

On Feb 27, 2008  12:20 +0100, Tomasz Chmielewski wrote:
> Theodore Tso schrieb:
>> On Mon, Feb 18, 2008 at 03:03:44PM +0100, Andi Kleen wrote:
>>> Tomasz Chmielewski <mangoo@wpkg.org> writes:
>>>> Is it normal to expect the write speed go down to only few dozens of
>>>> kilobytes/s? Is it because of that many seeks? Can it be somehow
>>>> optimized? 
>>>
>>> I have similar problems on my linux source partition which also
>>> has a lot of hard linked files (although probably not quite
>>> as many as you do). It seems like hard linking prevents
>>> some of the heuristics ext* uses to generate non fragmented
>>> disk layouts and the resulting seeking makes things slow.
>
> A follow-up to this thread.
> Using small optimizations like playing with /proc/sys/vm/* didn't help 
> much, increasing "commit=" ext3 mount option helped only a tiny bit.
>
> What *did* help a lot was... disabling the internal bitmap of the RAID-5 
> array. "rm -rf" doesn't "pause" for several seconds any more.
>
> If md and dm supported barriers, it would be even better I guess (I could 
> enable write cache with some degree of confidence).
>
> This is "iostat sda -d 10" output without the internal bitmap.
> The system mostly tries to read (Blk_read/s), and once in a while it
> does a big commit (Blk_wrtn/s):
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             164,67      2088,62         0,00      20928          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             180,12      1999,60         0,00      20016          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             172,63      2587,01         0,00      25896          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             156,53      2054,64         0,00      20608          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             170,20      3013,60         0,00      30136          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             119,46      1377,25      5264,67      13800      52752
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             154,05      1897,10         0,00      18952          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             197,70      2177,02         0,00      21792          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             166,47      1805,19         0,00      18088          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             150,95      1552,05         0,00      15536          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             158,44      1792,61         0,00      17944          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             132,47      1399,40      3781,82      14008      37856
>
>
>
> With the bitmap enabled, it sometimes behave similarly, but mostly, I
> can see as reads compete with writes, and both have very low numbers then:
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             112,57       946,11      5837,13       9480      58488
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             157,24      1858,94         0,00      18608          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             116,90      1173,60        44,00      11736        440
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              24,05        85,43       172,46        856       1728
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              25,60        90,40       165,60        904       1656
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              25,05       276,25       180,44       2768       1808
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              22,70        65,60       229,60        656       2296
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              21,66       202,79       786,43       2032       7880
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              20,90        83,20      1800,00        832      18000
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              51,75       237,36       479,52       2376       4800
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              35,43       129,34       245,91       1296       2464
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              34,50        88,00       270,40        880       2704
>
>
> Now, let's disable the bitmap in the RAID-5 array:
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             110,59       536,26       973,43       5368       9744
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             119,68       533,07      1574,43       5336      15760
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             123,78       368,43      2335,26       3688      23376
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             122,48       315,68      1990,01       3160      19920
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             117,08       580,22      1009,39       5808      10104
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             119,50       324,00      1080,80       3240      10808
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             118,36       353,69      1926,55       3544      19304
>
>
> And let's enable it again - after a while, it degrades again, and I can
> see "rm -rf" stops for longer periods:
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             162,70      2213,60         0,00      22136          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             165,73      1639,16         0,00      16408          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             119,76      1192,81      3722,16      11952      37296
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             178,70      1855,20         0,00      18552          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             162,64      1528,07         0,80      15296          8
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             182,87      2082,07         0,00      20904          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             168,93      1692,71         0,00      16944          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             177,45      1572,06         0,00      15752          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             123,10      1436,00      4941,60      14360      49416
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             201,30      1984,03         0,00      19880          0
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             165,50      1555,20        22,40      15552        224
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              25,35       273,05       189,22       2736       1896
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              22,58        63,94       165,43        640       1656
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda              69,40       435,20       262,40       4352       2624
>
>
>
> There is a related thread (although not much kernel-related) on a BackupPC 
> mailing list:
>
> http://thread.gmane.org/gmane.comp.sysutils.backup.backuppc.general/14009
>
> As it's BackupPC software which makes this amount of hardlinks (but hey, I 
> can keep ~14 TB of data on a 1.2 TB filesystem which is not even 65% full).
>
>
> -- 
> Tomasz Chmielewski
> http://wpkg.org
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-02-27 20:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-18 12:57 very poor ext3 write performance on big filesystems? Tomasz Chmielewski
2008-02-18 14:03 ` Andi Kleen
2008-02-18 14:16   ` Theodore Tso
2008-02-18 15:02     ` Tomasz Chmielewski
2008-02-18 15:16       ` Theodore Tso
2008-02-18 15:57         ` Andi Kleen
2008-02-18 15:35           ` Theodore Tso
2008-02-20 10:57             ` Jan Engelhardt
2008-02-20 17:44               ` David Rees
2008-02-20 18:08                 ` Jan Engelhardt
2008-02-18 16:16         ` Tomasz Chmielewski
2008-02-18 18:45           ` Theodore Tso
2008-02-18 15:18     ` Andi Kleen
2008-02-18 15:03       ` Theodore Tso
2008-02-19 14:54     ` Tomasz Chmielewski
2008-02-19 15:06       ` Chris Mason
2008-02-19 15:21         ` Tomasz Chmielewski
2008-02-19 16:04           ` Chris Mason
2008-02-19 18:29     ` Mark Lord
2008-02-19 18:41       ` Mark Lord
2008-02-19 18:58       ` Paulo Marques
2008-02-19 22:33         ` Mark Lord
2008-02-27 11:20     ` Tomasz Chmielewski
2008-02-27 20:03       ` Andreas Dilger [this message]
2008-02-27 20:25         ` Tomasz Chmielewski
2008-03-01 20:04         ` Bill Davidsen
2008-02-19  9:24 ` Vladislav Bolkhovitin
     [not found] <9YdLC-75W-51@gated-at.bofh.it>
     [not found] ` <9YeRh-Gq-39@gated-at.bofh.it>
     [not found]   ` <9Yf0W-SX-19@gated-at.bofh.it>
     [not found]     ` <9YfNi-2da-23@gated-at.bofh.it>
     [not found]       ` <9YfWL-2pZ-1@gated-at.bofh.it>
     [not found]         ` <9Yg6H-2DJ-23@gated-at.bofh.it>
2008-02-19 13:14           ` Paul Slootman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080227200335.GB9331@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=andi@firstfloor.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mangoo@wpkg.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).