All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Stefan Ring <stefanrin@gmail.com>
Cc: Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
Date: Mon, 09 Apr 2012 18:38:04 -0500	[thread overview]
Message-ID: <4F8372DC.7030405@hardwarefreak.com> (raw)
In-Reply-To: <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@mail.gmail.com>

On 4/9/2012 6:02 AM, Stefan Ring wrote:
>> Not at all.  You can achieve this performance with the 6 300GB spindles
>> you currently have, as Christoph and I both mentioned.  You simply lose
>> one spindle of capacity, 300GB, vs your current RAID6 setup.  Make 3
>> RAID1 pairs in the p400 and concatenate them.  If the p400 can't do this
>> concat the mirror pair devices with md --linear.  Format the resulting
>> Linux block device with the following and mount with inode64.
>>
>> $ mkfs.xfs -d agcount=3 /dev/[device]
>>
>> That will give you 1 AG per spindle, 3 horizontal AGs total instead of 4
>> vertical AGs as you get with default striping setup.  This is optimal
>> for your high IOPS workload as it eliminates all 'extraneous' seeks
>> yielding a per disk access pattern nearly identical to EXT4.  And it
>> will almost certainly outrun EXT4 on your RAID6 due mostly to the
>> eliminated seeks, but also to elimination of parity calculations.
>> You've wiped the array a few times in your testing already right, so one
>> or two more test setups should be no sweat.  Give it a go.  The results
>> will be pleasantly surprising.
> 
> Well I had to move around quite a bit of data, but for the sake of
> completeness, I had to give it a try.
> 
> With a nice and tidy fresh XFS file system, performance is indeed
> impressive – about 16 sec for the same task that would take 2 min 25
> before. So that’s about 150 MB/sec, which is not great, but for many
> tiny files it would perhaps be a bit unreasonable to expect more. A

150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
appear that you're writing all these files to a single directory.  If
you're writing them fairly evenly to 3 directories or a multiple of 3,
you should see close to 450MB/s, if using mdraid linear over 3 P400
RAID1 pairs.  If this is what you're doing then something seems wrong
somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
exercise all 3 AGs thus all 3 spindles.

> simple copy of the tar onto the XFS file system yields the same linear
> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
> best these disks can do, meaning that theoretically, with 3 AGs, it
> should be able to reach 450 MB/sec under optimal conditions.

The optimal condition, again, requires writing 3 of this file to 3
directories to hit ~450MB/s, which you should get close to if using
mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
parallelism must come from manipulating usage of filesystem structures.
 I thought I explained all of this previously when I introduced the "XFS
concat" into this thread.

> I will still do a test with the free space fragmentation priming on
> the concatenated AG=3 volume, because it seems to be rather slow as
> well.

> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
> benefits in this case to justify the hassle.

If you were writing to only one directory I can understand this
sentiment.  Again, if you were writing 3 directories fairly evenly, with
the md concat, then your sentiment here should be quite different.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-04-09 23:38 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 18:10 XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?) Stefan Ring
2012-04-05 19:56 ` Peter Grandi
2012-04-05 22:41   ` Peter Grandi
2012-04-06 14:36   ` Peter Grandi
2012-04-06 15:37     ` Stefan Ring
2012-04-07 13:33       ` Peter Grandi
2012-04-05 21:37 ` Christoph Hellwig
2012-04-06  1:09   ` Peter Grandi
2012-04-06  8:25   ` Stefan Ring
2012-04-07 18:57     ` Martin Steigerwald
2012-04-10 14:02       ` Stefan Ring
2012-04-10 14:32         ` Joe Landman
2012-04-10 15:56           ` Stefan Ring
2012-04-10 18:13         ` Martin Steigerwald
2012-04-10 20:44         ` Stan Hoeppner
2012-04-10 21:00           ` Stefan Ring
2012-04-05 22:32 ` Roger Willcocks
2012-04-06  7:11   ` Stefan Ring
2012-04-06  8:24     ` Stefan Ring
2012-04-05 23:07 ` Peter Grandi
2012-04-06  0:13   ` Peter Grandi
2012-04-06  7:27     ` Stefan Ring
2012-04-06 23:28       ` Stan Hoeppner
2012-04-07  7:27         ` Stefan Ring
2012-04-07  8:53           ` Emmanuel Florac
2012-04-07 14:57           ` Stan Hoeppner
2012-04-09 11:02             ` Stefan Ring
2012-04-09 12:48               ` Emmanuel Florac
2012-04-09 12:53                 ` Stefan Ring
2012-04-09 13:03                   ` Emmanuel Florac
2012-04-09 23:38               ` Stan Hoeppner [this message]
2012-04-10  6:11                 ` Stefan Ring
2012-04-10 20:29                   ` Stan Hoeppner
2012-04-10 20:43                     ` Stefan Ring
2012-04-10 21:29                       ` Stan Hoeppner
2012-04-09  0:19           ` Dave Chinner
2012-04-09 11:39             ` Emmanuel Florac
2012-04-09 21:47               ` Dave Chinner
2012-04-07  8:49         ` Emmanuel Florac
2012-04-08 20:33           ` Stan Hoeppner
2012-04-08 21:45             ` Emmanuel Florac
2012-04-09  5:27               ` Stan Hoeppner
2012-04-09 12:45                 ` Emmanuel Florac
2012-04-13 19:36                   ` Stefan Ring
2012-04-14  7:32                     ` Stan Hoeppner
2012-04-14 11:30                       ` Stefan Ring
2012-04-09 14:21         ` Geoffrey Wehrman
2012-04-10 19:30           ` Stan Hoeppner
2012-04-11 22:19             ` Geoffrey Wehrman
2012-04-07 16:50       ` Peter Grandi
2012-04-07 17:10         ` Joe Landman
2012-04-08 21:42           ` Stan Hoeppner
2012-04-09  5:13             ` Stan Hoeppner
2012-04-09 11:52               ` Stefan Ring
2012-04-10  7:34                 ` Stan Hoeppner
2012-04-10 13:59                   ` Stefan Ring
2012-04-09  9:23             ` Stefan Ring
2012-04-09 23:06               ` Stan Hoeppner
2012-04-06  0:53   ` Peter Grandi
2012-04-06  7:32     ` Stefan Ring
2012-04-06  5:53   ` Stefan Ring
2012-04-06 15:35     ` Peter Grandi
2012-04-10 14:05       ` Stefan Ring
2012-04-07 19:11     ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8372DC.7030405@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=stefanrin@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.