Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Stefan Ring <stefanrin@gmail.com>
Cc: Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
Date: Mon, 09 Apr 2012 18:38:04 -0500	[thread overview]
Message-ID: <4F8372DC.7030405@hardwarefreak.com> (raw)
In-Reply-To: <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@mail.gmail.com>

On 4/9/2012 6:02 AM, Stefan Ring wrote:
>> Not at all.  You can achieve this performance with the 6 300GB spindles
>> you currently have, as Christoph and I both mentioned.  You simply lose
>> one spindle of capacity, 300GB, vs your current RAID6 setup.  Make 3
>> RAID1 pairs in the p400 and concatenate them.  If the p400 can't do this
>> concat the mirror pair devices with md --linear.  Format the resulting
>> Linux block device with the following and mount with inode64.
>>
>> $ mkfs.xfs -d agcount=3 /dev/[device]
>>
>> That will give you 1 AG per spindle, 3 horizontal AGs total instead of 4
>> vertical AGs as you get with default striping setup.  This is optimal
>> for your high IOPS workload as it eliminates all 'extraneous' seeks
>> yielding a per disk access pattern nearly identical to EXT4.  And it
>> will almost certainly outrun EXT4 on your RAID6 due mostly to the
>> eliminated seeks, but also to elimination of parity calculations.
>> You've wiped the array a few times in your testing already right, so one
>> or two more test setups should be no sweat.  Give it a go.  The results
>> will be pleasantly surprising.
> 
> Well I had to move around quite a bit of data, but for the sake of
> completeness, I had to give it a try.
> 
> With a nice and tidy fresh XFS file system, performance is indeed
> impressive – about 16 sec for the same task that would take 2 min 25
> before. So that’s about 150 MB/sec, which is not great, but for many
> tiny files it would perhaps be a bit unreasonable to expect more. A

150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
appear that you're writing all these files to a single directory.  If
you're writing them fairly evenly to 3 directories or a multiple of 3,
you should see close to 450MB/s, if using mdraid linear over 3 P400
RAID1 pairs.  If this is what you're doing then something seems wrong
somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
exercise all 3 AGs thus all 3 spindles.

> simple copy of the tar onto the XFS file system yields the same linear
> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
> best these disks can do, meaning that theoretically, with 3 AGs, it
> should be able to reach 450 MB/sec under optimal conditions.

The optimal condition, again, requires writing 3 of this file to 3
directories to hit ~450MB/s, which you should get close to if using
mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
parallelism must come from manipulating usage of filesystem structures.
 I thought I explained all of this previously when I introduced the "XFS
concat" into this thread.

> I will still do a test with the free space fragmentation priming on
> the concatenated AG=3 volume, because it seems to be rather slow as
> well.

> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
> benefits in this case to justify the hassle.

If you were writing to only one directory I can understand this
sentiment.  Again, if you were writing 3 directories fairly evenly, with
the md concat, then your sentiment here should be quite different.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs