All of lore.kernel.org
 help / color / mirror / Atom feed
From: pg_xf2@xf2.for.to.sabi.co.UK (Peter Grandi)
To: Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
Date: Fri, 6 Apr 2012 00:07:23 +0100	[thread overview]
Message-ID: <20350.9643.379841.771496@tree.ty.sabi.co.UK> (raw)
In-Reply-To: <CAAxjCEwBMbd0x7WQmFELM8JyFu6Kv_b+KDe3XFqJE6shfSAfyQ@mail.gmail.com>

[ ... ]

> [ ... ] tarball of a finished IcedTea6 build, about 2.5 GB in
> size. It contains roughly 200,000 files in 20,000 directories.
> [ ... ] given that the entire write set -- all 2.5 GB of it --
> is "known" to the file system, that is, in memory, wouldn't it
> be possible to write it out to disk in a somewhat more
> reasonable fashion?  [ ... ] The disk hardware used was a
> SmartArray p400 controller with 6x 10k rpm 300GB SAS disks in
> RAID 6. The server has plenty of RAM (64 GB).

On reflection this trigger for me an aside: traditional
filesystem types are designed for the case where the ratio is
the opposite, something like a 64GB data collection to process
and 2.5GB of RAM, and where therefore the issue is minimizing
ongoing disk accesses, not the upload from memory to disk of a
bulk sparse set of stuff.

The Sprite Log-structured File System was a design targeted at
large-memory systems, assuming that then writes are the issue
(especially as Sprite was network-based), and reads would mostly
happen from RAM, as in your (euphemism) insipid test.

I suspect that if the fundamental tradeoffs are inverted, then a
completely different design like a LFS might be appropriate.

But the above has a relationship to your (euphemism) unwise
concerns: the case where 200,000 files for 2.5GB are completely
written to RAM and then flushed as a whole to disk is not only
"untraditional" it is also (euphemism) peculiar: try by setting
the flusher to run rather often so that not more than 100-300MB
of dirty pages are left at any one time.

Which brings another subject: usually hw RAID host adapter have
cache, and have firmware that cleverly rearranges writes.

Looking at the specs of the P400:

  http://h18004.www1.hp.com/products/servers/proliantstorage/arraycontrollers/smartarrayp400/

it seems to me that it has standard 256MB of cache, and only
supports RAID6 with a battery backed write cache (wise!).

Which means that your Linux-level seek graphs may be not so
useful, because the host adapter may be drastically rearranging
the seek patterns, and you may need to tweak the P400 elevator,
rather than or in addition to the Linux elevator.

Unless possibly barriers are enabled, and even with a BBWC the
P400 writes through on receiving a barrier request. IIRC XFS is
rather stricter in issuing barrier requests than 'ext4', and you
may be seeing more the effect of that than the effect of aiming
to splitting the access patterns between 4 AGs to improve the
potential for multithreading (which you deny because you are
using what is most likely a large RAID6 stripe size with a small
IO intensive write workload, as previously noted).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-04-05 23:07 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 18:10 XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?) Stefan Ring
2012-04-05 19:56 ` Peter Grandi
2012-04-05 22:41   ` Peter Grandi
2012-04-06 14:36   ` Peter Grandi
2012-04-06 15:37     ` Stefan Ring
2012-04-07 13:33       ` Peter Grandi
2012-04-05 21:37 ` Christoph Hellwig
2012-04-06  1:09   ` Peter Grandi
2012-04-06  8:25   ` Stefan Ring
2012-04-07 18:57     ` Martin Steigerwald
2012-04-10 14:02       ` Stefan Ring
2012-04-10 14:32         ` Joe Landman
2012-04-10 15:56           ` Stefan Ring
2012-04-10 18:13         ` Martin Steigerwald
2012-04-10 20:44         ` Stan Hoeppner
2012-04-10 21:00           ` Stefan Ring
2012-04-05 22:32 ` Roger Willcocks
2012-04-06  7:11   ` Stefan Ring
2012-04-06  8:24     ` Stefan Ring
2012-04-05 23:07 ` Peter Grandi [this message]
2012-04-06  0:13   ` Peter Grandi
2012-04-06  7:27     ` Stefan Ring
2012-04-06 23:28       ` Stan Hoeppner
2012-04-07  7:27         ` Stefan Ring
2012-04-07  8:53           ` Emmanuel Florac
2012-04-07 14:57           ` Stan Hoeppner
2012-04-09 11:02             ` Stefan Ring
2012-04-09 12:48               ` Emmanuel Florac
2012-04-09 12:53                 ` Stefan Ring
2012-04-09 13:03                   ` Emmanuel Florac
2012-04-09 23:38               ` Stan Hoeppner
2012-04-10  6:11                 ` Stefan Ring
2012-04-10 20:29                   ` Stan Hoeppner
2012-04-10 20:43                     ` Stefan Ring
2012-04-10 21:29                       ` Stan Hoeppner
2012-04-09  0:19           ` Dave Chinner
2012-04-09 11:39             ` Emmanuel Florac
2012-04-09 21:47               ` Dave Chinner
2012-04-07  8:49         ` Emmanuel Florac
2012-04-08 20:33           ` Stan Hoeppner
2012-04-08 21:45             ` Emmanuel Florac
2012-04-09  5:27               ` Stan Hoeppner
2012-04-09 12:45                 ` Emmanuel Florac
2012-04-13 19:36                   ` Stefan Ring
2012-04-14  7:32                     ` Stan Hoeppner
2012-04-14 11:30                       ` Stefan Ring
2012-04-09 14:21         ` Geoffrey Wehrman
2012-04-10 19:30           ` Stan Hoeppner
2012-04-11 22:19             ` Geoffrey Wehrman
2012-04-07 16:50       ` Peter Grandi
2012-04-07 17:10         ` Joe Landman
2012-04-08 21:42           ` Stan Hoeppner
2012-04-09  5:13             ` Stan Hoeppner
2012-04-09 11:52               ` Stefan Ring
2012-04-10  7:34                 ` Stan Hoeppner
2012-04-10 13:59                   ` Stefan Ring
2012-04-09  9:23             ` Stefan Ring
2012-04-09 23:06               ` Stan Hoeppner
2012-04-06  0:53   ` Peter Grandi
2012-04-06  7:32     ` Stefan Ring
2012-04-06  5:53   ` Stefan Ring
2012-04-06 15:35     ` Peter Grandi
2012-04-10 14:05       ` Stefan Ring
2012-04-07 19:11     ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20350.9643.379841.771496@tree.ty.sabi.co.UK \
    --to=pg_xf2@xf2.for.to.sabi.co.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.