All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: "Richard W.M. Jones" <rjones@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: mkfs.xfs options suitable for creating absurdly large XFS filesystems?
Date: Wed, 5 Sep 2018 17:43:49 +1000	[thread overview]
Message-ID: <20180905074349.GX5631@dastard> (raw)
In-Reply-To: <4003432.r53nhXgZDq@merkaba>

On Wed, Sep 05, 2018 at 09:09:28AM +0200, Martin Steigerwald wrote:
> Dave Chinner - 05.09.18, 00:23:
> > On Tue, Sep 04, 2018 at 05:36:43PM +0200, Martin Steigerwald wrote:
> > > Dave Chinner - 04.09.18, 02:49:
> > > > On Mon, Sep 03, 2018 at 11:49:19PM +0100, Richard W.M. Jones 
> wrote:
> > > > > [This is silly and has no real purpose except to explore the
> > > > > limits.
> > > > > If that offends you, don't read the rest of this email.]
> > > > 
> > > > We do this quite frequently ourselves, even if it is just to
> > > > remind
> > > > ourselves how long it takes to wait for millions of IOs to be
> > > > done.
> > > 
> > > Just for the fun of it during an Linux Performance analysis & tuning
> > > course I held I created a 1 EiB XFS filesystem a sparse file on
> > > another XFS filesystem on an SSD of a ThinkPad T520. It took
> > > several hours to create, but then it was there and mountable. AFAIR
> > > the sparse file was a bit less than 20 GiB.
> > 
> > Yup, 20GB of single sector IOs takes a long time.
> 
> Yeah. It was interesting to see that neither the CPU nor the SSD was 
> fully utilized during that time tough.

Right - it's not CPU bound because it's always waiting on a single
IO, and it's not IO bound because it's only issuing a single IO at a
time.

Speaking of which, I just hacked an delayed write buffer list
construct similar to the kernel code into mkfs/libxfs to batch
writeback. Then I added a
hacky AIO ring to allow it to drive deep IO queues. I'm seeing
sustained request queue depths of ~100 and the SSDs are about 80%
busy at 100,000 write IOPS. But mkfs is only consuming about 60% of
a single CPU.

Which means that, instead of 7-8 hours to make an 8EB filesystem, we
can get it down to:

$ time sudo ~/packages/mkfs.xfs -K  -d size=8191p /dev/vdd
meta-data=/dev/vdd               isize=512    agcount=8387585, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096 blocks=2251524935778304, imaxpct=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
	 =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real    15m18.090s
user    5m54.162s
sys     3m49.518s

Around 15 minutes on a couple of cheap consumer nvme SSDs.

xfs_repair is going to need some help to scale up to this many AGs,
though - phase 1 is doing a huge amount of IO just to verify the
primary superblock...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-09-05 12:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-03 22:49 mkfs.xfs options suitable for creating absurdly large XFS filesystems? Richard W.M. Jones
2018-09-04  0:49 ` Dave Chinner
2018-09-04  8:23   ` Dave Chinner
2018-09-04  9:12     ` Dave Chinner
2018-09-04  8:26   ` Richard W.M. Jones
2018-09-04  9:11     ` Dave Chinner
2018-09-04  9:45       ` Richard W.M. Jones
2018-09-04 15:36   ` Martin Steigerwald
2018-09-04 22:23     ` Dave Chinner
2018-09-05  7:09       ` Martin Steigerwald
2018-09-05  7:43         ` Dave Chinner [this message]
2018-09-05  9:05   ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180905074349.GX5631@dastard \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.