All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 00/11] xfs: rework extent allocation
Date: Fri, 7 Jun 2019 08:13:01 +1000	[thread overview]
Message-ID: <20190606221301.GC14308@dread.disaster.area> (raw)
In-Reply-To: <20190606152101.GA2791@bfoster>

On Thu, Jun 06, 2019 at 11:21:04AM -0400, Brian Foster wrote:
> On Fri, May 31, 2019 at 01:11:36PM -0400, Brian Foster wrote:
> > On Sun, May 26, 2019 at 08:43:17AM +1000, Dave Chinner wrote:
> > > On Fri, May 24, 2019 at 08:00:18AM -0400, Brian Foster wrote:
> > > > On Fri, May 24, 2019 at 08:15:52AM +1000, Dave Chinner wrote:
> > > > > On Thu, May 23, 2019 at 08:55:35AM -0400, Brian Foster wrote:
> > > > > > Hmmm.. I suppose if I had a script that
> > > > > > just dumped every applicable stride/delta value for an inode, I could
> > > > > > dump all of those numbers into a file and we can process it from there..
> > > > > 
> > > > > See how the freesp commands work in xfs_db - they just generate a
> > > > > set of {offset, size} tuples that are then bucketted appropriately.
> > > > > This is probably the best way to do this at the moment - have xfs_db
> > > > > walk the inode BMBTs outputting something like {extent size,
> > > > > distance to next extent} tuples and everything else falls out from
> > > > > how we bucket that information.
> > > > > 
> > > > 
> > > > That sounds plausible. A bit more involved than what I'm currently
> > > > working with, but we do already have a blueprint for the scanning
> > > > implementation required to collect this data via the frag command.
> > > > Perhaps some of this code between the frag/freesp can be generalized and
> > > > reused. I'll take a closer look at it.
> > > > 
> > > > My only concern is I'd prefer to only go down this path as long as we
> > > > plan to land the associated command in xfs_db. So this approach suggests
> > > > to me that we add a "locality" command similar to frag/freesp that
> > > > presents the locality state of the fs. For now I'm only really concerned
> > > > with the data associated with known near mode allocations (i.e., such as
> > > > the extent size:distance relationship you've outlined above) so we can
> > > > evaluate these algorithmic changes, but this would be for fs devs only
> > > > so we could always expand on it down the road if we want to assess
> > > > different allocations. Hm?
> > > 
> > > Yup, I'm needing to do similar analysis myself to determine how
> > > quickly I'm aging the filesystem, so having the tool in xfs_db or
> > > xfs_spaceman would be very useful.
> > > 
> > > FWIW, the tool I've just started writing will just use fallocate and
> > > truncate to hammer the allocation code as hard and as quickly as
> > > possible - I want to do accelerated aging of the filesystem, and so
> > > being able to run tens to hundreds of thousands of free space
> > > manipulations a second is the goal here....
> > > 
> > 
> > Ok. FWIW, from playing with this so far (before getting distracted for
> > much of this week) the most straightforward place to add this kind of
> > functionality turns out to be the frag command itself. It does 99% of
> > the work required to process data extents already, including pulling the
> > on-disk records of each inode in-core for processing. I basically just
> > had to update that code to include all of the record data and add the
> > locality tracking logic (I haven't got to actually presenting it yet..).
> > 
> 
> I managed to collect some preliminary data based on this strategy. I
....
> 
> Comparison of the baseline and test data shows a generally similar
> breakdown between the two.

Which is the result I wanted to see :)

> Thoughts on any of this data or presentation?

I think it's useful for comparing whether an allocator change has
affected the overall locality of allocation. If it's working as we
expect, you should get vastly different results for inode32 vs
inode64 mount options, with inode32 showing much, much higher
distances for most allocations, so it might be worth running a quick
test to confirm that it does, indeed, demonstrate the results we'd
expect from such a change.

> I could dig further into
> details or alternatively base the histogram on something like extent
> size and show the average delta for each extent size bucket, but I'm not
> sure that will tell us anything profound with respect to this patchset.

*nod*

> One thing I noticed while processing this data is that the current
> dataset skews heavily towards smaller allocations. I still think it's a
> useful comparison because smaller allocations are more likely to stress
> either algorithm via a larger locality search space, but I may try to
> repeat this test with a workload with fewer files and larger allocations
> and see how that changes things.

>From the testing I've been doing, I think the file count of around
10k isn't sufficient to really cause severe allocation issues.
Directory and inodes metadata are great for fragmenting free space,
so dramtically increasing the number of smaller files might actually
produce worse behaviour....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-06-06 22:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-22 18:05 [PATCH v2 00/11] xfs: rework extent allocation Brian Foster
2019-05-22 18:05 ` [PATCH v2 01/11] xfs: clean up small allocation helper Brian Foster
2019-06-21 23:57   ` Darrick J. Wong
2019-05-22 18:05 ` [PATCH v2 02/11] xfs: move " Brian Foster
2019-06-21 23:58   ` Darrick J. Wong
2019-05-22 18:05 ` [PATCH v2 03/11] xfs: skip small alloc cntbt logic on NULL cursor Brian Foster
2019-06-21 23:58   ` Darrick J. Wong
2019-05-22 18:05 ` [PATCH v2 04/11] xfs: always update params on small allocation Brian Foster
2019-06-21 23:59   ` Darrick J. Wong
2019-05-22 18:05 ` [PATCH v2 05/11] xfs: track active state of allocation btree cursors Brian Foster
2019-05-22 18:05 ` [PATCH v2 06/11] xfs: use locality optimized cntbt lookups for near mode allocations Brian Foster
2019-05-22 18:05 ` [PATCH v2 07/11] xfs: refactor exact extent allocation mode Brian Foster
2019-05-22 18:05 ` [PATCH v2 08/11] xfs: refactor by-size " Brian Foster
2019-05-22 18:05 ` [PATCH v2 09/11] xfs: replace small allocation logic with agfl only logic Brian Foster
2019-05-22 18:05 ` [PATCH v2 10/11] xfs: refactor successful AG allocation accounting code Brian Foster
2019-05-22 18:05 ` [PATCH v2 11/11] xfs: condense high level AG allocation functions Brian Foster
2019-05-23  1:56 ` [PATCH v2 00/11] xfs: rework extent allocation Dave Chinner
2019-05-23 12:55   ` Brian Foster
2019-05-23 22:15     ` Dave Chinner
2019-05-24 12:00       ` Brian Foster
2019-05-25 22:43         ` Dave Chinner
2019-05-31 17:11           ` Brian Foster
2019-06-06 15:21             ` Brian Foster
2019-06-06 22:13               ` Dave Chinner [this message]
2019-06-07 12:57                 ` Brian Foster
2019-06-06 22:05             ` Dave Chinner
2019-06-07 12:56               ` Brian Foster
2019-06-21 15:18 ` Darrick J. Wong
2019-07-01 19:12   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190606221301.GC14308@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.