All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Ian Kent <raven@themaw.net>,
	rcu@vger.kernel.org
Subject: Re: [PATCH] xfs: require an rcu grace period before inode recycle
Date: Tue, 25 Jan 2022 11:31:20 +1100	[thread overview]
Message-ID: <20220125003120.GO59729@dread.disaster.area> (raw)
In-Reply-To: <Ye82TgBY0VmtTjMc@bfoster>

On Mon, Jan 24, 2022 at 06:29:18PM -0500, Brian Foster wrote:
> On Tue, Jan 25, 2022 at 09:08:53AM +1100, Dave Chinner wrote:
> > > FYI, I modified my repeated alloc/free test to do some batching and form
> > > it into something more able to measure the potential side effect / cost
> > > of the grace period sync. The test is a single threaded, file alloc/free
> > > loop using a variable per iteration batch size. The test runs for ~60s
> > > and reports how many total files were allocated/freed in that period
> > > with the specified batch size. Note that this particular test ran
> > > without any background workload. Results are as follows:
> > > 
> > > 	files		baseline	test
> > > 
> > > 	1		38480		38437
> > > 	4		126055		111080
> > > 	8		218299		134469
> > > 	16		306619		141968
> > > 	32		397909		152267
> > > 	64		418603		200875
> > > 	128		469077		289365
> > > 	256		684117		566016
> > > 	512		931328		878933
> > > 	1024		1126741		1118891
> > 
> > Can you post the test code, because 38,000 alloc/unlinks in 60s is
> > extremely slow for a single tight open-unlink-close loop. I'd be
> > expecting at least ~10,000 alloc/unlink iterations per second, not
> > 650/second.
> > 
> 
> Hm, Ok. My test was just a bash script doing a 'touch <files>; rm
> <files>' loop. I know there was application overhead because if I
> tweaked the script to open an fd directly rather than use touch, the
> single file performance jumped up a bit, but it seemed to wash away as I
> increased the file count so I kept running it with larger sizes. This
> seems off so I'll port it over to C code and see how much the numbers
> change.

Yeah, using touch/rm becomes fork/exec bound very quickly. You'll
find that using "echo > <file>" is much faster than "touch <file>"
because it runs a shell built-in operation without fork/exec
overhead to create the file. But you can't play tricks like that to
replace rm:

$ time for ((i=0;i<1000;i++)); do touch /mnt/scratch/foo; rm /mnt/scratch/foo ; done

real    0m2.653s
user    0m0.910s
sys     0m2.051s
$ time for ((i=0;i<1000;i++)); do echo > /mnt/scratch/foo; rm /mnt/scratch/foo ; done

real    0m1.260s
user    0m0.452s
sys     0m0.913s
$ time ./open-unlink 1000 /mnt/scratch/foo

real    0m0.037s
user    0m0.001s
sys     0m0.030s
$

Note the difference in system time between the three operations -
almost all the difference in system CPU time is the overhead of
fork/exec to run the touch/rm binaries, not do the filesystem
operations....

> > > That's just a test of a quick hack, however. Since there is no real
> > > urgency to inactivate an unlinked inode (it has no potential users until
> > > it's freed),
> > 
> > On the contrary, there is extreme urgency to inactivate inodes
> > quickly.
> > 
> 
> Ok, I think we're talking about slightly different things. What I mean
> above is that if a task removes a file and goes off doing unrelated
> $work, that inode will just sit on the percpu queue indefinitely. That's
> fine, as there's no functional need for us to process it immediately
> unless we're around -ENOSPC thresholds or some such that demand reclaim
> of the inode.

Yup, an occasional unlink sitting around for a while on an unlinked
list isn't going to cause a performance problem.  Indeed, such
workloads are more likely to benefit from the reduced unlink()
syscall overhead and won't even notice the increase in background
CPU overhead for inactivation of those occasional inodes.

> It sounds like what you're talking about is specifically
> the behavior/performance of sustained file removal (which is important
> obviously), where apparently there is a notable degradation if the
> queues become deep enough to push the inode batches out of CPU cache. So
> that makes sense...

Yup, sustained bulk throughput is where cache residency really
matters. And for unlink, sustained unlink workloads are quite
common; they often are something people wait for on the command line
or make up a performance critical component of a highly concurrent
workload so it's pretty important to get this part right.

> > Darrick made the original assumption that we could delay
> > inactivation indefinitely and so he allowed really deep queues of up
> > to 64k deferred inactivations. But with queues this deep, we could
> > never get that background inactivation code to perform anywhere near
> > the original synchronous background inactivation code. e.g. I
> > measured 60-70% performance degradataions on my scalability tests,
> > and nothing stood out in the profiles until I started looking at
> > CPU data cache misses.
> > 
> 
> ... but could you elaborate on the scalability tests involved here so I
> can get a better sense of it in practice and perhaps observe the impact
> of changes in this path?

The same conconrrent fsmark create/traverse/unlink workloads I've
been running for the past decade+ demonstrates it pretty simply. I
also saw regressions with dbench (both op latency and throughput) as
the clinet count (concurrency) increased, and with compilebench.  I
didn't look much further because all the common benchmarks I ran
showed perf degradations with arbitrary delays that went away with
the current code we have.  ISTR that parts of aim7/reaim scalability
workloads that the intel zero-day infrastructure runs are quite
sensitive to background inactivation delays as well because that's a
CPU bound workload and hence any reduction in cache residency
results in a reduction of the number of concurrent jobs that can be
run.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-01-25  2:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21 14:24 [PATCH] xfs: require an rcu grace period before inode recycle Brian Foster
2022-01-21 17:26 ` Darrick J. Wong
2022-01-21 18:33   ` Brian Foster
2022-01-22  5:30     ` Paul E. McKenney
2022-01-22 16:55       ` Paul E. McKenney
2022-01-24 15:12       ` Brian Foster
2022-01-24 16:40         ` Paul E. McKenney
2022-01-23 22:43 ` Dave Chinner
2022-01-24 15:06   ` Brian Foster
2022-01-24 15:02 ` Brian Foster
2022-01-24 22:08   ` Dave Chinner
2022-01-24 23:29     ` Brian Foster
2022-01-25  0:31       ` Dave Chinner [this message]
2022-01-25 14:40         ` Paul E. McKenney
2022-01-25 22:36           ` Dave Chinner
2022-01-26  5:29             ` Paul E. McKenney
2022-01-26 13:21               ` Brian Foster
2022-01-25 18:30         ` Brian Foster
2022-01-25 20:07           ` Brian Foster
2022-01-25 22:45           ` Dave Chinner
2022-01-27  4:19             ` Al Viro
2022-01-27  5:26               ` Dave Chinner
2022-01-27 19:01                 ` Brian Foster
2022-01-27 22:18                   ` Dave Chinner
2022-01-28 14:11                     ` Brian Foster
2022-01-28 23:53                       ` Dave Chinner
2022-01-31 13:28                         ` Brian Foster
2022-01-28 21:39                   ` Paul E. McKenney
2022-01-31 13:22                     ` Brian Foster
2022-02-01 22:00                       ` Paul E. McKenney
2022-02-03 18:49                         ` Paul E. McKenney
2022-02-07 13:30                         ` Brian Foster
2022-02-07 16:36                           ` Paul E. McKenney
2022-02-10  4:09                             ` Dave Chinner
2022-02-10  5:45                               ` Paul E. McKenney
2022-02-10 20:47                                 ` Brian Foster
2022-01-25  8:16 ` [xfs] a7f4e88080: aim7.jobs-per-min -62.2% regression kernel test robot
2022-01-25  8:16   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220125003120.GO59729@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=rcu@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.