All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 02/10] xfs: convert quotacheck to use the new iwalk functions
Date: Thu, 13 Jun 2019 09:33:02 +1000	[thread overview]
Message-ID: <20190612233302.GG14363@dread.disaster.area> (raw)
In-Reply-To: <20190612125506.GE12395@bfoster>

On Wed, Jun 12, 2019 at 08:55:06AM -0400, Brian Foster wrote:
> On Tue, Jun 11, 2019 at 05:32:19PM -0700, Darrick J. Wong wrote:
> > On Wed, Jun 12, 2019 at 09:23:47AM +1000, Dave Chinner wrote:
> Since we're already discussing tweaks to readahead, another approach to
> this problem could be to try and readahead all the way into the inode
> cache. For example, consider a mechanism where a cluster buffer
> readahead sets a flag on the buffer that effectively triggers an iget of
> each allocated inode in the buffer. Darrick has already shown that the
> inode memory allocation and iget itself has considerable overhead even
> when the cluster buffer is already cached. We know that's not due to
> btree lookups because quotacheck isn't using IGET_UNTRUSTED, so perhaps
> we could amortize more of this cost via readahead.

The DONTCACHE inode caching semantics of bulkstat tend to conflict
with "readahead all the way to the inode cache".

> The caveats are that would probably be more involved than something that
> just caches the current cluster buffer and passes it into the iget path.
> We'd have to rectify readahead in-core inodes against DONTCACHE inodes
> used by bulkstat, for example, though I don't think that would be too
> difficult to address via a new inode readahead flag or some such
> preserve existing DONTCACHE behavior.

I did try that once, the cache thrashing was .... difficult to
contain under memory pressure. bulkstat pushes hundreds of thousands
of inodes a second through the inode cache, and under memory
pressure it will still cause working set perturbation with DONTCACHE
being set. Holding DONTCACHE inodes for some time in the cache kinda
defeats the simple purpose it has, and relying on cache hits to
convert "readahead" to "dont cache" becomes really nasty when we
try to use inode readahead for other things (like speeding up
directory traversals by having xfs_readdir() issue inode readahead).

The largest delay in bulkstat is the inode cluster IO latency.
Getting rid of that is where the biggest win is (hence cluster
read-ahead). The second largest overhead is the CPU burnt doing
inode lookups, and on filesystems with lots of inodes, a significant
amount of that is in the IGET_UNTRUSTED inobt lookup. IOWs, avoiding
GET_UNTRUSTED is relatively low hanging fruit.

The next limitation for bulkstat is the superblock inode list lock
contention. Getting rid of the IGET_UNTRUSTED overhead is likely to
push the lock contention into the severe range (the lock is already
the largest CPU consumer at 16 threads bulkstating 600,000 inodes/s
on a 16p machine) so until we get rid of that lock contention, there
isn't much point in doing major rework to the bulkstat algorithm as
it doesn't address the limitations that the current algorithm has.

> It's also likely that passing the buffer into iget would already address
> most of the overhead associated with the buffer lookup, so there might
> not be enough tangible benefit at that point. The positive is that it's
> probably an incremental step on top of an "iget from an existing cluster
> buffer" mechanism and so could be easily prototyped by hacking in a read
> side b_iodone handler or something.

We don't want to put inode cache insertion into a IO completion
routine. Tried it, caused horrible problems with metadata read IO
latency and substantially increased inode cache lock contention by
bouncing the radix trees around both submission and completion CPU
contexts...

/me has spent many, many years trying lots of different ways to make
the inode cache in XFS go faster and has failed most of the time....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-06-12 23:34 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04 21:49 [PATCH v2 00/10] xfs: refactor and improve inode iteration Darrick J. Wong
2019-06-04 21:49 ` [PATCH 01/10] xfs: create simplified inode walk function Darrick J. Wong
2019-06-10 13:58   ` Brian Foster
2019-06-10 16:59     ` Darrick J. Wong
2019-06-10 17:55       ` Brian Foster
2019-06-10 23:11         ` Darrick J. Wong
2019-06-11 22:33           ` Dave Chinner
2019-06-11 23:05             ` Darrick J. Wong
2019-06-12 12:13               ` Brian Foster
2019-06-12 16:53                 ` Darrick J. Wong
2019-06-12 17:54               ` Darrick J. Wong
2019-06-04 21:49 ` [PATCH 02/10] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
2019-06-10 13:58   ` Brian Foster
2019-06-10 17:10     ` Darrick J. Wong
2019-06-11 23:23     ` Dave Chinner
2019-06-12  0:32       ` Darrick J. Wong
2019-06-12 12:55         ` Brian Foster
2019-06-12 23:33           ` Dave Chinner [this message]
2019-06-13 18:34             ` Brian Foster
2019-06-04 21:49 ` [PATCH 03/10] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
2019-06-10 13:59   ` Brian Foster
2019-06-04 21:49 ` [PATCH 04/10] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
2019-06-10 14:02   ` Brian Foster
2019-06-10 17:38     ` Darrick J. Wong
2019-06-10 18:29       ` Brian Foster
2019-06-10 23:42         ` Darrick J. Wong
2019-06-04 21:49 ` [PATCH 05/10] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
2019-06-10 14:02   ` Brian Foster
2019-06-04 21:50 ` [PATCH 06/10] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
2019-06-10 19:32   ` Brian Foster
2019-06-04 21:50 ` [PATCH 07/10] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
2019-06-10 19:32   ` Brian Foster
2019-06-04 21:50 ` [PATCH 08/10] xfs: multithreaded iwalk implementation Darrick J. Wong
2019-06-10 19:40   ` Brian Foster
2019-06-11  1:10     ` Darrick J. Wong
2019-06-11 13:13       ` Brian Foster
2019-06-11 15:29         ` Darrick J. Wong
2019-06-11 17:00           ` Brian Foster
2019-06-04 21:50 ` [PATCH 09/10] xfs: poll waiting for quotacheck Darrick J. Wong
2019-06-11 15:07   ` Brian Foster
2019-06-11 16:06     ` Darrick J. Wong
2019-06-11 17:01       ` Brian Foster
2019-06-04 21:50 ` [PATCH 10/10] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
2019-06-11 15:08   ` Brian Foster
2019-06-11 16:21     ` Darrick J. Wong
2019-06-11 17:01       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612233302.GG14363@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.