linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Waiman Long <longman@redhat.com>,
	Michal Hocko <mhocko@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jonathan Corbet <corbet@lwn.net>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	Jan Kara <jack@suse.cz>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	"Wangkai (Kevin,C)" <wangkai86@huawei.com>
Subject: Re: [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries
Date: Fri, 13 Jul 2018 10:36:14 +1000	[thread overview]
Message-ID: <20180713003614.GW2234@dastard> (raw)
In-Reply-To: <1531425435.18255.17.camel@HansenPartnership.com>

On Thu, Jul 12, 2018 at 12:57:15PM -0700, James Bottomley wrote:
> What surprises me most about this behaviour is the steadiness of the
> page cache ... I would have thought we'd have shrunk it somewhat given
> the intense call on the dcache.

Oh, good, the page cache vs superblock shrinker balancing still
protects the working set of each cache the way it's supposed to
under heavy single cache pressure. :)

Keep in mind that the amount of work slab cache shrinkers perform is
directly proportional to the amount of page cache reclaim that is
performed and the size of the slab cache being reclaimed.  IOWs,
under a "single cache pressure" workload we should be directing
reclaim work to the huge cache creating the pressure and do very
little reclaim from other caches....

[ What follows from here is conjecture, but is based on what I've
seen in the past 10+ years on systems with large numbers of negative
dentries and fragmented dentry/inode caches. ]

However, this only reaches steady state if the reclaim rate can keep
ahead of the allocation rate. This single threaded micro-workload
won't result in an internally fragmented dentry slab cache, so
reclaim is going to be as efficient as possible and have the CPU to
keep up with the allocation rate.  i.e. Bulk negative dentry reclaim
is cheap, in LRU order, and frees slab pages quickly and efficiently
in large batches so steady state is easily reached.

Problems arise when the slab *page* reclaim rate drops below
allocation rate. i.e when you have short term (negative) dentries
mixed into the same slab pages as long term stable dentries. This
causes the dentry cache to fragment internally - reclaim hits the
negative dentries and creates large numbers of partial pages - and
so reclaim of negative dentries will fail to free memory. Creating
new negative dentries then fills these partial pages first, and so
the alloc/reclaim cycles on negative dentries only ever produce
partial pages and never free slab cache pages. IOWs, the cost of
reclaim slab *pages* goes way up despite the fact that the cost of
reclaiming individual dentries has remained the same.

That's the underlying problem here - the cost of reclaiming dentries
is constant but the cost of reclaiming *slab pages* is not.  It is
not uncommon to have to trash 90% of the dentry or inode caches to
reduce internal fragmentation down to the point where pages start to
get freed and the per-slab-page reclaim cost reduces to be less than
the allocation cost. Then we see the system return to normal steady
state behaviour.

In situations where lots of negative dentries are created by
production workloads, that "90%" of the cache that needs to be
reclaimed to fix the internal fragmentation issue is all negative
dentries and just enough of the real dentries to be freeing
quantities of partial pages in the slab. Hence negative dentries are
seen as the problem because they make up the vast majority of the
dentries that get reclaimed when the problem goes away.

By limiting the number of negative dentries in this case, internal
slab fragmentation is reduced such that reclaim cost never gets out
of control. While it appears to "fix" the symptoms, it doesn't
address the underlying problem. It is a partial solution at best but
at worst it's another opaque knob that nobody knows how or when to
tune.

Very few microbenchmarks expose this internal slab fragmentation
problem because they either don't run long enough, don't create
memory pressure, or don't have access patterns that mix long and
short term slab objects together in a way that causes slab
fragmentation. Run some cold cache directory traversals (git
status?) at the same time you are creating negative dentries so you
create pinned partial pages in the slab cache and see how the
behaviour changes....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-07-13  0:36 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-06 19:32 [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Waiman Long
2018-07-06 19:32 ` [PATCH v6 1/7] fs/dcache: Track & report number " Waiman Long
2018-07-06 19:32 ` [PATCH v6 2/7] fs/dcache: Add sysctl parameter neg-dentry-pc as a soft limit on " Waiman Long
2018-07-06 19:32 ` [PATCH v6 3/7] fs/dcache: Enable automatic pruning of " Waiman Long
2018-07-06 19:32 ` [PATCH v6 4/7] fs/dcache: Spread negative dentry pruning across multiple CPUs Waiman Long
2018-07-06 19:32 ` [PATCH v6 5/7] fs/dcache: Add negative dentries to LRU head initially Waiman Long
2018-07-06 19:32 ` [PATCH v6 6/7] fs/dcache: Allow optional enforcement of negative dentry limit Waiman Long
2018-07-06 19:32 ` [PATCH v6 7/7] fs/dcache: Allow deconfiguration of negative dentry code to reduce kernel size Waiman Long
2018-07-06 21:54   ` Eric Biggers
2018-07-06 22:28 ` [PATCH v6 0/7] fs/dcache: Track & limit # of negative dentries Al Viro
2018-07-07  3:02   ` Waiman Long
2018-07-09  8:19 ` Michal Hocko
2018-07-09 16:01   ` Waiman Long
2018-07-10 14:27     ` Michal Hocko
2018-07-10 16:09       ` Waiman Long
2018-07-11 10:21         ` Michal Hocko
2018-07-11 15:13           ` Waiman Long
2018-07-11 17:42             ` James Bottomley
2018-07-11 19:07               ` Waiman Long
2018-07-11 19:21                 ` James Bottomley
2018-07-12 15:54                   ` Waiman Long
2018-07-12 16:04                     ` James Bottomley
2018-07-12 16:26                       ` Waiman Long
2018-07-12 17:33                         ` James Bottomley
2018-07-13 15:32                           ` Waiman Long
2018-07-12 16:49                       ` Matthew Wilcox
2018-07-12 17:21                         ` James Bottomley
2018-07-12 18:06                           ` Linus Torvalds
2018-07-12 19:57                             ` James Bottomley
2018-07-13  0:36                               ` Dave Chinner [this message]
2018-07-13 15:46                                 ` James Bottomley
2018-07-13 23:17                                   ` Dave Chinner
2018-07-16  9:10                                   ` Michal Hocko
2018-07-16 14:42                                     ` James Bottomley
2018-07-16  9:09                                 ` Michal Hocko
2018-07-16  9:12                                   ` Michal Hocko
2018-07-16 12:41                                   ` Matthew Wilcox
2018-07-16 23:40                                     ` Andrew Morton
2018-07-17  1:30                                       ` Matthew Wilcox
2018-07-17  8:33                                       ` Michal Hocko
2018-07-19  0:33                                         ` Dave Chinner
2018-07-19  8:45                                           ` Michal Hocko
2018-07-19  9:13                                             ` Jan Kara
2018-07-18 18:39                                       ` Waiman Long
2018-07-18 16:17                                   ` Waiman Long
2018-07-19  8:48                                     ` Michal Hocko
2018-07-12  8:48             ` Michal Hocko
2018-07-12 16:12               ` Waiman Long
2018-07-12 23:16                 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180713003614.GW2234@dastard \
    --to=david@fromorbit.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=jack@suse.cz \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangkai86@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).