All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: "Tobin C. Harding" <me@tobin.cc>
Cc: "Tobin C. Harding" <tobin@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Alexander Viro" <viro@ftp.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Pekka Enberg" <penberg@cs.helsinki.fi>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Christopher Lameter <cl@linux.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Waiman Long <longman@redhat.com>, Tycho Andersen <tycho@tycho.ws>,
	"Theodore Ts'o" <tytso@mit.edu>, Andi Kleen <ak@linux.intel.com>,
	David Chinner <david@fromorbit.com>,
	Nick Piggin <npiggin@gmail.com>, Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v5 04/16] slub: Slab defrag core
Date: Tue, 21 May 2019 01:25:34 +0000	[thread overview]
Message-ID: <20190521012525.GA15348@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190521011525.GA25898@eros.localdomain>

On Tue, May 21, 2019 at 11:15:25AM +1000, Tobin C. Harding wrote:
> On Tue, May 21, 2019 at 12:51:57AM +0000, Roman Gushchin wrote:
> > On Mon, May 20, 2019 at 03:40:05PM +1000, Tobin C. Harding wrote:
> > > Internal fragmentation can occur within pages used by the slub
> > > allocator.  Under some workloads large numbers of pages can be used by
> > > partial slab pages.  This under-utilisation is bad simply because it
> > > wastes memory but also because if the system is under memory pressure
> > > higher order allocations may become difficult to satisfy.  If we can
> > > defrag slab caches we can alleviate these problems.
> > > 
> > > Implement Slab Movable Objects in order to defragment slab caches.
> > > 
> > > Slab defragmentation may occur:
> > > 
> > > 1. Unconditionally when __kmem_cache_shrink() is called on a slab cache
> > >    by the kernel calling kmem_cache_shrink().
> > > 
> > > 2. Unconditionally through the use of the slabinfo command.
> > > 
> > > 	slabinfo <cache> -s
> > > 
> > > 3. Conditionally via the use of kmem_cache_defrag()
> > > 
> > > - Use Slab Movable Objects when shrinking cache.
> > > 
> > > Currently when the kernel calls kmem_cache_shrink() we curate the
> > > partial slabs list.  If object migration is not enabled for the cache we
> > > still do this, if however, SMO is enabled we attempt to move objects in
> > > partially full slabs in order to defragment the cache.  Shrink attempts
> > > to move all objects in order to reduce the cache to a single partial
> > > slab for each node.
> > > 
> > > - Add conditional per node defrag via new function:
> > > 
> > > 	kmem_defrag_slabs(int node).
> > > 
> > > kmem_defrag_slabs() attempts to defragment all slab caches for
> > > node. Defragmentation is done conditionally dependent on MAX_PARTIAL
> > > _and_ defrag_used_ratio.
> > > 
> > >    Caches are only considered for defragmentation if the number of
> > >    partial slabs exceeds MAX_PARTIAL (per node).
> > > 
> > >    Also, defragmentation only occurs if the usage ratio of the slab is
> > >    lower than the configured percentage (sysfs field added in this
> > >    patch).  Fragmentation ratios are measured by calculating the
> > >    percentage of objects in use compared to the total number of objects
> > >    that the slab page can accommodate.
> > > 
> > >    The scanning of slab caches is optimized because the defragmentable
> > >    slabs come first on the list. Thus we can terminate scans on the
> > >    first slab encountered that does not support defragmentation.
> > > 
> > >    kmem_defrag_slabs() takes a node parameter. This can either be -1 if
> > >    defragmentation should be performed on all nodes, or a node number.
> > > 
> > >    Defragmentation may be disabled by setting defrag ratio to 0
> > > 
> > > 	echo 0 > /sys/kernel/slab/<cache>/defrag_used_ratio
> > > 
> > > - Add a defrag ratio sysfs field and set it to 30% by default. A limit
> > > of 30% specifies that more than 3 out of 10 available slots for objects
> > > need to be in use otherwise slab defragmentation will be attempted on
> > > the remaining objects.
> > > 
> > > In order for a cache to be defragmentable the cache must support object
> > > migration (SMO).  Enabling SMO for a cache is done via a call to the
> > > recently added function:
> > > 
> > > 	void kmem_cache_setup_mobility(struct kmem_cache *,
> > > 				       kmem_cache_isolate_func,
> > > 			               kmem_cache_migrate_func);
> > > 
> > > Co-developed-by: Christoph Lameter <cl@linux.com>
> > > Signed-off-by: Tobin C. Harding <tobin@kernel.org>
> > > ---
> > >  Documentation/ABI/testing/sysfs-kernel-slab |  14 +
> > >  include/linux/slab.h                        |   1 +
> > >  include/linux/slub_def.h                    |   7 +
> > >  mm/slub.c                                   | 385 ++++++++++++++++----
> > >  4 files changed, 334 insertions(+), 73 deletions(-)
> > 
> > Hi Tobin!
> > 
> > Overall looks very good to me! I'll take another look when you'll post
> > a non-RFC version, but so far I can't find any issues.
> 
> Thanks for the reviews.
> 
> > A generic question: as I understand, you do support only root kmemcaches now.
> > Is kmemcg support in plans?
> 
> I know very little about cgroups, I have no plans for this work.
> However, I'm not the architect behind this - Christoph is guiding the
> direction on this one.  Perhaps he will comment.
> 
> > Without it the patchset isn't as attractive to anyone using cgroups,
> > as it could be. Also, I hope it can solve (or mitigate) the memcg-specific
> > problem of scattering vfs cache workingset over multiple generations of the
> > same cgroup (their kmem_caches).
> 
> I'm keen to work on anything that makes this more useful so I'll do some
> research.  Thanks for the idea.

You're welcome! I'm happy to help or even to do it by myself, once
your patches will be merged.

Thanks!

  reply	other threads:[~2019-05-21  1:27 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20  5:40 [RFC PATCH v5 00/16] Slab Movable Objects (SMO) Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 01/16] slub: Add isolate() and migrate() methods Tobin C. Harding
2019-05-21  0:37   ` Roman Gushchin
2019-05-20  5:40 ` [RFC PATCH v5 02/16] tools/vm/slabinfo: Add support for -C and -M options Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 03/16] slub: Sort slab cache list Tobin C. Harding
2019-05-21  0:38   ` Roman Gushchin
2019-05-20  5:40 ` [RFC PATCH v5 04/16] slub: Slab defrag core Tobin C. Harding
2019-05-21  0:51   ` Roman Gushchin
2019-05-21  1:15     ` Tobin C. Harding
2019-05-21  1:25       ` Roman Gushchin [this message]
2019-05-20  5:40 ` [RFC PATCH v5 05/16] tools/vm/slabinfo: Add remote node defrag ratio output Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 06/16] tools/vm/slabinfo: Add defrag_used_ratio output Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 07/16] tools/testing/slab: Add object migration test module Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 08/16] tools/testing/slab: Add object migration test suite Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 09/16] lib: Separate radix_tree_node and xa_node slab cache Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 10/16] xarray: Implement migration function for xa_node objects Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 11/16] tools/testing/slab: Add XArray movable objects tests Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 12/16] slub: Enable moving objects to/from specific nodes Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 13/16] slub: Enable balancing slabs across nodes Tobin C. Harding
2019-05-21  1:04   ` Roman Gushchin
2019-05-21  1:44     ` Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 14/16] dcache: Provide a dentry constructor Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 15/16] dcache: Implement partial shrink via Slab Movable Objects Tobin C. Harding
2019-05-20  5:40 ` [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO Tobin C. Harding
2019-05-21  0:57   ` Roman Gushchin
2019-05-21  1:31     ` Tobin C. Harding
2019-05-21  2:05       ` Roman Gushchin
2019-05-21  3:15         ` Tobin C. Harding
2019-05-29  3:54         ` Tobin C. Harding
2019-05-29 16:16           ` Roman Gushchin
2019-06-03  4:26             ` Tobin C. Harding
2019-06-03 20:34               ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190521012525.GA15348@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=adilger@dilger.ca \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=me@tobin.cc \
    --cc=mszeredi@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tobin@kernel.org \
    --cc=tycho@tycho.ws \
    --cc=tytso@mit.edu \
    --cc=viro@ftp.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.