All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com
Subject: Re: [PATCH 4/4] mm: zero-seek shrinkers
Date: Fri, 12 Oct 2018 15:48:52 +0200	[thread overview]
Message-ID: <ce593e78-e245-6c89-eb91-2ba61b1be855@suse.cz> (raw)
In-Reply-To: <20181009184732.762-5-hannes@cmpxchg.org>

On 10/9/18 8:47 PM, Johannes Weiner wrote:
> The page cache and most shrinkable slab caches hold data that has been
> read from disk, but there are some caches that only cache CPU work,
> such as the dentry and inode caches of procfs and sysfs, as well as
> the subset of radix tree nodes that track non-resident page cache.
> 
> Currently, all these are shrunk at the same rate: using DEFAULT_SEEKS
> for the shrinker's seeks setting tells the reclaim algorithm that for
> every two page cache pages scanned it should scan one slab object.
> 
> This is a bogus setting. A virtual inode that required no IO to create
> is not twice as valuable as a page cache page; shadow cache entries
> with eviction distances beyond the size of memory aren't either.
> 
> In most cases, the behavior in practice is still fine. Such virtual
> caches don't tend to grow and assert themselves aggressively, and
> usually get picked up before they cause problems. But there are
> scenarios where that's not true.
> 
> Our database workloads suffer from two of those. For one, their file
> workingset is several times bigger than available memory, which has
> the kernel aggressively create shadow page cache entries for the
> non-resident parts of it. The workingset code does tell the VM that
> most of these are expendable, but the VM ends up balancing them 2:1 to
> cache pages as per the seeks setting. This is a huge waste of memory.
> 
> These workloads also deal with tens of thousands of open files and use
> /proc for introspection, which ends up growing the proc_inode_cache to
> absurdly large sizes - again at the cost of valuable cache space,
> which isn't a reasonable trade-off, given that proc inodes can be
> re-created without involving the disk.
> 
> This patch implements a "zero-seek" setting for shrinkers that results
> in a target ratio of 0:1 between their objects and IO-backed
> caches. This allows such virtual caches to grow when memory is
> available (they do cache/avoid CPU work after all), but effectively
> disables them as soon as IO-backed objects are under pressure.
> 
> It then switches the shrinkers for procfs and sysfs metadata, as well
> as excess page cache shadow nodes, to the new zero-seek setting.

AFAIU procfs and sysfs metadata have exclusive slab caches, while the
shadow nodes share 'radix_tree_node' cache with non-shadow ones, right?
To avoid fragmentation, it should be better if they had also separate
cache, since their lifetime becomes different. In case that's feasible
(are non-shadow nodes changing to shadow nodes and vice versa? I guess
they do? That would require reallocation in the other cache.).

Vlastimil

      parent reply	other threads:[~2018-10-12 13:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-09 18:47 [PATCH 0/4] mm: workingset & shrinker fixes Johannes Weiner
2018-10-09 18:47 ` [PATCH 1/4] mm: workingset: don't drop refault information prematurely fix Johannes Weiner
2018-10-10  0:55   ` Rik van Riel
2018-10-09 18:47 ` [PATCH 2/4] mm: workingset: use cheaper __inc_lruvec_state in irqsafe node reclaim Johannes Weiner
2018-10-10  0:55   ` Rik van Riel
2018-10-09 18:47 ` [PATCH 3/4] mm: workingset: add vmstat counter for shadow nodes Johannes Weiner
2018-10-09 22:04   ` Andrew Morton
2018-10-10 14:02     ` Johannes Weiner
2018-10-09 22:08   ` Andrew Morton
2018-10-10 15:05     ` Johannes Weiner
2018-10-16  8:49     ` Mel Gorman
2018-10-16 22:27       ` Andrew Morton
2018-10-09 18:47 ` [PATCH 4/4] mm: zero-seek shrinkers Johannes Weiner
2018-10-09 22:15   ` Andrew Morton
2018-10-09 22:17     ` Andrew Morton
2018-10-10  1:03   ` Rik van Riel
2018-10-10 15:15     ` Johannes Weiner
2018-10-12 13:48   ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce593e78-e245-6c89-eb91-2ba61b1be855@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.