All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/5] refault distance-based file cache sizing
Date: Thu, 3 May 2012 15:15:31 +0200	[thread overview]
Message-ID: <20120503131531.GC31780@cmpxchg.org> (raw)
In-Reply-To: <20120501142656.c9160d96.akpm@linux-foundation.org>

On Tue, May 01, 2012 at 02:26:56PM -0700, Andrew Morton wrote:
> On Tue, 01 May 2012 17:19:16 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > On 05/01/2012 03:08 PM, Andrew Morton wrote:
> > > On Tue,  1 May 2012 10:41:48 +0200
> > > Johannes Weiner<hannes@cmpxchg.org>  wrote:
> > >
> > >> This series stores file cache eviction information in the vacated page
> > >> cache radix tree slots and uses it on refault to see if the pages
> > >> currently on the active list need to have their status challenged.
> > >
> > > So we no longer free the radix-tree node when everything under it has
> > > been reclaimed?  One could create workloads which would result in a
> > > tremendous amount of memory used by radix_tree_node_cachep objects.
> > >
> > > So I assume these things get thrown away at some point.  Some
> > > discussion about the life-cycle here would be useful.
> > 
> > I assume that in the current codebase Johannes has, we would
> > have to rely on the inode cache shrinker to reclaim the inode
> > and throw out the radix tree nodes.
> > 
> > Having a better way to deal with radix tree nodes that contain
> > stale entries (where the evicted pages would no longer receive
> > special treatment on re-fault, because it has been so long) get
> > reclaimed would be nice for a future version.
> > 
> 
> Well, think of a stupid workload which creates a large number of very
> large but sparse files (populated with one page in each 64, for
> example).  Get them all in cache, then sit there touching the inodes to
> keep then fresh.  What's the worst case here?

With 8G of RAM, it takes a minimally populated file (one page per leaf
node) of 3.5TB to consume all memory for radix tree nodes.

The worst case is going OOM without someone to blame as the objects
are owned by the kernel.

Is this a use case we should worry about?  A realistic one, I mean, it
wouldn't be the first one to take down a machine maliciously and could
be prevented by rlimiting the maximum file size.

That aside, entries that are past the point where they would mean
anything, as Rik described above, are a waste of memory, the severity
of which depends on how much of its previously faulted data an inode
has evicted while still being in active use.

For me it's not a question of whether we want a mechanism to reclaim
old shadow pages of inodes that are still in use, but how critical
this is, and then how accurate it needs to be etc.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/5] refault distance-based file cache sizing
Date: Thu, 3 May 2012 15:15:31 +0200	[thread overview]
Message-ID: <20120503131531.GC31780@cmpxchg.org> (raw)
In-Reply-To: <20120501142656.c9160d96.akpm@linux-foundation.org>

On Tue, May 01, 2012 at 02:26:56PM -0700, Andrew Morton wrote:
> On Tue, 01 May 2012 17:19:16 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > On 05/01/2012 03:08 PM, Andrew Morton wrote:
> > > On Tue,  1 May 2012 10:41:48 +0200
> > > Johannes Weiner<hannes@cmpxchg.org>  wrote:
> > >
> > >> This series stores file cache eviction information in the vacated page
> > >> cache radix tree slots and uses it on refault to see if the pages
> > >> currently on the active list need to have their status challenged.
> > >
> > > So we no longer free the radix-tree node when everything under it has
> > > been reclaimed?  One could create workloads which would result in a
> > > tremendous amount of memory used by radix_tree_node_cachep objects.
> > >
> > > So I assume these things get thrown away at some point.  Some
> > > discussion about the life-cycle here would be useful.
> > 
> > I assume that in the current codebase Johannes has, we would
> > have to rely on the inode cache shrinker to reclaim the inode
> > and throw out the radix tree nodes.
> > 
> > Having a better way to deal with radix tree nodes that contain
> > stale entries (where the evicted pages would no longer receive
> > special treatment on re-fault, because it has been so long) get
> > reclaimed would be nice for a future version.
> > 
> 
> Well, think of a stupid workload which creates a large number of very
> large but sparse files (populated with one page in each 64, for
> example).  Get them all in cache, then sit there touching the inodes to
> keep then fresh.  What's the worst case here?

With 8G of RAM, it takes a minimally populated file (one page per leaf
node) of 3.5TB to consume all memory for radix tree nodes.

The worst case is going OOM without someone to blame as the objects
are owned by the kernel.

Is this a use case we should worry about?  A realistic one, I mean, it
wouldn't be the first one to take down a machine maliciously and could
be prevented by rlimiting the maximum file size.

That aside, entries that are past the point where they would mean
anything, as Rik described above, are a waste of memory, the severity
of which depends on how much of its previously faulted data an inode
has evicted while still being in active use.

For me it's not a question of whether we want a mechanism to reclaim
old shadow pages of inodes that are still in use, but how critical
this is, and then how accurate it needs to be etc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-05-03 13:15 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-01  8:41 [patch 0/5] refault distance-based file cache sizing Johannes Weiner
2012-05-01  8:41 ` Johannes Weiner
2012-05-01  8:41 ` [patch 1/5] mm: readahead: move radix tree hole searching here Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 21:06   ` Rik van Riel
2012-05-01 21:06     ` Rik van Riel
2012-05-01  8:41 ` [patch 2/5] mm + fs: prepare for non-page entries in page cache Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 19:02   ` Andrew Morton
2012-05-01 19:02     ` Andrew Morton
2012-05-01 20:15     ` Johannes Weiner
2012-05-01 20:15       ` Johannes Weiner
2012-05-01 20:24       ` Andrew Morton
2012-05-01 20:24         ` Andrew Morton
2012-05-01 21:14         ` Rik van Riel
2012-05-01 21:14           ` Rik van Riel
2012-05-01 21:29         ` Johannes Weiner
2012-05-01 21:29           ` Johannes Weiner
2012-05-01  8:41 ` [patch 3/5] mm + fs: store shadow pages " Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01  8:41 ` [patch 4/5] mm + fs: provide refault distance to page cache instantiations Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01  9:30   ` Peter Zijlstra
2012-05-01  9:30     ` Peter Zijlstra
2012-05-01  9:30     ` Peter Zijlstra
2012-05-01  9:55     ` Johannes Weiner
2012-05-01  9:55       ` Johannes Weiner
2012-05-01  9:58       ` Peter Zijlstra
2012-05-01  9:58         ` Peter Zijlstra
2012-05-01  9:58         ` Peter Zijlstra
2012-05-01  8:41 ` [patch 5/5] mm: refault distance-based file cache sizing Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 14:13   ` Minchan Kim
2012-05-01 14:13     ` Minchan Kim
2012-05-01 15:38     ` Johannes Weiner
2012-05-01 15:38       ` Johannes Weiner
2012-05-02  5:21       ` Minchan Kim
2012-05-02  5:21         ` Minchan Kim
2012-05-02  1:57   ` Andrea Arcangeli
2012-05-02  1:57     ` Andrea Arcangeli
2012-05-02  6:23     ` Johannes Weiner
2012-05-02  6:23       ` Johannes Weiner
2012-05-02 15:11       ` Andrea Arcangeli
2012-05-02 15:11         ` Andrea Arcangeli
2012-05-01 19:08 ` [patch 0/5] " Andrew Morton
2012-05-01 19:08   ` Andrew Morton
2012-05-01 21:19   ` Rik van Riel
2012-05-01 21:19     ` Rik van Riel
2012-05-01 21:26     ` Andrew Morton
2012-05-01 21:26       ` Andrew Morton
2012-05-02  1:10       ` Andrea Arcangeli
2012-05-02  1:10         ` Andrea Arcangeli
2012-05-03 13:15       ` Johannes Weiner [this message]
2012-05-03 13:15         ` Johannes Weiner
2012-05-16  5:25 ` nai.xia
2012-05-16  5:25   ` nai.xia
2012-05-16  6:51   ` Johannes Weiner
2012-05-16  6:51     ` Johannes Weiner
2012-05-16 12:56     ` nai.xia
2012-05-16 12:56       ` nai.xia
2012-05-17 21:08       ` Johannes Weiner
2012-05-17 21:08         ` Johannes Weiner
2012-05-18  3:44         ` Nai Xia
2012-05-18  3:44           ` Nai Xia
2012-05-18 15:07           ` Rik van Riel
2012-05-18 15:07             ` Rik van Riel
2012-05-18 15:30             ` Nai Xia
2012-05-18 15:30               ` Nai Xia
2012-05-18 15:30               ` Nai Xia
2012-05-17 13:11   ` Rik van Riel
2012-05-17 13:11     ` Rik van Riel
2012-05-18  5:03     ` Nai Xia
2012-05-18  5:03       ` Nai Xia
2012-05-18  5:03       ` Nai Xia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120503131531.GC31780@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.