All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 2/5] mm + fs: prepare for non-page entries in page cache
Date: Tue, 1 May 2012 23:29:58 +0200	[thread overview]
Message-ID: <20120501212958.GC2112@cmpxchg.org> (raw)
In-Reply-To: <20120501132449.30485966.akpm@linux-foundation.org>

On Tue, May 01, 2012 at 01:24:49PM -0700, Andrew Morton wrote:
> On Tue, 1 May 2012 22:15:04 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Tue, May 01, 2012 at 12:02:46PM -0700, Andrew Morton wrote:
> > > On Tue,  1 May 2012 10:41:50 +0200
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > 
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -544,8 +544,7 @@ static void evict(struct inode *inode)
> > > >  	if (op->evict_inode) {
> > > >  		op->evict_inode(inode);
> > > >  	} else {
> > > > -		if (inode->i_data.nrpages)
> > > > -			truncate_inode_pages(&inode->i_data, 0);
> > > > +		truncate_inode_pages(&inode->i_data, 0);
> > > 
> > > Why did we lose this optimisation?
> > 
> > For inodes with only shadow pages remaining in the tree, because there
> > is no separate counter for them.  Otherwise, we'd leak the tree nodes.
> > 
> > I had mapping->nrshadows at first to keep truncation conditional, but
> > thought that using an extra word per cached inode would be worse than
> > removing this optimization.  There is not too much being done when the
> > tree is empty.
> > 
> > Another solution would be to include the shadows count in ->nrpages,
> > but filesystems use this counter for various other purposes.
> > 
> > Do you think it's worth reconsidering?
> 
> It doesn't sound like it's worth adding ->nrshadows for only that
> reason.
> 
> That's a pretty significant alteration in the meaning of ->nrpages. 
> Did this not have any other effects?

It still means "number of page entries in radix tree", just that the
radix tree can be non-empty when this count drops to zero.  AFAICS,
it's used when writing/syncing the inode or when gathering statistics,
like nr_blockdev_pages().  They only care about actual pages.  It's
just the final truncate that has to make sure to remove the non-pages
as well.

> What does truncate do?  I assume it invalidates shadow page entries in
> the radix tree?  And frees the radix-tree nodes?

Yes, it just does a radix_tree_delete() on these shadow page entries,
see clear_exceptional_entry() in mm/truncate.c.  This garbage-collects
empty nodes.

> The patchset will make lookups slower in some (probably obscure)
> circumstances, due to the additional radix-tree nodes.
> 
> I assume that if a pagecache lookup encounters a radix-tree node which
> contains no real pages, the search will terminate at that point?  We
> don't pointlessly go all the way down to the leaf nodes?

When reading/instantiating it's not pointless.  Empty slots or shadow
slots are faults and we have to retrieve the shadow entries to
calculate the refault distance.

When writing: dirtied pages are tagged, and tags are propagated
upwards in the tree, so we don't check any more nodes than before.
For leaf nodes (64 slots) where shadow entries and dirty pages mix,
the cost of skipping shadow entries is a bit bigger than that of empty
slots (see the radix_tree_exception branch in find_get_pages(), which
Hugh added to handle shmem's swap entries).

Then there are filesystems that do page cache lookups for population
analysis/heuristics, they could indeed pointlessly descend to leaf
nodes that only contain non-page entries.  I haven't investigated yet
how hot these paths actuallly are.  If this turns out to be a problem,
we could add another tag and trade tree size for performance.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 2/5] mm + fs: prepare for non-page entries in page cache
Date: Tue, 1 May 2012 23:29:58 +0200	[thread overview]
Message-ID: <20120501212958.GC2112@cmpxchg.org> (raw)
In-Reply-To: <20120501132449.30485966.akpm@linux-foundation.org>

On Tue, May 01, 2012 at 01:24:49PM -0700, Andrew Morton wrote:
> On Tue, 1 May 2012 22:15:04 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > On Tue, May 01, 2012 at 12:02:46PM -0700, Andrew Morton wrote:
> > > On Tue,  1 May 2012 10:41:50 +0200
> > > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > 
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -544,8 +544,7 @@ static void evict(struct inode *inode)
> > > >  	if (op->evict_inode) {
> > > >  		op->evict_inode(inode);
> > > >  	} else {
> > > > -		if (inode->i_data.nrpages)
> > > > -			truncate_inode_pages(&inode->i_data, 0);
> > > > +		truncate_inode_pages(&inode->i_data, 0);
> > > 
> > > Why did we lose this optimisation?
> > 
> > For inodes with only shadow pages remaining in the tree, because there
> > is no separate counter for them.  Otherwise, we'd leak the tree nodes.
> > 
> > I had mapping->nrshadows at first to keep truncation conditional, but
> > thought that using an extra word per cached inode would be worse than
> > removing this optimization.  There is not too much being done when the
> > tree is empty.
> > 
> > Another solution would be to include the shadows count in ->nrpages,
> > but filesystems use this counter for various other purposes.
> > 
> > Do you think it's worth reconsidering?
> 
> It doesn't sound like it's worth adding ->nrshadows for only that
> reason.
> 
> That's a pretty significant alteration in the meaning of ->nrpages. 
> Did this not have any other effects?

It still means "number of page entries in radix tree", just that the
radix tree can be non-empty when this count drops to zero.  AFAICS,
it's used when writing/syncing the inode or when gathering statistics,
like nr_blockdev_pages().  They only care about actual pages.  It's
just the final truncate that has to make sure to remove the non-pages
as well.

> What does truncate do?  I assume it invalidates shadow page entries in
> the radix tree?  And frees the radix-tree nodes?

Yes, it just does a radix_tree_delete() on these shadow page entries,
see clear_exceptional_entry() in mm/truncate.c.  This garbage-collects
empty nodes.

> The patchset will make lookups slower in some (probably obscure)
> circumstances, due to the additional radix-tree nodes.
> 
> I assume that if a pagecache lookup encounters a radix-tree node which
> contains no real pages, the search will terminate at that point?  We
> don't pointlessly go all the way down to the leaf nodes?

When reading/instantiating it's not pointless.  Empty slots or shadow
slots are faults and we have to retrieve the shadow entries to
calculate the refault distance.

When writing: dirtied pages are tagged, and tags are propagated
upwards in the tree, so we don't check any more nodes than before.
For leaf nodes (64 slots) where shadow entries and dirty pages mix,
the cost of skipping shadow entries is a bit bigger than that of empty
slots (see the radix_tree_exception branch in find_get_pages(), which
Hugh added to handle shmem's swap entries).

Then there are filesystems that do page cache lookups for population
analysis/heuristics, they could indeed pointlessly descend to leaf
nodes that only contain non-page entries.  I haven't investigated yet
how hot these paths actuallly are.  If this turns out to be a problem,
we could add another tag and trade tree size for performance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-05-01 21:30 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-01  8:41 [patch 0/5] refault distance-based file cache sizing Johannes Weiner
2012-05-01  8:41 ` Johannes Weiner
2012-05-01  8:41 ` [patch 1/5] mm: readahead: move radix tree hole searching here Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 21:06   ` Rik van Riel
2012-05-01 21:06     ` Rik van Riel
2012-05-01  8:41 ` [patch 2/5] mm + fs: prepare for non-page entries in page cache Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 19:02   ` Andrew Morton
2012-05-01 19:02     ` Andrew Morton
2012-05-01 20:15     ` Johannes Weiner
2012-05-01 20:15       ` Johannes Weiner
2012-05-01 20:24       ` Andrew Morton
2012-05-01 20:24         ` Andrew Morton
2012-05-01 21:14         ` Rik van Riel
2012-05-01 21:14           ` Rik van Riel
2012-05-01 21:29         ` Johannes Weiner [this message]
2012-05-01 21:29           ` Johannes Weiner
2012-05-01  8:41 ` [patch 3/5] mm + fs: store shadow pages " Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01  8:41 ` [patch 4/5] mm + fs: provide refault distance to page cache instantiations Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01  9:30   ` Peter Zijlstra
2012-05-01  9:30     ` Peter Zijlstra
2012-05-01  9:30     ` Peter Zijlstra
2012-05-01  9:55     ` Johannes Weiner
2012-05-01  9:55       ` Johannes Weiner
2012-05-01  9:58       ` Peter Zijlstra
2012-05-01  9:58         ` Peter Zijlstra
2012-05-01  9:58         ` Peter Zijlstra
2012-05-01  8:41 ` [patch 5/5] mm: refault distance-based file cache sizing Johannes Weiner
2012-05-01  8:41   ` Johannes Weiner
2012-05-01 14:13   ` Minchan Kim
2012-05-01 14:13     ` Minchan Kim
2012-05-01 15:38     ` Johannes Weiner
2012-05-01 15:38       ` Johannes Weiner
2012-05-02  5:21       ` Minchan Kim
2012-05-02  5:21         ` Minchan Kim
2012-05-02  1:57   ` Andrea Arcangeli
2012-05-02  1:57     ` Andrea Arcangeli
2012-05-02  6:23     ` Johannes Weiner
2012-05-02  6:23       ` Johannes Weiner
2012-05-02 15:11       ` Andrea Arcangeli
2012-05-02 15:11         ` Andrea Arcangeli
2012-05-01 19:08 ` [patch 0/5] " Andrew Morton
2012-05-01 19:08   ` Andrew Morton
2012-05-01 21:19   ` Rik van Riel
2012-05-01 21:19     ` Rik van Riel
2012-05-01 21:26     ` Andrew Morton
2012-05-01 21:26       ` Andrew Morton
2012-05-02  1:10       ` Andrea Arcangeli
2012-05-02  1:10         ` Andrea Arcangeli
2012-05-03 13:15       ` Johannes Weiner
2012-05-03 13:15         ` Johannes Weiner
2012-05-16  5:25 ` nai.xia
2012-05-16  5:25   ` nai.xia
2012-05-16  6:51   ` Johannes Weiner
2012-05-16  6:51     ` Johannes Weiner
2012-05-16 12:56     ` nai.xia
2012-05-16 12:56       ` nai.xia
2012-05-17 21:08       ` Johannes Weiner
2012-05-17 21:08         ` Johannes Weiner
2012-05-18  3:44         ` Nai Xia
2012-05-18  3:44           ` Nai Xia
2012-05-18 15:07           ` Rik van Riel
2012-05-18 15:07             ` Rik van Riel
2012-05-18 15:30             ` Nai Xia
2012-05-18 15:30               ` Nai Xia
2012-05-18 15:30               ` Nai Xia
2012-05-17 13:11   ` Rik van Riel
2012-05-17 13:11     ` Rik van Riel
2012-05-18  5:03     ` Nai Xia
2012-05-18  5:03       ` Nai Xia
2012-05-18  5:03       ` Nai Xia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120501212958.GC2112@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.