linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Rik van Riel <riel@redhat.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Konrad Wilk <konrad.wilk@oracle.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Nitin Gupta <ngupta@vflare.org>,
	Nebojsa Trpkovic <trx.lists@gmail.com>,
	minchan@kernel.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Chris Mason <chris.mason@oracle.com>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [PATCH] mm: implement WasActive page flag (for improving cleancache)
Date: Sun, 29 Jan 2012 16:25:21 -0600	[thread overview]
Message-ID: <1327875921.21193.11.camel@dabdike.int.hansenpartnership.com> (raw)
In-Reply-To: <4F2497DC.2040405@redhat.com>

On Sat, 2012-01-28 at 19:50 -0500, Rik van Riel wrote:
> On 01/27/2012 04:49 PM, James Bottomley wrote:
> 
> > So here, I was just saying your desire to store more data in the page
> > table and expand the page flags looks complex.
> >
> > Perhaps we do have a fundamental misunderstanding:  For readahead, I
> > don't really care about the referenced part.  referenced just means
> > pointed to by one or more vmas and active means pointed to by two or
> > more vmas (unless executable in which case it's one).
> 
> That is not at all what "referenced" means everywhere
> else in the VM.

I'm aware there's more subtlety, but I think it's a reasonable
generality: your one sentence summary of page_referenced() seems
conspicuously absent; care to provide it ... or would you prefer the VM
internals remain inaccessible to mere mortals? 

> If you write theories on what Dan should use, it would
> help if you limited yourself to stuff the VM provides
> and/or could provide :)

I didn't give any theories at all about what he should or shouldn't do.
I'm trying to think out loud about whether what he wants and what I
think would help readahead are the same thing (I started of thinking
they were and I talked myself out of it by the end of the previous
email).

> > What I think we care about for readahead is accessed.  This means a page
> > that got touched regardless of how many references it has.  An
> > unaccessed unaged RA page is a less good candidate for reclaim because
> > it should soon be accessed (under the RA heuristics) than an accessed RA
> > page.  Obviously if the heuristics misfire, we end up with futile RA
> > pages, which we read in expecting to be accessed, but which in fact
> > never were (so an unaccessed aged RA page) and need to be evicted.
> >
> > But for me, perhaps it's enough to put unaccessed RA pages into the
> > active list on instantiation and then actually put them in the inactive
> > list when they're accessed
> 
> That is an absolutely terrible idea for many obvious reasons.
> 
> Having readahead pages displace the working set wholesale
> is the absolute last thing we want.

Um, only if you assume you place them at the most recently used head of
the active list ... for obvious reasons, that's not what I was thinking.
I'm still not sure it's more feasible than having separate lists, though
since most recently used tail is  nasty because it's reverse ordering
them and probably not providing sufficient boost and middle insertion
looks just plain wrong.

> > I'm less clear on why you think a WasActive() flag is needed.  I think
> > you mean a member of the inactive list that was at some point previously
> > active.
> 
> > Um, that's complex.  Doesn't your inactive-C list really just identify
> > pages that were shared but have sunk in the LRU lists due to lack of
> > use?
> 
> Nope. Pages that are not mapped can still end up on the active
> list, by virtue of getting accessed multiple times in a "short"
> period of time (the residence on the inactive list).
> 
> We want to cache frequently accessed pages with preference over
> streaming IO data that gets accessed infrequently.

Well, no, that's what I'm trying to argue against.  The chances are that
Streaming RA I/O gets accessed once (the classic movie scenario).  So
the idea is that if you can identify RA as streaming, it should be kept
while unaccessed but discarded after it's been accessed.  To get the LRU
lists to identify this, we want to give a boost to unaccessed unaged RA,
a suppression to accessed once RA and standard heuristics if RA gets
accessed more than once.

James



  reply	other threads:[~2012-01-29 22:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-25 21:58 [PATCH] mm: implement WasActive page flag (for improving cleancache) Dan Magenheimer
2012-01-26 17:28 ` Dave Hansen
2012-01-26 21:28   ` Dan Magenheimer
2012-01-27  0:31     ` Andrew Morton
2012-01-27  0:56       ` Dan Magenheimer
2012-01-27  1:15         ` Andrew Morton
2012-01-27  2:43           ` Dan Magenheimer
2012-01-27  3:33             ` Rik van Riel
2012-01-27  5:15               ` Dan Magenheimer
2012-01-30  8:57                 ` KAMEZAWA Hiroyuki
2012-01-30 22:03                   ` Dan Magenheimer
2012-01-27 13:43             ` James Bottomley
2012-01-27 17:32               ` Dan Magenheimer
2012-01-27 17:54                 ` James Bottomley
2012-01-27 18:46                   ` Dan Magenheimer
2012-01-27 21:49                     ` James Bottomley
2012-01-29  0:50                       ` Rik van Riel
2012-01-29 22:25                         ` James Bottomley [this message]
2012-01-27  3:28         ` Rik van Riel
2012-01-27  5:11           ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1327875921.21193.11.camel@dabdike.int.hansenpartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=trx.lists@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).