linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Andrea Arcangeli <andrea@suse.de>,
	nickpiggin@yahoo.com.au, linux-kernel@vger.kernel.org
Subject: Re: smp race fix between invalidate_inode_pages* and do_no_page
Date: Fri, 7 Apr 2006 20:18:18 +0100 (BST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0604071945200.16361@blonde.wat.veritas.com> (raw)
In-Reply-To: <20060401212138.3b48f634.akpm@osdl.org>

On Sat, 1 Apr 2006, Andrew Morton wrote:
> Andrea Arcangeli <andrea@suse.de> wrote:
> > 
> > The new PG_truncate bitflag can be used as well to eliminate the
> > truncate_count from all vmas etc... that would save substantial memory
> > and remove some complexity, truncate_count grown a lot since the time we
> > introduced it.

truncate_count is playing a useful role with the vma tree, allowing
us to find our place again after allowing preemption in.  Though I'm
sometimes wondering whether just to make the vma _bigger_ and add a
list instead of that truncate_count: construct a list of vmas from
the prio_tree for unmap_mapping_range, so it can keep place in that,
instead of having to restart the tree whenever preempted.  But I
think NFS doesn't hold i_sem across unmapping, so it wouldn't work.

> > The PG_truncate is needed as well because we can't know in do_no_page if
> > page->mapping is legitimate null or not (think bttv and other device
> > drivers returning page->mapping null because they're private but not
> > reserved pages etc..)

That part is easily dealt with, as Nick and I have suggested,
just by marking vmas liable to having their pages truncated.
But that's certainly no more than one part of a larger solution.

> The patch will pretty clearly fix that.  But it really would be better to
> place all the cost over on the invalidate side, rather than adding overhead
> to the pagefault path.

I see ~2% slowdown in relevant faulting tests e.g. the lmbench ones.
I'd certainly prefer not to be locking pages here: or not without
word from the high-scalability people that Andrea's patch really in
practice doesn't hurt them (but the ~2% I see is on UP as much as MP).

> Could we perhaps check the page_count(page) and/or page_mapped(page) in
> invalidate_complete_page(), when we have tree_lock?

I'm thinking on it, but not finding it easy.  And I've just realized
that invalidate_inode_pages2 isn't the only problematic case here
(though agreed the more urgent one): MADV_REMOVE likewise needs to
invalidate pages without adjusting i_size, so also cannot rely on
the truncate_count/i_size method.

> Do you have handy any test code which others can use to recreate this bug?

I'd be glad of that too.

Hugh

  reply	other threads:[~2006-04-07 19:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-13 19:37 smp race fix between invalidate_inode_pages* and do_no_page Andrea Arcangeli
2005-12-13 21:02 ` Andrew Morton
2005-12-13 21:14   ` Andrea Arcangeli
2005-12-16 13:51     ` Andrea Arcangeli
2006-01-10  6:24       ` Andrea Arcangeli
2006-01-10  6:48         ` Andrea Arcangeli
2006-01-11  4:08         ` Nick Piggin
2006-01-11  8:23           ` Andrea Arcangeli
2006-01-11  8:51             ` Andrew Morton
2006-01-11  9:02               ` Andrea Arcangeli
2006-01-11  9:06                 ` Andrew Morton
2006-01-11  9:13                   ` Andrea Arcangeli
2006-01-11 20:49                     ` Hugh Dickins
2006-01-11 21:05                       ` Andrew Morton
2006-01-13  7:35                       ` Nick Piggin
2006-01-13  7:47                         ` Andrew Morton
2006-01-13 10:37                           ` Nick Piggin
2006-03-31 12:36                             ` Andrea Arcangeli
2006-04-02  5:17                               ` Nick Piggin
2006-04-02  5:21                               ` Andrew Morton
2006-04-07 19:18                                 ` Hugh Dickins [this message]
2006-01-11  9:39                 ` Nick Piggin
2006-01-11  9:34             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0604071945200.16361@blonde.wat.veritas.com \
    --to=hugh@veritas.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).