linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, hugh@veritas.com
Subject: Re: smp race fix between invalidate_inode_pages* and do_no_page
Date: Tue, 10 Jan 2006 07:48:12 +0100	[thread overview]
Message-ID: <20060110064812.GB15897@opteron.random> (raw)
In-Reply-To: <20060110062425.GA15897@opteron.random>

On Tue, Jan 10, 2006 at 07:24:25AM +0100, Andrea Arcangeli wrote:
> On Fri, Dec 16, 2005 at 02:51:47PM +0100, Andrea Arcangeli wrote:
> > There was a minor buglet in the previous patch an update is here:
> > 
> > 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.15-rc5/seqschedlock-2
> 
> JFYI: I got a few hours ago positive confirmation of the fix from the
> customer that was capable of reproducing this. I guess this is good
> enough for production use (it's at the very least certainly better than
> the previous code and it's guaranteed not to hurt the scalability of the
> fast path in smp, so it's the least intrusive fix I could imagine).
> 
> So we can start to think if we should using this new primitive I
> created, and if to replace the yield() with a proper waitqueue (and
> how). Or if to take the risk of hitting a bit of scalability in the
> nopage page faults of processes, by rewriting the fix with a
> find_lock_page in the do_no_page handler, that would avoid the need of
> my new locking primitive.

Another possible way to fix this is to put the page_count check back in
the invalidate_*_* under the tree_lock (exactly like the VM does in
shrink_caches and exactly like 2.4 does too!), and to stop removing
pages from pagecache if their page_count is > 1 (we would go back to
clear PG_uptodate). But then we'd have a problem once again with the
PG_utpodate bitflag being cleared by invalidate_*_* while do_no_page is
running, and in turn a mapped page could have PG_uptodate clear 8),
that's the invariant that lead us to start zapping the ptes and dropping
pagecache, so then we could stop zapping the ptes too like 2.4 and
allowing PG_uptodate to be clear (there's nothing fundamentally wrong
with that, as long as the buffers BH_uptodate are clear too).

Side note: in 2.6 invalidate_mapping_pages has been smp racy at least since
2.6.5, it basically broke when the page_count check was replaced with a
page_mapping check long ago. But it's probably so infrequent and the
race so tiny that it never happened there, but the first bugchecks in
the sles VM (those bugchecks unfortunately removed when
mainline-merging) started to trigger only very recently when we started
zapping ptes and dropping pages from invalidate_inode_pages2 like
mainline. Bug was invisible in invalidate_mapping_pages (apparently only
jffs2 uses invalidate_mapping_pages in a way that could oops, even nfs
uses invalidate_inode_pages2).

  reply	other threads:[~2006-01-10  6:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-13 19:37 smp race fix between invalidate_inode_pages* and do_no_page Andrea Arcangeli
2005-12-13 21:02 ` Andrew Morton
2005-12-13 21:14   ` Andrea Arcangeli
2005-12-16 13:51     ` Andrea Arcangeli
2006-01-10  6:24       ` Andrea Arcangeli
2006-01-10  6:48         ` Andrea Arcangeli [this message]
2006-01-11  4:08         ` Nick Piggin
2006-01-11  8:23           ` Andrea Arcangeli
2006-01-11  8:51             ` Andrew Morton
2006-01-11  9:02               ` Andrea Arcangeli
2006-01-11  9:06                 ` Andrew Morton
2006-01-11  9:13                   ` Andrea Arcangeli
2006-01-11 20:49                     ` Hugh Dickins
2006-01-11 21:05                       ` Andrew Morton
2006-01-13  7:35                       ` Nick Piggin
2006-01-13  7:47                         ` Andrew Morton
2006-01-13 10:37                           ` Nick Piggin
2006-03-31 12:36                             ` Andrea Arcangeli
2006-04-02  5:17                               ` Nick Piggin
2006-04-02  5:21                               ` Andrew Morton
2006-04-07 19:18                                 ` Hugh Dickins
2006-01-11  9:39                 ` Nick Piggin
2006-01-11  9:34             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060110064812.GB15897@opteron.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).