From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, hugh@veritas.com
Subject: Re: smp race fix between invalidate_inode_pages* and do_no_page
Date: Wed, 11 Jan 2006 20:34:11 +1100 [thread overview]
Message-ID: <43C4D113.4060705@yahoo.com.au> (raw)
In-Reply-To: <20060111082359.GV15897@opteron.random>
Andrea Arcangeli wrote:
> On Wed, Jan 11, 2006 at 03:08:31PM +1100, Nick Piggin wrote:
>
>>I'd be inclined to think a lock_page is not a big SMP scalability
>>problem because the struct page's cacheline(s) will be written to
>>several times in the process of refcounting anyway. Such a workload
>>would also be running into tree_lock as well.
>
>
> I seem to recall you wanted to make the tree_lock a readonly lock for
> readers for the exact same scalability reason? do_no_page is quite a
I think Bill Irwin or Peter Chubb made the tree_lock a reader-writer
lock back in the day.
I have some patches (ref:lockless pagecache) that completely removes
the tree_lock from read-side operations like find_get_page and
find_lock_page, and turns the write side back into a regular spinlock.
You must be thinking of that?
> fast path for the tree lock too. But I totally agree the unavoidable is
> the atomic_inc though, good point, so it worth more to remove the
> tree_lock than to remove the page lock, the tree_lock can be avoided the
> atomic_inc on page->_count not.
>
Yep, my thinking as well.
> The other bonus that makes this attractive is that then we can drop the
> *whole* vm_truncate_count mess... vm_truncate_count and
> inode->trunate_count exists for the only single reason that do_no_page
> must not map into the pte a page that is under truncation. We can
> provide the same guarantee with the page lock doing like
> invalidate_inode_pages2_range (that is to check page_mapping under the
> page_lock and executing unmap_mapping_range with the page lock held if
> needed). That will free 4 bytes per vma (without even counting the
> truncate_count on every inode out there! that could be an even larger
> gain), on my system I have 9191 vmas in use, that's 36K saved of ram in
> my system, and that's 36K saved on x86, on x86-64 it's 72K saved of
> physical ram since it's an unsigned long after a pointer, and vma must
> not be hw aligned (and infact it isn't so the saving is real). On the
> indoes side it saves 4 bytes
> * 1384 on my current system, on a busy nfs server it can save a lot
> more. The inode also most not be hw aligned and correctly it isn't. On a
> server with lot more of vmas and lot more of inodes it'll save more ram.
>
> So if I make this change this could give me a grant for lifetime
> guarantee of seccomp in the kernel that takes less than 1kbyte on a x86,
> right? (on a normal desktop I'll save at minimum 30 times more than what
> I cost to the kernel users ;) Just kidding of course...
>
Sounds like a good idea (and your proposed implementation -
lock_page and recheck mapping in do_no_page sounds sane).
Thanks,
Nick
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
prev parent reply other threads:[~2006-01-11 9:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-13 19:37 smp race fix between invalidate_inode_pages* and do_no_page Andrea Arcangeli
2005-12-13 21:02 ` Andrew Morton
2005-12-13 21:14 ` Andrea Arcangeli
2005-12-16 13:51 ` Andrea Arcangeli
2006-01-10 6:24 ` Andrea Arcangeli
2006-01-10 6:48 ` Andrea Arcangeli
2006-01-11 4:08 ` Nick Piggin
2006-01-11 8:23 ` Andrea Arcangeli
2006-01-11 8:51 ` Andrew Morton
2006-01-11 9:02 ` Andrea Arcangeli
2006-01-11 9:06 ` Andrew Morton
2006-01-11 9:13 ` Andrea Arcangeli
2006-01-11 20:49 ` Hugh Dickins
2006-01-11 21:05 ` Andrew Morton
2006-01-13 7:35 ` Nick Piggin
2006-01-13 7:47 ` Andrew Morton
2006-01-13 10:37 ` Nick Piggin
2006-03-31 12:36 ` Andrea Arcangeli
2006-04-02 5:17 ` Nick Piggin
2006-04-02 5:21 ` Andrew Morton
2006-04-07 19:18 ` Hugh Dickins
2006-01-11 9:39 ` Nick Piggin
2006-01-11 9:34 ` Nick Piggin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C4D113.4060705@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@osdl.org \
--cc=andrea@suse.de \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).