All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Rik van Riel <riel@redhat.com>, Andrea Arcangeli <andrea@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	Hugh Dickins <hugh@veritas.com>, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@osdl.org>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: anon_vma RFC2
Date: Sat, 13 Mar 2004 08:18:48 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.58.0403130759150.1045@ppc970.osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.44.0403130942200.15971-100000@chimarrao.boston.redhat.com>



Ok, guys,
 how about this anon-page suggestion?

I'm a bit nervous about the complexity issues in Andrea's current setup, 
so I've been thinking about Rik's per-mm thing. And I think that there is 
one very simple approach, which should work fine, and should have minimal 
impact on the existing setup exactly because it is so simple.

Basic setup:
 - each anonymous page is associated with exactly _one_ virtual address, 
   in a "anon memory group". 

   We put the virtual address (shifted down by PAGE_SHIFT) into 
   "page->index". We put the "anon memory group" pointer into 
   "page->mapping". We have a PAGE_ANONYMOUS flag to tell the
   rest of the world about this.

 - the anon memory group has a list of all mm's that it is associated 
   with.

 - an "execve()" creates a new "anon memory group" and drops the old one.

 - a mm copy operation just increments the reference count and adds the 
   new mm to the mm list for that anon memory group.

So now to do reverse mapping, we can take a page, and do

	if (PageAnonymous(page)) {
		struct anongroup *mmlist = (struct anongroup *)page->mapping;
		unsigned long address = page->index << PAGE_SHIFT;
		struct mm_struct *mm;

		for_each_entry(mm, mmlist->anon_mms, anon_mm) {
			.. look up page in page tables in "mm, address" ..
			.. most of the time we may not even need to look ..
			.. up the "vma" at all, just walk the page tables ..
		}
	} else {
		/* Shared page */
		.. look up page using the inode vma list ..
	}

The above all works 99% of the time.

The only problem is mremap() after a fork(), and hell, we know that's a
special case anyway, and let's just add a few lines to copy_one_pte(),
which basically does:

	if (PageAnonymous(page) && page->count > 1) {
		newpage = alloc_page();
		copy_page(page, newpage);
		page = newpage;
	}
	/* Move the page to the new address */
	page->index = address >> PAGE_SHIFT;

and now we have zero special cases.

The above should work very well. In most cases the "anongroup" will be 
very small, and even when it's large (if somebody does a ton of forks 
without any execve's), we only have _one_ address to check, and that is 
pretty fast. A high-performance server would use threads, anyway. (And 
quite frankly, _any_ algorithm will have this issue. Even rmap will have 
exactly the same loop, although rmap skips any vm's where the page might 
have been COW'ed or removed).

The extra COW in mremap() seems benign. Again, it should usually not even 
trigger.

What do you think? To me, this seems to be a really simple approach..

		Linus

  reply	other threads:[~2004-03-13 16:13 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-08 20:24 objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Andrea Arcangeli
2004-03-08 20:39 ` Linus Torvalds
2004-03-08 21:23   ` Andrew Morton
2004-03-08 23:02     ` Andrea Arcangeli
2004-03-08 23:21       ` Andrew Morton
2004-03-08 23:40         ` Andrea Arcangeli
2004-03-09  0:10           ` Andrew Morton
2004-03-09  0:35             ` Andrea Arcangeli
2004-03-09  0:59               ` Andrew Morton
2004-03-09  8:31           ` Ingo Molnar
2004-03-09  8:44             ` William Lee Irwin III
2004-03-09  9:03             ` Ingo Molnar
2004-03-09 14:51               ` Andrea Arcangeli
2004-03-09 15:09                 ` Ingo Molnar
2004-03-09 15:24                   ` Andrea Arcangeli
2004-03-09 16:10                     ` Ingo Molnar
2004-03-09 16:35                       ` Andrea Arcangeli
2004-03-08 21:02 ` Andrew Morton
2004-03-08 22:34   ` Andrea Arcangeli
2004-03-09  2:46     ` Andrew Morton
2004-03-08 21:28 ` Arjan van de Ven
2004-03-08 23:08   ` Andrea Arcangeli
2004-03-09  7:47     ` Ingo Molnar
2004-03-09 15:21       ` Andrea Arcangeli
2004-03-09 15:36         ` Ingo Molnar
2004-03-09 16:33           ` Andrea Arcangeli
2004-03-09 17:23             ` Martin J. Bligh
2004-03-09 19:57             ` Ingo Molnar
2004-03-09 20:27               ` Andrea Arcangeli
2004-03-10 11:35                 ` Ingo Molnar
2004-03-10 12:32                   ` Andrea Arcangeli
2004-03-09 10:52 ` [lockup] " Ingo Molnar
2004-03-09 11:02   ` Ingo Molnar
2004-03-09 11:09     ` Andrew Morton
2004-03-09 11:49       ` Ingo Molnar
2004-03-09 12:32         ` William Lee Irwin III
2004-03-09 16:03         ` Andrea Arcangeli
2004-03-10 10:36           ` RFC anon_vma previous (i.e. full objrmap) Andrea Arcangeli
2004-03-10 10:40             ` RFC anon_vma preview " Andrea Arcangeli
2004-03-10 10:54             ` RFC anon_vma previous " Ingo Molnar
2004-03-11  6:52             ` anon_vma RFC2 Andrea Arcangeli
2004-03-11 13:23               ` Hugh Dickins
2004-03-11 13:56                 ` Andrea Arcangeli
2004-03-11 21:54                   ` Hugh Dickins
2004-03-12  1:47                     ` Andrea Arcangeli
2004-03-12  2:20                       ` Andrea Arcangeli
2004-03-12  3:28                   ` Rik van Riel
2004-03-12 12:21                     ` Andrea Arcangeli
2004-03-12 12:40                       ` Rik van Riel
2004-03-12 13:11                         ` Andrea Arcangeli
2004-03-12 16:25                           ` Rik van Riel
2004-03-12 17:13                             ` Andrea Arcangeli
2004-03-12 17:23                               ` Rik van Riel
2004-03-12 17:44                                 ` Andrea Arcangeli
2004-03-12 18:18                                   ` Rik van Riel
2004-03-12 18:25                                 ` Linus Torvalds
2004-03-12 18:48                                   ` Rik van Riel
2004-03-12 19:02                                     ` Chris Friesen
2004-03-12 19:06                                       ` Rik van Riel
2004-03-12 19:10                                         ` Chris Friesen
2004-03-12 19:14                                           ` Rik van Riel
2004-03-12 20:27                                         ` Andrea Arcangeli
2004-03-12 20:32                                           ` Rik van Riel
2004-03-12 20:49                                             ` Andrea Arcangeli
2004-03-12 21:08                                   ` Jamie Lokier
2004-03-12 12:42                       ` Andrea Arcangeli
2004-03-12 12:46                       ` William Lee Irwin III
2004-03-12 13:24                         ` Andrea Arcangeli
2004-03-12 13:40                           ` William Lee Irwin III
2004-03-12 13:55                           ` Hugh Dickins
2004-03-12 16:01                             ` Andrea Arcangeli
2004-03-12 16:17                         ` Linus Torvalds
2004-03-13  0:28                           ` William Lee Irwin III
2004-03-13 14:43                           ` Rik van Riel
2004-03-13 16:18                             ` Linus Torvalds [this message]
2004-03-13 17:24                               ` Hugh Dickins
2004-03-13 17:28                                 ` Rik van Riel
2004-03-13 17:41                                   ` Hugh Dickins
2004-03-13 18:08                                     ` Andrea Arcangeli
2004-03-13 17:54                                   ` Andrea Arcangeli
2004-03-13 17:55                                     ` Andrea Arcangeli
2004-03-13 18:57                                   ` Linus Torvalds
2004-03-13 19:14                                     ` Hugh Dickins
2004-03-13 17:48                                 ` Andrea Arcangeli
2004-03-13 17:33                               ` Andrea Arcangeli
2004-03-13 17:53                                 ` Hugh Dickins
2004-03-13 18:13                                   ` Andrea Arcangeli
2004-03-13 19:35                                     ` Hugh Dickins
2004-03-13 17:57                                 ` Rik van Riel
2004-03-12 13:43                       ` Hugh Dickins
2004-03-12 15:56                         ` Andrea Arcangeli
2004-03-12 16:12                           ` Hugh Dickins
2004-03-12 16:39                             ` Andrea Arcangeli
2004-03-11 17:33                 ` Andrea Arcangeli
2004-03-11 22:20                 ` Rik van Riel
2004-03-11 23:43                   ` Hugh Dickins
2004-03-12  3:20                     ` Rik van Riel
2004-03-09 17:22         ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-09 17:56           ` Andrea Arcangeli
2004-03-09 15:59     ` Andrea Arcangeli
2004-03-09 16:07       ` Ingo Molnar
2004-03-09 16:08         ` Ingo Molnar
2004-03-09 16:39           ` Andrea Arcangeli
2004-03-09 19:33             ` Ingo Molnar
2004-03-09 16:39         ` Andrea Arcangeli
2004-03-09 15:41   ` Andrea Arcangeli
2004-03-15 19:47     ` Marcelo Tosatti
2004-03-15 22:00       ` Andrea Arcangeli
2004-03-16  7:39         ` Marcelo Tosatti
2004-03-16 13:50           ` Andrea Arcangeli
2004-03-11 20:09 anon_vma RFC2 Manfred Spraul
     [not found] <20040310080000.GA30940@dualathlon.random>
2004-03-10 13:01 ` [lockup] Re: objrmap-core-1 (rmap removal for file mappings to avoid 4:4 in <=16G machines) Rik van Riel
2004-03-10 13:50   ` Andrea Arcangeli
2004-03-12 17:05     ` anon_vma RFC2 Rajesh Venkatasubramanian
2004-03-12 17:26       ` Andrea Arcangeli
2004-03-12 21:16         ` Rajesh Venkatasubramanian
2004-03-13 17:55           ` Rajesh Venkatasubramanian
2004-03-13 18:16             ` Andrea Arcangeli
2004-03-13 19:40               ` Rajesh Venkatasubramanian
2004-03-14  0:23                 ` Andrea Arcangeli
2004-03-14  0:52                   ` Linus Torvalds
2004-03-14  1:01                     ` William Lee Irwin III
2004-03-14  1:07                       ` Rik van Riel
2004-03-14  1:19                         ` William Lee Irwin III
2004-03-14  1:41                           ` Rik van Riel
2004-03-14  2:27                             ` William Lee Irwin III
2004-03-14  1:15                       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0403130759150.1045@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.