linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@transmeta.com>
To: Andrew Morton <akpm@digeo.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Rusty Russell <rusty@rustcorp.com.au>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [patch] 'sticky pages' support in the VM, futex-2.5.38-C5
Date: Thu, 26 Sep 2002 15:45:43 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.33.0209261533230.1345-100000@penguin.transmeta.com> (raw)
In-Reply-To: <3D938588.4508FDF@digeo.com>


Actually, thinking more on this..

Ingo Molnar wrote: 
>
>  - if the faulting context is a non-owner (ie. the fork()-ed child), then
>    the normal COW path is taken - new page allocated and installed.
> 
>  - if the faulting context is the owner, then the pte chain is walked, and
>    the new page is installed into every 'other' pte. This needs a
>    cross-CPU single-page TLB flush though. The TLB flush could be
>    optimized if we had a way to get to the mapping MM's of the individual
>    pte chain entries - is this possible?

Actually, we don't have to do it this way. My preferred solution would be 
to make the pinning data structure be a special one with a callback (which 
also means that you do _not_ have to re-use the LRU list), and what we do 
is that when we're getting called back the futex code just updates to the 
new physical page instead.

So the data structures would look something like this:

	struct page_change_struct {
		unsigned long address;
		struct mm_struct *vm;
		struct list_head list;
		void (*callback)(struct page_change_struct *data, struct page *new);
	}

	struct list_head page_change_struct_hash[HASHSIZE];

and then when we pin a page, we do

	/* This is part of the 
	struct page_change_struct pinned_data;

	pinned_data.address = virtual_address;
	pinned_data.vm = current_mm;
	pinned_data.callback = futex_cow_callback;

	insert_pin_page(page, &pinned_data);
		.. this does a hash on address, inserts it into the
		   page_change_struct_hash table, and is done..

unpinning does:

	remove_pin_page(page, &pinned_data);
		.. this just does a "list_del(&pinned_data); ...

and COW does:

	.. hash the COW address, look up the page_change_struct_hash,
	   search if the page/vm tuple exists in the index, and if it
	   does, call the callback()..

and then the "callback" function just updates the page information in the 
futex block directly - as if it was looked up anew.

This has the advantage that it works without any cross-CPU tlb stuff, and 
that other users (not just futexes) can also register themselves for 
getting callbacks if somebody COW's a page they had.

We could extend it to work for unmapping etc too if we wanted (ie anybody 
who caches a virtual->physical translation for a specific page can always 
ask for a "invalidate this particular page mapping" event.

I really like this approach. 

[ Of course I do, since I thought it up. All my ideas are absolutely 
  brilliant, until somebody points out why they can't work. The locking
  might be interesting, but the most obvious locking seems to be to have 
  some per-hash thing. ]

			Linus


  parent reply	other threads:[~2002-09-26 22:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-26 11:30 [patch] 'sticky pages' support in the VM, futex-2.5.38-C5 Ingo Molnar
2002-09-26 15:01 ` Linus Torvalds
2002-09-26 22:09 ` Andrew Morton
2002-09-26 22:32   ` Linus Torvalds
2002-09-27  7:53     ` Ingo Molnar
2002-09-26 22:45   ` Linus Torvalds [this message]
2002-09-26 22:56     ` Linus Torvalds
2002-09-27 11:11     ` Ingo Molnar
2002-10-04 22:47 ` Jamie Lokier
2002-10-04 23:20   ` Linus Torvalds
     [not found] <200209261501.g8QF1pc02251@penguin.transmeta.com>
2002-09-26 15:27 ` Ingo Molnar
2002-09-26 16:48   ` Alan Cox
2002-09-27  8:05 Martin Wirth
2002-09-27  9:27 ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.33.0209261533230.1345-100000@penguin.transmeta.com \
    --to=torvalds@transmeta.com \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).