linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hugh@veritas.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Rik van Riel <riel@redhat.com>,
	Ray Fucillo <fucillo@intersystems.com>,
	linux-kernel@vger.kernel.org
Subject: Re: process creation time increases linearly with shmem
Date: Fri, 26 Aug 2005 12:49:35 +0100 (BST)	[thread overview]
Message-ID: <Pine.LNX.4.61.0508261220230.4697@goblin.wat.veritas.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0508252055370.3317@g5.osdl.org>

On Thu, 25 Aug 2005, Linus Torvalds wrote:
> On Fri, 26 Aug 2005, Nick Piggin wrote:
> > 
> > > Skipping MAP_SHARED in fork() sounds like a good idea to me...
> > 
> > Indeed. Linus, can you remember why we haven't done this before?
> 
> Hmm. Historical reasons. Also, if the child ends up needing it, it will 
> now have to fault them in.
> 
> That said, I think it's a valid optimization. Especially as the child 
> _probably_ doesn't need it (ie there's at least some likelihood of an 
> execve() or similar).

I agree, seems a great idea to me (sulking because I was too dumb
to get it, even when Nick and Andi first posted their patches).

It won't just save on the copying at fork time, it'll save on
undoing it all again when the child mm is torn down for exec.

The refaulting will hurt the performance of something: let's
just hope that something doesn't turn out to be a show-stopper.

I see some flaws in the various patches posted, including Rik's.
Here's another version - doing it inside copy_page_range, so this
kind of vma special-casing is over in mm/ rather than kernel/.

No point in testing vm_file, the vm_flags cover the cases.
Test VM_MAYSHARE rather than VM_SHARED to include the never-can-be-
written MAP_SHARED cases too.  Must exclude VM_NONLINEAR, their ptes
are essential for defining the file offsets.  Must exclude VM_RESERVED,
faults on remap_pfn_range areas would usually put in anon zeroed pages
instead of the driver pages - or perhaps would be better as a test
against VM_IO, or vma->vm_ops->nopage?

Having to exclude the VM_NONLINEAR seems rather a shame, since those
are always shared and likely enormous.  The InfiniBand people's idea 
of a way for the app to set VM_DONTCOPY (to avoid rdma get_user_pages
problems) becomes attractive as a way for apps to speed their forks.

Hugh

--- 2.6.13-rc7/mm/memory.c	2005-08-24 11:13:41.000000000 +0100
+++ linux/mm/memory.c	2005-08-26 10:09:50.000000000 +0100
@@ -498,6 +498,14 @@ int copy_page_range(struct mm_struct *ds
 	unsigned long addr = vma->vm_start;
 	unsigned long end = vma->vm_end;
 
+	/*
+	 * Assume the fork will probably exec: don't waste time copying
+	 * ptes where a page fault will fill them correctly afterwards.
+	 */
+	if ((vma->vm_flags & (VM_MAYSHARE|VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))
+								== VM_MAYSHARE)
+		return 0;
+
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 

  reply	other threads:[~2005-08-26 11:47 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-24 18:43 process creation time increases linearly with shmem Ray Fucillo
2005-08-25  0:14 ` Nick Piggin
2005-08-25 13:07   ` Ray Fucillo
2005-08-25 13:13     ` Andi Kleen
2005-08-25 14:28     ` Nick Piggin
2005-08-25 17:31   ` Rik van Riel
2005-08-26  1:26     ` Nick Piggin
2005-08-26  1:50       ` Rik van Riel
2005-08-26  3:56       ` Linus Torvalds
2005-08-26 11:49         ` Hugh Dickins [this message]
2005-08-26 14:26           ` Nick Piggin
2005-08-26 17:00             ` Ray Fucillo
2005-08-26 17:53               ` Rik van Riel
2005-08-26 18:20                 ` Ross Biro
2005-08-26 18:56                   ` Hugh Dickins
     [not found]           ` <8783be660508260915524e2b1e@mail.gmail.com>
2005-08-26 16:38             ` Hugh Dickins
2005-08-26 16:43               ` Ross Biro
2005-08-26 18:07           ` Linus Torvalds
2005-08-26 18:41             ` Hugh Dickins
2005-08-26 22:55               ` Linus Torvalds
2005-08-26 23:10               ` Rik van Riel
2005-08-26 23:23                 ` Linus Torvalds
2005-08-27 15:05                   ` Nick Piggin
2005-08-28  4:26                     ` Hugh Dickins
2005-08-28  6:49                       ` Nick Piggin
2005-08-29 23:33                         ` Ray Fucillo
2005-08-30  0:29                           ` Nick Piggin
2005-08-30  1:03                             ` Linus Torvalds
2005-08-30  0:34                           ` Linus Torvalds
2005-08-25 14:05 Parag Warudkar
2005-08-25 14:22 ` Andi Kleen
2005-08-25 14:35   ` Nick Piggin
2005-08-25 14:47   ` Parag Warudkar
2005-08-25 15:56     ` Andi Kleen
2005-12-14 14:07 Brice Oliver
2005-12-14 16:21 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0508261220230.4697@goblin.wat.veritas.com \
    --to=hugh@veritas.com \
    --cc=fucillo@intersystems.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).