linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "heuristic overcommit" and fork()
@ 2009-02-11 19:26 David CHAMPELOVIER
  2009-02-13  1:36 ` [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork()) KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 4+ messages in thread
From: David CHAMPELOVIER @ 2009-02-11 19:26 UTC (permalink / raw)
  To: linux-kernel

Hi,

Recently, I was unable to fork() a 38 GB process on a system with 64 GB RAM
and no swap.
Having a look at the kernel source, I surprisingly found that in "heuristic
overcommit" mode, fork() always checks that there is enough memory to
duplicate process memory.

As far as I know, overcommit was introduced in the kernel for several
reasons, and fork() was one of them, since applications often exec() just
after fork(). I know fork() is not the most judicious choice in this case,
but well, this is the way many applications are written.

Moreover, I can read in the proc man page that in "heuristic overcommit
mode", "obvious overcommits of address space are refused". I do not think
fork() is an obvious overcommit, that's why I would expect fork() to be
always accepted in this mode.

So, is there a reason why fork() checks for available memory in "heuristic
mode" ?

Thanks in advance.

--
David Champelovier


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork())
  2009-02-11 19:26 "heuristic overcommit" and fork() David CHAMPELOVIER
@ 2009-02-13  1:36 ` KAMEZAWA Hiroyuki
  2009-02-16 14:32   ` Mel Gorman
  0 siblings, 1 reply; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-13  1:36 UTC (permalink / raw)
  To: mel; +Cc: linux-kernel, David CHAMPELOVIER, linux-mm, akpm

On Wed, 11 Feb 2009 20:26:32 +0100
"David CHAMPELOVIER" <david@champelovier.com> wrote:

> Hi,
> 
> Recently, I was unable to fork() a 38 GB process on a system with 64 GB RAM
> and no swap.
> Having a look at the kernel source, I surprisingly found that in "heuristic
> overcommit" mode, fork() always checks that there is enough memory to
> duplicate process memory.
> 
> As far as I know, overcommit was introduced in the kernel for several
> reasons, and fork() was one of them, since applications often exec() just
> after fork(). I know fork() is not the most judicious choice in this case,
> but well, this is the way many applications are written.
> 
> Moreover, I can read in the proc man page that in "heuristic overcommit
> mode", "obvious overcommits of address space are refused". I do not think
> fork() is an obvious overcommit, that's why I would expect fork() to be
> always accepted in this mode.
> 
> So, is there a reason why fork() checks for available memory in "heuristic
> mode" ?
> 

fork() is used for duplicate process and it means to duplicate memory space.
Because of Copy-On-Write, the page will not be used acutally. But, it's not
different from mmap() case. In that case, overcommit_guess compares
requested size and size of free memory for all that we use demand paging.
So, the behavior is not surprizing.  For notifing the kernel can assume
exec-is-called-after-fork, we may need some flags or paramater.

But, hmm.., there is something strange, following. Mel, how do you think ?
==

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Vm accounting at fork() should use the same logic as mmap().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/mm.h |    2 ++
 kernel/fork.c      |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: mmotm-2.6.29-Feb11/kernel/fork.c
===================================================================
--- mmotm-2.6.29-Feb11.orig/kernel/fork.c
+++ mmotm-2.6.29-Feb11/kernel/fork.c
@@ -301,7 +301,8 @@ static int dup_mmap(struct mm_struct *mm
 			continue;
 		}
 		charge = 0;
-		if (mpnt->vm_flags & VM_ACCOUNT) {
+		if (accountable_mapping(mpnt->vm_file, mpnt->vm_flags) &&
+			mpnt->vm_flags & VM_ACCOUNT) {
 			unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
 			if (security_vm_enough_memory(len))
 				goto fail_nomem;
Index: mmotm-2.6.29-Feb11/include/linux/mm.h
===================================================================
--- mmotm-2.6.29-Feb11.orig/include/linux/mm.h
+++ mmotm-2.6.29-Feb11/include/linux/mm.h
@@ -1047,6 +1047,8 @@ extern void free_bootmem_with_active_reg
 typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
 extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
 extern void sparse_memory_present_with_active_regions(int nid);
+extern int accountable_mapping(struct file *file, unsigned int vmflags);
+
 #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
 
 #if !defined(CONFIG_ARCH_POPULATES_NODE_MAP) && \




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork())
  2009-02-13  1:36 ` [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork()) KAMEZAWA Hiroyuki
@ 2009-02-16 14:32   ` Mel Gorman
  2009-02-17  0:08     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 4+ messages in thread
From: Mel Gorman @ 2009-02-16 14:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-kernel, David CHAMPELOVIER, linux-mm, akpm

On Fri, Feb 13, 2009 at 10:36:55AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 11 Feb 2009 20:26:32 +0100
> "David CHAMPELOVIER" <david@champelovier.com> wrote:
> 
> > Hi,
> > 
> > Recently, I was unable to fork() a 38 GB process on a system with 64 GB RAM
> > and no swap.
> > Having a look at the kernel source, I surprisingly found that in "heuristic
> > overcommit" mode, fork() always checks that there is enough memory to
> > duplicate process memory.
> > 
> > As far as I know, overcommit was introduced in the kernel for several
> > reasons, and fork() was one of them, since applications often exec() just
> > after fork(). I know fork() is not the most judicious choice in this case,
> > but well, this is the way many applications are written.
> > 
> > Moreover, I can read in the proc man page that in "heuristic overcommit
> > mode", "obvious overcommits of address space are refused". I do not think
> > fork() is an obvious overcommit, that's why I would expect fork() to be
> > always accepted in this mode.
> > 
> > So, is there a reason why fork() checks for available memory in "heuristic
> > mode" ?
> > 
> 
> fork() is used for duplicate process and it means to duplicate memory space.
> Because of Copy-On-Write, the page will not be used acutally. But, it's not
> different from mmap() case.

Pretty much. At fork() time, you cannot know if the process is going to
exec or not.

> In that case, overcommit_guess compares
> requested size and size of free memory for all that we use demand paging.
> So, the behavior is not surprizing.  For notifing the kernel can assume
> exec-is-called-after-fork, we may need some flags or paramater.
> 

There already is one of sorts. Use madvise(MADV_DONTFORK) on the large
memory regions so they don't get copied. If that doesn't work, it means
the accounting for the VMAs is being done in the wrong order.

> But, hmm.., there is something strange, following. Mel, how do you think ?
> ==
> 
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Vm accounting at fork() should use the same logic as mmap().
> 

This alters semantics in a fairly subtle manner and I think would break
counters as well.

accountable_mapping() is used to determine if VM_ACCOUNT is set or
not. Once set, it gets accounted after that. even if the overcommit settings
change. Somewhat weirdly, the overcommit decisions at the time of mmap()
are reused at fork() even if the overcommit settings change.  This is odd
behaviour and arguably your patch could fix this anomoly. However, it
would make more sense to me to recalculate if VM_ACCOUNT should be set
or not rather than what you do here.

I think this patch also has subtle breakage. We could do something like;

1. mmap(), VM_ACCOUNT not set due to overcommit settings
2. overcommit set to strict
3. fork()
4. check flags, note that VM_ACCOUNT would have been used, account but
   VM_ACCOUNT is still not set
5. child exits, VM_ACCOUNT not set so reserves are not given back

So reserves can constantly go up and never down. It would require root but
a bad program could eventually push the reserves up to the size of physical
memory without any of the memory actually being used.


> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/mm.h |    2 ++
>  kernel/fork.c      |    3 ++-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> Index: mmotm-2.6.29-Feb11/kernel/fork.c
> ===================================================================
> --- mmotm-2.6.29-Feb11.orig/kernel/fork.c
> +++ mmotm-2.6.29-Feb11/kernel/fork.c
> @@ -301,7 +301,8 @@ static int dup_mmap(struct mm_struct *mm
>  			continue;
>  		}
>  		charge = 0;
> -		if (mpnt->vm_flags & VM_ACCOUNT) {
> +		if (accountable_mapping(mpnt->vm_file, mpnt->vm_flags) &&
> +			mpnt->vm_flags & VM_ACCOUNT) {
>  			unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
>  			if (security_vm_enough_memory(len))
>  				goto fail_nomem;
> Index: mmotm-2.6.29-Feb11/include/linux/mm.h
> ===================================================================
> --- mmotm-2.6.29-Feb11.orig/include/linux/mm.h
> +++ mmotm-2.6.29-Feb11/include/linux/mm.h
> @@ -1047,6 +1047,8 @@ extern void free_bootmem_with_active_reg
>  typedef int (*work_fn_t)(unsigned long, unsigned long, void *);
>  extern void work_with_active_regions(int nid, work_fn_t work_fn, void *data);
>  extern void sparse_memory_present_with_active_regions(int nid);
> +extern int accountable_mapping(struct file *file, unsigned int vmflags);
> +

accountable_mapping() is a static inline in mm/mmap.c so I'd be
surprised if this compiles.

>  #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>  
>  #if !defined(CONFIG_ARCH_POPULATES_NODE_MAP) && \
> 
> 

NAK.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork())
  2009-02-16 14:32   ` Mel Gorman
@ 2009-02-17  0:08     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-02-17  0:08 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-kernel, David CHAMPELOVIER, linux-mm, akpm

On Mon, 16 Feb 2009 14:32:32 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> >  #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
> >  
> >  #if !defined(CONFIG_ARCH_POPULATES_NODE_MAP) && \
> > 
> > 
> 
> NAK.
> 
Ok, I'm wrong. thanks.

-Kame


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-17  0:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-11 19:26 "heuristic overcommit" and fork() David CHAMPELOVIER
2009-02-13  1:36 ` [PATCH] fix vmaccnt at fork (Was Re: "heuristic overcommit" and fork()) KAMEZAWA Hiroyuki
2009-02-16 14:32   ` Mel Gorman
2009-02-17  0:08     ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).