All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory...
@ 2001-12-06  1:54 S. Parker
  2001-12-06 12:48   ` Stephen C. Tweedie
  2001-12-07 22:47   ` S. Parker
  0 siblings, 2 replies; 5+ messages in thread
From: S. Parker @ 2001-12-06  1:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1773 bytes --]

Hi,

I'm interested in comments and feedback on this patch.


Attached below is "memstride.c", a simple program to exercise a process which
wishes to grow to the largest size of available VM the system can handle,
scribble in it all.  Actually, scribble in it all several times.

Under at least 2.4.14 -> 2.4.16, the VM system *always* over-commits to
memstride, even on an otherwise idle system, and ends up killing it.
This is wrong.  It should be possible for memstride to be told when
it has over-stepped the size of the system's total VM resources, by
having sbrk() return -1 (out of memory).


Also attached is my proposed fix for this problem.  It has the following
changes:

1.  Do a better job estimating how much VM is available
         vm_enough_memory() was changed to take the sum of all free RAM
         and all free swap, subtract up to 1/8th of physical RAM (but not
         more than 16MB) as a reserve for system buffers to prevent deadlock,
         and compare this to the request.  If the VM request is <= the
         available free stuff, then we're set.

2.  Be willing to sleep for memory chunks larger than 8 pages.
         __alloc_pages had an uncommented piece of code, that I couldn't
         see any reason to have.  It doesn't matter how big the piece of
         memory is--if we're low, and it's a sleepable request, we should
         sleep.  Now it does.  (Can anyone explain to me why this coded was
         added originally??)

The combination of these two changes makes it so that memstride passes.
Although memstride is a contrived example, it was contrived to parallel
behavior that was seen in many situations, with many different real
processes in normal use.  This fix allows those uncontrived programs
to run reliably.





[-- Attachment #2: memstride.c --]
[-- Type: text/plain, Size: 3034 bytes --]

#include <stdlib.h>
#include <sys/time.h>
#include <sys/user.h>
#include <sys/resource.h>
#include <unistd.h>

scan_mem(int *base, int size)
{
	int sum = 0;

	while (size > 0) {
		sum += *base++;
		size -= sizeof (int);
	}
}

usage_report(struct rusage *prev_ru, int npages)
{
	struct rusage ru;
	float fltim;
	float ofltim;
	int newline = 0;

	getrusage(RUSAGE_SELF, &ru);
	fltim = ru.ru_utime.tv_sec;
	fltim += ((float)ru.ru_utime.tv_usec)/1.0e06;
	ofltim = prev_ru->ru_utime.tv_sec;
	ofltim += ((float)prev_ru->ru_utime.tv_usec)/1.0e06;
	printf("user %.2f", fltim - ofltim);
	fltim = ru.ru_stime.tv_sec;
	fltim += ((float)ru.ru_stime.tv_usec)/1.0e06;
	ofltim = prev_ru->ru_stime.tv_sec;
	ofltim += ((float)prev_ru->ru_stime.tv_usec)/1.0e06;
	printf(" sys %.2f", fltim - ofltim);
	printf(" reclaims: %d faults %d swaps: %d in/out %d/%d csw %d/%d\n",
	    ru.ru_minflt - prev_ru->ru_minflt,
	    ru.ru_majflt - prev_ru->ru_majflt,
	    ru.ru_nswap - prev_ru->ru_nswap,
	    ru.ru_inblock - prev_ru->ru_inblock,
	    ru.ru_oublock - prev_ru->ru_oublock,
	    ru.ru_nvcsw - prev_ru->ru_nvcsw,
	    ru.ru_nivcsw - prev_ru->ru_nivcsw);
	if (npages == 0)
		return; /* should not happen */
	if (ru.ru_minflt - prev_ru->ru_minflt > 0) {
		printf("minor flts/pg: %d ",
		    (ru.ru_minflt - prev_ru->ru_minflt)/npages);
		newline++;
	}
	if (ru.ru_majflt - prev_ru->ru_majflt > 0) {
		printf("major flts/pg: %d ",
		    (ru.ru_majflt - prev_ru->ru_majflt)/npages);
		newline++;
	}
	if (newline) {
		printf("\n");
	}
}

#define SZ2PG(x)	((x + PAGE_SIZE - 1)/PAGE_SIZE)
main(int argc, char *argv[])
{
	int size = 512;
	int tot_size = 0;
	int loops = 0;
	int *p;
	int *base = sbrk(1);
	int max_size;
	struct rusage ru;

	if (argc > 1) {
		max_size = atoi(argv[1]);
		printf("Stop growing after crossing %d bytes\n", max_size);
	} else {
		max_size = 2000000000;
	}
	while ((p = sbrk(size)) != (int *)-1) {
		printf("Touching %d newly allocated bytes.  Size: %d pages/%d "
		    "bytes\n", size, SZ2PG(tot_size), tot_size);
		memset(p, loops, size);
		tot_size += size;
		scan_mem(base, tot_size);
		size <<= 1;
		loops++;
		if (tot_size + size > max_size)
			break;
	}
	printf("Begin doing smaller chunks to finish the job...\n");
	while (size > 4096) {
		size >>= 1;
		p = sbrk(size);
		if (p == (int *)-1)
			continue;
		printf("Touching %d newly allocated bytes.  Size: %d pages/%d "
		    "bytes\n", size, SZ2PG(tot_size), tot_size);
		tot_size += size;
	}
	printf("Finished growing memory.  %d pages\n", SZ2PG(tot_size));
	memset(&ru, 0, sizeof (ru));
	usage_report(&ru, SZ2PG(tot_size));
	getrusage(RUSAGE_SELF, &ru);
	printf("Now scan memory... total size: %d pages/%d. bytes\n", SZ2PG(tot_size), tot_size);
	for (loops = 5; loops > 0; loops--) {
		scan_mem(base, tot_size);
		/*usage_report(&ru, SZ2PG(tot_size));*/
		/*getrusage(RUSAGE_SELF, &ru);*/
		printf("%d scans remaining...\n", loops);
	}
	printf("Overall usage:  (%d pages in this run)\n", SZ2PG(tot_size));
	memset(&ru, 0, sizeof (ru));
	usage_report(&ru, SZ2PG(tot_size));
}

[-- Attachment #3: DiffsVm2.txt --]
[-- Type: text/plain, Size: 3328 bytes --]

diff -r -c generic-2.4.16/mm/mmap.c vm-lapdog/mm/mmap.c
*** generic-2.4.16/mm/mmap.c	Sun Nov  4 10:17:20 2001
--- vm-lapdog/mm/mmap.c	Sun Dec  2 22:10:46 2001
***************
*** 62,93 ****
  	 */
  
  	unsigned long free;
  	
          /* Sometimes we want to use more memory than we have. */
  	if (sysctl_overcommit_memory)
  	    return 1;
  
- 	/* The page cache contains buffer pages these days.. */
- 	free = atomic_read(&page_cache_size);
- 	free += nr_free_pages();
- 	free += nr_swap_pages;
- 
- 	/*
- 	 * This double-counts: the nrpages are both in the page-cache
- 	 * and in the swapper space. At the same time, this compensates
- 	 * for the swap-space over-allocation (ie "nr_swap_pages" being
- 	 * too small.
- 	 */
- 	free += swapper_space.nrpages;
- 
  	/*
! 	 * The code below doesn't account for free space in the inode
! 	 * and dentry slab cache, slab cache fragmentation, inodes and
! 	 * dentries which will become freeable under VM load, etc.
! 	 * Lets just hope all these (complex) factors balance out...
  	 */
! 	free += (dentry_stat.nr_unused * sizeof(struct dentry)) >> PAGE_SHIFT;
! 	free += (inodes_stat.nr_unused * sizeof(struct inode)) >> PAGE_SHIFT;
  
  	return free > pages;
  }
--- 62,98 ----
  	 */
  
  	unsigned long free;
+ 	unsigned long free_pages;
  	
          /* Sometimes we want to use more memory than we have. */
  	if (sysctl_overcommit_memory)
  	    return 1;
  
  	/*
! 	 * Um... how about a better algorithm yet.
! 	 * Start with the theoretical maximum--size of RAM + size of swap.
! 	 * Reduce the RAM by 1/8, up to a maximum of MAX_OS_RESERVED_MEM
! 	 * (16MB) to allow for the OS to have *some* work to get stuff
! 	 * done with.
  	 */
! #define	MAX_OS_RESERVED_MEM	4096 /* in pages */
! 	free_pages = nr_free_pages();
! 	if (free_pages < (max_mapnr >> 3)) { /* free_pages < 1/8 physical */
! 		free = nr_swap_pages + swapper_space.nrpages; /* take swap free */
! 		/*
! 		 * Now compensate for the RAM short-fall...
! 		 */
! 		if (free > ((max_mapnr >> 3) - free_pages)) { /* swap > RAM shortfall */
! 			free -= (max_mapnr >> 3) - free_pages;
! 		} else {
! 			free = 0;
! 		}
! 	} else {
! 		free = free_pages - (max_mapnr >> 3);
! 		if (free > MAX_OS_RESERVED_MEM)
! 			free = MAX_OS_RESERVED_MEM;
! 		free += nr_swap_pages + swapper_space.nrpages;
! 	}
  
  	return free > pages;
  }
diff -r -c generic-2.4.16/mm/page_alloc.c vm-lapdog/mm/page_alloc.c
*** generic-2.4.16/mm/page_alloc.c	Mon Nov 19 16:35:40 2001
--- vm-lapdog/mm/page_alloc.c	Sun Dec  2 18:50:29 2001
***************
*** 389,397 ****
  		}
  	}
  
! 	/* Don't let big-order allocations loop */
! 	if (order > 3)
! 		return NULL;
  
  	/* Yield for kswapd, and try again */
  	current->policy |= SCHED_YIELD;
--- 389,405 ----
  		}
  	}
  
! 	/*
! 	 * So this is a sleep-able request for memory--we're presuming
! 	 * that vm_enough_memory() hasn't indicated more available VM
! 	 * than we actually have, or this process could end up looping
! 	 * here.  On the other hand, that's far less detrimental than
! 	 * killing processes!
! 	 *
! 	 * More ideal would be if vm_enough_memory() decreased the
! 	 * available memory counts, but allowed *that* request to
! 	 * claim those pages as if they were free.
! 	 */
  
  	/* Yield for kswapd, and try again */
  	current->policy |= SCHED_YIELD;

[-- Attachment #4: Type: text/plain, Size: 19 bytes --]

Cheers,

	~sparker

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory...
  2001-12-06  1:54 [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory S. Parker
@ 2001-12-06 12:48   ` Stephen C. Tweedie
  2001-12-07 22:47   ` S. Parker
  1 sibling, 0 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 2001-12-06 12:48 UTC (permalink / raw)
  To: S. Parker; +Cc: linux-mm, linux-kernel

Hi,

On Wed, Dec 05, 2001 at 05:54:44PM -0800, S. Parker wrote:
 
> Attached below is "memstride.c", a simple program to exercise a process which
> wishes to grow to the largest size of available VM the system can handle,
> scribble in it all.  Actually, scribble in it all several times.
> 
> Under at least 2.4.14 -> 2.4.16, the VM system *always* over-commits to
> memstride, even on an otherwise idle system, and ends up killing it.
> This is wrong.  It should be possible for memstride to be told when
> it has over-stepped the size of the system's total VM resources, by
> having sbrk() return -1 (out of memory).

Yes, over-commit protection is far from perfect.  However, it's a
difficult problem to get right.

> Also attached is my proposed fix for this problem.  It has the following
> changes:
> 
> 1.  Do a better job estimating how much VM is available
>          vm_enough_memory() was changed to take the sum of all free RAM
>          and all free swap, subtract up to 1/8th of physical RAM (but not
>          more than 16MB) as a reserve for system buffers to prevent deadlock,
>          and compare this to the request.  If the VM request is <= the
>          available free stuff, then we're set.

That's still just a guestimate: do you have any hard data to back
up the magic numbers here?

> 2.  Be willing to sleep for memory chunks larger than 8 pages.
>          __alloc_pages had an uncommented piece of code, that I couldn't
>          see any reason to have.  It doesn't matter how big the piece of
>          memory is--if we're low, and it's a sleepable request, we should
>          sleep.  Now it does.  (Can anyone explain to me why this coded was
>          added originally??)

That's totally separate: *all* user VM allocations are done with
order-0 allocations, so this can't have any effect on VM overcommit.

Ultimately, your patch still doesn't protect against overcommit: if
you run two large, lazy memory using applications in parallel, you'll
still get each of them being told there's enough VM left at the time
of sbrk/mmap, and they will both later on find out at page fault time
that there's not enough memory to go round.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory...
@ 2001-12-06 12:48   ` Stephen C. Tweedie
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen C. Tweedie @ 2001-12-06 12:48 UTC (permalink / raw)
  To: S. Parker; +Cc: linux-mm, linux-kernel

Hi,

On Wed, Dec 05, 2001 at 05:54:44PM -0800, S. Parker wrote:
 
> Attached below is "memstride.c", a simple program to exercise a process which
> wishes to grow to the largest size of available VM the system can handle,
> scribble in it all.  Actually, scribble in it all several times.
> 
> Under at least 2.4.14 -> 2.4.16, the VM system *always* over-commits to
> memstride, even on an otherwise idle system, and ends up killing it.
> This is wrong.  It should be possible for memstride to be told when
> it has over-stepped the size of the system's total VM resources, by
> having sbrk() return -1 (out of memory).

Yes, over-commit protection is far from perfect.  However, it's a
difficult problem to get right.

> Also attached is my proposed fix for this problem.  It has the following
> changes:
> 
> 1.  Do a better job estimating how much VM is available
>          vm_enough_memory() was changed to take the sum of all free RAM
>          and all free swap, subtract up to 1/8th of physical RAM (but not
>          more than 16MB) as a reserve for system buffers to prevent deadlock,
>          and compare this to the request.  If the VM request is <= the
>          available free stuff, then we're set.

That's still just a guestimate: do you have any hard data to back
up the magic numbers here?

> 2.  Be willing to sleep for memory chunks larger than 8 pages.
>          __alloc_pages had an uncommented piece of code, that I couldn't
>          see any reason to have.  It doesn't matter how big the piece of
>          memory is--if we're low, and it's a sleepable request, we should
>          sleep.  Now it does.  (Can anyone explain to me why this coded was
>          added originally??)

That's totally separate: *all* user VM allocations are done with
order-0 allocations, so this can't have any effect on VM overcommit.

Ultimately, your patch still doesn't protect against overcommit: if
you run two large, lazy memory using applications in parallel, you'll
still get each of them being told there's enough VM left at the time
of sbrk/mmap, and they will both later on find out at page fault time
that there's not enough memory to go round.

Cheers,
 Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory...
  2001-12-06 12:48   ` Stephen C. Tweedie
@ 2001-12-07 22:47   ` S. Parker
  -1 siblings, 0 replies; 5+ messages in thread
From: S. Parker @ 2001-12-07 22:47 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-kernel

Hi Stephen,

At 04:48 AM 12/6/2001 , Stephen C. Tweedie wrote:
>Yes, over-commit protection is far from perfect.  However, it's a
>difficult problem to get right.

Um, at what % solution?  As long as the goal isn't that you under-commit
by more than a small percentage of pages, IMHO, it's an acceptable solution.


> > Also attached is my proposed fix for this problem.  It has the following
> > changes:
> >
> > 1.  Do a better job estimating how much VM is available
> >          vm_enough_memory() was changed to take the sum of all free RAM
> >          and all free swap, subtract up to 1/8th of physical RAM (but not
> >          more than 16MB) as a reserve for system buffers to prevent 
> deadlock,
> >          and compare this to the request.  If the VM request is <= the
> >          available free stuff, then we're set.
>
>That's still just a guestimate: do you have any hard data to back
>up the magic numbers here?

A guestimate of what exactly?  What I did was, using top, verify that
most of the available VM got put to use.  (95%+ on my systems.)
Would hard data such as what
percentage of potential VM resources didn't get used impress you?

While you're right that I haven't done a concrete, deductive, and provably
correct sort of thing, I've done something which seems to me moderately
intuitive and empirically effective.  I recognize that some amount of memory
resources must remain available for the operating system to page in/out, and
otherwise do I/O with, and refuse to commit that to user processes.

Admittedly better solutions are possible, and indeed although I haven't
looked in detail at the code in Eduardo Horvath's March 2000 patch prop
[Really disabling overcommit.], I'd certainly agree that a patch with
an approach like that would be technically superior.  Is such a patch
being seriously considered for inclusion in 2.4?

For me this problem is a serious one.  It's trivial to hit in real life,
and get your process killed.  And I can't have that.  The OS needs to err
on the conservative side, or it's not a usable system for me.


> > 2.  Be willing to sleep for memory chunks larger than 8 pages.
> >          __alloc_pages had an uncommented piece of code, that I couldn't
> >          see any reason to have.  It doesn't matter how big the piece of
> >          memory is--if we're low, and it's a sleepable request, we should
> >          sleep.  Now it does.  (Can anyone explain to me why this coded was
> >          added originally??)
>
>That's totally separate: *all* user VM allocations are done with
>order-0 allocations, so this can't have any effect on VM overcommit.

Well, I'll go back and check where they were coming from, but I *was* seeing
12-page allocation requests when running the test program I posted, on an
otherwise idle system.

But I still think my change, and question remain valid:  These are not handling
GFP_ATOMIC requests, so the kernel is allowed to sleep, why doesn't 
it?  Getting
memory is always better than being refused, it seems to me.  This code which
makes it not sleep was recently touched (it had been checking for order > 1,
now it's order > 3?, if my memory is correct.)  My question is:  why was this
done, and why keep it at all?


>Ultimately, your patch still doesn't protect against overcommit: if
>you run two large, lazy memory using applications in parallel, you'll
>still get each of them being told there's enough VM left at the time
>of sbrk/mmap, and they will both later on find out at page fault time
>that there's not enough memory to go round.

Interestingly, even if I run many of my test program in parallel, it does
*not* over-commit.  (At least, it certainly never calls on the OOM killer.
Perhaps it is over-commiting on some level, but if it is, I don't notice it...)

Do you have a test case which causes over-commit (and therefore OOM kill)
against my patch?

Thanks,

         ~sparker


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory...
@ 2001-12-07 22:47   ` S. Parker
  0 siblings, 0 replies; 5+ messages in thread
From: S. Parker @ 2001-12-07 22:47 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm, linux-kernel

Hi Stephen,

At 04:48 AM 12/6/2001 , Stephen C. Tweedie wrote:
>Yes, over-commit protection is far from perfect.  However, it's a
>difficult problem to get right.

Um, at what % solution?  As long as the goal isn't that you under-commit
by more than a small percentage of pages, IMHO, it's an acceptable solution.


> > Also attached is my proposed fix for this problem.  It has the following
> > changes:
> >
> > 1.  Do a better job estimating how much VM is available
> >          vm_enough_memory() was changed to take the sum of all free RAM
> >          and all free swap, subtract up to 1/8th of physical RAM (but not
> >          more than 16MB) as a reserve for system buffers to prevent 
> deadlock,
> >          and compare this to the request.  If the VM request is <= the
> >          available free stuff, then we're set.
>
>That's still just a guestimate: do you have any hard data to back
>up the magic numbers here?

A guestimate of what exactly?  What I did was, using top, verify that
most of the available VM got put to use.  (95%+ on my systems.)
Would hard data such as what
percentage of potential VM resources didn't get used impress you?

While you're right that I haven't done a concrete, deductive, and provably
correct sort of thing, I've done something which seems to me moderately
intuitive and empirically effective.  I recognize that some amount of memory
resources must remain available for the operating system to page in/out, and
otherwise do I/O with, and refuse to commit that to user processes.

Admittedly better solutions are possible, and indeed although I haven't
looked in detail at the code in Eduardo Horvath's March 2000 patch prop
[Really disabling overcommit.], I'd certainly agree that a patch with
an approach like that would be technically superior.  Is such a patch
being seriously considered for inclusion in 2.4?

For me this problem is a serious one.  It's trivial to hit in real life,
and get your process killed.  And I can't have that.  The OS needs to err
on the conservative side, or it's not a usable system for me.


> > 2.  Be willing to sleep for memory chunks larger than 8 pages.
> >          __alloc_pages had an uncommented piece of code, that I couldn't
> >          see any reason to have.  It doesn't matter how big the piece of
> >          memory is--if we're low, and it's a sleepable request, we should
> >          sleep.  Now it does.  (Can anyone explain to me why this coded was
> >          added originally??)
>
>That's totally separate: *all* user VM allocations are done with
>order-0 allocations, so this can't have any effect on VM overcommit.

Well, I'll go back and check where they were coming from, but I *was* seeing
12-page allocation requests when running the test program I posted, on an
otherwise idle system.

But I still think my change, and question remain valid:  These are not handling
GFP_ATOMIC requests, so the kernel is allowed to sleep, why doesn't 
it?  Getting
memory is always better than being refused, it seems to me.  This code which
makes it not sleep was recently touched (it had been checking for order > 1,
now it's order > 3?, if my memory is correct.)  My question is:  why was this
done, and why keep it at all?


>Ultimately, your patch still doesn't protect against overcommit: if
>you run two large, lazy memory using applications in parallel, you'll
>still get each of them being told there's enough VM left at the time
>of sbrk/mmap, and they will both later on find out at page fault time
>that there's not enough memory to go round.

Interestingly, even if I run many of my test program in parallel, it does
*not* over-commit.  (At least, it certainly never calls on the OOM killer.
Perhaps it is over-commiting on some level, but if it is, I don't notice it...)

Do you have a test case which causes over-commit (and therefore OOM kill)
against my patch?

Thanks,

         ~sparker

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-12-07 22:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-06  1:54 [PATCH] VM system in 2.4.16 doesn't try hard enough for user memory S. Parker
2001-12-06 12:48 ` Stephen C. Tweedie
2001-12-06 12:48   ` Stephen C. Tweedie
2001-12-07 22:47 ` S. Parker
2001-12-07 22:47   ` S. Parker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.