linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Repost: could ia32 mmap() allocations grow downward?
@ 2001-12-13 18:36 Petr Vandrovec
  2001-12-13 18:03 ` Wayne Whitney
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2001-12-13 18:36 UTC (permalink / raw)
  To: Wayne Whitney; +Cc: LKML

On 13 Dec 01 at 8:22, Wayne Whitney wrote:
> > So maybe MAGMA uses some API which it should not use under any
> > circumstances... Such as that you linked it with libc6 stdio.
> 
> Indeed.  How can I avoid the map at 0x40000000?  Must I avoid using
> certain glibc2 functions, and then link the executable carefully to leave
> out their initialization routines?  Or can I set some magic environment

It is caused by (I think that stupid...) code in 
glibc-2.2.4/libio/libioP.h:ALLOC_BUF(), which unconditionally does
'mmap(0, ROUND_TO_PAGE(size), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0)' instead of 'malloc(size)' when it finds that underlying system
supports malloc.

If you linked Magma yourself, try adding:
---
#include <malloc.h>

void* malloc(size_t len) { return sbrk(len); }
void* __mmap(void* start, size_t len, int prot, int flags, int fd, 
        unsigned long offset) {
    if (start == 0 && fd == -1) { return malloc(len); }
    return NULL;
}
---
into your project. It forces my 'void main() { printf("X\n"); pause(); }'
to use brk() instead of mmap() for stdio buffers. Maybe we should move
to bug-glibc instead, as there is no way to force stdio to not ignore
mallopt() parameters, it still insist on using mmap, and I think that it
is a glibc2.2 bug.
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz

P.S.: I did some testing, and about 95% of mremap() allocations is
targeted to last VMA, so no VMA move is needed for them. But no Java
was part of picture, only c/c++ programs I use - gcc, mc, perl.

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: Repost: could ia32 mmap() allocations grow downward?
@ 2001-12-13 20:13 Petr Vandrovec
  0 siblings, 0 replies; 12+ messages in thread
From: Petr Vandrovec @ 2001-12-13 20:13 UTC (permalink / raw)
  To: Wayne Whitney; +Cc: LKML

On 13 Dec 01 at 10:03, Wayne Whitney wrote:
> On Thu, 13 Dec 2001, Petr Vandrovec wrote:
> 
> > Maybe we should move to bug-glibc instead, as there is no way to force
> > stdio to not ignore mallopt() parameters, it still insist on using
> > mmap, and I think that it is a glibc2.2 bug.
> 
> OK, that makes sense for the glibc2 subthread of this discussion.  Would
> you mind submitting the bug report, as you have a better command of the
> issues than I do?  Or if you want, I can do it and just quote you.  :-)

If you can complain yourself...
 
> > P.S.: I did some testing, and about 95% of mremap() allocations is
> > targeted to last VMA, so no VMA move is needed for them. But no Java
> > was part of picture, only c/c++ programs I use - gcc, mc, perl.
> 
> Ah, so this is important data.  It shows that the mmap() grows downward
> strategy will hurt the common case.  I don't have any handle on the
> magnitude of this effect, but if it is significant, then I would have to
> agree that supporting the legacy brk() apps is not as important as keeping
> mremap() of the last VMA cheap.  How expensive is moving a VMA, and how
> often do programs mremap()?

It is not that bad, as only PTEs are moved, but ... currently code calls
mremap(), and in 95% of cases same address is returned, while after
change mremap() changes address in 100% of cases, so couple of bugs 
can be discovered due to this change.
 
> How about the idea of modifying brk() (or adding an alternative) to move
> VMAs out of the way as necessary?  This way the negative impact (of moving
> VMAs) is only borne by the legacy brk() using app.  Or is there some other
> downside that I am missing?

You cannot move VMAs when app does not request mremap(), as you must notify
app about new location of area - app can have couple of pointers to this
memory, so you cannot move it around without app being informed.

And unfortunately you also cannot just skip existing VMAs by brk(), as
userspace remebers latest value returned by brk(), add size to it, and 
calls brk() to grow data segment. As apps decides about new brk() value, 
and app does not know that there is some VMA somewhere, kernel cannot 
do anything about it too - unfortunately.
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz
                                                

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: Repost: could ia32 mmap() allocations grow downward?
@ 2001-12-13 11:27 Petr Vandrovec
  2001-12-13 16:22 ` Wayne Whitney
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2001-12-13 11:27 UTC (permalink / raw)
  To: Wayne Whitney; +Cc: LKML

On 12 Dec 01 at 22:28, Wayne Whitney wrote:

> BTW, if one were trying to port some code that uses brk() directly and
> even frees memory that way, then it seems that with glibc's malloc(), one
> could make it work by instructing malloc() always to use mmap().

> P.S.  I am 100% sure that the particular application of mine that started
> me thinking about this, MAGMA, uses its own allocator built on top of
> brk() and never calls malloc() itself.

If you have legacy app, how it comes that it uses mmap? And if I do
not use mmap, I have nothing at 1GB:

void main() { sleep(10); brk((void*)0xBF000000); pause(); }

/proc/`pidof x`/maps says during sleep(10):

08048000-080a1000 r-xp 00000000 03:03 230941   /usr/src/linus/x
080a1000-080a5000 rw-p 00058000 03:03 230941   /usr/src/linus/x
080a5000-080a6000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0

and after brk() (which suceeded after I did ulimit -d unlimited
and 'echo 1 >/proc/sys/vm/overcommit_memory') I see:

08048000-080a1000 r-xp 00000000 03:03 230941   /usr/src/linus/x
080a1000-080a5000 rw-p 00058000 03:03 230941   /usr/src/linus/x
080a5000-bf000000 rwxp 00000000 00:00 0
bffff000-c0000000 rwxp 00000000 00:00 0

So maybe MAGMA uses some API which it should not use under any
circumstances... Such as that you linked it with libc6 stdio.
                                                    Best regards,
                                                        Petr Vandrovec
                                                        vandrove@vc.cvut.cz
                                                        

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: Repost: could ia32 mmap() allocations grow downward?
@ 2001-12-12 21:47 Petr Vandrovec
  2001-12-13  6:28 ` Wayne Whitney
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Vandrovec @ 2001-12-12 21:47 UTC (permalink / raw)
  To: Wayne Whitney; +Cc: linux-kernel

On 12 Dec 01 at 12:02, Wayne Whitney wrote:

> o Pick a maximum stack size S and change the kernel so the "mmap()
>   without MAP_FIXED" region starts at 0xC0000000 - S and grows downwards. 

How you'll pick S? 8MB? 128MB? Now you can have 1GB brk + 2GB (stack+mmap),
after change you have 2.9GB (brk+mmap), but only 128MB stack. And if you'll
change your malloc implementation, you can have up to 2GB stack now, or
up to 3GB of mmap. After your change your stack is limited to 128MB, and
you cannot do anything around that except moving stack somewhere else
during libc startup - and in this case couple of argv[] assumptions
setproctitle and other do are no longer valid.

Another problem is mremap. Due to way how apps works, you'll have
to move VMAs around much more because of you cannot grow your last
VMA up without move. And if you shrink your last block, you'll get
a gap.
 
> This seems ideal, as it allows the balance between the mmap() region and
> the brk() region to vary for each process, automatically.  What changes
> would be required to the kernel to implement this properly and
> efficiently?  Is there some downside I am missing?

Nobody can call brk() directly from app, as libc may use brk() for
implementing malloc(), and libraries can call malloc. So you have to
create your own allocator on the top of brk() results, and this
allocator must not release memory back to system, as this could
release also chunks you do not own. Writting your allocator on the
top of malloc()ed areas is much better idea.
                                                Best regards,
                                                    Petr Vandrovec
                                                    vandrove@vc.cvut.cz
                                                    
P.S.: I do not think that your app calls directly brk(). I think that
your app calls malloc with some small number, and libc decides to use
brk() instead of mmap(). And in such case it is bug in your libc that 
it does not use mmap() after brk() fails.

^ permalink raw reply	[flat|nested] 12+ messages in thread
* Repost: could ia32 mmap() allocations grow downward?
@ 2001-12-12 20:02 Wayne Whitney
  0 siblings, 0 replies; 12+ messages in thread
From: Wayne Whitney @ 2001-12-12 20:02 UTC (permalink / raw)
  To: LKML

Hello,

I posted this message five days ago to nary a comment, so perhaps I did
something wrong.  Any comments would be appreciated, other than "go buy
64-bit hardware."  :-)

Cheers,
Wayne


Although pretty much a kernel newbie, I wanted to bring up an idea that I
first saw here a year ago but which received no commentary. 

Namely, from time to time an ia32 user will write about running out of
user address space in one way or another.  The standard answer is that
under ia32 Linux, the 32-bit address space for a program of size P is
carved up as follows:

Start Address	Map Contents			Growth Direction

0x08000000	the executable's code segment	upwards
0x08000000 + P	the executable's data segment	upwards
0x08000000 + 2P	the program's heap		upwards
0x40000000	mmap() without MAP_FIXED	upwards
0xBFFFFFFF	the stack			downards
0xC0000000	kernel space			upwards
0xFFFFFFFF	top of the addresss space

Thus a typical problem is that a program that wants to manage its own heap
(using the brk() system call instead of malloc() from libc) will have a
maximum heap size of 0x38000000 - 2P.  Or a program that heavily uses
mmap() will only have 0x80000000 of mmap() address space.

Various workaround are usually proposed, such as:

o Modify the program to use malloc(), or tune the malloc() allocation
  strategy parameters, as malloc() knows about the two distinct memory
  allocation mechanisms, brk() below 0x40000000 and mmap() above it.

o Change the value of TASK_UNMAPPED_BASE in the kernel from its default 
  of 0x40000000.

o Change __PAGE_OFFSET (and the associated value in vmlinux.lds) to 
  0xE0000000 to reduce the kernel space to 512MB.

The alternative idea (not mine) which I'm curious about is:

o Pick a maximum stack size S and change the kernel so the "mmap()
  without MAP_FIXED" region starts at 0xC0000000 - S and grows downwards. 

This seems ideal, as it allows the balance between the mmap() region and
the brk() region to vary for each process, automatically.  What changes
would be required to the kernel to implement this properly and
efficiently?  Is there some downside I am missing?

FWIW, I made a very simple, very naive attempt at doing this about a year
ago, against 2.2.19-prex.  The patch is included below, and it booted OK
for me at the time.  I'm sure I made various poor choices in the patch,
though, having not had the Big Picture.


diff -ru linux-2.2.19-pre7/include/asm-i386/processor.h linux-2.2.19-pre7-hack2/include/asm-i386/processor.h
--- linux-2.2.19-pre7/include/asm-i386/processor.h	Tue Jan  9 20:26:35 2001
+++ linux-2.2.19-pre7-hack2/include/asm-i386/processor.h	Sat Jan 13 11:58:00 2001
@@ -163,10 +163,22 @@
  */
 #define TASK_SIZE	(PAGE_OFFSET)
 
-/* This decides where the kernel will search for a free chunk of vm
- * space during mmap's.
+/* 
+ * When looking for a free chunk of vm space during mmap's, the kernel
+ * will search upwards from TASK_UNMAPPED_BASE (the usual algorithm),
+ * unless TASK_UNMAPPED_CEILING is defined, in which case it will
+ * search downwards from TASK_UNMAPPED_CEILING to TASK_UNMAPPED_FLOOR.
  */
 #define TASK_UNMAPPED_BASE	(TASK_SIZE / 3)
+
+/* 
+ * We need to allow room for the stack to grow downard from TASK_SIZE,
+ * I really have no idea how large it can get, so I arbitrarily picked
+ * 128MB.  Also, I'm not so sure where to stop searching and give up,
+ * so I pick 128MB, which seems to be where exectuables get loaded.
+ */
+#define TASK_UNMAPPED_CEILING   (TASK_SIZE - 128 * 1024 * 1024)
+#define TASK_UNMAPPED_FLOOR     (128 * 1024 * 1024)
 
 /*
  * Size of io_bitmap in longwords: 32 is ports 0-0x3ff.
diff -ru linux-2.2.19-pre7/mm/mmap.c linux-2.2.19-pre7-hack2/mm/mmap.c
--- linux-2.2.19-pre7/mm/mmap.c	Sat Dec  9 21:29:39 2000
+++ linux-2.2.19-pre7-hack2/mm/mmap.c	Sat Jan 13 11:58:00 2001
@@ -365,6 +365,22 @@
 
 	if (len > TASK_SIZE)
 		return 0;
+#ifdef TASK_UNMAPPED_CEILING
+	if (!addr)
+		addr = TASK_UNMAPPED_CEILING - len;
+
+	do { 
+		/* align addr downards; PAGE_ALIGN aligns it upwards */ 
+		addr = addr&PAGE_MASK; 
+		vmm = find_vma(current->mm,addr);
+		/* At this point:  (!vmm || addr < vmm->vm_end). */	  
+		if (!vmm || addr + len <= vmm->vm_start)
+			return addr;
+		addr = vmm->vm_start - len;
+	} while (addr >= TASK_UNMAPPED_FLOOR);
+
+	return 0;
+#else
 	if (!addr)
 		addr = TASK_UNMAPPED_BASE;
 	addr = PAGE_ALIGN(addr);
@@ -377,6 +393,7 @@
 			return addr;
 		addr = vmm->vm_end;
 	}
+#endif
 }
 
 #define vm_avl_empty	(struct vm_area_struct *) NULL






^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-12-13 19:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-13 18:36 Repost: could ia32 mmap() allocations grow downward? Petr Vandrovec
2001-12-13 18:03 ` Wayne Whitney
  -- strict thread matches above, loose matches on Subject: below --
2001-12-13 20:13 Petr Vandrovec
2001-12-13 11:27 Petr Vandrovec
2001-12-13 16:22 ` Wayne Whitney
2001-12-13 16:54   ` Wayne Whitney
2001-12-13 17:10   ` Hugh Dickins
2001-12-13 17:38     ` Wayne Whitney
2001-12-13 18:02       ` Hugh Dickins
2001-12-12 21:47 Petr Vandrovec
2001-12-13  6:28 ` Wayne Whitney
2001-12-12 20:02 Wayne Whitney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).