linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Memory allocation
@ 2001-02-27 16:55 Ivan Stepnikov
  2001-02-27 17:51 ` Wayne Whitney
  2001-02-27 17:55 ` Peter Samuelson
  0 siblings, 2 replies; 8+ messages in thread
From: Ivan Stepnikov @ 2001-02-27 16:55 UTC (permalink / raw)
  To: linux-kernel

    Hello!
I encountered with problem: one process can not allocate more then 2Gb of
memory. Kernel compiled with CONFIG_HIGHMEM4G=y, CONFIG_HIGHMEM=y. Kernel is
2.4.0

As far as I know on i386 linux process has got 32 bit address space. It
means that actually about 3Gb of memory should be available.

I tried to call getrlimit(). It shows only 2G available memory and there is
no way to increase it.

Could you say me are there any solutions? Might be on i386 linux process can
not use more than 2Gb of memory de facto? But I don't see the reason for it:
there is unsigned long type uses everywhere in kernel sources for memory
allocation.


--
Regards,
Ivan Stepnikov.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Memory allocation
  2001-02-27 16:55 Memory allocation Ivan Stepnikov
@ 2001-02-27 17:51 ` Wayne Whitney
  2001-02-27 17:55 ` Peter Samuelson
  1 sibling, 0 replies; 8+ messages in thread
From: Wayne Whitney @ 2001-02-27 17:51 UTC (permalink / raw)
  To: iv, linux-kernel

In mailing-lists.linux-kernel, you wrote:

>I encountered with problem: one process can not allocate more then 2Gb of
>memory. 

This is a problem that I have run into myself.  I am no kernel expert,
but I think I understand how this issue.  Here is how the standard
kernel maps the 4Gb 32-bit address space from the process's point of
view:

128MB	 program executable mapped (twice)
128MB +	 program heap
1GB	 mmap() starts here
3GB	 kernel

You can see this for yourself by looking at /proc/pid/maps, where pid
is the PID of the process in question.  

Now glibc()'s malloc uses mmap() for 'large' allocations, so you get
2GB maximum memory.  The way around this is to change the various
numbers in the left-hand column.

For example, you can try the patch per-process-3.5G-IA32-no-PAE-1, at
/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test11-pre5/ on
ftp.kernel.org.  This will make the kernel space start at 3.5G (so
that CONFIG_4G is required to use more than 384MB of physical RAM) and
the mmap() space start at 224MB, giving 3G288MB of address space for
mmap().  Note that only 96MB is then available for {two copies of your
executable plus your program heap}.

This is more or less the most you can do, but your needs may be best
suited by something in between. The above patch is quite short, so it
is easy to figure out how to do that.  The only hidden restriction is
that the size of the kernel space must be a power of 2: 2GB, 1GB,
512MB, etc.  As explained to me, this is so that the kernel can easily
test a pointer to see whether it is to kernel space or user space.

Cheers,
Wayne





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Memory allocation
  2001-02-27 16:55 Memory allocation Ivan Stepnikov
  2001-02-27 17:51 ` Wayne Whitney
@ 2001-02-27 17:55 ` Peter Samuelson
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Samuelson @ 2001-02-27 17:55 UTC (permalink / raw)
  To: Ivan Stepnikov; +Cc: linux-kernel


[Ivan Stepnikov]
> I tried to call getrlimit(). It shows only 2G available memory and
> there is no way to increase it.

Right.  Architectural limit.  There needs to be some room in the
address space for kernel stuff, I/O, etc -- in Linux at least, having
to play with your page tables every single time you enter a system call
or IRQ handler would be considered a Bad Thing.

> Could you say me are there any solutions?

a) If you have that much memory, maybe you need a 64-bit CPU.
b) fork() and do IPC.  It's the Unix Way.

Peter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Memory Allocation
  2007-04-17  2:17 Memory Allocation Brian D. McGrew
  2007-04-17  7:06 ` Eric Dumazet
@ 2007-04-18  4:47 ` David Schwartz
  1 sibling, 0 replies; 8+ messages in thread
From: David Schwartz @ 2007-04-18  4:47 UTC (permalink / raw)
  To: linux-kernel


> My test machine is a Dell Precision 490 with dual 5140 processors and
> 3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
> However, I need to allocate an array of char that is (2048 * 2048 * 256)
> and maybe even as large at (2048 * 2048 * 512).
>
> Obviously I have enough physical memory in the box to do this.  However,
> I suspect that I'm running out of page table entries.  Please, correct
> me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
> increment to 256 or 512 it fails and it is my suspicion that I just
> don't have enough more in kernel memory to allocate this much memory in
> user space.

It is unreasonable to expect single allocations this large to succeed on a
32-bit OS. Either get a 64-bit OS or use a number of smaller allocations.

You may want to use mmap'ed files instead of malloc'ed memory. You can them
mmap however many files you can at once, and unmap and remap them as needed.

DS




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Memory Allocation
  2007-04-17  2:17 Memory Allocation Brian D. McGrew
@ 2007-04-17  7:06 ` Eric Dumazet
  2007-04-18  4:47 ` David Schwartz
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2007-04-17  7:06 UTC (permalink / raw)
  To: Brian D. McGrew; +Cc: linux-kernel

Brian D. McGrew a écrit :
> Good evening gents!
> 
> I need some help in allocating memory and understanding how the system
> allocates memory with physical versus virtual page tables.  Please
> consider the following snippet of code.  Please, no wisecracks about bad
> code; it was written in 30 seconds in haste :-)
> 
> #include <iostream>
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <pthread.h>
> 
> const static u_long kMaxSize = (2048 * 2048 * 256);
> 
> void *msg(void *ptr);
> static u_long threads_done	= 0;
> 
> int
> main(int argc, char *argv[])
> {
>      pthread_t thread1;
>      pthread_t thread2;
> 
>      char *message1 = "Thread 1";
>      char *message2 = "Thread 2";
> 
>      int iret1;
>      int iret2;
> 
>      iret1 = pthread_create(&thread1, NULL, msg, (void *) message1);
>      iret2 = pthread_create(&thread2, NULL, msg, (void *) message2);
> 
>     //pthread_join(thread1, NULL);
>     //pthread_join(thread2, NULL); 
> 
>     while (threads_done < 2) {
> 	std::cout << "Threads complete: " << threads_done << std::endl;
> 	sleep(3);
>     }
> 
>     exit(0);
> }
> 
> void *
> msg(void *ptr)
> {
>     char *message = (char *) ptr;
> 
>     //
>     // Equal to 1 bank per thread of 256 each 4MP image buffers.  2GB.
>     //
>     char *buffer = new char[kMaxSize];
> 
>     u_long max = kMaxSize;
> 
>     //
>     // Init each buffer to 'something'.
>     //
>     for (u_long inx = 0; inx < max; inx++) {
> 	if (inx % 102400000 == 0) {
> 	    std::cout << message << ": Index: " << inx << std::endl;
> 	}
> 
>     	buffer[inx] = inx;
>     }
> 
>     free(buffer);
>     threads_done++;
> }
> 
> My test machine is a Dell Precision 490 with dual 5140 processors and
> 3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
> However, I need to allocate an array of char that is (2048 * 2048 * 256)
> and maybe even as large at (2048 * 2048 * 512).
> 
> Obviously I have enough physical memory in the box to do this.  However,
> I suspect that I'm running out of page table entries.  Please, correct
> me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
> increment to 256 or 512 it fails and it is my suspicion that I just
> don't have enough more in kernel memory to allocate this much memory in
> user space.  
> 
> Because of a piece of 3rd party hardware, I'm forced to run the kernel
> in the 4GB memory model.  What I need to be able to do is allocate an
> array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
> need the addresses that I get back to be contiguous, that's just the way
> my 3rd party hardware works.
> 
> I'm inclined to believe that this in not specifically a Linux problem
> but maybe an architecture problem???  But maybe there is some kind of
> work around in the kernel for it???  I'd find it hard to believe that
> I'm the first one that ever needed to use this much memory.
> 
> I ran this same code on two difference Macs.  One of them a Powerbook G4
> with 4GB of RAM and it was successful.  The other was a Macbook Pro with
> 4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
> runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
> it's an Intel/X86 issue!
> 
> How the heck to I get past this problem in Linux on the X86 plateform???
> 
> Thanks,

Hi Brian

Add this line at the begining of your msg() function :

char cmd[128];
sprintf(cmd, "cat /proc/%d/maps", getpid());
system(cmd);

You'll see :

08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-40a78000 rw-p 40279000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-80a7a000 rw-p 40279000 00:00 0
80a7a000-80a7b000 ---p 80a7a000 00:00 0
80a7b000-8127a000 rw-p 80a7b000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
   what():  St9bad_alloc
Aborted



The problem is about the dynamic libraries and thread stacks, that might be 
mapped in 0x40000000 zone. So your program cannot allocate a 2GB zone, because 
available zone for user program is 3GB, from 0x00000000 to 0xC0000000, but not 
contiguous.


Now if you compile your program with static libraries, it's a litle bit better :

g++ -o test1 -static test1.c -lpthread
# ./test1
Threads complete: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-41001000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-81002000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
   what():  St9bad_alloc
Killed

Still some mappings (thread stacks) are bitting you.

If you want to use so much memory on a 32bit kernel, you might tune your 
program to :

- Avoid dynamic libraries
- allocate thread stacks yourself, so that they wont be in the midle of your 
address space (using malloc() zone, in the 08139000-08xxxxxx range)
...
- Use a smarter kernel that can map in the other way (from the top to the 
down) (check /proc/sys/vm/legacy_va_layout )

Of course, switching to a 64bit kernel just make this problem not existant :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Memory Allocation
       [not found] <fa.BkbRyK9fXhgijRwEDN4ejyJhMRQ@ifi.uio.no>
@ 2007-04-17  5:08 ` Robert Hancock
  0 siblings, 0 replies; 8+ messages in thread
From: Robert Hancock @ 2007-04-17  5:08 UTC (permalink / raw)
  To: Brian D. McGrew; +Cc: linux-kernel

Brian D. McGrew wrote:
> Good evening gents!
> 
> I need some help in allocating memory and understanding how the system
> allocates memory with physical versus virtual page tables.  Please
> consider the following snippet of code.  Please, no wisecracks about bad
> code; it was written in 30 seconds in haste :-)

(snip)

> My test machine is a Dell Precision 490 with dual 5140 processors and
> 3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
> However, I need to allocate an array of char that is (2048 * 2048 * 256)
> and maybe even as large at (2048 * 2048 * 512).
> 
> Obviously I have enough physical memory in the box to do this.  However,
> I suspect that I'm running out of page table entries.  Please, correct
> me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I

Pretty sure you're wrong.

> increment to 256 or 512 it fails and it is my suspicion that I just
> don't have enough more in kernel memory to allocate this much memory in
> user space.  

Are you using a 32-bit kernel? If so, most likely you're hitting a limit 
of the address space layout - there's just not enough room in the 
address space for an allocation of this size.

> 
> Because of a piece of 3rd party hardware, I'm forced to run the kernel
> in the 4GB memory model.  What I need to be able to do is allocate an
> array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
> need the addresses that I get back to be contiguous, that's just the way
> my 3rd party hardware works.
> 
> I'm inclined to believe that this in not specifically a Linux problem
> but maybe an architecture problem???  But maybe there is some kind of
> work around in the kernel for it???  I'd find it hard to believe that
> I'm the first one that ever needed to use this much memory.
> 
> I ran this same code on two difference Macs.  One of them a Powerbook G4
> with 4GB of RAM and it was successful.  The other was a Macbook Pro with
> 4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
> runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
> it's an Intel/X86 issue!
> 
> How the heck to I get past this problem in Linux on the X86 plateform???

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Memory Allocation
@ 2007-04-17  2:17 Brian D. McGrew
  2007-04-17  7:06 ` Eric Dumazet
  2007-04-18  4:47 ` David Schwartz
  0 siblings, 2 replies; 8+ messages in thread
From: Brian D. McGrew @ 2007-04-17  2:17 UTC (permalink / raw)
  To: linux-kernel, Brian D. McGrew

Good evening gents!

I need some help in allocating memory and understanding how the system
allocates memory with physical versus virtual page tables.  Please
consider the following snippet of code.  Please, no wisecracks about bad
code; it was written in 30 seconds in haste :-)

#include <iostream>

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

const static u_long kMaxSize = (2048 * 2048 * 256);

void *msg(void *ptr);
static u_long threads_done	= 0;

int
main(int argc, char *argv[])
{
     pthread_t thread1;
     pthread_t thread2;

     char *message1 = "Thread 1";
     char *message2 = "Thread 2";

     int iret1;
     int iret2;

     iret1 = pthread_create(&thread1, NULL, msg, (void *) message1);
     iret2 = pthread_create(&thread2, NULL, msg, (void *) message2);

    //pthread_join(thread1, NULL);
    //pthread_join(thread2, NULL); 

    while (threads_done < 2) {
	std::cout << "Threads complete: " << threads_done << std::endl;
	sleep(3);
    }

    exit(0);
}

void *
msg(void *ptr)
{
    char *message = (char *) ptr;

    //
    // Equal to 1 bank per thread of 256 each 4MP image buffers.  2GB.
    //
    char *buffer = new char[kMaxSize];

    u_long max = kMaxSize;

    //
    // Init each buffer to 'something'.
    //
    for (u_long inx = 0; inx < max; inx++) {
	if (inx % 102400000 == 0) {
	    std::cout << message << ": Index: " << inx << std::endl;
	}

    	buffer[inx] = inx;
    }

    free(buffer);
    threads_done++;
}

My test machine is a Dell Precision 490 with dual 5140 processors and
3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
However, I need to allocate an array of char that is (2048 * 2048 * 256)
and maybe even as large at (2048 * 2048 * 512).

Obviously I have enough physical memory in the box to do this.  However,
I suspect that I'm running out of page table entries.  Please, correct
me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
increment to 256 or 512 it fails and it is my suspicion that I just
don't have enough more in kernel memory to allocate this much memory in
user space.  

Because of a piece of 3rd party hardware, I'm forced to run the kernel
in the 4GB memory model.  What I need to be able to do is allocate an
array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
need the addresses that I get back to be contiguous, that's just the way
my 3rd party hardware works.

I'm inclined to believe that this in not specifically a Linux problem
but maybe an architecture problem???  But maybe there is some kind of
work around in the kernel for it???  I'd find it hard to believe that
I'm the first one that ever needed to use this much memory.

I ran this same code on two difference Macs.  One of them a Powerbook G4
with 4GB of RAM and it was successful.  The other was a Macbook Pro with
4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
it's an Intel/X86 issue!

How the heck to I get past this problem in Linux on the X86 plateform???

Thanks,

-brian
brian@visionpro.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Memory allocation
  2003-09-14 19:11 ` Robert Love
@ 2003-09-16 15:04   ` Breno
  0 siblings, 0 replies; 8+ messages in thread
From: Breno @ 2003-09-16 15:04 UTC (permalink / raw)
  To: Robert Love; +Cc: Kernel List

Hi,

When i use kmem_cache_alloc() to create a vma and return addr of this vma ,
the pages have already been allocated and mapped ?  or i must use
mk_pte/set_pte to map the page to my vma ?

att
Breno


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-04-18  4:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-27 16:55 Memory allocation Ivan Stepnikov
2001-02-27 17:51 ` Wayne Whitney
2001-02-27 17:55 ` Peter Samuelson
2003-09-14 18:44 1:1 M:N threading Breno
2003-09-14 19:11 ` Robert Love
2003-09-16 15:04   ` Memory allocation Breno
2007-04-17  2:17 Memory Allocation Brian D. McGrew
2007-04-17  7:06 ` Eric Dumazet
2007-04-18  4:47 ` David Schwartz
     [not found] <fa.BkbRyK9fXhgijRwEDN4ejyJhMRQ@ifi.uio.no>
2007-04-17  5:08 ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).