linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hammer: MAP_32BIT
@ 2003-05-09  7:35 Ulrich Drepper
  2003-05-09  9:20 ` Andi Kleen
  0 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09  7:35 UTC (permalink / raw)
  To: linux-kernel, Andi Kleen

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

To allocate stacks for the threads in nptl we currently use MAP_32BIT to
make sure we get <4GB addresses for faster context switching time.  But
once the address space is allocated we have to resort to not using the
flag.  This means we have to make 2 mmap() calls, one with MAP_32BIT and
if it fails another one without.

It would be much better if there would also be a MAP_32PREFER flag with
the appropriate semantics.  The failing mmap() calls seems to be quite
expensive so programs with many threads are really punished a lot.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+u1pF2ijCOnn/RHQRAk2IAKDAzXZUOsxMPAKkK9ivOz8o6zAaHQCeMC24
ysih3QB/I1w5MNXEIxNs284=
=2cet
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09  7:35 hammer: MAP_32BIT Ulrich Drepper
@ 2003-05-09  9:20 ` Andi Kleen
  2003-05-09 11:28   ` mikpe
                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Andi Kleen @ 2003-05-09  9:20 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel, Andi Kleen

On Fri, May 09, 2003 at 09:35:32AM +0200, Ulrich Drepper wrote:
> It would be much better if there would also be a MAP_32PREFER flag with
> the appropriate semantics.  The failing mmap() calls seems to be quite
> expensive so programs with many threads are really punished a lot.

That's just an inadequate data structure. It does an linear search of the
VMAs and you probably have a lot of them. Before you add kludges like this 
better fix the data structure for fast free space lookup.

MAP_32BIT currently limits to the first 2GB only. That's needed because
most programs use it to allocate modules for the small code model and that
only supports 2GB (poster child for that is the X server) But for your 
application 4GB would be better. But adding another MAP_32BIT_4GB or so
would be quite ugly. I considered making the address where mmap starts searching
(TASK_UNMAPPED_BASE) settable using a prctl.

In some vendor kernels it's already in /proc/pid/mapped_base, but that is 
quite costly to change. That would probably give you the best of both, Just 
set it to a low value for the thread stacks and then reset it to the default.

I guess that would be the better solution for your stacks. 

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09  9:20 ` Andi Kleen
@ 2003-05-09 11:28   ` mikpe
  2003-05-09 11:38     ` Andi Kleen
  2003-05-09 17:36   ` H. Peter Anvin
  2003-05-09 17:39   ` Ulrich Drepper
  2 siblings, 1 reply; 29+ messages in thread
From: mikpe @ 2003-05-09 11:28 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ulrich Drepper, linux-kernel

Andi Kleen writes:
 > On Fri, May 09, 2003 at 09:35:32AM +0200, Ulrich Drepper wrote:
 > > It would be much better if there would also be a MAP_32PREFER flag with
 > > the appropriate semantics.  The failing mmap() calls seems to be quite
 > > expensive so programs with many threads are really punished a lot.
 > 
 > That's just an inadequate data structure. It does an linear search of the
 > VMAs and you probably have a lot of them. Before you add kludges like this 
 > better fix the data structure for fast free space lookup.
 > 
 > MAP_32BIT currently limits to the first 2GB only. That's needed because
 > most programs use it to allocate modules for the small code model and that
 > only supports 2GB (poster child for that is the X server) But for your 
 > application 4GB would be better. But adding another MAP_32BIT_4GB or so
 > would be quite ugly. I considered making the address where mmap starts searching
 > (TASK_UNMAPPED_BASE) settable using a prctl.

I have a potential use for mmap()ing in the low 4GB on x86_64.
Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
What about a more generic way of indicating which parts of the address
space one wants? The simplest that would work for me is a single byte
'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
This could be specified on a per-mmap() basis or as a settable process attribute.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 11:28   ` mikpe
@ 2003-05-09 11:38     ` Andi Kleen
  2003-05-09 11:52       ` mikpe
  2003-05-09 18:11       ` H. Peter Anvin
  0 siblings, 2 replies; 29+ messages in thread
From: Andi Kleen @ 2003-05-09 11:38 UTC (permalink / raw)
  To: mikpe; +Cc: Andi Kleen, Ulrich Drepper, linux-kernel


On Fri, May 09, 2003 at 01:28:11PM +0200, mikpe@csd.uu.se wrote:
> I have a potential use for mmap()ing in the low 4GB on x86_64.

Just use MAP_32BIT

> Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
> What about a more generic way of indicating which parts of the address
> space one wants? The simplest that would work for me is a single byte
> 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
> This could be specified on a per-mmap() basis or as a settable process attribute.

On x86-64 an mmap extension for that would be fine, but on i386 you get
problems because mmap64() already maxes out the argument limit and you 
cannot add more.
 
You could only implement it with a structure in memory pointed to by an
argument, which would be ugly.

prctl is probably better. You really want [start; end] right ? 

Pity that task_struct is already so bloated, so every new entry hurts.

-Andi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 11:38     ` Andi Kleen
@ 2003-05-09 11:52       ` mikpe
  2003-05-09 12:16         ` Andi Kleen
  2003-05-09 18:11       ` H. Peter Anvin
  1 sibling, 1 reply; 29+ messages in thread
From: mikpe @ 2003-05-09 11:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ulrich Drepper, linux-kernel

Andi Kleen writes:
 > 
 > On Fri, May 09, 2003 at 01:28:11PM +0200, mikpe@csd.uu.se wrote:
 > > I have a potential use for mmap()ing in the low 4GB on x86_64.
 > 
 > Just use MAP_32BIT

Will that be corrected to use the full 4GB space? 2GB is too small.

 > > Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
 > > What about a more generic way of indicating which parts of the address
 > > space one wants? The simplest that would work for me is a single byte
 > > 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
 > > This could be specified on a per-mmap() basis or as a settable process attribute.
 > 
 > On x86-64 an mmap extension for that would be fine, but on i386 you get
 > problems because mmap64() already maxes out the argument limit and you 
 > cannot add more.

This would only be used on x86_64. i386 compat is a non-issue.
(This is for runtime systems stuff, not applictions.)

 > prctl is probably better. You really want [start; end] right ? 

I just want mmap() to return addresses that fit in 32 bits.

MAP_32BIT would do nicely, if it wasn't limited to 2GB.

/Mikael

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 11:52       ` mikpe
@ 2003-05-09 12:16         ` Andi Kleen
  0 siblings, 0 replies; 29+ messages in thread
From: Andi Kleen @ 2003-05-09 12:16 UTC (permalink / raw)
  To: mikpe; +Cc: Andi Kleen, Ulrich Drepper, linux-kernel

On Fri, May 09, 2003 at 01:52:17PM +0200, mikpe@csd.uu.se wrote:
> Andi Kleen writes:
>  > 
>  > On Fri, May 09, 2003 at 01:28:11PM +0200, mikpe@csd.uu.se wrote:
>  > > I have a potential use for mmap()ing in the low 4GB on x86_64.
>  > 
>  > Just use MAP_32BIT
> 
> Will that be corrected to use the full 4GB space? 2GB is too small.

That would break the X server.

But what you can do is to use mmap(0x1000, ....) and free the memory
again if the result is bigger than 4GB. If you pass an non zero value
as first argument but not MAP_FIXED it'll use the address argument 
as starting point for the search.

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09  9:20 ` Andi Kleen
  2003-05-09 11:28   ` mikpe
@ 2003-05-09 17:36   ` H. Peter Anvin
  2003-05-09 17:39   ` Ulrich Drepper
  2 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 17:36 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20030509092026.GA11012@averell>
By author:    Andi Kleen <ak@muc.de>
In newsgroup: linux.dev.kernel
> 
> MAP_32BIT currently limits to the first 2GB only. That's needed because
> most programs use it to allocate modules for the small code model and that
> only supports 2GB (poster child for that is the X server) But for your 
> application 4GB would be better. But adding another MAP_32BIT_4GB or so
> would be quite ugly. I considered making the address where mmap starts searching
> (TASK_UNMAPPED_BASE) settable using a prctl.
> 

MAP_31BIT would have been a better name...

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09  9:20 ` Andi Kleen
  2003-05-09 11:28   ` mikpe
  2003-05-09 17:36   ` H. Peter Anvin
@ 2003-05-09 17:39   ` Ulrich Drepper
  2003-05-10  1:48     ` Andi Kleen
  2 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 17:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andi Kleen wrote:

> That's just an inadequate data structure. It does an linear search of the
> VMAs and you probably have a lot of them. Before you add kludges like this 
> better fix the data structure for fast free space lookup.

If you mean the code in arch_get_unmapped_area(), yes, this needs
fixing.  In fact, Ingo has already a patch which brings back the
performance of thread creation to what we had back in September/October.


> In some vendor kernels it's already in /proc/pid/mapped_base, but that is 
> quite costly to change. That would probably give you the best of both, Just 
> set it to a low value for the thread stacks and then reset it to the default.
> 
> I guess that would be the better solution for your stacks. 

Are you sure this is the best solution?  It means the mmap regions for
restricted 31/32 bit addresses and that for the normal, unrestricted
mapping is continuous.  This removes a lot of freedom in deciding where
the unrestricted mappings are best located and it would make programs
using threads have a very different memory layout.  Not that it should
make any difference; but I can here /them/ already scream that this
breaks applications.

My kernel-uninformed opinion would be to keep the settings separate.

Oh, and please rename MAP_32BIT to MAP_31BIT.  This will save nerves on
all sides.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+u+fi2ijCOnn/RHQRAqeBAKC3ZlSCNcw3f7SXahvxRc0WMupYgwCgyBGy
fMqzCxWcx90e002CNUQqwgM=
=LDJf
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 11:38     ` Andi Kleen
  2003-05-09 11:52       ` mikpe
@ 2003-05-09 18:11       ` H. Peter Anvin
  2003-05-09 19:24         ` Ulrich Drepper
  1 sibling, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 18:11 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20030509113845.GA4586@averell>
By author:    Andi Kleen <ak@muc.de>
In newsgroup: linux.dev.kernel
>
> 
> On Fri, May 09, 2003 at 01:28:11PM +0200, mikpe@csd.uu.se wrote:
> > I have a potential use for mmap()ing in the low 4GB on x86_64.
> 
> Just use MAP_32BIT
> 
> > Sounds like your MAP_32BIT really is MAP_31BIT :-( which is too limiting.
> > What about a more generic way of indicating which parts of the address
> > space one wants? The simplest that would work for me is a single byte
> > 'nrbits' specifying the target address space as [0 .. 2^nrbits-1].
> > This could be specified on a per-mmap() basis or as a settable process attribute.
> 
> On x86-64 an mmap extension for that would be fine, but on i386 you get
> problems because mmap64() already maxes out the argument limit and you 
> cannot add more.
>  

How about this: since the address argument is basically unused anyway
unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
address argument as the highest permissible address (or lowest
nonpermissible address)?

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 18:11       ` H. Peter Anvin
@ 2003-05-09 19:24         ` Ulrich Drepper
  2003-05-09 20:55           ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 19:24 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:

> How about this: since the address argument is basically unused anyway
> unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
> address argument as the highest permissible address (or lowest
> nonpermissible address)?

You miss the point of my initial mail: I need a way to say "preferrably
32bit address, otherwise give me what you have".  MAP_32BIT already
provides a way to require 32 bit addresses.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vACE2ijCOnn/RHQRAl3rAKCYgj3LqvIDJ8Ny3pnii8bBvsbwrQCdGkg4
pnFnBmubkRnnsVfBSjDBBWQ=
=P8SV
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 19:24         ` Ulrich Drepper
@ 2003-05-09 20:55           ` H. Peter Anvin
  2003-05-09 21:45             ` Ulrich Drepper
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 20:55 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> H. Peter Anvin wrote:
> 
> 
>>How about this: since the address argument is basically unused anyway
>>unless MAP_FIXED is set, how about a MAP_MAXADDR which interprets the
>>address argument as the highest permissible address (or lowest
>>nonpermissible address)?
> 
> 
> You miss the point of my initial mail: I need a way to say "preferrably
> 32bit address, otherwise give me what you have".  MAP_32BIT already
> provides a way to require 32 bit addresses.
> 

No, it requires 31-bit addresses, and there was a discussion about how
some things need 31-bit and some 32-bit addresses.  There might also be
a need for 39-bit addresses, to be compatible with Linux 2.4.

MAP_MAXADDR_ADVISORY?

	-hpa



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 20:55           ` H. Peter Anvin
@ 2003-05-09 21:45             ` Ulrich Drepper
  2003-05-09 22:07               ` H. Peter Anvin
  2003-05-09 22:20               ` Timothy Miller
  0 siblings, 2 replies; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 21:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:

> No, it requires 31-bit addresses, and there was a discussion about how
> some things need 31-bit and some 32-bit addresses.

That's completely irrelevant to my point.  Whether MAP_32BIT actually
has a 31 bit limit or not doesn't matter, it's limited as well in the
possible mmap blocks it can return.

The only thing I care about is to have a hint and not a fixed
requirement for mmap().  All your proposals completely ignored this.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCFk2ijCOnn/RHQRAnw1AKChzyuZ3g9iXAX5wH088rhko/s8YgCgku12
CayuZsLJGzPO//WCJVWyLxk=
=rkBk
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 21:45             ` Ulrich Drepper
@ 2003-05-09 22:07               ` H. Peter Anvin
  2003-05-09 22:20                 ` Ulrich Drepper
  2003-05-09 22:20               ` Timothy Miller
  1 sibling, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 22:07 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> H. Peter Anvin wrote:
> 
> 
>>No, it requires 31-bit addresses, and there was a discussion about how
>>some things need 31-bit and some 32-bit addresses.
> 
> 
> That's completely irrelevant to my point.  Whether MAP_32BIT actually
> has a 31 bit limit or not doesn't matter, it's limited as well in the
> possible mmap blocks it can return.
> 
> The only thing I care about is to have a hint and not a fixed
> requirement for mmap().  All your proposals completely ignored this.
> 

Yes, but this is irrelevant to *MY* point... this discussion spawned a
side discussion, and somehow you're upset that it's not addressing your
concern but a different one... seems a bit ridiculous!

Anyway, I already posted that if we're adding MAP_MAXADDR we could also
add MAP_MAXADDR_ADVISORY or something similar to that.  On the other
hand, how big of a performance issue is it really to call mmap() again
in the failure scenario *only*?

	-hpa



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:20               ` Timothy Miller
@ 2003-05-09 22:20                 ` H. Peter Anvin
  2003-05-09 22:46                   ` Timothy Miller
  2003-05-09 22:22                 ` Ulrich Drepper
  1 sibling, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 22:20 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Ulrich Drepper, linux-kernel

Timothy Miller wrote:
> 
> If your program is capable of handling an address with more than 32
> bits, what point is there giving a hint?  Either your program can handle
> 64-bit pointers or it cannot.  Any program flexible enough to handle
> either size dynamically would expend enough overhead checking that it
> would be worse than if it just made a hard choice.
>

The purpose is that there is a slight task-switching speed advantage if
the address is in the bottom 4 GB.  Since this affects every process,
and most processes use very little TLS, this is worthwhile.

This is fundamentally due to a K8 design flaw.

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 21:45             ` Ulrich Drepper
  2003-05-09 22:07               ` H. Peter Anvin
@ 2003-05-09 22:20               ` Timothy Miller
  2003-05-09 22:20                 ` H. Peter Anvin
  2003-05-09 22:22                 ` Ulrich Drepper
  1 sibling, 2 replies; 29+ messages in thread
From: Timothy Miller @ 2003-05-09 22:20 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: H. Peter Anvin, linux-kernel



Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> H. Peter Anvin wrote:
> 
> 
>>No, it requires 31-bit addresses, and there was a discussion about how
>>some things need 31-bit and some 32-bit addresses.
> 
> 
> That's completely irrelevant to my point.  Whether MAP_32BIT actually
> has a 31 bit limit or not doesn't matter, it's limited as well in the
> possible mmap blocks it can return.
> 
> The only thing I care about is to have a hint and not a fixed
> requirement for mmap().  All your proposals completely ignored this.
> 

If your program is capable of handling an address with more than 32 
bits, what point is there giving a hint?  Either your program can handle 
64-bit pointers or it cannot.  Any program flexible enough to handle 
either size dynamically would expend enough overhead checking that it 
would be worse than if it just made a hard choice.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:07               ` H. Peter Anvin
@ 2003-05-09 22:20                 ` Ulrich Drepper
  2003-05-09 22:21                   ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 22:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

H. Peter Anvin wrote:
> On the other
> hand, how big of a performance issue is it really to call mmap() again
> in the failure scenario *only*?

Just look at the code, it's very expensive.  In the moment the mmap code
has to sequentially look at the VMAs in question.  If it fails it means
it walked the entire data structure without success.  Ingo's patch does
not address this, it just makes successful allocation usually fast again.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCmt2ijCOnn/RHQRAsUeAJ9gGIwIK+QKpSz15YDEaB5aISBwowCgjReV
WSvgiDRcLX5bpla/Agikmj0=
=NSIn
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:20                 ` Ulrich Drepper
@ 2003-05-09 22:21                   ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 22:21 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> H. Peter Anvin wrote:
> 
>>On the other
>>hand, how big of a performance issue is it really to call mmap() again
>>in the failure scenario *only*?
> 
> 
> Just look at the code, it's very expensive.  In the moment the mmap code
> has to sequentially look at the VMAs in question.  If it fails it means
> it walked the entire data structure without success.  Ingo's patch does
> not address this, it just makes successful allocation usually fast again.
> 

OK, maybe we should fix that instead :-/

	-hpa



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:20               ` Timothy Miller
  2003-05-09 22:20                 ` H. Peter Anvin
@ 2003-05-09 22:22                 ` Ulrich Drepper
  2003-05-09 22:53                   ` Timothy Miller
  1 sibling, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 22:22 UTC (permalink / raw)
  To: Timothy Miller; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Timothy Miller wrote:

> If your program is capable of handling an address with more than 32
> bits, what point is there giving a hint?  Either your program can handle
> 64-bit pointers or it cannot.  Any program flexible enough to handle
> either size dynamically would expend enough overhead checking that it
> would be worse than if it just made a hard choice.

Look at the x86-64 context switching code.  If memory addressed by the
GDT entries has a 32-bit address it uses a different method than for
cases where the virtual address has more than 32 bits.  This way of
handling GDT entries is faster according to ak.  So, it's not a
correctness thing, it's a performance thing.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vCo82ijCOnn/RHQRAlGzAJ9Ti80kJMeecyxGikowWcfCAq0stwCfRVcQ
Clui3Z6yKNSy3mu+phrY2FQ=
=GFwi
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:20                 ` H. Peter Anvin
@ 2003-05-09 22:46                   ` Timothy Miller
  2003-05-09 23:24                     ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Timothy Miller @ 2003-05-09 22:46 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ulrich Drepper, linux-kernel



H. Peter Anvin wrote:
> Timothy Miller wrote:
> 
>>If your program is capable of handling an address with more than 32
>>bits, what point is there giving a hint?  Either your program can handle
>>64-bit pointers or it cannot.  Any program flexible enough to handle
>>either size dynamically would expend enough overhead checking that it
>>would be worse than if it just made a hard choice.
>>
> 
> 
> The purpose is that there is a slight task-switching speed advantage if
> the address is in the bottom 4 GB.  Since this affects every process,
> and most processes use very little TLS, this is worthwhile.
> 
> This is fundamentally due to a K8 design flaw.

Is there an explicit check somewhere for this?  Are the page tables laid 
out differently?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:22                 ` Ulrich Drepper
@ 2003-05-09 22:53                   ` Timothy Miller
  2003-05-09 23:24                     ` Ulrich Drepper
  0 siblings, 1 reply; 29+ messages in thread
From: Timothy Miller @ 2003-05-09 22:53 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel



Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Timothy Miller wrote:
> 
> 
>>If your program is capable of handling an address with more than 32
>>bits, what point is there giving a hint?  Either your program can handle
>>64-bit pointers or it cannot.  Any program flexible enough to handle
>>either size dynamically would expend enough overhead checking that it
>>would be worse than if it just made a hard choice.
> 
> 
> Look at the x86-64 context switching code.  If memory addressed by the
> GDT entries has a 32-bit address it uses a different method than for
> cases where the virtual address has more than 32 bits.  This way of
> handling GDT entries is faster according to ak.  So, it's not a
> correctness thing, it's a performance thing.
> 

Alright.  Sounds great.  So my next question is this:

Why does there ever need to be an explicit HINT that you would prefer a 
<32 bit address, when it's known a priori that <32 is better?  Why 
doesn't the mapping code ALWAYS try to use 32-bit addresses before 
resorting to 64-bit?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:46                   ` Timothy Miller
@ 2003-05-09 23:24                     ` H. Peter Anvin
  2003-05-13 14:25                       ` Timothy Miller
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-09 23:24 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Ulrich Drepper, linux-kernel

Timothy Miller wrote:
>>
>> The purpose is that there is a slight task-switching speed advantage if
>> the address is in the bottom 4 GB.  Since this affects every process,
>> and most processes use very little TLS, this is worthwhile.
>>
>> This is fundamentally due to a K8 design flaw.
> 
> Is there an explicit check somewhere for this?  Are the page tables laid
> out differently?
>

No, there are two ways to load the FS base register: use a descriptor,
which is limited to 4 GB but is faster, or WRMSR, which is slower, but
unlimited.

	-hpa


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 22:53                   ` Timothy Miller
@ 2003-05-09 23:24                     ` Ulrich Drepper
  2003-05-10  0:00                       ` Edgar Toernig
  0 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-09 23:24 UTC (permalink / raw)
  To: Timothy Miller; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Timothy Miller wrote:

> Why does there ever need to be an explicit HINT that you would prefer a
> <32 bit address, when it's known a priori that <32 is better?  Why
> doesn't the mapping code ALWAYS try to use 32-bit addresses before
> resorting to 64-bit?

Because not all memory is addressed via GDT entries.  In fact, almost
none is, only thread stacks and similar gimicks.  If all mmap memory
would by default be served from the low memory pool you soon run out of
it and without any good reason.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vDjB2ijCOnn/RHQRAnHmAJ9V3BwxGTAUs7hw1YXowv0K0cEFFACePj6t
vLI+B5BlYG4ox5WcyFrwg8A=
=IGO2
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 23:24                     ` Ulrich Drepper
@ 2003-05-10  0:00                       ` Edgar Toernig
  2003-05-10  0:58                         ` Ulrich Drepper
  0 siblings, 1 reply; 29+ messages in thread
From: Edgar Toernig @ 2003-05-10  0:00 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel, H. Peter Anvin

Ulrich Drepper wrote:
> > Why does there ever need to be an explicit HINT that you would prefer a
> > <32 bit address, when it's known a priori that <32 is better?  Why
> > doesn't the mapping code ALWAYS try to use 32-bit addresses before
> > resorting to 64-bit?
> 
> Because not all memory is addressed via GDT entries.  In fact, almost
> none is, only thread stacks and similar gimicks.  If all mmap memory
> would by default be served from the low memory pool you soon run out of
> it and without any good reason.

As if there are so many apps that would suffer from that...

Anyway, what's so bad about the idea someone (Linus?) suggested?
Without MAP_FIXED the address given to mmap is already taken as a
hint where to start looking for free memory.  So use mmap(4GB,...)
for regular memory and mmap(4kB, ...) for stacks.  What's wrong
with that?  And if you are really frightend to run out of "low"
memory make the above-4GB allocation the default for addr==0.

Ciao, ET.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-10  0:00                       ` Edgar Toernig
@ 2003-05-10  0:58                         ` Ulrich Drepper
  2003-05-10  2:51                           ` Edgar Toernig
  0 siblings, 1 reply; 29+ messages in thread
From: Ulrich Drepper @ 2003-05-10  0:58 UTC (permalink / raw)
  To: Edgar Toernig; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Edgar Toernig wrote:

> Anyway, what's so bad about the idea someone (Linus?) suggested?
> Without MAP_FIXED the address given to mmap is already taken as a
> hint where to start looking for free memory.

The kernel fortunately already defines some semantics to using a
non-NULL first parameter without MAP_FIXED.  It means: I prefer
*exactly* this address.  If it's not available, give me anything else.
This is used and needed, for instance, when loading prelinked DSOs.

Now you want to give this another semantics.  It would need at least one
more MAP_* flag.

Anyway, I don't care what the solution looks like.  Changing existing
semantics should be out, that's the only requirement.  Since I don't
plan on doing the work I have nothing to decide.

- -- 
- --------------.                        ,-.            444 Castro Street
Ulrich Drepper \    ,-----------------'   \ Mountain View, CA 94041 USA
Red Hat         `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+vE6a2ijCOnn/RHQRAnxgAJ9ptrA6XRvLveB+xZyXZVTz4W8KjgCgkyUp
BwOWiMQys/z8b6HZpneawJs=
=Ra9K
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 17:39   ` Ulrich Drepper
@ 2003-05-10  1:48     ` Andi Kleen
  2003-05-10 20:10       ` David Woodhouse
  0 siblings, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2003-05-10  1:48 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Andi Kleen, linux-kernel

On Fri, May 09, 2003 at 07:39:46PM +0200, Ulrich Drepper wrote:
> 
> > In some vendor kernels it's already in /proc/pid/mapped_base, but that is 
> > quite costly to change. That would probably give you the best of both, Just 
> > set it to a low value for the thread stacks and then reset it to the default.
> > 
> > I guess that would be the better solution for your stacks. 
> 
> Are you sure this is the best solution?  It means the mmap regions for

No, I'm not sure.

On further thinking the mapped_base would not be useful for you currently,
because at least in the SuSE/AMD64 kernel it only applies to 32bit processes.

The real solution is probably to pass in the search start hint in mmap's
address argument and not use MAP_32BiT. 

e.g. use something like

	/* 
  	 * Current gcc still needs PROT_EXEC because it doesn't call
	 * __enable_execute_stack for trampolines yet.
 	 */
	stack = mmap(0x1000, stack_size, PROT_READ|PROT_WRITE|PROT_EXEC, 	
		     MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);

This will give you memory at the beginning of the address space and 
beyond 4GB if needed.

This may still be slow, but fixing the search algorithm is a different
problem that can be tackled separately.

> Oh, and please rename MAP_32BIT to MAP_31BIT.  This will save nerves on
> all sides.

I bet changing it will cost more nerves in supporting all these people
whose software doesn't compile anymore. And it's not really a lie. 2GB 
is 32bit too.

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-10  0:58                         ` Ulrich Drepper
@ 2003-05-10  2:51                           ` Edgar Toernig
  0 siblings, 0 replies; 29+ messages in thread
From: Edgar Toernig @ 2003-05-10  2:51 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel

> > Anyway, what's so bad about the idea someone (Linus?) suggested?
[it was Andi]
> > Without MAP_FIXED the address given to mmap is already taken as a
> > hint where to start looking for free memory.
> 
> The kernel fortunately already defines some semantics to using a
> non-NULL first parameter without MAP_FIXED.  It means: I prefer
> *exactly* this address.

Yeah, ok.

>  If it's not available, give me anything else.

And at least on older kernels (don't know about 2.5) it gives you
not "anything" but the next free memory region above that address.

POSIX-draft6 about that topic:

    "A non-zero value of addr is taken to be a suggestion of a
     process address near which the mapping should be placed."


> Now you want to give this another semantics.  It would need at least one
> more MAP_* flag.

No new flag.  No new semantic.  Everything's already there...

Ciao, ET.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-10  1:48     ` Andi Kleen
@ 2003-05-10 20:10       ` David Woodhouse
  2003-05-13 18:54         ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: David Woodhouse @ 2003-05-10 20:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ulrich Drepper, linux-kernel

On Sat, 2003-05-10 at 02:48, Andi Kleen wrote:
> > Oh, and please rename MAP_32BIT to MAP_31BIT.  This will save nerves on
> > all sides.
> 
> I bet changing it will cost more nerves in supporting all these people
> whose software doesn't compile anymore. And it's not really a lie. 2GB 
> is 32bit too.

If that's _really_ an issue, then also provide MAP_32BIT which does what
its name implies. 

Anyone who was using MAP_32BIT in the knowledge that it really limits to
31 bits gets the breakage they deserve for not reporting and fixing the
problem at the time.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-09 23:24                     ` H. Peter Anvin
@ 2003-05-13 14:25                       ` Timothy Miller
  0 siblings, 0 replies; 29+ messages in thread
From: Timothy Miller @ 2003-05-13 14:25 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ulrich Drepper, linux-kernel



H. Peter Anvin wrote:
> Timothy Miller wrote:
> 
>>>The purpose is that there is a slight task-switching speed advantage if
>>>the address is in the bottom 4 GB.  Since this affects every process,
>>>and most processes use very little TLS, this is worthwhile.
>>>
>>>This is fundamentally due to a K8 design flaw.
>>
>>Is there an explicit check somewhere for this?  Are the page tables laid
>>out differently?
>>
> 
> 
> No, there are two ways to load the FS base register: use a descriptor,
> which is limited to 4 GB but is faster, or WRMSR, which is slower, but
> unlimited.
> 


Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Timothy Miller wrote:
> 
> 
>>Why does there ever need to be an explicit HINT that you would prefer a
>><32 bit address, when it's known a priori that <32 is better?  Why
>>doesn't the mapping code ALWAYS try to use 32-bit addresses before
>>resorting to 64-bit?
> 
> 
> Because not all memory is addressed via GDT entries.  In fact, almost
> none is, only thread stacks and similar gimicks.  If all mmap memory
> would by default be served from the low memory pool you soon run out of
> it and without any good reason.


All I have to say is... I appreciate your patience with my ignorant 
questions.  :)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: hammer: MAP_32BIT
  2003-05-10 20:10       ` David Woodhouse
@ 2003-05-13 18:54         ` H. Peter Anvin
  0 siblings, 0 replies; 29+ messages in thread
From: H. Peter Anvin @ 2003-05-13 18:54 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <1052597418.1881.2.camel@lapdancer.baythorne.internal>
By author:    David Woodhouse <dwmw2@infradead.org>
In newsgroup: linux.dev.kernel
>
> On Sat, 2003-05-10 at 02:48, Andi Kleen wrote:
> > > Oh, and please rename MAP_32BIT to MAP_31BIT.  This will save nerves on
> > > all sides.
> > 
> > I bet changing it will cost more nerves in supporting all these people
> > whose software doesn't compile anymore. And it's not really a lie. 2GB 
> > is 32bit too.
> 
> If that's _really_ an issue, then also provide MAP_32BIT which does what
> its name implies. 
> 
> Anyone who was using MAP_32BIT in the knowledge that it really limits to
> 31 bits gets the breakage they deserve for not reporting and fixing the
> problem at the time.
> 

Agreed.

That being said, I think a more flexible scheme is called for; I still
would like to suggest the MAP_MAXADDR and MAP_MAXADDR_ADVISORY flags
that I mentioned earlier.

If people really want to retain the (rarely used) suggestion address,
I'd suggest making the address argument a pointer to a structure:

struct map_maxaddr {
	void *search;	/* Suggestion address */
	void *min;	/* Lowest acceptable address */
	void *max;	/* Maximum acceptable address */
};

... however, it seems like overkill to me.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2003-05-13 18:42 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-09  7:35 hammer: MAP_32BIT Ulrich Drepper
2003-05-09  9:20 ` Andi Kleen
2003-05-09 11:28   ` mikpe
2003-05-09 11:38     ` Andi Kleen
2003-05-09 11:52       ` mikpe
2003-05-09 12:16         ` Andi Kleen
2003-05-09 18:11       ` H. Peter Anvin
2003-05-09 19:24         ` Ulrich Drepper
2003-05-09 20:55           ` H. Peter Anvin
2003-05-09 21:45             ` Ulrich Drepper
2003-05-09 22:07               ` H. Peter Anvin
2003-05-09 22:20                 ` Ulrich Drepper
2003-05-09 22:21                   ` H. Peter Anvin
2003-05-09 22:20               ` Timothy Miller
2003-05-09 22:20                 ` H. Peter Anvin
2003-05-09 22:46                   ` Timothy Miller
2003-05-09 23:24                     ` H. Peter Anvin
2003-05-13 14:25                       ` Timothy Miller
2003-05-09 22:22                 ` Ulrich Drepper
2003-05-09 22:53                   ` Timothy Miller
2003-05-09 23:24                     ` Ulrich Drepper
2003-05-10  0:00                       ` Edgar Toernig
2003-05-10  0:58                         ` Ulrich Drepper
2003-05-10  2:51                           ` Edgar Toernig
2003-05-09 17:36   ` H. Peter Anvin
2003-05-09 17:39   ` Ulrich Drepper
2003-05-10  1:48     ` Andi Kleen
2003-05-10 20:10       ` David Woodhouse
2003-05-13 18:54         ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).