linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* /proc/<n>/maps getting _VERY_ long
@ 2001-08-04 15:43 Chris Wedgwood
  2001-08-05  2:17 ` Rik van Riel
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-04 15:43 UTC (permalink / raw)
  To: linux-kernel

Some time ago, the logic for merging VMAs was changing (simplified).
I noticed a couple of applications, specifically things seemed a bit
sluggish when running things that either grow slowly or use lots of
shared libraries:

cw:tty5@tapu(cw)$ wc -l /proc/1368/maps
   5287 /proc/1368/maps

it's totally unusual.

Can anyone tell me why we don't merge such entries anymore?




  --cw




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-04 15:43 /proc/<n>/maps getting _VERY_ long Chris Wedgwood
@ 2001-08-05  2:17 ` Rik van Riel
  2001-08-05  5:12   ` Chris Wedgwood
  0 siblings, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2001-08-05  2:17 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

On Sun, 5 Aug 2001, Chris Wedgwood wrote:

> Some time ago, the logic for merging VMAs was changing (simplified).
> I noticed a couple of applications, specifically things seemed a bit
> sluggish when running things that either grow slowly or use lots of
> shared libraries:
>
> cw:tty5@tapu(cw)$ wc -l /proc/1368/maps
>    5287 /proc/1368/maps

Ouch, what kind of application is this happening with ?

regards,

Rik
--
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05  2:17 ` Rik van Riel
@ 2001-08-05  5:12   ` Chris Wedgwood
  2001-08-05 13:06     ` Alan Cox
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-05  5:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

On Sat, Aug 04, 2001 at 11:17:26PM -0300, Rik van Riel wrote:

    > cw:tty5@tapu(cw)$ wc -l /proc/1368/maps
    >    5287 /proc/1368/maps
    
    Ouch, what kind of application is this happening with ?

Mozilla.  Presumably some of the Gnome applications might be the same
as they use lots and lots of shared libraries (anyone out there Gnome
inflicted and can check?).

Why do we no longer merge? Is it too expensive?  If so, perhaps we
defer merging in some value is reached?

    IA64: a worthy successor to i860.

Interrupts aside it wasn't a bad little processor :)



  --cw


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05  5:12   ` Chris Wedgwood
@ 2001-08-05 13:06     ` Alan Cox
  2001-08-05 13:18       ` Chris Wedgwood
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alan Cox @ 2001-08-05 13:06 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Rik van Riel, linux-kernel

>     Ouch, what kind of application is this happening with ?
> 
> Mozilla.  Presumably some of the Gnome applications might be the same
> as they use lots and lots of shared libraries (anyone out there Gnome
> inflicted and can check?).
> 
> Why do we no longer merge? Is it too expensive?  If so, perhaps we

Linus took itout because it was quite complex and nobody seemed to have
cases that triggered it or made it useful

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05 13:06     ` Alan Cox
@ 2001-08-05 13:18       ` Chris Wedgwood
  2001-08-05 23:07       ` Jakob Østergaard
  2001-08-05 23:41       ` Linus Torvalds
  2 siblings, 0 replies; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-05 13:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rik van Riel, linux-kernel

On Sun, Aug 05, 2001 at 02:06:16PM +0100, Alan Cox wrote:

    Linus took itout because it was quite complex and nobody seemed to
    have cases that triggered it or made it useful

Hmm... well it seems the are cases which trigger this, mozilla and
vmware being quite common.

Is a less heavy-handed approach than the original code possible?
Something like when inserting into a processes vma, if there are more
than <n> entries, we lock/scan/coalesce/unlock --- or would this
locking be too gross?



  --cw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05 13:06     ` Alan Cox
  2001-08-05 13:18       ` Chris Wedgwood
@ 2001-08-05 23:07       ` Jakob Østergaard
  2001-08-05 23:41       ` Linus Torvalds
  2 siblings, 0 replies; 26+ messages in thread
From: Jakob Østergaard @ 2001-08-05 23:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Chris Wedgwood, Rik van Riel, linux-kernel

On Sun, Aug 05, 2001 at 02:06:16PM +0100, Alan Cox wrote:
> >     Ouch, what kind of application is this happening with ?
> > 
> > Mozilla.  Presumably some of the Gnome applications might be the same
> > as they use lots and lots of shared libraries (anyone out there Gnome
> > inflicted and can check?).
> > 
> > Why do we no longer merge? Is it too expensive?  If so, perhaps we
> 
> Linus took itout because it was quite complex and nobody seemed to have
> cases that triggered it or made it useful

What ??

It was put back in because RH GCC-2.96 triggers this too.  There was a thread
about this some months ago.

Did it get re-removed ?

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05 13:06     ` Alan Cox
  2001-08-05 13:18       ` Chris Wedgwood
  2001-08-05 23:07       ` Jakob Østergaard
@ 2001-08-05 23:41       ` Linus Torvalds
  2001-08-06  0:41         ` Michael H. Warfield
  2001-08-06  9:43         ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood
  2 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2001-08-05 23:41 UTC (permalink / raw)
  To: jakob, linux-kernel

In article <20010806010738.B11372@unthought.net> you write:
>> 
>> Linus took itout because it was quite complex and nobody seemed to have
>> cases that triggered it or made it useful
>
>What ??
>
>It was put back in because RH GCC-2.96 triggers this too.  There was a thread
>about this some months ago.

Strictly speaking, it wasn't put back. 

What recent kernels will do is merge a certain subset of mergeable
areas: this speeds up anonymous page allocation, whether by
mmap(MAP_ANONYMOYS) or by brk(). That subset was just made a bit larger
(and no, the subset hasn't been shrunk).

However, it doesn't merge in the generic case (it does not merge
mappings with backing store, for example), and it also does not merge
the case of the user actively changing the memory protections, for
example. 

So we certainly used to do more aggressive merging.

We could merge more, but I'm not interested in working around broken
applications. Right now we sanely merge the cases of consecutive
anonymous mmaps, but we do _not_ merge cases where the app plays silly
games, for example.

I'd like to know more than just the app that shows problems - I'd like
to know what it is doing.

		Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05 23:41       ` Linus Torvalds
@ 2001-08-06  0:41         ` Michael H. Warfield
  2001-08-06  1:01           ` Linus Torvalds
  2001-08-06  9:43         ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood
  1 sibling, 1 reply; 26+ messages in thread
From: Michael H. Warfield @ 2001-08-06  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jakob, linux-kernel

On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote:

	[...]

	I haven't been following this thread previously so I may be
way off base on this, but this caught my attention...

> So we certainly used to do more aggressive merging.

> We could merge more, but I'm not interested in working around broken
> applications. Right now we sanely merge the cases of consecutive
> anonymous mmaps, but we do _not_ merge cases where the app plays silly
> games, for example.

	Hmmm...  Apps that play silly games (intentionally) and
(deliberately) broken apps begin to fall into my territory.  Does
it become possible for a user application to create a system wide
denial of service by playing silly games or does this only affect
the application itself?  Yes, I know there are always ways of creating
denial of service attacks ala fork bombs and such, and I'm coming in on
this thread late, I'm just wondering about the scope of impact of "a
broken application" and does it give some leverage that can be
exploited by some misbehaving individual on a system?

> I'd like to know more than just the app that shows problems - I'd like
> to know what it is doing.

	Bruce Schneier put it best...  Fighting with broken applications
and classical "QA" and testing is programming for Murphy's computer.
Stuff goes bump in the night and broken apps cause bad things to happen.
In the security realm, we are programming for Satan's computer and have
to consider "apps that show problems" in the face of malicious intent.
What if what it is doing is trying to bring the system to its knees?

	If it only causes problems for the broken app, that's fine.  If it
causes problems for the rest of the system, that could be bad.

> 		Linus

	Mike
-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  0:41         ` Michael H. Warfield
@ 2001-08-06  1:01           ` Linus Torvalds
  2001-08-06  1:17             ` H. Peter Anvin
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2001-08-06  1:01 UTC (permalink / raw)
  To: linux-kernel

In article <20010805204143.A18899@alcove.wittsend.com>,
Michael H. Warfield <mhw@wittsend.com> wrote:
>On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote:
>
>> We could merge more, but I'm not interested in working around broken
>
>	If it only causes problems for the broken app, that's fine.  If it
>causes problems for the rest of the system, that could be bad.

It only causes problem for the broken app. Even then, the problem is a
(likely undetectable) slowdown, or in the extreme case the kernel will
just tell it that "Ok, you've allocated enough, no more soup for you".

		Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  1:01           ` Linus Torvalds
@ 2001-08-06  1:17             ` H. Peter Anvin
  2001-08-06  4:26               ` Linus Torvalds
  2001-08-06 11:52               ` Alan Cox
  0 siblings, 2 replies; 26+ messages in thread
From: H. Peter Anvin @ 2001-08-06  1:17 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <9kkq9k$829$1@penguin.transmeta.com>
By author:    torvalds@transmeta.com (Linus Torvalds)
In newsgroup: linux.dev.kernel
>
> In article <20010805204143.A18899@alcove.wittsend.com>,
> Michael H. Warfield <mhw@wittsend.com> wrote:
> >On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote:
> >
> >> We could merge more, but I'm not interested in working around broken
> >
> >	If it only causes problems for the broken app, that's fine.  If it
> >causes problems for the rest of the system, that could be bad.
> 
> It only causes problem for the broken app. Even then, the problem is a
> (likely undetectable) slowdown, or in the extreme case the kernel will
> just tell it that "Ok, you've allocated enough, no more soup for you".
> 

Do you count applications which selectively mprotect()'s memory (to
trap SIGSEGV and maintain coherency with on-disk data structures) as
"broken applications"?

Such applications *can* use large amounts of mprotect()'s.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  1:17             ` H. Peter Anvin
@ 2001-08-06  4:26               ` Linus Torvalds
  2001-08-06  6:30                 ` H. Peter Anvin
  2001-08-06 18:41                 ` Jamie Lokier
  2001-08-06 11:52               ` Alan Cox
  1 sibling, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2001-08-06  4:26 UTC (permalink / raw)
  To: linux-kernel

In article <9kkr7r$mov$1@cesium.transmeta.com>,
H. Peter Anvin <hpa@zytor.com> wrote:
>
>Do you count applications which selectively mprotect()'s memory (to
>trap SIGSEGV and maintain coherency with on-disk data structures) as
>"broken applications"?
>
>Such applications *can* use large amounts of mprotect()'s.

Note that such applications tend to not get any advantage from merging -
it does in fact only slow things down (because then the next mprotect
just has to split the thing again).

No, they aren't broken, but they should know that the use of lots of
small memory segments (even if it is a design goal) can and will slow
down page faulting, and use more memory for MM management for example. 

Linux does have a log(n) vma lookup, so the slowdown isn't huge.

		Linus


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  4:26               ` Linus Torvalds
@ 2001-08-06  6:30                 ` H. Peter Anvin
  2001-08-06 18:41                 ` Jamie Lokier
  1 sibling, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2001-08-06  6:30 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <9kl6aa$87l$1@penguin.transmeta.com>
By author:    torvalds@transmeta.com (Linus Torvalds)
In newsgroup: linux.dev.kernel
>
> In article <9kkr7r$mov$1@cesium.transmeta.com>,
> H. Peter Anvin <hpa@zytor.com> wrote:
> >
> >Do you count applications which selectively mprotect()'s memory (to
> >trap SIGSEGV and maintain coherency with on-disk data structures) as
> >"broken applications"?
> >
> >Such applications *can* use large amounts of mprotect()'s.
> 
> Note that such applications tend to not get any advantage from merging -
> it does in fact only slow things down (because then the next mprotect
> just has to split the thing again).
> 

Unless you're doing a sequential access in the data space, for example
while accessing a large object.  If a single large object (usually
called a BLOB) covers N pages, and is accessed in its entirety, you
will typically have N pagefaults, each of which bring/unprotect the
page and then mprotect() it accordingly.  Those could all be merged
back into a single vma.

Now, I don't know how frequently this actually happens, but I do think
it is at least a possibility.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long)
  2001-08-05 23:41       ` Linus Torvalds
  2001-08-06  0:41         ` Michael H. Warfield
@ 2001-08-06  9:43         ` Chris Wedgwood
  1 sibling, 0 replies; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-06  9:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jakob, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5727 bytes --]

On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote:

    I'd like to know more than just the app that shows problems - I'd
    like to know what it is doing.

Well, since I initially complained (this time).... I thought I would
try and quantify things a little.

Attached is a program I use for reading 'maps' files and showing what
levels of aggregation are possible, vma-merge-test.c (barf).



Now, I wrote a small proglet to malloc (I'm interested in testing how
glibc behaves, since pretty much everything uses glibc) memory in
chunks until it can do so no longer, this means getting pretty close
to 3G in my machine (not all pages need be resident).

Now, if I malloc 1 megabyte chunks, things coalesce very nicely, in
fact, the coalesce as much as you reasonable can coalesce things,
to about 13 vmas or so, which is about as good as you can hope for if
using shared libraries.

If I allocate 4K chunks, I get 65746 vmas! Values in between obviously
have varying effects:

     13 alloc-1M
   3069 alloc-512K
   7151 alloc-256K
  32731 alloc-64K
  65746 alloc-4K

like so.

the 4K allocations will actually coalesce into only 11 vmas (the fact
is does better than 1M is because we have better granularity so it
fills in gaps where 1M chunks simply won't fit)!

alloc-512K can't be coalesced at all, alloc-256K can be by about 50%,
alloc-128K by 25% and alloc-64K by 12.5% --- no points for spotting
the pattern.

Using strace....

... for 1M allocations, I can see there are 2038 mmap's for 257*4k to
allocate the 2G or so, and brk is used to 'allocate' 257*4k chunks of
memory 897 times, which pretty much gives us our 3G.  FWIW, mmap is
used for the first 1G or so, brk for the next 1G, and mmap for the
last 1G, with a call to mprotect hidden in there.  The mmaps are
PROT_READ|PROT_WRITE.

... for 4K allocations, I can see brk is used to allocate 8K chunks,
114000 times or so, getting 1G, then mmap is used to allocate 2M
chunks, of which about 1M is munmapp'ed and for the remaining 1M
mprotect is called (page by page!) making about 250 odd mprotect
calls.  This appears to happen until 3G is allocated. The mmaps are
PROT_NONE and the mprotect's change this to PROT_READ|PROT_WRITE.

... for 128K allocations, mmap is used to grab 33*4k about 1024 times,
netting 128M, brk using to allocate 32(+/-1)*4k pages about 7000 times
netting around 1G and then a pattern of mmap 2M, munmap 1M, protect
{33,32,32,32,32,32}*4k of the still mapped 1M --- this is the bit that
sucks.  The mmap was done PROT_NONE, the protect's change this to
PROT_READ|PROT_WRITE, but not all of the 1M, so the ability to
coalesce here is thwarted (you can coalesce the 32*4k mprotect
regions, the remain region has the wrong protection).



What does this have to do with reality?

IT DEPENDS ON WHAT APPLICATION(S) YOU ARE RUNNING.

It appears mozilla, that super lean, super fast and very stable
web-browser mostly grows using brk with fairly small increments (under
64K) as it reads data in form various places --- and from several
threads at a time.... and lots of small allocates appears to be a
"Very Bad Thing".  A couple of people sent me examples of other
applications that cause problems too, for example David Luyer sent me
the map for evolution-mail which is some new "fangled pointy-clicky
Gnome super-widget-enhanced" mail application --- perhaps that also
grows memory slowly (I don't have an strace of it, so this is just
speculation).

VMware (capatalisation?) also causes large numbers of vmas, but my
attempts to get Xfree86, gimp or gcc (when compiling C code) to do so
were unsuccessful, all showed little if any ability to merge vmas.
Compiling a large c++ application might show some gains here, but I
don't have anything large enough to try.


In linux/mm/mmap.c:do_brk I see:

        /* Can we just expand an old anonymous mapping? */
        if (addr) {
                struct vm_area_struct * vma = find_vma(mm, addr-1);
                if (vma && vma->vm_end == addr && !vma->vm_file &&
                    vma->vm_flags == flags) {
                        vma->vm_end = addr + len;
                        goto out;
                }
        }

which explains why allocations from increments of brk do coalesce
well.  Elsewhere in linux/mm/mmap.c:do_mmap_pgoff we have:

        /* Can we just expand an old anonymous mapping? */
        if (addr && !file && !(vm_flags & VM_SHARED)) {
                struct vm_area_struct * vma = find_vma(mm, addr-1);
                if (vma && vma->vm_end == addr && !vma->vm_file &&
                    vma->vm_flags == vm_flags) {
                        vma->vm_end = addr + len;
                        goto out;
                }
        }

so I assume consistent use of mmap will produce good results too.


BUT, glibc doesn't always have consistent use, as I mentioned about,
it will often do

        mmap( .... PROT_FOO ... )
        munmap ( some of the above )           [optional]
        for( ... )
                mprotect ( PROT_BAR ... )

        which means the simple logic above cannot coalesce things.




This leaves three (four) possibilities:

   (1) change glibc to avoid the above behavior

   (2) fiddle with mprotect to expand/coalesce regions

   (3) declare problematic applications borked

or maybe

   (4) have more complex vma logic all over the place in the kernel


Anyhow, that's my very brief rather unscientific handy-waving
explanation that seems to make sense to me!

Incidentally, the algorithm in linux/fs/proc/aarry.c:proc_pid_read_maps
is _terribly_ slow for reading /proc/<n>/maps when there are many
vmas.  We could possible hack around this by assuming a contact
line-length or something (too gross?).




  --cw

[-- Attachment #2: vma-merge-test.c --]
[-- Type: text/x-csrc, Size: 1948 bytes --]

/*
 * vma-merge-test.c --- count # entries in /proc/<n>/maps as well as
 * indication of mergability (is that a word?)
 *
 * technically this is buggy --- pass the paper bag :)
 *
 * cw@f00f.org, 6 Aug 2001
 *
 */

#include <errno.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    FILE *f;
    char linebuf[128];
    long st, en, len, dunno;
    long pen = 0, ms = 0;
    char flags[32], indev[32], path[256];
    char pflags[32], pindev[32];
    int cu = 0, cm = 0, mc = 0;

    if(argc != 2) {
        fprintf(stderr, "Please supply one argument, the 'maps' file\n");
        return 1;
    }

    if(!(f = fopen(argv[1],"r"))) {
        fprintf(stderr, "error (%s) trying to open '%s'\n", strerror(errno), argv[1]);
        return 2;
    }

    pflags[0] = '\000';
    pindev[0] = '\000';

    while(!feof(f)) {
        if(!fgets(linebuf, sizeof linebuf, f))
            break;

        if(sscanf(linebuf, "%lx-%lx %s %8lx %s %ld %s\n",
                  &st, &en, flags, &len, indev, &dunno, path) < 5) {
            fprintf(stderr, "Bad line\n\t%s\nAborting\n", linebuf);
            break;
        }

        cu++;

        /* same as previous mapping and adjacent, then merge is
           possible */

        if(!strcmp(indev, pindev) && !strcmp(flags, pflags) && (st == pen)) {
            cm++;
            mc++;
        } else {
            if(mc) {
                /* show merged results */
                printf("%08lx-%08lx %s %s (%d)\n", ms, pen, flags, indev, mc);
            }

            /* show these results */
            printf("%08lx-%08lx %s %s\n", st, en, flags, indev);

            strcpy(pindev, indev);
            strcpy(pflags, flags);
            ms = st;
            mc = 0;
        }
        pen = en;
    }

    printf("\n%d entries, %d merges\n", cu, cm);
    printf("%d with merging, %4.1f%% of original\n", cu - cm,
           (double)(100.0 * (cu - cm) / cu));

    fclose(f);

    return 0;
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  1:17             ` H. Peter Anvin
  2001-08-06  4:26               ` Linus Torvalds
@ 2001-08-06 11:52               ` Alan Cox
  2001-08-06 12:23                 ` Chris Wedgwood
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Cox @ 2001-08-06 11:52 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

> Do you count applications which selectively mprotect()'s memory (to
> trap SIGSEGV and maintain coherency with on-disk data structures) as
> "broken applications"?
> 
> Such applications *can* use large amounts of mprotect()'s.

That would explain a lot since mprotect currently doesn't seem to do
merging, and worse it also seems to not be doing rlimit checking right

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06 11:52               ` Alan Cox
@ 2001-08-06 12:23                 ` Chris Wedgwood
  2001-08-06 13:17                   ` Alan Cox
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-06 12:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: H. Peter Anvin, linux-kernel

On Mon, Aug 06, 2001 at 12:52:37PM +0100, Alan Cox wrote:

    That would explain a lot since mprotect currently doesn't seem to do
    merging, and worse it also seems to not be doing rlimit checking right

Err stupid question, but why does it need to do rlimit checking?


  --cw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06 12:23                 ` Chris Wedgwood
@ 2001-08-06 13:17                   ` Alan Cox
  2001-08-06 13:55                     ` Chris Wedgwood
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Cox @ 2001-08-06 13:17 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Alan Cox, H. Peter Anvin, linux-kernel

> On Mon, Aug 06, 2001 at 12:52:37PM +0100, Alan Cox wrote:
> 
>     That would explain a lot since mprotect currently doesn't seem to do
>     merging, and worse it also seems to not be doing rlimit checking right
> 
> Err stupid question, but why does it need to do rlimit checking?

mmap nothing over a large space
mprotect it read/write


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06 13:17                   ` Alan Cox
@ 2001-08-06 13:55                     ` Chris Wedgwood
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wedgwood @ 2001-08-06 13:55 UTC (permalink / raw)
  To: Alan Cox; +Cc: H. Peter Anvin, linux-kernel

On Mon, Aug 06, 2001 at 02:17:32PM +0100, Alan Cox wrote:

    mmap nothing over a large space

shouldn't the rlimit be in the mmap?
(or are sparse mappings not supposed to count towards the rlimit?)



  --cw

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06  4:26               ` Linus Torvalds
  2001-08-06  6:30                 ` H. Peter Anvin
@ 2001-08-06 18:41                 ` Jamie Lokier
  2001-08-10 21:55                   ` Linus Torvalds
  1 sibling, 1 reply; 26+ messages in thread
From: Jamie Lokier @ 2001-08-06 18:41 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin; +Cc: linux-kernel

Linus Torvalds wrote:
> >Do you count applications which selectively mprotect()'s memory (to
> >trap SIGSEGV and maintain coherency with on-disk data structures) as
> >"broken applications"?
> >
> >Such applications *can* use large amounts of mprotect()'s.
> 
> Note that such applications tend to not get any advantage from merging -
> it does in fact only slow things down (because then the next mprotect
> just has to split the thing again).
> 
> No, they aren't broken, but they should know that the use of lots of
> small memory segments (even if it is a design goal) can and will slow
> down page faulting, and use more memory for MM management for example. 
> 
> Linux does have a log(n) vma lookup, so the slowdown isn't huge.

There are garbage collectors that use mprotect() and SEGV trapping per
page.  It would be nice if there was a way to change the protections per
page without requiring a VMA for each one.

Btw, Linux has pretty fast SIGSEGV handling (the fastest of any
OS/machine combination that I measured), so it's a good platform for
this sort of thing.  I measured 7.75 microseconds per page for SEGV
trapping followed by mprotect() in the handler, on a particular test on
a 600MHz Pentium III.

-- Jamie

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-06 18:41                 ` Jamie Lokier
@ 2001-08-10 21:55                   ` Linus Torvalds
  2001-08-10 22:00                     ` H. Peter Anvin
  2001-08-11  1:04                     ` Pavel Machek
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2001-08-10 21:55 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: H. Peter Anvin, linux-kernel


On Mon, 6 Aug 2001, Jamie Lokier wrote:
>
> There are garbage collectors that use mprotect() and SEGV trapping per
> page.  It would be nice if there was a way to change the protections per
> page without requiring a VMA for each one.

This is actually how Linux used to work a long long time ago - all
protection information was in the page tables, and you could do per-page
things without having to worry about piddling details like vma's.

It does work, but it had major downsides. Trivial things like re-creating
the permission after throwing a page out or swapping it out.

We used to have these "this is a COW page" and "this is shared writable"
bits in the page table etc - there are two sw bits on x86, and I think we
used them both.

These days, the vma's just have too much information, and the page tables
can't be counted on to have enough bits.

So on one level I basically agree with you, but at the same time it's just
not feasible any more. The VM got a lot better, and got ported to other
architectures. And it started needing more information - it used to be
enough to know whether a page was shared writable or privately writable or
not writable at all, but back then we didn't really support the full
semantics of shared memory or mprotect, so we didn't need all the
information we have to have now.

They were "the good old days", but trust me, you really don't want them
back. The vma's have some overhead, but it is not excessive, and they
really make things like a portable VM layer possible..

It's very hard to actually see any performance impact of the VMA handling.
It's a small structure, with reasonable lookup algorithms, and the common
case is still to not have all that many of them.

		Linus


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-10 21:55                   ` Linus Torvalds
@ 2001-08-10 22:00                     ` H. Peter Anvin
  2001-08-10 23:03                       ` Nicolas Pitre
  2001-08-10 23:26                       ` Linus Torvalds
  2001-08-11  1:04                     ` Pavel Machek
  1 sibling, 2 replies; 26+ messages in thread
From: H. Peter Anvin @ 2001-08-10 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jamie Lokier, linux-kernel

Linus Torvalds wrote:
> 
> These days, the vma's just have too much information, and the
> page tables
> can't be counted on to have enough bits.
> 

Note that it isn't very hard to deal with *that* problem, *if you want 
to*... you just need to maintain a shadow data structure in the same 
format as the page tables and stuff your software bits in there.

Whether or not that is a good idea is another issue entirely, however, 
on some level it would make sense to separate protection from all the 
other VM things...

	-hpa


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-10 22:00                     ` H. Peter Anvin
@ 2001-08-10 23:03                       ` Nicolas Pitre
  2001-08-10 23:26                       ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2001-08-10 23:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Jamie Lokier, lkml



On Fri, 10 Aug 2001, H. Peter Anvin wrote:

> Linus Torvalds wrote:
> >
> > These days, the vma's just have too much information, and the
> > page tables
> > can't be counted on to have enough bits.
> >
>
> Note that it isn't very hard to deal with *that* problem, *if you want
> to*... you just need to maintain a shadow data structure in the same
> format as the page tables and stuff your software bits in there.

This technique is already used on ARM.


Nicolas


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-10 22:00                     ` H. Peter Anvin
  2001-08-10 23:03                       ` Nicolas Pitre
@ 2001-08-10 23:26                       ` Linus Torvalds
  2001-08-10 23:55                         ` Rik van Riel
  1 sibling, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2001-08-10 23:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jamie Lokier, linux-kernel


On Fri, 10 Aug 2001, H. Peter Anvin wrote:
>
> Note that it isn't very hard to deal with *that* problem, *if you want
> to*... you just need to maintain a shadow data structure in the same
> format as the page tables and stuff your software bits in there.

Actually, this is what Linux already does.

The Linux page tables _are_ a "shadow data structure", and are
conceptually independent from the hardware page tables (or hash table, or
whatever the actual hardware uses to actually fill in the TLB).

This is most clearly seen on CPU's that don't have traditional page table
trees, but use software fill TLB's, hashes, or other things in hardware.

> Whether or not that is a good idea is another issue entirely, however,
> on some level it would make sense to separate protection from all the
> other VM things...

I think that the current Linux approach is much superior - the page tables
are conceptually a separate shadow data structure, but the way things are
set up, you can choose to make the mapping from the shadow data structure
to the actual hardware data structures be a 1:1 mapping.

This does mean that we do NOT want to make the Linux shadow page tables
contain stuff that is not easy to translate to hardware page tables.
Tough. It's a trade-off: either you overspecify the kernel page tables
(and take the hit of having to keep two separate page tables), or you say
"the kernel page tables are weaker than we could make them", and you get
the optimization of being able to "fold" them on top of the hardware page
tables.

I'm 100% convinced that the Linux VM does the right choice - we optimize
for the important case, and I will claim that it is _really_ hard for
anybody to make a VM that is as efficient and as fast as the Linux one.

Proof: show me a full-fledged VM setup that even comes _close_ in
performance, and gives the protection and the flexibility that the Linux
one does.

And yes, we do have _another_ shadow data structure too. It's called the
vm_area_struct, aka "vma", and we do not artificially limit ourself to
trying to look like hardware on that one.

Which brings us back to the original question, and answers it: we already
do all of this, and we do it RIGHT. We optimize for the right things.

		Linus


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-10 23:26                       ` Linus Torvalds
@ 2001-08-10 23:55                         ` Rik van Riel
  0 siblings, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2001-08-10 23:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Jamie Lokier, linux-kernel

On Fri, 10 Aug 2001, Linus Torvalds wrote:

> Which brings us back to the original question, and answers it: we already
> do all of this, and we do it RIGHT. We optimize for the right things.

... and die under load.

There still are a whole number of things outstanding:

1) true low-memory deadlock prevention (memory reservations?)
2) load control, so we won't die from thrashing
3) better IO clustering, to push the thrashing point out further

regards,

Rik
--
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-10 21:55                   ` Linus Torvalds
  2001-08-10 22:00                     ` H. Peter Anvin
@ 2001-08-11  1:04                     ` Pavel Machek
  1 sibling, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2001-08-11  1:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jamie Lokier, H. Peter Anvin, linux-kernel

Hi!

> > There are garbage collectors that use mprotect() and SEGV trapping per
> > page.  It would be nice if there was a way to change the protections per
> > page without requiring a VMA for each one.
> 
> This is actually how Linux used to work a long long time ago - all
> protection information was in the page tables, and you could do per-page
> things without having to worry about piddling details like vma's.
> 
> It does work, but it had major downsides. Trivial things like re-creating
> the permission after throwing a page out or swapping it out.

For some uses, spurious SEGV after swap-in might be okay ;-). Garbage
collector might be that example.				Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
  2001-08-05  6:44 /proc/<n>/maps getting _VERY_ long David Luyer
@ 2001-08-05  7:21 ` Anders Eriksson
  0 siblings, 0 replies; 26+ messages in thread
From: Anders Eriksson @ 2001-08-05  7:21 UTC (permalink / raw)
  To: David Luyer; +Cc: linux-kernel, Chris Wedgwood, riel


My current winner it s vmware (latest version) with a freshly booted w98:

90 /proc/21582/maps
1015 /proc/14395/maps
3909 total
[ander@milou ander]$ ps 14395
PID TTY      STAT   TIME COMMAND
14395 ttyp2    S    469:40 vmware


/A


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: /proc/<n>/maps getting _VERY_ long
@ 2001-08-05  6:44 David Luyer
  2001-08-05  7:21 ` Anders Eriksson
  0 siblings, 1 reply; 26+ messages in thread
From: David Luyer @ 2001-08-05  6:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Wedgwood, riel


I wrote (off-list):
> On 05 Aug 2001 17:12:02 +1200, Chris Wedgwood wrote:
> > On Sat, Aug 04, 2001 at 11:17:26PM -0300, Rik van Riel wrote:
> >
> >     > cw:tty5@tapu(cw)$ wc -l /proc/1368/maps
> >     >    5287 /proc/1368/maps
> >
> >     Ouch, what kind of application is this happening with ?
> >
> > Mozilla.  Presumably some of the Gnome applications might be the same
> > as they use lots and lots of shared libraries (anyone out there Gnome
> > inflicted and can check?).
>
> FYI: Linux 2.2.14 (yes, I know, it's old but I've had no cause to update
> the machine in question):
>
> Mozilla: 215 lines in /proc/$$/maps
> StarOffice opening a small PowerPoint: 209 lines in /proc/$$/maps
> Evolution Mail Component: 193 lines in /proc/$$/maps
>
> Those are the current 'winners' on my wc -l /proc/*/maps | sort -n but
> I'm not exactly doing anything to stress the machine.  Hard to know if
> the 2.2.x number of mappings will have any correlation with 2.4.x (as
> if 2.4.x isn't aggressive combining ranges but both allocate initially
> as well as each other, it might get a lot worse with long-running
> processes on 2.4.x but not on 2.2.x, for example).

And the same machine, 2.4.7ac5:

Mozilla: 222 lines in /proc/$$/maps on startup... and growing
StarOffice opening a small PowerPoint: 209 lines in /proc/$$/maps
Evolution Mail Component: 181 lines in /proc/$$/maps

But after visiting a few web pages Mozilla has already grown to 265 mappings;
302 mappings; growing... (whereas playing around in Evolution Mail only
increased it's number to 185.. actually as I finish off this mail and have
done a few other things it's up to 222 now).

So the problem is something which Mozilla is particularly good at triggering.
Under 2.2.14 the number of mappings for Mozilla wasn't growing significantly
with use.  But that doesn't say that it isn't some kind of 'bad' behaviour
from Mozilla.

Here's some sample mappings for evolution-mail:

40f10000-40f11000 rw-p 000cf000 00:00 0
40f11000-40f12000 rw-p 000d0000 00:00 0
40f12000-40f13000 rw-p 000d1000 00:00 0
40f13000-40f14000 rw-p 000d2000 00:00 0
40f14000-40f15000 rw-p 000d3000 00:00 0
40f15000-40f16000 rw-p 000d4000 00:00 0
40f16000-40f17000 rw-p 000d5000 00:00 0
40f17000-40f19000 rw-p 000d6000 00:00 0
40f19000-40f1a000 rw-p 000d8000 00:00 0
40f1a000-40f1d000 rw-p 000d9000 00:00 0
40f1d000-40f25000 rw-p 000dc000 00:00 0
40f25000-40f26000 rw-p 000e4000 00:00 0
40f26000-40f27000 rw-p 000e5000 00:00 0
[...]

Now I would naievely assume those adjacent contiguous mappings with equal
permissions could pretty easily be merged.

David.
-- 
David Luyer                                     Phone:   +61 3 9674 7525
Engineering Projects Manager   P A C I F I C    Fax:     +61 3 9699 8693
Pacific Internet (Australia)  I N T E R N E T   Mobile:  +61 4 1111 2983
http://www.pacific.net.au/                      NASDAQ:  PCNTF

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2001-08-14 11:49 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-04 15:43 /proc/<n>/maps getting _VERY_ long Chris Wedgwood
2001-08-05  2:17 ` Rik van Riel
2001-08-05  5:12   ` Chris Wedgwood
2001-08-05 13:06     ` Alan Cox
2001-08-05 13:18       ` Chris Wedgwood
2001-08-05 23:07       ` Jakob Østergaard
2001-08-05 23:41       ` Linus Torvalds
2001-08-06  0:41         ` Michael H. Warfield
2001-08-06  1:01           ` Linus Torvalds
2001-08-06  1:17             ` H. Peter Anvin
2001-08-06  4:26               ` Linus Torvalds
2001-08-06  6:30                 ` H. Peter Anvin
2001-08-06 18:41                 ` Jamie Lokier
2001-08-10 21:55                   ` Linus Torvalds
2001-08-10 22:00                     ` H. Peter Anvin
2001-08-10 23:03                       ` Nicolas Pitre
2001-08-10 23:26                       ` Linus Torvalds
2001-08-10 23:55                         ` Rik van Riel
2001-08-11  1:04                     ` Pavel Machek
2001-08-06 11:52               ` Alan Cox
2001-08-06 12:23                 ` Chris Wedgwood
2001-08-06 13:17                   ` Alan Cox
2001-08-06 13:55                     ` Chris Wedgwood
2001-08-06  9:43         ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood
2001-08-05  6:44 /proc/<n>/maps getting _VERY_ long David Luyer
2001-08-05  7:21 ` Anders Eriksson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).