* /proc/<n>/maps getting _VERY_ long @ 2001-08-04 15:43 Chris Wedgwood 2001-08-05 2:17 ` Rik van Riel 0 siblings, 1 reply; 26+ messages in thread From: Chris Wedgwood @ 2001-08-04 15:43 UTC (permalink / raw) To: linux-kernel Some time ago, the logic for merging VMAs was changing (simplified). I noticed a couple of applications, specifically things seemed a bit sluggish when running things that either grow slowly or use lots of shared libraries: cw:tty5@tapu(cw)$ wc -l /proc/1368/maps 5287 /proc/1368/maps it's totally unusual. Can anyone tell me why we don't merge such entries anymore? --cw ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-04 15:43 /proc/<n>/maps getting _VERY_ long Chris Wedgwood @ 2001-08-05 2:17 ` Rik van Riel 2001-08-05 5:12 ` Chris Wedgwood 0 siblings, 1 reply; 26+ messages in thread From: Rik van Riel @ 2001-08-05 2:17 UTC (permalink / raw) To: Chris Wedgwood; +Cc: linux-kernel On Sun, 5 Aug 2001, Chris Wedgwood wrote: > Some time ago, the logic for merging VMAs was changing (simplified). > I noticed a couple of applications, specifically things seemed a bit > sluggish when running things that either grow slowly or use lots of > shared libraries: > > cw:tty5@tapu(cw)$ wc -l /proc/1368/maps > 5287 /proc/1368/maps Ouch, what kind of application is this happening with ? regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 2:17 ` Rik van Riel @ 2001-08-05 5:12 ` Chris Wedgwood 2001-08-05 13:06 ` Alan Cox 0 siblings, 1 reply; 26+ messages in thread From: Chris Wedgwood @ 2001-08-05 5:12 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel On Sat, Aug 04, 2001 at 11:17:26PM -0300, Rik van Riel wrote: > cw:tty5@tapu(cw)$ wc -l /proc/1368/maps > 5287 /proc/1368/maps Ouch, what kind of application is this happening with ? Mozilla. Presumably some of the Gnome applications might be the same as they use lots and lots of shared libraries (anyone out there Gnome inflicted and can check?). Why do we no longer merge? Is it too expensive? If so, perhaps we defer merging in some value is reached? IA64: a worthy successor to i860. Interrupts aside it wasn't a bad little processor :) --cw ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 5:12 ` Chris Wedgwood @ 2001-08-05 13:06 ` Alan Cox 2001-08-05 13:18 ` Chris Wedgwood ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Alan Cox @ 2001-08-05 13:06 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Rik van Riel, linux-kernel > Ouch, what kind of application is this happening with ? > > Mozilla. Presumably some of the Gnome applications might be the same > as they use lots and lots of shared libraries (anyone out there Gnome > inflicted and can check?). > > Why do we no longer merge? Is it too expensive? If so, perhaps we Linus took itout because it was quite complex and nobody seemed to have cases that triggered it or made it useful ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 13:06 ` Alan Cox @ 2001-08-05 13:18 ` Chris Wedgwood 2001-08-05 23:07 ` Jakob Østergaard 2001-08-05 23:41 ` Linus Torvalds 2 siblings, 0 replies; 26+ messages in thread From: Chris Wedgwood @ 2001-08-05 13:18 UTC (permalink / raw) To: Alan Cox; +Cc: Rik van Riel, linux-kernel On Sun, Aug 05, 2001 at 02:06:16PM +0100, Alan Cox wrote: Linus took itout because it was quite complex and nobody seemed to have cases that triggered it or made it useful Hmm... well it seems the are cases which trigger this, mozilla and vmware being quite common. Is a less heavy-handed approach than the original code possible? Something like when inserting into a processes vma, if there are more than <n> entries, we lock/scan/coalesce/unlock --- or would this locking be too gross? --cw ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 13:06 ` Alan Cox 2001-08-05 13:18 ` Chris Wedgwood @ 2001-08-05 23:07 ` Jakob Østergaard 2001-08-05 23:41 ` Linus Torvalds 2 siblings, 0 replies; 26+ messages in thread From: Jakob Østergaard @ 2001-08-05 23:07 UTC (permalink / raw) To: Alan Cox; +Cc: Chris Wedgwood, Rik van Riel, linux-kernel On Sun, Aug 05, 2001 at 02:06:16PM +0100, Alan Cox wrote: > > Ouch, what kind of application is this happening with ? > > > > Mozilla. Presumably some of the Gnome applications might be the same > > as they use lots and lots of shared libraries (anyone out there Gnome > > inflicted and can check?). > > > > Why do we no longer merge? Is it too expensive? If so, perhaps we > > Linus took itout because it was quite complex and nobody seemed to have > cases that triggered it or made it useful What ?? It was put back in because RH GCC-2.96 triggers this too. There was a thread about this some months ago. Did it get re-removed ? -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 13:06 ` Alan Cox 2001-08-05 13:18 ` Chris Wedgwood 2001-08-05 23:07 ` Jakob Østergaard @ 2001-08-05 23:41 ` Linus Torvalds 2001-08-06 0:41 ` Michael H. Warfield 2001-08-06 9:43 ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood 2 siblings, 2 replies; 26+ messages in thread From: Linus Torvalds @ 2001-08-05 23:41 UTC (permalink / raw) To: jakob, linux-kernel In article <20010806010738.B11372@unthought.net> you write: >> >> Linus took itout because it was quite complex and nobody seemed to have >> cases that triggered it or made it useful > >What ?? > >It was put back in because RH GCC-2.96 triggers this too. There was a thread >about this some months ago. Strictly speaking, it wasn't put back. What recent kernels will do is merge a certain subset of mergeable areas: this speeds up anonymous page allocation, whether by mmap(MAP_ANONYMOYS) or by brk(). That subset was just made a bit larger (and no, the subset hasn't been shrunk). However, it doesn't merge in the generic case (it does not merge mappings with backing store, for example), and it also does not merge the case of the user actively changing the memory protections, for example. So we certainly used to do more aggressive merging. We could merge more, but I'm not interested in working around broken applications. Right now we sanely merge the cases of consecutive anonymous mmaps, but we do _not_ merge cases where the app plays silly games, for example. I'd like to know more than just the app that shows problems - I'd like to know what it is doing. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 23:41 ` Linus Torvalds @ 2001-08-06 0:41 ` Michael H. Warfield 2001-08-06 1:01 ` Linus Torvalds 2001-08-06 9:43 ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood 1 sibling, 1 reply; 26+ messages in thread From: Michael H. Warfield @ 2001-08-06 0:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: jakob, linux-kernel On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote: [...] I haven't been following this thread previously so I may be way off base on this, but this caught my attention... > So we certainly used to do more aggressive merging. > We could merge more, but I'm not interested in working around broken > applications. Right now we sanely merge the cases of consecutive > anonymous mmaps, but we do _not_ merge cases where the app plays silly > games, for example. Hmmm... Apps that play silly games (intentionally) and (deliberately) broken apps begin to fall into my territory. Does it become possible for a user application to create a system wide denial of service by playing silly games or does this only affect the application itself? Yes, I know there are always ways of creating denial of service attacks ala fork bombs and such, and I'm coming in on this thread late, I'm just wondering about the scope of impact of "a broken application" and does it give some leverage that can be exploited by some misbehaving individual on a system? > I'd like to know more than just the app that shows problems - I'd like > to know what it is doing. Bruce Schneier put it best... Fighting with broken applications and classical "QA" and testing is programming for Murphy's computer. Stuff goes bump in the night and broken apps cause bad things to happen. In the security realm, we are programming for Satan's computer and have to consider "apps that show problems" in the face of malicious intent. What if what it is doing is trying to bring the system to its knees? If it only causes problems for the broken app, that's fine. If it causes problems for the rest of the system, that could be bad. > Linus Mike -- Michael H. Warfield | (770) 985-6132 | mhw@WittsEnd.com (The Mad Wizard) | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0xDF1DD471 | possible worlds. A pessimist is sure of it! ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 0:41 ` Michael H. Warfield @ 2001-08-06 1:01 ` Linus Torvalds 2001-08-06 1:17 ` H. Peter Anvin 0 siblings, 1 reply; 26+ messages in thread From: Linus Torvalds @ 2001-08-06 1:01 UTC (permalink / raw) To: linux-kernel In article <20010805204143.A18899@alcove.wittsend.com>, Michael H. Warfield <mhw@wittsend.com> wrote: >On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote: > >> We could merge more, but I'm not interested in working around broken > > If it only causes problems for the broken app, that's fine. If it >causes problems for the rest of the system, that could be bad. It only causes problem for the broken app. Even then, the problem is a (likely undetectable) slowdown, or in the extreme case the kernel will just tell it that "Ok, you've allocated enough, no more soup for you". Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 1:01 ` Linus Torvalds @ 2001-08-06 1:17 ` H. Peter Anvin 2001-08-06 4:26 ` Linus Torvalds 2001-08-06 11:52 ` Alan Cox 0 siblings, 2 replies; 26+ messages in thread From: H. Peter Anvin @ 2001-08-06 1:17 UTC (permalink / raw) To: linux-kernel Followup to: <9kkq9k$829$1@penguin.transmeta.com> By author: torvalds@transmeta.com (Linus Torvalds) In newsgroup: linux.dev.kernel > > In article <20010805204143.A18899@alcove.wittsend.com>, > Michael H. Warfield <mhw@wittsend.com> wrote: > >On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote: > > > >> We could merge more, but I'm not interested in working around broken > > > > If it only causes problems for the broken app, that's fine. If it > >causes problems for the rest of the system, that could be bad. > > It only causes problem for the broken app. Even then, the problem is a > (likely undetectable) slowdown, or in the extreme case the kernel will > just tell it that "Ok, you've allocated enough, no more soup for you". > Do you count applications which selectively mprotect()'s memory (to trap SIGSEGV and maintain coherency with on-disk data structures) as "broken applications"? Such applications *can* use large amounts of mprotect()'s. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 1:17 ` H. Peter Anvin @ 2001-08-06 4:26 ` Linus Torvalds 2001-08-06 6:30 ` H. Peter Anvin 2001-08-06 18:41 ` Jamie Lokier 2001-08-06 11:52 ` Alan Cox 1 sibling, 2 replies; 26+ messages in thread From: Linus Torvalds @ 2001-08-06 4:26 UTC (permalink / raw) To: linux-kernel In article <9kkr7r$mov$1@cesium.transmeta.com>, H. Peter Anvin <hpa@zytor.com> wrote: > >Do you count applications which selectively mprotect()'s memory (to >trap SIGSEGV and maintain coherency with on-disk data structures) as >"broken applications"? > >Such applications *can* use large amounts of mprotect()'s. Note that such applications tend to not get any advantage from merging - it does in fact only slow things down (because then the next mprotect just has to split the thing again). No, they aren't broken, but they should know that the use of lots of small memory segments (even if it is a design goal) can and will slow down page faulting, and use more memory for MM management for example. Linux does have a log(n) vma lookup, so the slowdown isn't huge. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 4:26 ` Linus Torvalds @ 2001-08-06 6:30 ` H. Peter Anvin 2001-08-06 18:41 ` Jamie Lokier 1 sibling, 0 replies; 26+ messages in thread From: H. Peter Anvin @ 2001-08-06 6:30 UTC (permalink / raw) To: linux-kernel Followup to: <9kl6aa$87l$1@penguin.transmeta.com> By author: torvalds@transmeta.com (Linus Torvalds) In newsgroup: linux.dev.kernel > > In article <9kkr7r$mov$1@cesium.transmeta.com>, > H. Peter Anvin <hpa@zytor.com> wrote: > > > >Do you count applications which selectively mprotect()'s memory (to > >trap SIGSEGV and maintain coherency with on-disk data structures) as > >"broken applications"? > > > >Such applications *can* use large amounts of mprotect()'s. > > Note that such applications tend to not get any advantage from merging - > it does in fact only slow things down (because then the next mprotect > just has to split the thing again). > Unless you're doing a sequential access in the data space, for example while accessing a large object. If a single large object (usually called a BLOB) covers N pages, and is accessed in its entirety, you will typically have N pagefaults, each of which bring/unprotect the page and then mprotect() it accordingly. Those could all be merged back into a single vma. Now, I don't know how frequently this actually happens, but I do think it is at least a possibility. -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! "Unix gives you enough rope to shoot yourself in the foot." http://www.zytor.com/~hpa/puzzle.txt <amsp@zytor.com> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 4:26 ` Linus Torvalds 2001-08-06 6:30 ` H. Peter Anvin @ 2001-08-06 18:41 ` Jamie Lokier 2001-08-10 21:55 ` Linus Torvalds 1 sibling, 1 reply; 26+ messages in thread From: Jamie Lokier @ 2001-08-06 18:41 UTC (permalink / raw) To: Linus Torvalds, H. Peter Anvin; +Cc: linux-kernel Linus Torvalds wrote: > >Do you count applications which selectively mprotect()'s memory (to > >trap SIGSEGV and maintain coherency with on-disk data structures) as > >"broken applications"? > > > >Such applications *can* use large amounts of mprotect()'s. > > Note that such applications tend to not get any advantage from merging - > it does in fact only slow things down (because then the next mprotect > just has to split the thing again). > > No, they aren't broken, but they should know that the use of lots of > small memory segments (even if it is a design goal) can and will slow > down page faulting, and use more memory for MM management for example. > > Linux does have a log(n) vma lookup, so the slowdown isn't huge. There are garbage collectors that use mprotect() and SEGV trapping per page. It would be nice if there was a way to change the protections per page without requiring a VMA for each one. Btw, Linux has pretty fast SIGSEGV handling (the fastest of any OS/machine combination that I measured), so it's a good platform for this sort of thing. I measured 7.75 microseconds per page for SEGV trapping followed by mprotect() in the handler, on a particular test on a 600MHz Pentium III. -- Jamie ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 18:41 ` Jamie Lokier @ 2001-08-10 21:55 ` Linus Torvalds 2001-08-10 22:00 ` H. Peter Anvin 2001-08-11 1:04 ` Pavel Machek 0 siblings, 2 replies; 26+ messages in thread From: Linus Torvalds @ 2001-08-10 21:55 UTC (permalink / raw) To: Jamie Lokier; +Cc: H. Peter Anvin, linux-kernel On Mon, 6 Aug 2001, Jamie Lokier wrote: > > There are garbage collectors that use mprotect() and SEGV trapping per > page. It would be nice if there was a way to change the protections per > page without requiring a VMA for each one. This is actually how Linux used to work a long long time ago - all protection information was in the page tables, and you could do per-page things without having to worry about piddling details like vma's. It does work, but it had major downsides. Trivial things like re-creating the permission after throwing a page out or swapping it out. We used to have these "this is a COW page" and "this is shared writable" bits in the page table etc - there are two sw bits on x86, and I think we used them both. These days, the vma's just have too much information, and the page tables can't be counted on to have enough bits. So on one level I basically agree with you, but at the same time it's just not feasible any more. The VM got a lot better, and got ported to other architectures. And it started needing more information - it used to be enough to know whether a page was shared writable or privately writable or not writable at all, but back then we didn't really support the full semantics of shared memory or mprotect, so we didn't need all the information we have to have now. They were "the good old days", but trust me, you really don't want them back. The vma's have some overhead, but it is not excessive, and they really make things like a portable VM layer possible.. It's very hard to actually see any performance impact of the VMA handling. It's a small structure, with reasonable lookup algorithms, and the common case is still to not have all that many of them. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-10 21:55 ` Linus Torvalds @ 2001-08-10 22:00 ` H. Peter Anvin 2001-08-10 23:03 ` Nicolas Pitre 2001-08-10 23:26 ` Linus Torvalds 2001-08-11 1:04 ` Pavel Machek 1 sibling, 2 replies; 26+ messages in thread From: H. Peter Anvin @ 2001-08-10 22:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jamie Lokier, linux-kernel Linus Torvalds wrote: > > These days, the vma's just have too much information, and the > page tables > can't be counted on to have enough bits. > Note that it isn't very hard to deal with *that* problem, *if you want to*... you just need to maintain a shadow data structure in the same format as the page tables and stuff your software bits in there. Whether or not that is a good idea is another issue entirely, however, on some level it would make sense to separate protection from all the other VM things... -hpa ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-10 22:00 ` H. Peter Anvin @ 2001-08-10 23:03 ` Nicolas Pitre 2001-08-10 23:26 ` Linus Torvalds 1 sibling, 0 replies; 26+ messages in thread From: Nicolas Pitre @ 2001-08-10 23:03 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linus Torvalds, Jamie Lokier, lkml On Fri, 10 Aug 2001, H. Peter Anvin wrote: > Linus Torvalds wrote: > > > > These days, the vma's just have too much information, and the > > page tables > > can't be counted on to have enough bits. > > > > Note that it isn't very hard to deal with *that* problem, *if you want > to*... you just need to maintain a shadow data structure in the same > format as the page tables and stuff your software bits in there. This technique is already used on ARM. Nicolas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-10 22:00 ` H. Peter Anvin 2001-08-10 23:03 ` Nicolas Pitre @ 2001-08-10 23:26 ` Linus Torvalds 2001-08-10 23:55 ` Rik van Riel 1 sibling, 1 reply; 26+ messages in thread From: Linus Torvalds @ 2001-08-10 23:26 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Jamie Lokier, linux-kernel On Fri, 10 Aug 2001, H. Peter Anvin wrote: > > Note that it isn't very hard to deal with *that* problem, *if you want > to*... you just need to maintain a shadow data structure in the same > format as the page tables and stuff your software bits in there. Actually, this is what Linux already does. The Linux page tables _are_ a "shadow data structure", and are conceptually independent from the hardware page tables (or hash table, or whatever the actual hardware uses to actually fill in the TLB). This is most clearly seen on CPU's that don't have traditional page table trees, but use software fill TLB's, hashes, or other things in hardware. > Whether or not that is a good idea is another issue entirely, however, > on some level it would make sense to separate protection from all the > other VM things... I think that the current Linux approach is much superior - the page tables are conceptually a separate shadow data structure, but the way things are set up, you can choose to make the mapping from the shadow data structure to the actual hardware data structures be a 1:1 mapping. This does mean that we do NOT want to make the Linux shadow page tables contain stuff that is not easy to translate to hardware page tables. Tough. It's a trade-off: either you overspecify the kernel page tables (and take the hit of having to keep two separate page tables), or you say "the kernel page tables are weaker than we could make them", and you get the optimization of being able to "fold" them on top of the hardware page tables. I'm 100% convinced that the Linux VM does the right choice - we optimize for the important case, and I will claim that it is _really_ hard for anybody to make a VM that is as efficient and as fast as the Linux one. Proof: show me a full-fledged VM setup that even comes _close_ in performance, and gives the protection and the flexibility that the Linux one does. And yes, we do have _another_ shadow data structure too. It's called the vm_area_struct, aka "vma", and we do not artificially limit ourself to trying to look like hardware on that one. Which brings us back to the original question, and answers it: we already do all of this, and we do it RIGHT. We optimize for the right things. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-10 23:26 ` Linus Torvalds @ 2001-08-10 23:55 ` Rik van Riel 0 siblings, 0 replies; 26+ messages in thread From: Rik van Riel @ 2001-08-10 23:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, Jamie Lokier, linux-kernel On Fri, 10 Aug 2001, Linus Torvalds wrote: > Which brings us back to the original question, and answers it: we already > do all of this, and we do it RIGHT. We optimize for the right things. ... and die under load. There still are a whole number of things outstanding: 1) true low-memory deadlock prevention (memory reservations?) 2) load control, so we won't die from thrashing 3) better IO clustering, to push the thrashing point out further regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-10 21:55 ` Linus Torvalds 2001-08-10 22:00 ` H. Peter Anvin @ 2001-08-11 1:04 ` Pavel Machek 1 sibling, 0 replies; 26+ messages in thread From: Pavel Machek @ 2001-08-11 1:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jamie Lokier, H. Peter Anvin, linux-kernel Hi! > > There are garbage collectors that use mprotect() and SEGV trapping per > > page. It would be nice if there was a way to change the protections per > > page without requiring a VMA for each one. > > This is actually how Linux used to work a long long time ago - all > protection information was in the page tables, and you could do per-page > things without having to worry about piddling details like vma's. > > It does work, but it had major downsides. Trivial things like re-creating > the permission after throwing a page out or swapping it out. For some uses, spurious SEGV after swap-in might be okay ;-). Garbage collector might be that example. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 1:17 ` H. Peter Anvin 2001-08-06 4:26 ` Linus Torvalds @ 2001-08-06 11:52 ` Alan Cox 2001-08-06 12:23 ` Chris Wedgwood 1 sibling, 1 reply; 26+ messages in thread From: Alan Cox @ 2001-08-06 11:52 UTC (permalink / raw) To: H. Peter Anvin; +Cc: linux-kernel > Do you count applications which selectively mprotect()'s memory (to > trap SIGSEGV and maintain coherency with on-disk data structures) as > "broken applications"? > > Such applications *can* use large amounts of mprotect()'s. That would explain a lot since mprotect currently doesn't seem to do merging, and worse it also seems to not be doing rlimit checking right ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 11:52 ` Alan Cox @ 2001-08-06 12:23 ` Chris Wedgwood 2001-08-06 13:17 ` Alan Cox 0 siblings, 1 reply; 26+ messages in thread From: Chris Wedgwood @ 2001-08-06 12:23 UTC (permalink / raw) To: Alan Cox; +Cc: H. Peter Anvin, linux-kernel On Mon, Aug 06, 2001 at 12:52:37PM +0100, Alan Cox wrote: That would explain a lot since mprotect currently doesn't seem to do merging, and worse it also seems to not be doing rlimit checking right Err stupid question, but why does it need to do rlimit checking? --cw ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 12:23 ` Chris Wedgwood @ 2001-08-06 13:17 ` Alan Cox 2001-08-06 13:55 ` Chris Wedgwood 0 siblings, 1 reply; 26+ messages in thread From: Alan Cox @ 2001-08-06 13:17 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Alan Cox, H. Peter Anvin, linux-kernel > On Mon, Aug 06, 2001 at 12:52:37PM +0100, Alan Cox wrote: > > That would explain a lot since mprotect currently doesn't seem to do > merging, and worse it also seems to not be doing rlimit checking right > > Err stupid question, but why does it need to do rlimit checking? mmap nothing over a large space mprotect it read/write ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-06 13:17 ` Alan Cox @ 2001-08-06 13:55 ` Chris Wedgwood 0 siblings, 0 replies; 26+ messages in thread From: Chris Wedgwood @ 2001-08-06 13:55 UTC (permalink / raw) To: Alan Cox; +Cc: H. Peter Anvin, linux-kernel On Mon, Aug 06, 2001 at 02:17:32PM +0100, Alan Cox wrote: mmap nothing over a large space shouldn't the rlimit be in the mmap? (or are sparse mappings not supposed to count towards the rlimit?) --cw ^ permalink raw reply [flat|nested] 26+ messages in thread
* [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) 2001-08-05 23:41 ` Linus Torvalds 2001-08-06 0:41 ` Michael H. Warfield @ 2001-08-06 9:43 ` Chris Wedgwood 1 sibling, 0 replies; 26+ messages in thread From: Chris Wedgwood @ 2001-08-06 9:43 UTC (permalink / raw) To: Linus Torvalds; +Cc: jakob, linux-kernel [-- Attachment #1: Type: text/plain, Size: 5727 bytes --] On Sun, Aug 05, 2001 at 04:41:43PM -0700, Linus Torvalds wrote: I'd like to know more than just the app that shows problems - I'd like to know what it is doing. Well, since I initially complained (this time).... I thought I would try and quantify things a little. Attached is a program I use for reading 'maps' files and showing what levels of aggregation are possible, vma-merge-test.c (barf). Now, I wrote a small proglet to malloc (I'm interested in testing how glibc behaves, since pretty much everything uses glibc) memory in chunks until it can do so no longer, this means getting pretty close to 3G in my machine (not all pages need be resident). Now, if I malloc 1 megabyte chunks, things coalesce very nicely, in fact, the coalesce as much as you reasonable can coalesce things, to about 13 vmas or so, which is about as good as you can hope for if using shared libraries. If I allocate 4K chunks, I get 65746 vmas! Values in between obviously have varying effects: 13 alloc-1M 3069 alloc-512K 7151 alloc-256K 32731 alloc-64K 65746 alloc-4K like so. the 4K allocations will actually coalesce into only 11 vmas (the fact is does better than 1M is because we have better granularity so it fills in gaps where 1M chunks simply won't fit)! alloc-512K can't be coalesced at all, alloc-256K can be by about 50%, alloc-128K by 25% and alloc-64K by 12.5% --- no points for spotting the pattern. Using strace.... ... for 1M allocations, I can see there are 2038 mmap's for 257*4k to allocate the 2G or so, and brk is used to 'allocate' 257*4k chunks of memory 897 times, which pretty much gives us our 3G. FWIW, mmap is used for the first 1G or so, brk for the next 1G, and mmap for the last 1G, with a call to mprotect hidden in there. The mmaps are PROT_READ|PROT_WRITE. ... for 4K allocations, I can see brk is used to allocate 8K chunks, 114000 times or so, getting 1G, then mmap is used to allocate 2M chunks, of which about 1M is munmapp'ed and for the remaining 1M mprotect is called (page by page!) making about 250 odd mprotect calls. This appears to happen until 3G is allocated. The mmaps are PROT_NONE and the mprotect's change this to PROT_READ|PROT_WRITE. ... for 128K allocations, mmap is used to grab 33*4k about 1024 times, netting 128M, brk using to allocate 32(+/-1)*4k pages about 7000 times netting around 1G and then a pattern of mmap 2M, munmap 1M, protect {33,32,32,32,32,32}*4k of the still mapped 1M --- this is the bit that sucks. The mmap was done PROT_NONE, the protect's change this to PROT_READ|PROT_WRITE, but not all of the 1M, so the ability to coalesce here is thwarted (you can coalesce the 32*4k mprotect regions, the remain region has the wrong protection). What does this have to do with reality? IT DEPENDS ON WHAT APPLICATION(S) YOU ARE RUNNING. It appears mozilla, that super lean, super fast and very stable web-browser mostly grows using brk with fairly small increments (under 64K) as it reads data in form various places --- and from several threads at a time.... and lots of small allocates appears to be a "Very Bad Thing". A couple of people sent me examples of other applications that cause problems too, for example David Luyer sent me the map for evolution-mail which is some new "fangled pointy-clicky Gnome super-widget-enhanced" mail application --- perhaps that also grows memory slowly (I don't have an strace of it, so this is just speculation). VMware (capatalisation?) also causes large numbers of vmas, but my attempts to get Xfree86, gimp or gcc (when compiling C code) to do so were unsuccessful, all showed little if any ability to merge vmas. Compiling a large c++ application might show some gains here, but I don't have anything large enough to try. In linux/mm/mmap.c:do_brk I see: /* Can we just expand an old anonymous mapping? */ if (addr) { struct vm_area_struct * vma = find_vma(mm, addr-1); if (vma && vma->vm_end == addr && !vma->vm_file && vma->vm_flags == flags) { vma->vm_end = addr + len; goto out; } } which explains why allocations from increments of brk do coalesce well. Elsewhere in linux/mm/mmap.c:do_mmap_pgoff we have: /* Can we just expand an old anonymous mapping? */ if (addr && !file && !(vm_flags & VM_SHARED)) { struct vm_area_struct * vma = find_vma(mm, addr-1); if (vma && vma->vm_end == addr && !vma->vm_file && vma->vm_flags == vm_flags) { vma->vm_end = addr + len; goto out; } } so I assume consistent use of mmap will produce good results too. BUT, glibc doesn't always have consistent use, as I mentioned about, it will often do mmap( .... PROT_FOO ... ) munmap ( some of the above ) [optional] for( ... ) mprotect ( PROT_BAR ... ) which means the simple logic above cannot coalesce things. This leaves three (four) possibilities: (1) change glibc to avoid the above behavior (2) fiddle with mprotect to expand/coalesce regions (3) declare problematic applications borked or maybe (4) have more complex vma logic all over the place in the kernel Anyhow, that's my very brief rather unscientific handy-waving explanation that seems to make sense to me! Incidentally, the algorithm in linux/fs/proc/aarry.c:proc_pid_read_maps is _terribly_ slow for reading /proc/<n>/maps when there are many vmas. We could possible hack around this by assuming a contact line-length or something (too gross?). --cw [-- Attachment #2: vma-merge-test.c --] [-- Type: text/x-csrc, Size: 1948 bytes --] /* * vma-merge-test.c --- count # entries in /proc/<n>/maps as well as * indication of mergability (is that a word?) * * technically this is buggy --- pass the paper bag :) * * cw@f00f.org, 6 Aug 2001 * */ #include <errno.h> #include <stdio.h> int main(int argc, char *argv[]) { FILE *f; char linebuf[128]; long st, en, len, dunno; long pen = 0, ms = 0; char flags[32], indev[32], path[256]; char pflags[32], pindev[32]; int cu = 0, cm = 0, mc = 0; if(argc != 2) { fprintf(stderr, "Please supply one argument, the 'maps' file\n"); return 1; } if(!(f = fopen(argv[1],"r"))) { fprintf(stderr, "error (%s) trying to open '%s'\n", strerror(errno), argv[1]); return 2; } pflags[0] = '\000'; pindev[0] = '\000'; while(!feof(f)) { if(!fgets(linebuf, sizeof linebuf, f)) break; if(sscanf(linebuf, "%lx-%lx %s %8lx %s %ld %s\n", &st, &en, flags, &len, indev, &dunno, path) < 5) { fprintf(stderr, "Bad line\n\t%s\nAborting\n", linebuf); break; } cu++; /* same as previous mapping and adjacent, then merge is possible */ if(!strcmp(indev, pindev) && !strcmp(flags, pflags) && (st == pen)) { cm++; mc++; } else { if(mc) { /* show merged results */ printf("%08lx-%08lx %s %s (%d)\n", ms, pen, flags, indev, mc); } /* show these results */ printf("%08lx-%08lx %s %s\n", st, en, flags, indev); strcpy(pindev, indev); strcpy(pflags, flags); ms = st; mc = 0; } pen = en; } printf("\n%d entries, %d merges\n", cu, cm); printf("%d with merging, %4.1f%% of original\n", cu - cm, (double)(100.0 * (cu - cm) / cu)); fclose(f); return 0; } ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long
@ 2001-08-05 6:44 David Luyer
2001-08-05 7:21 ` Anders Eriksson
0 siblings, 1 reply; 26+ messages in thread
From: David Luyer @ 2001-08-05 6:44 UTC (permalink / raw)
To: linux-kernel; +Cc: Chris Wedgwood, riel
I wrote (off-list):
> On 05 Aug 2001 17:12:02 +1200, Chris Wedgwood wrote:
> > On Sat, Aug 04, 2001 at 11:17:26PM -0300, Rik van Riel wrote:
> >
> > > cw:tty5@tapu(cw)$ wc -l /proc/1368/maps
> > > 5287 /proc/1368/maps
> >
> > Ouch, what kind of application is this happening with ?
> >
> > Mozilla. Presumably some of the Gnome applications might be the same
> > as they use lots and lots of shared libraries (anyone out there Gnome
> > inflicted and can check?).
>
> FYI: Linux 2.2.14 (yes, I know, it's old but I've had no cause to update
> the machine in question):
>
> Mozilla: 215 lines in /proc/$$/maps
> StarOffice opening a small PowerPoint: 209 lines in /proc/$$/maps
> Evolution Mail Component: 193 lines in /proc/$$/maps
>
> Those are the current 'winners' on my wc -l /proc/*/maps | sort -n but
> I'm not exactly doing anything to stress the machine. Hard to know if
> the 2.2.x number of mappings will have any correlation with 2.4.x (as
> if 2.4.x isn't aggressive combining ranges but both allocate initially
> as well as each other, it might get a lot worse with long-running
> processes on 2.4.x but not on 2.2.x, for example).
And the same machine, 2.4.7ac5:
Mozilla: 222 lines in /proc/$$/maps on startup... and growing
StarOffice opening a small PowerPoint: 209 lines in /proc/$$/maps
Evolution Mail Component: 181 lines in /proc/$$/maps
But after visiting a few web pages Mozilla has already grown to 265 mappings;
302 mappings; growing... (whereas playing around in Evolution Mail only
increased it's number to 185.. actually as I finish off this mail and have
done a few other things it's up to 222 now).
So the problem is something which Mozilla is particularly good at triggering.
Under 2.2.14 the number of mappings for Mozilla wasn't growing significantly
with use. But that doesn't say that it isn't some kind of 'bad' behaviour
from Mozilla.
Here's some sample mappings for evolution-mail:
40f10000-40f11000 rw-p 000cf000 00:00 0
40f11000-40f12000 rw-p 000d0000 00:00 0
40f12000-40f13000 rw-p 000d1000 00:00 0
40f13000-40f14000 rw-p 000d2000 00:00 0
40f14000-40f15000 rw-p 000d3000 00:00 0
40f15000-40f16000 rw-p 000d4000 00:00 0
40f16000-40f17000 rw-p 000d5000 00:00 0
40f17000-40f19000 rw-p 000d6000 00:00 0
40f19000-40f1a000 rw-p 000d8000 00:00 0
40f1a000-40f1d000 rw-p 000d9000 00:00 0
40f1d000-40f25000 rw-p 000dc000 00:00 0
40f25000-40f26000 rw-p 000e4000 00:00 0
40f26000-40f27000 rw-p 000e5000 00:00 0
[...]
Now I would naievely assume those adjacent contiguous mappings with equal
permissions could pretty easily be merged.
David.
--
David Luyer Phone: +61 3 9674 7525
Engineering Projects Manager P A C I F I C Fax: +61 3 9699 8693
Pacific Internet (Australia) I N T E R N E T Mobile: +61 4 1111 2983
http://www.pacific.net.au/ NASDAQ: PCNTF
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: /proc/<n>/maps getting _VERY_ long 2001-08-05 6:44 /proc/<n>/maps getting _VERY_ long David Luyer @ 2001-08-05 7:21 ` Anders Eriksson 0 siblings, 0 replies; 26+ messages in thread From: Anders Eriksson @ 2001-08-05 7:21 UTC (permalink / raw) To: David Luyer; +Cc: linux-kernel, Chris Wedgwood, riel My current winner it s vmware (latest version) with a freshly booted w98: 90 /proc/21582/maps 1015 /proc/14395/maps 3909 total [ander@milou ander]$ ps 14395 PID TTY STAT TIME COMMAND 14395 ttyp2 S 469:40 vmware /A ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2001-08-14 11:49 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-08-04 15:43 /proc/<n>/maps getting _VERY_ long Chris Wedgwood 2001-08-05 2:17 ` Rik van Riel 2001-08-05 5:12 ` Chris Wedgwood 2001-08-05 13:06 ` Alan Cox 2001-08-05 13:18 ` Chris Wedgwood 2001-08-05 23:07 ` Jakob Østergaard 2001-08-05 23:41 ` Linus Torvalds 2001-08-06 0:41 ` Michael H. Warfield 2001-08-06 1:01 ` Linus Torvalds 2001-08-06 1:17 ` H. Peter Anvin 2001-08-06 4:26 ` Linus Torvalds 2001-08-06 6:30 ` H. Peter Anvin 2001-08-06 18:41 ` Jamie Lokier 2001-08-10 21:55 ` Linus Torvalds 2001-08-10 22:00 ` H. Peter Anvin 2001-08-10 23:03 ` Nicolas Pitre 2001-08-10 23:26 ` Linus Torvalds 2001-08-10 23:55 ` Rik van Riel 2001-08-11 1:04 ` Pavel Machek 2001-08-06 11:52 ` Alan Cox 2001-08-06 12:23 ` Chris Wedgwood 2001-08-06 13:17 ` Alan Cox 2001-08-06 13:55 ` Chris Wedgwood 2001-08-06 9:43 ` [LONGish] Brief analysis of VMAs (was: /proc/<n>/maps getting _VERY_ long) Chris Wedgwood 2001-08-05 6:44 /proc/<n>/maps getting _VERY_ long David Luyer 2001-08-05 7:21 ` Anders Eriksson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).