* broken VM in 2.4.10-pre9 @ 2001-09-15 22:43 Peter Magnusson 2001-09-15 23:50 ` Jan Harkes ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Peter Magnusson @ 2001-09-15 22:43 UTC (permalink / raw) To: linux-kernel 2.4.7: good VM 2.4.8: not good 2.4.9: not good!!!++ 2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7 2.4.10-pre8: not good 2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i unrared two very large files at the same time. And now 104 Mbyte swap is used! :-( 2.4.7 didnt do like this. Best is to use the swap as little as possible. My cfg: Real mem: 512684K (512 Mbyte) Swap : 257032K compiled with: gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk) !! remove "nothanksok." from my email if you want to reply to me !! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson @ 2001-09-15 23:50 ` Jan Harkes 2001-09-16 5:31 ` Linus Torvalds 2001-09-17 10:25 ` Tonu Samuel 2 siblings, 0 replies; 22+ messages in thread From: Jan Harkes @ 2001-09-15 23:50 UTC (permalink / raw) To: linux-kernel What do you consider as good VM? Because pages aren't 'aged' until there is swap allocated for them, your kernel should actually work better if it has a lot of pages backed by swap. The only thing is, we don't really make the right decision about which pages to swap out, but that's just a detail. IMHO. A large number of cached/active pages == good. Jan On Sun, Sep 16, 2001 at 12:43:35AM +0200, Peter Magnusson wrote: > 2.4.7: good VM > 2.4.8: not good > 2.4.9: not good!!!++ > 2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7 > 2.4.10-pre8: not good > 2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i > unrared two very large files at the same time. And now 104 > Mbyte swap is used! :-( 2.4.7 didnt do like this. > Best is to use the swap as little as possible. > > My cfg: > > Real mem: 512684K (512 Mbyte) > Swap : 257032K > compiled with: gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson 2001-09-15 23:50 ` Jan Harkes @ 2001-09-16 5:31 ` Linus Torvalds 2001-09-16 8:45 ` Eric W. Biederman 2001-09-17 10:25 ` Tonu Samuel 2 siblings, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2001-09-16 5:31 UTC (permalink / raw) To: linux-kernel In article <Pine.LNX.4.33L2.0109160031500.7740-100000@flashdance>, Peter Magnusson <iocc@flashdance.nothanksok.cx> wrote: > >2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7 >2.4.10-pre8: not good Ehh.. There are _no_ VM changes that I can see between pre4 and pre8. >2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i > unrared two very large files at the same time. And now 104 > Mbyte swap is used! :-( 2.4.7 didnt do like this. > Best is to use the swap as little as possible. .. and there are none between pre8 and pre9. Basically, it sounds lik eyou have tested different loads on different kernels, and some loads are nice and others are not. Also note that the amount of "swap used" is totally meaningless in 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much earlier than 2.2.x, but that doesn't actuall ymean that it does any of the IO. Indeed, allocating the swap backing store just means that the swap pages are then kept track of, so that they can be aged along with other stores. So whether Linux uses swap or not is a 100% meaningless indicator of "goodness". The only thing that matters is how well the job gets done, ie was it reasonably responsive, and did the big untars finish quickly.. Don't look at how many pages of swap were used. That's a statistic, nothing more. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 5:31 ` Linus Torvalds @ 2001-09-16 8:45 ` Eric W. Biederman 0 siblings, 0 replies; 22+ messages in thread From: Eric W. Biederman @ 2001-09-16 8:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel torvalds@transmeta.com (Linus Torvalds) writes: > Don't look at how many pages of swap were used. That's a statistic, > nothing more. It is a statistic until you run out of them. Obviously that isn't the problem here, or we'd hear complaints about the OOM killer. But the number of pages used can make a difference. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson 2001-09-15 23:50 ` Jan Harkes 2001-09-16 5:31 ` Linus Torvalds @ 2001-09-17 10:25 ` Tonu Samuel 2001-09-16 16:47 ` Jeremy Zawodny ` (2 more replies) 2 siblings, 3 replies; 22+ messages in thread From: Tonu Samuel @ 2001-09-17 10:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote: > Also note that the amount of "swap used" is totally meaningless in > 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much > earlier than 2.2.x, but that doesn't actuall ymean that it does any of > the IO. Indeed, allocating the swap backing store just means that the > swap pages are then kept track of, so that they can be aged along with > other stores. Problem still exists and persists. Not long time ago man from Yahoo described well case when change from 2.2.19 to 2.4.x caused performance problems. On 2.2.19 everything ran fine. They have MySQL running+did backups from disk. After upgrade to 2.4.x MySQL performance felt down on backup time. They investigated stuff and found that MySQL daemon gets swapped out in the middle of usage to make room for buffers. In summary: this made both sql and backup double slow. Even increasing memory from 1G->2G didn't helped. Finally they disabled swap at all and problem lost. If you do not want to change it back as it was in 2.2.x then would be good if this is tunable somehow. -- For technical support contracts, goto https://order.mysql.com/ __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Mr. Tonu Samuel <tonu@mysql.com> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Security Administrator /_/ /_/\_, /___/\___\_\___/ Hong Kong, China <___/ www.mysql.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-17 10:25 ` Tonu Samuel @ 2001-09-16 16:47 ` Jeremy Zawodny 2001-09-16 18:36 ` Alan Cox 2001-09-16 18:34 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli 2001-09-16 19:37 ` broken VM in 2.4.10-pre9 Linus Torvalds 2 siblings, 1 reply; 22+ messages in thread From: Jeremy Zawodny @ 2001-09-16 16:47 UTC (permalink / raw) To: Tonu Samuel; +Cc: Linus Torvalds, linux-kernel On Mon, Sep 17, 2001 at 06:25:38PM +0800, Tonu Samuel wrote: > On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote: > > > Also note that the amount of "swap used" is totally meaningless in > > 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much > > earlier than 2.2.x, but that doesn't actuall ymean that it does any of > > the IO. Indeed, allocating the swap backing store just means that the > > swap pages are then kept track of, so that they can be aged along with > > other stores. > > Problem still exists and persists. Not long time ago man from Yahoo > described well case when change from 2.2.19 to 2.4.x caused > performance problems. On 2.2.19 everything ran fine. They have MySQL > running+did backups from disk. After upgrade to 2.4.x MySQL > performance felt down on backup time. They investigated stuff and > found that MySQL daemon gets swapped out in the middle of usage to > make room for buffers. In summary: this made both sql and backup > double slow. Even increasing memory from 1G->2G didn't > helped. Finally they disabled swap at all and problem lost. Yep, that was me. It was frustrating to have to double the RAM in the machine and then turn off swap. The extra RAM did help, but it really only delayed the problem. > If you do not want to change it back as it was in 2.2.x then would > be good if this is tunable somehow. Agreed. I'd be great if there was an option to say "Don't swap out memory that was allocated by these programs. If you run out of disk buffers, toss the oldest ones and start re-using them." Jeremy -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! <Jeremy@Zawodny.com> | http://jeremy.zawodny.com/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 16:47 ` Jeremy Zawodny @ 2001-09-16 18:36 ` Alan Cox 2001-09-16 19:38 ` Linus Torvalds 0 siblings, 1 reply; 22+ messages in thread From: Alan Cox @ 2001-09-16 18:36 UTC (permalink / raw) To: Jeremy Zawodny; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel > Yep, that was me. It was frustrating to have to double the RAM in the > machine and then turn off swap. The extra RAM did help, but it really > only delayed the problem. That shouldnt be needed with at least the later -ac kernels - nor is the swap > twice ram rule present in those ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 18:36 ` Alan Cox @ 2001-09-16 19:38 ` Linus Torvalds 0 siblings, 0 replies; 22+ messages in thread From: Linus Torvalds @ 2001-09-16 19:38 UTC (permalink / raw) To: linux-kernel In article <E15igmC-0005bs-00@the-village.bc.nu>, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: >> Yep, that was me. It was frustrating to have to double the RAM in the >> machine and then turn off swap. The extra RAM did help, but it really >> only delayed the problem. > >That shouldnt be needed with at least the later -ac kernels - nor is the >swap > twice ram rule present in those Nor has it been present in the standard kernels since 2.4.8. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-17 10:25 ` Tonu Samuel 2001-09-16 16:47 ` Jeremy Zawodny @ 2001-09-16 18:34 ` Andrea Arcangeli 2001-09-16 19:07 ` Rik van Riel [not found] ` <20010917174037.7e3739b9.skraw@ithnet.com> 2001-09-16 19:37 ` broken VM in 2.4.10-pre9 Linus Torvalds 2 siblings, 2 replies; 22+ messages in thread From: Andrea Arcangeli @ 2001-09-16 18:34 UTC (permalink / raw) To: Tonu Samuel; +Cc: Linus Torvalds, linux-kernel On Mon, Sep 17, 2001 at 06:25:38PM +0800, Tonu Samuel wrote: > On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote: > > > Also note that the amount of "swap used" is totally meaningless in > > 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much > > earlier than 2.2.x, but that doesn't actuall ymean that it does any of > > the IO. Indeed, allocating the swap backing store just means that the > > swap pages are then kept track of, so that they can be aged along with > > other stores. > > Problem still exists and persists. Not long time ago man from Yahoo > described well case when change from 2.2.19 to 2.4.x caused performance > problems. On 2.2.19 everything ran fine. They have MySQL running+did After a few days of developement I think I'm ready to release the VM rewrite I did. The alternate vm will be included in 2.4.10pre9aa1 (or anwways the very next -aa release) and I'll maintain it in the -aa tree. It is supposed to provide: 1) stable kswapd, avoid the kswapd 100% load of the cpu problem (this is provided by the classzone design, btw I improved the implementation a little bit compared to the 2.3/2.4.0-test patches, now I try to do things as lazily as possible without the bookkeeping in the pagealloc/pagefreeing) 2) optimal performance, avoid slowdowns after multiple runs of workloads and avoid swapout storms (for databases not using O_DIRECT) 3) you will get swap+ram of available virtual memory At the moment it's of course still a bit experimental and subject to changes but I'm writing this email on top of it and it's perfectly usable. This isn't an hack/band-aid or a small set of changes, it's a complete rewrite from scratch of the whole memory balancing including garbage collections lru lists, kswapd etc... (only the swap_out() path is almost unchanged) The only benchmark I did so far is been `dbench`. Without the vm patch applied dbench says: andrea@laser:/mnt > dbench 40 40 clients started ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+..........................................................................................................................+..................+.....................................+...........................................................................................+...........................................+..+.......................................++............................................+................+............+..................++++++++++++++++++++++++++++**************************************** Throughput 9.40112 MB/sec (NB\x11.7514 MB/sec 94.0112 MBit/sec) andrea@laser:/mnt > dbench 40 40 clients started .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+.....................................+...................................................+...............................................................................+...+...............................................................................+.........................+.....................................+...............+................................................................................+.++++++++++++++++++++++++++++++**************************************** Throughput 9.56469 MB/sec (NB\x11.9559 MB/sec 95.6469 MBit/sec) andrea@laser:/mnt > After I apply my vm patch dbench constantly says: andrea@laser:/mnt > for i in 1 2 3 4 ; do dbench 40; done 40 clients started .......................................................................................................................................................................................................................................................................................................++.+..+...+...............................+..+..............................................................................................................................................................+.........................+.................................................................................+..........................................+.......................................................................................+...............+.+..+..........................................................................................................+.................+.................+.......................+...........................+..........................+...................................+....+.+.+........................................+................................+.+...................................................................................................................................+........................................................................+...........................+................................................................+.......................+...........................+.............................................................++++++**************************************** Throughput 20.353 MB/sec (NB%.4412 MB/sec 203.53 MBit/sec) 40 clients started ................................................................................................................+.................................................+..............+............................................+.....................................................................................................................................................+........................................+....+................+.+..........................................++.......................................+...............................+...................................................+............................................................................+.........+.....................................................................+..............................................................................................................................................................................+............+................................+..............................................................+...............+.......................................................+...+++...................................+.................................+............+..............................+...................................................++...................................................+...............................................................................................................+.......+...+....................................+......+...+...+**************************************** Throughput 20.9269 MB/sec (NB&.1586 MB/sec 209.269 MBit/sec) 40 clients started ..........................................................................................................................................+.............................+..........+..............................................................................+............+...............................................+...................................................................+.......................+.........................................................................................................................++......+.........................................................................+.........+....+.......+...................................................+....+.............................................+....................................................+..........................................................+................+.......................................................................................................................................................+.........+...............................................................+........................+.........................................................................+.....................................................................................................................................................................+.......+..+..........................+............................+................+................+.....+......+..........+...............+........+.....+.+**************************************** Throughput 21.0787 MB/sec (NB&.3483 MB/sec 210.787 MBit/sec) 40 clients started .................................................................................................................................................................+......................+.+...................................+.......................................+...............................+........................................................................................+.........................................................+................................+...................................................................................................................++................+.................................+........................+........................+....................+.......................++..........................................................+......+.........................................+......................................................+...............................+...............................................................................+...................................................+......................................................................................................................................................................................................+.....................+......+...............................+................................................+..........................................+.......+......................................+..............................+...+.+.+............+.++**************************************** Throughput 21.6167 MB/sec (NB'.0208 MB/sec 216.167 MBit/sec) andrea@laser:/mnt > Andrea ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-16 18:34 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli @ 2001-09-16 19:07 ` Rik van Riel 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi ` (2 more replies) [not found] ` <20010917174037.7e3739b9.skraw@ithnet.com> 1 sibling, 3 replies; 22+ messages in thread From: Rik van Riel @ 2001-09-16 19:07 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel On Sun, 16 Sep 2001, Andrea Arcangeli wrote: > The alternate vm will be included in 2.4.10pre9aa1 (or anwways the > very next -aa release) and I'll maintain it in the -aa tree. Cool, I'll definately take a look to see if there are any good ideas ready to be integrated into the -linus or -ac kernels. > It is supposed to provide: [snip holy grail] I doubt you'll be able to achieve all of those without really major changes, but I'll take a look at your code when you make it public ;) cheers, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 19:07 ` Rik van Riel @ 2001-09-16 15:19 ` Phillip Susi 2001-09-16 19:33 ` Jeremy Zawodny 2001-09-16 19:52 ` Rik van Riel 2001-09-16 19:17 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox 2001-09-16 19:19 ` Andrea Arcangeli 2 siblings, 2 replies; 22+ messages in thread From: Phillip Susi @ 2001-09-16 15:19 UTC (permalink / raw) To: linux-kernel Maybe I'm missing something here, but it seems to me that these problems are due to the cache putting pressure on VM, so process pages get swapped out. The obvious solution to this is to limit the size of the cache, or implement some sort of algorithm to slow its growth and reduce the pressure on VM. It also seems that one of the causes for the cache expanding is large bulk file copies, or reads for say, mp3 playing. Wasn't there a flag to disable caching on file IO that these programs could use, to keep from polluting the cache? Am I way off base here? -- --> Phill Susi ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi @ 2001-09-16 19:33 ` Jeremy Zawodny 2001-09-16 19:54 ` [PATCH] " Rik van Riel 2001-09-16 19:52 ` Rik van Riel 1 sibling, 1 reply; 22+ messages in thread From: Jeremy Zawodny @ 2001-09-16 19:33 UTC (permalink / raw) To: Phillip Susi; +Cc: linux-kernel On Sun, Sep 16, 2001 at 03:19:29PM +0000, Phillip Susi wrote: > Maybe I'm missing something here, but it seems to me that these > problems are due to the cache putting pressure on VM, so process > pages get swapped out. That's what it felt like in the cases that I ran into it. It was trying to treat all memory equally, when it probably shouldn't have. Jeremy -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! <Jeremy@Zawodny.com> | http://jeremy.zawodny.com/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] Re: broken VM in 2.4.10-pre9 2001-09-16 19:33 ` Jeremy Zawodny @ 2001-09-16 19:54 ` Rik van Riel 0 siblings, 0 replies; 22+ messages in thread From: Rik van Riel @ 2001-09-16 19:54 UTC (permalink / raw) To: Jeremy Zawodny; +Cc: Phillip Susi, linux-kernel On Sun, 16 Sep 2001, Jeremy Zawodny wrote: > On Sun, Sep 16, 2001 at 03:19:29PM +0000, Phillip Susi wrote: > > > Maybe I'm missing something here, but it seems to me that these > > problems are due to the cache putting pressure on VM, so process > > pages get swapped out. > > That's what it felt like in the cases that I ran into it. It was > trying to treat all memory equally, when it probably shouldn't have. Indeed, it should treat all memory equally, except when we really have far too much cache. I'll resend the patch with the subject clearly marked since this trivial thing really does need testers ;) regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ --- mm/vmscan.c.orig Sun Sep 16 16:44:14 2001 +++ mm/vmscan.c Sun Sep 16 16:49:09 2001 @@ -731,6 +731,8 @@ */ #define too_many_buffers (atomic_read(&buffermem_pages) > \ (num_physpages * buffer_mem.borrow_percent / 100)) +#define too_much_cache (page_cache_size - swapper_space.nrpages) > \ + (num_physpages * page_cache.borrow_percent / 100)) int refill_inactive_scan(unsigned int priority) { struct list_head * page_lru; @@ -793,6 +795,18 @@ * be reclaimed there... */ if (page->buffers && !page->mapping && too_many_buffers) { + deactivate_page_nolock(page); + page_active = 0; + } + + /* + * If the page cache is too large, move the page + * to the inactive list. If it is really accessed + * it'll be referenced before it reaches the point + * where we'll reclaim it. + */ + if (page->mapping && too_much_cache && page_count(page) <= + (page->buffers ? 2 : 1)) { deactivate_page_nolock(page); page_active = 0; } --- mm/swap.c.orig Sun Sep 16 16:50:43 2001 +++ mm/swap.c Sun Sep 16 16:50:58 2001 @@ -64,7 +64,7 @@ buffer_mem_t page_cache = { 2, /* minimum percent page cache */ - 15, /* borrow percent page cache */ + 60, /* borrow percent page cache */ 75 /* maximum */ }; ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi 2001-09-16 19:33 ` Jeremy Zawodny @ 2001-09-16 19:52 ` Rik van Riel 1 sibling, 0 replies; 22+ messages in thread From: Rik van Riel @ 2001-09-16 19:52 UTC (permalink / raw) To: Phillip Susi; +Cc: linux-kernel On Sun, 16 Sep 2001, Phillip Susi wrote: > Maybe I'm missing something here, but it seems to me that these > problems are due to the cache putting pressure on VM, so process pages > get swapped out. The obvious solution to this is to limit the size of > the cache, or implement some sort of algorithm to slow its growth and > reduce the pressure on VM. > Am I way off base here? You're absolutely right and it's only a tiny patch to implement this thing. I've attached a completely untested (I haven't even compiled this thing) patch which implements this thing. I suspect it'll apply to any recent -ac kernel, porting it to -linus should be easy. regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) --- mm/vmscan.c.orig Sun Sep 16 16:44:14 2001 +++ mm/vmscan.c Sun Sep 16 16:49:09 2001 @@ -731,6 +731,8 @@ */ #define too_many_buffers (atomic_read(&buffermem_pages) > \ (num_physpages * buffer_mem.borrow_percent / 100)) +#define too_much_cache (page_cache_size - swapper_space.nrpages) > \ + (num_physpages * page_cache.borrow_percent / 100)) int refill_inactive_scan(unsigned int priority) { struct list_head * page_lru; @@ -793,6 +795,18 @@ * be reclaimed there... */ if (page->buffers && !page->mapping && too_many_buffers) { + deactivate_page_nolock(page); + page_active = 0; + } + + /* + * If the page cache is too large, move the page + * to the inactive list. If it is really accessed + * it'll be referenced before it reaches the point + * where we'll reclaim it. + */ + if (page->mapping && too_much_cache && page_count(page) <= + (page->buffers ? 2 : 1)) { deactivate_page_nolock(page); page_active = 0; } --- mm/swap.c.orig Sun Sep 16 16:50:43 2001 +++ mm/swap.c Sun Sep 16 16:50:58 2001 @@ -64,7 +64,7 @@ buffer_mem_t page_cache = { 2, /* minimum percent page cache */ - 15, /* borrow percent page cache */ + 60, /* borrow percent page cache */ 75 /* maximum */ }; ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-16 19:07 ` Rik van Riel 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi @ 2001-09-16 19:17 ` Alan Cox 2001-09-16 19:15 ` Rik van Riel 2001-09-16 19:19 ` Andrea Arcangeli 2 siblings, 1 reply; 22+ messages in thread From: Alan Cox @ 2001-09-16 19:17 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrea Arcangeli, Tonu Samuel, Linus Torvalds, linux-kernel > [snip holy grail] > > I doubt you'll be able to achieve all of those without > really major changes, but I'll take a look at your code > when you make it public ;) Andrea made 2.2 finally stable under really high VM loads. I'm certainly interested to see what comes out of this. Alan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-16 19:17 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox @ 2001-09-16 19:15 ` Rik van Riel 0 siblings, 0 replies; 22+ messages in thread From: Rik van Riel @ 2001-09-16 19:15 UTC (permalink / raw) To: Alan Cox; +Cc: Andrea Arcangeli, Tonu Samuel, Linus Torvalds, linux-kernel On Sun, 16 Sep 2001, Alan Cox wrote: > > [snip holy grail] > > > > I doubt you'll be able to achieve all of those without > > really major changes, but I'll take a look at your code > > when you make it public ;) > > Andrea made 2.2 finally stable under really high VM loads. I'm > certainly interested to see what comes out of this. Definately, I have no doubt he'll achieve some good results. It's the overly wild claims I'm having doubts about. I'm looking forward to seeing his patch... regards, Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-16 19:07 ` Rik van Riel 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi 2001-09-16 19:17 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox @ 2001-09-16 19:19 ` Andrea Arcangeli 2001-09-16 19:30 ` Linus Torvalds 2 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2001-09-16 19:19 UTC (permalink / raw) To: Rik van Riel; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel On Sun, Sep 16, 2001 at 04:07:16PM -0300, Rik van Riel wrote: > I doubt you'll be able to achieve all of those without > really major changes, but I'll take a look at your code > when you make it public ;) as said it is quite a major change, it discards most of the the 2.4 vm that I don't agree with, it is basically an evolution of the classzone patch. andrea@athlon:~/remote/kernel.org/kernels/v2.4/2.4.10pre9aa1 > diffstat 80_vm-aa-1 ID |binary arch/alpha/mm/fault.c | 7 arch/i386/mm/fault.c | 25 + fs/buffer.c | 68 +-- fs/dcache.c | 2 fs/inode.c | 59 +-- fs/proc/proc_misc.c | 8 include/linux/fs.h | 2 include/linux/highmem.h | 2 include/linux/list.h | 1 include/linux/mm.h | 50 +- include/linux/mmzone.h | 9 include/linux/pagemap.h | 1 include/linux/sched.h | 3 include/linux/slab.h | 2 include/linux/swap.h | 148 ++----- include/linux/swapctl.h | 22 - kernel/fork.c | 2 kernel/signal.c | 2 kernel/sysctl.c | 6 mm/filemap.c | 38 - mm/memory.c | 12 mm/numa.c | 8 mm/oom_kill.c | 40 -- mm/page_alloc.c | 501 +++++++++----------------- mm/shmem.c | 2 mm/slab.c | 8 mm/swap.c | 105 ----- mm/swap_state.c | 14 mm/swapfile.c | 21 - mm/vmscan.c | 913 +++++++++++++++--------------------------------- 31 files changed, 699 insertions(+), 1382 deletions(-) andrea@athlon:~/remote/kernel.org/kernels/v2.4/2.4.10pre9aa1 > Andrea ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] 2001-09-16 19:19 ` Andrea Arcangeli @ 2001-09-16 19:30 ` Linus Torvalds 0 siblings, 0 replies; 22+ messages in thread From: Linus Torvalds @ 2001-09-16 19:30 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Rik van Riel, Tonu Samuel, linux-kernel On Sun, 16 Sep 2001, Andrea Arcangeli wrote: > > as said it is quite a major change, it discards most of the the 2.4 vm > that I don't agree with, it is basically an evolution of the classzone > patch. That is the wrong direction to go into. We'll be completely screwed on NuMA with the classzone patch. I've said so before, I'll say so again. The basic approach of the classzone patch is _wrong_, in making global decisions where no "globality" exists. I bet that the improvements are from other things, not from classzone itself. An dI will bet that if we start doing classzones, we'll regret it a LOT in a few years. Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <20010917174037.7e3739b9.skraw@ithnet.com>]
[parent not found: <20010917181040.J713@athlon.random>]
[parent not found: <20010917191256.6e6a1c87.skraw@ithnet.com>]
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] [not found] ` <20010917191256.6e6a1c87.skraw@ithnet.com> @ 2001-09-17 22:41 ` Andrea Arcangeli 0 siblings, 0 replies; 22+ messages in thread From: Andrea Arcangeli @ 2001-09-17 22:41 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel [ CC'ed to l-k with Stephan approval ] On Mon, Sep 17, 2001 at 07:12:56PM +0200, Stephan von Krawczynski wrote: > On Mon, 17 Sep 2001 18:10:40 +0200 Andrea Arcangeli <andrea@suse.de> wrote: > > > On Mon, Sep 17, 2001 at 05:40:37PM +0200, Stephan von Krawczynski wrote: > > > On Sun, 16 Sep 2001 20:34:14 +0200 Andrea Arcangeli <andrea@suse.de> wrote: > > > > > > > After a few days of developement I think I'm ready to release the VM > > > > rewrite I did. > > > > > > > > The alternate vm will be included in 2.4.10pre9aa1 (or anwways the very > > > > next -aa release) and I'll maintain it in the -aa tree. It is supposed > > > > to provide: > > > > > > Where can I get a patch working on 2.4.9 (possibly pre9 or pre10)? > > > Didn't find it on ftp.kernel.org. > > > > I uploaded it now. You can apply the whole 2.4.10pre10 patch > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.10pre10aa1.bz2 > > > > Any feedback is welcome so I can make it better if it swapouts too much > > etc... > > Hello Andrea, > > my first impression: very high performance compared to all other versions I > tested so far. > > - cpu average load is low, during whole test sometimes even below 3 > (never saw > this before) Good. I also had another report with very vfs intensive operation going on and I suspect this patch will be a good idea (even if it can lead to the usual excessive grow of the vfs caches on the long run but the current way is probably too aggressive). --- 2.4.10pre10aa2/mm/vmscan.c.~1~ Mon Sep 17 19:17:27 2001 +++ 2.4.10pre10aa2/mm/vmscan.c Tue Sep 18 00:09:33 2001 @@ -518,12 +518,12 @@ if (nr_pages <= 0) return 0; - shrink_dcache_memory(priority, gfp_mask); - shrink_icache_memory(priority, gfp_mask); - nr_pages = shrink_cache(&active_list, &max_scan, nr_pages, classzone, gfp_mask); if (nr_pages <= 0) return 0; + + shrink_dcache_memory(priority, gfp_mask); + shrink_icache_memory(priority, gfp_mask); return nr_pages; } > - meminfo during test: > total: used: free: shared: buffers: cached: > Mem: 923574272 920178688 3395584 0 73883648 741076992 > Swap: 271392768 0 271392768 > MemTotal: 901928 kB > MemFree: 3316 kB > MemShared: 0 kB > Buffers: 72152 kB > Cached: 723708 kB > SwapCached: 0 kB > Active: 116172 kB > Inactive: 679688 kB > HighTotal: 0 kB > HighFree: 0 kB > LowTotal: 901928 kB > LowFree: 3316 kB > SwapTotal: 265032 kB > SwapFree: 265032 kB Fine. > Doesn't change that much (once all mem is eaten up from free). that's expected, I didn't claimed to have added the defragmenter yet ;) The architecture for the defrag it's just there though, I just ignored solving the order > 1 for now, that will be probably the next step. > - Has same alloc problems as other versions: > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 2-order allocation failed > (gfp=0x20/0) from c012de72 > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 1-order allocation failed > (gfp=0x20/0) from c012de72 > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 0-order allocation failed > (gfp=0x20/0) from c012de72 while this is order 0 this is a GFP_ATOMIC allocation so it's sane too that it failed. Can you symbol-resolve the address "c012de72" so we know who's doing this GFP_ATOMIC allocation? thanks. We could theoretically shrink the cache also from GFP_ATOMIC but we should make a few spinlocks irq spinlocks. > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed > (gfp=0x20/0) from c012de72 > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed > (gfp=0x20/0) from c012de72 > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 2-order allocation failed > (gfp=0x20/0) from c012de72 > Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed > (gfp=0x20/0) from c012de72 > I cut out only a few to give you a hint. I patched the current->comm in myself, > thats where the cdda2wav comes from. Ok. > Is it possible for you to make something like always at least one free page in > every zone->order? If not try to "refill" the order queue? There must be some > way to get rid of those alloc-failures. Of course, as said that's probably the next step, but it won't be a free page in every zone order, we'll do the work lazily as usual (only when necessary, order > 0 allocations should be very unlikely, even more unlikely should be order >0 allocations with GFP_ATOMIC, I believe the right fix is to fix the caller that is allocating memory that way, but of course on the long run we'll also try to defrag the ram, but this is not a good reason for not fixing the drivers! :) > I do an overnight test right now and have a look tomorrow morning how things > went. Ok. > I'll be back, thanks for the feedback. As said right now all kind of feedback is welcome and I've a few other changes pending that looks attractive but they hurts the wonderful dbench numbers so I didn't made them yet in the hope dbench has some relation to real life too (and I can pretty much see why the current algorithm works better than the other changes, it wasn't developed to run well in dbench of course, it just incidentally happened to be the best score in dbench). Another detail: it happened that I was talking with David Mosemberg about the ptrace races while working on the vm, so due an "editing in the wrong tree error" there's now a leftover in the 80_vm-aa-1 patch in signal.c, this patch should be backed out: diff -urN vm-ref/kernel/signal.c vm/kernel/signal.c --- vm-ref/kernel/signal.c Mon Sep 17 01:26:12 2001 +++ vm/kernel/signal.c Mon Sep 17 01:26:25 2001 @@ -382,7 +382,7 @@ switch (sig) { case SIGKILL: case SIGCONT: /* Wake up the process if stopped. */ - if (t->state == TASK_STOPPED) + if (t->state == TASK_STOPPED && !(t->ptrace & PT_PTRACED)) wake_up_process(t); t->exit_code = 0; rm_sig_from_queue(SIGSTOP, t); but it's non fatal, just ignore it for now, it will be fixed in the next -aa. The only downside is that any SIGKILL or SIGCONT won't arrive to the task while it's being ptraced. Andrea ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9] [not found] ` <20010917174037.7e3739b9.skraw@ithnet.com> [not found] ` <20010917181040.J713@athlon.random> @ 2001-09-18 9:00 ` Stephan von Krawczynski 1 sibling, 0 replies; 22+ messages in thread From: Stephan von Krawczynski @ 2001-09-18 9:00 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel On Tue, 18 Sep 2001 00:41:16 +0200 Andrea Arcangeli <andrea@suse.de> wrote: > [ CC'ed to l-k with Stephan approval ] > > - cpu average load is low, during whole test sometimes even below 3 > > (never saw > > this before) > > Good. > > I also had another report with very vfs intensive operation going on and > I suspect this patch will be a good idea (even if it can lead to the > usual excessive grow of the vfs caches on the long run but the current > way is probably too aggressive). Hm, are you sure about this? Here is /proc/meminfo after a night of heavy nfs action (we are at the server side): total: used: free: shared: buffers: cached: Mem: 923574272 919187456 4386816 0 39723008 793706496 Swap: 271392768 1417216 269975552 MemTotal: 901928 kB MemFree: 4284 kB MemShared: 0 kB Buffers: 38792 kB Cached: 775052 kB SwapCached: 52 kB Active: 811464 kB Inactive: 2432 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 901928 kB LowFree: 4284 kB SwapTotal: 265032 kB SwapFree: 263648 kB You see most mem found its way in the active queue. If you talk about "aggressive" meaning aggressively aged or even freed, I cannot see it. I will go on for another day without additional patching and see how things evolve and how the system behaves in interactive situation. Ah, another thing to mention. I got some _new_ alloc failures: Sep 18 04:16:49 admin kernel: nfsd __alloc_pages: 1-order allocation failed (gfp=0x20/0) from c012de72 Sep 18 04:17:27 admin kernel: nfsd __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) from c012de72 Sep 18 04:21:18 admin kernel: gzip __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) from c012de72 c012de5c T _alloc_pages c012de74 t balance_classzone Hope this helps, Stephan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-17 10:25 ` Tonu Samuel 2001-09-16 16:47 ` Jeremy Zawodny 2001-09-16 18:34 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli @ 2001-09-16 19:37 ` Linus Torvalds 2001-09-17 14:04 ` Olaf Zaplinski 2 siblings, 1 reply; 22+ messages in thread From: Linus Torvalds @ 2001-09-16 19:37 UTC (permalink / raw) To: linux-kernel In article <1000722338.14005.0.camel@x153.internalnet>, Tonu Samuel <tonu@please.do.not.remove.this.spam.ee> wrote: > >Problem still exists and persists. Not long time ago man from Yahoo >described well case when change from 2.2.19 to 2.4.x caused performance >problems. On 2.2.19 everything ran fine. They have MySQL running+did >backups from disk. After upgrade to 2.4.x MySQL performance felt down on >backup time. They investigated stuff and found that MySQL daemon gets >swapped out in the middle of usage to make room for buffers. Note that if you're using a raw device backup strategy (ie "e2dump" or similar), that is expected: 2.4.x up until about 2.4.7 gave _much_ too much preference to the buffer cache. That should actually have been fixed in 2.4.8. We used to mark buffer pages much too active. > In summary: >this made both sql and backup double slow. Even increasing memory from >1G->2G didn't helped. Finally they disabled swap at all and problem >lost. You just hid the problem - by disabling swap the buffer cache couldn't grow without bounds any more, and the proper buffer cache shrinking couldn't happen. Try 2.4.8 or later. >If you do not want to change it back as it was in 2.2.x then would be >good if this is tunable somehow. Tuning for bugs? What do you want to happen? You want to have an interface like echo 0 > /proc/bugs/mm that makes mm bugs go away? Linus ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: broken VM in 2.4.10-pre9 2001-09-16 19:37 ` broken VM in 2.4.10-pre9 Linus Torvalds @ 2001-09-17 14:04 ` Olaf Zaplinski 0 siblings, 0 replies; 22+ messages in thread From: Olaf Zaplinski @ 2001-09-17 14:04 UTC (permalink / raw) To: linux-kernel Linus Torvalds wrote: > [...] > What do you want to happen? You want to have an interface like > > echo 0 > /proc/bugs/mm > > that makes mm bugs go away? Good idea! ;-) Well, I had similar problems and went back to 2.2.19... but isn't there a tuneable yet? On http://www.badtux.org/eric/editorial/mindcraft.html I found this one: 'Tuning the file buffer size so that more than 60% of memory can be used (90% in this example) can be accomplished by issuing the following command: echo "2 10 90" >/proc/sys/vm/buffermem" This is documented in the file /usr/src/linux/Documentation/sysctl/vm.txt along with many other tuning parameters, such as the 'bdflush' parameter.' But vm.txt from 2.4.9ac10 and 2.2.19 says: buffermem: The three values in this file correspond to the values in the struct buffer_mem. It controls how much memory should be used for buffer memory. The percentage is calculated as a percentage of total system memory. The values are: min_percent -- this is the minimum percentage of memory that should be spent on buffer memory borrow_percent -- UNUSED max_percent -- UNUSED Is vm.txt out of date, or is there really no tuneable, neither in 2.2.x nor in 2.4.x? Olaf ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2001-09-18 9:00 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson 2001-09-15 23:50 ` Jan Harkes 2001-09-16 5:31 ` Linus Torvalds 2001-09-16 8:45 ` Eric W. Biederman 2001-09-17 10:25 ` Tonu Samuel 2001-09-16 16:47 ` Jeremy Zawodny 2001-09-16 18:36 ` Alan Cox 2001-09-16 19:38 ` Linus Torvalds 2001-09-16 18:34 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli 2001-09-16 19:07 ` Rik van Riel 2001-09-16 15:19 ` broken VM in 2.4.10-pre9 Phillip Susi 2001-09-16 19:33 ` Jeremy Zawodny 2001-09-16 19:54 ` [PATCH] " Rik van Riel 2001-09-16 19:52 ` Rik van Riel 2001-09-16 19:17 ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox 2001-09-16 19:15 ` Rik van Riel 2001-09-16 19:19 ` Andrea Arcangeli 2001-09-16 19:30 ` Linus Torvalds [not found] ` <20010917174037.7e3739b9.skraw@ithnet.com> [not found] ` <20010917181040.J713@athlon.random> [not found] ` <20010917191256.6e6a1c87.skraw@ithnet.com> 2001-09-17 22:41 ` Andrea Arcangeli 2001-09-18 9:00 ` Stephan von Krawczynski 2001-09-16 19:37 ` broken VM in 2.4.10-pre9 Linus Torvalds 2001-09-17 14:04 ` Olaf Zaplinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).