From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 7647E6B0070 for ; Wed, 28 Nov 2012 19:31:23 -0500 (EST) Received: by mail-qc0-f169.google.com with SMTP id t2so12850521qcq.14 for ; Wed, 28 Nov 2012 16:31:22 -0800 (PST) MIME-Version: 1.0 Date: Wed, 28 Nov 2012 16:31:22 -0800 Message-ID: Subject: zram, OOM, and speed of allocation From: Luigi Semenzato Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Sonny Rao , Bryan Freed , Hugh Dickins I am beginning to understand why zram appears to work fine on our x86 systems but not on our ARM systems. The bottom line is that swapping doesn't work as I would expect when allocation is "too fast". In one of my tests, opening 50 tabs simultaneously in a Chrome browser on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I was observing that on the x86 device all of the zram swap space was used before OOM kills happened, but on the ARM device I would see OOM kills when only about 1 GB (out of 3) was swapped out. I wrote a simple program to understand this behavior. The program (called "hog") allocates memory and fills it with a mix of incompressible data (from /dev/urandom) and highly compressible data (1's, just to avoid zero pages) in a given ratio. The memory is never touched again. It turns out that if I don't limit the allocation speed, I see premature OOM kills also on the x86 device. If I limit the allocation to 10 MB/s, the premature OOM kills stop happening on the x86 device, but still happen on the ARM device. If I further limit the allocation speed to 5 Mb/s, the premature OOM kills disappear also from the ARM device. I have noticed a few time constants in the MM whose value is not well explained, and I am wondering if the code is tuned for some ideal system that doesn't behave like ours (considering, for instance, that zram is much faster than swapping to a disk device, but it also uses more CPU). If this is plausible, I am wondering if anybody has suggestions for changes that I could try out to obtain a better behavior with a higher allocation speed. Thanks! Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx114.postini.com [74.125.245.114]) by kanga.kvack.org (Postfix) with SMTP id 46B766B0072 for ; Thu, 29 Nov 2012 13:46:03 -0500 (EST) Received: by mail-qc0-f169.google.com with SMTP id t2so13642970qcq.14 for ; Thu, 29 Nov 2012 10:46:02 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 29 Nov 2012 10:46:02 -0800 Message-ID: Subject: Re: zram, OOM, and speed of allocation From: Luigi Semenzato Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Sonny Rao , Bryan Freed , Hugh Dickins Minchan: I tried your suggestion to move the call to wake_all_kswapd from after "restart:" to after "rebalance:". The behavior is still similar, but slightly improved. Here's what I see. Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, then OOM kills happen, and the system ends up with 1 GB swap used, 2 unused. Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, and continue happening while swap fills up. Eventually swap fills up completely. This is better than before (could not go past about 1 GB of swap used), but there are too many kills too early. I would like to see no OOM kills until swap is full or almost full. Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills happen earlier, and not all swap space is used (400 MB free at the end). This is with 200 processes using 20 MB each, and 2:1 compression ratio. So it looks like kswapd is still not aggressive enough in pushing pages out. What's the best way of changing that? Play around with the watermarks? Incidentally, I also tried removing the min_filelist_kbytes hacky patch, but, as usual, the system thrashes so badly that it's impossible to complete any experiment. I set it to a lower minimum amount of free file pages, 10 MB instead of the 50 MB which we use normally, and I could run with some thrashing, but I got the same results. Thanks! Luigi On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: > I am beginning to understand why zram appears to work fine on our x86 > systems but not on our ARM systems. The bottom line is that swapping > doesn't work as I would expect when allocation is "too fast". > > In one of my tests, opening 50 tabs simultaneously in a Chrome browser > on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I > was observing that on the x86 device all of the zram swap space was > used before OOM kills happened, but on the ARM device I would see OOM > kills when only about 1 GB (out of 3) was swapped out. > > I wrote a simple program to understand this behavior. The program > (called "hog") allocates memory and fills it with a mix of > incompressible data (from /dev/urandom) and highly compressible data > (1's, just to avoid zero pages) in a given ratio. The memory is never > touched again. > > It turns out that if I don't limit the allocation speed, I see > premature OOM kills also on the x86 device. If I limit the allocation > to 10 MB/s, the premature OOM kills stop happening on the x86 device, > but still happen on the ARM device. If I further limit the allocation > speed to 5 Mb/s, the premature OOM kills disappear also from the ARM > device. > > I have noticed a few time constants in the MM whose value is not well > explained, and I am wondering if the code is tuned for some ideal > system that doesn't behave like ours (considering, for instance, that > zram is much faster than swapping to a disk device, but it also uses > more CPU). If this is plausible, I am wondering if anybody has > suggestions for changes that I could try out to obtain a better > behavior with a higher allocation speed. > > Thanks! > Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx155.postini.com [74.125.245.155]) by kanga.kvack.org (Postfix) with SMTP id 271406B008C for ; Thu, 29 Nov 2012 14:31:47 -0500 (EST) Received: by mail-qa0-f48.google.com with SMTP id o19so770985qap.14 for ; Thu, 29 Nov 2012 11:31:46 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 29 Nov 2012 11:31:46 -0800 Message-ID: Subject: Re: zram, OOM, and speed of allocation From: Luigi Semenzato Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Sonny Rao , Bryan Freed , Hugh Dickins Oh well, I found the problem, it's laptop_mode. We keep it on by default. When I turn it off, I can allocate as fast as I can, and no OOMs happen until swap is exhausted. I don't think this is a desirable behavior even for laptop_mode, so if anybody wants to help me debug it (or wants my help in debugging it) do let me know. Thanks! Luigi On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: > Minchan: > > I tried your suggestion to move the call to wake_all_kswapd from after > "restart:" to after "rebalance:". The behavior is still similar, but > slightly improved. Here's what I see. > > Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, > then OOM kills happen, and the system ends up with 1 GB swap used, 2 > unused. > > Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, > and continue happening while swap fills up. Eventually swap fills up > completely. This is better than before (could not go past about 1 GB > of swap used), but there are too many kills too early. I would like > to see no OOM kills until swap is full or almost full. > > Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills > happen earlier, and not all swap space is used (400 MB free at the > end). > > This is with 200 processes using 20 MB each, and 2:1 compression ratio. > > So it looks like kswapd is still not aggressive enough in pushing > pages out. What's the best way of changing that? Play around with > the watermarks? > > Incidentally, I also tried removing the min_filelist_kbytes hacky > patch, but, as usual, the system thrashes so badly that it's > impossible to complete any experiment. I set it to a lower minimum > amount of free file pages, 10 MB instead of the 50 MB which we use > normally, and I could run with some thrashing, but I got the same > results. > > Thanks! > Luigi > > > On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: >> I am beginning to understand why zram appears to work fine on our x86 >> systems but not on our ARM systems. The bottom line is that swapping >> doesn't work as I would expect when allocation is "too fast". >> >> In one of my tests, opening 50 tabs simultaneously in a Chrome browser >> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I >> was observing that on the x86 device all of the zram swap space was >> used before OOM kills happened, but on the ARM device I would see OOM >> kills when only about 1 GB (out of 3) was swapped out. >> >> I wrote a simple program to understand this behavior. The program >> (called "hog") allocates memory and fills it with a mix of >> incompressible data (from /dev/urandom) and highly compressible data >> (1's, just to avoid zero pages) in a given ratio. The memory is never >> touched again. >> >> It turns out that if I don't limit the allocation speed, I see >> premature OOM kills also on the x86 device. If I limit the allocation >> to 10 MB/s, the premature OOM kills stop happening on the x86 device, >> but still happen on the ARM device. If I further limit the allocation >> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM >> device. >> >> I have noticed a few time constants in the MM whose value is not well >> explained, and I am wondering if the code is tuned for some ideal >> system that doesn't behave like ours (considering, for instance, that >> zram is much faster than swapping to a disk device, but it also uses >> more CPU). If this is plausible, I am wondering if anybody has >> suggestions for changes that I could try out to obtain a better >> behavior with a higher allocation speed. >> >> Thanks! >> Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx206.postini.com [74.125.245.206]) by kanga.kvack.org (Postfix) with SMTP id 6CCF76B004D for ; Thu, 29 Nov 2012 15:55:50 -0500 (EST) Received: by mail-qc0-f169.google.com with SMTP id t2so13731698qcq.14 for ; Thu, 29 Nov 2012 12:55:49 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Sonny Rao Date: Thu, 29 Nov 2012 12:55:29 -0800 Message-ID: Subject: Re: zram, OOM, and speed of allocation Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Bryan Freed , Hugh Dickins On Thu, Nov 29, 2012 at 11:31 AM, Luigi Semenzato wrote: > Oh well, I found the problem, it's laptop_mode. We keep it on by > default. When I turn it off, I can allocate as fast as I can, and no > OOMs happen until swap is exhausted. > > I don't think this is a desirable behavior even for laptop_mode, so if > anybody wants to help me debug it (or wants my help in debugging it) > do let me know. > Luigi, I thought we disabled Laptop mode a few weeks ago -- due to undesirable behavior with respect to too many writes happening. Are you sure it's on? > Thanks! > Luigi > > On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: >> Minchan: >> >> I tried your suggestion to move the call to wake_all_kswapd from after >> "restart:" to after "rebalance:". The behavior is still similar, but >> slightly improved. Here's what I see. >> >> Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, >> then OOM kills happen, and the system ends up with 1 GB swap used, 2 >> unused. >> >> Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, >> and continue happening while swap fills up. Eventually swap fills up >> completely. This is better than before (could not go past about 1 GB >> of swap used), but there are too many kills too early. I would like >> to see no OOM kills until swap is full or almost full. >> >> Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills >> happen earlier, and not all swap space is used (400 MB free at the >> end). >> >> This is with 200 processes using 20 MB each, and 2:1 compression ratio. >> >> So it looks like kswapd is still not aggressive enough in pushing >> pages out. What's the best way of changing that? Play around with >> the watermarks? >> >> Incidentally, I also tried removing the min_filelist_kbytes hacky >> patch, but, as usual, the system thrashes so badly that it's >> impossible to complete any experiment. I set it to a lower minimum >> amount of free file pages, 10 MB instead of the 50 MB which we use >> normally, and I could run with some thrashing, but I got the same >> results. >> >> Thanks! >> Luigi >> >> >> On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: >>> I am beginning to understand why zram appears to work fine on our x86 >>> systems but not on our ARM systems. The bottom line is that swapping >>> doesn't work as I would expect when allocation is "too fast". >>> >>> In one of my tests, opening 50 tabs simultaneously in a Chrome browser >>> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I >>> was observing that on the x86 device all of the zram swap space was >>> used before OOM kills happened, but on the ARM device I would see OOM >>> kills when only about 1 GB (out of 3) was swapped out. >>> >>> I wrote a simple program to understand this behavior. The program >>> (called "hog") allocates memory and fills it with a mix of >>> incompressible data (from /dev/urandom) and highly compressible data >>> (1's, just to avoid zero pages) in a given ratio. The memory is never >>> touched again. >>> >>> It turns out that if I don't limit the allocation speed, I see >>> premature OOM kills also on the x86 device. If I limit the allocation >>> to 10 MB/s, the premature OOM kills stop happening on the x86 device, >>> but still happen on the ARM device. If I further limit the allocation >>> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM >>> device. >>> >>> I have noticed a few time constants in the MM whose value is not well >>> explained, and I am wondering if the code is tuned for some ideal >>> system that doesn't behave like ours (considering, for instance, that >>> zram is much faster than swapping to a disk device, but it also uses >>> more CPU). If this is plausible, I am wondering if anybody has >>> suggestions for changes that I could try out to obtain a better >>> behavior with a higher allocation speed. >>> >>> Thanks! >>> Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx173.postini.com [74.125.245.173]) by kanga.kvack.org (Postfix) with SMTP id A83EE6B0075 for ; Thu, 29 Nov 2012 16:33:58 -0500 (EST) Received: by mail-qa0-f48.google.com with SMTP id o19so898315qap.14 for ; Thu, 29 Nov 2012 13:33:57 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 29 Nov 2012 13:33:57 -0800 Message-ID: Subject: Re: zram, OOM, and speed of allocation From: Luigi Semenzato Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Sonny Rao Cc: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Bryan Freed , Hugh Dickins On Thu, Nov 29, 2012 at 12:55 PM, Sonny Rao wrote: > On Thu, Nov 29, 2012 at 11:31 AM, Luigi Semenzato wrote: >> Oh well, I found the problem, it's laptop_mode. We keep it on by >> default. When I turn it off, I can allocate as fast as I can, and no >> OOMs happen until swap is exhausted. >> >> I don't think this is a desirable behavior even for laptop_mode, so if >> anybody wants to help me debug it (or wants my help in debugging it) >> do let me know. >> > > Luigi, I thought we disabled Laptop mode a few weeks ago -- due to > undesirable behavior with respect to too many writes happening. > Are you sure it's on? Yes. The change happened a month ago, but I hadn't updated my testing image since then. So I suppose we aren't really too interested in fixing the laptop_mode behavior, but I'll be happy to test fixes if anybody would like me to. > >> Thanks! >> Luigi >> >> On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: >>> Minchan: >>> >>> I tried your suggestion to move the call to wake_all_kswapd from after >>> "restart:" to after "rebalance:". The behavior is still similar, but >>> slightly improved. Here's what I see. >>> >>> Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, >>> then OOM kills happen, and the system ends up with 1 GB swap used, 2 >>> unused. >>> >>> Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, >>> and continue happening while swap fills up. Eventually swap fills up >>> completely. This is better than before (could not go past about 1 GB >>> of swap used), but there are too many kills too early. I would like >>> to see no OOM kills until swap is full or almost full. >>> >>> Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills >>> happen earlier, and not all swap space is used (400 MB free at the >>> end). >>> >>> This is with 200 processes using 20 MB each, and 2:1 compression ratio. >>> >>> So it looks like kswapd is still not aggressive enough in pushing >>> pages out. What's the best way of changing that? Play around with >>> the watermarks? >>> >>> Incidentally, I also tried removing the min_filelist_kbytes hacky >>> patch, but, as usual, the system thrashes so badly that it's >>> impossible to complete any experiment. I set it to a lower minimum >>> amount of free file pages, 10 MB instead of the 50 MB which we use >>> normally, and I could run with some thrashing, but I got the same >>> results. >>> >>> Thanks! >>> Luigi >>> >>> >>> On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: >>>> I am beginning to understand why zram appears to work fine on our x86 >>>> systems but not on our ARM systems. The bottom line is that swapping >>>> doesn't work as I would expect when allocation is "too fast". >>>> >>>> In one of my tests, opening 50 tabs simultaneously in a Chrome browser >>>> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I >>>> was observing that on the x86 device all of the zram swap space was >>>> used before OOM kills happened, but on the ARM device I would see OOM >>>> kills when only about 1 GB (out of 3) was swapped out. >>>> >>>> I wrote a simple program to understand this behavior. The program >>>> (called "hog") allocates memory and fills it with a mix of >>>> incompressible data (from /dev/urandom) and highly compressible data >>>> (1's, just to avoid zero pages) in a given ratio. The memory is never >>>> touched again. >>>> >>>> It turns out that if I don't limit the allocation speed, I see >>>> premature OOM kills also on the x86 device. If I limit the allocation >>>> to 10 MB/s, the premature OOM kills stop happening on the x86 device, >>>> but still happen on the ARM device. If I further limit the allocation >>>> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM >>>> device. >>>> >>>> I have noticed a few time constants in the MM whose value is not well >>>> explained, and I am wondering if the code is tuned for some ideal >>>> system that doesn't behave like ours (considering, for instance, that >>>> zram is much faster than swapping to a disk device, but it also uses >>>> more CPU). If this is plausible, I am wondering if anybody has >>>> suggestions for changes that I could try out to obtain a better >>>> behavior with a higher allocation speed. >>>> >>>> Thanks! >>>> Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx184.postini.com [74.125.245.184]) by kanga.kvack.org (Postfix) with SMTP id 758AB6B0078 for ; Thu, 29 Nov 2012 17:57:40 -0500 (EST) Received: by mail-qa0-f41.google.com with SMTP id o19so955228qap.14 for ; Thu, 29 Nov 2012 14:57:39 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Sonny Rao Date: Thu, 29 Nov 2012 14:57:19 -0800 Message-ID: Subject: Re: zram, OOM, and speed of allocation Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Bryan Freed , Hugh Dickins On Thu, Nov 29, 2012 at 1:33 PM, Luigi Semenzato wrote: > On Thu, Nov 29, 2012 at 12:55 PM, Sonny Rao wrote: >> On Thu, Nov 29, 2012 at 11:31 AM, Luigi Semenzato wrote: >>> Oh well, I found the problem, it's laptop_mode. We keep it on by >>> default. When I turn it off, I can allocate as fast as I can, and no >>> OOMs happen until swap is exhausted. >>> >>> I don't think this is a desirable behavior even for laptop_mode, so if >>> anybody wants to help me debug it (or wants my help in debugging it) >>> do let me know. >>> >> >> Luigi, I thought we disabled Laptop mode a few weeks ago -- due to >> undesirable behavior with respect to too many writes happening. >> Are you sure it's on? > > Yes. The change happened a month ago, but I hadn't updated my testing > image since then. > > So I suppose we aren't really too interested in fixing the laptop_mode > behavior, but I'll be happy to test fixes if anybody would like me to. > Yeah, the big problem that led us to disable laptop_mode is some pathological behavior with disk writes. Laptop mode sets a timer after each write, presumably to see if any data got dirtied, and checks for dirty data after the timer expires and then writes it out *and* sets the timer again. So we saw a pattern where things were being dirtied often enough and there is almost always new dirty data when the timer expires and the we'd keep the disk up and burning power for a very long time, which is clearly not what laptop mode is trying to do. Maybe we should work on trying to fix laptop_mode at some point. If it just did a single flush of dirty data when we woke up the disk and didn't try to wait for more dirty data, it would work better. Your case here is a different example of bad interactions with laptop_mode seems to come from code in balance_pgdat: loop_again: total_scanned = 0; sc.nr_reclaimed = 0; sc.may_writepage = !laptop_mode; <----------- count_vm_event(PAGEOUTRUN); this code is assuming that swap is on a disk which is subject to laptop mode, but in the case of zram (and NFS), this is an incorrect assumption >> >>> Thanks! >>> Luigi >>> >>> On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: >>>> Minchan: >>>> >>>> I tried your suggestion to move the call to wake_all_kswapd from after >>>> "restart:" to after "rebalance:". The behavior is still similar, but >>>> slightly improved. Here's what I see. >>>> >>>> Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, >>>> then OOM kills happen, and the system ends up with 1 GB swap used, 2 >>>> unused. >>>> >>>> Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, >>>> and continue happening while swap fills up. Eventually swap fills up >>>> completely. This is better than before (could not go past about 1 GB >>>> of swap used), but there are too many kills too early. I would like >>>> to see no OOM kills until swap is full or almost full. >>>> >>>> Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills >>>> happen earlier, and not all swap space is used (400 MB free at the >>>> end). >>>> >>>> This is with 200 processes using 20 MB each, and 2:1 compression ratio. >>>> >>>> So it looks like kswapd is still not aggressive enough in pushing >>>> pages out. What's the best way of changing that? Play around with >>>> the watermarks? >>>> >>>> Incidentally, I also tried removing the min_filelist_kbytes hacky >>>> patch, but, as usual, the system thrashes so badly that it's >>>> impossible to complete any experiment. I set it to a lower minimum >>>> amount of free file pages, 10 MB instead of the 50 MB which we use >>>> normally, and I could run with some thrashing, but I got the same >>>> results. >>>> >>>> Thanks! >>>> Luigi >>>> >>>> >>>> On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: >>>>> I am beginning to understand why zram appears to work fine on our x86 >>>>> systems but not on our ARM systems. The bottom line is that swapping >>>>> doesn't work as I would expect when allocation is "too fast". >>>>> >>>>> In one of my tests, opening 50 tabs simultaneously in a Chrome browser >>>>> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I >>>>> was observing that on the x86 device all of the zram swap space was >>>>> used before OOM kills happened, but on the ARM device I would see OOM >>>>> kills when only about 1 GB (out of 3) was swapped out. >>>>> >>>>> I wrote a simple program to understand this behavior. The program >>>>> (called "hog") allocates memory and fills it with a mix of >>>>> incompressible data (from /dev/urandom) and highly compressible data >>>>> (1's, just to avoid zero pages) in a given ratio. The memory is never >>>>> touched again. >>>>> >>>>> It turns out that if I don't limit the allocation speed, I see >>>>> premature OOM kills also on the x86 device. If I limit the allocation >>>>> to 10 MB/s, the premature OOM kills stop happening on the x86 device, >>>>> but still happen on the ARM device. If I further limit the allocation >>>>> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM >>>>> device. >>>>> >>>>> I have noticed a few time constants in the MM whose value is not well >>>>> explained, and I am wondering if the code is tuned for some ideal >>>>> system that doesn't behave like ours (considering, for instance, that >>>>> zram is much faster than swapping to a disk device, but it also uses >>>>> more CPU). If this is plausible, I am wondering if anybody has >>>>> suggestions for changes that I could try out to obtain a better >>>>> behavior with a higher allocation speed. >>>>> >>>>> Thanks! >>>>> Luigi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id 5A3DF6B005D for ; Mon, 3 Dec 2012 01:42:15 -0500 (EST) Date: Mon, 3 Dec 2012 15:42:12 +0900 From: Minchan Kim Subject: Re: zram, OOM, and speed of allocation Message-ID: <20121203064212.GA4569@blaptop> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: linux-mm@kvack.org, Dan Magenheimer , Sonny Rao , Bryan Freed , Hugh Dickins Hi Luigi, On Thu, Nov 29, 2012 at 11:31:46AM -0800, Luigi Semenzato wrote: > Oh well, I found the problem, it's laptop_mode. We keep it on by > default. When I turn it off, I can allocate as fast as I can, and no > OOMs happen until swap is exhausted. > > I don't think this is a desirable behavior even for laptop_mode, so if > anybody wants to help me debug it (or wants my help in debugging it) > do let me know. Interesting. Just a quick trial. Could you try this patch based on your kernel without my previous patch " wakeup kswapd in direct reclaim path"? If you still has a trouble about stopped kswapd, plz apply both patches. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 32bc955..4a7fe5d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -725,6 +725,7 @@ typedef struct pglist_data { struct task_struct *kswapd; /* Protected by lock_memory_hotplug() */ int kswapd_max_order; enum zone_type classzone_idx; + bool may_writepage; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/vmscan.c b/mm/vmscan.c index 53dcde9..1952420 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -68,6 +68,11 @@ struct scan_control { /* This context's GFP mask */ gfp_t gfp_mask; + /* + * If laptop_mode is true, you don't need to set may_writepage. + * Otherwise, you should set may_writepage explicitly. + */ + bool laptop_mode; int may_writepage; /* Can mapped pages be reclaimed? */ @@ -1846,6 +1851,15 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) unsigned long nr_reclaimed, nr_scanned; unsigned long nr_to_reclaim = sc->nr_to_reclaim; struct blk_plug plug; + struct zone *zone = lruvec_zone(lruvec); + pg_data_t *pgdat = zone->zone_pgdat; + + if (sc->laptop_mode) { + if (pgdat->may_writepage) + sc->may_writepage = 1; + else + sc->may_writepage = 0; + } restart: nr_reclaimed = 0; @@ -2145,11 +2159,9 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, * writeout. So in laptop mode, write out the whole world. */ writeback_threshold = sc->nr_to_reclaim + sc->nr_to_reclaim / 2; - if (total_scanned > writeback_threshold) { - wakeup_flusher_threads(laptop_mode ? 0 : total_scanned, + if (total_scanned > writeback_threshold) + wakeup_flusher_threads(sc->laptop_mode ? 0 : total_scanned, WB_REASON_TRY_TO_FREE_PAGES); - sc->may_writepage = 1; - } /* Take a nap, wait for some writeback to complete */ if (!sc->hibernation_mode && sc->nr_scanned && @@ -2289,7 +2301,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order, unsigned long nr_reclaimed; struct scan_control sc = { .gfp_mask = gfp_mask, - .may_writepage = !laptop_mode, + .laptop_mode = laptop_mode, .nr_to_reclaim = SWAP_CLUSTER_MAX, .may_unmap = 1, .may_swap = 1, @@ -2331,7 +2343,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg, struct scan_control sc = { .nr_scanned = 0, .nr_to_reclaim = SWAP_CLUSTER_MAX, - .may_writepage = !laptop_mode, + .laptop_mode = laptop_mode, .may_unmap = 1, .may_swap = !noswap, .order = 0, @@ -2370,7 +2382,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_reclaimed; int nid; struct scan_control sc = { - .may_writepage = !laptop_mode, + .laptop_mode = laptop_mode, .may_unmap = 1, .may_swap = !noswap, .nr_to_reclaim = SWAP_CLUSTER_MAX, @@ -2585,7 +2597,7 @@ loop_again: total_scanned = 0; sc.priority = DEF_PRIORITY; sc.nr_reclaimed = 0; - sc.may_writepage = !laptop_mode; + sc.laptop_mode = laptop_mode; count_vm_event(PAGEOUTRUN); do { @@ -2722,7 +2734,7 @@ loop_again: */ if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) - sc.may_writepage = 1; + zone->zone_pgdat->may_writepage = true; if (zone->all_unreclaimable) { if (end_zone && end_zone == i) @@ -2749,6 +2761,7 @@ loop_again: * speculatively avoid congestion waits */ zone_clear_flag(zone, ZONE_CONGESTED); + zone->zone_pgdat->may_writepage = false; if (i <= *classzone_idx) balanced += zone->present_pages; } @@ -3112,6 +3125,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) .gfp_mask = GFP_HIGHUSER_MOVABLE, .may_swap = 1, .may_unmap = 1, + .laptop_mode = false, .may_writepage = 1, .nr_to_reclaim = nr_to_reclaim, .hibernation_mode = 1, @@ -3299,6 +3313,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) struct task_struct *p = current; struct reclaim_state reclaim_state; struct scan_control sc = { + .laptop_mode = false, .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE), .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP), .may_swap = 1, > > Thanks! > Luigi > > On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: > > Minchan: > > > > I tried your suggestion to move the call to wake_all_kswapd from after > > "restart:" to after "rebalance:". The behavior is still similar, but > > slightly improved. Here's what I see. > > > > Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, > > then OOM kills happen, and the system ends up with 1 GB swap used, 2 > > unused. > > > > Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, > > and continue happening while swap fills up. Eventually swap fills up > > completely. This is better than before (could not go past about 1 GB > > of swap used), but there are too many kills too early. I would like > > to see no OOM kills until swap is full or almost full. > > > > Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills > > happen earlier, and not all swap space is used (400 MB free at the > > end). > > > > This is with 200 processes using 20 MB each, and 2:1 compression ratio. > > > > So it looks like kswapd is still not aggressive enough in pushing > > pages out. What's the best way of changing that? Play around with > > the watermarks? > > > > Incidentally, I also tried removing the min_filelist_kbytes hacky > > patch, but, as usual, the system thrashes so badly that it's > > impossible to complete any experiment. I set it to a lower minimum > > amount of free file pages, 10 MB instead of the 50 MB which we use > > normally, and I could run with some thrashing, but I got the same > > results. > > > > Thanks! > > Luigi > > > > > > On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: > >> I am beginning to understand why zram appears to work fine on our x86 > >> systems but not on our ARM systems. The bottom line is that swapping > >> doesn't work as I would expect when allocation is "too fast". > >> > >> In one of my tests, opening 50 tabs simultaneously in a Chrome browser > >> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I > >> was observing that on the x86 device all of the zram swap space was > >> used before OOM kills happened, but on the ARM device I would see OOM > >> kills when only about 1 GB (out of 3) was swapped out. > >> > >> I wrote a simple program to understand this behavior. The program > >> (called "hog") allocates memory and fills it with a mix of > >> incompressible data (from /dev/urandom) and highly compressible data > >> (1's, just to avoid zero pages) in a given ratio. The memory is never > >> touched again. > >> > >> It turns out that if I don't limit the allocation speed, I see > >> premature OOM kills also on the x86 device. If I limit the allocation > >> to 10 MB/s, the premature OOM kills stop happening on the x86 device, > >> but still happen on the ARM device. If I further limit the allocation > >> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM > >> device. > >> > >> I have noticed a few time constants in the MM whose value is not well > >> explained, and I am wondering if the code is tuned for some ideal > >> system that doesn't behave like ours (considering, for instance, that > >> zram is much faster than swapping to a disk device, but it also uses > >> more CPU). If this is plausible, I am wondering if anybody has > >> suggestions for changes that I could try out to obtain a better > >> behavior with a higher allocation speed. > >> > >> Thanks! > >> Luigi > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id B2D816B005D for ; Mon, 3 Dec 2012 02:38:26 -0500 (EST) Date: Mon, 3 Dec 2012 16:38:24 +0900 From: Minchan Kim Subject: Re: zram, OOM, and speed of allocation Message-ID: <20121203073824.GB4569@blaptop> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Luigi Semenzato Cc: linux-mm@kvack.org, Dan Magenheimer , Sonny Rao , Bryan Freed , Hugh Dickins Hi Luigi, It's another patch without dependency with previous my patches. You can control /proc/sys/vm/swappiness up to 200(which means VM reclaimer can reclaim only anonymous pages) so I hope it makes swap device full while file-backed page(ie, code pages) are protected from eviction. I hope this patch removes your hacky min_filelist_kbytes. Could you try this and send feedback? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx119.postini.com [74.125.245.119]) by kanga.kvack.org (Postfix) with SMTP id 4FCB06B00BD for ; Sat, 16 Feb 2013 21:49:40 -0500 (EST) Received: by mail-da0-f47.google.com with SMTP id s35so2005065dak.34 for ; Sat, 16 Feb 2013 18:49:39 -0800 (PST) Message-ID: <5120453C.7050408@gmail.com> Date: Sun, 17 Feb 2013 10:49:32 +0800 From: Jaegeuk Hanse MIME-Version: 1.0 Subject: Re: zram, OOM, and speed of allocation References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Sonny Rao Cc: Luigi Semenzato , linux-mm@kvack.org, Minchan Kim , Dan Magenheimer , Bryan Freed , Hugh Dickins On 11/30/2012 06:57 AM, Sonny Rao wrote: > On Thu, Nov 29, 2012 at 1:33 PM, Luigi Semenzato wrote: >> On Thu, Nov 29, 2012 at 12:55 PM, Sonny Rao wrote: >>> On Thu, Nov 29, 2012 at 11:31 AM, Luigi Semenzato wrote: >>>> Oh well, I found the problem, it's laptop_mode. We keep it on by >>>> default. When I turn it off, I can allocate as fast as I can, and no >>>> OOMs happen until swap is exhausted. >>>> >>>> I don't think this is a desirable behavior even for laptop_mode, so if >>>> anybody wants to help me debug it (or wants my help in debugging it) >>>> do let me know. >>>> >>> Luigi, I thought we disabled Laptop mode a few weeks ago -- due to >>> undesirable behavior with respect to too many writes happening. >>> Are you sure it's on? >> Yes. The change happened a month ago, but I hadn't updated my testing >> image since then. >> >> So I suppose we aren't really too interested in fixing the laptop_mode >> behavior, but I'll be happy to test fixes if anybody would like me to. >> > Yeah, the big problem that led us to disable laptop_mode is some > pathological behavior with disk writes. > > Laptop mode sets a timer after each write, presumably to see if any > data got dirtied, and checks for dirty data after the timer expires > and then writes it out *and* sets the timer again. So we saw a > pattern where things were being dirtied often enough and there is > almost always new dirty data when the timer expires and the we'd keep > the disk up and burning power for a very long time, which is clearly > not what laptop mode is trying to do. Do you mean only write after timer expires in laptop mode? > > Maybe we should work on trying to fix laptop_mode at some point. If > it just did a single flush of dirty data when we woke up the disk and > didn't try to wait for more dirty data, it would work better. > > Your case here is a different example of bad interactions with > laptop_mode seems to come from code in balance_pgdat: > > loop_again: > total_scanned = 0; > sc.nr_reclaimed = 0; > sc.may_writepage = !laptop_mode; <----------- > count_vm_event(PAGEOUTRUN); > > > this code is assuming that swap is on a disk which is subject to > laptop mode, but in the case of zram (and NFS), this is an incorrect > assumption > > >>>> Thanks! >>>> Luigi >>>> >>>> On Thu, Nov 29, 2012 at 10:46 AM, Luigi Semenzato wrote: >>>>> Minchan: >>>>> >>>>> I tried your suggestion to move the call to wake_all_kswapd from after >>>>> "restart:" to after "rebalance:". The behavior is still similar, but >>>>> slightly improved. Here's what I see. >>>>> >>>>> Allocating as fast as I can: 1.5 GB of the 3 GB of zram swap are used, >>>>> then OOM kills happen, and the system ends up with 1 GB swap used, 2 >>>>> unused. >>>>> >>>>> Allocating 10 MB/s: some kills happen when only 1 to 1.5 GB are used, >>>>> and continue happening while swap fills up. Eventually swap fills up >>>>> completely. This is better than before (could not go past about 1 GB >>>>> of swap used), but there are too many kills too early. I would like >>>>> to see no OOM kills until swap is full or almost full. >>>>> >>>>> Allocating 20 MB/s: almost as good as with 10 MB/s, but more kills >>>>> happen earlier, and not all swap space is used (400 MB free at the >>>>> end). >>>>> >>>>> This is with 200 processes using 20 MB each, and 2:1 compression ratio. >>>>> >>>>> So it looks like kswapd is still not aggressive enough in pushing >>>>> pages out. What's the best way of changing that? Play around with >>>>> the watermarks? >>>>> >>>>> Incidentally, I also tried removing the min_filelist_kbytes hacky >>>>> patch, but, as usual, the system thrashes so badly that it's >>>>> impossible to complete any experiment. I set it to a lower minimum >>>>> amount of free file pages, 10 MB instead of the 50 MB which we use >>>>> normally, and I could run with some thrashing, but I got the same >>>>> results. >>>>> >>>>> Thanks! >>>>> Luigi >>>>> >>>>> >>>>> On Wed, Nov 28, 2012 at 4:31 PM, Luigi Semenzato wrote: >>>>>> I am beginning to understand why zram appears to work fine on our x86 >>>>>> systems but not on our ARM systems. The bottom line is that swapping >>>>>> doesn't work as I would expect when allocation is "too fast". >>>>>> >>>>>> In one of my tests, opening 50 tabs simultaneously in a Chrome browser >>>>>> on devices with 2 GB of RAM and a zram-disk of 3 GB (uncompressed), I >>>>>> was observing that on the x86 device all of the zram swap space was >>>>>> used before OOM kills happened, but on the ARM device I would see OOM >>>>>> kills when only about 1 GB (out of 3) was swapped out. >>>>>> >>>>>> I wrote a simple program to understand this behavior. The program >>>>>> (called "hog") allocates memory and fills it with a mix of >>>>>> incompressible data (from /dev/urandom) and highly compressible data >>>>>> (1's, just to avoid zero pages) in a given ratio. The memory is never >>>>>> touched again. >>>>>> >>>>>> It turns out that if I don't limit the allocation speed, I see >>>>>> premature OOM kills also on the x86 device. If I limit the allocation >>>>>> to 10 MB/s, the premature OOM kills stop happening on the x86 device, >>>>>> but still happen on the ARM device. If I further limit the allocation >>>>>> speed to 5 Mb/s, the premature OOM kills disappear also from the ARM >>>>>> device. >>>>>> >>>>>> I have noticed a few time constants in the MM whose value is not well >>>>>> explained, and I am wondering if the code is tuned for some ideal >>>>>> system that doesn't behave like ours (considering, for instance, that >>>>>> zram is much faster than swapping to a disk device, but it also uses >>>>>> more CPU). If this is plausible, I am wondering if anybody has >>>>>> suggestions for changes that I could try out to obtain a better >>>>>> behavior with a higher allocation speed. >>>>>> >>>>>> Thanks! >>>>>> Luigi > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org