linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] avoid swapping out with swappiness==0
@ 2012-03-02 17:36 Satoru Moriya
  2012-03-02 22:47 ` Rik van Riel
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-03-02 17:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Rik van Riel, lwoodman, jweiner, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

Sometimes we'd like to avoid swapping out anonymous memory
in particular, avoid swapping out pages of important process or
process groups while there is a reasonable amount of pagecache
on RAM so that we can satisfy our customers' requirements.

OTOH, we can control how aggressive the kernel will swap memory pages
with /proc/sys/vm/swappiness for global and
/sys/fs/cgroup/memory/memory.swappiness for each memcg.

But with current reclaim implementation, the kernel may swap out
even if we set swappiness==0 and there is pagecache on RAM.

This patch changes the behavior with swappiness==0. If we set
swappiness==0, the kernel does not swap out completely
(for global reclaim until the amount of free pages and filebacked
pages in a zone has been reduced to something very very small
(nr_free + nr_filebacked < high watermark)).

Any comments are welcome.

Regards,
Satoru Moriya

Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>
---
 mm/vmscan.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c52b235..27dc3e8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
 	 * proportional to the fraction of recently scanned pages on
 	 * each list that were recently referenced and in active use.
 	 */
-	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
+	ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
 	ap /= reclaim_stat->recent_rotated[0] + 1;
 
-	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
+	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
 	fp /= reclaim_stat->recent_rotated[1] + 1;
 	spin_unlock_irq(&mz->zone->lru_lock);
 
@@ -1999,7 +1999,7 @@ out:
 		unsigned long scan;
 
 		scan = zone_nr_lru_pages(mz, lru);
-		if (priority || noswap) {
+		if (priority || noswap || !vmscan_swappiness(mz, sc)) {
 			scan >>= priority;
 			if (!scan && force_scan)
 				scan = SWAP_CLUSTER_MAX;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
@ 2012-03-02 22:47 ` Rik van Riel
  2012-03-02 23:43   ` Satoru Moriya
  2012-03-03  2:29   ` Hillf Danton
  2012-03-04  6:57 ` Minchan Kim
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Rik van Riel @ 2012-03-02 22:47 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: linux-mm, linux-kernel, lwoodman, jweiner, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

On 03/02/2012 12:36 PM, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
>
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
>
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked<  high watermark)).
>
> Any comments are welcome.
>
> Regards,
> Satoru Moriya
>
> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>
> ---
>   mm/vmscan.c |    6 +++---
>   1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c52b235..27dc3e8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
>   	 * proportional to the fraction of recently scanned pages on
>   	 * each list that were recently referenced and in active use.
>   	 */
> -	ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
> +	ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
>   	ap /= reclaim_stat->recent_rotated[0] + 1;
>
> -	fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
> +	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
>   	fp /= reclaim_stat->recent_rotated[1] + 1;
>   	spin_unlock_irq(&mz->zone->lru_lock);

ACK on this bit of the patch.

> @@ -1999,7 +1999,7 @@ out:
>   		unsigned long scan;
>
>   		scan = zone_nr_lru_pages(mz, lru);
> -		if (priority || noswap) {
> +		if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>   			scan>>= priority;
>   			if (!scan&&  force_scan)
>   				scan = SWAP_CLUSTER_MAX;

However, I do not understand why we fail to scale
the number of pages we want to scan with priority
if "noswap".

For that matter, surely if we do not want to swap
out anonymous pages, we WANT to go into this if
branch, in order to make sure we set "scan" to 0?

scan = div64_u64(scan * fraction[file], denominator);

With your patch and swappiness=0, or no swap space, it
looks like we do not zero out "scan" and may end up
scanning anonymous pages.

Am I overlooking something?  Is this correct?

I mean, it is Friday and my brain is very full...

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 22:47 ` Rik van Riel
@ 2012-03-02 23:43   ` Satoru Moriya
  2012-03-03  2:29   ` Hillf Danton
  1 sibling, 0 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-03-02 23:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, lwoodman, jweiner, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1370 bytes --]

Hi Rik,

Thank you for reviewing.

On 03/02/2012 05:47 PM, Rik van Riel wrote:
> On 03/02/2012 12:36 PM, Satoru Moriya wrote:
>> @@ -1999,7 +1999,7 @@ out:
>>           unsigned long scan;
>>
>>           scan = zone_nr_lru_pages(mz, lru);
>> -        if (priority || noswap) {
>> +        if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>>               scan>>= priority;
>>               if (!scan&&  force_scan)
>>                   scan = SWAP_CLUSTER_MAX;
> 
> However, I do not understand why we fail to scale the number of pages 
> we want to scan with priority if "noswap".
> 
> For that matter, surely if we do not want to swap out anonymous pages, 
> we WANT to go into this if branch, in order to make sure we set "scan" 
> to 0?
> 
> scan = div64_u64(scan * fraction[file], denominator);
> 
> With your patch and swappiness=0, or no swap space, it looks like we 
> do not zero out "scan" and may end up scanning anonymous pages.

With my patch, if swappiness==0 or noswap==1, fraction[file] is
set to 0. As a result, scan will be set to 0, too.

> Am I overlooking something?  Is this correct?
> 
> I mean, it is Friday and my brain is very full...

Have a nice weekend ;)

Regards,
Satoru
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 22:47 ` Rik van Riel
  2012-03-02 23:43   ` Satoru Moriya
@ 2012-03-03  2:29   ` Hillf Danton
  1 sibling, 0 replies; 32+ messages in thread
From: Hillf Danton @ 2012-03-03  2:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Satoru Moriya, linux-mm, linux-kernel, lwoodman, jweiner, shaohua.li

On Sat, Mar 3, 2012 at 6:47 AM, Rik van Riel <riel@redhat.com> wrote:
> On 03/02/2012 12:36 PM, Satoru Moriya wrote:
>>
>> Sometimes we'd like to avoid swapping out anonymous memory
>> in particular, avoid swapping out pages of important process or
>> process groups while there is a reasonable amount of pagecache
>> on RAM so that we can satisfy our customers' requirements.
>>
>> OTOH, we can control how aggressive the kernel will swap memory pages
>> with /proc/sys/vm/swappiness for global and
>> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>>
>> But with current reclaim implementation, the kernel may swap out
>> even if we set swappiness==0 and there is pagecache on RAM.
>>
>> This patch changes the behavior with swappiness==0. If we set
>> swappiness==0, the kernel does not swap out completely
>> (for global reclaim until the amount of free pages and filebacked
>> pages in a zone has been reduced to something very very small
>> (nr_free + nr_filebacked<  high watermark)).
>>
>> Any comments are welcome.
>>
>> Regards,
>> Satoru Moriya
>>
>> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>
>> ---
>>  mm/vmscan.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c52b235..27dc3e8 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone
>> *mz, struct scan_control *sc,
>>         * proportional to the fraction of recently scanned pages on
>>         * each list that were recently referenced and in active use.
>>         */
>> -       ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
>> +       ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
>>        ap /= reclaim_stat->recent_rotated[0] + 1;
>>
>> -       fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>> +       fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
>>        fp /= reclaim_stat->recent_rotated[1] + 1;
>>        spin_unlock_irq(&mz->zone->lru_lock);
>
>
> ACK on this bit of the patch.
>
>> @@ -1999,7 +1999,7 @@ out:
>>                unsigned long scan;
>>
>>                scan = zone_nr_lru_pages(mz, lru);
>> -               if (priority || noswap) {
>> +               if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>>                        scan>>= priority;
>>                        if (!scan&&  force_scan)
>>                                scan = SWAP_CLUSTER_MAX;
>
>
> However, I do not understand why we fail to scale
> the number of pages we want to scan with priority
> if "noswap".
>
> For that matter, surely if we do not want to swap
> out anonymous pages, we WANT to go into this if
> branch, in order to make sure we set "scan" to 0?
>
> scan = div64_u64(scan * fraction[file], denominator);
>
> With your patch and swappiness=0, or no swap space, it
> looks like we do not zero out "scan" and may end up
> scanning anonymous pages.
>
> Am I overlooking something?  Is this correct?
>

Try to simplify the complex a bit :)

Good weekend
-hd

--- a/mm/vmscan.c	Wed Feb  8 20:10:14 2012
+++ b/mm/vmscan.c	Sat Mar  3 10:02:10 2012
@@ -1997,15 +1997,23 @@ static void get_scan_count(struct mem_cg
 out:
 	for_each_evictable_lru(lru) {
 		int file = is_file_lru(lru);
-		unsigned long scan;
+		unsigned long scan = 0;

-		scan = zone_nr_lru_pages(mz, lru);
-		if (priority || noswap) {
-			scan >>= priority;
-			if (!scan && force_scan)
-				scan = SWAP_CLUSTER_MAX;
+		/* First, check noswap */
+		if (noswap && !file)
+			goto set;
+
+		/* Second, apply priority */
+		scan = zone_nr_lru_pages(mz, lru) >> priority;
+
+		/* Third, check force */
+		if (!scan && force_scan)
+			scan = SWAP_CLUSTER_MAX;
+
+		/* Finally, try to avoid div64 */
+		if (scan)
 			scan = div64_u64(scan * fraction[file], denominator);
-		}
+set:
 		nr[lru] = scan;
 	}
 }
--

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
  2012-03-02 22:47 ` Rik van Riel
@ 2012-03-04  6:57 ` Minchan Kim
  2012-03-05 21:38   ` Satoru Moriya
  2012-03-05 13:49 ` Rik van Riel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Minchan Kim @ 2012-03-04  6:57 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: linux-mm, linux-kernel, Rik van Riel, lwoodman, jweiner,
	shaohua.li, KOSAKI Motohiro, dle-develop, Seiji Aguchi

Hi Satoru,

On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
> 
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
> 
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
> 
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked < high watermark)).
> 
> Any comments are welcome.
> 
> Regards,
> Satoru Moriya
> 
> Signed-off-by: Satoru Moriya <satoru.moriya@hds.com>

Acked-by: Minchan Kim <minchan@kernel.org>

I agree this feature but current code is rather ugly on readbility.
It's not your fault because it is caused by adding 'noswap' to avoid
scanning of anon pages when priority is 0. You just used that code. :)

Hillf's version looks to be much clean refactoring so after we merge
your patch, we can tidy it up with Hillf's patch.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
  2012-03-02 22:47 ` Rik van Riel
  2012-03-04  6:57 ` Minchan Kim
@ 2012-03-05 13:49 ` Rik van Riel
  2012-03-05 21:56 ` Johannes Weiner
  2012-05-07 20:11 ` Rik van Riel
  4 siblings, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2012-03-05 13:49 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: linux-mm, linux-kernel, lwoodman, jweiner, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

On 03/02/2012 12:36 PM, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
>
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
>
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked<  high watermark)).
>
> Any comments are welcome.

My mind is now rested by doing a nice 10 mile hike :)

> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-04  6:57 ` Minchan Kim
@ 2012-03-05 21:38   ` Satoru Moriya
  0 siblings, 0 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-03-05 21:38 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, lwoodman, jweiner,
	shaohua.li, KOSAKI Motohiro, dle-develop, Seiji Aguchi

Hi Minchan,

Thank you for reviewing.

On 03/04/2012 01:57 AM, Minchan Kim wrote:
> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
> 
> I agree this feature but current code is rather ugly on readbility.

I agree with you.

> Hillf's version looks to be much clean refactoring so after we merge 
> your patch, we can tidy it up with Hillf's patch.

Thanks. No problem.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
                   ` (2 preceding siblings ...)
  2012-03-05 13:49 ` Rik van Riel
@ 2012-03-05 21:56 ` Johannes Weiner
  2012-03-07 17:19   ` KOSAKI Motohiro
  2012-05-07 20:11 ` Rik van Riel
  4 siblings, 1 reply; 32+ messages in thread
From: Johannes Weiner @ 2012-03-05 21:56 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: linux-mm, linux-kernel, Rik van Riel, lwoodman, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
> 
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
> 
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
> 
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked < high watermark)).
> 
> Any comments are welcome.

Last time I tried that (getting rid of sc->may_swap, using
!swappiness), it was rejected it as there were users who relied on
swapping very slowly with this setting.

KOSAKI-san, do I remember correctly?  Do you still think it's an
issue?

Personally, I still think it's illogical that !swappiness allows
swapping and would love to see this patch go in.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-05 21:56 ` Johannes Weiner
@ 2012-03-07 17:19   ` KOSAKI Motohiro
  2012-03-07 18:18     ` Satoru Moriya
  0 siblings, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2012-03-07 17:19 UTC (permalink / raw)
  To: jweiner
  Cc: satoru.moriya, linux-mm, linux-kernel, riel, lwoodman,
	shaohua.li, kosaki.motohiro, dle-develop, seiji.aguchi

On 3/5/2012 4:56 PM, Johannes Weiner wrote:
> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
>> Sometimes we'd like to avoid swapping out anonymous memory
>> in particular, avoid swapping out pages of important process or
>> process groups while there is a reasonable amount of pagecache
>> on RAM so that we can satisfy our customers' requirements.
>>
>> OTOH, we can control how aggressive the kernel will swap memory pages
>> with /proc/sys/vm/swappiness for global and
>> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>>
>> But with current reclaim implementation, the kernel may swap out
>> even if we set swappiness==0 and there is pagecache on RAM.
>>
>> This patch changes the behavior with swappiness==0. If we set
>> swappiness==0, the kernel does not swap out completely
>> (for global reclaim until the amount of free pages and filebacked
>> pages in a zone has been reduced to something very very small
>> (nr_free + nr_filebacked < high watermark)).
>>
>> Any comments are welcome.
> 
> Last time I tried that (getting rid of sc->may_swap, using
> !swappiness), it was rejected it as there were users who relied on
> swapping very slowly with this setting.
> 
> KOSAKI-san, do I remember correctly?  Do you still think it's an
> issue?
>
> Personally, I still think it's illogical that !swappiness allows
> swapping and would love to see this patch go in.

Thank you. I brought back to memory it. Unfortunately DB folks are still
mainly using RHEL5 generation distros. At that time, swapiness=0 doesn't
mean disabling swap.

They want, "don't swap as far as kernel has any file cache page". but linux
don't have such feature. then they used swappiness for emulate it. So, I
think this patch clearly make userland harm. Because of, we don't have an
alternative way.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-07 17:19   ` KOSAKI Motohiro
@ 2012-03-07 18:18     ` Satoru Moriya
  2012-03-30 22:44       ` Satoru Moriya
  2012-04-24  8:20       ` Richard Davies
  0 siblings, 2 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-03-07 18:18 UTC (permalink / raw)
  To: KOSAKI Motohiro, jweiner
  Cc: linux-mm, linux-kernel, riel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi

On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
> On 3/5/2012 4:56 PM, Johannes Weiner wrote:
>> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
>>>
>>> This patch changes the behavior with swappiness==0. If we set 
>>> swappiness==0, the kernel does not swap out completely (for global 
>>> reclaim until the amount of free pages and filebacked pages in a 
>>> zone has been reduced to something very very small (nr_free + 
>>> nr_filebacked < high watermark)).
>>>
>>> Any comments are welcome.
>>
>> Last time I tried that (getting rid of sc->may_swap, using 
>> !swappiness), it was rejected it as there were users who relied on 
>> swapping very slowly with this setting.
>>
>> KOSAKI-san, do I remember correctly?  Do you still think it's an 
>> issue?
>>
>> Personally, I still think it's illogical that !swappiness allows 
>> swapping and would love to see this patch go in.
> 
> Thank you. I brought back to memory it. Unfortunately DB folks are 
> still mainly using RHEL5 generation distros. At that time, swapiness=0 
> doesn't mean disabling swap.
> 
> They want, "don't swap as far as kernel has any file cache page". but 
> linux don't have such feature. then they used swappiness for emulate 
> it. So, I think this patch clearly make userland harm. Because of, we 
> don't have an alternative way.

If they expect the behavior that "don't swap as far as kernel
has any file cache page", this patch definitely helps them
because if we set swappiness==0, kernel does not swap out
*until* nr_free + nr_filebacked < high watermark in the zone.
It means kernel begins to swap out when nr_free + nr_filebacked
becomes less than high watermark.

But, yes, this patch actually changes the behavior with
swappiness==0 and so it may make userland harm. 

How about introducing new value e.g -1 to avoid swap and
maintain compatibility?

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-07 18:18     ` Satoru Moriya
@ 2012-03-30 22:44       ` Satoru Moriya
  2012-04-02 17:10         ` KOSAKI Motohiro
  2012-04-24  8:20       ` Richard Davies
  1 sibling, 1 reply; 32+ messages in thread
From: Satoru Moriya @ 2012-03-30 22:44 UTC (permalink / raw)
  To: Satoru Moriya, KOSAKI Motohiro, jweiner
  Cc: linux-mm, linux-kernel, riel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi

Hello Kosaki-san,

On 03/07/2012 01:18 PM, Satoru Moriya wrote:
> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>> Thank you. I brought back to memory it. Unfortunately DB folks are 
>> still mainly using RHEL5 generation distros. At that time, 
>> swapiness=0 doesn't mean disabling swap.
>>
>> They want, "don't swap as far as kernel has any file cache page". but 
>> linux don't have such feature. then they used swappiness for emulate 
>> it. So, I think this patch clearly make userland harm. Because of, we 
>> don't have an alternative way.

As I wrote in the previous mail(see below), with this patch
the kernel begins to swap out when the sum of free pages and
filebacked pages reduces less than watermark_high.

So the kernel reclaims pages like following.

nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
nr_free + nr_filebacked <  watermark_high: reclaim only anonymous pages

Do you think this behavior satisfies DB users' requirement?


> If they expect the behavior that "don't swap as far as kernel has any 
> file cache page", this patch definitely helps them because if we set 
> swappiness==0, kernel does not swap out
> *until* nr_free + nr_filebacked < high watermark in the zone.
> It means kernel begins to swap out when nr_free + nr_filebacked 
> becomes less than high watermark.
> 
> But, yes, this patch actually changes the behavior with swappiness==0 
> and so it may make userland harm.
> 
> How about introducing new value e.g -1 to avoid swap and maintain 
> compatibility?

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-30 22:44       ` Satoru Moriya
@ 2012-04-02 17:10         ` KOSAKI Motohiro
  2012-04-03 11:25           ` Jerome Marchand
  0 siblings, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2012-04-02 17:10 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: jweiner, linux-mm, linux-kernel, riel, lwoodman, shaohua.li,
	dle-develop, Seiji Aguchi

2012/3/30 Satoru Moriya <satoru.moriya@hds.com>:
> Hello Kosaki-san,
>
> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>> still mainly using RHEL5 generation distros. At that time,
>>> swapiness=0 doesn't mean disabling swap.
>>>
>>> They want, "don't swap as far as kernel has any file cache page". but
>>> linux don't have such feature. then they used swappiness for emulate
>>> it. So, I think this patch clearly make userland harm. Because of, we
>>> don't have an alternative way.
>
> As I wrote in the previous mail(see below), with this patch
> the kernel begins to swap out when the sum of free pages and
> filebacked pages reduces less than watermark_high.
>
> So the kernel reclaims pages like following.
>
> nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
> nr_free + nr_filebacked <  watermark_high: reclaim only anonymous pages

How?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-02 17:10         ` KOSAKI Motohiro
@ 2012-04-03 11:25           ` Jerome Marchand
  2012-04-03 15:15             ` Satoru Moriya
  2012-04-04 17:38             ` KOSAKI Motohiro
  0 siblings, 2 replies; 32+ messages in thread
From: Jerome Marchand @ 2012-04-03 11:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Satoru Moriya, jweiner, linux-mm, linux-kernel, riel, lwoodman,
	shaohua.li, dle-develop, Seiji Aguchi

On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
> 2012/3/30 Satoru Moriya <satoru.moriya@hds.com>:
>> Hello Kosaki-san,
>>
>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>> still mainly using RHEL5 generation distros. At that time,
>>>> swapiness=0 doesn't mean disabling swap.
>>>>
>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>> linux don't have such feature. then they used swappiness for emulate
>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>> don't have an alternative way.
>>
>> As I wrote in the previous mail(see below), with this patch
>> the kernel begins to swap out when the sum of free pages and
>> filebacked pages reduces less than watermark_high.

Actually, this is true only for global reclaims. Reclaims in cgroup can fail
in this case.

>>
>> So the kernel reclaims pages like following.
>>
>> nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
>> nr_free + nr_filebacked <  watermark_high: reclaim only anonymous pages
> 
> How?

get_scan_count() checks that case explicitly:

	if (global_reclaim(sc)) {
		free  = zone_page_state(mz->zone, NR_FREE_PAGES);
		/* If we have very few page cache pages,
		   force-scan anon pages. */
		if (unlikely(file + free <= high_wmark_pages(mz->zone))) {
			fraction[0] = 1;
			fraction[1] = 0;
			denominator = 1;
			goto out;
		}
	}

Regards,
Jerome

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-03 11:25           ` Jerome Marchand
@ 2012-04-03 15:15             ` Satoru Moriya
  2012-04-04 17:38             ` KOSAKI Motohiro
  1 sibling, 0 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-04-03 15:15 UTC (permalink / raw)
  To: Jerome Marchand, KOSAKI Motohiro
  Cc: jweiner, linux-mm, linux-kernel, riel, lwoodman, dle-develop,
	Seiji Aguchi

On 04/03/2012 07:25 AM, Jerome Marchand wrote:
> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>> 2012/3/30 Satoru Moriya <satoru.moriya@hds.com>:
>>> Hello Kosaki-san,
>>>
>>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>>> still mainly using RHEL5 generation distros. At that time,
>>>>> swapiness=0 doesn't mean disabling swap.
>>>>>
>>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>>> linux don't have such feature. then they used swappiness for emulate
>>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>>> don't have an alternative way.
>>>
>>> As I wrote in the previous mail(see below), with this patch
>>> the kernel begins to swap out when the sum of free pages and
>>> filebacked pages reduces less than watermark_high.
>
> Actually, this is true only for global reclaims. Reclaims in cgroup can fail
> in this case.

Right.
As long as we consider RHEL5 users above, I believe they don't care
about cgroup case.

>>>
>>> So the kernel reclaims pages like following.
>>>
>>> nr_free + nr_filebacked >= watermark_high: reclaim only filebacked pages
>>> nr_free + nr_filebacked <  watermark_high: reclaim only anonymous pages

I made a tiny mistake.
Correct one is following ;p

nr_free + nr_filebacked >  watermark_high: reclaim only filebacked pages
nr_free + nr_filebacked <= watermark_high: reclaim only anonymous pages

>> How?
>
> get_scan_count() checks that case explicitly:
>
>        if (global_reclaim(sc)) {
>                free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>                /* If we have very few page cache pages,
>                   force-scan anon pages. */
>                if (unlikely(file + free <= high_wmark_pages(mz->zone))) {
>                        fraction[0] = 1;
>                        fraction[1] = 0;
>                        denominator = 1;
>                        goto out;
>                }
>        }

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-03 11:25           ` Jerome Marchand
  2012-04-03 15:15             ` Satoru Moriya
@ 2012-04-04 17:38             ` KOSAKI Motohiro
  2012-04-21  0:21               ` Satoru Moriya
  1 sibling, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2012-04-04 17:38 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: KOSAKI Motohiro, Satoru Moriya, jweiner, linux-mm, linux-kernel,
	riel, lwoodman, shaohua.li, dle-develop, Seiji Aguchi,
	kosaki.motohiro

(4/3/12 4:25 AM), Jerome Marchand wrote:
> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>> 2012/3/30 Satoru Moriya<satoru.moriya@hds.com>:
>>> Hello Kosaki-san,
>>>
>>> On 03/07/2012 01:18 PM, Satoru Moriya wrote:
>>>> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>>>>> Thank you. I brought back to memory it. Unfortunately DB folks are
>>>>> still mainly using RHEL5 generation distros. At that time,
>>>>> swapiness=0 doesn't mean disabling swap.
>>>>>
>>>>> They want, "don't swap as far as kernel has any file cache page". but
>>>>> linux don't have such feature. then they used swappiness for emulate
>>>>> it. So, I think this patch clearly make userland harm. Because of, we
>>>>> don't have an alternative way.
>>>
>>> As I wrote in the previous mail(see below), with this patch
>>> the kernel begins to swap out when the sum of free pages and
>>> filebacked pages reduces less than watermark_high.
>
> Actually, this is true only for global reclaims. Reclaims in cgroup can fail
> in this case.
>
>>>
>>> So the kernel reclaims pages like following.
>>>
>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>
>> How?
>
> get_scan_count() checks that case explicitly:
>
> 	if (global_reclaim(sc)) {
> 		free  = zone_page_state(mz->zone, NR_FREE_PAGES);
> 		/* If we have very few page cache pages,
> 		   force-scan anon pages. */
> 		if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
> 			fraction[0] = 1;
> 			fraction[1] = 0;
> 			denominator = 1;
> 			goto out;
> 		}
> 	}

Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
so, high watermask is not good threshold anymore.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-04 17:38             ` KOSAKI Motohiro
@ 2012-04-21  0:21               ` Satoru Moriya
  2012-05-11 21:11                 ` Satoru Moriya
  0 siblings, 1 reply; 32+ messages in thread
From: Satoru Moriya @ 2012-04-21  0:21 UTC (permalink / raw)
  To: KOSAKI Motohiro, Jerome Marchand
  Cc: KOSAKI Motohiro, jweiner, linux-mm, linux-kernel, riel, lwoodman,
	shaohua.li, dle-develop, Seiji Aguchi

Hi,

Sorry for my late reply.

On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
> (4/3/12 4:25 AM), Jerome Marchand wrote:
>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>> 2012/3/30 Satoru Moriya<satoru.moriya@hds.com>:
>>>> So the kernel reclaims pages like following.
>>>>
>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>>
>>> How?
>>
>> get_scan_count() checks that case explicitly:
>>
>>     if (global_reclaim(sc)) {
>>         free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>>         /* If we have very few page cache pages,
>>            force-scan anon pages. */
>>         if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>>             fraction[0] = 1;
>>             fraction[1] = 0;
>>             denominator = 1;
>>             goto out;
>>         }
>>     }
> 
> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
> so, high watermask is not good threshold anymore.

Ah yes, it is not so small now.
On 4GB server, without THP min_free_kbytes is 8113 but
with THP it is 67584.

How about using low watermark or min watermark?
Are they still big?

...or should we use other value? 

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-07 18:18     ` Satoru Moriya
  2012-03-30 22:44       ` Satoru Moriya
@ 2012-04-24  8:20       ` Richard Davies
  2012-04-24 22:14         ` Satoru Moriya
  2012-04-26 14:50         ` Christoph Lameter
  1 sibling, 2 replies; 32+ messages in thread
From: Richard Davies @ 2012-04-24  8:20 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: KOSAKI Motohiro, Jerome Marchand, KOSAKI Motohiro, jweiner,
	linux-mm, linux-kernel, riel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi, Minchan Kim

On 03/07/2012 18:18 PM, Satoru Moriya wrote:
> On 03/07/2012 12:19 PM, KOSAKI Motohiro wrote:
>> On 3/5/2012 4:56 PM, Johannes Weiner wrote:
>>> On Fri, Mar 02, 2012 at 12:36:40PM -0500, Satoru Moriya wrote:
>>>>
>>>> This patch changes the behavior with swappiness==0. If we set 
>>>> swappiness==0, the kernel does not swap out completely (for global 
>>>> reclaim until the amount of free pages and filebacked pages in a 
>>>> zone has been reduced to something very very small (nr_free + 
>>>> nr_filebacked < high watermark)).
>>>>
>>>> Any comments are welcome.
>>>
>>> Last time I tried that (getting rid of sc->may_swap, using 
>>> !swappiness), it was rejected it as there were users who relied on 
>>> swapping very slowly with this setting.
>>>
>>> KOSAKI-san, do I remember correctly?  Do you still think it's an 
>>> issue?
>>>
>>> Personally, I still think it's illogical that !swappiness allows 
>>> swapping and would love to see this patch go in.
>> 
>> Thank you. I brought back to memory it. Unfortunately DB folks are 
>> still mainly using RHEL5 generation distros. At that time, swapiness=0 
>> doesn't mean disabling swap.
>> 
>> They want, "don't swap as far as kernel has any file cache page". but 
>> linux don't have such feature. then they used swappiness for emulate 
>> it. So, I think this patch clearly make userland harm. Because of, we 
>> don't have an alternative way.
>
> If they expect the behavior that "don't swap as far as kernel
> has any file cache page", this patch definitely helps them
> because if we set swappiness==0, kernel does not swap out
> *until* nr_free + nr_filebacked < high watermark in the zone.
> It means kernel begins to swap out when nr_free + nr_filebacked
> becomes less than high watermark.
>
> But, yes, this patch actually changes the behavior with
> swappiness==0 and so it may make userland harm. 
>
> How about introducing new value e.g -1 to avoid swap and
> maintain compatibility?

I have run into problems with heavy swapping with swappiness==0 and was
pointed to this thread ( http://marc.info/?l=linux-mm&m=133522782307215 )

I strongly believe that Linux should have a way to turn off swapping unless
absolutely necessary. This means that users like us can run with swap
present for emergency use, rather than having to disable it because of the
side effects.

Personally, I feel that swappiness==0 should have this (intuitive) meaning,
and that people running RHEL5 are extremely unlikely to run 3.5 kernels(!)

However, swappiness==-1 or some other hack is definitely better than no
patch.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-24  8:20       ` Richard Davies
@ 2012-04-24 22:14         ` Satoru Moriya
  2012-04-26 14:26           ` Richard Davies
  2012-04-26 14:50         ` Christoph Lameter
  1 sibling, 1 reply; 32+ messages in thread
From: Satoru Moriya @ 2012-04-24 22:14 UTC (permalink / raw)
  To: Richard Davies
  Cc: KOSAKI Motohiro, Jerome Marchand, KOSAKI Motohiro, jweiner,
	linux-mm, linux-kernel, riel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi, Minchan Kim

On 04/24/2012 04:20 AM, Richard Davies wrote:
> 
> I have run into problems with heavy swapping with swappiness==0 and 
> was pointed to this thread ( 
> http://marc.info/?l=linux-mm&m=133522782307215 )

Did you test this patch with your workload?
If yes, how did it come out?

> I strongly believe that Linux should have a way to turn off swapping 
> unless absolutely necessary. This means that users like us can run 
> with swap present for emergency use, rather than having to disable it 
> because of the side effects.

Agreed. That is why I proposed the patch.

> Personally, I feel that swappiness==0 should have this (intuitive) 
> meaning, and that people running RHEL5 are extremely unlikely to run 
> 3.5 kernels(!)
> 
> However, swappiness==-1 or some other hack is definitely better than 
> no patch.


Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-24 22:14         ` Satoru Moriya
@ 2012-04-26 14:26           ` Richard Davies
  2012-04-26 15:41             ` KOSAKI Motohiro
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Davies @ 2012-04-26 14:26 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: KOSAKI Motohiro, Jerome Marchand, KOSAKI Motohiro, jweiner,
	linux-mm, linux-kernel, riel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi, Minchan Kim

Satoru Moriya wrote:
> > I have run into problems with heavy swapping with swappiness==0 and 
> > was pointed to this thread ( 
> > http://marc.info/?l=linux-mm&m=133522782307215 )
> 
> Did you test this patch with your workload?

I haven't yet tested this patch. It takes a long time since these are
production machines, and the bug itself takes several weeks of production
use to really show up.

Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4:
http://marc.info/?l=linux-mm&m=133536506926326

My intention is to reboot half of our machines into plain 3.4 once it is
out, and half onto 3.4 + your patch.

Then we can compare behaviour.


Will your patch apply cleanly on 3.4?

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-24  8:20       ` Richard Davies
  2012-04-24 22:14         ` Satoru Moriya
@ 2012-04-26 14:50         ` Christoph Lameter
  2012-04-26 15:37           ` KOSAKI Motohiro
  2012-04-27 13:55           ` Rik van Riel
  1 sibling, 2 replies; 32+ messages in thread
From: Christoph Lameter @ 2012-04-26 14:50 UTC (permalink / raw)
  To: Richard Davies
  Cc: Satoru Moriya, KOSAKI Motohiro, Jerome Marchand, KOSAKI Motohiro,
	jweiner, linux-mm, linux-kernel, riel, lwoodman, shaohua.li,
	dle-develop, Seiji Aguchi, Minchan Kim

On Tue, 24 Apr 2012, Richard Davies wrote:

> I strongly believe that Linux should have a way to turn off swapping unless
> absolutely necessary. This means that users like us can run with swap
> present for emergency use, rather than having to disable it because of the
> side effects.

Agree. And this ooperation mode should be the default behavior given that
swapping is a very slow and tedious process these days.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 14:50         ` Christoph Lameter
@ 2012-04-26 15:37           ` KOSAKI Motohiro
  2012-04-26 16:08             ` Richard Davies
  2012-04-26 18:20             ` Christoph Lameter
  2012-04-27 13:55           ` Rik van Riel
  1 sibling, 2 replies; 32+ messages in thread
From: KOSAKI Motohiro @ 2012-04-26 15:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Richard Davies, Satoru Moriya, KOSAKI Motohiro, Jerome Marchand,
	KOSAKI Motohiro, jweiner, linux-mm, linux-kernel, riel, lwoodman,
	shaohua.li, dle-develop, Seiji Aguchi, Minchan Kim

(4/26/12 10:50 AM), Christoph Lameter wrote:
> On Tue, 24 Apr 2012, Richard Davies wrote:
>
>> I strongly believe that Linux should have a way to turn off swapping unless
>> absolutely necessary. This means that users like us can run with swap
>> present for emergency use, rather than having to disable it because of the
>> side effects.
>
> Agree. And this ooperation mode should be the default behavior given that
> swapping is a very slow and tedious process these days.

Even though current patch is not optimal, I don't disagree this opinion. Can
you please explain your use case? Why don't you use swapoff?

Off topic: I hope linux is going to aim good swap clustered io in future. Especially
when using THP, 4k size io is not really good.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 14:26           ` Richard Davies
@ 2012-04-26 15:41             ` KOSAKI Motohiro
  2012-05-07 20:09               ` Rik van Riel
  0 siblings, 1 reply; 32+ messages in thread
From: KOSAKI Motohiro @ 2012-04-26 15:41 UTC (permalink / raw)
  To: Richard Davies
  Cc: Satoru Moriya, Jerome Marchand, jweiner, linux-mm, linux-kernel,
	riel, lwoodman, shaohua.li, dle-develop, Seiji Aguchi,
	Minchan Kim

On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies
<richard.davies@elastichosts.com> wrote:
> Satoru Moriya wrote:
>> > I have run into problems with heavy swapping with swappiness==0 and
>> > was pointed to this thread (
>> > http://marc.info/?l=linux-mm&m=133522782307215 )
>>
>> Did you test this patch with your workload?
>
> I haven't yet tested this patch. It takes a long time since these are
> production machines, and the bug itself takes several weeks of production
> use to really show up.
>
> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4:
> http://marc.info/?l=linux-mm&m=133536506926326
>
> My intention is to reboot half of our machines into plain 3.4 once it is
> out, and half onto 3.4 + your patch.
>
> Then we can compare behaviour.
>
> Will your patch apply cleanly on 3.4?

Note. This patch doesn't solve your issue. This patch mean,
when occuring very few swap io, it change to 0. But you said
you are seeing eager swap io. As Dave already pointed out, your
machine have buffer head issue.

So, this thread is pointless.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 15:37           ` KOSAKI Motohiro
@ 2012-04-26 16:08             ` Richard Davies
  2012-04-26 18:20             ` Christoph Lameter
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Davies @ 2012-04-26 16:08 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Satoru Moriya, Jerome Marchand,
	KOSAKI Motohiro, jweiner, linux-mm, linux-kernel, riel, lwoodman,
	shaohua.li, dle-develop, Seiji Aguchi, Minchan Kim

KOSAKI Motohiro wrote:
> Christoph Lameter wrote:
> > Richard Davies wrote:
> >
> > > I strongly believe that Linux should have a way to turn off swapping unless
> > > absolutely necessary. This means that users like us can run with swap
> > > present for emergency use, rather than having to disable it because of the
> > > side effects.
> >
> > Agree. And this ooperation mode should be the default behavior given that
> > swapping is a very slow and tedious process these days.
> 
> Even though current patch is not optimal, I don't disagree this opinion. Can
> you please explain your use case? Why don't you use swapoff?

My use case is that I have large (64 or 128GB RAM) qemu-kvm virtualization
hosts, running many (20-50) VMs.

Typically the total memory in use is less than physical memory. In these
cases I would like the virtualization host to run without any swapping. I
have set swappiness==0, but in practise I get big load spikes from swapping.
See http://marc.info/?l=linux-mm&m=133517452117581

I don't want to run swapoff, because sometimes I will need to provision
slightly more VMs than physical memory, and in these cases I would rather
that the system runs with a little swap in use rather than the OOM killer
occurring.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 15:37           ` KOSAKI Motohiro
  2012-04-26 16:08             ` Richard Davies
@ 2012-04-26 18:20             ` Christoph Lameter
  1 sibling, 0 replies; 32+ messages in thread
From: Christoph Lameter @ 2012-04-26 18:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Richard Davies, Satoru Moriya, Jerome Marchand, KOSAKI Motohiro,
	jweiner, linux-mm, linux-kernel, riel, lwoodman, shaohua.li,
	dle-develop, Seiji Aguchi, Minchan Kim

On Thu, 26 Apr 2012, KOSAKI Motohiro wrote:

> (4/26/12 10:50 AM), Christoph Lameter wrote:
> > On Tue, 24 Apr 2012, Richard Davies wrote:
> >
> > > I strongly believe that Linux should have a way to turn off swapping
> > > unless
> > > absolutely necessary. This means that users like us can run with swap
> > > present for emergency use, rather than having to disable it because of the
> > > side effects.
> >
> > Agree. And this ooperation mode should be the default behavior given that
> > swapping is a very slow and tedious process these days.
>
> Even though current patch is not optimal, I don't disagree this opinion. Can
> you please explain your use case? Why don't you use swapoff?

Because I do not want to have systems go OOM. In an emergency lets use
swap (and maybe generate some sort of alert if that happens).

> Off topic: I hope linux is going to aim good swap clustered io in future.
> Especially
> when using THP, 4k size io is not really good.

Swap to regular disks is going to be an ever greater problem since
the access speed of rotational media has not changed much whereas the
processing performance of the cpu has increased significantly. There is an
ever increasing gap in speed.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 14:50         ` Christoph Lameter
  2012-04-26 15:37           ` KOSAKI Motohiro
@ 2012-04-27 13:55           ` Rik van Riel
  1 sibling, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2012-04-27 13:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Richard Davies, Satoru Moriya, KOSAKI Motohiro, Jerome Marchand,
	KOSAKI Motohiro, jweiner, linux-mm, linux-kernel, lwoodman,
	shaohua.li, dle-develop, Seiji Aguchi, Minchan Kim

On 04/26/2012 10:50 AM, Christoph Lameter wrote:
> On Tue, 24 Apr 2012, Richard Davies wrote:
>
>> I strongly believe that Linux should have a way to turn off swapping unless
>> absolutely necessary. This means that users like us can run with swap
>> present for emergency use, rather than having to disable it because of the
>> side effects.
>
> Agree. And this ooperation mode should be the default behavior given that
> swapping is a very slow and tedious process these days.

I believe that is a bad idea.

With cgroups, the situation is a whole lot less obvious than with
the simple test done in this patch.  Lets see how the 3.4 code
behaves, and if we need any additional changes to reduce swapping
and step up reclaiming of page cache...

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-26 15:41             ` KOSAKI Motohiro
@ 2012-05-07 20:09               ` Rik van Riel
  2012-05-08  0:05                 ` Minchan Kim
  2012-05-21  7:12                 ` Richard Davies
  0 siblings, 2 replies; 32+ messages in thread
From: Rik van Riel @ 2012-05-07 20:09 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Richard Davies, Satoru Moriya, Jerome Marchand, jweiner,
	linux-mm, linux-kernel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi, Minchan Kim

On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote:
> On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies
> <richard.davies@elastichosts.com>  wrote:
>> Satoru Moriya wrote:
>>>> I have run into problems with heavy swapping with swappiness==0 and
>>>> was pointed to this thread (
>>>> http://marc.info/?l=linux-mm&m=133522782307215 )
>>>
>>> Did you test this patch with your workload?
>>
>> I haven't yet tested this patch. It takes a long time since these are
>> production machines, and the bug itself takes several weeks of production
>> use to really show up.
>>
>> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4:
>> http://marc.info/?l=linux-mm&m=133536506926326
>>
>> My intention is to reboot half of our machines into plain 3.4 once it is
>> out, and half onto 3.4 + your patch.
>>
>> Then we can compare behaviour.
>>
>> Will your patch apply cleanly on 3.4?
>
> Note. This patch doesn't solve your issue. This patch mean,
> when occuring very few swap io, it change to 0. But you said
> you are seeing eager swap io. As Dave already pointed out, your
> machine have buffer head issue.
>
> So, this thread is pointless.

Running KVM guests directly off block devices results in a lot
of buffer cache.

I suspect that this patch will in fact fix Richard's issue.

The patch is small, fairly simple and looks like it will fix
people's problems.  It also makes swappiness=0 behave the way
most people seem to imagine it would work.

If it works for a few people (test results), I believe we
might as well merge it.

Yes, for cgroups we may need additional logic, but we can
sort that out as we go along.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
                   ` (3 preceding siblings ...)
  2012-03-05 21:56 ` Johannes Weiner
@ 2012-05-07 20:11 ` Rik van Riel
  4 siblings, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2012-05-07 20:11 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: linux-mm, linux-kernel, lwoodman, jweiner, shaohua.li,
	KOSAKI Motohiro, dle-develop, Seiji Aguchi

On 03/02/2012 12:36 PM, Satoru Moriya wrote:
> Sometimes we'd like to avoid swapping out anonymous memory
> in particular, avoid swapping out pages of important process or
> process groups while there is a reasonable amount of pagecache
> on RAM so that we can satisfy our customers' requirements.
>
> OTOH, we can control how aggressive the kernel will swap memory pages
> with /proc/sys/vm/swappiness for global and
> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>
> But with current reclaim implementation, the kernel may swap out
> even if we set swappiness==0 and there is pagecache on RAM.
>
> This patch changes the behavior with swappiness==0. If we set
> swappiness==0, the kernel does not swap out completely
> (for global reclaim until the amount of free pages and filebacked
> pages in a zone has been reduced to something very very small
> (nr_free + nr_filebacked<  high watermark)).

> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-05-07 20:09               ` Rik van Riel
@ 2012-05-08  0:05                 ` Minchan Kim
  2012-05-21  7:12                 ` Richard Davies
  1 sibling, 0 replies; 32+ messages in thread
From: Minchan Kim @ 2012-05-08  0:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Richard Davies, Satoru Moriya, Jerome Marchand,
	jweiner, linux-mm, linux-kernel, lwoodman, shaohua.li,
	dle-develop, Seiji Aguchi, Minchan Kim

On 05/08/2012 05:09 AM, Rik van Riel wrote:

> On 04/26/2012 11:41 AM, KOSAKI Motohiro wrote:
>> On Thu, Apr 26, 2012 at 10:26 AM, Richard Davies
>> <richard.davies@elastichosts.com>  wrote:
>>> Satoru Moriya wrote:
>>>>> I have run into problems with heavy swapping with swappiness==0 and
>>>>> was pointed to this thread (
>>>>> http://marc.info/?l=linux-mm&m=133522782307215 )
>>>>
>>>> Did you test this patch with your workload?
>>>
>>> I haven't yet tested this patch. It takes a long time since these are
>>> production machines, and the bug itself takes several weeks of
>>> production
>>> use to really show up.
>>>
>>> Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4:
>>> http://marc.info/?l=linux-mm&m=133536506926326
>>>
>>> My intention is to reboot half of our machines into plain 3.4 once it is
>>> out, and half onto 3.4 + your patch.
>>>
>>> Then we can compare behaviour.
>>>
>>> Will your patch apply cleanly on 3.4?
>>
>> Note. This patch doesn't solve your issue. This patch mean,
>> when occuring very few swap io, it change to 0. But you said
>> you are seeing eager swap io. As Dave already pointed out, your
>> machine have buffer head issue.
>>
>> So, this thread is pointless.
> 
> Running KVM guests directly off block devices results in a lot
> of buffer cache.
> 
> I suspect that this patch will in fact fix Richard's issue.
> 
> The patch is small, fairly simple and looks like it will fix
> people's problems.  It also makes swappiness=0 behave the way
> most people seem to imagine it would work.
> 
> If it works for a few people (test results), I believe we
> might as well merge it.
> 
> Yes, for cgroups we may need additional logic, but we can
> sort that out as we go along.
> 


I agree Rik's opinion absolutely.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-04-21  0:21               ` Satoru Moriya
@ 2012-05-11 21:11                 ` Satoru Moriya
  2012-05-12 22:21                   ` Rik van Riel
  0 siblings, 1 reply; 32+ messages in thread
From: Satoru Moriya @ 2012-05-11 21:11 UTC (permalink / raw)
  To: Satoru Moriya, KOSAKI Motohiro, Jerome Marchand
  Cc: KOSAKI Motohiro, jweiner, linux-mm, linux-kernel, riel, lwoodman,
	dle-develop, Seiji Aguchi

On 04/20/2012 08:21 PM, Satoru Moriya wrote:
> On 04/04/2012 01:38 PM, KOSAKI Motohiro wrote:
>> (4/3/12 4:25 AM), Jerome Marchand wrote:
>>> On 04/02/2012 07:10 PM, KOSAKI Motohiro wrote:
>>>> 2012/3/30 Satoru Moriya<satoru.moriya@hds.com>:
>>>>> So the kernel reclaims pages like following.
>>>>>
>>>>> nr_free + nr_filebacked>= watermark_high: reclaim only filebacked pages
>>>>> nr_free + nr_filebacked<   watermark_high: reclaim only anonymous pages
>>>>
>>>> How?
>>>
>>> get_scan_count() checks that case explicitly:
>>>
>>>     if (global_reclaim(sc)) {
>>>         free  = zone_page_state(mz->zone, NR_FREE_PAGES);
>>>         /* If we have very few page cache pages,
>>>            force-scan anon pages. */
>>>         if (unlikely(file + free<= high_wmark_pages(mz->zone))) {
>>>             fraction[0] = 1;
>>>             fraction[1] = 0;
>>>             denominator = 1;
>>>             goto out;
>>>         }
>>>     }
>>
>> Eek. This is silly. Nowaday many people enabled THP and it increase zone watermark.
>> so, high watermask is not good threshold anymore.
> 
> Ah yes, it is not so small now.
> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is 
> 67584.
> 
> How about using low watermark or min watermark?
> Are they still big?
> 
> ...or should we use other value? 

What do you think of the idea above?

By the way, I'd like to discuss this topic in other thread
because discussion about optimal threshold where the kernel
changes its reclaim policy does not affect only swappiness==0
case but also all other settings. 

So, I propose that we start with applying this patch first
and then discuss/improve the threshold.

The patch may not be perfect but, at least, we can improve
the kernel behavior in the enough filebacked memory case
with this patch. I believe it's better than nothing.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-05-11 21:11                 ` Satoru Moriya
@ 2012-05-12 22:21                   ` Rik van Riel
  0 siblings, 0 replies; 32+ messages in thread
From: Rik van Riel @ 2012-05-12 22:21 UTC (permalink / raw)
  To: Satoru Moriya
  Cc: KOSAKI Motohiro, Jerome Marchand, KOSAKI Motohiro, jweiner,
	linux-mm, linux-kernel, lwoodman, dle-develop, Seiji Aguchi

On 05/11/2012 05:11 PM, Satoru Moriya wrote:
> On 04/20/2012 08:21 PM, Satoru Moriya wrote:
>> Ah yes, it is not so small now.
>> On 4GB server, without THP min_free_kbytes is 8113 but with THP it is
>> 67584.
>>
>> How about using low watermark or min watermark?
>> Are they still big?
>>
>> ...or should we use other value?
>
> What do you think of the idea above?

I believe that using the high watermark is just fine.

We want to start swapping, before the page cache is so
small that we start thrashing from that.

> So, I propose that we start with applying this patch first
> and then discuss/improve the threshold.
>
> The patch may not be perfect but, at least, we can improve
> the kernel behavior in the enough filebacked memory case
> with this patch. I believe it's better than nothing.

Agreed.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-05-07 20:09               ` Rik van Riel
  2012-05-08  0:05                 ` Minchan Kim
@ 2012-05-21  7:12                 ` Richard Davies
  2012-05-21 13:39                   ` Satoru Moriya
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Davies @ 2012-05-21  7:12 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Satoru Moriya, Jerome Marchand, jweiner,
	linux-mm, linux-kernel, lwoodman, shaohua.li, dle-develop,
	Seiji Aguchi, Minchan Kim

Hi Satoru,

Rik van Riel wrote:
> KOSAKI Motohiro wrote:
> > Richard Davies wrote:
> > >Satoru Moriya wrote:
> > > > > I have run into problems with heavy swapping with swappiness==0 and
> > > > > was pointed to this thread (
> > > > > http://marc.info/?l=linux-mm&m=133522782307215 )
> > > >
> > > > Did you test this patch with your workload?
> > >
> > > I haven't yet tested this patch. It takes a long time since these are
> > > production machines, and the bug itself takes several weeks of
> > > production use to really show up.
> > >
> > > Rik van Riel has pointed out a lot of VM tweaks that he put into 3.4:
> > > http://marc.info/?l=linux-mm&m=133536506926326
> > >
> > > My intention is to reboot half of our machines into plain 3.4 once it
> > > is out, and half onto 3.4 + your patch.
> > >
> > > Then we can compare behaviour.
> > >
> > > Will your patch apply cleanly on 3.4?
> >
> > Note. This patch doesn't solve your issue. This patch mean,
> > when occuring very few swap io, it change to 0. But you said
> > you are seeing eager swap io. As Dave already pointed out, your
> > machine have buffer head issue.
> >
> > So, this thread is pointless.
>
> Running KVM guests directly off block devices results in a lot
> of buffer cache.
>
> I suspect that this patch will in fact fix Richard's issue.
>
> The patch is small, fairly simple and looks like it will fix
> people's problems. It also makes swappiness=0 behave the way
> most people seem to imagine it would work.
>
> If it works for a few people (test results), I believe we
> might as well merge it.
>
> Yes, for cgroups we may need additional logic, but we can
> sort that out as we go along.

Now that 3.4 is out with Rik's fixes, I'm keen to start testing with and
without this extra patch.

Satoru - should I just apply your original patch (most likely), or do you
need to update for the final released kernel?

Thanks,

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [RFC][PATCH] avoid swapping out with swappiness==0
  2012-05-21  7:12                 ` Richard Davies
@ 2012-05-21 13:39                   ` Satoru Moriya
  0 siblings, 0 replies; 32+ messages in thread
From: Satoru Moriya @ 2012-05-21 13:39 UTC (permalink / raw)
  To: Richard Davies, Rik van Riel
  Cc: KOSAKI Motohiro, Jerome Marchand, jweiner, linux-mm,
	linux-kernel, lwoodman, dle-develop, Seiji Aguchi, Minchan Kim

Hi Richard,

On 05/21/2012 03:12 AM, Richard Davies wrote:
> Now that 3.4 is out with Rik's fixes, I'm keen to start testing with 
> and without this extra patch.
> 
> Satoru - should I just apply your original patch (most likely), or do 
> you need to update for the final released kernel?

Thank you for testing!
I believe you can apply the patch without any updates.

Regards,
Satoru

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-05-21 13:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
2012-03-02 22:47 ` Rik van Riel
2012-03-02 23:43   ` Satoru Moriya
2012-03-03  2:29   ` Hillf Danton
2012-03-04  6:57 ` Minchan Kim
2012-03-05 21:38   ` Satoru Moriya
2012-03-05 13:49 ` Rik van Riel
2012-03-05 21:56 ` Johannes Weiner
2012-03-07 17:19   ` KOSAKI Motohiro
2012-03-07 18:18     ` Satoru Moriya
2012-03-30 22:44       ` Satoru Moriya
2012-04-02 17:10         ` KOSAKI Motohiro
2012-04-03 11:25           ` Jerome Marchand
2012-04-03 15:15             ` Satoru Moriya
2012-04-04 17:38             ` KOSAKI Motohiro
2012-04-21  0:21               ` Satoru Moriya
2012-05-11 21:11                 ` Satoru Moriya
2012-05-12 22:21                   ` Rik van Riel
2012-04-24  8:20       ` Richard Davies
2012-04-24 22:14         ` Satoru Moriya
2012-04-26 14:26           ` Richard Davies
2012-04-26 15:41             ` KOSAKI Motohiro
2012-05-07 20:09               ` Rik van Riel
2012-05-08  0:05                 ` Minchan Kim
2012-05-21  7:12                 ` Richard Davies
2012-05-21 13:39                   ` Satoru Moriya
2012-04-26 14:50         ` Christoph Lameter
2012-04-26 15:37           ` KOSAKI Motohiro
2012-04-26 16:08             ` Richard Davies
2012-04-26 18:20             ` Christoph Lameter
2012-04-27 13:55           ` Rik van Riel
2012-05-07 20:11 ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).