All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <dhillf@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: Satoru Moriya <satoru.moriya@hds.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	"jweiner@redhat.com" <jweiner@redhat.com>,
	"shaohua.li@intel.com" <shaohua.li@intel.com>
Subject: Re: [RFC][PATCH] avoid swapping out with swappiness==0
Date: Sat, 3 Mar 2012 10:29:04 +0800	[thread overview]
Message-ID: <CAJd=RBBdnA-gCXo8w5afng_v+AgfQF797pKW0eDdVJbszULvhg@mail.gmail.com> (raw)
In-Reply-To: <4F514E09.5060801@redhat.com>

On Sat, Mar 3, 2012 at 6:47 AM, Rik van Riel <riel@redhat.com> wrote:
> On 03/02/2012 12:36 PM, Satoru Moriya wrote:
>>
>> Sometimes we'd like to avoid swapping out anonymous memory
>> in particular, avoid swapping out pages of important process or
>> process groups while there is a reasonable amount of pagecache
>> on RAM so that we can satisfy our customers' requirements.
>>
>> OTOH, we can control how aggressive the kernel will swap memory pages
>> with /proc/sys/vm/swappiness for global and
>> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>>
>> But with current reclaim implementation, the kernel may swap out
>> even if we set swappiness==0 and there is pagecache on RAM.
>>
>> This patch changes the behavior with swappiness==0. If we set
>> swappiness==0, the kernel does not swap out completely
>> (for global reclaim until the amount of free pages and filebacked
>> pages in a zone has been reduced to something very very small
>> (nr_free + nr_filebacked<  high watermark)).
>>
>> Any comments are welcome.
>>
>> Regards,
>> Satoru Moriya
>>
>> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>
>> ---
>>  mm/vmscan.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c52b235..27dc3e8 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone
>> *mz, struct scan_control *sc,
>>         * proportional to the fraction of recently scanned pages on
>>         * each list that were recently referenced and in active use.
>>         */
>> -       ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
>> +       ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
>>        ap /= reclaim_stat->recent_rotated[0] + 1;
>>
>> -       fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>> +       fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
>>        fp /= reclaim_stat->recent_rotated[1] + 1;
>>        spin_unlock_irq(&mz->zone->lru_lock);
>
>
> ACK on this bit of the patch.
>
>> @@ -1999,7 +1999,7 @@ out:
>>                unsigned long scan;
>>
>>                scan = zone_nr_lru_pages(mz, lru);
>> -               if (priority || noswap) {
>> +               if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>>                        scan>>= priority;
>>                        if (!scan&&  force_scan)
>>                                scan = SWAP_CLUSTER_MAX;
>
>
> However, I do not understand why we fail to scale
> the number of pages we want to scan with priority
> if "noswap".
>
> For that matter, surely if we do not want to swap
> out anonymous pages, we WANT to go into this if
> branch, in order to make sure we set "scan" to 0?
>
> scan = div64_u64(scan * fraction[file], denominator);
>
> With your patch and swappiness=0, or no swap space, it
> looks like we do not zero out "scan" and may end up
> scanning anonymous pages.
>
> Am I overlooking something?  Is this correct?
>

Try to simplify the complex a bit :)

Good weekend
-hd

--- a/mm/vmscan.c	Wed Feb  8 20:10:14 2012
+++ b/mm/vmscan.c	Sat Mar  3 10:02:10 2012
@@ -1997,15 +1997,23 @@ static void get_scan_count(struct mem_cg
 out:
 	for_each_evictable_lru(lru) {
 		int file = is_file_lru(lru);
-		unsigned long scan;
+		unsigned long scan = 0;

-		scan = zone_nr_lru_pages(mz, lru);
-		if (priority || noswap) {
-			scan >>= priority;
-			if (!scan && force_scan)
-				scan = SWAP_CLUSTER_MAX;
+		/* First, check noswap */
+		if (noswap && !file)
+			goto set;
+
+		/* Second, apply priority */
+		scan = zone_nr_lru_pages(mz, lru) >> priority;
+
+		/* Third, check force */
+		if (!scan && force_scan)
+			scan = SWAP_CLUSTER_MAX;
+
+		/* Finally, try to avoid div64 */
+		if (scan)
 			scan = div64_u64(scan * fraction[file], denominator);
-		}
+set:
 		nr[lru] = scan;
 	}
 }
--

WARNING: multiple messages have this Message-ID (diff)
From: Hillf Danton <dhillf@gmail.com>
To: Rik van Riel <riel@redhat.com>
Cc: Satoru Moriya <satoru.moriya@hds.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	"jweiner@redhat.com" <jweiner@redhat.com>,
	"shaohua.li@intel.com" <shaohua.li@intel.com>
Subject: Re: [RFC][PATCH] avoid swapping out with swappiness==0
Date: Sat, 3 Mar 2012 10:29:04 +0800	[thread overview]
Message-ID: <CAJd=RBBdnA-gCXo8w5afng_v+AgfQF797pKW0eDdVJbszULvhg@mail.gmail.com> (raw)
In-Reply-To: <4F514E09.5060801@redhat.com>

On Sat, Mar 3, 2012 at 6:47 AM, Rik van Riel <riel@redhat.com> wrote:
> On 03/02/2012 12:36 PM, Satoru Moriya wrote:
>>
>> Sometimes we'd like to avoid swapping out anonymous memory
>> in particular, avoid swapping out pages of important process or
>> process groups while there is a reasonable amount of pagecache
>> on RAM so that we can satisfy our customers' requirements.
>>
>> OTOH, we can control how aggressive the kernel will swap memory pages
>> with /proc/sys/vm/swappiness for global and
>> /sys/fs/cgroup/memory/memory.swappiness for each memcg.
>>
>> But with current reclaim implementation, the kernel may swap out
>> even if we set swappiness==0 and there is pagecache on RAM.
>>
>> This patch changes the behavior with swappiness==0. If we set
>> swappiness==0, the kernel does not swap out completely
>> (for global reclaim until the amount of free pages and filebacked
>> pages in a zone has been reduced to something very very small
>> (nr_free + nr_filebacked<  high watermark)).
>>
>> Any comments are welcome.
>>
>> Regards,
>> Satoru Moriya
>>
>> Signed-off-by: Satoru Moriya<satoru.moriya@hds.com>
>> ---
>>  mm/vmscan.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c52b235..27dc3e8 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1983,10 +1983,10 @@ static void get_scan_count(struct mem_cgroup_zone
>> *mz, struct scan_control *sc,
>>         * proportional to the fraction of recently scanned pages on
>>         * each list that were recently referenced and in active use.
>>         */
>> -       ap = (anon_prio + 1) * (reclaim_stat->recent_scanned[0] + 1);
>> +       ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
>>        ap /= reclaim_stat->recent_rotated[0] + 1;
>>
>> -       fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
>> +       fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
>>        fp /= reclaim_stat->recent_rotated[1] + 1;
>>        spin_unlock_irq(&mz->zone->lru_lock);
>
>
> ACK on this bit of the patch.
>
>> @@ -1999,7 +1999,7 @@ out:
>>                unsigned long scan;
>>
>>                scan = zone_nr_lru_pages(mz, lru);
>> -               if (priority || noswap) {
>> +               if (priority || noswap || !vmscan_swappiness(mz, sc)) {
>>                        scan>>= priority;
>>                        if (!scan&&  force_scan)
>>                                scan = SWAP_CLUSTER_MAX;
>
>
> However, I do not understand why we fail to scale
> the number of pages we want to scan with priority
> if "noswap".
>
> For that matter, surely if we do not want to swap
> out anonymous pages, we WANT to go into this if
> branch, in order to make sure we set "scan" to 0?
>
> scan = div64_u64(scan * fraction[file], denominator);
>
> With your patch and swappiness=0, or no swap space, it
> looks like we do not zero out "scan" and may end up
> scanning anonymous pages.
>
> Am I overlooking something?  Is this correct?
>

Try to simplify the complex a bit :)

Good weekend
-hd

--- a/mm/vmscan.c	Wed Feb  8 20:10:14 2012
+++ b/mm/vmscan.c	Sat Mar  3 10:02:10 2012
@@ -1997,15 +1997,23 @@ static void get_scan_count(struct mem_cg
 out:
 	for_each_evictable_lru(lru) {
 		int file = is_file_lru(lru);
-		unsigned long scan;
+		unsigned long scan = 0;

-		scan = zone_nr_lru_pages(mz, lru);
-		if (priority || noswap) {
-			scan >>= priority;
-			if (!scan && force_scan)
-				scan = SWAP_CLUSTER_MAX;
+		/* First, check noswap */
+		if (noswap && !file)
+			goto set;
+
+		/* Second, apply priority */
+		scan = zone_nr_lru_pages(mz, lru) >> priority;
+
+		/* Third, check force */
+		if (!scan && force_scan)
+			scan = SWAP_CLUSTER_MAX;
+
+		/* Finally, try to avoid div64 */
+		if (scan)
 			scan = div64_u64(scan * fraction[file], denominator);
-		}
+set:
 		nr[lru] = scan;
 	}
 }
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-03-03  2:29 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-02 17:36 [RFC][PATCH] avoid swapping out with swappiness==0 Satoru Moriya
2012-03-02 17:36 ` Satoru Moriya
2012-03-02 22:47 ` Rik van Riel
2012-03-02 22:47   ` Rik van Riel
2012-03-02 23:43   ` Satoru Moriya
2012-03-02 23:43     ` Satoru Moriya
2012-03-03  2:29   ` Hillf Danton [this message]
2012-03-03  2:29     ` Hillf Danton
2012-03-04  6:57 ` Minchan Kim
2012-03-04  6:57   ` Minchan Kim
2012-03-05 21:38   ` Satoru Moriya
2012-03-05 21:38     ` Satoru Moriya
2012-03-05 13:49 ` Rik van Riel
2012-03-05 13:49   ` Rik van Riel
2012-03-05 21:56 ` Johannes Weiner
2012-03-05 21:56   ` Johannes Weiner
2012-03-07 17:19   ` KOSAKI Motohiro
2012-03-07 17:19     ` KOSAKI Motohiro
2012-03-07 18:18     ` Satoru Moriya
2012-03-07 18:18       ` Satoru Moriya
2012-03-30 22:44       ` Satoru Moriya
2012-03-30 22:44         ` Satoru Moriya
2012-04-02 17:10         ` KOSAKI Motohiro
2012-04-02 17:10           ` KOSAKI Motohiro
2012-04-03 11:25           ` Jerome Marchand
2012-04-03 11:25             ` Jerome Marchand
2012-04-03 15:15             ` Satoru Moriya
2012-04-03 15:15               ` Satoru Moriya
2012-04-04 17:38             ` KOSAKI Motohiro
2012-04-04 17:38               ` KOSAKI Motohiro
2012-04-21  0:21               ` Satoru Moriya
2012-04-21  0:21                 ` Satoru Moriya
2012-05-11 21:11                 ` Satoru Moriya
2012-05-11 21:11                   ` Satoru Moriya
2012-05-12 22:21                   ` Rik van Riel
2012-05-12 22:21                     ` Rik van Riel
2012-04-24  8:20       ` Richard Davies
2012-04-24  8:20         ` Richard Davies
2012-04-24 22:14         ` Satoru Moriya
2012-04-24 22:14           ` Satoru Moriya
2012-04-26 14:26           ` Richard Davies
2012-04-26 14:26             ` Richard Davies
2012-04-26 15:41             ` KOSAKI Motohiro
2012-04-26 15:41               ` KOSAKI Motohiro
2012-05-07 20:09               ` Rik van Riel
2012-05-07 20:09                 ` Rik van Riel
2012-05-08  0:05                 ` Minchan Kim
2012-05-08  0:05                   ` Minchan Kim
2012-05-21  7:12                 ` Richard Davies
2012-05-21  7:12                   ` Richard Davies
2012-05-21 13:39                   ` Satoru Moriya
2012-05-21 13:39                     ` Satoru Moriya
2012-04-26 14:50         ` Christoph Lameter
2012-04-26 14:50           ` Christoph Lameter
2012-04-26 15:37           ` KOSAKI Motohiro
2012-04-26 15:37             ` KOSAKI Motohiro
2012-04-26 16:08             ` Richard Davies
2012-04-26 16:08               ` Richard Davies
2012-04-26 18:20             ` Christoph Lameter
2012-04-26 18:20               ` Christoph Lameter
2012-04-27 13:55           ` Rik van Riel
2012-04-27 13:55             ` Rik van Riel
2012-05-07 20:11 ` Rik van Riel
2012-05-07 20:11   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJd=RBBdnA-gCXo8w5afng_v+AgfQF797pKW0eDdVJbszULvhg@mail.gmail.com' \
    --to=dhillf@gmail.com \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=riel@redhat.com \
    --cc=satoru.moriya@hds.com \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.