All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Chris Mason <chris.mason@oracle.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org, gregkh@novell.com,
	Corrado Zoccolo <czoccolo@gmail.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Date: Mon, 19 Apr 2010 23:44:12 +0200	[thread overview]
Message-ID: <20100419214412.GB5336@cmpxchg.org> (raw)
In-Reply-To: <4BCC4B0C.8000602@linux.vnet.ibm.com>

On Mon, Apr 19, 2010 at 02:22:36PM +0200, Christian Ehrhardt wrote:
> So now coming to the probably most critical part - the evict once discussion in this thread.
> I'll try to explain what I found in the meanwhile - let me know whats unclear and I'll add data etc.
> 
> In the past we identified that "echo 3 > /proc/sys/vm/drop_caches" helps to improve the accuracy of the used testcase by lowering the noise from 5-8% to <1%.
> Therefore I ran all tests and verifications with that drops.
> In the meanwhile I unfortunately discovered that Mel's fix only helps for the cases when the caches are dropped.
> Without it seems to be bad all the time. So don't cast the patch away due to that discovery :-)
> 
> On the good side I was also able to analyze a few more things due to that insight - and it might give us new data to debug the root cause.
> Like Mel I also had identified "56e49d21 vmscan: evict use-once pages first" to be related in the past. But without the watermark wait fix, unapplying it 56e49d21 didn't change much for my case so I left this analysis path.
> 
> But now after I found dropping caches is the key to "get back good performance" and "subsequent writes for bad performance" even with watermark wait applied I checked what else changes:
> - first write/read load after reboot or dropping caches -> read TP good
> - second write/read load after reboot or dropping caches -> read TP bad
> => so what changed.
> 
> I went through all kind of logs and found something in the system activity report which very probably is related to 56e49d21.
> When issuing subsequent writes after I dropped caches to get a clean start I get this in Buffers/Caches from Meminfo:
> 
> pre write 1
> Buffers:             484 kB
> Cached:             5664 kB
> pre write 2
> Buffers:           33500 kB
> Cached:           149856 kB
> pre write 3
> Buffers:           65564 kB
> Cached:           115888 kB
> pre write 4
> Buffers:           85556 kB
> Cached:            97184 kB
> 
> It stays at ~85M with more writes which is approx 50% of my free 160M memory.

Ok, so I am the idiot that got quoted on 'the active set is not too big, so
buffer heads are not a problem when avoiding to scan it' in eternal history.

But the threshold inactive/active ratio for skipping active file pages is
actually 1:1.

The easiest 'fix' is probably to change that ratio, 2:1 (or even 3:1?) appears
to be a bit more natural anyway?  Below is a patch that changes it to 2:1.
Christian, can you check if it fixes your regression?

Additionally, we can always scan active file pages but only deactivate them
when the ratio is off and otherwise strip buffers of clean pages.

What do people think?

	Hannes

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4ede99..a4aea76 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -898,7 +898,7 @@ int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
 	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
 	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
 
-	return (active > inactive);
+	return (active > inactive / 2);
 }
 
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..8f1a846 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1466,7 +1466,7 @@ static int inactive_file_is_low_global(struct zone *zone)
 	active = zone_page_state(zone, NR_ACTIVE_FILE);
 	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
 
-	return (active > inactive);
+	return (active > inactive / 2);
 }
 
 /**


WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Chris Mason <chris.mason@oracle.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux-kernel@vger.kernel.org, gregkh@novell.com,
	Corrado Zoccolo <czoccolo@gmail.com>
Subject: Re: [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure
Date: Mon, 19 Apr 2010 23:44:12 +0200	[thread overview]
Message-ID: <20100419214412.GB5336@cmpxchg.org> (raw)
In-Reply-To: <4BCC4B0C.8000602@linux.vnet.ibm.com>

On Mon, Apr 19, 2010 at 02:22:36PM +0200, Christian Ehrhardt wrote:
> So now coming to the probably most critical part - the evict once discussion in this thread.
> I'll try to explain what I found in the meanwhile - let me know whats unclear and I'll add data etc.
> 
> In the past we identified that "echo 3 > /proc/sys/vm/drop_caches" helps to improve the accuracy of the used testcase by lowering the noise from 5-8% to <1%.
> Therefore I ran all tests and verifications with that drops.
> In the meanwhile I unfortunately discovered that Mel's fix only helps for the cases when the caches are dropped.
> Without it seems to be bad all the time. So don't cast the patch away due to that discovery :-)
> 
> On the good side I was also able to analyze a few more things due to that insight - and it might give us new data to debug the root cause.
> Like Mel I also had identified "56e49d21 vmscan: evict use-once pages first" to be related in the past. But without the watermark wait fix, unapplying it 56e49d21 didn't change much for my case so I left this analysis path.
> 
> But now after I found dropping caches is the key to "get back good performance" and "subsequent writes for bad performance" even with watermark wait applied I checked what else changes:
> - first write/read load after reboot or dropping caches -> read TP good
> - second write/read load after reboot or dropping caches -> read TP bad
> => so what changed.
> 
> I went through all kind of logs and found something in the system activity report which very probably is related to 56e49d21.
> When issuing subsequent writes after I dropped caches to get a clean start I get this in Buffers/Caches from Meminfo:
> 
> pre write 1
> Buffers:             484 kB
> Cached:             5664 kB
> pre write 2
> Buffers:           33500 kB
> Cached:           149856 kB
> pre write 3
> Buffers:           65564 kB
> Cached:           115888 kB
> pre write 4
> Buffers:           85556 kB
> Cached:            97184 kB
> 
> It stays at ~85M with more writes which is approx 50% of my free 160M memory.

Ok, so I am the idiot that got quoted on 'the active set is not too big, so
buffer heads are not a problem when avoiding to scan it' in eternal history.

But the threshold inactive/active ratio for skipping active file pages is
actually 1:1.

The easiest 'fix' is probably to change that ratio, 2:1 (or even 3:1?) appears
to be a bit more natural anyway?  Below is a patch that changes it to 2:1.
Christian, can you check if it fixes your regression?

Additionally, we can always scan active file pages but only deactivate them
when the ratio is off and otherwise strip buffers of clean pages.

What do people think?

	Hannes

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4ede99..a4aea76 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -898,7 +898,7 @@ int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
 	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
 	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
 
-	return (active > inactive);
+	return (active > inactive / 2);
 }
 
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3ff3311..8f1a846 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1466,7 +1466,7 @@ static int inactive_file_is_low_global(struct zone *zone)
 	active = zone_page_state(zone, NR_ACTIVE_FILE);
 	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
 
-	return (active > inactive);
+	return (active > inactive / 2);
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-04-19 21:44 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 11:48 [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Mel Gorman
2010-03-08 11:48 ` Mel Gorman
2010-03-08 11:48 ` [PATCH 1/3] page-allocator: Under memory pressure, wait on pressure to relieve instead of congestion Mel Gorman
2010-03-08 11:48   ` Mel Gorman
2010-03-09 13:35   ` Nick Piggin
2010-03-09 13:35     ` Nick Piggin
2010-03-09 14:17     ` Mel Gorman
2010-03-09 14:17       ` Mel Gorman
2010-03-09 15:03       ` Nick Piggin
2010-03-09 15:03         ` Nick Piggin
2010-03-09 15:42         ` Christian Ehrhardt
2010-03-09 15:42           ` Christian Ehrhardt
2010-03-09 18:22           ` Mel Gorman
2010-03-09 18:22             ` Mel Gorman
2010-03-10  2:38             ` Nick Piggin
2010-03-10  2:38               ` Nick Piggin
2010-03-09 17:35         ` Mel Gorman
2010-03-09 17:35           ` Mel Gorman
2010-03-10  2:35           ` Nick Piggin
2010-03-10  2:35             ` Nick Piggin
2010-03-09 15:50   ` Christoph Lameter
2010-03-09 15:50     ` Christoph Lameter
2010-03-09 15:56     ` Christian Ehrhardt
2010-03-09 15:56       ` Christian Ehrhardt
2010-03-09 16:09       ` Christoph Lameter
2010-03-09 16:09         ` Christoph Lameter
2010-03-09 17:01         ` Mel Gorman
2010-03-09 17:01           ` Mel Gorman
2010-03-09 17:11           ` Christoph Lameter
2010-03-09 17:11             ` Christoph Lameter
2010-03-09 17:30             ` Mel Gorman
2010-03-09 17:30               ` Mel Gorman
2010-03-08 11:48 ` [PATCH 2/3] page-allocator: Check zone pressure when batch of pages are freed Mel Gorman
2010-03-08 11:48   ` Mel Gorman
2010-03-09  9:53   ` Nick Piggin
2010-03-09  9:53     ` Nick Piggin
2010-03-09 10:08     ` Mel Gorman
2010-03-09 10:08       ` Mel Gorman
2010-03-09 10:23       ` Nick Piggin
2010-03-09 10:23         ` Nick Piggin
2010-03-09 10:36         ` Mel Gorman
2010-03-09 10:36           ` Mel Gorman
2010-03-09 11:11           ` Nick Piggin
2010-03-09 11:11             ` Nick Piggin
2010-03-09 11:29             ` Mel Gorman
2010-03-09 11:29               ` Mel Gorman
2010-03-08 11:48 ` [PATCH 3/3] vmscan: Put kswapd to sleep on its own waitqueue, not congestion Mel Gorman
2010-03-08 11:48   ` Mel Gorman
2010-03-09 10:00   ` Nick Piggin
2010-03-09 10:00     ` Nick Piggin
2010-03-09 10:21     ` Mel Gorman
2010-03-09 10:21       ` Mel Gorman
2010-03-09 10:32       ` Nick Piggin
2010-03-09 10:32         ` Nick Piggin
2010-03-11 23:41 ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Andrew Morton
2010-03-11 23:41   ` Andrew Morton
2010-03-12  6:39   ` Christian Ehrhardt
2010-03-12  6:39     ` Christian Ehrhardt
2010-03-12  7:05     ` Andrew Morton
2010-03-12  7:05       ` Andrew Morton
2010-03-12 10:47       ` Mel Gorman
2010-03-12 10:47         ` Mel Gorman
2010-03-12 12:15         ` Christian Ehrhardt
2010-03-12 12:15           ` Christian Ehrhardt
2010-03-12 14:37           ` Andrew Morton
2010-03-12 14:37             ` Andrew Morton
2010-03-15 12:29             ` Mel Gorman
2010-03-15 12:29               ` Mel Gorman
2010-03-15 14:45               ` Christian Ehrhardt
2010-03-15 14:45                 ` Christian Ehrhardt
2010-03-15 12:34             ` Christian Ehrhardt
2010-03-15 12:34               ` Christian Ehrhardt
2010-03-15 20:09               ` Andrew Morton
2010-03-15 20:09                 ` Andrew Morton
2010-03-16 10:11                 ` Mel Gorman
2010-03-16 10:11                   ` Mel Gorman
2010-03-18 17:42                 ` Mel Gorman
2010-03-18 17:42                   ` Mel Gorman
2010-03-22 23:50                 ` Mel Gorman
2010-03-22 23:50                   ` Mel Gorman
2010-03-23 14:35                   ` Christian Ehrhardt
2010-03-23 14:35                     ` Christian Ehrhardt
2010-03-23 21:35                   ` Corrado Zoccolo
2010-03-23 21:35                     ` Corrado Zoccolo
2010-03-24 11:48                     ` Mel Gorman
2010-03-24 11:48                       ` Mel Gorman
2010-03-24 12:56                       ` Corrado Zoccolo
2010-03-24 12:56                         ` Corrado Zoccolo
2010-03-23 22:29                   ` Rik van Riel
2010-03-23 22:29                     ` Rik van Riel
2010-03-24 14:50                     ` Mel Gorman
2010-03-24 14:50                       ` Mel Gorman
2010-04-19 12:22                       ` Christian Ehrhardt
2010-04-19 12:22                         ` Christian Ehrhardt
2010-04-19 21:44                         ` Johannes Weiner [this message]
2010-04-19 21:44                           ` Johannes Weiner
2010-04-20  7:20                           ` Christian Ehrhardt
2010-04-20  7:20                             ` Christian Ehrhardt
2010-04-20  8:54                             ` Christian Ehrhardt
2010-04-20  8:54                               ` Christian Ehrhardt
2010-04-20 15:32                             ` Johannes Weiner
2010-04-20 15:32                               ` Johannes Weiner
2010-04-20 17:22                               ` Rik van Riel
2010-04-20 17:22                                 ` Rik van Riel
2010-04-21  4:23                                 ` Christian Ehrhardt
2010-04-21  4:23                                   ` Christian Ehrhardt
2010-04-21  7:35                                   ` Christian Ehrhardt
2010-04-21  7:35                                     ` Christian Ehrhardt
2010-04-21 13:19                                     ` Rik van Riel
2010-04-21 13:19                                       ` Rik van Riel
2010-04-22  6:21                                       ` Christian Ehrhardt
2010-04-22  6:21                                         ` Christian Ehrhardt
2010-04-26 10:59                                         ` Subject: [PATCH][RFC] mm: make working set portion that is protected tunable v2 Christian Ehrhardt
2010-04-26 10:59                                           ` Christian Ehrhardt
2010-04-26 11:59                                           ` KOSAKI Motohiro
2010-04-26 11:59                                             ` KOSAKI Motohiro
2010-04-26 12:43                                             ` Christian Ehrhardt
2010-04-26 12:43                                               ` Christian Ehrhardt
2010-04-26 14:20                                               ` Rik van Riel
2010-04-26 14:20                                                 ` Rik van Riel
2010-04-27 14:00                                                 ` Christian Ehrhardt
2010-04-27 14:00                                                   ` Christian Ehrhardt
2010-04-21  9:03                                   ` [RFC PATCH 0/3] Avoid the use of congestion_wait under zone pressure Johannes Weiner
2010-04-21  9:03                                     ` Johannes Weiner
2010-04-21 13:20                                   ` Rik van Riel
2010-04-21 13:20                                     ` Rik van Riel
2010-04-20 14:40                           ` Rik van Riel
2010-04-20 14:40                             ` Rik van Riel
2010-03-24  2:38                   ` Greg KH
2010-03-24  2:38                     ` Greg KH
2010-03-24 11:49                     ` Mel Gorman
2010-03-24 11:49                       ` Mel Gorman
2010-03-24 13:13                   ` Johannes Weiner
2010-03-24 13:13                     ` Johannes Weiner
2010-03-12  9:09   ` Mel Gorman
2010-03-12  9:09     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100419214412.GB5336@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=czoccolo@gmail.com \
    --cc=ehrhardt@linux.vnet.ibm.com \
    --cc=gregkh@novell.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.