* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
[not found] <200904302208.n3UM8t9R016687@imap1.linux-foundation.org>
@ 2009-05-01 1:22 ` Wu Fengguang
0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-01 1:22 UTC (permalink / raw)
To: akpm
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
>
> Local variable `scan' can overflow on zones which are larger than
>
> (2G * 4k) / 100 = 80GB.
>
> Making it 64-bit on 64-bit will fix that up.
A side note about the "one HUGE scan inside shrink_zone":
Isn't this low level scan granularity way tooooo large?
It makes things a lot worse on memory pressure:
- the over reclaim, somehow workarounded by Rik's early bail out patch
- the throttle_vm_writeout()/congestion_wait() guards could work in a
very sparse manner and hence is useless: imagine to stop and wait
after shooting away every 1GB memory.
The long term fix could be to move the granularity control up to the
shrink_zones() level: there it can bail out early without hurting the
balanced zone aging.
Thanks,
Fengguang
> --- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
> +++ a/mm/vmscan.c
> @@ -1471,7 +1471,7 @@ static void shrink_zone(int priority, st
>
> for_each_evictable_lru(l) {
> int file = is_file_lru(l);
> - int scan;
> + unsigned long scan;
>
> scan = zone_nr_pages(zone, sc, l);
> if (priority) {
> _
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
@ 2009-05-01 1:22 ` Wu Fengguang
0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-01 1:22 UTC (permalink / raw)
To: akpm
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
>
> Local variable `scan' can overflow on zones which are larger than
>
> (2G * 4k) / 100 = 80GB.
>
> Making it 64-bit on 64-bit will fix that up.
A side note about the "one HUGE scan inside shrink_zone":
Isn't this low level scan granularity way tooooo large?
It makes things a lot worse on memory pressure:
- the over reclaim, somehow workarounded by Rik's early bail out patch
- the throttle_vm_writeout()/congestion_wait() guards could work in a
very sparse manner and hence is useless: imagine to stop and wait
after shooting away every 1GB memory.
The long term fix could be to move the granularity control up to the
shrink_zones() level: there it can bail out early without hurting the
balanced zone aging.
Thanks,
Fengguang
> --- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
> +++ a/mm/vmscan.c
> @@ -1471,7 +1471,7 @@ static void shrink_zone(int priority, st
>
> for_each_evictable_lru(l) {
> int file = is_file_lru(l);
> - int scan;
> + unsigned long scan;
>
> scan = zone_nr_pages(zone, sc, l);
> if (priority) {
> _
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
2009-05-01 1:22 ` Wu Fengguang
@ 2009-05-01 2:49 ` Andrew Morton
-1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2009-05-01 2:49 UTC (permalink / raw)
To: Wu Fengguang
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, 1 May 2009 09:22:12 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
> >
> > Local variable `scan' can overflow on zones which are larger than
> >
> > (2G * 4k) / 100 = 80GB.
> >
> > Making it 64-bit on 64-bit will fix that up.
>
> A side note about the "one HUGE scan inside shrink_zone":
>
> Isn't this low level scan granularity way tooooo large?
>
> It makes things a lot worse on memory pressure:
> - the over reclaim, somehow workarounded by Rik's early bail out patch
> - the throttle_vm_writeout()/congestion_wait() guards could work in a
> very sparse manner and hence is useless: imagine to stop and wait
> after shooting away every 1GB memory.
>
> The long term fix could be to move the granularity control up to the
> shrink_zones() level: there it can bail out early without hurting the
> balanced zone aging.
>
I guess it could be bad in some circumstances. Normally we'll bail out
way early because (nr_reclaimed > swap_cluster_max) comes true. If it
_doesn't_ come true, we have little choice but to keep scanning.
The code is mystifying:
: for_each_evictable_lru(l) {
: int file = is_file_lru(l);
: unsigned long scan;
:
: scan = zone_nr_pages(zone, sc, l);
: if (priority) {
: scan >>= priority;
: scan = (scan * percent[file]) / 100;
: }
: if (scanning_global_lru(sc)) {
: zone->lru[l].nr_scan += scan;
Here we increase zone->lru[l].nr_scan by (say) 1000000.
: nr[l] = zone->lru[l].nr_scan;
locally save away the number of pages to scan
: if (nr[l] >= swap_cluster_max)
: zone->lru[l].nr_scan = 0;
err, wot? This makes no sense at all afacit.
: else
: nr[l] = 0;
ok, this is doing some batching I think.
: } else
: nr[l] = scan;
so we didn't update the zone's nr_scan at all here. But we display
nr_scan in /proc/zoneinfo as "scanned". So we're filing to inform
userspace about scanning on this zone which is due to memcgroup
constraints. I think.
: }
:
: while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
: nr[LRU_INACTIVE_FILE]) {
: for_each_evictable_lru(l) {
: if (nr[l]) {
: nr_to_scan = min(nr[l], swap_cluster_max);
: nr[l] -= nr_to_scan;
:
: nr_reclaimed += shrink_list(l, nr_to_scan,
: zone, sc, priority);
: }
: }
: /*
: * On large memory systems, scan >> priority can become
: * really large. This is fine for the starting priority;
: * we want to put equal scanning pressure on each zone.
: * However, if the VM has a harder time of freeing pages,
: * with multiple processes reclaiming pages, the total
: * freeing target can get unreasonably large.
: */
: if (nr_reclaimed > swap_cluster_max &&
: priority < DEF_PRIORITY && !current_is_kswapd())
: break;
here we bale out after scanning 32 pages, without updating ->nr_scan.
: }
What on earth does zone->lru[l].nr_scan mean after wending through all
this stuff?
afacit this will muck up /proc/zoneinfo, but nothing else.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
@ 2009-05-01 2:49 ` Andrew Morton
0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2009-05-01 2:49 UTC (permalink / raw)
To: Wu Fengguang
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, 1 May 2009 09:22:12 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
> >
> > Local variable `scan' can overflow on zones which are larger than
> >
> > (2G * 4k) / 100 = 80GB.
> >
> > Making it 64-bit on 64-bit will fix that up.
>
> A side note about the "one HUGE scan inside shrink_zone":
>
> Isn't this low level scan granularity way tooooo large?
>
> It makes things a lot worse on memory pressure:
> - the over reclaim, somehow workarounded by Rik's early bail out patch
> - the throttle_vm_writeout()/congestion_wait() guards could work in a
> very sparse manner and hence is useless: imagine to stop and wait
> after shooting away every 1GB memory.
>
> The long term fix could be to move the granularity control up to the
> shrink_zones() level: there it can bail out early without hurting the
> balanced zone aging.
>
I guess it could be bad in some circumstances. Normally we'll bail out
way early because (nr_reclaimed > swap_cluster_max) comes true. If it
_doesn't_ come true, we have little choice but to keep scanning.
The code is mystifying:
: for_each_evictable_lru(l) {
: int file = is_file_lru(l);
: unsigned long scan;
:
: scan = zone_nr_pages(zone, sc, l);
: if (priority) {
: scan >>= priority;
: scan = (scan * percent[file]) / 100;
: }
: if (scanning_global_lru(sc)) {
: zone->lru[l].nr_scan += scan;
Here we increase zone->lru[l].nr_scan by (say) 1000000.
: nr[l] = zone->lru[l].nr_scan;
locally save away the number of pages to scan
: if (nr[l] >= swap_cluster_max)
: zone->lru[l].nr_scan = 0;
err, wot? This makes no sense at all afacit.
: else
: nr[l] = 0;
ok, this is doing some batching I think.
: } else
: nr[l] = scan;
so we didn't update the zone's nr_scan at all here. But we display
nr_scan in /proc/zoneinfo as "scanned". So we're filing to inform
userspace about scanning on this zone which is due to memcgroup
constraints. I think.
: }
:
: while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
: nr[LRU_INACTIVE_FILE]) {
: for_each_evictable_lru(l) {
: if (nr[l]) {
: nr_to_scan = min(nr[l], swap_cluster_max);
: nr[l] -= nr_to_scan;
:
: nr_reclaimed += shrink_list(l, nr_to_scan,
: zone, sc, priority);
: }
: }
: /*
: * On large memory systems, scan >> priority can become
: * really large. This is fine for the starting priority;
: * we want to put equal scanning pressure on each zone.
: * However, if the VM has a harder time of freeing pages,
: * with multiple processes reclaiming pages, the total
: * freeing target can get unreasonably large.
: */
: if (nr_reclaimed > swap_cluster_max &&
: priority < DEF_PRIORITY && !current_is_kswapd())
: break;
here we bale out after scanning 32 pages, without updating ->nr_scan.
: }
What on earth does zone->lru[l].nr_scan mean after wending through all
this stuff?
afacit this will muck up /proc/zoneinfo, but nothing else.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
2009-05-01 2:49 ` Andrew Morton
@ 2009-05-01 6:29 ` Wu Fengguang
-1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-01 6:29 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, May 01, 2009 at 10:49:07AM +0800, Andrew Morton wrote:
> On Fri, 1 May 2009 09:22:12 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
> > >
> > > Local variable `scan' can overflow on zones which are larger than
> > >
> > > (2G * 4k) / 100 = 80GB.
> > >
> > > Making it 64-bit on 64-bit will fix that up.
> >
> > A side note about the "one HUGE scan inside shrink_zone":
> >
> > Isn't this low level scan granularity way tooooo large?
> >
> > It makes things a lot worse on memory pressure:
> > - the over reclaim, somehow workarounded by Rik's early bail out patch
> > - the throttle_vm_writeout()/congestion_wait() guards could work in a
> > very sparse manner and hence is useless: imagine to stop and wait
> > after shooting away every 1GB memory.
> >
> > The long term fix could be to move the granularity control up to the
> > shrink_zones() level: there it can bail out early without hurting the
> > balanced zone aging.
> >
>
> I guess it could be bad in some circumstances. Normally we'll bail out
> way early because (nr_reclaimed > swap_cluster_max) comes true. If it
> _doesn't_ come true, we have little choice but to keep scanning.
Right. The main concern to the proposed granularity-control-lifting
could be the trickiness of scan code - the transition won't be easy.
> The code is mystifying:
>
> : for_each_evictable_lru(l) {
> : int file = is_file_lru(l);
> : unsigned long scan;
> :
> : scan = zone_nr_pages(zone, sc, l);
> : if (priority) {
> : scan >>= priority;
> : scan = (scan * percent[file]) / 100;
> : }
> : if (scanning_global_lru(sc)) {
> : zone->lru[l].nr_scan += scan;
>
> Here we increase zone->lru[l].nr_scan by (say) 1000000.
>
> : nr[l] = zone->lru[l].nr_scan;
>
> locally save away the number of pages to scan
>
> : if (nr[l] >= swap_cluster_max)
> : zone->lru[l].nr_scan = 0;
>
> err, wot? This makes no sense at all afacit.
>
> : else
> : nr[l] = 0;
>
> ok, this is doing some batching I think.
Yes it's batching. So that smallish <32 scans can be delayed and batched.
I was lost too (twice! First time in 2006 and once more in 2009), so
we'd better add a simple comment to remind this fact 8-)
> : } else
> : nr[l] = scan;
>
> so we didn't update the zone's nr_scan at all here. But we display
> nr_scan in /proc/zoneinfo as "scanned". So we're filing to inform
> userspace about scanning on this zone which is due to memcgroup
> constraints. I think.
$ grep scanned /proc/zoneinfo
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
They are all dynamic values. The first field shows pages scanned since
last reclaim - so a large value indicates we have trouble reclaiming
enough pages. The following 4 fields are the useless nr_scan[]s: they
never exceed SWAP_CLUSTER_MAX=32, and typically is 0 for large lists.
> : }
> :
> : while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
> : nr[LRU_INACTIVE_FILE]) {
> : for_each_evictable_lru(l) {
> : if (nr[l]) {
> : nr_to_scan = min(nr[l], swap_cluster_max);
> : nr[l] -= nr_to_scan;
> :
> : nr_reclaimed += shrink_list(l, nr_to_scan,
> : zone, sc, priority);
> : }
> : }
> : /*
> : * On large memory systems, scan >> priority can become
> : * really large. This is fine for the starting priority;
> : * we want to put equal scanning pressure on each zone.
> : * However, if the VM has a harder time of freeing pages,
> : * with multiple processes reclaiming pages, the total
> : * freeing target can get unreasonably large.
> : */
> : if (nr_reclaimed > swap_cluster_max &&
> : priority < DEF_PRIORITY && !current_is_kswapd())
> : break;
>
> here we bale out after scanning 32 pages, without updating ->nr_scan.
This is fine. Because (nr_reclaimed > swap_cluster_max) implies
(nr_scan = 0). You know nr_scan is not regular accounting numbers ;-)
> : }
>
>
> What on earth does zone->lru[l].nr_scan mean after wending through all
> this stuff?
>
> afacit this will muck up /proc/zoneinfo, but nothing else.
Exactly. nr_scan[] are not accounting numbers and means nothing to user.
They shall either be removed from /proc/zoneinfo, or be replaced with
meaningful _accumulated_ scan numbers.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone()
@ 2009-05-01 6:29 ` Wu Fengguang
0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-01 6:29 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML
On Fri, May 01, 2009 at 10:49:07AM +0800, Andrew Morton wrote:
> On Fri, 1 May 2009 09:22:12 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > On Fri, May 01, 2009 at 06:08:55AM +0800, Andrew Morton wrote:
> > >
> > > Local variable `scan' can overflow on zones which are larger than
> > >
> > > (2G * 4k) / 100 = 80GB.
> > >
> > > Making it 64-bit on 64-bit will fix that up.
> >
> > A side note about the "one HUGE scan inside shrink_zone":
> >
> > Isn't this low level scan granularity way tooooo large?
> >
> > It makes things a lot worse on memory pressure:
> > - the over reclaim, somehow workarounded by Rik's early bail out patch
> > - the throttle_vm_writeout()/congestion_wait() guards could work in a
> > very sparse manner and hence is useless: imagine to stop and wait
> > after shooting away every 1GB memory.
> >
> > The long term fix could be to move the granularity control up to the
> > shrink_zones() level: there it can bail out early without hurting the
> > balanced zone aging.
> >
>
> I guess it could be bad in some circumstances. Normally we'll bail out
> way early because (nr_reclaimed > swap_cluster_max) comes true. If it
> _doesn't_ come true, we have little choice but to keep scanning.
Right. The main concern to the proposed granularity-control-lifting
could be the trickiness of scan code - the transition won't be easy.
> The code is mystifying:
>
> : for_each_evictable_lru(l) {
> : int file = is_file_lru(l);
> : unsigned long scan;
> :
> : scan = zone_nr_pages(zone, sc, l);
> : if (priority) {
> : scan >>= priority;
> : scan = (scan * percent[file]) / 100;
> : }
> : if (scanning_global_lru(sc)) {
> : zone->lru[l].nr_scan += scan;
>
> Here we increase zone->lru[l].nr_scan by (say) 1000000.
>
> : nr[l] = zone->lru[l].nr_scan;
>
> locally save away the number of pages to scan
>
> : if (nr[l] >= swap_cluster_max)
> : zone->lru[l].nr_scan = 0;
>
> err, wot? This makes no sense at all afacit.
>
> : else
> : nr[l] = 0;
>
> ok, this is doing some batching I think.
Yes it's batching. So that smallish <32 scans can be delayed and batched.
I was lost too (twice! First time in 2006 and once more in 2009), so
we'd better add a simple comment to remind this fact 8-)
> : } else
> : nr[l] = scan;
>
> so we didn't update the zone's nr_scan at all here. But we display
> nr_scan in /proc/zoneinfo as "scanned". So we're filing to inform
> userspace about scanning on this zone which is due to memcgroup
> constraints. I think.
$ grep scanned /proc/zoneinfo
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
scanned 0 (aa: 0 ia: 0 af: 0 if: 0)
They are all dynamic values. The first field shows pages scanned since
last reclaim - so a large value indicates we have trouble reclaiming
enough pages. The following 4 fields are the useless nr_scan[]s: they
never exceed SWAP_CLUSTER_MAX=32, and typically is 0 for large lists.
> : }
> :
> : while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
> : nr[LRU_INACTIVE_FILE]) {
> : for_each_evictable_lru(l) {
> : if (nr[l]) {
> : nr_to_scan = min(nr[l], swap_cluster_max);
> : nr[l] -= nr_to_scan;
> :
> : nr_reclaimed += shrink_list(l, nr_to_scan,
> : zone, sc, priority);
> : }
> : }
> : /*
> : * On large memory systems, scan >> priority can become
> : * really large. This is fine for the starting priority;
> : * we want to put equal scanning pressure on each zone.
> : * However, if the VM has a harder time of freeing pages,
> : * with multiple processes reclaiming pages, the total
> : * freeing target can get unreasonably large.
> : */
> : if (nr_reclaimed > swap_cluster_max &&
> : priority < DEF_PRIORITY && !current_is_kswapd())
> : break;
>
> here we bale out after scanning 32 pages, without updating ->nr_scan.
This is fine. Because (nr_reclaimed > swap_cluster_max) implies
(nr_scan = 0). You know nr_scan is not regular accounting numbers ;-)
> : }
>
>
> What on earth does zone->lru[l].nr_scan mean after wending through all
> this stuff?
>
> afacit this will muck up /proc/zoneinfo, but nothing else.
Exactly. nr_scan[] are not accounting numbers and means nothing to user.
They shall either be removed from /proc/zoneinfo, or be replaced with
meaningful _accumulated_ scan numbers.
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] vmscan: cleanup the scan batching code
2009-05-01 2:49 ` Andrew Morton
@ 2009-05-02 2:31 ` Wu Fengguang
-1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-02 2:31 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
The vmscan batching logic is twisting. Move it into a standalone
function nr_scan_try_batch() and document it. No behavior change.
CC: Nick Piggin <npiggin@suse.de>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/mmzone.h | 4 ++--
mm/page_alloc.c | 2 +-
mm/vmscan.c | 39 ++++++++++++++++++++++++++++-----------
mm/vmstat.c | 8 ++++----
4 files changed, 35 insertions(+), 18 deletions(-)
--- mm.orig/include/linux/mmzone.h
+++ mm/include/linux/mmzone.h
@@ -323,9 +323,9 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct {
+ struct zone_lru {
struct list_head list;
- unsigned long nr_scan;
+ unsigned long nr_saved_scan; /* accumulated for batching */
} lru[NR_LRU_LISTS];
struct zone_reclaim_stat reclaim_stat;
--- mm.orig/mm/vmscan.c
+++ mm/mm/vmscan.c
@@ -1450,6 +1450,26 @@ static void get_scan_ratio(struct zone *
percent[1] = 100 - percent[0];
}
+/*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+ unsigned long *nr_saved_scan,
+ unsigned long swap_cluster_max)
+{
+ unsigned long nr;
+
+ *nr_saved_scan += nr_to_scan;
+ nr = *nr_saved_scan;
+
+ if (nr >= swap_cluster_max)
+ *nr_saved_scan = 0;
+ else
+ nr = 0;
+
+ return nr;
+}
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
@@ -1475,14 +1495,11 @@ static void shrink_zone(int priority, st
scan >>= priority;
scan = (scan * percent[file]) / 100;
}
- if (scanning_global_lru(sc)) {
- zone->lru[l].nr_scan += scan;
- nr[l] = zone->lru[l].nr_scan;
- if (nr[l] >= swap_cluster_max)
- zone->lru[l].nr_scan = 0;
- else
- nr[l] = 0;
- } else
+ if (scanning_global_lru(sc))
+ nr[l] = nr_scan_try_batch(scan,
+ &zone->lru[l].nr_saved_scan,
+ swap_cluster_max);
+ else
nr[l] = scan;
}
@@ -2079,11 +2096,11 @@ static void shrink_all_zones(unsigned lo
l == LRU_ACTIVE_FILE))
continue;
- zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
- if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
+ zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
+ if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
unsigned long nr_to_scan;
- zone->lru[l].nr_scan = 0;
+ zone->lru[l].nr_saved_scan = 0;
nr_to_scan = min(nr_pages, lru_pages);
nr_reclaimed += shrink_list(l, nr_to_scan, zone,
sc, prio);
--- mm.orig/mm/vmstat.c
+++ mm/mm/vmstat.c
@@ -729,10 +729,10 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->lru[LRU_ACTIVE_ANON].nr_scan,
- zone->lru[LRU_INACTIVE_ANON].nr_scan,
- zone->lru[LRU_ACTIVE_FILE].nr_scan,
- zone->lru[LRU_INACTIVE_FILE].nr_scan,
+ zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
+ zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
+ zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
+ zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
zone->spanned_pages,
zone->present_pages);
--- mm.orig/mm/page_alloc.c
+++ mm/mm/page_alloc.c
@@ -3544,7 +3544,7 @@ static void __paginginit free_area_init_
zone_pcp_init(zone);
for_each_lru(l) {
INIT_LIST_HEAD(&zone->lru[l].list);
- zone->lru[l].nr_scan = 0;
+ zone->lru[l].nr_saved_scan = 0;
}
zone->reclaim_stat.recent_rotated[0] = 0;
zone->reclaim_stat.recent_rotated[1] = 0;
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] vmscan: cleanup the scan batching code
@ 2009-05-02 2:31 ` Wu Fengguang
0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-02 2:31 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
The vmscan batching logic is twisting. Move it into a standalone
function nr_scan_try_batch() and document it. No behavior change.
CC: Nick Piggin <npiggin@suse.de>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/mmzone.h | 4 ++--
mm/page_alloc.c | 2 +-
mm/vmscan.c | 39 ++++++++++++++++++++++++++++-----------
mm/vmstat.c | 8 ++++----
4 files changed, 35 insertions(+), 18 deletions(-)
--- mm.orig/include/linux/mmzone.h
+++ mm/include/linux/mmzone.h
@@ -323,9 +323,9 @@ struct zone {
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
- struct {
+ struct zone_lru {
struct list_head list;
- unsigned long nr_scan;
+ unsigned long nr_saved_scan; /* accumulated for batching */
} lru[NR_LRU_LISTS];
struct zone_reclaim_stat reclaim_stat;
--- mm.orig/mm/vmscan.c
+++ mm/mm/vmscan.c
@@ -1450,6 +1450,26 @@ static void get_scan_ratio(struct zone *
percent[1] = 100 - percent[0];
}
+/*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+ unsigned long *nr_saved_scan,
+ unsigned long swap_cluster_max)
+{
+ unsigned long nr;
+
+ *nr_saved_scan += nr_to_scan;
+ nr = *nr_saved_scan;
+
+ if (nr >= swap_cluster_max)
+ *nr_saved_scan = 0;
+ else
+ nr = 0;
+
+ return nr;
+}
/*
* This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
@@ -1475,14 +1495,11 @@ static void shrink_zone(int priority, st
scan >>= priority;
scan = (scan * percent[file]) / 100;
}
- if (scanning_global_lru(sc)) {
- zone->lru[l].nr_scan += scan;
- nr[l] = zone->lru[l].nr_scan;
- if (nr[l] >= swap_cluster_max)
- zone->lru[l].nr_scan = 0;
- else
- nr[l] = 0;
- } else
+ if (scanning_global_lru(sc))
+ nr[l] = nr_scan_try_batch(scan,
+ &zone->lru[l].nr_saved_scan,
+ swap_cluster_max);
+ else
nr[l] = scan;
}
@@ -2079,11 +2096,11 @@ static void shrink_all_zones(unsigned lo
l == LRU_ACTIVE_FILE))
continue;
- zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
- if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
+ zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
+ if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
unsigned long nr_to_scan;
- zone->lru[l].nr_scan = 0;
+ zone->lru[l].nr_saved_scan = 0;
nr_to_scan = min(nr_pages, lru_pages);
nr_reclaimed += shrink_list(l, nr_to_scan, zone,
sc, prio);
--- mm.orig/mm/vmstat.c
+++ mm/mm/vmstat.c
@@ -729,10 +729,10 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->lru[LRU_ACTIVE_ANON].nr_scan,
- zone->lru[LRU_INACTIVE_ANON].nr_scan,
- zone->lru[LRU_ACTIVE_FILE].nr_scan,
- zone->lru[LRU_INACTIVE_FILE].nr_scan,
+ zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
+ zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
+ zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
+ zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
zone->spanned_pages,
zone->present_pages);
--- mm.orig/mm/page_alloc.c
+++ mm/mm/page_alloc.c
@@ -3544,7 +3544,7 @@ static void __paginginit free_area_init_
zone_pcp_init(zone);
for_each_lru(l) {
INIT_LIST_HEAD(&zone->lru[l].list);
- zone->lru[l].nr_scan = 0;
+ zone->lru[l].nr_saved_scan = 0;
}
zone->reclaim_stat.recent_rotated[0] = 0;
zone->reclaim_stat.recent_rotated[1] = 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-02 2:31 ` Wu Fengguang
@ 2009-05-02 2:47 ` Wu Fengguang
-1 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-02 2:47 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
The lru->nr_saved_scan's are not meaningful counters for even kernel
developers. They typically are smaller than 32 and are always 0 for
large lists. So remove them from /proc/zoneinfo.
Hopefully this interface change won't break too many scripts.
/proc/zoneinfo is too unstructured to be script friendly, and I wonder
the affected scripts - if there are any - are still bleeding since the
not long ago commit "vmscan: split LRU lists into anon & file sets",
which also touched the "scanned" line :)
If we are to re-export accumulated vmscan counts in the future, they
can go to new lines in /proc/zoneinfo instead of the current form, or
to /sys/devices/system/node/node0/meminfo?
CC: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/vmstat.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
--- mm.orig/mm/vmstat.c
+++ mm/mm/vmstat.c
@@ -721,7 +721,7 @@ static void zoneinfo_show_print(struct s
"\n min %lu"
"\n low %lu"
"\n high %lu"
- "\n scanned %lu (aa: %lu ia: %lu af: %lu if: %lu)"
+ "\n scanned %lu"
"\n spanned %lu"
"\n present %lu",
zone_page_state(zone, NR_FREE_PAGES),
@@ -729,10 +729,6 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
- zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
- zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
- zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
zone->spanned_pages,
zone->present_pages);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-02 2:47 ` Wu Fengguang
0 siblings, 0 replies; 28+ messages in thread
From: Wu Fengguang @ 2009-05-02 2:47 UTC (permalink / raw)
To: Andrew Morton
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
The lru->nr_saved_scan's are not meaningful counters for even kernel
developers. They typically are smaller than 32 and are always 0 for
large lists. So remove them from /proc/zoneinfo.
Hopefully this interface change won't break too many scripts.
/proc/zoneinfo is too unstructured to be script friendly, and I wonder
the affected scripts - if there are any - are still bleeding since the
not long ago commit "vmscan: split LRU lists into anon & file sets",
which also touched the "scanned" line :)
If we are to re-export accumulated vmscan counts in the future, they
can go to new lines in /proc/zoneinfo instead of the current form, or
to /sys/devices/system/node/node0/meminfo?
CC: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/vmstat.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
--- mm.orig/mm/vmstat.c
+++ mm/mm/vmstat.c
@@ -721,7 +721,7 @@ static void zoneinfo_show_print(struct s
"\n min %lu"
"\n low %lu"
"\n high %lu"
- "\n scanned %lu (aa: %lu ia: %lu af: %lu if: %lu)"
+ "\n scanned %lu"
"\n spanned %lu"
"\n present %lu",
zone_page_state(zone, NR_FREE_PAGES),
@@ -729,10 +729,6 @@ static void zoneinfo_show_print(struct s
zone->pages_low,
zone->pages_high,
zone->pages_scanned,
- zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
- zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
- zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
- zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
zone->spanned_pages,
zone->present_pages);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
2009-05-02 2:31 ` Wu Fengguang
@ 2009-05-02 14:14 ` Rik van Riel
-1 siblings, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2009-05-02 14:14 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, linux-mm, LKML, Nick Piggin, Christoph Lameter
Wu Fengguang wrote:
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
@ 2009-05-02 14:14 ` Rik van Riel
0 siblings, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2009-05-02 14:14 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, linux-mm, LKML, Nick Piggin, Christoph Lameter
Wu Fengguang wrote:
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-02 2:47 ` Wu Fengguang
@ 2009-05-02 14:21 ` Rik van Riel
-1 siblings, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2009-05-02 14:21 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, linux-mm, LKML, Nick Piggin, Christoph Lameter
Wu Fengguang wrote:
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-02 14:21 ` Rik van Riel
0 siblings, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2009-05-02 14:21 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, linux-mm, LKML, Nick Piggin, Christoph Lameter
Wu Fengguang wrote:
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
2009-05-02 2:31 ` Wu Fengguang
@ 2009-05-04 8:26 ` Peter Zijlstra
-1 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2009-05-04 8:26 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
On Sat, 2009-05-02 at 10:31 +0800, Wu Fengguang wrote:
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> include/linux/mmzone.h | 4 ++--
> mm/page_alloc.c | 2 +-
> mm/vmscan.c | 39 ++++++++++++++++++++++++++++-----------
> mm/vmstat.c | 8 ++++----
> 4 files changed, 35 insertions(+), 18 deletions(-)
>
> --- mm.orig/include/linux/mmzone.h
> +++ mm/include/linux/mmzone.h
> @@ -323,9 +323,9 @@ struct zone {
>
> /* Fields commonly accessed by the page reclaim scanner */
> spinlock_t lru_lock;
> - struct {
> + struct zone_lru {
> struct list_head list;
> - unsigned long nr_scan;
> + unsigned long nr_saved_scan; /* accumulated for batching */
> } lru[NR_LRU_LISTS];
>
> struct zone_reclaim_stat reclaim_stat;
> --- mm.orig/mm/vmscan.c
> +++ mm/mm/vmscan.c
> @@ -1450,6 +1450,26 @@ static void get_scan_ratio(struct zone *
> percent[1] = 100 - percent[0];
> }
>
> +/*
> + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> + * until we collected @swap_cluster_max pages to scan.
> + */
> +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> + unsigned long *nr_saved_scan,
> + unsigned long swap_cluster_max)
> +{
> + unsigned long nr;
> +
> + *nr_saved_scan += nr_to_scan;
> + nr = *nr_saved_scan;
> +
> + if (nr >= swap_cluster_max)
> + *nr_saved_scan = 0;
> + else
> + nr = 0;
> +
> + return nr;
> +}
>
> /*
> * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
> @@ -1475,14 +1495,11 @@ static void shrink_zone(int priority, st
> scan >>= priority;
> scan = (scan * percent[file]) / 100;
> }
> - if (scanning_global_lru(sc)) {
> - zone->lru[l].nr_scan += scan;
> - nr[l] = zone->lru[l].nr_scan;
> - if (nr[l] >= swap_cluster_max)
> - zone->lru[l].nr_scan = 0;
> - else
> - nr[l] = 0;
> - } else
> + if (scanning_global_lru(sc))
> + nr[l] = nr_scan_try_batch(scan,
> + &zone->lru[l].nr_saved_scan,
> + swap_cluster_max);
> + else
> nr[l] = scan;
> }
>
> @@ -2079,11 +2096,11 @@ static void shrink_all_zones(unsigned lo
> l == LRU_ACTIVE_FILE))
> continue;
>
> - zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
> - if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
> + zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
> + if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
> unsigned long nr_to_scan;
>
> - zone->lru[l].nr_scan = 0;
> + zone->lru[l].nr_saved_scan = 0;
> nr_to_scan = min(nr_pages, lru_pages);
> nr_reclaimed += shrink_list(l, nr_to_scan, zone,
> sc, prio);
> --- mm.orig/mm/vmstat.c
> +++ mm/mm/vmstat.c
> @@ -729,10 +729,10 @@ static void zoneinfo_show_print(struct s
> zone->pages_low,
> zone->pages_high,
> zone->pages_scanned,
> - zone->lru[LRU_ACTIVE_ANON].nr_scan,
> - zone->lru[LRU_INACTIVE_ANON].nr_scan,
> - zone->lru[LRU_ACTIVE_FILE].nr_scan,
> - zone->lru[LRU_INACTIVE_FILE].nr_scan,
> + zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
> + zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
> + zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
> + zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
> zone->spanned_pages,
> zone->present_pages);
>
> --- mm.orig/mm/page_alloc.c
> +++ mm/mm/page_alloc.c
> @@ -3544,7 +3544,7 @@ static void __paginginit free_area_init_
> zone_pcp_init(zone);
> for_each_lru(l) {
> INIT_LIST_HEAD(&zone->lru[l].list);
> - zone->lru[l].nr_scan = 0;
> + zone->lru[l].nr_saved_scan = 0;
> }
> zone->reclaim_stat.recent_rotated[0] = 0;
> zone->reclaim_stat.recent_rotated[1] = 0;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
@ 2009-05-04 8:26 ` Peter Zijlstra
0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2009-05-04 8:26 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
On Sat, 2009-05-02 at 10:31 +0800, Wu Fengguang wrote:
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> include/linux/mmzone.h | 4 ++--
> mm/page_alloc.c | 2 +-
> mm/vmscan.c | 39 ++++++++++++++++++++++++++++-----------
> mm/vmstat.c | 8 ++++----
> 4 files changed, 35 insertions(+), 18 deletions(-)
>
> --- mm.orig/include/linux/mmzone.h
> +++ mm/include/linux/mmzone.h
> @@ -323,9 +323,9 @@ struct zone {
>
> /* Fields commonly accessed by the page reclaim scanner */
> spinlock_t lru_lock;
> - struct {
> + struct zone_lru {
> struct list_head list;
> - unsigned long nr_scan;
> + unsigned long nr_saved_scan; /* accumulated for batching */
> } lru[NR_LRU_LISTS];
>
> struct zone_reclaim_stat reclaim_stat;
> --- mm.orig/mm/vmscan.c
> +++ mm/mm/vmscan.c
> @@ -1450,6 +1450,26 @@ static void get_scan_ratio(struct zone *
> percent[1] = 100 - percent[0];
> }
>
> +/*
> + * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
> + * until we collected @swap_cluster_max pages to scan.
> + */
> +static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
> + unsigned long *nr_saved_scan,
> + unsigned long swap_cluster_max)
> +{
> + unsigned long nr;
> +
> + *nr_saved_scan += nr_to_scan;
> + nr = *nr_saved_scan;
> +
> + if (nr >= swap_cluster_max)
> + *nr_saved_scan = 0;
> + else
> + nr = 0;
> +
> + return nr;
> +}
>
> /*
> * This is a basic per-zone page freer. Used by both kswapd and direct reclaim.
> @@ -1475,14 +1495,11 @@ static void shrink_zone(int priority, st
> scan >>= priority;
> scan = (scan * percent[file]) / 100;
> }
> - if (scanning_global_lru(sc)) {
> - zone->lru[l].nr_scan += scan;
> - nr[l] = zone->lru[l].nr_scan;
> - if (nr[l] >= swap_cluster_max)
> - zone->lru[l].nr_scan = 0;
> - else
> - nr[l] = 0;
> - } else
> + if (scanning_global_lru(sc))
> + nr[l] = nr_scan_try_batch(scan,
> + &zone->lru[l].nr_saved_scan,
> + swap_cluster_max);
> + else
> nr[l] = scan;
> }
>
> @@ -2079,11 +2096,11 @@ static void shrink_all_zones(unsigned lo
> l == LRU_ACTIVE_FILE))
> continue;
>
> - zone->lru[l].nr_scan += (lru_pages >> prio) + 1;
> - if (zone->lru[l].nr_scan >= nr_pages || pass > 3) {
> + zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
> + if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
> unsigned long nr_to_scan;
>
> - zone->lru[l].nr_scan = 0;
> + zone->lru[l].nr_saved_scan = 0;
> nr_to_scan = min(nr_pages, lru_pages);
> nr_reclaimed += shrink_list(l, nr_to_scan, zone,
> sc, prio);
> --- mm.orig/mm/vmstat.c
> +++ mm/mm/vmstat.c
> @@ -729,10 +729,10 @@ static void zoneinfo_show_print(struct s
> zone->pages_low,
> zone->pages_high,
> zone->pages_scanned,
> - zone->lru[LRU_ACTIVE_ANON].nr_scan,
> - zone->lru[LRU_INACTIVE_ANON].nr_scan,
> - zone->lru[LRU_ACTIVE_FILE].nr_scan,
> - zone->lru[LRU_INACTIVE_FILE].nr_scan,
> + zone->lru[LRU_ACTIVE_ANON].nr_saved_scan,
> + zone->lru[LRU_INACTIVE_ANON].nr_saved_scan,
> + zone->lru[LRU_ACTIVE_FILE].nr_saved_scan,
> + zone->lru[LRU_INACTIVE_FILE].nr_saved_scan,
> zone->spanned_pages,
> zone->present_pages);
>
> --- mm.orig/mm/page_alloc.c
> +++ mm/mm/page_alloc.c
> @@ -3544,7 +3544,7 @@ static void __paginginit free_area_init_
> zone_pcp_init(zone);
> for_each_lru(l) {
> INIT_LIST_HEAD(&zone->lru[l].list);
> - zone->lru[l].nr_scan = 0;
> + zone->lru[l].nr_saved_scan = 0;
> }
> zone->reclaim_stat.recent_rotated[0] = 0;
> zone->reclaim_stat.recent_rotated[1] = 0;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-02 2:47 ` Wu Fengguang
@ 2009-05-04 13:50 ` Christoph Lameter
-1 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-05-04 13:50 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, riel, linux-mm, LKML, Nick Piggin
Acked-by: Christoph Lameter <cl@linux-foundation.org>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-04 13:50 ` Christoph Lameter
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-05-04 13:50 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, kosaki.motohiro, lee.schermerhorn,
peterz, riel, linux-mm, LKML, Nick Piggin
Acked-by: Christoph Lameter <cl@linux-foundation.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-02 2:47 ` Wu Fengguang
@ 2009-05-04 14:40 ` KOSAKI Motohiro
-1 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2009-05-04 14:40 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-04 14:40 ` KOSAKI Motohiro
0 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2009-05-04 14:40 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
2009-05-02 2:31 ` Wu Fengguang
@ 2009-05-04 14:41 ` KOSAKI Motohiro
-1 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2009-05-04 14:41 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] vmscan: cleanup the scan batching code
@ 2009-05-04 14:41 ` KOSAKI Motohiro
0 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2009-05-04 14:41 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, torvalds, lee.schermerhorn, peterz, riel,
linux-mm, LKML, Nick Piggin, Christoph Lameter
> The vmscan batching logic is twisting. Move it into a standalone
> function nr_scan_try_batch() and document it. No behavior change.
>
> CC: Nick Piggin <npiggin@suse.de>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> CC: Christoph Lameter <cl@linux-foundation.org>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-02 2:47 ` Wu Fengguang
@ 2009-05-04 21:49 ` Andrew Morton
-1 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2009-05-04 21:49 UTC (permalink / raw)
To: Wu Fengguang
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, linux-kernel, npiggin, cl
On Sat, 2 May 2009 10:47:19 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
/proc/zoneinfo is unsalvageable :( Shifting future work over to
/sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
sounds good to me.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-04 21:49 ` Andrew Morton
0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2009-05-04 21:49 UTC (permalink / raw)
To: Wu Fengguang
Cc: torvalds, kosaki.motohiro, lee.schermerhorn, peterz, riel,
linux-mm, linux-kernel, npiggin, cl
On Sat, 2 May 2009 10:47:19 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:
> The lru->nr_saved_scan's are not meaningful counters for even kernel
> developers. They typically are smaller than 32 and are always 0 for
> large lists. So remove them from /proc/zoneinfo.
>
> Hopefully this interface change won't break too many scripts.
> /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> the affected scripts - if there are any - are still bleeding since the
> not long ago commit "vmscan: split LRU lists into anon & file sets",
> which also touched the "scanned" line :)
>
> If we are to re-export accumulated vmscan counts in the future, they
> can go to new lines in /proc/zoneinfo instead of the current form, or
> to /sys/devices/system/node/node0/meminfo?
>
/proc/zoneinfo is unsalvageable :( Shifting future work over to
/sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
sounds good to me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-04 21:49 ` Andrew Morton
@ 2009-05-05 7:38 ` Peter Zijlstra
-1 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2009-05-05 7:38 UTC (permalink / raw)
To: Andrew Morton
Cc: Wu Fengguang, torvalds, kosaki.motohiro, lee.schermerhorn, riel,
linux-mm, linux-kernel, npiggin, cl
On Mon, 2009-05-04 at 14:49 -0700, Andrew Morton wrote:
> On Sat, 2 May 2009 10:47:19 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > The lru->nr_saved_scan's are not meaningful counters for even kernel
> > developers. They typically are smaller than 32 and are always 0 for
> > large lists. So remove them from /proc/zoneinfo.
> >
> > Hopefully this interface change won't break too many scripts.
> > /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> > the affected scripts - if there are any - are still bleeding since the
> > not long ago commit "vmscan: split LRU lists into anon & file sets",
> > which also touched the "scanned" line :)
> >
> > If we are to re-export accumulated vmscan counts in the future, they
> > can go to new lines in /proc/zoneinfo instead of the current form, or
> > to /sys/devices/system/node/node0/meminfo?
> >
>
> /proc/zoneinfo is unsalvageable :( Shifting future work over to
> /sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
> sounds good to me.
If only one could find things put in sysfs :-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-05 7:38 ` Peter Zijlstra
0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2009-05-05 7:38 UTC (permalink / raw)
To: Andrew Morton
Cc: Wu Fengguang, torvalds, kosaki.motohiro, lee.schermerhorn, riel,
linux-mm, linux-kernel, npiggin, cl
On Mon, 2009-05-04 at 14:49 -0700, Andrew Morton wrote:
> On Sat, 2 May 2009 10:47:19 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > The lru->nr_saved_scan's are not meaningful counters for even kernel
> > developers. They typically are smaller than 32 and are always 0 for
> > large lists. So remove them from /proc/zoneinfo.
> >
> > Hopefully this interface change won't break too many scripts.
> > /proc/zoneinfo is too unstructured to be script friendly, and I wonder
> > the affected scripts - if there are any - are still bleeding since the
> > not long ago commit "vmscan: split LRU lists into anon & file sets",
> > which also touched the "scanned" line :)
> >
> > If we are to re-export accumulated vmscan counts in the future, they
> > can go to new lines in /proc/zoneinfo instead of the current form, or
> > to /sys/devices/system/node/node0/meminfo?
> >
>
> /proc/zoneinfo is unsalvageable :( Shifting future work over to
> /sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
> sounds good to me.
If only one could find things put in sysfs :-)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
2009-05-05 7:38 ` Peter Zijlstra
@ 2009-05-05 15:43 ` Christoph Lameter
-1 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-05-05 15:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Wu Fengguang, torvalds, kosaki.motohiro,
lee.schermerhorn, riel, linux-mm, linux-kernel, npiggin
On Tue, 5 May 2009, Peter Zijlstra wrote:
> > /proc/zoneinfo is unsalvageable :( Shifting future work over to
> > /sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
> > sounds good to me.
>
> If only one could find things put in sysfs :-)
Write a "zoneinfo" command line tool to show this information? If we
cannot output large texts via /proc or /sys then we need small tools for
all sorts of statistics.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo
@ 2009-05-05 15:43 ` Christoph Lameter
0 siblings, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2009-05-05 15:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andrew Morton, Wu Fengguang, torvalds, kosaki.motohiro,
lee.schermerhorn, riel, linux-mm, linux-kernel, npiggin
On Tue, 5 May 2009, Peter Zijlstra wrote:
> > /proc/zoneinfo is unsalvageable :( Shifting future work over to
> > /sys/devices/system/node/nodeN/meminfo and deprecating /proc/zoneinfo
> > sounds good to me.
>
> If only one could find things put in sysfs :-)
Write a "zoneinfo" command line tool to show this information? If we
cannot output large texts via /proc or /sys then we need small tools for
all sorts of statistics.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2009-05-05 15:55 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <200904302208.n3UM8t9R016687@imap1.linux-foundation.org>
2009-05-01 1:22 ` [patch 20/22] vmscan: avoid multiplication overflow in shrink_zone() Wu Fengguang
2009-05-01 1:22 ` Wu Fengguang
2009-05-01 2:49 ` Andrew Morton
2009-05-01 2:49 ` Andrew Morton
2009-05-01 6:29 ` Wu Fengguang
2009-05-01 6:29 ` Wu Fengguang
2009-05-02 2:31 ` [PATCH] vmscan: cleanup the scan batching code Wu Fengguang
2009-05-02 2:31 ` Wu Fengguang
2009-05-02 2:47 ` [RFC][PATCH] vmscan: don't export nr_saved_scan in /proc/zoneinfo Wu Fengguang
2009-05-02 2:47 ` Wu Fengguang
2009-05-02 14:21 ` Rik van Riel
2009-05-02 14:21 ` Rik van Riel
2009-05-04 13:50 ` Christoph Lameter
2009-05-04 13:50 ` Christoph Lameter
2009-05-04 14:40 ` KOSAKI Motohiro
2009-05-04 14:40 ` KOSAKI Motohiro
2009-05-04 21:49 ` Andrew Morton
2009-05-04 21:49 ` Andrew Morton
2009-05-05 7:38 ` Peter Zijlstra
2009-05-05 7:38 ` Peter Zijlstra
2009-05-05 15:43 ` Christoph Lameter
2009-05-05 15:43 ` Christoph Lameter
2009-05-02 14:14 ` [PATCH] vmscan: cleanup the scan batching code Rik van Riel
2009-05-02 14:14 ` Rik van Riel
2009-05-04 8:26 ` Peter Zijlstra
2009-05-04 8:26 ` Peter Zijlstra
2009-05-04 14:41 ` KOSAKI Motohiro
2009-05-04 14:41 ` KOSAKI Motohiro
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.