linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v6
@ 2007-09-12 21:04 Mel Gorman
  2007-09-12 21:05 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Mel Gorman @ 2007-09-12 21:04 UTC (permalink / raw)
  To: Lee.Schermerhorn, kamezawa.hiroyu, clameter
  Cc: Mel Gorman, linux-kernel, linux-mm

Kamezawa-san,

This version implements your idea for storing a zone pointer and zone_idx
in a structure within the zonelist instead of encoding information in a
pointer. It has worked out quite well. The performance is comparable on
the tests I've run with similar gains/losses as I've seen with but pointer
packing but this code may be easier to understand. However, the zonelist
has doubled in size and consumes more cache lines.

I did not put the node_idx into the structure as it was not clear that there
was a real gain from doing that as the node ID is no rarely used. However,
it would be trivial to add if it could be demonstrated to be of real benefit
on workloads that make heavy use of nodemasks. I do not have an appropriate
test environment for measuring that but prehaps someone else. If they are
willing to check it out, I'll roll a suitable patch.

Any opinions on whether the slight gain in apparent performance in kernbench
worth the cacheline? It's very difficult to craft a benchmark that notices
the extra line being used so this could be a hand-waving issue.

Changelog since V6
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2
@ 2007-12-11 20:21 Mel Gorman
  2007-12-11 20:22 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-12-11 20:21 UTC (permalink / raw)
  To: akpm
  Cc: Lee.Schermerhorn, Mel Gorman, linux-kernel, linux-mm, rientjes,
	kamezawa.hiroyu, clameter

This is a rebase of the two-zonelist patchset to 2.6.24-rc4-mm1 and some
warnings cleared up. The warnings were not picked up before as they were
introduced early in the set and cleared up by the end. This might have hurt
bisecting so were worth fixing even if the end result was correct. Tests
looked good, both numactltest (slightly modified) and performance tests.

I believe Lee has been testing heavily with a version of the patchset
almost identical to this and hasn't complained. If Lee is happy enough,
can you merge these to -mm for wider testing please Andrew?

Changelog since V10
  o Rebase to 2.6.24-rc4-mm1
  o Clear up warnings in fs/buffer.c early in the patchset

Changelog since V9
  o Rebase to 2.6.24-rc2-mm1
  o Lookup the nodemask for each allocator callsite in mempolicy.c
  o Update NUMA statistics based on preferred zone, not first zonelist entry
  o When __GFP_THISNODE is specified with MPOL_BIND and the current node is
    not in the allowed nodemask, the first node in the mask will be used
  o Stick with using two zonelists instead of one because of excessive
    complexity with corner cases

Changelog since V8
  o Rebase to 2.6.24-rc2
  o Added ack for the OOM changes
  o Behave correctly when GFP_THISNODE and a node ID are specified
  o Clear up warning over type of nodes_intersects() function

Changelog since V7
  o Rebase to 2.6.23-rc8-mm2

Changelog since V6
  o Fix build bug in relation to memory controller combined with one-zonelist
  o Use while() instead of a stupid looking for()
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with two zonelists
that are filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking
up the appropriate zonelist for a GFP mask which simplifies patches later
in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask. Two
zonelists exist per node, one for normal allocations and one for __GFP_THISNODE.

Performance results varied depending on the machine configuration. In real
workloads the gain/loss will depend on how much the userspace portion of
the benchmark benefits from having more cache available due to reduced
referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc4-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

			     loss   to  gain
Total CPU time on Kernbench: -0.86% to  1.13%
Elapsed   time on Kernbench: -0.79% to  0.76%
page_test from aim9:         -4.37% to  0.79%
brk_test  from aim9:         -0.71% to  4.07%
fork_test from aim9:         -1.84% to  4.60%
exec_test from aim9:         -0.71% to  1.08%

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v10
@ 2007-11-21  0:38 Mel Gorman
  2007-11-21  0:39 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-11-21  0:38 UTC (permalink / raw)
  To: Lee.Schermerhorn, clameter
  Cc: Mel Gorman, linux-kernel, kamezawa.hiroyu, linux-mm

This release brings the number of zonelists to two instead of one. Getting
all the corner cases right for __GFP_THISNODE and one zonelist was turning
into a complicated mess. Not only was it affecting too many paths but it
reached the point where it should be reviewed as a standalone change.

Much of the aims of the earlier sets are met by having two zonelists. The
hack is still removed, the number of zonelists is reduced and the MPOL_BIND
policy still behaves sensibly. I believe this to be a reasonable starting
point leaving the full one-zonelist approach to be tackled later.

There were a few bugs and issues highlighed from reviews fixed up which
are briefly described in the changelog.

There are concerns over the stability of mainline and -mm at the moment
and the evidence is on http://test.kernel.org so we should verify for sure
it is still ok. The set passes a slightly modified numactl regression test
on x86_64. The slight modification was required because numastat behaves
differently than the regression test expects (nodes in reverse order). Lee,
can you confirm it still hasn't regressed with your tests before another
attempt is made to push it please?

Changelog since V9
  o Rebase to 2.6.24-rc2-mm1
  o Lookup the nodemask for each allocator callsite in mempolicy.c
  o Update NUMA statistics based on preferred zone, not first zonelist entry
  o When __GFP_THISNODE is specified with MPOL_BIND and the current node is
    not in the allowed nodemask, the first node in the mask will be used
  o Stick with using two zonelists instead of one because of excessive
    complexity with corner cases

Changelog since V8
  o Rebase to 2.6.24-rc2
  o Added ack for the OOM changes
  o Behave correctly when GFP_THISNODE and a node ID are specified
  o Clear up warning over type of nodes_intersects() function

Changelog since V7
  o Rebase to 2.6.23-rc8-mm2

Changelog since V6
  o Fix build bug in relation to memory controller combined with one-zonelist
  o Use while() instead of a stupid looking for()
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch introduces a helper function node_zonelist() for looking
up the appropriate zonelist for a GFP mask which simplifies patches later
in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist.

The final patch introduces filtering of the zonelists based on a nodemask. Two
zonelists exist per node, one for normal allocations and one for __GFP_THISNODE.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.24-rc2-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

			     loss   to  gain
Total CPU time on Kernbench: -1.54% to  0.54%
Elapsed   time on Kernbench: -0.75% to  0.42%
page_test from aim9:         -8.23% to 10.71%
brk_test  from aim9:         -3.32% to  4.78%
fork_test from aim9:         -0.44% to  0.38%
exec_test from aim9:         -0.95% to  1.11%

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v9
@ 2007-11-09 14:32 Mel Gorman
  2007-11-09 14:32 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-11-09 14:32 UTC (permalink / raw)
  To: akpm
  Cc: Lee.Schermerhorn, Mel Gorman, linux-kernel, linux-mm, rientjes,
	nacc, kamezawa.hiroyu, clameter

This is basically a rebase to the broken-out -mm tree. Since v8, two fixes
have been applied that showed up during testing. Most machines I test -mm
on are failing to boot for a variety of reasons but on the two machines
that did work, they appeared to work fine.

Changelog since V8
  o Rebase to 2.6.24-rc2
  o Added ack for the OOM changes
  o Behave correctly when GFP_THISNODE and a node ID are specified
  o Clear up warning over type of nodes_intersects() function

Changelog since V7
  o Rebase to 2.6.23-rc8-mm2

Changelog since V6
  o Fix build bug in relation to memory controller combined with one-zonelist
  o Use while() instead of a stupid looking for()
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist.

The fifth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v8
@ 2007-09-28 14:23 Mel Gorman
  2007-09-28 14:23 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-09-28 14:23 UTC (permalink / raw)
  To: akpm
  Cc: Lee.Schermerhorn, Mel Gorman, linux-kernel, linux-mm, rientjes,
	kamezawa.hiroyu, clameter

Hi Andrew,

This is the one-zonelist patchset again. There were multiple collisions
with patches in -mm like the policy cleanups, policy refcounting, the memory
controller patches and OOM killer changes. The functionality of the code has
not changed since the last release. I'm still hoping to merge this to -mm
when it is considered a bit more stable.

I've added David Rientjes to the cc as the OOM-zone-locking code is affected
by this patchset now and I want to be sure I didn't accidently break it. The
changes to try_set_zone_oom() are the most important here. I believe the
code is equivilant but a second opinion would not hurt.

Changelog since V7
  o Rebase to 2.6.23-rc8-mm2

Changelog since V6
  o Fix build bug in relation to memory controller combined with one-zonelist
  o Use while() instead of a stupid looking for()
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist.

The fifth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v7
@ 2007-09-13 17:52 Mel Gorman
  2007-09-13 17:52 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-09-13 17:52 UTC (permalink / raw)
  To: Lee.Schermerhorn
  Cc: Lee.Schermerhorn, Mel Gorman, linux-kernel, linux-mm,
	kamezawa.hiroyu, clameter

Hi Lee,

This is the patchset I would like tested. It has Kamezawa-sans approach for
using a structure instead of pointer packing. While it consumes more cache
like Christoph pointed out, it should an easier starting point to optimise
once workloads are identified that can show performance gains/regressions. The
pointer packing is a potential optimisation but once in place, it's difficult
to alter again.

Please let me know how it works out for you.

Changelog since V7
  o Fix build bug in relation to memory controller combined with one-zonelist
  o Use while() instead of a stupid looking for()

Changelog since V6
  o Instead of encoding zone index information in a pointer, this version
    introduces a structure that stores a zone pointer and its index 

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 (resend)
@ 2007-09-11 21:30 Mel Gorman
  2007-09-11 21:30 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-09-11 21:30 UTC (permalink / raw)
  To: Lee.Schermerhorn, akpm, ak, clameter; +Cc: Mel Gorman, linux-kernel, linux-mm

(Sorry for the resend, I mucked up the TO: line in the earlier sending)

This is the latest version of one-zonelist and it should be solid enough
for wider testing. To briefly summarise, the patchset replaces multiple
zonelists-per-node with one zonelist that is filtered based on nodemask and
GFP flags. I've dropped the patch that replaces inline functions with macros
from the end as it obscures the code for something that may or may not be a
performance benefit on older compilers. If we see performance regressions that
might have something to do with it, the patch is trivially to bring forward.

Andrew, please merge to -mm for wider testing and consideration for merging
to mainline. Minimally, it gets rid of the hack in relation to ZONE_MOVABLE
and MPOL_BIND.

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5
@ 2007-09-11 15:19 Mel Gorman
  2007-09-11 15:19 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-09-11 15:19 UTC (permalink / raw)
  To: apw; +Cc: Mel Gorman, linux-kernel, linux-mm

This is the latest version of one-zonelist and it should be solid enough
for wider testing. To briefly summarise, the patchset replaces multiple
zonelists-per-node with one zonelist that is filtered based on nodemask and
GFP flags. I've dropped the patch that replaces inline functions with macros
from the end as it obscures the code for something that may or may not be a
performance benefit on older compilers. If we see performance regressions that
might have something to do with it, the patch is trivially to bring forward.

Andrew, please merge to -mm for wider testing and consideration for merging
to mainline. Minimally, it gets rid of the hack in relation to ZONE_MOVABLE
and MPOL_BIND.

Changelog since V5
  o Rebase to 2.6.23-rc4-mm1
  o Drop patch that replaces inline functions with macros

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist. The second patch introduces
a helper function node_zonelist() for looking up the appropriate zonelist
for a GFP mask which simplifies patches later in the set.

The third patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The fourth patch introduces filtering of the zonelists based on a nodemask.

The final patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5
@ 2007-08-31 20:51 Mel Gorman
  2007-08-31 20:51 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-08-31 20:51 UTC (permalink / raw)
  To: Lee.Schermerhorn, ak, clameter; +Cc: Mel Gorman, linux-kernel, linux-mm

The main changes here is a changeover to -mm and the dropping of gfp_skip until
it has been of proven performance benefit to scanning. The -mm switch is not
straight-forward as they collide heavily with the memoryless patches. This
set has the memoryless patches as a pre-requisite for smooth merging.

Node ID embedding in the zonelist->_zones was implemented but it was
ineffectual. Only the VSMP sub-architecture on x86_64 has enough space to
store the node ID so I dropped the patch again.

If there are no major objections to this, I'll push these patches towards
Andrew for -mm and wider testing. The full description of patchset is after
the changelog.

Changelog since V4
  o Rebase to -mm kernel. Host of memoryless patches collisions dealt with
  o Do not call wakeup_kswapd() for every zone in a zonelist
  o Dropped the FASTCALL removal
  o Have cursor in iterator advance earlier
  o Use nodes_and in cpuset_nodes_valid_mems_allowed()
  o Use defines instead of inlines, noticably better performance on gcc-3.4
    No difference on later compilers such as gcc 4.1
  o Dropped gfp_skip patch until it is proven to be of benefit. Tests are
    currently inconclusive but it definitly consumes at least one cache
    line

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch replaces multiple zonelists with two zonelists that are
filtered. The two zonelists are due to the fact that the memoryless patchset
introduces a second set of zonelists for __GFP_THISNODE.

The third patch introduces filtering of the zonelists based on a nodemask.

The fourth patch replaces the two zonelists with one zonelist. A nodemask is
created when __GFP_THISNODE is specified to filter the list. The nodelists
could be pre-allocated with one-per-node but it's not clear that __GFP_THISNODE
is used often enough to be worth the effort.

The final patch replaces some static inline functions with macros. This
is purely for gcc 3.4 and possibly older versions that produce inferior
code. For readability, the patch can be dropped but if performance problems
are discovered, the compiler version and this final patch should be considered.

Performance results varied depending on the machine configuration but were
usually small performance gains. In real workloads the gain/loss will depend
on how much the userspace portion of the benchmark benefits from having more
cache available due to reduced referencing of zonelists.

These are the range of performance losses/gains when running against
2.6.23-rc3-mm1. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.67% to  3.05%
Elapsed   time on Kernbench: -0.25% to  2.96%
page_test from aim9:         -6.98% to  5.60%
brk_test  from aim9:         -3.94% to  4.11%
fork_test from aim9:         -5.72% to  4.14%
exec_test from aim9:         -1.02% to  1.56%

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes were within noise.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v4
@ 2007-08-17 20:16 Mel Gorman
  2007-08-17 20:17 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-08-17 20:16 UTC (permalink / raw)
  To: Lee.Schermerhorn, ak, clameter; +Cc: Mel Gorman, linux-kernel, linux-mm

Biggest changes are altering the embedding of zone IDs so that the type is
unsigned long instead of struct zone * and the removal of MPOL_BIND-specific
zonelists and filering based on node data instead. The biggest concern is the
last patch where FASTCALL doesn't appear to do the right thing in all cases.

Changelog since V3
  o Fix compile error in the parisc change
  o Calculate gfp_zone only once in __alloc_pages
  o Calculate classzone_idx properly in get_page_from_freelist
  o Alter check so that zone id embedded may still be used on UP
  o Use Kamezawa-sans suggestion for skipping zones in zonelist
  o Add __alloc_pages_nodemask() to filter zonelist based on a nodemask. This
    removes the need for MPOL_BIND to have a custom zonelist
  o Move zonelist iterators and helpers to mm.h
  o Change _zones from struct zone * to unsigned long
  
Changelog since V2
  o shrink_zones() uses zonelist instead of zonelist->zones
  o hugetlb uses zonelist iterator
  o zone_idx information is embedded in zonelist pointers
  o replace NODE_DATA(nid)->node_zonelist with node_zonelist(nid)

Changelog since V1
  o Break up the patch into 3 patches
  o Introduce iterators for zonelists
  o Performance regression test

The following patches replace multiple zonelists per node with one zonelist
that is filtered based on the GFP flags. The patches as a set fix a bug
with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset,
the MPOL_BIND will apply to the two highest zones when the highest zone
is ZONE_MOVABLE. This should be considered as an alternative fix for the
MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that
filters only custom zonelists. As a bonus, the patchset reduces the cache
footprint of the kernel and should improve performance in a number of cases.

The first patch cleans up an inconsitency where direct reclaim uses
zonelist->zones where other places use zonelist.

The second patch replaces multiple zonelists with one zonelist that is
filtered.

The final patch is a fix that depends on the previous two patches. The
patch changes policy zone so that the MPOL_BIND policy gets applied
to the two highest populated zones when the highest populated zone is
ZONE_MOVABLE. Otherwise, MPOL_BIND only applies to the highest populated zone.

The tests passed regression tests with numactltest. Performance results
varied depending on the machine configuration but were usually small
performance gains. The new algorithm relies heavily on the implementation
of zone_idx which is currently pretty expensive. Experiments to optimise
this have shown significant improvements for this algorithm, but is beyond
the scope of this patchset. Due to the nature of the change, the results
for other people are likely to vary - it'll usually win but occasionally lose.

In real workloads the gain/loss will depend on how much the userspace
portion of the benchmark benefits from having more cache available due
to reduced referencing of zonelists. I expect it'll be more noticable on
x86_64 with many zones than on IA64 which typically would only have one
active zonelist-per-node.

These are the range of performance losses/gains I found when running against
2.6.23-rc1-mm2. The set and these machines are a mix of i386, x86_64 and
ppc64 both NUMA and non-NUMA.

Total CPU time on Kernbench: -0.02% to  0.27%
Elapsed   time on Kernbench: -0.21% to  1.26%
page_test from aim9:         -3.41% to  3.90%
brk_test  from aim9:         -0.20% to 40.94%
fork_test from aim9:         -0.42% to  4.59%
exec_test from aim9:         -0.78% to  1.95%
Size reduction of pg_dat_t:   0     to  7808 bytes (depends on alignment)

The TBench figures were too variable between runs to draw conclusions from but
there didn't appear to be any regressions there. The hackbench results for both
sockets and pipes was within noise. I haven't gone though lmbench.

These three patches are a standalone set which address the MPOL_BIND problem
with ZONE_MOVABLE as well as reducing memory usage and in many cases the
cache footprint of the kernel.  They should be considered as a bug fix due to
the MPOL_BIND fixup.

If these patches are accepted, the follow-on work would entail;

o Encode zone_id in the zonelist pointers to avoid zone_idx() (Christoph's idea)
o If zone_id works out, eliminate z_to_n from the zonelist cache as unnecessary
o Remove bind_zonelist() (Patch in progress, very messy right now)
o Eliminate policy_zone (Trickier)

Comments?
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-12-11 20:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-12 21:04 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v6 Mel Gorman
2007-09-12 21:05 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-12 21:05 ` [PATCH 2/6] Introduce node_zonelist() for accessing the zonelist for a GFP mask Mel Gorman
2007-09-12 21:05 ` [PATCH 3/6] Use two zonelist that are filtered by " Mel Gorman
2007-09-12 21:06 ` [PATCH 4/6] Have zonelist contains structs with both a zone pointer and zone_idx Mel Gorman
2007-09-12 21:17   ` Christoph Lameter
2007-09-12 21:06 ` [PATCH 5/6] Filter based on a nodemask as well as a gfp_mask Mel Gorman
2007-09-12 21:23   ` Christoph Lameter
2007-09-13 10:25     ` Mel Gorman
2007-09-13 15:49   ` Lee Schermerhorn
2007-09-12 21:06 ` [PATCH 6/6] Use one zonelist that is filtered by nodemask Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2007-12-11 20:21 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2 Mel Gorman
2007-12-11 20:22 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-11-21  0:38 [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v10 Mel Gorman
2007-11-21  0:39 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-11-09 14:32 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v9 Mel Gorman
2007-11-09 14:32 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-28 14:23 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v8 Mel Gorman
2007-09-28 14:23 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-13 17:52 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v7 Mel Gorman
2007-09-13 17:52 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-11 21:30 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 (resend) Mel Gorman
2007-09-11 21:30 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-09-11 15:19 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 Mel Gorman
2007-09-11 15:19 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-08-31 20:51 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v5 Mel Gorman
2007-08-31 20:51 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman
2007-08-17 20:16 [PATCH 0/6] Use one zonelist per node instead of multiple zonelists v4 Mel Gorman
2007-08-17 20:17 ` [PATCH 1/6] Use zonelists instead of zones when direct reclaiming pages Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).