From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@skynet.ie>
Cc: Christoph Lameter <clameter@sgi.com>,
linux-mm@kvack.org, ak@suse.de,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
akpm@linux-foundation.org, pj@sgi.com
Subject: Re: NUMA policy issues with ZONE_MOVABLE
Date: Wed, 01 Aug 2007 14:59:39 -0400 [thread overview]
Message-ID: <1185994779.5059.87.camel@localhost> (raw)
In-Reply-To: <20070726225920.GA10225@skynet.ie>
<snip>
> This patch filters only when MPOL_BIND is in use. In non-numa, the
> checks do not exist and in NUMA cases, the filtering usually does not
> take place. I'd like this to be the bug fix for policy + ZONE_MOVABLE
> and then deal with reducing zonelists to see if there is any performance
> gain as well as a simplification in how policies and cpusets are
> implemented.
>
> Testing shows no difference on non-numa as you'd expect and on NUMA machines,
> there are very small differences on NUMA (kernbench figures range from -0.02%
> to 0.15% differences on machines). Lee, can you test this patch in relation
> to MPOL_BIND? I'll look at the numactl tests tomorrow as well.
>
The patches look OK to me. I got around to testing it today.
Both atop the Memoryless Nodes series, and directly on 23-rc1-mm1.
Test System: 32GB 4-node ia64, booted with kernelcore=24G.
Yields, about 2GB Movable, and 6G Normal per node.
Filtered zoneinfo:
Node 0, zone Normal
pages free 416464
spanned 425984
present 424528
Node 0, zone Movable
pages free 47195
spanned 60416
present 60210
Node 1, zone Normal
pages free 388011
spanned 393216
present 391871
Node 1, zone Movable
pages free 125940
spanned 126976
present 126542
Node 2, zone Normal
pages free 387849
spanned 393216
present 391872
Node 2, zone Movable
pages free 126285
spanned 126976
present 126542
Node 3, zone Normal
pages free 388256
spanned 393216
present 391872
Node 3, zone Movable
pages free 126575
spanned 126966
present 126490
Node 4, zone DMA
pages free 31689
spanned 32767
present 32656
---
Attempt to allocate a 12G--i.e., > 4*2G--segment interleaved
across nodes 0-3 with memtoy. I figured this would use up
all of ZONE_MOVABLE on each node and then dip into NORMAL.
root@gwydyr(root):memtoy
memtoy pid: 6558
memtoy>anon a1 12g
memtoy>map a1
memtoy>mbind a1 interleave 0,1,2,3
memtoy>touch a1 w
memtoy: touched 786432 pages in 10.542 secs
Yields:
Node 0, zone Normal
pages free 328392
spanned 425984
present 424528
Node 0, zone Movable
pages free 37
spanned 60416
present 60210
Node 1, zone Normal
pages free 300293
spanned 393216
present 391871
Node 1, zone Movable
pages free 91
spanned 126976
present 126542
Node 2, zone Normal
pages free 300193
spanned 393216
present 391872
Node 2, zone Movable
pages free 49
spanned 126976
present 126542
Node 3, zone Normal
pages free 300448
spanned 393216
present 391872
Node 3, zone Movable
pages free 56
spanned 126966
present 126490
Node 4, zone DMA
pages free 31689
spanned 32767
present 32656
Looks like most of the movable zone in each node [~8G]
and remainder from normal zones. Should be ~1G from
zone normal of each node. However, memtoy shows something
weird, looking at the location of the 1st 64 pages at each
1G boundary. Most pages are located as I "expect" [well, I'm
not sure why we start with node 2 at offset 0, instead of
node 0].
memtoy>where a1
a 0x2000000003c08000 0x000300000000 0x000000000000 rw- private a1
page offset +00 +01 +02 +03 +04 +05 +06 +07
0: 2 3 0 1 2 3 0 1
8: 2 3 0 1 2 3 0 1
10: 2 3 0 1 2 3 0 1
18: 2 3 0 1 2 3 0 1
20: 2 3 0 1 2 3 0 1
28: 2 3 0 1 2 3 0 1
30: 2 3 0 1 2 3 0 1
38: 2 3 0 1 2 3 0 1
Same at 1G, 2G and 3G
But, between ~4G through 6+G [I didn't check any finer
granuality and didn't want to watch > 780K pages scroll
by] show:
memtoy>where a1 4g 64p
a 0x2000000003c08000 0x000300000000 0x000000000000 rw- private a1
page offset +00 +01 +02 +03 +04 +05 +06 +07
40000: 2 3 1 1 2 3 1 1
40008: 2 3 1 1 2 3 1 1
40010: 2 3 1 1 2 3 1 1
40018: 2 3 1 1 2 3 1 1
40020: 2 3 1 1 2 3 1 1
40028: 2 3 1 1 2 3 1 1
40030: 2 3 1 1 2 3 1 1
40038: 2 3 1 1 2 3 1 1
Same at 5G, then:
memtoy>where a1 6g 64p
a 0x2000000003c08000 0x000300000000 0x000000000000 rw- private a1
page offset +00 +01 +02 +03 +04 +05 +06 +07
60000: 2 3 2 2 2 3 2 2
60008: 2 3 2 2 2 3 2 2
60010: 2 3 2 2 2 3 2 2
60018: 2 3 2 2 2 3 2 2
60020: 2 3 2 2 2 3 2 2
60028: 2 3 2 2 2 3 2 2
60030: 2 3 2 2 2 3 2 2
60038: 2 3 2 2 2 3 2 2
7G, 8G, ... 11G back to expected pattern.
Thought this might be due to interaction with memoryless node patches,
so I backed those out and tested Mel's patch again. This time I
ran memtoy in batch mode and dumped the entire segment page locations
to a file. Did this twice. Both looked pretty much the same--i.e.,
the change in pattern occurs at around the same offset into the
segment. Note that here, the interleave starts at node 3 at offset
zero.
memtoy>where a1 0 0
a 0x200000000047c000 0x000300000000 0x000000000000 rw- private a1
page offset +00 +01 +02 +03 +04 +05 +06 +07
0: 3 0 1 2 3 0 1 2
8: 3 0 1 2 3 0 1 2
10: 3 0 1 2 3 0 1 2
...
38c20: 3 0 1 2 3 0 1 2
38c28: 3 0 1 2 3 0 1 2
38c30: 3 1 1 2 3 1 1 2
38c38: 3 1 1 2 3 1 1 2
38c40: 3 1 1 2 3 1 1 2
...
5a0c0: 3 1 1 2 3 1 1 2
5a0c8: 3 1 1 2 3 1 1 2
5a0d0: 3 1 1 2 3 2 2 2
5a0d8: 3 2 2 2 3 2 2 2
5a0e0: 3 2 2 2 3 2 2 2
...
65230: 3 2 2 2 3 2 2 2
65238: 3 2 2 2 3 2 2 2
65240: 3 2 2 2 3 3 3 3
65248: 3 3 3 3 3 3 3 3
65250: 3 3 3 3 3 3 3 3
...
6ab60: 3 3 3 3 3 3 3 3
6ab68: 3 3 3 3 3 3 3 3
6ab70: 3 3 3 2 3 0 1 2
6ab78: 3 0 1 2 3 0 1 2
6ab80: 3 0 1 2 3 0 1 2
...
and so on to the end of the segment:
bffe8: 3 0 1 2 3 0 1 2
bfff0: 3 0 1 2 3 0 1 2
bfff8: 3 0 1 2 3 0 1 2
The pattern changes occur at about page offsets:
0x38800 = ~ 3.6G
0x5a000 = ~ 5.8G
0x65000 = ~ 6.4G
0x6aa00 = ~ 6.8G
Then I checked zonelist order:
Built 5 zonelists in Zone order, mobility grouping on. Total pages: 2072583
Looks like we're falling back to ZONE_MOVABLE on the next node when ZONE_MOVABLE
on target node overflows.
Rebooted to "Node order" [numa_zonelist_order sysctl missing in 23-rc1-mm1]
and tried again. Saw "expected" interleave pattern across entire 12G segment.
Kame-san's patch to just exclude the DMA zones from the zonelists is looking
better--better than changing zonelist order when zone_movable is populated!
But, Mel's patch seems to work OK. I'll keep it in my stack for later
stress testing.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-01 18:59 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-25 4:20 NUMA policy issues with ZONE_MOVABLE Christoph Lameter
2007-07-25 4:47 ` Nick Piggin
2007-07-25 5:05 ` Christoph Lameter
2007-07-25 5:24 ` Nick Piggin
2007-07-25 6:00 ` Christoph Lameter
2007-07-25 6:09 ` Nick Piggin
2007-07-25 9:32 ` Andi Kleen
2007-07-25 6:36 ` KAMEZAWA Hiroyuki
2007-07-25 11:16 ` Mel Gorman
2007-07-25 14:30 ` Lee Schermerhorn
2007-07-25 19:31 ` Christoph Lameter
2007-07-26 4:15 ` KAMEZAWA Hiroyuki
2007-07-26 4:53 ` Christoph Lameter
2007-07-26 7:41 ` KAMEZAWA Hiroyuki
2007-07-26 16:16 ` Mel Gorman
2007-07-26 18:03 ` Christoph Lameter
2007-07-26 18:26 ` Mel Gorman
2007-07-26 13:23 ` Mel Gorman
2007-07-26 18:07 ` Christoph Lameter
2007-07-26 22:59 ` Mel Gorman
2007-07-27 1:22 ` Christoph Lameter
2007-07-27 8:20 ` Mel Gorman
2007-07-27 15:45 ` Mel Gorman
2007-07-27 17:35 ` Christoph Lameter
2007-07-27 17:46 ` Mel Gorman
2007-07-27 18:38 ` Christoph Lameter
2007-07-27 18:00 ` [PATCH] Document Linux Memory Policy - V2 Lee Schermerhorn
2007-07-27 18:38 ` Randy Dunlap
2007-07-27 19:01 ` Lee Schermerhorn
2007-07-27 19:21 ` Randy Dunlap
2007-07-27 18:55 ` Christoph Lameter
2007-07-27 19:24 ` Lee Schermerhorn
2007-07-31 15:14 ` Mel Gorman
2007-07-31 16:34 ` Lee Schermerhorn
2007-07-31 19:10 ` Christoph Lameter
2007-07-31 19:46 ` Lee Schermerhorn
2007-07-31 19:58 ` Christoph Lameter
2007-07-31 20:23 ` Lee Schermerhorn
2007-07-31 20:48 ` [PATCH] Document Linux Memory Policy - V3 Lee Schermerhorn
2007-08-03 13:52 ` Mel Gorman
2007-07-28 7:28 ` NUMA policy issues with ZONE_MOVABLE KAMEZAWA Hiroyuki
2007-07-28 11:57 ` Mel Gorman
2007-07-28 14:10 ` KAMEZAWA Hiroyuki
2007-07-28 14:21 ` KAMEZAWA Hiroyuki
2007-07-30 12:41 ` Mel Gorman
2007-07-30 18:06 ` Christoph Lameter
2007-07-27 14:24 ` Lee Schermerhorn
2007-08-01 18:59 ` Lee Schermerhorn [this message]
2007-08-02 0:36 ` KAMEZAWA Hiroyuki
2007-08-02 17:10 ` Mel Gorman
2007-08-02 17:51 ` Lee Schermerhorn
2007-07-26 18:09 ` Lee Schermerhorn
2007-08-02 14:09 ` Mel Gorman
2007-08-02 18:56 ` Christoph Lameter
2007-08-02 19:42 ` Mel Gorman
2007-08-02 19:52 ` Christoph Lameter
2007-08-03 9:32 ` Mel Gorman
2007-08-03 16:36 ` Christoph Lameter
2007-07-25 14:27 ` Lee Schermerhorn
2007-07-25 17:39 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1185994779.5059.87.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.