From: Jack Steiner <steiner@sgi.com>
To: Jes Sorensen <jes@wildopensource.com>
Cc: Andrew Morton <akpm@osdl.org>, Jesse Barnes <jbarnes@sgi.com>,
viro@math.psu.edu, wli@holomorphy.com,
linux-kernel@vger.kernel.org
Subject: Re: hash table sizes
Date: Fri, 28 Nov 2003 08:52:55 -0600 [thread overview]
Message-ID: <20031128145255.GA26853@sgi.com> (raw)
In-Reply-To: <yq0d6bcmvfd.fsf@wildopensource.com>
On Fri, Nov 28, 2003 at 09:15:02AM -0500, Jes Sorensen wrote:
> >>>>> "Andrew" == Andrew Morton <akpm@osdl.org> writes:
>
> Andrew> jbarnes@sgi.com (Jesse Barnes) wrote:
> >> Something like that might be ok, but on our system, all memory is
> >> in ZONE_DMA...
>
> Andrew> Well yes, we'd want
>
> Andrew> vfs_caches_init(min(num_physpages,
> Andrew> some_platform_limit()));
>
> Andrew> which on ia32 would evaluate to nr_free_buffer_pages() and on
> Andrew> ia64 would evaluate to the size of one of those zones.
>
> What about something like this? I believe node_present_pages should be
> the same as nym_physpages on a non-NUMA machine. If not we can make it
> min(num_physpages, NODE_DATA(0)->node_present_pages).
I may be missing something but I dont see how this fixes the original
problem that we started with.
The system has a large number of nodes. Physically, each node has the same
amount of memory. After boot, we observe that several nodes have
substantially less memory than other nodes. Some of the inbalance is
due to the kernel data/text being on node 0, but by far, the major source
of in the inbalance is the 3 (in 2.4.x) large hash tables that are being
allocated.
I suspect the size of the hash tables is a lot bigger than is needed.
That is certainly the first problem to be fixed, but unless the
required size is a very small percentage (5-10%) of the amount of memory
on a node (2GB to 32GB per node & 256 nodes), we still have a problem.
We run large MPI applications that place threads onto each node. Each
thread needs the same amount of local memory. The maximum problem size
that can be efficiently solved is limited by the amount of free memory
on the smallest node. We need an allocation scheme that doesn't deplete
a significant amount of memory from any single node.
>
> Of course this might not work perfectly if one has multiple nodes and
> node 0 has no or very little memory. It would also be nice if one
> could spread the various caches onto various nodes, but we can leave
> that for stage 2 ;-)
>
> Cheers,
> Jes
>
> --- orig/linux-2.6.0-test10/init/main.c Sun Nov 23 17:31:14 2003
> +++ linux-2.6.0-test10/init/main.c Fri Nov 28 07:06:45 2003
> @@ -447,7 +447,7 @@
> proc_caches_init();
> buffer_init();
> security_scaffolding_startup();
> - vfs_caches_init(num_physpages);
> + vfs_caches_init(NODE_DATA(0)->node_present_pages);
> radix_tree_init();
> signals_init();
> /* rootfs populating might need page-writeback */
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
next prev parent reply other threads:[~2003-11-28 14:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-11-25 13:35 hash table sizes Jes Sorensen
2003-11-25 13:42 ` William Lee Irwin III
2003-11-25 13:54 ` Jes Sorensen
2003-11-25 16:25 ` Thomas Schlichter
2003-11-25 17:52 ` Antonio Vargas
2003-11-25 17:54 ` William Lee Irwin III
2003-11-25 20:48 ` Jack Steiner
2003-11-25 21:07 ` Andrew Morton
2003-11-25 21:14 ` Jesse Barnes
2003-11-25 21:24 ` Andrew Morton
2003-11-26 2:14 ` David S. Miller
2003-11-26 5:27 ` Matt Mackall
2003-11-28 14:15 ` Jes Sorensen
2003-11-28 14:52 ` Jack Steiner [this message]
2003-11-28 16:22 ` Jes Sorensen
2003-11-28 19:35 ` Jack Steiner
2003-11-28 21:18 ` Jörn Engel
2003-12-01 9:46 ` Jes Sorensen
2003-12-01 21:06 ` Anton Blanchard
2003-12-01 22:57 ` Martin J. Bligh
2003-11-25 21:16 ` Anton Blanchard
2003-11-25 23:11 ` Jack Steiner
2003-11-26 3:39 ` Rik van Riel
2003-11-26 3:59 ` William Lee Irwin III
2003-11-26 4:25 ` Andrew Morton
2003-11-26 4:23 ` William Lee Irwin III
2003-11-26 5:14 ` Martin J. Bligh
2003-11-26 9:51 ` William Lee Irwin III
2003-11-26 16:17 ` Martin J. Bligh
2003-11-26 7:25 ` Anton Blanchard
2003-11-26 5:53 Zhang, Yanmin
2003-11-29 10:39 Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031128145255.GA26853@sgi.com \
--to=steiner@sgi.com \
--cc=akpm@osdl.org \
--cc=jbarnes@sgi.com \
--cc=jes@wildopensource.com \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@math.psu.edu \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).