linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: Jes Sorensen <jes@wildopensource.com>
Cc: Andrew Morton <akpm@osdl.org>, Jesse Barnes <jbarnes@sgi.com>,
	viro@math.psu.edu, wli@holomorphy.com,
	linux-kernel@vger.kernel.org
Subject: Re: hash table sizes
Date: Fri, 28 Nov 2003 08:52:55 -0600	[thread overview]
Message-ID: <20031128145255.GA26853@sgi.com> (raw)
In-Reply-To: <yq0d6bcmvfd.fsf@wildopensource.com>

On Fri, Nov 28, 2003 at 09:15:02AM -0500, Jes Sorensen wrote:
> >>>>> "Andrew" == Andrew Morton <akpm@osdl.org> writes:
> 
> Andrew> jbarnes@sgi.com (Jesse Barnes) wrote:
> >> Something like that might be ok, but on our system, all memory is
> >> in ZONE_DMA...
> 
> Andrew> Well yes, we'd want
> 
> Andrew> 	vfs_caches_init(min(num_physpages,
> Andrew> some_platform_limit()));
> 
> Andrew> which on ia32 would evaluate to nr_free_buffer_pages() and on
> Andrew> ia64 would evaluate to the size of one of those zones.
> 
> What about something like this? I believe node_present_pages should be
> the same as nym_physpages on a non-NUMA machine. If not we can make it
> min(num_physpages, NODE_DATA(0)->node_present_pages).


I may be missing something but I dont see how this fixes the original
problem that we started with.

The system has a large number of nodes. Physically, each node has the same 
amount of memory.  After boot, we observe that several nodes have 
substantially less memory than other nodes. Some of the inbalance is
due to the kernel data/text being on node 0, but by far, the major source 
of in the inbalance is the 3 (in 2.4.x) large hash tables that are being 
allocated. 

I suspect the size of the hash tables is a lot bigger than is needed. 
That is certainly the first problem to be fixed, but unless the 
required size is a very small percentage (5-10%) of the amount of memory 
on a node (2GB to 32GB per node & 256 nodes), we still have a problem.

We run large MPI applications that place threads onto each node. Each
thread needs the same amount of local memory. The maximum problem size 
that can be efficiently solved is limited by the amount of free memory 
on the smallest node. We need an allocation scheme that doesn't deplete
a significant amount of memory from any single node.



> 
> Of course this might not work perfectly if one has multiple nodes and
> node 0 has no or very little memory. It would also be nice if one
> could spread the various caches onto various nodes, but we can leave
> that for stage 2 ;-)
> 
> Cheers,
> Jes
> 
> --- orig/linux-2.6.0-test10/init/main.c	Sun Nov 23 17:31:14 2003
> +++ linux-2.6.0-test10/init/main.c	Fri Nov 28 07:06:45 2003
> @@ -447,7 +447,7 @@
>  	proc_caches_init();
>  	buffer_init();
>  	security_scaffolding_startup();
> -	vfs_caches_init(num_physpages);
> +	vfs_caches_init(NODE_DATA(0)->node_present_pages);
>  	radix_tree_init();
>  	signals_init();
>  	/* rootfs populating might need page-writeback */

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



  reply	other threads:[~2003-11-28 14:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-25 13:35 hash table sizes Jes Sorensen
2003-11-25 13:42 ` William Lee Irwin III
2003-11-25 13:54   ` Jes Sorensen
2003-11-25 16:25     ` Thomas Schlichter
2003-11-25 17:52       ` Antonio Vargas
2003-11-25 17:54         ` William Lee Irwin III
2003-11-25 20:48 ` Jack Steiner
2003-11-25 21:07   ` Andrew Morton
2003-11-25 21:14     ` Jesse Barnes
2003-11-25 21:24       ` Andrew Morton
2003-11-26  2:14         ` David S. Miller
2003-11-26  5:27         ` Matt Mackall
2003-11-28 14:15         ` Jes Sorensen
2003-11-28 14:52           ` Jack Steiner [this message]
2003-11-28 16:22             ` Jes Sorensen
2003-11-28 19:35               ` Jack Steiner
2003-11-28 21:18                 ` Jörn Engel
2003-12-01  9:46                   ` Jes Sorensen
2003-12-01 21:06     ` Anton Blanchard
2003-12-01 22:57       ` Martin J. Bligh
2003-11-25 21:16   ` Anton Blanchard
2003-11-25 23:11     ` Jack Steiner
2003-11-26  3:39       ` Rik van Riel
2003-11-26  3:59         ` William Lee Irwin III
2003-11-26  4:25           ` Andrew Morton
2003-11-26  4:23             ` William Lee Irwin III
2003-11-26  5:14           ` Martin J. Bligh
2003-11-26  9:51             ` William Lee Irwin III
2003-11-26 16:17               ` Martin J. Bligh
2003-11-26  7:25       ` Anton Blanchard
2003-11-26  5:53 Zhang, Yanmin
2003-11-29 10:39 Manfred Spraul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031128145255.GA26853@sgi.com \
    --to=steiner@sgi.com \
    --cc=akpm@osdl.org \
    --cc=jbarnes@sgi.com \
    --cc=jes@wildopensource.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@math.psu.edu \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).