All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, Mel Gorman <mel@csn.ul.ie>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: [PATCH 0/5] Huge Pages Nodes Allowed
Date: Tue, 16 Jun 2009 09:52:28 -0400	[thread overview]
Message-ID: <20090616135228.25248.22018.sendpatchset@lts-notebook> (raw)

Because of assymmetries in some NUMA platforms, and "interesting"
topologies emerging in the "scale up x86" world, we have need for
better control over the placement of "fresh huge pages".  A while
back Nish Aravamundan floated a series of patches to add per node
controls for allocating pages to the hugepage pool and removing
them.  Nish apparently moved on to other tasks before those patches
were accepted.  I have kept a copy of Nish's patches and have
intended to rebase and test them and resubmit.

In an [off-list] exchange with Mel Gorman, who admits to knowledge
in the huge pages area, I asked his opinion of per node controls
for huge pages and he suggested another approach:  using the mempolicy
of the task that changes nr_hugepages to constrain the fresh huge
page allocations.  I considered this approach but it seemed to me
to be a misuse of mempolicy for populating the huge pages free
pool.  Interleave policy doesn't have same "this node" semantics
that we want and bind policy would require constructing a custom
node mask for node as well as addressing OOM, which we don't want
during fresh huge page allocation.  One could derive a node mask
of allowed nodes for huge pages from the mempolicy of the task
that is modifying nr_hugepages and use that for fresh huge pages
with GFP_THISNODE.  However, if we're not going to use mempolicy
directly--e.g., via alloc_page_current() or alloc_page_vma() [with
yet another on-stack pseudo-vma :(]--I thought it cleaner to
define a "nodes allowed" nodemask for populating the [persistent]
huge pages free pool.

This patch series introduces a [per hugepage size] "sysctl",
hugepages_nodes_allowed, that specifies a nodemask to constrain
the allocation of persistent, fresh huge pages.   The nodemask
may be specified by a sysctl, a sysfs huge pages attribute and
on the kernel boot command line.  

The series includes a patch to free hugepages from the pool in a
"round robin" fashion, interleaved across all on-line nodes to
balance the hugepage pool across nodes.  Nish had a patch to do
this, too.

Together, these changes don't provide the fine grain of control
that per node attributes would.  Specifically, there is no easy
way to reduce the persistent huge page count for a specific node.
I think the degree of control provided by these patches is the
minimal necessary and sufficient for managing the persistent the
huge page pool.  However, with a bit more reorganization,  we
could implement per node controls if others would find that
useful.

For more info, see the patch descriptions and the updated kernel
hugepages documentation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2009-06-16 13:50 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-16 13:52 Lee Schermerhorn [this message]
2009-06-16 13:52 ` [PATCH 1/5] Free huge pages round robin to balance across nodes Lee Schermerhorn
2009-06-17 13:18   ` Mel Gorman
2009-06-17 17:16     ` Lee Schermerhorn
2009-06-18 19:08       ` David Rientjes
2009-06-16 13:52 ` [PATCH 2/5] Add nodes_allowed members to hugepages hstate struct Lee Schermerhorn
2009-06-17 13:35   ` Mel Gorman
2009-06-17 17:38     ` Lee Schermerhorn
2009-06-18  9:17       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 3/5] Use per hstate nodes_allowed to constrain huge page allocation Lee Schermerhorn
2009-06-17 13:39   ` Mel Gorman
2009-06-17 17:47     ` Lee Schermerhorn
2009-06-18  9:18       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 4/5] Add sysctl for default hstate nodes_allowed Lee Schermerhorn
2009-06-17 13:41   ` Mel Gorman
2009-06-17 17:52     ` Lee Schermerhorn
2009-06-18  9:19       ` Mel Gorman
2009-06-16 13:53 ` [PATCH 5/5] Update huge pages kernel documentation Lee Schermerhorn
2009-06-18 18:49   ` David Rientjes
2009-06-18 19:06     ` Lee Schermerhorn
2009-06-17 13:02 ` [PATCH 0/5] Huge Pages Nodes Allowed Mel Gorman
2009-06-17 17:15   ` Lee Schermerhorn
2009-06-18  9:33     ` Mel Gorman
2009-06-18 14:46       ` Lee Schermerhorn
2009-06-18 15:00         ` Mel Gorman
2009-06-18 19:08     ` David Rientjes
2009-06-24  7:11       ` David Rientjes
2009-06-24 11:25         ` Lee Schermerhorn
2009-06-24 22:26           ` David Rientjes
2009-06-25  2:14             ` Lee Schermerhorn
2009-06-25 19:22               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090616135228.25248.22018.sendpatchset@lts-notebook \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.