All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-mm@kvack.org, linux-numa@vger.kernel.org,
	akpm@linux-foundation.org, Nishanth Aravamudan <nacc@us.ibm.com>,
	David Rientjes <rientjes@google.com>, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	eric.whitney@hp.com
Subject: Re: [PATCH 4/5] hugetlb:  add per node hstate attributes
Date: Thu, 27 Aug 2009 12:52:10 -0400	[thread overview]
Message-ID: <1251391930.4374.89.camel@useless.americas.hpqcorp.net> (raw)
In-Reply-To: <20090827102338.GC21183@csn.ul.ie>

On Thu, 2009-08-27 at 11:23 +0100, Mel Gorman wrote:
> On Wed, Aug 26, 2009 at 02:04:03PM -0400, Lee Schermerhorn wrote:
> > <SNIP>
> > This revised patch also removes the include of hugetlb.h from node.h.
> > 
> > Lee
> > 
> > ---
> > 
> > PATCH 5/6 hugetlb:  register per node hugepages attributes
> > 
> > Against: 2.6.31-rc6-mmotm-090820-1918
> > 
> > V2:  remove dependency on kobject private bitfield.  Search
> >      global hstates then all per node hstates for kobject
> >      match in attribute show/store functions.
> > 
> > V3:  rebase atop the mempolicy-based hugepage alloc/free;
> >      use custom "nodes_allowed" to restrict alloc/free to
> >      a specific node via per node attributes.  Per node
> >      attribute overrides mempolicy.  I.e., mempolicy only
> >      applies to global attributes.
> > 
> > V4:  Fix issues raised by Mel Gorman:
> >      + add !NUMA versions of hugetlb_[un]register_node()
> >      + rename 'hi' to 'i' in kobj_to_node_hstate()
> >      + rename (count, input) to (len, count) in nr_hugepages_store()
> >      + moved per node hugepages_kobj and hstate_kobjs[] from the
> >        struct node [sysdev] to hugetlb.c private arrays.
> >      + changed registration mechanism so that hugetlbfs [a module]
> >        register its attributes registration callbacks with the node
> >        driver, eliminating the dependency between the node driver
> >        and hugetlbfs.  From it's init func, hugetlbfs will register
> >        all on-line nodes' hugepage sysfs attributes along with
> >        hugetlbfs' attributes register/unregister functions.  The
> >        node driver will use these functions to [un]register nodes
> >        with hugetlbfs on node hot-plug.
> >      + replaced hugetlb.c private "nodes_allowed_from_node()" with
> >        generic "alloc_nodemask_of_node()".
> > 
> > This patch adds the per huge page size control/query attributes
> > to the per node sysdevs:
> > 
> > /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
> > 	nr_hugepages       - r/w
> > 	free_huge_pages    - r/o
> > 	surplus_huge_pages - r/o
> > 
> > The patch attempts to re-use/share as much of the existing
> > global hstate attribute initialization and handling, and the
> > "nodes_allowed" constraint processing as possible.
> > Calling set_max_huge_pages() with no node indicates a change to
> > global hstate parameters.  In this case, any non-default task
> > mempolicy will be used to generate the nodes_allowed mask.  A
> > valid node id indicates an update to that node's hstate 
> > parameters, and the count argument specifies the target count
> > for the specified node.  From this info, we compute the target
> > global count for the hstate and construct a nodes_allowed node
> > mask contain only the specified node.
> > 
> > Setting the node specific nr_hugepages via the per node attribute
> > effectively ignores any task mempolicy or cpuset constraints.
> > 
<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/drivers/base/node.c	2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/drivers/base/node.c	2009-08-26 13:01:54.000000000 -0400
> > @@ -177,6 +177,31 @@ static ssize_t node_read_distance(struct
> >  }
> >  static SYSDEV_ATTR(distance, S_IRUGO, node_read_distance, NULL);
> >  
> > +/*
> > + * hugetlbfs per node attributes registration interface
> > + */
> > +NODE_REGISTRATION_FUNC __hugetlb_register_node;
> > +NODE_REGISTRATION_FUNC __hugetlb_unregister_node;
> > +
> > +static inline void hugetlb_register_node(struct node *node)
> > +{
> > +	if (__hugetlb_register_node)
> > +		__hugetlb_register_node(node);
> > +}
> > +
> > +static inline void hugetlb_unregister_node(struct node *node)
> > +{
> > +	if (__hugetlb_unregister_node)
> > +		__hugetlb_unregister_node(node);
> > +}
> > +
> > +void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > +                                  NODE_REGISTRATION_FUNC unregister)
> > +{
> > +	__hugetlb_register_node   = doregister;
> > +	__hugetlb_unregister_node = unregister;
> > +}
> > +
> >  
> 
> I think I get this. Basically, you want to avoid the functions being
> called too early before sysfs is initialised and still work with hotplug
> later. So early in boot, no registeration happens. sysfs and hugetlbfs
> get initialised and at that point, these hooks become active, all nodes
> registered and hotplug later continues to work.
> 
> Is that accurate? Can it get a comment?

Yes, you got it, and yes, I'll add a comment.  I had explained it in the
patch description [V4], but that's not too useful to someone coming
along later...

<snip>

> > @@ -1253,7 +1255,21 @@ static unsigned long set_max_huge_pages(
> >  	if (h->order >= MAX_ORDER)
> >  		return h->max_huge_pages;
> >  
> > -	nodes_allowed = huge_mpol_nodes_allowed();
> > +	if (nid == NO_NODEID_SPECIFIED)
> > +		nodes_allowed = huge_mpol_nodes_allowed();
> > +	else {
> > +		/*
> > +		 * incoming 'count' is for node 'nid' only, so
> > +		 * adjust count to global, but restrict alloc/free
> > +		 * to the specified node.
> > +		 */
> > +		count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
> > +		nodes_allowed = alloc_nodemask_of_node(nid);
> 
> alloc_nodemask_of_node() isn't defined anywhere.


Well, that's because the patch that defines it is in a message that I
meant to send before this one.  I see it's in my Drafts folder.  I'll
attach that patch below.  I'm rebasing against the 0827 mmotm, and I'll
resend the rebased series.  However, I wanted to get your opinion of the
nodemask patch below.

<snip>
> >  
> > +#ifdef CONFIG_NUMA
> > +
> > +struct node_hstate {
> > +	struct kobject		*hugepages_kobj;
> > +	struct kobject		*hstate_kobjs[HUGE_MAX_HSTATE];
> > +};
> > +struct node_hstate node_hstates[MAX_NUMNODES];
> > +
> > +static struct attribute *per_node_hstate_attrs[] = {
> > +	&nr_hugepages_attr.attr,
> > +	&free_hugepages_attr.attr,
> > +	&surplus_hugepages_attr.attr,
> > +	NULL,
> > +};
> > +
> > +static struct attribute_group per_node_hstate_attr_group = {
> > +	.attrs = per_node_hstate_attrs,
> > +};
> > +
> > +static struct hstate *kobj_to_node_hstate(struct kobject *kobj, int *nidp)
> > +{
> > +	int nid;
> > +
> > +	for (nid = 0; nid < nr_node_ids; nid++) {
> > +		struct node_hstate *nhs = &node_hstates[nid];
> > +		int i;
> > +		for (i = 0; i < HUGE_MAX_HSTATE; i++)
> > +			if (nhs->hstate_kobjs[i] == kobj) {
> > +				if (nidp)
> > +					*nidp = nid;
> > +				return &hstates[i];
> > +			}
> > +	}
> > +
> > +	BUG();
> > +	return NULL;
> > +}
> 
> Ok, this looks nicer in that the dependencies between hugetlbfs and base
> node support are going the right direction.

Agreed.  I removed that "issue" from the patch description.

<snip>
> > Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h
> > ===================================================================
> > --- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/node.h	2009-08-26 12:37:03.000000000 -0400
> > +++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/node.h	2009-08-26 12:40:19.000000000 -0400
> > @@ -28,6 +28,7 @@ struct node {
> >  
> >  struct memory_block;
> >  extern struct node node_devices[];
> > +typedef  void (*NODE_REGISTRATION_FUNC)(struct node *);
> >  
> >  extern int register_node(struct node *, int, struct node *);
> >  extern void unregister_node(struct node *node);
> > @@ -39,6 +40,8 @@ extern int unregister_cpu_under_node(uns
> >  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
> >  						int nid);
> >  extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk);
> > +extern void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC doregister,
> > +                                         NODE_REGISTRATION_FUNC unregister);
> >  #else
> >  static inline int register_one_node(int nid)
> >  {
> > @@ -65,6 +68,9 @@ static inline int unregister_mem_sect_un
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void register_hugetlbfs_with_node(NODE_REGISTRATION_FUNC do,
> > +                                                NODE_REGISTRATION_FUNC un) { }
> 
> "do" is a keyword. This won't compile on !NUMA. needs to be called
> doregister and unregister or basically anything other than "do"

Sorry.  Last minute, obviously untested, addition.  I have built the
reworked code with and without NUMA.

Here's my current "alloc_nodemask_of_node()" patch.  What do you think
about going with this? 

PATCH 4/6 - hugetlb:  introduce alloc_nodemask_of_node()

Against: 2.6.31-rc6-mmotm-090820-1918

Introduce nodemask macro to allocate a nodemask and 
initialize it to contain a single node, using existing
nodemask_of_node() macro.  Coded as a macro to avoid header
dependency hell.

This will be used to construct the huge pages "nodes_allowed"
nodemask for a single node when a persistent huge page
pool page count is modified via a per node sysfs attribute.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/nodemask.h |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Index: linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h
===================================================================
--- linux-2.6.31-rc6-mmotm-090820-1918.orig/include/linux/nodemask.h	2009-08-27 09:16:39.000000000 -0400
+++ linux-2.6.31-rc6-mmotm-090820-1918/include/linux/nodemask.h	2009-08-27 09:52:21.000000000 -0400
@@ -245,18 +245,31 @@ static inline int __next_node(int n, con
 	return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
 }
 
+#define init_nodemask_of_nodes(mask, node)				\
+	nodes_clear(*(mask));						\
+	node_set((node), *(mask));
+
 #define nodemask_of_node(node)						\
 ({									\
 	typeof(_unused_nodemask_arg_) m;				\
 	if (sizeof(m) == sizeof(unsigned long)) {			\
 		m.bits[0] = 1UL<<(node);				\
 	} else {							\
-		nodes_clear(m);						\
-		node_set((node), m);					\
+		init_nodemask_of_nodes(&m, (node));			\
 	}								\
 	m;								\
 })
 
+#define alloc_nodemask_of_node(node)					\
+({									\
+	typeof(_unused_nodemask_arg_) *nmp;				\
+	nmp = kmalloc(sizeof(*nmp), GFP_KERNEL);			\
+	if (nmp)							\
+		init_nodemask_of_nodes(nmp, (node));			\
+	nmp;								\
+})
+
+
 #define first_unset_node(mask) __first_unset_node(&(mask))
 static inline int __first_unset_node(const nodemask_t *maskp)
 {


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-08-27 16:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-24 19:24 [PATCH 0/5] hugetlb: numa control of persistent huge pages alloc/free Lee Schermerhorn
2009-08-24 19:25 ` [PATCH 1/5] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-08-25  8:10   ` David Rientjes
2009-08-25  8:10     ` David Rientjes
2009-08-24 19:26 ` [PATCH 2/5] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-08-25  8:16   ` David Rientjes
2009-08-25  8:16     ` David Rientjes
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-25 20:49       ` Lee Schermerhorn
2009-08-25 21:59       ` David Rientjes
2009-08-25 21:59         ` David Rientjes
2009-08-26  9:58       ` Mel Gorman
2009-08-26  9:58         ` Mel Gorman
2009-08-24 19:27 ` [PATCH 3/5] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-08-25  8:47   ` David Rientjes
2009-08-25  8:47     ` David Rientjes
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-25 20:49       ` Lee Schermerhorn
2009-08-27 19:40       ` David Rientjes
2009-08-27 19:40         ` David Rientjes
2009-08-25 10:22   ` Mel Gorman
2009-08-25 10:22     ` Mel Gorman
2009-08-24 19:29 ` [PATCH 4/5] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-08-25 10:19   ` Mel Gorman
2009-08-25 10:19     ` Mel Gorman
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-25 20:49       ` Lee Schermerhorn
2009-08-26 10:11       ` Mel Gorman
2009-08-26 10:11         ` Mel Gorman
2009-08-26 18:02         ` Lee Schermerhorn
2009-08-26 18:02           ` Lee Schermerhorn
2009-08-26 19:47           ` David Rientjes
2009-08-26 19:47             ` David Rientjes
2009-08-26 20:46             ` Lee Schermerhorn
2009-08-26 20:46               ` Lee Schermerhorn
2009-08-27  9:52               ` Mel Gorman
2009-08-27  9:52                 ` Mel Gorman
2009-08-27 19:35               ` David Rientjes
2009-08-28 12:56                 ` Lee Schermerhorn
2009-08-26 18:04         ` Lee Schermerhorn
2009-08-27 10:23           ` Mel Gorman
2009-08-27 16:52             ` Lee Schermerhorn [this message]
2009-08-28 10:09               ` Mel Gorman
2009-08-28 10:09                 ` Mel Gorman
2009-08-25 13:35   ` Mel Gorman
2009-08-25 13:35     ` Mel Gorman
2009-08-25 20:49     ` Lee Schermerhorn
2009-08-25 20:49       ` Lee Schermerhorn
2009-08-26 10:12       ` Mel Gorman
2009-08-26 10:12         ` Mel Gorman
2009-08-24 19:30 ` [PATCH 5/5] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251391930.4374.89.camel@useless.americas.hpqcorp.net \
    --to=lee.schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.