All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	linux-numa@vger.kernel.org, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	Eric Whitney <eric.whitney@hp.com>,
	Randy Dunlap <randy.dunlap@oracle.com>
Subject: Re: [PATCH 6/6] hugetlb:  update hugetlb documentation for mempolicy based management.
Date: Fri, 11 Sep 2009 15:27:30 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com> (raw)
In-Reply-To: <20090910122641.GA31153@csn.ul.ie>

On Thu, 10 Sep 2009, Mel Gorman wrote:

> > Would you explain why introducing a new mempolicy flag, MPOL_F_HUGEPAGES, 
> > and only using the new behavior when this is set would be inconsistent or 
> > inadvisible?
> 
> I already explained this. The interface in numactl would look weird. There
> would be an --interleave switch and a --hugepages-interleave that only
> applies to nr_hugepages. The smarts could be in hugeadm to apply the mask
> when --pool-pages-min is specified but that wouldn't help scripts that are
> still using echo.
> 

I don't think we need to address the scripts that are currently using echo 
since they're (hopefully) written to the kernel implementation, i.e. no 
mempolicy restriction on writing to nr_hugepages.

> I hate to have to do this, but how about nr_hugepages which acts
> system-wide as it did traditionally and nr_hugepages_mempolicy that obeys
> policies? Something like the following untested patch. It would be fairly
> trivial for me to implement a --obey-mempolicies switch for hugeadm which
> works in conjunction with --pool--pages-min and less likely to cause confusion
> than --hugepages-interleave in numactl.
> 

I like it.

> Sorry the patch is untested. I can't hold of a NUMA machine at the moment
> and fake NUMA support sucks far worse than I expected it to.
> 

Hmm, I rewrote most of fake NUMA a couple years ago.  What problems are 
you having with it?

> ==== BEGIN PATCH ====
> 
> [PATCH] Optionally use a memory policy when tuning the size of the static hugepage pool
> 
> Patch "derive huge pages nodes allowed from task mempolicy" brought
> huge page support more in line with the core VM in that tuning the size
> of the static huge page pool would obey memory policies. Using this,
> administrators could interleave allocation of huge pages from a subset
> of nodes. This is consistent with how dynamic hugepage pool resizing
> works and how hugepages get allocated to applications at run-time.
> 
> However, it was pointed out that scripts may exist that depend on being
> able to drain all hugepages via /proc/sys/vm/nr_hugepages from processes
> that are running within a memory policy. This patch adds
> /proc/sys/vm/nr_hugepages_mempolicy which when written to will obey
> memory policies. /proc/sys/vm/nr_hugepages continues then to be a
> system-wide tunable regardless of memory policy.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> --- 
>  include/linux/hugetlb.h |    1 +
>  kernel/sysctl.c         |   11 +++++++++++
>  mm/hugetlb.c            |   35 ++++++++++++++++++++++++++++++++---
>  3 files changed, 44 insertions(+), 3 deletions(-)
> 

It'll need an update to Documentation/vm/hugetlb.txt, but this can 
probably be done in one of Lee's patches that edits the same file when he 
reposts.

> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index fcb1677..fc3a659 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -21,6 +21,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
>  
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> +int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8bac3f5..0637655 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1171,6 +1171,17 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= (void *)&hugetlb_zero,
>  		.extra2		= (void *)&hugetlb_infinity,
>  	 },
> +#ifdef CONFIG_NUMA
> +	 {
> +		.procname	= "nr_hugepages_mempolicy",
> +		.data		= NULL,
> +		.maxlen		= sizeof(unsigned long),
> +		.mode		= 0644,
> +		.proc_handler	= &hugetlb_mempolicy_sysctl_handler,
> +		.extra1		= (void *)&hugetlb_zero,
> +		.extra2		= (void *)&hugetlb_infinity,
> +	 },
> +#endif
>  	 {
>  		.ctl_name	= VM_HUGETLB_GROUP,
>  		.procname	= "hugetlb_shm_group",
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 83decd6..68abef0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1244,6 +1244,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
>  	return ret;
>  }
>  
> +#define NUMA_NO_NODE_OBEY_MEMPOLICY (-2)
>  #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
>  static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
>  								int nid)

I think it would be possible to avoid adding NUMA_NO_NODE_OBEY_MEMPOLICY 
if the nodemask was allocated in the sysctl handler instead and passing it 
into set_max_huge_pages() instead of a nid.  Lee, what do you think?


Other than that, I like this approach because it avoids the potential for 
userspace breakage while adding the new feature in way that avoids 
confusion.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Nishanth Aravamudan <nacc@us.ibm.com>,
	linux-numa@vger.kernel.org, Adam Litke <agl@us.ibm.com>,
	Andy Whitcroft <apw@canonical.com>,
	Eric Whitney <eric.whitney@hp.com>,
	Randy Dunlap <randy.dunlap@oracle.com>
Subject: Re: [PATCH 6/6] hugetlb:  update hugetlb documentation for mempolicy based management.
Date: Fri, 11 Sep 2009 15:27:30 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com> (raw)
In-Reply-To: <20090910122641.GA31153@csn.ul.ie>

On Thu, 10 Sep 2009, Mel Gorman wrote:

> > Would you explain why introducing a new mempolicy flag, MPOL_F_HUGEPAGES, 
> > and only using the new behavior when this is set would be inconsistent or 
> > inadvisible?
> 
> I already explained this. The interface in numactl would look weird. There
> would be an --interleave switch and a --hugepages-interleave that only
> applies to nr_hugepages. The smarts could be in hugeadm to apply the mask
> when --pool-pages-min is specified but that wouldn't help scripts that are
> still using echo.
> 

I don't think we need to address the scripts that are currently using echo 
since they're (hopefully) written to the kernel implementation, i.e. no 
mempolicy restriction on writing to nr_hugepages.

> I hate to have to do this, but how about nr_hugepages which acts
> system-wide as it did traditionally and nr_hugepages_mempolicy that obeys
> policies? Something like the following untested patch. It would be fairly
> trivial for me to implement a --obey-mempolicies switch for hugeadm which
> works in conjunction with --pool--pages-min and less likely to cause confusion
> than --hugepages-interleave in numactl.
> 

I like it.

> Sorry the patch is untested. I can't hold of a NUMA machine at the moment
> and fake NUMA support sucks far worse than I expected it to.
> 

Hmm, I rewrote most of fake NUMA a couple years ago.  What problems are 
you having with it?

> ==== BEGIN PATCH ====
> 
> [PATCH] Optionally use a memory policy when tuning the size of the static hugepage pool
> 
> Patch "derive huge pages nodes allowed from task mempolicy" brought
> huge page support more in line with the core VM in that tuning the size
> of the static huge page pool would obey memory policies. Using this,
> administrators could interleave allocation of huge pages from a subset
> of nodes. This is consistent with how dynamic hugepage pool resizing
> works and how hugepages get allocated to applications at run-time.
> 
> However, it was pointed out that scripts may exist that depend on being
> able to drain all hugepages via /proc/sys/vm/nr_hugepages from processes
> that are running within a memory policy. This patch adds
> /proc/sys/vm/nr_hugepages_mempolicy which when written to will obey
> memory policies. /proc/sys/vm/nr_hugepages continues then to be a
> system-wide tunable regardless of memory policy.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> --- 
>  include/linux/hugetlb.h |    1 +
>  kernel/sysctl.c         |   11 +++++++++++
>  mm/hugetlb.c            |   35 ++++++++++++++++++++++++++++++++---
>  3 files changed, 44 insertions(+), 3 deletions(-)
> 

It'll need an update to Documentation/vm/hugetlb.txt, but this can 
probably be done in one of Lee's patches that edits the same file when he 
reposts.

> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index fcb1677..fc3a659 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -21,6 +21,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
>  
>  void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
>  int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
> +int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *);
>  int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8bac3f5..0637655 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1171,6 +1171,17 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= (void *)&hugetlb_zero,
>  		.extra2		= (void *)&hugetlb_infinity,
>  	 },
> +#ifdef CONFIG_NUMA
> +	 {
> +		.procname	= "nr_hugepages_mempolicy",
> +		.data		= NULL,
> +		.maxlen		= sizeof(unsigned long),
> +		.mode		= 0644,
> +		.proc_handler	= &hugetlb_mempolicy_sysctl_handler,
> +		.extra1		= (void *)&hugetlb_zero,
> +		.extra2		= (void *)&hugetlb_infinity,
> +	 },
> +#endif
>  	 {
>  		.ctl_name	= VM_HUGETLB_GROUP,
>  		.procname	= "hugetlb_shm_group",
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 83decd6..68abef0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1244,6 +1244,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
>  	return ret;
>  }
>  
> +#define NUMA_NO_NODE_OBEY_MEMPOLICY (-2)
>  #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages)
>  static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count,
>  								int nid)

I think it would be possible to avoid adding NUMA_NO_NODE_OBEY_MEMPOLICY 
if the nodemask was allocated in the sysctl handler instead and passing it 
into set_max_huge_pages() instead of a nid.  Lee, what do you think?


Other than that, I like this approach because it avoids the potential for 
userspace breakage while adding the new feature in way that avoids 
confusion.

  reply	other threads:[~2009-09-11 22:27 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-28 16:03 [PATCH 0/6] hugetlb: V5 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-08-28 16:03 ` Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 1/6] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 2/6] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-09-03 18:39   ` David Rientjes
2009-09-03 18:39     ` David Rientjes
2009-08-28 16:03 ` [PATCH 3/6] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-09-01 14:47   ` Mel Gorman
2009-09-01 14:47     ` Mel Gorman
2009-09-03 19:22   ` David Rientjes
2009-09-03 19:22     ` David Rientjes
2009-09-03 20:15     ` Lee Schermerhorn
2009-09-03 20:15       ` Lee Schermerhorn
2009-09-03 20:49       ` David Rientjes
2009-09-03 20:49         ` David Rientjes
2009-08-28 16:03 ` [PATCH 4/6] hugetlb: introduce alloc_nodemask_of_node Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-09-01 14:49   ` Mel Gorman
2009-09-01 14:49     ` Mel Gorman
2009-09-01 16:42     ` Lee Schermerhorn
2009-09-01 16:42       ` Lee Schermerhorn
2009-09-03 18:34       ` David Rientjes
2009-09-03 18:34         ` David Rientjes
2009-09-03 20:49         ` Lee Schermerhorn
2009-09-03 21:03           ` David Rientjes
2009-09-03 21:03             ` David Rientjes
2009-08-28 16:03 ` [PATCH 5/6] hugetlb: add per node hstate attributes Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-09-01 15:20   ` Mel Gorman
2009-09-01 15:20     ` Mel Gorman
2009-09-03 19:52   ` David Rientjes
2009-09-03 19:52     ` David Rientjes
2009-09-03 20:41     ` Lee Schermerhorn
2009-09-03 20:41       ` Lee Schermerhorn
2009-09-03 21:02       ` David Rientjes
2009-09-03 21:02         ` David Rientjes
2009-09-04 14:30         ` Lee Schermerhorn
2009-09-04 14:30           ` Lee Schermerhorn
2009-08-28 16:03 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
2009-08-28 16:03   ` Lee Schermerhorn
2009-09-03 20:07   ` David Rientjes
2009-09-03 20:07     ` David Rientjes
2009-09-03 21:09     ` Lee Schermerhorn
2009-09-03 21:09       ` Lee Schermerhorn
2009-09-03 21:25       ` David Rientjes
2009-09-08 10:44         ` Mel Gorman
2009-09-08 10:44           ` Mel Gorman
2009-09-08 19:51           ` David Rientjes
2009-09-08 20:04             ` Mel Gorman
2009-09-08 20:04               ` Mel Gorman
2009-09-08 20:18               ` David Rientjes
2009-09-08 21:41                 ` Mel Gorman
2009-09-08 21:41                   ` Mel Gorman
2009-09-08 22:54                   ` David Rientjes
2009-09-09  8:16                     ` Mel Gorman
2009-09-09  8:16                       ` Mel Gorman
2009-09-09 20:44                       ` David Rientjes
2009-09-10 12:26                         ` Mel Gorman
2009-09-10 12:26                           ` Mel Gorman
2009-09-11 22:27                           ` David Rientjes [this message]
2009-09-11 22:27                             ` David Rientjes
2009-09-14 13:33                             ` Mel Gorman
2009-09-14 14:15                               ` Lee Schermerhorn
2009-09-14 14:15                                 ` Lee Schermerhorn
2009-09-14 15:41                                 ` Mel Gorman
2009-09-14 15:41                                   ` Mel Gorman
2009-09-14 19:15                                   ` David Rientjes
2009-09-14 19:15                                     ` David Rientjes
2009-09-15 11:48                                     ` Mel Gorman
2009-09-15 11:48                                       ` Mel Gorman
2009-09-14 19:14                               ` David Rientjes
2009-09-14 19:14                                 ` David Rientjes
2009-09-14 21:28                                 ` David Rientjes
2009-09-16 10:21                                   ` Mel Gorman
2009-09-03 20:42   ` Randy Dunlap
2009-09-04 15:23     ` Lee Schermerhorn
2009-09-09 16:31 [PATCH 0/6] hugetlb: V6 constrain allocation/free based on task mempolicy Lee Schermerhorn
2009-09-09 16:32 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn
2009-09-09 16:32   ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=apw@canonical.com \
    --cc=eric.whitney@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mel@csn.ul.ie \
    --cc=nacc@us.ibm.com \
    --cc=randy.dunlap@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.