From: David Rientjes <rientjes@google.com> To: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>, linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, Nishanth Aravamudan <nacc@us.ibm.com>, linux-numa@vger.kernel.org, Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>, Eric Whitney <eric.whitney@hp.com>, Randy Dunlap <randy.dunlap@oracle.com> Subject: Re: [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management. Date: Fri, 11 Sep 2009 15:27:30 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com> (raw) In-Reply-To: <20090910122641.GA31153@csn.ul.ie> On Thu, 10 Sep 2009, Mel Gorman wrote: > > Would you explain why introducing a new mempolicy flag, MPOL_F_HUGEPAGES, > > and only using the new behavior when this is set would be inconsistent or > > inadvisible? > > I already explained this. The interface in numactl would look weird. There > would be an --interleave switch and a --hugepages-interleave that only > applies to nr_hugepages. The smarts could be in hugeadm to apply the mask > when --pool-pages-min is specified but that wouldn't help scripts that are > still using echo. > I don't think we need to address the scripts that are currently using echo since they're (hopefully) written to the kernel implementation, i.e. no mempolicy restriction on writing to nr_hugepages. > I hate to have to do this, but how about nr_hugepages which acts > system-wide as it did traditionally and nr_hugepages_mempolicy that obeys > policies? Something like the following untested patch. It would be fairly > trivial for me to implement a --obey-mempolicies switch for hugeadm which > works in conjunction with --pool--pages-min and less likely to cause confusion > than --hugepages-interleave in numactl. > I like it. > Sorry the patch is untested. I can't hold of a NUMA machine at the moment > and fake NUMA support sucks far worse than I expected it to. > Hmm, I rewrote most of fake NUMA a couple years ago. What problems are you having with it? > ==== BEGIN PATCH ==== > > [PATCH] Optionally use a memory policy when tuning the size of the static hugepage pool > > Patch "derive huge pages nodes allowed from task mempolicy" brought > huge page support more in line with the core VM in that tuning the size > of the static huge page pool would obey memory policies. Using this, > administrators could interleave allocation of huge pages from a subset > of nodes. This is consistent with how dynamic hugepage pool resizing > works and how hugepages get allocated to applications at run-time. > > However, it was pointed out that scripts may exist that depend on being > able to drain all hugepages via /proc/sys/vm/nr_hugepages from processes > that are running within a memory policy. This patch adds > /proc/sys/vm/nr_hugepages_mempolicy which when written to will obey > memory policies. /proc/sys/vm/nr_hugepages continues then to be a > system-wide tunable regardless of memory policy. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/hugetlb.h | 1 + > kernel/sysctl.c | 11 +++++++++++ > mm/hugetlb.c | 35 ++++++++++++++++++++++++++++++++--- > 3 files changed, 44 insertions(+), 3 deletions(-) > It'll need an update to Documentation/vm/hugetlb.txt, but this can probably be done in one of Lee's patches that edits the same file when he reposts. > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index fcb1677..fc3a659 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -21,6 +21,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) > > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > +int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 8bac3f5..0637655 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1171,6 +1171,17 @@ static struct ctl_table vm_table[] = { > .extra1 = (void *)&hugetlb_zero, > .extra2 = (void *)&hugetlb_infinity, > }, > +#ifdef CONFIG_NUMA > + { > + .procname = "nr_hugepages_mempolicy", > + .data = NULL, > + .maxlen = sizeof(unsigned long), > + .mode = 0644, > + .proc_handler = &hugetlb_mempolicy_sysctl_handler, > + .extra1 = (void *)&hugetlb_zero, > + .extra2 = (void *)&hugetlb_infinity, > + }, > +#endif > { > .ctl_name = VM_HUGETLB_GROUP, > .procname = "hugetlb_shm_group", > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 83decd6..68abef0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1244,6 +1244,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +#define NUMA_NO_NODE_OBEY_MEMPOLICY (-2) > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, > int nid) I think it would be possible to avoid adding NUMA_NO_NODE_OBEY_MEMPOLICY if the nodemask was allocated in the sysctl handler instead and passing it into set_max_huge_pages() instead of a nid. Lee, what do you think? Other than that, I like this approach because it avoids the potential for userspace breakage while adding the new feature in way that avoids confusion. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com> To: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>, linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, Nishanth Aravamudan <nacc@us.ibm.com>, linux-numa@vger.kernel.org, Adam Litke <agl@us.ibm.com>, Andy Whitcroft <apw@canonical.com>, Eric Whitney <eric.whitney@hp.com>, Randy Dunlap <randy.dunlap@oracle.com> Subject: Re: [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management. Date: Fri, 11 Sep 2009 15:27:30 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com> (raw) In-Reply-To: <20090910122641.GA31153@csn.ul.ie> On Thu, 10 Sep 2009, Mel Gorman wrote: > > Would you explain why introducing a new mempolicy flag, MPOL_F_HUGEPAGES, > > and only using the new behavior when this is set would be inconsistent or > > inadvisible? > > I already explained this. The interface in numactl would look weird. There > would be an --interleave switch and a --hugepages-interleave that only > applies to nr_hugepages. The smarts could be in hugeadm to apply the mask > when --pool-pages-min is specified but that wouldn't help scripts that are > still using echo. > I don't think we need to address the scripts that are currently using echo since they're (hopefully) written to the kernel implementation, i.e. no mempolicy restriction on writing to nr_hugepages. > I hate to have to do this, but how about nr_hugepages which acts > system-wide as it did traditionally and nr_hugepages_mempolicy that obeys > policies? Something like the following untested patch. It would be fairly > trivial for me to implement a --obey-mempolicies switch for hugeadm which > works in conjunction with --pool--pages-min and less likely to cause confusion > than --hugepages-interleave in numactl. > I like it. > Sorry the patch is untested. I can't hold of a NUMA machine at the moment > and fake NUMA support sucks far worse than I expected it to. > Hmm, I rewrote most of fake NUMA a couple years ago. What problems are you having with it? > ==== BEGIN PATCH ==== > > [PATCH] Optionally use a memory policy when tuning the size of the static hugepage pool > > Patch "derive huge pages nodes allowed from task mempolicy" brought > huge page support more in line with the core VM in that tuning the size > of the static huge page pool would obey memory policies. Using this, > administrators could interleave allocation of huge pages from a subset > of nodes. This is consistent with how dynamic hugepage pool resizing > works and how hugepages get allocated to applications at run-time. > > However, it was pointed out that scripts may exist that depend on being > able to drain all hugepages via /proc/sys/vm/nr_hugepages from processes > that are running within a memory policy. This patch adds > /proc/sys/vm/nr_hugepages_mempolicy which when written to will obey > memory policies. /proc/sys/vm/nr_hugepages continues then to be a > system-wide tunable regardless of memory policy. > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > --- > include/linux/hugetlb.h | 1 + > kernel/sysctl.c | 11 +++++++++++ > mm/hugetlb.c | 35 ++++++++++++++++++++++++++++++++--- > 3 files changed, 44 insertions(+), 3 deletions(-) > It'll need an update to Documentation/vm/hugetlb.txt, but this can probably be done in one of Lee's patches that edits the same file when he reposts. > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index fcb1677..fc3a659 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -21,6 +21,7 @@ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) > > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > int hugetlb_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > +int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_overcommit_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int hugetlb_treat_movable_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 8bac3f5..0637655 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1171,6 +1171,17 @@ static struct ctl_table vm_table[] = { > .extra1 = (void *)&hugetlb_zero, > .extra2 = (void *)&hugetlb_infinity, > }, > +#ifdef CONFIG_NUMA > + { > + .procname = "nr_hugepages_mempolicy", > + .data = NULL, > + .maxlen = sizeof(unsigned long), > + .mode = 0644, > + .proc_handler = &hugetlb_mempolicy_sysctl_handler, > + .extra1 = (void *)&hugetlb_zero, > + .extra2 = (void *)&hugetlb_infinity, > + }, > +#endif > { > .ctl_name = VM_HUGETLB_GROUP, > .procname = "hugetlb_shm_group", > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 83decd6..68abef0 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1244,6 +1244,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, > return ret; > } > > +#define NUMA_NO_NODE_OBEY_MEMPOLICY (-2) > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, > int nid) I think it would be possible to avoid adding NUMA_NO_NODE_OBEY_MEMPOLICY if the nodemask was allocated in the sysctl handler instead and passing it into set_max_huge_pages() instead of a nid. Lee, what do you think? Other than that, I like this approach because it avoids the potential for userspace breakage while adding the new feature in way that avoids confusion.
next prev parent reply other threads:[~2009-09-11 22:27 UTC|newest] Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-08-28 16:03 [PATCH 0/6] hugetlb: V5 constrain allocation/free based on task mempolicy Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-08-28 16:03 ` [PATCH 1/6] hugetlb: rework hstate_next_node_* functions Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-08-28 16:03 ` [PATCH 2/6] hugetlb: add nodemask arg to huge page alloc, free and surplus adjust fcns Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-09-03 18:39 ` David Rientjes 2009-09-03 18:39 ` David Rientjes 2009-08-28 16:03 ` [PATCH 3/6] hugetlb: derive huge pages nodes allowed from task mempolicy Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-09-01 14:47 ` Mel Gorman 2009-09-01 14:47 ` Mel Gorman 2009-09-03 19:22 ` David Rientjes 2009-09-03 19:22 ` David Rientjes 2009-09-03 20:15 ` Lee Schermerhorn 2009-09-03 20:15 ` Lee Schermerhorn 2009-09-03 20:49 ` David Rientjes 2009-09-03 20:49 ` David Rientjes 2009-08-28 16:03 ` [PATCH 4/6] hugetlb: introduce alloc_nodemask_of_node Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-09-01 14:49 ` Mel Gorman 2009-09-01 14:49 ` Mel Gorman 2009-09-01 16:42 ` Lee Schermerhorn 2009-09-01 16:42 ` Lee Schermerhorn 2009-09-03 18:34 ` David Rientjes 2009-09-03 18:34 ` David Rientjes 2009-09-03 20:49 ` Lee Schermerhorn 2009-09-03 21:03 ` David Rientjes 2009-09-03 21:03 ` David Rientjes 2009-08-28 16:03 ` [PATCH 5/6] hugetlb: add per node hstate attributes Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-09-01 15:20 ` Mel Gorman 2009-09-01 15:20 ` Mel Gorman 2009-09-03 19:52 ` David Rientjes 2009-09-03 19:52 ` David Rientjes 2009-09-03 20:41 ` Lee Schermerhorn 2009-09-03 20:41 ` Lee Schermerhorn 2009-09-03 21:02 ` David Rientjes 2009-09-03 21:02 ` David Rientjes 2009-09-04 14:30 ` Lee Schermerhorn 2009-09-04 14:30 ` Lee Schermerhorn 2009-08-28 16:03 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn 2009-08-28 16:03 ` Lee Schermerhorn 2009-09-03 20:07 ` David Rientjes 2009-09-03 20:07 ` David Rientjes 2009-09-03 21:09 ` Lee Schermerhorn 2009-09-03 21:09 ` Lee Schermerhorn 2009-09-03 21:25 ` David Rientjes 2009-09-08 10:44 ` Mel Gorman 2009-09-08 10:44 ` Mel Gorman 2009-09-08 19:51 ` David Rientjes 2009-09-08 20:04 ` Mel Gorman 2009-09-08 20:04 ` Mel Gorman 2009-09-08 20:18 ` David Rientjes 2009-09-08 21:41 ` Mel Gorman 2009-09-08 21:41 ` Mel Gorman 2009-09-08 22:54 ` David Rientjes 2009-09-09 8:16 ` Mel Gorman 2009-09-09 8:16 ` Mel Gorman 2009-09-09 20:44 ` David Rientjes 2009-09-10 12:26 ` Mel Gorman 2009-09-10 12:26 ` Mel Gorman 2009-09-11 22:27 ` David Rientjes [this message] 2009-09-11 22:27 ` David Rientjes 2009-09-14 13:33 ` Mel Gorman 2009-09-14 14:15 ` Lee Schermerhorn 2009-09-14 14:15 ` Lee Schermerhorn 2009-09-14 15:41 ` Mel Gorman 2009-09-14 15:41 ` Mel Gorman 2009-09-14 19:15 ` David Rientjes 2009-09-14 19:15 ` David Rientjes 2009-09-15 11:48 ` Mel Gorman 2009-09-15 11:48 ` Mel Gorman 2009-09-14 19:14 ` David Rientjes 2009-09-14 19:14 ` David Rientjes 2009-09-14 21:28 ` David Rientjes 2009-09-16 10:21 ` Mel Gorman 2009-09-03 20:42 ` Randy Dunlap 2009-09-04 15:23 ` Lee Schermerhorn 2009-09-09 16:31 [PATCH 0/6] hugetlb: V6 constrain allocation/free based on task mempolicy Lee Schermerhorn 2009-09-09 16:32 ` [PATCH 6/6] hugetlb: update hugetlb documentation for mempolicy based management Lee Schermerhorn 2009-09-09 16:32 ` Lee Schermerhorn
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.DEB.1.00.0909111507540.22083@chino.kir.corp.google.com \ --to=rientjes@google.com \ --cc=Lee.Schermerhorn@hp.com \ --cc=agl@us.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=apw@canonical.com \ --cc=eric.whitney@hp.com \ --cc=linux-mm@kvack.org \ --cc=linux-numa@vger.kernel.org \ --cc=mel@csn.ul.ie \ --cc=nacc@us.ibm.com \ --cc=randy.dunlap@oracle.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.