* [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() @ 2019-02-23 1:32 Jing Xiangfeng 2019-02-25 0:45 ` Mike Kravetz 0 siblings, 1 reply; 20+ messages in thread From: Jing Xiangfeng @ 2019-02-23 1:32 UTC (permalink / raw) To: mike.kravetz, mhocko Cc: akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel, Jing Xiangfeng User can change a node specific hugetlb count. i.e. /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages the calculated value of count is a total number of huge pages. It could be overflow when a user entering a crazy high value. If so, the total number of huge pages could be a small value which is not user expect. We can simply fix it by setting count to ULONG_MAX, then it goes on. This may be more in line with user's intention of allocating as many huge pages as possible. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> --- mm/hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index afef616..6688894 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, * per node hstate attribute: adjust count to global, * but restrict alloc/free to the specified node. */ + unsigned long old_count = count; count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; + /* + * If user specified count causes overflow, set to + * largest possible value. + */ + if (count < old_count) + count = ULONG_MAX; init_nodemask_of_node(nodes_allowed, nid); } else nodes_allowed = &node_states[N_MEMORY]; -- 2.7.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-23 1:32 [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() Jing Xiangfeng @ 2019-02-25 0:45 ` Mike Kravetz 2019-02-25 3:17 ` David Rientjes 0 siblings, 1 reply; 20+ messages in thread From: Mike Kravetz @ 2019-02-25 0:45 UTC (permalink / raw) To: Jing Xiangfeng, mhocko Cc: akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel On 2/22/19 5:32 PM, Jing Xiangfeng wrote: > User can change a node specific hugetlb count. i.e. > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > the calculated value of count is a total number of huge pages. It could > be overflow when a user entering a crazy high value. If so, the total > number of huge pages could be a small value which is not user expect. > We can simply fix it by setting count to ULONG_MAX, then it goes on. This > may be more in line with user's intention of allocating as many huge pages > as possible. > > Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Thank you. Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index afef616..6688894 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, > * per node hstate attribute: adjust count to global, > * but restrict alloc/free to the specified node. > */ > + unsigned long old_count = count; > count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + /* > + * If user specified count causes overflow, set to > + * largest possible value. > + */ > + if (count < old_count) > + count = ULONG_MAX; > init_nodemask_of_node(nodes_allowed, nid); > } else > nodes_allowed = &node_states[N_MEMORY]; > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-25 0:45 ` Mike Kravetz @ 2019-02-25 3:17 ` David Rientjes 2019-02-25 16:49 ` Mike Kravetz 0 siblings, 1 reply; 20+ messages in thread From: David Rientjes @ 2019-02-25 3:17 UTC (permalink / raw) To: Mike Kravetz Cc: Jing Xiangfeng, mhocko, akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel On Sun, 24 Feb 2019, Mike Kravetz wrote: > > User can change a node specific hugetlb count. i.e. > > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > > the calculated value of count is a total number of huge pages. It could > > be overflow when a user entering a crazy high value. If so, the total > > number of huge pages could be a small value which is not user expect. > > We can simply fix it by setting count to ULONG_MAX, then it goes on. This > > may be more in line with user's intention of allocating as many huge pages > > as possible. > > > > Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > > Thank you. > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > > > --- > > mm/hugetlb.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index afef616..6688894 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, > > * per node hstate attribute: adjust count to global, > > * but restrict alloc/free to the specified node. > > */ > > + unsigned long old_count = count; > > count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > > + /* > > + * If user specified count causes overflow, set to > > + * largest possible value. > > + */ > > + if (count < old_count) > > + count = ULONG_MAX; > > init_nodemask_of_node(nodes_allowed, nid); > > } else > > nodes_allowed = &node_states[N_MEMORY]; > > Looks like this fixes the overflow issue, but isn't there already a possible underflow since we don't hold hugetlb_lock? Even if count == 0, what prevents h->nr_huge_pages_node[nid] being greater than h->nr_huge_pages here? I think the per hstate values need to be read with READ_ONCE() and stored on the stack to do any sane bounds checking. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-25 3:17 ` David Rientjes @ 2019-02-25 16:49 ` Mike Kravetz 2019-02-25 18:19 ` Mike Kravetz 0 siblings, 1 reply; 20+ messages in thread From: Mike Kravetz @ 2019-02-25 16:49 UTC (permalink / raw) To: David Rientjes Cc: Jing Xiangfeng, mhocko, akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel On 2/24/19 7:17 PM, David Rientjes wrote: > On Sun, 24 Feb 2019, Mike Kravetz wrote: > >>> User can change a node specific hugetlb count. i.e. >>> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages >>> the calculated value of count is a total number of huge pages. It could >>> be overflow when a user entering a crazy high value. If so, the total >>> number of huge pages could be a small value which is not user expect. >>> We can simply fix it by setting count to ULONG_MAX, then it goes on. This >>> may be more in line with user's intention of allocating as many huge pages >>> as possible. >>> >>> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> >> >> Thank you. >> >> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> >> >>> --- >>> mm/hugetlb.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index afef616..6688894 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, >>> * per node hstate attribute: adjust count to global, >>> * but restrict alloc/free to the specified node. >>> */ >>> + unsigned long old_count = count; >>> count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >>> + /* >>> + * If user specified count causes overflow, set to >>> + * largest possible value. >>> + */ >>> + if (count < old_count) >>> + count = ULONG_MAX; >>> init_nodemask_of_node(nodes_allowed, nid); >>> } else >>> nodes_allowed = &node_states[N_MEMORY]; >>> > > Looks like this fixes the overflow issue, but isn't there already a > possible underflow since we don't hold hugetlb_lock? Even if > count == 0, what prevents h->nr_huge_pages_node[nid] being greater than > h->nr_huge_pages here? I think the per hstate values need to be read with > READ_ONCE() and stored on the stack to do any sane bounds checking. Yes, without holding the lock there is the potential for issues. Looking back to when the node specific code was added there is a comment about "re-use/share as much of the existing global hstate attribute initialization and handling". I suspect that is the reason for these calculations outside the lock. As you mention above, nr_huge_pages_node[nid] could be greater than nr_huge_pages. This is true even if we do READ_ONCE(). So, the code would need to test for this condition and 'fix up' values or re-read. It is just racy without holding the lock. If that is too ugly, then we could just add code for the node specific adjustments. set_max_huge_pages() is only called from here. It would be pretty easy to modify set_max_huge_pages() to take the node specific value and do calculations/adjustments under the lock. -- Mike Kravetz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-25 16:49 ` Mike Kravetz @ 2019-02-25 18:19 ` Mike Kravetz 2019-02-25 19:17 ` David Rientjes 0 siblings, 1 reply; 20+ messages in thread From: Mike Kravetz @ 2019-02-25 18:19 UTC (permalink / raw) To: David Rientjes Cc: Jing Xiangfeng, mhocko, akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel > On 2/24/19 7:17 PM, David Rientjes wrote: >> On Sun, 24 Feb 2019, Mike Kravetz wrote: >>>> @@ -2423,7 +2423,14 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, >>>> * per node hstate attribute: adjust count to global, >>>> * but restrict alloc/free to the specified node. >>>> */ >>>> + unsigned long old_count = count; >>>> count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >>>> + /* >>>> + * If user specified count causes overflow, set to >>>> + * largest possible value. >>>> + */ >>>> + if (count < old_count) >>>> + count = ULONG_MAX; >>>> init_nodemask_of_node(nodes_allowed, nid); >>>> } else >>>> nodes_allowed = &node_states[N_MEMORY]; >>>> >> >> Looks like this fixes the overflow issue, but isn't there already a >> possible underflow since we don't hold hugetlb_lock? Even if >> count == 0, what prevents h->nr_huge_pages_node[nid] being greater than >> h->nr_huge_pages here? I think the per hstate values need to be read with >> READ_ONCE() and stored on the stack to do any sane bounds checking. > > Yes, without holding the lock there is the potential for issues. Looking > back to when the node specific code was added there is a comment about > "re-use/share as much of the existing global hstate attribute initialization > and handling". I suspect that is the reason for these calculations outside > the lock. > > As you mention above, nr_huge_pages_node[nid] could be greater than > nr_huge_pages. This is true even if we do READ_ONCE(). So, the code would > need to test for this condition and 'fix up' values or re-read. It is just > racy without holding the lock. > > If that is too ugly, then we could just add code for the node specific > adjustments. set_max_huge_pages() is only called from here. It would be > pretty easy to modify set_max_huge_pages() to take the node specific value > and do calculations/adjustments under the lock. Ok, what about just moving the calculation/check inside the lock as in the untested patch below? Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1c5219193b9e..5afa77dc7bc8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, } #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) -static int set_max_huge_pages(struct hstate *h, unsigned long count, +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { unsigned long min_count, ret; @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, goto decrease_pool; } + spin_lock(&hugetlb_lock); + + /* + * Check for a node specific request. Adjust global count, but + * restrict alloc/free to the specified node. + */ + if (nid != NUMA_NO_NODE) { + unsigned long old_count = count; + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; + /* + * If user specified count causes overflow, set to + * largest possible value. + */ + if (count < old_count) + count = ULONG_MAX; + } + /* * Increase the pool size * First take pages out of surplus state. Then make up the @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, * pool might be one hugepage larger than it needs to be, but * within all the constraints specified by the sysctls. */ - spin_lock(&hugetlb_lock); while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, -1)) break; @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, nodes_allowed = &node_states[N_MEMORY]; } } else if (nodes_allowed) { + /* Node specific request */ + init_nodemask_of_node(nodes_allowed, nid); + } else { /* - * per node hstate attribute: adjust count to global, - * but restrict alloc/free to the specified node. + * Node specific request, but we could not allocate + * node mask. Pass in ALL nodes, and clear nid. */ - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; - init_nodemask_of_node(nodes_allowed, nid); - } else + nid = NUMA_NO_NODE; nodes_allowed = &node_states[N_MEMORY]; + } - err = set_max_huge_pages(h, count, nodes_allowed); + err = set_max_huge_pages(h, count, nid, nodes_allowed); if (err) goto out; -- 2.17.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-25 18:19 ` Mike Kravetz @ 2019-02-25 19:17 ` David Rientjes 2019-02-26 2:22 ` Jing Xiangfeng 0 siblings, 1 reply; 20+ messages in thread From: David Rientjes @ 2019-02-25 19:17 UTC (permalink / raw) To: Mike Kravetz Cc: Jing Xiangfeng, mhocko, akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel On Mon, 25 Feb 2019, Mike Kravetz wrote: > Ok, what about just moving the calculation/check inside the lock as in the > untested patch below? > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 1c5219193b9e..5afa77dc7bc8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, > nodemask_t *nodes_allowed, > } > > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > -static int set_max_huge_pages(struct hstate *h, unsigned long count, > +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > nodemask_t *nodes_allowed) > { > unsigned long min_count, ret; > @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned > long count, > goto decrease_pool; > } > > + spin_lock(&hugetlb_lock); > + > + /* > + * Check for a node specific request. Adjust global count, but > + * restrict alloc/free to the specified node. > + */ > + if (nid != NUMA_NO_NODE) { > + unsigned long old_count = count; > + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + /* > + * If user specified count causes overflow, set to > + * largest possible value. > + */ > + if (count < old_count) > + count = ULONG_MAX; > + } > + > /* > * Increase the pool size > * First take pages out of surplus state. Then make up the > @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned > long count, > * pool might be one hugepage larger than it needs to be, but > * within all the constraints specified by the sysctls. > */ > - spin_lock(&hugetlb_lock); > while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { > if (!adjust_pool_surplus(h, nodes_allowed, -1)) > break; > @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool > obey_mempolicy, > nodes_allowed = &node_states[N_MEMORY]; > } > } else if (nodes_allowed) { > + /* Node specific request */ > + init_nodemask_of_node(nodes_allowed, nid); > + } else { > /* > - * per node hstate attribute: adjust count to global, > - * but restrict alloc/free to the specified node. > + * Node specific request, but we could not allocate > + * node mask. Pass in ALL nodes, and clear nid. > */ > - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > - init_nodemask_of_node(nodes_allowed, nid); > - } else > + nid = NUMA_NO_NODE; > nodes_allowed = &node_states[N_MEMORY]; > + } > > - err = set_max_huge_pages(h, count, nodes_allowed); > + err = set_max_huge_pages(h, count, nid, nodes_allowed); > if (err) > goto out; > Looks good; Jing, could you test that this fixes your case? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-25 19:17 ` David Rientjes @ 2019-02-26 2:22 ` Jing Xiangfeng 2019-02-26 6:21 ` David Rientjes 0 siblings, 1 reply; 20+ messages in thread From: Jing Xiangfeng @ 2019-02-26 2:22 UTC (permalink / raw) To: David Rientjes, Mike Kravetz Cc: mhocko, akpm, hughd, linux-mm, n-horiguchi, aarcange, kirill.shutemov, linux-kernel On 2019/2/26 3:17, David Rientjes wrote: > On Mon, 25 Feb 2019, Mike Kravetz wrote: > >> Ok, what about just moving the calculation/check inside the lock as in the >> untested patch below? >> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> >> --- >> mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- >> 1 file changed, 26 insertions(+), 8 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 1c5219193b9e..5afa77dc7bc8 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, >> nodemask_t *nodes_allowed, >> } >> >> #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) >> -static int set_max_huge_pages(struct hstate *h, unsigned long count, >> +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, >> nodemask_t *nodes_allowed) >> { >> unsigned long min_count, ret; >> @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned >> long count, >> goto decrease_pool; >> } >> >> + spin_lock(&hugetlb_lock); >> + >> + /* >> + * Check for a node specific request. Adjust global count, but >> + * restrict alloc/free to the specified node. >> + */ >> + if (nid != NUMA_NO_NODE) { >> + unsigned long old_count = count; >> + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >> + /* >> + * If user specified count causes overflow, set to >> + * largest possible value. >> + */ >> + if (count < old_count) >> + count = ULONG_MAX; >> + } >> + >> /* >> * Increase the pool size >> * First take pages out of surplus state. Then make up the >> @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned >> long count, >> * pool might be one hugepage larger than it needs to be, but >> * within all the constraints specified by the sysctls. >> */ >> - spin_lock(&hugetlb_lock); >> while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { >> if (!adjust_pool_surplus(h, nodes_allowed, -1)) >> break; >> @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool >> obey_mempolicy, >> nodes_allowed = &node_states[N_MEMORY]; >> } >> } else if (nodes_allowed) { >> + /* Node specific request */ >> + init_nodemask_of_node(nodes_allowed, nid); >> + } else { >> /* >> - * per node hstate attribute: adjust count to global, >> - * but restrict alloc/free to the specified node. >> + * Node specific request, but we could not allocate >> + * node mask. Pass in ALL nodes, and clear nid. >> */ >> - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >> - init_nodemask_of_node(nodes_allowed, nid); >> - } else >> + nid = NUMA_NO_NODE; >> nodes_allowed = &node_states[N_MEMORY]; >> + } >> >> - err = set_max_huge_pages(h, count, nodes_allowed); >> + err = set_max_huge_pages(h, count, nid, nodes_allowed); >> if (err) >> goto out; >> > > Looks good; Jing, could you test that this fixes your case? Yes, I have tested this patch, it can also fix my case. > > . > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-26 2:22 ` Jing Xiangfeng @ 2019-02-26 6:21 ` David Rientjes 2019-02-26 19:32 ` Mike Kravetz 0 siblings, 1 reply; 20+ messages in thread From: David Rientjes @ 2019-02-26 6:21 UTC (permalink / raw) To: Jing Xiangfeng Cc: Mike Kravetz, mhocko, Andrew Morton, hughd, linux-mm, n-horiguchi, Andrea Arcangeli, kirill.shutemov, linux-kernel On Tue, 26 Feb 2019, Jing Xiangfeng wrote: > On 2019/2/26 3:17, David Rientjes wrote: > > On Mon, 25 Feb 2019, Mike Kravetz wrote: > > > >> Ok, what about just moving the calculation/check inside the lock as in the > >> untested patch below? > >> > >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > >> --- > >> mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- > >> 1 file changed, 26 insertions(+), 8 deletions(-) > >> > >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >> index 1c5219193b9e..5afa77dc7bc8 100644 > >> --- a/mm/hugetlb.c > >> +++ b/mm/hugetlb.c > >> @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, > >> nodemask_t *nodes_allowed, > >> } > >> > >> #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > >> -static int set_max_huge_pages(struct hstate *h, unsigned long count, > >> +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > >> nodemask_t *nodes_allowed) > >> { > >> unsigned long min_count, ret; > >> @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned > >> long count, > >> goto decrease_pool; > >> } > >> > >> + spin_lock(&hugetlb_lock); > >> + > >> + /* > >> + * Check for a node specific request. Adjust global count, but > >> + * restrict alloc/free to the specified node. > >> + */ > >> + if (nid != NUMA_NO_NODE) { > >> + unsigned long old_count = count; > >> + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > >> + /* > >> + * If user specified count causes overflow, set to > >> + * largest possible value. > >> + */ > >> + if (count < old_count) > >> + count = ULONG_MAX; > >> + } > >> + > >> /* > >> * Increase the pool size > >> * First take pages out of surplus state. Then make up the > >> @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned > >> long count, > >> * pool might be one hugepage larger than it needs to be, but > >> * within all the constraints specified by the sysctls. > >> */ > >> - spin_lock(&hugetlb_lock); > >> while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { > >> if (!adjust_pool_surplus(h, nodes_allowed, -1)) > >> break; > >> @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool > >> obey_mempolicy, > >> nodes_allowed = &node_states[N_MEMORY]; > >> } > >> } else if (nodes_allowed) { > >> + /* Node specific request */ > >> + init_nodemask_of_node(nodes_allowed, nid); > >> + } else { > >> /* > >> - * per node hstate attribute: adjust count to global, > >> - * but restrict alloc/free to the specified node. > >> + * Node specific request, but we could not allocate > >> + * node mask. Pass in ALL nodes, and clear nid. > >> */ > >> - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > >> - init_nodemask_of_node(nodes_allowed, nid); > >> - } else > >> + nid = NUMA_NO_NODE; > >> nodes_allowed = &node_states[N_MEMORY]; > >> + } > >> > >> - err = set_max_huge_pages(h, count, nodes_allowed); > >> + err = set_max_huge_pages(h, count, nid, nodes_allowed); > >> if (err) > >> goto out; > >> > > > > Looks good; Jing, could you test that this fixes your case? > > Yes, I have tested this patch, it can also fix my case. Great! Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Tested-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Acked-by: David Rientjes <rientjes@google.com> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-26 6:21 ` David Rientjes @ 2019-02-26 19:32 ` Mike Kravetz 2019-02-26 22:36 ` Andrew Morton 2019-03-04 6:00 ` Naoya Horiguchi 0 siblings, 2 replies; 20+ messages in thread From: Mike Kravetz @ 2019-02-26 19:32 UTC (permalink / raw) To: David Rientjes, Jing Xiangfeng, Andrew Morton Cc: mhocko, hughd, linux-mm, n-horiguchi, Andrea Arcangeli, kirill.shutemov, linux-kernel On 2/25/19 10:21 PM, David Rientjes wrote: > On Tue, 26 Feb 2019, Jing Xiangfeng wrote: >> On 2019/2/26 3:17, David Rientjes wrote: >>> On Mon, 25 Feb 2019, Mike Kravetz wrote: >>> >>>> Ok, what about just moving the calculation/check inside the lock as in the >>>> untested patch below? >>>> >>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> <snip> >>> >>> Looks good; Jing, could you test that this fixes your case? >> >> Yes, I have tested this patch, it can also fix my case. > > Great! > > Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > Tested-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > Acked-by: David Rientjes <rientjes@google.com> Thanks Jing and David! Here is the patch with an updated commit message and above tags: From: Mike Kravetz <mike.kravetz@oracle.com> Date: Tue, 26 Feb 2019 10:43:24 -0800 Subject: [PATCH] hugetlbfs: fix potential over/underflow setting node specific nr_hugepages The number of node specific huge pages can be set via a file such as: /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages When a node specific value is specified, the global number of huge pages must also be adjusted. This adjustment is calculated as the specified node specific value + (global value - current node value). If the node specific value provided by the user is large enough, this calculation could overflow an unsigned long leading to a smaller than expected number of huge pages. To fix, check the calculation for overflow. If overflow is detected, use ULONG_MAX as the requested value. This is inline with the user request to allocate as many huge pages as possible. It was also noticed that the above calculation was done outside the hugetlb_lock. Therefore, the values could be inconsistent and result in underflow. To fix, the calculation is moved to within the routine set_max_huge_pages() where the lock is held. Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Acked-by: David Rientjes <rientjes@google.com> --- mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b37e3100b7cc..a7e4223d2df5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, } #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) -static int set_max_huge_pages(struct hstate *h, unsigned long count, +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { unsigned long min_count, ret; @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, goto decrease_pool; } + spin_lock(&hugetlb_lock); + + /* + * Check for a node specific request. Adjust global count, but + * restrict alloc/free to the specified node. + */ + if (nid != NUMA_NO_NODE) { + unsigned long old_count = count; + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; + /* + * If user specified count causes overflow, set to + * largest possible value. + */ + if (count < old_count) + count = ULONG_MAX; + } + /* * Increase the pool size * First take pages out of surplus state. Then make up the @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, * pool might be one hugepage larger than it needs to be, but * within all the constraints specified by the sysctls. */ - spin_lock(&hugetlb_lock); while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, -1)) break; @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, nodes_allowed = &node_states[N_MEMORY]; } } else if (nodes_allowed) { + /* Node specific request */ + init_nodemask_of_node(nodes_allowed, nid); + } else { /* - * per node hstate attribute: adjust count to global, - * but restrict alloc/free to the specified node. + * Node specific request, but we could not allocate + * node mask. Pass in ALL nodes, and clear nid. */ - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; - init_nodemask_of_node(nodes_allowed, nid); - } else + nid = NUMA_NO_NODE; nodes_allowed = &node_states[N_MEMORY]; + } - err = set_max_huge_pages(h, count, nodes_allowed); + err = set_max_huge_pages(h, count, nid, nodes_allowed); if (err) goto out; -- 2.17.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-26 19:32 ` Mike Kravetz @ 2019-02-26 22:36 ` Andrew Morton 2019-02-27 0:03 ` Mike Kravetz 2019-03-04 6:00 ` Naoya Horiguchi 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2019-02-26 22:36 UTC (permalink / raw) To: Mike Kravetz Cc: David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, n-horiguchi, Andrea Arcangeli, kirill.shutemov, linux-kernel > > The number of node specific huge pages can be set via a file such as: > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > When a node specific value is specified, the global number of huge > pages must also be adjusted. This adjustment is calculated as the > specified node specific value + (global value - current node value). > If the node specific value provided by the user is large enough, this > calculation could overflow an unsigned long leading to a smaller > than expected number of huge pages. > > To fix, check the calculation for overflow. If overflow is detected, > use ULONG_MAX as the requested value. This is inline with the user > request to allocate as many huge pages as possible. > > It was also noticed that the above calculation was done outside the > hugetlb_lock. Therefore, the values could be inconsistent and result > in underflow. To fix, the calculation is moved to within the routine > set_max_huge_pages() where the lock is held. > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, > nodemask_t *nodes_allowed, Please tweak that email client to prevent the wordwraps. > + /* > + * Check for a node specific request. Adjust global count, but > + * restrict alloc/free to the specified node. > + */ > + if (nid != NUMA_NO_NODE) { > + unsigned long old_count = count; > + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + /* > + * If user specified count causes overflow, set to > + * largest possible value. > + */ > + if (count < old_count) > + count = ULONG_MAX; > + } The above two comments explain the code, but do not reveal the reasoning behind the policy decisions which that code implements. > ... > > + } else { > /* > - * per node hstate attribute: adjust count to global, > - * but restrict alloc/free to the specified node. > + * Node specific request, but we could not allocate > + * node mask. Pass in ALL nodes, and clear nid. > */ Ditto here, somewhat. The old mantra: comments should explain "why", not "what". Reading the code tells us the "what". Thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-26 22:36 ` Andrew Morton @ 2019-02-27 0:03 ` Mike Kravetz 2019-03-04 13:48 ` Oscar Salvador 2019-03-05 0:03 ` Naoya Horiguchi 0 siblings, 2 replies; 20+ messages in thread From: Mike Kravetz @ 2019-02-27 0:03 UTC (permalink / raw) To: Andrew Morton, David Rientjes Cc: Jing Xiangfeng, mhocko, hughd, linux-mm, n-horiguchi, Andrea Arcangeli, kirill.shutemov, linux-kernel On 2/26/19 2:36 PM, Andrew Morton wrote: >> ... >> >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, >> nodemask_t *nodes_allowed, > > Please tweak that email client to prevent the wordwraps. Sorry about that. >> + /* >> + * Check for a node specific request. Adjust global count, but >> + * restrict alloc/free to the specified node. >> + */ Better comment might be: /* * Check for a node specific request. * Changing node specific huge page count may require a corresponding * change to the global count. In any case, the passed node mask * (nodes_allowed) will restrict alloc/free to the specified node. */ >> + if (nid != NUMA_NO_NODE) { >> + unsigned long old_count = count; >> + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; >> + /* >> + * If user specified count causes overflow, set to >> + * largest possible value. >> + */ Updated comment: /* * User may have specified a large count value which caused the * above calculation to overflow. In this case, they wanted * to allocate as many huge pages as possible. Set count to * largest possible value to align with their intention. */ >> + if (count < old_count) >> + count = ULONG_MAX; >> + } > > The above two comments explain the code, but do not reveal the > reasoning behind the policy decisions which that code implements. > >> ... >> >> + } else { >> /* >> - * per node hstate attribute: adjust count to global, >> - * but restrict alloc/free to the specified node. >> + * Node specific request, but we could not allocate >> + * node mask. Pass in ALL nodes, and clear nid. >> */ > > Ditto here, somewhat. I was just going to update the comments and send you a new patch, but but your comment got me thinking about this situation. I did not really change the way this code operates. As a reminder, the original code is like: NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); if (nid == NUMA_NO_NODE) { /* do something */ } else if (nodes_allowed) { /* do something else */ } else { nodes_allowed = &node_states[N_MEMORY]; } So, the only way we get to that final else if if we can not allocate a node mask (kmalloc a few words). Right? I wonder why we should even try to continue in this case. Why not just return right there? The specified count value is either a request to increase number of huge pages or decrease. If we can't allocate a few words, we certainly are not going to find memory to create huge pages. There 'might' be surplus pages which can be converted to permanent pages. But remember this is a 'node specific' request and we can't allocate a mask to pass down to the conversion routines. So, chances are good we would operate on the wrong node. The same goes for a request to 'free' huge pages. Since, we can't allocate a node mask we are likely to free them from the wrong node. Unless my reasoning above is incorrect, I think that final else block in __nr_hugepages_store_common() is wrong. Any additional thoughts? -- Mike Kravetz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-27 0:03 ` Mike Kravetz @ 2019-03-04 13:48 ` Oscar Salvador 2019-03-05 0:03 ` Naoya Horiguchi 1 sibling, 0 replies; 20+ messages in thread From: Oscar Salvador @ 2019-03-04 13:48 UTC (permalink / raw) To: Mike Kravetz Cc: Andrew Morton, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, n-horiguchi, Andrea Arcangeli, kirill.shutemov, linux-kernel On Tue, Feb 26, 2019 at 04:03:23PM -0800, Mike Kravetz wrote: > I was just going to update the comments and send you a new patch, but > but your comment got me thinking about this situation. I did not really > change the way this code operates. As a reminder, the original code is like: > > NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); > > if (nid == NUMA_NO_NODE) { > /* do something */ > } else if (nodes_allowed) { > /* do something else */ > } else { > nodes_allowed = &node_states[N_MEMORY]; > } > > So, the only way we get to that final else if if we can not allocate > a node mask (kmalloc a few words). Right? I wonder why we should > even try to continue in this case. Why not just return right there? > > The specified count value is either a request to increase number of > huge pages or decrease. If we can't allocate a few words, we certainly > are not going to find memory to create huge pages. There 'might' be > surplus pages which can be converted to permanent pages. But remember > this is a 'node specific' request and we can't allocate a mask to pass > down to the conversion routines. So, chances are good we would operate > on the wrong node. The same goes for a request to 'free' huge pages. > Since, we can't allocate a node mask we are likely to free them from > the wrong node. > > Unless my reasoning above is incorrect, I think that final else block > in __nr_hugepages_store_common() is wrong. > > Any additional thoughts? Could not we kill the NODEMASK_ALLOC there? __nr_hugepages_store_common() should be called from a rather shallow stack, and unless I am wrong, the maximum value we can get for a nodemask_t is 128bytes (1024 NUMA nodes). -- Oscar Salvador SUSE L3 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-27 0:03 ` Mike Kravetz 2019-03-04 13:48 ` Oscar Salvador @ 2019-03-05 0:03 ` Naoya Horiguchi 2019-03-05 4:15 ` Mike Kravetz 1 sibling, 1 reply; 20+ messages in thread From: Naoya Horiguchi @ 2019-03-05 0:03 UTC (permalink / raw) To: Mike Kravetz Cc: Andrew Morton, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel On Tue, Feb 26, 2019 at 04:03:23PM -0800, Mike Kravetz wrote: > On 2/26/19 2:36 PM, Andrew Morton wrote: ... > >> > >> + } else { > >> /* > >> - * per node hstate attribute: adjust count to global, > >> - * but restrict alloc/free to the specified node. > >> + * Node specific request, but we could not allocate > >> + * node mask. Pass in ALL nodes, and clear nid. > >> */ > > > > Ditto here, somewhat. # I missed this part when reviewing yesterday for some reason, sorry. > > I was just going to update the comments and send you a new patch, but > but your comment got me thinking about this situation. I did not really > change the way this code operates. As a reminder, the original code is like: > > NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); > > if (nid == NUMA_NO_NODE) { > /* do something */ > } else if (nodes_allowed) { > /* do something else */ > } else { > nodes_allowed = &node_states[N_MEMORY]; > } > > So, the only way we get to that final else if if we can not allocate > a node mask (kmalloc a few words). Right? I wonder why we should > even try to continue in this case. Why not just return right there? Simply returning on allocation failure looks better to me. As you mentioned below, current behavior for this 'else' case is not helpful for anyone. Thanks, Naoya Horiguchi > > The specified count value is either a request to increase number of > huge pages or decrease. If we can't allocate a few words, we certainly > are not going to find memory to create huge pages. There 'might' be > surplus pages which can be converted to permanent pages. But remember > this is a 'node specific' request and we can't allocate a mask to pass > down to the conversion routines. So, chances are good we would operate > on the wrong node. The same goes for a request to 'free' huge pages. > Since, we can't allocate a node mask we are likely to free them from > the wrong node. > > Unless my reasoning above is incorrect, I think that final else block > in __nr_hugepages_store_common() is wrong. > > Any additional thoughts? > -- > Mike Kravetz > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-05 0:03 ` Naoya Horiguchi @ 2019-03-05 4:15 ` Mike Kravetz 2019-03-05 21:16 ` Andrew Morton 2019-03-06 9:41 ` Oscar Salvador 0 siblings, 2 replies; 20+ messages in thread From: Mike Kravetz @ 2019-03-05 4:15 UTC (permalink / raw) To: Naoya Horiguchi, Andrew Morton, Oscar Salvador Cc: David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Alexandre Ghiti On 3/4/19 4:03 PM, Naoya Horiguchi wrote: > On Tue, Feb 26, 2019 at 04:03:23PM -0800, Mike Kravetz wrote: >> On 2/26/19 2:36 PM, Andrew Morton wrote: > ... >>>> >>>> + } else { >>>> /* >>>> - * per node hstate attribute: adjust count to global, >>>> - * but restrict alloc/free to the specified node. >>>> + * Node specific request, but we could not allocate >>>> + * node mask. Pass in ALL nodes, and clear nid. >>>> */ >>> >>> Ditto here, somewhat. > > # I missed this part when reviewing yesterday for some reason, sorry. > >> >> I was just going to update the comments and send you a new patch, but >> but your comment got me thinking about this situation. I did not really >> change the way this code operates. As a reminder, the original code is like: >> >> NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY); >> >> if (nid == NUMA_NO_NODE) { >> /* do something */ >> } else if (nodes_allowed) { >> /* do something else */ >> } else { >> nodes_allowed = &node_states[N_MEMORY]; >> } >> >> So, the only way we get to that final else if if we can not allocate >> a node mask (kmalloc a few words). Right? I wonder why we should >> even try to continue in this case. Why not just return right there? > > Simply returning on allocation failure looks better to me. > As you mentioned below, current behavior for this 'else' case is not > helpful for anyone. Thanks Naoya. And, thank you Oscar for your suggestion. I think the simplest thing to do would be simply return in this case. In practice, we will likely never hit the condition. If we do, the system is really low on memory and there will be other more important issues. The revised patch below updates comments as suggested, and returns -ENOMEM if we can not allocate a node mask. Andrew, this is on top of Alexandre Ghiti's "hugetlb: allow to free gigantic pages regardless of the configuration" patch. Both patches modify __nr_hugepages_store_common(). Alex's patch is going to change slightly in this area. Let me know if there is something I can do to help make merging easier. From: Mike Kravetz <mike.kravetz@oracle.com> Date: Mon, 4 Mar 2019 17:45:11 -0800 Subject: [PATCH] hugetlbfs: fix potential over/underflow setting node specific nr_hugepages The number of node specific huge pages can be set via a file such as: /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages When a node specific value is specified, the global number of huge pages must also be adjusted. This adjustment is calculated as the specified node specific value + (global value - current node value). If the node specific value provided by the user is large enough, this calculation could overflow an unsigned long leading to a smaller than expected number of huge pages. To fix, check the calculation for overflow. If overflow is detected, use ULONG_MAX as the requested value. This is inline with the user request to allocate as many huge pages as possible. It was also noticed that the above calculation was done outside the hugetlb_lock. Therefore, the values could be inconsistent and result in underflow. To fix, the calculation is moved within the routine set_max_huge_pages() where the lock is held. In addition, the code in __nr_hugepages_store_common() which tries to handle the case of not being able to allocate a node mask would likely result in incorrect behavior. Luckily, it is very unlikely we will ever take this path. If we do, simply return ENOMEM. Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c5c4558e4a79..5a190a652cac 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, } #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) -static int set_max_huge_pages(struct hstate *h, unsigned long count, +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { unsigned long min_count, ret; @@ -2289,6 +2289,28 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, goto decrease_pool; } + spin_lock(&hugetlb_lock); + + /* + * Check for a node specific request. + * Changing node specific huge page count may require a corresponding + * change to the global count. In any case, the passed node mask + * (nodes_allowed) will restrict alloc/free to the specified node. + */ + if (nid != NUMA_NO_NODE) { + unsigned long old_count = count; + + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; + /* + * User may have specified a large count value which caused the + * above calculation to overflow. In this case, they wanted + * to allocate as many huge pages as possible. Set count to + * largest possible value to align with their intention. + */ + if (count < old_count) + count = ULONG_MAX; + } + /* * Increase the pool size * First take pages out of surplus state. Then make up the @@ -2300,7 +2322,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, * pool might be one hugepage larger than it needs to be, but * within all the constraints specified by the sysctls. */ - spin_lock(&hugetlb_lock); while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, -1)) break; @@ -2421,16 +2442,19 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, nodes_allowed = &node_states[N_MEMORY]; } } else if (nodes_allowed) { + /* Node specific request */ + init_nodemask_of_node(nodes_allowed, nid); + } else { /* - * per node hstate attribute: adjust count to global, - * but restrict alloc/free to the specified node. + * Node specific request, but we could not allocate the few + * words required for a node mask. We are unlikely to hit + * this condition. Since we can not pass down the appropriate + * node mask, just return ENOMEM. */ - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; - init_nodemask_of_node(nodes_allowed, nid); - } else - nodes_allowed = &node_states[N_MEMORY]; + return -ENOMEM; + } - err = set_max_huge_pages(h, count, nodes_allowed); + err = set_max_huge_pages(h, count, nid, nodes_allowed); if (err) goto out; -- 2.17.2 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-05 4:15 ` Mike Kravetz @ 2019-03-05 21:16 ` Andrew Morton 2019-03-05 21:35 ` Mike Kravetz 2019-03-06 9:41 ` Oscar Salvador 1 sibling, 1 reply; 20+ messages in thread From: Andrew Morton @ 2019-03-05 21:16 UTC (permalink / raw) To: Mike Kravetz Cc: Naoya Horiguchi, Oscar Salvador, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Alexandre Ghiti On Mon, 4 Mar 2019 20:15:40 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote: > Andrew, this is on top of Alexandre Ghiti's "hugetlb: allow to free gigantic > pages regardless of the configuration" patch. Both patches modify > __nr_hugepages_store_common(). Alex's patch is going to change slightly > in this area. OK, thanks, I missed that. Are the changes significant? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-05 21:16 ` Andrew Morton @ 2019-03-05 21:35 ` Mike Kravetz 2019-03-05 21:41 ` Alex Ghiti 0 siblings, 1 reply; 20+ messages in thread From: Mike Kravetz @ 2019-03-05 21:35 UTC (permalink / raw) To: Andrew Morton Cc: Naoya Horiguchi, Oscar Salvador, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Alexandre Ghiti On 3/5/19 1:16 PM, Andrew Morton wrote: > On Mon, 4 Mar 2019 20:15:40 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote: > >> Andrew, this is on top of Alexandre Ghiti's "hugetlb: allow to free gigantic >> pages regardless of the configuration" patch. Both patches modify >> __nr_hugepages_store_common(). Alex's patch is going to change slightly >> in this area. > > OK, thanks, I missed that. Are the changes significant? > No, changes should be minor. IIRC, just checking for a condition in an error path. -- Mike Kravetz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-05 21:35 ` Mike Kravetz @ 2019-03-05 21:41 ` Alex Ghiti 0 siblings, 0 replies; 20+ messages in thread From: Alex Ghiti @ 2019-03-05 21:41 UTC (permalink / raw) To: Mike Kravetz Cc: Naoya Horiguchi, Oscar Salvador, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Andrew Morton On 3/5/19 4:35 PM, Mike Kravetz wrote: > On 3/5/19 1:16 PM, Andrew Morton wrote: >> On Mon, 4 Mar 2019 20:15:40 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >>> Andrew, this is on top of Alexandre Ghiti's "hugetlb: allow to free gigantic >>> pages regardless of the configuration" patch. Both patches modify >>> __nr_hugepages_store_common(). Alex's patch is going to change slightly >>> in this area. >> OK, thanks, I missed that. Are the changes significant? >> > No, changes should be minor. IIRC, just checking for a condition in an > error path. I will send the v5 of this patch tomorrow, I was waiting for architecture maintainer remarks. I still miss sh, but I'm confident on this change. Alex ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-05 4:15 ` Mike Kravetz 2019-03-05 21:16 ` Andrew Morton @ 2019-03-06 9:41 ` Oscar Salvador 2019-03-07 0:17 ` Mike Kravetz 1 sibling, 1 reply; 20+ messages in thread From: Oscar Salvador @ 2019-03-06 9:41 UTC (permalink / raw) To: Mike Kravetz Cc: Naoya Horiguchi, Andrew Morton, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Alexandre Ghiti On Mon, Mar 04, 2019 at 08:15:40PM -0800, Mike Kravetz wrote: > In addition, the code in __nr_hugepages_store_common() which tries to > handle the case of not being able to allocate a node mask would likely > result in incorrect behavior. Luckily, it is very unlikely we will > ever take this path. If we do, simply return ENOMEM. Hi Mike, I still thnk that we could just get rid of the NODEMASK_ALLOC machinery here, it adds a needlessly complexity IMHO. Note that before "(5df66d306ec9: mm: fix comment for NODEMASK_ALLOC)", the comment about the size was wrong, showing a much bigger size that it actually was, and I would not be surprised if people started to add NODEMASK_ALLOC here and there because of that. Actually, there was a little talk about removing NODEMASK_ALLOC altogether, but some further checks must be done before. > Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> But the overall change looks good to me: Reviewed-by: Oscar Salvador <osalvador@suse.de> > --- > mm/hugetlb.c | 42 +++++++++++++++++++++++++++++++++--------- > 1 file changed, 33 insertions(+), 9 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c5c4558e4a79..5a190a652cac 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, > } > > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > -static int set_max_huge_pages(struct hstate *h, unsigned long count, > +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > nodemask_t *nodes_allowed) > { > unsigned long min_count, ret; > @@ -2289,6 +2289,28 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, > goto decrease_pool; > } > > + spin_lock(&hugetlb_lock); > + > + /* > + * Check for a node specific request. > + * Changing node specific huge page count may require a corresponding > + * change to the global count. In any case, the passed node mask > + * (nodes_allowed) will restrict alloc/free to the specified node. > + */ > + if (nid != NUMA_NO_NODE) { > + unsigned long old_count = count; > + > + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + /* > + * User may have specified a large count value which caused the > + * above calculation to overflow. In this case, they wanted > + * to allocate as many huge pages as possible. Set count to > + * largest possible value to align with their intention. > + */ > + if (count < old_count) > + count = ULONG_MAX; > + } > + > /* > * Increase the pool size > * First take pages out of surplus state. Then make up the > @@ -2300,7 +2322,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, > * pool might be one hugepage larger than it needs to be, but > * within all the constraints specified by the sysctls. > */ > - spin_lock(&hugetlb_lock); > while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { > if (!adjust_pool_surplus(h, nodes_allowed, -1)) > break; > @@ -2421,16 +2442,19 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, > nodes_allowed = &node_states[N_MEMORY]; > } > } else if (nodes_allowed) { > + /* Node specific request */ > + init_nodemask_of_node(nodes_allowed, nid); > + } else { > /* > - * per node hstate attribute: adjust count to global, > - * but restrict alloc/free to the specified node. > + * Node specific request, but we could not allocate the few > + * words required for a node mask. We are unlikely to hit > + * this condition. Since we can not pass down the appropriate > + * node mask, just return ENOMEM. > */ > - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > - init_nodemask_of_node(nodes_allowed, nid); > - } else > - nodes_allowed = &node_states[N_MEMORY]; > + return -ENOMEM; > + } > > - err = set_max_huge_pages(h, count, nodes_allowed); > + err = set_max_huge_pages(h, count, nid, nodes_allowed); > if (err) > goto out; > > -- > 2.17.2 > -- Oscar Salvador SUSE L3 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-03-06 9:41 ` Oscar Salvador @ 2019-03-07 0:17 ` Mike Kravetz 0 siblings, 0 replies; 20+ messages in thread From: Mike Kravetz @ 2019-03-07 0:17 UTC (permalink / raw) To: Oscar Salvador Cc: Naoya Horiguchi, Andrew Morton, David Rientjes, Jing Xiangfeng, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel, Alexandre Ghiti On 3/6/19 1:41 AM, Oscar Salvador wrote: > On Mon, Mar 04, 2019 at 08:15:40PM -0800, Mike Kravetz wrote: >> In addition, the code in __nr_hugepages_store_common() which tries to >> handle the case of not being able to allocate a node mask would likely >> result in incorrect behavior. Luckily, it is very unlikely we will >> ever take this path. If we do, simply return ENOMEM. > > Hi Mike, > > I still thnk that we could just get rid of the NODEMASK_ALLOC machinery > here, it adds a needlessly complexity IMHO. > Note that before "(5df66d306ec9: mm: fix comment for NODEMASK_ALLOC)", > the comment about the size was wrong, showing a much bigger size that it > actually was, and I would not be surprised if people started to add > NODEMASK_ALLOC here and there because of that. > > Actually, there was a little talk about removing NODEMASK_ALLOC altogether, > but some further checks must be done before. Thanks for the information. I too saw or remembered a large byte value. :( A quick grep doesn't reveal any configurable way to get NODE_SHIFT larger than 10. Of course, that could change. So, it does seem a bit funny that NODEMASK_ALLOC() kicks into dynamic allocation mode with NODE_SHIFT > 8. Although, my desktop distro has NODE_SHIFT set to 10. >> Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > But the overall change looks good to me: > > Reviewed-by: Oscar Salvador <osalvador@suse.de> Thanks. I'm going to leave as is for now and put off removal of the dynamic allocation for a later time. Unless, you get around to removing NODEMASK_ALLOC altogether. :) -- Mike Kravetz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() 2019-02-26 19:32 ` Mike Kravetz 2019-02-26 22:36 ` Andrew Morton @ 2019-03-04 6:00 ` Naoya Horiguchi 1 sibling, 0 replies; 20+ messages in thread From: Naoya Horiguchi @ 2019-03-04 6:00 UTC (permalink / raw) To: Mike Kravetz Cc: David Rientjes, Jing Xiangfeng, Andrew Morton, mhocko, hughd, linux-mm, Andrea Arcangeli, kirill.shutemov, linux-kernel On Tue, Feb 26, 2019 at 11:32:24AM -0800, Mike Kravetz wrote: > On 2/25/19 10:21 PM, David Rientjes wrote: > > On Tue, 26 Feb 2019, Jing Xiangfeng wrote: > >> On 2019/2/26 3:17, David Rientjes wrote: > >>> On Mon, 25 Feb 2019, Mike Kravetz wrote: > >>> > >>>> Ok, what about just moving the calculation/check inside the lock as in the > >>>> untested patch below? > >>>> > >>>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > <snip> > > >>> > >>> Looks good; Jing, could you test that this fixes your case? > >> > >> Yes, I have tested this patch, it can also fix my case. > > > > Great! > > > > Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > > Tested-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > > Acked-by: David Rientjes <rientjes@google.com> > > Thanks Jing and David! > > Here is the patch with an updated commit message and above tags: > > From: Mike Kravetz <mike.kravetz@oracle.com> > Date: Tue, 26 Feb 2019 10:43:24 -0800 > Subject: [PATCH] hugetlbfs: fix potential over/underflow setting node specific > nr_hugepages > > The number of node specific huge pages can be set via a file such as: > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > When a node specific value is specified, the global number of huge > pages must also be adjusted. This adjustment is calculated as the > specified node specific value + (global value - current node value). > If the node specific value provided by the user is large enough, this > calculation could overflow an unsigned long leading to a smaller > than expected number of huge pages. > > To fix, check the calculation for overflow. If overflow is detected, > use ULONG_MAX as the requested value. This is inline with the user > request to allocate as many huge pages as possible. > > It was also noticed that the above calculation was done outside the > hugetlb_lock. Therefore, the values could be inconsistent and result > in underflow. To fix, the calculation is moved to within the routine > set_max_huge_pages() where the lock is held. > > Reported-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > Tested-by: Jing Xiangfeng <jingxiangfeng@huawei.com> > Acked-by: David Rientjes <rientjes@google.com> Looks good to me with improved comments. Thanks everyone. Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > --- > mm/hugetlb.c | 34 ++++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index b37e3100b7cc..a7e4223d2df5 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2274,7 +2274,7 @@ static int adjust_pool_surplus(struct hstate *h, > nodemask_t *nodes_allowed, > } > > #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) > -static int set_max_huge_pages(struct hstate *h, unsigned long count, > +static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > nodemask_t *nodes_allowed) > { > unsigned long min_count, ret; > @@ -2289,6 +2289,23 @@ static int set_max_huge_pages(struct hstate *h, unsigned > long count, > goto decrease_pool; > } > > + spin_lock(&hugetlb_lock); > + > + /* > + * Check for a node specific request. Adjust global count, but > + * restrict alloc/free to the specified node. > + */ > + if (nid != NUMA_NO_NODE) { > + unsigned long old_count = count; > + count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > + /* > + * If user specified count causes overflow, set to > + * largest possible value. > + */ > + if (count < old_count) > + count = ULONG_MAX; > + } > + > /* > * Increase the pool size > * First take pages out of surplus state. Then make up the > @@ -2300,7 +2317,6 @@ static int set_max_huge_pages(struct hstate *h, unsigned > long count, > * pool might be one hugepage larger than it needs to be, but > * within all the constraints specified by the sysctls. > */ > - spin_lock(&hugetlb_lock); > while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { > if (!adjust_pool_surplus(h, nodes_allowed, -1)) > break; > @@ -2421,16 +2437,18 @@ static ssize_t __nr_hugepages_store_common(bool > obey_mempolicy, > nodes_allowed = &node_states[N_MEMORY]; > } > } else if (nodes_allowed) { > + /* Node specific request */ > + init_nodemask_of_node(nodes_allowed, nid); > + } else { > /* > - * per node hstate attribute: adjust count to global, > - * but restrict alloc/free to the specified node. > + * Node specific request, but we could not allocate > + * node mask. Pass in ALL nodes, and clear nid. > */ > - count += h->nr_huge_pages - h->nr_huge_pages_node[nid]; > - init_nodemask_of_node(nodes_allowed, nid); > - } else > + nid = NUMA_NO_NODE; > nodes_allowed = &node_states[N_MEMORY]; > + } > > - err = set_max_huge_pages(h, count, nodes_allowed); > + err = set_max_huge_pages(h, count, nid, nodes_allowed); > if (err) > goto out; > > -- > 2.17.2 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2019-03-07 0:18 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-23 1:32 [PATCH v4] mm/hugetlb: Fix unsigned overflow in __nr_hugepages_store_common() Jing Xiangfeng 2019-02-25 0:45 ` Mike Kravetz 2019-02-25 3:17 ` David Rientjes 2019-02-25 16:49 ` Mike Kravetz 2019-02-25 18:19 ` Mike Kravetz 2019-02-25 19:17 ` David Rientjes 2019-02-26 2:22 ` Jing Xiangfeng 2019-02-26 6:21 ` David Rientjes 2019-02-26 19:32 ` Mike Kravetz 2019-02-26 22:36 ` Andrew Morton 2019-02-27 0:03 ` Mike Kravetz 2019-03-04 13:48 ` Oscar Salvador 2019-03-05 0:03 ` Naoya Horiguchi 2019-03-05 4:15 ` Mike Kravetz 2019-03-05 21:16 ` Andrew Morton 2019-03-05 21:35 ` Mike Kravetz 2019-03-05 21:41 ` Alex Ghiti 2019-03-06 9:41 ` Oscar Salvador 2019-03-07 0:17 ` Mike Kravetz 2019-03-04 6:00 ` Naoya Horiguchi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).