* Re: [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set [not found] <20200318153424.3202304-1-guro@fb.com> @ 2020-03-18 16:16 ` Michal Hocko 2020-03-18 17:55 ` Roman Gushchin 0 siblings, 1 reply; 5+ messages in thread From: Michal Hocko @ 2020-03-18 16:16 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, linux-mm, kernel-team, linux-kernel, Rik van Riel, Andreas Schaufler, Mike Kravetz, Guido Günther, Naresh Kamboju On Wed 18-03-20 08:34:24, Roman Gushchin wrote: > If CONFIG_NUMA isn't set, there is no need to ensure that > the hugetlb cma area belongs to a specific numa node. > > min/max_low_pfn can be used for limiting the maximum size > of the hugetlb_cma area. > > Also for_each_mem_pfn_range() is defined only if > CONFIG_HAVE_MEMBLOCK_NODE_MAP is set, and on arm (unlike most > other architectures) it depends on CONFIG_NUMA. This makes the > build fail if CONFIG_NUMA isn't set. CONFIG_HAVE_MEMBLOCK_NODE_MAP has popped out as a problem several times already. Is there any real reason we cannot make it unconditional? Essentially make the functionality always enabled and drop the config? The code below is ugly as hell. Just look at it. You have for_each_node_state without any ifdefery but the having ifdef CONFIG_NUMA. That just doesn't make any sense. > Signed-off-by: Roman Gushchin <guro@fb.com> > Reported-by: Andreas Schaufler <andreas.schaufler@gmx.de> > --- > mm/hugetlb.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7a20cae7c77a..a6161239abde 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -5439,16 +5439,21 @@ void __init hugetlb_cma_reserve(int order) > > reserved = 0; > for_each_node_state(nid, N_ONLINE) { > - unsigned long start_pfn, end_pfn; > unsigned long min_pfn = 0, max_pfn = 0; > - int res, i; > + int res; > +#ifdef CONFIG_NUMA > + unsigned long start_pfn, end_pfn; > + int i; > > for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { > if (!min_pfn) > min_pfn = start_pfn; > max_pfn = end_pfn; > } > - > +#else > + min_pfn = min_low_pfn; > + max_pfn = max_low_pfn; > +#endif > size = max(per_node, hugetlb_cma_size - reserved); > size = round_up(size, PAGE_SIZE << order); > > -- > 2.24.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set 2020-03-18 16:16 ` [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set Michal Hocko @ 2020-03-18 17:55 ` Roman Gushchin 2020-03-19 16:16 ` Michal Hocko 0 siblings, 1 reply; 5+ messages in thread From: Roman Gushchin @ 2020-03-18 17:55 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-mm, kernel-team, linux-kernel, Rik van Riel, Andreas Schaufler, Mike Kravetz, Guido Günther, Naresh Kamboju On Wed, Mar 18, 2020 at 05:16:25PM +0100, Michal Hocko wrote: > On Wed 18-03-20 08:34:24, Roman Gushchin wrote: > > If CONFIG_NUMA isn't set, there is no need to ensure that > > the hugetlb cma area belongs to a specific numa node. > > > > min/max_low_pfn can be used for limiting the maximum size > > of the hugetlb_cma area. > > > > Also for_each_mem_pfn_range() is defined only if > > CONFIG_HAVE_MEMBLOCK_NODE_MAP is set, and on arm (unlike most > > other architectures) it depends on CONFIG_NUMA. This makes the > > build fail if CONFIG_NUMA isn't set. > > CONFIG_HAVE_MEMBLOCK_NODE_MAP has popped out as a problem several times > already. Is there any real reason we cannot make it unconditional? > Essentially make the functionality always enabled and drop the config? It depends on CONFIG_NUMA only on arm, and I really don't know if there is a good justification for it. It not, that will be a much simpler fix. > The code below is ugly as hell. Just look at it. You have > for_each_node_state without any ifdefery but the having ifdef > CONFIG_NUMA. That just doesn't make any sense. I don't think it makes no sense: it tries to reserve a cma area on each node (need for_each_node_state()), and it uses the for_each_mem_pfn_range() to get a min and max pfn for each node. With !CONFIG_NUMA the first part is reduced to one iteration and the second part is not required at all. I agree that for_each_mem_pfn_range() here looks quite ugly, but I don't know of a better way to get min/max pfns for a node so early in the boot process. If somebody has any ideas here, I'll appreciate a lot. I know Rik plans some further improvements here, so the goal for now is to fix the build. If you think that enabling CONFIG_HAVE_MEMBLOCK_NODE_MAP unconditionally is a way to go, I'm fine with it too. Rik also posted a different fix for the build problem, but from what I've seen it didn't fix it completely. I'm fine with either option here. Thanks! > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > Reported-by: Andreas Schaufler <andreas.schaufler@gmx.de> > > --- > > mm/hugetlb.c | 11 ++++++++--- > > 1 file changed, 8 insertions(+), 3 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 7a20cae7c77a..a6161239abde 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -5439,16 +5439,21 @@ void __init hugetlb_cma_reserve(int order) > > > > reserved = 0; > > for_each_node_state(nid, N_ONLINE) { > > - unsigned long start_pfn, end_pfn; > > unsigned long min_pfn = 0, max_pfn = 0; > > - int res, i; > > + int res; > > +#ifdef CONFIG_NUMA > > + unsigned long start_pfn, end_pfn; > > + int i; > > > > for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { > > if (!min_pfn) > > min_pfn = start_pfn; > > max_pfn = end_pfn; > > } > > - > > +#else > > + min_pfn = min_low_pfn; > > + max_pfn = max_low_pfn; > > +#endif > > size = max(per_node, hugetlb_cma_size - reserved); > > size = round_up(size, PAGE_SIZE << order); > > > > -- > > 2.24.1 > > -- > Michal Hocko > SUSE Labs > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set 2020-03-18 17:55 ` Roman Gushchin @ 2020-03-19 16:16 ` Michal Hocko 2020-03-19 16:56 ` Rik van Riel 0 siblings, 1 reply; 5+ messages in thread From: Michal Hocko @ 2020-03-19 16:16 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, linux-mm, kernel-team, linux-kernel, Rik van Riel, Andreas Schaufler, Mike Kravetz, Guido Günther, Naresh Kamboju On Wed 18-03-20 10:55:29, Roman Gushchin wrote: > On Wed, Mar 18, 2020 at 05:16:25PM +0100, Michal Hocko wrote: > > On Wed 18-03-20 08:34:24, Roman Gushchin wrote: > > > If CONFIG_NUMA isn't set, there is no need to ensure that > > > the hugetlb cma area belongs to a specific numa node. > > > > > > min/max_low_pfn can be used for limiting the maximum size > > > of the hugetlb_cma area. > > > > > > Also for_each_mem_pfn_range() is defined only if > > > CONFIG_HAVE_MEMBLOCK_NODE_MAP is set, and on arm (unlike most > > > other architectures) it depends on CONFIG_NUMA. This makes the > > > build fail if CONFIG_NUMA isn't set. > > > > CONFIG_HAVE_MEMBLOCK_NODE_MAP has popped out as a problem several times > > already. Is there any real reason we cannot make it unconditional? > > Essentially make the functionality always enabled and drop the config? > > It depends on CONFIG_NUMA only on arm, and I really don't know > if there is a good justification for it. It not, that will be a much > simpler fix. I have checked the history and the dependency is there since NUMA was introduced in arm64. So it would be great to double check with arch maintainers. > > The code below is ugly as hell. Just look at it. You have > > for_each_node_state without any ifdefery but the having ifdef > > CONFIG_NUMA. That just doesn't make any sense. > > I don't think it makes no sense: > it tries to reserve a cma area on each node (need for_each_node_state()), > and it uses the for_each_mem_pfn_range() to get a min and max pfn > for each node. With !CONFIG_NUMA the first part is reduced to one > iteration and the second part is not required at all. Sure the resulting code logic makes sense. I meant that it doesn't make much sense wrt readability. There is a loop over all existing numa nodes to have ifdef for NUMA inside the loop. See? > I agree that for_each_mem_pfn_range() here looks quite ugly, but I don't know > of a better way to get min/max pfns for a node so early in the boot process. > If somebody has any ideas here, I'll appreciate a lot. The loop is ok. Maybe we have other memblock API that would be better but I am not really aware of it from top of my head. I would stick with it. It just sucks that this API depends on HAVE_MEMBLOCK_NODE_MAP and that it is not generally available. This is what I am complaining about. Just look what kind of dirty hack it made you to create ;) > I know Rik plans some further improvements here, so the goal for now > is to fix the build. If you think that enabling CONFIG_HAVE_MEMBLOCK_NODE_MAP > unconditionally is a way to go, I'm fine with it too. This is not the first time HAVE_MEMBLOCK_NODE_MAP has been problematic. I might be missing something but I really do not get why do we really need it these days. As for !NUMA, I suspect we can make it generate the right thing when !NUMA. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set 2020-03-19 16:16 ` Michal Hocko @ 2020-03-19 16:56 ` Rik van Riel 2020-03-19 17:01 ` Michal Hocko 0 siblings, 1 reply; 5+ messages in thread From: Rik van Riel @ 2020-03-19 16:56 UTC (permalink / raw) To: Michal Hocko, Roman Gushchin, Aslan Bakirov Cc: Andrew Morton, linux-mm, kernel-team, linux-kernel, Andreas Schaufler, Mike Kravetz, Guido Günther, Naresh Kamboju [-- Attachment #1: Type: text/plain, Size: 863 bytes --] On Thu, 2020-03-19 at 17:16 +0100, Michal Hocko wrote: > > This is not the first time HAVE_MEMBLOCK_NODE_MAP has been > problematic. > I might be missing something but I really do not get why do we really > need it these days. As for !NUMA, I suspect we can make it generate > the > right thing when !NUMA. We're working on a different fix now. It looks like cma_declare_contiguous calls memblock_phys_alloc_range, which calls memblock_alloc_range_nid, which takes a NUMA node as one of its arguments. Aslan is looking at simply adding a cma_declare_contiguous_nid, which also takes a NUMA node ID as an argument. At that point we can simply leave CMA free to allocate from anywhere in each NUMA node, which by default already happens from the top down. That should be the nicer long term fix to this issue. -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set 2020-03-19 16:56 ` Rik van Riel @ 2020-03-19 17:01 ` Michal Hocko 0 siblings, 0 replies; 5+ messages in thread From: Michal Hocko @ 2020-03-19 17:01 UTC (permalink / raw) To: Rik van Riel Cc: Roman Gushchin, Aslan Bakirov, Andrew Morton, linux-mm, kernel-team, linux-kernel, Andreas Schaufler, Mike Kravetz, Guido Günther, Naresh Kamboju On Thu 19-03-20 12:56:34, Rik van Riel wrote: > On Thu, 2020-03-19 at 17:16 +0100, Michal Hocko wrote: > > > > This is not the first time HAVE_MEMBLOCK_NODE_MAP has been > > problematic. > > I might be missing something but I really do not get why do we really > > need it these days. As for !NUMA, I suspect we can make it generate > > the > > right thing when !NUMA. > > We're working on a different fix now. > > It looks like cma_declare_contiguous calls memblock_phys_alloc_range, > which calls memblock_alloc_range_nid, which takes a NUMA node as one > of its arguments. > > Aslan is looking at simply adding a cma_declare_contiguous_nid, which > also takes a NUMA node ID as an argument. At that point we can simply > leave CMA free to allocate from anywhere in each NUMA node, which by > default already happens from the top down. > > That should be the nicer long term fix to this issue. Yes, that sounds like a better solution. Not that I would be much happier to have HAVE_MEMBLOCK_NODE_MAP off the table as well ;) Ohh well, it will have to remain on my todo list for some longer. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-19 17:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200318153424.3202304-1-guro@fb.com> 2020-03-18 16:16 ` [PATCH] mm: hugetlb: fix hugetlb_cma_reserve() if CONFIG_NUMA isn't set Michal Hocko 2020-03-18 17:55 ` Roman Gushchin 2020-03-19 16:16 ` Michal Hocko 2020-03-19 16:56 ` Rik van Riel 2020-03-19 17:01 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).