All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
@ 2019-03-14  9:39 Vlastimil Babka
  2019-03-14  9:42 ` [PATCH v2] " Vlastimil Babka
  0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2019-03-14  9:39 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Kirill A. Shutemov, Mel Gorman, Takashi Iwai,
	Vlastimil Babka

alloc_pages_exact*() allocates a page of sufficient order and then splits it
to return only the number of pages requested. That makes it incompatible with
__GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size (possibly
depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the spliting when __GFP_COMP is passed, and return the whole
compound page. However if caller then returns it via free_pages_exact(),
that will be unexpected and the freeing actions there will be wrong.

2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
things make break later somewhere.

3) Warn and return NULL. However NULL may be unexpected, especially for
small sizes.

This patch picks option 3, as it's best defined.

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b9f577b1a2a..3127d47afaa7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 /**
  * alloc_pages_exact - allocate an exact number physically-contiguous pages.
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * This function is similar to alloc_pages(), except that it allocates the
  * minimum number of pages to satisfy the request.  alloc_pages() can only
@@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	unsigned long addr;
 
 	addr = __get_free_pages(gfp_mask, order);
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		gfp_mask &= ~__GFP_COMP;
+
 	return make_alloc_exact(addr, order, size);
 }
 EXPORT_SYMBOL(alloc_pages_exact);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14  9:39 [PATCH] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact() Vlastimil Babka
@ 2019-03-14  9:42 ` Vlastimil Babka
  2019-03-14 10:15   ` Michal Hocko
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Vlastimil Babka @ 2019-03-14  9:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Kirill A. Shutemov, Mel Gorman, Takashi Iwai,
	Vlastimil Babka

alloc_pages_exact*() allocates a page of sufficient order and then splits it
to return only the number of pages requested. That makes it incompatible with
__GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size (possibly
depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the spliting when __GFP_COMP is passed, and return the whole
compound page. However if caller then returns it via free_pages_exact(),
that will be unexpected and the freeing actions there will be wrong.

2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
things may break later somewhere.

3) Warn and return NULL. However NULL may be unexpected, especially for
small sizes.

This patch picks option 3, as it's best defined.

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Sent v1 before amending commit, sorry.

 mm/page_alloc.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b9f577b1a2a..dd3f89e8f88d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 /**
  * alloc_pages_exact - allocate an exact number physically-contiguous pages.
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * This function is similar to alloc_pages(), except that it allocates the
  * minimum number of pages to satisfy the request.  alloc_pages() can only
@@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	unsigned long addr;
 
 	addr = __get_free_pages(gfp_mask, order);
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		return NULL;
+
 	return make_alloc_exact(addr, order, size);
 }
 EXPORT_SYMBOL(alloc_pages_exact);
@@ -4777,7 +4781,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *			   pages on a node.
  * @nid: the preferred node ID where memory should be allocated
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * Like alloc_pages_exact(), but try to allocate on node nid first before falling
  * back.
@@ -4785,7 +4789,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
-	struct page *p = alloc_pages_node(nid, gfp_mask, order);
+	struct page *p;
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		return NULL;
+
+	p = alloc_pages_node(nid, gfp_mask, order);
 	if (!p)
 		return NULL;
 	return make_alloc_exact((unsigned long)page_address(p), order, size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14  9:42 ` [PATCH v2] " Vlastimil Babka
@ 2019-03-14 10:15   ` Michal Hocko
  2019-03-14 10:30     ` Vlastimil Babka
  2019-03-14 18:51   ` Kirill A. Shutemov
  2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
  2 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-03-14 10:15 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Kirill A. Shutemov, Mel Gorman, Takashi Iwai

On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> to return only the number of pages requested. That makes it incompatible with
> __GFP_COMP, because compound pages cannot be split.
> 
> As shown by [1] things may silently work until the requested size (possibly
> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> 
> There are several options here, none of them great:
> 
> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> compound page. However if caller then returns it via free_pages_exact(),
> that will be unexpected and the freeing actions there will be wrong.
> 
> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> things may break later somewhere.
> 
> 3) Warn and return NULL. However NULL may be unexpected, especially for
> small sizes.
> 
> This patch picks option 3, as it's best defined.

The question is whether callers of alloc_pages_exact do have any
fallback because if they don't then this is forcing an always fail path
and I strongly suspect this is not really what users want. I would
rather go with 2) because "callers wanted it" is much less probable than
"caller is simply confused and more gfp flags is surely better than
fewer".

> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Sent v1 before amending commit, sorry.
> 
>  mm/page_alloc.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0b9f577b1a2a..dd3f89e8f88d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  /**
>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * This function is similar to alloc_pages(), except that it allocates the
>   * minimum number of pages to satisfy the request.  alloc_pages() can only
> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned long addr;
>  
>  	addr = __get_free_pages(gfp_mask, order);
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +
>  	return make_alloc_exact(addr, order, size);
>  }
>  EXPORT_SYMBOL(alloc_pages_exact);
> @@ -4777,7 +4781,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>   *			   pages on a node.
>   * @nid: the preferred node ID where memory should be allocated
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * Like alloc_pages_exact(), but try to allocate on node nid first before falling
>   * back.
> @@ -4785,7 +4789,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
>  void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>  {
>  	unsigned int order = get_order(size);
> -	struct page *p = alloc_pages_node(nid, gfp_mask, order);
> +	struct page *p;
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +
> +	p = alloc_pages_node(nid, gfp_mask, order);
>  	if (!p)
>  		return NULL;
>  	return make_alloc_exact((unsigned long)page_address(p), order, size);
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 10:15   ` Michal Hocko
@ 2019-03-14 10:30     ` Vlastimil Babka
  2019-03-14 11:36       ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2019-03-14 10:30 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Kirill A. Shutemov, Mel Gorman, Takashi Iwai

On 3/14/19 11:15 AM, Michal Hocko wrote:
> On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
>> alloc_pages_exact*() allocates a page of sufficient order and then splits it
>> to return only the number of pages requested. That makes it incompatible with
>> __GFP_COMP, because compound pages cannot be split.
>> 
>> As shown by [1] things may silently work until the requested size (possibly
>> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
>> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
>> 
>> There are several options here, none of them great:
>> 
>> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
>> compound page. However if caller then returns it via free_pages_exact(),
>> that will be unexpected and the freeing actions there will be wrong.
>> 
>> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
>> things may break later somewhere.
>> 
>> 3) Warn and return NULL. However NULL may be unexpected, especially for
>> small sizes.
>> 
>> This patch picks option 3, as it's best defined.
> 
> The question is whether callers of alloc_pages_exact do have any
> fallback because if they don't then this is forcing an always fail path
> and I strongly suspect this is not really what users want. I would
> rather go with 2) because "callers wanted it" is much less probable than
> "caller is simply confused and more gfp flags is surely better than
> fewer".

I initially went with 2 as well, as you can see from v1 :) but then I looked at
the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
that the pages are then mapped to userspace. Breaking that didn't seem good.

The point is that with the warning in place, A developer will immediately know
that they did something wrong, regardless if the size is power-of-two or not.
But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
still sit silently for a while.

But maybe we could go with 1) if free_pages_exact() is also adjusted to check
for CompoundPage and free it properly?

>> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf

>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>> Sent v1 before amending commit, sorry.
>> 
>>  mm/page_alloc.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 0b9f577b1a2a..dd3f89e8f88d 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>>  /**
>>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>>   * @size: the number of bytes to allocate
>> - * @gfp_mask: GFP flags for the allocation
>> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>>   *
>>   * This function is similar to alloc_pages(), except that it allocates the
>>   * minimum number of pages to satisfy the request.  alloc_pages() can only
>> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>>  	unsigned long addr;
>>  
>>  	addr = __get_free_pages(gfp_mask, order);
>> +
>> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
>> +		return NULL;
>> +
>>  	return make_alloc_exact(addr, order, size);
>>  }
>>  EXPORT_SYMBOL(alloc_pages_exact);
>> @@ -4777,7 +4781,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>>   *			   pages on a node.
>>   * @nid: the preferred node ID where memory should be allocated
>>   * @size: the number of bytes to allocate
>> - * @gfp_mask: GFP flags for the allocation
>> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>>   *
>>   * Like alloc_pages_exact(), but try to allocate on node nid first before falling
>>   * back.
>> @@ -4785,7 +4789,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
>>  void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>>  {
>>  	unsigned int order = get_order(size);
>> -	struct page *p = alloc_pages_node(nid, gfp_mask, order);
>> +	struct page *p;
>> +
>> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
>> +		return NULL;
>> +
>> +	p = alloc_pages_node(nid, gfp_mask, order);
>>  	if (!p)
>>  		return NULL;
>>  	return make_alloc_exact((unsigned long)page_address(p), order, size);
>> -- 
>> 2.20.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 10:30     ` Vlastimil Babka
@ 2019-03-14 11:36       ` Michal Hocko
  2019-03-14 11:56         ` Takashi Iwai
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-03-14 11:36 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Kirill A. Shutemov, Mel Gorman, Takashi Iwai

On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> On 3/14/19 11:15 AM, Michal Hocko wrote:
> > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> >> to return only the number of pages requested. That makes it incompatible with
> >> __GFP_COMP, because compound pages cannot be split.
> >> 
> >> As shown by [1] things may silently work until the requested size (possibly
> >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> >> 
> >> There are several options here, none of them great:
> >> 
> >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> >> compound page. However if caller then returns it via free_pages_exact(),
> >> that will be unexpected and the freeing actions there will be wrong.
> >> 
> >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> >> things may break later somewhere.
> >> 
> >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> >> small sizes.
> >> 
> >> This patch picks option 3, as it's best defined.
> > 
> > The question is whether callers of alloc_pages_exact do have any
> > fallback because if they don't then this is forcing an always fail path
> > and I strongly suspect this is not really what users want. I would
> > rather go with 2) because "callers wanted it" is much less probable than
> > "caller is simply confused and more gfp flags is surely better than
> > fewer".
> 
> I initially went with 2 as well, as you can see from v1 :) but then I looked at
> the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> that the pages are then mapped to userspace. Breaking that didn't seem good.

It used the flag legitimately before because they were allocating
compound pages but now they don't so this is just a conversion bug.
Why should we screw up the helper for that reason? Or put in other words
why a silent fix up adds any risk?

> The point is that with the warning in place, A developer will immediately know
> that they did something wrong, regardless if the size is power-of-two or not.
> But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
> still sit silently for a while.
> 
> But maybe we could go with 1) if free_pages_exact() is also adjusted to check
> for CompoundPage and free it properly?

I dunno, it sounds like it adds even more confusion.

> >> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 11:36       ` Michal Hocko
@ 2019-03-14 11:56         ` Takashi Iwai
  2019-03-14 12:09           ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Takashi Iwai @ 2019-03-14 11:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman, Takashi Iwai

On Thu, 14 Mar 2019 12:36:26 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > >> to return only the number of pages requested. That makes it incompatible with
> > >> __GFP_COMP, because compound pages cannot be split.
> > >> 
> > >> As shown by [1] things may silently work until the requested size (possibly
> > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > >> 
> > >> There are several options here, none of them great:
> > >> 
> > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > >> compound page. However if caller then returns it via free_pages_exact(),
> > >> that will be unexpected and the freeing actions there will be wrong.
> > >> 
> > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > >> things may break later somewhere.
> > >> 
> > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > >> small sizes.
> > >> 
> > >> This patch picks option 3, as it's best defined.
> > > 
> > > The question is whether callers of alloc_pages_exact do have any
> > > fallback because if they don't then this is forcing an always fail path
> > > and I strongly suspect this is not really what users want. I would
> > > rather go with 2) because "callers wanted it" is much less probable than
> > > "caller is simply confused and more gfp flags is surely better than
> > > fewer".
> > 
> > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > that the pages are then mapped to userspace. Breaking that didn't seem good.
> 
> It used the flag legitimately before because they were allocating
> compound pages but now they don't so this is just a conversion bug.

We still use __GFP_COMP for allocation of the sound buffers that are
also mmapped to user-space.  The mentioned commit above [2] was
reverted later.

But honestly speaking, I'm not sure whether we still need the compound
pages.  The change was introduced long time ago (commit f3d48f0373c1
in 2005).  Is it superfluous nowadays...?

> Why should we screw up the helper for that reason? Or put in other words
> why a silent fix up adds any risk?

IMO, it's good to catch the incompatible usage as early as possible,
so that others won't hit the same failure again like I did.  There
aren't so many users of __GFP_COMP in the whole tree, after all.


thanks,

Takashi

> > The point is that with the warning in place, A developer will immediately know
> > that they did something wrong, regardless if the size is power-of-two or not.
> > But yeah, if it's adding of __GFP_COMP that is not deterministic, a bug can
> > still sit silently for a while.
> > 
> > But maybe we could go with 1) if free_pages_exact() is also adjusted to check
> > for CompoundPage and free it properly?
> 
> I dunno, it sounds like it adds even more confusion.
> 
> > >> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> > 
> > [2]
> > https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=3a6d1980fe96dbbfe3ae58db0048867f5319cdbf
> -- 
> Michal Hocko
> SUSE Labs
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 11:56         ` Takashi Iwai
@ 2019-03-14 12:09           ` Michal Hocko
  2019-03-14 13:15             ` Takashi Iwai
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-03-14 12:09 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman

On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 12:36:26 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > > >> to return only the number of pages requested. That makes it incompatible with
> > > >> __GFP_COMP, because compound pages cannot be split.
> > > >> 
> > > >> As shown by [1] things may silently work until the requested size (possibly
> > > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > > >> 
> > > >> There are several options here, none of them great:
> > > >> 
> > > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > > >> compound page. However if caller then returns it via free_pages_exact(),
> > > >> that will be unexpected and the freeing actions there will be wrong.
> > > >> 
> > > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > > >> things may break later somewhere.
> > > >> 
> > > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > > >> small sizes.
> > > >> 
> > > >> This patch picks option 3, as it's best defined.
> > > > 
> > > > The question is whether callers of alloc_pages_exact do have any
> > > > fallback because if they don't then this is forcing an always fail path
> > > > and I strongly suspect this is not really what users want. I would
> > > > rather go with 2) because "callers wanted it" is much less probable than
> > > > "caller is simply confused and more gfp flags is surely better than
> > > > fewer".
> > > 
> > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > 
> > It used the flag legitimately before because they were allocating
> > compound pages but now they don't so this is just a conversion bug.
> 
> We still use __GFP_COMP for allocation of the sound buffers that are
> also mmapped to user-space.  The mentioned commit above [2] was
> reverted later.

Yes, I understand that part. __GFP_COMP makes sense on a comound page.
But if you are using alloc_pages_exact then the flag doesn't make sense
because split out should already do what you want. Unless I am missing
something.

> But honestly speaking, I'm not sure whether we still need the compound
> pages.  The change was introduced long time ago (commit f3d48f0373c1
> in 2005).  Is it superfluous nowadays...?

AFAIU alloc_pages_exact should do do what you need.

> > Why should we screw up the helper for that reason? Or put in other words
> > why a silent fix up adds any risk?
> 
> IMO, it's good to catch the incompatible usage as early as possible,
> so that others won't hit the same failure again like I did.  There
> aren't so many users of __GFP_COMP in the whole tree, after all.

Yes, completely agreed and warning with a fixup sounds like the safest
option to me. Returning NULL is risky because it essentially introduces a
permanent failure mode as already pointed out.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 12:09           ` Michal Hocko
@ 2019-03-14 13:15             ` Takashi Iwai
  2019-03-14 13:29               ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Takashi Iwai @ 2019-03-14 13:15 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman

On Thu, 14 Mar 2019 13:09:39 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 12:36:26 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > > On 3/14/19 11:15 AM, Michal Hocko wrote:
> > > > > On Thu 14-03-19 10:42:49, Vlastimil Babka wrote:
> > > > >> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> > > > >> to return only the number of pages requested. That makes it incompatible with
> > > > >> __GFP_COMP, because compound pages cannot be split.
> > > > >> 
> > > > >> As shown by [1] things may silently work until the requested size (possibly
> > > > >> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> > > > >> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> > > > >> 
> > > > >> There are several options here, none of them great:
> > > > >> 
> > > > >> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> > > > >> compound page. However if caller then returns it via free_pages_exact(),
> > > > >> that will be unexpected and the freeing actions there will be wrong.
> > > > >> 
> > > > >> 2) Warn and remove __GFP_COMP from the flags. But the caller wanted it, so
> > > > >> things may break later somewhere.
> > > > >> 
> > > > >> 3) Warn and return NULL. However NULL may be unexpected, especially for
> > > > >> small sizes.
> > > > >> 
> > > > >> This patch picks option 3, as it's best defined.
> > > > > 
> > > > > The question is whether callers of alloc_pages_exact do have any
> > > > > fallback because if they don't then this is forcing an always fail path
> > > > > and I strongly suspect this is not really what users want. I would
> > > > > rather go with 2) because "callers wanted it" is much less probable than
> > > > > "caller is simply confused and more gfp flags is surely better than
> > > > > fewer".
> > > > 
> > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > 
> > > It used the flag legitimately before because they were allocating
> > > compound pages but now they don't so this is just a conversion bug.
> > 
> > We still use __GFP_COMP for allocation of the sound buffers that are
> > also mmapped to user-space.  The mentioned commit above [2] was
> > reverted later.
> 
> Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> But if you are using alloc_pages_exact then the flag doesn't make sense
> because split out should already do what you want. Unless I am missing
> something.

The __GFP_COMP was taken as a sort of workaround for the problem wrt
mmap I already forgot.  If it can be eliminated, it's all good.

> > But honestly speaking, I'm not sure whether we still need the compound
> > pages.  The change was introduced long time ago (commit f3d48f0373c1
> > in 2005).  Is it superfluous nowadays...?
> 
> AFAIU alloc_pages_exact should do do what you need.

OK, I'll try whether it works with alloc_pages_exact() and dropping
__GFP_COMP.


Thanks!

Takashi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 13:15             ` Takashi Iwai
@ 2019-03-14 13:29               ` Michal Hocko
  2019-03-14 16:52                 ` Takashi Iwai
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2019-03-14 13:29 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman

On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 13:09:39 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > Michal Hocko wrote:
> > > > 
> > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
[...]
> > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > 
> > > > It used the flag legitimately before because they were allocating
> > > > compound pages but now they don't so this is just a conversion bug.
> > > 
> > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > also mmapped to user-space.  The mentioned commit above [2] was
> > > reverted later.
> > 
> > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > But if you are using alloc_pages_exact then the flag doesn't make sense
> > because split out should already do what you want. Unless I am missing
> > something.
> 
> The __GFP_COMP was taken as a sort of workaround for the problem wrt
> mmap I already forgot.  If it can be eliminated, it's all good.

Without __GFP_COMP you would get tail pages which are not setup properly
AFAIU. With alloc_pages_exact you should get an "array" of head pages
which are properly reference counted. But I might misunderstood the
original problem which __GFP_COMP tried to solve.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 13:29               ` Michal Hocko
@ 2019-03-14 16:52                 ` Takashi Iwai
  2019-03-14 17:37                   ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Takashi Iwai @ 2019-03-14 16:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman, Hugh Dickins

On Thu, 14 Mar 2019 14:29:33 +0100,
Michal Hocko wrote:
> 
> On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 13:09:39 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > Michal Hocko wrote:
> > > > > 
> > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> [...]
> > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > 
> > > > > It used the flag legitimately before because they were allocating
> > > > > compound pages but now they don't so this is just a conversion bug.
> > > > 
> > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > reverted later.
> > > 
> > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > because split out should already do what you want. Unless I am missing
> > > something.
> > 
> > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > mmap I already forgot.  If it can be eliminated, it's all good.
> 
> Without __GFP_COMP you would get tail pages which are not setup properly
> AFAIU. With alloc_pages_exact you should get an "array" of head pages
> which are properly reference counted. But I might misunderstood the
> original problem which __GFP_COMP tried to solve.

I only vaguely remember that it was about a Bad Page error for the
reserved pages, but forgot the all details, sorry.

Hugh, could you confirm whether we still need __GFP_COMP in the sound
buffer allocations?  FWIW, it's the change introduced by the ancient
commit f3d48f0373c1.


thanks,

Takashi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 16:52                 ` Takashi Iwai
@ 2019-03-14 17:37                   ` Hugh Dickins
  2019-03-14 18:00                     ` Takashi Iwai
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2019-03-14 17:37 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Michal Hocko, Vlastimil Babka, linux-mm, Kirill A. Shutemov,
	Mel Gorman, Hugh Dickins

On Thu, 14 Mar 2019, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 14:29:33 +0100,
> Michal Hocko wrote:
> > 
> > On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > > On Thu, 14 Mar 2019 13:09:39 +0100,
> > > Michal Hocko wrote:
> > > > 
> > > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > > Michal Hocko wrote:
> > > > > > 
> > > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > [...]
> > > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > > 
> > > > > > It used the flag legitimately before because they were allocating
> > > > > > compound pages but now they don't so this is just a conversion bug.
> > > > > 
> > > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > > reverted later.
> > > > 
> > > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > > because split out should already do what you want. Unless I am missing
> > > > something.
> > > 
> > > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > > mmap I already forgot.  If it can be eliminated, it's all good.
> > 
> > Without __GFP_COMP you would get tail pages which are not setup properly
> > AFAIU. With alloc_pages_exact you should get an "array" of head pages
> > which are properly reference counted. But I might misunderstood the
> > original problem which __GFP_COMP tried to solve.
> 
> I only vaguely remember that it was about a Bad Page error for the
> reserved pages, but forgot the all details, sorry.
> 
> Hugh, could you confirm whether we still need __GFP_COMP in the sound
> buffer allocations?  FWIW, it's the change introduced by the ancient
> commit f3d48f0373c1.

I'm not confident in finding all "the sound buffer allocations".
Where you're using alloc_pages_exact() for them, you do not need
__GFP_COMP, and should not pass it.  But if there are other places
where you use one of those page allocators with an "order" argument
non-zero, and map that buffer into userspace (without any split_page()),
there you would still need the __GFP_COMP - zap_pte_range() and others
do the wrong thing on tail ptes if the non-zero-order page has neither
been set up as compound nor split into zero-order pages.

Hugh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 17:37                   ` Hugh Dickins
@ 2019-03-14 18:00                     ` Takashi Iwai
  2019-03-14 18:15                       ` Hugh Dickins
  0 siblings, 1 reply; 19+ messages in thread
From: Takashi Iwai @ 2019-03-14 18:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Michal Hocko, Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman

On Thu, 14 Mar 2019 18:37:06 +0100,
Hugh Dickins wrote:
> 
> On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 14:29:33 +0100,
> > Michal Hocko wrote:
> > > 
> > > On Thu 14-03-19 14:15:38, Takashi Iwai wrote:
> > > > On Thu, 14 Mar 2019 13:09:39 +0100,
> > > > Michal Hocko wrote:
> > > > > 
> > > > > On Thu 14-03-19 12:56:43, Takashi Iwai wrote:
> > > > > > On Thu, 14 Mar 2019 12:36:26 +0100,
> > > > > > Michal Hocko wrote:
> > > > > > > 
> > > > > > > On Thu 14-03-19 11:30:03, Vlastimil Babka wrote:
> > > [...]
> > > > > > > > I initially went with 2 as well, as you can see from v1 :) but then I looked at
> > > > > > > > the commit [2] mentioned in [1] and I think ALSA legitimaly uses __GFP_COMP so
> > > > > > > > that the pages are then mapped to userspace. Breaking that didn't seem good.
> > > > > > > 
> > > > > > > It used the flag legitimately before because they were allocating
> > > > > > > compound pages but now they don't so this is just a conversion bug.
> > > > > > 
> > > > > > We still use __GFP_COMP for allocation of the sound buffers that are
> > > > > > also mmapped to user-space.  The mentioned commit above [2] was
> > > > > > reverted later.
> > > > > 
> > > > > Yes, I understand that part. __GFP_COMP makes sense on a comound page.
> > > > > But if you are using alloc_pages_exact then the flag doesn't make sense
> > > > > because split out should already do what you want. Unless I am missing
> > > > > something.
> > > > 
> > > > The __GFP_COMP was taken as a sort of workaround for the problem wrt
> > > > mmap I already forgot.  If it can be eliminated, it's all good.
> > > 
> > > Without __GFP_COMP you would get tail pages which are not setup properly
> > > AFAIU. With alloc_pages_exact you should get an "array" of head pages
> > > which are properly reference counted. But I might misunderstood the
> > > original problem which __GFP_COMP tried to solve.
> > 
> > I only vaguely remember that it was about a Bad Page error for the
> > reserved pages, but forgot the all details, sorry.
> > 
> > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > buffer allocations?  FWIW, it's the change introduced by the ancient
> > commit f3d48f0373c1.
> 
> I'm not confident in finding all "the sound buffer allocations".
> Where you're using alloc_pages_exact() for them, you do not need
> __GFP_COMP, and should not pass it.

It was my fault attempt to convert to alloc_pages_exact() and hitting
the incompatibility with __GFP_COMP, so it was reverted in the end.

> But if there are other places
> where you use one of those page allocators with an "order" argument
> non-zero, and map that buffer into userspace (without any split_page()),
> there you would still need the __GFP_COMP - zap_pte_range() and others
> do the wrong thing on tail ptes if the non-zero-order page has neither
> been set up as compound nor split into zero-order pages.

Hm, what if we allocate the whole pages via alloc_pages_exact() (but
without __GFP_COMP)?  Can we mmap them properly to user-space like
before, or it won't work as-is?


thanks,

Takashi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 18:00                     ` Takashi Iwai
@ 2019-03-14 18:15                       ` Hugh Dickins
  2019-03-14 20:13                         ` Takashi Iwai
  0 siblings, 1 reply; 19+ messages in thread
From: Hugh Dickins @ 2019-03-14 18:15 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Hugh Dickins, Michal Hocko, Vlastimil Babka, linux-mm,
	Kirill A. Shutemov, Mel Gorman

On Thu, 14 Mar 2019, Takashi Iwai wrote:
> On Thu, 14 Mar 2019 18:37:06 +0100,Hugh Dickins wrote:
> > On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > > 
> > > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > > buffer allocations?  FWIW, it's the change introduced by the ancient
> > > commit f3d48f0373c1.
> > 
> > I'm not confident in finding all "the sound buffer allocations".
> > Where you're using alloc_pages_exact() for them, you do not need
> > __GFP_COMP, and should not pass it.
> 
> It was my fault attempt to convert to alloc_pages_exact() and hitting
> the incompatibility with __GFP_COMP, so it was reverted in the end.
> 
> > But if there are other places
> > where you use one of those page allocators with an "order" argument
> > non-zero, and map that buffer into userspace (without any split_page()),
> > there you would still need the __GFP_COMP - zap_pte_range() and others
> > do the wrong thing on tail ptes if the non-zero-order page has neither
> > been set up as compound nor split into zero-order pages.
> 
> Hm, what if we allocate the whole pages via alloc_pages_exact() (but
> without __GFP_COMP)?  Can we mmap them properly to user-space like
> before, or it won't work as-is?

Yes, you can map the alloc_pages_exact() pages to user-space as
before, whether or not it ended up using a whole non-zero-order page:
alloc_pages_exact() does a split_page(), so the subpages end up all just
ordinary order-zero pages (and need to be freed individually, which
free_pages_exact() does for you).

Hugh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14  9:42 ` [PATCH v2] " Vlastimil Babka
  2019-03-14 10:15   ` Michal Hocko
@ 2019-03-14 18:51   ` Kirill A. Shutemov
  2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
  2 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2019-03-14 18:51 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Michal Hocko, Mel Gorman, Takashi Iwai

On Thu, Mar 14, 2019 at 09:42:49AM +0000, Vlastimil Babka wrote:
> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  /**
>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * This function is similar to alloc_pages(), except that it allocates the
>   * minimum number of pages to satisfy the request.  alloc_pages() can only
> @@ -4768,6 +4768,10 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned long addr;
>  
>  	addr = __get_free_pages(gfp_mask, order);
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		return NULL;
> +

Shouldn't it be before __get_free_pages() call? :P

>  	return make_alloc_exact(addr, order, size);
>  }
>  EXPORT_SYMBOL(alloc_pages_exact);

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14 18:15                       ` Hugh Dickins
@ 2019-03-14 20:13                         ` Takashi Iwai
  0 siblings, 0 replies; 19+ messages in thread
From: Takashi Iwai @ 2019-03-14 20:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Michal Hocko, Vlastimil Babka, linux-mm, Kirill A. Shutemov, Mel Gorman

On Thu, 14 Mar 2019 19:15:22 +0100,
Hugh Dickins wrote:
> 
> On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > On Thu, 14 Mar 2019 18:37:06 +0100,Hugh Dickins wrote:
> > > On Thu, 14 Mar 2019, Takashi Iwai wrote:
> > > > 
> > > > Hugh, could you confirm whether we still need __GFP_COMP in the sound
> > > > buffer allocations?  FWIW, it's the change introduced by the ancient
> > > > commit f3d48f0373c1.
> > > 
> > > I'm not confident in finding all "the sound buffer allocations".
> > > Where you're using alloc_pages_exact() for them, you do not need
> > > __GFP_COMP, and should not pass it.
> > 
> > It was my fault attempt to convert to alloc_pages_exact() and hitting
> > the incompatibility with __GFP_COMP, so it was reverted in the end.
> > 
> > > But if there are other places
> > > where you use one of those page allocators with an "order" argument
> > > non-zero, and map that buffer into userspace (without any split_page()),
> > > there you would still need the __GFP_COMP - zap_pte_range() and others
> > > do the wrong thing on tail ptes if the non-zero-order page has neither
> > > been set up as compound nor split into zero-order pages.
> > 
> > Hm, what if we allocate the whole pages via alloc_pages_exact() (but
> > without __GFP_COMP)?  Can we mmap them properly to user-space like
> > before, or it won't work as-is?
> 
> Yes, you can map the alloc_pages_exact() pages to user-space as
> before, whether or not it ended up using a whole non-zero-order page:
> alloc_pages_exact() does a split_page(), so the subpages end up all just
> ordinary order-zero pages (and need to be freed individually, which
> free_pages_exact() does for you).

Great, thanks for clarification!


Takashi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-14  9:42 ` [PATCH v2] " Vlastimil Babka
  2019-03-14 10:15   ` Michal Hocko
  2019-03-14 18:51   ` Kirill A. Shutemov
@ 2019-03-18 12:21   ` Vlastimil Babka
  2019-03-18 12:43     ` Michal Hocko
                       ` (2 more replies)
  2 siblings, 3 replies; 19+ messages in thread
From: Vlastimil Babka @ 2019-03-18 12:21 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Michal Hocko, Kirill A. Shutemov, Mel Gorman, Takashi Iwai, Hugh Dickins

OK here's a new version that changes the patch to remove __GFP_COMP per
the v2 discussion, and also fixes the bug Kirill spotted (thanks!).

----8<----
From 1fbc84c208573b885f51818ed823f89b3aa1e0ae Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 14 Mar 2019 10:19:30 +0100
Subject: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()

alloc_pages_exact*() allocates a page of sufficient order and then splits it
to return only the number of pages requested. That makes it incompatible with
__GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size (possibly
depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the spliting when __GFP_COMP is passed, and return the whole
compound page. However if caller then returns it via free_pages_exact(),
that will be unexpected and the freeing actions there will be wrong.

2) Warn and remove __GFP_COMP from the flags. But the caller may have really
wanted it, so things may break later somewhere.

3) Warn and return NULL. However NULL may be unexpected, especially for
small sizes.

This patch picks option 2, because as Michal Hocko put it: "callers wanted it"
is much less probable than "caller is simply confused and more gfp flags is
surely better than fewer".

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b9f577b1a2a..123d9a407599 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 /**
  * alloc_pages_exact - allocate an exact number physically-contiguous pages.
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * This function is similar to alloc_pages(), except that it allocates the
  * minimum number of pages to satisfy the request.  alloc_pages() can only
@@ -4767,6 +4767,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
 	unsigned int order = get_order(size);
 	unsigned long addr;
 
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		gfp_mask &= ~__GFP_COMP;
+
 	addr = __get_free_pages(gfp_mask, order);
 	return make_alloc_exact(addr, order, size);
 }
@@ -4777,7 +4780,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
  *			   pages on a node.
  * @nid: the preferred node ID where memory should be allocated
  * @size: the number of bytes to allocate
- * @gfp_mask: GFP flags for the allocation
+ * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
  *
  * Like alloc_pages_exact(), but try to allocate on node nid first before falling
  * back.
@@ -4785,7 +4788,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
 {
 	unsigned int order = get_order(size);
-	struct page *p = alloc_pages_node(nid, gfp_mask, order);
+	struct page *p;
+
+	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
+		gfp_mask &= ~__GFP_COMP;
+
+	p = alloc_pages_node(nid, gfp_mask, order);
 	if (!p)
 		return NULL;
 	return make_alloc_exact((unsigned long)page_address(p), order, size);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
@ 2019-03-18 12:43     ` Michal Hocko
  2019-03-19  8:45     ` Kirill A. Shutemov
  2019-03-19  9:47     ` Mel Gorman
  2 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-03-18 12:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Kirill A. Shutemov, Mel Gorman,
	Takashi Iwai, Hugh Dickins

On Mon 18-03-19 13:21:59, Vlastimil Babka wrote:
> OK here's a new version that changes the patch to remove __GFP_COMP per
> the v2 discussion, and also fixes the bug Kirill spotted (thanks!).
> 
> ----8<----
> >From 1fbc84c208573b885f51818ed823f89b3aa1e0ae Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 14 Mar 2019 10:19:30 +0100
> Subject: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
> 
> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> to return only the number of pages requested. That makes it incompatible with
> __GFP_COMP, because compound pages cannot be split.
> 
> As shown by [1] things may silently work until the requested size (possibly
> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> 
> There are several options here, none of them great:
> 
> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> compound page. However if caller then returns it via free_pages_exact(),
> that will be unexpected and the freeing actions there will be wrong.
> 
> 2) Warn and remove __GFP_COMP from the flags. But the caller may have really
> wanted it, so things may break later somewhere.
> 
> 3) Warn and return NULL. However NULL may be unexpected, especially for
> small sizes.
> 
> This patch picks option 2, because as Michal Hocko put it: "callers wanted it"
> is much less probable than "caller is simply confused and more gfp flags is
> surely better than fewer".
> 
> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/page_alloc.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0b9f577b1a2a..123d9a407599 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4752,7 +4752,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
>  /**
>   * alloc_pages_exact - allocate an exact number physically-contiguous pages.
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * This function is similar to alloc_pages(), except that it allocates the
>   * minimum number of pages to satisfy the request.  alloc_pages() can only
> @@ -4767,6 +4767,9 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
>  	unsigned int order = get_order(size);
>  	unsigned long addr;
>  
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		gfp_mask &= ~__GFP_COMP;
> +
>  	addr = __get_free_pages(gfp_mask, order);
>  	return make_alloc_exact(addr, order, size);
>  }
> @@ -4777,7 +4780,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
>   *			   pages on a node.
>   * @nid: the preferred node ID where memory should be allocated
>   * @size: the number of bytes to allocate
> - * @gfp_mask: GFP flags for the allocation
> + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
>   *
>   * Like alloc_pages_exact(), but try to allocate on node nid first before falling
>   * back.
> @@ -4785,7 +4788,12 @@ EXPORT_SYMBOL(alloc_pages_exact);
>  void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
>  {
>  	unsigned int order = get_order(size);
> -	struct page *p = alloc_pages_node(nid, gfp_mask, order);
> +	struct page *p;
> +
> +	if (WARN_ON_ONCE(gfp_mask & __GFP_COMP))
> +		gfp_mask &= ~__GFP_COMP;
> +
> +	p = alloc_pages_node(nid, gfp_mask, order);
>  	if (!p)
>  		return NULL;
>  	return make_alloc_exact((unsigned long)page_address(p), order, size);
> -- 
> 2.21.0
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
  2019-03-18 12:43     ` Michal Hocko
@ 2019-03-19  8:45     ` Kirill A. Shutemov
  2019-03-19  9:47     ` Mel Gorman
  2 siblings, 0 replies; 19+ messages in thread
From: Kirill A. Shutemov @ 2019-03-19  8:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Michal Hocko, Kirill A. Shutemov,
	Mel Gorman, Takashi Iwai, Hugh Dickins

On Mon, Mar 18, 2019 at 01:21:59PM +0100, Vlastimil Babka wrote:
> OK here's a new version that changes the patch to remove __GFP_COMP per
> the v2 discussion, and also fixes the bug Kirill spotted (thanks!).
> 
> ----8<----
> From 1fbc84c208573b885f51818ed823f89b3aa1e0ae Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 14 Mar 2019 10:19:30 +0100
> Subject: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
> 
> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> to return only the number of pages requested. That makes it incompatible with
> __GFP_COMP, because compound pages cannot be split.
> 
> As shown by [1] things may silently work until the requested size (possibly
> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> 
> There are several options here, none of them great:
> 
> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> compound page. However if caller then returns it via free_pages_exact(),
> that will be unexpected and the freeing actions there will be wrong.
> 
> 2) Warn and remove __GFP_COMP from the flags. But the caller may have really
> wanted it, so things may break later somewhere.
> 
> 3) Warn and return NULL. However NULL may be unexpected, especially for
> small sizes.
> 
> This patch picks option 2, because as Michal Hocko put it: "callers wanted it"
> is much less probable than "caller is simply confused and more gfp flags is
> surely better than fewer".
> 
> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
  2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
  2019-03-18 12:43     ` Michal Hocko
  2019-03-19  8:45     ` Kirill A. Shutemov
@ 2019-03-19  9:47     ` Mel Gorman
  2 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2019-03-19  9:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Andrew Morton, Michal Hocko, Kirill A. Shutemov,
	Takashi Iwai, Hugh Dickins

On Mon, Mar 18, 2019 at 01:21:59PM +0100, Vlastimil Babka wrote:
> OK here's a new version that changes the patch to remove __GFP_COMP per
> the v2 discussion, and also fixes the bug Kirill spotted (thanks!).
> 
> ----8<----
> From 1fbc84c208573b885f51818ed823f89b3aa1e0ae Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 14 Mar 2019 10:19:30 +0100
> Subject: [PATCH v3] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
> 
> alloc_pages_exact*() allocates a page of sufficient order and then splits it
> to return only the number of pages requested. That makes it incompatible with
> __GFP_COMP, because compound pages cannot be split.
> 
> As shown by [1] things may silently work until the requested size (possibly
> depending on user) stops being power of two. Then for CONFIG_DEBUG_VM, BUG_ON()
> triggers in split_page(). Without CONFIG_DEBUG_VM, consequences are unclear.
> 
> There are several options here, none of them great:
> 
> 1) Don't do the spliting when __GFP_COMP is passed, and return the whole
> compound page. However if caller then returns it via free_pages_exact(),
> that will be unexpected and the freeing actions there will be wrong.
> 
> 2) Warn and remove __GFP_COMP from the flags. But the caller may have really
> wanted it, so things may break later somewhere.
> 
> 3) Warn and return NULL. However NULL may be unexpected, especially for
> small sizes.
> 
> This patch picks option 2, because as Michal Hocko put it: "callers wanted it"
> is much less probable than "caller is simply confused and more gfp flags is
> surely better than fewer".
> 
> [1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-03-19  9:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14  9:39 [PATCH] mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact() Vlastimil Babka
2019-03-14  9:42 ` [PATCH v2] " Vlastimil Babka
2019-03-14 10:15   ` Michal Hocko
2019-03-14 10:30     ` Vlastimil Babka
2019-03-14 11:36       ` Michal Hocko
2019-03-14 11:56         ` Takashi Iwai
2019-03-14 12:09           ` Michal Hocko
2019-03-14 13:15             ` Takashi Iwai
2019-03-14 13:29               ` Michal Hocko
2019-03-14 16:52                 ` Takashi Iwai
2019-03-14 17:37                   ` Hugh Dickins
2019-03-14 18:00                     ` Takashi Iwai
2019-03-14 18:15                       ` Hugh Dickins
2019-03-14 20:13                         ` Takashi Iwai
2019-03-14 18:51   ` Kirill A. Shutemov
2019-03-18 12:21   ` [PATCH v3] " Vlastimil Babka
2019-03-18 12:43     ` Michal Hocko
2019-03-19  8:45     ` Kirill A. Shutemov
2019-03-19  9:47     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.