[PATCH] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
@ 2022-05-28 15:39 Chen Lin
  2022-05-29 23:30 ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Chen Lin @ 2022-05-28 15:39 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Chen Lin

netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
the following process:

1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.

2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
__page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
is allocated, now current frag cache is 4K, alloc is success, 
nc->pagecnt_bias--.

3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.

4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
is equal to nc->pagecnt_bias, reset page count bias and offset to 
start of new frag. page_frag_alloc will return the 4K memory for a 
5K memory request.

5. The caller write on the extra 1k memory which is not actual allocated 
will cause memory corruption.

page_frag_alloc is for fragmented allocation. We should warn the caller 
to avoid memory corruption.

Signed-off-by: Chen Lin <chen45464546@163.com>
---
 mm/page_alloc.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3d..6c0db52 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5574,6 +5574,11 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 	struct page *page;
 	int offset;

+	/* frag_alloc is not suitable for memory alloc which fragsz
+	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
+	 */
+	WARN_ON(fragsz > PAGE_SIZE);
+
 	if (unlikely(!nc->va)) {
 refill:
 		page = __page_frag_cache_refill(nc, gfp_mask);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-28 15:39 [PATCH] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE Chen Lin
@ 2022-05-29 23:30 ` Andrew Morton
  2022-05-30 13:39   ` [PATCH v2] " Chen Lin
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2022-05-29 23:30 UTC (permalink / raw)
  To: Chen Lin; +Cc: linux-mm, linux-kernel, Alexander Duyck, netdev

On Sat, 28 May 2022 23:39:33 +0800 Chen Lin <chen45464546@163.com> wrote:

> netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
> the following process:
> 
> 1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.
> 
> 2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
> __page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
> is allocated, now current frag cache is 4K, alloc is success, 
> nc->pagecnt_bias--.
> 
> 3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.
> 
> 4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
> is equal to nc->pagecnt_bias, reset page count bias and offset to 
> start of new frag. page_frag_alloc will return the 4K memory for a 
> 5K memory request.
> 
> 5. The caller write on the extra 1k memory which is not actual allocated 
> will cause memory corruption.
> 
> page_frag_alloc is for fragmented allocation. We should warn the caller 
> to avoid memory corruption.
> 

Let's cc Alexander and the networking developers.

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5574,6 +5574,11 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>  	struct page *page;
>  	int offset;
>  
> +	/* frag_alloc is not suitable for memory alloc which fragsz
> +	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
> +	 */
> +	WARN_ON(fragsz > PAGE_SIZE);
> +
>  	if (unlikely(!nc->va)) {
>  refill:
>  		page = __page_frag_cache_refill(nc, gfp_mask);

Odd.  All this does is generate a warning.  If the kernel is corrupting
memory, that's a bug which needs fixing?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-29 23:30 ` Andrew Morton
@ 2022-05-30 13:39   ` Chen Lin
  2022-05-30 19:27     ` Jakub Kicinski
  2022-05-30 20:07     ` Andrew Morton
  0 siblings, 2 replies; 17+ messages in thread
From: Chen Lin @ 2022-05-30 13:39 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, alexander.h.duyck, netdev, Chen Lin

netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
the following process:

1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.

2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
__page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
is allocated, now current frag cache is 4K, alloc is success, 
nc->pagecnt_bias--.

3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.

4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
is equal to nc->pagecnt_bias, reset page count bias and offset to 
start of new frag. page_frag_alloc will return the 4K memory for a 
5K memory request.

5. The caller write on the extra 1k memory which is not actual allocated 
will cause memory corruption.

page_frag_alloc is for fragmented allocation. We should warn the caller 
to avoid memory corruption.

When fragsz is larger than one page, we report the failure and return.
I don't think it is a good idea to make efforts to support the
allocation of more than one page in this function because the total
frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small.
When the request is larger than one page, the caller should switch to
use other kernel interfaces, such as kmalloc and alloc_Pages.

This bug is mainly caused by the reuse of the previously allocated
frag cache memory by the following LARGER allocations. This bug existed
before page_frag_alloc was ported from __netdev_alloc_frag in 
net/core/skbuff.c, so most Linux versions have this problem.

Signed-off-by: Chen Lin <chen45464546@163.com>
---
 mm/page_alloc.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3d..1e9e2c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5574,6 +5574,16 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 	struct page *page;
 	int offset;

+	/* frag_alloc is not suitable for memory alloc which fragsz
+	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
+	 */
+	if (unlikely(fragsz > PAGE_SIZE)) {
+		WARN(1, "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported,
+			alloc fail\n", fragsz, PAGE_SIZE);
+
+		return NULL;
+	}
+
 	if (unlikely(!nc->va)) {
 refill:
 		page = __page_frag_cache_refill(nc, gfp_mask);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-30 13:39   ` [PATCH v2] " Chen Lin
@ 2022-05-30 19:27     ` Jakub Kicinski
  2022-05-30 19:29       ` Jakub Kicinski
  2022-05-30 20:07     ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-05-30 19:27 UTC (permalink / raw)
  To: Chen Lin; +Cc: akpm, linux-mm, linux-kernel, Alexander Duyck, netdev

On Mon, 30 May 2022 21:39:02 +0800 Chen Lin wrote:
> netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
> the following process:
> 
> 1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.
> 
> 2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
> __page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
> is allocated, now current frag cache is 4K, alloc is success, 
> nc->pagecnt_bias--.
> 
> 3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.
> 
> 4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
> is equal to nc->pagecnt_bias, reset page count bias and offset to 
> start of new frag. page_frag_alloc will return the 4K memory for a 
> 5K memory request.
> 
> 5. The caller write on the extra 1k memory which is not actual allocated 
> will cause memory corruption.
> 
> page_frag_alloc is for fragmented allocation. We should warn the caller 
> to avoid memory corruption.
> 
> When fragsz is larger than one page, we report the failure and return.
> I don't think it is a good idea to make efforts to support the
> allocation of more than one page in this function because the total
> frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small.
> When the request is larger than one page, the caller should switch to
> use other kernel interfaces, such as kmalloc and alloc_Pages.
> 
> This bug is mainly caused by the reuse of the previously allocated
> frag cache memory by the following LARGER allocations. This bug existed
> before page_frag_alloc was ported from __netdev_alloc_frag in 
> net/core/skbuff.c, so most Linux versions have this problem.
> 
> Signed-off-by: Chen Lin <chen45464546@163.com>
> ---
>  mm/page_alloc.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3d..1e9e2c4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5574,6 +5574,16 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>  	struct page *page;
>  	int offset;
>  
> +	/* frag_alloc is not suitable for memory alloc which fragsz
> +	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
> +	 */
> +	if (unlikely(fragsz > PAGE_SIZE)) {
> +		WARN(1, "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported,
> +			alloc fail\n", fragsz, PAGE_SIZE);
> +
> +		return NULL;
> +	}
> +
>  	if (unlikely(!nc->va)) {
>  refill:
>  		page = __page_frag_cache_refill(nc, gfp_mask);

Let's see what Alex says (fixing his email now). It seems a little too
drastic to me. I'd go with something like:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..360a545ee5e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5537,6 +5537,7 @@ EXPORT_SYMBOL(free_pages);
  * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
  */
 static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
+					     unsigned int fragsz,
 					     gfp_t gfp_mask)
 {
 	struct page *page = NULL;
@@ -5549,7 +5550,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 				PAGE_FRAG_CACHE_MAX_ORDER);
 	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
 #endif
-	if (unlikely(!page))
+	if (unlikely(!page && fragsz <= PAGE_SIZE))
 		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
 
 	nc->va = page ? page_address(page) : NULL;
@@ -5576,7 +5577,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 
 	if (unlikely(!nc->va)) {
 refill:
-		page = __page_frag_cache_refill(nc, gfp_mask);
+		page = __page_frag_cache_refill(nc, fragsz, gfp_mask);
 		if (!page)
 			return NULL;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-30 19:27     ` Jakub Kicinski
@ 2022-05-30 19:29       ` Jakub Kicinski
  2022-05-31 14:41         ` Chen Lin
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-05-30 19:29 UTC (permalink / raw)
  To: Chen Lin; +Cc: akpm, linux-mm, linux-kernel, Alexander Duyck, netdev

On Mon, 30 May 2022 12:27:05 -0700 Jakub Kicinski wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..360a545ee5e8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5537,6 +5537,7 @@ EXPORT_SYMBOL(free_pages);
>   * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
>   */
>  static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> +					     unsigned int fragsz,
>  					     gfp_t gfp_mask)
>  {
>  	struct page *page = NULL;
> @@ -5549,7 +5550,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>  				PAGE_FRAG_CACHE_MAX_ORDER);
>  	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>  #endif
> -	if (unlikely(!page))
> +	if (unlikely(!page && fragsz <= PAGE_SIZE))
>  		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>  
>  	nc->va = page ? page_address(page) : NULL;
> @@ -5576,7 +5577,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>  
>  	if (unlikely(!nc->va)) {
>  refill:
> -		page = __page_frag_cache_refill(nc, gfp_mask);
> +		page = __page_frag_cache_refill(nc, fragsz, gfp_mask);
>  		if (!page)
>  			return NULL;

Oh, well, the reuse also needs an update. We can slap a similar
condition next to the pfmemalloc check.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-30 13:39   ` [PATCH v2] " Chen Lin
  2022-05-30 19:27     ` Jakub Kicinski
@ 2022-05-30 20:07     ` Andrew Morton
  2022-05-31 14:43       ` [PATCH v3] " Chen Lin
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2022-05-30 20:07 UTC (permalink / raw)
  To: Chen Lin; +Cc: linux-mm, linux-kernel, alexander.h.duyck, netdev

On Mon, 30 May 2022 21:39:02 +0800 Chen Lin <chen45464546@163.com> wrote:

> netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
> the following process:
> 
> 1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.
> 
> 2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
> __page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
> is allocated, now current frag cache is 4K, alloc is success, 
> nc->pagecnt_bias--.
> 
> 3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.
> 
> 4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
> is equal to nc->pagecnt_bias, reset page count bias and offset to 
> start of new frag. page_frag_alloc will return the 4K memory for a 
> 5K memory request.
> 
> 5. The caller write on the extra 1k memory which is not actual allocated 
> will cause memory corruption.
> 
> page_frag_alloc is for fragmented allocation. We should warn the caller 
> to avoid memory corruption.
> 
> When fragsz is larger than one page, we report the failure and return.
> I don't think it is a good idea to make efforts to support the
> allocation of more than one page in this function because the total
> frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small.
> When the request is larger than one page, the caller should switch to
> use other kernel interfaces, such as kmalloc and alloc_Pages.
> 
> This bug is mainly caused by the reuse of the previously allocated
> frag cache memory by the following LARGER allocations. This bug existed
> before page_frag_alloc was ported from __netdev_alloc_frag in 
> net/core/skbuff.c, so most Linux versions have this problem.
> 

I won't attempt to address the large issues here (like, should
networking be changed to support this).  But I can nitpick :)

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5574,6 +5574,16 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>  	struct page *page;
>  	int offset;
>  
> +	/* frag_alloc is not suitable for memory alloc which fragsz

Like this please:

	/*
	 * frag_alloc...

> +	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
> +	 */
> +	if (unlikely(fragsz > PAGE_SIZE)) {
> +		WARN(1, "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported,
> +			alloc fail\n", fragsz, PAGE_SIZE);

It's neater to do

	if (WARN(fragsz > PAGE_SIZE, "alloc fragsz(%d...", ...))
		return NULL;

Also, you have a newline and a bunch of tabs in that string.

Also, please consider WARN_ONCE.  We don't want to provide misbehaved
or malicious userspace with the ability to flood the logs with
warnings.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re:Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-30 19:29       ` Jakub Kicinski
@ 2022-05-31 14:41         ` Chen Lin
  2022-05-31 15:14           ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Chen Lin @ 2022-05-31 14:41 UTC (permalink / raw)
  To: kuba; +Cc: akpm, linux-mm, linux-kernel, alexander.duyck, netdev, Chen Lin

At 2022-05-31 02:29:18, "Jakub Kicinski" <kuba@kernel.org> wrote:
>On Mon, 30 May 2022 12:27:05 -0700 Jakub Kicinski wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index e008a3df0485..360a545ee5e8 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5537,6 +5537,7 @@ EXPORT_SYMBOL(free_pages);
>>   * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
>>   */
>>  static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>> +					     unsigned int fragsz,
>>  					     gfp_t gfp_mask)
>>  {
>>  	struct page *page = NULL;
>> @@ -5549,7 +5550,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>  				PAGE_FRAG_CACHE_MAX_ORDER);
>>  	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>>  #endif
>> -	if (unlikely(!page))
>> +	if (unlikely(!page && fragsz <= PAGE_SIZE))
>>  		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>>  
>>  	nc->va = page ? page_address(page) : NULL;
>> @@ -5576,7 +5577,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>>  
>>  	if (unlikely(!nc->va)) {
>>  refill:
>> -		page = __page_frag_cache_refill(nc, gfp_mask);
>> +		page = __page_frag_cache_refill(nc, fragsz, gfp_mask);
>>  		if (!page)
>>  			return NULL;
>
>Oh, well, the reuse also needs an update. We can slap a similar
>condition next to the pfmemalloc check.

The sample code above cannot completely solve the current problem.
For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
__page_frag_cache_refill will return a memory of only 32768 bytes, so 
should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more 
work needs to be done


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-30 20:07     ` Andrew Morton
@ 2022-05-31 14:43       ` Chen Lin
  2022-05-31 23:45         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Chen Lin @ 2022-05-31 14:43 UTC (permalink / raw)
  To: akpm; +Cc: kuba, linux-mm, linux-kernel, alexander.duyck, netdev, Chen Lin

netdev_alloc_frag->page_frag_alloc may cause memory corruption in 
the following process:

1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.

2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in 
__page_frag_cache_refill to fill frag cache, then one page(eg:4K) 
is allocated, now current frag cache is 4K, alloc is success, 
nc->pagecnt_bias--.

3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.

4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount 
is equal to nc->pagecnt_bias, reset page count bias and offset to 
start of new frag. page_frag_alloc will return the 4K memory for a 
5K memory request.

5. The caller write on the extra 1k memory which is not actual allocated 
will cause memory corruption.

page_frag_alloc is for fragmented allocation. We should warn the caller 
to avoid memory corruption.

When fragsz is larger than one page, we report the failure and return.
I don't think it is a good idea to make efforts to support the
allocation of more than one page in this function because the total
frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small.
When the request is larger than one page, the caller should switch to
use other kernel interfaces, such as kmalloc and alloc_Pages.

This bug is mainly caused by the reuse of the previously allocated
frag cache memory by the following LARGER allocations. This bug existed
before page_frag_alloc was ported from __netdev_alloc_frag in 
net/core/skbuff.c, so most Linux versions have this problem.

Signed-off-by: Chen Lin <chen45464546@163.com>
---
 mm/page_alloc.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3d..ffc42b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5574,6 +5574,15 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 	struct page *page;
 	int offset;

+	/*
+	 * frag_alloc is not suitable for memory alloc which fragsz
+	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
+	 */
+	if (WARN_ONCE(fragz > PAGE_SIZE,
+		      "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported, alloc fail\n",
+		      fragsz, PAGE_SIZE))
+		return NULL;
+
 	if (unlikely(!nc->va)) {
 refill:
 		page = __page_frag_cache_refill(nc, gfp_mask);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 14:41         ` Chen Lin
@ 2022-05-31 15:14           ` Jakub Kicinski
  2022-05-31 15:36             ` Chen Lin
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-05-31 15:14 UTC (permalink / raw)
  To: Chen Lin; +Cc: akpm, linux-mm, linux-kernel, alexander.duyck, netdev

On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:
> At 2022-05-31 02:29:18, "Jakub Kicinski" <kuba@kernel.org> wrote:
> >Oh, well, the reuse also needs an update. We can slap a similar
> >condition next to the pfmemalloc check.  
> 
> The sample code above cannot completely solve the current problem.
> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
> __page_frag_cache_refill will return a memory of only 32768 bytes, so 
> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more 
> work needs to be done

Right, but I can think of two drivers off the top of my head which will
allocate <=32k frags but none which will allocate more.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re:Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 15:14           ` Jakub Kicinski
@ 2022-05-31 15:36             ` Chen Lin
  2022-05-31 15:47               ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Chen Lin @ 2022-05-31 15:36 UTC (permalink / raw)
  To: kuba; +Cc: akpm, linux-mm, linux-kernel, alexander.duyck, netdev, Chen Lin

At 2022-05-31 22:14:12, "Jakub Kicinski" <kuba@kernel.org> wrote:
>On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:
>> At 2022-05-31 02:29:18, "Jakub Kicinski" <kuba@kernel.org> wrote:
>> >Oh, well, the reuse also needs an update. We can slap a similar
>> >condition next to the pfmemalloc check.  
>> 
>> The sample code above cannot completely solve the current problem.
>> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
>> __page_frag_cache_refill will return a memory of only 32768 bytes, so 
>> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more 
>> work needs to be done
>
>Right, but I can think of two drivers off the top of my head which will
>allocate <=32k frags but none which will allocate more.

In fact, it is rare to apply for more than one page, so is it necessary to 
change it to support? 
we can just warning and return, also it is easy to synchronize this simple 
protective measures to lower Linux versions.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 15:36             ` Chen Lin
@ 2022-05-31 15:47               ` Jakub Kicinski
  2022-05-31 18:28                 ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2022-05-31 15:47 UTC (permalink / raw)
  To: Chen Lin; +Cc: akpm, linux-mm, linux-kernel, alexander.duyck, netdev

On Tue, 31 May 2022 23:36:22 +0800 Chen Lin wrote:
> At 2022-05-31 22:14:12, "Jakub Kicinski" <kuba@kernel.org> wrote:
> >On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:  
> >> The sample code above cannot completely solve the current problem.
> >> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
> >> __page_frag_cache_refill will return a memory of only 32768 bytes, so 
> >> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more 
> >> work needs to be done  
> >
> >Right, but I can think of two drivers off the top of my head which will
> >allocate <=32k frags but none which will allocate more.  
> 
> In fact, it is rare to apply for more than one page, so is it necessary to 
> change it to support? 

I don't really care if it's supported TBH, but I dislike adding 
a branch to the fast path just to catch one or two esoteric bad 
callers.

Maybe you can wrap the check with some debug CONFIG_ so it won't
run on production builds?

> we can just warning and return, also it is easy to synchronize this simple 
> protective measures to lower Linux versions.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 15:47               ` Jakub Kicinski
@ 2022-05-31 18:28                 ` Alexander Duyck
  2022-06-01 12:32                   ` 愚树
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2022-05-31 18:28 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Chen Lin, Andrew Morton, linux-mm, LKML, Netdev

On Tue, May 31, 2022 at 8:47 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 31 May 2022 23:36:22 +0800 Chen Lin wrote:
> > At 2022-05-31 22:14:12, "Jakub Kicinski" <kuba@kernel.org> wrote:
> > >On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:
> > >> The sample code above cannot completely solve the current problem.
> > >> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
> > >> __page_frag_cache_refill will return a memory of only 32768 bytes, so
> > >> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more
> > >> work needs to be done
> > >
> > >Right, but I can think of two drivers off the top of my head which will
> > >allocate <=32k frags but none which will allocate more.
> >
> > In fact, it is rare to apply for more than one page, so is it necessary to
> > change it to support?
>
> I don't really care if it's supported TBH, but I dislike adding
> a branch to the fast path just to catch one or two esoteric bad
> callers.
>
> Maybe you can wrap the check with some debug CONFIG_ so it won't
> run on production builds?

Also the example used here to define what is triggering the behavior
is seriously flawed. The code itself is meant to allow for order0 page
reuse, and the 32K page was just an optimization. So the assumption
that you could request more than 4k is a bad assumption in the driver
that is making this call.

So I am in agreement with Kuba. We shouldn't be needing to add code in
the fast path to tell users not to shoot themselves in the foot.

We already have code in place in __netdev_alloc_skb that is calling
the slab allocator if "len > SKB_WITH_OVERHEAD(PAGE_SIZE)". We could
probably just add a DEBUG wrapped BUG_ON to capture those cases where
a driver is making that mistake with __netdev_alloc_frag_align.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 14:43       ` [PATCH v3] " Chen Lin
@ 2022-05-31 23:45         ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2022-05-31 23:45 UTC (permalink / raw)
  To: Chen Lin, akpm; +Cc: kuba, linux-mm, linux-kernel, alexander.duyck, netdev


On 5/31/22 07:43, Chen Lin wrote:
> netdev_alloc_frag->page_frag_alloc may cause memory corruption in
> the following process:
>
> 1. A netdev_alloc_frag function call need alloc 200 Bytes to build a skb.
>
> 2. Insufficient memory to alloc PAGE_FRAG_CACHE_MAX_ORDER(32K) in
> __page_frag_cache_refill to fill frag cache, then one page(eg:4K)
> is allocated, now current frag cache is 4K, alloc is success,
> nc->pagecnt_bias--.
>
> 3. Then this 200 bytes skb in step 1 is freed, page->_refcount--.
>
> 4. Another netdev_alloc_frag function call need alloc 5k, page->_refcount
> is equal to nc->pagecnt_bias, reset page count bias and offset to
> start of new frag. page_frag_alloc will return the 4K memory for a
> 5K memory request.
>
> 5. The caller write on the extra 1k memory which is not actual allocated
> will cause memory corruption.
>
> page_frag_alloc is for fragmented allocation. We should warn the caller
> to avoid memory corruption.
>
> When fragsz is larger than one page, we report the failure and return.
> I don't think it is a good idea to make efforts to support the
> allocation of more than one page in this function because the total
> frag cache size(PAGE_FRAG_CACHE_MAX_SIZE 32768) is relatively small.
> When the request is larger than one page, the caller should switch to
> use other kernel interfaces, such as kmalloc and alloc_Pages.
>
> This bug is mainly caused by the reuse of the previously allocated
> frag cache memory by the following LARGER allocations. This bug existed
> before page_frag_alloc was ported from __netdev_alloc_frag in
> net/core/skbuff.c, so most Linux versions have this problem.
>
> Signed-off-by: Chen Lin <chen45464546@163.com>
> ---
>   mm/page_alloc.c |    9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3d..ffc42b5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5574,6 +5574,15 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>   	struct page *page;
>   	int offset;
>   
> +	/*
> +	 * frag_alloc is not suitable for memory alloc which fragsz
> +	 * is bigger than PAGE_SIZE, use kmalloc or alloc_pages instead.
> +	 */
> +	if (WARN_ONCE(fragz > PAGE_SIZE,
> +		      "alloc fragsz(%d) > PAGE_SIZE(%ld) not supported, alloc fail\n",
> +		      fragsz, PAGE_SIZE))
> +		return NULL;
> +
>   	if (unlikely(!nc->va)) {
>   refill:
>   		page = __page_frag_cache_refill(nc, gfp_mask);


I do not think this patch is needed, nor correct. (panic_on_warn=1 will 
panic the box)

Or provide a stack trace ?

Please fix the caller (presumably a network driver ?), and provide an 
appropriate Fixes: tag.

Thanks.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re:Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-05-31 18:28                 ` Alexander Duyck
@ 2022-06-01 12:32                   ` 愚树
  2022-06-01 15:04                     ` Alexander Duyck
  2022-07-08  8:06                     ` Maurizio Lombardi
  0 siblings, 2 replies; 17+ messages in thread
From: 愚树 @ 2022-06-01 12:32 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev

At 2022-06-01 01:28:59, "Alexander Duyck" <alexander.duyck@gmail.com> wrote:
>On Tue, May 31, 2022 at 8:47 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 31 May 2022 23:36:22 +0800 Chen Lin wrote:
>> > At 2022-05-31 22:14:12, "Jakub Kicinski" <kuba@kernel.org> wrote:
>> > >On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:
>> > >> The sample code above cannot completely solve the current problem.
>> > >> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
>> > >> __page_frag_cache_refill will return a memory of only 32768 bytes, so
>> > >> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more
>> > >> work needs to be done
>> > >
>> > >Right, but I can think of two drivers off the top of my head which will
>> > >allocate <=32k frags but none which will allocate more.
>> >
>> > In fact, it is rare to apply for more than one page, so is it necessary to
>> > change it to support?
>>
>> I don't really care if it's supported TBH, but I dislike adding
>> a branch to the fast path just to catch one or two esoteric bad
>> callers.
>>
>> Maybe you can wrap the check with some debug CONFIG_ so it won't
>> run on production builds?
>
>Also the example used here to define what is triggering the behavior
>is seriously flawed. The code itself is meant to allow for order0 page
>reuse, and the 32K page was just an optimization. So the assumption
>that you could request more than 4k is a bad assumption in the driver
>that is making this call.
>
>So I am in agreement with Kuba. We shouldn't be needing to add code in
>the fast path to tell users not to shoot themselves in the foot.
>
>We already have code in place in __netdev_alloc_skb that is calling
>the slab allocator if "len > SKB_WITH_OVERHEAD(PAGE_SIZE)". We could
>probably just add a DEBUG wrapped BUG_ON to capture those cases where
>a driver is making that mistake with __netdev_alloc_frag_align.

Thanks for the clear explanation. 
The reality is that it is not easy to capture the drivers that make such mistake.
Because memory corruption usually leads to errors on other unrelated modules. 
Not long ago, we have spent a lot of time and effort to locate a issue that 
occasionally occurs in different kernel modules, and finally find the root cause is
the improper use of this netdev_alloc_frag interface in DPAA net driver from NXP. 
It's a miserable process.

I also found that some net drivers in the latest Linux version have this issue.
Like:
1. netdev_alloc_frag "len" may larger than PAGE_SIZE
#elif (PAGE_SIZE >= E1000_RXBUFFER_4096)
                adapter->rx_buffer_len = PAGE_SIZE;
#endif

static unsigned int e1000_frag_len(const struct e1000_adapter *a)
{
        return SKB_DATA_ALIGN(a->rx_buffer_len + E1000_HEADROOM) +
                SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
}

static void *e1000_alloc_frag(const struct e1000_adapter *a)
{
        unsigned int len = e1000_frag_len(a);
        u8 *data = netdev_alloc_frag(len);
}
"./drivers/net/ethernet/intel/e1000/e1000_main.c" 5316  --38%-- 

2. netdev_alloc_frag "ring->frag_size" may larger than (4096 * 3)

#define MTK_MAX_LRO_RX_LENGTH           (4096 * 3)
        if (rx_flag == MTK_RX_FLAGS_HWLRO) {
                rx_data_len = MTK_MAX_LRO_RX_LENGTH;
                rx_dma_size = MTK_HW_LRO_DMA_SIZE;
        } else {
                rx_data_len = ETH_DATA_LEN;
                rx_dma_size = MTK_DMA_SIZE;
        }

        ring->frag_size = mtk_max_frag_size(rx_data_len);
        
        for (i = 0; i < rx_dma_size; i++) {
                ring->data[i] = netdev_alloc_frag(ring->frag_size);
                if (!ring->data[i])
                        return -ENOMEM;
        }
"drivers/net/ethernet/mediatek/mtk_eth_soc.c" 3344  --50%-- 

I will try to fix these drivers later.

Even experienced driver engineers may use this netdev_alloc_frag 
interface incorrectly. 
So I thought it is best to provide some prompt information of usage 
error inside the netdev_alloc_frag, or it's OK to report such mistake 
during system running which may caused by fragsz varies(exceeded page size).

Now, as you and Kuba mentioned earlier, "do not add code in fast path".

Can we just add code to the relatively slow path to capture the mistake
before it lead to memory corruption? 
Like:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e6f211d..ac60a97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5580,6 +5580,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               BUG_ON(offset < 0);
        }

        nc->pagecnt_bias--;


Additional, we may modify document to clearly indicate the limits of the 
input parameter fragsz.
Like:
diff --git a/Documentation/vm/page_frags.rst b/Documentation/vm/page_frags.rst
index 7d6f938..61b2805 100644
--- a/Documentation/vm/page_frags.rst
+++ b/Documentation/vm/page_frags.rst
@@ -4,7 +4,7 @@
 Page fragments
 ==============

-A page fragment is an arbitrary-length arbitrary-offset area of memory
+A page fragment is an arbitrary-length(must <= PAGE_SIZE) arbitrary-offset area of memory
 which resides within a 0 or higher order compound page. 

Thanks

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-06-01 12:32                   ` 愚树
@ 2022-06-01 15:04                     ` Alexander Duyck
  2022-07-06 15:21                       ` Maurizio Lombardi
  2022-07-08  8:06                     ` Maurizio Lombardi
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2022-06-01 15:04 UTC (permalink / raw)
  To: 愚树; +Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev

On Wed, Jun 1, 2022 at 5:33 AM 愚树 <chen45464546@163.com> wrote:
>
> At 2022-06-01 01:28:59, "Alexander Duyck" <alexander.duyck@gmail.com> wrote:
> >On Tue, May 31, 2022 at 8:47 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Tue, 31 May 2022 23:36:22 +0800 Chen Lin wrote:
> >> > At 2022-05-31 22:14:12, "Jakub Kicinski" <kuba@kernel.org> wrote:
> >> > >On Tue, 31 May 2022 22:41:12 +0800 Chen Lin wrote:
> >> > >> The sample code above cannot completely solve the current problem.
> >> > >> For example, when fragsz is greater than PAGE_FRAG_CACHE_MAX_SIZE(32768),
> >> > >> __page_frag_cache_refill will return a memory of only 32768 bytes, so
> >> > >> should we continue to expand the PAGE_FRAG_CACHE_MAX_SIZE? Maybe more
> >> > >> work needs to be done
> >> > >
> >> > >Right, but I can think of two drivers off the top of my head which will
> >> > >allocate <=32k frags but none which will allocate more.
> >> >
> >> > In fact, it is rare to apply for more than one page, so is it necessary to
> >> > change it to support?
> >>
> >> I don't really care if it's supported TBH, but I dislike adding
> >> a branch to the fast path just to catch one or two esoteric bad
> >> callers.
> >>
> >> Maybe you can wrap the check with some debug CONFIG_ so it won't
> >> run on production builds?
> >
> >Also the example used here to define what is triggering the behavior
> >is seriously flawed. The code itself is meant to allow for order0 page
> >reuse, and the 32K page was just an optimization. So the assumption
> >that you could request more than 4k is a bad assumption in the driver
> >that is making this call.
> >
> >So I am in agreement with Kuba. We shouldn't be needing to add code in
> >the fast path to tell users not to shoot themselves in the foot.
> >
> >We already have code in place in __netdev_alloc_skb that is calling
> >the slab allocator if "len > SKB_WITH_OVERHEAD(PAGE_SIZE)". We could
> >probably just add a DEBUG wrapped BUG_ON to capture those cases where
> >a driver is making that mistake with __netdev_alloc_frag_align.
>
> Thanks for the clear explanation.
> The reality is that it is not easy to capture the drivers that make such mistake.
> Because memory corruption usually leads to errors on other unrelated modules.
> Not long ago, we have spent a lot of time and effort to locate a issue that
> occasionally occurs in different kernel modules, and finally find the root cause is
> the improper use of this netdev_alloc_frag interface in DPAA net driver from NXP.
> It's a miserable process.
>
> I also found that some net drivers in the latest Linux version have this issue.
> Like:
> 1. netdev_alloc_frag "len" may larger than PAGE_SIZE
> #elif (PAGE_SIZE >= E1000_RXBUFFER_4096)
>                 adapter->rx_buffer_len = PAGE_SIZE;
> #endif
>
> static unsigned int e1000_frag_len(const struct e1000_adapter *a)
> {
>         return SKB_DATA_ALIGN(a->rx_buffer_len + E1000_HEADROOM) +
>                 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> }
>
> static void *e1000_alloc_frag(const struct e1000_adapter *a)
> {
>         unsigned int len = e1000_frag_len(a);
>         u8 *data = netdev_alloc_frag(len);
> }
> "./drivers/net/ethernet/intel/e1000/e1000_main.c" 5316  --38%--

So there isn't actually a bug in this code. Specifically the code is
split up between two paths. The first code block comes from the jumbo
frames path which creates a fraglist skb and will memcpy the header
out if I recall correctly. The code from the other two functions is
from the non-jumbo frames path which has restricted the length to
MAXIMUM_ETHERNET_VLAN_SIZE.

> 2. netdev_alloc_frag "ring->frag_size" may larger than (4096 * 3)
>
> #define MTK_MAX_LRO_RX_LENGTH           (4096 * 3)
>         if (rx_flag == MTK_RX_FLAGS_HWLRO) {
>                 rx_data_len = MTK_MAX_LRO_RX_LENGTH;
>                 rx_dma_size = MTK_HW_LRO_DMA_SIZE;
>         } else {
>                 rx_data_len = ETH_DATA_LEN;
>                 rx_dma_size = MTK_DMA_SIZE;
>         }
>
>         ring->frag_size = mtk_max_frag_size(rx_data_len);
>
>         for (i = 0; i < rx_dma_size; i++) {
>                 ring->data[i] = netdev_alloc_frag(ring->frag_size);
>                 if (!ring->data[i])
>                         return -ENOMEM;
>         }
> "drivers/net/ethernet/mediatek/mtk_eth_soc.c" 3344  --50%--
>
> I will try to fix these drivers later.

This one I don't know as much about, and it does appear to contain a
bug. What it should be doing is a check before doing the
netdev_alloc_frag call to verify if it is less than 4K then it uses
netdev_alloc_frag, if it is greater then it needs to use alloc_pages.

> Even experienced driver engineers may use this netdev_alloc_frag
> interface incorrectly.
> So I thought it is best to provide some prompt information of usage
> error inside the netdev_alloc_frag, or it's OK to report such mistake
> during system running which may caused by fragsz varies(exceeded page size).
>
> Now, as you and Kuba mentioned earlier, "do not add code in fast path".
>
> Can we just add code to the relatively slow path to capture the mistake
> before it lead to memory corruption?
> Like:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e6f211d..ac60a97 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5580,6 +5580,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>                 /* reset page count bias and offset to start of new frag */
>                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>                 offset = size - fragsz;
> +               BUG_ON(offset < 0);
>         }
>
>         nc->pagecnt_bias--;
>


I think I could be onboard with a patch like this. The test shouldn't
add more than 1 instruction since it is essentially just a jump if
signed test which will be performed after the size - fragsz check.

> Additional, we may modify document to clearly indicate the limits of the
> input parameter fragsz.
> Like:
> diff --git a/Documentation/vm/page_frags.rst b/Documentation/vm/page_frags.rst
> index 7d6f938..61b2805 100644
> --- a/Documentation/vm/page_frags.rst
> +++ b/Documentation/vm/page_frags.rst
> @@ -4,7 +4,7 @@
>  Page fragments
>  ==============
>
> -A page fragment is an arbitrary-length arbitrary-offset area of memory
> +A page fragment is an arbitrary-length(must <= PAGE_SIZE) arbitrary-offset area of memory
>  which resides within a 0 or higher order compound page.

The main thing I would call out about the page fragment is that it
should be less than an order 0 page in size, ideally at least half a
page to allow for reuse even in the case of order 0 pages. Otherwise
it is really an abuse of the interface as it isn't really meant to be
allocating 1 fragment per page since the efficiency will drop pretty
significantly as memory becomes fragmented and it becomes harder to
allocate higher order pages. It would essentially just become
alloc_page with more overhead.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-06-01 15:04                     ` Alexander Duyck
@ 2022-07-06 15:21                       ` Maurizio Lombardi
  0 siblings, 0 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2022-07-06 15:21 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: 愚树, Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev

st 1. 6. 2022 v 17:05 odesílatel Alexander Duyck
<alexander.duyck@gmail.com> napsal:
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index e6f211d..ac60a97 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5580,6 +5580,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
> >                 /* reset page count bias and offset to start of new frag */
> >                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >                 offset = size - fragsz;
> > +               BUG_ON(offset < 0);
> >         }
> >
> >         nc->pagecnt_bias--;
> >
>
>
> I think I could be onboard with a patch like this. The test shouldn't
> add more than 1 instruction since it is essentially just a jump if
> signed test which will be performed after the size - fragsz check.

FYI, I hit this problem a few days ago with the nfp network driver, it uses
page_frag_alloc() with a frag size larger than PAGE_SIZE when MTU is
set to 9000,
this may result in memory corruptions when the system runs out of memory.

The solution I was working on was something like the following, this
makes the allocation
fail if fragsz is greater than the cache size.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4dc0d333279f..c6b40b85c55d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5544,12 +5544,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
                /* if size can vary use size else just use PAGE_SIZE */
                size = nc->size;
 #endif
-               /* OK, page count is 0, we can safely set it */
-               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
-
                /* reset page count bias and offset to start of new frag */
                nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
                offset = size - fragsz;
+               if (unlikely(offset < 0)) {
+                       free_the_page(page, compound_order(page));
+                       nc->va = NULL;
+                       return NULL;
+               }
+
+               /* OK, page count is 0, we can safely set it */
+               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
        }

        nc->pagecnt_bias--;


Maurizio


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH v2] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE
  2022-06-01 12:32                   ` 愚树
  2022-06-01 15:04                     ` Alexander Duyck
@ 2022-07-08  8:06                     ` Maurizio Lombardi
  1 sibling, 0 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2022-07-08  8:06 UTC (permalink / raw)
  To: 愚树
  Cc: Alexander Duyck, Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev

st 1. 6. 2022 v 14:49 odesílatel 愚树 <chen45464546@163.com> napsal:
> Can we just add code to the relatively slow path to capture the mistake
> before it lead to memory corruption?
> Like:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e6f211d..ac60a97 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5580,6 +5580,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>                 /* reset page count bias and offset to start of new frag */
>                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>                 offset = size - fragsz;
> +               BUG_ON(offset < 0);
>         }
>

Personally, I'm not really convinced this is the best solution.
The next time a driver abuses  the page_frag_alloc() interface, the
bug may go unnoticed for a long time...
until a server in production runs into OOM and crashes because it hits
the BUG_ON().

And why should the kernel panic? It's perfectly able to handle this
condition by failing
the allocation and returning NULL, and printing a warning maybe.

Maurizio


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-07-08  8:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28 15:39 [PATCH] mm: page_frag: Warn_on when frag_alloc size is bigger than PAGE_SIZE Chen Lin
2022-05-29 23:30 ` Andrew Morton
2022-05-30 13:39   ` [PATCH v2] " Chen Lin
2022-05-30 19:27     ` Jakub Kicinski
2022-05-30 19:29       ` Jakub Kicinski
2022-05-31 14:41         ` Chen Lin
2022-05-31 15:14           ` Jakub Kicinski
2022-05-31 15:36             ` Chen Lin
2022-05-31 15:47               ` Jakub Kicinski
2022-05-31 18:28                 ` Alexander Duyck
2022-06-01 12:32                   ` 愚树
2022-06-01 15:04                     ` Alexander Duyck
2022-07-06 15:21                       ` Maurizio Lombardi
2022-07-08  8:06                     ` Maurizio Lombardi
2022-05-30 20:07     ` Andrew Morton
2022-05-31 14:43       ` [PATCH v3] " Chen Lin
2022-05-31 23:45         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).