linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
@ 2022-07-13 14:58 Maurizio Lombardi
  2022-07-13 15:01 ` Maurizio Lombardi
  0 siblings, 1 reply; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-13 14:58 UTC (permalink / raw)
  To: alexander.duyck; +Cc: kuba, akpm, linux-mm, linux-kernel, netdev, chen45464546

A number of drivers call page_frag_alloc() with a
fragment's size > PAGE_SIZE.
In low memory conditions, __page_frag_cache_refill() may fail the order 3
cache allocation and fall back to order 0;
In this case, the cache will be smaller than the fragment, causing
memory corruptions.

Prevent this from happening by checking if the newly allocated cache
is large enough for the fragment; if not, the allocation will fail
and page_frag_alloc() will return NULL.

V2: do not free the cache page because this could make memory pressure
even worse, just return NULL.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..b1407254a826 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5617,6 +5617,8 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 		/* reset page count bias and offset to start of new frag */
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
 		offset = size - fragsz;
+		if (unlikely(offset < 0))
+			return NULL;
 	}
 
 	nc->pagecnt_bias--;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-13 14:58 [PATCH] mm: prevent page_frag_alloc() from corrupting the memory Maurizio Lombardi
@ 2022-07-13 15:01 ` Maurizio Lombardi
  0 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-13 15:01 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev, 愚树

st 13. 7. 2022 v 16:59 odesílatel Maurizio Lombardi
<mlombard@redhat.com> napsal:
>
> A number of drivers call page_frag_alloc() with a
> fragment's size > PAGE_SIZE.
> In low memory conditions, __page_frag_cache_refill() may fail the order 3
> cache allocation and fall back to order 0;
> In this case, the cache will be smaller than the fragment, causing
> memory corruptions.

Oops, I didn't modify the subject, I'm going to resend it.

Maurizio

>
> Prevent this from happening by checking if the newly allocated cache
> is large enough for the fragment; if not, the allocation will fail
> and page_frag_alloc() will return NULL.
>
> V2: do not free the cache page because this could make memory pressure
> even worse, just return NULL.
>
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..b1407254a826 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5617,6 +5617,8 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>                 /* reset page count bias and offset to start of new frag */
>                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>                 offset = size - fragsz;
> +               if (unlikely(offset < 0))
> +                       return NULL;
>         }
>
>         nc->pagecnt_bias--;
> --
> 2.31.1
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-11 18:23     ` Alexander Duyck
@ 2022-07-11 18:36       ` Maurizio Lombardi
  0 siblings, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-11 18:36 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Andrew Morton, Chen Lin, Jakub Kicinski, LKML, Netdev, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1896 bytes --]

po 11. 7. 2022 v 20:23 odesílatel Alexander Duyck <alexander.duyck@gmail.com>
napsal:

> On Mon, Jul 11, 2022 at 9:17 AM Maurizio Lombardi <mlombard@redhat.com>
> wrote:
> >
> > po 11. 7. 2022 v 17:34 odesílatel Alexander Duyck
> > <alexander.duyck@gmail.com> napsal:
> > >
> > > Rather than forcing us to free the page it might be better to move the
> > > lines getting the size and computing the offset to the top of the "if
> > > (unlikely(offset < 0)) {" block. Then instead of freeing the page we
> > > could just return NULL and don't have to change the value of any
> > > fields in the page_frag_cache.
> > >
> > > That way a driver performing bad requests can't force us to start
> > > allocating and freeing pages like mad by repeatedly flushing the
> > > cache.
> > >
> >
> > I understand. On the other hand, if we free the cache page then the
> > next time __page_frag_cache_refill() runs it may be successful
> > at allocating the order=3 cache, the normal page_frag_alloc() behaviour
> will
> > therefore be restored.
>
> That is a big "maybe". My concern is that it will actually make memory
> pressure worse by forcing us to reduce the number of uses for a lower
> order page. One bad actor will have us flushing memory like mad so a
> guy expecting a small fragment may end up allocating 32K pages because
> someone else is trying to allocate them.
>
> I recommend we do not optimize for a case which this code was not
> designed for. Try to optimize for the standard case that most of the
> drivers are using. These drivers that are allocating higher order
> pages worth of memory should really be using alloc_pages. Using this
> to allocate pages over 4K in size is just a waste since they are not
> likely to see page reuse which is what this code expects to see.


Ok, thanks for the review, I will submit a V2 soon.

Maurizio

[-- Attachment #2: Type: text/html, Size: 2563 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-11 16:17   ` Maurizio Lombardi
@ 2022-07-11 18:23     ` Alexander Duyck
  2022-07-11 18:36       ` Maurizio Lombardi
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2022-07-11 18:23 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev, Chen Lin

On Mon, Jul 11, 2022 at 9:17 AM Maurizio Lombardi <mlombard@redhat.com> wrote:
>
> po 11. 7. 2022 v 17:34 odesílatel Alexander Duyck
> <alexander.duyck@gmail.com> napsal:
> >
> > Rather than forcing us to free the page it might be better to move the
> > lines getting the size and computing the offset to the top of the "if
> > (unlikely(offset < 0)) {" block. Then instead of freeing the page we
> > could just return NULL and don't have to change the value of any
> > fields in the page_frag_cache.
> >
> > That way a driver performing bad requests can't force us to start
> > allocating and freeing pages like mad by repeatedly flushing the
> > cache.
> >
>
> I understand. On the other hand, if we free the cache page then the
> next time __page_frag_cache_refill() runs it may be successful
> at allocating the order=3 cache, the normal page_frag_alloc() behaviour will
> therefore be restored.

That is a big "maybe". My concern is that it will actually make memory
pressure worse by forcing us to reduce the number of uses for a lower
order page. One bad actor will have us flushing memory like mad so a
guy expecting a small fragment may end up allocating 32K pages because
someone else is trying to allocate them.

I recommend we do not optimize for a case which this code was not
designed for. Try to optimize for the standard case that most of the
drivers are using. These drivers that are allocating higher order
pages worth of memory should really be using alloc_pages. Using this
to allocate pages over 4K in size is just a waste since they are not
likely to see page reuse which is what this code expects to see.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-11 15:34 ` Alexander Duyck
@ 2022-07-11 16:17   ` Maurizio Lombardi
  2022-07-11 18:23     ` Alexander Duyck
  0 siblings, 1 reply; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-11 16:17 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev, Chen Lin

po 11. 7. 2022 v 17:34 odesílatel Alexander Duyck
<alexander.duyck@gmail.com> napsal:
>
> Rather than forcing us to free the page it might be better to move the
> lines getting the size and computing the offset to the top of the "if
> (unlikely(offset < 0)) {" block. Then instead of freeing the page we
> could just return NULL and don't have to change the value of any
> fields in the page_frag_cache.
>
> That way a driver performing bad requests can't force us to start
> allocating and freeing pages like mad by repeatedly flushing the
> cache.
>

I understand. On the other hand, if we free the cache page then the
next time __page_frag_cache_refill() runs it may be successful
at allocating the order=3 cache, the normal page_frag_alloc() behaviour will
therefore be restored.

Maurizio



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-11  7:52 Maurizio Lombardi
  2022-07-11  9:02 ` Maurizio Lombardi
@ 2022-07-11 15:34 ` Alexander Duyck
  2022-07-11 16:17   ` Maurizio Lombardi
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2022-07-11 15:34 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev, Chen Lin

On Mon, Jul 11, 2022 at 12:52 AM Maurizio Lombardi <mlombard@redhat.com> wrote:
>
> A number of drivers call page_frag_alloc() with a
> fragment's size > PAGE_SIZE.
> In low memory conditions, __page_frag_cache_refill() may fail the order 3
> cache allocation and fall back to order 0;
> If this happens, the cache will be smaller than the fragment, causing
> memory corruptions.
>
> Prevent this from happening by checking if the newly allocated cache
> is large enough for the fragment; if not, the allocation will fail
> and page_frag_alloc() will return NULL.
>
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  mm/page_alloc.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e008a3df0485..7fb000d7e90c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5611,12 +5611,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
>                 /* if size can vary use size else just use PAGE_SIZE */
>                 size = nc->size;
>  #endif
> -               /* OK, page count is 0, we can safely set it */
> -               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
> -
>                 /* reset page count bias and offset to start of new frag */
>                 nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
>                 offset = size - fragsz;
> +               if (unlikely(offset < 0)) {
> +                       free_the_page(page, compound_order(page));
> +                       nc->va = NULL;
> +                       return NULL;
> +               }
> +
> +               /* OK, page count is 0, we can safely set it */
> +               set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>         }
>
>         nc->pagecnt_bias--;

Rather than forcing us to free the page it might be better to move the
lines getting the size and computing the offset to the top of the "if
(unlikely(offset < 0)) {" block. Then instead of freeing the page we
could just return NULL and don't have to change the value of any
fields in the page_frag_cache.

That way a driver performing bad requests can't force us to start
allocating and freeing pages like mad by repeatedly flushing the
cache.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
  2022-07-11  7:52 Maurizio Lombardi
@ 2022-07-11  9:02 ` Maurizio Lombardi
  2022-07-11 15:34 ` Alexander Duyck
  1 sibling, 0 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-11  9:02 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Andrew Morton, linux-mm, LKML, Netdev, 愚树

Tested with this kernel module:

http://bsdbackstore.eu/misc/oomk/

It requires 2 parameters: the first one is the amount of memory you
want to allocate with page_frag_alloc(), the second one is the size of
the fragment
I tested it on a machine with ~7Gb of free memory.

Without the patch:
-------------------------------------------------
3Gb of memory will be used with frag size = 1024 byte. No issue:

#insmod oomk.ko memory_size_gb=3 fragsize=1024

[  177.875107] Test begins, memory size = 3 fragsize = 1024
[  177.974538] Test completed!

10 Gb of memory, 1024 byte frag. page allocation failure but the
kernel handles it and doesn't crash:

#insmod oomk.ko memory_size_gb=10 fragsize=1024

[  215.104801] Test begins, memory size = 10 fragsize = 1024
[  215.227854] insmod: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
[  215.230231] CPU: 1 PID: 1738 Comm: insmod Kdump: loaded Tainted: G
         OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  215.232344] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  215.233523] Call Trace:
[  215.234001]  dump_stack_lvl+0x34/0x44
[  215.234894]  warn_alloc+0x134/0x160
[  215.235592]  __alloc_pages_slowpath.constprop.0+0x809/0x840
[  215.236687]  ? get_page_from_freelist+0xc6/0x500
[  215.237569]  __alloc_pages+0x1fa/0x230
[  215.238381]  page_frag_alloc_align+0x16c/0x1a0
[...]
[  215.315722] allocation number 7379888 failed!
[  215.426227] Test completed!

10Gb, 4097 byte frag. Kernel crashes:

#insmod oomk.ko memory_size_gb=10 fragsize=4097
[  623.461505] BUG: Bad page state in process insmod  pfn:10a80c
[  623.462634] page:000000000654dc14 refcount:0 mapcount:0
mapping:000000007a56d6cd index:0x0 pfn:0x10a80c
[  623.464401] memcg:ffff900343a5b501
[  623.465058] aops:0xffff9003409e5d38 with invalid host inode 00003524480055f0
[  623.466394] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  623.467632] raw: 0017ffffc0000000 dead000000000100 dead000000000122
ffff900346cf2900
[  623.469069] raw: 0000000000000000 0000000000100010 00000000ffffffff
ffff900343a5b501
[  623.470521] page dumped because: page still charged to cgroup
[...]
[  626.632838] general protection fault, probably for non-canonical
address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI
[  626.633913] ------------[ cut here ]------------
[  626.639981] CPU: 0 PID: 722 Comm: agetty Kdump: loaded Tainted: G
 B      OE    --------- ---  5.14.0-124.kpq0.el9.x86_64 #1
[  626.640923] WARNING: CPU: 1 PID: 22 at mm/slub.c:4566 __ksize+0xc4/0xe0
[  626.645018] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  626.645021] RIP: 0010:___slab_alloc+0x1b7/0x5c0


------------------------------------------

With the patch the kernel doesn't crash:

#insmod oomk.ko memory_size_gb=10 fragsize=4097
[ 4859.358496] Test begins, memory size = 10 fragsize = 4097
[ 4859.459674] allocation number 607754 failed!
[ 4859.495489] Test completed!

#insmod oomk.ko memory_size_gb=10 fragsize=40000
[ 8428.021491] Test begins, memory size = 10 fragsize = 40000
[ 8428.024308] allocation number 0 failed!
[ 8428.025709] Test completed!

Maurizio



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] mm: prevent page_frag_alloc() from corrupting the memory
@ 2022-07-11  7:52 Maurizio Lombardi
  2022-07-11  9:02 ` Maurizio Lombardi
  2022-07-11 15:34 ` Alexander Duyck
  0 siblings, 2 replies; 8+ messages in thread
From: Maurizio Lombardi @ 2022-07-11  7:52 UTC (permalink / raw)
  To: alexander.duyck; +Cc: kuba, akpm, linux-mm, linux-kernel, netdev, chen45464546

A number of drivers call page_frag_alloc() with a
fragment's size > PAGE_SIZE.
In low memory conditions, __page_frag_cache_refill() may fail the order 3
cache allocation and fall back to order 0;
If this happens, the cache will be smaller than the fragment, causing
memory corruptions.

Prevent this from happening by checking if the newly allocated cache
is large enough for the fragment; if not, the allocation will fail
and page_frag_alloc() will return NULL.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 mm/page_alloc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e008a3df0485..7fb000d7e90c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5611,12 +5611,17 @@ void *page_frag_alloc_align(struct page_frag_cache *nc,
 		/* if size can vary use size else just use PAGE_SIZE */
 		size = nc->size;
 #endif
-		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
-
 		/* reset page count bias and offset to start of new frag */
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
 		offset = size - fragsz;
+		if (unlikely(offset < 0)) {
+			free_the_page(page, compound_order(page));
+			nc->va = NULL;
+			return NULL;
+		}
+
+		/* OK, page count is 0, we can safely set it */
+		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
 	}
 
 	nc->pagecnt_bias--;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-13 15:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13 14:58 [PATCH] mm: prevent page_frag_alloc() from corrupting the memory Maurizio Lombardi
2022-07-13 15:01 ` Maurizio Lombardi
  -- strict thread matches above, loose matches on Subject: below --
2022-07-11  7:52 Maurizio Lombardi
2022-07-11  9:02 ` Maurizio Lombardi
2022-07-11 15:34 ` Alexander Duyck
2022-07-11 16:17   ` Maurizio Lombardi
2022-07-11 18:23     ` Alexander Duyck
2022-07-11 18:36       ` Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).