linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
@ 2022-03-04 14:26 Daniel Borkmann
  2022-03-04 15:34 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Daniel Borkmann @ 2022-03-04 14:26 UTC (permalink / raw)
  To: torvalds
  Cc: bpf, netdev, linux-kernel, Daniel Borkmann,
	syzbot+11421fbbff99b989670e, Björn Töpel,
	Magnus Karlsson, Willy Tarreau, Andrew Morton,
	Alexei Starovoitov, Andrii Nakryiko, Jakub Kicinski,
	David S . Miller

syzkaller was recently triggering an oversized kvmalloc() warning via
xdp_umem_create().

The triggered warning was added back in 7661809d493b ("mm: don't allow
oversized kvmalloc() calls"). The rationale for the warning for huge
kvmalloc sizes was as a reaction to a security bug where the size was
more than UINT_MAX but not everything was prepared to handle unsigned
long sizes.

Anyway, the AF_XDP related call trace from this syzkaller report was:

  kvmalloc include/linux/mm.h:806 [inline]
  kvmalloc_array include/linux/mm.h:824 [inline]
  kvcalloc include/linux/mm.h:829 [inline]
  xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
  xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
  xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
  xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
  __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
  __do_sys_setsockopt net/socket.c:2187 [inline]
  __se_sys_setsockopt net/socket.c:2184 [inline]
  __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Björn mentioned that requests for >2GB allocation can still be valid:

  The structure that is being allocated is the page-pinning accounting.
  AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
  still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
  PAGE_SIZE on 64 bit systems). [...]

  I could just change from U32_MAX to INT_MAX, but as I stated earlier
  that has a hacky feeling to it. [...] From my perspective, the code
  isn't broken, with the memcg limits in consideration. [...]

Linus says:

  [...] Pretty much every time this has come up, the kernel warning has
  shown that yes, the code was broken and there really wasn't a reason
  for doing allocations that big.

  Of course, some people would be perfectly fine with the allocation
  failing, they just don't want the warning. I didn't want __GFP_NOWARN
  to shut it up originally because I wanted people to see all those
  cases, but these days I think we can just say "yeah, people can shut
  it up explicitly by saying 'go ahead and fail this allocation, don't
  warn about it'".

  So enough time has passed that by now I'd certainly be ok with [it].

Thus allow call-sites to silence such userspace triggered splats if the
allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
to kvcalloc() this is already the case, so nothing else needed there.

Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
Cc: Björn Töpel <bjorn@kernel.org>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
---
 [ Hi Linus, just to follow-up on the discussion from here [0], I've cooked
   up proper and tested patch. Feel free to take it directly to your tree if
   you prefer, or we could also either route it via bpf or mm, whichever way
   is best. Thanks!
   [0] https://lore.kernel.org/bpf/CAHk-=wiRq+_jd_O1gz3J6-ANtXMY7iLpi8XFUcmtB3rBixvUXQ@mail.gmail.com/ ]

 mm/util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index 7e43369064c8..d3102081add0 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -587,8 +587,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 		return ret;
 
 	/* Don't even allow crazy sizes */
-	if (WARN_ON_ONCE(size > INT_MAX))
+	if (unlikely(size > INT_MAX)) {
+		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
 		return NULL;
+	}
 
 	return __vmalloc_node(size, 1, flags, node,
 			__builtin_return_address(0));
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
  2022-03-04 14:26 [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls Daniel Borkmann
@ 2022-03-04 15:34 ` Michal Hocko
  2022-03-04 17:45 ` Leon Romanovsky
  2022-03-04 18:01 ` Linus Torvalds
  2 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2022-03-04 15:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: torvalds, bpf, netdev, linux-kernel, syzbot+11421fbbff99b989670e,
	Björn Töpel, Magnus Karlsson, Willy Tarreau,
	Andrew Morton, Alexei Starovoitov, Andrii Nakryiko,
	Jakub Kicinski, David S . Miller

On Fri 04-03-22 15:26:32, Daniel Borkmann wrote:
> syzkaller was recently triggering an oversized kvmalloc() warning via
> xdp_umem_create().
> 
> The triggered warning was added back in 7661809d493b ("mm: don't allow
> oversized kvmalloc() calls"). The rationale for the warning for huge
> kvmalloc sizes was as a reaction to a security bug where the size was
> more than UINT_MAX but not everything was prepared to handle unsigned
> long sizes.
> 
> Anyway, the AF_XDP related call trace from this syzkaller report was:
> 
>   kvmalloc include/linux/mm.h:806 [inline]
>   kvmalloc_array include/linux/mm.h:824 [inline]
>   kvcalloc include/linux/mm.h:829 [inline]
>   xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
>   xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
>   xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
>   xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
>   __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
>   __do_sys_setsockopt net/socket.c:2187 [inline]
>   __se_sys_setsockopt net/socket.c:2184 [inline]
>   __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Björn mentioned that requests for >2GB allocation can still be valid:
> 
>   The structure that is being allocated is the page-pinning accounting.
>   AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
>   still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
>   PAGE_SIZE on 64 bit systems). [...]
> 
>   I could just change from U32_MAX to INT_MAX, but as I stated earlier
>   that has a hacky feeling to it. [...] From my perspective, the code
>   isn't broken, with the memcg limits in consideration. [...]
> 
> Linus says:
> 
>   [...] Pretty much every time this has come up, the kernel warning has
>   shown that yes, the code was broken and there really wasn't a reason
>   for doing allocations that big.
> 
>   Of course, some people would be perfectly fine with the allocation
>   failing, they just don't want the warning. I didn't want __GFP_NOWARN
>   to shut it up originally because I wanted people to see all those
>   cases, but these days I think we can just say "yeah, people can shut
>   it up explicitly by saying 'go ahead and fail this allocation, don't
>   warn about it'".
> 
>   So enough time has passed that by now I'd certainly be ok with [it].
> 
> Thus allow call-sites to silence such userspace triggered splats if the
> allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
> to kvcalloc() this is already the case, so nothing else needed there.
> 
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
> Cc: Björn Töpel <bjorn@kernel.org>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: David S. Miller <davem@davemloft.net>
> Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
> Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org

This makes sense to me.
Ackd-by: Michal Hocko <mhocko@suse.com>

> ---
>  [ Hi Linus, just to follow-up on the discussion from here [0], I've cooked
>    up proper and tested patch. Feel free to take it directly to your tree if
>    you prefer, or we could also either route it via bpf or mm, whichever way
>    is best. Thanks!
>    [0] https://lore.kernel.org/bpf/CAHk-=wiRq+_jd_O1gz3J6-ANtXMY7iLpi8XFUcmtB3rBixvUXQ@mail.gmail.com/ ]
> 
>  mm/util.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 7e43369064c8..d3102081add0 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -587,8 +587,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  		return ret;
>  
>  	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> +	if (unlikely(size > INT_MAX)) {
> +		WARN_ON_ONCE(!(flags & __GFP_NOWARN));
>  		return NULL;
> +	}
>  
>  	return __vmalloc_node(size, 1, flags, node,
>  			__builtin_return_address(0));
> -- 
> 2.21.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
  2022-03-04 14:26 [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls Daniel Borkmann
  2022-03-04 15:34 ` Michal Hocko
@ 2022-03-04 17:45 ` Leon Romanovsky
  2022-03-04 18:01 ` Linus Torvalds
  2 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2022-03-04 17:45 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: torvalds, bpf, netdev, linux-kernel, syzbot+11421fbbff99b989670e,
	Björn Töpel, Magnus Karlsson, Willy Tarreau,
	Andrew Morton, Alexei Starovoitov, Andrii Nakryiko,
	Jakub Kicinski, David S . Miller

On Fri, Mar 04, 2022 at 03:26:32PM +0100, Daniel Borkmann wrote:
> syzkaller was recently triggering an oversized kvmalloc() warning via
> xdp_umem_create().
> 
> The triggered warning was added back in 7661809d493b ("mm: don't allow
> oversized kvmalloc() calls"). The rationale for the warning for huge
> kvmalloc sizes was as a reaction to a security bug where the size was
> more than UINT_MAX but not everything was prepared to handle unsigned
> long sizes.
> 
> Anyway, the AF_XDP related call trace from this syzkaller report was:
> 
>   kvmalloc include/linux/mm.h:806 [inline]
>   kvmalloc_array include/linux/mm.h:824 [inline]
>   kvcalloc include/linux/mm.h:829 [inline]
>   xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
>   xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
>   xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
>   xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
>   __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
>   __do_sys_setsockopt net/socket.c:2187 [inline]
>   __se_sys_setsockopt net/socket.c:2184 [inline]
>   __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Björn mentioned that requests for >2GB allocation can still be valid:
> 
>   The structure that is being allocated is the page-pinning accounting.
>   AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
>   still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
>   PAGE_SIZE on 64 bit systems). [...]
> 
>   I could just change from U32_MAX to INT_MAX, but as I stated earlier
>   that has a hacky feeling to it. [...] From my perspective, the code
>   isn't broken, with the memcg limits in consideration. [...]
> 
> Linus says:
> 
>   [...] Pretty much every time this has come up, the kernel warning has
>   shown that yes, the code was broken and there really wasn't a reason
>   for doing allocations that big.
> 
>   Of course, some people would be perfectly fine with the allocation
>   failing, they just don't want the warning. I didn't want __GFP_NOWARN
>   to shut it up originally because I wanted people to see all those
>   cases, but these days I think we can just say "yeah, people can shut
>   it up explicitly by saying 'go ahead and fail this allocation, don't
>   warn about it'".
> 
>   So enough time has passed that by now I'd certainly be ok with [it].
> 
> Thus allow call-sites to silence such userspace triggered splats if the
> allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
> to kvcalloc() this is already the case, so nothing else needed there.
> 
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
> Cc: Björn Töpel <bjorn@kernel.org>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: David S. Miller <davem@davemloft.net>
> Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
> Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.org
> ---
>  [ Hi Linus, just to follow-up on the discussion from here [0], I've cooked
>    up proper and tested patch. Feel free to take it directly to your tree if
>    you prefer, or we could also either route it via bpf or mm, whichever way
>    is best. Thanks!
>    [0] https://lore.kernel.org/bpf/CAHk-=wiRq+_jd_O1gz3J6-ANtXMY7iLpi8XFUcmtB3rBixvUXQ@mail.gmail.com/ ]

It will be great to see this patch applied directly to Linus's tree.
We (RDMA) have same false alarms [1].

[1] https://lore.kernel.org/linux-mm/YayptO82EvG3EwKA@unreal/

Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls
  2022-03-04 14:26 [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls Daniel Borkmann
  2022-03-04 15:34 ` Michal Hocko
  2022-03-04 17:45 ` Leon Romanovsky
@ 2022-03-04 18:01 ` Linus Torvalds
  2 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2022-03-04 18:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Netdev, Linux Kernel Mailing List, syzbot,
	Björn Töpel, Magnus Karlsson, Willy Tarreau,
	Andrew Morton, Alexei Starovoitov, Andrii Nakryiko,
	Jakub Kicinski, David S . Miller

On Fri, Mar 4, 2022 at 6:27 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>  [ Hi Linus, just to follow-up on the discussion from here [0], I've cooked
>    up proper and tested patch. Feel free to take it directly to your tree if
>    you prefer, or we could also either route it via bpf or mm, whichever way
>    is best. Thanks!

Applied.

Thanks,
          Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-04 18:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04 14:26 [PATCH] mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls Daniel Borkmann
2022-03-04 15:34 ` Michal Hocko
2022-03-04 17:45 ` Leon Romanovsky
2022-03-04 18:01 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).