linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
@ 2022-12-06 23:17 Kees Cook
  2022-12-07  1:55 ` Jakub Kicinski
  2022-12-07  9:19 ` Vlastimil Babka
  0 siblings, 2 replies; 6+ messages in thread
From: Kees Cook @ 2022-12-06 23:17 UTC (permalink / raw)
  To: David S. Miller
  Cc: Kees Cook, syzbot+fda18eaa8c12534ccb3b, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Pavel Begunkov, pepsipu,
	Vlastimil Babka, kasan-dev, Andrii Nakryiko, ast, bpf,
	Daniel Borkmann, Hao Luo, Jesper Dangaard Brouer, John Fastabend,
	jolsa, KP Singh, martin.lau, Stanislav Fomichev, song,
	Yonghong Song, netdev, LKML, Menglong Dong, David Ahern,
	Martin KaFai Lau, Luiz Augusto von Dentz, Richard Gobert,
	Andrey Konovalov, David Rientjes, linux-hardening

When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified. For example, syzkaller reported:

  BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
  Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: pepsipu <soopthegoop@gmail.com>
Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: ast@kernel.org
Cc: bpf <bpf@vger.kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: jolsa@kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: martin.lau@linux.dev
Cc: Stanislav Fomichev <sdf@google.com>
Cc: song@kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/skbuff.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1d9719e72f9d..b55d061ed8b4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
 			       unsigned int frag_size)
 {
 	struct skb_shared_info *shinfo;
-	unsigned int size = frag_size ? : ksize(data);
+	unsigned int size = frag_size;
+
+	/* When frag_size == 0, the buffer came from kmalloc, so we
+	 * must find its true allocation size (and grow it to match).
+	 */
+	if (unlikely(size == 0)) {
+		void *resized;
+
+		size = ksize(data);
+		/* krealloc() will immediate return "data" when
+		 * "ksize(data)" is requested: it is the existing upper
+		 * bounds. As a result, GFP_ATOMIC will be ignored.
+		 */
+		resized = krealloc(data, size, GFP_ATOMIC);
+		if (WARN_ON(resized != data))
+			data = resized;
+	}
 
 	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
  2022-12-06 23:17 [PATCH] skbuff: Reallocate to ksize() in __build_skb_around() Kees Cook
@ 2022-12-07  1:55 ` Jakub Kicinski
  2022-12-07  3:47   ` Kees Cook
  2022-12-07 10:30   ` Eric Dumazet
  2022-12-07  9:19 ` Vlastimil Babka
  1 sibling, 2 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-12-07  1:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, syzbot+fda18eaa8c12534ccb3b, Eric Dumazet,
	Paolo Abeni, Pavel Begunkov, pepsipu, Vlastimil Babka, kasan-dev,
	Andrii Nakryiko, ast, bpf, Daniel Borkmann, Hao Luo,
	Jesper Dangaard Brouer, John Fastabend, jolsa, KP Singh,
	martin.lau, Stanislav Fomichev, song, Yonghong Song, netdev,
	LKML, Menglong Dong, David Ahern, Martin KaFai Lau,
	Luiz Augusto von Dentz, Richard Gobert, Andrey Konovalov,
	David Rientjes, linux-hardening

On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> -	unsigned int size = frag_size ? : ksize(data);
> +	unsigned int size = frag_size;
> +
> +	/* When frag_size == 0, the buffer came from kmalloc, so we
> +	 * must find its true allocation size (and grow it to match).
> +	 */
> +	if (unlikely(size == 0)) {
> +		void *resized;
> +
> +		size = ksize(data);
> +		/* krealloc() will immediate return "data" when
> +		 * "ksize(data)" is requested: it is the existing upper
> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
> +		 */
> +		resized = krealloc(data, size, GFP_ATOMIC);
> +		if (WARN_ON(resized != data))
> +			data = resized;
> +	}
>  

Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
using kmalloc()'ed heads is large because GRO can't free the metadata.
So we end up carrying per-MTU skbs across to the application and then
freeing them one by one. With pages we just aggregate up to 64k of data
in a single skb.

I can only grep out 3 cases of build_skb(.. 0), could we instead
convert them into a new build_skb_slab(), and handle all the silliness
in such a new helper? That'd be a win both for the memory safety and one
fewer branch for the fast path.

I think it's worth doing, so LMK if you're okay to do this extra work,
otherwise I can help (unless e.g. Eric tells me I'm wrong..).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
  2022-12-07  1:55 ` Jakub Kicinski
@ 2022-12-07  3:47   ` Kees Cook
  2022-12-07  4:04     ` Jakub Kicinski
  2022-12-07 10:30   ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Kees Cook @ 2022-12-07  3:47 UTC (permalink / raw)
  To: Jakub Kicinski, Kees Cook
  Cc: David S. Miller, syzbot+fda18eaa8c12534ccb3b, Eric Dumazet,
	Paolo Abeni, Pavel Begunkov, pepsipu, Vlastimil Babka, kasan-dev,
	Andrii Nakryiko, ast, bpf, Daniel Borkmann, Hao Luo,
	Jesper Dangaard Brouer, John Fastabend, jolsa, KP Singh,
	martin.lau, Stanislav Fomichev, song, Yonghong Song, netdev,
	LKML, Menglong Dong, David Ahern, Martin KaFai Lau,
	Luiz Augusto von Dentz, Richard Gobert, Andrey Konovalov,
	David Rientjes, linux-hardening

On December 6, 2022 5:55:57 PM PST, Jakub Kicinski <kuba@kernel.org> wrote:
>On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
>> -	unsigned int size = frag_size ? : ksize(data);
>> +	unsigned int size = frag_size;
>> +
>> +	/* When frag_size == 0, the buffer came from kmalloc, so we
>> +	 * must find its true allocation size (and grow it to match).
>> +	 */
>> +	if (unlikely(size == 0)) {
>> +		void *resized;
>> +
>> +		size = ksize(data);
>> +		/* krealloc() will immediate return "data" when
>> +		 * "ksize(data)" is requested: it is the existing upper
>> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
>> +		 */
>> +		resized = krealloc(data, size, GFP_ATOMIC);
>> +		if (WARN_ON(resized != data))
>> +			data = resized;
>> +	}
>>  
>
>Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
>using kmalloc()'ed heads is large because GRO can't free the metadata.
>So we end up carrying per-MTU skbs across to the application and then
>freeing them one by one. With pages we just aggregate up to 64k of data
>in a single skb.

This isn't changed by this patch, though? The users of kmalloc+build_skb are pre-existing.

>I can only grep out 3 cases of build_skb(.. 0), could we instead
>convert them into a new build_skb_slab(), and handle all the silliness
>in such a new helper? That'd be a win both for the memory safety and one
>fewer branch for the fast path.

When I went through callers, it was many more than 3. Regardless, I don't see the point: my patch has no more branches than the original code (in fact, it may actually be faster because I made the initial assignment unconditional, and zero-test-after-assign is almost free, where as before it tested before the assign. And now it's marked as unlikely to keep it out-of-line.

>I think it's worth doing, so LMK if you're okay to do this extra work,
>otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I had been changing callers to round up (e.g. bnx2), but it seemed like centralizing this makes more sense. I don't think a different helper will clean this up.

-Kees


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
  2022-12-07  3:47   ` Kees Cook
@ 2022-12-07  4:04     ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2022-12-07  4:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kees Cook, David S. Miller, syzbot+fda18eaa8c12534ccb3b,
	Eric Dumazet, Paolo Abeni, Pavel Begunkov, pepsipu,
	Vlastimil Babka, kasan-dev, Andrii Nakryiko, ast, bpf,
	Daniel Borkmann, Hao Luo, Jesper Dangaard Brouer, John Fastabend,
	jolsa, KP Singh, martin.lau, Stanislav Fomichev, song,
	Yonghong Song, netdev, LKML, Menglong Dong, David Ahern,
	Martin KaFai Lau, Luiz Augusto von Dentz, Richard Gobert,
	Andrey Konovalov, David Rientjes, linux-hardening

On Tue, 06 Dec 2022 19:47:13 -0800 Kees Cook wrote:
> >Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> >using kmalloc()'ed heads is large because GRO can't free the metadata.
> >So we end up carrying per-MTU skbs across to the application and then
> >freeing them one by one. With pages we just aggregate up to 64k of data
> >in a single skb.  
> 
> This isn't changed by this patch, though? The users of
> kmalloc+build_skb are pre-existing.

Yes.

> >I can only grep out 3 cases of build_skb(.. 0), could we instead
> >convert them into a new build_skb_slab(), and handle all the silliness
> >in such a new helper? That'd be a win both for the memory safety and one
> >fewer branch for the fast path.  
> 
> When I went through callers, it was many more than 3. Regardless, I
> don't see the point: my patch has no more branches than the original
> code (in fact, it may actually be faster because I made the initial
> assignment unconditional, and zero-test-after-assign is almost free,
> where as before it tested before the assign. And now it's marked as
> unlikely to keep it out-of-line.

Maybe.

> >I think it's worth doing, so LMK if you're okay to do this extra
> >work, otherwise I can help (unless e.g. Eric tells me I'm wrong..).  
> 
> I had been changing callers to round up (e.g. bnx2), but it seemed
> like centralizing this makes more sense. I don't think a different
> helper will clean this up.

It's a combination of the fact that I think "0 is magic" falls in 
the "garbage" category of APIs, and the fact that driver developers
have many things to worry about, so they often don't know that using
slab is a bad idea. So I want a helper out of the normal path, where 
I can put a kdoc warning that says "if you're doing this - GRO will
suck, use page frags".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
  2022-12-06 23:17 [PATCH] skbuff: Reallocate to ksize() in __build_skb_around() Kees Cook
  2022-12-07  1:55 ` Jakub Kicinski
@ 2022-12-07  9:19 ` Vlastimil Babka
  1 sibling, 0 replies; 6+ messages in thread
From: Vlastimil Babka @ 2022-12-07  9:19 UTC (permalink / raw)
  To: Kees Cook, David S. Miller
  Cc: syzbot+fda18eaa8c12534ccb3b, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Pavel Begunkov, pepsipu, kasan-dev, Andrii Nakryiko,
	ast, bpf, Daniel Borkmann, Hao Luo, Jesper Dangaard Brouer,
	John Fastabend, jolsa, KP Singh, martin.lau, Stanislav Fomichev,
	song, Yonghong Song, netdev, LKML, Menglong Dong, David Ahern,
	Martin KaFai Lau, Luiz Augusto von Dentz, Richard Gobert,
	Andrey Konovalov, David Rientjes, linux-hardening

On 12/7/22 00:17, Kees Cook wrote:
> When build_skb() is passed a frag_size of 0, it means the buffer came
> from kmalloc. In these cases, ksize() is used to find its actual size,
> but since the allocation may not have been made to that size, actually
> perform the krealloc() call so that all the associated buffer size
> checking will be correctly notified. For example, syzkaller reported:
> 
>   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
>   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
> 
> For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> build_skb().

Weren't all such kmalloc() users converted to kmalloc_size_roundup() to
prevent this?

> Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
> Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Pavel Begunkov <asml.silence@gmail.com>
> Cc: pepsipu <soopthegoop@gmail.com>
> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: kasan-dev <kasan-dev@googlegroups.com>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: ast@kernel.org
> Cc: bpf <bpf@vger.kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: jolsa@kernel.org
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: martin.lau@linux.dev
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: song@kernel.org
> Cc: Yonghong Song <yhs@fb.com>
> Cc: netdev@vger.kernel.org
> Cc: LKML <linux-kernel@vger.kernel.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  net/core/skbuff.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d9719e72f9d..b55d061ed8b4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
>  			       unsigned int frag_size)
>  {
>  	struct skb_shared_info *shinfo;
> -	unsigned int size = frag_size ? : ksize(data);
> +	unsigned int size = frag_size;
> +
> +	/* When frag_size == 0, the buffer came from kmalloc, so we
> +	 * must find its true allocation size (and grow it to match).
> +	 */
> +	if (unlikely(size == 0)) {
> +		void *resized;
> +
> +		size = ksize(data);
> +		/* krealloc() will immediate return "data" when
> +		 * "ksize(data)" is requested: it is the existing upper
> +		 * bounds. As a result, GFP_ATOMIC will be ignored.
> +		 */
> +		resized = krealloc(data, size, GFP_ATOMIC);
> +		if (WARN_ON(resized != data))

WARN_ON_ONCE() could be sufficient as either this is impossible to hit by
definition, or something went very wrong (a patch screwed ksize/krealloc?)
and it can be hit many times?

> +			data = resized;

In that "impossible" case, this could also end up as NULL due to GFP_ATOMIC
allocation failure, but maybe it's really impractical to do anything about it...

> +	}
>  
>  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] skbuff: Reallocate to ksize() in __build_skb_around()
  2022-12-07  1:55 ` Jakub Kicinski
  2022-12-07  3:47   ` Kees Cook
@ 2022-12-07 10:30   ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2022-12-07 10:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kees Cook, David S. Miller, syzbot+fda18eaa8c12534ccb3b,
	Paolo Abeni, Pavel Begunkov, pepsipu, Vlastimil Babka, kasan-dev,
	Andrii Nakryiko, ast, bpf, Daniel Borkmann, Hao Luo,
	Jesper Dangaard Brouer, John Fastabend, jolsa, KP Singh,
	martin.lau, Stanislav Fomichev, song, Yonghong Song, netdev,
	LKML, Menglong Dong, David Ahern, Martin KaFai Lau,
	Luiz Augusto von Dentz, Richard Gobert, Andrey Konovalov,
	David Rientjes, linux-hardening

On Wed, Dec 7, 2022 at 2:56 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue,  6 Dec 2022 15:17:14 -0800 Kees Cook wrote:
> > -     unsigned int size = frag_size ? : ksize(data);
> > +     unsigned int size = frag_size;
> > +
> > +     /* When frag_size == 0, the buffer came from kmalloc, so we
> > +      * must find its true allocation size (and grow it to match).
> > +      */
> > +     if (unlikely(size == 0)) {
> > +             void *resized;
> > +
> > +             size = ksize(data);
> > +             /* krealloc() will immediate return "data" when
> > +              * "ksize(data)" is requested: it is the existing upper
> > +              * bounds. As a result, GFP_ATOMIC will be ignored.
> > +              */
> > +             resized = krealloc(data, size, GFP_ATOMIC);
> > +             if (WARN_ON(resized != data))
> > +                     data = resized;
> > +     }
> >
>
> Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of
> using kmalloc()'ed heads is large because GRO can't free the metadata.
> So we end up carrying per-MTU skbs across to the application and then
> freeing them one by one. With pages we just aggregate up to 64k of data
> in a single skb.
>
> I can only grep out 3 cases of build_skb(.. 0), could we instead
> convert them into a new build_skb_slab(), and handle all the silliness
> in such a new helper? That'd be a win both for the memory safety and one
> fewer branch for the fast path.
>
> I think it's worth doing, so LMK if you're okay to do this extra work,
> otherwise I can help (unless e.g. Eric tells me I'm wrong..).

I totally agree, I would indeed remove ksize() use completely,
let callers give us the size, and the head_frag boolean,
instead of inferring from size==0

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-07 10:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06 23:17 [PATCH] skbuff: Reallocate to ksize() in __build_skb_around() Kees Cook
2022-12-07  1:55 ` Jakub Kicinski
2022-12-07  3:47   ` Kees Cook
2022-12-07  4:04     ` Jakub Kicinski
2022-12-07 10:30   ` Eric Dumazet
2022-12-07  9:19 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).