All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
@ 2021-09-17 16:24 Jakub Kicinski
  2021-09-17 18:15 ` Vasily Averin
  2021-09-18 10:05 ` Vasily Averin
  0 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-17 16:24 UTC (permalink / raw)
  To: eric.dumazet
  Cc: netdev, Vasily Averin, Christoph Paasch, Hao Sun, Jakub Kicinski

From: Vasily Averin <vvs@virtuozzo.com>

Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
 - skb_set_owner_w() for newly cloned skb is called too early,
   before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
 - pskb_expand_head() does not adjust truesize in (skb->sk) case.
   In this case sk->sk_wmem_alloc should be adjusted too.

Eric cautions us against increasing sk_wmem_alloc if the old
skb did not hold any wmem references.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v7: - shift more magic into helpers
    - follow Eric's advice and don't inherit non-wmem sks for now

Looks like we stalled here, let me try to push this forward.
This builds, is it possible to repro without syzcaller?
Anyone willing to test?
---
 include/net/sock.h |  2 ++
 net/core/skbuff.c  | 50 +++++++++++++++++++++++++++++++++++-----------
 net/core/sock.c    | 10 ++++++++++
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 66a9a90f9558..102e3e1009d1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1707,6 +1707,8 @@ void sock_pfree(struct sk_buff *skb);
 #define sock_edemux sock_efree
 #endif
 
+bool is_skb_wmem(const struct sk_buff *skb);
+
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7c2ab27fcbf9..5093321c2b65 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1786,6 +1786,24 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
 }
 EXPORT_SYMBOL(skb_realloc_headroom);
 
+static void skb_owner_inherit(struct sk_buff *nskb, struct sk_buff *oskb)
+{
+	if (is_skb_wmem(oskb))
+		skb_set_owner_w(nskb, oskb->sk);
+
+	/* handle rmem sock etc. as needed .. */
+}
+
+static void skb_increase_truesize(struct sk_buff *skb, unsigned int add)
+{
+	if (is_skb_wmem(skb))
+		refcount_add(add, &skb->sk->sk_wmem_alloc);
+	/* handle rmem sock etc. as needed .. */
+	WARN_ON(skb->destructor == sock_rfree);
+
+	skb->truesize += add;
+}
+
 /**
  *	skb_expand_head - reallocate header of &sk_buff
  *	@skb: buffer to reallocate
@@ -1801,6 +1819,7 @@ EXPORT_SYMBOL(skb_realloc_headroom);
 struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 {
 	int delta = headroom - skb_headroom(skb);
+	int osize = skb_end_offset(skb);
 
 	if (WARN_ONCE(delta <= 0,
 		      "%s is expecting an increase in the headroom", __func__))
@@ -1810,21 +1829,28 @@ struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 	if (skb_shared(skb)) {
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
-		if (likely(nskb)) {
-			if (skb->sk)
-				skb_set_owner_w(nskb, skb->sk);
-			consume_skb(skb);
-		} else {
-			kfree_skb(skb);
-		}
+		if (unlikely(!nskb))
+			goto err_free;
+
+		skb_owner_inherit(nskb, skb);
+		consume_skb(skb);
 		skb = nskb;
 	}
-	if (skb &&
-	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-		kfree_skb(skb);
-		skb = NULL;
-	}
+
+	if (pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
+		goto err_free;
+	delta = skb_end_offset(skb) - osize;
+
+	/* pskb_expand_head() will adjust truesize itself for non-sk cases
+	 * todo: move the adjustment there at some point?
+	 */
+	if (skb->sk && skb->destructor != sock_edemux)
+		skb_increase_truesize(skb, delta);
+
 	return skb;
+err_free:
+	kfree_skb(skb);
+	return NULL;
 }
 EXPORT_SYMBOL(skb_expand_head);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 62627e868e03..1483b4f755ef 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2227,6 +2227,16 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
+/* Should clones of this skb count towards skb->sk->sk_wmem_alloc
+ * and use sock_wfree() as their destructor?
+ */
+bool is_skb_wmem(const struct sk_buff *skb)
+{
+	return skb->destructor == sock_wfree ||
+		skb->destructor == __sock_wfree ||
+		(IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+
 static bool can_skb_orphan_partial(const struct sk_buff *skb)
 {
 #ifdef CONFIG_TLS_DEVICE
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-17 16:24 [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly Jakub Kicinski
@ 2021-09-17 18:15 ` Vasily Averin
  2021-09-18 10:05 ` Vasily Averin
  1 sibling, 0 replies; 19+ messages in thread
From: Vasily Averin @ 2021-09-17 18:15 UTC (permalink / raw)
  To: Jakub Kicinski, eric.dumazet; +Cc: netdev, Christoph Paasch, Hao Sun

On 9/17/21 7:24 PM, Jakub Kicinski wrote:
> From: Vasily Averin <vvs@virtuozzo.com>
> ---
> v7: - shift more magic into helpers
>     - follow Eric's advice and don't inherit non-wmem sks for now
> 
> Looks like we stalled here, let me try to push this forward.
> This builds, is it possible to repro without syzcaller?
> Anyone willing to test?

I'm going to review and test this on this weekend.

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-17 16:24 [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly Jakub Kicinski
  2021-09-17 18:15 ` Vasily Averin
@ 2021-09-18 10:05 ` Vasily Averin
  2021-09-20 18:12   ` Jakub Kicinski
  1 sibling, 1 reply; 19+ messages in thread
From: Vasily Averin @ 2021-09-18 10:05 UTC (permalink / raw)
  To: Jakub Kicinski, eric.dumazet; +Cc: netdev, Christoph Paasch, Hao Sun

On 9/17/21 7:24 PM, Jakub Kicinski wrote:
> From: Vasily Averin <vvs@virtuozzo.com>
> 
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
>  - skb_set_owner_w() for newly cloned skb is called too early,
>    before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
>  - pskb_expand_head() does not adjust truesize in (skb->sk) case.
>    In this case sk->sk_wmem_alloc should be adjusted too.
> 
> Eric cautions us against increasing sk_wmem_alloc if the old
> skb did not hold any wmem references.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v7: - shift more magic into helpers
>     - follow Eric's advice and don't inherit non-wmem sks for now
> 
> Looks like we stalled here, let me try to push this forward.
> This builds, is it possible to repro without syzcaller?
> Anyone willing to test?
> ---
>  include/net/sock.h |  2 ++
>  net/core/skbuff.c  | 50 +++++++++++++++++++++++++++++++++++-----------
>  net/core/sock.c    | 10 ++++++++++
>  3 files changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66a9a90f9558..102e3e1009d1 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1707,6 +1707,8 @@ void sock_pfree(struct sk_buff *skb);
>  #define sock_edemux sock_efree
>  #endif
>  
> +bool is_skb_wmem(const struct sk_buff *skb);
> +
>  int sock_setsockopt(struct socket *sock, int level, int op,
>  		    sockptr_t optval, unsigned int optlen);
>  
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7c2ab27fcbf9..5093321c2b65 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1786,6 +1786,24 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
>  }
>  EXPORT_SYMBOL(skb_realloc_headroom);
>  
> +static void skb_owner_inherit(struct sk_buff *nskb, struct sk_buff *oskb)
> +{
> +	if (is_skb_wmem(oskb))
> +		skb_set_owner_w(nskb, oskb->sk);
> +
> +	/* handle rmem sock etc. as needed .. */
> +}
> +
> +static void skb_increase_truesize(struct sk_buff *skb, unsigned int add)
> +{
> +	if (is_skb_wmem(skb))
> +		refcount_add(add, &skb->sk->sk_wmem_alloc);
> +	/* handle rmem sock etc. as needed .. */
> +	WARN_ON(skb->destructor == sock_rfree);
> +
> +	skb->truesize += add;
> +}
> +
>  /**
>   *	skb_expand_head - reallocate header of &sk_buff
>   *	@skb: buffer to reallocate
> @@ -1801,6 +1819,7 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
> @@ -1810,21 +1829,28 @@ struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  	if (skb_shared(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto err_free;
> +
> +		skb_owner_inherit(nskb, skb);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> -	}
> +
> +	if (pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
> +		goto err_free;
> +	delta = skb_end_offset(skb) - osize;
> +
> +	/* pskb_expand_head() will adjust truesize itself for non-sk cases
> +	 * todo: move the adjustment there at some point?
> +	 */
> +	if (skb->sk && skb->destructor != sock_edemux)
> +		skb_increase_truesize(skb, delta);

I think it is wrong.
1) there are a few skb destructors called sock_wfree inside. I've found: 
   tpacket_destruct_skb, sctp_wfree, unix_destruct_scm and xsk_destruct_skb.
   If any such skb can be use here it will not adjust sk_wmem_alloc.   I afraid there might be other similar destructors, out of tree,
   so we cannot have full white list for wfree-compatible destructors.

2) in fact you increase truesize here for all skb types.
   If it is acceptable it could be done directly inside pskb_expand_head().
   However it isn't.  As you pointed sock_rfree case is handled incorrectly. 
   I've found other similar destructors: sock_rmem_free, netlink_skb_destructor,
   kcm_rfree, sock_ofree. They will be handled incorrectly too, but even without WARN_ON.
   Few other descriptors seems should not fail but do not require truesize update.

From my POV v6 patch version works correctly in any cases. If necessary it calls
original destructor, correctly set up new one and correctly adjust truesize
and sk_wmem_alloc.
If you still have doubts, we can just go back and clone non-wmem skb, 
like we did before.

Thank you,
	Vasily Averin

>  	return skb;
> +err_free:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 62627e868e03..1483b4f755ef 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,16 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +/* Should clones of this skb count towards skb->sk->sk_wmem_alloc
> + * and use sock_wfree() as their destructor?
> + */
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +		skb->destructor == __sock_wfree ||
> +		(IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +
>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-18 10:05 ` Vasily Averin
@ 2021-09-20 18:12   ` Jakub Kicinski
  2021-09-20 21:41     ` Vasily Averin
  2021-09-20 21:41     ` [PATCH net v8] " Vasily Averin
  0 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-20 18:12 UTC (permalink / raw)
  To: Vasily Averin; +Cc: eric.dumazet, netdev, Christoph Paasch, Hao Sun

On Sat, 18 Sep 2021 13:05:28 +0300 Vasily Averin wrote:
> On 9/17/21 7:24 PM, Jakub Kicinski wrote:
> > From: Vasily Averin <vvs@virtuozzo.com>
> > 
> > Christoph Paasch reports [1] about incorrect skb->truesize
> > after skb_expand_head() call in ip6_xmit.
> > This may happen because of two reasons:
> >  - skb_set_owner_w() for newly cloned skb is called too early,
> >    before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> >  - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> >    In this case sk->sk_wmem_alloc should be adjusted too.
> > 
> > Eric cautions us against increasing sk_wmem_alloc if the old
> > skb did not hold any wmem references.

> > @@ -1810,21 +1829,28 @@ struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
> >  	if (skb_shared(skb)) {
> >  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
> >  
> > -		if (likely(nskb)) {
> > -			if (skb->sk)
> > -				skb_set_owner_w(nskb, skb->sk);
> > -			consume_skb(skb);
> > -		} else {
> > -			kfree_skb(skb);
> > -		}
> > +		if (unlikely(!nskb))
> > +			goto err_free;
> > +
> > +		skb_owner_inherit(nskb, skb);
> > +		consume_skb(skb);
> >  		skb = nskb;
> >  	}
> > -	if (skb &&
> > -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> > -		kfree_skb(skb);
> > -		skb = NULL;
> > -	}
> > +
> > +	if (pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
> > +		goto err_free;
> > +	delta = skb_end_offset(skb) - osize;
> > +
> > +	/* pskb_expand_head() will adjust truesize itself for non-sk cases
> > +	 * todo: move the adjustment there at some point?
> > +	 */
> > +	if (skb->sk && skb->destructor != sock_edemux)
> > +		skb_increase_truesize(skb, delta);  
> 
> I think it is wrong.
> 1) there are a few skb destructors called sock_wfree inside. I've found: 
>    tpacket_destruct_skb, sctp_wfree, unix_destruct_scm and xsk_destruct_skb.
>    If any such skb can be use here it will not adjust sk_wmem_alloc.   I afraid there might be other similar destructors, out of tree,
>    so we cannot have full white list for wfree-compatible destructors.
> 
> 2) in fact you increase truesize here for all skb types.
>    If it is acceptable it could be done directly inside pskb_expand_head().
>    However it isn't.  As you pointed sock_rfree case is handled incorrectly. 
>    I've found other similar destructors: sock_rmem_free, netlink_skb_destructor,
>    kcm_rfree, sock_ofree. They will be handled incorrectly too, but even without WARN_ON.
>    Few other descriptors seems should not fail but do not require truesize update.
> 
> From my POV v6 patch version works correctly in any cases. If necessary it calls
> original destructor, correctly set up new one and correctly adjust truesize
> and sk_wmem_alloc.
> If you still have doubts, we can just go back and clone non-wmem skb, 
> like we did before.

Thanks for taking a look. I would prefer not to bake any ideas about
the skb's function into generic functions. Enumerating every destructor
callback in generic code is impossible (technically so, since the code
may reside in modules).

Let me think about it. Perhaps we can extend sock callbacks with
skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
handling the adjustments done on splitting to the protocols. I'll see
if that's feasible unless someone can immediately call this path
ghastly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-20 18:12   ` Jakub Kicinski
@ 2021-09-20 21:41     ` Vasily Averin
  2021-09-21  0:39       ` Jakub Kicinski
  2021-09-20 21:41     ` [PATCH net v8] " Vasily Averin
  1 sibling, 1 reply; 19+ messages in thread
From: Vasily Averin @ 2021-09-20 21:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: eric.dumazet, netdev, Christoph Paasch, Hao Sun

On 9/20/21 9:12 PM, Jakub Kicinski wrote:
> On Sat, 18 Sep 2021 13:05:28 +0300 Vasily Averin wrote:
>> On 9/17/21 7:24 PM, Jakub Kicinski wrote:
>>> From: Vasily Averin <vvs@virtuozzo.com>
>>>
>>> Christoph Paasch reports [1] about incorrect skb->truesize
>>> after skb_expand_head() call in ip6_xmit.
>>> This may happen because of two reasons:
>>>  - skb_set_owner_w() for newly cloned skb is called too early,
>>>    before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
>>>  - pskb_expand_head() does not adjust truesize in (skb->sk) case.
>>>    In this case sk->sk_wmem_alloc should be adjusted too.
>>>
>>> Eric cautions us against increasing sk_wmem_alloc if the old
>>> skb did not hold any wmem references.
> 
>>> @@ -1810,21 +1829,28 @@ struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>>>  	if (skb_shared(skb)) {
>>>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>>>  
>>> -		if (likely(nskb)) {
>>> -			if (skb->sk)
>>> -				skb_set_owner_w(nskb, skb->sk);
>>> -			consume_skb(skb);
>>> -		} else {
>>> -			kfree_skb(skb);
>>> -		}
>>> +		if (unlikely(!nskb))
>>> +			goto err_free;
>>> +
>>> +		skb_owner_inherit(nskb, skb);
>>> +		consume_skb(skb);
>>>  		skb = nskb;
>>>  	}
>>> -	if (skb &&
>>> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
>>> -		kfree_skb(skb);
>>> -		skb = NULL;
>>> -	}
>>> +
>>> +	if (pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC))
>>> +		goto err_free;
>>> +	delta = skb_end_offset(skb) - osize;
>>> +
>>> +	/* pskb_expand_head() will adjust truesize itself for non-sk cases
>>> +	 * todo: move the adjustment there at some point?
>>> +	 */
>>> +	if (skb->sk && skb->destructor != sock_edemux)
>>> +		skb_increase_truesize(skb, delta);  
>>
>> I think it is wrong.
>> 1) there are a few skb destructors called sock_wfree inside. I've found: 
>>    tpacket_destruct_skb, sctp_wfree, unix_destruct_scm and xsk_destruct_skb.
>>    If any such skb can be use here it will not adjust sk_wmem_alloc.   I afraid there might be other similar destructors, out of tree,
>>    so we cannot have full white list for wfree-compatible destructors.
>>
>> 2) in fact you increase truesize here for all skb types.
>>    If it is acceptable it could be done directly inside pskb_expand_head().
>>    However it isn't.  As you pointed sock_rfree case is handled incorrectly. 
>>    I've found other similar destructors: sock_rmem_free, netlink_skb_destructor,
>>    kcm_rfree, sock_ofree. They will be handled incorrectly too, but even without WARN_ON.
>>    Few other descriptors seems should not fail but do not require truesize update.
>>
>> From my POV v6 patch version works correctly in any cases. If necessary it calls
>> original destructor, correctly set up new one and correctly adjust truesize
>> and sk_wmem_alloc.
>> If you still have doubts, we can just go back and clone non-wmem skb, 
>> like we did before.
> 
> Thanks for taking a look. I would prefer not to bake any ideas about
> the skb's function into generic functions. Enumerating every destructor
> callback in generic code is impossible (technically so, since the code
> may reside in modules).
> 
> Let me think about it. Perhaps we can extend sock callbacks with
> skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
> handling the adjustments done on splitting to the protocols. I'll see
> if that's feasible unless someone can immediately call this path
> ghastly.

This is similar to Alexey Kuznetsov's suggestion for me, 
see https://lkml.org/lkml/2021/8/27/460

However I think we can do it later,
right now we need to fix somehow broken skb_expand_head(),
please take look at v8.

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net v8] skb_expand_head() adjust skb->truesize incorrectly
  2021-09-20 18:12   ` Jakub Kicinski
  2021-09-20 21:41     ` Vasily Averin
@ 2021-09-20 21:41     ` Vasily Averin
  2021-09-21  5:21       ` Vasily Averin
  1 sibling, 1 reply; 19+ messages in thread
From: Vasily Averin @ 2021-09-20 21:41 UTC (permalink / raw)
  To: Christoph Paasch, Jakub Kicinski, Eric Dumazet
  Cc: David S. Miller, Hideaki YOSHIFUJI, David Ahern, netdev,
	linux-kernel, kernel, Julian Wiedmann

Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
- skb_set_owner_w() for newly cloned skb is called too early,
before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
- pskb_expand_head() does not adjust truesize in (skb->sk) case.
In this case sk->sk_wmem_alloc should be adjusted too.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v8: clone non-wmem skb
V7 (from kuba@):
    shift more magic into helpers,
    follow Eric's advice and don't inherit non-wmem skbs for now
v6: fixed delta,
    improved comments
v5: fixed else condition, thanks to Eric
    reworked update of expanded skb,
    added corresponding comments
v4: decided to use is_skb_wmem() after pskb_expand_head() call
    fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
v3: removed __pskb_expand_head(),
    added is_skb_wmem() helper for skb with wmem-compatible destructors
    there are 2 ways to use it:
     - before pskb_expand_head(), to create skb clones
     - after successfull pskb_expand_head() to change owner on extended skb.
v2: based on patch version from Eric Dumazet,
    added __pskb_expand_head() function, which can be forced
    to adjust skb->truesize and sk->sk_wmem_alloc.
---
 include/net/sock.h |  1 +
 net/core/skbuff.c  | 33 +++++++++++++++++++++------------
 net/core/sock.c    |  8 ++++++++
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 95b2577..173d58c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 			     gfp_t priority);
 void __sock_wfree(struct sk_buff *skb);
 void sock_wfree(struct sk_buff *skb);
+bool is_skb_wmem(const struct sk_buff *skb);
 struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
 			     gfp_t priority);
 void skb_orphan_partial(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f931176..4b49f63 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1804,30 +1804,39 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
 struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 {
 	int delta = headroom - skb_headroom(skb);
+	int osize = skb_end_offset(skb);
+	struct sock *sk = skb->sk;
 
 	if (WARN_ONCE(delta <= 0,
 		      "%s is expecting an increase in the headroom", __func__))
 		return skb;
 
+	delta = SKB_DATA_ALIGN(delta);
 	/* pskb_expand_head() might crash, if skb is shared */
-	if (skb_shared(skb)) {
+	if (skb_shared(skb) || !is_skb_wmem(skb)) {
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
-		if (likely(nskb)) {
-			if (skb->sk)
-				skb_set_owner_w(nskb, skb->sk);
-			consume_skb(skb);
-		} else {
-			kfree_skb(skb);
-		}
+		if (unlikely(!nskb))
+			goto fail;
+
+		if (sk)
+			skb_set_owner_w(nskb, sk);
+		consume_skb(skb);
 		skb = nskb;
 	}
-	if (skb &&
-	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-		kfree_skb(skb);
-		skb = NULL;
+	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
+		goto fail;
+
+	if (sk) {
+		delta = skb_end_offset(skb) - osize;
+		refcount_add(delta, &sk->sk_wmem_alloc);
+		skb->truesize += delta;
 	}
 	return skb;
+
+fail:
+	kfree_skb(skb);
+	return NULL;
 }
 EXPORT_SYMBOL(skb_expand_head);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 950f1e7..6cbda43 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
+bool is_skb_wmem(const struct sk_buff *skb)
+{
+	return skb->destructor == sock_wfree ||
+	       skb->destructor == __sock_wfree ||
+	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+EXPORT_SYMBOL(is_skb_wmem);
+
 static bool can_skb_orphan_partial(const struct sk_buff *skb)
 {
 #ifdef CONFIG_TLS_DEVICE
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-20 21:41     ` Vasily Averin
@ 2021-09-21  0:39       ` Jakub Kicinski
  2021-09-21  6:36         ` Vasily Averin
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-21  0:39 UTC (permalink / raw)
  To: Vasily Averin; +Cc: eric.dumazet, netdev, Christoph Paasch, Hao Sun

On Tue, 21 Sep 2021 00:41:15 +0300 Vasily Averin wrote:
> > Thanks for taking a look. I would prefer not to bake any ideas about
> > the skb's function into generic functions. Enumerating every destructor
> > callback in generic code is impossible (technically so, since the code
> > may reside in modules).
> > 
> > Let me think about it. Perhaps we can extend sock callbacks with
> > skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
> > handling the adjustments done on splitting to the protocols. I'll see
> > if that's feasible unless someone can immediately call this path
> > ghastly.  
> 
> This is similar to Alexey Kuznetsov's suggestion for me, 
> see https://lkml.org/lkml/2021/8/27/460

Interesting, I wasn't thinking of keeping the ops pointer in every skb.

> However I think we can do it later,
> right now we need to fix somehow broken skb_expand_head(),
> please take look at v8.

I think v8 still has the issue that Eric was explaining over and over.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v8] skb_expand_head() adjust skb->truesize incorrectly
  2021-09-20 21:41     ` [PATCH net v8] " Vasily Averin
@ 2021-09-21  5:21       ` Vasily Averin
  2021-10-04 13:00         ` [PATCH net v9] " Vasily Averin
  0 siblings, 1 reply; 19+ messages in thread
From: Vasily Averin @ 2021-09-21  5:21 UTC (permalink / raw)
  To: Christoph Paasch, Jakub Kicinski, Eric Dumazet
  Cc: David S. Miller, Hideaki YOSHIFUJI, David Ahern, netdev,
	linux-kernel, kernel, Julian Wiedmann

On 9/21/21 12:41 AM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v8: clone non-wmem skb
> V7 (from kuba@):
>     shift more magic into helpers,
>     follow Eric's advice and don't inherit non-wmem skbs for now
> v6: fixed delta,
>     improved comments
> v5: fixed else condition, thanks to Eric
>     reworked update of expanded skb,
>     added corresponding comments
> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
> v3: removed __pskb_expand_head(),
>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>     there are 2 ways to use it:
>      - before pskb_expand_head(), to create skb clones
>      - after successfull pskb_expand_head() to change owner on extended skb.
> v2: based on patch version from Eric Dumazet,
>     added __pskb_expand_head() function, which can be forced
>     to adjust skb->truesize and sk->sk_wmem_alloc.
> ---
>  include/net/sock.h |  1 +
>  net/core/skbuff.c  | 33 +++++++++++++++++++++------------
>  net/core/sock.c    |  8 ++++++++
>  3 files changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 95b2577..173d58c 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>  			     gfp_t priority);
>  void __sock_wfree(struct sk_buff *skb);
>  void sock_wfree(struct sk_buff *skb);
> +bool is_skb_wmem(const struct sk_buff *skb);
>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>  			     gfp_t priority);
>  void skb_orphan_partial(struct sk_buff *skb);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f931176..4b49f63 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1804,30 +1804,39 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, unsigned int headroom)
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
> +	struct sock *sk = skb->sk;
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
>  		return skb;
>  
> +	delta = SKB_DATA_ALIGN(delta);
>  	/* pskb_expand_head() might crash, if skb is shared */
> -	if (skb_shared(skb)) {
> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto fail;
> +
> +		if (sk)
> +			skb_set_owner_w(nskb, sk);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
> +		goto fail;
> +
> +	if (sk) {
sock_edemux check is still required here too.
> +		delta = skb_end_offset(skb) - osize;
> +		refcount_add(delta, &sk->sk_wmem_alloc);
> +		skb->truesize += delta;
>  	}
>  	return skb;
> +
> +fail:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 950f1e7..6cbda43 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +	       skb->destructor == __sock_wfree ||
> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +EXPORT_SYMBOL(is_skb_wmem);
> +
>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-21  0:39       ` Jakub Kicinski
@ 2021-09-21  6:36         ` Vasily Averin
  2021-09-21 21:25           ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Vasily Averin @ 2021-09-21  6:36 UTC (permalink / raw)
  To: Jakub Kicinski, Eric Dumazet; +Cc: netdev, Christoph Paasch, Hao Sun, kernel

On 9/21/21 3:39 AM, Jakub Kicinski wrote:
> On Tue, 21 Sep 2021 00:41:15 +0300 Vasily Averin wrote:
>>> Thanks for taking a look. I would prefer not to bake any ideas about
>>> the skb's function into generic functions. Enumerating every destructor
>>> callback in generic code is impossible (technically so, since the code
>>> may reside in modules).
>>>
>>> Let me think about it. Perhaps we can extend sock callbacks with
>>> skb_sock_inherit, and skb_adjust_trusize? That'd transfer the onus of
>>> handling the adjustments done on splitting to the protocols. I'll see
>>> if that's feasible unless someone can immediately call this path
>>> ghastly.  
>>
>> This is similar to Alexey Kuznetsov's suggestion for me, 
>> see https://lkml.org/lkml/2021/8/27/460
> 
> Interesting, I wasn't thinking of keeping the ops pointer in every skb.
> 
>> However I think we can do it later,
>> right now we need to fix somehow broken skb_expand_head(),
>> please take look at v8.
> 
> I think v8 still has the issue that Eric was explaining over and over.

I've missed sock_edemux check, however I do not see any other issues.
Could you please explain what problem you talking about?

Eric said:
"it is not valid to call skb_set_owner_w(skb, sk) on all kind of sockets",
because socket might have been closed already.

Before the call we have old skb with sk reference, so sk is not closed yet
and have nonzero sk->sk_wmem_alloc.

During the call, skb_set_owner_w calls skb_orphan that calls old skb destructor.
Yes, it can decrement last sk reference and release the socket, 
and I think this is exactly the problem that Eric was pointing out: 
now sk access is unsafe.

However it can be prevented in at least 2 ways:
a) clone old skb and call skb_set_owner_w(nskb, sk) before skb_consume(oskb).
   In this case, skb_orphan does not call old destructor, because at this point
   nskb->sk = NULL and nskb->destructor = NULL, and sk reference is kept by oskb.  
   This is widely used in current code (ppp_xmit, ipip6_tunnel_xmit, 
   ip_vs_prepare_tunneled_skb and so on).
   This is used in v8 too.
b) Alternatively, extra refs on sk->sk_wmem_alloc and sk->sk_refcnt can be
   carefully taken before skb_set_owner_w() call. These references will not allow
   to release sk during old destructor's execution. 
   This was used in v6, and I think this should works correctly too. 

Could you please explain where I am wrong?
Do you talking about some other issue perhaps?

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly
  2021-09-21  6:36         ` Vasily Averin
@ 2021-09-21 21:25           ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-21 21:25 UTC (permalink / raw)
  To: Vasily Averin; +Cc: Eric Dumazet, netdev, Christoph Paasch, Hao Sun, kernel

On Tue, 21 Sep 2021 09:36:26 +0300 Vasily Averin wrote:
> >> However I think we can do it later,
> >> right now we need to fix somehow broken skb_expand_head(),
> >> please take look at v8.  
> > 
> > I think v8 still has the issue that Eric was explaining over and over.  
> 
> I've missed sock_edemux check, however I do not see any other issues.
> Could you please explain what problem you talking about?
> 
> Eric said:
> "it is not valid to call skb_set_owner_w(skb, sk) on all kind of sockets",
> because socket might have been closed already.
> 
> Before the call we have old skb with sk reference, so sk is not closed yet
> and have nonzero sk->sk_wmem_alloc.
> 
> During the call, skb_set_owner_w calls skb_orphan that calls old skb destructor.
> Yes, it can decrement last sk reference and release the socket, 
> and I think this is exactly the problem that Eric was pointing out: 
> now sk access is unsafe.
> 
> However it can be prevented in at least 2 ways:
> a) clone old skb and call skb_set_owner_w(nskb, sk) before skb_consume(oskb).
>    In this case, skb_orphan does not call old destructor, because at this point
>    nskb->sk = NULL and nskb->destructor = NULL, and sk reference is kept by oskb.  
>    This is widely used in current code (ppp_xmit, ipip6_tunnel_xmit, 
>    ip_vs_prepare_tunneled_skb and so on).
>    This is used in v8 too.
> b) Alternatively, extra refs on sk->sk_wmem_alloc and sk->sk_refcnt can be
>    carefully taken before skb_set_owner_w() call. These references will not allow
>    to release sk during old destructor's execution. 
>    This was used in v6, and I think this should works correctly too. 
> 
> Could you please explain where I am wrong?
> Do you talking about some other issue perhaps?

I'm not particularly interested in being part of the arguing here.
If Eric acks your code it will be applied. I can do my cleanups on top.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-09-21  5:21       ` Vasily Averin
@ 2021-10-04 13:00         ` Vasily Averin
  2021-10-04 13:14           ` Vasily Averin
  2021-10-04 19:26           ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Vasily Averin @ 2021-10-04 13:00 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, David S. Miller, Hideaki YOSHIFUJI,
	David Ahern, Julian Wiedmann, Christoph Paasch, linux-kernel,
	kernel

Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
- skb_set_owner_w() for newly cloned skb is called too early,
before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
- pskb_expand_head() does not adjust truesize in (skb->sk) case.
In this case sk->sk_wmem_alloc should be adjusted too.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v9: restored sock_edemux check
v8: clone non-wmem skb
v7 (from kuba@):
    shift more magic into helpers,
    follow Eric's advice and don't inherit non-wmem skbs for now
v6: fixed delta,
    improved comments
v5: fixed else condition, thanks to Eric
    reworked update of expanded skb,
    added corresponding comments
v4: decided to use is_skb_wmem() after pskb_expand_head() call
    fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
v3: removed __pskb_expand_head(),
    added is_skb_wmem() helper for skb with wmem-compatible destructors
    there are 2 ways to use it:
     - before pskb_expand_head(), to create skb clones
     - after successfull pskb_expand_head() to change owner on extended
       skb.
v2: based on patch version from Eric Dumazet,
    added __pskb_expand_head() function, which can be forced
    to adjust skb->truesize and sk->sk_wmem_alloc.
---
 include/net/sock.h |  1 +
 net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
 net/core/sock.c    |  8 ++++++++
 3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 66a9a90f9558..a547651d46a7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
 			     gfp_t priority);
 void __sock_wfree(struct sk_buff *skb);
 void sock_wfree(struct sk_buff *skb);
+bool is_skb_wmem(const struct sk_buff *skb);
 struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
 			     gfp_t priority);
 void skb_orphan_partial(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2170bea2c7de..8cb6d360cda5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
 struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 {
 	int delta = headroom - skb_headroom(skb);
+	int osize = skb_end_offset(skb);
+	struct sock *sk = skb->sk;
 
 	if (WARN_ONCE(delta <= 0,
 		      "%s is expecting an increase in the headroom", __func__))
 		return skb;
 
-	/* pskb_expand_head() might crash, if skb is shared */
-	if (skb_shared(skb)) {
+	delta = SKB_DATA_ALIGN(delta);
+	/* pskb_expand_head() might crash, if skb is shared. */
+	if (skb_shared(skb) || !is_skb_wmem(skb)) {
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
-		if (likely(nskb)) {
-			if (skb->sk)
-				skb_set_owner_w(nskb, skb->sk);
-			consume_skb(skb);
-		} else {
-			kfree_skb(skb);
-		}
+		if (unlikely(!nskb))
+			goto fail;
+
+		if (sk)
+			skb_set_owner_w(nskb, sk);
+		consume_skb(skb);
 		skb = nskb;
 	}
-	if (skb &&
-	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-		kfree_skb(skb);
-		skb = NULL;
+	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
+		goto fail;
+
+	if (sk && skb->destructor != sock_edemux) {
+		delta = skb_end_offset(skb) - osize;
+		refcount_add(delta, &sk->sk_wmem_alloc);
+		skb->truesize += delta;
 	}
 	return skb;
+
+fail:
+	kfree_skb(skb);
+	return NULL;
 }
 EXPORT_SYMBOL(skb_expand_head);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 62627e868e03..1932755ae9ba 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
 }
 EXPORT_SYMBOL(skb_set_owner_w);
 
+bool is_skb_wmem(const struct sk_buff *skb)
+{
+	return skb->destructor == sock_wfree ||
+	       skb->destructor == __sock_wfree ||
+	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+EXPORT_SYMBOL(is_skb_wmem);
+
 static bool can_skb_orphan_partial(const struct sk_buff *skb)
 {
 #ifdef CONFIG_TLS_DEVICE
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-04 13:00         ` [PATCH net v9] " Vasily Averin
@ 2021-10-04 13:14           ` Vasily Averin
  2021-10-04 19:26           ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Vasily Averin @ 2021-10-04 13:14 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel

Dear Eric,
could you please take look at this patch?

Original issue reported by Christoph Paasch is still not fixed,
however now buggy paches was merged into upstream.

v6 patch version [1] fixes the problem by careful change of destructor
on existing skb. I still think it is correct however I'm agree
it requires careful review.

[1] https://lkml.org/lkml/2021/9/6/584

This patch version is more simple and returns to cloning of non-wmem skb.

Both variants (i.e. this one and v6) resolves the problem.

Could you please review the patches, select one of them or propose some
better solution?

Thank you,
	Vasily Averin

On 10/4/21 4:00 PM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v9: restored sock_edemux check
> v8: clone non-wmem skb
> v7 (from kuba@):
>     shift more magic into helpers,
>     follow Eric's advice and don't inherit non-wmem skbs for now
> v6: fixed delta,
>     improved comments
> v5: fixed else condition, thanks to Eric
>     reworked update of expanded skb,
>     added corresponding comments
> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
> v3: removed __pskb_expand_head(),
>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>     there are 2 ways to use it:
>      - before pskb_expand_head(), to create skb clones
>      - after successfull pskb_expand_head() to change owner on extended
>        skb.
> v2: based on patch version from Eric Dumazet,
>     added __pskb_expand_head() function, which can be forced
>     to adjust skb->truesize and sk->sk_wmem_alloc.
> ---
>  include/net/sock.h |  1 +
>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>  net/core/sock.c    |  8 ++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66a9a90f9558..a547651d46a7 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>  			     gfp_t priority);
>  void __sock_wfree(struct sk_buff *skb);
>  void sock_wfree(struct sk_buff *skb);
> +bool is_skb_wmem(const struct sk_buff *skb);
>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>  			     gfp_t priority);
>  void skb_orphan_partial(struct sk_buff *skb);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea2c7de..8cb6d360cda5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
> +	struct sock *sk = skb->sk;
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
>  		return skb;
>  
> -	/* pskb_expand_head() might crash, if skb is shared */
> -	if (skb_shared(skb)) {
> +	delta = SKB_DATA_ALIGN(delta);
> +	/* pskb_expand_head() might crash, if skb is shared. */
> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto fail;
> +
> +		if (sk)
> +			skb_set_owner_w(nskb, sk);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
> +		goto fail;
> +
> +	if (sk && skb->destructor != sock_edemux) {
> +		delta = skb_end_offset(skb) - osize;
> +		refcount_add(delta, &sk->sk_wmem_alloc);
> +		skb->truesize += delta;
>  	}
>  	return skb;
> +
> +fail:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 62627e868e03..1932755ae9ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +	       skb->destructor == __sock_wfree ||
> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +EXPORT_SYMBOL(is_skb_wmem);
> +
>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-04 13:00         ` [PATCH net v9] " Vasily Averin
  2021-10-04 13:14           ` Vasily Averin
@ 2021-10-04 19:26           ` Eric Dumazet
  2021-10-05  5:57             ` Vasily Averin
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2021-10-04 19:26 UTC (permalink / raw)
  To: Vasily Averin, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel



On 10/4/21 6:00 AM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v9: restored sock_edemux check
> v8: clone non-wmem skb
> v7 (from kuba@):
>     shift more magic into helpers,
>     follow Eric's advice and don't inherit non-wmem skbs for now
> v6: fixed delta,
>     improved comments
> v5: fixed else condition, thanks to Eric
>     reworked update of expanded skb,
>     added corresponding comments
> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
> v3: removed __pskb_expand_head(),
>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>     there are 2 ways to use it:
>      - before pskb_expand_head(), to create skb clones
>      - after successfull pskb_expand_head() to change owner on extended
>        skb.
> v2: based on patch version from Eric Dumazet,
>     added __pskb_expand_head() function, which can be forced
>     to adjust skb->truesize and sk->sk_wmem_alloc.
> ---
>  include/net/sock.h |  1 +
>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>  net/core/sock.c    |  8 ++++++++
>  3 files changed, 31 insertions(+), 13 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 66a9a90f9558..a547651d46a7 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>  			     gfp_t priority);
>  void __sock_wfree(struct sk_buff *skb);
>  void sock_wfree(struct sk_buff *skb);
> +bool is_skb_wmem(const struct sk_buff *skb);
>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>  			     gfp_t priority);
>  void skb_orphan_partial(struct sk_buff *skb);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea2c7de..8cb6d360cda5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>  {
>  	int delta = headroom - skb_headroom(skb);
> +	int osize = skb_end_offset(skb);
> +	struct sock *sk = skb->sk;
>  
>  	if (WARN_ONCE(delta <= 0,
>  		      "%s is expecting an increase in the headroom", __func__))
>  		return skb;
>  
> -	/* pskb_expand_head() might crash, if skb is shared */
> -	if (skb_shared(skb)) {
> +	delta = SKB_DATA_ALIGN(delta);
> +	/* pskb_expand_head() might crash, if skb is shared. */
> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>  
> -		if (likely(nskb)) {
> -			if (skb->sk)
> -				skb_set_owner_w(nskb, skb->sk);
> -			consume_skb(skb);
> -		} else {
> -			kfree_skb(skb);
> -		}
> +		if (unlikely(!nskb))
> +			goto fail;
> +
> +		if (sk)
> +			skb_set_owner_w(nskb, sk);
> +		consume_skb(skb);
>  		skb = nskb;
>  	}
> -	if (skb &&
> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
> -		kfree_skb(skb);
> -		skb = NULL;
> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
> +		goto fail;
> +
> +	if (sk && skb->destructor != sock_edemux) {

    Why not re-using is_skb_wmem() here ?
    Testing != sock_edemux looks strange.
> +		delta = skb_end_offset(skb) - osize;
> +		refcount_add(delta, &sk->sk_wmem_alloc);
> +		skb->truesize += delta;
>  	}
>  	return skb;
> +
> +fail:
> +	kfree_skb(skb);
> +	return NULL;
>  }
>  EXPORT_SYMBOL(skb_expand_head);
>  
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 62627e868e03..1932755ae9ba 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>  }
>  EXPORT_SYMBOL(skb_set_owner_w);
>  
> +bool is_skb_wmem(const struct sk_buff *skb)
> +{
> +	return skb->destructor == sock_wfree ||
> +	       skb->destructor == __sock_wfree ||
> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
> +}
> +EXPORT_SYMBOL(is_skb_wmem);
> +

This probably should be inlined.

>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>  {
>  #ifdef CONFIG_TLS_DEVICE
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-04 19:26           ` Eric Dumazet
@ 2021-10-05  5:57             ` Vasily Averin
  2021-10-20 16:14               ` Eric Dumazet
  2021-10-20 16:18               ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Vasily Averin @ 2021-10-05  5:57 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel

On 10/4/21 10:26 PM, Eric Dumazet wrote:
> 
> 
> On 10/4/21 6:00 AM, Vasily Averin wrote:
>> Christoph Paasch reports [1] about incorrect skb->truesize
>> after skb_expand_head() call in ip6_xmit.
>> This may happen because of two reasons:
>> - skb_set_owner_w() for newly cloned skb is called too early,
>> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
>> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
>> In this case sk->sk_wmem_alloc should be adjusted too.
>>
>> [1] https://lkml.org/lkml/2021/8/20/1082
>>
>> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
>> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
>> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
>> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>> ---
>> v9: restored sock_edemux check
>> v8: clone non-wmem skb
>> v7 (from kuba@):
>>     shift more magic into helpers,
>>     follow Eric's advice and don't inherit non-wmem skbs for now
>> v6: fixed delta,
>>     improved comments
>> v5: fixed else condition, thanks to Eric
>>     reworked update of expanded skb,
>>     added corresponding comments
>> v4: decided to use is_skb_wmem() after pskb_expand_head() call
>>     fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric Dumazet
>> v3: removed __pskb_expand_head(),
>>     added is_skb_wmem() helper for skb with wmem-compatible destructors
>>     there are 2 ways to use it:
>>      - before pskb_expand_head(), to create skb clones
>>      - after successfull pskb_expand_head() to change owner on extended
>>        skb.
>> v2: based on patch version from Eric Dumazet,
>>     added __pskb_expand_head() function, which can be forced
>>     to adjust skb->truesize and sk->sk_wmem_alloc.
>> ---
>>  include/net/sock.h |  1 +
>>  net/core/skbuff.c  | 35 ++++++++++++++++++++++-------------
>>  net/core/sock.c    |  8 ++++++++
>>  3 files changed, 31 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 66a9a90f9558..a547651d46a7 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -1695,6 +1695,7 @@ struct sk_buff *sock_wmalloc(struct sock *sk, unsigned long size, int force,
>>  			     gfp_t priority);
>>  void __sock_wfree(struct sk_buff *skb);
>>  void sock_wfree(struct sk_buff *skb);
>> +bool is_skb_wmem(const struct sk_buff *skb);
>>  struct sk_buff *sock_omalloc(struct sock *sk, unsigned long size,
>>  			     gfp_t priority);
>>  void skb_orphan_partial(struct sk_buff *skb);
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 2170bea2c7de..8cb6d360cda5 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1804,30 +1804,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
>>  struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
>>  {
>>  	int delta = headroom - skb_headroom(skb);
>> +	int osize = skb_end_offset(skb);
>> +	struct sock *sk = skb->sk;
>>  
>>  	if (WARN_ONCE(delta <= 0,
>>  		      "%s is expecting an increase in the headroom", __func__))
>>  		return skb;
>>  
>> -	/* pskb_expand_head() might crash, if skb is shared */
>> -	if (skb_shared(skb)) {
>> +	delta = SKB_DATA_ALIGN(delta);
>> +	/* pskb_expand_head() might crash, if skb is shared. */
>> +	if (skb_shared(skb) || !is_skb_wmem(skb)) {
>>  		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
>>  
>> -		if (likely(nskb)) {
>> -			if (skb->sk)
>> -				skb_set_owner_w(nskb, skb->sk);
>> -			consume_skb(skb);
>> -		} else {
>> -			kfree_skb(skb);
>> -		}
>> +		if (unlikely(!nskb))
>> +			goto fail;
>> +
>> +		if (sk)
>> +			skb_set_owner_w(nskb, sk);
>> +		consume_skb(skb);
>>  		skb = nskb;
>>  	}
>> -	if (skb &&
>> -	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
>> -		kfree_skb(skb);
>> -		skb = NULL;
>> +	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
>> +		goto fail;
>> +
>> +	if (sk && skb->destructor != sock_edemux) {
> 
>     Why not re-using is_skb_wmem() here ?
>     Testing != sock_edemux looks strange.

All non-wmem skbs was cloned and then was freed already.
After pskb_expand_head() call we can have:
(1) either original wmem skbs 
(2) or cloned skbs: 
 (2a) either without sk at all,
 (2b) or with sock_edemux destructor (that was set inside skb_set_owner_w() for !sk_fullsock(sk))
 (2c) or with sock_wfree destructor (that was set inside skb_set_owner_w() for sk_fullsock(sk))

(2a) and (2b) do not require truesize/sk_wmem_alloc update, it was handled inside pskb_expand_head()
(1) and (2c) cases are processed here.

If required I can add this explanation either into patch description or as comment.

Btw I just noticed that we can avoid cloning for original skbs without sk.
How do you think should I do it?

>> +		delta = skb_end_offset(skb) - osize;
>> +		refcount_add(delta, &sk->sk_wmem_alloc);
>> +		skb->truesize += delta;
>>  	}
>>  	return skb;
>> +
>> +fail:
>> +	kfree_skb(skb);
>> +	return NULL;
>>  }
>>  EXPORT_SYMBOL(skb_expand_head);
>>  
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 62627e868e03..1932755ae9ba 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>>  }
>>  EXPORT_SYMBOL(skb_set_owner_w);
>>  
>> +bool is_skb_wmem(const struct sk_buff *skb)
>> +{
>> +	return skb->destructor == sock_wfree ||
>> +	       skb->destructor == __sock_wfree ||
>> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
>> +}
>> +EXPORT_SYMBOL(is_skb_wmem);
>> +
> 
> This probably should be inlined.

David Miller pointed me out in the comments to an early version of the patch
"Please do not use inline in foo.c files, let the compiler decide."

>>  static bool can_skb_orphan_partial(const struct sk_buff *skb)
>>  {
>>  #ifdef CONFIG_TLS_DEVICE
>>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-05  5:57             ` Vasily Averin
@ 2021-10-20 16:14               ` Eric Dumazet
  2021-10-20 16:18               ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2021-10-20 16:14 UTC (permalink / raw)
  To: Vasily Averin, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel



On 10/4/21 10:57 PM, Vasily Averin wrote:

>>>  
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 62627e868e03..1932755ae9ba 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -2227,6 +2227,14 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk)
>>>  }
>>>  EXPORT_SYMBOL(skb_set_owner_w);
>>>  
>>> +bool is_skb_wmem(const struct sk_buff *skb)
>>> +{
>>> +	return skb->destructor == sock_wfree ||
>>> +	       skb->destructor == __sock_wfree ||
>>> +	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
>>> +}
>>> +EXPORT_SYMBOL(is_skb_wmem);
>>> +
>>
>> This probably should be inlined.
> 
> David Miller pointed me out in the comments to an early version of the patch
> "Please do not use inline in foo.c files, let the compiler decide."
> 

Sure, my suggestion was to move this helper in an include file,
and use

static inline bool ....

I would not suggest add an inline in a C file, unless absolutely critical.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v9] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-05  5:57             ` Vasily Averin
  2021-10-20 16:14               ` Eric Dumazet
@ 2021-10-20 16:18               ` Eric Dumazet
  2021-10-22 10:28                 ` [PATCH net v10] " Vasily Averin
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2021-10-20 16:18 UTC (permalink / raw)
  To: Vasily Averin, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel



On 10/4/21 10:57 PM, Vasily Averin wrote:
> On 10/4/21 10:26 PM, Eric Dumazet wrote:

>>
>>     Why not re-using is_skb_wmem() here ?
>>     Testing != sock_edemux looks strange.
> 
> All non-wmem skbs was cloned and then was freed already.
> After pskb_expand_head() call we can have:
> (1) either original wmem skbs 
> (2) or cloned skbs: 
>  (2a) either without sk at all,
>  (2b) or with sock_edemux destructor (that was set inside skb_set_owner_w() for !sk_fullsock(sk))
>  (2c) or with sock_wfree destructor (that was set inside skb_set_owner_w() for sk_fullsock(sk))
> 
> (2a) and (2b) do not require truesize/sk_wmem_alloc update, it was handled inside pskb_expand_head()
> (1) and (2c) cases are processed here.
> 
> If required I can add this explanation either into patch description or as comment.
> 

sock_edemux is one of the current destructors.

New ones will be added later. We can not expect that in two or three years,
at least one reviewer will remember this special case.

I would prefer you list the known destructors (allow-list, instead of disallow-list)



> Btw I just noticed that we can avoid cloning for original skbs without sk.
> How do you think should I do it?

Lets wait a bit before new features...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net v10] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-20 16:18               ` Eric Dumazet
@ 2021-10-22 10:28                 ` Vasily Averin
  2021-10-22 19:32                   ` Eric Dumazet
  2021-10-22 20:50                   ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 19+ messages in thread
From: Vasily Averin @ 2021-10-22 10:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, David S. Miller, Hideaki YOSHIFUJI,
	David Ahern, Julian Wiedmann, Christoph Paasch, linux-kernel,
	kernel

Christoph Paasch reports [1] about incorrect skb->truesize
after skb_expand_head() call in ip6_xmit.
This may happen because of two reasons:
- skb_set_owner_w() for newly cloned skb is called too early,
before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
- pskb_expand_head() does not adjust truesize in (skb->sk) case.
In this case sk->sk_wmem_alloc should be adjusted too.

[1] https://lkml.org/lkml/2021/8/20/1082

Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v10: is_skb_wmem() was moved into separate header (it depends on net/tcp.h)
     use it after pskb_expand_head() insted of strange sock_edemux check
v9: restored sock_edemux check
v8: clone non-wmem skb
v7 (from kuba@):
    shift more magic into helpers,
    follow Eric's advice and don't inherit non-wmem skbs for now
v6: fixed delta,
    improved comments
v5: fixed else condition, thanks to Eric
    reworked update of expanded skb,
    added corresponding comments
v4: decided to use is_skb_wmem() after pskb_expand_head() call
    fixed 'return (EXPRESSION);' in os_skb_wmem according to Eric
Dumazet
v3: removed __pskb_expand_head(),
    added is_skb_wmem() helper for skb with wmem-compatible destructors
    there are 2 ways to use it:
     - before pskb_expand_head(), to create skb clones
     - after successfull pskb_expand_head() to change owner on extended
       skb.
v2: based on patch version from Eric Dumazet,
    added __pskb_expand_head() function, which can be forced
    to adjust skb->truesize and sk->sk_wmem_alloc.

 net/core/skbuff.c          | 36 +++++++++++++++++++++++-------------
 net/core/sock_destructor.h | 12 ++++++++++++
 2 files changed, 35 insertions(+), 13 deletions(-)
 create mode 100644 net/core/sock_destructor.h

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2170bea2c7de..fe9358437380 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -80,6 +80,7 @@
 #include <linux/indirect_call_wrapper.h>
 
 #include "datagram.h"
+#include "sock_destructor.h"
 
 struct kmem_cache *skbuff_head_cache __ro_after_init;
 static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
@@ -1804,30 +1805,39 @@ EXPORT_SYMBOL(skb_realloc_headroom);
 struct sk_buff *skb_expand_head(struct sk_buff *skb, unsigned int headroom)
 {
 	int delta = headroom - skb_headroom(skb);
+	int osize = skb_end_offset(skb);
+	struct sock *sk = skb->sk;
 
 	if (WARN_ONCE(delta <= 0,
 		      "%s is expecting an increase in the headroom", __func__))
 		return skb;
 
-	/* pskb_expand_head() might crash, if skb is shared */
-	if (skb_shared(skb)) {
+	delta = SKB_DATA_ALIGN(delta);
+	/* pskb_expand_head() might crash, if skb is shared. */
+	if (skb_shared(skb) || !is_skb_wmem(skb)) {
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
-		if (likely(nskb)) {
-			if (skb->sk)
-				skb_set_owner_w(nskb, skb->sk);
-			consume_skb(skb);
-		} else {
-			kfree_skb(skb);
-		}
+		if (unlikely(!nskb))
+			goto fail;
+
+		if (sk)
+			skb_set_owner_w(nskb, sk);
+		consume_skb(skb);
 		skb = nskb;
 	}
-	if (skb &&
-	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
-		kfree_skb(skb);
-		skb = NULL;
+	if (pskb_expand_head(skb, delta, 0, GFP_ATOMIC))
+		goto fail;
+
+	if (sk && is_skb_wmem(skb)) {
+		delta = skb_end_offset(skb) - osize;
+		refcount_add(delta, &sk->sk_wmem_alloc);
+		skb->truesize += delta;
 	}
 	return skb;
+
+fail:
+	kfree_skb(skb);
+	return NULL;
 }
 EXPORT_SYMBOL(skb_expand_head);
 
diff --git a/net/core/sock_destructor.h b/net/core/sock_destructor.h
new file mode 100644
index 000000000000..2f396e6bfba5
--- /dev/null
+++ b/net/core/sock_destructor.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _NET_CORE_SOCK_DESTRUCTOR_H
+#define _NET_CORE_SOCK_DESTRUCTOR_H
+#include <net/tcp.h>
+
+static inline bool is_skb_wmem(const struct sk_buff *skb)
+{
+	return skb->destructor == sock_wfree ||
+	       skb->destructor == __sock_wfree ||
+	       (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree);
+}
+#endif
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net v10] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-22 10:28                 ` [PATCH net v10] " Vasily Averin
@ 2021-10-22 19:32                   ` Eric Dumazet
  2021-10-22 20:50                   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2021-10-22 19:32 UTC (permalink / raw)
  To: Vasily Averin, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David S. Miller, Hideaki YOSHIFUJI, David Ahern,
	Julian Wiedmann, Christoph Paasch, linux-kernel, kernel



On 10/22/21 3:28 AM, Vasily Averin wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [1] https://lkml.org/lkml/2021/8/20/1082
> 
> Fixes: f1260ff15a71 ("skbuff: introduce skb_expand_head()")
> Fixes: 2d85a1b31dde ("ipv6: ip6_finish_output2: set sk into newly allocated nskb")
> Reported-by: Christoph Paasch <christoph.paasch@gmail.com>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v10: is_skb_wmem() was moved into separate header (it depends on net/tcp.h)
>      use it after pskb_expand_head() insted of strange sock_edemux check

SGTM, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v10] skb_expand_head() adjust skb->truesize incorrectly
  2021-10-22 10:28                 ` [PATCH net v10] " Vasily Averin
  2021-10-22 19:32                   ` Eric Dumazet
@ 2021-10-22 20:50                   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-10-22 20:50 UTC (permalink / raw)
  To: Vasily Averin
  Cc: eric.dumazet, kuba, netdev, davem, yoshfuji, dsahern, jwi,
	christoph.paasch, linux-kernel, kernel

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 22 Oct 2021 13:28:37 +0300 you wrote:
> Christoph Paasch reports [1] about incorrect skb->truesize
> after skb_expand_head() call in ip6_xmit.
> This may happen because of two reasons:
> - skb_set_owner_w() for newly cloned skb is called too early,
> before pskb_expand_head() where truesize is adjusted for (!skb-sk) case.
> - pskb_expand_head() does not adjust truesize in (skb->sk) case.
> In this case sk->sk_wmem_alloc should be adjusted too.
> 
> [...]

Here is the summary with links:
  - [net,v10] skb_expand_head() adjust skb->truesize incorrectly
    https://git.kernel.org/netdev/net/c/7f678def99d2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-22 20:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17 16:24 [RFC net v7] net: skb_expand_head() adjust skb->truesize incorrectly Jakub Kicinski
2021-09-17 18:15 ` Vasily Averin
2021-09-18 10:05 ` Vasily Averin
2021-09-20 18:12   ` Jakub Kicinski
2021-09-20 21:41     ` Vasily Averin
2021-09-21  0:39       ` Jakub Kicinski
2021-09-21  6:36         ` Vasily Averin
2021-09-21 21:25           ` Jakub Kicinski
2021-09-20 21:41     ` [PATCH net v8] " Vasily Averin
2021-09-21  5:21       ` Vasily Averin
2021-10-04 13:00         ` [PATCH net v9] " Vasily Averin
2021-10-04 13:14           ` Vasily Averin
2021-10-04 19:26           ` Eric Dumazet
2021-10-05  5:57             ` Vasily Averin
2021-10-20 16:14               ` Eric Dumazet
2021-10-20 16:18               ` Eric Dumazet
2021-10-22 10:28                 ` [PATCH net v10] " Vasily Averin
2021-10-22 19:32                   ` Eric Dumazet
2021-10-22 20:50                   ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.