All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next  1/1] tipc: avoid unnecessary copying of bundled messages
@ 2018-02-14 12:50 Jon Maloy
  2018-02-14 13:42 ` Eric Dumazet
  2018-02-14 20:27 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Jon Maloy @ 2018-02-14 12:50 UTC (permalink / raw)
  To: davem, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	jon.maloy, canh.d.luu, ying.xue, tipc-discussion

A received sk buffer may contain dozens of smaller 'bundled' messages
which after extraction go each in their own direction.

Unfortunately, when we extract those messages using skb_clone() each
of the extracted buffers inherit the truesize value of the original
buffer. Apart from causing massive overaccounting of the base buffer's
memory, this often causes tipc_msg_validate() to come to the false
conclusion that the ratio truesize/datasize > 4, and perform an
unnecessary copying of the extracted buffer.

We now fix this problem by explicitly correcting the truesize value of
the buffer clones to be the truesize of the clone itself. This change
eliminates both the overaccounting and the unnecessary buffer copying.

Reported-by: Hoang Le <hoang.h.le@dektek.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/msg.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 4e1c6f6..a368fa8 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -434,6 +434,9 @@ bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos)
 	skb_pull(*iskb, offset);
 	imsz = msg_size(buf_msg(*iskb));
 	skb_trim(*iskb, imsz);
+
+	/* Scale extracted buffer's truesize to avoid double accounting */
+	(*iskb)->truesize = SKB_TRUESIZE(imsz);
 	if (unlikely(!tipc_msg_validate(iskb)))
 		goto none;
 	*pos += align(imsz);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [net-next  1/1] tipc: avoid unnecessary copying of bundled messages
  2018-02-14 12:50 [net-next 1/1] tipc: avoid unnecessary copying of bundled messages Jon Maloy
@ 2018-02-14 13:42 ` Eric Dumazet
  2018-02-14 20:27 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2018-02-14 13:42 UTC (permalink / raw)
  To: Jon Maloy, davem, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	canh.d.luu, ying.xue, tipc-discussion

On Wed, 2018-02-14 at 13:50 +0100, Jon Maloy wrote:
> A received sk buffer may contain dozens of smaller 'bundled' messages
> which after extraction go each in their own direction.
> 
> Unfortunately, when we extract those messages using skb_clone() each
> of the extracted buffers inherit the truesize value of the original
> buffer. Apart from causing massive overaccounting of the base buffer's
> memory, this often causes tipc_msg_validate() to come to the false
> conclusion that the ratio truesize/datasize > 4, and perform an
> unnecessary copying of the extracted buffer.
> 
> We now fix this problem by explicitly correcting the truesize value of
> the buffer clones to be the truesize of the clone itself. This change
> eliminates both the overaccounting and the unnecessary buffer copying.
> 
> Reported-by: Hoang Le <hoang.h.le@dektek.com.au>
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> ---
>  net/tipc/msg.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index 4e1c6f6..a368fa8 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -434,6 +434,9 @@ bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos)
>  	skb_pull(*iskb, offset);
>  	imsz = msg_size(buf_msg(*iskb));
>  	skb_trim(*iskb, imsz);
> +
> +	/* Scale extracted buffer's truesize to avoid double accounting */
> +	(*iskb)->truesize = SKB_TRUESIZE(imsz);

How do you guarantee that under accounting wont happen here ?

Copying data to avoid OOM is not necessarily bad.

TCP stack does this under stress (this is called collapsing),
and this definitely can happen.

skb_clone() will also clones frags, and you absolutely do not know what
memory each frag can hold (that could be 64KB on arches with 64KB
pages)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net-next 1/1] tipc: avoid unnecessary copying of bundled messages
  2018-02-14 12:50 [net-next 1/1] tipc: avoid unnecessary copying of bundled messages Jon Maloy
  2018-02-14 13:42 ` Eric Dumazet
@ 2018-02-14 20:27 ` David Miller
  2018-02-15  8:57   ` Jon Maloy
  1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2018-02-14 20:27 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
	hoang.h.le, canh.d.luu, ying.xue, tipc-discussion

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 14 Feb 2018 13:50:31 +0100

> diff --git a/net/tipc/msg.c b/net/tipc/msg.c
> index 4e1c6f6..a368fa8 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -434,6 +434,9 @@ bool tipc_msg_extract(struct sk_buff *skb, struct sk_buff **iskb, int *pos)
>  	skb_pull(*iskb, offset);
>  	imsz = msg_size(buf_msg(*iskb));
>  	skb_trim(*iskb, imsz);
> +
> +	/* Scale extracted buffer's truesize to avoid double accounting */
> +	(*iskb)->truesize = SKB_TRUESIZE(imsz);
>  	if (unlikely(!tipc_msg_validate(iskb)))
>  		goto none;
>  	*pos += align(imsz);

As Eric said, you have to be really careful here.

If you clone a 10K SKB 10 times, you really have to account for the full
truesize 10 times.

That is unless you explicitly trim off frags in the new clone, then adjust
the truesize by explicitly decreasing it by the amount of memory backing
the frags you trimmed off completely (not partially).

Finally, you can only do this on an SKB that has never entered a socket
SKB queue, otherwise you corrupt memory accounting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [net-next 1/1] tipc: avoid unnecessary copying of bundled messages
  2018-02-14 20:27 ` David Miller
@ 2018-02-15  8:57   ` Jon Maloy
  2018-02-16 20:25     ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Maloy @ 2018-02-15  8:57 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Mohan Krishna Ghanta Krishnamurthy, Tung Quang Nguyen,
	Hoang Huu Le, Canh Duc Luu, Ying Xue, tipc-discussion



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: Wednesday, February 14, 2018 21:27
> To: Jon Maloy <jon.maloy@ericsson.com>
> Cc: netdev@vger.kernel.org; Mohan Krishna Ghanta Krishnamurthy
> <mohan.krishna.ghanta.krishnamurthy@ericsson.com>; Tung Quang Nguyen
> <tung.q.nguyen@dektech.com.au>; Hoang Huu Le
> <hoang.h.le@dektech.com.au>; Canh Duc Luu
> <canh.d.luu@dektech.com.au>; Ying Xue <ying.xue@windriver.com>; tipc-
> discussion@lists.sourceforge.net
> Subject: Re: [net-next 1/1] tipc: avoid unnecessary copying of bundled
> messages
> 
> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Wed, 14 Feb 2018 13:50:31 +0100
> 
> > diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 4e1c6f6..a368fa8
> > 100644
> > --- a/net/tipc/msg.c
> > +++ b/net/tipc/msg.c
> > @@ -434,6 +434,9 @@ bool tipc_msg_extract(struct sk_buff *skb, struct
> sk_buff **iskb, int *pos)
> >  	skb_pull(*iskb, offset);
> >  	imsz = msg_size(buf_msg(*iskb));
> >  	skb_trim(*iskb, imsz);
> > +
> > +	/* Scale extracted buffer's truesize to avoid double accounting */
> > +	(*iskb)->truesize = SKB_TRUESIZE(imsz);
> >  	if (unlikely(!tipc_msg_validate(iskb)))
> >  		goto none;
> >  	*pos += align(imsz);
> 
> As Eric said, you have to be really careful here.
> 
> If you clone a 10K SKB 10 times, you really have to account for the full
> truesize 10 times.
> 
> That is unless you explicitly trim off frags in the new clone, then adjust the
> truesize by explicitly decreasing it by the amount of memory backing the
> frags you trimmed off completely (not partially).

The buffers we are cloning are linearized 1 MTU incoming buffers. There are no fragments. 
Each clone normally points to only a tiny fraction of the data area of the base buffer.
I don't claim that copying always is bad, but in this case it happens in the majority of cases, and as I see it completely unnecessarily.

There is actually some under accounting, however, since we now only count the data area of the base buffer (== the sum of the data area of the clones) plus the overhead of the clones.
A more accurate calculation, taking into account even the overhead of the base buffer, would look  like this:
(*iskb)->truesize =  SKB_TRUSIZE(imsz) + (skb->truesize - skb->len)  / msg_msgcnt(msg);

I.e.,  we calculate the overhead of the base buffer and divide it equally among the clones.
Now I really can't see we are missing anything.

BR
///jon

> 
> Finally, you can only do this on an SKB that has never entered a socket SKB
> queue, otherwise you corrupt memory accounting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net-next 1/1] tipc: avoid unnecessary copying of bundled messages
  2018-02-15  8:57   ` Jon Maloy
@ 2018-02-16 20:25     ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2018-02-16 20:25 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen,
	hoang.h.le, canh.d.luu, ying.xue, tipc-discussion

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 15 Feb 2018 08:57:14 +0000

> The buffers we are cloning are linearized 1 MTU incoming
> buffers. There are no fragments.  Each clone normally points to only
> a tiny fraction of the data area of the base buffer.  I don't claim
> that copying always is bad, but in this case it happens in the
> majority of cases, and as I see it completely unnecessarily.
> 
> There is actually some under accounting, however, since we now only
> count the data area of the base buffer (== the sum of the data area
> of the clones) plus the overhead of the clones.  A more accurate
> calculation, taking into account even the overhead of the base
> buffer, would look like this: (*iskb)->truesize = SKB_TRUSIZE(imsz)
> + (skb->truesize - skb->len) / msg_msgcnt(msg);
> 
> I.e., we calculate the overhead of the base buffer and divide it
> equally among the clones.  Now I really can't see we are missing
> anything.

Maybe there is some miscommunication between us.  Let me try using
different words.

I you have a single SKB which has a truesize of, say, 10K and then you
clone this several times to point the SKB into small windows into that
original SKB's buffer, then you must use the same 10K truesize for the
clones that you did for the original SKB.

It absolutely does not matter that the clones are only pointing to a
small part of the original SKB's buffer.  Until they are freed up each
individual SKB still causes that _ENTIRE_ buffer to be held up and not
freeable.

This is why you have to be careful how you manage SKBs, and in fact
what you are doing by cloning and constricting the data pointers, is
actually troublesome and you can see this with the truesize problems
you are running into.

SKBs are really not built to be managed like this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-16 20:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14 12:50 [net-next 1/1] tipc: avoid unnecessary copying of bundled messages Jon Maloy
2018-02-14 13:42 ` Eric Dumazet
2018-02-14 20:27 ` David Miller
2018-02-15  8:57   ` Jon Maloy
2018-02-16 20:25     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.