All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] udp: cache sock to avoid searching it twice
@ 2018-11-09  6:21 Li RongQing
  2018-11-09 13:41 ` Paolo Abeni
  2018-11-09 17:25 ` Eric Dumazet
  0 siblings, 2 replies; 5+ messages in thread
From: Li RongQing @ 2018-11-09  6:21 UTC (permalink / raw)
  To: netdev

GRO for UDP needs to lookup socket twice, first is in gro receive,
second is gro complete, so if store sock to skb to avoid looking up
twice, this can give small performance boost

netperf -t UDP_RR -l 10

Before:
	Rate per sec: 28746.01
After:
	Rate per sec: 29401.67

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/ipv4/udp_offload.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 0646d61f4fa8..429570112a33 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 
 	if (udp_sk(sk)->gro_enabled) {
 		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
+
+		if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
+			sock_hold(sk);
+			pp->sk = sk;
+		}
 		rcu_read_unlock();
 		return pp;
 	}
@@ -444,6 +449,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
 	pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
 
+	if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
+		sock_hold(sk);
+		pp->sk = sk;
+	}
 out_unlock:
 	rcu_read_unlock();
 	skb_gro_flush_final(skb, pp, flush);
@@ -502,7 +511,9 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 	uh->len = newlen;
 
 	rcu_read_lock();
-	sk = (*lookup)(skb, uh->source, uh->dest);
+	sk = skb->sk;
+	if (!sk)
+		sk = (*lookup)(skb, uh->source, uh->dest);
 	if (sk && udp_sk(sk)->gro_enabled) {
 		err = udp_gro_complete_segment(skb);
 	} else if (sk && udp_sk(sk)->gro_complete) {
@@ -516,6 +527,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		err = udp_sk(sk)->gro_complete(sk, skb,
 				nhoff + sizeof(struct udphdr));
 	}
+
+	if (skb->sk) {
+		sock_put(skb->sk);
+		skb->sk = NULL;
+	}
 	rcu_read_unlock();
 
 	if (skb->remcsum_offload)
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] udp: cache sock to avoid searching it twice
  2018-11-09  6:21 [PATCH][RFC] udp: cache sock to avoid searching it twice Li RongQing
@ 2018-11-09 13:41 ` Paolo Abeni
  2018-11-12  5:46   ` 答复: " Li,Rongqing
  2018-11-09 17:25 ` Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2018-11-09 13:41 UTC (permalink / raw)
  To: Li RongQing, netdev; +Cc: Willem de Bruijn

Hi,

Adding Willem, I think he can be interested.

On Fri, 2018-11-09 at 14:21 +0800, Li RongQing wrote:
> GRO for UDP needs to lookup socket twice, first is in gro receive,
> second is gro complete, so if store sock to skb to avoid looking up
> twice, this can give small performance boost
> 
> netperf -t UDP_RR -l 10
> 
> Before:
> 	Rate per sec: 28746.01
> After:
> 	Rate per sec: 29401.67
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  net/ipv4/udp_offload.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 0646d61f4fa8..429570112a33 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>  
>  	if (udp_sk(sk)->gro_enabled) {
>  		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> +
> +		if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> +			sock_hold(sk);
> +			pp->sk = sk;
> +		}
>  		rcu_read_unlock();
>  		return pp;
>  	}

What if 'pp' is NULL?

Aside from that, this replace a lookup with 2 atomic ops, and only when
such lookup is amortized on multiple aggregated packets: I'm unsure if
it's worthy and I don't understand how that improves RR tests (where
the socket can't see multiple, consecutive skbs, AFAIK).

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] udp: cache sock to avoid searching it twice
  2018-11-09  6:21 [PATCH][RFC] udp: cache sock to avoid searching it twice Li RongQing
  2018-11-09 13:41 ` Paolo Abeni
@ 2018-11-09 17:25 ` Eric Dumazet
  2018-11-12  5:39   ` 答复: " Li,Rongqing
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2018-11-09 17:25 UTC (permalink / raw)
  To: Li RongQing, netdev



On 11/08/2018 10:21 PM, Li RongQing wrote:
> GRO for UDP needs to lookup socket twice, first is in gro receive,
> second is gro complete, so if store sock to skb to avoid looking up
> twice, this can give small performance boost
> 
> netperf -t UDP_RR -l 10
> 
> Before:
> 	Rate per sec: 28746.01
> After:
> 	Rate per sec: 29401.67
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  net/ipv4/udp_offload.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 0646d61f4fa8..429570112a33 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>  
>  	if (udp_sk(sk)->gro_enabled) {
>  		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> +
> +		if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> +			sock_hold(sk);
> +			pp->sk = sk;


You also have to set pp->destructor to sock_edemux

flush_gro_hash -> kfree_skb()

If there is no destructor, the reference on pp->sk will never be released.




> +		}
>  		rcu_read_unlock();
>  		return pp;
>  	}
> @@ -444,6 +449,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
>  	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
>  	pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
>  
> +	if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> +		sock_hold(sk);
> +		pp->sk = sk;
> +	}
>  out_unlock:
>  	rcu_read_unlock();
>  	skb_gro_flush_final(skb, pp, flush);
> @@ -502,7 +511,9 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>  	uh->len = newlen;
>  
>  	rcu_read_lock();
> -	sk = (*lookup)(skb, uh->source, uh->dest);
> +	sk = skb->sk;
> +	if (!sk)
> +		sk = (*lookup)(skb, uh->source, uh->dest);
>  	if (sk && udp_sk(sk)->gro_enabled) {
>  		err = udp_gro_complete_segment(skb);
>  	} else if (sk && udp_sk(sk)->gro_complete) {
> @@ -516,6 +527,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>  		err = udp_sk(sk)->gro_complete(sk, skb,
>  				nhoff + sizeof(struct udphdr));
>  	}
> +
> +	if (skb->sk) {
> +		sock_put(skb->sk);
> +		skb->sk = NULL;
> +	}
>  	rcu_read_unlock();
>  
>  	if (skb->remcsum_offload)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复: [PATCH][RFC] udp: cache sock to avoid searching it twice
  2018-11-09 17:25 ` Eric Dumazet
@ 2018-11-12  5:39   ` Li,Rongqing
  0 siblings, 0 replies; 5+ messages in thread
From: Li,Rongqing @ 2018-11-12  5:39 UTC (permalink / raw)
  To: Eric Dumazet, netdev


On Sat, Nov 10, 2018 at 1:29 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 11/08/2018 10:21 PM, Li RongQing wrote:
> > GRO for UDP needs to lookup socket twice, first is in gro receive,
> > second is gro complete, so if store sock to skb to avoid looking up
> > twice, this can give small performance boost
> >
> > netperf -t UDP_RR -l 10
> >
> > Before:
> >       Rate per sec: 28746.01
> > After:
> >       Rate per sec: 29401.67
> >
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> >  net/ipv4/udp_offload.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 0646d61f4fa8..429570112a33 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
> >
> >       if (udp_sk(sk)->gro_enabled) {
> >               pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> > +
> > +             if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> > +                     sock_hold(sk);
> > +                     pp->sk = sk;
>
>
> You also have to set pp->destructor to sock_edemux
>
> flush_gro_hash -> kfree_skb()
>
> If there is no destructor, the reference on pp->sk will never be released.
>
>

Ok, thanks,

does it need to reset sk in udp_gro_complete,  ip early demuxing will lookup udp socket again, if we can keep it, we can avoid to lookup socket again


-RongQing
>
>
> > +             }
> >               rcu_read_unlock();
> >               return pp;
> >       }
> > @@ -444,6 +449,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
> >       skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
> >       pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
> >
> > +     if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> > +             sock_hold(sk);
> > +             pp->sk = sk;
> > +     }
> >  out_unlock:
> >       rcu_read_unlock();
> >       skb_gro_flush_final(skb, pp, flush);
> > @@ -502,7 +511,9 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> >       uh->len = newlen;
> >
> >       rcu_read_lock();
> > -     sk = (*lookup)(skb, uh->source, uh->dest);
> > +     sk = skb->sk;
> > +     if (!sk)
> > +             sk = (*lookup)(skb, uh->source, uh->dest);
> >       if (sk && udp_sk(sk)->gro_enabled) {
> >               err = udp_gro_complete_segment(skb);
> >       } else if (sk && udp_sk(sk)->gro_complete) {
> > @@ -516,6 +527,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> >               err = udp_sk(sk)->gro_complete(sk, skb,
> >                               nhoff + sizeof(struct udphdr));
> >       }
> > +
> > +     if (skb->sk) {
> > +             sock_put(skb->sk);
> > +             skb->sk = NULL;
> > +     }
> >       rcu_read_unlock();
> >
> >       if (skb->remcsum_offload)
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复: [PATCH][RFC] udp: cache sock to avoid searching it twice
  2018-11-09 13:41 ` Paolo Abeni
@ 2018-11-12  5:46   ` Li,Rongqing
  0 siblings, 0 replies; 5+ messages in thread
From: Li,Rongqing @ 2018-11-12  5:46 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: Willem de Bruijn

> >               return pp;
> >       }
>
> What if 'pp' is NULL?
>
> Aside from that, this replace a lookup with 2 atomic ops, and only when
> such lookup is amortized on multiple aggregated packets: I'm unsure if
> it's worthy and I don't understand how that improves RR tests (where
> the socket can't see multiple, consecutive skbs, AFAIK).
>
> Cheers,
>
> Paolo
>

If we not release the socket in udp_gro_complete , we can reduce a udp socket
Lookup when do ip demux again, it maybe more worthy.

I test UDP_STREAM, find no difference, both can reach NIC's limit, 10G; 
so Test RR, I will do more tests

-RongQing

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-12 15:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09  6:21 [PATCH][RFC] udp: cache sock to avoid searching it twice Li RongQing
2018-11-09 13:41 ` Paolo Abeni
2018-11-12  5:46   ` 答复: " Li,Rongqing
2018-11-09 17:25 ` Eric Dumazet
2018-11-12  5:39   ` 答复: " Li,Rongqing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.