* [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case @ 2016-01-05 15:57 Craig Gallek 2016-01-05 16:39 ` Daniel Borkmann ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Craig Gallek @ 2016-01-05 15:57 UTC (permalink / raw) To: netdev, David Miller From: Craig Gallek <kraig@google.com> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Craig Gallek <kraig@google.com> --- net/core/sock_reuseport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index ae0969c0fc2e..1df98c557440 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks, /* temporarily advance data past protocol header */ if (!pskb_pull(skb, hdr_len)) { - consume_skb(nskb); + kfree_skb(nskb); return NULL; } index = bpf_prog_run_save_cb(prog, skb); -- 2.6.0.rc2.230.g3dd15c0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek @ 2016-01-05 16:39 ` Daniel Borkmann 2016-01-05 17:38 ` Eric Dumazet 2016-01-06 6:31 ` David Miller 2 siblings, 0 replies; 8+ messages in thread From: Daniel Borkmann @ 2016-01-05 16:39 UTC (permalink / raw) To: Craig Gallek, netdev, David Miller On 01/05/2016 04:57 PM, Craig Gallek wrote: > From: Craig Gallek <kraig@google.com> > > Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Craig Gallek <kraig@google.com> Thanks, Craig. Acked-by: Daniel Borkmann <daniel@iogearbox.net> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek 2016-01-05 16:39 ` Daniel Borkmann @ 2016-01-05 17:38 ` Eric Dumazet 2016-01-05 17:47 ` Craig Gallek 2016-01-06 6:31 ` David Miller 2 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2016-01-05 17:38 UTC (permalink / raw) To: Craig Gallek; +Cc: netdev, David Miller On Tue, 2016-01-05 at 10:57 -0500, Craig Gallek wrote: > From: Craig Gallek <kraig@google.com> > > Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Craig Gallek <kraig@google.com> > --- > net/core/sock_reuseport.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > index ae0969c0fc2e..1df98c557440 100644 > --- a/net/core/sock_reuseport.c > +++ b/net/core/sock_reuseport.c > @@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks, > > /* temporarily advance data past protocol header */ > if (!pskb_pull(skb, hdr_len)) { > - consume_skb(nskb); > + kfree_skb(nskb); > return NULL; > } > index = bpf_prog_run_save_cb(prog, skb); Note that we always call reuseport_select_sock() after pulling the headers in skb->head anyway, so the pskb_pull() can never fail. It really could be __skb_pull() BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ? I believe the following patch is needed. diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 835378365f25..52387096dbba 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -514,7 +514,8 @@ begin: struct sock *sk2; hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - sk2 = reuseport_select_sock(sk, hash, NULL, 0); + sk2 = reuseport_select_sock(sk, hash, NULL, + sizeof(struct udphdr)); if (sk2) { result = sk2; goto found; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 56fcb55fda31..da0a5fa02b0f 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -272,7 +272,8 @@ begin: struct sock *sk2; hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); - sk2 = reuseport_select_sock(sk, hash, NULL, 0); + sk2 = reuseport_select_sock(sk, hash, NULL, + sizeof(struct udphdr)); if (sk2) { result = sk2; goto found; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 17:38 ` Eric Dumazet @ 2016-01-05 17:47 ` Craig Gallek 2016-01-05 18:31 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Craig Gallek @ 2016-01-05 17:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2016-01-05 at 10:57 -0500, Craig Gallek wrote: >> From: Craig Gallek <kraig@google.com> >> >> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") >> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> >> Signed-off-by: Craig Gallek <kraig@google.com> >> --- >> net/core/sock_reuseport.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c >> index ae0969c0fc2e..1df98c557440 100644 >> --- a/net/core/sock_reuseport.c >> +++ b/net/core/sock_reuseport.c >> @@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks, >> >> /* temporarily advance data past protocol header */ >> if (!pskb_pull(skb, hdr_len)) { >> - consume_skb(nskb); >> + kfree_skb(nskb); >> return NULL; >> } >> index = bpf_prog_run_save_cb(prog, skb); > > Note that we always call reuseport_select_sock() after pulling the > headers in skb->head anyway, so the pskb_pull() can never fail. > > It really could be __skb_pull() > > BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ? hdr_len only matters when you have an skb to work with. In both of the call sites of your suggested patch, NULL is passed for the skb parameter so hdr_len is never used. When no skb is available, the code falls back to the hash-based method used in the non-BPF case. > I believe the following patch is needed. > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 835378365f25..52387096dbba 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -514,7 +514,8 @@ begin: > struct sock *sk2; > hash = udp_ehashfn(net, daddr, hnum, > saddr, sport); > - sk2 = reuseport_select_sock(sk, hash, NULL, 0); > + sk2 = reuseport_select_sock(sk, hash, NULL, > + sizeof(struct udphdr)); > if (sk2) { > result = sk2; > goto found; > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 56fcb55fda31..da0a5fa02b0f 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -272,7 +272,8 @@ begin: > struct sock *sk2; > hash = udp6_ehashfn(net, daddr, hnum, > saddr, sport); > - sk2 = reuseport_select_sock(sk, hash, NULL, 0); > + sk2 = reuseport_select_sock(sk, hash, NULL, > + sizeof(struct udphdr)); > if (sk2) { > result = sk2; > goto found; > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 17:47 ` Craig Gallek @ 2016-01-05 18:31 ` Eric Dumazet 2016-01-05 19:12 ` Craig Gallek 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2016-01-05 18:31 UTC (permalink / raw) To: Craig Gallek; +Cc: netdev, David Miller On Tue, 2016-01-05 at 12:47 -0500, Craig Gallek wrote: > On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ? > hdr_len only matters when you have an skb to work with. In both of > the call sites of your suggested patch, NULL is passed for the skb > parameter so hdr_len is never used. When no skb is available, the > code falls back to the hash-based method used in the non-BPF case. But skb _is_ available ! We expect the BPF to always be run to get consistent selection. udp4_lib_lookup2() can be used if the hash bucket has more than 10 sockets. SO_REUSEPORT should work the same way, ie BPF filter shoulw work regardless of number of sockets in hash table. diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index ae0969c0fc2e..2b0bbecbc4b5 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -220,7 +220,7 @@ struct sock *reuseport_select_sock(struct sock *sk, /* paired with smp_wmb() in reuseport_add_sock() */ smp_rmb(); - if (prog && skb) + if (prog) sk2 = run_bpf(reuse, socks, prog, skb, hdr_len); else sk2 = reuse->socks[reciprocal_scale(hash, socks)]; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 835378365f25..3a66731e3af6 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -493,7 +493,8 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, static struct sock *udp4_lib_lookup2(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, unsigned int hnum, int dif, - struct udp_hslot *hslot2, unsigned int slot2) + struct udp_hslot *hslot2, unsigned int slot2, + struct sk_buff *skb) { struct sock *sk, *result; struct hlist_nulls_node *node; @@ -514,7 +515,8 @@ begin: struct sock *sk2; hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - sk2 = reuseport_select_sock(sk, hash, NULL, 0); + sk2 = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); if (sk2) { result = sk2; goto found; @@ -573,7 +575,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, result = udp4_lib_lookup2(net, saddr, sport, daddr, hnum, dif, - hslot2, slot2); + hslot2, slot2, skb); if (!result) { hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum); slot2 = hash2 & udptable->mask; @@ -583,7 +585,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, result = udp4_lib_lookup2(net, saddr, sport, htonl(INADDR_ANY), hnum, dif, - hslot2, slot2); + hslot2, slot2, skb); } rcu_read_unlock(); return result; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 56fcb55fda31..5d2c2afffe7b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -251,7 +251,8 @@ static inline int compute_score2(struct sock *sk, struct net *net, static struct sock *udp6_lib_lookup2(struct net *net, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, unsigned int hnum, int dif, - struct udp_hslot *hslot2, unsigned int slot2) + struct udp_hslot *hslot2, unsigned int slot2, + struct sk_buff *skb) { struct sock *sk, *result; struct hlist_nulls_node *node; @@ -272,7 +273,8 @@ begin: struct sock *sk2; hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); - sk2 = reuseport_select_sock(sk, hash, NULL, 0); + sk2 = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); if (sk2) { result = sk2; goto found; @@ -331,7 +333,7 @@ struct sock *__udp6_lib_lookup(struct net *net, result = udp6_lib_lookup2(net, saddr, sport, daddr, hnum, dif, - hslot2, slot2); + hslot2, slot2, skb); if (!result) { hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum); slot2 = hash2 & udptable->mask; @@ -341,7 +343,7 @@ struct sock *__udp6_lib_lookup(struct net *net, result = udp6_lib_lookup2(net, saddr, sport, &in6addr_any, hnum, dif, - hslot2, slot2); + hslot2, slot2, skb); } rcu_read_unlock(); return result; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 18:31 ` Eric Dumazet @ 2016-01-05 19:12 ` Craig Gallek 2016-01-05 19:34 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Craig Gallek @ 2016-01-05 19:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, David Miller On Tue, Jan 5, 2016 at 1:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2016-01-05 at 12:47 -0500, Craig Gallek wrote: >> On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> > >> > BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ? >> hdr_len only matters when you have an skb to work with. In both of >> the call sites of your suggested patch, NULL is passed for the skb >> parameter so hdr_len is never used. When no skb is available, the >> code falls back to the hash-based method used in the non-BPF case. > > But skb _is_ available ! > > We expect the BPF to always be run to get consistent selection. > > udp4_lib_lookup2() can be used if the hash bucket has more than 10 > sockets. SO_REUSEPORT should work the same way, ie BPF filter shoulw > work regardless of number of sockets in hash table. OK, I can buy that an skb should be piped through udp4|6_lib_lookup2, but I don't think it's safe to remove the skb NULL check in reuseport_select_sock. There's at least one path (udp_diag.c: udp_dump_one) which does a lookup without an skb. I'll start with your patch in a separate thread and we can discuss outside the context of the kfree_skb change. > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c > index ae0969c0fc2e..2b0bbecbc4b5 100644 > --- a/net/core/sock_reuseport.c > +++ b/net/core/sock_reuseport.c > @@ -220,7 +220,7 @@ struct sock *reuseport_select_sock(struct sock *sk, > /* paired with smp_wmb() in reuseport_add_sock() */ > smp_rmb(); > > - if (prog && skb) > + if (prog) > sk2 = run_bpf(reuse, socks, prog, skb, hdr_len); > else > sk2 = reuse->socks[reciprocal_scale(hash, socks)]; > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 835378365f25..3a66731e3af6 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -493,7 +493,8 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, > static struct sock *udp4_lib_lookup2(struct net *net, > __be32 saddr, __be16 sport, > __be32 daddr, unsigned int hnum, int dif, > - struct udp_hslot *hslot2, unsigned int slot2) > + struct udp_hslot *hslot2, unsigned int slot2, > + struct sk_buff *skb) > { > struct sock *sk, *result; > struct hlist_nulls_node *node; > @@ -514,7 +515,8 @@ begin: > struct sock *sk2; > hash = udp_ehashfn(net, daddr, hnum, > saddr, sport); > - sk2 = reuseport_select_sock(sk, hash, NULL, 0); > + sk2 = reuseport_select_sock(sk, hash, skb, > + sizeof(struct udphdr)); > if (sk2) { > result = sk2; > goto found; > @@ -573,7 +575,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, > > result = udp4_lib_lookup2(net, saddr, sport, > daddr, hnum, dif, > - hslot2, slot2); > + hslot2, slot2, skb); > if (!result) { > hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum); > slot2 = hash2 & udptable->mask; > @@ -583,7 +585,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, > > result = udp4_lib_lookup2(net, saddr, sport, > htonl(INADDR_ANY), hnum, dif, > - hslot2, slot2); > + hslot2, slot2, skb); > } > rcu_read_unlock(); > return result; > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 56fcb55fda31..5d2c2afffe7b 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -251,7 +251,8 @@ static inline int compute_score2(struct sock *sk, struct net *net, > static struct sock *udp6_lib_lookup2(struct net *net, > const struct in6_addr *saddr, __be16 sport, > const struct in6_addr *daddr, unsigned int hnum, int dif, > - struct udp_hslot *hslot2, unsigned int slot2) > + struct udp_hslot *hslot2, unsigned int slot2, > + struct sk_buff *skb) > { > struct sock *sk, *result; > struct hlist_nulls_node *node; > @@ -272,7 +273,8 @@ begin: > struct sock *sk2; > hash = udp6_ehashfn(net, daddr, hnum, > saddr, sport); > - sk2 = reuseport_select_sock(sk, hash, NULL, 0); > + sk2 = reuseport_select_sock(sk, hash, skb, > + sizeof(struct udphdr)); > if (sk2) { > result = sk2; > goto found; > @@ -331,7 +333,7 @@ struct sock *__udp6_lib_lookup(struct net *net, > > result = udp6_lib_lookup2(net, saddr, sport, > daddr, hnum, dif, > - hslot2, slot2); > + hslot2, slot2, skb); > if (!result) { > hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum); > slot2 = hash2 & udptable->mask; > @@ -341,7 +343,7 @@ struct sock *__udp6_lib_lookup(struct net *net, > > result = udp6_lib_lookup2(net, saddr, sport, > &in6addr_any, hnum, dif, > - hslot2, slot2); > + hslot2, slot2, skb); > } > rcu_read_unlock(); > return result; > > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 19:12 ` Craig Gallek @ 2016-01-05 19:34 ` Eric Dumazet 0 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2016-01-05 19:34 UTC (permalink / raw) To: Craig Gallek; +Cc: netdev, David Miller On Tue, 2016-01-05 at 14:12 -0500, Craig Gallek wrote: > OK, I can buy that an skb should be piped through udp4|6_lib_lookup2, > but I don't think it's safe to remove the skb NULL check in > reuseport_select_sock. There's at least one path (udp_diag.c: > udp_dump_one) which does a lookup without an skb. I'll start with > your patch in a separate thread and we can discuss outside the context > of the kfree_skb change. Well, udp_dump_one() is broken vs SO_REUSEPORT. Fortunately iproute2 does not use it (yet) We need to to get the socket designated by req->id.idiag_cookie, ie ignoring reusport hash or BPF completely. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case 2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek 2016-01-05 16:39 ` Daniel Borkmann 2016-01-05 17:38 ` Eric Dumazet @ 2016-01-06 6:31 ` David Miller 2 siblings, 0 replies; 8+ messages in thread From: David Miller @ 2016-01-06 6:31 UTC (permalink / raw) To: kraigatgoog; +Cc: netdev From: Craig Gallek <kraigatgoog@gmail.com> Date: Tue, 5 Jan 2016 10:57:13 -0500 > From: Craig Gallek <kraig@google.com> > > Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") > Suggested-by: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Craig Gallek <kraig@google.com> Also applied, thanks again Craig. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-06 6:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek 2016-01-05 16:39 ` Daniel Borkmann 2016-01-05 17:38 ` Eric Dumazet 2016-01-05 17:47 ` Craig Gallek 2016-01-05 18:31 ` Eric Dumazet 2016-01-05 19:12 ` Craig Gallek 2016-01-05 19:34 ` Eric Dumazet 2016-01-06 6:31 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.