* [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
@ 2016-01-05 15:57 Craig Gallek
2016-01-05 16:39 ` Daniel Borkmann
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Craig Gallek @ 2016-01-05 15:57 UTC (permalink / raw)
To: netdev, David Miller
From: Craig Gallek <kraig@google.com>
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Craig Gallek <kraig@google.com>
---
net/core/sock_reuseport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index ae0969c0fc2e..1df98c557440 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
/* temporarily advance data past protocol header */
if (!pskb_pull(skb, hdr_len)) {
- consume_skb(nskb);
+ kfree_skb(nskb);
return NULL;
}
index = bpf_prog_run_save_cb(prog, skb);
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek
@ 2016-01-05 16:39 ` Daniel Borkmann
2016-01-05 17:38 ` Eric Dumazet
2016-01-06 6:31 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: Daniel Borkmann @ 2016-01-05 16:39 UTC (permalink / raw)
To: Craig Gallek, netdev, David Miller
On 01/05/2016 04:57 PM, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Craig Gallek <kraig@google.com>
Thanks, Craig.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek
2016-01-05 16:39 ` Daniel Borkmann
@ 2016-01-05 17:38 ` Eric Dumazet
2016-01-05 17:47 ` Craig Gallek
2016-01-06 6:31 ` David Miller
2 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-01-05 17:38 UTC (permalink / raw)
To: Craig Gallek; +Cc: netdev, David Miller
On Tue, 2016-01-05 at 10:57 -0500, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
> net/core/sock_reuseport.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> index ae0969c0fc2e..1df98c557440 100644
> --- a/net/core/sock_reuseport.c
> +++ b/net/core/sock_reuseport.c
> @@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
>
> /* temporarily advance data past protocol header */
> if (!pskb_pull(skb, hdr_len)) {
> - consume_skb(nskb);
> + kfree_skb(nskb);
> return NULL;
> }
> index = bpf_prog_run_save_cb(prog, skb);
Note that we always call reuseport_select_sock() after pulling the
headers in skb->head anyway, so the pskb_pull() can never fail.
It really could be __skb_pull()
BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ?
I believe the following patch is needed.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 835378365f25..52387096dbba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -514,7 +514,8 @@ begin:
struct sock *sk2;
hash = udp_ehashfn(net, daddr, hnum,
saddr, sport);
- sk2 = reuseport_select_sock(sk, hash, NULL, 0);
+ sk2 = reuseport_select_sock(sk, hash, NULL,
+ sizeof(struct udphdr));
if (sk2) {
result = sk2;
goto found;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 56fcb55fda31..da0a5fa02b0f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -272,7 +272,8 @@ begin:
struct sock *sk2;
hash = udp6_ehashfn(net, daddr, hnum,
saddr, sport);
- sk2 = reuseport_select_sock(sk, hash, NULL, 0);
+ sk2 = reuseport_select_sock(sk, hash, NULL,
+ sizeof(struct udphdr));
if (sk2) {
result = sk2;
goto found;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 17:38 ` Eric Dumazet
@ 2016-01-05 17:47 ` Craig Gallek
2016-01-05 18:31 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Craig Gallek @ 2016-01-05 17:47 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David Miller
On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 10:57 -0500, Craig Gallek wrote:
>> From: Craig Gallek <kraig@google.com>
>>
>> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
>> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Craig Gallek <kraig@google.com>
>> ---
>> net/core/sock_reuseport.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
>> index ae0969c0fc2e..1df98c557440 100644
>> --- a/net/core/sock_reuseport.c
>> +++ b/net/core/sock_reuseport.c
>> @@ -173,7 +173,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
>>
>> /* temporarily advance data past protocol header */
>> if (!pskb_pull(skb, hdr_len)) {
>> - consume_skb(nskb);
>> + kfree_skb(nskb);
>> return NULL;
>> }
>> index = bpf_prog_run_save_cb(prog, skb);
>
> Note that we always call reuseport_select_sock() after pulling the
> headers in skb->head anyway, so the pskb_pull() can never fail.
>
> It really could be __skb_pull()
>
> BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ?
hdr_len only matters when you have an skb to work with. In both of
the call sites of your suggested patch, NULL is passed for the skb
parameter so hdr_len is never used. When no skb is available, the
code falls back to the hash-based method used in the non-BPF case.
> I believe the following patch is needed.
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 835378365f25..52387096dbba 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -514,7 +514,8 @@ begin:
> struct sock *sk2;
> hash = udp_ehashfn(net, daddr, hnum,
> saddr, sport);
> - sk2 = reuseport_select_sock(sk, hash, NULL, 0);
> + sk2 = reuseport_select_sock(sk, hash, NULL,
> + sizeof(struct udphdr));
> if (sk2) {
> result = sk2;
> goto found;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 56fcb55fda31..da0a5fa02b0f 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -272,7 +272,8 @@ begin:
> struct sock *sk2;
> hash = udp6_ehashfn(net, daddr, hnum,
> saddr, sport);
> - sk2 = reuseport_select_sock(sk, hash, NULL, 0);
> + sk2 = reuseport_select_sock(sk, hash, NULL,
> + sizeof(struct udphdr));
> if (sk2) {
> result = sk2;
> goto found;
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 17:47 ` Craig Gallek
@ 2016-01-05 18:31 ` Eric Dumazet
2016-01-05 19:12 ` Craig Gallek
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-01-05 18:31 UTC (permalink / raw)
To: Craig Gallek; +Cc: netdev, David Miller
On Tue, 2016-01-05 at 12:47 -0500, Craig Gallek wrote:
> On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ?
> hdr_len only matters when you have an skb to work with. In both of
> the call sites of your suggested patch, NULL is passed for the skb
> parameter so hdr_len is never used. When no skb is available, the
> code falls back to the hash-based method used in the non-BPF case.
But skb _is_ available !
We expect the BPF to always be run to get consistent selection.
udp4_lib_lookup2() can be used if the hash bucket has more than 10
sockets. SO_REUSEPORT should work the same way, ie BPF filter shoulw
work regardless of number of sockets in hash table.
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index ae0969c0fc2e..2b0bbecbc4b5 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -220,7 +220,7 @@ struct sock *reuseport_select_sock(struct sock *sk,
/* paired with smp_wmb() in reuseport_add_sock() */
smp_rmb();
- if (prog && skb)
+ if (prog)
sk2 = run_bpf(reuse, socks, prog, skb, hdr_len);
else
sk2 = reuse->socks[reciprocal_scale(hash, socks)];
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 835378365f25..3a66731e3af6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -493,7 +493,8 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
static struct sock *udp4_lib_lookup2(struct net *net,
__be32 saddr, __be16 sport,
__be32 daddr, unsigned int hnum, int dif,
- struct udp_hslot *hslot2, unsigned int slot2)
+ struct udp_hslot *hslot2, unsigned int slot2,
+ struct sk_buff *skb)
{
struct sock *sk, *result;
struct hlist_nulls_node *node;
@@ -514,7 +515,8 @@ begin:
struct sock *sk2;
hash = udp_ehashfn(net, daddr, hnum,
saddr, sport);
- sk2 = reuseport_select_sock(sk, hash, NULL, 0);
+ sk2 = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
if (sk2) {
result = sk2;
goto found;
@@ -573,7 +575,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, slot2);
+ hslot2, slot2, skb);
if (!result) {
hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
slot2 = hash2 & udptable->mask;
@@ -583,7 +585,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
result = udp4_lib_lookup2(net, saddr, sport,
htonl(INADDR_ANY), hnum, dif,
- hslot2, slot2);
+ hslot2, slot2, skb);
}
rcu_read_unlock();
return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 56fcb55fda31..5d2c2afffe7b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -251,7 +251,8 @@ static inline int compute_score2(struct sock *sk, struct net *net,
static struct sock *udp6_lib_lookup2(struct net *net,
const struct in6_addr *saddr, __be16 sport,
const struct in6_addr *daddr, unsigned int hnum, int dif,
- struct udp_hslot *hslot2, unsigned int slot2)
+ struct udp_hslot *hslot2, unsigned int slot2,
+ struct sk_buff *skb)
{
struct sock *sk, *result;
struct hlist_nulls_node *node;
@@ -272,7 +273,8 @@ begin:
struct sock *sk2;
hash = udp6_ehashfn(net, daddr, hnum,
saddr, sport);
- sk2 = reuseport_select_sock(sk, hash, NULL, 0);
+ sk2 = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
if (sk2) {
result = sk2;
goto found;
@@ -331,7 +333,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
result = udp6_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
- hslot2, slot2);
+ hslot2, slot2, skb);
if (!result) {
hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum);
slot2 = hash2 & udptable->mask;
@@ -341,7 +343,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
result = udp6_lib_lookup2(net, saddr, sport,
&in6addr_any, hnum, dif,
- hslot2, slot2);
+ hslot2, slot2, skb);
}
rcu_read_unlock();
return result;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 18:31 ` Eric Dumazet
@ 2016-01-05 19:12 ` Craig Gallek
2016-01-05 19:34 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Craig Gallek @ 2016-01-05 19:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David Miller
On Tue, Jan 5, 2016 at 1:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-05 at 12:47 -0500, Craig Gallek wrote:
>> On Tue, Jan 5, 2016 at 12:38 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> >
>> > BTW, why UDP calls reuseport_select_sock() with hdr_len == 0 sometimes ?
>> hdr_len only matters when you have an skb to work with. In both of
>> the call sites of your suggested patch, NULL is passed for the skb
>> parameter so hdr_len is never used. When no skb is available, the
>> code falls back to the hash-based method used in the non-BPF case.
>
> But skb _is_ available !
>
> We expect the BPF to always be run to get consistent selection.
>
> udp4_lib_lookup2() can be used if the hash bucket has more than 10
> sockets. SO_REUSEPORT should work the same way, ie BPF filter shoulw
> work regardless of number of sockets in hash table.
OK, I can buy that an skb should be piped through udp4|6_lib_lookup2,
but I don't think it's safe to remove the skb NULL check in
reuseport_select_sock. There's at least one path (udp_diag.c:
udp_dump_one) which does a lookup without an skb. I'll start with
your patch in a separate thread and we can discuss outside the context
of the kfree_skb change.
> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> index ae0969c0fc2e..2b0bbecbc4b5 100644
> --- a/net/core/sock_reuseport.c
> +++ b/net/core/sock_reuseport.c
> @@ -220,7 +220,7 @@ struct sock *reuseport_select_sock(struct sock *sk,
> /* paired with smp_wmb() in reuseport_add_sock() */
> smp_rmb();
>
> - if (prog && skb)
> + if (prog)
> sk2 = run_bpf(reuse, socks, prog, skb, hdr_len);
> else
> sk2 = reuse->socks[reciprocal_scale(hash, socks)];
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 835378365f25..3a66731e3af6 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -493,7 +493,8 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
> static struct sock *udp4_lib_lookup2(struct net *net,
> __be32 saddr, __be16 sport,
> __be32 daddr, unsigned int hnum, int dif,
> - struct udp_hslot *hslot2, unsigned int slot2)
> + struct udp_hslot *hslot2, unsigned int slot2,
> + struct sk_buff *skb)
> {
> struct sock *sk, *result;
> struct hlist_nulls_node *node;
> @@ -514,7 +515,8 @@ begin:
> struct sock *sk2;
> hash = udp_ehashfn(net, daddr, hnum,
> saddr, sport);
> - sk2 = reuseport_select_sock(sk, hash, NULL, 0);
> + sk2 = reuseport_select_sock(sk, hash, skb,
> + sizeof(struct udphdr));
> if (sk2) {
> result = sk2;
> goto found;
> @@ -573,7 +575,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
>
> result = udp4_lib_lookup2(net, saddr, sport,
> daddr, hnum, dif,
> - hslot2, slot2);
> + hslot2, slot2, skb);
> if (!result) {
> hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
> slot2 = hash2 & udptable->mask;
> @@ -583,7 +585,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
>
> result = udp4_lib_lookup2(net, saddr, sport,
> htonl(INADDR_ANY), hnum, dif,
> - hslot2, slot2);
> + hslot2, slot2, skb);
> }
> rcu_read_unlock();
> return result;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 56fcb55fda31..5d2c2afffe7b 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -251,7 +251,8 @@ static inline int compute_score2(struct sock *sk, struct net *net,
> static struct sock *udp6_lib_lookup2(struct net *net,
> const struct in6_addr *saddr, __be16 sport,
> const struct in6_addr *daddr, unsigned int hnum, int dif,
> - struct udp_hslot *hslot2, unsigned int slot2)
> + struct udp_hslot *hslot2, unsigned int slot2,
> + struct sk_buff *skb)
> {
> struct sock *sk, *result;
> struct hlist_nulls_node *node;
> @@ -272,7 +273,8 @@ begin:
> struct sock *sk2;
> hash = udp6_ehashfn(net, daddr, hnum,
> saddr, sport);
> - sk2 = reuseport_select_sock(sk, hash, NULL, 0);
> + sk2 = reuseport_select_sock(sk, hash, skb,
> + sizeof(struct udphdr));
> if (sk2) {
> result = sk2;
> goto found;
> @@ -331,7 +333,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
>
> result = udp6_lib_lookup2(net, saddr, sport,
> daddr, hnum, dif,
> - hslot2, slot2);
> + hslot2, slot2, skb);
> if (!result) {
> hash2 = udp6_portaddr_hash(net, &in6addr_any, hnum);
> slot2 = hash2 & udptable->mask;
> @@ -341,7 +343,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
>
> result = udp6_lib_lookup2(net, saddr, sport,
> &in6addr_any, hnum, dif,
> - hslot2, slot2);
> + hslot2, slot2, skb);
> }
> rcu_read_unlock();
> return result;
>
>
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 19:12 ` Craig Gallek
@ 2016-01-05 19:34 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-01-05 19:34 UTC (permalink / raw)
To: Craig Gallek; +Cc: netdev, David Miller
On Tue, 2016-01-05 at 14:12 -0500, Craig Gallek wrote:
> OK, I can buy that an skb should be piped through udp4|6_lib_lookup2,
> but I don't think it's safe to remove the skb NULL check in
> reuseport_select_sock. There's at least one path (udp_diag.c:
> udp_dump_one) which does a lookup without an skb. I'll start with
> your patch in a separate thread and we can discuss outside the context
> of the kfree_skb change.
Well, udp_dump_one() is broken vs SO_REUSEPORT. Fortunately iproute2
does not use it (yet)
We need to to get the socket designated by req->id.idiag_cookie,
ie ignoring reusport hash or BPF completely.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case
2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek
2016-01-05 16:39 ` Daniel Borkmann
2016-01-05 17:38 ` Eric Dumazet
@ 2016-01-06 6:31 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-01-06 6:31 UTC (permalink / raw)
To: kraigatgoog; +Cc: netdev
From: Craig Gallek <kraigatgoog@gmail.com>
Date: Tue, 5 Jan 2016 10:57:13 -0500
> From: Craig Gallek <kraig@google.com>
>
> Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Craig Gallek <kraig@google.com>
Also applied, thanks again Craig.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-06 6:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-05 15:57 [PATCH net-next] soreuseport: change consume_skb to kfree_skb in error case Craig Gallek
2016-01-05 16:39 ` Daniel Borkmann
2016-01-05 17:38 ` Eric Dumazet
2016-01-05 17:47 ` Craig Gallek
2016-01-05 18:31 ` Eric Dumazet
2016-01-05 19:12 ` Craig Gallek
2016-01-05 19:34 ` Eric Dumazet
2016-01-06 6:31 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.