* [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt @ 2017-11-09 23:04 Vlad Dumitrescu 2017-11-10 0:43 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Vlad Dumitrescu @ 2017-11-09 23:04 UTC (permalink / raw) To: davem, ast, daniel, brakmo; +Cc: netdev, kraigatgoog From: Vlad Dumitrescu <vladum@google.com> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu <vladum@google.com> --- net/core/filter.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 1afa17935954..61c791f9f628 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, if (!sk_fullsock(sk)) goto err_clear; + if (level == SOL_SOCKET) { + if (optlen != sizeof(int)) + goto err_clear; + + switch (optname) { + case SO_PRIORITY: + *((int *)optval) = sk->sk_priority; + break; + default: + goto err_clear; + } #ifdef CONFIG_INET - if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) { + } else if (level == SOL_TCP && + sk->sk_prot->getsockopt == tcp_getsockopt) { if (optname == TCP_CONGESTION) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -3304,11 +3316,11 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, } else { goto err_clear; } +#endif } else { goto err_clear; } return 0; -#endif err_clear: memset(optval, 0, optlen); return -EINVAL; -- 2.15.0.448.gf294e3d99a-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt 2017-11-09 23:04 [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt Vlad Dumitrescu @ 2017-11-10 0:43 ` Alexei Starovoitov 2017-11-10 17:51 ` Vlad Dumitrescu 2017-11-10 19:17 ` [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops Vlad Dumitrescu 0 siblings, 2 replies; 12+ messages in thread From: Alexei Starovoitov @ 2017-11-10 0:43 UTC (permalink / raw) To: Vlad Dumitrescu, davem, daniel, brakmo; +Cc: netdev, kraigatgoog On 11/10/17 8:04 AM, Vlad Dumitrescu wrote: > From: Vlad Dumitrescu <vladum@google.com> > > Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. > > Signed-off-by: Vlad Dumitrescu <vladum@google.com> > --- > net/core/filter.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 1afa17935954..61c791f9f628 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock, > if (!sk_fullsock(sk)) > goto err_clear; > > + if (level == SOL_SOCKET) { > + if (optlen != sizeof(int)) > + goto err_clear; > + > + switch (optname) { > + case SO_PRIORITY: > + *((int *)optval) = sk->sk_priority; would be cleaner to add sk_priority to 'struct bpf_sock_ops' instead. Faster runtime too. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt 2017-11-10 0:43 ` Alexei Starovoitov @ 2017-11-10 17:51 ` Vlad Dumitrescu 2017-11-10 19:17 ` [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops Vlad Dumitrescu 1 sibling, 0 replies; 12+ messages in thread From: Vlad Dumitrescu @ 2017-11-10 17:51 UTC (permalink / raw) To: Alexei Starovoitov, davem, daniel, brakmo; +Cc: netdev, kraigatgoog On Thu, Nov 9, 2017 at 4:43 PM, Alexei Starovoitov <ast@fb.com> wrote: > On 11/10/17 8:04 AM, Vlad Dumitrescu wrote: >> >> From: Vlad Dumitrescu <vladum@google.com> >> >> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >> >> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >> --- >> net/core/filter.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 1afa17935954..61c791f9f628 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3292,8 +3292,20 @@ BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern >> *, bpf_sock, >> if (!sk_fullsock(sk)) >> goto err_clear; >> >> + if (level == SOL_SOCKET) { >> + if (optlen != sizeof(int)) >> + goto err_clear; >> + >> + switch (optname) { >> + case SO_PRIORITY: >> + *((int *)optval) = sk->sk_priority; > > > would be cleaner to add sk_priority to 'struct bpf_sock_ops' instead. > Faster runtime too. > I agree it will be faster, and I considered that as well. However, I was aiming for consistency with the set function, which supports SO_PRIORITY. Maybe both (I have no preference)? I'll prepare the patch for bpf_sock_ops in the meantime. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-10 0:43 ` Alexei Starovoitov 2017-11-10 17:51 ` Vlad Dumitrescu @ 2017-11-10 19:17 ` Vlad Dumitrescu 2017-11-10 21:07 ` Daniel Borkmann 1 sibling, 1 reply; 12+ messages in thread From: Vlad Dumitrescu @ 2017-11-10 19:17 UTC (permalink / raw) To: davem, ast, daniel, brakmo; +Cc: netdev, kraigatgoog From: Vlad Dumitrescu <vladum@google.com> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu <vladum@google.com> --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 11 +++++++++++ tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 13 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 priority; }; /* List of known BPF sock_ops operators. diff --git a/net/core/filter.c b/net/core/filter.c index 61c791f9f628..a6329642d047 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, offsetof(struct sock_common, skc_num)); break; + + case offsetof(struct bpf_sock_ops, priority): + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); + + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, sk), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, sk)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct sock, sk_priority)); + break; } return insn - insn_buf; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 priority; }; /* List of known BPF sock_ops operators. -- 2.15.0.448.gf294e3d99a-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-10 19:17 ` [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops Vlad Dumitrescu @ 2017-11-10 21:07 ` Daniel Borkmann 2017-11-11 4:06 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Daniel Borkmann @ 2017-11-10 21:07 UTC (permalink / raw) To: Vlad Dumitrescu, davem, ast, brakmo; +Cc: netdev, kraigatgoog On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: > From: Vlad Dumitrescu <vladum@google.com> > > Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. > > Signed-off-by: Vlad Dumitrescu <vladum@google.com> > --- > include/uapi/linux/bpf.h | 1 + > net/core/filter.c | 11 +++++++++++ > tools/include/uapi/linux/bpf.h | 1 + > 3 files changed, 13 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index e880ae6434ee..9757a2002513 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -947,6 +947,7 @@ struct bpf_sock_ops { > __u32 local_ip6[4]; /* Stored in network byte order */ > __u32 remote_port; /* Stored in network byte order */ > __u32 local_port; /* stored in host byte order */ > + __u32 priority; > }; > > /* List of known BPF sock_ops operators. > diff --git a/net/core/filter.c b/net/core/filter.c > index 61c791f9f628..a6329642d047 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, > *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, > offsetof(struct sock_common, skc_num)); > break; > + > + case offsetof(struct bpf_sock_ops, priority): > + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); > + > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > + struct bpf_sock_ops_kern, sk), > + si->dst_reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, sk)); > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, > + offsetof(struct sock, sk_priority)); > + break; Hm, I don't think this would work, I actually think your initial patch was ok. bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) right before accessing options on either socket or TCP level, and bail out with error otherwise; in such cases we'd read something else here and assume it's sk_priority. > } > return insn - insn_buf; > } > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index e880ae6434ee..9757a2002513 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -947,6 +947,7 @@ struct bpf_sock_ops { > __u32 local_ip6[4]; /* Stored in network byte order */ > __u32 remote_port; /* Stored in network byte order */ > __u32 local_port; /* stored in host byte order */ > + __u32 priority; > }; > > /* List of known BPF sock_ops operators. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-10 21:07 ` Daniel Borkmann @ 2017-11-11 4:06 ` Alexei Starovoitov 2017-11-11 20:46 ` Daniel Borkmann 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2017-11-11 4:06 UTC (permalink / raw) To: Daniel Borkmann, Vlad Dumitrescu, davem, brakmo; +Cc: netdev, kraigatgoog On 11/11/17 6:07 AM, Daniel Borkmann wrote: > On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >> From: Vlad Dumitrescu <vladum@google.com> >> >> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >> >> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >> --- >> include/uapi/linux/bpf.h | 1 + >> net/core/filter.c | 11 +++++++++++ >> tools/include/uapi/linux/bpf.h | 1 + >> 3 files changed, 13 insertions(+) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index e880ae6434ee..9757a2002513 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >> __u32 local_ip6[4]; /* Stored in network byte order */ >> __u32 remote_port; /* Stored in network byte order */ >> __u32 local_port; /* stored in host byte order */ >> + __u32 priority; >> }; >> /* List of known BPF sock_ops operators. >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 61c791f9f628..a6329642d047 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >> bpf_access_type type, >> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >> offsetof(struct sock_common, skc_num)); >> break; >> + >> + case offsetof(struct bpf_sock_ops, priority): >> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >> + >> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >> + struct bpf_sock_ops_kern, sk), >> + si->dst_reg, si->src_reg, >> + offsetof(struct bpf_sock_ops_kern, sk)); >> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >> + offsetof(struct sock, sk_priority)); >> + break; > > Hm, I don't think this would work, I actually think your initial patch > was ok. > bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) > right > before accessing options on either socket or TCP level, and bail out > with error > otherwise; in such cases we'd read something else here and assume it's > sk_priority. even if it's not fullsock, it will just read zero, no? what's a problem with that? In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB the program author will know that it's meaningless to read sk_priority, so returning zero with minimal checks is fine. While adding extra runtime if (sk_fullsock(sk)) is unnecessary, since the safety is not compromised. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-11 4:06 ` Alexei Starovoitov @ 2017-11-11 20:46 ` Daniel Borkmann 2017-11-11 22:38 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Daniel Borkmann @ 2017-11-11 20:46 UTC (permalink / raw) To: Alexei Starovoitov, Vlad Dumitrescu, davem, brakmo; +Cc: netdev, kraigatgoog On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: > On 11/11/17 6:07 AM, Daniel Borkmann wrote: >> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >>> From: Vlad Dumitrescu <vladum@google.com> >>> >>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >>> >>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >>> --- >>> include/uapi/linux/bpf.h | 1 + >>> net/core/filter.c | 11 +++++++++++ >>> tools/include/uapi/linux/bpf.h | 1 + >>> 3 files changed, 13 insertions(+) >>> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>> index e880ae6434ee..9757a2002513 100644 >>> --- a/include/uapi/linux/bpf.h >>> +++ b/include/uapi/linux/bpf.h >>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >>> __u32 local_ip6[4]; /* Stored in network byte order */ >>> __u32 remote_port; /* Stored in network byte order */ >>> __u32 local_port; /* stored in host byte order */ >>> + __u32 priority; >>> }; >>> /* List of known BPF sock_ops operators. >>> diff --git a/net/core/filter.c b/net/core/filter.c >>> index 61c791f9f628..a6329642d047 100644 >>> --- a/net/core/filter.c >>> +++ b/net/core/filter.c >>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >>> bpf_access_type type, >>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >>> offsetof(struct sock_common, skc_num)); >>> break; >>> + >>> + case offsetof(struct bpf_sock_ops, priority): >>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >>> + >>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >>> + struct bpf_sock_ops_kern, sk), >>> + si->dst_reg, si->src_reg, >>> + offsetof(struct bpf_sock_ops_kern, sk)); >>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >>> + offsetof(struct sock, sk_priority)); >>> + break; >> >> Hm, I don't think this would work, I actually think your initial patch >> was ok. >> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >> right >> before accessing options on either socket or TCP level, and bail out >> with error >> otherwise; in such cases we'd read something else here and assume it's >> sk_priority. > > even if it's not fullsock, it will just read zero, no? what's a problem > with that? > In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB > the program author will know that it's meaningless to read sk_priority, > so returning zero with minimal checks is fine. > While adding extra runtime if (sk_fullsock(sk)) is unnecessary, > since the safety is not compromised. Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, struct request_sock itself is only 232 byte long in total, and the struct inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds that way, so it cannot be ignored and assumed zero. If we don't care about error when !fullsock, then you could code the sk_fullsock(sk) check in BPF itself above in the ctx conversion, and set it to 0 manually when !fullsock. It might make it harder in the future to change sk_fullsock() itself, but in any case sk_fullsock() helper should get a comment in its function saying that when contents are changed, also above BPF bits need to be adjusted to remain an equivalent test. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-11 20:46 ` Daniel Borkmann @ 2017-11-11 22:38 ` Alexei Starovoitov 2017-11-13 19:00 ` Vlad Dumitrescu 0 siblings, 1 reply; 12+ messages in thread From: Alexei Starovoitov @ 2017-11-11 22:38 UTC (permalink / raw) To: Daniel Borkmann, Vlad Dumitrescu, davem, brakmo; +Cc: netdev, kraigatgoog On 11/12/17 4:46 AM, Daniel Borkmann wrote: > On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: >> On 11/11/17 6:07 AM, Daniel Borkmann wrote: >>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >>>> From: Vlad Dumitrescu <vladum@google.com> >>>> >>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >>>> >>>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >>>> --- >>>> include/uapi/linux/bpf.h | 1 + >>>> net/core/filter.c | 11 +++++++++++ >>>> tools/include/uapi/linux/bpf.h | 1 + >>>> 3 files changed, 13 insertions(+) >>>> >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>> index e880ae6434ee..9757a2002513 100644 >>>> --- a/include/uapi/linux/bpf.h >>>> +++ b/include/uapi/linux/bpf.h >>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >>>> __u32 local_ip6[4]; /* Stored in network byte order */ >>>> __u32 remote_port; /* Stored in network byte order */ >>>> __u32 local_port; /* stored in host byte order */ >>>> + __u32 priority; >>>> }; >>>> /* List of known BPF sock_ops operators. >>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>> index 61c791f9f628..a6329642d047 100644 >>>> --- a/net/core/filter.c >>>> +++ b/net/core/filter.c >>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >>>> bpf_access_type type, >>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >>>> offsetof(struct sock_common, skc_num)); >>>> break; >>>> + >>>> + case offsetof(struct bpf_sock_ops, priority): >>>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >>>> + >>>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >>>> + struct bpf_sock_ops_kern, sk), >>>> + si->dst_reg, si->src_reg, >>>> + offsetof(struct bpf_sock_ops_kern, sk)); >>>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >>>> + offsetof(struct sock, sk_priority)); >>>> + break; >>> >>> Hm, I don't think this would work, I actually think your initial patch >>> was ok. >>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >>> right >>> before accessing options on either socket or TCP level, and bail out >>> with error >>> otherwise; in such cases we'd read something else here and assume it's >>> sk_priority. >> >> even if it's not fullsock, it will just read zero, no? what's a problem >> with that? >> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB >> the program author will know that it's meaningless to read sk_priority, >> so returning zero with minimal checks is fine. >> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, >> since the safety is not compromised. > > Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, > struct request_sock itself is only 232 byte long in total, and the struct > inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds > that way, so it cannot be ignored and assumed zero. I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) so yeah ctx rewrite approach won't work. Let's go back to access via helper. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-11 22:38 ` Alexei Starovoitov @ 2017-11-13 19:00 ` Vlad Dumitrescu 2017-11-13 20:09 ` Lawrence Brakmo 0 siblings, 1 reply; 12+ messages in thread From: Vlad Dumitrescu @ 2017-11-13 19:00 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, brakmo, davem; +Cc: netdev, Craig Gallek On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov <ast@fb.com> wrote: > > On 11/12/17 4:46 AM, Daniel Borkmann wrote: >> >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: >>> >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote: >>>> >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >>>>> >>>>> From: Vlad Dumitrescu <vladum@google.com> >>>>> >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >>>>> >>>>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >>>>> --- >>>>> include/uapi/linux/bpf.h | 1 + >>>>> net/core/filter.c | 11 +++++++++++ >>>>> tools/include/uapi/linux/bpf.h | 1 + >>>>> 3 files changed, 13 insertions(+) >>>>> >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>> index e880ae6434ee..9757a2002513 100644 >>>>> --- a/include/uapi/linux/bpf.h >>>>> +++ b/include/uapi/linux/bpf.h >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >>>>> __u32 local_ip6[4]; /* Stored in network byte order */ >>>>> __u32 remote_port; /* Stored in network byte order */ >>>>> __u32 local_port; /* stored in host byte order */ >>>>> + __u32 priority; >>>>> }; >>>>> /* List of known BPF sock_ops operators. >>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>> index 61c791f9f628..a6329642d047 100644 >>>>> --- a/net/core/filter.c >>>>> +++ b/net/core/filter.c >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >>>>> bpf_access_type type, >>>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >>>>> offsetof(struct sock_common, skc_num)); >>>>> break; >>>>> + >>>>> + case offsetof(struct bpf_sock_ops, priority): >>>>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >>>>> + >>>>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >>>>> + struct bpf_sock_ops_kern, sk), >>>>> + si->dst_reg, si->src_reg, >>>>> + offsetof(struct bpf_sock_ops_kern, sk)); >>>>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >>>>> + offsetof(struct sock, sk_priority)); >>>>> + break; >>>> >>>> >>>> Hm, I don't think this would work, I actually think your initial patch >>>> was ok. >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >>>> right >>>> before accessing options on either socket or TCP level, and bail out >>>> with error >>>> otherwise; in such cases we'd read something else here and assume it's >>>> sk_priority. >>> >>> >>> even if it's not fullsock, it will just read zero, no? what's a problem >>> with that? >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB >>> the program author will know that it's meaningless to read sk_priority, >>> so returning zero with minimal checks is fine. >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, >>> since the safety is not compromised. >> >> >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, >> struct request_sock itself is only 232 byte long in total, and the struct >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds >> that way, so it cannot be ignored and assumed zero. > > > I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) > so yeah ctx rewrite approach won't work. > Let's go back to access via helper. > TIL. Thanks! Is there anything else needed from me to get the helper approach accepted? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-13 19:00 ` Vlad Dumitrescu @ 2017-11-13 20:09 ` Lawrence Brakmo 2017-11-13 20:20 ` Daniel Borkmann 0 siblings, 1 reply; 12+ messages in thread From: Lawrence Brakmo @ 2017-11-13 20:09 UTC (permalink / raw) To: Vlad Dumitrescu, Alexei Starovoitov, Daniel Borkmann, davem Cc: netdev, Craig Gallek On 11/13/17, 11:01 AM, "Vlad Dumitrescu" <vlad@dumitrescu.ro> wrote: On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov <ast@fb.com> wrote: > > On 11/12/17 4:46 AM, Daniel Borkmann wrote: >> >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: >>> >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote: >>>> >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >>>>> >>>>> From: Vlad Dumitrescu <vladum@google.com> >>>>> >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >>>>> >>>>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >>>>> --- >>>>> include/uapi/linux/bpf.h | 1 + >>>>> net/core/filter.c | 11 +++++++++++ >>>>> tools/include/uapi/linux/bpf.h | 1 + >>>>> 3 files changed, 13 insertions(+) >>>>> >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>> index e880ae6434ee..9757a2002513 100644 >>>>> --- a/include/uapi/linux/bpf.h >>>>> +++ b/include/uapi/linux/bpf.h >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >>>>> __u32 local_ip6[4]; /* Stored in network byte order */ >>>>> __u32 remote_port; /* Stored in network byte order */ >>>>> __u32 local_port; /* stored in host byte order */ >>>>> + __u32 priority; >>>>> }; >>>>> /* List of known BPF sock_ops operators. >>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>> index 61c791f9f628..a6329642d047 100644 >>>>> --- a/net/core/filter.c >>>>> +++ b/net/core/filter.c >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >>>>> bpf_access_type type, >>>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >>>>> offsetof(struct sock_common, skc_num)); >>>>> break; >>>>> + >>>>> + case offsetof(struct bpf_sock_ops, priority): >>>>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >>>>> + >>>>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >>>>> + struct bpf_sock_ops_kern, sk), >>>>> + si->dst_reg, si->src_reg, >>>>> + offsetof(struct bpf_sock_ops_kern, sk)); >>>>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >>>>> + offsetof(struct sock, sk_priority)); >>>>> + break; >>>> >>>> >>>> Hm, I don't think this would work, I actually think your initial patch >>>> was ok. >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >>>> right >>>> before accessing options on either socket or TCP level, and bail out >>>> with error >>>> otherwise; in such cases we'd read something else here and assume it's >>>> sk_priority. >>> >>> >>> even if it's not fullsock, it will just read zero, no? what's a problem >>> with that? >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB >>> the program author will know that it's meaningless to read sk_priority, >>> so returning zero with minimal checks is fine. >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, >>> since the safety is not compromised. >> >> >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, >> struct request_sock itself is only 232 byte long in total, and the struct >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds >> that way, so it cannot be ignored and assumed zero. > > > I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) > so yeah ctx rewrite approach won't work. > Let's go back to access via helper. > TIL. Thanks! Is there anything else needed from me to get the helper approach accepted? I plan to add access to TCP state variables (cwnd, rtt, etc.) and I have been thinking about this issue. I think it is possible to access it directly as long as we use a value like 0xffffffff to represent an invalid value (e.g. not fullsock). The ctx rewrite just needs to add a conditional to determine what to return. I would probably add a field into the internal kernel struct to indicate if it is fullsock or not (we should know when we call tcp_call_bpf whether it is a fullsock or not based on context). Let me do a sample patch that I can send for review and get feedback from Alexi and Daniel. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-13 20:09 ` Lawrence Brakmo @ 2017-11-13 20:20 ` Daniel Borkmann 2017-11-13 22:51 ` Alexei Starovoitov 0 siblings, 1 reply; 12+ messages in thread From: Daniel Borkmann @ 2017-11-13 20:20 UTC (permalink / raw) To: Lawrence Brakmo, Vlad Dumitrescu, Alexei Starovoitov, davem Cc: netdev, Craig Gallek On 11/13/2017 09:09 PM, Lawrence Brakmo wrote: > On 11/13/17, 11:01 AM, "Vlad Dumitrescu" <vlad@dumitrescu.ro> wrote: > > On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov <ast@fb.com> wrote: > > > > On 11/12/17 4:46 AM, Daniel Borkmann wrote: > >> > >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: > >>> > >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote: > >>>> > >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: > >>>>> > >>>>> From: Vlad Dumitrescu <vladum@google.com> > >>>>> > >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. > >>>>> > >>>>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> > >>>>> --- > >>>>> include/uapi/linux/bpf.h | 1 + > >>>>> net/core/filter.c | 11 +++++++++++ > >>>>> tools/include/uapi/linux/bpf.h | 1 + > >>>>> 3 files changed, 13 insertions(+) > >>>>> > >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>>>> index e880ae6434ee..9757a2002513 100644 > >>>>> --- a/include/uapi/linux/bpf.h > >>>>> +++ b/include/uapi/linux/bpf.h > >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { > >>>>> __u32 local_ip6[4]; /* Stored in network byte order */ > >>>>> __u32 remote_port; /* Stored in network byte order */ > >>>>> __u32 local_port; /* stored in host byte order */ > >>>>> + __u32 priority; > >>>>> }; > >>>>> /* List of known BPF sock_ops operators. > >>>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>>> index 61c791f9f628..a6329642d047 100644 > >>>>> --- a/net/core/filter.c > >>>>> +++ b/net/core/filter.c > >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum > >>>>> bpf_access_type type, > >>>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, > >>>>> offsetof(struct sock_common, skc_num)); > >>>>> break; > >>>>> + > >>>>> + case offsetof(struct bpf_sock_ops, priority): > >>>>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); > >>>>> + > >>>>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > >>>>> + struct bpf_sock_ops_kern, sk), > >>>>> + si->dst_reg, si->src_reg, > >>>>> + offsetof(struct bpf_sock_ops_kern, sk)); > >>>>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, > >>>>> + offsetof(struct sock, sk_priority)); > >>>>> + break; > >>>> > >>>> > >>>> Hm, I don't think this would work, I actually think your initial patch > >>>> was ok. > >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) > >>>> right > >>>> before accessing options on either socket or TCP level, and bail out > >>>> with error > >>>> otherwise; in such cases we'd read something else here and assume it's > >>>> sk_priority. > >>> > >>> > >>> even if it's not fullsock, it will just read zero, no? what's a problem > >>> with that? > >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB > >>> the program author will know that it's meaningless to read sk_priority, > >>> so returning zero with minimal checks is fine. > >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, > >>> since the safety is not compromised. > >> > >> > >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, > >> struct request_sock itself is only 232 byte long in total, and the struct > >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds > >> that way, so it cannot be ignored and assumed zero. > > > > > > I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) > > so yeah ctx rewrite approach won't work. > > Let's go back to access via helper. > > > > TIL. Thanks! > > Is there anything else needed from me to get the helper approach accepted? > > I plan to add access to TCP state variables (cwnd, rtt, etc.) and I have been thinking > about this issue. I think it is possible to access it directly as long as we use a value > like 0xffffffff to represent an invalid value (e.g. not fullsock). The ctx rewrite just > needs to add a conditional to determine what to return. I would probably add a > field into the internal kernel struct to indicate if it is fullsock or not (we should > know when we call tcp_call_bpf whether it is a fullsock or not based on context). > > Let me do a sample patch that I can send for review and get feedback from > Alexi and Daniel. Agree, if the mov op from the ctx rewrite to read(/write) a sk member, for example, is just a BPF_W, then we know upper reg bits are zero anyway for the success case, so we might be able to utilize this for writing a signed error back to the user if !fullsk. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops 2017-11-13 20:20 ` Daniel Borkmann @ 2017-11-13 22:51 ` Alexei Starovoitov 0 siblings, 0 replies; 12+ messages in thread From: Alexei Starovoitov @ 2017-11-13 22:51 UTC (permalink / raw) To: Daniel Borkmann, Lawrence Brakmo, Vlad Dumitrescu, davem Cc: netdev, Craig Gallek On 11/14/17 4:20 AM, Daniel Borkmann wrote: > On 11/13/2017 09:09 PM, Lawrence Brakmo wrote: >> On 11/13/17, 11:01 AM, "Vlad Dumitrescu" <vlad@dumitrescu.ro> wrote: >> >> On Sat, Nov 11, 2017 at 2:38 PM, Alexei Starovoitov <ast@fb.com> wrote: >> > >> > On 11/12/17 4:46 AM, Daniel Borkmann wrote: >> >> >> >> On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: >> >>> >> >>> On 11/11/17 6:07 AM, Daniel Borkmann wrote: >> >>>> >> >>>> On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: >> >>>>> >> >>>>> From: Vlad Dumitrescu <vladum@google.com> >> >>>>> >> >>>>> Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. >> >>>>> >> >>>>> Signed-off-by: Vlad Dumitrescu <vladum@google.com> >> >>>>> --- >> >>>>> include/uapi/linux/bpf.h | 1 + >> >>>>> net/core/filter.c | 11 +++++++++++ >> >>>>> tools/include/uapi/linux/bpf.h | 1 + >> >>>>> 3 files changed, 13 insertions(+) >> >>>>> >> >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> >>>>> index e880ae6434ee..9757a2002513 100644 >> >>>>> --- a/include/uapi/linux/bpf.h >> >>>>> +++ b/include/uapi/linux/bpf.h >> >>>>> @@ -947,6 +947,7 @@ struct bpf_sock_ops { >> >>>>> __u32 local_ip6[4]; /* Stored in network byte order */ >> >>>>> __u32 remote_port; /* Stored in network byte order */ >> >>>>> __u32 local_port; /* stored in host byte order */ >> >>>>> + __u32 priority; >> >>>>> }; >> >>>>> /* List of known BPF sock_ops operators. >> >>>>> diff --git a/net/core/filter.c b/net/core/filter.c >> >>>>> index 61c791f9f628..a6329642d047 100644 >> >>>>> --- a/net/core/filter.c >> >>>>> +++ b/net/core/filter.c >> >>>>> @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum >> >>>>> bpf_access_type type, >> >>>>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, >> >>>>> offsetof(struct sock_common, skc_num)); >> >>>>> break; >> >>>>> + >> >>>>> + case offsetof(struct bpf_sock_ops, priority): >> >>>>> + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); >> >>>>> + >> >>>>> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( >> >>>>> + struct bpf_sock_ops_kern, sk), >> >>>>> + si->dst_reg, si->src_reg, >> >>>>> + offsetof(struct bpf_sock_ops_kern, sk)); >> >>>>> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, >> >>>>> + offsetof(struct sock, sk_priority)); >> >>>>> + break; >> >>>> >> >>>> >> >>>> Hm, I don't think this would work, I actually think your initial patch >> >>>> was ok. >> >>>> bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) >> >>>> right >> >>>> before accessing options on either socket or TCP level, and bail out >> >>>> with error >> >>>> otherwise; in such cases we'd read something else here and assume it's >> >>>> sk_priority. >> >>> >> >>> >> >>> even if it's not fullsock, it will just read zero, no? what's a problem >> >>> with that? >> >>> In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB >> >>> the program author will know that it's meaningless to read sk_priority, >> >>> so returning zero with minimal checks is fine. >> >>> While adding extra runtime if (sk_fullsock(sk)) is unnecessary, >> >>> since the safety is not compromised. >> >> >> >> >> >> Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, >> >> struct request_sock itself is only 232 byte long in total, and the struct >> >> inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds >> >> that way, so it cannot be ignored and assumed zero. >> > >> > >> > I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) >> > so yeah ctx rewrite approach won't work. >> > Let's go back to access via helper. >> > >> >> TIL. Thanks! >> >> Is there anything else needed from me to get the helper approach accepted? >> >> I plan to add access to TCP state variables (cwnd, rtt, etc.) and I have been thinking >> about this issue. I think it is possible to access it directly as long as we use a value >> like 0xffffffff to represent an invalid value (e.g. not fullsock). The ctx rewrite just >> needs to add a conditional to determine what to return. I would probably add a >> field into the internal kernel struct to indicate if it is fullsock or not (we should >> know when we call tcp_call_bpf whether it is a fullsock or not based on context). >> >> Let me do a sample patch that I can send for review and get feedback from >> Alexi and Daniel. > > Agree, if the mov op from the ctx rewrite to read(/write) a sk member, for > example, is just a BPF_W, then we know upper reg bits are zero anyway for the > success case, so we might be able to utilize this for writing a signed error > back to the user if !fullsk. it can be __u64 in bpf_sock_ops too, while real read is 32-bit or less, then guaranteed no conflicts if we return (s64)-enoent or (s64)-einval in case of !fullsock. I like the idea of copying boolean value of sk_fullsock() into hidden part of bpf_sock_ops_kern, since it's been accessed by tcp_call_bpf() anyway. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-11-13 22:51 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-09 23:04 [PATCH net-next] bpf: add support for SO_PRIORITY in bpf_getsockopt Vlad Dumitrescu 2017-11-10 0:43 ` Alexei Starovoitov 2017-11-10 17:51 ` Vlad Dumitrescu 2017-11-10 19:17 ` [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops Vlad Dumitrescu 2017-11-10 21:07 ` Daniel Borkmann 2017-11-11 4:06 ` Alexei Starovoitov 2017-11-11 20:46 ` Daniel Borkmann 2017-11-11 22:38 ` Alexei Starovoitov 2017-11-13 19:00 ` Vlad Dumitrescu 2017-11-13 20:09 ` Lawrence Brakmo 2017-11-13 20:20 ` Daniel Borkmann 2017-11-13 22:51 ` Alexei Starovoitov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.