bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Menglong Dong <menglong8.dong@gmail.com>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	KP Singh <kpsingh@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Mengen Sun <mengensun@tencent.com>, <flyingpeng@tencent.com>,
	<mungerjiang@tencent.com>, Menglong Dong <imagedong@tencent.com>
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'
Date: Mon, 24 Jan 2022 17:16:55 -0800	[thread overview]
Message-ID: <20220125011655.qpb7gelbik4tdwcf@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAADnVQ+xnnuf3ssgmkR3Nui46WT6h37RUU1zsjhOhy+vCfVdXA@mail.gmail.com>

On Mon, Jan 24, 2022 at 05:03:20PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 24, 2022 at 4:35 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > > > Hello!
> > > > > >
> > > > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > >
> > > > > > [...]
> > > > > > >
> > > > > > > Looks like
> > > > > > >  __sk_buff->remote_port
> > > > > > >  bpf_sock_ops->remote_port
> > > > > > >  sk_msg_md->remote_port
> > > > > > > are doing the right thing,
> > > > > > > but bpf_sock->dst_port is not correct?
> > > > > > >
> > > > > > > I think it's better to fix it,
> > > > > > > but probably need to consolidate it with
> > > > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > > > won't be correct.
> > > > > >
> > > > > > What's the meaning of 'narrow access'? Do you mean to
> > > > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > > > won't work, as it only is accessible with u32.
> > > > >
> > > > > u8 access to remote_port won't pass the verifier,
> > > > > but u8 access to dst_port will.
> > > > > Though it will return incorrect data.
> > > > > See how convert_ctx_accesses() handles narrow loads.
> > > > > I think we need to generalize it for different endian fields.
> > > >
> > > > Yeah, I understand narrower load in convert_ctx_accesses()
> > > > now. Seems u8 access to dst_port can't pass the verifier too,
> > > > which can be seen form bpf_sock_is_valid_access():
> > > >
> > > > $    switch (off) {
> > > > $    case offsetof(struct bpf_sock, state):
> > > > $    case offsetof(struct bpf_sock, family):
> > > > $    case offsetof(struct bpf_sock, type):
> > > > $    case offsetof(struct bpf_sock, protocol):
> > > > $    case offsetof(struct bpf_sock, dst_port):  // u8 access is not allowed
> > > > $    case offsetof(struct bpf_sock, src_port):
> > > > $    case offsetof(struct bpf_sock, rx_queue_mapping):
> > > > $    case bpf_ctx_range(struct bpf_sock, src_ip4):
> > > > $    case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > > > $    case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > > > $    case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > > > $        bpf_ctx_record_field_size(info, size_default);
> > > > $        return bpf_ctx_narrow_access_ok(off, size, size_default);
> > > > $    }
> > > >
> > > > I'm still not sure what should we do now. Should we make all
> > > > remote_port and dst_port narrower accessable and endianness
> > > > right? For example the remote_port in struct bpf_sock_ops:
> > > >
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > > >                                 return false;
> > > >                         info->reg_type = PTR_TO_PACKET_END;
> > > >                         break;
> > > > +               case bpf_ctx_range(struct bpf_sock_ops, remote_port):
> > >
> > > Ahh. bpf_sock_ops don't have it.
> > > But bpf_sk_lookup and sk_msg_md have it.
> > >
> > > bpf_sk_lookup->remote_port
> > > supports narrow access.
> > >
> > > When it accesses sport from bpf_sk_lookup_kern.
> > >
> > > and we have tests that do u8 access from remote_port.
> > > See verifier/ctx_sk_lookup.c
> > >
> > > >                 case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > > >                         bpf_ctx_record_field_size(info, size_default);
> > > >                         return bpf_ctx_narrow_access_ok(off, size,
> > > >
> > > > If remote_port/dst_port are made narrower accessable, the
> > > > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > > > be the port with network byte order. And the port in host byte
> > > > order can be get with:
> > > > bpf_ntohs(*((u16*)&sk->remote_port))
> > > > or
> > > > bpf_htonl(sk->remote_port)
> > >
> > > So u8, u16, u32 will work if we make them narrow-accessible, right?
> > >
> > > The summary if I understood it:
> > > . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> > > . bpf_sock->dst_port is not correct for u32,
> > >   since it's missing bpf_ctx_range() ?
> > > . __sk_buff->remote_port
> > >  bpf_sock_ops->remote_port
> > >  sk_msg_md->remote_port
> > >  correct for u32 access only. They don't support narrow access.
> > >
> > > but wait
> > > we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> > > How does it work then?
> > >
> > > I think we need more eyes on the problem.
> > > cc-ing more experts.
> > iiuc,  I think both bpf_sk_lookup and bpf_sock allow narrow access.
> > bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
> > not ((__u8 *)&bpf_sock->dst_port)[1].  bpf_sk_lookup allows reading
> > a byte at [0], [1], [2], and [3].
> >
> > The test_sock_fields.c currently works because it is comparing
> > with another __u16: "sk->dst_port == srv_sa6.sin6_port".
> > It should also work with bpf_ntohS() which usually is what the
> > userspace program expects when dealing with port instead of using bpf_ntohl()?
> > Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
> > and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
> > using.  Also, changing it to the upper 16 bits will break existing
> > bpf progs.
> >
> > For narrow access with any number of bytes at any offset may be useful
> > for IP[6] addr.  Not sure about the port though.  Ideally it should only
> > allow sizeof(__u16) read at offset 0.  However, I think at this point it makes
> > sense to make them consistent with how bpf_sk_lookup does it also,
> > i.e. allow byte [0], [1], [2], and [3] access.
> 
> Sounds like the proposal is to do:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a06931c27eeb..1a8c97bc1927 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8276,9 +8276,9 @@ bool bpf_sock_is_valid_access(int off, int size,
> enum bpf_access_type type,
>         case offsetof(struct bpf_sock, family):
>         case offsetof(struct bpf_sock, type):
>         case offsetof(struct bpf_sock, protocol):
> -       case offsetof(struct bpf_sock, dst_port):
>         case offsetof(struct bpf_sock, src_port):
>         case offsetof(struct bpf_sock, rx_queue_mapping):
> +       case bpf_ctx_range(struct bpf_sock, dst_port):
>         case bpf_ctx_range(struct bpf_sock, src_ip4):
> 
> and then document bpf_sock->dst_port and bpf_sk_lookup->remote_port
also bpf_sock_addr->user_port

> behavior and their difference vs
>   __sk_buff->remote_port
>   bpf_sock_ops->remote_port
>   sk_msg_md->remote_port
> ?
Yes, agree on the code change and adding doc.

> I suspect we cannot remove lshift_16 from them either,
> since it might break some prog as well.
Right, I believe the existing lshift_16 has to stay.

  reply	other threads:[~2022-01-25  3:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13  7:02 [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock' menglong8.dong
2022-01-13 18:55 ` Song Liu
2022-01-19 22:03 ` Alexei Starovoitov
2022-01-20  3:02   ` Menglong Dong
2022-01-20  4:17     ` Alexei Starovoitov
2022-01-20 14:14       ` Menglong Dong
2022-01-21  5:17         ` Alexei Starovoitov
2022-01-25  0:35           ` Martin KaFai Lau
2022-01-25  1:03             ` Alexei Starovoitov
2022-01-25  1:16               ` Martin KaFai Lau [this message]
2022-01-25  3:09             ` Menglong Dong
2022-01-25 19:24 ` Jakub Sitnicki
2022-01-25 22:45   ` Martin KaFai Lau
2022-01-25 23:02     ` Alexei Starovoitov
2022-01-25 23:53       ` Martin KaFai Lau
2022-01-27 17:31         ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220125011655.qpb7gelbik4tdwcf@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=flyingpeng@tencent.com \
    --cc=imagedong@tencent.com \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mengensun@tencent.com \
    --cc=menglong8.dong@gmail.com \
    --cc=mungerjiang@tencent.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).