All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	bpf@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>
Subject: Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
Date: Mon, 14 Mar 2022 21:57:55 +0100	[thread overview]
Message-ID: <87k0cwxkzs.fsf@cloudflare.com> (raw)
In-Reply-To: <f46bff60be49ab2062770d39a20d1874b10c70ae.camel@linux.ibm.com>


On Mon, Mar 14, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> On Mon, 2022-03-14 at 18:35 +0100, Jakub Sitnicki wrote:
>> On Thu, Mar 10, 2022 at 11:57 PM +01, Jakub Sitnicki wrote:
>> > On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote:
>> > > On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:
>> > 
>> > [...]
>> > 
>> > > > 
>> > > > Consider this - today the below is true on both LE and BE,
>> > > > right?
>> > > > 
>> > > >   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
>> > > > 
>> > > > because the loads get converted to:
>> > > > 
>> > > >   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
>> > > > 
>> > > > IOW, today, because of the bug that you are fixing here, the
>> > > > data
>> > > > layout
>> > > > changes from the PoV of the BPF program depending on the load
>> > > > size.
>> > > > 
>> > > > With 2-byte loads, without this patch, the data layout appears
>> > > > as:
>> > > > 
>> > > >   struct bpf_sk_lookup {
>> > > >     ...
>> > > >     __be16 remote_port;
>> > > >     __be16 remote_port;
>> > > >     ...
>> > > >   }
>> > > 
>> > > I see, one can indeed argue that this is also a part of the ABI
>> > > now.
>> > > So we're stuck between a rock and a hard place.
>> > > 
>> > > > While for 4-byte loads, it appears as in your 2nd patch:
>> > > > 
>> > > >   struct bpf_sk_lookup {
>> > > >     ...
>> > > >     #if little-endian
>> > > >     __be16 remote_port;
>> > > >     __u16  :16; /* zero padding */
>> > > >     #elif big-endian
>> > > >     __u16  :16; /* zero padding */
>> > > >     __be16 remote_port;
>> > > >     #endif
>> > > >     ...
>> > > >   }
>> > > > 
>> > > > Because of that I don't see how we could keep complete ABI
>> > > > compatiblity,
>> > > > and have just one definition of struct bpf_sk_lookup that
>> > > > reflects
>> > > > it. These are conflicting requirements.
>> > > > 
>> > > > I'd bite the bullet for 4-byte loads, for the sake of having an
>> > > > endian-agnostic struct bpf_sk_lookup and struct bpf_sock
>> > > > definition
>> > > > in
>> > > > the uAPI header.
>> > > > 
>> > > > The sacrifice here is that the access converter will have to
>> > > > keep
>> > > > rewriting 4-byte access to bpf_sk_lookup.remote_port and
>> > > > bpf_sock.dst_port in this unexpected, quirky manner.
>> > > > 
>> > > > The expectation is that with time users will recompile their
>> > > > BPF
>> > > > progs
>> > > > against the updated bpf.h, and switch to 2-byte loads. That
>> > > > will make
>> > > > the quirk in the access converter dead code in time.
>> > > > 
>> > > > I don't have any better ideas. Sorry.
>> > > > 
>> > > > [...]
>> > > 
>> > > I agree, let's go ahead with this solution.
>> > > 
>> > > The only remaining problem that I see is: the bug is in the
>> > > common
>> > > code, and it will affect the fields that we add in the future.
>> > > 
>> > > Can we either document this state of things in a comment, or fix
>> > > the
>> > > bug and emulate the old behavior for certain fields?
>> > 
>> > I think we can fix the bug in the common code, and compensate for
>> > the
>> > quirky 4-byte access to bpf_sk_lookup.remote_port and
>> > bpf_sock.dst_port
>> > in the is_valid_access and convert_ctx_access.
>> > 
>> > With the patch as below, access to remote_port gets rewritten to:
>> > 
>> > * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (54) w2 &= 255
>> >    2: (b7) r0 = 0
>> >    3: (95) exit
>> > 
>> > * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (74) w2 >>= 8
>> >    2: (54) w2 &= 255
>> >    3: (b7) r0 = 0
>> >    4: (95) exit
>> > 
>> > * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>> >    0: (b4) w2 = 0
>> >    1: (54) w2 &= 255
>> >    2: (b7) r0 = 0
>> >    3: (95) exit
>> > 
>> > * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>> >    0: (b4) w2 = 0
>> >    1: (74) w2 >>= 8
>> >    2: (54) w2 &= 255
>> >    3: (b7) r0 = 0
>> >    4: (95) exit
>> > 
>> > * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>> >    0: (b4) w2 = 0
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > How does that look to you?
>> > 
>> > Still need to give it a test on s390x.
>> 
>> Context conversion with patch below applied looks correct to me on
>> s390x
>> as well:
>> 
>> * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (74) w2 >>= 8
>>    3: (bc) w2 = w2
>>    4: (54) w2 &= 255
>>    5: (bc) w2 = w2
>>    6: (b7) r0 = 0
>>    7: (95) exit
>> 
>> * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (54) w2 &= 255
>>    3: (bc) w2 = w2
>>    4: (b7) r0 = 0
>>    5: (95) exit
>> 
>> * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (74) w2 >>= 8
>>    3: (bc) w2 = w2
>>    4: (54) w2 &= 255
>>    5: (bc) w2 = w2
>>    6: (b7) r0 = 0
>>    7: (95) exit
>> 
>> * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (54) w2 &= 255
>>    3: (bc) w2 = w2
>>    4: (b7) r0 = 0
>>    5: (95) exit
>> 
>> * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> If we go this way, this should unbreak the bpf selftests on BE,
>> independently of the patch 1 from this series.
>> 
>> Will send it as a patch, so that we continue the review discussion.
>
> I applied this patch to bpf-next, and while it looks reasonable, the
> test still fails, e.g. here:
>
> 	/* Load from remote_port field with zero padding (backward
> compatibility) */
> 	val_u32 = *(__u32 *)&ctx->remote_port;
> 	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> 		return SK_DROP;
>

You are right. That's that the check I recently added that broke the bpf
CI runs for s390x [1], and started our thread.

I had a stale build of test_progs with a fix akin to patch [2] that I
was testing and missed that. Thanks for giving it a run.

If we go with Martin's suggestion [3] here and avoid bpf_htonl(), then
we could make it work and save ourselves endianess checks.

IOW, a patch like below would be also needed to unbreak the tests.

First chunk is copied from your fixes to test_sk_lookup in patch 3 in
this RFC series, naturally.

[1] https://lore.kernel.org/bpf/CAEf4BzaRNLw9_EnaMo5e46CdEkzbJiVU3j9oxnsemBKjNFf3wQ@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20220227202757.519015-4-jakub@cloudflare.com/
[3] https://lore.kernel.org/bpf/20220301062207.d3aqge5qg623asr6@kafai-mbp.dhcp.thefacebook.com/

---8<---

diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
index bf5b7caefdd0..2765a4bd500c 100644
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -413,15 +413,14 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)

        /* Narrow loads from remote_port field. Expect SRC_PORT. */
        if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
-           LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
-           LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0)
+           LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff))
                return SK_DROP;
        if (LSW(ctx->remote_port, 0) != SRC_PORT)
                return SK_DROP;

        /* Load from remote_port field with zero padding (backward compatibility) */
        val_u32 = *(__u32 *)&ctx->remote_port;
-       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+       if (val_u32 != SRC_PORT)
                return SK_DROP;

        /* Narrow loads from local_port field. Expect DST_PORT. */

  reply	other threads:[~2022-03-14 21:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
2022-03-08 15:01   ` Jakub Sitnicki
2022-03-08 23:58     ` Ilya Leoshkevich
2022-03-09  8:36       ` Jakub Sitnicki
2022-03-09 12:34         ` Ilya Leoshkevich
2022-03-10 22:57           ` Jakub Sitnicki
2022-03-14 17:35             ` Jakub Sitnicki
2022-03-14 18:25               ` Ilya Leoshkevich
2022-03-14 20:57                 ` Jakub Sitnicki [this message]
2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
2022-02-27  2:44   ` Alexei Starovoitov
2022-02-27 20:30     ` Jakub Sitnicki
2022-02-28 10:19       ` Ilya Leoshkevich
2022-02-28 13:26         ` Jakub Sitnicki
2022-03-01  0:39           ` Ilya Leoshkevich
2022-03-01  0:40           ` Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0cwxkzs.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.