From: John Fastabend <john.fastabend@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
kafai@fb.com, ast@kernel.org
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
Date: Fri, 31 Jul 2020 15:46:50 -0700 [thread overview]
Message-ID: <5f249f5a74483_54fa2b1d9fe285b4c5@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <546828c9-a6bb-57d3-9a9d-83f4e0131163@iogearbox.net>
Daniel Borkmann wrote:
> On 7/29/20 6:22 PM, John Fastabend wrote:
> > I had a sockmap program that after doing some refactoring started spewing
> > this splat at me:
> >
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [...]
> > [18610.807359] Call Trace:
> > [18610.807370] ? 0xffffffffc114d0d5
> > [18610.807382] __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> > [18610.807391] tcp_connect+0x895/0xd50
> > [18610.807400] tcp_v4_connect+0x465/0x4e0
> > [18610.807407] __inet_stream_connect+0xd6/0x3a0
> > [18610.807412] ? __inet_stream_connect+0x5/0x3a0
> > [18610.807417] inet_stream_connect+0x3b/0x60
> > [18610.807425] __sys_connect+0xed/0x120
> >
> > After some debugging I was able to build this simple reproducer,
> >
> > __section("sockops/reproducer_bad")
> > int bpf_reproducer_bad(struct bpf_sock_ops *skops)
> > {
> > volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > return 0;
> > }
> >
> > And along the way noticed that below program ran without splat,
> >
> > __section("sockops/reproducer_good")
> > int bpf_reproducer_good(struct bpf_sock_ops *skops)
> > {
> > volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > volatile __maybe_unused __u32 family;
> >
> > compiler_barrier();
> >
> > family = skops->family;
> > return 0;
> > }
> >
> > So I decided to check out the code we generate for the above two
> > programs and noticed each generates the BPF code you would expect,
> >
> > 0000000000000000 <bpf_reproducer_bad>:
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: r1 = *(u32 *)(r1 + 96)
> > 1: *(u32 *)(r10 - 4) = r1
> > ; return 0;
> > 2: r0 = 0
> > 3: exit
> >
> > 0000000000000000 <bpf_reproducer_good>:
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: r2 = *(u32 *)(r1 + 96)
> > 1: *(u32 *)(r10 - 4) = r2
> > ; family = skops->family;
> > 2: r1 = *(u32 *)(r1 + 20)
> > 3: *(u32 *)(r10 - 8) = r1
> > ; return 0;
> > 4: r0 = 0
> > 5: exit
> >
> > So we get reasonable assembly, but still something was causing the null
> > pointer dereference. So, we load the programs and dump the xlated version
> > observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> > translated by the skops access helpers.
> >
> > int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: (61) r1 = *(u32 *)(r1 +28)
> > 1: (15) if r1 == 0x0 goto pc+2
> > 2: (79) r1 = *(u64 *)(r1 +0)
> > 3: (61) r1 = *(u32 *)(r1 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 4: (63) *(u32 *)(r10 -4) = r1
> > ; return 0;
> > 5: (b7) r0 = 0
> > 6: (95) exit
> >
> > int bpf_reproducer_good(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 0: (61) r2 = *(u32 *)(r1 +28)
> > 1: (15) if r2 == 0x0 goto pc+2
> > 2: (79) r2 = *(u64 *)(r1 +0)
> > 3: (61) r2 = *(u32 *)(r2 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> > 4: (63) *(u32 *)(r10 -4) = r2
> > ; family = skops->family;
> > 5: (79) r1 = *(u64 *)(r1 +0)
> > 6: (69) r1 = *(u16 *)(r1 +16)
> > ; family = skops->family;
> > 7: (63) *(u32 *)(r10 -8) = r1
> > ; return 0;
> > 8: (b7) r0 = 0
> > 9: (95) exit
> >
> > Then we look at lines 0 and 2 above. In the good case we do the zero
> > check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> > into the bpf_sock_ops check and we can confirm that is the 'struct
> > sock *sk' pointer field. But, in the bad case,
> >
> > 0: (61) r1 = *(u32 *)(r1 +28)
> > 1: (15) if r1 == 0x0 goto pc+2
> > 2: (79) r1 = *(u64 *)(r1 +0)
> >
> > Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> > line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> >
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> >
> > The 0x01 makes sense because that is exactly the fullsock value. And
> > its not a valid dereference so we splat.
> >
> > To fix we need to guard the case when a program is doing a sock_ops field
> > access with src_reg == dst_reg. This is already handled in the load case
> > where the ctx_access handler uses a tmp register being careful to
> > store the old value and restore it. To fix the get case test if
> > src_reg == dst_reg and in this case do the is_fullsock test in the
> > temporary register. Remembering to restore the temporary register before
> > writing to either dst_reg or src_reg to avoid smashing the pointer into
> > the struct holding the tmp variable.
> >
> > Adding this inline code to test_tcpbpf_kern will now be generated
> > correctly from,
> >
> > 9: r2 = *(u32 *)(r2 + 96)
> >
> > to xlated code,
I have this in my logs at line 12,
*(u64 *)(r2 +32) = r9
> > 13: (61) r9 = *(u32 *)(r2 +28)
> > 14: (15) if r9 == 0x0 goto pc+4
> > 15: (79) r9 = *(u64 *)(r2 +32)
> > 16: (79) r2 = *(u64 *)(r2 +0)
> > 17: (61) r2 = *(u32 *)(r2 +2348)
> > 18: (05) goto pc+1
> > 19: (79) r9 = *(u64 *)(r2 +32)
>
> The diff below looks good to me, but I'm confused on this one above. I'm probably
> missing something, but given this is the dst == src case with the r2 register, where
> in the dump do we first saves the content of r9 into the scratch tmp store?
> Line 19 seems to restore it, but the save is missing, no?
>
> Please double check whether this was just omitted, but I would really like to have
> the commit message 100% correct as it otherwise causes confusion when we stare at it
> again a month later wrt what was the original intention.
off-by-one on the cut'n'paste into the commit message. Let me send a v3
with a correction to the commit. I do want this to be correct.
next prev parent reply other threads:[~2020-07-31 22:47 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
2020-07-29 21:29 ` Song Liu
2020-07-31 12:25 ` Daniel Borkmann
2020-07-31 22:46 ` John Fastabend [this message]
2020-07-29 16:23 ` [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg John Fastabend
2020-07-29 21:30 ` Song Liu
2020-07-29 16:23 ` [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register John Fastabend
2020-07-29 21:35 ` Song Liu
2020-07-29 16:23 ` [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers John Fastabend
2020-07-29 21:36 ` Song Liu
2020-07-29 16:24 ` [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk John Fastabend
2020-07-29 21:36 ` Song Liu
2020-07-29 21:57 ` [bpf PATCH v2 0/5] Fix sock_ops field read splat Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f249f5a74483_54fa2b1d9fe285b4c5@john-XPS-13-9370.notmuch \
--to=john.fastabend@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).