All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: "Lee Jones" <lee@kernel.org>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com,
	bpf@vger.kernel.org, "Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>, "Xu Kuohai" <xukuohai@huawei.com>,
	"Will Deacon" <will@kernel.org>,
	"Nathan Chancellor" <nathan@kernel.org>,
	"Pu Lehui" <pulehui@huawei.com>, "Björn Töpel" <bjorn@kernel.org>,
	"Ilya Leoshkevich" <iii@linux.ibm.com>
Subject: [PATCHv2 bpf 2/2] bpf: Fix prog_array_map_poke_run map poke update
Date: Tue, 28 Nov 2023 10:28:50 +0100	[thread overview]
Message-ID: <20231128092850.1545199-3-jolsa@kernel.org> (raw)
In-Reply-To: <20231128092850.1545199-1-jolsa@kernel.org>

Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.

There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.

The problem is that in such case we won't update the tail call jump
and cause imballance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.

I'm hitting following race during the program load:

  CPU 0                             CPU 1

  bpf_prog_load
    bpf_check
      do_misc_fixups
        prog_array_map_poke_track

                                    map_update_elem
                                      bpf_fd_array_map_update_elem
                                        prog_array_map_poke_run

                                          bpf_arch_text_poke returns -EINVAL

    bpf_prog_kallsyms_add

After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

Similar race exists on the program unload.

Fixing this by calling bpf_arch_text_poke with 'checkip=false' to skip the
bpf symbol check like we do in bpf_tail_call_direct_fixup. This way the
prog_array_map_poke_run does not depend on bpf program having the kallsym
symbol in place.

[0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/arraymap.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7ba389f7212f..b194282eacbb 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1044,20 +1044,11 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
 			 *    activated, so tail call updates can arrive from here
 			 *    while JIT is still finishing its final fixup for
 			 *    non-activated poke entries.
-			 * 3) On program teardown, the program's kallsym entry gets
-			 *    removed out of RCU callback, but we can only untrack
-			 *    from sleepable context, therefore bpf_arch_text_poke()
-			 *    might not see that this is in BPF text section and
-			 *    bails out with -EINVAL. As these are unreachable since
-			 *    RCU grace period already passed, we simply skip them.
-			 * 4) Also programs reaching refcount of zero while patching
+			 * 3) Also programs reaching refcount of zero while patching
 			 *    is in progress is okay since we're protected under
 			 *    poke_mutex and untrack the programs before the JIT
-			 *    buffer is freed. When we're still in the middle of
-			 *    patching and suddenly kallsyms entry of the program
-			 *    gets evicted, we just skip the rest which is fine due
-			 *    to point 3).
-			 * 5) Any other error happening below from bpf_arch_text_poke()
+			 *    buffer is freed.
+			 * 4) Any error happening below from bpf_arch_text_poke()
 			 *    is a unexpected bug.
 			 */
 			if (!READ_ONCE(poke->tailcall_target_stable))
@@ -1075,21 +1066,21 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
 			if (new) {
 				ret = bpf_arch_text_poke(poke->tailcall_target,
 							 BPF_MOD_JUMP,
-							 old_addr, new_addr, true);
-				BUG_ON(ret < 0 && ret != -EINVAL);
+							 old_addr, new_addr, false);
+				BUG_ON(ret < 0);
 				if (!old) {
 					ret = bpf_arch_text_poke(poke->tailcall_bypass,
 								 BPF_MOD_JUMP,
 								 poke->bypass_addr,
-								 NULL, true);
-					BUG_ON(ret < 0 && ret != -EINVAL);
+								 NULL, false);
+					BUG_ON(ret < 0);
 				}
 			} else {
 				ret = bpf_arch_text_poke(poke->tailcall_bypass,
 							 BPF_MOD_JUMP,
 							 old_bypass_addr,
-							 poke->bypass_addr, true);
-				BUG_ON(ret < 0 && ret != -EINVAL);
+							 poke->bypass_addr, false);
+				BUG_ON(ret < 0);
 				/* let other CPUs finish the execution of program
 				 * so that it will not possible to expose them
 				 * to invalid nop, stack unwind, nop state
@@ -1098,8 +1089,8 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
 					synchronize_rcu();
 				ret = bpf_arch_text_poke(poke->tailcall_target,
 							 BPF_MOD_JUMP,
-							 old_addr, NULL, true);
-				BUG_ON(ret < 0 && ret != -EINVAL);
+							 old_addr, NULL, false);
+				BUG_ON(ret < 0);
 			}
 		}
 	}
-- 
2.43.0


  parent reply	other threads:[~2023-11-28  9:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-28  9:28 [PATCHv2 bpf 0/2] bpf: Fix prog_array_map_poke_run map poke update Jiri Olsa
2023-11-28  9:28 ` [PATCHv2 bpf 1/2] bpf: Add checkip argument to bpf_arch_text_poke Jiri Olsa
2023-11-28 21:24   ` Stanislav Fomichev
2023-11-29 14:05     ` Jiri Olsa
2023-11-29 14:55       ` Jiri Olsa
2023-11-29 18:10         ` Stanislav Fomichev
2023-12-01  9:10           ` Jiri Olsa
2023-12-01 14:36   ` Ilya Leoshkevich
2023-12-03 20:50     ` Jiri Olsa
2023-11-28  9:28 ` Jiri Olsa [this message]
2023-11-28 22:44 ` [PATCHv2 bpf 0/2] bpf: Fix prog_array_map_poke_run map poke update Ilya Leoshkevich
2023-11-29 13:23   ` Jiri Olsa
2023-12-01 13:13     ` Jiri Olsa
2023-12-01 14:31       ` Ilya Leoshkevich
2023-12-01 14:52         ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231128092850.1545199-3-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=iii@linux.ibm.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=lee@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=nathan@kernel.org \
    --cc=pulehui@huawei.com \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com \
    --cc=will@kernel.org \
    --cc=xukuohai@huawei.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.