All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Song Liu <songliubraving@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Song Liu <song@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <Kernel-team@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack
Date: Fri, 22 Apr 2022 09:31:18 +0200	[thread overview]
Message-ID: <20220422073118.GR2731@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <8F788446-899C-4BA3-8236-612A94D98582@fb.com>

> > On Apr 21, 2022, at 3:30 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > I actually think bpf_arch_text_copy() is another horribly badly done thing.
> > 
> > It seems only implemented on x86 (I'm not sure how anything else is
> > supposed to work, I didn't go look), and there it is horribly badly
> > done, using __text_poke() that does all these magical things just to
> > make it atomic wrt concurrent code execution.
> > 
> > None of which is *AT*ALL* relevant for this case, since concurrent
> > code execution simply isn't a thing (and if it were, you would already
> > have lost).
> > 
> > And if that wasn't pointless enough, it does all that magic "map the
> > page writably at a different virtual address using poking_addr in
> > poking_mm" and a different address space entirely.
> > 
> > All of that is required for REAL KERNEL CODE.
> > 
> > But the thing is, for bpf_prog_pack, all of that is just completely
> > pointless and stupid complexity.

I think the point is that this hole will likely share a page with active
code, and as such there should not be a writable mapping mapping to it,
necessitating the whole __text_poke() mess.

That said; it does seem somewhat silly have a whole page worth of int3
around just for this.

Perhaps we can do something like the completely untested below?

---
 arch/x86/kernel/alternative.c | 48 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d374cb3cf024..60afa9105307 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -994,7 +994,20 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
-static void *__text_poke(void *addr, const void *opcode, size_t len)
+static void text_poke_memcpy(void *dst, const void *src, size_t len)
+{
+	memcpy(dst, src, len);
+}
+
+static void text_poke_memset(void *dst, const void *src, size_t len)
+{
+	int c = *(int *)src;
+	memset(dst, c, len);
+}
+
+typedef void text_poke_f(void *dst, const void *src, size_t len);
+
+static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len)
 {
 	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
 	struct page *pages[2] = {NULL};
@@ -1059,7 +1072,7 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
 	prev = use_temporary_mm(poking_mm);
 
 	kasan_disable_current();
-	memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len);
+	func((void *)poking_addr + offset_in_page(addr), src, len);
 	kasan_enable_current();
 
 	/*
@@ -1091,7 +1104,8 @@ static void *__text_poke(void *addr, const void *opcode, size_t len)
 	 * If the text does not match what we just wrote then something is
 	 * fundamentally screwy; there's nothing we can really do about that.
 	 */
-	BUG_ON(memcmp(addr, opcode, len));
+	if (func == text_poke_memcpy)
+		BUG_ON(memcmp(addr, src, len));
 
 	local_irq_restore(flags);
 	pte_unmap_unlock(ptep, ptl);
@@ -1118,7 +1132,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
 {
 	lockdep_assert_held(&text_mutex);
 
-	return __text_poke(addr, opcode, len);
+	return __text_poke(text_poke_memcpy, addr, opcode, len);
 }
 
 /**
@@ -1137,7 +1151,7 @@ void *text_poke(void *addr, const void *opcode, size_t len)
  */
 void *text_poke_kgdb(void *addr, const void *opcode, size_t len)
 {
-	return __text_poke(addr, opcode, len);
+	return __text_poke(text_poke_memcpy, addr, opcode, len);
 }
 
 /**
@@ -1167,7 +1181,29 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len)
 
 		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
 
-		__text_poke((void *)ptr, opcode + patched, s);
+		__text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s);
+		patched += s;
+	}
+	mutex_unlock(&text_mutex);
+	return addr;
+}
+
+void *text_poke_set(void *addr, int c, size_t len)
+{
+	unsigned long start = (unsigned long)addr;
+	size_t patched = 0;
+
+	if (WARN_ON_ONCE(core_kernel_text(start)))
+		return NULL;
+
+	mutex_lock(&text_mutex);
+	while (patched < len) {
+		unsigned long ptr = start + patched;
+		size_t s;
+
+		s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched);
+
+		__text_poke(text_poke_memset, (void *)ptr, (void *)&c, s);
 		patched += s;
 	}
 	mutex_unlock(&text_mutex);

  parent reply	other threads:[~2022-04-22  7:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-21  7:22 [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack Song Liu
2022-04-21 17:09 ` Linus Torvalds
2022-04-21 18:24   ` Alexei Starovoitov
2022-04-21 18:59     ` Linus Torvalds
2022-04-21 19:40       ` Song Liu
2022-04-21 21:28         ` Linus Torvalds
2022-04-21 21:52           ` Song Liu
2022-04-21 22:30             ` Linus Torvalds
2022-04-21 22:51               ` Song Liu
2022-04-21 23:10                 ` Linus Torvalds
2022-04-22  1:31                   ` Song Liu
2022-04-22  7:31                 ` Peter Zijlstra [this message]
2022-04-23  5:25                   ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220422073118.GR2731@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=song@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.