linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] livepatch: Clear relocation targets on a module removal
@ 2023-01-18 20:47 Song Liu
  2023-01-18 22:08 ` Josh Poimboeuf
  2023-01-20 19:16 ` Josh Poimboeuf
  0 siblings, 2 replies; 11+ messages in thread
From: Song Liu @ 2023-01-18 20:47 UTC (permalink / raw)
  To: linux-kernel, linux-modules, live-patching
  Cc: x86, jpoimboe, jikos, pmladek, joe.lawrence, Miroslav Benes,
	Song Liu, Josh Poimboeuf

From: Miroslav Benes <mbenes@suse.cz>

Josh reported a bug:

  When the object to be patched is a module, and that module is
  rmmod'ed and reloaded, it fails to load with:

  module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

  The livepatch module has a relocation which references a symbol
  in the _previous_ loading of nfsd. When apply_relocate_add()
  tries to replace the old relocation with a new one, it sees that
  the previous one is nonzero and it errors out.

  On ppc64le, we have a similar issue:

  module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9a1
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Co-developed-by: Song Liu <song@kernel.org>
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>

---

NOTE: powerpc32 code is only compile tested.

Changes v8 => v9:
1. Fix overflow check for R_X86_64_PC32 and R_X86_64_PLT32. (Petr Mladek)

Changes v7 = v8:
1. Remove the logic in powerpc/kernel/module_64.c, as there is ongoing
   discussions.
2. For x86_64, add check for expected value during clear_relocate_add().
   (Petr Mladek)
3. Optimize the logic in klp_write_section_relocs(). (Petr Mladek)
4. Optimize __write_relocate_add (x86_64). (Joe Lawrence)

Changes v6 = v7:
1. Reduce code duplication in livepatch/core.c and x86/kernel/module.c.
2. Add more comments to powerpc/kernel/module_64.c.
3. Added Joe's Tested-by (which I should have added in v6).

Changes v5 = v6:
1. Fix powerpc64.
2. Fix compile for powerpc32.

Changes v4 = v5:
1. Fix compile with powerpc.

Changes v3 = v4:
1. Reuse __apply_relocate_add to make it more reliable in long term.
   (Josh Poimboeuf)
2. Add back ppc64 logic from v2, with changes to match current code.
   (Josh Poimboeuf)

Changes v2 => v3:
1. Rewrite x86 changes to match current code style.
2. Remove powerpc changes as there is no test coverage in v3.
3. Only keep 1/3 of v2.

v2: https://lore.kernel.org/all/20190905124514.8944-1-mbenes@suse.cz/T/#u

fix
---
 arch/powerpc/kernel/module_32.c | 10 ++++
 arch/powerpc/kernel/module_64.c | 10 ++++
 arch/s390/kernel/module.c       |  8 +++
 arch/x86/kernel/module.c        | 92 +++++++++++++++++++++------------
 include/linux/moduleloader.h    |  7 +++
 kernel/livepatch/core.c         | 54 ++++++++++++++-----
 6 files changed, 134 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index ea6536171778..e3c312770453 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -285,6 +285,16 @@ int apply_relocate_add(Elf32_Shdr *sechdrs,
 	return 0;
 }
 
+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf32_Shdr *sechdrs,
+		   const char *strtab,
+		   unsigned int symindex,
+		   unsigned int relsec,
+		   struct module *me)
+{
+}
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 notrace int module_trampoline_target(struct module *mod, unsigned long addr,
 				     unsigned long *target)
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ff045644f13f..1096d6b3a62c 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -749,6 +749,16 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 	return 0;
 }
 
+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+		       const char *strtab,
+		       unsigned int symindex,
+		       unsigned int relsec,
+		       struct module *me)
+{
+}
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 int module_trampoline_target(struct module *mod, unsigned long addr,
 			     unsigned long *target)
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 2d159b32885b..cc6784fbc1ac 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -500,6 +500,14 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct module *me,
 }
 #endif /* CONFIG_FUNCTION_TRACER */
 
+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf64_Shdr *sechdrs, const char *strtab,
+			unsigned int symindex, unsigned int relsec,
+			struct module *me)
+{
+}
+#endif
+
 int module_finalize(const Elf_Ehdr *hdr,
 		    const Elf_Shdr *sechdrs,
 		    struct module *me)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 705fb2a41d7d..034969ad7533 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -129,22 +129,27 @@ int apply_relocate(Elf32_Shdr *sechdrs,
 	return 0;
 }
 #else /*X86_64*/
-static int __apply_relocate_add(Elf64_Shdr *sechdrs,
+static int __write_relocate_add(Elf64_Shdr *sechdrs,
 		   const char *strtab,
 		   unsigned int symindex,
 		   unsigned int relsec,
 		   struct module *me,
-		   void *(*write)(void *dest, const void *src, size_t len))
+		   void *(*write)(void *dest, const void *src, size_t len),
+		   bool apply)
 {
 	unsigned int i;
 	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
 	Elf64_Sym *sym;
 	void *loc;
 	u64 val;
+	u64 zero = 0ULL;
 
-	DEBUGP("Applying relocate section %u to %u\n",
+	DEBUGP("%s relocate section %u to %u\n",
+	       (apply) ? "Applying" : "Clearing",
 	       relsec, sechdrs[relsec].sh_info);
 	for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
+		int write_size = 4;
+
 		/* This is where to make the change */
 		loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
 			+ rel[i].r_offset;
@@ -162,56 +167,53 @@ static int __apply_relocate_add(Elf64_Shdr *sechdrs,
 
 		switch (ELF64_R_TYPE(rel[i].r_info)) {
 		case R_X86_64_NONE:
-			break;
+			continue;  /* nothing to write */
 		case R_X86_64_64:
-			if (*(u64 *)loc != 0)
-				goto invalid_relocation;
-			write(loc, &val, 8);
+			write_size = 8;
 			break;
 		case R_X86_64_32:
-			if (*(u32 *)loc != 0)
-				goto invalid_relocation;
-			write(loc, &val, 4);
-			if (val != *(u32 *)loc)
+			if (val != *(u32 *)&val)
 				goto overflow;
 			break;
 		case R_X86_64_32S:
-			if (*(s32 *)loc != 0)
-				goto invalid_relocation;
-			write(loc, &val, 4);
-			if ((s64)val != *(s32 *)loc)
+			if ((s64)val != *(s32 *)&val)
 				goto overflow;
 			break;
 		case R_X86_64_PC32:
 		case R_X86_64_PLT32:
-			if (*(u32 *)loc != 0)
-				goto invalid_relocation;
 			val -= (u64)loc;
-			write(loc, &val, 4);
 #if 0
-			if ((s64)val != *(s32 *)loc)
+			if ((s64)val != *(s32 *)&val)
 				goto overflow;
 #endif
 			break;
 		case R_X86_64_PC64:
-			if (*(u64 *)loc != 0)
-				goto invalid_relocation;
 			val -= (u64)loc;
-			write(loc, &val, 8);
+			write_size = 8;
 			break;
 		default:
 			pr_err("%s: Unknown rela relocation: %llu\n",
 			       me->name, ELF64_R_TYPE(rel[i].r_info));
 			return -ENOEXEC;
 		}
+
+		if (apply) {
+			if (memcmp(loc, &zero, write_size)) {
+				pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
+				       (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
+				return -ENOEXEC;
+			}
+			write(loc, &val, write_size);
+		} else {
+			if (memcmp(loc, &val, write_size)) {
+				pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
+					(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
+			}
+			write(loc, &zero, write_size);
+		}
 	}
 	return 0;
 
-invalid_relocation:
-	pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
-	       (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
-	return -ENOEXEC;
-
 overflow:
 	pr_err("overflow in relocation type %d val %Lx\n",
 	       (int)ELF64_R_TYPE(rel[i].r_info), val);
@@ -220,11 +222,12 @@ static int __apply_relocate_add(Elf64_Shdr *sechdrs,
 	return -ENOEXEC;
 }
 
-int apply_relocate_add(Elf64_Shdr *sechdrs,
-		   const char *strtab,
-		   unsigned int symindex,
-		   unsigned int relsec,
-		   struct module *me)
+static int write_relocate_add(Elf64_Shdr *sechdrs,
+			      const char *strtab,
+			      unsigned int symindex,
+			      unsigned int relsec,
+			      struct module *me,
+			      bool apply)
 {
 	int ret;
 	bool early = me->state == MODULE_STATE_UNFORMED;
@@ -235,8 +238,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		mutex_lock(&text_mutex);
 	}
 
-	ret = __apply_relocate_add(sechdrs, strtab, symindex, relsec, me,
-				   write);
+	ret = __write_relocate_add(sechdrs, strtab, symindex, relsec, me,
+				   write, apply);
 
 	if (!early) {
 		text_poke_sync();
@@ -246,6 +249,27 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 	return ret;
 }
 
+int apply_relocate_add(Elf64_Shdr *sechdrs,
+		   const char *strtab,
+		   unsigned int symindex,
+		   unsigned int relsec,
+		   struct module *me)
+{
+	return write_relocate_add(sechdrs, strtab, symindex, relsec, me, true);
+}
+
+#ifdef CONFIG_LIVEPATCH
+
+void clear_relocate_add(Elf64_Shdr *sechdrs,
+			const char *strtab,
+			unsigned int symindex,
+			unsigned int relsec,
+			struct module *me)
+{
+	write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);
+}
+#endif
+
 #endif
 
 int module_finalize(const Elf_Ehdr *hdr,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 7b4587a19189..0b54ec9856df 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -75,6 +75,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
 		       unsigned int symindex,
 		       unsigned int relsec,
 		       struct module *mod);
+#ifdef CONFIG_LIVEPATCH
+void clear_relocate_add(Elf_Shdr *sechdrs,
+		   const char *strtab,
+		   unsigned int symindex,
+		   unsigned int relsec,
+		   struct module *me);
+#endif
 #else
 static inline int apply_relocate_add(Elf_Shdr *sechdrs,
 				     const char *strtab,
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 201f0c0482fb..c72f378181ce 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -291,10 +291,10 @@ static int klp_resolve_symbols(Elf_Shdr *sechdrs, const char *strtab,
  *    the to-be-patched module to be loaded and patched sometime *after* the
  *    klp module is loaded.
  */
-int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
-			     const char *shstrtab, const char *strtab,
-			     unsigned int symndx, unsigned int secndx,
-			     const char *objname)
+static int klp_write_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
+				    const char *shstrtab, const char *strtab,
+				    unsigned int symndx, unsigned int secndx,
+				    const char *objname, bool apply)
 {
 	int cnt, ret;
 	char sec_objname[MODULE_NAME_LEN];
@@ -316,11 +316,26 @@ int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
 	if (strcmp(objname ? objname : "vmlinux", sec_objname))
 		return 0;
 
-	ret = klp_resolve_symbols(sechdrs, strtab, symndx, sec, sec_objname);
-	if (ret)
-		return ret;
+	if (apply) {
+		ret = klp_resolve_symbols(sechdrs, strtab, symndx,
+					  sec, sec_objname);
+		if (ret)
+			return ret;
 
-	return apply_relocate_add(sechdrs, strtab, symndx, secndx, pmod);
+		return apply_relocate_add(sechdrs, strtab, symndx, secndx, pmod);
+	}
+
+	clear_relocate_add(sechdrs, strtab, symndx, secndx, pmod);
+	return 0;
+}
+
+int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
+			     const char *shstrtab, const char *strtab,
+			     unsigned int symndx, unsigned int secndx,
+			     const char *objname)
+{
+	return klp_write_section_relocs(pmod, sechdrs, shstrtab, strtab, symndx,
+					secndx, objname, true);
 }
 
 /*
@@ -769,8 +784,9 @@ static int klp_init_func(struct klp_object *obj, struct klp_func *func)
 			   func->old_sympos ? func->old_sympos : 1);
 }
 
-static int klp_apply_object_relocs(struct klp_patch *patch,
-				   struct klp_object *obj)
+static int klp_write_object_relocs(struct klp_patch *patch,
+				   struct klp_object *obj,
+				   bool apply)
 {
 	int i, ret;
 	struct klp_modinfo *info = patch->mod->klp_info;
@@ -781,10 +797,10 @@ static int klp_apply_object_relocs(struct klp_patch *patch,
 		if (!(sec->sh_flags & SHF_RELA_LIVEPATCH))
 			continue;
 
-		ret = klp_apply_section_relocs(patch->mod, info->sechdrs,
+		ret = klp_write_section_relocs(patch->mod, info->sechdrs,
 					       info->secstrings,
 					       patch->mod->core_kallsyms.strtab,
-					       info->symndx, i, obj->name);
+					       info->symndx, i, obj->name, apply);
 		if (ret)
 			return ret;
 	}
@@ -792,6 +808,18 @@ static int klp_apply_object_relocs(struct klp_patch *patch,
 	return 0;
 }
 
+static int klp_apply_object_relocs(struct klp_patch *patch,
+				   struct klp_object *obj)
+{
+	return klp_write_object_relocs(patch, obj, true);
+}
+
+static void klp_clear_object_relocs(struct klp_patch *patch,
+				    struct klp_object *obj)
+{
+	klp_write_object_relocs(patch, obj, false);
+}
+
 /* parts of the initialization that is done only when the object is loaded */
 static int klp_init_object_loaded(struct klp_patch *patch,
 				  struct klp_object *obj)
@@ -1179,7 +1207,7 @@ static void klp_cleanup_module_patches_limited(struct module *mod,
 			klp_unpatch_object(obj);
 
 			klp_post_unpatch_callback(obj);
-
+			klp_clear_object_relocs(patch, obj);
 			klp_free_object_loaded(obj);
 			break;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-18 20:47 [PATCH v9] livepatch: Clear relocation targets on a module removal Song Liu
@ 2023-01-18 22:08 ` Josh Poimboeuf
  2023-01-19 19:06   ` Song Liu
  2023-01-20 19:16 ` Josh Poimboeuf
  1 sibling, 1 reply; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-18 22:08 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> From: Miroslav Benes <mbenes@suse.cz>
> 
> Josh reported a bug:
> 
>   When the object to be patched is a module, and that module is
>   rmmod'ed and reloaded, it fails to load with:
> 
>   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> 
>   The livepatch module has a relocation which references a symbol
>   in the _previous_ loading of nfsd. When apply_relocate_add()
>   tries to replace the old relocation with a new one, it sees that
>   the previous one is nonzero and it errors out.
> 
>   On ppc64le, we have a similar issue:
> 
>   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

Shouldn't there also be a fix for this powerpc issue?

-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-18 22:08 ` Josh Poimboeuf
@ 2023-01-19 19:06   ` Song Liu
  2023-01-20  6:42     ` Josh Poimboeuf
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2023-01-19 19:06 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Wed, Jan 18, 2023 at 2:08 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> > From: Miroslav Benes <mbenes@suse.cz>
> >
> > Josh reported a bug:
> >
> >   When the object to be patched is a module, and that module is
> >   rmmod'ed and reloaded, it fails to load with:
> >
> >   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> >   The livepatch module has a relocation which references a symbol
> >   in the _previous_ loading of nfsd. When apply_relocate_add()
> >   tries to replace the old relocation with a new one, it sees that
> >   the previous one is nonzero and it errors out.
> >
> >   On ppc64le, we have a similar issue:
> >
> >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> Shouldn't there also be a fix for this powerpc issue?

There was a working version, but it was not very clean. We couldn't agree
on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
s390?) first [1].

Thanks,
Song

[1] https://lore.kernel.org/live-patching/Y7hLvpHqgY0oJ4GY@alley/#t

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-19 19:06   ` Song Liu
@ 2023-01-20  6:42     ` Josh Poimboeuf
  2023-01-20 17:03       ` Song Liu
  2023-01-24 20:08       ` Joe Lawrence
  0 siblings, 2 replies; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-20  6:42 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Thu, Jan 19, 2023 at 11:06:35AM -0800, Song Liu wrote:
> On Wed, Jan 18, 2023 at 2:08 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> > > From: Miroslav Benes <mbenes@suse.cz>
> > >
> > > Josh reported a bug:
> > >
> > >   When the object to be patched is a module, and that module is
> > >   rmmod'ed and reloaded, it fails to load with:
> > >
> > >   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > >
> > >   The livepatch module has a relocation which references a symbol
> > >   in the _previous_ loading of nfsd. When apply_relocate_add()
> > >   tries to replace the old relocation with a new one, it sees that
> > >   the previous one is nonzero and it errors out.
> > >
> > >   On ppc64le, we have a similar issue:
> > >
> > >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> > Shouldn't there also be a fix for this powerpc issue?
> 
> There was a working version, but it was not very clean. We couldn't agree
> on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
> s390?) first [1].

Sorry for coming in late, I was on leave so I missed a lot of the
discussions on previous versions.  The decision to leave powerpc broken
wasn't clear from reading the commit message.  The bug is mentioned, and
the fix is implied, but surprisingly there's no fix.

I agree that the powerpc fix should be in a separate patch, but I still
don't feel comfortable merging the x86 fix without the corresponding
powerpc fix.

powerpc is a major arch and not a second-class citizen.  If we don't fix
it now then it'll probably never get fixed until it blows up in the real
world.

For powerpc, instead of clearing, how about just "fixing" the warning
site, something like so (untested)?


diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 1096d6b3a62c..1a12463ba674 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -499,9 +499,11 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
 
 /* We expect a noop next: if it is, replace it with instruction to
    restore r2. */
-static int restore_r2(const char *name, u32 *instruction, struct module *me)
+static int restore_r2(const char *name, u32 *instruction, struct module *me,
+		      bool klp_sym)
 {
 	u32 *prev_insn = instruction - 1;
+	u32 insn_val = *instruction;
 
 	if (is_mprofile_ftrace_call(name))
 		return 1;
@@ -514,9 +516,18 @@ static int restore_r2(const char *name, u32 *instruction, struct module *me)
 	if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
 		return 1;
 
-	if (*instruction != PPC_RAW_NOP()) {
+	/*
+	 * For a livepatch relocation, the restore r2 instruction might have
+	 * been previously written if the relocation references a symbol in a
+	 * module which was unloaded and is now being reloaded.  In that case,
+	 * skip the warning and instruction write.
+	 */
+	if (klp_sym && insn_val == PPC_INST_LD_TOC)
+		return 0;
+
+	if (insn_val != PPC_RAW_NOP()) {
 		pr_err("%s: Expected nop after call, got %08x at %pS\n",
-			me->name, *instruction, instruction);
+			me->name, insn_val, instruction);
 		return 0;
 	}
 
@@ -649,7 +660,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 				if (!value)
 					return -ENOENT;
 				if (!restore_r2(strtab + sym->st_name,
-							(u32 *)location + 1, me))
+						(u32 *)location + 1, me,
+						sym->st_shndx == SHN_LIVEPATCH))
 					return -ENOEXEC;
 			} else
 				value += local_entry_offset(sym);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-20  6:42     ` Josh Poimboeuf
@ 2023-01-20 17:03       ` Song Liu
  2023-01-20 17:28         ` Josh Poimboeuf
  2023-01-24 20:08       ` Joe Lawrence
  1 sibling, 1 reply; 11+ messages in thread
From: Song Liu @ 2023-01-20 17:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Thu, Jan 19, 2023 at 10:42 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Thu, Jan 19, 2023 at 11:06:35AM -0800, Song Liu wrote:
> > On Wed, Jan 18, 2023 at 2:08 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> > > > From: Miroslav Benes <mbenes@suse.cz>
> > > >
> > > > Josh reported a bug:
> > > >
> > > >   When the object to be patched is a module, and that module is
> > > >   rmmod'ed and reloaded, it fails to load with:
> > > >
> > > >   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > > >
> > > >   The livepatch module has a relocation which references a symbol
> > > >   in the _previous_ loading of nfsd. When apply_relocate_add()
> > > >   tries to replace the old relocation with a new one, it sees that
> > > >   the previous one is nonzero and it errors out.
> > > >
> > > >   On ppc64le, we have a similar issue:
> > > >
> > > >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> > >
> > > Shouldn't there also be a fix for this powerpc issue?
> >
> > There was a working version, but it was not very clean. We couldn't agree
> > on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
> > s390?) first [1].
>
> Sorry for coming in late, I was on leave so I missed a lot of the
> discussions on previous versions.  The decision to leave powerpc broken
> wasn't clear from reading the commit message.  The bug is mentioned, and
> the fix is implied, but surprisingly there's no fix.
>
> I agree that the powerpc fix should be in a separate patch, but I still
> don't feel comfortable merging the x86 fix without the corresponding
> powerpc fix.
>
> powerpc is a major arch and not a second-class citizen.  If we don't fix
> it now then it'll probably never get fixed until it blows up in the real
> world.
>
> For powerpc, instead of clearing, how about just "fixing" the warning
> site, something like so (untested)?

This version looks reasonable to me.

Thanks,
Song

>
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 1096d6b3a62c..1a12463ba674 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -499,9 +499,11 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
>
>  /* We expect a noop next: if it is, replace it with instruction to
>     restore r2. */
> -static int restore_r2(const char *name, u32 *instruction, struct module *me)
> +static int restore_r2(const char *name, u32 *instruction, struct module *me,
> +                     bool klp_sym)
>  {
>         u32 *prev_insn = instruction - 1;
> +       u32 insn_val = *instruction;
>
>         if (is_mprofile_ftrace_call(name))
>                 return 1;
> @@ -514,9 +516,18 @@ static int restore_r2(const char *name, u32 *instruction, struct module *me)
>         if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
>                 return 1;
>
> -       if (*instruction != PPC_RAW_NOP()) {
> +       /*
> +        * For a livepatch relocation, the restore r2 instruction might have
> +        * been previously written if the relocation references a symbol in a
> +        * module which was unloaded and is now being reloaded.  In that case,
> +        * skip the warning and instruction write.
> +        */
> +       if (klp_sym && insn_val == PPC_INST_LD_TOC)
> +               return 0;
> +
> +       if (insn_val != PPC_RAW_NOP()) {
>                 pr_err("%s: Expected nop after call, got %08x at %pS\n",
> -                       me->name, *instruction, instruction);
> +                       me->name, insn_val, instruction);
>                 return 0;
>         }
>
> @@ -649,7 +660,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>                                 if (!value)
>                                         return -ENOENT;
>                                 if (!restore_r2(strtab + sym->st_name,
> -                                                       (u32 *)location + 1, me))
> +                                               (u32 *)location + 1, me,
> +                                               sym->st_shndx == SHN_LIVEPATCH))
>                                         return -ENOEXEC;
>                         } else
>                                 value += local_entry_offset(sym);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-20 17:03       ` Song Liu
@ 2023-01-20 17:28         ` Josh Poimboeuf
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-20 17:28 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Fri, Jan 20, 2023 at 09:03:55AM -0800, Song Liu wrote:
> > > > Shouldn't there also be a fix for this powerpc issue?
> > >
> > > There was a working version, but it was not very clean. We couldn't agree
> > > on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
> > > s390?) first [1].
> >
> > Sorry for coming in late, I was on leave so I missed a lot of the
> > discussions on previous versions.  The decision to leave powerpc broken
> > wasn't clear from reading the commit message.  The bug is mentioned, and
> > the fix is implied, but surprisingly there's no fix.
> >
> > I agree that the powerpc fix should be in a separate patch, but I still
> > don't feel comfortable merging the x86 fix without the corresponding
> > powerpc fix.
> >
> > powerpc is a major arch and not a second-class citizen.  If we don't fix
> > it now then it'll probably never get fixed until it blows up in the real
> > world.
> >
> > For powerpc, instead of clearing, how about just "fixing" the warning
> > site, something like so (untested)?
> 
> This version looks reasonable to me.

Ok, I'll run it through testing and work up a proper patch.  I just
noticed the one I posted has a major bug thanks to restore_r2()'s
surprising return semantics.

-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-18 20:47 [PATCH v9] livepatch: Clear relocation targets on a module removal Song Liu
  2023-01-18 22:08 ` Josh Poimboeuf
@ 2023-01-20 19:16 ` Josh Poimboeuf
  2023-01-20 19:41   ` Song Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-20 19:16 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> From: Miroslav Benes <mbenes@suse.cz>
> 
> Josh reported a bug:
> 
>   When the object to be patched is a module, and that module is
>   rmmod'ed and reloaded, it fails to load with:
> 
>   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> 
>   The livepatch module has a relocation which references a symbol
>   in the _previous_ loading of nfsd. When apply_relocate_add()
>   tries to replace the old relocation with a new one, it sees that
>   the previous one is nonzero and it errors out.

Should we add a selftest to make sure this problem doesn't come back?

>   On ppc64le, we have a similar issue:
> 
>   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

Just to clarify my previous comment, the reference to the powerpc issue
should be removed because it'll be fixed in a separate patch.

> He also proposed three different solutions. We could remove the error
> check in apply_relocate_add() introduced by commit eda9cec4c9a1
> ("x86/module: Detect and skip invalid relocations"). However the check
> is useful for detecting corrupted modules.
> 
> We could also deny the patched modules to be removed. If it proved to be
> a major drawback for users, we could still implement a different
> approach. The solution would also complicate the existing code a lot.
> 
> We thus decided to reverse the relocation patching (clear all relocation
> targets on x86_64). The solution is not
> universal and is too much arch-specific, but it may prove to be simpler
> in the end.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> Signed-off-by: Song Liu <song@kernel.org>
> Acked-by: Miroslav Benes <mbenes@suse.cz>
> Co-developed-by: Song Liu <song@kernel.org>
> Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>

According to submitting-patches.rst the Co-developed-by is supposed to
be immediately before your Signed-off-by.

Also, other than the commit log, this patch is almost completely
unrecognizable compared to Miroslav's original patch.  Does it still
make sense for him to be listed as the author?

In the -tip tree they sometimes use an Originally-by tag, which might be
relevant here.

> -static int __apply_relocate_add(Elf64_Shdr *sechdrs,
> +static int __write_relocate_add(Elf64_Shdr *sechdrs,
>  		   const char *strtab,
>  		   unsigned int symindex,
>  		   unsigned int relsec,
>  		   struct module *me,
> -		   void *(*write)(void *dest, const void *src, size_t len))
> +		   void *(*write)(void *dest, const void *src, size_t len),
> +		   bool apply)
>  {
>  	unsigned int i;
>  	Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
>  	Elf64_Sym *sym;
>  	void *loc;
>  	u64 val;
> +	u64 zero = 0ULL;
>  
> -	DEBUGP("Applying relocate section %u to %u\n",
> +	DEBUGP("%s relocate section %u to %u\n",
> +	       (apply) ? "Applying" : "Clearing",

"(apply)" has unnecessary parentheses.

>  	       relsec, sechdrs[relsec].sh_info);
>  	for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
> +		int write_size = 4;

We already know we're writing, I suggest just calling it 'size' to be
more consistent with kernel style.

Also, it can be an unsigned value like size_t.

Also, instead of initializing it here with a potentially bogus value
which may need to be overwritten for a 64-bit reloc, it's better to
explicitly set the size in each individual case below.  That's makes the
logic clearer and helps prevent future bugs if new 64-bit relocation
types are added later.

>  		case R_X86_64_PC32:
>  		case R_X86_64_PLT32:
> -			if (*(u32 *)loc != 0)
> -				goto invalid_relocation;
>  			val -= (u64)loc;
> -			write(loc, &val, 4);
>  #if 0
> -			if ((s64)val != *(s32 *)loc)
> +			if ((s64)val != *(s32 *)&val)
>  				goto overflow;
>  #endif

Why is the compiled-out code getting changed?  Is it actually fixing a
hypothetical bug?

This code really needs to be removed anyway, it's been dead for at least
15 years.

>  			break;
>  		case R_X86_64_PC64:
> -			if (*(u64 *)loc != 0)
> -				goto invalid_relocation;
>  			val -= (u64)loc;
> -			write(loc, &val, 8);
> +			write_size = 8;
>  			break;
>  		default:
>  			pr_err("%s: Unknown rela relocation: %llu\n",
>  			       me->name, ELF64_R_TYPE(rel[i].r_info));
>  			return -ENOEXEC;
>  		}
> +
> +		if (apply) {
> +			if (memcmp(loc, &zero, write_size)) {
> +				pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",

It's not just "skipping", it's erroring out completely.  Yes, it's is a
pre-existing error message but we might as well improve it while
touching it.

Maybe just remove the "Skipping", i.e. change "Skipping invalid ..." to
"Invalid ..." ?

> +				       (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> +				return -ENOEXEC;
> +			}
> +			write(loc, &val, write_size);
> +		} else {
> +			if (memcmp(loc, &val, write_size)) {
> +				pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
> +					(int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> +			}
> +			write(loc, &zero, write_size);

If the value doesn't match then something has gone badly wrong.  Why go
ahead with the clearing in that case?

> +#ifdef CONFIG_LIVEPATCH
> +
> +void clear_relocate_add(Elf64_Shdr *sechdrs,
> +			const char *strtab,
> +			unsigned int symindex,
> +			unsigned int relsec,
> +			struct module *me)
> +{
> +	write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);
> +}
> +#endif

Superflous newline after the '#ifdef CONFIG_LIVEPATCH'.

> +
>  #endif
>  
>  int module_finalize(const Elf_Ehdr *hdr,
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 7b4587a19189..0b54ec9856df 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -75,6 +75,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
>  		       unsigned int symindex,
>  		       unsigned int relsec,
>  		       struct module *mod);
> +#ifdef CONFIG_LIVEPATCH

This could use a comment explaining the purpose of this function:

/*
 * Some architectures (namely x86_64 and ppc64) perform sanity checks when
 * applying relocations.  If a patched module gets unloaded and then later
 * reloaded (and re-patched), klp re-applies relocations to the replacement
 * function(s).  Any leftover relocations from the previous loading of the
 * patched module might trigger the sanity checks.
 *
 * To prevent that, when unloading a patched module, clear out any relocations
 * that might trigger arch-specific sanity checks on a future module reload.
 */

> +void clear_relocate_add(Elf_Shdr *sechdrs,
> +		   const char *strtab,
> +		   unsigned int symindex,
> +		   unsigned int relsec,
> +		   struct module *me);
> +#endif


-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-20 19:16 ` Josh Poimboeuf
@ 2023-01-20 19:41   ` Song Liu
  2023-01-20 20:32     ` Josh Poimboeuf
  0 siblings, 1 reply; 11+ messages in thread
From: Song Liu @ 2023-01-20 19:41 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Fri, Jan 20, 2023 at 11:16 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> > From: Miroslav Benes <mbenes@suse.cz>
> >
> > Josh reported a bug:
> >
> >   When the object to be patched is a module, and that module is
> >   rmmod'ed and reloaded, it fails to load with:
> >
> >   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> >   The livepatch module has a relocation which references a symbol
> >   in the _previous_ loading of nfsd. When apply_relocate_add()
> >   tries to replace the old relocation with a new one, it sees that
> >   the previous one is nonzero and it errors out.
>
> Should we add a selftest to make sure this problem doesn't come back?

IIRC, a selftest for this issue is not easy without Joe's klp-convert work.
At the moment I use kpatch-build for testing.

>
> >   On ppc64le, we have a similar issue:
> >
> >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> Just to clarify my previous comment, the reference to the powerpc issue
> should be removed because it'll be fixed in a separate patch.

Will remove.

>
> > He also proposed three different solutions. We could remove the error
> > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > ("x86/module: Detect and skip invalid relocations"). However the check
> > is useful for detecting corrupted modules.
> >
> > We could also deny the patched modules to be removed. If it proved to be
> > a major drawback for users, we could still implement a different
> > approach. The solution would also complicate the existing code a lot.
> >
> > We thus decided to reverse the relocation patching (clear all relocation
> > targets on x86_64). The solution is not
> > universal and is too much arch-specific, but it may prove to be simpler
> > in the end.
> >
> > Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> > Signed-off-by: Song Liu <song@kernel.org>
> > Acked-by: Miroslav Benes <mbenes@suse.cz>
> > Co-developed-by: Song Liu <song@kernel.org>
> > Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
>
> According to submitting-patches.rst the Co-developed-by is supposed to
> be immediately before your Signed-off-by.
>
> Also, other than the commit log, this patch is almost completely
> unrecognizable compared to Miroslav's original patch.  Does it still
> make sense for him to be listed as the author?
>
> In the -tip tree they sometimes use an Originally-by tag, which might be
> relevant here.

How about:

Signed-off-by: Song Liu <song@kernel.org>
Originally-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>

>
> > -static int __apply_relocate_add(Elf64_Shdr *sechdrs,
> > +static int __write_relocate_add(Elf64_Shdr *sechdrs,
> >                  const char *strtab,
> >                  unsigned int symindex,
> >                  unsigned int relsec,
> >                  struct module *me,
> > -                void *(*write)(void *dest, const void *src, size_t len))
> > +                void *(*write)(void *dest, const void *src, size_t len),
> > +                bool apply)
> >  {
> >       unsigned int i;
> >       Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
> >       Elf64_Sym *sym;
> >       void *loc;
> >       u64 val;
> > +     u64 zero = 0ULL;
> >
> > -     DEBUGP("Applying relocate section %u to %u\n",
> > +     DEBUGP("%s relocate section %u to %u\n",
> > +            (apply) ? "Applying" : "Clearing",
>
> "(apply)" has unnecessary parentheses.
>
> >              relsec, sechdrs[relsec].sh_info);
> >       for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
> > +             int write_size = 4;
>
> We already know we're writing, I suggest just calling it 'size' to be
> more consistent with kernel style.
>
> Also, it can be an unsigned value like size_t.
>
> Also, instead of initializing it here with a potentially bogus value
> which may need to be overwritten for a 64-bit reloc, it's better to
> explicitly set the size in each individual case below.  That's makes the
> logic clearer and helps prevent future bugs if new 64-bit relocation
> types are added later.

Will fix in v10.

>
> >               case R_X86_64_PC32:
> >               case R_X86_64_PLT32:
> > -                     if (*(u32 *)loc != 0)
> > -                             goto invalid_relocation;
> >                       val -= (u64)loc;
> > -                     write(loc, &val, 4);
> >  #if 0
> > -                     if ((s64)val != *(s32 *)loc)
> > +                     if ((s64)val != *(s32 *)&val)
> >                               goto overflow;
> >  #endif
>
> Why is the compiled-out code getting changed?  Is it actually fixing a
> hypothetical bug?

I just changed them the same way as other cases (assuming we remove
the #if 0, of course).

>
> This code really needs to be removed anyway, it's been dead for at least
> 15 years.

Shall we remove it now? Within the same patch? Or with a preparation
patch?

>
> >                       break;
> >               case R_X86_64_PC64:
> > -                     if (*(u64 *)loc != 0)
> > -                             goto invalid_relocation;
> >                       val -= (u64)loc;
> > -                     write(loc, &val, 8);
> > +                     write_size = 8;
> >                       break;
> >               default:
> >                       pr_err("%s: Unknown rela relocation: %llu\n",
> >                              me->name, ELF64_R_TYPE(rel[i].r_info));
> >                       return -ENOEXEC;
> >               }
> > +
> > +             if (apply) {
> > +                     if (memcmp(loc, &zero, write_size)) {
> > +                             pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
>
> It's not just "skipping", it's erroring out completely.  Yes, it's is a
> pre-existing error message but we might as well improve it while
> touching it.
>
> Maybe just remove the "Skipping", i.e. change "Skipping invalid ..." to
> "Invalid ..." ?

Sounds good to me.

>
> > +                                    (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > +                             return -ENOEXEC;
> > +                     }
> > +                     write(loc, &val, write_size);
> > +             } else {
> > +                     if (memcmp(loc, &val, write_size)) {
> > +                             pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
> > +                                     (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > +                     }
> > +                     write(loc, &zero, write_size);
>
> If the value doesn't match then something has gone badly wrong.  Why go
> ahead with the clearing in that case?

We can pr_err() then return -ENOEXEC (?). But I guess we need to
handle the error case in:
  klp_cleanup_module_patches_limited()
  klp_module_coming()
  klp_module_going()
and all the functions that call klp_module_going().

This seems a big overkill to me...

Or do you mean we just skip the write()?

>
> > +#ifdef CONFIG_LIVEPATCH
> > +
> > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > +                     const char *strtab,
> > +                     unsigned int symindex,
> > +                     unsigned int relsec,
> > +                     struct module *me)
> > +{
> > +     write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);
> > +}
> > +#endif
>
> Superflous newline after the '#ifdef CONFIG_LIVEPATCH'.
>
> > +
> >  #endif
> >
> >  int module_finalize(const Elf_Ehdr *hdr,
> > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> > index 7b4587a19189..0b54ec9856df 100644
> > --- a/include/linux/moduleloader.h
> > +++ b/include/linux/moduleloader.h
> > @@ -75,6 +75,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
> >                      unsigned int symindex,
> >                      unsigned int relsec,
> >                      struct module *mod);
> > +#ifdef CONFIG_LIVEPATCH
>
> This could use a comment explaining the purpose of this function:
>
> /*
>  * Some architectures (namely x86_64 and ppc64) perform sanity checks when
>  * applying relocations.  If a patched module gets unloaded and then later
>  * reloaded (and re-patched), klp re-applies relocations to the replacement
>  * function(s).  Any leftover relocations from the previous loading of the
>  * patched module might trigger the sanity checks.
>  *
>  * To prevent that, when unloading a patched module, clear out any relocations
>  * that might trigger arch-specific sanity checks on a future module reload.
>  */

Sounds great. Will add this to the next version.

>
> > +void clear_relocate_add(Elf_Shdr *sechdrs,
> > +                const char *strtab,
> > +                unsigned int symindex,
> > +                unsigned int relsec,
> > +                struct module *me);
> > +#endif
>
>
> --
> Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-20 19:41   ` Song Liu
@ 2023-01-20 20:32     ` Josh Poimboeuf
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-20 20:32 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	joe.lawrence, Miroslav Benes, Josh Poimboeuf

On Fri, Jan 20, 2023 at 11:41:02AM -0800, Song Liu wrote:
> > >   The livepatch module has a relocation which references a symbol
> > >   in the _previous_ loading of nfsd. When apply_relocate_add()
> > >   tries to replace the old relocation with a new one, it sees that
> > >   the previous one is nonzero and it errors out.
> >
> > Should we add a selftest to make sure this problem doesn't come back?
> 
> IIRC, a selftest for this issue is not easy without Joe's klp-convert work.
> At the moment I use kpatch-build for testing.

Ah right, I remember that now.

> How about:
> 
> Signed-off-by: Song Liu <song@kernel.org>
> Originally-by: Miroslav Benes <mbenes@suse.cz>
> Acked-by: Miroslav Benes <mbenes@suse.cz>
> Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>

Yes, but the ordering looks off, I think it should be more like:

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Originally-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>

And then make sure 'From:' is you.

BTW, this patch affects both livepatch and x86, so the subject prefix
should have "x86" added, something like:

  livepatch,x86: Clear relocations on module removal

> > This code really needs to be removed anyway, it's been dead for at least
> > 15 years.
> 
> Shall we remove it now? Within the same patch? Or with a preparation
> patch?
> 

A preparatory patch sounds good.

> > > +                                    (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > > +                             return -ENOEXEC;
> > > +                     }
> > > +                     write(loc, &val, write_size);
> > > +             } else {
> > > +                     if (memcmp(loc, &val, write_size)) {
> > > +                             pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
> > > +                                     (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > > +                     }
> > > +                     write(loc, &zero, write_size);
> >
> > If the value doesn't match then something has gone badly wrong.  Why go
> > ahead with the clearing in that case?
> 
> We can pr_err() then return -ENOEXEC (?). But I guess we need to
> handle the error case in:
>   klp_cleanup_module_patches_limited()
>   klp_module_coming()
>   klp_module_going()
> and all the functions that call klp_module_going().
> 
> This seems a big overkill to me...
> 
> Or do you mean we just skip the write()?

At the very least, skip the write.

But I really think it should just break out of the loop and return an
error, there's no point in trying to continue clearing the rest of the
relocations if one of them failed.

It's probably fine for the callers to ignore the error, the module's
going to get unloaded regardless.

-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-20  6:42     ` Josh Poimboeuf
  2023-01-20 17:03       ` Song Liu
@ 2023-01-24 20:08       ` Joe Lawrence
  2023-01-24 22:32         ` Josh Poimboeuf
  1 sibling, 1 reply; 11+ messages in thread
From: Joe Lawrence @ 2023-01-24 20:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	Miroslav Benes, Josh Poimboeuf, Song Liu

On 1/20/23 01:42, Josh Poimboeuf wrote:
> On Thu, Jan 19, 2023 at 11:06:35AM -0800, Song Liu wrote:
>> On Wed, Jan 18, 2023 at 2:08 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>>>
>>> On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
>>>> From: Miroslav Benes <mbenes@suse.cz>
>>>>
>>>> Josh reported a bug:
>>>>
>>>>   When the object to be patched is a module, and that module is
>>>>   rmmod'ed and reloaded, it fails to load with:
>>>>
>>>>   module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
>>>>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>>
>>>>   The livepatch module has a relocation which references a symbol
>>>>   in the _previous_ loading of nfsd. When apply_relocate_add()
>>>>   tries to replace the old relocation with a new one, it sees that
>>>>   the previous one is nonzero and it errors out.
>>>>
>>>>   On ppc64le, we have a similar issue:
>>>>
>>>>   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
>>>>   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
>>>>   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>>>
>>> Shouldn't there also be a fix for this powerpc issue?
>>
>> There was a working version, but it was not very clean. We couldn't agree
>> on the path forward for powerpc, so we are hoping to ship the fix to x86 (and
>> s390?) first [1].
> 
> Sorry for coming in late, I was on leave so I missed a lot of the
> discussions on previous versions.  The decision to leave powerpc broken
> wasn't clear from reading the commit message.  The bug is mentioned, and
> the fix is implied, but surprisingly there's no fix.
> 
> I agree that the powerpc fix should be in a separate patch, but I still
> don't feel comfortable merging the x86 fix without the corresponding
> powerpc fix.
> 
> powerpc is a major arch and not a second-class citizen.  If we don't fix
> it now then it'll probably never get fixed until it blows up in the real
> world.
> 
> For powerpc, instead of clearing, how about just "fixing" the warning
> site, something like so (untested)?
> 
> 
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 1096d6b3a62c..1a12463ba674 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -499,9 +499,11 @@ static unsigned long stub_for_addr(const Elf64_Shdr *sechdrs,
>  
>  /* We expect a noop next: if it is, replace it with instruction to
>     restore r2. */
> -static int restore_r2(const char *name, u32 *instruction, struct module *me)
> +static int restore_r2(const char *name, u32 *instruction, struct module *me,
> +		      bool klp_sym)
>  {
>  	u32 *prev_insn = instruction - 1;
> +	u32 insn_val = *instruction;
>  
>  	if (is_mprofile_ftrace_call(name))
>  		return 1;
> @@ -514,9 +516,18 @@ static int restore_r2(const char *name, u32 *instruction, struct module *me)
>  	if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
>  		return 1;
>  
> -	if (*instruction != PPC_RAW_NOP()) {
> +	/*
> +	 * For a livepatch relocation, the restore r2 instruction might have
> +	 * been previously written if the relocation references a symbol in a
> +	 * module which was unloaded and is now being reloaded.  In that case,
> +	 * skip the warning and instruction write.
> +	 */
> +	if (klp_sym && insn_val == PPC_INST_LD_TOC)
> +		return 0;

Hi Josh,

Nit: shouldn't this return 1?

And if you're willing to entertain a small refactor, wouldn't
restore_r2() be clearer if it returned -ESOMETHING on error?

Maybe converting to a boolean could work, but then I'd suggest a name
that clearly implies success/fail given true/false return.  Maybe
replace_nop_with_ld_toc() or replace_nop_to_restore_r2() ... still
-ESOMETHING is more intuitive to me as there are cases like this where
the function safely returns w/o replacing anything.

> +
> +	if (insn_val != PPC_RAW_NOP()) {
>  		pr_err("%s: Expected nop after call, got %08x at %pS\n",
> -			me->name, *instruction, instruction);
> +			me->name, insn_val, instruction);
>  		return 0;
>  	}
>  
> @@ -649,7 +660,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  				if (!value)
>  					return -ENOENT;
>  				if (!restore_r2(strtab + sym->st_name,
> -							(u32 *)location + 1, me))
> +						(u32 *)location + 1, me,
> +						sym->st_shndx == SHN_LIVEPATCH))
>  					return -ENOEXEC;
>  			} else
>  				value += local_entry_offset(sym);
> 

klp-convert-tree tests* ran OK with this patch (with the nit fixed) on
top of Song's v10.  LMK if you want me to push a branch with some or all
of these patches for further testing.

* I removed the tests that check for relocation clearing, only tested
module reloading

-- 
Joe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
  2023-01-24 20:08       ` Joe Lawrence
@ 2023-01-24 22:32         ` Josh Poimboeuf
  0 siblings, 0 replies; 11+ messages in thread
From: Josh Poimboeuf @ 2023-01-24 22:32 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, linux-modules, live-patching, x86, jikos, pmladek,
	Miroslav Benes, Josh Poimboeuf, Song Liu

On Tue, Jan 24, 2023 at 03:08:27PM -0500, Joe Lawrence wrote:
> > +	/*
> > +	 * For a livepatch relocation, the restore r2 instruction might have
> > +	 * been previously written if the relocation references a symbol in a
> > +	 * module which was unloaded and is now being reloaded.  In that case,
> > +	 * skip the warning and instruction write.
> > +	 */
> > +	if (klp_sym && insn_val == PPC_INST_LD_TOC)
> > +		return 0;
> 
> Hi Josh,
> 
> Nit: shouldn't this return 1?
> 
> And if you're willing to entertain a small refactor, wouldn't
> restore_r2() be clearer if it returned -ESOMETHING on error?
> 
> Maybe converting to a boolean could work, but then I'd suggest a name
> that clearly implies success/fail given true/false return.  Maybe
> replace_nop_with_ld_toc() or replace_nop_to_restore_r2() ... still
> -ESOMETHING is more intuitive to me as there are cases like this where
> the function safely returns w/o replacing anything.

Indeed, and I actually already discovered that and made such changes,
just need to get around to posting the patches.

-- 
Josh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-01-24 22:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 20:47 [PATCH v9] livepatch: Clear relocation targets on a module removal Song Liu
2023-01-18 22:08 ` Josh Poimboeuf
2023-01-19 19:06   ` Song Liu
2023-01-20  6:42     ` Josh Poimboeuf
2023-01-20 17:03       ` Song Liu
2023-01-20 17:28         ` Josh Poimboeuf
2023-01-24 20:08       ` Joe Lawrence
2023-01-24 22:32         ` Josh Poimboeuf
2023-01-20 19:16 ` Josh Poimboeuf
2023-01-20 19:41   ` Song Liu
2023-01-20 20:32     ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).