linux-modules.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: mod->klp set on copy ok ?
       [not found] ` <ZA8NBuXbVP+PRPp0@alley>
@ 2023-03-16 21:50   ` Luis Chamberlain
  2023-03-17 16:16     ` Josh Poimboeuf
  0 siblings, 1 reply; 4+ messages in thread
From: Luis Chamberlain @ 2023-03-16 21:50 UTC (permalink / raw)
  To: Petr Mladek, Song Liu; +Cc: patches, linux-modules, live-patching, mcgrof

I had a hiccup evaluating if we copy or not over the mod->klp over from
live patching to the new copy of the module we process, and so I cross
checked with Petr on this.

I had jumped the gun on my analysis leading me to believe we could very
well not be copying it over once we discard the copy of the module. I
did a simple test and I see the mod->klp set to true propagated but
understanding *why* is a pit perplexing and I frankly don't think I see it
yet.  Eearlier  today I though I had clarity on this but as I review
things more with patience, the more puzzling this seems.

Given the lack of documentation on how this works I figured I'd cc
linux-modules and live-patching to ensure others can review / so we
can enhance this documentation and code to be made clearer. It's not
clear to me yet where the copy of the data is coming from yet, I see
where it should be, but the logic seems fragile and perhaps error prone
so I'd love some more eyeballs on this now.

Song is working on some of these areas as well, if we can also clean
things up in a better way as we go along it's a good time to review
that now too.

Below code analysis as of linux-next today.

On Mon, Mar 13, 2023 at 12:46:14PM +0100, Petr Mladek wrote:
> On Sat 2023-03-11 18:16:35, Luis Chamberlain wrote:
> > Hey Petr, while working on my V2 to reduce unnecessary memory pressure
> > (with my fixes from V1 I'm seeing considerable savings) I came to realize
> > that the place I put mod aliases was wrong as it works on the copy of the
> > module, not the final thing.
> 
> I see.
> 
> > So I'm fixing that now so I allocate the aliases after
> > layout_and_allocate(). However since mod->klp is set to true
> > on the copy on check_mod_info() I started wondering if it's not propagated
> > later. Code wise I think that true.
> 
> It seems to be propagated, see below. Well, I do not see it in the code.

I didn't see where this could be propagated either at first glance. Now
I  have some idea but it's still a bit obfuscated and I'm not quite sure
of it to be frank. It seems that it should all happen because move_module()
actually copies all the data we fudged on the copy into its final
section header location. But the copy is constrained by the ELF section
data it sees and length, and I don't see why we are getting more
information than what modpost generates on the mod.c files for modules
and it is minimal.

Since its obscure let me elaborate on my explanation. If folks find
issues please let me know.

We have two ways to shove modules into the kernel, the old system call
init_module() and then the new finit_module(). Both end up having a
temporary buffer allocated with a copy of userspace module information
in it, and set the struct load_info structure, passed to load_module().
The non-decompression finit_module() has the simplest userspace copy
example:

SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
{
	int len;
	struct load_info info = { };
	...
	len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL, READING_MODULE);  
	...
	if (not_compression) {
		info.len = len;
		info.hdr = buf;
	}
	return load_module(&info, uargs, flags);
}

The call load_module() eventually will have to vfree(info->hdr), it
eventually does this right before load_module() calls do_init_module(mod)
at the very end.

static int load_module(struct load_info *info, const char __user *uargs,
                       int flags)
{
	...
	/* Get rid of temporary copy. */
	free_copy(info, flags); 

	/* Done! */                                                             
	trace_module_load(mod);                                                 

	return do_init_module(mod);
	...
}

So we have proof we get rid of it. How about the copy? First let's
define the copy of the module as being on info->mod. This gets first
initialized through setup_load_info() early on load_module().

static int setup_load_info(struct load_info *info, int flags)
{
	...
        /* This is temporary: point mod into copy of data. */                   
	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;
	...
}

Understanding this is key, as well as where and how the non-copy
of mod pointer is set and what it points to.

We know info->hdr is initialized to the copy of the user buffer (info.hdr = buf).
So we just need now to look at info->sechdrs[info->index.mod].sh_offset.
This is all set up on early on setup_load_info(). But first we must look
for when the hell info->sechdrs is initialized. Sadly this is obfuscated
in elf_validity_check() which is called right before setup_load_info():

static int elf_validity_check(struct load_info *info)                           
{
	...
	info->sechdrs = (void *)info->hdr + info->hdr->e_shoff;
	...
}

So back to info->mod pointer. So now we know where info->sechdrs points to.
It's just info->hdr + info->hdr->e_shoff. And the info->index.mod is set
in setup_load_info:

static int setup_load_info(struct load_info *info, int flags)
{
	...
	info->index.mod = find_sec(info, ".gnu.linkonce.this_module");          
	if (!info->index.mod) {
		pr_warn("%s: No module found in object\n",                      
			info->name ? : "(missing .modinfo section or name field)");
		return -ENOEXEC;                                                
	}
	...
}

The nice thing is we can rest assured all modules have this section set.
So mod just just points to the ELF section for ".gnu.linkonce.this_module"
from the copy in userspace at first.

That is generated for the module on the mod.c file by modpost, my fs/xfs/xfs.mod.c
for example has:

__visible struct module __this_module
__section(".gnu.linkonce.this_module") = {
	.name = KBUILD_MODNAME,
	.init = init_module,
#ifdef CONFIG_MODULE_UNLOAD
	.exit = cleanup_module,
#endif
	.arch = MODULE_ARCH_INIT,
};

This is a nasty way to just define statically a struct module,
using the variable name __this_module. But its just the same as:

struct module __this_module = {
	.name = KBUILD_MODNAME,
	.init = init_module,
	.exit = cleanup_module,
	.arch = MODULE_ARCH_INIT,
};

But what I'm seeing is that if we grow the struct module in include/linux/module.h
the section for .gnu.linkonce.this_module for my ELF modules does not grow.

But let's look at how we copy this to the final mod that is used...

Note that layout_and_allocate() sets the final mod as:

static struct module *layout_and_allocate(struct load_info *info, int flags)
{
	...
	/* Allocate and move to the final place */                              
	err = move_module(info->mod, info);                                     
	if (err)
		goto err_out;

	/* Module has been copied to its final place now: return it. */         
	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
	...
}

We have to look at what move_module() does then. The
layout_sections() stuff sets up the corresponding mod->mem stuff to
correspond with the copy. Then move_module() allocates the the same
mod->mem stuff and copies the data from the old module copy. The only
difference is that now sections allocated with module_memory_alloc().
The relevant parts:

static int move_module(struct module *mod, struct load_info *info)              
{
	int i;
	void *ptr; 
	...
	for_each_mod_mem_type(type) {
		ptr = module_memory_alloc(mod->mem[type].size, type);
		...
		memset(ptr, 0, mod->mem[type].size);
		mod->mem[type].base = ptr;
	}
	for (i = 0; i < info->hdr->e_shnum; i++) {
		void *dest;
		Elf_Shdr *shdr = &info->sechdrs[i];
		...
		dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
		if (shdr->sh_type != SHT_NOBITS)
			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
		/* Update sh_addr to point to copy in image. */
		shdr->sh_addr = (unsigned long)dest;
		...
	}
	...
}

The comment for "Update sh_addr to point to copy in image." seems pretty
misleading to me, what we are doing there is actually ensuring that we update
the copy's ELF section address to point to our newly allocated memory.
Do folks agree?

And how about the size on the memcpy()? That's a shd->sh_size. No matter
how much I increase my struct module in include/linux/module.h I see
thes same sh_size. Do folks see same?

nm --print-size --size-sort fs/xfs/xfs.ko | grep __this_module
0000000000000000 0000000000000500 D __this_module

This is what is supposed to make the final part of layout_and_allocate() work:

	mod = (void *)info->sechdrs[info->index.mod].sh_addr;

This works off of the copy of the module. Let's recall that
setup_load_info() sets the copy mod to:

	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;

The memcpy() in move_module() is what *should* be copying over the entire
mod stuff properly over, that includes the mod->klp for live patching
but also any new data we muck with in-kernel as the new mod->mem stuff
in layout_sections(). In short, anything in struct module should be
shoved into an ELF section. But I'm not quite sure this is all right.

My biggest fear is that as we work on the copy of the module we are
actually overwriting data that the user provided.

> > But we don't seem to need mod->klp to be true post settings up the
> > module from my quick glance. We check for it on layout_symtab().
> > So it seem just by chance and context that is_livepatch_module() works.
> 
> Hmm, is_livepatch_module() is called also from klp_enable_patch().
> 
> And klp_enable_patch() is called from the mod->init() callbacks,
> for example, see samples/livepatch/livepatch-sample.c.
> IMHO, it is done when the module is already on the final location.
> 
> It would be great if mod->klp is set correctly in the final struct
> module. Otherwise, is_livepatch_module() would be error prone
> and should not be in the global include/linux/module.h.

Yeah, that's what I was thinking, but indeed we do need it for both
layout_and_allocate() where we work with what userspace provided
and after layout_and_allocate() where we are supposed to be just working
with the new copy.

> > I just wanted you to confirm before I continue my cleanup. I'll add a pre
> > check and post check. All the taints for instance are wrong as they are
> > used and called before we even know if we have memory to load the module.
> > The taints are global though but still it's not accurate to call before we
> > load the actual module successfully.
> 
> Thanks a lot for keeping livepatching in mind.

Help me unravel the above special historic spaghetti code :)

  Luis

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mod->klp set on copy ok ?
  2023-03-16 21:50   ` mod->klp set on copy ok ? Luis Chamberlain
@ 2023-03-17 16:16     ` Josh Poimboeuf
  2023-03-17 19:00       ` Luis Chamberlain
  0 siblings, 1 reply; 4+ messages in thread
From: Josh Poimboeuf @ 2023-03-17 16:16 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Petr Mladek, Song Liu, patches, linux-modules, live-patching

On Thu, Mar 16, 2023 at 02:50:55PM -0700, Luis Chamberlain wrote:
> The comment for "Update sh_addr to point to copy in image." seems pretty
> misleading to me, what we are doing there is actually ensuring that we update
> the copy's ELF section address to point to our newly allocated memory.
> Do folks agree?
> 
> And how about the size on the memcpy()? That's a shd->sh_size. No matter
> how much I increase my struct module in include/linux/module.h I see
> thes same sh_size. Do folks see same?
> 
> nm --print-size --size-sort fs/xfs/xfs.ko | grep __this_module
> 0000000000000000 0000000000000500 D __this_module
> 
> This is what is supposed to make the final part of layout_and_allocate() work:
> 
> 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> 
> This works off of the copy of the module. Let's recall that
> setup_load_info() sets the copy mod to:
> 
> 	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;
> 
> The memcpy() in move_module() is what *should* be copying over the entire
> mod stuff properly over, that includes the mod->klp for live patching
> but also any new data we muck with in-kernel as the new mod->mem stuff
> in layout_sections(). In short, anything in struct module should be
> shoved into an ELF section. But I'm not quite sure this is all right.

I dug into that code years ago, and the above sounds right.

The .ko file has a .gnu.linkonce.this_module section whose data is just
the original "struct module __this_module" which is created by the
module build (from foo.mod.c).

At the beginning of the finit_module() syscall, the .ko file's ELF
sections get copied (and optionally decompressed) into kernel memory.
Then 'mod' just points to the copied __this_module struct.

Then mod->klp (and possibly mod->taint) get set.

Then in layout_and_allocate(), that 'mod' gets memcpy'd into the second
(and final) in-kernel copy of 'struct module':

 		if (shdr->sh_type != SHT_NOBITS)
 			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
 		/* Update sh_addr to point to copy in image. */
 		shdr->sh_addr = (unsigned long)dest;

I suspect you don't see the size changing when you add to 'struct
module' because it's ____cacheline_aligned.

It's all rather obtuse, but working as designed as far as I can tell.

-- 
Josh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mod->klp set on copy ok ?
  2023-03-17 16:16     ` Josh Poimboeuf
@ 2023-03-17 19:00       ` Luis Chamberlain
  2023-03-17 23:31         ` Josh Poimboeuf
  0 siblings, 1 reply; 4+ messages in thread
From: Luis Chamberlain @ 2023-03-17 19:00 UTC (permalink / raw)
  To: Josh Poimboeuf, Masahiro Yamada
  Cc: Petr Mladek, Song Liu, patches, linux-modules, live-patching,
	linux-kbuild

+ linux-kbuild

On Fri, Mar 17, 2023 at 09:16:39AM -0700, Josh Poimboeuf wrote:
> On Thu, Mar 16, 2023 at 02:50:55PM -0700, Luis Chamberlain wrote:
> > The comment for "Update sh_addr to point to copy in image." seems pretty
> > misleading to me, what we are doing there is actually ensuring that we update
> > the copy's ELF section address to point to our newly allocated memory.
> > Do folks agree?
> > 
> > And how about the size on the memcpy()? That's a shd->sh_size. No matter
> > how much I increase my struct module in include/linux/module.h I see
> > thes same sh_size. Do folks see same?
> > 
> > nm --print-size --size-sort fs/xfs/xfs.ko | grep __this_module
> > 0000000000000000 0000000000000500 D __this_module
> > 
> > This is what is supposed to make the final part of layout_and_allocate() work:
> > 
> > 	mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> > 
> > This works off of the copy of the module. Let's recall that
> > setup_load_info() sets the copy mod to:
> > 
> > 	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;
> > 
> > The memcpy() in move_module() is what *should* be copying over the entire
> > mod stuff properly over, that includes the mod->klp for live patching
> > but also any new data we muck with in-kernel as the new mod->mem stuff
> > in layout_sections(). In short, anything in struct module should be
> > shoved into an ELF section. But I'm not quite sure this is all right.
> 
> I dug into that code years ago, and the above sounds right.
> 
> The .ko file has a .gnu.linkonce.this_module section whose data is just
> the original "struct module __this_module" which is created by the
> module build (from foo.mod.c).
> 
> At the beginning of the finit_module() syscall, the .ko file's ELF
> sections get copied (and optionally decompressed) into kernel memory.
> Then 'mod' just points to the copied __this_module struct.
> 
> Then mod->klp (and possibly mod->taint) get set.
> 
> Then in layout_and_allocate(), that 'mod' gets memcpy'd into the second
> (and final) in-kernel copy of 'struct module':
> 
>  		if (shdr->sh_type != SHT_NOBITS)
>  			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
>  		/* Update sh_addr to point to copy in image. */
>  		shdr->sh_addr = (unsigned long)dest;
> 
> I suspect you don't see the size changing when you add to 'struct
> module' because it's ____cacheline_aligned.
> 
> It's all rather obtuse, but working as designed as far as I can tell.

Ah, well it is beyond a ____cacheline_aligned issue! It would seem our build
system does not incur a full re-build of $foo.mod.c if the size of struct module
changes. Doing a full rebuild does get the right drift size change in
struct module:

nm --print-size --radix=dec --size-sort fs/xfs/xfs.ko | grep __this_module
0000000000000000 0000000000001280 D __this_module

I validated that sizeof(struct module) does match 1280. And so, to remove all
these cob webs for good, and to extend our ELF validity checker, what about
the following patch (boot tested):

From cdd78ff0cb6d521930305026089a72776a982845 Mon Sep 17 00:00:00 2001
From: Luis Chamberlain <mcgrof@kernel.org>
Date: Fri, 17 Mar 2023 11:41:02 -0700
Subject: [PATCH] module: add sanity check for ELF module section

The ELF ".gnu.linkonce.this_module" section is special, it is what we
use to construct the struct module __this_module, which THIS_MODULE
points to. When userspace loads a module we always deal first with a
copy of the userspace buffer, and twiddle with the userspace copy's
version of the struct module. Eventually we allocate memory to do a
memcpy() of that struct module, under the assumption that the module
size is right. But we have no validity checks against the size or
the requirements for the section.

Add some validity checks for the special module section early and while
at it, cache the module section index early, so we don't have to do that
later.

While at it, just move over the assigment of the info->mod to make the
code clearer. The validity checker also adds an explicit size check to
ensure the module section size matches the kernel's run time size for
sizeof(struct module). This should prevent sloppy loads of modules
which are built today *without* actually increasing the size of
the struct module. A developer today can for example expand the size
of struct module, rebuild a directoroy 'make fs/xfs/' for example and
then try to insmode the driver there. That module would in effect have
an incorrect size. This new size check would put a stop gap against such
mistakes.

This also makes the entire goal of ".gnu.linkonce.this_module" pretty
clear. Before this patch verification of the goal / intent required some
Indian Jones whips, torches and cleaning up big old spider webs.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c | 60 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 57cc10a20b45..d1aee35ab928 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1656,6 +1656,7 @@ static int elf_validity_check(struct load_info *info)
 	unsigned int i;
 	Elf_Shdr *shdr, *strhdr;
 	int err;
+	unsigned int num_mod_secs = 0, mod_idx;
 
 	if (info->len < sizeof(*(info->hdr))) {
 		pr_err("Invalid ELF header len %lu\n", info->len);
@@ -1767,6 +1768,11 @@ static int elf_validity_check(struct load_info *info)
 					i, shdr->sh_type);
 				return err;
 			}
+			if (strcmp(info->secstrings + shdr->sh_name,
+				   ".gnu.linkonce.this_module") == 0) {
+				num_mod_secs++;
+				mod_idx = i;
+			}
 
 			if (shdr->sh_flags & SHF_ALLOC) {
 				if (shdr->sh_name >= strhdr->sh_size) {
@@ -1779,6 +1785,51 @@ static int elf_validity_check(struct load_info *info)
 		}
 	}
 
+	/* The ".gnu.linkonce.this_module" ELF section is special. It is
+	 * what modpost uses to refer to __this_module and let's use rely
+	 * on THIS_MODULE to point to &__this_module properly. The kernel's
+	 * modpost declares it on each modules's *.mod.c file. If the struct
+	 * module of the kernel changes a full kernel rebuild is required.
+	 *
+	 * We have a few expectaions for this special section, the following
+	 * code validates all this for us:
+	 *
+	 *   o Only one section must exist
+	 *   o We expect the kernel to always have to allocate it: SHF_ALLOC
+	 *   o The section size must match the kernel's run time's struct module
+	 *     size
+	 */
+	if (num_mod_secs != 1) {
+		pr_err("Only one .gnu.linkonce.this_module section must exist.\n");
+		goto no_exec;
+	}
+
+	shdr = &info->sechdrs[mod_idx];
+
+	/*
+	 * This is already implied on the switch above, however let's be
+	 * pedantic about it.
+	 */
+	if (shdr->sh_type == SHT_NOBITS) {
+		pr_err(".gnu.linkonce.this_module section must have a size set\n");
+		goto no_exec;
+	}
+
+	if (!(shdr->sh_flags & SHF_ALLOC)) {
+		pr_err(".gnu.linkonce.this_module must occupy memory during process execution\n");
+		goto no_exec;
+	}
+
+	if (shdr->sh_size != sizeof(struct module)) {
+		pr_err(".gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n");
+		goto no_exec;
+	}
+
+	info->index.mod = mod_idx;
+
+	/* This is temporary: point mod into copy of data. */
+	info->mod = (void *)info->hdr + shdr->sh_offset;
+
 	return 0;
 
 no_exec:
@@ -1925,15 +1976,6 @@ static int setup_load_info(struct load_info *info, int flags)
 		return -ENOEXEC;
 	}
 
-	info->index.mod = find_sec(info, ".gnu.linkonce.this_module");
-	if (!info->index.mod) {
-		pr_warn("%s: No module found in object\n",
-			info->name ?: "(missing .modinfo section or name field)");
-		return -ENOEXEC;
-	}
-	/* This is temporary: point mod into copy of data. */
-	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;
-
 	/*
 	 * If we didn't load the .modinfo 'name' field earlier, fall back to
 	 * on-disk struct mod 'name' field.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: mod->klp set on copy ok ?
  2023-03-17 19:00       ` Luis Chamberlain
@ 2023-03-17 23:31         ` Josh Poimboeuf
  0 siblings, 0 replies; 4+ messages in thread
From: Josh Poimboeuf @ 2023-03-17 23:31 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Masahiro Yamada, Petr Mladek, Song Liu, patches, linux-modules,
	live-patching, linux-kbuild

On Fri, Mar 17, 2023 at 12:00:40PM -0700, Luis Chamberlain wrote:
> > I dug into that code years ago, and the above sounds right.
> > 
> > The .ko file has a .gnu.linkonce.this_module section whose data is just
> > the original "struct module __this_module" which is created by the
> > module build (from foo.mod.c).
> > 
> > At the beginning of the finit_module() syscall, the .ko file's ELF
> > sections get copied (and optionally decompressed) into kernel memory.
> > Then 'mod' just points to the copied __this_module struct.
> > 
> > Then mod->klp (and possibly mod->taint) get set.
> > 
> > Then in layout_and_allocate(), that 'mod' gets memcpy'd into the second
> > (and final) in-kernel copy of 'struct module':
> > 
> >  		if (shdr->sh_type != SHT_NOBITS)
> >  			memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> >  		/* Update sh_addr to point to copy in image. */
> >  		shdr->sh_addr = (unsigned long)dest;
> > 
> > I suspect you don't see the size changing when you add to 'struct
> > module' because it's ____cacheline_aligned.
> > 
> > It's all rather obtuse, but working as designed as far as I can tell.
> 
> Ah, well it is beyond a ____cacheline_aligned issue! It would seem our build
> system does not incur a full re-build of $foo.mod.c if the size of struct module
> changes. Doing a full rebuild does get the right drift size change in
> struct module:

Ah, ok.  It sounds like build dependencies are broken for *.mod.c.

The added validations in the patch look reasonable, though the broken
build dependencies should also get fixed.

-- 
Josh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-17 23:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAB=NE6Vo4AXVrn1GPEoZWVF3NkXRoPwWOuUEJqJ35S9VMGTM2Q@mail.gmail.com>
     [not found] ` <ZA8NBuXbVP+PRPp0@alley>
2023-03-16 21:50   ` mod->klp set on copy ok ? Luis Chamberlain
2023-03-17 16:16     ` Josh Poimboeuf
2023-03-17 19:00       ` Luis Chamberlain
2023-03-17 23:31         ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).