linux-modules.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Introduce "sysctl:" module aliases
@ 2022-07-22  2:24 Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 1/6] modpost: factor out elf/arch-specific code from section_rel[a]() Mauricio Faria de Oliveira
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

This series allows modules to have "sysctl:<procname>" module aliases
for sysctl entries, registering sysctl tables with modpost/file2alias.
(Similarly to "pci:<IDs>" aliases for PCI ID tables in device drivers.)

The issue behind it: if a sysctl value is in /etc/sysctl.{conf,d/*.conf}
but does not exist in /proc/sys/ when the userspace tool that applies it
runs, it does not get set.

It would be nice if the tool could run 'modprobe sysctl:<something>' and
get that '/proc/sys/.../something' up (as an administrator configured it)
and then set it, as intended. (A bit like PCI ID-based module loading.)

...

The series is relatively simple, except for patch 4 (IMHO) due to ELF.

- Patches 1-2 simplify ELF code in modpost.c (code moves, not in-depth).
- Patches 3-4 implement the feature (patch 4 is more in-depth).
- Patches 5-6 consume and expose it.

I have tested it on x86_64 with next-20220721, and it looks correct
('modprobe sysctl:nf_conntrack_max' works; other aliases there; see below).

I plan to test other archs by cross-building 'allmodconfig' and checking
the .mod.c files and modpost output (eg, warnings) for no changes at all,
and nf_conntrack.mod.c for expected sysctl aliases. [based on feedback.]
(i.e., changes didn't break modpost, and ELF code works on other archs.)

Happy to receive suggestions to improve test coverage and functionality.

I didn't look much at auto-registration with modpost using the register
functions for sysctl, but it seems it would need plumbing, if possible.

Let's see review/feedback on the basics first.

thanks,
Mauricio

...

Some context.

Even though that issue might be expected and obvious, its consequences
sometimes are not.

An example is the nf_conntrack_max value, that in busy gateways/routers
/cloud deployments can affect performance and functionality more subtly,
or even fill the kernel log non-stop with 'table full, dropping packet',
if a value greater than the default value is not used.

The current solution (workaround, arguably) for this is to include such
modules in /etc/modules (or in /etc/modules-load.d/*.conf with systemd),
which loads them before an userspace tool (procps's sysctl or systemd's
systemd-sysctl{,.service}) runs, so /proc/sys/... exists when it runs.

...

That is simple, indeed, but comes w/ technical debt. (ugly stuff warning!)

Now there are many _different_ pieces of code that use the _same_ module
doing that (eg, deployment tools/scripts for openstack nova and neutron,
firewalls, and maybe more).

And sometimes when components are split or deployed to different nodes
it turns out that in the next reboot we figure (through an issue) that
some component did set /etc/sysctl.conf but not /etc/modules.conf, or
relied in the ex-colocated component doing that.

This has generated several one-off fixes at this point in some projects.
(I have submitted one of those, actually, a while ago.)

Also, some of those fixes (or original code) put 'nf_conntrack_ipv{4,6}'
in /etc/modules, getting 'nf_conntrack' loaded via module dependencies
(maybe it was the right module for them at the time, for some reason).

So, that component (or a colocated component) got nf_conntrack.ko too.

*BUT* after an upgrade from Ubuntu 18.04 (4.15-based kernel) to 20.04
(5.4-based kernel), the nf_conntrack_ipv{4,6}.ko modules do not exist
anymore, and now nf_conntrack.ko is no longer loaded, and the sysctl
nf_conntrack_max is no longer applied. (Someone had to figure it out.)

And now maybe we'd need release/kernel-version checks in scripts that
use the workaround of /etc/modules for /etc/sysctl.conf configuration.

(Yes, it was ugly stuff.)

...

Well, this last point seemed like "ok, that's enough; we can do better."

I'm not sure this approach is "better" in all reasons, but hopefully it
might help starting something that is. 🙏

cheers,
Mauricio

...

Tests:

    $ cat /proc/sys/kernel/modprobe_sysctl_alias
    1
    
    $ cat /proc/sys/net/netfilter/nf_conntrack_max
    cat: /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
    
    $ lsmod | grep nf_conntrack
    $
    
    $ sudo modprobe sysctl:nf_conntrack_max
    
    $ cat /proc/sys/net/netfilter/nf_conntrack_max
    262144
    
    $ lsmod | grep nf_conntrack
    nf_conntrack          110592  0
    nf_defrag_ipv6         20480  1 nf_conntrack
    nf_defrag_ipv4         16384  1 nf_conntrack
    
    $ modinfo nf_conntrack | grep ^alias:
    alias:          nf_conntrack-10
    alias:          nf_conntrack-2
    alias:          ip_conntrack
    alias:          sysctl:nf_conntrack_icmpv6_timeout
    alias:          sysctl:nf_conntrack_icmp_timeout
    alias:          sysctl:nf_conntrack_udp_timeout_stream
    alias:          sysctl:nf_conntrack_udp_timeout
    alias:          sysctl:nf_conntrack_tcp_max_retrans
    alias:          sysctl:nf_conntrack_tcp_ignore_invalid_rst
    alias:          sysctl:nf_conntrack_tcp_be_liberal
    alias:          sysctl:nf_conntrack_tcp_loose
    alias:          sysctl:nf_conntrack_tcp_timeout_unacknowledged
    alias:          sysctl:nf_conntrack_tcp_timeout_max_retrans
    alias:          sysctl:nf_conntrack_tcp_timeout_close
    alias:          sysctl:nf_conntrack_tcp_timeout_time_wait
    alias:          sysctl:nf_conntrack_tcp_timeout_last_ack
    alias:          sysctl:nf_conntrack_tcp_timeout_close_wait
    alias:          sysctl:nf_conntrack_tcp_timeout_fin_wait
    alias:          sysctl:nf_conntrack_tcp_timeout_established
    alias:          sysctl:nf_conntrack_tcp_timeout_syn_recv
    alias:          sysctl:nf_conntrack_tcp_timeout_syn_sent
    alias:          sysctl:nf_conntrack_generic_timeout
    alias:          sysctl:nf_conntrack_helper
    alias:          sysctl:nf_conntrack_acct
    alias:          sysctl:nf_conntrack_expect_max
    alias:          sysctl:nf_conntrack_log_invalid
    alias:          sysctl:nf_conntrack_checksum
    alias:          sysctl:nf_conntrack_buckets
    alias:          sysctl:nf_conntrack_count
    alias:          sysctl:nf_conntrack_max
    
    $ modinfo r8169 | grep ^alias:
    alias:          pci:v000010ECd00003000sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008125sv*sd*bc*sc*i*
    alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
    alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
    alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
    alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
    alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
    alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
    alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
    alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
    alias:          pci:v000010FFd00008168sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008162sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008161sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00002600sv*sd*bc*sc*i*
    alias:          pci:v000010ECd00002502sv*sd*bc*sc*i*

Mauricio Faria de Oliveira (6):
  modpost: factor out elf/arch-specific code from section_rel[a]()
  modpost: deduplicate section_rel[a]()
  sysctl, mod_devicetable: shadow struct ctl_table.procname for
    file2alias
  module, modpost: introduce support for MODULE_SYSCTL_TABLE
  netfilter: conntrack: use MODULE_SYSCTL_TABLE
  sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias

 fs/proc/proc_sysctl.c                   |  27 ++++
 include/linux/mod_devicetable.h         |  25 ++++
 include/linux/module.h                  |   8 ++
 include/linux/sysctl.h                  |  11 +-
 kernel/sysctl.c                         |  10 ++
 net/netfilter/nf_conntrack_standalone.c |   4 +
 scripts/mod/devicetable-offsets.c       |   3 +
 scripts/mod/file2alias.c                | 111 +++++++++++++++
 scripts/mod/modpost.c                   | 178 +++++++++++++-----------
 scripts/mod/modpost.h                   |   3 +
 10 files changed, 296 insertions(+), 84 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/6] modpost: factor out elf/arch-specific code from section_rel[a]()
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 2/6] modpost: deduplicate section_rel[a]() Mauricio Faria de Oliveira
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

There's elf/arch-specific code identical in both functions, with some
in section_rel() only.

In order to factor that out, generalize the different relocation types
Elf_Rela/Elf_Rel (relocation with/without an addend) with the Elf_Rela
type, that is just Elf_Rel with a '.r_addend' field.

Most of this code only uses Elf_Rel fields ('.r_offset' and '.r_info').
Make usage of '.r_addend' conditional on section header type SHT_RELA.

(Note, though, that '.r_addend' is used on SHT_REL in some archs/formats
for the _output_ relocation entry, but this is fine and existing code.)

This change also seems to help with readability of section_rel[a]().

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 scripts/mod/modpost.c | 141 +++++++++++++++++++++++++-----------------
 1 file changed, 84 insertions(+), 57 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index a0f59d7a8875..4c1038dccae0 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1716,13 +1716,90 @@ static int addend_mips_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
 #define R_RISCV_SUB32		39
 #endif
 
+/*
+ * RelX: refers to usage of Elf_Rela or Elf_Rel interchangeably where possible.
+ *
+ * The usage of Elf_Rela (relocation with an addend) even for Elf_Rel (without)
+ * as an input parameter is possible for .r_offset and .r_info (same offset on
+ * both struct types) *BUT* .r_addend can ONLY be accessed on SHT_RELA headers
+ * (i.e., where it is valid in the input).
+ *
+ * Note: .r_addend on SHT_REL is calculated/accessed for the _output_ parameter,
+ * via the addend_ARCH_rel() functions, but that is fine, as it's not the input.
+ *
+ * Return value 1 indicates to skip further processing on this relocation entry.
+ *
+ * Output parameters:
+ * - 'r' is the relocation entry (i.e., replace data at 'r.r_offset' in section
+ * w/ header 'sechdr' w/ data from symbol w/ symbol table index in 'r.r_info',
+ * w/ possible relocation addend 'r.r_addend').
+ * - 'sym' is that symbol (a pointer to the symbol table + symbol table index).
+ */
+static int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
+			Elf_Rela *out_r, Elf_Sym **out_sym)
+{
+	Elf_Sym *sym;
+	Elf_Rela r;
+	unsigned int r_sym;
+
+	/* Get .r_offset/.r_info and r_sym */
+	r.r_offset = TO_NATIVE(rela->r_offset);
+#if KERNEL_ELFCLASS == ELFCLASS64
+	if (elf->hdr->e_machine == EM_MIPS) {
+		unsigned int r_typ;
+
+		r_sym = ELF64_MIPS_R_SYM(rela->r_info);
+		r_sym = TO_NATIVE(r_sym);
+		r_typ = ELF64_MIPS_R_TYPE(rela->r_info);
+		r.r_info = ELF64_R_INFO(r_sym, r_typ);
+	} else {
+		r.r_info = TO_NATIVE(rela->r_info);
+		r_sym = ELF_R_SYM(r.r_info);
+	}
+#else
+	r.r_info = TO_NATIVE(rela->r_info);
+	r_sym = ELF_R_SYM(r.r_info);
+#endif
+
+	/* Get .r_addend (only output on SHT_REL) */
+	if (sechdr->sh_type == SHT_RELA) {
+		r.r_addend = TO_NATIVE(rela->r_addend);
+	} else if (sechdr->sh_type == SHT_REL) {
+		r.r_addend = 0;
+		switch (elf->hdr->e_machine) {
+		case EM_386:
+			if (addend_386_rel(elf, sechdr, &r))
+				return 1;
+			break;
+		case EM_ARM:
+			if (addend_arm_rel(elf, sechdr, &r))
+				return 1;
+			break;
+		case EM_MIPS:
+			if (addend_mips_rel(elf, sechdr, &r))
+				return 1;
+			break;
+		}
+	}
+
+	sym = elf->symtab_start + r_sym;
+
+	/* Skip special sections */
+	if (is_shndx_special(sym->st_shndx))
+		return 1;
+
+	/* Done */
+	*out_r = r;
+	*out_sym = sym;
+	return 0;
+}
+
 static void section_rela(const char *modname, struct elf_info *elf,
 			 Elf_Shdr *sechdr)
 {
 	Elf_Sym  *sym;
 	Elf_Rela *rela;
 	Elf_Rela r;
-	unsigned int r_sym;
 	const char *fromsec;
 
 	Elf_Rela *start = (void *)elf->hdr + sechdr->sh_offset;
@@ -1735,23 +1812,9 @@ static void section_rela(const char *modname, struct elf_info *elf,
 		return;
 
 	for (rela = start; rela < stop; rela++) {
-		r.r_offset = TO_NATIVE(rela->r_offset);
-#if KERNEL_ELFCLASS == ELFCLASS64
-		if (elf->hdr->e_machine == EM_MIPS) {
-			unsigned int r_typ;
-			r_sym = ELF64_MIPS_R_SYM(rela->r_info);
-			r_sym = TO_NATIVE(r_sym);
-			r_typ = ELF64_MIPS_R_TYPE(rela->r_info);
-			r.r_info = ELF64_R_INFO(r_sym, r_typ);
-		} else {
-			r.r_info = TO_NATIVE(rela->r_info);
-			r_sym = ELF_R_SYM(r.r_info);
-		}
-#else
-		r.r_info = TO_NATIVE(rela->r_info);
-		r_sym = ELF_R_SYM(r.r_info);
-#endif
-		r.r_addend = TO_NATIVE(rela->r_addend);
+		if (get_relx_sym(elf, sechdr, rela, &r, &sym))
+			continue;
+
 		switch (elf->hdr->e_machine) {
 		case EM_RISCV:
 			if (!strcmp("__ex_table", fromsec) &&
@@ -1759,10 +1822,7 @@ static void section_rela(const char *modname, struct elf_info *elf,
 				continue;
 			break;
 		}
-		sym = elf->symtab_start + r_sym;
-		/* Skip special sections */
-		if (is_shndx_special(sym->st_shndx))
-			continue;
+
 		if (is_second_extable_reloc(start, rela, fromsec))
 			find_extable_entry_size(fromsec, &r);
 		check_section_mismatch(modname, elf, &r, sym, fromsec);
@@ -1775,7 +1835,6 @@ static void section_rel(const char *modname, struct elf_info *elf,
 	Elf_Sym *sym;
 	Elf_Rel *rel;
 	Elf_Rela r;
-	unsigned int r_sym;
 	const char *fromsec;
 
 	Elf_Rel *start = (void *)elf->hdr + sechdr->sh_offset;
@@ -1788,41 +1847,9 @@ static void section_rel(const char *modname, struct elf_info *elf,
 		return;
 
 	for (rel = start; rel < stop; rel++) {
-		r.r_offset = TO_NATIVE(rel->r_offset);
-#if KERNEL_ELFCLASS == ELFCLASS64
-		if (elf->hdr->e_machine == EM_MIPS) {
-			unsigned int r_typ;
-			r_sym = ELF64_MIPS_R_SYM(rel->r_info);
-			r_sym = TO_NATIVE(r_sym);
-			r_typ = ELF64_MIPS_R_TYPE(rel->r_info);
-			r.r_info = ELF64_R_INFO(r_sym, r_typ);
-		} else {
-			r.r_info = TO_NATIVE(rel->r_info);
-			r_sym = ELF_R_SYM(r.r_info);
-		}
-#else
-		r.r_info = TO_NATIVE(rel->r_info);
-		r_sym = ELF_R_SYM(r.r_info);
-#endif
-		r.r_addend = 0;
-		switch (elf->hdr->e_machine) {
-		case EM_386:
-			if (addend_386_rel(elf, sechdr, &r))
-				continue;
-			break;
-		case EM_ARM:
-			if (addend_arm_rel(elf, sechdr, &r))
-				continue;
-			break;
-		case EM_MIPS:
-			if (addend_mips_rel(elf, sechdr, &r))
-				continue;
-			break;
-		}
-		sym = elf->symtab_start + r_sym;
-		/* Skip special sections */
-		if (is_shndx_special(sym->st_shndx))
+		if (get_relx_sym(elf, sechdr, (Elf_Rela *)rel, &r, &sym)
 			continue;
+
 		if (is_second_extable_reloc(start, rel, fromsec))
 			find_extable_entry_size(fromsec, &r);
 		check_section_mismatch(modname, elf, &r, sym, fromsec);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/6] modpost: deduplicate section_rel[a]()
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 1/6] modpost: factor out elf/arch-specific code from section_rel[a]() Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-26  9:19   ` Masahiro Yamada
  2022-07-22  2:24 ` [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias Mauricio Faria de Oliveira
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

Now both functions are almost identical, and we can again generalize
the relocation types Elf_Rela/Elf_Rel with Elf_Rela, and handle some
differences with conditionals on section header type (SHT_RELA/REL).

The important bit is to make sure the loop increment uses the right
size for pointer arithmethic.

The original reason for split functions to make program logic easier
to follow; commit 5b24c0715fc4 ("kbuild: code refactoring in modpost").

Hopefully these 2 commits may help improving that, without an impact
in understanding the code due to generalization of relocation types.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 scripts/mod/modpost.c | 61 ++++++++++++++++---------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 4c1038dccae0..d1ed67fa290b 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1794,63 +1794,49 @@ static int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
 	return 0;
 }
 
-static void section_rela(const char *modname, struct elf_info *elf,
+/* The caller must ensure sechdr->sh_type == SHT_RELA or SHT_REL. */
+static void section_relx(const char *modname, struct elf_info *elf,
 			 Elf_Shdr *sechdr)
 {
 	Elf_Sym  *sym;
-	Elf_Rela *rela;
+	Elf_Rela *relx; /* access .r_addend in SHT_RELA _only_! */
 	Elf_Rela r;
+	size_t relx_size;
 	const char *fromsec;
 
 	Elf_Rela *start = (void *)elf->hdr + sechdr->sh_offset;
 	Elf_Rela *stop  = (void *)start + sechdr->sh_size;
 
 	fromsec = sech_name(elf, sechdr);
-	fromsec += strlen(".rela");
+	if (sechdr->sh_type == SHT_RELA) {
+		relx_size = sizeof(Elf_Rela);
+		fromsec += strlen(".rela");
+	} else if (sechdr->sh_type == SHT_REL) {
+		relx_size = sizeof(Elf_Rel);
+		fromsec += strlen(".rel");
+	} else {
+		error("%s: [%s.ko] not relocation section\n", fromsec, modname);
+		return;
+	}
+
 	/* if from section (name) is know good then skip it */
 	if (match(fromsec, section_white_list))
 		return;
 
-	for (rela = start; rela < stop; rela++) {
-		if (get_relx_sym(elf, sechdr, rela, &r, &sym))
+	for (relx = start; relx < stop; relx = (void *)relx + relx_size) {
+		if (get_relx_sym(elf, sechdr, relx, &r, &sym))
 			continue;
 
 		switch (elf->hdr->e_machine) {
 		case EM_RISCV:
-			if (!strcmp("__ex_table", fromsec) &&
+			if (sechdr->sh_type == SHT_RELA &&
+			    !strcmp("__ex_table", fromsec) &&
 			    ELF_R_TYPE(r.r_info) == R_RISCV_SUB32)
 				continue;
 			break;
 		}
 
-		if (is_second_extable_reloc(start, rela, fromsec))
-			find_extable_entry_size(fromsec, &r);
-		check_section_mismatch(modname, elf, &r, sym, fromsec);
-	}
-}
-
-static void section_rel(const char *modname, struct elf_info *elf,
-			Elf_Shdr *sechdr)
-{
-	Elf_Sym *sym;
-	Elf_Rel *rel;
-	Elf_Rela r;
-	const char *fromsec;
-
-	Elf_Rel *start = (void *)elf->hdr + sechdr->sh_offset;
-	Elf_Rel *stop  = (void *)start + sechdr->sh_size;
-
-	fromsec = sech_name(elf, sechdr);
-	fromsec += strlen(".rel");
-	/* if from section (name) is know good then skip it */
-	if (match(fromsec, section_white_list))
-		return;
-
-	for (rel = start; rel < stop; rel++) {
-		if (get_relx_sym(elf, sechdr, (Elf_Rela *)rel, &r, &sym)
-			continue;
-
-		if (is_second_extable_reloc(start, rel, fromsec))
+		if (is_second_extable_reloc(start, relx, fromsec))
 			find_extable_entry_size(fromsec, &r);
 		check_section_mismatch(modname, elf, &r, sym, fromsec);
 	}
@@ -1877,10 +1863,9 @@ static void check_sec_ref(const char *modname, struct elf_info *elf)
 	for (i = 0; i < elf->num_sections; i++) {
 		check_section(modname, elf, &elf->sechdrs[i]);
 		/* We want to process only relocation sections and not .init */
-		if (sechdrs[i].sh_type == SHT_RELA)
-			section_rela(modname, elf, &elf->sechdrs[i]);
-		else if (sechdrs[i].sh_type == SHT_REL)
-			section_rel(modname, elf, &elf->sechdrs[i]);
+		if (sechdrs[i].sh_type == SHT_RELA ||
+		    sechdrs[i].sh_type == SHT_REL)
+			section_relx(modname, elf, &elf->sechdrs[i]);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 1/6] modpost: factor out elf/arch-specific code from section_rel[a]() Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 2/6] modpost: deduplicate section_rel[a]() Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-26  9:25   ` Masahiro Yamada
  2022-07-22  2:24 ` [RFC PATCH 4/6] module, modpost: introduce support for MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

In order to expose a sysctl entry to modpost (file2alias.c, precisely)
we have to shadow 'struct ctl_table' in mod_devicetable.h, as scripts
should not access kernel headers or its types (see file2alias.c).

The required field is '.procname' (basename of '/proc/sys/.../entry').

Since 'struct ctl_table' is annotated for structure randomization and
we need a known offset for '.procname' (remember, no kernel headers),
take it out of the randomized portion (as in, eg, 'struct task_struct').

Of course, add build-time checks for struct size and .procname offset
between both structs. (This has to be done on kernel side; for headers.)

With that in place, use the regular macros in devicetable-offsets.c to
define SIZE_... and OFF_... macros for the shadow struct and the field
of interest.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 fs/proc/proc_sysctl.c             | 19 +++++++++++++++++++
 include/linux/mod_devicetable.h   | 25 +++++++++++++++++++++++++
 include/linux/sysctl.h            | 11 ++++++++++-
 kernel/sysctl.c                   |  1 +
 scripts/mod/devicetable-offsets.c |  3 +++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 021e83fe831f..ebbf8702387e 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -19,6 +19,24 @@
 #include <linux/kmemleak.h>
 #include "internal.h"
 
+#ifdef CONFIG_MODULES
+#include <linux/mod_devicetable.h>
+
+static void check_struct_sysctl_device_id(void)
+{
+	/*
+	 * The shadow struct sysctl_device_id for file2alias.c needs
+	 * the same size of struct ctl_table and offset for procname.
+	 */
+	BUILD_BUG_ON(sizeof(struct sysctl_device_id)
+			!= sizeof(struct ctl_table));
+	BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
+			!= offsetof(struct ctl_table, procname));
+}
+#else
+static void check_struct_sysctl_device_id(void) {}
+#endif
+
 #define list_for_each_table_entry(entry, table) \
 	for ((entry) = (table); (entry)->procname; (entry)++)
 
@@ -1779,6 +1797,7 @@ int __init proc_sys_init(void)
 	proc_sys_root->proc_dir_ops = &proc_sys_dir_file_operations;
 	proc_sys_root->nlink = 0;
 
+	check_struct_sysctl_device_id();
 	return sysctl_init_bases();
 }
 
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 549590e9c644..9cee024d8f2f 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -20,6 +20,31 @@ enum {
 	PCI_ID_F_VFIO_DRIVER_OVERRIDE = 1,
 };
 
+/*
+ * "Device" table entry for a sysctl file (shadow of struct ctl_table).
+ *
+ * Only the procname field is reliable (known offset); all other fields
+ * are in the randomized portion of struct ctl_table, do NOT use them.
+ */
+struct sysctl_device_id {
+
+	/* This must be the first field (shadowed from struct ctl_table). */
+	const char *procname;
+
+	/* Here begins the randomizable portion of struct ctl_table. */
+
+	void *data;
+	int maxlen;
+	unsigned short mode; // umode_t in <linux/types.h>
+	void *child;
+	void *proc_handler;
+	void *poll;
+	void *extra1;
+	void *extra2;
+
+	/* Here ends the randomizable portion of struct ctl_table. */
+};
+
 /**
  * struct pci_device_id - PCI device ID structure
  * @vendor:		Vendor ID to match (or PCI_ANY_ID)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 780690dc08cd..676112fde5ff 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -133,7 +133,13 @@ static inline void *proc_sys_poll_event(struct ctl_table_poll *poll)
 
 /* A sysctl table is an array of struct ctl_table: */
 struct ctl_table {
+
+	/* This must be the first field (shadowed to struct sysctl_device_id) */
 	const char *procname;		/* Text ID for /proc/sys, or zero */
+
+	/* This begins the randomizable portion of the struct. */
+	randomized_struct_fields_start
+
 	void *data;
 	int maxlen;
 	umode_t mode;
@@ -142,7 +148,10 @@ struct ctl_table {
 	struct ctl_table_poll *poll;
 	void *extra1;
 	void *extra2;
-} __randomize_layout;
+
+	/* New fields go above here, so they are in the randomized portion. */
+	randomized_struct_fields_end
+};
 
 struct ctl_node {
 	struct rb_node node;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 223376959d29..15073621cfa8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2487,6 +2487,7 @@ int __init sysctl_init_bases(void)
 
 	return 0;
 }
+
 #endif /* CONFIG_SYSCTL */
 /*
  * No sense putting this after each symbol definition, twice,
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index c0d3bcb99138..43b2549940d2 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -262,5 +262,8 @@ int main(void)
 	DEVID(ishtp_device_id);
 	DEVID_FIELD(ishtp_device_id, guid);
 
+	DEVID(sysctl_device_id);
+	DEVID_FIELD(sysctl_device_id, procname);
+
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/6] module, modpost: introduce support for MODULE_SYSCTL_TABLE
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
                   ` (2 preceding siblings ...)
  2022-07-22  2:24 ` [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 5/6] netfilter: conntrack: use MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

Now that 'struct ctl_table' is exposed to modpost/file2alias,
introduce MODULE_SYSCTL_TABLE to register some sysctl tables,
just like MODULE_DEVICE_TABLE for, eg, drivers PCI ID tables.

This adds a '__mod_sysctl__<name>_device_table' symbol in the
module that points to the array of 'struct ctl_table' entries.

This is identified and handled by file2alias, as other tables,
that emits MODULE_ALIAS("sysctl:<struct ctl_table.procname>")
statements for each entry.

That would be relatively simple for 'procname' as a char array
(eg, do_{rpmsg,i2c,spi}_entry()) but since it's a char pointer
an ELF relocation is used (it sets that pointer value to where
the string it should point to actually ends up).

...

It's probably not ideal to convert 'struct ctl_table.procname'
to a char array, as the max length it uses in some modules is
long, and all other users would pay the memory price of a one-
size-fits-all array size.

Also, some places set it to NULL, which requires special care
and changes (i.e., set/check the first byte is '\0').

Anyway, the resulting disadvantages in the general case aren't
worth the simplicity gain in this particular case.

...

So, add ELF relocation handling code, borrowing that function
we factored-out in modpost.c earlier.

The logic and details are commented in the source, but briefly:

0) modpost.c calls file2alias.c's handle_moddevtable()

1) handle_moddevtable()
1.1) matches a '__mod__sysctl__..._device_table' symbol
1.2) calls do_sysctl_table() for it (array of struct ctl_table)

2) do_sysctl_table()
2.1) finds the relocation section that _references_ the section
     that defines that symbol (ie, that holds the actual values
     of 'procname' char pointers to be replaced in that symbol).
2.2) calls do_sysctl_entry() for each entry in that table/array

3) do_sysctl_entry()
3.1) calls do_sysctl_section_relx() to scan a relocation section
     for the relocation entry that targets a particular procname
     pointer of this entry (struct ctl_table).
3.2) do_sysctl_section_relx() returns its source string pointer.
3.3) do_sysctl_entry() stores 'sysctl:<procname>' in the buffer.

4) do_sysctl_table() emits a 'MODULE_ALIAS()' statement with it.

...

This algorithm is likely iterating over the earlier relocation
entries many times as each 'struct ctl_table' entry starts the
loop over the relocation section again, but keep it simple now.

We could keep the relocation entry cursor pointer across calls,
and start over only when we can't find a relocation, I guess.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 include/linux/module.h   |   7 +++
 scripts/mod/file2alias.c | 111 +++++++++++++++++++++++++++++++++++++++
 scripts/mod/modpost.c    |   4 +-
 scripts/mod/modpost.h    |   3 ++
 4 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 518296ea7f73..3010f687df19 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -247,6 +247,13 @@ extern typeof(name) __mod_##type##__##name##_device_table		\
 #define MODULE_DEVICE_TABLE(type, name)
 #endif
 
+#if defined(MODULE) && defined(CONFIG_PROC_SYSCTL)
+/* Creates an alias so file2alias.c can find sysctl "device" table. */
+#define MODULE_SYSCTL_TABLE(name) MODULE_DEVICE_TABLE(sysctl, name)
+#else /* !MODULE || !CONFIG_PROC_SYSCTL */
+#define MODULE_SYSCTL_TABLE(name)
+#endif
+
 /* Version of form [<epoch>:]<version>[-<extra-version>].
  * Or for CVS/RCS ID version, everything but the number is stripped.
  * <epoch>: A (small) unsigned integer which allows you to start versions
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index cbd6b0f48b4e..3bf9cb84c548 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1452,6 +1452,115 @@ static int do_dfl_entry(const char *filename, void *symval, char *alias)
 	return 1;
 }
 
+/*
+ * Scan the relocation section with section header index 'relx_shndx'
+ * for the relocation entry for target 'offset' and return a pointer
+ * to its source value (from another section/offset).
+ *
+ * The caller must ensure sechdr->sh_type == SHT_RELA or SHT_REL.
+ */
+static void *do_sysctl_section_relx(struct elf_info *info,
+				    unsigned int relx_shndx, int offset)
+{
+	/* Relocation section's header, and start/stop addresses. */
+	Elf_Shdr *sechdr = &info->sechdrs[relx_shndx];
+	Elf_Rela *start = (void *)info->hdr + sechdr->sh_offset;
+	Elf_Rela *stop  = (void *)start + sechdr->sh_size;
+
+	/* Relocation entry cursor and size in SHT_RELA or SHT_REL. */
+	Elf_Rela *relx; /* access .r_addend in SHT_RELA _only_! */
+	size_t relx_size;
+
+	if (sechdr->sh_type == SHT_RELA)
+		relx_size = sizeof(Elf_Rela);
+	else if (sechdr->sh_type == SHT_REL)
+		relx_size = sizeof(Elf_Rel);
+	else
+		return NULL;
+
+	for (relx = start; relx < stop; relx = (void *)relx + relx_size) {
+		/*
+		 * 'r' is the relocation entry, applied to the 'target offset'
+		 * in the 'target section' of this (relocation) section, with
+		 * the value from symbol 'sym' (in/at 'source section/offset').
+		 */
+		Elf_Rela r;
+		Elf_Sym *sym;
+		unsigned int sym_shndx;
+		int sym_offset;
+
+		if (get_relx_sym(info, sechdr, relx, &r, &sym))
+			continue;
+
+		/* Looking for this target offset. */
+		if (r.r_offset != offset)
+			continue;
+
+		/* Pointer to source section/offset (note: addend is needed). */
+		sym_shndx = get_secindex(info, sym);
+		sym_offset = sym->st_value;
+		return (void *)info->hdr + info->sechdrs[sym_shndx].sh_offset
+			+ sym_offset + r.r_addend;
+	}
+
+	return NULL;
+}
+
+/* Looks like: sysctl:S */
+static int do_sysctl_entry(const char *modname, int sym_entry_offset,
+			   char *alias, const char *symname,
+			   unsigned int relx_shndx, struct elf_info *info)
+{
+	/* Find the relocation entry for procname's offset and use its string */
+	int offset = sym_entry_offset + OFF_sysctl_device_id_procname;
+	const char *procname = do_sysctl_section_relx(info, relx_shndx, offset);
+
+	if (procname) {
+		sprintf(alias, "sysctl:%s", procname);
+		return 1;
+	}
+
+	error("%s: [%s.ko] cannot find relocation string.\n", symname, modname);
+	return 0;
+}
+
+static void do_sysctl_table(void *symval, unsigned long size,
+			    struct module *mod, const char *symname,
+			    Elf_Sym *sym, struct elf_info *info)
+{
+	unsigned long id_size = SIZE_sysctl_device_id;
+	unsigned int i, secindex, shndx;
+	char alias[ALIAS_SIZE];
+
+	device_id_check(mod->name, "sysctl", size, id_size, symval);
+	/* Leave last one: it's the terminator. */
+	size -= id_size;
+
+	/* Find relocation section that references the section w/ the symbol. */
+	shndx = sym->st_shndx;
+	for (secindex = 0; secindex < info->num_sections; secindex++) {
+		Elf_Shdr *shdr = &info->sechdrs[secindex];
+
+		if ((shdr->sh_type == SHT_REL || shdr->sh_type == SHT_RELA) &&
+		    (shdr->sh_flags & SHF_INFO_LINK) && shdr->sh_info == shndx)
+			break;
+	}
+
+	if (secindex == info->num_sections) {
+		error("%s: [%s.ko] cannot find relocation section.\n",
+		      symname, mod->name);
+		return;
+	}
+
+	/* The symbol is an array of struct ctl_table elements at offset 'i'. */
+	for (i = 0; i < size; i += id_size) {
+		if (do_sysctl_entry(mod->name, sym->st_value+i, alias, symname, secindex, info)) {
+			buf_printf(&mod->dev_table_buf,
+				   "MODULE_ALIAS(\"%s\");\n", alias);
+		}
+	}
+}
+
 /* Does namelen bytes of name exactly match the symbol? */
 static bool sym_is(const char *name, unsigned namelen, const char *symbol)
 {
@@ -1585,6 +1694,8 @@ void handle_moddevtable(struct module *mod, struct elf_info *info,
 		do_pnp_device_entry(symval, sym->st_size, mod);
 	else if (sym_is(name, namelen, "pnp_card"))
 		do_pnp_card_entries(symval, sym->st_size, mod);
+	if (sym_is(name, namelen, "sysctl"))
+		do_sysctl_table(symval, sym->st_size, mod, symname, sym, info);
 	else {
 		int i;
 
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index d1ed67fa290b..e2df2fbb0909 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1735,8 +1735,8 @@ static int addend_mips_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
  * w/ possible relocation addend 'r.r_addend').
  * - 'sym' is that symbol (a pointer to the symbol table + symbol table index).
  */
-static int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
-			Elf_Rela *out_r, Elf_Sym **out_sym)
+int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
+		 Elf_Rela *out_r, Elf_Sym **out_sym)
 {
 	Elf_Sym *sym;
 	Elf_Rela r;
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index 044bdfb894b7..cdb95c7e03a9 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -212,3 +212,6 @@ void modpost_log(enum loglevel loglevel, const char *fmt, ...);
 #define warn(fmt, args...)	modpost_log(LOG_WARN, fmt, ##args)
 #define error(fmt, args...)	modpost_log(LOG_ERROR, fmt, ##args)
 #define fatal(fmt, args...)	modpost_log(LOG_FATAL, fmt, ##args)
+
+int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
+			Elf_Rela *out_r, Elf_Sym **out_sym);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 5/6] netfilter: conntrack: use MODULE_SYSCTL_TABLE
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
                   ` (3 preceding siblings ...)
  2022-07-22  2:24 ` [RFC PATCH 4/6] module, modpost: introduce support for MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-22  2:24 ` [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias Mauricio Faria de Oliveira
  2022-07-26  9:02 ` [RFC PATCH 0/6] Introduce "sysctl:" module aliases Masahiro Yamada
  6 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

Let's take nf_conntrack as an (actually helpful) example and exerciser,
as it has many sysctl entries, and other module aliases already.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 net/netfilter/nf_conntrack_standalone.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 05895878610c..2da628f054cf 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -980,6 +980,8 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 	{}
 };
 
+MODULE_SYSCTL_TABLE(nf_ct_sysctl_table);
+
 static struct ctl_table nf_ct_netfilter_table[] = {
 	{
 		.procname	= "nf_conntrack_max",
@@ -991,6 +993,8 @@ static struct ctl_table nf_ct_netfilter_table[] = {
 	{ }
 };
 
+/* MODULE_SYSCTL_TABLE(nf_ct_sysctl_table) already includes nf_conntrack_max. */
+
 static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
 						    struct ctl_table *table)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
                   ` (4 preceding siblings ...)
  2022-07-22  2:24 ` [RFC PATCH 5/6] netfilter: conntrack: use MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
@ 2022-07-22  2:24 ` Mauricio Faria de Oliveira
  2022-07-26  9:22   ` Masahiro Yamada
  2022-07-26  9:02 ` [RFC PATCH 0/6] Introduce "sysctl:" module aliases Masahiro Yamada
  6 siblings, 1 reply; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-22  2:24 UTC (permalink / raw)
  To: linux-kernel, linux-modules, linux-kbuild, linux-fsdevel
  Cc: Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Luis Chamberlain, Kees Cook, Iurii Zaikin

The goal of the earlier patches is to let sysctl userspace tools
load the kernel module with a sysctl entry that is not available
yet in /proc/sys/ when the tool runs (so it can become available).

Let's expose this file for userspace for two reasons:

1) Allow such tools to identify that the running kernel has the
   code which produces sysctl module aliases, so they could run
   'modprobe sysctl:<entry>' only when it may actually help.

2) Allow an administrator to hint such tools not to do that, if
   that is desired for some reason (e.g., rather have the tools
   fail if something is misconfigured in a critical deployment).

Also add a module parameter for that (proc.modprobe_sysctl_alias),
for another method that doesn't depend on sysctl tools to be set
(that wouldn't fail them to try and set it if it's not there yet).

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 fs/proc/proc_sysctl.c  | 8 ++++++++
 include/linux/module.h | 1 +
 kernel/sysctl.c        | 9 +++++++++
 3 files changed, 18 insertions(+)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index ebbf8702387e..1e63819fcda8 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -33,6 +33,14 @@ static void check_struct_sysctl_device_id(void)
 	BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
 			!= offsetof(struct ctl_table, procname));
 }
+
+/*
+ * Hint sysctl userspace tools whether or not to run modprobe with sysctl alias
+ * ('modprobe sysctl:entry') if they cannot find the file '/proc/sys/.../entry'
+ */
+int modprobe_sysctl_alias = 1;
+module_param(modprobe_sysctl_alias, int, 0644);
+
 #else
 static void check_struct_sysctl_device_id(void) {}
 #endif
diff --git a/include/linux/module.h b/include/linux/module.h
index 3010f687df19..5f565491c596 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -304,6 +304,7 @@ struct notifier_block;
 #ifdef CONFIG_MODULES
 
 extern int modules_disabled; /* for sysctl */
+extern int modprobe_sysctl_alias; /* for proc sysctl */
 /* Get/put a kernel symbol (calls must be symmetric) */
 void *__symbol_get(const char *symbol);
 void *__symbol_get_gpl(const char *symbol);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 15073621cfa8..b396cfcb55fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1763,6 +1763,15 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dostring,
 	},
+#ifdef CONFIG_PROC_SYSCTL
+	{
+		.procname	= "modprobe_sysctl_alias",
+		.data		= &modprobe_sysctl_alias,
+		.maxlen		= sizeof(modprobe_sysctl_alias),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
 	{
 		.procname	= "modules_disabled",
 		.data		= &modules_disabled,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/6] Introduce "sysctl:" module aliases
  2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
                   ` (5 preceding siblings ...)
  2022-07-22  2:24 ` [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias Mauricio Faria de Oliveira
@ 2022-07-26  9:02 ` Masahiro Yamada
  2022-07-27 17:09   ` Mauricio Faria de Oliveira
  6 siblings, 1 reply; 15+ messages in thread
From: Masahiro Yamada @ 2022-07-26  9:02 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
>
> This series allows modules to have "sysctl:<procname>" module aliases
> for sysctl entries, registering sysctl tables with modpost/file2alias.
> (Similarly to "pci:<IDs>" aliases for PCI ID tables in device drivers.)
>
> The issue behind it: if a sysctl value is in /etc/sysctl.{conf,d/*.conf}
> but does not exist in /proc/sys/ when the userspace tool that applies it
> runs, it does not get set.
>
> It would be nice if the tool could run 'modprobe sysctl:<something>' and
> get that '/proc/sys/.../something' up (as an administrator configured it)
> and then set it, as intended. (A bit like PCI ID-based module loading.)
>
> ...
>
> The series is relatively simple, except for patch 4 (IMHO) due to ELF.
>
> - Patches 1-2 simplify ELF code in modpost.c (code moves, not in-depth).
> - Patches 3-4 implement the feature (patch 4 is more in-depth).
> - Patches 5-6 consume and expose it.
>
> I have tested it on x86_64 with next-20220721, and it looks correct
> ('modprobe sysctl:nf_conntrack_max' works; other aliases there; see below).


I did not test this patch set at all, but I am afraid
you took a good case as an example.



I see two locations for the "fib_multipath_hash_fields" parameter
for example.

#  find /proc/sys/ -name fib_multipath_hash_fields
/proc/sys/net/ipv4/fib_multipath_hash_fields
/proc/sys/net/ipv6/fib_multipath_hash_fields


If I run

   modprobe sysctl:fib_multipath_hash_fields

Which one will be loaded, net/ipv4/sysctl_net_ipv4.c
or ipv6/sysctl_net_ipv6.c ?

Of course, IPv4 is always built-in, so ipv6.ko will be loaded in this case.
But, let's think. The basename is not enough to identify
which code resulted in that sysctl property.
The PCI vendor/device ID is meant to be unique. That's the difference.


You may argue the full path is globally unique, so

  modprobe  sysctl:net/ipv6/fib_multipath_hash_fields

should work, but that may not be so feasible to implement
because not all file paths are static.


On my machine:

# find  /proc/sys  -name  forwarding
/proc/sys/net/ipv4/conf/all/forwarding
/proc/sys/net/ipv4/conf/br-22440b7735e7/forwarding
/proc/sys/net/ipv4/conf/br-3e8284a56053/forwarding
/proc/sys/net/ipv4/conf/br-9b27f0f9e130/forwarding
/proc/sys/net/ipv4/conf/br-bc5fbfa838fc/forwarding
/proc/sys/net/ipv4/conf/br-ca51e25e8af8/forwarding
/proc/sys/net/ipv4/conf/default/forwarding
/proc/sys/net/ipv4/conf/docker0/forwarding
/proc/sys/net/ipv4/conf/lo/forwarding
/proc/sys/net/ipv4/conf/lxcbr0/forwarding
/proc/sys/net/ipv4/conf/veth6e3e4b8/forwarding
/proc/sys/net/ipv4/conf/virbr0/forwarding
/proc/sys/net/ipv4/conf/vpn0/forwarding
/proc/sys/net/ipv4/conf/wlp0s20f3/forwarding
/proc/sys/net/ipv6/conf/all/forwarding
/proc/sys/net/ipv6/conf/br-22440b7735e7/forwarding
/proc/sys/net/ipv6/conf/br-3e8284a56053/forwarding
/proc/sys/net/ipv6/conf/br-9b27f0f9e130/forwarding
/proc/sys/net/ipv6/conf/br-bc5fbfa838fc/forwarding
/proc/sys/net/ipv6/conf/br-ca51e25e8af8/forwarding
/proc/sys/net/ipv6/conf/default/forwarding
/proc/sys/net/ipv6/conf/docker0/forwarding
/proc/sys/net/ipv6/conf/lo/forwarding
/proc/sys/net/ipv6/conf/lxcbr0/forwarding
/proc/sys/net/ipv6/conf/veth6e3e4b8/forwarding
/proc/sys/net/ipv6/conf/virbr0/forwarding
/proc/sys/net/ipv6/conf/vpn0/forwarding
/proc/sys/net/ipv6/conf/wlp0s20f3/forwarding


I do not know how to do it correctly.




>
> I plan to test other archs by cross-building 'allmodconfig' and checking
> the .mod.c files and modpost output (eg, warnings) for no changes at all,
> and nf_conntrack.mod.c for expected sysctl aliases. [based on feedback.]
> (i.e., changes didn't break modpost, and ELF code works on other archs.)
>
> Happy to receive suggestions to improve test coverage and functionality.
>
> I didn't look much at auto-registration with modpost using the register
> functions for sysctl, but it seems it would need plumbing, if possible.
>
> Let's see review/feedback on the basics first.
>
> thanks,
> Mauricio
>
> ...
>
> Some context.
>
> Even though that issue might be expected and obvious, its consequences
> sometimes are not.
>
> An example is the nf_conntrack_max value, that in busy gateways/routers
> /cloud deployments can affect performance and functionality more subtly,
> or even fill the kernel log non-stop with 'table full, dropping packet',
> if a value greater than the default value is not used.
>
> The current solution (workaround, arguably) for this is to include such
> modules in /etc/modules (or in /etc/modules-load.d/*.conf with systemd),
> which loads them before an userspace tool (procps's sysctl or systemd's
> systemd-sysctl{,.service}) runs, so /proc/sys/... exists when it runs.
>
> ...
>
> That is simple, indeed, but comes w/ technical debt. (ugly stuff warning!)
>
> Now there are many _different_ pieces of code that use the _same_ module
> doing that (eg, deployment tools/scripts for openstack nova and neutron,
> firewalls, and maybe more).
>
> And sometimes when components are split or deployed to different nodes
> it turns out that in the next reboot we figure (through an issue) that
> some component did set /etc/sysctl.conf but not /etc/modules.conf, or
> relied in the ex-colocated component doing that.
>
> This has generated several one-off fixes at this point in some projects.
> (I have submitted one of those, actually, a while ago.)
>
> Also, some of those fixes (or original code) put 'nf_conntrack_ipv{4,6}'
> in /etc/modules, getting 'nf_conntrack' loaded via module dependencies
> (maybe it was the right module for them at the time, for some reason).
>
> So, that component (or a colocated component) got nf_conntrack.ko too.
>
> *BUT* after an upgrade from Ubuntu 18.04 (4.15-based kernel) to 20.04
> (5.4-based kernel), the nf_conntrack_ipv{4,6}.ko modules do not exist
> anymore, and now nf_conntrack.ko is no longer loaded, and the sysctl
> nf_conntrack_max is no longer applied. (Someone had to figure it out.)
>
> And now maybe we'd need release/kernel-version checks in scripts that
> use the workaround of /etc/modules for /etc/sysctl.conf configuration.
>
> (Yes, it was ugly stuff.)
>
> ...
>
> Well, this last point seemed like "ok, that's enough; we can do better."
>
> I'm not sure this approach is "better" in all reasons, but hopefully it
> might help starting something that is. 🙏
>
> cheers,
> Mauricio
>
> ...
>
> Tests:
>
>     $ cat /proc/sys/kernel/modprobe_sysctl_alias
>     1
>
>     $ cat /proc/sys/net/netfilter/nf_conntrack_max
>     cat: /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
>
>     $ lsmod | grep nf_conntrack
>     $
>
>     $ sudo modprobe sysctl:nf_conntrack_max
>
>     $ cat /proc/sys/net/netfilter/nf_conntrack_max
>     262144
>
>     $ lsmod | grep nf_conntrack
>     nf_conntrack          110592  0
>     nf_defrag_ipv6         20480  1 nf_conntrack
>     nf_defrag_ipv4         16384  1 nf_conntrack
>
>     $ modinfo nf_conntrack | grep ^alias:
>     alias:          nf_conntrack-10
>     alias:          nf_conntrack-2
>     alias:          ip_conntrack
>     alias:          sysctl:nf_conntrack_icmpv6_timeout
>     alias:          sysctl:nf_conntrack_icmp_timeout
>     alias:          sysctl:nf_conntrack_udp_timeout_stream
>     alias:          sysctl:nf_conntrack_udp_timeout
>     alias:          sysctl:nf_conntrack_tcp_max_retrans
>     alias:          sysctl:nf_conntrack_tcp_ignore_invalid_rst
>     alias:          sysctl:nf_conntrack_tcp_be_liberal
>     alias:          sysctl:nf_conntrack_tcp_loose
>     alias:          sysctl:nf_conntrack_tcp_timeout_unacknowledged
>     alias:          sysctl:nf_conntrack_tcp_timeout_max_retrans
>     alias:          sysctl:nf_conntrack_tcp_timeout_close
>     alias:          sysctl:nf_conntrack_tcp_timeout_time_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_last_ack
>     alias:          sysctl:nf_conntrack_tcp_timeout_close_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_fin_wait
>     alias:          sysctl:nf_conntrack_tcp_timeout_established
>     alias:          sysctl:nf_conntrack_tcp_timeout_syn_recv
>     alias:          sysctl:nf_conntrack_tcp_timeout_syn_sent
>     alias:          sysctl:nf_conntrack_generic_timeout
>     alias:          sysctl:nf_conntrack_helper
>     alias:          sysctl:nf_conntrack_acct
>     alias:          sysctl:nf_conntrack_expect_max
>     alias:          sysctl:nf_conntrack_log_invalid
>     alias:          sysctl:nf_conntrack_checksum
>     alias:          sysctl:nf_conntrack_buckets
>     alias:          sysctl:nf_conntrack_count
>     alias:          sysctl:nf_conntrack_max
>
>     $ modinfo r8169 | grep ^alias:
>     alias:          pci:v000010ECd00003000sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008125sv*sd*bc*sc*i*
>     alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
>     alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
>     alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
>     alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
>     alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
>     alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
>     alias:          pci:v000010FFd00008168sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008162sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008161sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00002600sv*sd*bc*sc*i*
>     alias:          pci:v000010ECd00002502sv*sd*bc*sc*i*
>
> Mauricio Faria de Oliveira (6):
>   modpost: factor out elf/arch-specific code from section_rel[a]()
>   modpost: deduplicate section_rel[a]()
>   sysctl, mod_devicetable: shadow struct ctl_table.procname for
>     file2alias
>   module, modpost: introduce support for MODULE_SYSCTL_TABLE
>   netfilter: conntrack: use MODULE_SYSCTL_TABLE
>   sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
>
>  fs/proc/proc_sysctl.c                   |  27 ++++
>  include/linux/mod_devicetable.h         |  25 ++++
>  include/linux/module.h                  |   8 ++
>  include/linux/sysctl.h                  |  11 +-
>  kernel/sysctl.c                         |  10 ++
>  net/netfilter/nf_conntrack_standalone.c |   4 +
>  scripts/mod/devicetable-offsets.c       |   3 +
>  scripts/mod/file2alias.c                | 111 +++++++++++++++
>  scripts/mod/modpost.c                   | 178 +++++++++++++-----------
>  scripts/mod/modpost.h                   |   3 +
>  10 files changed, 296 insertions(+), 84 deletions(-)
>
> --
> 2.25.1
>


--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/6] modpost: deduplicate section_rel[a]()
  2022-07-22  2:24 ` [RFC PATCH 2/6] modpost: deduplicate section_rel[a]() Mauricio Faria de Oliveira
@ 2022-07-26  9:19   ` Masahiro Yamada
  2022-07-27 17:10     ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 15+ messages in thread
From: Masahiro Yamada @ 2022-07-26  9:19 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
>
> Now both functions are almost identical, and we can again generalize
> the relocation types Elf_Rela/Elf_Rel with Elf_Rela, and handle some
> differences with conditionals on section header type (SHT_RELA/REL).
>
> The important bit is to make sure the loop increment uses the right
> size for pointer arithmethic.
>
> The original reason for split functions to make program logic easier
> to follow; commit 5b24c0715fc4 ("kbuild: code refactoring in modpost").
>
> Hopefully these 2 commits may help improving that, without an impact
> in understanding the code due to generalization of relocation types.
>
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> ---
>  scripts/mod/modpost.c | 61 ++++++++++++++++---------------------------
>  1 file changed, 23 insertions(+), 38 deletions(-)
>
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 4c1038dccae0..d1ed67fa290b 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1794,63 +1794,49 @@ static int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
>         return 0;
>  }
>
> -static void section_rela(const char *modname, struct elf_info *elf,
> +/* The caller must ensure sechdr->sh_type == SHT_RELA or SHT_REL. */
> +static void section_relx(const char *modname, struct elf_info *elf,
>                          Elf_Shdr *sechdr)
>  {
>         Elf_Sym  *sym;
> -       Elf_Rela *rela;
> +       Elf_Rela *relx; /* access .r_addend in SHT_RELA _only_! */
>         Elf_Rela r;
> +       size_t relx_size;
>         const char *fromsec;
>
>         Elf_Rela *start = (void *)elf->hdr + sechdr->sh_offset;
>         Elf_Rela *stop  = (void *)start + sechdr->sh_size;
>
>         fromsec = sech_name(elf, sechdr);
> -       fromsec += strlen(".rela");
> +       if (sechdr->sh_type == SHT_RELA) {
> +               relx_size = sizeof(Elf_Rela);
> +               fromsec += strlen(".rela");
> +       } else if (sechdr->sh_type == SHT_REL) {
> +               relx_size = sizeof(Elf_Rel);
> +               fromsec += strlen(".rel");
> +       } else {
> +               error("%s: [%s.ko] not relocation section\n", fromsec, modname);


Nit.

modname already contains the suffix  ".o".

For vmlinux, the error message will print like this:
[vmlinux.o.ko]





> +               return;
> +       }
> +
>         /* if from section (name) is know good then skip it */
>         if (match(fromsec, section_white_list))
>                 return;
>
> -       for (rela = start; rela < stop; rela++) {
> -               if (get_relx_sym(elf, sechdr, rela, &r, &sym))
> +       for (relx = start; relx < stop; relx = (void *)relx + relx_size) {
> +               if (get_relx_sym(elf, sechdr, relx, &r, &sym))
>                         continue;
>
>                 switch (elf->hdr->e_machine) {
>                 case EM_RISCV:
> -                       if (!strcmp("__ex_table", fromsec) &&
> +                       if (sechdr->sh_type == SHT_RELA &&
> +                           !strcmp("__ex_table", fromsec) &&
>                             ELF_R_TYPE(r.r_info) == R_RISCV_SUB32)
>                                 continue;
>                         break;
>                 }
>
> -               if (is_second_extable_reloc(start, rela, fromsec))
> -                       find_extable_entry_size(fromsec, &r);
> -               check_section_mismatch(modname, elf, &r, sym, fromsec);
> -       }
> -}
> -
> -static void section_rel(const char *modname, struct elf_info *elf,
> -                       Elf_Shdr *sechdr)
> -{
> -       Elf_Sym *sym;
> -       Elf_Rel *rel;
> -       Elf_Rela r;
> -       const char *fromsec;
> -
> -       Elf_Rel *start = (void *)elf->hdr + sechdr->sh_offset;
> -       Elf_Rel *stop  = (void *)start + sechdr->sh_size;
> -
> -       fromsec = sech_name(elf, sechdr);
> -       fromsec += strlen(".rel");
> -       /* if from section (name) is know good then skip it */
> -       if (match(fromsec, section_white_list))
> -               return;
> -
> -       for (rel = start; rel < stop; rel++) {
> -               if (get_relx_sym(elf, sechdr, (Elf_Rela *)rel, &r, &sym)
> -                       continue;
> -
> -               if (is_second_extable_reloc(start, rel, fromsec))
> +               if (is_second_extable_reloc(start, relx, fromsec))
>                         find_extable_entry_size(fromsec, &r);
>                 check_section_mismatch(modname, elf, &r, sym, fromsec);
>         }
> @@ -1877,10 +1863,9 @@ static void check_sec_ref(const char *modname, struct elf_info *elf)
>         for (i = 0; i < elf->num_sections; i++) {
>                 check_section(modname, elf, &elf->sechdrs[i]);
>                 /* We want to process only relocation sections and not .init */
> -               if (sechdrs[i].sh_type == SHT_RELA)
> -                       section_rela(modname, elf, &elf->sechdrs[i]);
> -               else if (sechdrs[i].sh_type == SHT_REL)
> -                       section_rel(modname, elf, &elf->sechdrs[i]);
> +               if (sechdrs[i].sh_type == SHT_RELA ||
> +                   sechdrs[i].sh_type == SHT_REL)
> +                       section_relx(modname, elf, &elf->sechdrs[i]);
>         }
>  }
>
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
  2022-07-22  2:24 ` [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias Mauricio Faria de Oliveira
@ 2022-07-26  9:22   ` Masahiro Yamada
  2022-07-27 17:11     ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 15+ messages in thread
From: Masahiro Yamada @ 2022-07-26  9:22 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
>
> The goal of the earlier patches is to let sysctl userspace tools
> load the kernel module with a sysctl entry that is not available
> yet in /proc/sys/ when the tool runs (so it can become available).
>
> Let's expose this file for userspace for two reasons:
>
> 1) Allow such tools to identify that the running kernel has the
>    code which produces sysctl module aliases, so they could run
>    'modprobe sysctl:<entry>' only when it may actually help.
>
> 2) Allow an administrator to hint such tools not to do that, if
>    that is desired for some reason (e.g., rather have the tools
>    fail if something is misconfigured in a critical deployment).

This flag is just a hint.
User-space tools are still able to ignore it.

Perhaps, such administrator's choice might be specified in
tools' configuration file.

For example,

/etc/modprobe.d/forbid-sysctl-alias.conf

may specify

    blacklist:  sysctl:*

if they want to forbid sysctl aliasing.
(but I do not know if this works or not).














> Also add a module parameter for that (proc.modprobe_sysctl_alias),
> for another method that doesn't depend on sysctl tools to be set
> (that wouldn't fail them to try and set it if it's not there yet).
>
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> ---
>  fs/proc/proc_sysctl.c  | 8 ++++++++
>  include/linux/module.h | 1 +
>  kernel/sysctl.c        | 9 +++++++++
>  3 files changed, 18 insertions(+)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index ebbf8702387e..1e63819fcda8 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -33,6 +33,14 @@ static void check_struct_sysctl_device_id(void)
>         BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
>                         != offsetof(struct ctl_table, procname));
>  }
> +
> +/*
> + * Hint sysctl userspace tools whether or not to run modprobe with sysctl alias
> + * ('modprobe sysctl:entry') if they cannot find the file '/proc/sys/.../entry'
> + */
> +int modprobe_sysctl_alias = 1;
> +module_param(modprobe_sysctl_alias, int, 0644);
> +
>  #else
>  static void check_struct_sysctl_device_id(void) {}
>  #endif
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 3010f687df19..5f565491c596 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -304,6 +304,7 @@ struct notifier_block;
>  #ifdef CONFIG_MODULES
>
>  extern int modules_disabled; /* for sysctl */
> +extern int modprobe_sysctl_alias; /* for proc sysctl */
>  /* Get/put a kernel symbol (calls must be symmetric) */
>  void *__symbol_get(const char *symbol);
>  void *__symbol_get_gpl(const char *symbol);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 15073621cfa8..b396cfcb55fc 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1763,6 +1763,15 @@ static struct ctl_table kern_table[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dostring,
>         },
> +#ifdef CONFIG_PROC_SYSCTL
> +       {
> +               .procname       = "modprobe_sysctl_alias",
> +               .data           = &modprobe_sysctl_alias,
> +               .maxlen         = sizeof(modprobe_sysctl_alias),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
> +#endif
>         {
>                 .procname       = "modules_disabled",
>                 .data           = &modules_disabled,
> --
> 2.25.1
>


--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias
  2022-07-22  2:24 ` [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias Mauricio Faria de Oliveira
@ 2022-07-26  9:25   ` Masahiro Yamada
  2022-07-27 17:11     ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 15+ messages in thread
From: Masahiro Yamada @ 2022-07-26  9:25 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
<mfo@canonical.com> wrote:
>
> In order to expose a sysctl entry to modpost (file2alias.c, precisely)
> we have to shadow 'struct ctl_table' in mod_devicetable.h, as scripts
> should not access kernel headers or its types (see file2alias.c).
>
> The required field is '.procname' (basename of '/proc/sys/.../entry').
>
> Since 'struct ctl_table' is annotated for structure randomization and
> we need a known offset for '.procname' (remember, no kernel headers),
> take it out of the randomized portion (as in, eg, 'struct task_struct').
>
> Of course, add build-time checks for struct size and .procname offset
> between both structs. (This has to be done on kernel side; for headers.)
>
> With that in place, use the regular macros in devicetable-offsets.c to
> define SIZE_... and OFF_... macros for the shadow struct and the field
> of interest.
>
> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> ---
>  fs/proc/proc_sysctl.c             | 19 +++++++++++++++++++
>  include/linux/mod_devicetable.h   | 25 +++++++++++++++++++++++++
>  include/linux/sysctl.h            | 11 ++++++++++-
>  kernel/sysctl.c                   |  1 +
>  scripts/mod/devicetable-offsets.c |  3 +++
>  5 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 021e83fe831f..ebbf8702387e 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -19,6 +19,24 @@
>  #include <linux/kmemleak.h>
>  #include "internal.h"
>
> +#ifdef CONFIG_MODULES
> +#include <linux/mod_devicetable.h>
> +
> +static void check_struct_sysctl_device_id(void)
> +{
> +       /*
> +        * The shadow struct sysctl_device_id for file2alias.c needs
> +        * the same size of struct ctl_table and offset for procname.
> +        */
> +       BUILD_BUG_ON(sizeof(struct sysctl_device_id)
> +                       != sizeof(struct ctl_table));
> +       BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
> +                       != offsetof(struct ctl_table, procname));


Nit:

If you use static_assert(), you can remove
 check_struct_sysctl_device_id().


You can write static_assert() out of a function.



> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 223376959d29..15073621cfa8 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2487,6 +2487,7 @@ int __init sysctl_init_bases(void)
>
>         return 0;
>  }
> +


Noise.




>  #endif /* CONFIG_SYSCTL */
>  /*
>   * No sense putting this after each symbol definition, twice,
> diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
> index c0d3bcb99138..43b2549940d2 100644
> --- a/scripts/mod/devicetable-offsets.c
> +++ b/scripts/mod/devicetable-offsets.c
> @@ -262,5 +262,8 @@ int main(void)
>         DEVID(ishtp_device_id);
>         DEVID_FIELD(ishtp_device_id, guid);
>
> +       DEVID(sysctl_device_id);
> +       DEVID_FIELD(sysctl_device_id, procname);
> +
>         return 0;
>  }
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/6] Introduce "sysctl:" module aliases
  2022-07-26  9:02 ` [RFC PATCH 0/6] Introduce "sysctl:" module aliases Masahiro Yamada
@ 2022-07-27 17:09   ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-27 17:09 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

Hey Masahiro,

Thanks for looking into this!

On Tue, Jul 26, 2022 at 6:04 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
[...]
> > I have tested it on x86_64 with next-20220721, and it looks correct
> > ('modprobe sysctl:nf_conntrack_max' works; other aliases there; see below).
[...]

> I see two locations for the "fib_multipath_hash_fields" parameter
> for example.
>
> #  find /proc/sys/ -name fib_multipath_hash_fields
> /proc/sys/net/ipv4/fib_multipath_hash_fields
> /proc/sys/net/ipv6/fib_multipath_hash_fields
>
>
> If I run
>
>    modprobe sysctl:fib_multipath_hash_fields
>
> Which one will be loaded, net/ipv4/sysctl_net_ipv4.c
> or ipv6/sysctl_net_ipv6.c ?
>
> Of course, IPv4 is always built-in, so ipv6.ko will be loaded in this case.
> But, let's think. The basename is not enough to identify
> which code resulted in that sysctl property.
> The PCI vendor/device ID is meant to be unique. That's the difference.
>
>
> You may argue the full path is globally unique, so
>
>   modprobe  sysctl:net/ipv6/fib_multipath_hash_fields
>
> should work, but that may not be so feasible to implement
> because not all file paths are static.
>
>
> On my machine:
>
> # find  /proc/sys  -name  forwarding
> /proc/sys/net/ipv4/conf/all/forwarding
> /proc/sys/net/ipv4/conf/br-22440b7735e7/forwarding
> /proc/sys/net/ipv4/conf/br-3e8284a56053/forwarding
> /proc/sys/net/ipv4/conf/br-9b27f0f9e130/forwarding
> /proc/sys/net/ipv4/conf/br-bc5fbfa838fc/forwarding
> /proc/sys/net/ipv4/conf/br-ca51e25e8af8/forwarding
> /proc/sys/net/ipv4/conf/default/forwarding
> /proc/sys/net/ipv4/conf/docker0/forwarding
> /proc/sys/net/ipv4/conf/lo/forwarding
> /proc/sys/net/ipv4/conf/lxcbr0/forwarding
> /proc/sys/net/ipv4/conf/veth6e3e4b8/forwarding
> /proc/sys/net/ipv4/conf/virbr0/forwarding
> /proc/sys/net/ipv4/conf/vpn0/forwarding
> /proc/sys/net/ipv4/conf/wlp0s20f3/forwarding
> /proc/sys/net/ipv6/conf/all/forwarding
> /proc/sys/net/ipv6/conf/br-22440b7735e7/forwarding
> /proc/sys/net/ipv6/conf/br-3e8284a56053/forwarding
> /proc/sys/net/ipv6/conf/br-9b27f0f9e130/forwarding
> /proc/sys/net/ipv6/conf/br-bc5fbfa838fc/forwarding
> /proc/sys/net/ipv6/conf/br-ca51e25e8af8/forwarding
> /proc/sys/net/ipv6/conf/default/forwarding
> /proc/sys/net/ipv6/conf/docker0/forwarding
> /proc/sys/net/ipv6/conf/lo/forwarding
> /proc/sys/net/ipv6/conf/lxcbr0/forwarding
> /proc/sys/net/ipv6/conf/veth6e3e4b8/forwarding
> /proc/sys/net/ipv6/conf/virbr0/forwarding
> /proc/sys/net/ipv6/conf/vpn0/forwarding
> /proc/sys/net/ipv6/conf/wlp0s20f3/forwarding
>
>
> I do not know how to do it correctly.

Good point. So, these are actually 2 similar, but subtly different cases.

1) Multiple sysctl entries with identical procname in the _same_
module (e.g., forwarding in either ipv4/ipv6).

This should be fine, as the same module is backing the entries.

2) Multiple sysctl entries with identical procname in _different_
modules (e.g., forwarding in both ipv4/ipv6).

This would load all the different modules, per modprobe's behavior.

Note that a similar case exists with PCI IDs too: alternative device drivers;
and a way is to define which module to choose/ignore, as in modprobe.d(5).
(e.g., alias a particular, duplicated sysctl entry to the chosen
module/ignored).

Sure enough, this isn't efficient, and a kernel-only approach is required.

I'd say it's possible to compromise with a wildcard (e.g., sysctl:*/procname),
so the user/tool knows it's not necessarily unique -- this can be done now.

For some uniqueness, I guess we could add the static parts of the path
(as you mentioned, not all parts of the path are static) in some field(s)
in the alias (similar to PCI IDs, as well), and introduce logic in modprobe
to match closer it multiple modules are found.

This would likely need some of the plumbing I mentioned below, between
the syscl register functions and module macros, I guess; so it'd be new.

But for an initial implementation, maybe the compromise above is fine?

(ie, that if only the basename or '*/basename' is specified you may get
more modules loaded (and will get the sysctl asked!), but that you can
configure appropriately with modprobe.d if needed.)

Thanks,
Mauricio



>
>
>
>
> >
> > I plan to test other archs by cross-building 'allmodconfig' and checking
> > the .mod.c files and modpost output (eg, warnings) for no changes at all,
> > and nf_conntrack.mod.c for expected sysctl aliases. [based on feedback.]
> > (i.e., changes didn't break modpost, and ELF code works on other archs.)
> >
> > Happy to receive suggestions to improve test coverage and functionality.
> >
> > I didn't look much at auto-registration with modpost using the register
> > functions for sysctl, but it seems it would need plumbing, if possible.
> >
> > Let's see review/feedback on the basics first.
> >
> > thanks,
> > Mauricio
> >
> > ...
> >
> > Some context.
> >
> > Even though that issue might be expected and obvious, its consequences
> > sometimes are not.
> >
> > An example is the nf_conntrack_max value, that in busy gateways/routers
> > /cloud deployments can affect performance and functionality more subtly,
> > or even fill the kernel log non-stop with 'table full, dropping packet',
> > if a value greater than the default value is not used.
> >
> > The current solution (workaround, arguably) for this is to include such
> > modules in /etc/modules (or in /etc/modules-load.d/*.conf with systemd),
> > which loads them before an userspace tool (procps's sysctl or systemd's
> > systemd-sysctl{,.service}) runs, so /proc/sys/... exists when it runs.
> >
> > ...
> >
> > That is simple, indeed, but comes w/ technical debt. (ugly stuff warning!)
> >
> > Now there are many _different_ pieces of code that use the _same_ module
> > doing that (eg, deployment tools/scripts for openstack nova and neutron,
> > firewalls, and maybe more).
> >
> > And sometimes when components are split or deployed to different nodes
> > it turns out that in the next reboot we figure (through an issue) that
> > some component did set /etc/sysctl.conf but not /etc/modules.conf, or
> > relied in the ex-colocated component doing that.
> >
> > This has generated several one-off fixes at this point in some projects.
> > (I have submitted one of those, actually, a while ago.)
> >
> > Also, some of those fixes (or original code) put 'nf_conntrack_ipv{4,6}'
> > in /etc/modules, getting 'nf_conntrack' loaded via module dependencies
> > (maybe it was the right module for them at the time, for some reason).
> >
> > So, that component (or a colocated component) got nf_conntrack.ko too.
> >
> > *BUT* after an upgrade from Ubuntu 18.04 (4.15-based kernel) to 20.04
> > (5.4-based kernel), the nf_conntrack_ipv{4,6}.ko modules do not exist
> > anymore, and now nf_conntrack.ko is no longer loaded, and the sysctl
> > nf_conntrack_max is no longer applied. (Someone had to figure it out.)
> >
> > And now maybe we'd need release/kernel-version checks in scripts that
> > use the workaround of /etc/modules for /etc/sysctl.conf configuration.
> >
> > (Yes, it was ugly stuff.)
> >
> > ...
> >
> > Well, this last point seemed like "ok, that's enough; we can do better."
> >
> > I'm not sure this approach is "better" in all reasons, but hopefully it
> > might help starting something that is. 🙏
> >
> > cheers,
> > Mauricio
> >
> > ...
> >
> > Tests:
> >
> >     $ cat /proc/sys/kernel/modprobe_sysctl_alias
> >     1
> >
> >     $ cat /proc/sys/net/netfilter/nf_conntrack_max
> >     cat: /proc/sys/net/netfilter/nf_conntrack_max: No such file or directory
> >
> >     $ lsmod | grep nf_conntrack
> >     $
> >
> >     $ sudo modprobe sysctl:nf_conntrack_max
> >
> >     $ cat /proc/sys/net/netfilter/nf_conntrack_max
> >     262144
> >
> >     $ lsmod | grep nf_conntrack
> >     nf_conntrack          110592  0
> >     nf_defrag_ipv6         20480  1 nf_conntrack
> >     nf_defrag_ipv4         16384  1 nf_conntrack
> >
> >     $ modinfo nf_conntrack | grep ^alias:
> >     alias:          nf_conntrack-10
> >     alias:          nf_conntrack-2
> >     alias:          ip_conntrack
> >     alias:          sysctl:nf_conntrack_icmpv6_timeout
> >     alias:          sysctl:nf_conntrack_icmp_timeout
> >     alias:          sysctl:nf_conntrack_udp_timeout_stream
> >     alias:          sysctl:nf_conntrack_udp_timeout
> >     alias:          sysctl:nf_conntrack_tcp_max_retrans
> >     alias:          sysctl:nf_conntrack_tcp_ignore_invalid_rst
> >     alias:          sysctl:nf_conntrack_tcp_be_liberal
> >     alias:          sysctl:nf_conntrack_tcp_loose
> >     alias:          sysctl:nf_conntrack_tcp_timeout_unacknowledged
> >     alias:          sysctl:nf_conntrack_tcp_timeout_max_retrans
> >     alias:          sysctl:nf_conntrack_tcp_timeout_close
> >     alias:          sysctl:nf_conntrack_tcp_timeout_time_wait
> >     alias:          sysctl:nf_conntrack_tcp_timeout_last_ack
> >     alias:          sysctl:nf_conntrack_tcp_timeout_close_wait
> >     alias:          sysctl:nf_conntrack_tcp_timeout_fin_wait
> >     alias:          sysctl:nf_conntrack_tcp_timeout_established
> >     alias:          sysctl:nf_conntrack_tcp_timeout_syn_recv
> >     alias:          sysctl:nf_conntrack_tcp_timeout_syn_sent
> >     alias:          sysctl:nf_conntrack_generic_timeout
> >     alias:          sysctl:nf_conntrack_helper
> >     alias:          sysctl:nf_conntrack_acct
> >     alias:          sysctl:nf_conntrack_expect_max
> >     alias:          sysctl:nf_conntrack_log_invalid
> >     alias:          sysctl:nf_conntrack_checksum
> >     alias:          sysctl:nf_conntrack_buckets
> >     alias:          sysctl:nf_conntrack_count
> >     alias:          sysctl:nf_conntrack_max
> >
> >     $ modinfo r8169 | grep ^alias:
> >     alias:          pci:v000010ECd00003000sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008125sv*sd*bc*sc*i*
> >     alias:          pci:v00000001d00008168sv*sd00002410bc*sc*i*
> >     alias:          pci:v00001737d00001032sv*sd00000024bc*sc*i*
> >     alias:          pci:v000016ECd00000116sv*sd*bc*sc*i*
> >     alias:          pci:v00001259d0000C107sv*sd*bc*sc*i*
> >     alias:          pci:v00001186d00004302sv*sd*bc*sc*i*
> >     alias:          pci:v00001186d00004300sv*sd*bc*sc*i*
> >     alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
> >     alias:          pci:v000010ECd00008169sv*sd*bc*sc*i*
> >     alias:          pci:v000010FFd00008168sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008167sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008162sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008161sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008136sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00008129sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00002600sv*sd*bc*sc*i*
> >     alias:          pci:v000010ECd00002502sv*sd*bc*sc*i*
> >
> > Mauricio Faria de Oliveira (6):
> >   modpost: factor out elf/arch-specific code from section_rel[a]()
> >   modpost: deduplicate section_rel[a]()
> >   sysctl, mod_devicetable: shadow struct ctl_table.procname for
> >     file2alias
> >   module, modpost: introduce support for MODULE_SYSCTL_TABLE
> >   netfilter: conntrack: use MODULE_SYSCTL_TABLE
> >   sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
> >
> >  fs/proc/proc_sysctl.c                   |  27 ++++
> >  include/linux/mod_devicetable.h         |  25 ++++
> >  include/linux/module.h                  |   8 ++
> >  include/linux/sysctl.h                  |  11 +-
> >  kernel/sysctl.c                         |  10 ++
> >  net/netfilter/nf_conntrack_standalone.c |   4 +
> >  scripts/mod/devicetable-offsets.c       |   3 +
> >  scripts/mod/file2alias.c                | 111 +++++++++++++++
> >  scripts/mod/modpost.c                   | 178 +++++++++++++-----------
> >  scripts/mod/modpost.h                   |   3 +
> >  10 files changed, 296 insertions(+), 84 deletions(-)
> >
> > --
> > 2.25.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada



--
Mauricio Faria de Oliveira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/6] modpost: deduplicate section_rel[a]()
  2022-07-26  9:19   ` Masahiro Yamada
@ 2022-07-27 17:10     ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-27 17:10 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Tue, Jul 26, 2022 at 6:20 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
> <mfo@canonical.com> wrote:
> >
> > Now both functions are almost identical, and we can again generalize
> > the relocation types Elf_Rela/Elf_Rel with Elf_Rela, and handle some
> > differences with conditionals on section header type (SHT_RELA/REL).
> >
> > The important bit is to make sure the loop increment uses the right
> > size for pointer arithmethic.
> >
> > The original reason for split functions to make program logic easier
> > to follow; commit 5b24c0715fc4 ("kbuild: code refactoring in modpost").
> >
> > Hopefully these 2 commits may help improving that, without an impact
> > in understanding the code due to generalization of relocation types.
> >
> > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> > ---
> >  scripts/mod/modpost.c | 61 ++++++++++++++++---------------------------
> >  1 file changed, 23 insertions(+), 38 deletions(-)
> >
> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index 4c1038dccae0..d1ed67fa290b 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -1794,63 +1794,49 @@ static int get_relx_sym(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *rela,
> >         return 0;
> >  }
> >
> > -static void section_rela(const char *modname, struct elf_info *elf,
> > +/* The caller must ensure sechdr->sh_type == SHT_RELA or SHT_REL. */
> > +static void section_relx(const char *modname, struct elf_info *elf,
> >                          Elf_Shdr *sechdr)
> >  {
> >         Elf_Sym  *sym;
> > -       Elf_Rela *rela;
> > +       Elf_Rela *relx; /* access .r_addend in SHT_RELA _only_! */
> >         Elf_Rela r;
> > +       size_t relx_size;
> >         const char *fromsec;
> >
> >         Elf_Rela *start = (void *)elf->hdr + sechdr->sh_offset;
> >         Elf_Rela *stop  = (void *)start + sechdr->sh_size;
> >
> >         fromsec = sech_name(elf, sechdr);
> > -       fromsec += strlen(".rela");
> > +       if (sechdr->sh_type == SHT_RELA) {
> > +               relx_size = sizeof(Elf_Rela);
> > +               fromsec += strlen(".rela");
> > +       } else if (sechdr->sh_type == SHT_REL) {
> > +               relx_size = sizeof(Elf_Rel);
> > +               fromsec += strlen(".rel");
> > +       } else {
> > +               error("%s: [%s.ko] not relocation section\n", fromsec, modname);
>
>
> Nit.
>
> modname already contains the suffix  ".o".
>
> For vmlinux, the error message will print like this:
> [vmlinux.o.ko]

Oops, I missed the '.o' suffix difference between modname and mod->name.

If it's OK, I just removed '.ko' as it's simpler than plumbing 'mod' in
(i.e., for  "[%s%s]" with mod->name, mod->is_vmlinux ? "" : ".ko" ).

And just noting for myself/other readers:

Similar calls in do_sysctl_{entry,table}() with '.ko' are OK because their
modname is mod->name (without '.o' suffix), and shouldn't run for vmlinux,
just modules (MODULE_SYSCTL_TABLE is defined if MODULE is defined).

Thanks!


>
>
>
>
>
> > +               return;
> > +       }
> > +
> >         /* if from section (name) is know good then skip it */
> >         if (match(fromsec, section_white_list))
> >                 return;
> >
> > -       for (rela = start; rela < stop; rela++) {
> > -               if (get_relx_sym(elf, sechdr, rela, &r, &sym))
> > +       for (relx = start; relx < stop; relx = (void *)relx + relx_size) {
> > +               if (get_relx_sym(elf, sechdr, relx, &r, &sym))
> >                         continue;
> >
> >                 switch (elf->hdr->e_machine) {
> >                 case EM_RISCV:
> > -                       if (!strcmp("__ex_table", fromsec) &&
> > +                       if (sechdr->sh_type == SHT_RELA &&
> > +                           !strcmp("__ex_table", fromsec) &&
> >                             ELF_R_TYPE(r.r_info) == R_RISCV_SUB32)
> >                                 continue;
> >                         break;
> >                 }
> >
> > -               if (is_second_extable_reloc(start, rela, fromsec))
> > -                       find_extable_entry_size(fromsec, &r);
> > -               check_section_mismatch(modname, elf, &r, sym, fromsec);
> > -       }
> > -}
> > -
> > -static void section_rel(const char *modname, struct elf_info *elf,
> > -                       Elf_Shdr *sechdr)
> > -{
> > -       Elf_Sym *sym;
> > -       Elf_Rel *rel;
> > -       Elf_Rela r;
> > -       const char *fromsec;
> > -
> > -       Elf_Rel *start = (void *)elf->hdr + sechdr->sh_offset;
> > -       Elf_Rel *stop  = (void *)start + sechdr->sh_size;
> > -
> > -       fromsec = sech_name(elf, sechdr);
> > -       fromsec += strlen(".rel");
> > -       /* if from section (name) is know good then skip it */
> > -       if (match(fromsec, section_white_list))
> > -               return;
> > -
> > -       for (rel = start; rel < stop; rel++) {
> > -               if (get_relx_sym(elf, sechdr, (Elf_Rela *)rel, &r, &sym)
> > -                       continue;
> > -
> > -               if (is_second_extable_reloc(start, rel, fromsec))
> > +               if (is_second_extable_reloc(start, relx, fromsec))
> >                         find_extable_entry_size(fromsec, &r);
> >                 check_section_mismatch(modname, elf, &r, sym, fromsec);
> >         }
> > @@ -1877,10 +1863,9 @@ static void check_sec_ref(const char *modname, struct elf_info *elf)
> >         for (i = 0; i < elf->num_sections; i++) {
> >                 check_section(modname, elf, &elf->sechdrs[i]);
> >                 /* We want to process only relocation sections and not .init */
> > -               if (sechdrs[i].sh_type == SHT_RELA)
> > -                       section_rela(modname, elf, &elf->sechdrs[i]);
> > -               else if (sechdrs[i].sh_type == SHT_REL)
> > -                       section_rel(modname, elf, &elf->sechdrs[i]);
> > +               if (sechdrs[i].sh_type == SHT_RELA ||
> > +                   sechdrs[i].sh_type == SHT_REL)
> > +                       section_relx(modname, elf, &elf->sechdrs[i]);
> >         }
> >  }
> >
> > --
> > 2.25.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada



--
Mauricio Faria de Oliveira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias
  2022-07-26  9:22   ` Masahiro Yamada
@ 2022-07-27 17:11     ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-27 17:11 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Tue, Jul 26, 2022 at 6:24 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
> <mfo@canonical.com> wrote:
> >
> > The goal of the earlier patches is to let sysctl userspace tools
> > load the kernel module with a sysctl entry that is not available
> > yet in /proc/sys/ when the tool runs (so it can become available).
> >
> > Let's expose this file for userspace for two reasons:
> >
> > 1) Allow such tools to identify that the running kernel has the
> >    code which produces sysctl module aliases, so they could run
> >    'modprobe sysctl:<entry>' only when it may actually help.
> >
> > 2) Allow an administrator to hint such tools not to do that, if
> >    that is desired for some reason (e.g., rather have the tools
> >    fail if something is misconfigured in a critical deployment).
>
> This flag is just a hint.
> User-space tools are still able to ignore it.
>
> Perhaps, such administrator's choice might be specified in
> tools' configuration file.
>
> For example,
>
> /etc/modprobe.d/forbid-sysctl-alias.conf
>
> may specify
>
>     blacklist:  sysctl:*
>
> if they want to forbid sysctl aliasing.
> (but I do not know if this works or not).

Yes, it's just a hint. I considered this isn't strong enough, but
didn't think more into it.

Now, your idea with modprobe.d is strong enough. We have to change it a bit, as
only 'alias' supports wildcards per modprobe.d(5), then add 'install'
to make sure.

# cat /etc/modprobe.d/disable-sysctl-alias.conf
alias sysctl:* sysctl_alias_off
install sysctl_alias_off /bin/false
# or /bin/true, per the sysadmin.

# modprobe sysctl:nf_conntrack_max
modprobe: ERROR: ../libkmod/libkmod-module.c:990 command_do() Error
running install command '/bin/false' for module sysctl_alias_off:
retcode 1
modprobe: ERROR: could not insert 'sysctl_alias_off': Invalid argument

I'll document this in the commit message for now.

P.S.: Since the flag is a hint to userspace tools in sense 1) as well
(so they know not to run modprobe if sysctl aliases aren't expected),
the idea or the file itself seems worth keeping -- but maybe differently.

Thanks,


>
>
>
>
>
>
>
>
>
>
>
>
>
>
> > Also add a module parameter for that (proc.modprobe_sysctl_alias),
> > for another method that doesn't depend on sysctl tools to be set
> > (that wouldn't fail them to try and set it if it's not there yet).
> >
> > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> > ---
> >  fs/proc/proc_sysctl.c  | 8 ++++++++
> >  include/linux/module.h | 1 +
> >  kernel/sysctl.c        | 9 +++++++++
> >  3 files changed, 18 insertions(+)
> >
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index ebbf8702387e..1e63819fcda8 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -33,6 +33,14 @@ static void check_struct_sysctl_device_id(void)
> >         BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
> >                         != offsetof(struct ctl_table, procname));
> >  }
> > +
> > +/*
> > + * Hint sysctl userspace tools whether or not to run modprobe with sysctl alias
> > + * ('modprobe sysctl:entry') if they cannot find the file '/proc/sys/.../entry'
> > + */
> > +int modprobe_sysctl_alias = 1;
> > +module_param(modprobe_sysctl_alias, int, 0644);
> > +
> >  #else
> >  static void check_struct_sysctl_device_id(void) {}
> >  #endif
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 3010f687df19..5f565491c596 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -304,6 +304,7 @@ struct notifier_block;
> >  #ifdef CONFIG_MODULES
> >
> >  extern int modules_disabled; /* for sysctl */
> > +extern int modprobe_sysctl_alias; /* for proc sysctl */
> >  /* Get/put a kernel symbol (calls must be symmetric) */
> >  void *__symbol_get(const char *symbol);
> >  void *__symbol_get_gpl(const char *symbol);
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 15073621cfa8..b396cfcb55fc 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1763,6 +1763,15 @@ static struct ctl_table kern_table[] = {
> >                 .mode           = 0644,
> >                 .proc_handler   = proc_dostring,
> >         },
> > +#ifdef CONFIG_PROC_SYSCTL
> > +       {
> > +               .procname       = "modprobe_sysctl_alias",
> > +               .data           = &modprobe_sysctl_alias,
> > +               .maxlen         = sizeof(modprobe_sysctl_alias),
> > +               .mode           = 0644,
> > +               .proc_handler   = proc_dointvec,
> > +       },
> > +#endif
> >         {
> >                 .procname       = "modules_disabled",
> >                 .data           = &modules_disabled,
> > --
> > 2.25.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada



--
Mauricio Faria de Oliveira

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias
  2022-07-26  9:25   ` Masahiro Yamada
@ 2022-07-27 17:11     ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 15+ messages in thread
From: Mauricio Faria de Oliveira @ 2022-07-27 17:11 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kernel Mailing List, linux-modules,
	Linux Kbuild mailing list, Linux FS-devel Mailing List,
	Michal Marek, Nick Desaulniers, Luis Chamberlain, Kees Cook,
	Iurii Zaikin

On Tue, Jul 26, 2022 at 6:27 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Fri, Jul 22, 2022 at 11:24 AM Mauricio Faria de Oliveira
> <mfo@canonical.com> wrote:
> >
> > In order to expose a sysctl entry to modpost (file2alias.c, precisely)
> > we have to shadow 'struct ctl_table' in mod_devicetable.h, as scripts
> > should not access kernel headers or its types (see file2alias.c).
> >
> > The required field is '.procname' (basename of '/proc/sys/.../entry').
> >
> > Since 'struct ctl_table' is annotated for structure randomization and
> > we need a known offset for '.procname' (remember, no kernel headers),
> > take it out of the randomized portion (as in, eg, 'struct task_struct').
> >
> > Of course, add build-time checks for struct size and .procname offset
> > between both structs. (This has to be done on kernel side; for headers.)
> >
> > With that in place, use the regular macros in devicetable-offsets.c to
> > define SIZE_... and OFF_... macros for the shadow struct and the field
> > of interest.
> >
> > Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
> > ---
> >  fs/proc/proc_sysctl.c             | 19 +++++++++++++++++++
> >  include/linux/mod_devicetable.h   | 25 +++++++++++++++++++++++++
> >  include/linux/sysctl.h            | 11 ++++++++++-
> >  kernel/sysctl.c                   |  1 +
> >  scripts/mod/devicetable-offsets.c |  3 +++
> >  5 files changed, 58 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> > index 021e83fe831f..ebbf8702387e 100644
> > --- a/fs/proc/proc_sysctl.c
> > +++ b/fs/proc/proc_sysctl.c
> > @@ -19,6 +19,24 @@
> >  #include <linux/kmemleak.h>
> >  #include "internal.h"
> >
> > +#ifdef CONFIG_MODULES
> > +#include <linux/mod_devicetable.h>
> > +
> > +static void check_struct_sysctl_device_id(void)
> > +{
> > +       /*
> > +        * The shadow struct sysctl_device_id for file2alias.c needs
> > +        * the same size of struct ctl_table and offset for procname.
> > +        */
> > +       BUILD_BUG_ON(sizeof(struct sysctl_device_id)
> > +                       != sizeof(struct ctl_table));
> > +       BUILD_BUG_ON(offsetof(struct sysctl_device_id, procname)
> > +                       != offsetof(struct ctl_table, procname));
>
>
> Nit:
>
> If you use static_assert(), you can remove
>  check_struct_sysctl_device_id().
>
>
> You can write static_assert() out of a function.

That's a nice cleanup; thanks!

>
>
>
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 223376959d29..15073621cfa8 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -2487,6 +2487,7 @@ int __init sysctl_init_bases(void)
> >
> >         return 0;
> >  }
> > +
>
>
> Noise.

Fixed.


>
>
>
>
> >  #endif /* CONFIG_SYSCTL */
> >  /*
> >   * No sense putting this after each symbol definition, twice,
> > diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
> > index c0d3bcb99138..43b2549940d2 100644
> > --- a/scripts/mod/devicetable-offsets.c
> > +++ b/scripts/mod/devicetable-offsets.c
> > @@ -262,5 +262,8 @@ int main(void)
> >         DEVID(ishtp_device_id);
> >         DEVID_FIELD(ishtp_device_id, guid);
> >
> > +       DEVID(sysctl_device_id);
> > +       DEVID_FIELD(sysctl_device_id, procname);
> > +
> >         return 0;
> >  }
> > --
> > 2.25.1
> >
>
>
> --
> Best Regards
> Masahiro Yamada



--
Mauricio Faria de Oliveira

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-07-27 18:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-22  2:24 [RFC PATCH 0/6] Introduce "sysctl:" module aliases Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 1/6] modpost: factor out elf/arch-specific code from section_rel[a]() Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 2/6] modpost: deduplicate section_rel[a]() Mauricio Faria de Oliveira
2022-07-26  9:19   ` Masahiro Yamada
2022-07-27 17:10     ` Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 3/6] sysctl, mod_devicetable: shadow struct ctl_table.procname for file2alias Mauricio Faria de Oliveira
2022-07-26  9:25   ` Masahiro Yamada
2022-07-27 17:11     ` Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 4/6] module, modpost: introduce support for MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 5/6] netfilter: conntrack: use MODULE_SYSCTL_TABLE Mauricio Faria de Oliveira
2022-07-22  2:24 ` [RFC PATCH 6/6] sysctl: introduce /proc/sys/kernel/modprobe_sysctl_alias Mauricio Faria de Oliveira
2022-07-26  9:22   ` Masahiro Yamada
2022-07-27 17:11     ` Mauricio Faria de Oliveira
2022-07-26  9:02 ` [RFC PATCH 0/6] Introduce "sysctl:" module aliases Masahiro Yamada
2022-07-27 17:09   ` Mauricio Faria de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).