linux-modules.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFT 0/5] module: avoid userspace pressure on unwanted allocations
@ 2023-03-19 21:49 Luis Chamberlain
  2023-03-19 21:49 ` [RFT 1/5] module: move finished_loading() Luis Chamberlain
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

Finally this third patch set spins the first RFC I put out to help
reduce memory pressure [0]. It updates the commit log with some stats
obtained on a guest, but I need to do more tests on more systems and
then also with stress-ng. I posted a patch to run stress-ng for modules,
so it stresses running finit_module() [1]. Using that instead of
kmod test 0008 should be useful as that really puts some heavy load
without going through the kernel module auto-loader, that has a
restriction of just allowing 50 threads concurrently. The issue with
that stress test so far is that unloading doesn't seem to unload yet.

The last patch is purely for testing purposes and its value can only be
shown if it really does help the use case of a large system with many
CPUs. That situation is known currently to cuase issues with subsystems
which end up loading tons of the same drivers and so this tries to be
a bit defensive for subsystems that might need some love in this area.

I have two trees for this patchset, the first one had the ELF checker
and validity tests at the end [2], and the latest one re-adjusts the
ordering to put this patch set as the last series [3], in line with
the order in which I've submitted the patches. I had only run time
tested the patch order on [2] but it makes sense to put more of the
heavier functional changes at the very end, and request for further
testing.

[0] https://lkml.kernel.org/r/20230311051712.4095040-1-mcgrof@kernel.org
[1] https://lore.kernel.org/all/ZBUA6E3kYh0Xuu/c@bombadil.infradead.org/?q=stress-ng+mcgrof
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230319-module-alloc-opts

Luis Chamberlain (5):
  module: move finished_loading()
  module: extract patient module check into helper
  module: avoid allocation if module is already present and ready
  module: use list_add_tail_rcu() when adding module
  module: add a sanity check prior to allowing kernel module
    auto-loading

 kernel/module/internal.h |   1 +
 kernel/module/kmod.c     |   7 ++
 kernel/module/main.c     | 139 ++++++++++++++++++++++++---------------
 3 files changed, 93 insertions(+), 54 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFT 1/5] module: move finished_loading()
  2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
@ 2023-03-19 21:49 ` Luis Chamberlain
  2023-03-19 21:49 ` [RFT 2/5] module: extract patient module check into helper Luis Chamberlain
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

This has no functional change, just moves a routine earlier
as we'll make use of it next.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 929644d79d38..560d00e60744 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2444,27 +2444,6 @@ static int post_relocation(struct module *mod, const struct load_info *info)
 	return module_finalize(info->hdr, info->sechdrs, mod);
 }
 
-/* Is this module of this name done loading?  No locks held. */
-static bool finished_loading(const char *name)
-{
-	struct module *mod;
-	bool ret;
-
-	/*
-	 * The module_mutex should not be a heavily contended lock;
-	 * if we get the occasional sleep here, we'll go an extra iteration
-	 * in the wait_event_interruptible(), which is harmless.
-	 */
-	sched_annotate_sleep();
-	mutex_lock(&module_mutex);
-	mod = find_module_all(name, strlen(name), true);
-	ret = !mod || mod->state == MODULE_STATE_LIVE
-		|| mod->state == MODULE_STATE_GOING;
-	mutex_unlock(&module_mutex);
-
-	return ret;
-}
-
 /* Call module constructors. */
 static void do_mod_ctors(struct module *mod)
 {
@@ -2628,6 +2607,27 @@ static int may_init_module(void)
 	return 0;
 }
 
+/* Is this module of this name done loading?  No locks held. */
+static bool finished_loading(const char *name)
+{
+	struct module *mod;
+	bool ret;
+
+	/*
+	 * The module_mutex should not be a heavily contended lock;
+	 * if we get the occasional sleep here, we'll go an extra iteration
+	 * in the wait_event_interruptible(), which is harmless.
+	 */
+	sched_annotate_sleep();
+	mutex_lock(&module_mutex);
+	mod = find_module_all(name, strlen(name), true);
+	ret = !mod || mod->state == MODULE_STATE_LIVE
+		|| mod->state == MODULE_STATE_GOING;
+	mutex_unlock(&module_mutex);
+
+	return ret;
+}
+
 /*
  * We try to place it in the list now to make sure it's unique before
  * we dedicate too many resources.  In particular, temporary percpu
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFT 2/5] module: extract patient module check into helper
  2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
  2023-03-19 21:49 ` [RFT 1/5] module: move finished_loading() Luis Chamberlain
@ 2023-03-19 21:49 ` Luis Chamberlain
  2023-03-19 21:49 ` [RFT 3/5] module: avoid allocation if module is already present and ready Luis Chamberlain
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

The patient module check inside add_unformed_module() is large
enough as we need it. It is a bit hard to read too, so just
move it to a helper and do the inverse checks first to help
shift the code and make it easier to read. The new helper then
is module_patient_check_exists().

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c | 71 +++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 560d00e60744..3644438ff96e 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2628,6 +2628,43 @@ static bool finished_loading(const char *name)
 	return ret;
 }
 
+/* Must be called with module_mutex held */
+static int module_patient_check_exists(const char *name)
+{
+	struct module *old;
+	int err = 0;
+
+	old = find_module_all(name, strlen(name), true);
+	if (old == NULL)
+		return 0;
+
+	if (old->state == MODULE_STATE_COMING
+	    || old->state == MODULE_STATE_UNFORMED) {
+		/* Wait in case it fails to load. */
+		mutex_unlock(&module_mutex);
+		err = wait_event_interruptible(module_wq,
+				       finished_loading(name));
+		if (err)
+			return err;
+
+		/* The module might have gone in the meantime. */
+		mutex_lock(&module_mutex);
+		old = find_module_all(name, strlen(name), true);
+	}
+
+	/*
+	 * We are here only when the same module was being loaded. Do
+	 * not try to load it again right now. It prevents long delays
+	 * caused by serialized module load failures. It might happen
+	 * when more devices of the same type trigger load of
+	 * a particular module.
+	 */
+	if (old && old->state == MODULE_STATE_LIVE)
+		return -EEXIST;
+	else
+		return -EBUSY;
+}
+
 /*
  * We try to place it in the list now to make sure it's unique before
  * we dedicate too many resources.  In particular, temporary percpu
@@ -2636,41 +2673,14 @@ static bool finished_loading(const char *name)
 static int add_unformed_module(struct module *mod)
 {
 	int err;
-	struct module *old;
 
 	mod->state = MODULE_STATE_UNFORMED;
 
 	mutex_lock(&module_mutex);
-	old = find_module_all(mod->name, strlen(mod->name), true);
-	if (old != NULL) {
-		if (old->state == MODULE_STATE_COMING
-		    || old->state == MODULE_STATE_UNFORMED) {
-			/* Wait in case it fails to load. */
-			mutex_unlock(&module_mutex);
-			err = wait_event_interruptible(module_wq,
-					       finished_loading(mod->name));
-			if (err)
-				goto out_unlocked;
-
-			/* The module might have gone in the meantime. */
-			mutex_lock(&module_mutex);
-			old = find_module_all(mod->name, strlen(mod->name),
-					      true);
-		}
-
-		/*
-		 * We are here only when the same module was being loaded. Do
-		 * not try to load it again right now. It prevents long delays
-		 * caused by serialized module load failures. It might happen
-		 * when more devices of the same type trigger load of
-		 * a particular module.
-		 */
-		if (old && old->state == MODULE_STATE_LIVE)
-			err = -EEXIST;
-		else
-			err = -EBUSY;
+	err = module_patient_check_exists(mod->name);
+	if (err)
 		goto out;
-	}
+
 	mod_update_bounds(mod);
 	list_add_rcu(&mod->list, &modules);
 	mod_tree_insert(mod);
@@ -2678,7 +2688,6 @@ static int add_unformed_module(struct module *mod)
 
 out:
 	mutex_unlock(&module_mutex);
-out_unlocked:
 	return err;
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFT 3/5] module: avoid allocation if module is already present and ready
  2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
  2023-03-19 21:49 ` [RFT 1/5] module: move finished_loading() Luis Chamberlain
  2023-03-19 21:49 ` [RFT 2/5] module: extract patient module check into helper Luis Chamberlain
@ 2023-03-19 21:49 ` Luis Chamberlain
  2023-03-19 21:49 ` [RFT 4/5] module: use list_add_tail_rcu() when adding module Luis Chamberlain
  2023-03-19 21:49 ` [RFT 5/5] module: add a sanity check prior to allowing kernel module auto-loading Luis Chamberlain
  4 siblings, 0 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

load_module() will allocate a struct module before even checking
if the module is already loaded. This can create unecessary memory
pressure since we can easily just check if the module is already
present early with the copy of the module information from userspace
after we've validated it a bit.

This can only be an issue if a system is getting hammered with userspace
loading modules. Note that there are two ways to load modules, one is
auto-loading in-kernel and that pings back to userspace to just call
modprobe, and we already have a way to restrict the count and load
from there on the kernel usermode helper. However that does not stop
a system from issuing tons of system calls to load a module. So userspace
itself *is* supposed to check if a module is present before loading it.
But we're observing situations where tons of the same module are in effect
being loaded. Although some of these are acknolwedged as in-kernel bugs
such as the ACPI frequency modules we can also help a bit more in the
modules side to avoid those dramatic situations. All that is just memory
being allocated to then be thrown away.

To avoid memory pressure for such stupid cases put a stop gap for them.
We now check for the module being present *before* allocation, and then
right after we are going to add it to the system.

On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests
kmod.sh -t 0008 I see a saving in the *highest* side of memory
consumption of up to ~ 84 MiB. This can be obvserved and visualized
below. The time it takes to run the test is also not affected.

The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib)
given the tests peak around that range.

cat kmod-simple.plot
set term dumb
set output fileout
set yrange [400000:580000]
plot filein with linespoints title "Memory usage (KiB)"

Before:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C
root@kmod ~ # sort -n -r log-0008-before.txt | head -1
528732

So ~516.33 MiB

After:

root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C

root@kmod ~ # sort -n -r log-0008-after.txt | head -1
442516

So ~432.14 MiB

That's about 84 ~MiB in savings in the worst case. The graphs:

root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot
root@kmod ~ # gnuplot -e "filein='log-0008-after.txt';  fileout='graph-0008-after.txt'"  kmod.plot

root@kmod ~ # cat graph-0008-before.txt

  580000 +-----------------------------------------------------------------+
         |       +        +       +       +       +        +       +       |
  560000 |-+                                    Memory usage (KiB) ***A***-|
         |                                                                 |
  540000 |-+                                                             +-|
         |                                                                 |
         |        *A     *AA*AA*A*AA          *A*AA    A*A*A *AA*A*AA*A  A |
  520000 |-+A*A*AA  *AA*A           *A*AA*A*AA     *A*A     A          *A+-|
         |*A                                                               |
  500000 |-+                                                             +-|
         |                                                                 |
  480000 |-+                                                             +-|
         |                                                                 |
  460000 |-+                                                             +-|
         |                                                                 |
         |                                                                 |
  440000 |-+                                                             +-|
         |                                                                 |
  420000 |-+                                                             +-|
         |       +        +       +       +       +        +       +       |
  400000 +-----------------------------------------------------------------+
         0       5        10      15      20      25       30      35      40

root@kmod ~ # cat graph-0008-after.txt

  580000 +-----------------------------------------------------------------+
         |       +        +       +       +       +        +       +       |
  560000 |-+                                    Memory usage (KiB) ***A***-|
         |                                                                 |
  540000 |-+                                                             +-|
         |                                                                 |
         |                                                                 |
  520000 |-+                                                             +-|
         |                                                                 |
  500000 |-+                                                             +-|
         |                                                                 |
  480000 |-+                                                             +-|
         |                                                                 |
  460000 |-+                                                             +-|
         |                                                                 |
         |          *A              *A*A                                   |
  440000 |-+A*A*AA*A  A       A*A*AA    A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-|
         |*A           *A*AA*A                                             |
  420000 |-+                                                             +-|
         |       +        +       +       +       +        +       +       |
  400000 +-----------------------------------------------------------------+
         0       5        10      15      20      25       30      35      40

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 3644438ff96e..0ad26455def2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2787,7 +2787,11 @@ static int early_mod_check(struct load_info *info, int flags)
 	if (err)
 		return err;
 
-	return 0;
+	mutex_lock(&module_mutex);
+	err = module_patient_check_exists(info->mod->name);
+	mutex_unlock(&module_mutex);
+
+	return err;
 }
 
 /*
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFT 4/5] module: use list_add_tail_rcu() when adding module
  2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
                   ` (2 preceding siblings ...)
  2023-03-19 21:49 ` [RFT 3/5] module: avoid allocation if module is already present and ready Luis Chamberlain
@ 2023-03-19 21:49 ` Luis Chamberlain
  2023-03-19 21:49 ` [RFT 5/5] module: add a sanity check prior to allowing kernel module auto-loading Luis Chamberlain
  4 siblings, 0 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

Put a new module at the end of the list intead of making
new modules at the top of the list. find_module_all() start
the hunt using the first entry on the list, if we assume
that the modules which are first loaded are the most
frequently looked for modules this should provide a tiny
optimization.

With a 8vcpu 8 GiB RAM kvm guest with kdevops with kdevops
this saves about 4 MiB of RAM max use during the kmod tests
0008 which loops and runs loading a module in a loop through
kthreads while not affecting the time to test.

There could be some minor savings for systems that have repeated
module requests for subsystems that are not yet optimized (ACPI
frequency drivers come to mind) so in theory this could help there
but that is just pure speculation.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 0ad26455def2..32955b7819b3 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2682,7 +2682,7 @@ static int add_unformed_module(struct module *mod)
 		goto out;
 
 	mod_update_bounds(mod);
-	list_add_rcu(&mod->list, &modules);
+	list_add_tail_rcu(&mod->list, &modules);
 	mod_tree_insert(mod);
 	err = 0;
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFT 5/5] module: add a sanity check prior to allowing kernel module auto-loading
  2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
                   ` (3 preceding siblings ...)
  2023-03-19 21:49 ` [RFT 4/5] module: use list_add_tail_rcu() when adding module Luis Chamberlain
@ 2023-03-19 21:49 ` Luis Chamberlain
  4 siblings, 0 replies; 6+ messages in thread
From: Luis Chamberlain @ 2023-03-19 21:49 UTC (permalink / raw)
  To: linux-modules, linux-kernel, pmladek, david, petr.pavlu, prarit
  Cc: christophe.leroy, song, mcgrof

request_module() is what we use for kernel-module autoloading. But
this today is pretty stupid, it just requests for the module without
even checking if its needed. And so we bounce to userspace and hope
for the best.

We can instead do a simple check to see if the module is already
present. This will however contend the module_mutex, but it should
never be heavily contended. However, module auto-loading is a special
use case in the kernel and kernel/kmod already limits the amount of
concurrent requests from the kernel to 50 so we know the kernel can
only contend on the module_mutex for querying if a module exists 50
times at any single point in time.

This work is only valuable if it proves useful for booting up large
systems where a lot of current kernel heuristics use many kernel
module auto-loading features and they could benefit from this. There
are no metrics yet to show, but at least this doesn't penalize much
existing systems.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/internal.h |  1 +
 kernel/module/kmod.c     |  7 +++++++
 kernel/module/main.c     | 18 ++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 6ae29bb8836f..cb00555780bd 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -86,6 +86,7 @@ struct find_symbol_arg {
 	enum mod_license license;
 };
 
+bool module_autoload_proceed(const char *name);
 int mod_verify_sig(const void *mod, struct load_info *info);
 int try_to_force_load(struct module *mod, const char *reason);
 bool find_symbol(struct find_symbol_arg *fsa);
diff --git a/kernel/module/kmod.c b/kernel/module/kmod.c
index b717134ebe17..67efc6de902f 100644
--- a/kernel/module/kmod.c
+++ b/kernel/module/kmod.c
@@ -28,6 +28,8 @@
 
 #include <trace/events/module.h>
 
+#include "internal.h"
+
 /*
  * Assuming:
  *
@@ -167,8 +169,13 @@ int __request_module(bool wait, const char *fmt, ...)
 
 	trace_module_request(module_name, wait, _RET_IP_);
 
+	if (!module_autoload_proceed(module_name)) {
+		ret = 0;
+		goto out;
+	}
 	ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
 
+out:
 	atomic_inc(&kmod_concurrent_max);
 	wake_up(&kmod_wq);
 
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 32955b7819b3..2a84d747865a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2665,6 +2665,24 @@ static int module_patient_check_exists(const char *name)
 		return -EBUSY;
 }
 
+/*
+ * We allow contention of the module_mutex here because kernel module
+ * auto-loading a special feature *and* because we are only allowing
+ * MAX_KMOD_CONCURRENT possible checks at a time for a module.
+ */
+bool module_autoload_proceed(const char *name)
+{
+	int err;
+
+	mutex_lock(&module_mutex);
+	err = module_patient_check_exists(name);
+	mutex_unlock(&module_mutex);
+
+	if (err == -EEXIST)
+		return false;
+	return true;
+}
+
 /*
  * We try to place it in the list now to make sure it's unique before
  * we dedicate too many resources.  In particular, temporary percpu
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-19 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-19 21:49 [RFT 0/5] module: avoid userspace pressure on unwanted allocations Luis Chamberlain
2023-03-19 21:49 ` [RFT 1/5] module: move finished_loading() Luis Chamberlain
2023-03-19 21:49 ` [RFT 2/5] module: extract patient module check into helper Luis Chamberlain
2023-03-19 21:49 ` [RFT 3/5] module: avoid allocation if module is already present and ready Luis Chamberlain
2023-03-19 21:49 ` [RFT 4/5] module: use list_add_tail_rcu() when adding module Luis Chamberlain
2023-03-19 21:49 ` [RFT 5/5] module: add a sanity check prior to allowing kernel module auto-loading Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).