linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: david@redhat.com, tglx@linutronix.de, hch@lst.de,
	patches@lists.linux.dev, linux-modules@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	pmladek@suse.com, petr.pavlu@suse.com, prarit@redhat.com,
	torvalds@linux-foundation.org, lennart@poettering.net
Cc: gregkh@linuxfoundation.org, rafael@kernel.org, song@kernel.org,
	lucas.de.marchi@gmail.com, lucas.demarchi@intel.com,
	christophe.leroy@csgroup.eu, peterz@infradead.org,
	rppt@kernel.org, dave@stgolabs.net, willy@infradead.org,
	vbabka@suse.cz, mhocko@suse.com, dave.hansen@linux.intel.com,
	colin.i.king@gmail.com, jim.cromie@gmail.com,
	catalin.marinas@arm.com, jbaron@akamai.com,
	rick.p.edgecombe@intel.com, yujie.liu@intel.com,
	mcgrof@kernel.org
Subject: [PATCH 2/2] module: add support to avoid duplicates early on load
Date: Wed, 24 May 2023 14:36:20 -0700	[thread overview]
Message-ID: <20230524213620.3509138-3-mcgrof@kernel.org> (raw)
In-Reply-To: <20230524213620.3509138-1-mcgrof@kernel.org>

Add support to use the new kread_uniq_fd() to avoid duplicate kernel
reads on modules. At the cost of about ~945 bytes to your kernel size,
enabling this on a 255 CPU x86_64 qemu guest this saves about ~1.8 GiB
of memory during boot which would otherwise be free'd, and reduces boot
time by about ~11 seconds.

Userspace loads modules through finit_module(), this in turn will
use vmalloc space up to 3 times:

  a) The kernel_read_file() call
  b) Optional module decompression
  c) Our final copy of the module

Commit 064f4536d139 ("module: avoid allocation if module is already
present and ready") shows a graph of the amount of vmalloc space
observed allocated but freed for duplicate module request which end
up in the trash bin. Since there is a linear relationship with the
number of CPUs eventually this will bite us and you end up not being
able to boot. That commit put a stop gap for c) but to avoid the
vmalloc() space wasted on a) and b) we need to detect duplicates
earlier.

We could just have userspace fix this, but as reviewed at LSFMM 2023
this year in Vancouver, fixing this in userspace can be complex and we
also can't know when userpace is fixed. Fixing this in kernel turned
out to be easy with the inode and with a simple kconfig option we can
let users / distros decide if this full stop gap is worthy to enable.

With this enabled I'm now able to see 0 bytes wasted on vmalloc space
due to duplicates.

Before:
 # sudo systemd-analyze
 Startup finished in 41.653s (kernel) + 44.305s (userspace) = 1min 25.958s
 graphical.target reached after 44.178s in userspace.

 # grep "Virtual mem wasted bytes" /sys/kernel/debug/modules/stats
 Virtual mem wasted bytes       1949006968

So ~1.8 GiB... of vmalloc space wasted during boot.

After:

 # sudo systemd-analyze
 Startup finished in 29.883s (kernel) + 45.817s (userspace) = 1min 15.700s
 graphical.target reached after 45.682s in userspace.

 # grep "Virtual mem wasted bytes" /sys/kernel/debug/modules/stats
 Virtual mem wasted bytes       0

Suggested-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/module.h   |  1 +
 kernel/module/Kconfig    | 20 ++++++++++++++++++++
 kernel/module/internal.h |  1 +
 kernel/module/main.c     | 19 ++++++++++++-------
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 9e56763dff81..afc44df96278 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -557,6 +557,7 @@ struct module {
 	unsigned int printk_index_size;
 	struct pi_entry **printk_index_start;
 #endif
+	unsigned long i_ino;
 
 #ifdef CONFIG_MODULE_UNLOAD
 	/* What modules depend on me? */
diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 33a2e991f608..85a6c7c5ddc0 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig
@@ -157,6 +157,26 @@ config MODULE_UNLOAD_TAINT_TRACKING
 	  page (see bad_page()), the aforementioned details are also
 	  shown. If unsure, say N.
 
+config MODULE_KREAD_UNIQ
+	bool "Avoid duplicate module kernel reads"
+	select KREAD_UNIQ
+	help
+	  Enable this option to avoid vmalloc() space for duplicate module
+	  requests early before we can even check for the module name. This
+	  is useful to avoid duplicate module requests which userspace or kernel
+	  can issue. On some architectures such as x86_64 there is only 128 MiB
+	  of virtual memory space and since in the worst case we can end up
+	  allocating up to 3 times the module size in vmalloc space, avoiding
+	  duplicates can save virtual memory on boot.
+
+	  Enabling this will incrase your kernel by about 945 bytes, but can
+	  save considerable memory during bootup which would otherwise be freed
+	  and this in turn can help speed up kernel boot time.
+
+	  Disable this if you have enabled CONFIG_MODULE_STATS and have
+	  verified you see no duplicates or virtual memory being freed on
+	  bootup.
+
 config MODVERSIONS
 	bool "Module versioning support"
 	help
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index dc7b0160c480..7ea7f177f907 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -67,6 +67,7 @@ struct load_info {
 	unsigned int max_pages;
 	unsigned int used_pages;
 #endif
+	unsigned long i_ino;
 	struct {
 		unsigned int sym, str, mod, vers, info, pcpu;
 	} index;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index ea7d0c7f3e60..e35900ee616a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1283,6 +1283,7 @@ static void free_module(struct module *mod)
 	kfree(mod->args);
 	percpu_modfree(mod);
 
+	kread_uniq_fd_free(mod->i_ino);
 	free_mod_mem(mod);
 }
 
@@ -1964,12 +1965,14 @@ static int copy_module_from_user(const void __user *umod, unsigned long len,
 	return err;
 }
 
-static void free_copy(struct load_info *info, int flags)
+static void free_copy(struct load_info *info, int flags, bool error)
 {
 	if (flags & MODULE_INIT_COMPRESSED_FILE)
 		module_decompress_cleanup(info);
 	else
 		vfree(info->hdr);
+	if (error)
+		kread_uniq_fd_free(info->i_ino);
 }
 
 static int rewrite_section_headers(struct load_info *info, int flags)
@@ -2965,7 +2968,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	}
 
 	/* Get rid of temporary copy. */
-	free_copy(info, flags);
+	free_copy(info, flags, false);
 
 	/* Done! */
 	trace_module_load(mod);
@@ -3023,7 +3026,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	 */
 	if (!module_allocated)
 		mod_stat_bump_becoming(info, flags);
-	free_copy(info, flags);
+	free_copy(info, flags, true);
 	return err;
 }
 
@@ -3068,11 +3071,12 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		      |MODULE_INIT_COMPRESSED_FILE))
 		return -EINVAL;
 
-	len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL,
-				       READING_MODULE);
+	len = kread_uniq_fd(fd, 0, &buf, &info.i_ino, INT_MAX, NULL, READING_MODULE);
 	if (len < 0) {
-		mod_stat_inc(&failed_kreads);
-		mod_stat_add_long(len, &invalid_kread_bytes);
+		if (len != -EBUSY) {
+			mod_stat_inc(&failed_kreads);
+			mod_stat_add_long(len, &invalid_kread_bytes);
+		}
 		return len;
 	}
 
@@ -3082,6 +3086,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 		if (err) {
 			mod_stat_inc(&failed_decompress);
 			mod_stat_add_long(len, &invalid_decompress_bytes);
+			kread_uniq_fd_free(info.i_ino);
 			return err;
 		}
 	} else {
-- 
2.39.2


  parent reply	other threads:[~2023-05-24 21:36 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24 21:36 [PATCH 0/2] module: avoid all memory pressure due to duplicates Luis Chamberlain
2023-05-24 21:36 ` [PATCH 1/2] fs/kernel_read_file: add support for duplicate detection Luis Chamberlain
2023-05-24 21:52   ` Linus Torvalds
2023-05-24 21:56     ` Linus Torvalds
2023-05-24 22:07       ` Luis Chamberlain
2023-05-25  4:00     ` Linus Torvalds
2023-05-25 18:08       ` Luis Chamberlain
2023-05-25 18:35         ` Luis Chamberlain
2023-05-25 18:50         ` Linus Torvalds
2023-05-25 19:32           ` Luis Chamberlain
2023-05-25  7:01     ` Christian Brauner
2023-05-24 21:36 ` Luis Chamberlain [this message]
2023-05-25 11:40   ` [PATCH 2/2] module: add support to avoid duplicates early on load Petr Pavlu
2023-05-25 16:07     ` Linus Torvalds
2023-05-25 16:42       ` Greg KH
2023-05-25 18:22         ` Luis Chamberlain
2023-05-25 17:52       ` Linus Torvalds
2023-05-25 18:45       ` Lucas De Marchi
2023-05-25 21:12         ` Linus Torvalds
2023-05-25 22:02           ` Luis Chamberlain
2023-05-26  1:39             ` Linus Torvalds
2023-05-29  8:58               ` Johan Hovold
2023-05-29 11:00                 ` Linus Torvalds
2023-05-29 12:44                   ` Johan Hovold
2023-05-29 15:18                     ` Johan Hovold
2023-05-30  1:55                       ` Linus Torvalds
2023-05-30  9:40                         ` Johan Hovold
2023-06-05 12:25                           ` Johan Hovold
2023-05-30 16:22                         ` Luis Chamberlain
2023-05-30 17:16                           ` Lucas De Marchi
2023-05-30 19:41                             ` Luis Chamberlain
2023-05-30 22:17                               ` Linus Torvalds
2023-05-31  5:30                                 ` Lucas De Marchi
2023-05-31  0:31                           ` Luis Chamberlain
2023-05-31  7:51                           ` David Hildenbrand
2023-05-31 16:57                             ` Luis Chamberlain
2023-06-02 15:19                               ` David Hildenbrand
2023-06-02 16:04                                 ` Luis Chamberlain
2023-06-05 11:26                                   ` David Hildenbrand
2023-06-05 15:17                                     ` Luis Chamberlain
2023-06-05 15:28                                       ` Luis Chamberlain
2023-06-28 18:52                                         ` Luis Chamberlain
2023-06-28 20:14                                           ` Linus Torvalds
2023-06-28 22:07                                             ` Linus Torvalds
2023-06-28 23:17                                               ` Linus Torvalds
2023-06-29  0:18                                                 ` Luis Chamberlain
2023-06-02 16:06                                 ` Linus Torvalds
2023-06-02 16:37                                   ` David Hildenbrand
2023-05-30 22:45                         ` Dan Williams
2023-06-04 14:26                         ` Rudi Heitbaum
2023-05-29 17:47                     ` Linus Torvalds
2023-05-30 10:01                       ` Johan Hovold
2023-05-25 16:54     ` Lucas De Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230524213620.3509138-3-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=colin.i.king@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jbaron@akamai.com \
    --cc=jim.cromie@gmail.com \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=lucas.de.marchi@gmail.com \
    --cc=lucas.demarchi@intel.com \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=petr.pavlu@suse.com \
    --cc=pmladek@suse.com \
    --cc=prarit@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).