patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johan Hovold <johan@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	Lucas De Marchi <lucas.demarchi@intel.com>,
	 Petr Pavlu <petr.pavlu@suse.com>,
	gregkh@linuxfoundation.org, rafael@kernel.org,  song@kernel.org,
	lucas.de.marchi@gmail.com, christophe.leroy@csgroup.eu,
	 peterz@infradead.org, rppt@kernel.org, dave@stgolabs.net,
	willy@infradead.org,  vbabka@suse.cz, mhocko@suse.com,
	dave.hansen@linux.intel.com,  colin.i.king@gmail.com,
	jim.cromie@gmail.com, catalin.marinas@arm.com,
	 jbaron@akamai.com, rick.p.edgecombe@intel.com,
	yujie.liu@intel.com,  david@redhat.com, tglx@linutronix.de,
	hch@lst.de, patches@lists.linux.dev,
	 linux-modules@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, pmladek@suse.com,
	prarit@redhat.com,  lennart@poettering.net
Subject: Re: [PATCH 2/2] module: add support to avoid duplicates early on load
Date: Mon, 29 May 2023 21:55:15 -0400	[thread overview]
Message-ID: <CAHk-=wg7ihygotpO9x5a6QJO5oAom9o91==L_Kx-gUHvRYuXiQ@mail.gmail.com> (raw)
In-Reply-To: <ZHTCK2_1pF61yWIr@hovoldconsulting.com>

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On Mon, May 29, 2023 at 11:18 AM Johan Hovold <johan@kernel.org> wrote:
>
> I took a closer look at some of the modules that failed to load and
> noticed a pattern in that they have dependencies that are needed by more
> than one device.

Ok, this is a "maybe something like this" RFC series of two patches -
one trivial one to re-organize things a bit so that we can then do the
real one which uses a filter based on the inode pointer to return an
"idempotent return value" for module loads that share the same inode.

It's entirely untested, and since I'm on the road I'm going to not
really be able to test it. It compiles for me, and the code looks
fairly straightforward, but it's probably buggy.

It's very loosely based on Luis' attempt,  but it
 (a) is internal to module loading
 (b) uses a reliable cookie
 (c) doesn't leave the cookie around randomly for later
 (d) has seen absolutely no testing

Put another way: if somebody wants to play with this, please treat it
as a starting point, not the final thing. You might need to debug
things, and fix silly mistakes.

The idea is to just have a simple hash list of currently executing
module loads, protected by a trivial spinlock. Every module loader
adds itself to the right hash list, and if they were the *first* one
(ie no other pending module loads for that inode), will actually do
the module load.

Everybody who *isn't* the first one will just wait for completion and
return the same error code that the first one returned.

This is technically bogus. The first one might fail due to arguments.
So the cookie shouldn't be just the inode, it should be the inode and
a hash of the arguments or something like that. But it is what it is,
and apart from possible show-stopper bugs this is no worse than the
failed "exclusive write deny" attempt. IOW - maybe worth trying?

And if *that* didn't sell people on this patch series, I don't know
what will. I should be in marketing! Two drink minimums, here I come!

               Linus

[-- Attachment #2: 0001-module-split-up-finit_module-into-init_module_from_f.patch --]
[-- Type: text/x-patch, Size: 2888 bytes --]

From b7d19af1b2a3ca9f789df9c04147576fca7c5b8f Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 29 May 2023 20:55:13 -0400
Subject: [PATCH 1/2] module: split up 'finit_module()' into
 init_module_from_file() helper

This will simplify the next step, where we can then key off the inode to
do one idempotent module load.

Let's do the obvious re-organization in one step, and then the new code
in another.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/module/main.c | 42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 044aa2c9e3cb..427bffa2844f 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3057,26 +3057,16 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
 	return load_module(&info, uargs, 0);
 }
 
-SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
+static int init_module_from_file(struct file *f, const char __user * uargs, int flags)
 {
 	struct load_info info = { };
 	void *buf = NULL;
 	int len;
-	int err;
 
-	err = may_init_module();
-	if (err)
-		return err;
+	if (!f || !(f->f_mode & FMODE_READ))
+		return -EBADF;
 
-	pr_debug("finit_module: fd=%d, uargs=%p, flags=%i\n", fd, uargs, flags);
-
-	if (flags & ~(MODULE_INIT_IGNORE_MODVERSIONS
-		      |MODULE_INIT_IGNORE_VERMAGIC
-		      |MODULE_INIT_COMPRESSED_FILE))
-		return -EINVAL;
-
-	len = kernel_read_file_from_fd(fd, 0, &buf, INT_MAX, NULL,
-				       READING_MODULE);
+	len = kernel_read_file(f, 0, &buf, INT_MAX, NULL, READING_MODULE);
 	if (len < 0) {
 		mod_stat_inc(&failed_kreads);
 		mod_stat_add_long(len, &invalid_kread_bytes);
@@ -3084,7 +3074,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 	}
 
 	if (flags & MODULE_INIT_COMPRESSED_FILE) {
-		err = module_decompress(&info, buf, len);
+		int err = module_decompress(&info, buf, len);
 		vfree(buf); /* compressed data is no longer needed */
 		if (err) {
 			mod_stat_inc(&failed_decompress);
@@ -3099,6 +3089,28 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
 	return load_module(&info, uargs, flags);
 }
 
+SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
+{
+	int err;
+	struct fd f;
+
+	err = may_init_module();
+	if (err)
+		return err;
+
+	pr_debug("finit_module: fd=%d, uargs=%p, flags=%i\n", fd, uargs, flags);
+
+	if (flags & ~(MODULE_INIT_IGNORE_MODVERSIONS
+		      |MODULE_INIT_IGNORE_VERMAGIC
+		      |MODULE_INIT_COMPRESSED_FILE))
+		return -EINVAL;
+
+	f = fdget(fd);
+	err = init_module_from_file(f.file, uargs, flags);
+	fdput(f);
+	return err;
+}
+
 /* Keep in sync with MODULE_FLAGS_BUF_SIZE !!! */
 char *module_flags(struct module *mod, char *buf, bool show_state)
 {
-- 
2.40.0.rc1.2.gd15644fe02


[-- Attachment #3: 0002-modules-catch-concurrent-module-loads-take-two.patch --]
[-- Type: text/x-patch, Size: 3300 bytes --]

From c844099ab4d3032424b4bf8720e761fbccf88ea5 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 29 May 2023 21:39:51 -0400
Subject: [PATCH 2/2] modules: catch concurrent module loads, take two

This treats concurrent module loads from a file as "idempotent" in the
inode, ie if one module load is ongoing, we don't start a new one, but
instead just expect the first one to complete and return the same return
value as it did.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/module/main.c | 73 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/kernel/module/main.c b/kernel/module/main.c
index 427bffa2844f..82b0dcc1fe77 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3057,15 +3057,82 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
 	return load_module(&info, uargs, 0);
 }
 
+struct idempotent {
+	const void *cookie;
+	struct hlist_node entry;
+	struct completion complete;
+	int ret;
+};
+
+#define IDEM_HASH_BITS 8
+static struct hlist_head idem_hash[1 << IDEM_HASH_BITS];
+static struct spinlock idem_lock;
+
+static bool idempotent(struct idempotent *u, const void *cookie)
+{
+	int hash = hash_ptr(cookie, IDEM_HASH_BITS);
+	struct hlist_head *head = idem_hash + hash;
+	struct idempotent *existing;
+	bool first;
+
+	u->ret = 0;
+	u->cookie = cookie;
+	init_completion(&u->complete);
+
+	spin_lock(&idem_lock);
+	first = true;
+	hlist_for_each_entry(existing, head, entry) {
+		if (existing->cookie != cookie)
+			continue;
+		first = false;
+		break;
+	}
+	hlist_add_head(&u->entry, idem_hash+hash);
+	spin_unlock(&idem_lock);
+
+	return !first;
+}
+
+/*
+ * We were the first one with 'cookie' on the list, and we ended
+ * up completing the operation. We now need to walk the list,
+ * remove everybody - which includes ourselfs - fill in the return
+ * value, and then complete the operation.
+ */
+static void idempotent_complete(struct idempotent *u, int ret)
+{
+	const void *cookie = u->cookie;
+	int hash = hash_ptr(cookie, IDEM_HASH_BITS);
+	struct hlist_head *head = idem_hash + hash;
+	struct hlist_node *next;
+	struct idempotent *pos;
+
+	spin_lock(&idem_lock);
+	hlist_for_each_entry_safe(pos, next, head, entry) {
+		if (pos->cookie != cookie)
+			continue;
+		hlist_del(&pos->entry);
+		pos->ret = ret;
+		complete(&pos->complete);
+	}
+	spin_unlock(&idem_lock);
+}
+
 static int init_module_from_file(struct file *f, const char __user * uargs, int flags)
 {
+	struct idempotent idem;
 	struct load_info info = { };
 	void *buf = NULL;
-	int len;
+	int len, ret;
 
 	if (!f || !(f->f_mode & FMODE_READ))
 		return -EBADF;
 
+	if (idempotent(&idem, file_inode(f))) {
+		wait_for_completion(&idem.complete);
+		return idem.ret;
+	}
+
 	len = kernel_read_file(f, 0, &buf, INT_MAX, NULL, READING_MODULE);
 	if (len < 0) {
 		mod_stat_inc(&failed_kreads);
@@ -3086,7 +3153,9 @@ static int init_module_from_file(struct file *f, const char __user * uargs, int
 		info.len = len;
 	}
 
-	return load_module(&info, uargs, flags);
+	ret = load_module(&info, uargs, flags);
+	idempotent_complete(&idem, ret);
+	return ret;
 }
 
 SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
-- 
2.40.0.rc1.2.gd15644fe02


  reply	other threads:[~2023-05-30  1:55 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24 21:36 [PATCH 0/2] module: avoid all memory pressure due to duplicates Luis Chamberlain
2023-05-24 21:36 ` [PATCH 1/2] fs/kernel_read_file: add support for duplicate detection Luis Chamberlain
2023-05-24 21:52   ` Linus Torvalds
2023-05-24 21:56     ` Linus Torvalds
2023-05-24 22:07       ` Luis Chamberlain
2023-05-25  4:00     ` Linus Torvalds
2023-05-25 18:08       ` Luis Chamberlain
2023-05-25 18:35         ` Luis Chamberlain
2023-05-25 18:50         ` Linus Torvalds
2023-05-25 19:32           ` Luis Chamberlain
2023-05-25  7:01     ` Christian Brauner
2023-05-24 21:36 ` [PATCH 2/2] module: add support to avoid duplicates early on load Luis Chamberlain
2023-05-25 11:40   ` Petr Pavlu
2023-05-25 16:07     ` Linus Torvalds
2023-05-25 16:42       ` Greg KH
2023-05-25 18:22         ` Luis Chamberlain
2023-05-25 17:52       ` Linus Torvalds
2023-05-25 18:45       ` Lucas De Marchi
2023-05-25 21:12         ` Linus Torvalds
2023-05-25 22:02           ` Luis Chamberlain
2023-05-26  1:39             ` Linus Torvalds
2023-05-29  8:58               ` Johan Hovold
2023-05-29 11:00                 ` Linus Torvalds
2023-05-29 12:44                   ` Johan Hovold
2023-05-29 15:18                     ` Johan Hovold
2023-05-30  1:55                       ` Linus Torvalds [this message]
2023-05-30  9:40                         ` Johan Hovold
2023-06-05 12:25                           ` Johan Hovold
2023-05-30 16:22                         ` Luis Chamberlain
2023-05-30 17:16                           ` Lucas De Marchi
2023-05-30 19:41                             ` Luis Chamberlain
2023-05-30 22:17                               ` Linus Torvalds
2023-05-31  5:30                                 ` Lucas De Marchi
2023-05-31  0:31                           ` Luis Chamberlain
2023-05-31  7:51                           ` David Hildenbrand
2023-05-31 16:57                             ` Luis Chamberlain
2023-06-02 15:19                               ` David Hildenbrand
2023-06-02 16:04                                 ` Luis Chamberlain
2023-06-05 11:26                                   ` David Hildenbrand
2023-06-05 15:17                                     ` Luis Chamberlain
2023-06-05 15:28                                       ` Luis Chamberlain
2023-06-28 18:52                                         ` Luis Chamberlain
2023-06-28 20:14                                           ` Linus Torvalds
2023-06-28 22:07                                             ` Linus Torvalds
2023-06-28 23:17                                               ` Linus Torvalds
2023-06-29  0:18                                                 ` Luis Chamberlain
2023-06-02 16:06                                 ` Linus Torvalds
2023-06-02 16:37                                   ` David Hildenbrand
2023-05-30 22:45                         ` Dan Williams
2023-06-04 14:26                         ` Rudi Heitbaum
2023-05-29 17:47                     ` Linus Torvalds
2023-05-30 10:01                       ` Johan Hovold
2023-05-25 16:54     ` Lucas De Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg7ihygotpO9x5a6QJO5oAom9o91==L_Kx-gUHvRYuXiQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=colin.i.king@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jbaron@akamai.com \
    --cc=jim.cromie@gmail.com \
    --cc=johan@kernel.org \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=lucas.de.marchi@gmail.com \
    --cc=lucas.demarchi@intel.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=petr.pavlu@suse.com \
    --cc=pmladek@suse.com \
    --cc=prarit@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).