linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: David Miller <davem@davemloft.net>
Cc: Andy Lutomirski <luto@amacapital.net>,
	Alexei Starovoitov <ast@fb.com>,
	Kees Cook <keescook@chromium.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Djalal Harouni <tixxdz@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@fb.com>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
Date: Fri, 9 Mar 2018 11:38:25 -0800	[thread overview]
Message-ID: <CA+55aFx-kDp87_uv9TW5pYD2+nr8c65=ervvwcxhTBJVg7c_pQ@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFyN8JDh76bm1VinBvrgMH5jNL6RbuFAkrH8vuYU1s9GDQ@mail.gmail.com>

On Fri, Mar 9, 2018 at 11:12 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> How are you going to handle five processes doing the same setup concurrently?

Side note: it's not just serialization. It's also "is it actually up
and running".

The rule for "request_module()" (for a real module) has been that it
returns when the module is actually alive and active and have done
their initcalls.

The UMH_WAIT_EXEC behavior (ignore the serialization - you could do
that in the caller) behavior doesn't actually have any semantics AT
ALL. It only means that you get the error returns from execve()
itself, so you know that the executable file actually existed and
parsed right enough to get started.

But you don't actually have any reason to believe that it has *done*
anything, and started processing any requests. There's no reason
what-so-ever to believe that it has registered itself for any
asynchronous requests or anything like that.

So in the real module case, you can do

    request_module("modulename");

and just start using whatever resource you just requested. So the
netfilter code literally does

                request_module("nft-chain-%u-%.*s", family,
                               nla_len(nla), (const char *)nla_data(nla));
                nfnl_lock(NFNL_SUBSYS_NFTABLES);
                type = __nf_tables_chain_type_lookup(nla, family);
                if (type != NULL)
                        return ERR_PTR(-EAGAIN);

and doesn't even care about error handling for request_module()
itself, because it knows that either the module got loaded and is
ready, or something failed. And it needs to look that chain type up
anyway, so the failure is indicated by _that_.

With a UMH_WAIT_EXEC? No. You have *nothing*. You know the thing
started, but it might have SIGSEGV'd immediately, and you have
absolutely no way of knowing, and absolutely no way of even figuring
it out. You can wait - forever - for something to bind to whatever
dynamic resource you're expecting. You'll just fundamentally never
know.

You can try again, of course. Add a timeout, and try again in five
seconds or something. Maybe it will work then. Maybe it won't. You
won't have any way to know the _second_ time around either. Or the
third. Or...

See why I say it has to be synchronous?

If it's synchronous, you can actually do things like

 (a) maybe you only need a one-time thing, and don't have any state
("load fixed tables, be done") and that's it. If the process returns
with no error code, you're all done, and you know it's fine.

 (b) maybe the process wants to start a listener daemon or something
like the traditional inetd model. It can open the socket, it can start
listening on it, and it can fork off a child and check it's status. It
can then do exit(0) if everything is fine, and now request_module()
returns.

see the difference? Even if you ended up with a background process
(like in that (b) case), you did so with *error* handling, and you did
so knowing that the state has actually been set up by the time the
request_module() returns.

And if you use the proper module loading exclusion, it also means that
that (b) can know it's the only process starting up, and it's not
racing with another one.  It might still want to do the usual
lock-files in user space to protect against just the admin starting it
manually, but at least you don't have the situation that a hundred
threads just had a thundering herd where they all ended up using the
same kernel facility, and they all independently started a hundred
usermode helpers.

                  Linus

  reply	other threads:[~2018-03-09 19:38 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06  1:34 [PATCH net-next] modules: allow modprobe load regular elf binaries Alexei Starovoitov
2018-03-06  2:13 ` Randy Dunlap
2018-03-06  3:02   ` Alexei Starovoitov
2018-03-06 11:05 ` Greg KH
2018-03-07  1:07   ` Alexei Starovoitov
2018-03-07  3:24     ` Greg KH
2018-03-06 19:12 ` Linus Torvalds
2018-03-06 23:42   ` Chris Mason
2018-05-02  9:12     ` Jesper Dangaard Brouer
2018-03-06 20:01 ` Andy Lutomirski
2018-03-06 20:26   ` Linus Torvalds
2018-03-07 17:22 ` David Miller
2018-03-08  1:23 ` Luis R. Rodriguez
2018-03-08 23:07   ` Alexei Starovoitov
2018-03-09  1:58     ` Luis R. Rodriguez
2018-03-09  0:24 ` Kees Cook
2018-03-09  0:57   ` Alexei Starovoitov
2018-03-09  1:04     ` Andy Lutomirski
2018-03-09  1:25       ` Alexei Starovoitov
2018-03-09  1:24     ` Kees Cook
2018-03-09  0:59   ` Andy Lutomirski
2018-03-09  1:20     ` Alexei Starovoitov
2018-03-09  2:12       ` Andy Lutomirski
2018-03-09  2:31         ` David Miller
2018-03-09  3:10           ` Andy Lutomirski
2018-03-09  3:27         ` Alexei Starovoitov
2018-03-09  1:38     ` Linus Torvalds
2018-03-09  1:44       ` Kees Cook
2018-03-09  3:06         ` Linus Torvalds
2018-03-09  3:17           ` Linus Torvalds
2018-03-09  3:54           ` Andy Lutomirski
2018-03-09  5:08             ` Alexei Starovoitov
2018-03-09 15:16               ` Andy Lutomirski
2018-03-09 15:39                 ` Alexei Starovoitov
2018-03-09 16:24                   ` Andy Lutomirski
2018-03-09 17:32                     ` Alexei Starovoitov
2018-03-09 18:15                       ` Greg KH
2018-03-09 18:23                         ` Andy Lutomirski
2018-03-09 18:29                           ` Greg KH
2018-03-09 18:50                           ` Alexei Starovoitov
2018-03-09 18:55                             ` David Miller
2018-03-09 19:37                               ` Andy Lutomirski
2018-03-10  1:43                                 ` Alexei Starovoitov
2018-03-11  2:17                                   ` Andy Lutomirski
2018-03-09 18:17               ` Linus Torvalds
2018-03-09 18:35                 ` David Miller
2018-03-09 18:43                   ` Kees Cook
2018-03-09 18:50                     ` Linus Torvalds
2018-03-09 18:54                       ` Kees Cook
2018-03-09 18:58                       ` Alexei Starovoitov
2018-03-12 12:02                         ` Edward Cree
2018-03-12 17:49                           ` Alexei Starovoitov
2018-03-09 18:48                 ` Andy Lutomirski
2018-03-09 18:53                   ` Linus Torvalds
2018-03-09 18:57                     ` David Miller
2018-03-09 19:12                       ` Linus Torvalds
2018-03-09 19:38                         ` Linus Torvalds [this message]
2018-03-09 19:45                           ` Andy Lutomirski
2018-03-10  2:34                           ` Alexei Starovoitov
2018-03-10 14:08                             ` Luis R. Rodriguez
2018-03-10 15:16                               ` Luis R. Rodriguez
2018-03-10 15:34                                 ` Luis R. Rodriguez
2018-03-12 17:22                                   ` Alexei Starovoitov
2018-03-13  8:48                                     ` Greg Kroah-Hartman
2018-03-22 20:54                                 ` Luis R. Rodriguez
2018-03-22 22:15                                   ` Andy Lutomirski
2018-03-22 22:21                                     ` Alexei Starovoitov
2018-03-23  2:47                                     ` Luis R. Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+55aFx-kDp87_uv9TW5pYD2+nr8c65=ervvwcxhTBJVg7c_pQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=ast@fb.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=keescook@chromium.org \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mcgrof@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tixxdz@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).