From: Alexey Gladkov <legion@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christian Brauner <brauner@kernel.org>,
Iurii Zaikin <yzaikin@google.com>,
Kees Cook <keescook@chromium.org>,
Linux Containers <containers@lists.linux.dev>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Luis Chamberlain <mcgrof@kernel.org>,
Vasily Averin <vvs@virtuozzo.com>
Subject: Re: [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory
Date: Wed, 1 Jun 2022 20:24:55 +0200 [thread overview]
Message-ID: <Ypeu97GDg6mNiKQ8@example.org> (raw)
In-Reply-To: <CAHk-=wjJ2CP0ugbOnwAd-=Cw0i-q_xC1PbJ-_1jrvR-aisiAAA@mail.gmail.com>
On Wed, Jun 01, 2022 at 09:45:15AM -0700, Linus Torvalds wrote:
> On Wed, Jun 1, 2022 at 6:20 AM Alexey Gladkov <legion@kernel.org> wrote:
> >
> > Dynamic memory allocation is needed to modify .data and specify the per
> > namespace parameter. The new sysctl API is allowed to get rid of the
> > need for such modification.
>
> Ok, this is looking better. That said, a few comments:
>
> >
> > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> > index ef313ecfb53a..833b670c38f3 100644
> > --- a/ipc/ipc_sysctl.c
> > +++ b/ipc/ipc_sysctl.c
> > @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
> > return ret;
> > }
> >
> > +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table);
> > +
> > +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file)
> > +{
> > + struct ipc_namespace *ns = current->nsproxy->ipc_ns;
> > +
> > + // For now, we only allow changes in init_user_ns.
> > + if (ns->user_ns != &init_user_ns)
> > + return -EPERM;
> > +
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > + int index = (ctx->table - ipc_sysctls);
> > +
> > + switch (index) {
> > + case IPC_SYSCTL_SEM_NEXT_ID:
> > + case IPC_SYSCTL_MSG_NEXT_ID
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392:
> [...]
>
> I don't think you actually even compile-tested this, because you're
> using these IPC_SYSCTL_SEM_NEXT_ID etc enums before you even declared
> them later in the same file.
I did it but without CONFIG_CHECKPOINT_RESTORE.
This is where I'm not sure who can write to ipc sysctls inside
ipc_namespace.
> > +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file,
> > + char *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > + struct ctl_table table = *ctx->table;
> > + table.data = data_from_ns(ctx, ctx->table);
> > + return table.proc_handler(&table, 0, buffer, lenp, ppos);
> > +}
>
> Can we please fix the names and the types of this new 'ctx' structure?
>
> Yes, yes, I know the old legacy "sysctl table" is horribly named, and
> uses "ctl_table".
Sure.
> But let's just write it out. It's not some random control table for
> anything. It's a very odd and specific thing: "sysctl". Let's use the
> full name.
>
> Also, Please just make that "ctl_data" member in that "ctl_context"
> struct not just have a real name, but a real type. Make it literally
> be
>
> struct ipc_namespace *ipc_ns;
>
> and if we end up with other things wanting other pointers, just add a
> new one (or make a union if we care about the size of that allocation,
> which I don't see any reason we'd do when it's literally just like a
> couple of pointers in size).
>
> There is no reason to have some pseudo-generic "void *ctl_data" that
> makes it ambiguous and allows for type confusion and isn't
> self-documenting. I'd rather have a properly typed pointer that is
> just initialized to NULL and is not always used or needed, but always
> has a clear case for *what* it would be used for.
>
> Yes, yes, we have f_private etc for things that are really very very
> generic and have arbitrary users. But 'sysctl' is not that kind of
> truly generic use.
Yep. I made ctl_data in the same way as f_private. My idea is that if
someone needs to store more than one pointer, they can put a struct there.
But it turned out that at least now, apart from ipc_namespace, nothing is
needed.
> I wish we didn't have that silly "create a temporary ctl_table entry"
> either, and I wish it was properly named. But it's not worth the
> pointless churn to fix old bad interfaces. But the new ones should
> have better names, and try to avoid those bad old decisions.
Currently temporary ctl_table is the main strategy for handling sysctl
entries.
Perhaps it will be possible to get rid of this if we add another
get_data() that would return what is currently placed in .data in
ctl_table. I mean make getting .data dynamic.
> But yeah, I think this all is a step in the right direction. And maybe
> some of those cases and old 'ctl_table' things can be migrated to just
> using individual read() functions entirely. The whole 'ctl_table'
> model was broken, and came from the bad old days with an actual
> 'sysctl()' system call.
I'm not sure how to get rid of ctl_table since net sysctls are heavily
dependent on it.
I was wondering if it's possible to get rid of ctl_table but if it's not
possible to rewrite everything to some kind of new magic API, then keeping
two of them would be a nightmare.
Another problem is that ctl_table is being used by __cgroup_bpf_run_filter_sysctl.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392
> Because I think it would be lovely if people would move away from the
> 'sysctl table' approach entirely for cases where that makes sense, and
> these guys that already need special handling are very much in that
> situation.
Since you think that these patches are a step in the right direction, then
I will prepare the first version with your comments in mind.
--
Rgrds, legion
next prev parent reply other threads:[~2022-06-01 18:25 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-14 18:18 [PATCH v4 0/2] ipc: Store mq and ipc sysctls in the ipc namespace Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 1/2] ipc: Store mqueue " Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 2/2] ipc: Store ipc " Alexey Gladkov
2022-03-23 20:24 ` [GIT PULL] ipc: Bind to the ipc namespace at open time Eric W. Biederman
2022-03-24 18:12 ` Linus Torvalds
2022-03-24 21:48 ` Eric W. Biederman
2022-03-24 22:16 ` Linus Torvalds
2022-03-25 12:10 ` Alexey Gladkov
2022-04-22 12:53 ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-04-22 12:53 ` [PATCH v1 1/4] " Alexey Gladkov
2022-05-02 16:07 ` Eric W. Biederman
2022-04-22 12:53 ` [PATCH v1 2/4] ipc: Use proper " Alexey Gladkov
2022-05-02 16:09 ` Eric W. Biederman
2022-05-03 13:39 ` Alexey Gladkov
2022-05-03 13:39 ` [PATCH v2 1/4] ipc: Use the same namespace to modify and validate Alexey Gladkov
2022-05-03 13:39 ` [PATCH v2 2/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-05-03 13:39 ` [PATCH v2 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-05-03 13:39 ` [PATCH v2 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 12:53 ` [PATCH v1 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-04-22 12:53 ` [PATCH v1 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 20:44 ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Linus Torvalds
2022-05-04 3:42 ` Philip Rhoades
2022-06-01 13:20 ` [RFC PATCH 0/4] API extension for handling sysctl Alexey Gladkov
2022-06-01 13:20 ` [RFC PATCH 1/4] sysctl: " Alexey Gladkov
2022-06-01 19:19 ` Matthew Wilcox
2022-06-01 19:23 ` Linus Torvalds
2022-06-01 19:25 ` Matthew Wilcox
2022-06-01 19:31 ` Linus Torvalds
2022-06-01 19:32 ` Alexey Gladkov
2022-06-01 13:20 ` [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory Alexey Gladkov
2022-06-01 16:45 ` Linus Torvalds
2022-06-01 18:24 ` Alexey Gladkov [this message]
2022-06-01 18:34 ` Linus Torvalds
2022-06-01 19:05 ` Alexey Gladkov
2022-06-09 18:51 ` Luis Chamberlain
2022-06-01 13:20 ` [RFC PATCH 3/4] sysctl: userns: " Alexey Gladkov
2022-06-01 13:20 ` [RFC PATCH 4/4] sysctl: mqueue: " Alexey Gladkov
2022-06-09 16:45 ` [RFC PATCH 0/4] API extension for handling sysctl Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ypeu97GDg6mNiKQ8@example.org \
--to=legion@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=containers@lists.linux.dev \
--cc=ebiederm@xmission.com \
--cc=keescook@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vvs@virtuozzo.com \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).