containers.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Alexey Gladkov <legion@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <brauner@kernel.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Kees Cook <keescook@chromium.org>,
	Linux Containers <containers@lists.linux.dev>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Vasily Averin <vvs@virtuozzo.com>
Subject: Re: [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory
Date: Wed, 1 Jun 2022 20:24:55 +0200	[thread overview]
Message-ID: <Ypeu97GDg6mNiKQ8@example.org> (raw)
In-Reply-To: <CAHk-=wjJ2CP0ugbOnwAd-=Cw0i-q_xC1PbJ-_1jrvR-aisiAAA@mail.gmail.com>

On Wed, Jun 01, 2022 at 09:45:15AM -0700, Linus Torvalds wrote:
> On Wed, Jun 1, 2022 at 6:20 AM Alexey Gladkov <legion@kernel.org> wrote:
> >
> > Dynamic memory allocation is needed to modify .data and specify the per
> > namespace parameter. The new sysctl API is allowed to get rid of the
> > need for such modification.
> 
> Ok, this is looking better. That said, a few comments:
> 
> >
> > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> > index ef313ecfb53a..833b670c38f3 100644
> > --- a/ipc/ipc_sysctl.c
> > +++ b/ipc/ipc_sysctl.c
> > @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
> >         return ret;
> >  }
> >
> > +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table);
> > +
> > +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file)
> > +{
> > +       struct ipc_namespace *ns = current->nsproxy->ipc_ns;
> > +
> > +       // For now, we only allow changes in init_user_ns.
> > +       if (ns->user_ns != &init_user_ns)
> > +               return -EPERM;
> > +
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > +       int index = (ctx->table - ipc_sysctls);
> > +
> > +       switch (index) {
> > +               case IPC_SYSCTL_SEM_NEXT_ID:
> > +               case IPC_SYSCTL_MSG_NEXT_ID

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392:
> [...]
> 
> I don't think you actually even compile-tested this, because you're
> using these IPC_SYSCTL_SEM_NEXT_ID etc enums before you even declared
> them later in the same file.

I did it but without CONFIG_CHECKPOINT_RESTORE.

This is where I'm not sure who can write to ipc sysctls inside
ipc_namespace.

> > +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file,
> > +                    char *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +       struct ctl_table table = *ctx->table;
> > +       table.data = data_from_ns(ctx, ctx->table);
> > +       return table.proc_handler(&table, 0, buffer, lenp, ppos);
> > +}
> 
> Can we please fix the names and the types of this new 'ctx' structure?
> 
> Yes, yes, I know the old legacy "sysctl table" is horribly named, and
> uses "ctl_table".

Sure.

> But let's just write it out. It's not some random control table for
> anything. It's a very odd and specific thing: "sysctl". Let's use the
> full name.
> 
> Also, Please just make that "ctl_data" member in that "ctl_context"
> struct not just have a real name, but a real type. Make it literally
> be
> 
>     struct ipc_namespace *ipc_ns;
> 
> and if we end up with other things wanting other pointers, just add a
> new one (or make a union if we care about the size of that allocation,
> which I don't see any reason we'd do when it's literally just like a
> couple of pointers in size).
> 
> There is no reason to have some pseudo-generic "void *ctl_data" that
> makes it ambiguous and allows for type confusion and isn't
> self-documenting. I'd rather have a properly typed pointer that is
> just initialized to NULL and is not always used or needed, but always
> has a clear case for *what* it would be used for.
> 
> Yes, yes, we have f_private etc for things that are really very very
> generic and have arbitrary users. But 'sysctl' is not that kind of
> truly generic use.

Yep. I made ctl_data in the same way as f_private. My idea is that if
someone needs to store more than one pointer, they can put a struct there.
But it turned out that at least now, apart from ipc_namespace, nothing is
needed.

> I wish we didn't have that silly "create a temporary ctl_table entry"
> either, and I wish it was properly named. But it's not worth the
> pointless churn to fix old bad interfaces. But the new ones should
> have better names, and try to avoid those bad old decisions.

Currently temporary ctl_table is the main strategy for handling sysctl
entries.

Perhaps it will be possible to get rid of this if we add another
get_data() that would return what is currently placed in .data in
ctl_table. I mean make getting .data dynamic.

> But yeah, I think this all is a step in the right direction. And maybe
> some of those cases and old 'ctl_table' things can be migrated to just
> using individual read() functions entirely. The whole 'ctl_table'
> model was broken, and came from the bad old days with an actual
> 'sysctl()' system call.

I'm not sure how to get rid of ctl_table since net sysctls are heavily
dependent on it.

I was wondering if it's possible to get rid of ctl_table but if it's not
possible to rewrite everything to some kind of new magic API, then keeping
two of them would be a nightmare.

Another problem is that ctl_table is being used by __cgroup_bpf_run_filter_sysctl.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392

> Because I think it would be lovely if people would move away from the
> 'sysctl table' approach entirely for cases where that makes sense, and
> these guys that already need special handling are very much in that
> situation.

Since you think that these patches are a step in the right direction, then
I will prepare the first version with your comments in mind.

-- 
Rgrds, legion


  reply	other threads:[~2022-06-01 18:25 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-14 18:18 [PATCH v4 0/2] ipc: Store mq and ipc sysctls in the ipc namespace Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 1/2] ipc: Store mqueue " Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 2/2] ipc: Store ipc " Alexey Gladkov
2022-03-23 20:24 ` [GIT PULL] ipc: Bind to the ipc namespace at open time Eric W. Biederman
2022-03-24 18:12   ` Linus Torvalds
2022-03-24 21:48     ` Eric W. Biederman
2022-03-24 22:16       ` Linus Torvalds
2022-03-25 12:10     ` Alexey Gladkov
2022-04-22 12:53     ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 1/4] " Alexey Gladkov
2022-05-02 16:07         ` Eric W. Biederman
2022-04-22 12:53       ` [PATCH v1 2/4] ipc: Use proper " Alexey Gladkov
2022-05-02 16:09         ` Eric W. Biederman
2022-05-03 13:39           ` Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 1/4] ipc: Use the same namespace to modify and validate Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 2/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 20:44       ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Linus Torvalds
2022-05-04  3:42         ` Philip Rhoades
2022-06-01 13:20         ` [RFC PATCH 0/4] API extension for handling sysctl Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 1/4] sysctl: " Alexey Gladkov
2022-06-01 19:19             ` Matthew Wilcox
2022-06-01 19:23               ` Linus Torvalds
2022-06-01 19:25                 ` Matthew Wilcox
2022-06-01 19:31                   ` Linus Torvalds
2022-06-01 19:32               ` Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory Alexey Gladkov
2022-06-01 16:45             ` Linus Torvalds
2022-06-01 18:24               ` Alexey Gladkov [this message]
2022-06-01 18:34                 ` Linus Torvalds
2022-06-01 19:05                   ` Alexey Gladkov
2022-06-09 18:51                   ` Luis Chamberlain
2022-06-01 13:20           ` [RFC PATCH 3/4] sysctl: userns: " Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 4/4] sysctl: mqueue: " Alexey Gladkov
2022-06-09 16:45           ` [RFC PATCH 0/4] API extension for handling sysctl Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ypeu97GDg6mNiKQ8@example.org \
    --to=legion@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=containers@lists.linux.dev \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vvs@virtuozzo.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).