All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Gladkov <legion@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christian Brauner <brauner@kernel.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Kees Cook <keescook@chromium.org>,
	Linux Containers <containers@lists.linux.dev>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Vasily Averin <vvs@virtuozzo.com>
Subject: Re: [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory
Date: Wed, 1 Jun 2022 20:24:55 +0200	[thread overview]
Message-ID: <Ypeu97GDg6mNiKQ8@example.org> (raw)
In-Reply-To: <CAHk-=wjJ2CP0ugbOnwAd-=Cw0i-q_xC1PbJ-_1jrvR-aisiAAA@mail.gmail.com>

On Wed, Jun 01, 2022 at 09:45:15AM -0700, Linus Torvalds wrote:
> On Wed, Jun 1, 2022 at 6:20 AM Alexey Gladkov <legion@kernel.org> wrote:
> >
> > Dynamic memory allocation is needed to modify .data and specify the per
> > namespace parameter. The new sysctl API is allowed to get rid of the
> > need for such modification.
> 
> Ok, this is looking better. That said, a few comments:
> 
> >
> > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> > index ef313ecfb53a..833b670c38f3 100644
> > --- a/ipc/ipc_sysctl.c
> > +++ b/ipc/ipc_sysctl.c
> > @@ -68,26 +68,94 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
> >         return ret;
> >  }
> >
> > +static inline void *data_from_ns(struct ctl_context *ctx, struct ctl_table *table);
> > +
> > +static int ipc_sys_open(struct ctl_context *ctx, struct inode *inode, struct file *file)
> > +{
> > +       struct ipc_namespace *ns = current->nsproxy->ipc_ns;
> > +
> > +       // For now, we only allow changes in init_user_ns.
> > +       if (ns->user_ns != &init_user_ns)
> > +               return -EPERM;
> > +
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > +       int index = (ctx->table - ipc_sysctls);
> > +
> > +       switch (index) {
> > +               case IPC_SYSCTL_SEM_NEXT_ID:
> > +               case IPC_SYSCTL_MSG_NEXT_ID

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392:
> [...]
> 
> I don't think you actually even compile-tested this, because you're
> using these IPC_SYSCTL_SEM_NEXT_ID etc enums before you even declared
> them later in the same file.

I did it but without CONFIG_CHECKPOINT_RESTORE.

This is where I'm not sure who can write to ipc sysctls inside
ipc_namespace.

> > +static ssize_t ipc_sys_read(struct ctl_context *ctx, struct file *file,
> > +                    char *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +       struct ctl_table table = *ctx->table;
> > +       table.data = data_from_ns(ctx, ctx->table);
> > +       return table.proc_handler(&table, 0, buffer, lenp, ppos);
> > +}
> 
> Can we please fix the names and the types of this new 'ctx' structure?
> 
> Yes, yes, I know the old legacy "sysctl table" is horribly named, and
> uses "ctl_table".

Sure.

> But let's just write it out. It's not some random control table for
> anything. It's a very odd and specific thing: "sysctl". Let's use the
> full name.
> 
> Also, Please just make that "ctl_data" member in that "ctl_context"
> struct not just have a real name, but a real type. Make it literally
> be
> 
>     struct ipc_namespace *ipc_ns;
> 
> and if we end up with other things wanting other pointers, just add a
> new one (or make a union if we care about the size of that allocation,
> which I don't see any reason we'd do when it's literally just like a
> couple of pointers in size).
> 
> There is no reason to have some pseudo-generic "void *ctl_data" that
> makes it ambiguous and allows for type confusion and isn't
> self-documenting. I'd rather have a properly typed pointer that is
> just initialized to NULL and is not always used or needed, but always
> has a clear case for *what* it would be used for.
> 
> Yes, yes, we have f_private etc for things that are really very very
> generic and have arbitrary users. But 'sysctl' is not that kind of
> truly generic use.

Yep. I made ctl_data in the same way as f_private. My idea is that if
someone needs to store more than one pointer, they can put a struct there.
But it turned out that at least now, apart from ipc_namespace, nothing is
needed.

> I wish we didn't have that silly "create a temporary ctl_table entry"
> either, and I wish it was properly named. But it's not worth the
> pointless churn to fix old bad interfaces. But the new ones should
> have better names, and try to avoid those bad old decisions.

Currently temporary ctl_table is the main strategy for handling sysctl
entries.

Perhaps it will be possible to get rid of this if we add another
get_data() that would return what is currently placed in .data in
ctl_table. I mean make getting .data dynamic.

> But yeah, I think this all is a step in the right direction. And maybe
> some of those cases and old 'ctl_table' things can be migrated to just
> using individual read() functions entirely. The whole 'ctl_table'
> model was broken, and came from the bad old days with an actual
> 'sysctl()' system call.

I'm not sure how to get rid of ctl_table since net sysctls are heavily
dependent on it.

I was wondering if it's possible to get rid of ctl_table but if it's not
possible to rewrite everything to some kind of new magic API, then keeping
two of them would be a nightmare.

Another problem is that ctl_table is being used by __cgroup_bpf_run_filter_sysctl.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/cgroup.c#n1392

> Because I think it would be lovely if people would move away from the
> 'sysctl table' approach entirely for cases where that makes sense, and
> these guys that already need special handling are very much in that
> situation.

Since you think that these patches are a step in the right direction, then
I will prepare the first version with your comments in mind.

-- 
Rgrds, legion


  reply	other threads:[~2022-06-01 18:25 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-14 18:18 [PATCH v4 0/2] ipc: Store mq and ipc sysctls in the ipc namespace Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 1/2] ipc: Store mqueue " Alexey Gladkov
2022-02-14 18:18 ` [PATCH v4 2/2] ipc: Store ipc " Alexey Gladkov
2022-03-23 20:24 ` [GIT PULL] ipc: Bind to the ipc namespace at open time Eric W. Biederman
2022-03-24 18:12   ` Linus Torvalds
2022-03-24 21:48     ` Eric W. Biederman
2022-03-24 22:16       ` Linus Torvalds
2022-03-25 12:10     ` Alexey Gladkov
2022-04-22 12:53     ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 1/4] " Alexey Gladkov
2022-05-02 16:07         ` Eric W. Biederman
2022-04-22 12:53       ` [PATCH v1 2/4] ipc: Use proper " Alexey Gladkov
2022-05-02 16:09         ` Eric W. Biederman
2022-05-03 13:39           ` Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 1/4] ipc: Use the same namespace to modify and validate Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 2/4] ipc: Remove extra1 field abuse to pass ipc namespace Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-05-03 13:39             ` [PATCH v2 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 3/4] ipc: Check permissions for checkpoint_restart sysctls at open time Alexey Gladkov
2022-04-22 12:53       ` [PATCH v1 4/4] ipc: Remove extra braces Alexey Gladkov
2022-04-22 20:44       ` [PATCH v1 0/4] ipc: Remove extra1 field abuse to pass ipc namespace Linus Torvalds
2022-05-04  3:42         ` Philip Rhoades
2022-06-01 13:20         ` [RFC PATCH 0/4] API extension for handling sysctl Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 1/4] sysctl: " Alexey Gladkov
2022-06-01 19:19             ` Matthew Wilcox
2022-06-01 19:23               ` Linus Torvalds
2022-06-01 19:25                 ` Matthew Wilcox
2022-06-01 19:31                   ` Linus Torvalds
2022-06-01 19:32               ` Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 2/4] sysctl: ipc: Do not use dynamic memory Alexey Gladkov
2022-06-01 16:45             ` Linus Torvalds
2022-06-01 18:24               ` Alexey Gladkov [this message]
2022-06-01 18:34                 ` Linus Torvalds
2022-06-01 19:05                   ` Alexey Gladkov
2022-06-09 18:51                   ` Luis Chamberlain
2022-06-01 13:20           ` [RFC PATCH 3/4] sysctl: userns: " Alexey Gladkov
2022-06-01 13:20           ` [RFC PATCH 4/4] sysctl: mqueue: " Alexey Gladkov
2022-06-09 16:45           ` [RFC PATCH 0/4] API extension for handling sysctl Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ypeu97GDg6mNiKQ8@example.org \
    --to=legion@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=containers@lists.linux.dev \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vvs@virtuozzo.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.