linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrei Vagin <avagin@gmail.com>,
	adobriyan@gmail.com, viro@zeniv.linux.org.uk,
	davem@davemloft.net, akpm@linux-foundation.org,
	areber@redhat.com, serge@hallyn.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
Date: Mon, 17 Aug 2020 19:47:45 +0200	[thread overview]
Message-ID: <20200817174745.jssxjdcwoqxeg5pu@wittgenstein> (raw)
In-Reply-To: <87d03pb7f2.fsf@x220.int.ebiederm.org>

On Mon, Aug 17, 2020 at 10:48:01AM -0500, Eric W. Biederman wrote:
> 
> Creating names in the kernel for namespaces is very difficult and
> problematic.  I have not seen anything that looks like  all of the
> problems have been solved with restoring these new names.
> 
> When your filter for your list of namespaces is user namespace creating
> a new directory in proc is highly questionable.
> 
> As everyone uses proc placing this functionality in proc also amplifies
> the problem of creating names.
> 
> 
> Rather than proc having a way to mount a namespace filesystem filter by
> the user namespace of the mounter likely to have many many fewer
> problems.  Especially as we are limiting/not allow new non-process
> things and ideally finding a way to remove the non-process things.
> 
> 
> Kirill you have a good point that taking the case where a pid namespace
> does not exist in a user namespace is likely quite unrealistic.
> 
> Kirill mentioned upthread that the list of namespaces are the list that
> can appear in a container.  Except by discipline in creating containers
> it is not possible to know which namespaces may appear in attached to a
> process.  It is possible to be very creative with setns, and violate any
> constraint you may have.  Which means your filtered list of namespaces
> may not contain all of the namespaces used by a set of processes.  This

Indeed. We use setns() quite creatively when intercepting syscalls and
when attaching to a container.

> further argues that attaching the list of namespaces to proc does not
> make sense.
> 
> Andrei has a good point that placing the names in a hierarchy by
> user namespace has the potential to create more freedom when
> assigning names to namespaces, as it means the names for namespaces
> do not need to be globally unique, and while still allowing the names
> to stay the same.
> 
> 
> To recap the possibilities for names for namespaces that I have seen
> mentioned in this thread are:
>   - Names per mount
>   - Names per user namespace
> 
> I personally suspect that names per mount are likely to be so flexibly
> they are confusing, while names per user namespace are likely to be
> rigid, possibly too rigid to use.
> 
> It all depends upon how everything is used.  I have yet to see a
> complete story of how these names will be generated and used.  So I can
> not really judge.

So I haven't fully understood either what the motivation for this
patchset is.
I can just speak to the use-case I had when I started prototyping
something similar: We needed a way to get a view on all namespaces
that exist on the system because we wanted a way to do namespace
debugging on a live system. This interface could've easily lived in
debugfs. The main point was that it should contain all namespaces.
Note, that it wasn't supposed to be a hierarchical format it was only
mean to list all namespaces and accessible to real root.
The interface here is way more flexible/complex and I haven't yet
figured out what exactly it is supposed to be used for.

> 
> 
> Let me add another take on this idea that might give this work a path
> forward. If I were solving this I would explore giving nsfs directories
> per user namespace, and a way to mount it that exposed the directory of
> the mounters current user namespace (something like btrfs snapshots).
> 
> Hmm.  For the user namespace directory I think I would give it a file
> "ns" that can be opened to get a file handle on the user namespace.
> Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid",
> "user", "uts") for each type of namespace.  In each directory I think
> I would just have a 64bit counter and each new entry I would assign the
> next number from that counter.
> 
> The restore could either have the ability to rename files or simply the
> ability to bump the counter (like we do with pids) so the names of the
> namespaces can be restored.
> 
> That winds up making a user namespace the namespace of namespaces, so
> I am not 100% about the idea. 

I think you're right that we need to understand better what the use-case
is. If I understand your suggestion correctly it wouldn't allow to show
nested user namespaces if the nsfs mount is per-user namespace.

Let me throw in a crazy idea: couldn't we just make the ioctl_ns() walk
a namespace hierarchy? For example, you could pass in a user namespace
fd and then you'd get back a struct with handles for fds for the
namespaces owned by that user namespace and then you could use
NS_GET_USERNS/NS_GET_PARENT to walk upwards from the user namespace fd
passed in initially and so on? Or something similar/simpler. This would
also decouple this from procfs somewhat.

Christian

  reply	other threads:[~2020-08-17 17:48 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 11:59 [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Kirill Tkhai
2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai
2020-07-30 13:35   ` Christian Brauner
2020-07-30 14:07     ` Kirill Tkhai
2020-07-30 15:59       ` Christian Brauner
2020-07-30 14:30   ` Christian Brauner
2020-07-30 14:34     ` Kirill Tkhai
2020-07-30 14:39       ` Christian Brauner
2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai
2020-07-30 14:30   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai
2020-07-30 14:32   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai
2020-07-30 14:37   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai
2020-07-30 14:46   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai
2020-07-30 14:49   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai
2020-07-30 14:50   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai
2020-07-30 14:52   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai
2020-07-30 12:23   ` Matthew Wilcox
2020-07-30 13:32     ` Kirill Tkhai
2020-07-30 13:56       ` Matthew Wilcox
2020-07-30 14:12         ` Kirill Tkhai
2020-07-30 14:15           ` Matthew Wilcox
2020-07-30 14:20             ` Kirill Tkhai
2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai
2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai
2020-07-30 12:18   ` Alexey Dobriyan
2020-07-30 13:22     ` Kirill Tkhai
2020-07-30 13:26   ` Christian Brauner
2020-07-30 14:30     ` Kirill Tkhai
2020-07-30 20:47   ` kernel test robot
2020-07-30 22:20   ` kernel test robot
2020-08-05  8:17   ` kernel test robot
2020-08-05  8:17   ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot
2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai
2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai
2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai
2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai
2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai
2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner
2020-07-30 13:38   ` Christian Brauner
2020-07-30 14:34 ` Eric W. Biederman
2020-07-30 14:42   ` Christian Brauner
2020-07-30 15:01   ` Kirill Tkhai
2020-07-30 22:13     ` Eric W. Biederman
2020-07-31  8:48       ` Pavel Tikhomirov
2020-08-03 10:03       ` Kirill Tkhai
2020-08-03 10:51         ` Alexey Dobriyan
2020-08-06  8:05         ` Andrei Vagin
2020-08-07  8:47           ` Kirill Tkhai
2020-08-10 17:34             ` Andrei Vagin
2020-08-11 10:23               ` Kirill Tkhai
2020-08-12 17:53                 ` Andrei Vagin
2020-08-13  8:12                   ` Kirill Tkhai
2020-08-14  1:16                     ` Andrei Vagin
2020-08-14 15:11                       ` Kirill Tkhai
2020-08-14 19:21                         ` Andrei Vagin
2020-08-17 14:05                           ` Kirill Tkhai
2020-08-17 15:48                             ` Eric W. Biederman
2020-08-17 17:47                               ` Christian Brauner [this message]
2020-08-17 18:53                                 ` Eric W. Biederman
2020-08-04  5:43     ` Andrei Vagin
2020-08-04 12:11       ` Pavel Tikhomirov
2020-08-04 14:47       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817174745.jssxjdcwoqxeg5pu@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=areber@redhat.com \
    --cc=avagin@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).