All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrei Vagin <avagin@gmail.com>,
	adobriyan@gmail.com, viro@zeniv.linux.org.uk,
	davem@davemloft.net, akpm@linux-foundation.org,
	areber@redhat.com, serge@hallyn.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary
Date: Mon, 17 Aug 2020 19:47:45 +0200	[thread overview]
Message-ID: <20200817174745.jssxjdcwoqxeg5pu@wittgenstein> (raw)
In-Reply-To: <87d03pb7f2.fsf@x220.int.ebiederm.org>

On Mon, Aug 17, 2020 at 10:48:01AM -0500, Eric W. Biederman wrote:
> 
> Creating names in the kernel for namespaces is very difficult and
> problematic.  I have not seen anything that looks like  all of the
> problems have been solved with restoring these new names.
> 
> When your filter for your list of namespaces is user namespace creating
> a new directory in proc is highly questionable.
> 
> As everyone uses proc placing this functionality in proc also amplifies
> the problem of creating names.
> 
> 
> Rather than proc having a way to mount a namespace filesystem filter by
> the user namespace of the mounter likely to have many many fewer
> problems.  Especially as we are limiting/not allow new non-process
> things and ideally finding a way to remove the non-process things.
> 
> 
> Kirill you have a good point that taking the case where a pid namespace
> does not exist in a user namespace is likely quite unrealistic.
> 
> Kirill mentioned upthread that the list of namespaces are the list that
> can appear in a container.  Except by discipline in creating containers
> it is not possible to know which namespaces may appear in attached to a
> process.  It is possible to be very creative with setns, and violate any
> constraint you may have.  Which means your filtered list of namespaces
> may not contain all of the namespaces used by a set of processes.  This

Indeed. We use setns() quite creatively when intercepting syscalls and
when attaching to a container.

> further argues that attaching the list of namespaces to proc does not
> make sense.
> 
> Andrei has a good point that placing the names in a hierarchy by
> user namespace has the potential to create more freedom when
> assigning names to namespaces, as it means the names for namespaces
> do not need to be globally unique, and while still allowing the names
> to stay the same.
> 
> 
> To recap the possibilities for names for namespaces that I have seen
> mentioned in this thread are:
>   - Names per mount
>   - Names per user namespace
> 
> I personally suspect that names per mount are likely to be so flexibly
> they are confusing, while names per user namespace are likely to be
> rigid, possibly too rigid to use.
> 
> It all depends upon how everything is used.  I have yet to see a
> complete story of how these names will be generated and used.  So I can
> not really judge.

So I haven't fully understood either what the motivation for this
patchset is.
I can just speak to the use-case I had when I started prototyping
something similar: We needed a way to get a view on all namespaces
that exist on the system because we wanted a way to do namespace
debugging on a live system. This interface could've easily lived in
debugfs. The main point was that it should contain all namespaces.
Note, that it wasn't supposed to be a hierarchical format it was only
mean to list all namespaces and accessible to real root.
The interface here is way more flexible/complex and I haven't yet
figured out what exactly it is supposed to be used for.

> 
> 
> Let me add another take on this idea that might give this work a path
> forward. If I were solving this I would explore giving nsfs directories
> per user namespace, and a way to mount it that exposed the directory of
> the mounters current user namespace (something like btrfs snapshots).
> 
> Hmm.  For the user namespace directory I think I would give it a file
> "ns" that can be opened to get a file handle on the user namespace.
> Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid",
> "user", "uts") for each type of namespace.  In each directory I think
> I would just have a 64bit counter and each new entry I would assign the
> next number from that counter.
> 
> The restore could either have the ability to rename files or simply the
> ability to bump the counter (like we do with pids) so the names of the
> namespaces can be restored.
> 
> That winds up making a user namespace the namespace of namespaces, so
> I am not 100% about the idea. 

I think you're right that we need to understand better what the use-case
is. If I understand your suggestion correctly it wouldn't allow to show
nested user namespaces if the nsfs mount is per-user namespace.

Let me throw in a crazy idea: couldn't we just make the ioctl_ns() walk
a namespace hierarchy? For example, you could pass in a user namespace
fd and then you'd get back a struct with handles for fds for the
namespaces owned by that user namespace and then you could use
NS_GET_USERNS/NS_GET_PARENT to walk upwards from the user namespace fd
passed in initially and so on? Or something similar/simpler. This would
also decouple this from procfs somewhat.

Christian

  reply	other threads:[~2020-08-17 17:48 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 11:59 [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Kirill Tkhai
2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai
2020-07-30 13:35   ` Christian Brauner
2020-07-30 14:07     ` Kirill Tkhai
2020-07-30 15:59       ` Christian Brauner
2020-07-30 14:30   ` Christian Brauner
2020-07-30 14:34     ` Kirill Tkhai
2020-07-30 14:39       ` Christian Brauner
2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai
2020-07-30 14:30   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai
2020-07-30 14:32   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai
2020-07-30 14:37   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai
2020-07-30 14:46   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai
2020-07-30 14:49   ` Christian Brauner
2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai
2020-07-30 14:50   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai
2020-07-30 14:52   ` Christian Brauner
2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai
2020-07-30 12:23   ` Matthew Wilcox
2020-07-30 13:32     ` Kirill Tkhai
2020-07-30 13:56       ` Matthew Wilcox
2020-07-30 14:12         ` Kirill Tkhai
2020-07-30 14:15           ` Matthew Wilcox
2020-07-30 14:20             ` Kirill Tkhai
2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai
2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai
2020-07-30 12:18   ` Alexey Dobriyan
2020-07-30 13:22     ` Kirill Tkhai
2020-07-30 13:26   ` Christian Brauner
2020-07-30 14:30     ` Kirill Tkhai
2020-07-30 20:47   ` kernel test robot
2020-07-30 20:47     ` kernel test robot
2020-07-30 22:20   ` kernel test robot
2020-07-30 22:20     ` kernel test robot
2020-08-05  8:17   ` kernel test robot
2020-08-05  8:17     ` kernel test robot
2020-08-05  8:17   ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot
2020-08-05  8:17     ` kernel test robot
2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai
2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai
2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai
2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai
2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai
2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai
2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai
2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai
2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner
2020-07-30 13:38   ` Christian Brauner
2020-07-30 14:34 ` Eric W. Biederman
2020-07-30 14:42   ` Christian Brauner
2020-07-30 15:01   ` Kirill Tkhai
2020-07-30 22:13     ` Eric W. Biederman
2020-07-31  8:48       ` Pavel Tikhomirov
2020-08-03 10:03       ` Kirill Tkhai
2020-08-03 10:51         ` Alexey Dobriyan
2020-08-06  8:05         ` Andrei Vagin
2020-08-07  8:47           ` Kirill Tkhai
2020-08-10 17:34             ` Andrei Vagin
2020-08-11 10:23               ` Kirill Tkhai
2020-08-12 17:53                 ` Andrei Vagin
2020-08-13  8:12                   ` Kirill Tkhai
2020-08-14  1:16                     ` Andrei Vagin
2020-08-14 15:11                       ` Kirill Tkhai
2020-08-14 19:21                         ` Andrei Vagin
2020-08-17 14:05                           ` Kirill Tkhai
2020-08-17 15:48                             ` Eric W. Biederman
2020-08-17 17:47                               ` Christian Brauner [this message]
2020-08-17 18:53                                 ` Eric W. Biederman
2020-08-04  5:43     ` Andrei Vagin
2020-08-04 12:11       ` Pavel Tikhomirov
2020-08-04 14:47       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200817174745.jssxjdcwoqxeg5pu@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=areber@redhat.com \
    --cc=avagin@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.