linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: "Michael Kerrisk \(man-pages\)" <mtk.manpages@gmail.com>
Cc: Andrew Vagin <avagin@virtuozzo.com>,
	Andrey Vagin <avagin@openvz.org>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	"criu\@openvz.org" <criu@openvz.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
Date: Fri, 29 Jul 2016 13:05:48 -0500	[thread overview]
Message-ID: <87h9b8e2v7.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <40e35f1a-10e6-b7a5-936e-a09f008be0d0@gmail.com> (Michael Kerrisk's message of "Thu, 28 Jul 2016 21:00:32 +0200")

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On 07/28/2016 02:56 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> On 07/26/2016 10:39 PM, Andrew Vagin wrote:
>>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote:
>>
>>>> If we want to compare two file descriptors of the current process,
>>>> it is one of cases for which kcmp can be used. We can call kcmp to
>>>> compare two namespaces which are opened in other processes.
>>>
>>> Is there really a use case there? I assume we're talking about the
>>> scenario where a process in one namespace opens a /proc/PID/ns/*
>>> file descriptor and passes that FD to another process via a UNIX
>>> domain socket. Is that correct?
>>>
>>> So, supposing that we want to build a map of the relationships
>>> between namespaces using the proposed kcmp() API, and there are
>>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls
>>> to kcmp()?
>>
>> Potentially.  The numbers are small enough O(N^2) isn't fatal.
>
> Define "small", please.
>
> O(N^2) makes me nervous about what other use cases lurk out
> there that may get bitten by this.

Worst case for N (One namespace per thread) is about 60k.
A typical heavy use case may be 1000 namespaces of any type.
So we are talking about O(N^2) that rarely happens and should be done in
a couple of seconds.

>> Where kcmp shines is that it allows migration to happen.  Inode numbers
>> to change (which they very much will today), and still have things work.
>
>
>> We can keep it O(Nlog(N)) by taking advantage of not just the equality
>> but the ordering relationship.  Although Ugh.
>
> Yes, that sounds pretty ugly...

Actually having thought about this a little more if kcmp returns an
ordering by inode and migration preserves the relative order of
the inodes (which should just be a creation order) it should be quite
solvable.

Switch from an order by inode number to an order by object creation
time, and guarantee that all creations are have an order (which with
task_list_lock we practically already have) and it should be even easier
to create.  (A 64bit nanosecond resolution timestamp is good for 544
years of uptime).  A 64bit number that increments each time an object is
created should have an even better lifespan.

I don't know if we can find a way to give that guarantee for other kcmp
comparisons but it is worth a thought.

>>One disadvantage of
>> kcmp currently is that the way the ordering relationship is defined
>> the order is not preserved over migration :(
>
> So, does kcmp() fully solve the proble(s) at hand? It sounds like
> not, if I understand your last point correctly.

There are 3 possibilities I see for migration in migration, ordered
in order of implementation difficulty.
1) Have a clear signal that migration happened and a nested migration
   needs to restart.
2) Use kcmp so that only the relative order needs to be preserved.
3) Preserve the device number and inode numbers.

At a practical level I think (2) may actually in net be the simplest.
It requires a little more care to implement and you have to opt in,
but it should not require any rolling back of activity (merely careful
ordering of object creation).

I definititely like kcmp knowing how to compare things by inode
(aka st_dev, st_inode) because then even if you have to restart
the comparisons after a migration the exact details you are comparing
are hidden and so it is easier to support and harder to get wrong.

I can imagine how to preserve inode numbers by creating a new instance
of nsfs instance and using the old inode numbers upon restore.  I don't
currently see how we could possibly preserve st_dev over migration short of
a device number namespace.

So if we are going to continue with making device numbers be a legacy
attribute applications should not care about we need a way to compare
things by not looking at st_dev.  Which brings us back to kcmp.

Hmm.  Hotplugging as disk and plugging it back likely will change the
device number and give the same kind of challenge with st_dev (although
you can't keep a file descriptor open across that kind of event).  So
certainly a hotplug event on a device should be enough to say don't care
about the device number.

Eric

  reply	other threads:[~2016-07-29 18:05 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-14 18:20 [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-14 18:20 ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-15 12:21   ` kbuild test robot
2016-07-14 18:20 ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-14 19:07   ` W. Trevor King
2016-07-14 18:20 ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-14 18:48   ` W. Trevor King
2016-07-14 18:20 ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-14 18:20 ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-14 22:02 ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Andrey Vagin
2016-07-15  2:12   ` [PATCH 1/5] namespaces: move user_ns into ns_common Andrey Vagin
2016-07-15  2:12     ` [PATCH 2/5] kernel: add a helper to get an owning user namespace for a namespace Andrey Vagin
2016-07-24  5:03       ` Eric W. Biederman
2016-07-24  6:37         ` Andrew Vagin
2016-07-24 14:30           ` Eric W. Biederman
2016-07-24 17:05             ` W. Trevor King
2016-07-24 16:54       ` W. Trevor King
2016-07-15  2:12     ` [PATCH 3/5] nsfs: add ioctl to get an owning user namespace for ns file descriptor Andrey Vagin
2016-07-15  2:12     ` [PATCH 4/5] nsfs: add ioctl to get a parent namespace Andrey Vagin
2016-07-24  5:07       ` Eric W. Biederman
2016-07-15  2:12     ` [PATCH 5/5] tools/testing: add a test to check nsfs ioctl-s Andrey Vagin
2016-07-16  8:21     ` [PATCH 1/5] namespaces: move user_ns into ns_common kbuild test robot
2016-07-23 23:07     ` kbuild test robot
2016-07-24  5:00     ` Eric W. Biederman
2016-07-24  5:54       ` Andrew Vagin
     [not found]       ` <87k2gbmy02.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-07-24  5:54         ` Andrew Vagin
2016-07-24  5:54       ` Andrew Vagin
2016-07-24  5:54       ` Andrew Vagin
2016-07-24  5:10   ` [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces Eric W. Biederman
2016-07-26  2:07     ` Andrew Vagin
2016-07-21 14:41 ` Michael Kerrisk (man-pages)
2016-07-21 21:06   ` Andrew Vagin
     [not found]     ` <20160721210650.GA10989-1ViLX0X+lBJGNQ1M2rI3KwRV3xvJKrda@public.gmane.org>
2016-07-22  6:48       ` Michael Kerrisk (man-pages)
2016-07-22 18:25         ` Andrey Vagin
2016-07-25 11:47           ` Michael Kerrisk (man-pages)
2016-07-25 13:18             ` Eric W. Biederman
2016-07-25 14:46               ` Michael Kerrisk (man-pages)
2016-07-25 14:54                 ` Serge E. Hallyn
2016-07-25 15:17                   ` Eric W. Biederman
2016-07-25 14:59                 ` Eric W. Biederman
2016-07-26  2:54                   ` Andrew Vagin
2016-07-26  8:03                     ` Michael Kerrisk (man-pages)
2016-07-26 18:25                       ` Andrew Vagin
2016-07-26 18:32                         ` W. Trevor King
2016-07-26 19:11                           ` Andrew Vagin
2016-07-26 19:17                         ` Michael Kerrisk (man-pages)
2016-07-26 20:39                           ` Andrew Vagin
2016-07-28 10:45                             ` Michael Kerrisk (man-pages)
2016-07-28 12:56                               ` Eric W. Biederman
2016-07-28 19:00                                 ` Michael Kerrisk (man-pages)
2016-07-29 18:05                                   ` Eric W. Biederman [this message]
2016-07-31 21:31                                     ` Michael Kerrisk (man-pages)
2016-08-01 23:01                                     ` Andrew Vagin
2016-07-26 19:38                     ` Eric W. Biederman
2016-07-23 21:14 ` W. Trevor King
2016-07-23 21:38   ` James Bottomley
2016-07-23 21:58     ` W. Trevor King
2016-07-23 21:56       ` Eric W. Biederman
2016-07-23 22:34         ` W. Trevor King
2016-07-24  4:51           ` Eric W. Biederman
2016-08-01 18:20 ` Alban Crequy
2016-08-01 23:32   ` Andrew Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h9b8e2v7.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=avagin@openvz.org \
    --cc=avagin@virtuozzo.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=criu@openvz.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=serge@hallyn.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).