linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Nikolay Borisov <kernel@kyup.com>
Cc: Jan Kara <jack@suse.cz>,
	john@johnmccutchan.com, eparis@redhat.com,
	linux-kernel@vger.kernel.org, gorcunov@openvz.org,
	avagin@openvz.org, netdev@vger.kernel.org,
	operations@siteground.com,
	Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns
Date: Fri, 03 Jun 2016 15:41:55 -0500	[thread overview]
Message-ID: <87inxqovho.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <5751667D.7010207@kyup.com> (Nikolay Borisov's message of "Fri, 3 Jun 2016 14:14:05 +0300")

Nikolay Borisov <kernel@kyup.com> writes:

> On 06/02/2016 07:58 PM, Eric W. Biederman wrote:
>> 
>> Nikolay please see my question for you at the end.
[snip] 
>> All of that said there is definitely a practical question that needs to
>> be asked.  Nikolay how did you get into this situation?  A typical user
>> namespace configuration will set up uid and gid maps with the help of a
>> privileged program and not map the uid of the user who created the user
>> namespace.  Thus avoiding exhausting the limits of the user who created
>> the container.
>
> Right but imagine having multiple containers with identical uid/gid maps
> for LXC-based setups imagine this:
>
> lxc.id_map = u 0 1337 65536

So I am only moderately concerned when the containers have overlapping
ids.  Because at some level overlapping ids means they are the same
user.  This is certainly true for file permissions and for other
permissions.  To isolate one container from another it fundamentally
needs to have separate uids and gids on the host system.

> Now all processes which are running with the same user on different
> containers will actually share the underlying user_struct thus the
> inotify limits. In such cases even running multiple instances of 'tail'
> in one container will eventually use all allowed inotify/mark instances.
> For this to happen you needn't also have complete overlap of the uid
> map, it's enough to have at least one UID between 2 containers overlap.
>
>
> So the risk of exhaustion doesn't apply to the privileged user that
> created the container and the uid mapping, but rather the users under
> which the various processes in the container are running. Does that make
> it clear?

Yes.  That is clear.

>> Which makes me personally more worried about escaping the existing
>> limits than exhausting the limits of a particular user.
>
> So I thought bit about it and I guess a solution can be concocted which
> utilize the hierarchical nature of page counter, and the inotify limits
> are set per namespace if you have capable(CAP_SYS_ADMIN). That way the
> admin can set one fairly large on the init_user_ns and then in every
> namespace created one can set smaller limits. That way for a branch in
> the tree (in the nomenclature you used in your previous reply to me) you
> will really be upper-bound to the limit set in the namespace which have
> ->level = 1. For the width of the tree, you will be bound by the
> "global" init_user_ns limits. How does that sound?

As a addendum to that design.  I think there should be an additional
sysctl or two that specifies how much the limit decreases when creating
a new user namespace and when creating a new user in that user
namespace.  That way with a good selection of limits and a limit
decrease people can use the kernel defaults without needing to change
them.

Having default settings that are good enough 99% of the time and that
people don't need to tune, would be my biggest requirement (aside from
being light-weight) for merging something like this.

If things are set and forget and even the continer case does not need to
be aware then I think we have a design sufficiently robust and different
from what cgroups is doing to make it worth while to have a userns based
solution.

I can see a lot of different limits implemented this way.

Eric

  reply	other threads:[~2016-06-03 20:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  7:52 [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns Nikolay Borisov
2016-06-01  7:52 ` [PATCH 1/4] inotify: Add infrastructure to account inotify limits per-namespace Nikolay Borisov
2016-06-06  8:05   ` Cyrill Gorcunov
2016-06-06  9:26     ` Nikolay Borisov
2016-06-01  7:52 ` [PATCH 2/4] inotify: Convert inotify limits to be accounted per-realuser/per-namespace Nikolay Borisov
2016-06-01  7:52 ` [PATCH 3/4] misc: Rename the HASH_SIZE macro Nikolay Borisov
2016-06-01 18:13   ` David Miller
2016-06-01  7:53 ` [PATCH 4/4] inotify: Don't include inotify.h when !CONFIG_INOTIFY_USER Nikolay Borisov
2016-06-01 16:00 ` [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns Eric W. Biederman
2016-06-02  6:27   ` Nikolay Borisov
2016-06-02 16:19     ` Eric W. Biederman
2016-06-02  7:49   ` Jan Kara
2016-06-02 16:58     ` Eric W. Biederman
2016-06-03 11:14       ` Nikolay Borisov
2016-06-03 20:41         ` Eric W. Biederman [this message]
2016-06-06  6:41           ` Nikolay Borisov
2016-06-06 20:00             ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inxqovho.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=avagin@openvz.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=eparis@redhat.com \
    --cc=gorcunov@openvz.org \
    --cc=jack@suse.cz \
    --cc=john@johnmccutchan.com \
    --cc=kernel@kyup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=operations@siteground.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).