linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Nikolay Borisov <kernel@kyup.com>
Cc: john@johnmccutchan.com, eparis@redhat.com, jack@suse.cz,
	linux-kernel@vger.kernel.org, gorcunov@openvz.org,
	avagin@openvz.org, netdev@vger.kernel.org,
	operations@siteground.com,
	Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns
Date: Thu, 02 Jun 2016 11:19:23 -0500	[thread overview]
Message-ID: <87y46n36no.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <574FD1E4.8090109@kyup.com> (Nikolay Borisov's message of "Thu, 2 Jun 2016 09:27:48 +0300")

Nikolay Borisov <kernel@kyup.com> writes:

> On 06/01/2016 07:00 PM, Eric W. Biederman wrote:
>> Cc'd the containers list.
>> 
>> 
>> Nikolay Borisov <kernel@kyup.com> writes:
>> 
>>> Currently the inotify instances/watches are being accounted in the 
>>> user_struct structure. This means that in setups where multiple 
>>> users in unprivileged containers map to the same underlying 
>>> real user (e.g. user_struct) the inotify limits are going to be 
>>> shared as well which can lead to unplesantries. This is a problem 
>>> since any user inside any of the containers can potentially exhaust 
>>> the instance/watches limit which in turn might prevent certain 
>>> services from other containers from starting.
>> 
>> On a high level this is a bit problematic as it appears to escapes the
>> current limits and allows anyone creating a user namespace to have their
>> own fresh set of limits.  Given that anyone should be able to create a
>> user namespace whenever they feel like escaping limits is a problem.
>> That however is solvable.
>
> This is indeed a problem and the presented solution is rather dumb in
> that regard. I'm happy to work with you on suggestions so that I arrive
> at a solution that is upstreamable.

The one in kernel solution to hierarchical resource limits that I am
aware of is the current include/linux/page_counter.h which evolved from
include/linux/res_counter.h

>> A practical question.  What kind of limits are we looking at here?
>> 
>> Are these loose limits for detecting buggy programs that have gone
>> off their rails?
>
> Loose limits.
>
>> 
>> Are these tight limits to ensure multitasking is possible?
>> 
>> 
>> 
>> For tight limits where something is actively controlling the limits you
>> probably want a cgroup base solution.
>> 
>> For loose limits that are the kind where you set a good default and
>> forget about I think a user namespace based solution is reasonable.
>
> That's exactly the use case I had in mind.
>
>> 
>>> The solution I propose is rather simple, instead of accounting the 
>>> watches/instances per user_struct, start accounting them in a hashtable, 
>>> where the index used is the hashed pointer of the userns. This way
>>> the administrator needn't set the inotify limits very high and also 
>>> the risk of one container breaching the limits and affecting every 
>>> other container is alleviated.
>> 
>> I don't think this is the right data structure for a user namespace
>> based solution, at least in part because it does not account for users
>> escaping.
>
> Admittedly this is a naive solution, what are you ideas on something
> which achieves my initial aim of having limits per users, yet not
> allowing them to just create another namespace and escape them. The
> current namespace code has a hard-coded limit of 32 for nesting user
> namespaces. So currently at the worst case one can escape the limits up
> to 32 * current_limits.

32 is the nesting depth not the width of the tree.  But see above.

Eric

  reply	other threads:[~2016-06-02 16:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01  7:52 [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns Nikolay Borisov
2016-06-01  7:52 ` [PATCH 1/4] inotify: Add infrastructure to account inotify limits per-namespace Nikolay Borisov
2016-06-06  8:05   ` Cyrill Gorcunov
2016-06-06  9:26     ` Nikolay Borisov
2016-06-01  7:52 ` [PATCH 2/4] inotify: Convert inotify limits to be accounted per-realuser/per-namespace Nikolay Borisov
2016-06-01  7:52 ` [PATCH 3/4] misc: Rename the HASH_SIZE macro Nikolay Borisov
2016-06-01 18:13   ` David Miller
2016-06-01  7:53 ` [PATCH 4/4] inotify: Don't include inotify.h when !CONFIG_INOTIFY_USER Nikolay Borisov
2016-06-01 16:00 ` [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns Eric W. Biederman
2016-06-02  6:27   ` Nikolay Borisov
2016-06-02 16:19     ` Eric W. Biederman [this message]
2016-06-02  7:49   ` Jan Kara
2016-06-02 16:58     ` Eric W. Biederman
2016-06-03 11:14       ` Nikolay Borisov
2016-06-03 20:41         ` Eric W. Biederman
2016-06-06  6:41           ` Nikolay Borisov
2016-06-06 20:00             ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y46n36no.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=avagin@openvz.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=eparis@redhat.com \
    --cc=gorcunov@openvz.org \
    --cc=jack@suse.cz \
    --cc=john@johnmccutchan.com \
    --cc=kernel@kyup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=operations@siteground.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).