From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932419AbcFBHtZ (ORCPT <rfc822;w@1wt.eu>);
	Thu, 2 Jun 2016 03:49:25 -0400
Received: from mx2.suse.de ([195.135.220.15]:51880 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932175AbcFBHtY (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 2 Jun 2016 03:49:24 -0400
Date: Thu, 2 Jun 2016 09:49:20 +0200
From: Jan Kara <jack@suse.cz>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nikolay Borisov <kernel@kyup.com>, john@johnmccutchan.com,
        eparis@redhat.com, jack@suse.cz, linux-kernel@vger.kernel.org,
        gorcunov@openvz.org, avagin@openvz.org, netdev@vger.kernel.org,
        operations@siteground.com,
        Linux Containers <containers@lists.linux-foundation.org>
Subject: Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per
 userns
Message-ID: <20160602074920.GG19636@quack2.suse.cz>
References: <1464767580-22732-1-git-send-email-kernel@kyup.com>
 <8737ow7vcp.fsf@x220.int.ebiederm.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8737ow7vcp.fsf@x220.int.ebiederm.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 01-06-16 11:00:06, Eric W. Biederman wrote:
> Cc'd the containers list.
> 
> Nikolay Borisov <kernel@kyup.com> writes:
> 
> > Currently the inotify instances/watches are being accounted in the 
> > user_struct structure. This means that in setups where multiple 
> > users in unprivileged containers map to the same underlying 
> > real user (e.g. user_struct) the inotify limits are going to be 
> > shared as well which can lead to unplesantries. This is a problem 
> > since any user inside any of the containers can potentially exhaust 
> > the instance/watches limit which in turn might prevent certain 
> > services from other containers from starting.
> 
> On a high level this is a bit problematic as it appears to escapes the
> current limits and allows anyone creating a user namespace to have their
> own fresh set of limits.  Given that anyone should be able to create a
> user namespace whenever they feel like escaping limits is a problem.
> That however is solvable.
> 
> A practical question.  What kind of limits are we looking at here?
> 
> Are these loose limits for detecting buggy programs that have gone
> off their rails?
> 
> Are these tight limits to ensure multitasking is possible?

The original motivation for these limits is to limit resource usage.  There
is in-kernel data structure that is associated with each notification mark
you create and we don't want users to be able to DoS the system by creating
too many of them. Thus we limit number of notification marks for each user.
There is also a limit on the number of notification instances - those are
naturally limited by the number of open file descriptors but admin may want
to limit them more...

So cgroups would be probably the best fit for this but I'm not sure whether
it is not an overkill...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR