From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx2.suse.de ([195.135.220.15]:55027 "EHLO mx2.suse.de"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S932136AbcDDIeN (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 4 Apr 2016 04:34:13 -0400
Date: Thu, 31 Mar 2016 13:17:34 +0200
From: Jan Kara <jack@suse.cz>
To: Steve Grubb <sgrubb@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Eric Sandeen <sandeen@redhat.com>,
	fsdevel <linux-fsdevel@vger.kernel.org>, eparis@redhat.com
Subject: Re: [PATCH 1/1] fanotify: pre-approve listener's OPEN_PERM access
 requests
Message-ID: <20160331111734.GA6651@quack.suse.cz>
References: <56A7FF64.1050301@redhat.com>
 <20160128135611.GJ7726@quack.suse.cz>
 <2101761.USnoJ1A03o@x2>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2101761.USnoJ1A03o@x2>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed 30-03-16 14:47:31, Steve Grubb wrote:
> On Thursday, January 28, 2016 02:56:11 PM Jan Kara wrote:
> > On Tue 26-01-16 17:21:08, Eric Sandeen wrote:
> > > From: Steve Grubb <sgrubb@redhat.com>
> > > 
> > > If a daemon using FANOTIFY needs to open a file on a watched filesystem
> > > and
> > > its wanting OPEN_PERM events, we get deadlock. (This could happen because
> > > of a library the daemon is using suddenly decides it needs to look in a
> > > new
> > > file.) Even though the man page says that the daemon should approve its
> > > own
> > > access decision, it really can't. If its in the middle of working on a
> > > request and that in turn generates another request, the second request is
> > > going to sit in the queue inside the kernel and not be read because the
> > > daemon is waiting on a library call that will never finish. We also have
> > > no
> > > idea how many requests are stacked up before we get to it. So, it really
> > > can't approve its own access requests.
> > > 
> > > The solution is to assume that the daemon is going to approve its own file
> > > access requests. So, any requested access that matches the pid of the
> > > program receiving fanotify events should be pre-approved in the kernel
> > > and not sent to user space for approval. This should prevent deadlock.
> > > 
> > > This behavior only exists if FAN_SELF_APPROVE is in the flags at
> > > fanotify_init() time.
> > > 
> > > [Eric Sandeen: Make behavior contingent on fanotify_init flag]
> > > 
> > > Signed-off-by: Steve Grubb <sgrubb@redhat.com>
> > > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > > ---
> > > 
> > > Resending this; first submission to lkml generated no responses, but
> > > offline Eric Paris indicated that the original patch was "policy in the
> > > kernel," so I'll see what people think of making it contingent on an
> > > fanotify_init flag at syscall time.
> > 
> > AKPM is merging fanotify patches these days so it is good to CC him. I'm
> > keeping eye on the notification infrastructure so you can CC me when you
> > need an opinion.
> > 
> > Now to the patch: This solution really looks half baked to me. What if
> > there are two processes mediating the access?
> 
> While this is certainly possible, is this actually done in real life?

Honestly, I don't know. But it doesn't seem like too exotic configuration
to me. Think for example about containers. It just seems as a bad design to
restrict fanotify permission events to: "only one process in the system
can safely use this". That has always proven to be a bad idea in the long
run.

> > You'll get the deadlock again: We have processes A and B mediating access. A
> > accesses some file f, A selfapproves the event for itself but the access is
> > still blocked on the approval from B. B proceeds to process the access
> > request by A. It accesses some file which generates the permission event -
> > now B is blocked waiting for A to approve its event. Dang.
> > 
> > This really resembles a situation when e.g. multipathd must be damn carful
> > not to generate any IO while it is running lest it deadlocks while
> > reconfiguring paths. So here you have to be damn careful not to generate
> > any events when processing fanotify permission events... And I know that is
> > inconvenient but that's how things were designed and duck taping some
> > special cases IMHO is not acceptable.
> 
> The issue here is that any relatively interesting program will have several 
> libraries who could at any time decide it needs to open a file. Perhaps even 
> /etc/hosts for network name resolution. You really don't know what the third 
> party libraries might do and the consequences are pretty severe - deadlock.
> 
> You could make it multi-threaded and have one thread dequeuing approval 
> requests and another servicing them. But...each request has a live descriptor 
> that counts towards the rlimit for max open descriptors. Hitting that is bad 
> also.

I agree this reduces the attractivity of the two thread design.
 
> The simplest solution is to assume that the daemon is going to approve
> its own request. It would never refuse its own request, should it?

I agree this is a solution which fixes your usecase but I don't think it is
good enough for general use. I won't accept anything that doesn't work for
several users of fanotify permission events. As I wrote below in my
original email, you can use for example capabilities to make things work for
several processes without too big hassle...

> > One solution which would look reasonably clean to me would be that a
> > process could have a capability which when set would mean that accesses by
> > that process would not generate fanotify permission events. This would
> > really avoid the deadlocks. But then botnet / virus writers would quickly
> > learn to set this capability for their processes so I'm not sure if this
> > doesn't somewhat defeat the purpose of permission events. OTOH setting
> > this capability would be gated by CAP_SYS_ADMIN anyway and once you have
> > this you have other ways to circumvent any protection so it is not
> > catastrophical. But still it makes life easier for the bad guys.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR