From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932292AbeARK3d (ORCPT <rfc822;w@1wt.eu>);
        Thu, 18 Jan 2018 05:29:33 -0500
Received: from mail-pg0-f65.google.com ([74.125.83.65]:38233 "EHLO
        mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755290AbeARK3a (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 18 Jan 2018 05:29:30 -0500
X-Google-Smtp-Source: ACJfBosDJbG3ub0ITJO4bf5iOM+WinZsBOKUdUbVYqleDM5cbok0beBVjdP5334dBLxV0+XFn7hvgnapHrJpL+rYJFo=
MIME-Version: 1.0
In-Reply-To: <20180117193124.GC3723@ubuntu-xps13>
References: <cover.1512741134.git.dongsu@kinvolk.io> <c85c293e19a478353aba8e6e3ee39e5914f798d5.1512041070.git.dongsu@kinvolk.io>
 <CADZs7q5NA7Kox62vnCOkL=TGgzTxX+oNYz6=oNXKWkQkQwSMrA@mail.gmail.com>
 <20180117142935.GA3723@ubuntu-xps13> <CADZs7q6ZHGHbrdL96Bmy148Zc6TxruiJrEeDjaDYEX8U-5QV1A@mail.gmail.com>
 <20180117193124.GC3723@ubuntu-xps13>
From: Alban Crequy <alban@kinvolk.io>
Date: Thu, 18 Jan 2018 11:29:29 +0100
Message-ID: <CADZs7q45--wKbWmuANHfTHbMYo1thQgtZ1u1Yg3LWpU4BNd-TQ@mail.gmail.com>
Subject: Re: [PATCH 08/11] fuse: Support fuse filesystems outside of init_user_ns
To: Seth Forshee <seth.forshee@canonical.com>
Cc: Dongsu Park <dongsu@kinvolk.io>, linux-kernel@vger.kernel.org,
        containers@lists.linux-foundation.org,
        "Eric W . Biederman" <ebiederm@xmission.com>,
        Miklos Szeredi <mszeredi@redhat.com>,
        Sargun Dhillon <sargun@sargun.me>, linux-fsdevel@vger.kernel.org,
        Tejun Heo <tj@kernel.org>, David Herrmann <dh.herrmann@googlemail.com>,
        Tom Gundersen <teg@jklm.no>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jan 17, 2018 at 8:31 PM, Seth Forshee
<seth.forshee@canonical.com> wrote:
> On Wed, Jan 17, 2018 at 07:56:59PM +0100, Alban Crequy wrote:
>> On Wed, Jan 17, 2018 at 3:29 PM, Seth Forshee
>> <seth.forshee@canonical.com> wrote:
>> > On Wed, Jan 17, 2018 at 11:59:06AM +0100, Alban Crequy wrote:
>> >> [Adding Tejun, David, Tom for question about cuse]
>> >>
>> >> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu@kinvolk.io> wrote:
>> >> > From: Seth Forshee <seth.forshee@canonical.com>
>> >> >
>> >> > In order to support mounts from namespaces other than
>> >> > init_user_ns, fuse must translate uids and gids to/from the
>> >> > userns of the process servicing requests on /dev/fuse. This
>> >> > patch does that, with a couple of restrictions on the namespace:
>> >> >
>> >> >  - The userns for the fuse connection is fixed to the namespace
>> >> >    from which /dev/fuse is opened.
>> >> >
>> >> >  - The namespace must be the same as s_user_ns.
>> >> >
>> >> > These restrictions simplify the implementation by avoiding the
>> >> > need to pass around userns references and by allowing fuse to
>> >> > rely on the checks in inode_change_ok for ownership changes.
>> >> > Either restriction could be relaxed in the future if needed.
>> >> >
>> >> > For cuse the namespace used for the connection is also simply
>> >> > current_user_ns() at the time /dev/cuse is opened.
>> >>
>> >> Was a use case discussed for using cuse in a new unprivileged userns?
>> >>
>> >> I ran some tests yesterday with cusexmp [1] and I could add a new char
>> >> device as an unprivileged user with:
>> >>
>> >> $ unshare -U -r -m sh -c 'mount --bind /mnt/cuse /dev/cuse ; cusexmp
>> >> --maj=99 --min=30 --name=foo
>> >>
>> >> where /mnt/cuse is previously mknod'ed correctly and chmod'ed 777.
>> >> Then, I could see the new device:
>> >>
>> >> $ cat /proc/devices | grep foo
>> >>  99 foo
>> >>
>> >> On normal distros, we don't have a /mnt/cuse chmod'ed 777 but still it
>> >> seems dangerous if the dev node can be provided otherwise and if we
>> >> don't have a use case for it.
>> >>
>> >> Thoughts?
>> >
>> > I can't remember the specific reasons, but I had concluded that letting
>> > unprivileged users use cuse within a user namespace isn't safe. But
>> > having a cuse device node usable by regular users at all is equally
>> > unsafe I suspect,
>>
>> This makes sense.
>>
>> > so I don't think your example demonstrates any problem
>> > specific to user namespaces. There shouldn't be any way to use a user
>> > namespace to gain access permissions towards /dev/cuse, otherwise we
>> > have bigger problems than cuse to worry about.
>>
>> From my tests, the patch seem safe but I don't fully understand why that is.
>>
>> I am not trying to gain more permissions towards /dev/cuse but to
>> create another cuse char file from within the unprivileged userns. I
>> tested the scenario by patching the memfs userspace FUSE driver to
>> generate the char device whenever the file is named "cuse" (turning
>> the regular file into a char device with the cuse major/minor behind
>> the scene):
>>
>> $ unshare -U -r -m
>> # memfs /mnt/memfs &
>> # ls -l /mnt/memfs
>> # echo -n > /mnt/memfs/cuse
>> -bash: /mnt/memfs/cuse: Input/output error
>> # ls -l /mnt/memfs/cuse
>> crwxrwxrwx. 1 root root 10, 203 Jan 17 18:24 /mnt/memfs/cuse
>> # cat /mnt/memfs/cuse
>> cat: /mnt/memfs/cuse: Permission denied
>>
>> But then, I could not use that char device, even though it seems to
>> have the correct major/minor and permissions. The kernel FUSE code
>> seems to call init_special_inode() to handle character devices. I
>> don't understand why it seems to be safe.
>
> Because for new mounts in non-init user namespaces alloc_super() sets
> SB_I_NODEV flag in s_iflags, which disallows opening device nodes in
> that filesystem.

I see. Thanks for the explanation!