From: Daniel Walsh <dwalsh@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>, Roman Mohr <rmohr@redhat.com>
Cc: "vromanso@redhat.com" <vromanso@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"virtio-fs@redhat.com" <virtio-fs@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
"misono.tomohiro@fujitsu.com" <misono.tomohiro@fujitsu.com>,
"mpatel@redhat.com" <mpatel@redhat.com>
Subject: Re: [PATCH v2 3/3] virtiofsd: probe unshare(CLONE_FS) and print an error
Date: Tue, 28 Jul 2020 15:12:54 -0400 [thread overview]
Message-ID: <e982e87a-d5a8-264d-f591-0f1523464c97@redhat.com> (raw)
In-Reply-To: <20200728131250.GB78409@redhat.com>
On 7/28/20 09:12, Vivek Goyal wrote:
> On Tue, Jul 28, 2020 at 12:00:20PM +0200, Roman Mohr wrote:
>> On Tue, Jul 28, 2020 at 3:07 AM misono.tomohiro@fujitsu.com <
>> misono.tomohiro@fujitsu.com> wrote:
>>
>>>> Subject: [PATCH v2 3/3] virtiofsd: probe unshare(CLONE_FS) and print an
>>> error
>>>> An assertion failure is raised during request processing if
>>>> unshare(CLONE_FS) fails. Implement a probe at startup so the problem can
>>>> be detected right away.
>>>>
>>>> Unfortunately Docker/Moby does not include unshare in the seccomp.json
>>>> list unless CAP_SYS_ADMIN is given. Other seccomp.json lists always
>>>> include unshare (e.g. podman is unaffected):
>>>>
>>> https://raw.githubusercontent.com/seccomp/containers-golang/master/seccomp.json
>>>> Use "docker run --security-opt seccomp=path/to/seccomp.json ..." if the
>>>> default seccomp.json is missing unshare.
>>> Hi, sorry for a bit late.
>>>
>>> unshare() was added to fix xattr problem:
>>>
>>> https://github.com/qemu/qemu/commit/bdfd66788349acc43cd3f1298718ad491663cfcc#
>>> In theory we don't need to call unshare if xattr is disabled, but it is
>>> hard to get to know
>>> if xattr is enabled or disabled in fv_queue_worker(), right?
>>>
>>>
>> In kubevirt we want to run virtiofsd in containers. We would already not
>> have xattr support for e.g. overlayfs in the VM after this patch series (an
>> acceptable con at least for us right now).
>> If we can get rid of the unshare (and potentially of needing root) that
>> would be great. We always assume that everything which we run in containers
>> should work for cri-o and docker.
> But cri-o and docker containers run as root, isn't it? (or atleast have
> the capability to run as root). Havind said that, it will be nice to be able
> to run virtiofsd without root.
>
> There are few hurdles though.
>
> - For file creation, we switch uid/gid (seteuid/setegid) and that seems
> to require root. If we were to run unpriviliged, probably all files
> on host will have to be owned by unpriviliged user and guest visible
> uid/gid will have to be stored in xattrs. I think virtfs supports
> something similar.
>
> I am sure there are other restrictions but this probably is the biggest
> one to overcome.
>
> >
You should be able to run it within a user namespace with Namespaces
capabilities.
>> "Just" pointing docker to a different seccomp.json file is something which
>> k8s users/admin in many cases can't do.
> Or may be issue is that standard seccomp.json does not allow unshare()
> and hence you are forced to use a non-standar seccomp.json.
>
> Vivek
>
>> Best Regards,
>> Roman
>>
>>
>>> So, it looks good to me.
>>> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>>>
>>> Regards,
>>> Misono
>>>
>>>> Cc: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> ---
>>>> tools/virtiofsd/fuse_virtio.c | 16 ++++++++++++++++
>>>> 1 file changed, 16 insertions(+)
>>>>
>>>> diff --git a/tools/virtiofsd/fuse_virtio.c
>>> b/tools/virtiofsd/fuse_virtio.c
>>>> index 3b6d16a041..9e5537506c 100644
>>>> --- a/tools/virtiofsd/fuse_virtio.c
>>>> +++ b/tools/virtiofsd/fuse_virtio.c
>>>> @@ -949,6 +949,22 @@ int virtio_session_mount(struct fuse_session *se)
>>>> {
>>>> int ret;
>>>>
>>>> + /*
>>>> + * Test that unshare(CLONE_FS) works. fv_queue_worker() will need
>>> it. It's
>>>> + * an unprivileged system call but some Docker/Moby versions are
>>> known to
>>>> + * reject it via seccomp when CAP_SYS_ADMIN is not given.
>>>> + *
>>>> + * Note that the program is single-threaded here so this syscall
>>> has no
>>>> + * visible effect and is safe to make.
>>>> + */
>>>> + ret = unshare(CLONE_FS);
>>>> + if (ret == -1 && errno == EPERM) {
>>>> + fuse_log(FUSE_LOG_ERR, "unshare(CLONE_FS) failed with EPERM. If
>>> "
>>>> + "running in a container please check that the container
>>> "
>>>> + "runtime seccomp policy allows unshare.\n");
>>>> + return -1;
>>>> + }
>>>> +
>>>> ret = fv_create_listen_socket(se);
>>>> if (ret < 0) {
>>>> return ret;
>>>> --
>>>> 2.26.2
>>>
next prev parent reply other threads:[~2020-07-28 20:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-27 19:02 [PATCH v2 0/3] virtiofsd: allow virtiofsd to run in a container Stefan Hajnoczi
2020-07-27 19:02 ` [PATCH v2 1/3] virtiofsd: drop CAP_DAC_READ_SEARCH Stefan Hajnoczi
2020-07-27 19:02 ` [PATCH v2 2/3] virtiofsd: add container-friendly -o sandbox=chroot option Stefan Hajnoczi
2020-08-07 15:36 ` Dr. David Alan Gilbert
2020-07-27 19:02 ` [PATCH v2 3/3] virtiofsd: probe unshare(CLONE_FS) and print an error Stefan Hajnoczi
2020-07-28 1:05 ` misono.tomohiro
2020-07-28 10:00 ` Roman Mohr
2020-07-28 13:12 ` Vivek Goyal
2020-07-28 15:52 ` Daniel P. Berrangé
2020-07-28 20:54 ` Vivek Goyal
2020-07-28 19:12 ` Daniel Walsh [this message]
2020-07-28 21:01 ` Vivek Goyal
2020-07-29 7:59 ` Roman Mohr
2020-07-29 14:40 ` Stefan Hajnoczi
2020-07-30 22:21 ` Daniel Walsh
2020-07-31 8:26 ` Stefan Hajnoczi
2020-07-31 8:39 ` Roman Mohr
2020-07-31 14:11 ` Stefan Hajnoczi
2020-07-28 15:32 ` Stefan Hajnoczi
2020-07-28 19:15 ` Daniel Walsh
2020-07-29 14:29 ` Stefan Hajnoczi
2020-08-07 15:29 ` Dr. David Alan Gilbert
2020-08-27 18:40 ` [PATCH v2 0/3] virtiofsd: allow virtiofsd to run in a container Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e982e87a-d5a8-264d-f591-0f1523464c97@redhat.com \
--to=dwalsh@redhat.com \
--cc=dgilbert@redhat.com \
--cc=misono.tomohiro@fujitsu.com \
--cc=mpatel@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rmohr@redhat.com \
--cc=stefanha@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtio-fs@redhat.com \
--cc=vromanso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).