On Fri, Jul 31, 2020 at 10:26 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
On Thu, Jul 30, 2020 at 06:21:34PM -0400, Daniel Walsh wrote:
> On 7/29/20 10:40, Stefan Hajnoczi wrote:
> > On Wed, Jul 29, 2020 at 09:59:01AM +0200, Roman Mohr wrote:
> >> On Tue, Jul 28, 2020 at 3:13 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >>
> >>> On Tue, Jul 28, 2020 at 12:00:20PM +0200, Roman Mohr wrote:
> >>>> On Tue, Jul 28, 2020 at 3:07 AM misono.tomohiro@fujitsu.com <
> >>>> misono.tomohiro@fujitsu.com> wrote:
> >>>>
> >>>>>> Subject: [PATCH v2 3/3] virtiofsd: probe unshare(CLONE_FS) and print
> >>> an
> >>>>> error
> >> Yes they can run as root. I can tell you what we plan to do with the
> >> containerized virtiofsd: We run it as part of the user-owned pod (a set of
> >> containers).
> >> One of our main goals at the moment is to run VMs in a user-owned pod
> >> without additional privileges.
> >> So that in case the user (VM-creator/owner) enters the pod or something
> >> breaks out of the VM they are just in the unprivileged container sandbox.
> >> As part of that we try to get also rid of running containers in the
> >> user-context with the root user.
> >>
> >> One possible scenario which I could think of as being desirable from a
> >> kubevirt perspective:
> >> We would run the VM in one container and have an unprivileged
> >> virtiofsd container in parallel.
> >> This container already has its own mount namespace and it is not that
> >> critical if something manages to enter this sandbox.
> >>
> >> But we are not as far yet as getting completely rid of root right now in
> >> kubevirt, so if as a temporary step it needs root, the current proposed
> >> changes would still be very useful for us.
> > What is the issue with root in user namespaces?
> >
> > I remember a few years ago it was seen as a major security issue but
> > don't remember if container runtimes were already using user namespaces
> > back then.
> >
> > I guess the goal might be simply to minimize Linux capabilities as much
> > as possible?
> >
> > virtiofsd could nominally run with an arbitrary uid/gid but it still
> > needs the Linux capabilities that allow it to change uid/gid and
> > override file system permission checks just like the root user. Not sure
> > if there is any advantage to running with uid 1000 when you still have
> > these Linux capabilities.
> >
> > Stefan
>
> When you run in a user namespace, virtiofsd would only have
> setuid/setgid over the range of UIDs mapped into the user namespace.  So
> if UID=0 on the host is not mapped, then the container can not create
> real UID=0 files on disk.
>
> Similarly you can protect the user directories and any content by
> running the containers in a really high UID Mapping.

Roman, do user namespaces address your concerns about uid 0 in
containers?

They may eventually solve it. I would not let us hang up on this right now, since as said at least in kubevirt we can't get rid right now of root anyway.
Even if it is at some point in the future save and supported on bleeding-edge managed k8s clusters to allow ordinary users to run with uid 0, from my perspective it is right now common to restrict namespaces with PodSecurityPolicies or SecurityContexts to not allow running pods as root for normal users.
It is also common that a significant part of the community users run docker and/or run on managed k8s clusters where they can not influence if user-namespaces are enabled, if they can run pods as root, if the runtime points to a seccomp file they like or if the runtime they prefer is used.

But let me repeat again that we require root right now anyway and that we don't run the pods right now with the user privileges (but we should and we aim for that). Right now PSPs and SCCs restrict access to these pods by the users.
So for our use case, at this exact moment root is acceptable, the unshare call is a little bit more problematic.

Best Regards,
Roman

 

Stefan