On Fri, Jul 31, 2020 at 10:39:37AM +0200, Roman Mohr wrote: > On Fri, Jul 31, 2020 at 10:26 AM Stefan Hajnoczi > wrote: > > > On Thu, Jul 30, 2020 at 06:21:34PM -0400, Daniel Walsh wrote: > > > On 7/29/20 10:40, Stefan Hajnoczi wrote: > > > > On Wed, Jul 29, 2020 at 09:59:01AM +0200, Roman Mohr wrote: > > > >> On Tue, Jul 28, 2020 at 3:13 PM Vivek Goyal > > wrote: > > > >> > > > >>> On Tue, Jul 28, 2020 at 12:00:20PM +0200, Roman Mohr wrote: > > > >>>> On Tue, Jul 28, 2020 at 3:07 AM misono.tomohiro@fujitsu.com < > > > >>>> misono.tomohiro@fujitsu.com> wrote: > > > >>>> > > > >>>>>> Subject: [PATCH v2 3/3] virtiofsd: probe unshare(CLONE_FS) and > > print > > > >>> an > > > >>>>> error > > > >> Yes they can run as root. I can tell you what we plan to do with the > > > >> containerized virtiofsd: We run it as part of the user-owned pod (a > > set of > > > >> containers). > > > >> One of our main goals at the moment is to run VMs in a user-owned pod > > > >> without additional privileges. > > > >> So that in case the user (VM-creator/owner) enters the pod or > > something > > > >> breaks out of the VM they are just in the unprivileged container > > sandbox. > > > >> As part of that we try to get also rid of running containers in the > > > >> user-context with the root user. > > > >> > > > >> One possible scenario which I could think of as being desirable from a > > > >> kubevirt perspective: > > > >> We would run the VM in one container and have an unprivileged > > > >> virtiofsd container in parallel. > > > >> This container already has its own mount namespace and it is not that > > > >> critical if something manages to enter this sandbox. > > > >> > > > >> But we are not as far yet as getting completely rid of root right now > > in > > > >> kubevirt, so if as a temporary step it needs root, the current > > proposed > > > >> changes would still be very useful for us. > > > > What is the issue with root in user namespaces? > > > > > > > > I remember a few years ago it was seen as a major security issue but > > > > don't remember if container runtimes were already using user namespaces > > > > back then. > > > > > > > > I guess the goal might be simply to minimize Linux capabilities as much > > > > as possible? > > > > > > > > virtiofsd could nominally run with an arbitrary uid/gid but it still > > > > needs the Linux capabilities that allow it to change uid/gid and > > > > override file system permission checks just like the root user. Not > > sure > > > > if there is any advantage to running with uid 1000 when you still have > > > > these Linux capabilities. > > > > > > > > Stefan > > > > > > When you run in a user namespace, virtiofsd would only have > > > setuid/setgid over the range of UIDs mapped into the user namespace. So > > > if UID=0 on the host is not mapped, then the container can not create > > > real UID=0 files on disk. > > > > > > Similarly you can protect the user directories and any content by > > > running the containers in a really high UID Mapping. > > > > Roman, do user namespaces address your concerns about uid 0 in > > containers? > > > > They may eventually solve it. I would not let us hang up on this right now, > since as said at least in kubevirt we can't get rid right now of root > anyway. > Even if it is at some point in the future save and supported on > bleeding-edge managed k8s clusters to allow ordinary users to run with uid > 0, from my perspective it is right now common to restrict namespaces with > PodSecurityPolicies or SecurityContexts to not allow running pods as root > for normal users. > It is also common that a significant part of the community users run docker > and/or run on managed k8s clusters where they can not influence if > user-namespaces are enabled, if they can run pods as root, if the runtime > points to a seccomp file they like or if the runtime they prefer is used. > > But let me repeat again that we require root right now anyway and that we > don't run the pods right now with the user privileges (but we should and we > aim for that). Right now PSPs and SCCs restrict access to these pods by the > users. > So for our use case, at this exact moment root is acceptable, the unshare > call is a little bit more problematic. Okay, thanks for explaining. Stefan