All of lore.kernel.org
 help / color / mirror / Atom feed
* virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-09 21:24 Colin Walters
  2022-09-27 16:37   ` [Virtio-fs] " Vivek Goyal
  0 siblings, 1 reply; 21+ messages in thread
From: Colin Walters @ 2022-09-09 21:24 UTC (permalink / raw)
  To: qemu-devel

We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.

Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
https://github.com/coreos/coreos-assembler/issues/1812

Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
https://lwn.net/Articles/796868/

Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.

I may try a patch if this sounds OK...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-09 21:24 virtiofsd: Any reason why there's not an "openat2" sandbox mode? Colin Walters
@ 2022-09-27 16:37   ` Vivek Goyal
  0 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-27 16:37 UTC (permalink / raw)
  To: Colin Walters; +Cc: qemu-devel, virtio-fs-list, German Maglione, Sergio Lopez

On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> 
> Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> https://github.com/coreos/coreos-assembler/issues/1812
> 
> Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> https://lwn.net/Articles/796868/
> 
> Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.

[ cc virtio-fs-list, german, sergio ]

Hi Colin,

Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
good idea. We talked about it few times but nobody ever wrote a patch to
implement it.

And it probably makes sense with all the sandboxes (chroot(), namespaces).

I am wondering that it probably should not be a new sandbox mode at all.
It probably should be the default if kernel offers openat2() syscall.

Now all the development has moved to rust virtiofsd.

https://gitlab.com/virtio-fs/virtiofsd

C version of virtiofsd is just seeing small critical fixes.

And rust version allows running unprivileged (inside a user namespace).
German is also working on allowing running unprivileged without
user namespaces but this will not allow arbitrary uid/gid switching.

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136

If one wants to run unprivileged and also do arbitrary uid/gid switching,
then you need to use user namepsaces and map a range of subuid/subgid
into the user namepsace virtiofsd is running in.

If possible, please try to use rust virtiofsd for your situation. Its
already packaged for fedora.

Coming back to original idea of using openat2(), I think we should
probably give it a try in rust virtiofsd and if it works, it should
work across all the sandboxing modes.

Thanks
Vivek

> 
> I may try a patch if this sounds OK...
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-27 16:37   ` Vivek Goyal
  0 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-27 16:37 UTC (permalink / raw)
  To: Colin Walters; +Cc: qemu-devel, virtio-fs-list, German Maglione, Sergio Lopez

On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> 
> Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> https://github.com/coreos/coreos-assembler/issues/1812
> 
> Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> https://lwn.net/Articles/796868/
> 
> Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.

[ cc virtio-fs-list, german, sergio ]

Hi Colin,

Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
good idea. We talked about it few times but nobody ever wrote a patch to
implement it.

And it probably makes sense with all the sandboxes (chroot(), namespaces).

I am wondering that it probably should not be a new sandbox mode at all.
It probably should be the default if kernel offers openat2() syscall.

Now all the development has moved to rust virtiofsd.

https://gitlab.com/virtio-fs/virtiofsd

C version of virtiofsd is just seeing small critical fixes.

And rust version allows running unprivileged (inside a user namespace).
German is also working on allowing running unprivileged without
user namespaces but this will not allow arbitrary uid/gid switching.

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136

If one wants to run unprivileged and also do arbitrary uid/gid switching,
then you need to use user namepsaces and map a range of subuid/subgid
into the user namepsace virtiofsd is running in.

If possible, please try to use rust virtiofsd for your situation. Its
already packaged for fedora.

Coming back to original idea of using openat2(), I think we should
probably give it a try in rust virtiofsd and if it works, it should
work across all the sandboxing modes.

Thanks
Vivek

> 
> I may try a patch if this sounds OK...
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 16:37   ` [Virtio-fs] " Vivek Goyal
@ 2022-09-27 16:57     ` Vivek Goyal
  -1 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-27 16:57 UTC (permalink / raw)
  To: Colin Walters; +Cc: qemu-devel, virtio-fs-list, German Maglione, Sergio Lopez

On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > 
> > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > https://github.com/coreos/coreos-assembler/issues/1812
> > 
> > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > https://lwn.net/Articles/796868/
> > 
> > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> 
> [ cc virtio-fs-list, german, sergio ]
> 
> Hi Colin,
> 
> Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> good idea. We talked about it few times but nobody ever wrote a patch to
> implement it.
> 
> And it probably makes sense with all the sandboxes (chroot(), namespaces).
> 
> I am wondering that it probably should not be a new sandbox mode at all.
> It probably should be the default if kernel offers openat2() syscall.
> 
> Now all the development has moved to rust virtiofsd.
> 
> https://gitlab.com/virtio-fs/virtiofsd
> 
> C version of virtiofsd is just seeing small critical fixes.
> 
> And rust version allows running unprivileged (inside a user namespace).
> German is also working on allowing running unprivileged without
> user namespaces but this will not allow arbitrary uid/gid switching.
> 
> https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> 
> If one wants to run unprivileged and also do arbitrary uid/gid switching,
> then you need to use user namepsaces and map a range of subuid/subgid
> into the user namepsace virtiofsd is running in.
> 
> If possible, please try to use rust virtiofsd for your situation. Its
> already packaged for fedora.
> 
> Coming back to original idea of using openat2(), I think we should
> probably give it a try in rust virtiofsd and if it works, it should
> work across all the sandboxing modes.

Thinking more about it, enabling openat2() usage conditionally based on
some option probably is not a bad idea. I was assuming that using
openat2() by default will not break any of the existing use cases. But
I am not sure. I have burnt my fingers so many times and had to back
out on default settings that enabling usage of openat2() conditionally
will probably be a safer choice. :-)

Vivek



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-27 16:57     ` Vivek Goyal
  0 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-27 16:57 UTC (permalink / raw)
  To: Colin Walters; +Cc: qemu-devel, virtio-fs-list, German Maglione, Sergio Lopez

On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > 
> > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > https://github.com/coreos/coreos-assembler/issues/1812
> > 
> > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > https://lwn.net/Articles/796868/
> > 
> > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> 
> [ cc virtio-fs-list, german, sergio ]
> 
> Hi Colin,
> 
> Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> good idea. We talked about it few times but nobody ever wrote a patch to
> implement it.
> 
> And it probably makes sense with all the sandboxes (chroot(), namespaces).
> 
> I am wondering that it probably should not be a new sandbox mode at all.
> It probably should be the default if kernel offers openat2() syscall.
> 
> Now all the development has moved to rust virtiofsd.
> 
> https://gitlab.com/virtio-fs/virtiofsd
> 
> C version of virtiofsd is just seeing small critical fixes.
> 
> And rust version allows running unprivileged (inside a user namespace).
> German is also working on allowing running unprivileged without
> user namespaces but this will not allow arbitrary uid/gid switching.
> 
> https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> 
> If one wants to run unprivileged and also do arbitrary uid/gid switching,
> then you need to use user namepsaces and map a range of subuid/subgid
> into the user namepsace virtiofsd is running in.
> 
> If possible, please try to use rust virtiofsd for your situation. Its
> already packaged for fedora.
> 
> Coming back to original idea of using openat2(), I think we should
> probably give it a try in rust virtiofsd and if it works, it should
> work across all the sandboxing modes.

Thinking more about it, enabling openat2() usage conditionally based on
some option probably is not a bad idea. I was assuming that using
openat2() by default will not break any of the existing use cases. But
I am not sure. I have burnt my fingers so many times and had to back
out on default settings that enabling usage of openat2() conditionally
will probably be a safer choice. :-)

Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 16:57     ` [Virtio-fs] " Vivek Goyal
@ 2022-09-27 17:27       ` German Maglione
  -1 siblings, 0 replies; 21+ messages in thread
From: German Maglione @ 2022-09-27 17:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Colin Walters, qemu-devel, virtio-fs-list, Sergio Lopez

On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > >
> > > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > > https://github.com/coreos/coreos-assembler/issues/1812
> > >
> > > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > > https://lwn.net/Articles/796868/
> > >
> > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> >
> > [ cc virtio-fs-list, german, sergio ]
> >
> > Hi Colin,
> >
> > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > good idea. We talked about it few times but nobody ever wrote a patch to
> > implement it.
> >
> > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> >
> > I am wondering that it probably should not be a new sandbox mode at all.
> > It probably should be the default if kernel offers openat2() syscall.
> >
> > Now all the development has moved to rust virtiofsd.
> >
> > https://gitlab.com/virtio-fs/virtiofsd
> >
> > C version of virtiofsd is just seeing small critical fixes.
> >
> > And rust version allows running unprivileged (inside a user namespace).
> > German is also working on allowing running unprivileged without
> > user namespaces but this will not allow arbitrary uid/gid switching.
> >
> > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> >
> > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > then you need to use user namepsaces and map a range of subuid/subgid
> > into the user namepsace virtiofsd is running in.
> >
> > If possible, please try to use rust virtiofsd for your situation. Its
> > already packaged for fedora.
> >
> > Coming back to original idea of using openat2(), I think we should
> > probably give it a try in rust virtiofsd and if it works, it should
> > work across all the sandboxing modes.
>
> Thinking more about it, enabling openat2() usage conditionally based on
> some option probably is not a bad idea. I was assuming that using
> openat2() by default will not break any of the existing use cases. But
> I am not sure. I have burnt my fingers so many times and had to back
> out on default settings that enabling usage of openat2() conditionally
> will probably be a safer choice. :-)
>

I could work on this for the next major version and see if anything breaks.
But I prefer to add this as a compilation feature, instead of a command line
option that we will then have to maintain for a while.

Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
process can call openat2() without RESOLVE_IN_ROOT. I did some test with
Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
kernel 5.13

Cheers,
-- 
German



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-27 17:27       ` German Maglione
  0 siblings, 0 replies; 21+ messages in thread
From: German Maglione @ 2022-09-27 17:27 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Colin Walters, qemu-devel, virtio-fs-list, Sergio Lopez

On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > >
> > > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > > https://github.com/coreos/coreos-assembler/issues/1812
> > >
> > > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > > https://lwn.net/Articles/796868/
> > >
> > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> >
> > [ cc virtio-fs-list, german, sergio ]
> >
> > Hi Colin,
> >
> > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > good idea. We talked about it few times but nobody ever wrote a patch to
> > implement it.
> >
> > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> >
> > I am wondering that it probably should not be a new sandbox mode at all.
> > It probably should be the default if kernel offers openat2() syscall.
> >
> > Now all the development has moved to rust virtiofsd.
> >
> > https://gitlab.com/virtio-fs/virtiofsd
> >
> > C version of virtiofsd is just seeing small critical fixes.
> >
> > And rust version allows running unprivileged (inside a user namespace).
> > German is also working on allowing running unprivileged without
> > user namespaces but this will not allow arbitrary uid/gid switching.
> >
> > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> >
> > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > then you need to use user namepsaces and map a range of subuid/subgid
> > into the user namepsace virtiofsd is running in.
> >
> > If possible, please try to use rust virtiofsd for your situation. Its
> > already packaged for fedora.
> >
> > Coming back to original idea of using openat2(), I think we should
> > probably give it a try in rust virtiofsd and if it works, it should
> > work across all the sandboxing modes.
>
> Thinking more about it, enabling openat2() usage conditionally based on
> some option probably is not a bad idea. I was assuming that using
> openat2() by default will not break any of the existing use cases. But
> I am not sure. I have burnt my fingers so many times and had to back
> out on default settings that enabling usage of openat2() conditionally
> will probably be a safer choice. :-)
>

I could work on this for the next major version and see if anything breaks.
But I prefer to add this as a compilation feature, instead of a command line
option that we will then have to maintain for a while.

Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
process can call openat2() without RESOLVE_IN_ROOT. I did some test with
Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
kernel 5.13

Cheers,
-- 
German


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 17:27       ` [Virtio-fs] " German Maglione
@ 2022-09-27 17:51         ` Colin Walters
  -1 siblings, 0 replies; 21+ messages in thread
From: Colin Walters @ 2022-09-27 17:51 UTC (permalink / raw)
  To: German Maglione, Vivek Goyal; +Cc: qemu-devel, virtio-fs-list, Sergio Lopez



On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
>
>> > Now all the development has moved to rust virtiofsd.

Oh, awesome!!  The code there looks great.

> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

Hmm, what would be the issue with having the code there by default?  I think rather than any new command line option, we automatically use `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.

> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. 

I'm a bit skeptical honestly about how secure the existing namespace code is against a compromised virtiofsd process.  The primary worry is guest filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust makes this dramatically safer.

> I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

But yes, landlock and other things make sense, I just don't see these things as strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd on more sandboxing than openat2 already gives us.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-27 17:51         ` Colin Walters
  0 siblings, 0 replies; 21+ messages in thread
From: Colin Walters @ 2022-09-27 17:51 UTC (permalink / raw)
  To: German Maglione, Vivek Goyal; +Cc: qemu-devel, virtio-fs-list, Sergio Lopez



On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
>
>> > Now all the development has moved to rust virtiofsd.

Oh, awesome!!  The code there looks great.

> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

Hmm, what would be the issue with having the code there by default?  I think rather than any new command line option, we automatically use `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.

> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. 

I'm a bit skeptical honestly about how secure the existing namespace code is against a compromised virtiofsd process.  The primary worry is guest filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust makes this dramatically safer.

> I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

But yes, landlock and other things make sense, I just don't see these things as strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd on more sandboxing than openat2 already gives us.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 17:51         ` [Virtio-fs] " Colin Walters
  (?)
@ 2022-09-27 20:14         ` Stefan Hajnoczi
  2022-09-28  8:33           ` Sergio Lopez
  -1 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2022-09-27 20:14 UTC (permalink / raw)
  To: Colin Walters; +Cc: German Maglione, Vivek Goyal, virtio-fs-list, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]

On Tue, Sep 27, 2022 at 01:51:41PM -0400, Colin Walters wrote:
> 
> 
> On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
> >
> >> > Now all the development has moved to rust virtiofsd.
> 
> Oh, awesome!!  The code there looks great.
> 
> > I could work on this for the next major version and see if anything breaks.
> > But I prefer to add this as a compilation feature, instead of a command line
> > option that we will then have to maintain for a while.
> 
> Hmm, what would be the issue with having the code there by default?  I think rather than any new command line option, we automatically use `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.
> 
> > Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> > process can call openat2() without RESOLVE_IN_ROOT. 
> 
> I'm a bit skeptical honestly about how secure the existing namespace code is against a compromised virtiofsd process.  The primary worry is guest filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust makes this dramatically safer.
> 
> > I did some test with
> > Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> > kernel 5.13
> 
> But yes, landlock and other things make sense, I just don't see these things as strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd on more sandboxing than openat2 already gives us.

I think openat2(RESOLVE_IN_ROOT) support should be added unless there is
another unprivileged mechanism that is stronger.

The security implications need to be covered in the user documentation
so people can decide whether using this mode is appropriate.

We should continue to explain the difference between a voluntary
mechanism like openat2(RESOLVE_IN_ROOT) and a mandatory mechanism like
mount namespaces with pivot_root(2). Rust programs are not immune to
arbitrary code execution, but it's less likely than with a C program.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 20:14         ` Stefan Hajnoczi
@ 2022-09-28  8:33           ` Sergio Lopez
  2022-09-28 19:28             ` Vivek Goyal
  0 siblings, 1 reply; 21+ messages in thread
From: Sergio Lopez @ 2022-09-28  8:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Colin Walters, virtio-fs-list, qemu-devel, Vivek Goyal

[-- Attachment #1: Type: text/plain, Size: 2397 bytes --]

On Tue, Sep 27, 2022 at 04:14:20PM -0400, Stefan Hajnoczi wrote:
> On Tue, Sep 27, 2022 at 01:51:41PM -0400, Colin Walters wrote:
> > 
> > 
> > On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
> > >
> > >> > Now all the development has moved to rust virtiofsd.
> > 
> > Oh, awesome!!  The code there looks great.
> > 
> > > I could work on this for the next major version and see if anything breaks.
> > > But I prefer to add this as a compilation feature, instead of a command line
> > > option that we will then have to maintain for a while.
> > 
> > Hmm, what would be the issue with having the code there by default?  I think rather than any new command line option, we automatically use `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.
> > 
> > > Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> > > process can call openat2() without RESOLVE_IN_ROOT. 
> > 
> > I'm a bit skeptical honestly about how secure the existing namespace code is against a compromised virtiofsd process.  The primary worry is guest filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust makes this dramatically safer.
> > 
> > > I did some test with
> > > Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> > > kernel 5.13
> > 
> > But yes, landlock and other things make sense, I just don't see these things as strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd on more sandboxing than openat2 already gives us.
> 
> I think openat2(RESOLVE_IN_ROOT) support should be added unless there is
> another unprivileged mechanism that is stronger.
> 
> The security implications need to be covered in the user documentation
> so people can decide whether using this mode is appropriate.
> 
> We should continue to explain the difference between a voluntary
> mechanism like openat2(RESOLVE_IN_ROOT) and a mandatory mechanism like
> mount namespaces with pivot_root(2). Rust programs are not immune to
> arbitrary code execution, but it's less likely than with a C program.

I agree. Perhaps we could modify the "none" sandbox mode to use
openat2, if available, and add an "openat2" mode which does basically
the same thing, but bailing out if openat2 is not available.

And explain this clearly in the docs, of course.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-27 17:27       ` [Virtio-fs] " German Maglione
@ 2022-09-28 19:26         ` Vivek Goyal
  -1 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-28 19:26 UTC (permalink / raw)
  To: German Maglione; +Cc: Colin Walters, qemu-devel, virtio-fs-list, Sergio Lopez

On Tue, Sep 27, 2022 at 07:27:02PM +0200, German Maglione wrote:
> On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > > > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > > >
> > > > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > > > https://github.com/coreos/coreos-assembler/issues/1812
> > > >
> > > > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > > > https://lwn.net/Articles/796868/
> > > >
> > > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> > >
> > > [ cc virtio-fs-list, german, sergio ]
> > >
> > > Hi Colin,
> > >
> > > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > > good idea. We talked about it few times but nobody ever wrote a patch to
> > > implement it.
> > >
> > > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> > >
> > > I am wondering that it probably should not be a new sandbox mode at all.
> > > It probably should be the default if kernel offers openat2() syscall.
> > >
> > > Now all the development has moved to rust virtiofsd.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd
> > >
> > > C version of virtiofsd is just seeing small critical fixes.
> > >
> > > And rust version allows running unprivileged (inside a user namespace).
> > > German is also working on allowing running unprivileged without
> > > user namespaces but this will not allow arbitrary uid/gid switching.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> > >
> > > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > > then you need to use user namepsaces and map a range of subuid/subgid
> > > into the user namepsace virtiofsd is running in.
> > >
> > > If possible, please try to use rust virtiofsd for your situation. Its
> > > already packaged for fedora.
> > >
> > > Coming back to original idea of using openat2(), I think we should
> > > probably give it a try in rust virtiofsd and if it works, it should
> > > work across all the sandboxing modes.
> >
> > Thinking more about it, enabling openat2() usage conditionally based on
> > some option probably is not a bad idea. I was assuming that using
> > openat2() by default will not break any of the existing use cases. But
> > I am not sure. I have burnt my fingers so many times and had to back
> > out on default settings that enabling usage of openat2() conditionally
> > will probably be a safer choice. :-)
> >
> 
> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

What does compilation feature mean? One can compile it out? If it is
compiled in, is it enabled by default?

> 
> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

landlock sounds interesting. May be use it by default if kernel offers it.

Question will be, security mechanisms we are using, how many of these
are mutually exclusive and how many can be used together.

A. pivot_root()
B. chroot()
C. openat2()
D. landlock
E. seccomp

Seccomp goes well with everything. 
landlock probably will go well as well.

pivot_root() and chroot() are currently mutually exlusive.

openat2() is probably redundant if pivot_root()/chroot()/landlock is
being used. But should work anyway.

Something to document as Stefan suggested.

Vivek



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
@ 2022-09-28 19:26         ` Vivek Goyal
  0 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-28 19:26 UTC (permalink / raw)
  To: German Maglione; +Cc: Colin Walters, qemu-devel, virtio-fs-list, Sergio Lopez

On Tue, Sep 27, 2022 at 07:27:02PM +0200, German Maglione wrote:
> On Tue, Sep 27, 2022 at 6:57 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Tue, Sep 27, 2022 at 12:37:15PM -0400, Vivek Goyal wrote:
> > > On Fri, Sep 09, 2022 at 05:24:03PM -0400, Colin Walters wrote:
> > > > We previously had a chat here https://lore.kernel.org/all/348d4774-bd5f-4832-bd7e-a21491fdac8d@www.fastmail.com/T/
> > > > around virtiofsd and privileges and the case of trying to run virtiofsd inside an unprivileged (Kubernetes) container.
> > > >
> > > > Right now we're still using 9p, and it has bugs (basically it seems like the 9p inode flushing callback tries to allocate memory to send an RPC, and this causes OOM problems)
> > > > https://github.com/coreos/coreos-assembler/issues/1812
> > > >
> > > > Coming back to this...as of lately in Linux, there's support for strongly isolated filesystem access via openat2():
> > > > https://lwn.net/Articles/796868/
> > > >
> > > > Is there any reason we couldn't do an -o sandbox=openat2 ?  This operates without any privileges at all, and should be usable (and secure enough) in our use case.
> > >
> > > [ cc virtio-fs-list, german, sergio ]
> > >
> > > Hi Colin,
> > >
> > > Using openat2(RESOLVE_IN_ROOT) (if kernel is new enough), sounds like a
> > > good idea. We talked about it few times but nobody ever wrote a patch to
> > > implement it.
> > >
> > > And it probably makes sense with all the sandboxes (chroot(), namespaces).
> > >
> > > I am wondering that it probably should not be a new sandbox mode at all.
> > > It probably should be the default if kernel offers openat2() syscall.
> > >
> > > Now all the development has moved to rust virtiofsd.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd
> > >
> > > C version of virtiofsd is just seeing small critical fixes.
> > >
> > > And rust version allows running unprivileged (inside a user namespace).
> > > German is also working on allowing running unprivileged without
> > > user namespaces but this will not allow arbitrary uid/gid switching.
> > >
> > > https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136
> > >
> > > If one wants to run unprivileged and also do arbitrary uid/gid switching,
> > > then you need to use user namepsaces and map a range of subuid/subgid
> > > into the user namepsace virtiofsd is running in.
> > >
> > > If possible, please try to use rust virtiofsd for your situation. Its
> > > already packaged for fedora.
> > >
> > > Coming back to original idea of using openat2(), I think we should
> > > probably give it a try in rust virtiofsd and if it works, it should
> > > work across all the sandboxing modes.
> >
> > Thinking more about it, enabling openat2() usage conditionally based on
> > some option probably is not a bad idea. I was assuming that using
> > openat2() by default will not break any of the existing use cases. But
> > I am not sure. I have burnt my fingers so many times and had to back
> > out on default settings that enabling usage of openat2() conditionally
> > will probably be a safer choice. :-)
> >
> 
> I could work on this for the next major version and see if anything breaks.
> But I prefer to add this as a compilation feature, instead of a command line
> option that we will then have to maintain for a while.

What does compilation feature mean? One can compile it out? If it is
compiled in, is it enabled by default?

> 
> Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> process can call openat2() without RESOLVE_IN_ROOT. I did some test with
> Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> kernel 5.13

landlock sounds interesting. May be use it by default if kernel offers it.

Question will be, security mechanisms we are using, how many of these
are mutually exclusive and how many can be used together.

A. pivot_root()
B. chroot()
C. openat2()
D. landlock
E. seccomp

Seccomp goes well with everything. 
landlock probably will go well as well.

pivot_root() and chroot() are currently mutually exlusive.

openat2() is probably redundant if pivot_root()/chroot()/landlock is
being used. But should work anyway.

Something to document as Stefan suggested.

Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-28  8:33           ` Sergio Lopez
@ 2022-09-28 19:28             ` Vivek Goyal
  2022-09-29 14:04               ` Colin Walters
  0 siblings, 1 reply; 21+ messages in thread
From: Vivek Goyal @ 2022-09-28 19:28 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: Stefan Hajnoczi, Colin Walters, virtio-fs-list, qemu-devel

On Wed, Sep 28, 2022 at 10:33:40AM +0200, Sergio Lopez wrote:
> On Tue, Sep 27, 2022 at 04:14:20PM -0400, Stefan Hajnoczi wrote:
> > On Tue, Sep 27, 2022 at 01:51:41PM -0400, Colin Walters wrote:
> > > 
> > > 
> > > On Tue, Sep 27, 2022, at 1:27 PM, German Maglione wrote:
> > > >
> > > >> > Now all the development has moved to rust virtiofsd.
> > > 
> > > Oh, awesome!!  The code there looks great.
> > > 
> > > > I could work on this for the next major version and see if anything breaks.
> > > > But I prefer to add this as a compilation feature, instead of a command line
> > > > option that we will then have to maintain for a while.
> > > 
> > > Hmm, what would be the issue with having the code there by default?  I think rather than any new command line option, we automatically use `openat2+RESOLVE_IN_ROOT` if the process is run as a nonzero uid.
> > > 
> > > > Also, I don't see it as a sandbox feature, as Stefan mentioned, a compromised
> > > > process can call openat2() without RESOLVE_IN_ROOT. 
> > > 
> > > I'm a bit skeptical honestly about how secure the existing namespace code is against a compromised virtiofsd process.  The primary worry is guest filesystem traversals, right?  openat2+RESOLVE_IN_ROOT addresses that.  Plus being in Rust makes this dramatically safer.
> > > 
> > > > I did some test with
> > > > Landlock to lock virtiofsd inside the shared directory, but IIRC it requires a
> > > > kernel 5.13
> > > 
> > > But yes, landlock and other things make sense, I just don't see these things as strongly linked.  IOW we shouldn't in my opinion block unprivileged virtiofsd on more sandboxing than openat2 already gives us.
> > 
> > I think openat2(RESOLVE_IN_ROOT) support should be added unless there is
> > another unprivileged mechanism that is stronger.
> > 
> > The security implications need to be covered in the user documentation
> > so people can decide whether using this mode is appropriate.
> > 
> > We should continue to explain the difference between a voluntary
> > mechanism like openat2(RESOLVE_IN_ROOT) and a mandatory mechanism like
> > mount namespaces with pivot_root(2). Rust programs are not immune to
> > arbitrary code execution, but it's less likely than with a C program.
> 
> I agree. Perhaps we could modify the "none" sandbox mode to use
> openat2, if available, and add an "openat2" mode which does basically
> the same thing, but bailing out if openat2 is not available.

Sounds reasonable. In fact, we could probably do someting similar
for "landlock" as well. 

Vivek

> 
> And explain this clearly in the docs, of course.
> 
> Sergio.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-28 19:28             ` Vivek Goyal
@ 2022-09-29 14:04               ` Colin Walters
  2022-09-29 14:10                 ` Vivek Goyal
  0 siblings, 1 reply; 21+ messages in thread
From: Colin Walters @ 2022-09-29 14:04 UTC (permalink / raw)
  To: Vivek Goyal, Sergio Lopez; +Cc: Stefan Hajnoczi, virtio-fs-list, qemu-devel

On Wed, Sep 28, 2022, at 3:28 PM, Vivek Goyal wrote:

> Sounds reasonable. In fact, we could probably do someting similar
> for "landlock" as well. 

Thanks for the discussion all!  Can someone (vaguely) commit to look into this in say the next few months?  It's not *urgent*, we can live with the 9p flakes and problems short term, just trying to figure out if this needs to be on our medium-term radar or not.  Thanks!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-29 14:04               ` Colin Walters
@ 2022-09-29 14:10                 ` Vivek Goyal
  2022-09-29 15:47                   ` Colin Walters
  0 siblings, 1 reply; 21+ messages in thread
From: Vivek Goyal @ 2022-09-29 14:10 UTC (permalink / raw)
  To: Colin Walters; +Cc: Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel

On Thu, Sep 29, 2022 at 10:04:36AM -0400, Colin Walters wrote:
> On Wed, Sep 28, 2022, at 3:28 PM, Vivek Goyal wrote:
> 
> > Sounds reasonable. In fact, we could probably do someting similar
> > for "landlock" as well. 
> 
> Thanks for the discussion all!  Can someone (vaguely) commit to look into this in say the next few months?  It's not *urgent*, we can live with the 9p flakes and problems short term, just trying to figure out if this needs to be on our medium-term radar or not.  Thanks!

Hi Colin,

What's your use case. How do you plan to use virtiofs.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-29 14:10                 ` Vivek Goyal
@ 2022-09-29 15:47                   ` Colin Walters
  2022-09-29 17:03                     ` Vivek Goyal
  0 siblings, 1 reply; 21+ messages in thread
From: Colin Walters @ 2022-09-29 15:47 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel



On Thu, Sep 29, 2022, at 10:10 AM, Vivek Goyal wrote:

> What's your use case. How do you plan to use virtiofs.

At the current time, the Kubernetes that we run does not support user namespaces.  We want to do the production builds of our operating system (Fedora CoreOS and RHEL CoreOS) today inside an *unprivileged* Kubernetes pod (actually in OpenShift using anyuid, i.e. random unprivileged uid too), just with /dev/kvm exposed from the host (which is safe).  Operating system builds *and* tests in qemu are just another workload that can be shared with other tenants.

qemu works fine in this model, as does 9p.  It's just the virtiofs isolation requires privileges to be used today.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-29 15:47                   ` Colin Walters
@ 2022-09-29 17:03                     ` Vivek Goyal
  2022-09-30  8:13                       ` German Maglione
  2022-10-03 22:51                       ` Colin Walters
  0 siblings, 2 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-09-29 17:03 UTC (permalink / raw)
  To: Colin Walters
  Cc: Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel,
	German Maglione

On Thu, Sep 29, 2022 at 11:47:32AM -0400, Colin Walters wrote:
> 
> 
> On Thu, Sep 29, 2022, at 10:10 AM, Vivek Goyal wrote:
> 
> > What's your use case. How do you plan to use virtiofs.
> 
> At the current time, the Kubernetes that we run does not support user namespaces.  We want to do the production builds of our operating system (Fedora CoreOS and RHEL CoreOS) today inside an *unprivileged* Kubernetes pod (actually in OpenShift using anyuid, i.e. random unprivileged uid too), just with /dev/kvm exposed from the host (which is safe).  Operating system builds *and* tests in qemu are just another workload that can be shared with other tenants.
> 
> qemu works fine in this model, as does 9p.  It's just the virtiofs isolation requires privileges to be used today.

[ cc German ]

Hi Colin,

So rust version of virtiofsd, already supports running unprivileged
(inside a user namespace).

https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md#running-as-non-privileged-user

host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock --shared-dir /mnt \
        --announce-submounts --sandbox chroot &

I think only privileged operation it needs is assigning a range of
subuid/subgid to the uid you are using on host.

I think that should be usable for you as of now.

Having said that, openat2() and landlock are interesting improvements,
especially when somebody does not want to use user namespaces. Without
user namespaces, one will not be able to do arbitrary swithing of uid/gid.
IOW, inside guest, you will be limited to one uid/gid.

I am hoping German or somebody else can have a look openat2() and landlock
improvements in near future.

I am assuming you are fine with using user namespaces on host. And by
assigning subuid/subgid range, it will allow you arbitrary swithching
of uid/gid inside guest.

Can you give rust virtiofsd (unprivileged) a try.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-29 17:03                     ` Vivek Goyal
@ 2022-09-30  8:13                       ` German Maglione
  2022-10-03 22:51                       ` Colin Walters
  1 sibling, 0 replies; 21+ messages in thread
From: German Maglione @ 2022-09-30  8:13 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Colin Walters, Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel

On Thu, Sep 29, 2022 at 7:03 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, Sep 29, 2022 at 11:47:32AM -0400, Colin Walters wrote:
> >
> >
> > On Thu, Sep 29, 2022, at 10:10 AM, Vivek Goyal wrote:
> >
> > > What's your use case. How do you plan to use virtiofs.
> >
> > At the current time, the Kubernetes that we run does not support user namespaces.  We want to do the production builds of our operating system (Fedora CoreOS and RHEL CoreOS) today inside an *unprivileged* Kubernetes pod (actually in OpenShift using anyuid, i.e. random unprivileged uid too), just with /dev/kvm exposed from the host (which is safe).  Operating system builds *and* tests in qemu are just another workload that can be shared with other tenants.
> >
> > qemu works fine in this model, as does 9p.  It's just the virtiofs isolation requires privileges to be used today.
>
> [ cc German ]
>
> Hi Colin,
>
> So rust version of virtiofsd, already supports running unprivileged
> (inside a user namespace).
>
> https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md#running-as-non-privileged-user
>
> host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock --shared-dir /mnt \
>         --announce-submounts --sandbox chroot &
>
> I think only privileged operation it needs is assigning a range of
> subuid/subgid to the uid you are using on host.
>
> I think that should be usable for you as of now.
>
> Having said that, openat2() and landlock are interesting improvements,
> especially when somebody does not want to use user namespaces. Without
> user namespaces, one will not be able to do arbitrary swithing of uid/gid.
> IOW, inside guest, you will be limited to one uid/gid.
>
> I am hoping German or somebody else can have a look openat2() and landlock
> improvements in near future.

I will do it.

>
> I am assuming you are fine with using user namespaces on host. And by
> assigning subuid/subgid range, it will allow you arbitrary swithching
> of uid/gid inside guest.
>
> Can you give rust virtiofsd (unprivileged) a try.
>
> Thanks
> Vivek
>


-- 
German


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-09-29 17:03                     ` Vivek Goyal
  2022-09-30  8:13                       ` German Maglione
@ 2022-10-03 22:51                       ` Colin Walters
  2022-10-05 21:29                         ` Vivek Goyal
  1 sibling, 1 reply; 21+ messages in thread
From: Colin Walters @ 2022-10-03 22:51 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel,
	German Maglione



On Thu, Sep 29, 2022, at 1:03 PM, Vivek Goyal wrote:
> 
> So rust version of virtiofsd, already supports running unprivileged
> (inside a user namespace).

I know, but as I already said, the use case here is running inside an OpenShift unprivileged pod where *we are already in a container*.

> host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock 
> --shared-dir /mnt \
>         --announce-submounts --sandbox chroot &

Yes, but in current OCP 4.11 our seccomp policy denies CLONE_NEWUSER:

```
$ unshare -m
unshare: unshare failed: Function not implemented
```

https://docs.openshift.com/container-platform/4.11/security/seccomp-profiles.html

> I think only privileged operation it needs is assigning a range of
> subuid/subgid to the uid you are using on host.

We also turn on NO_NEW_PRIVILEGES by default in OCP pods.  

Now, I *could* in general get elevated permissions where I need to today.  But it's also really important to me to have a long term goal of having operating system builds and tests work well as "just another workload" in our production container platform (now, one *does* want to bind in /dev/kvm, but that's generally safe, and even that strictly speaking is optional if one can stomach the ~10x perf hit).

> Can you give rust virtiofsd (unprivileged) a try.

I admit to not actually trying it in a pod, but I think we all agree it can't work, and the only thing that can today is openat2.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Virtio-fs] virtiofsd: Any reason why there's not an "openat2" sandbox mode?
  2022-10-03 22:51                       ` Colin Walters
@ 2022-10-05 21:29                         ` Vivek Goyal
  0 siblings, 0 replies; 21+ messages in thread
From: Vivek Goyal @ 2022-10-05 21:29 UTC (permalink / raw)
  To: Colin Walters
  Cc: Sergio Lopez, Stefan Hajnoczi, virtio-fs-list, qemu-devel,
	German Maglione

On Mon, Oct 03, 2022 at 06:51:42PM -0400, Colin Walters wrote:
> 
> 
> On Thu, Sep 29, 2022, at 1:03 PM, Vivek Goyal wrote:
> > 
> > So rust version of virtiofsd, already supports running unprivileged
> > (inside a user namespace).
> 
> I know, but as I already said, the use case here is running inside an OpenShift unprivileged pod where *we are already in a container*.
> 
> > host$ podman unshare -- virtiofsd --socket-path=/tmp/vfsd.sock 
> > --shared-dir /mnt \
> >         --announce-submounts --sandbox chroot &
> 
> Yes, but in current OCP 4.11 our seccomp policy denies CLONE_NEWUSER:

Hmm..., no user namespaces allowed. 

So sandbox=none in theory should work once we fix it for unprivileged
user.

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/136

Given you are already running inside a pod/container, not sure if
locking down virtiofsd with openat2(RESOLVE_IN_ROOT)/landlock is
must for you from security point of view. virtiofsd should not be
able to access anything outside the pod/container anyway and can
only affect things inside the pod/container.

Once we add support for openat2(). Next issue is do you need
arbitrary uid/gid support. By default it will be a single uid/gid
filesystem. Is that enough for your use case? Or inside the guest
you need to be able to switch between arbitrary uid/gid on this
virtiofs filesystem.



> 
> ```
> $ unshare -m
> unshare: unshare failed: Function not implemented
> ```
> 
> https://docs.openshift.com/container-platform/4.11/security/seccomp-profiles.html
> 
> > I think only privileged operation it needs is assigning a range of
> > subuid/subgid to the uid you are using on host.
> 
> We also turn on NO_NEW_PRIVILEGES by default in OCP pods.  
> 
> Now, I *could* in general get elevated permissions where I need to today.  But it's also really important to me to have a long term goal of having operating system builds and tests work well as "just another workload" in our production container platform (now, one *does* want to bind in /dev/kvm, but that's generally safe, and even that strictly speaking is optional if one can stomach the ~10x perf hit).

I am assuming this 10x performance hit is being compared with native
container build and test where no VM will be launched.


> 
> > Can you give rust virtiofsd (unprivileged) a try.
> 
> I admit to not actually trying it in a pod, but I think we all agree it can't work, and the only thing that can today is openat2.

Agreed. Right now we rely on using user namespace for unpriviliged use
case. 

We should be able to enable sandbox=none for unprivileged user (no user
namespace) and possibly add openat2() support as well. 

I think being able to provide arbitrary uid/gid support will be more
tricky and more work. It will need to store actual uid/gid into some
sort of user xattr. (as done by 9pfs and fuse-overlay and libkrun etc).
And I will not be surprised that there are bunch of corner cases using
that approach. (setuid/setgid automatic clearing etc.)

Thanks
Vivek

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-10-05 21:29 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 21:24 virtiofsd: Any reason why there's not an "openat2" sandbox mode? Colin Walters
2022-09-27 16:37 ` Vivek Goyal
2022-09-27 16:37   ` [Virtio-fs] " Vivek Goyal
2022-09-27 16:57   ` Vivek Goyal
2022-09-27 16:57     ` [Virtio-fs] " Vivek Goyal
2022-09-27 17:27     ` German Maglione
2022-09-27 17:27       ` [Virtio-fs] " German Maglione
2022-09-27 17:51       ` Colin Walters
2022-09-27 17:51         ` [Virtio-fs] " Colin Walters
2022-09-27 20:14         ` Stefan Hajnoczi
2022-09-28  8:33           ` Sergio Lopez
2022-09-28 19:28             ` Vivek Goyal
2022-09-29 14:04               ` Colin Walters
2022-09-29 14:10                 ` Vivek Goyal
2022-09-29 15:47                   ` Colin Walters
2022-09-29 17:03                     ` Vivek Goyal
2022-09-30  8:13                       ` German Maglione
2022-10-03 22:51                       ` Colin Walters
2022-10-05 21:29                         ` Vivek Goyal
2022-09-28 19:26       ` Vivek Goyal
2022-09-28 19:26         ` [Virtio-fs] " Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.