All of lore.kernel.org
 help / color / mirror / Atom feed
* Use of unshare(CLONE_FS) in virtiofsd
@ 2022-11-04  7:50 ` Florian Weimer
  0 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2022-11-04  7:50 UTC (permalink / raw)
  To: virtio-fs
  Cc: qemu-devel, Misono Tomohiro, Vivek Goyal, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi

I've got a proposed extension for glibc's pthread_create which allows
the creation of threads with a dedicated current working
directory/umask/chroot:

  [PATCH 0/2] Introduce per-thread file system properties on Linux
  <https://sourceware.org/pipermail/libc-alpha/2022-October/142640.html>

I expect that glibc integration will work around the seccomp issue
mentioned in a comment (also brought up by the Samba people for their
use) because glibc will perform the unshare directly during the clone
system call, and not via a separate system call.

I see that unshare(CLONE_FS) was introduced in this commit:

commit bdfd66788349acc43cd3f1298718ad491663cfcc
Author: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Date:   Thu Feb 27 14:59:27 2020 +0900

    virtiofsd: Fix xattr operations
    
    Current virtiofsd has problems about xattr operations and
    they does not work properly for directory/symlink/special file.
    
    The fundamental cause is that virtiofsd uses openat() + f...xattr()
    systemcalls for xattr operation but we should not open symlink/special
    file in the daemon. Therefore the function is restricted.
    
    Fix this problem by:
     1. during setup of each thread, call unshare(CLONE_FS)
     2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
        file or directory, use fchdir(proc_loot_fd) + ...xattr() +
        fchdir(root.fd) instead of openat() + f...xattr()
    
        (Note: for a regular file/directory openat() + f...xattr()
         is still used for performance reason)
    
    With this patch, xfstests generic/062 passes on virtiofs.
    
    This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
    The original discussion can be found here:
      https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html
    
    Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
    Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Now the question has come up on the libc-coord list why the *at
interfaces are not used in such cases:

  <https://www.openwall.com/lists/libc-coord/2022/10/24/3>

Clearly the kernel lacks support for fgetxattrat today.  The usual
recommendation for emulating it is to use openat with O_PATH, and then
use getxattr on the virtual /proc/self/fd path.  This needs an
additional system call (openat, getxattr, close instead of fchdir,
getxattr), but it avoids the unshare(CLONE_FS) call behind libc's back.
The directory entries in /proc/self/fd present as symbolic links, but
are not implemented as such by the kernel: there is no separate pathname
lookup for already-open O_PATH descriptors, so there is no race.

Thoughts?

Thanks,
Florian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Virtio-fs] Use of unshare(CLONE_FS) in virtiofsd
@ 2022-11-04  7:50 ` Florian Weimer
  0 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2022-11-04  7:50 UTC (permalink / raw)
  To: virtio-fs
  Cc: qemu-devel, Misono Tomohiro, Vivek Goyal, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi

I've got a proposed extension for glibc's pthread_create which allows
the creation of threads with a dedicated current working
directory/umask/chroot:

  [PATCH 0/2] Introduce per-thread file system properties on Linux
  <https://sourceware.org/pipermail/libc-alpha/2022-October/142640.html>

I expect that glibc integration will work around the seccomp issue
mentioned in a comment (also brought up by the Samba people for their
use) because glibc will perform the unshare directly during the clone
system call, and not via a separate system call.

I see that unshare(CLONE_FS) was introduced in this commit:

commit bdfd66788349acc43cd3f1298718ad491663cfcc
Author: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Date:   Thu Feb 27 14:59:27 2020 +0900

    virtiofsd: Fix xattr operations
    
    Current virtiofsd has problems about xattr operations and
    they does not work properly for directory/symlink/special file.
    
    The fundamental cause is that virtiofsd uses openat() + f...xattr()
    systemcalls for xattr operation but we should not open symlink/special
    file in the daemon. Therefore the function is restricted.
    
    Fix this problem by:
     1. during setup of each thread, call unshare(CLONE_FS)
     2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
        file or directory, use fchdir(proc_loot_fd) + ...xattr() +
        fchdir(root.fd) instead of openat() + f...xattr()
    
        (Note: for a regular file/directory openat() + f...xattr()
         is still used for performance reason)
    
    With this patch, xfstests generic/062 passes on virtiofs.
    
    This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
    The original discussion can be found here:
      https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html
    
    Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
    Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Now the question has come up on the libc-coord list why the *at
interfaces are not used in such cases:

  <https://www.openwall.com/lists/libc-coord/2022/10/24/3>

Clearly the kernel lacks support for fgetxattrat today.  The usual
recommendation for emulating it is to use openat with O_PATH, and then
use getxattr on the virtual /proc/self/fd path.  This needs an
additional system call (openat, getxattr, close instead of fchdir,
getxattr), but it avoids the unshare(CLONE_FS) call behind libc's back.
The directory entries in /proc/self/fd present as symbolic links, but
are not implemented as such by the kernel: there is no separate pathname
lookup for already-open O_PATH descriptors, so there is no race.

Thoughts?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use of unshare(CLONE_FS) in virtiofsd
  2022-11-04  7:50 ` [Virtio-fs] " Florian Weimer
@ 2022-11-04 13:00   ` Vivek Goyal
  -1 siblings, 0 replies; 6+ messages in thread
From: Vivek Goyal @ 2022-11-04 13:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: virtio-fs, qemu-devel, Misono Tomohiro, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi, German Maglione, Sergio Lopez


On Fri, Nov 04, 2022 at 08:50:45AM +0100, Florian Weimer wrote:
> I've got a proposed extension for glibc's pthread_create which allows
> the creation of threads with a dedicated current working
> directory/umask/chroot:
> 
>   [PATCH 0/2] Introduce per-thread file system properties on Linux
>   <https://sourceware.org/pipermail/libc-alpha/2022-October/142640.html>
> 
> I expect that glibc integration will work around the seccomp issue
> mentioned in a comment (also brought up by the Samba people for their
> use) because glibc will perform the unshare directly during the clone
> system call, and not via a separate system call.
> 
> I see that unshare(CLONE_FS) was introduced in this commit:
> 
> commit bdfd66788349acc43cd3f1298718ad491663cfcc
> Author: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> Date:   Thu Feb 27 14:59:27 2020 +0900
> 
>     virtiofsd: Fix xattr operations
>     
>     Current virtiofsd has problems about xattr operations and
>     they does not work properly for directory/symlink/special file.
>     
>     The fundamental cause is that virtiofsd uses openat() + f...xattr()
>     systemcalls for xattr operation but we should not open symlink/special
>     file in the daemon. Therefore the function is restricted.
>     
>     Fix this problem by:
>      1. during setup of each thread, call unshare(CLONE_FS)
>      2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
>         file or directory, use fchdir(proc_loot_fd) + ...xattr() +
>         fchdir(root.fd) instead of openat() + f...xattr()
>     
>         (Note: for a regular file/directory openat() + f...xattr()
>          is still used for performance reason)
>     
>     With this patch, xfstests generic/062 passes on virtiofs.
>     
>     This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
>     The original discussion can be found here:
>       https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html
>     
>     Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>     Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
>     Acked-by: Vivek Goyal <vgoyal@redhat.com>
>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>     Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Now the question has come up on the libc-coord list why the *at
> interfaces are not used in such cases:
> 
>   <https://www.openwall.com/lists/libc-coord/2022/10/24/3>
> 
> Clearly the kernel lacks support for fgetxattrat today.

[ CC German, Sergio ]

fgetxattrat() will be nice.

>  The usual
> recommendation for emulating it is to use openat with O_PATH, and then
> use getxattr on the virtual /proc/self/fd path.  This needs an
> additional system call (openat, getxattr, close instead of fchdir,
> getxattr),

openat(O_PATH) + getxattr(/proc/self/fd) + close() sounds reasonable
too. Not sure why did we not take that path. May be due to that extra
syscall or something else.

> but it avoids the unshare(CLONE_FS) call behind libc's back.

Hmm.., did not know that libc does not like threads calling
unshare(CLONE_FS). Not sure why that is a problem.

BTW, we need separate umask per thread as well. During file creation 
we might be switching to umask provide in fuse protocol message
and then switch back. Given multiple therads might be doing this
creation in parallel, so we ofcourse need this to be per thread
property.

So if your patches for pthread_create() with per thread filesystem
attributes finally goes upstream, I guess we should be able to
make use of it and drop unshare(CLONE_FS).

Thanks
Vivek

> The directory entries in /proc/self/fd present as symbolic links, but
> are not implemented as such by the kernel: there is no separate pathname
> lookup for already-open O_PATH descriptors, so there is no race.
> 
> Thoughts?
> 
> Thanks,
> Florian
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] Use of unshare(CLONE_FS) in virtiofsd
@ 2022-11-04 13:00   ` Vivek Goyal
  0 siblings, 0 replies; 6+ messages in thread
From: Vivek Goyal @ 2022-11-04 13:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: virtio-fs, qemu-devel, Misono Tomohiro, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi, German Maglione, Sergio Lopez


On Fri, Nov 04, 2022 at 08:50:45AM +0100, Florian Weimer wrote:
> I've got a proposed extension for glibc's pthread_create which allows
> the creation of threads with a dedicated current working
> directory/umask/chroot:
> 
>   [PATCH 0/2] Introduce per-thread file system properties on Linux
>   <https://sourceware.org/pipermail/libc-alpha/2022-October/142640.html>
> 
> I expect that glibc integration will work around the seccomp issue
> mentioned in a comment (also brought up by the Samba people for their
> use) because glibc will perform the unshare directly during the clone
> system call, and not via a separate system call.
> 
> I see that unshare(CLONE_FS) was introduced in this commit:
> 
> commit bdfd66788349acc43cd3f1298718ad491663cfcc
> Author: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
> Date:   Thu Feb 27 14:59:27 2020 +0900
> 
>     virtiofsd: Fix xattr operations
>     
>     Current virtiofsd has problems about xattr operations and
>     they does not work properly for directory/symlink/special file.
>     
>     The fundamental cause is that virtiofsd uses openat() + f...xattr()
>     systemcalls for xattr operation but we should not open symlink/special
>     file in the daemon. Therefore the function is restricted.
>     
>     Fix this problem by:
>      1. during setup of each thread, call unshare(CLONE_FS)
>      2. in xattr operations (i.e. lo_getxattr), if inode is not a regular
>         file or directory, use fchdir(proc_loot_fd) + ...xattr() +
>         fchdir(root.fd) instead of openat() + f...xattr()
>     
>         (Note: for a regular file/directory openat() + f...xattr()
>          is still used for performance reason)
>     
>     With this patch, xfstests generic/062 passes on virtiofs.
>     
>     This fix is suggested by Miklos Szeredi and Stefan Hajnoczi.
>     The original discussion can be found here:
>       https://www.redhat.com/archives/virtio-fs/2019-October/msg00046.html
>     
>     Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
>     Message-Id: <20200227055927.24566-3-misono.tomohiro@jp.fujitsu.com>
>     Acked-by: Vivek Goyal <vgoyal@redhat.com>
>     Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>     Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Now the question has come up on the libc-coord list why the *at
> interfaces are not used in such cases:
> 
>   <https://www.openwall.com/lists/libc-coord/2022/10/24/3>
> 
> Clearly the kernel lacks support for fgetxattrat today.

[ CC German, Sergio ]

fgetxattrat() will be nice.

>  The usual
> recommendation for emulating it is to use openat with O_PATH, and then
> use getxattr on the virtual /proc/self/fd path.  This needs an
> additional system call (openat, getxattr, close instead of fchdir,
> getxattr),

openat(O_PATH) + getxattr(/proc/self/fd) + close() sounds reasonable
too. Not sure why did we not take that path. May be due to that extra
syscall or something else.

> but it avoids the unshare(CLONE_FS) call behind libc's back.

Hmm.., did not know that libc does not like threads calling
unshare(CLONE_FS). Not sure why that is a problem.

BTW, we need separate umask per thread as well. During file creation 
we might be switching to umask provide in fuse protocol message
and then switch back. Given multiple therads might be doing this
creation in parallel, so we ofcourse need this to be per thread
property.

So if your patches for pthread_create() with per thread filesystem
attributes finally goes upstream, I guess we should be able to
make use of it and drop unshare(CLONE_FS).

Thanks
Vivek

> The directory entries in /proc/self/fd present as symbolic links, but
> are not implemented as such by the kernel: there is no separate pathname
> lookup for already-open O_PATH descriptors, so there is no race.
> 
> Thoughts?
> 
> Thanks,
> Florian
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use of unshare(CLONE_FS) in virtiofsd
  2022-11-04 13:00   ` [Virtio-fs] " Vivek Goyal
@ 2022-11-04 17:44     ` Florian Weimer
  -1 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2022-11-04 17:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Misono Tomohiro, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi, German Maglione, Sergio Lopez

* Vivek Goyal:

>>  The usual
>> recommendation for emulating it is to use openat with O_PATH, and then
>> use getxattr on the virtual /proc/self/fd path.  This needs an
>> additional system call (openat, getxattr, close instead of fchdir,
>> getxattr),
>
> openat(O_PATH) + getxattr(/proc/self/fd) + close() sounds reasonable
> too. Not sure why did we not take that path. May be due to that extra
> syscall or something else.

Thanks.

>> but it avoids the unshare(CLONE_FS) call behind libc's back.
>
> Hmm.., did not know that libc does not like threads calling
> unshare(CLONE_FS). Not sure why that is a problem.

Here's a corner case: We plan to add chroot detection to NSS module
loading (so that we do not load NSS modules after chroot), as a form of
security hardening.  If the application calls unshare(CLONE_FS) and then
chroot, which NSS modules are loaded depends on which threads call NSS
functions.

One could argue that chdir/chroot are problematic, not
unshare(CLONE_FS). 8-)

> BTW, we need separate umask per thread as well. During file creation 
> we might be switching to umask provide in fuse protocol message
> and then switch back. Given multiple therads might be doing this
> creation in parallel, so we ofcourse need this to be per thread
> property.

That's a good point.  That's not something that can be worked around
with *at functions.

> So if your patches for pthread_create() with per thread filesystem
> attributes finally goes upstream, I guess we should be able to
> make use of it and drop unshare(CLONE_FS).

Thank you for the feedback.

Florian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] Use of unshare(CLONE_FS) in virtiofsd
@ 2022-11-04 17:44     ` Florian Weimer
  0 siblings, 0 replies; 6+ messages in thread
From: Florian Weimer @ 2022-11-04 17:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: virtio-fs, qemu-devel, Misono Tomohiro, Dr. David Alan Gilbert,
	Miklos Szeredi, Stefan Hajnoczi, German Maglione, Sergio Lopez

* Vivek Goyal:

>>  The usual
>> recommendation for emulating it is to use openat with O_PATH, and then
>> use getxattr on the virtual /proc/self/fd path.  This needs an
>> additional system call (openat, getxattr, close instead of fchdir,
>> getxattr),
>
> openat(O_PATH) + getxattr(/proc/self/fd) + close() sounds reasonable
> too. Not sure why did we not take that path. May be due to that extra
> syscall or something else.

Thanks.

>> but it avoids the unshare(CLONE_FS) call behind libc's back.
>
> Hmm.., did not know that libc does not like threads calling
> unshare(CLONE_FS). Not sure why that is a problem.

Here's a corner case: We plan to add chroot detection to NSS module
loading (so that we do not load NSS modules after chroot), as a form of
security hardening.  If the application calls unshare(CLONE_FS) and then
chroot, which NSS modules are loaded depends on which threads call NSS
functions.

One could argue that chdir/chroot are problematic, not
unshare(CLONE_FS). 8-)

> BTW, we need separate umask per thread as well. During file creation 
> we might be switching to umask provide in fuse protocol message
> and then switch back. Given multiple therads might be doing this
> creation in parallel, so we ofcourse need this to be per thread
> property.

That's a good point.  That's not something that can be worked around
with *at functions.

> So if your patches for pthread_create() with per thread filesystem
> attributes finally goes upstream, I guess we should be able to
> make use of it and drop unshare(CLONE_FS).

Thank you for the feedback.

Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-04 17:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04  7:50 Use of unshare(CLONE_FS) in virtiofsd Florian Weimer
2022-11-04  7:50 ` [Virtio-fs] " Florian Weimer
2022-11-04 13:00 ` Vivek Goyal
2022-11-04 13:00   ` [Virtio-fs] " Vivek Goyal
2022-11-04 17:44   ` Florian Weimer
2022-11-04 17:44     ` [Virtio-fs] " Florian Weimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.