All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS
  2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
@ 2022-04-21 15:07 ` Christian Schoenebeck
  2022-04-21 16:32   ` Greg Kurz
  2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

mknod() on macOS does not support creating regular files, so
divert to openat_file() if S_IFREG is passed with mode argument.

Furthermore, 'man 2 mknodat' on Linux says: "Zero file type is
equivalent to type S_IFREG".

Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Will Cohen <wwcohen@gmail.com>
---
 hw/9pfs/9p-util-darwin.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
index bec0253474..e24d09763a 100644
--- a/hw/9pfs/9p-util-darwin.c
+++ b/hw/9pfs/9p-util-darwin.c
@@ -77,6 +77,15 @@ int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
 int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
 {
     int preserved_errno, err;
+
+    if (S_ISREG(mode) || !(mode & S_IFMT)) {
+        int fd = openat_file(dirfd, filename, O_CREAT, mode);
+        if (fd == -1) {
+            return fd;
+        }
+        close(fd);
+        return 0;
+    }
     if (!pthread_fchdir_np) {
         error_report_once("pthread_fchdir_np() not available on this version of macOS");
         return -ENOTSUP;
-- 
2.32.0 (Apple Git-132)



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
  2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
@ 2022-04-21 15:07 ` Christian Schoenebeck
  2022-04-21 16:36   ` Greg Kurz
  2022-04-22  2:43   ` Akihiko Odaki
  2022-04-21 15:07 ` [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr " Christian Schoenebeck
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

mknod() on macOS does not support creating sockets, so divert to
call sequence socket(), bind() and chmod() respectively if S_IFSOCK
was passed with mode argument.

Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Will Cohen <wwcohen@gmail.com>
---
 hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
index e24d09763a..39308f2a45 100644
--- a/hw/9pfs/9p-util-darwin.c
+++ b/hw/9pfs/9p-util-darwin.c
@@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
  */
 #if defined CONFIG_PTHREAD_FCHDIR_NP
 
+static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
+    int fd, err;
+    struct sockaddr_un addr = {
+        .sun_family = AF_UNIX
+    };
+
+    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
+    if (fd == -1) {
+        return fd;
+    }
+    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
+    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
+    if (err == -1) {
+        goto out;
+    }
+    err = chmod(addr.sun_path, mode);
+out:
+    close(fd);
+    return err;
+}
+
 int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
 {
     int preserved_errno, err;
@@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
     if (pthread_fchdir_np(dirfd) < 0) {
         return -1;
     }
-    err = mknod(filename, mode, dev);
+    if (S_ISSOCK(mode)) {
+        err = create_socket_file_at_cwd(filename, mode);
+    } else {
+        err = mknod(filename, mode, dev);
+    }
     preserved_errno = errno;
     /* Stop using the thread-local cwd */
     pthread_fchdir_np(-1);
-- 
2.32.0 (Apple Git-132)



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr on macOS
  2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
  2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
  2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
@ 2022-04-21 15:07 ` Christian Schoenebeck
  2022-04-21 16:39   ` Greg Kurz
  2022-04-21 15:07 ` [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host Christian Schoenebeck
  2022-04-21 15:07 ` [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr " Christian Schoenebeck
  4 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

The 'rdev' field in 9p reponse 'Rgetattr' is of type dev_t,
which is actually a system dependant type and therefore both the
size and encoding of dev_t differ between macOS and Linux.

So far we have sent 'rdev' to guest in host's dev_t format as-is,
which caused devices to appear with wrong device numbers on
guests running on macOS hosts, eventually leading to various
misbehaviours on guest in conjunction with device files.

This patch fixes this issue by converting the device number from
host's dev_t format to Linux dev_t format. As 9p request
'Tgettattr' is exclusive to protocol version 9p2000.L, it should
be fair to assume that 'rdev' field is assumed to be in Linux dev_t
format by client as well.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/20220421093056.5ab1e7ed@bahia/
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 hw/9pfs/9p-util.h | 39 +++++++++++++++++++++++++++++++++++++++
 hw/9pfs/9p.c      |  2 +-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
index 97e681e167..2cc9a5dbfb 100644
--- a/hw/9pfs/9p-util.h
+++ b/hw/9pfs/9p-util.h
@@ -19,6 +19,45 @@
 #define O_PATH_9P_UTIL 0
 #endif
 
+#if !defined(CONFIG_LINUX)
+
+/*
+ * Generates a Linux device number (a.k.a. dev_t) for given device major
+ * and minor numbers.
+ *
+ * To be more precise: it generates a device number in glibc's format
+ * (MMMM_Mmmm_mmmM_MMmm, 64 bits) actually, which is compatible with
+ * Linux's format (mmmM_MMmm, 32 bits), as described in <bits/sysmacros.h>.
+ */
+static inline uint64_t makedev_dotl(uint32_t dev_major, uint32_t dev_minor)
+{
+    uint64_t dev;
+
+    // from glibc sysmacros.h:
+    dev  = (((uint64_t) (dev_major & 0x00000fffu)) <<  8);
+    dev |= (((uint64_t) (dev_major & 0xfffff000u)) << 32);
+    dev |= (((uint64_t) (dev_minor & 0x000000ffu)) <<  0);
+    dev |= (((uint64_t) (dev_minor & 0xffffff00u)) << 12);
+    return dev;
+}
+
+#endif
+
+/*
+ * Converts given device number from host's device number format to Linux
+ * device number format. As both the size of type dev_t and encoding of
+ * dev_t is system dependant, we have to convert them for Linux guests if
+ * host is not running Linux.
+ */
+static inline uint64_t host_dev_to_dotl_dev(dev_t dev)
+{
+#ifdef CONFIG_LINUX
+    return dev;
+#else
+    return makedev_dotl(major(dev), minor(dev));
+#endif
+}
+
 #ifdef CONFIG_DARWIN
 #define qemu_fgetxattr(...) fgetxattr(__VA_ARGS__, 0, 0)
 #define qemu_lgetxattr(...) getxattr(__VA_ARGS__, 0, XATTR_NOFOLLOW)
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 225f31fc31..4a296a0b94 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1327,7 +1327,7 @@ static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
     v9lstat->st_nlink = stbuf->st_nlink;
     v9lstat->st_uid = stbuf->st_uid;
     v9lstat->st_gid = stbuf->st_gid;
-    v9lstat->st_rdev = stbuf->st_rdev;
+    v9lstat->st_rdev = host_dev_to_dotl_dev(stbuf->st_rdev);
     v9lstat->st_size = stbuf->st_size;
     v9lstat->st_blksize = stat_to_iounit(pdu, stbuf);
     v9lstat->st_blocks = stbuf->st_blocks;
-- 
2.32.0 (Apple Git-132)



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host
  2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
                   ` (2 preceding siblings ...)
  2022-04-21 15:07 ` [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr " Christian Schoenebeck
@ 2022-04-21 15:07 ` Christian Schoenebeck
  2022-04-21 16:39   ` Greg Kurz
  2022-04-21 15:07 ` [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr " Christian Schoenebeck
  4 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

Linux and macOS only share some errno definitions with equal macro
name and value. In fact most mappings for errno are completely
different on the two systems.

This patch converts some important errno values from macOS host to
corresponding Linux errno values before eventually sending such error
codes along with 'Rlerror' replies (if 9p2000.L is used that is). Not
having translated errnos before violated the 9p2000.L protocol spec,
which says:

  "
  size[4] Rlerror tag[2] ecode[4]

  ... ecode is a numerical Linux errno.
  "

  https://github.com/chaos/diod/wiki/protocol#lerror----return-error-code

This patch fixes a bunch of misbehaviours when running a Linux client
on macOS host. For instance this patch fixes:

  mount -t 9p -o posixacl ...

on Linux guest if security_mode=mapped was used for 9p server, which
refused to mount successfully, because macOS returned ENOATTR==93
when client tried to retrieve POSIX ACL xattrs, because errno 93
is defined as EPROTONOSUPPORT==93 on Linux, so Linux client believed
that xattrs were not supported by filesystem on host in general.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/20220421124835.3e664669@bahia/
---
 hw/9pfs/9p-util.h | 30 ++++++++++++++++++++++++++++++
 hw/9pfs/9p.c      |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
index 2cc9a5dbfb..c3526144c9 100644
--- a/hw/9pfs/9p-util.h
+++ b/hw/9pfs/9p-util.h
@@ -58,6 +58,36 @@ static inline uint64_t host_dev_to_dotl_dev(dev_t dev)
 #endif
 }
 
+/* Translates errno from host -> Linux if needed */
+static inline int errno_to_dotl(int err) {
+#if defined(CONFIG_LINUX)
+    /* nothing to translate (Linux -> Linux) */
+#elif defined(CONFIG_DARWIN)
+    /*
+     * translation mandatory for macOS hosts
+     *
+     * FIXME: Only most important errnos translated here yet, this should be
+     * extended to as many errnos being translated as possible in future.
+     */
+    if (err == ENAMETOOLONG) {
+        err = 36; /* ==ENAMETOOLONG on Linux */
+    } else if (err == ENOTEMPTY) {
+        err = 39; /* ==ENOTEMPTY on Linux */
+    } else if (err == ELOOP) {
+        err = 40; /* ==ELOOP on Linux */
+    } else if (err == ENOATTR) {
+        err = 61; /* ==ENODATA on Linux */
+    } else if (err == ENOTSUP) {
+        err = 95; /* ==EOPNOTSUPP on Linux */
+    } else if (err == EOPNOTSUPP) {
+        err = 95; /* ==EOPNOTSUPP on Linux */
+    }
+#else
+#error Missing errno translation to Linux for this host system
+#endif
+    return err;
+}
+
 #ifdef CONFIG_DARWIN
 #define qemu_fgetxattr(...) fgetxattr(__VA_ARGS__, 0, 0)
 #define qemu_lgetxattr(...) getxattr(__VA_ARGS__, 0, XATTR_NOFOLLOW)
diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 4a296a0b94..0cd0c14c2a 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1054,6 +1054,8 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ssize_t len)
             }
             len += ret;
             id = P9_RERROR;
+        } else {
+            err = errno_to_dotl(err);
         }
 
         ret = pdu_marshal(pdu, len, "d", err);
-- 
2.32.0 (Apple Git-132)



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr on macOS host
  2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
                   ` (3 preceding siblings ...)
  2022-04-21 15:07 ` [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host Christian Schoenebeck
@ 2022-04-21 15:07 ` Christian Schoenebeck
  2022-04-21 16:40   ` Greg Kurz
  4 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

When mapped POSIX ACL is used, we are ignoring errors when trying
to remove a POSIX ACL xattr that does not exist. On Linux hosts we
would get ENODATA in such cases, on macOS hosts however we get
ENOATTR instead.

As we can be sure that ENOATTR is defined as being identical on Linux
hosts (at least by qemu/xattr.h), it is safe to fix this issue by
simply comparing against ENOATTR instead of ENODATA.

This patch fixes e.g. a command on Linux guest like:

  cp --preserve=mode old new

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Link: https://lore.kernel.org/qemu-devel/2866993.yOYK24bMf6@silver/
---
 hw/9pfs/9p-posix-acl.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/9p-posix-acl.c b/hw/9pfs/9p-posix-acl.c
index eadae270dd..4b2cb3c66c 100644
--- a/hw/9pfs/9p-posix-acl.c
+++ b/hw/9pfs/9p-posix-acl.c
@@ -65,7 +65,11 @@ static int mp_pacl_removexattr(FsContext *ctx,
     int ret;
 
     ret = local_removexattr_nofollow(ctx, path, MAP_ACL_ACCESS);
-    if (ret == -1 && errno == ENODATA) {
+    /*
+     * macOS returns ENOATTR (!=ENODATA on macOS), whereas Linux returns
+     * ENODATA (==ENOATTR on Linux), so checking for ENOATTR is fine
+     */
+    if (ret == -1 && errno == ENOATTR) {
         /*
          * We don't get ENODATA error when trying to remove a
          * posix acl that is not present. So don't throw the error
@@ -115,7 +119,11 @@ static int mp_dacl_removexattr(FsContext *ctx,
     int ret;
 
     ret = local_removexattr_nofollow(ctx, path, MAP_ACL_DEFAULT);
-    if (ret == -1 && errno == ENODATA) {
+    /*
+     * macOS returns ENOATTR (!=ENODATA on macOS), whereas Linux returns
+     * ENODATA (==ENOATTR on Linux), so checking for ENOATTR is fine
+     */
+    if (ret == -1 && errno == ENOATTR) {
         /*
          * We don't get ENODATA error when trying to remove a
          * posix acl that is not present. So don't throw the error
-- 
2.32.0 (Apple Git-132)



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 0/5] 9pfs: macOS host fixes
@ 2022-04-21 15:08 Christian Schoenebeck
  2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 15:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Will Cohen, Greg Kurz, Michael Roitzsch, Keno Fischer,
	Akihiko Odaki, qemu-stable

A bunch of fixes for recently added (QEMU 7.0) 9p support on macOS hosts.

Note: there are still issues to address with case-insensitive file systems
on macOS hosts. But I'll send a separate RFC on that issue.

v1 -> v2:

  * Close open file descriptor and return 0 on success instead of descriptor.
    [patch 1], [patch 2]

  * Compare res == -1 instead of res < 0. [patch 1], [patch 2]

  * Move dev_t and errno translation functions from 9p.c to 9p-util.h
    [patch 3], [patch 4]

  * Extend in-code comment that dev_t conversion is actually in glibc format.
    [patch 3]

  * Make it clear from the commit log that not having translated errnos
    before actually violated 9p2000.L protocol. [patch 4]

  * Simply compare against ENOATTR only, instead of comparing against both
    ENOATTR and ENODATA and describe in commit log and in-code comment why
    that is fine. [patch 5]

Christian Schoenebeck (5):
  9pfs: fix qemu_mknodat(S_IFREG) on macOS
  9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  9pfs: fix wrong encoding of rdev field in Rgetattr on macOS
  9pfs: fix wrong errno being sent to Linux client on macOS host
  9pfs: fix removing non-existent POSIX ACL xattr on macOS host

 hw/9pfs/9p-posix-acl.c   | 12 +++++--
 hw/9pfs/9p-util-darwin.c | 36 ++++++++++++++++++++-
 hw/9pfs/9p-util.h        | 69 ++++++++++++++++++++++++++++++++++++++++
 hw/9pfs/9p.c             |  4 ++-
 4 files changed, 117 insertions(+), 4 deletions(-)

-- 
2.32.0 (Apple Git-132)



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS
  2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
@ 2022-04-21 16:32   ` Greg Kurz
  0 siblings, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2022-04-21 16:32 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Thu, 21 Apr 2022 17:07:38 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> mknod() on macOS does not support creating regular files, so
> divert to openat_file() if S_IFREG is passed with mode argument.
> 
> Furthermore, 'man 2 mknodat' on Linux says: "Zero file type is
> equivalent to type S_IFREG".
> 
> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Reviewed-by: Will Cohen <wwcohen@gmail.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p-util-darwin.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> index bec0253474..e24d09763a 100644
> --- a/hw/9pfs/9p-util-darwin.c
> +++ b/hw/9pfs/9p-util-darwin.c
> @@ -77,6 +77,15 @@ int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
>  int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
>  {
>      int preserved_errno, err;
> +
> +    if (S_ISREG(mode) || !(mode & S_IFMT)) {
> +        int fd = openat_file(dirfd, filename, O_CREAT, mode);
> +        if (fd == -1) {
> +            return fd;
> +        }
> +        close(fd);
> +        return 0;
> +    }
>      if (!pthread_fchdir_np) {
>          error_report_once("pthread_fchdir_np() not available on this version of macOS");
>          return -ENOTSUP;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
@ 2022-04-21 16:36   ` Greg Kurz
  2022-04-21 17:29     ` Christian Schoenebeck
  2022-04-22  2:43   ` Akihiko Odaki
  1 sibling, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2022-04-21 16:36 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Thu, 21 Apr 2022 17:07:43 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> mknod() on macOS does not support creating sockets, so divert to
> call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> was passed with mode argument.
> 
> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Reviewed-by: Will Cohen <wwcohen@gmail.com>
> ---
>  hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> index e24d09763a..39308f2a45 100644
> --- a/hw/9pfs/9p-util-darwin.c
> +++ b/hw/9pfs/9p-util-darwin.c
> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
>   */
>  #if defined CONFIG_PTHREAD_FCHDIR_NP
>  
> +static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
> +    int fd, err;
> +    struct sockaddr_un addr = {
> +        .sun_family = AF_UNIX
> +    };
> +
> +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> +    if (fd == -1) {
> +        return fd;
> +    }
> +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
> +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> +    if (err == -1) {
> +        goto out;
> +    }
> +    err = chmod(addr.sun_path, mode);
> +out:
> +    close(fd);

You need close_preserve_errno() here.

Rest LGTM, so with that fixed, you can add:

Reviewed-by: Greg Kurz <groug@kaod.org>

> +    return err;
> +}
> +
>  int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
>  {
>      int preserved_errno, err;
> @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
>      if (pthread_fchdir_np(dirfd) < 0) {
>          return -1;
>      }
> -    err = mknod(filename, mode, dev);
> +    if (S_ISSOCK(mode)) {
> +        err = create_socket_file_at_cwd(filename, mode);
> +    } else {
> +        err = mknod(filename, mode, dev);
> +    }
>      preserved_errno = errno;
>      /* Stop using the thread-local cwd */
>      pthread_fchdir_np(-1);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr on macOS
  2022-04-21 15:07 ` [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr " Christian Schoenebeck
@ 2022-04-21 16:39   ` Greg Kurz
  0 siblings, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2022-04-21 16:39 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Thu, 21 Apr 2022 17:07:46 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> The 'rdev' field in 9p reponse 'Rgetattr' is of type dev_t,
> which is actually a system dependant type and therefore both the
> size and encoding of dev_t differ between macOS and Linux.
> 
> So far we have sent 'rdev' to guest in host's dev_t format as-is,
> which caused devices to appear with wrong device numbers on
> guests running on macOS hosts, eventually leading to various
> misbehaviours on guest in conjunction with device files.
> 
> This patch fixes this issue by converting the device number from
> host's dev_t format to Linux dev_t format. As 9p request
> 'Tgettattr' is exclusive to protocol version 9p2000.L, it should
> be fair to assume that 'rdev' field is assumed to be in Linux dev_t
> format by client as well.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Link: https://lore.kernel.org/qemu-devel/20220421093056.5ab1e7ed@bahia/
> Reviewed-by: Greg Kurz <groug@kaod.org>
> ---

Reviewed-again-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p-util.h | 39 +++++++++++++++++++++++++++++++++++++++
>  hw/9pfs/9p.c      |  2 +-
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> index 97e681e167..2cc9a5dbfb 100644
> --- a/hw/9pfs/9p-util.h
> +++ b/hw/9pfs/9p-util.h
> @@ -19,6 +19,45 @@
>  #define O_PATH_9P_UTIL 0
>  #endif
>  
> +#if !defined(CONFIG_LINUX)
> +
> +/*
> + * Generates a Linux device number (a.k.a. dev_t) for given device major
> + * and minor numbers.
> + *
> + * To be more precise: it generates a device number in glibc's format
> + * (MMMM_Mmmm_mmmM_MMmm, 64 bits) actually, which is compatible with
> + * Linux's format (mmmM_MMmm, 32 bits), as described in <bits/sysmacros.h>.
> + */
> +static inline uint64_t makedev_dotl(uint32_t dev_major, uint32_t dev_minor)
> +{
> +    uint64_t dev;
> +
> +    // from glibc sysmacros.h:
> +    dev  = (((uint64_t) (dev_major & 0x00000fffu)) <<  8);
> +    dev |= (((uint64_t) (dev_major & 0xfffff000u)) << 32);
> +    dev |= (((uint64_t) (dev_minor & 0x000000ffu)) <<  0);
> +    dev |= (((uint64_t) (dev_minor & 0xffffff00u)) << 12);
> +    return dev;
> +}
> +
> +#endif
> +
> +/*
> + * Converts given device number from host's device number format to Linux
> + * device number format. As both the size of type dev_t and encoding of
> + * dev_t is system dependant, we have to convert them for Linux guests if
> + * host is not running Linux.
> + */
> +static inline uint64_t host_dev_to_dotl_dev(dev_t dev)
> +{
> +#ifdef CONFIG_LINUX
> +    return dev;
> +#else
> +    return makedev_dotl(major(dev), minor(dev));
> +#endif
> +}
> +
>  #ifdef CONFIG_DARWIN
>  #define qemu_fgetxattr(...) fgetxattr(__VA_ARGS__, 0, 0)
>  #define qemu_lgetxattr(...) getxattr(__VA_ARGS__, 0, XATTR_NOFOLLOW)
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 225f31fc31..4a296a0b94 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1327,7 +1327,7 @@ static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
>      v9lstat->st_nlink = stbuf->st_nlink;
>      v9lstat->st_uid = stbuf->st_uid;
>      v9lstat->st_gid = stbuf->st_gid;
> -    v9lstat->st_rdev = stbuf->st_rdev;
> +    v9lstat->st_rdev = host_dev_to_dotl_dev(stbuf->st_rdev);
>      v9lstat->st_size = stbuf->st_size;
>      v9lstat->st_blksize = stat_to_iounit(pdu, stbuf);
>      v9lstat->st_blocks = stbuf->st_blocks;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host
  2022-04-21 15:07 ` [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host Christian Schoenebeck
@ 2022-04-21 16:39   ` Greg Kurz
  0 siblings, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2022-04-21 16:39 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Thu, 21 Apr 2022 17:07:49 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> Linux and macOS only share some errno definitions with equal macro
> name and value. In fact most mappings for errno are completely
> different on the two systems.
> 
> This patch converts some important errno values from macOS host to
> corresponding Linux errno values before eventually sending such error
> codes along with 'Rlerror' replies (if 9p2000.L is used that is). Not
> having translated errnos before violated the 9p2000.L protocol spec,
> which says:
> 
>   "
>   size[4] Rlerror tag[2] ecode[4]
> 
>   ... ecode is a numerical Linux errno.
>   "
> 
>   https://github.com/chaos/diod/wiki/protocol#lerror----return-error-code
> 
> This patch fixes a bunch of misbehaviours when running a Linux client
> on macOS host. For instance this patch fixes:
> 
>   mount -t 9p -o posixacl ...
> 
> on Linux guest if security_mode=mapped was used for 9p server, which
> refused to mount successfully, because macOS returned ENOATTR==93
> when client tried to retrieve POSIX ACL xattrs, because errno 93
> is defined as EPROTONOSUPPORT==93 on Linux, so Linux client believed
> that xattrs were not supported by filesystem on host in general.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Link: https://lore.kernel.org/qemu-devel/20220421124835.3e664669@bahia/
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p-util.h | 30 ++++++++++++++++++++++++++++++
>  hw/9pfs/9p.c      |  2 ++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
> index 2cc9a5dbfb..c3526144c9 100644
> --- a/hw/9pfs/9p-util.h
> +++ b/hw/9pfs/9p-util.h
> @@ -58,6 +58,36 @@ static inline uint64_t host_dev_to_dotl_dev(dev_t dev)
>  #endif
>  }
>  
> +/* Translates errno from host -> Linux if needed */
> +static inline int errno_to_dotl(int err) {
> +#if defined(CONFIG_LINUX)
> +    /* nothing to translate (Linux -> Linux) */
> +#elif defined(CONFIG_DARWIN)
> +    /*
> +     * translation mandatory for macOS hosts
> +     *
> +     * FIXME: Only most important errnos translated here yet, this should be
> +     * extended to as many errnos being translated as possible in future.
> +     */
> +    if (err == ENAMETOOLONG) {
> +        err = 36; /* ==ENAMETOOLONG on Linux */
> +    } else if (err == ENOTEMPTY) {
> +        err = 39; /* ==ENOTEMPTY on Linux */
> +    } else if (err == ELOOP) {
> +        err = 40; /* ==ELOOP on Linux */
> +    } else if (err == ENOATTR) {
> +        err = 61; /* ==ENODATA on Linux */
> +    } else if (err == ENOTSUP) {
> +        err = 95; /* ==EOPNOTSUPP on Linux */
> +    } else if (err == EOPNOTSUPP) {
> +        err = 95; /* ==EOPNOTSUPP on Linux */
> +    }
> +#else
> +#error Missing errno translation to Linux for this host system
> +#endif
> +    return err;
> +}
> +
>  #ifdef CONFIG_DARWIN
>  #define qemu_fgetxattr(...) fgetxattr(__VA_ARGS__, 0, 0)
>  #define qemu_lgetxattr(...) getxattr(__VA_ARGS__, 0, XATTR_NOFOLLOW)
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 4a296a0b94..0cd0c14c2a 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1054,6 +1054,8 @@ static void coroutine_fn pdu_complete(V9fsPDU *pdu, ssize_t len)
>              }
>              len += ret;
>              id = P9_RERROR;
> +        } else {
> +            err = errno_to_dotl(err);
>          }
>  
>          ret = pdu_marshal(pdu, len, "d", err);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr on macOS host
  2022-04-21 15:07 ` [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr " Christian Schoenebeck
@ 2022-04-21 16:40   ` Greg Kurz
  0 siblings, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2022-04-21 16:40 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Thu, 21 Apr 2022 17:07:55 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> When mapped POSIX ACL is used, we are ignoring errors when trying
> to remove a POSIX ACL xattr that does not exist. On Linux hosts we
> would get ENODATA in such cases, on macOS hosts however we get
> ENOATTR instead.
> 
> As we can be sure that ENOATTR is defined as being identical on Linux
> hosts (at least by qemu/xattr.h), it is safe to fix this issue by
> simply comparing against ENOATTR instead of ENODATA.
> 
> This patch fixes e.g. a command on Linux guest like:
> 
>   cp --preserve=mode old new
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Link: https://lore.kernel.org/qemu-devel/2866993.yOYK24bMf6@silver/
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/9pfs/9p-posix-acl.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/9pfs/9p-posix-acl.c b/hw/9pfs/9p-posix-acl.c
> index eadae270dd..4b2cb3c66c 100644
> --- a/hw/9pfs/9p-posix-acl.c
> +++ b/hw/9pfs/9p-posix-acl.c
> @@ -65,7 +65,11 @@ static int mp_pacl_removexattr(FsContext *ctx,
>      int ret;
>  
>      ret = local_removexattr_nofollow(ctx, path, MAP_ACL_ACCESS);
> -    if (ret == -1 && errno == ENODATA) {
> +    /*
> +     * macOS returns ENOATTR (!=ENODATA on macOS), whereas Linux returns
> +     * ENODATA (==ENOATTR on Linux), so checking for ENOATTR is fine
> +     */
> +    if (ret == -1 && errno == ENOATTR) {
>          /*
>           * We don't get ENODATA error when trying to remove a
>           * posix acl that is not present. So don't throw the error
> @@ -115,7 +119,11 @@ static int mp_dacl_removexattr(FsContext *ctx,
>      int ret;
>  
>      ret = local_removexattr_nofollow(ctx, path, MAP_ACL_DEFAULT);
> -    if (ret == -1 && errno == ENODATA) {
> +    /*
> +     * macOS returns ENOATTR (!=ENODATA on macOS), whereas Linux returns
> +     * ENODATA (==ENOATTR on Linux), so checking for ENOATTR is fine
> +     */
> +    if (ret == -1 && errno == ENOATTR) {
>          /*
>           * We don't get ENODATA error when trying to remove a
>           * posix acl that is not present. So don't throw the error



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-21 16:36   ` Greg Kurz
@ 2022-04-21 17:29     ` Christian Schoenebeck
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-21 17:29 UTC (permalink / raw)
  To: qemu-devel, qemu-stable
  Cc: Keno Fischer, Michael Roitzsch, Will Cohen, Akihiko Odaki, Greg Kurz

On Donnerstag, 21. April 2022 18:36:31 CEST Greg Kurz wrote:
> On Thu, 21 Apr 2022 17:07:43 +0200
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > mknod() on macOS does not support creating sockets, so divert to
> > call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> > was passed with mode argument.
> > 
> > Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > Reviewed-by: Will Cohen <wwcohen@gmail.com>
> > ---
> > 
> >  hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
> >  1 file changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> > index e24d09763a..39308f2a45 100644
> > --- a/hw/9pfs/9p-util-darwin.c
> > +++ b/hw/9pfs/9p-util-darwin.c
> > @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char
> > *filename, const char *name,> 
> >   */
> >  
> >  #if defined CONFIG_PTHREAD_FCHDIR_NP
> > 
> > +static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
> > +    int fd, err;
> > +    struct sockaddr_un addr = {
> > +        .sun_family = AF_UNIX
> > +    };
> > +
> > +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> > +    if (fd == -1) {
> > +        return fd;
> > +    }
> > +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
> > +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> > +    if (err == -1) {
> > +        goto out;
> > +    }
> > +    err = chmod(addr.sun_path, mode);
> > +out:
> > +    close(fd);
> 
> You need close_preserve_errno() here.
> 
> Rest LGTM, so with that fixed, you can add:
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>

Right, unlike patch 1, we might come from an error path here when closing.

I'll just s/close/close_preserve_errno/ this on my end before queuing, without 
sending a v3, unless somebody finds something else in this series.

Thanks!

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
  2022-04-21 16:36   ` Greg Kurz
@ 2022-04-22  2:43   ` Akihiko Odaki
  2022-04-22 14:06     ` Christian Schoenebeck
  1 sibling, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2022-04-22  2:43 UTC (permalink / raw)
  To: Christian Schoenebeck, qemu-devel
  Cc: Michael Roitzsch, Keno Fischer, Will Cohen, Greg Kurz, qemu-stable

On 2022/04/22 0:07, Christian Schoenebeck wrote:
> mknod() on macOS does not support creating sockets, so divert to
> call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> was passed with mode argument.
> 
> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Reviewed-by: Will Cohen <wwcohen@gmail.com>
> ---
>   hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
>   1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> index e24d09763a..39308f2a45 100644
> --- a/hw/9pfs/9p-util-darwin.c
> +++ b/hw/9pfs/9p-util-darwin.c
> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char *filename, const char *name,
>    */
>   #if defined CONFIG_PTHREAD_FCHDIR_NP
>   
> +static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
> +    int fd, err;
> +    struct sockaddr_un addr = {
> +        .sun_family = AF_UNIX
> +    };
> +
> +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> +    if (fd == -1) {
> +        return fd;
> +    }
> +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);

It would result in an incorrect path if the path does not fit in 
addr.sun_path. It should report an explicit error instead.

> +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> +    if (err == -1) {
> +        goto out;

You may close(fd) as soon as bind() returns (before checking the 
returned value) and eliminate goto.

> +    }
> +    err = chmod(addr.sun_path, mode);

I'm not sure if it is fine to have a time window between bind() and 
chmod(). Do you have some rationale?

Regards,
Akihiko Odaki

> +out:
> +    close(fd);
> +    return err;
> +}
> +
>   int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
>   {
>       int preserved_errno, err;
> @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t dev)
>       if (pthread_fchdir_np(dirfd) < 0) {
>           return -1;
>       }
> -    err = mknod(filename, mode, dev);
> +    if (S_ISSOCK(mode)) {
> +        err = create_socket_file_at_cwd(filename, mode);
> +    } else {
> +        err = mknod(filename, mode, dev);
> +    }
>       preserved_errno = errno;
>       /* Stop using the thread-local cwd */
>       pthread_fchdir_np(-1);



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-22  2:43   ` Akihiko Odaki
@ 2022-04-22 14:06     ` Christian Schoenebeck
  2022-04-23  4:33       ` Akihiko Odaki
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-22 14:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael Roitzsch, Keno Fischer, Will Cohen, Greg Kurz,
	qemu-stable, Akihiko Odaki

On Freitag, 22. April 2022 04:43:40 CEST Akihiko Odaki wrote:
> On 2022/04/22 0:07, Christian Schoenebeck wrote:
> > mknod() on macOS does not support creating sockets, so divert to
> > call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> > was passed with mode argument.
> > 
> > Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > Reviewed-by: Will Cohen <wwcohen@gmail.com>
> > ---
> > 
> >   hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
> >   1 file changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> > index e24d09763a..39308f2a45 100644
> > --- a/hw/9pfs/9p-util-darwin.c
> > +++ b/hw/9pfs/9p-util-darwin.c
> > @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char
> > *filename, const char *name,> 
> >    */
> >   
> >   #if defined CONFIG_PTHREAD_FCHDIR_NP
> > 
> > +static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
> > +    int fd, err;
> > +    struct sockaddr_un addr = {
> > +        .sun_family = AF_UNIX
> > +    };
> > +
> > +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> > +    if (fd == -1) {
> > +        return fd;
> > +    }
> > +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
> 
> It would result in an incorrect path if the path does not fit in
> addr.sun_path. It should report an explicit error instead.

Looking at its header file, 'sun_path' is indeed defined on macOS with an 
oddly small size of only 104 bytes. So yes, I should explicitly handle that 
error case.

I'll post a v3.

> > +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> > +    if (err == -1) {
> > +        goto out;
> 
> You may close(fd) as soon as bind() returns (before checking the
> returned value) and eliminate goto.

Yeah, I thought about that alternative, but found it a bit ugly, and probably 
also counter-productive in case this function might get extended with more 
error pathes in future. Not that I would insist on the current solution 
though.

> > +    }
> > +    err = chmod(addr.sun_path, mode);
> 
> I'm not sure if it is fine to have a time window between bind() and
> chmod(). Do you have some rationale?

Good question. QEMU's 9p server is multi-threaded; all 9p requests come in 
serialized and the 9p server controller portion (9p.c) is only running on QEMU 
main thread, but the actual filesystem driver calls are then dispatched to 
QEMU worker threads and therefore running concurrently at this point:

https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines

Similar situation on Linux 9p client side: it handles access to a mounted 9p 
filesystem concurrently, requests are then serialized by 9p driver on Linux 
and sent over wire to 9p server (host).

So yes, there might be implications by that short time windows. But could that 
be exploited on macOS hosts in practice?

The socket file would have mode srwxr-xr-x for a short moment.

For security_model=mapped* this should not be a problem.

For security_model=none|passhrough, in theory, maybe? But how likely is that? 
If you are using a Linux client for instance, trying to brute-force opening 
the socket file, the client would send several 9p commands (Twalk, Tgetattr, 
Topen, probably more). The time window of the two commands above should be 
much smaller than that and I would expect one of the 9p commands to error out 
in between.

What would be a viable approach to avoid this issue on macOS?

> > +out:
> > +    close(fd);
> > +    return err;
> > +}
> > +
> > 
> >   int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t
> >   dev)
> >   {
> >   
> >       int preserved_errno, err;
> > 
> > @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename,
> > mode_t mode, dev_t dev)> 
> >       if (pthread_fchdir_np(dirfd) < 0) {
> >       
> >           return -1;
> >       
> >       }
> > 
> > -    err = mknod(filename, mode, dev);
> > +    if (S_ISSOCK(mode)) {
> > +        err = create_socket_file_at_cwd(filename, mode);
> > +    } else {
> > +        err = mknod(filename, mode, dev);
> > +    }
> > 
> >       preserved_errno = errno;
> >       /* Stop using the thread-local cwd */
> >       pthread_fchdir_np(-1);




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-22 14:06     ` Christian Schoenebeck
@ 2022-04-23  4:33       ` Akihiko Odaki
  2022-04-24 18:45         ` Christian Schoenebeck
  0 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2022-04-23  4:33 UTC (permalink / raw)
  To: Christian Schoenebeck, qemu-devel
  Cc: Keno Fischer, Michael Roitzsch, Will Cohen, Greg Kurz, qemu-stable

On 2022/04/22 23:06, Christian Schoenebeck wrote:
> On Freitag, 22. April 2022 04:43:40 CEST Akihiko Odaki wrote:
>> On 2022/04/22 0:07, Christian Schoenebeck wrote:
>>> mknod() on macOS does not support creating sockets, so divert to
>>> call sequence socket(), bind() and chmod() respectively if S_IFSOCK
>>> was passed with mode argument.
>>>
>>> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
>>> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
>>> Reviewed-by: Will Cohen <wwcohen@gmail.com>
>>> ---
>>>
>>>    hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
>>>    1 file changed, 26 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
>>> index e24d09763a..39308f2a45 100644
>>> --- a/hw/9pfs/9p-util-darwin.c
>>> +++ b/hw/9pfs/9p-util-darwin.c
>>> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char
>>> *filename, const char *name,>
>>>     */
>>>    
>>>    #if defined CONFIG_PTHREAD_FCHDIR_NP
>>>
>>> +static int create_socket_file_at_cwd(const char *filename, mode_t mode) {
>>> +    int fd, err;
>>> +    struct sockaddr_un addr = {
>>> +        .sun_family = AF_UNIX
>>> +    };
>>> +
>>> +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
>>> +    if (fd == -1) {
>>> +        return fd;
>>> +    }
>>> +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
>>
>> It would result in an incorrect path if the path does not fit in
>> addr.sun_path. It should report an explicit error instead.
> 
> Looking at its header file, 'sun_path' is indeed defined on macOS with an
> oddly small size of only 104 bytes. So yes, I should explicitly handle that
> error case.
> 
> I'll post a v3.
> 
>>> +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
>>> +    if (err == -1) {
>>> +        goto out;
>>
>> You may close(fd) as soon as bind() returns (before checking the
>> returned value) and eliminate goto.
> 
> Yeah, I thought about that alternative, but found it a bit ugly, and probably
> also counter-productive in case this function might get extended with more
> error pathes in future. Not that I would insist on the current solution
> though.

I'm happy with the explanation. Thanks.

> 
>>> +    }
>>> +    err = chmod(addr.sun_path, mode);
>>
>> I'm not sure if it is fine to have a time window between bind() and
>> chmod(). Do you have some rationale?
> 
> Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
> serialized and the 9p server controller portion (9p.c) is only running on QEMU
> main thread, but the actual filesystem driver calls are then dispatched to
> QEMU worker threads and therefore running concurrently at this point:
> 
> https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
> 
> Similar situation on Linux 9p client side: it handles access to a mounted 9p
> filesystem concurrently, requests are then serialized by 9p driver on Linux
> and sent over wire to 9p server (host).
> 
> So yes, there might be implications by that short time windows. But could that
> be exploited on macOS hosts in practice?
> 
> The socket file would have mode srwxr-xr-x for a short moment.
> 
> For security_model=mapped* this should not be a problem.
> 
> For security_model=none|passhrough, in theory, maybe? But how likely is that?
> If you are using a Linux client for instance, trying to brute-force opening
> the socket file, the client would send several 9p commands (Twalk, Tgetattr,
> Topen, probably more). The time window of the two commands above should be
> much smaller than that and I would expect one of the 9p commands to error out
> in between.
> 
> What would be a viable approach to avoid this issue on macOS?

It is unlikely that a naive brute-force approach will succeed to 
exploit. The more concerning scenario is that the attacker uses the 
knowledge of the underlying implementation of macOS to cause resource 
contention to widen the window. Whether an exploitation is viable 
depends on how much time you spend digging XNU.

However, I'm also not sure if it really *has* a race condition. Looking 
at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and 
s->ops->lstat(). It also results in an entity called "path name based 
fid" in the code, which inherently cannot identify a file when it is 
renamed or recreated.

If there is some rationale it is safe, it may also be applied to the 
sequence of bind() and chmod(). Can anyone explain the sequence of 
s->ops->mknod() and s->ops->lstat() or path name based fid in general?

Regards,
Akihiko Odaki

> 
>>> +out:
>>> +    close(fd);
>>> +    return err;
>>> +}
>>> +
>>>
>>>    int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t
>>>    dev)
>>>    {
>>>    
>>>        int preserved_errno, err;
>>>
>>> @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename,
>>> mode_t mode, dev_t dev)>
>>>        if (pthread_fchdir_np(dirfd) < 0) {
>>>        
>>>            return -1;
>>>        
>>>        }
>>>
>>> -    err = mknod(filename, mode, dev);
>>> +    if (S_ISSOCK(mode)) {
>>> +        err = create_socket_file_at_cwd(filename, mode);
>>> +    } else {
>>> +        err = mknod(filename, mode, dev);
>>> +    }
>>>
>>>        preserved_errno = errno;
>>>        /* Stop using the thread-local cwd */
>>>        pthread_fchdir_np(-1);
> 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-23  4:33       ` Akihiko Odaki
@ 2022-04-24 18:45         ` Christian Schoenebeck
  2022-04-26  3:57           ` Akihiko Odaki
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-24 18:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Keno Fischer, Michael Roitzsch, Will Cohen, Greg Kurz,
	qemu-stable, Akihiko Odaki

On Samstag, 23. April 2022 06:33:50 CEST Akihiko Odaki wrote:
> On 2022/04/22 23:06, Christian Schoenebeck wrote:
> > On Freitag, 22. April 2022 04:43:40 CEST Akihiko Odaki wrote:
> >> On 2022/04/22 0:07, Christian Schoenebeck wrote:
> >>> mknod() on macOS does not support creating sockets, so divert to
> >>> call sequence socket(), bind() and chmod() respectively if S_IFSOCK
> >>> was passed with mode argument.
> >>> 
> >>> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/
> >>> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> >>> Reviewed-by: Will Cohen <wwcohen@gmail.com>
> >>> ---
> >>> 
> >>>    hw/9pfs/9p-util-darwin.c | 27 ++++++++++++++++++++++++++-
> >>>    1 file changed, 26 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> >>> index e24d09763a..39308f2a45 100644
> >>> --- a/hw/9pfs/9p-util-darwin.c
> >>> +++ b/hw/9pfs/9p-util-darwin.c
> >>> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char
> >>> *filename, const char *name,>
> >>> 
> >>>     */
> >>>    
> >>>    #if defined CONFIG_PTHREAD_FCHDIR_NP
> >>> 
> >>> +static int create_socket_file_at_cwd(const char *filename, mode_t mode)
> >>> {
> >>> +    int fd, err;
> >>> +    struct sockaddr_un addr = {
> >>> +        .sun_family = AF_UNIX
> >>> +    };
> >>> +
> >>> +    fd = socket(PF_UNIX, SOCK_DGRAM, 0);
> >>> +    if (fd == -1) {
> >>> +        return fd;
> >>> +    }
> >>> +    snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename);
> >> 
> >> It would result in an incorrect path if the path does not fit in
> >> addr.sun_path. It should report an explicit error instead.
> > 
> > Looking at its header file, 'sun_path' is indeed defined on macOS with an
> > oddly small size of only 104 bytes. So yes, I should explicitly handle
> > that
> > error case.
> > 
> > I'll post a v3.
> > 
> >>> +    err = bind(fd, (struct sockaddr *) &addr, sizeof(addr));
> >>> +    if (err == -1) {
> >>> +        goto out;
> >> 
> >> You may close(fd) as soon as bind() returns (before checking the
> >> returned value) and eliminate goto.
> > 
> > Yeah, I thought about that alternative, but found it a bit ugly, and
> > probably also counter-productive in case this function might get extended
> > with more error pathes in future. Not that I would insist on the current
> > solution though.
> 
> I'm happy with the explanation. Thanks.
> 
> >>> +    }
> >>> +    err = chmod(addr.sun_path, mode);
> >> 
> >> I'm not sure if it is fine to have a time window between bind() and
> >> chmod(). Do you have some rationale?
> > 
> > Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
> > serialized and the 9p server controller portion (9p.c) is only running on
> > QEMU main thread, but the actual filesystem driver calls are then
> > dispatched to QEMU worker threads and therefore running concurrently at
> > this point:
> > 
> > https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
> > 
> > Similar situation on Linux 9p client side: it handles access to a mounted
> > 9p filesystem concurrently, requests are then serialized by 9p driver on
> > Linux and sent over wire to 9p server (host).
> > 
> > So yes, there might be implications by that short time windows. But could
> > that be exploited on macOS hosts in practice?
> > 
> > The socket file would have mode srwxr-xr-x for a short moment.
> > 
> > For security_model=mapped* this should not be a problem.
> > 
> > For security_model=none|passhrough, in theory, maybe? But how likely is
> > that? If you are using a Linux client for instance, trying to brute-force
> > opening the socket file, the client would send several 9p commands
> > (Twalk, Tgetattr, Topen, probably more). The time window of the two
> > commands above should be much smaller than that and I would expect one of
> > the 9p commands to error out in between.
> > 
> > What would be a viable approach to avoid this issue on macOS?
> 
> It is unlikely that a naive brute-force approach will succeed to
> exploit. The more concerning scenario is that the attacker uses the
> knowledge of the underlying implementation of macOS to cause resource
> contention to widen the window. Whether an exploitation is viable
> depends on how much time you spend digging XNU.
> 
> However, I'm also not sure if it really *has* a race condition. Looking
> at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
> s->ops->lstat(). It also results in an entity called "path name based
> fid" in the code, which inherently cannot identify a file when it is
> renamed or recreated.
> 
> If there is some rationale it is safe, it may also be applied to the
> sequence of bind() and chmod(). Can anyone explain the sequence of
> s->ops->mknod() and s->ops->lstat() or path name based fid in general?

You are talking about 9p server's controller level: I don't see something that 
would prevent a concurrent open() during this bind() ... chmod() time window 
unfortunately.

Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in 
which the new device file shall be created. So 'fidp' is not the device file 
here, nor is 'fidp' modified during this function.

Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the 
beginning of the function it first acquires a read lock on a (per 9p export) 
global coroutine mutex:

    v9fs_path_read_lock(s);

and holds this lock until returning from function v9fs_co_mknod(). But that's 
just a read lock. Function v9fs_co_open() also just gains a read lock. So they 
can happen concurrently.

Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the 
code block (think of it as an Obj-C "block") inside this (macro actually) on a 
QEMU worker thread. So an arbitrary background thread would then call the fs 
driver functions:

    s->ops->mknod()
    v9fs_name_to_path()
    s->ops->lstat()

and then at the end of the code block the background thread would dispatch 
back to QEMU main thread. So when we are reaching:

    v9fs_path_unlock(s);

we are already back on QEMU main thread, hence unlocking on main thread now 
and finally leaving function v9fs_co_mknod().

The important thing to understand is, while that

    v9fs_co_run_in_worker({...})

code block is executed on a QEMU worker thread, the QEMU main thread (9p 
server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread 
rather continues to process other (if any) client requests in the meantime. In 
other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD 
dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.

So 9p server might pull a pending 'Topen' client request from the input FIFO 
in the meantime and likewise dispatch that to a worker thread, etc. Hence a 
concurrent open() might in theory be possible, but I find it quite unlikely to 
succeed in practice as the open() call on guest is translated by Linux client 
into a bunch of synchronous 9p requests on the path passed with the open() 
call on guest, and a round trip for each 9p message is like what, ~0.3ms or 
something in this order. That's quite huge compared to the time window I would 
expect between bind() ... open().

Does this answer your questions?

> Regards,
> Akihiko Odaki
> 
> >>> +out:
> >>> +    close(fd);
> >>> +    return err;
> >>> +}
> >>> +
> >>> 
> >>>    int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t
> >>>    dev)
> >>>    {
> >>>    
> >>>        int preserved_errno, err;
> >>> 
> >>> @@ -93,7 +114,11 @@ int qemu_mknodat(int dirfd, const char *filename,
> >>> mode_t mode, dev_t dev)>
> >>> 
> >>>        if (pthread_fchdir_np(dirfd) < 0) {
> >>>        
> >>>            return -1;
> >>>        
> >>>        }
> >>> 
> >>> -    err = mknod(filename, mode, dev);
> >>> +    if (S_ISSOCK(mode)) {
> >>> +        err = create_socket_file_at_cwd(filename, mode);
> >>> +    } else {
> >>> +        err = mknod(filename, mode, dev);
> >>> +    }
> >>> 
> >>>        preserved_errno = errno;
> >>>        /* Stop using the thread-local cwd */
> >>>        pthread_fchdir_np(-1);




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-24 18:45         ` Christian Schoenebeck
@ 2022-04-26  3:57           ` Akihiko Odaki
  2022-04-26 12:38             ` Greg Kurz
  0 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2022-04-26  3:57 UTC (permalink / raw)
  To: Christian Schoenebeck, qemu-devel
  Cc: Keno Fischer, Michael Roitzsch, Will Cohen, Greg Kurz, qemu-stable

On 2022/04/25 3:45, Christian Schoenebeck wrote:
>>>>> +    }
>>>>> +    err = chmod(addr.sun_path, mode);
>>>>
>>>> I'm not sure if it is fine to have a time window between bind() and
>>>> chmod(). Do you have some rationale?
>>>
>>> Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
>>> serialized and the 9p server controller portion (9p.c) is only running on
>>> QEMU main thread, but the actual filesystem driver calls are then
>>> dispatched to QEMU worker threads and therefore running concurrently at
>>> this point:
>>>
>>> https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
>>>
>>> Similar situation on Linux 9p client side: it handles access to a mounted
>>> 9p filesystem concurrently, requests are then serialized by 9p driver on
>>> Linux and sent over wire to 9p server (host).
>>>
>>> So yes, there might be implications by that short time windows. But could
>>> that be exploited on macOS hosts in practice?
>>>
>>> The socket file would have mode srwxr-xr-x for a short moment.
>>>
>>> For security_model=mapped* this should not be a problem.
>>>
>>> For security_model=none|passhrough, in theory, maybe? But how likely is
>>> that? If you are using a Linux client for instance, trying to brute-force
>>> opening the socket file, the client would send several 9p commands
>>> (Twalk, Tgetattr, Topen, probably more). The time window of the two
>>> commands above should be much smaller than that and I would expect one of
>>> the 9p commands to error out in between.
>>>
>>> What would be a viable approach to avoid this issue on macOS?
>>
>> It is unlikely that a naive brute-force approach will succeed to
>> exploit. The more concerning scenario is that the attacker uses the
>> knowledge of the underlying implementation of macOS to cause resource
>> contention to widen the window. Whether an exploitation is viable
>> depends on how much time you spend digging XNU.
>>
>> However, I'm also not sure if it really *has* a race condition. Looking
>> at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
>> s->ops->lstat(). It also results in an entity called "path name based
>> fid" in the code, which inherently cannot identify a file when it is
>> renamed or recreated.
>>
>> If there is some rationale it is safe, it may also be applied to the
>> sequence of bind() and chmod(). Can anyone explain the sequence of
>> s->ops->mknod() and s->ops->lstat() or path name based fid in general?
> 
> You are talking about 9p server's controller level: I don't see something that
> would prevent a concurrent open() during this bind() ... chmod() time window
> unfortunately.
> 
> Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in
> which the new device file shall be created. So 'fidp' is not the device file
> here, nor is 'fidp' modified during this function.
> 
> Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the
> beginning of the function it first acquires a read lock on a (per 9p export)
> global coroutine mutex:
> 
>      v9fs_path_read_lock(s);
> 
> and holds this lock until returning from function v9fs_co_mknod(). But that's
> just a read lock. Function v9fs_co_open() also just gains a read lock. So they
> can happen concurrently.
> 
> Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the
> code block (think of it as an Obj-C "block") inside this (macro actually) on a
> QEMU worker thread. So an arbitrary background thread would then call the fs
> driver functions:
> 
>      s->ops->mknod()
>      v9fs_name_to_path()
>      s->ops->lstat()
> 
> and then at the end of the code block the background thread would dispatch
> back to QEMU main thread. So when we are reaching:
> 
>      v9fs_path_unlock(s);
> 
> we are already back on QEMU main thread, hence unlocking on main thread now
> and finally leaving function v9fs_co_mknod().
> 
> The important thing to understand is, while that
> 
>      v9fs_co_run_in_worker({...})
> 
> code block is executed on a QEMU worker thread, the QEMU main thread (9p
> server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread
> rather continues to process other (if any) client requests in the meantime. In
> other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD
> dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.
> 
> So 9p server might pull a pending 'Topen' client request from the input FIFO
> in the meantime and likewise dispatch that to a worker thread, etc. Hence a
> concurrent open() might in theory be possible, but I find it quite unlikely to
> succeed in practice as the open() call on guest is translated by Linux client
> into a bunch of synchronous 9p requests on the path passed with the open()
> call on guest, and a round trip for each 9p message is like what, ~0.3ms or
> something in this order. That's quite huge compared to the time window I would
> expect between bind() ... open().
> 
> Does this answer your questions?

The time window may be widened by a malicious actor if the actor knows 
XNU well so the window length inferred from experiences is not really 
enough to claim it safe, particularly when considering about security.

On the other hand, I'm wondering if there is same kind of a time window 
between s->ops->mknodat() and s->ops->lstat(). Also, there should be 
similar time windows among operations with "path name based fid" as they 
also use path names as identifiers. If there is a rationale that it is 
considered secure, we may be able to apply the same logic to the time 
window between bind() and chmod() and claim it secure.
I need a review from someone who understands that part of the code, 
therefore.

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-26  3:57           ` Akihiko Odaki
@ 2022-04-26 12:38             ` Greg Kurz
  2022-04-27  2:27               ` Akihiko Odaki
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2022-04-26 12:38 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Michael Roitzsch, Christian Schoenebeck, qemu-devel, qemu-stable,
	Keno Fischer, Will Cohen

On Tue, 26 Apr 2022 12:57:37 +0900
Akihiko Odaki <akihiko.odaki@gmail.com> wrote:

> On 2022/04/25 3:45, Christian Schoenebeck wrote:
> >>>>> +    }
> >>>>> +    err = chmod(addr.sun_path, mode);
> >>>>
> >>>> I'm not sure if it is fine to have a time window between bind() and
> >>>> chmod(). Do you have some rationale?
> >>>
> >>> Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
> >>> serialized and the 9p server controller portion (9p.c) is only running on
> >>> QEMU main thread, but the actual filesystem driver calls are then
> >>> dispatched to QEMU worker threads and therefore running concurrently at
> >>> this point:
> >>>
> >>> https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
> >>>
> >>> Similar situation on Linux 9p client side: it handles access to a mounted
> >>> 9p filesystem concurrently, requests are then serialized by 9p driver on
> >>> Linux and sent over wire to 9p server (host).
> >>>
> >>> So yes, there might be implications by that short time windows. But could
> >>> that be exploited on macOS hosts in practice?
> >>>
> >>> The socket file would have mode srwxr-xr-x for a short moment.
> >>>
> >>> For security_model=mapped* this should not be a problem.
> >>>
> >>> For security_model=none|passhrough, in theory, maybe? But how likely is
> >>> that? If you are using a Linux client for instance, trying to brute-force
> >>> opening the socket file, the client would send several 9p commands
> >>> (Twalk, Tgetattr, Topen, probably more). The time window of the two
> >>> commands above should be much smaller than that and I would expect one of
> >>> the 9p commands to error out in between.
> >>>
> >>> What would be a viable approach to avoid this issue on macOS?
> >>
> >> It is unlikely that a naive brute-force approach will succeed to
> >> exploit. The more concerning scenario is that the attacker uses the
> >> knowledge of the underlying implementation of macOS to cause resource
> >> contention to widen the window. Whether an exploitation is viable
> >> depends on how much time you spend digging XNU.
> >>
> >> However, I'm also not sure if it really *has* a race condition. Looking
> >> at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
> >> s->ops->lstat(). It also results in an entity called "path name based
> >> fid" in the code, which inherently cannot identify a file when it is
> >> renamed or recreated.
> >>
> >> If there is some rationale it is safe, it may also be applied to the
> >> sequence of bind() and chmod(). Can anyone explain the sequence of
> >> s->ops->mknod() and s->ops->lstat() or path name based fid in general?
> > 
> > You are talking about 9p server's controller level: I don't see something that
> > would prevent a concurrent open() during this bind() ... chmod() time window
> > unfortunately.
> > 
> > Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in
> > which the new device file shall be created. So 'fidp' is not the device file
> > here, nor is 'fidp' modified during this function.
> > 
> > Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the
> > beginning of the function it first acquires a read lock on a (per 9p export)
> > global coroutine mutex:
> > 
> >      v9fs_path_read_lock(s);
> > 
> > and holds this lock until returning from function v9fs_co_mknod(). But that's
> > just a read lock. Function v9fs_co_open() also just gains a read lock. So they
> > can happen concurrently.
> > 
> > Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the
> > code block (think of it as an Obj-C "block") inside this (macro actually) on a
> > QEMU worker thread. So an arbitrary background thread would then call the fs
> > driver functions:
> > 
> >      s->ops->mknod()
> >      v9fs_name_to_path()
> >      s->ops->lstat()
> > 
> > and then at the end of the code block the background thread would dispatch
> > back to QEMU main thread. So when we are reaching:
> > 
> >      v9fs_path_unlock(s);
> > 
> > we are already back on QEMU main thread, hence unlocking on main thread now
> > and finally leaving function v9fs_co_mknod().
> > 
> > The important thing to understand is, while that
> > 
> >      v9fs_co_run_in_worker({...})
> > 
> > code block is executed on a QEMU worker thread, the QEMU main thread (9p
> > server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread
> > rather continues to process other (if any) client requests in the meantime. In
> > other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD
> > dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.
> > 
> > So 9p server might pull a pending 'Topen' client request from the input FIFO
> > in the meantime and likewise dispatch that to a worker thread, etc. Hence a
> > concurrent open() might in theory be possible, but I find it quite unlikely to
> > succeed in practice as the open() call on guest is translated by Linux client
> > into a bunch of synchronous 9p requests on the path passed with the open()
> > call on guest, and a round trip for each 9p message is like what, ~0.3ms or
> > something in this order. That's quite huge compared to the time window I would
> > expect between bind() ... open().
> > 
> > Does this answer your questions?
> 
> The time window may be widened by a malicious actor if the actor knows 
> XNU well so the window length inferred from experiences is not really 
> enough to claim it safe, particularly when considering about security.
> 
> On the other hand, I'm wondering if there is same kind of a time window 
> between s->ops->mknodat() and s->ops->lstat(). Also, there should be 
> similar time windows among operations with "path name based fid" as they 
> also use path names as identifiers. If there is a rationale that it is 
> considered secure, we may be able to apply the same logic to the time 
> window between bind() and chmod() and claim it secure.
> I need a review from someone who understands that part of the code, 
> therefore.
> 

I think Christian's explanation is clear enough. We don't guarantee
that v9fs_co_foo() calls run atomically. As a consequence, the client
might see transient states or be able to interact with an ongoing
request. And to answer your question, we have no specific rationale
on security with that.

I'm not sure what the concerns are but unless you come up with a
valid scenario [*] I don't see any reason to prevent this patch
to go forward.

[*] things like:
    - client escaping the shared directory
    - QEMU crashing
    - QEMU hogging host resources
    - client-side unprivileged user gaining elevated privleges
      in the guest

Cheers,

--
Greg

> Regards,
> Akihiko Odaki



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-26 12:38             ` Greg Kurz
@ 2022-04-27  2:27               ` Akihiko Odaki
  2022-04-27 10:18                 ` Greg Kurz
  0 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2022-04-27  2:27 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Michael Roitzsch, Christian Schoenebeck, qemu-devel, qemu-stable,
	Keno Fischer, Will Cohen

On 2022/04/26 21:38, Greg Kurz wrote:
> On Tue, 26 Apr 2022 12:57:37 +0900
> Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> 
>> On 2022/04/25 3:45, Christian Schoenebeck wrote:
>>>>>>> +    }
>>>>>>> +    err = chmod(addr.sun_path, mode);
>>>>>>
>>>>>> I'm not sure if it is fine to have a time window between bind() and
>>>>>> chmod(). Do you have some rationale?
>>>>>
>>>>> Good question. QEMU's 9p server is multi-threaded; all 9p requests come in
>>>>> serialized and the 9p server controller portion (9p.c) is only running on
>>>>> QEMU main thread, but the actual filesystem driver calls are then
>>>>> dispatched to QEMU worker threads and therefore running concurrently at
>>>>> this point:
>>>>>
>>>>> https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines
>>>>>
>>>>> Similar situation on Linux 9p client side: it handles access to a mounted
>>>>> 9p filesystem concurrently, requests are then serialized by 9p driver on
>>>>> Linux and sent over wire to 9p server (host).
>>>>>
>>>>> So yes, there might be implications by that short time windows. But could
>>>>> that be exploited on macOS hosts in practice?
>>>>>
>>>>> The socket file would have mode srwxr-xr-x for a short moment.
>>>>>
>>>>> For security_model=mapped* this should not be a problem.
>>>>>
>>>>> For security_model=none|passhrough, in theory, maybe? But how likely is
>>>>> that? If you are using a Linux client for instance, trying to brute-force
>>>>> opening the socket file, the client would send several 9p commands
>>>>> (Twalk, Tgetattr, Topen, probably more). The time window of the two
>>>>> commands above should be much smaller than that and I would expect one of
>>>>> the 9p commands to error out in between.
>>>>>
>>>>> What would be a viable approach to avoid this issue on macOS?
>>>>
>>>> It is unlikely that a naive brute-force approach will succeed to
>>>> exploit. The more concerning scenario is that the attacker uses the
>>>> knowledge of the underlying implementation of macOS to cause resource
>>>> contention to widen the window. Whether an exploitation is viable
>>>> depends on how much time you spend digging XNU.
>>>>
>>>> However, I'm also not sure if it really *has* a race condition. Looking
>>>> at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and
>>>> s->ops->lstat(). It also results in an entity called "path name based
>>>> fid" in the code, which inherently cannot identify a file when it is
>>>> renamed or recreated.
>>>>
>>>> If there is some rationale it is safe, it may also be applied to the
>>>> sequence of bind() and chmod(). Can anyone explain the sequence of
>>>> s->ops->mknod() and s->ops->lstat() or path name based fid in general?
>>>
>>> You are talking about 9p server's controller level: I don't see something that
>>> would prevent a concurrent open() during this bind() ... chmod() time window
>>> unfortunately.
>>>
>>> Argument 'fidp' passed to function v9fs_co_mknod() reflects the directory in
>>> which the new device file shall be created. So 'fidp' is not the device file
>>> here, nor is 'fidp' modified during this function.
>>>
>>> Function v9fs_co_mknod() is entered by 9p server on QEMU main thread. At the
>>> beginning of the function it first acquires a read lock on a (per 9p export)
>>> global coroutine mutex:
>>>
>>>       v9fs_path_read_lock(s);
>>>
>>> and holds this lock until returning from function v9fs_co_mknod(). But that's
>>> just a read lock. Function v9fs_co_open() also just gains a read lock. So they
>>> can happen concurrently.
>>>
>>> Then v9fs_co_run_in_worker({...}) is called to dispatch and execute all the
>>> code block (think of it as an Obj-C "block") inside this (macro actually) on a
>>> QEMU worker thread. So an arbitrary background thread would then call the fs
>>> driver functions:
>>>
>>>       s->ops->mknod()
>>>       v9fs_name_to_path()
>>>       s->ops->lstat()
>>>
>>> and then at the end of the code block the background thread would dispatch
>>> back to QEMU main thread. So when we are reaching:
>>>
>>>       v9fs_path_unlock(s);
>>>
>>> we are already back on QEMU main thread, hence unlocking on main thread now
>>> and finally leaving function v9fs_co_mknod().
>>>
>>> The important thing to understand is, while that
>>>
>>>       v9fs_co_run_in_worker({...})
>>>
>>> code block is executed on a QEMU worker thread, the QEMU main thread (9p
>>> server controller portion, i.e. 9p.c) is *not* sleeping, QEMU main thread
>>> rather continues to process other (if any) client requests in the meantime. In
>>> other words v9fs_co_run_in_worker() neither behaves exactly like Apple's GCD
>>> dispatch_async(), nor like dispatch_sync(), as GCD is not coroutine based.
>>>
>>> So 9p server might pull a pending 'Topen' client request from the input FIFO
>>> in the meantime and likewise dispatch that to a worker thread, etc. Hence a
>>> concurrent open() might in theory be possible, but I find it quite unlikely to
>>> succeed in practice as the open() call on guest is translated by Linux client
>>> into a bunch of synchronous 9p requests on the path passed with the open()
>>> call on guest, and a round trip for each 9p message is like what, ~0.3ms or
>>> something in this order. That's quite huge compared to the time window I would
>>> expect between bind() ... open().
>>>
>>> Does this answer your questions?
>>
>> The time window may be widened by a malicious actor if the actor knows
>> XNU well so the window length inferred from experiences is not really
>> enough to claim it safe, particularly when considering about security.
>>
>> On the other hand, I'm wondering if there is same kind of a time window
>> between s->ops->mknodat() and s->ops->lstat(). Also, there should be
>> similar time windows among operations with "path name based fid" as they
>> also use path names as identifiers. If there is a rationale that it is
>> considered secure, we may be able to apply the same logic to the time
>> window between bind() and chmod() and claim it secure.
>> I need a review from someone who understands that part of the code,
>> therefore.
>>
> 
> I think Christian's explanation is clear enough. We don't guarantee
> that v9fs_co_foo() calls run atomically. As a consequence, the client
> might see transient states or be able to interact with an ongoing
> request. And to answer your question, we have no specific rationale
> on security with that.
> 
> I'm not sure what the concerns are but unless you come up with a
> valid scenario [*] I don't see any reason to prevent this patch
> to go forward.
> 
> [*] things like:
>      - client escaping the shared directory
>      - QEMU crashing
>      - QEMU hogging host resources
>      - client-side unprivileged user gaining elevated privleges
>        in the guest

I was just not sure if such transient states are safe. The past 
discussion was about the length of the non-atomic time window where a 
path name is used to identify a particular file, but if such states are 
not considered problematic, the length does not matter all and we can 
confidently say the sequence of bind() and chmod() is safe.

Considering the transient states are tolerated in 9pfs, we need to 
design this function to be tolerant with transient states as well. The 
use of chmod() is not safe when we consider about transient states. A 
malicious actor may replace the file at the path with a symlink which 
may escape the shared directory and chmod() will naively follow it. 
chmod() should be replaced with fchmodat_nofollow() or something similar.

Regards,
Akihiko Odaki

> 
> Cheers,
> 
> --
> Greg
> 
>> Regards,
>> Akihiko Odaki
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27  2:27               ` Akihiko Odaki
@ 2022-04-27 10:18                 ` Greg Kurz
  2022-04-27 12:32                   ` Christian Schoenebeck
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2022-04-27 10:18 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Michael Roitzsch, Christian Schoenebeck, qemu-devel, qemu-stable,
	Keno Fischer, Will Cohen

On Wed, 27 Apr 2022 11:27:28 +0900
Akihiko Odaki <akihiko.odaki@gmail.com> wrote:

> On 2022/04/26 21:38, Greg Kurz wrote:

[..skip..]

> > 
> > I think Christian's explanation is clear enough. We don't guarantee
> > that v9fs_co_foo() calls run atomically. As a consequence, the client
> > might see transient states or be able to interact with an ongoing
> > request. And to answer your question, we have no specific rationale
> > on security with that.
> > 
> > I'm not sure what the concerns are but unless you come up with a
> > valid scenario [*] I don't see any reason to prevent this patch
> > to go forward.
> > 
> > [*] things like:
> >      - client escaping the shared directory
> >      - QEMU crashing
> >      - QEMU hogging host resources
> >      - client-side unprivileged user gaining elevated privleges
> >        in the guest
> 
> I was just not sure if such transient states are safe. The past 
> discussion was about the length of the non-atomic time window where a 
> path name is used to identify a particular file, but if such states are 
> not considered problematic, the length does not matter all and we can 
> confidently say the sequence of bind() and chmod() is safe.
> 
> Considering the transient states are tolerated in 9pfs, we need to 
> design this function to be tolerant with transient states as well. The 
> use of chmod() is not safe when we consider about transient states. A 
> malicious actor may replace the file at the path with a symlink which 
> may escape the shared directory and chmod() will naively follow it. 

You get a point here. Thanks for your tenacity ! :-)

> chmod() should be replaced with fchmodat_nofollow() or something similar.
> 

On a GNU/Linux system, this could be achieved by calling fchmod() on
the socket fd *before* calling bind() but I'm afraid this hack might
not work with a BSDish OS.

Replacing chmod() with fchmodat_nofollow(dirfd, addr.sun_path, mode)
won't make things atomic as above but at least it won't follow a
malicious symbolic link : mknod() on the client will fail with
ELOOP, which is fine when it comes to not breaking out of the shared
directory.

This brings up a new problem I hadn't realized before : the
fchmodat_nofollow() implementation in 9p-local.c is really
a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
supported with fchmodat(). It looks that this should move to
9p-util-linux.c and a proper version should be added for macOS
in 9p-util-darwin.c

Cheers,

--
Greg

> Regards,
> Akihiko Odaki
> 
> > 
> > Cheers,
> > 
> > --
> > Greg
> > 
> >> Regards,
> >> Akihiko Odaki
> > 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 10:18                 ` Greg Kurz
@ 2022-04-27 12:32                   ` Christian Schoenebeck
  2022-04-27 13:31                     ` Greg Kurz
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-27 12:32 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Akihiko Odaki, qemu-devel, Michael Roitzsch, qemu-stable,
	Keno Fischer, Will Cohen

On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> On Wed, 27 Apr 2022 11:27:28 +0900
> 
> Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > On 2022/04/26 21:38, Greg Kurz wrote:
> [..skip..]
> 
> > > I think Christian's explanation is clear enough. We don't guarantee
> > > that v9fs_co_foo() calls run atomically. As a consequence, the client
> > > might see transient states or be able to interact with an ongoing
> > > request. And to answer your question, we have no specific rationale
> > > on security with that.
> > > 
> > > I'm not sure what the concerns are but unless you come up with a
> > > valid scenario [*] I don't see any reason to prevent this patch
> > > to go forward.
> > > 
> > > [*] things like:
> > >      - client escaping the shared directory
> > >      - QEMU crashing
> > >      - QEMU hogging host resources
> > >      - client-side unprivileged user gaining elevated privleges
> > >      
> > >        in the guest
> > 
> > I was just not sure if such transient states are safe. The past
> > discussion was about the length of the non-atomic time window where a
> > path name is used to identify a particular file, but if such states are
> > not considered problematic, the length does not matter all and we can
> > confidently say the sequence of bind() and chmod() is safe.
> > 
> > Considering the transient states are tolerated in 9pfs, we need to
> > design this function to be tolerant with transient states as well. The
> > use of chmod() is not safe when we consider about transient states. A
> > malicious actor may replace the file at the path with a symlink which
> > may escape the shared directory and chmod() will naively follow it.
> 
> You get a point here. Thanks for your tenacity ! :-)

Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!

BTW, why is it actually allowed for client to create a symlink pointing 
outside exported directory tree with security_model=passthrough/none? Did 
anybody want that?

> > chmod() should be replaced with fchmodat_nofollow() or something similar.
> 
> On a GNU/Linux system, this could be achieved by calling fchmod() on
> the socket fd *before* calling bind() but I'm afraid this hack might
> not work with a BSDish OS.

As you already imagined, this is unfortunately not supported by any BSDs, 
including macOS. I'll file a bug report with Apple though.

> Replacing chmod() with fchmodat_nofollow(dirfd, addr.sun_path, mode)
> won't make things atomic as above but at least it won't follow a
> malicious symbolic link : mknod() on the client will fail with
> ELOOP, which is fine when it comes to not breaking out of the shared
> directory.

Current security_model=passthrough/none already has similar non-atomic 
operations BTW, so this was not something new. E.g.:

static int local_symlink(FsContext *fs_ctx, const char *oldpath,
                         V9fsPath *dir_path, const char *name, FsCred *credp)
{
    ...
    } else if (fs_ctx->export_flags & V9FS_SM_PASSTHROUGH ||
               fs_ctx->export_flags & V9FS_SM_NONE) {
        err = symlinkat(oldpath, dirfd, name);
        if (err) {
            goto out;
        }
        err = fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
                       AT_SYMLINK_NOFOLLOW);
    ...
}

In general, if you care about a higher degree of security, I'd always 
recommend to use security_model=mapped in the first place.

> This brings up a new problem I hadn't realized before : the
> fchmodat_nofollow() implementation in 9p-local.c is really
> a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> supported with fchmodat(). It looks that this should move to
> 9p-util-linux.c and a proper version should be added for macOS
> in 9p-util-darwin.c

Like already agreed on the other thread, yes, that makes sense. But I think 
this can be handled with a follow-up, separate from this series.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 12:32                   ` Christian Schoenebeck
@ 2022-04-27 13:31                     ` Greg Kurz
  2022-04-27 16:18                       ` Christian Schoenebeck
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2022-04-27 13:31 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Wed, 27 Apr 2022 14:32:53 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > On Wed, 27 Apr 2022 11:27:28 +0900
> > 
> > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > On 2022/04/26 21:38, Greg Kurz wrote:
> > [..skip..]
> > 
> > > > I think Christian's explanation is clear enough. We don't guarantee
> > > > that v9fs_co_foo() calls run atomically. As a consequence, the client
> > > > might see transient states or be able to interact with an ongoing
> > > > request. And to answer your question, we have no specific rationale
> > > > on security with that.
> > > > 
> > > > I'm not sure what the concerns are but unless you come up with a
> > > > valid scenario [*] I don't see any reason to prevent this patch
> > > > to go forward.
> > > > 
> > > > [*] things like:
> > > >      - client escaping the shared directory
> > > >      - QEMU crashing
> > > >      - QEMU hogging host resources
> > > >      - client-side unprivileged user gaining elevated privleges
> > > >      
> > > >        in the guest
> > > 
> > > I was just not sure if such transient states are safe. The past
> > > discussion was about the length of the non-atomic time window where a
> > > path name is used to identify a particular file, but if such states are
> > > not considered problematic, the length does not matter all and we can
> > > confidently say the sequence of bind() and chmod() is safe.
> > > 
> > > Considering the transient states are tolerated in 9pfs, we need to
> > > design this function to be tolerant with transient states as well. The
> > > use of chmod() is not safe when we consider about transient states. A
> > > malicious actor may replace the file at the path with a symlink which
> > > may escape the shared directory and chmod() will naively follow it.
> > 
> > You get a point here. Thanks for your tenacity ! :-)
> 
> Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> 
> BTW, why is it actually allowed for client to create a symlink pointing 
> outside exported directory tree with security_model=passthrough/none? Did 
> anybody want that?
> 

The target argument to symlink() is merely a string that is stored in
the inode. It is only evaluated as a path at the time an action is
made on the link. Checking at symlink() time is thus useless.

Anyway, we're safe on this side since it's the client's job to
resolve links and we explicitly don't follow them in the server.

> > > chmod() should be replaced with fchmodat_nofollow() or something similar.
> > 
> > On a GNU/Linux system, this could be achieved by calling fchmod() on
> > the socket fd *before* calling bind() but I'm afraid this hack might
> > not work with a BSDish OS.
> 
> As you already imagined, this is unfortunately not supported by any BSDs, 
> including macOS. I'll file a bug report with Apple though.
> 

I'm not sure if this is documented and supported behavior on linux either.

> > Replacing chmod() with fchmodat_nofollow(dirfd, addr.sun_path, mode)
> > won't make things atomic as above but at least it won't follow a
> > malicious symbolic link : mknod() on the client will fail with
> > ELOOP, which is fine when it comes to not breaking out of the shared
> > directory.
> 
> Current security_model=passthrough/none already has similar non-atomic 
> operations BTW, so this was not something new. E.g.:
> 
> static int local_symlink(FsContext *fs_ctx, const char *oldpath,
>                          V9fsPath *dir_path, const char *name, FsCred *credp)
> {
>     ...
>     } else if (fs_ctx->export_flags & V9FS_SM_PASSTHROUGH ||
>                fs_ctx->export_flags & V9FS_SM_NONE) {
>         err = symlinkat(oldpath, dirfd, name);
>         if (err) {
>             goto out;
>         }
>         err = fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
>                        AT_SYMLINK_NOFOLLOW);
>     ...
> }
> 

Yes similar window but this is secure since fchownat() supports
AT_SYMLINK_NOFOLLOW.

> In general, if you care about a higher degree of security, I'd always 
> recommend to use security_model=mapped in the first place.
> 
> > This brings up a new problem I hadn't realized before : the
> > fchmodat_nofollow() implementation in 9p-local.c is really
> > a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> > supported with fchmodat(). It looks that this should move to
> > 9p-util-linux.c and a proper version should be added for macOS
> > in 9p-util-darwin.c
> 
> Like already agreed on the other thread, yes, that makes sense. But I think 
> this can be handled with a follow-up, separate from this series.
> 

Fair enough if you want to handle fchmodat_nofollow() later but you
must at least use fchmodat(AT_SYMLINK_NOFOLLOW) in this patch
instead of chmod().

> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 13:31                     ` Greg Kurz
@ 2022-04-27 16:18                       ` Christian Schoenebeck
  2022-04-27 17:12                         ` Will Cohen
  2022-04-27 17:37                         ` Greg Kurz
  0 siblings, 2 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-27 16:18 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Mittwoch, 27. April 2022 15:31:42 CEST Greg Kurz wrote:
> On Wed, 27 Apr 2022 14:32:53 +0200
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > > On Wed, 27 Apr 2022 11:27:28 +0900
> > > 
> > > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > > On 2022/04/26 21:38, Greg Kurz wrote:
[...]
> > > > Considering the transient states are tolerated in 9pfs, we need to
> > > > design this function to be tolerant with transient states as well. The
> > > > use of chmod() is not safe when we consider about transient states. A
> > > > malicious actor may replace the file at the path with a symlink which
> > > > may escape the shared directory and chmod() will naively follow it.
> > > 
> > > You get a point here. Thanks for your tenacity ! :-)
> > 
> > Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> > 
> > BTW, why is it actually allowed for client to create a symlink pointing
> > outside exported directory tree with security_model=passthrough/none? Did
> > anybody want that?
> 
> The target argument to symlink() is merely a string that is stored in
> the inode. It is only evaluated as a path at the time an action is
> made on the link. Checking at symlink() time is thus useless.
> 
> Anyway, we're safe on this side since it's the client's job to
> resolve links and we explicitly don't follow them in the server.

I wouldn't call it useless, because it is easier to keep exactly 1 hole
stuffed instead of being forced to continuously stuff N holes as new ones may
popup up over time, as this case shows (mea culpa).

So you are trading (probably minor) performance for security.

But my question was actually meant seriously: whether there was anybody in the
past who probably actually wanted to create relative symlinks outside the
exported tree. For instance for another service with wider host filesystem
access.

[...]
> > > This brings up a new problem I hadn't realized before : the
> > > fchmodat_nofollow() implementation in 9p-local.c is really
> > > a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> > > supported with fchmodat(). It looks that this should move to
> > > 9p-util-linux.c and a proper version should be added for macOS
> > > in 9p-util-darwin.c
> > 
> > Like already agreed on the other thread, yes, that makes sense. But I
> > think
> > this can be handled with a follow-up, separate from this series.
> 
> Fair enough if you want to handle fchmodat_nofollow() later but you
> must at least use fchmodat(AT_SYMLINK_NOFOLLOW) in this patch
> instead of chmod().

Sounds like a quick and easy workaround. However looking at 'man fchmodat' on
macOS, this probably does not exactly do what you wanted it to, at least not
in this particular case:

     AT_SYMLINK_NOFOLLOW
             If path names a symbolic link, then the mode of the symbolic link is changed.

     AT_SYMLINK_NOFOLLOW_ANY
             If path names a symbolic link, then the mode of the symbolic link is changed and
             if if the path has any other symbolic links, an error is returned.

So if somebody replaced the socket file by a 1st order symlink, you would
adjust the symlink's permissions with both AT_SYMLINK_NOFOLLOW as well as with 
AT_SYMLINK_NOFOLLOW_ANY. I mean it's better than chmod(), but acceptable?

Using our existing fchmodat_nofollow() instead is no viable alternative
either, as it uses operations that are not supported on socket files on macOS
(tested).

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 16:18                       ` Christian Schoenebeck
@ 2022-04-27 17:12                         ` Will Cohen
  2022-04-27 18:16                           ` Christian Schoenebeck
  2022-04-27 17:37                         ` Greg Kurz
  1 sibling, 1 reply; 27+ messages in thread
From: Will Cohen @ 2022-04-27 17:12 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu Developers, qemu-stable, Greg Kurz, Keno Fischer,
	Michael Roitzsch, Akihiko Odaki

[-- Attachment #1: Type: text/plain, Size: 4818 bytes --]

On Wed, Apr 27, 2022 at 12:18 PM Christian Schoenebeck <
qemu_oss@crudebyte.com> wrote:

> On Mittwoch, 27. April 2022 15:31:42 CEST Greg Kurz wrote:
> > On Wed, 27 Apr 2022 14:32:53 +0200
> >
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > > > On Wed, 27 Apr 2022 11:27:28 +0900
> > > >
> > > > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > > > On 2022/04/26 21:38, Greg Kurz wrote:
> [...]
> > > > > Considering the transient states are tolerated in 9pfs, we need to
> > > > > design this function to be tolerant with transient states as well.
> The
> > > > > use of chmod() is not safe when we consider about transient
> states. A
> > > > > malicious actor may replace the file at the path with a symlink
> which
> > > > > may escape the shared directory and chmod() will naively follow it.
> > > >
> > > > You get a point here. Thanks for your tenacity ! :-)
> > >
> > > Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> > >
> > > BTW, why is it actually allowed for client to create a symlink pointing
> > > outside exported directory tree with security_model=passthrough/none?
> Did
> > > anybody want that?
> >
> > The target argument to symlink() is merely a string that is stored in
> > the inode. It is only evaluated as a path at the time an action is
> > made on the link. Checking at symlink() time is thus useless.
> >
> > Anyway, we're safe on this side since it's the client's job to
> > resolve links and we explicitly don't follow them in the server.
>
> I wouldn't call it useless, because it is easier to keep exactly 1 hole
> stuffed instead of being forced to continuously stuff N holes as new ones
> may
> popup up over time, as this case shows (mea culpa).
>
> So you are trading (probably minor) performance for security.
>
> But my question was actually meant seriously: whether there was anybody in
> the
> past who probably actually wanted to create relative symlinks outside the
> exported tree. For instance for another service with wider host filesystem
> access.
>

For what it's worth, the inability to follow symlinks read-write outside of
the tree appears to be, at the moment, the primary pain point for new users
of 9pfs in containerization software (see the later comments in
https://github.com/lima-vm/lima/pull/726 and to a lesser extent
https://github.com/containers/podman/issues/13784).

To my knowledge, neither podman nor lima have come up with conclusive
preferred solutions for how to address this, much less had a robust
discussion with the QEMU team.
The lima discussion notes that it works read-only with passthrough/none, so
I'd suggest that if there weren't users of it before, there are now! As I
understand it, one partial solution for downstream software that allows for
read-write may just be to more proactively mount larger directories to
minimize the number of external paths that symlinks might get tripped up
on. That said, this will stop working when it comes to linking to
additional mounts, since /Volumes on darwin will never line up with /mnt.


> [...]
> > > > This brings up a new problem I hadn't realized before : the
> > > > fchmodat_nofollow() implementation in 9p-local.c is really
> > > > a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> > > > supported with fchmodat(). It looks that this should move to
> > > > 9p-util-linux.c and a proper version should be added for macOS
> > > > in 9p-util-darwin.c
> > >
> > > Like already agreed on the other thread, yes, that makes sense. But I
> > > think
> > > this can be handled with a follow-up, separate from this series.
> >
> > Fair enough if you want to handle fchmodat_nofollow() later but you
> > must at least use fchmodat(AT_SYMLINK_NOFOLLOW) in this patch
> > instead of chmod().
>
> Sounds like a quick and easy workaround. However looking at 'man fchmodat'
> on
> macOS, this probably does not exactly do what you wanted it to, at least
> not
> in this particular case:
>
>      AT_SYMLINK_NOFOLLOW
>              If path names a symbolic link, then the mode of the symbolic
> link is changed.
>
>      AT_SYMLINK_NOFOLLOW_ANY
>              If path names a symbolic link, then the mode of the symbolic
> link is changed and
>              if if the path has any other symbolic links, an error is
> returned.
>
> So if somebody replaced the socket file by a 1st order symlink, you would
> adjust the symlink's permissions with both AT_SYMLINK_NOFOLLOW as well as
> with
> AT_SYMLINK_NOFOLLOW_ANY. I mean it's better than chmod(), but acceptable?
>
> Using our existing fchmodat_nofollow() instead is no viable alternative
> either, as it uses operations that are not supported on socket files on
> macOS
> (tested).
>
> Best regards,
> Christian Schoenebeck
>
>
>

[-- Attachment #2: Type: text/html, Size: 6135 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 16:18                       ` Christian Schoenebeck
  2022-04-27 17:12                         ` Will Cohen
@ 2022-04-27 17:37                         ` Greg Kurz
  2022-04-27 18:36                           ` Christian Schoenebeck
  1 sibling, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2022-04-27 17:37 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Wed, 27 Apr 2022 18:18:31 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> On Mittwoch, 27. April 2022 15:31:42 CEST Greg Kurz wrote:
> > On Wed, 27 Apr 2022 14:32:53 +0200
> > 
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > > > On Wed, 27 Apr 2022 11:27:28 +0900
> > > > 
> > > > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > > > On 2022/04/26 21:38, Greg Kurz wrote:
> [...]
> > > > > Considering the transient states are tolerated in 9pfs, we need to
> > > > > design this function to be tolerant with transient states as well. The
> > > > > use of chmod() is not safe when we consider about transient states. A
> > > > > malicious actor may replace the file at the path with a symlink which
> > > > > may escape the shared directory and chmod() will naively follow it.
> > > > 
> > > > You get a point here. Thanks for your tenacity ! :-)
> > > 
> > > Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> > > 
> > > BTW, why is it actually allowed for client to create a symlink pointing
> > > outside exported directory tree with security_model=passthrough/none? Did
> > > anybody want that?
> > 
> > The target argument to symlink() is merely a string that is stored in
> > the inode. It is only evaluated as a path at the time an action is
> > made on the link. Checking at symlink() time is thus useless.
> > 
> > Anyway, we're safe on this side since it's the client's job to
> > resolve links and we explicitly don't follow them in the server.
> 
> I wouldn't call it useless, because it is easier to keep exactly 1 hole
> stuffed instead of being forced to continuously stuff N holes as new ones may
> popup up over time, as this case shows (mea culpa).
> 
> So you are trading (probably minor) performance for security.
> 
> But my question was actually meant seriously: whether there was anybody in the
> past who probably actually wanted to create relative symlinks outside the
> exported tree. For instance for another service with wider host filesystem
> access.
> 

I took the question seriously :-), the problem is that even if you
do a check on the target at symlink() time, you don't know how it
will be evaluated in the end.

Quick demonstration:

$ cd /var/tmp/
$ mkdir foo
$ cd foo/
$ # Suppose foo is the jail
$ mkdir bar
$ ln -sf .. bar/link
$ realpath bar/link
/var/tmp/foo
$ # Good, we're still under foo
$ mv bar/link .
$ realpath link
/var/tmp
$ # Ouch we've escaped

So in the end, the only real fix is to ban path based syscalls and
pass AT_SYMLINK_NOFOLLOW everywhere. This was the justification for
rewriting nearly all 9p local in order to fix CVE-2016-9602.

https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg06225.html

> [...]
> > > > This brings up a new problem I hadn't realized before : the
> > > > fchmodat_nofollow() implementation in 9p-local.c is really
> > > > a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> > > > supported with fchmodat(). It looks that this should move to
> > > > 9p-util-linux.c and a proper version should be added for macOS
> > > > in 9p-util-darwin.c
> > > 
> > > Like already agreed on the other thread, yes, that makes sense. But I
> > > think
> > > this can be handled with a follow-up, separate from this series.
> > 
> > Fair enough if you want to handle fchmodat_nofollow() later but you
> > must at least use fchmodat(AT_SYMLINK_NOFOLLOW) in this patch
> > instead of chmod().
> 
> Sounds like a quick and easy workaround. However looking at 'man fchmodat' on
> macOS, this probably does not exactly do what you wanted it to, at least not
> in this particular case:
> 
>      AT_SYMLINK_NOFOLLOW
>              If path names a symbolic link, then the mode of the symbolic link is changed.
> 
>      AT_SYMLINK_NOFOLLOW_ANY
>              If path names a symbolic link, then the mode of the symbolic link is changed and
>              if if the path has any other symbolic links, an error is returned.
> 
> So if somebody replaced the socket file by a 1st order symlink, you would
> adjust the symlink's permissions with both AT_SYMLINK_NOFOLLOW as well as with 
> AT_SYMLINK_NOFOLLOW_ANY. I mean it's better than chmod(), but acceptable?
> 

As long as the link is not followed outside, we're good : it will change the
symlink mode and then what ?

> Using our existing fchmodat_nofollow() instead is no viable alternative
> either, as it uses operations that are not supported on socket files on macOS
> (tested).
> 
> Best regards,
> Christian Schoenebeck
> 
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 17:12                         ` Will Cohen
@ 2022-04-27 18:16                           ` Christian Schoenebeck
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-27 18:16 UTC (permalink / raw)
  To: Will Cohen
  Cc: qemu-devel, qemu-stable, Greg Kurz, Keno Fischer,
	Michael Roitzsch, Akihiko Odaki

On Mittwoch, 27. April 2022 19:12:15 CEST Will Cohen wrote:
> On Wed, Apr 27, 2022 at 12:18 PM Christian Schoenebeck <
> 
> qemu_oss@crudebyte.com> wrote:
> > On Mittwoch, 27. April 2022 15:31:42 CEST Greg Kurz wrote:
> > > On Wed, 27 Apr 2022 14:32:53 +0200
> > > 
> > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > > On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > > > > On Wed, 27 Apr 2022 11:27:28 +0900
> > > > > 
> > > > > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > > > > On 2022/04/26 21:38, Greg Kurz wrote:
> > [...]
> > 
> > > > > > Considering the transient states are tolerated in 9pfs, we need to
> > > > > > design this function to be tolerant with transient states as well.
> > 
> > The
> > 
> > > > > > use of chmod() is not safe when we consider about transient
> > 
> > states. A
> > 
> > > > > > malicious actor may replace the file at the path with a symlink
> > 
> > which
> > 
> > > > > > may escape the shared directory and chmod() will naively follow
> > > > > > it.
> > > > > 
> > > > > You get a point here. Thanks for your tenacity ! :-)
> > > > 
> > > > Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> > > > 
> > > > BTW, why is it actually allowed for client to create a symlink
> > > > pointing
> > > > outside exported directory tree with security_model=passthrough/none?
> > 
> > Did
> > 
> > > > anybody want that?
> > > 
> > > The target argument to symlink() is merely a string that is stored in
> > > the inode. It is only evaluated as a path at the time an action is
> > > made on the link. Checking at symlink() time is thus useless.
> > > 
> > > Anyway, we're safe on this side since it's the client's job to
> > > resolve links and we explicitly don't follow them in the server.
> > 
> > I wouldn't call it useless, because it is easier to keep exactly 1 hole
> > stuffed instead of being forced to continuously stuff N holes as new ones
> > may
> > popup up over time, as this case shows (mea culpa).
> > 
> > So you are trading (probably minor) performance for security.
> > 
> > But my question was actually meant seriously: whether there was anybody in
> > the
> > past who probably actually wanted to create relative symlinks outside the
> > exported tree. For instance for another service with wider host filesystem
> > access.
> 
> For what it's worth, the inability to follow symlinks read-write outside of
> the tree appears to be, at the moment, the primary pain point for new users
> of 9pfs in containerization software (see the later comments in
> https://github.com/lima-vm/lima/pull/726 and to a lesser extent
> https://github.com/containers/podman/issues/13784).
> 
> To my knowledge, neither podman nor lima have come up with conclusive
> preferred solutions for how to address this, much less had a robust
> discussion with the QEMU team.
> The lima discussion notes that it works read-only with passthrough/none, so
> I'd suggest that if there weren't users of it before, there are now! As I
> understand it, one partial solution for downstream software that allows for
> read-write may just be to more proactively mount larger directories to
> minimize the number of external paths that symlinks might get tripped up
> on. That said, this will stop working when it comes to linking to
> additional mounts, since /Volumes on darwin will never line up with /mnt.

That's a different thing. People in those discussions were using 
security_model=mapped where symlinks on guest are virtually mapped as textual 
file content (try 'cat <symlink>' on host). So in this mode symlinks on host 
and symlinks on guest are intentionally separated from each other.

The issue I was referring to was about security_model=passthrough|none which 
has the exact same symlinks accessible both on host and guest side, and more 
specifically I meant: symlinks created by guest that would point to a location 
*above* the 9p export root. E.g. say guest has access to the following host 
directory via 9p, that is access *below* the following directory on host:

  /vm/foo/

And say guest now mounts that host directory and creates a symlink like:

  mount -t 9p foo /mnt
  cd /mnt
  ln -s ../bar bar

That symlink would now point to /bar from guest's PoV, and to /vm/bar from 
host's PoV (i.e. a location on host where guest should not have access to at 
all).

BTW some of the other issues mentioned in the linked discussion, like the 
timeout errors, are fixed with this patch set.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
  2022-04-27 17:37                         ` Greg Kurz
@ 2022-04-27 18:36                           ` Christian Schoenebeck
  0 siblings, 0 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2022-04-27 18:36 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-devel, qemu-stable, Keno Fischer, Michael Roitzsch,
	Will Cohen, Akihiko Odaki

On Mittwoch, 27. April 2022 19:37:39 CEST Greg Kurz wrote:
> On Wed, 27 Apr 2022 18:18:31 +0200
> 
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > On Mittwoch, 27. April 2022 15:31:42 CEST Greg Kurz wrote:
> > > On Wed, 27 Apr 2022 14:32:53 +0200
> > > 
> > > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > > > On Mittwoch, 27. April 2022 12:18:10 CEST Greg Kurz wrote:
> > > > > On Wed, 27 Apr 2022 11:27:28 +0900
> > > > > 
> > > > > Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
> > > > > > On 2022/04/26 21:38, Greg Kurz wrote:
> > [...]
> > 
> > > > > > Considering the transient states are tolerated in 9pfs, we need to
> > > > > > design this function to be tolerant with transient states as well.
> > > > > > The
> > > > > > use of chmod() is not safe when we consider about transient
> > > > > > states. A
> > > > > > malicious actor may replace the file at the path with a symlink
> > > > > > which
> > > > > > may escape the shared directory and chmod() will naively follow
> > > > > > it.
> > > > > 
> > > > > You get a point here. Thanks for your tenacity ! :-)
> > > > 
> > > > Yep, I send a v4 with fchmodat_nofollow() instead of chmod(), thanks!
> > > > 
> > > > BTW, why is it actually allowed for client to create a symlink
> > > > pointing
> > > > outside exported directory tree with security_model=passthrough/none?
> > > > Did
> > > > anybody want that?
> > > 
> > > The target argument to symlink() is merely a string that is stored in
> > > the inode. It is only evaluated as a path at the time an action is
> > > made on the link. Checking at symlink() time is thus useless.
> > > 
> > > Anyway, we're safe on this side since it's the client's job to
> > > resolve links and we explicitly don't follow them in the server.
> > 
> > I wouldn't call it useless, because it is easier to keep exactly 1 hole
> > stuffed instead of being forced to continuously stuff N holes as new ones
> > may popup up over time, as this case shows (mea culpa).
> > 
> > So you are trading (probably minor) performance for security.
> > 
> > But my question was actually meant seriously: whether there was anybody in
> > the past who probably actually wanted to create relative symlinks outside
> > the exported tree. For instance for another service with wider host
> > filesystem access.
> 
> I took the question seriously :-), the problem is that even if you
> do a check on the target at symlink() time, you don't know how it
> will be evaluated in the end.
> 
> Quick demonstration:
> 
> $ cd /var/tmp/
> $ mkdir foo
> $ cd foo/
> $ # Suppose foo is the jail
> $ mkdir bar
> $ ln -sf .. bar/link
> $ realpath bar/link
> /var/tmp/foo
> $ # Good, we're still under foo
> $ mv bar/link .
> $ realpath link
> /var/tmp
> $ # Ouch we've escaped
> 
> So in the end, the only real fix is to ban path based syscalls and
> pass AT_SYMLINK_NOFOLLOW everywhere. This was the justification for
> rewriting nearly all 9p local in order to fix CVE-2016-9602.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg06225.html

Touché :) Agreed, it's not worth it.

I mean this simple example could still be addressed by catching the move, but 
if you have like several nested directories, each with a huge number of 
chained symlinks, on top of it non-atomic issues etc., then things would get 
way expensive to check, as you would actually have to traverse an entire tree 
and validate an even bigger amount of symlink pathes for every single symlink 
modification attempt on guest, probably even with exclusive locks, and so on.

> > [...]
> > 
> > > > > This brings up a new problem I hadn't realized before : the
> > > > > fchmodat_nofollow() implementation in 9p-local.c is really
> > > > > a linux only thing to cope with AT_SYMLINK_NOFOLLOW not being
> > > > > supported with fchmodat(). It looks that this should move to
> > > > > 9p-util-linux.c and a proper version should be added for macOS
> > > > > in 9p-util-darwin.c
> > > > 
> > > > Like already agreed on the other thread, yes, that makes sense. But I
> > > > think
> > > > this can be handled with a follow-up, separate from this series.
> > > 
> > > Fair enough if you want to handle fchmodat_nofollow() later but you
> > > must at least use fchmodat(AT_SYMLINK_NOFOLLOW) in this patch
> > > instead of chmod().
> > 
> > Sounds like a quick and easy workaround. However looking at 'man fchmodat'
> > on macOS, this probably does not exactly do what you wanted it to, at
> > least not> 
> > in this particular case:
> >      AT_SYMLINK_NOFOLLOW
> >      
> >              If path names a symbolic link, then the mode of the symbolic
> >              link is changed.>      
> >      AT_SYMLINK_NOFOLLOW_ANY
> >      
> >              If path names a symbolic link, then the mode of the symbolic
> >              link is changed and if if the path has any other symbolic
> >              links, an error is returned.> 
> > So if somebody replaced the socket file by a 1st order symlink, you would
> > adjust the symlink's permissions with both AT_SYMLINK_NOFOLLOW as well as
> > with AT_SYMLINK_NOFOLLOW_ANY. I mean it's better than chmod(), but
> > acceptable?
> As long as the link is not followed outside, we're good : it will change the
> symlink mode and then what ?

OK, then fine with me. I already filed a bug report with Apple for supporting 
fchmod(socket()). Hopefully we'll have a clean atomic solution in near future 
for this issue then.

I'll post v4 now. Thanks!

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-04-27 18:38 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 15:08 [PATCH v2 0/5] 9pfs: macOS host fixes Christian Schoenebeck
2022-04-21 15:07 ` [PATCH v2 1/5] 9pfs: fix qemu_mknodat(S_IFREG) on macOS Christian Schoenebeck
2022-04-21 16:32   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) " Christian Schoenebeck
2022-04-21 16:36   ` Greg Kurz
2022-04-21 17:29     ` Christian Schoenebeck
2022-04-22  2:43   ` Akihiko Odaki
2022-04-22 14:06     ` Christian Schoenebeck
2022-04-23  4:33       ` Akihiko Odaki
2022-04-24 18:45         ` Christian Schoenebeck
2022-04-26  3:57           ` Akihiko Odaki
2022-04-26 12:38             ` Greg Kurz
2022-04-27  2:27               ` Akihiko Odaki
2022-04-27 10:18                 ` Greg Kurz
2022-04-27 12:32                   ` Christian Schoenebeck
2022-04-27 13:31                     ` Greg Kurz
2022-04-27 16:18                       ` Christian Schoenebeck
2022-04-27 17:12                         ` Will Cohen
2022-04-27 18:16                           ` Christian Schoenebeck
2022-04-27 17:37                         ` Greg Kurz
2022-04-27 18:36                           ` Christian Schoenebeck
2022-04-21 15:07 ` [PATCH v2 3/5] 9pfs: fix wrong encoding of rdev field in Rgetattr " Christian Schoenebeck
2022-04-21 16:39   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 4/5] 9pfs: fix wrong errno being sent to Linux client on macOS host Christian Schoenebeck
2022-04-21 16:39   ` Greg Kurz
2022-04-21 15:07 ` [PATCH v2 5/5] 9pfs: fix removing non-existent POSIX ACL xattr " Christian Schoenebeck
2022-04-21 16:40   ` Greg Kurz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.