All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
@ 2012-05-01 15:31 Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt Stefan Hajnoczi
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
it from opening files that it should not have access to.  This improves
security because it prevents the attacker from escaping the QEMU process if
they manage to gain control.

NFS has been a pain point for SELinux because it does not support labels (which
I believe are stored in extended attributes).  In other words, it's not
possible to use SELinux goodness on QEMU when image files are located on NFS.
Today we have to allow QEMU access to any file on the NFS export rather than
restricting specifically to the image files that the guest requires.

File descriptor passing is a solution to this problem and might also come in
handy elsewhere.  Libvirt or another external process chooses files which QEMU
is allowed to access and provides just those file descriptors - QEMU cannot
open the files itself.

This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
open an image file it sends a request over the given UNIX domain socket.  The
response includes the file descriptor or an errno on failure.  Please see the
patches for details on the protocol.

The -open-hook-fd approach allows QEMU to support file descriptor passing
without changing -drive.  It also supports snapshot_blkdev and other commands
that re-open image files.

Anthony Liguori <aliguori@us.ibm.com> wrote most of these patches.  I added a
demo -open-hook-fd server and added some small fixes.  Since Anthony is
traveling right now I'm sending the RFC for discussion.

Anthony Liguori (3):
  block: add open() wrapper that can be hooked by libvirt
  block: add new command line parameter that and protocol description
  block: plumb up open-hook-fd option

Stefan Hajnoczi (2):
  osdep: add qemu_recvmsg() wrapper
  Example -open-hook-fd server

 block.c           |  107 ++++++++++++++++++++++++++++++++++++++
 block.h           |    2 +
 block/raw-posix.c |   18 +++----
 block/raw-win32.c |    2 +-
 block/vdi.c       |    2 +-
 block/vmdk.c      |    6 +--
 block/vpc.c       |    2 +-
 block/vvfat.c     |    4 +-
 block_int.h       |   12 +++++
 osdep.c           |   46 +++++++++++++++++
 qemu-common.h     |    2 +
 qemu-options.hx   |   42 +++++++++++++++
 test-fd-passing.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 vl.c              |    3 ++
 14 files changed, 378 insertions(+), 17 deletions(-)
 create mode 100644 test-fd-passing.c

-- 
1.7.10

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
@ 2012-05-01 15:31 ` Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

From: Anthony Liguori <aliguori@us.ibm.com>

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c           |    5 +++++
 block/raw-posix.c |   16 ++++++++--------
 block/raw-win32.c |    2 +-
 block/vdi.c       |    2 +-
 block/vmdk.c      |    6 +++---
 block/vpc.c       |    2 +-
 block/vvfat.c     |    4 ++--
 block_int.h       |   12 ++++++++++++
 8 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index 43c794c..f9c4633 100644
--- a/block.c
+++ b/block.c
@@ -102,6 +102,11 @@ static BlockDriverState *bs_snapshots;
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
+int file_open(const char *filename, int flags, mode_t mode)
+{
+    return open(filename, flags, mode);
+}
+
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2d1bc13..b6bc6bc 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -568,7 +568,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+    fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
               0644);
     if (fd < 0) {
         result = -errno;
@@ -741,7 +741,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
         if ( bsdPath[ 0 ] != '\0' ) {
             strcat(bsdPath,"s0");
             /* some CDs don't have a partition 0 */
-            fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
+            fd = file_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE, 0);
             if (fd < 0) {
                 bsdPath[strlen(bsdPath)-1] = '1';
             } else {
@@ -798,7 +798,7 @@ static int fd_open(BlockDriverState *bs)
 #endif
             return -EIO;
         }
-        s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK);
+        s->fd = file_open(bs->filename, s->open_flags & ~O_NONBLOCK, 0);
         if (s->fd < 0) {
             s->fd_error_time = get_clock();
             s->fd_got_error = 1;
@@ -872,7 +872,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_BINARY);
+    fd = file_open(filename, O_WRONLY | O_BINARY, 0);
     if (fd < 0)
         return -errno;
 
@@ -950,7 +950,7 @@ static int floppy_probe_device(const char *filename)
     if (strstart(filename, "/dev/fd", NULL))
         prio = 50;
 
-    fd = open(filename, O_RDONLY | O_NONBLOCK);
+    fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0);
     if (fd < 0) {
         goto out;
     }
@@ -1003,7 +1003,7 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag)
         close(s->fd);
         s->fd = -1;
     }
-    fd = open(bs->filename, s->open_flags | O_NONBLOCK);
+    fd = file_open(bs->filename, s->open_flags | O_NONBLOCK, 0);
     if (fd >= 0) {
         if (ioctl(fd, FDEJECT, 0) < 0)
             perror("FDEJECT");
@@ -1053,7 +1053,7 @@ static int cdrom_probe_device(const char *filename)
     int prio = 0;
     struct stat st;
 
-    fd = open(filename, O_RDONLY | O_NONBLOCK);
+    fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0);
     if (fd < 0) {
         goto out;
     }
@@ -1177,7 +1177,7 @@ static int cdrom_reopen(BlockDriverState *bs)
      */
     if (s->fd >= 0)
         close(s->fd);
-    fd = open(bs->filename, s->open_flags, 0644);
+    fd = file_open(bs->filename, s->open_flags, 0644);
     if (fd < 0) {
         s->fd = -1;
         return -EIO;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index e4b0b75..a64651b 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -255,7 +255,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+    fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
               0644);
     if (fd < 0)
         return -EIO;
diff --git a/block/vdi.c b/block/vdi.c
index 119d3c7..c8be34b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -648,7 +648,7 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+    fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
               0644);
     if (fd < 0) {
         return -errno;
diff --git a/block/vmdk.c b/block/vmdk.c
index 18e9b4c..1cf86f1 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1161,7 +1161,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
     VMDK4Header header;
     uint32_t tmp, magic, grains, gd_size, gt_size, gt_count;
 
-    fd = open(
+    fd = file_open(
         filename,
         O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
         0644);
@@ -1484,12 +1484,12 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
             (flags & BLOCK_FLAG_COMPAT6 ? 6 : 4),
             total_size / (int64_t)(63 * 16 * 512));
     if (split || flat) {
-        fd = open(
+        fd = file_open(
                 filename,
                 O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
                 0644);
     } else {
-        fd = open(
+        fd = file_open(
                 filename,
                 O_WRONLY | O_BINARY | O_LARGEFILE,
                 0644);
diff --git a/block/vpc.c b/block/vpc.c
index 5cd13d1..c1efb10 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -678,7 +678,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
     }
 
     /* Create the file */
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
+    fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
     if (fd < 0) {
         return -EIO;
     }
diff --git a/block/vvfat.c b/block/vvfat.c
index 9ef21dd..052343a 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1150,7 +1150,7 @@ static int open_file(BDRVVVFATState* s,mapping_t* mapping)
     if(!s->current_mapping ||
 	    strcmp(s->current_mapping->path,mapping->path)) {
 	/* open file */
-	int fd = open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE);
+	int fd = file_open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE, 0);
 	if(fd<0)
 	    return -1;
 	vvfat_close_current_file(s);
@@ -2209,7 +2209,7 @@ static int commit_one_file(BDRVVVFATState* s,
     for (i = s->cluster_size; i < offset; i += s->cluster_size)
 	c = modified_fat_get(s, c);
 
-    fd = open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666);
+    fd = file_open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666);
     if (fd < 0) {
 	fprintf(stderr, "Could not open %s... (%s, %d)\n", mapping->path,
 		strerror(errno), errno);
diff --git a/block_int.h b/block_int.h
index 086832a..7ccc8ef 100644
--- a/block_int.h
+++ b/block_int.h
@@ -435,4 +435,16 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/**
+ * file_open:
+ * @filename: the filename to attempt to open
+ * @flags: O_ flags that determine how the file is open
+ * @mode: the mode to create the file with if #O_CREAT is included in @flags
+ *
+ * This function behaves just like the @open libc function.  It may, however,
+ * delegate the open to a separate process if QEMU is being run with fewer
+ * privileges.
+ */
+int file_open(const char *filename, int flags, mode_t mode);
+
 #endif /* BLOCK_INT_H */
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt Stefan Hajnoczi
@ 2012-05-01 15:31 ` Stefan Hajnoczi
  2012-05-02  8:58   ` Daniel P. Berrange
  2012-05-02  9:03   ` Daniel P. Berrange
  2012-05-01 15:31 ` [Qemu-devel] [RFC 3/5] block: plumb up open-hook-fd option Stefan Hajnoczi
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

From: Anthony Liguori <aliguori@us.ibm.com>

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 qemu-options.hx |   42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index a169792..ccf4d1d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2724,6 +2724,48 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log,
     "-qtest-log LOG  specify tracing options\n",
     QEMU_ARCH_ALL)
 
+DEF("open-hook-fd", HAS_ARG, QEMU_OPTION_open_hook_fd,
+    "-open-hook-fd <fd>\n"
+    "                delegate opens to external process using <fd>\n", QEMU_ARCH_ALL)
+STEXI
+@item -open-hook-fd @var{fd}
+@findex -open-hook-fd
+Delegates open()s to an external process using @var<fd> to communicate commands.
+@var<fd> should be an open Unix Domain socket pipe that file descriptors can be
+received from.  The protocol the socket uses is a simple request/response initiated
+by the client.  All integers are in host byte order.  It is assumed that this protocol
+is only ever used on the same physical machine.  It is currently defined as:
+
+u32 message_size
+u32 command
+u8  payload[message_size - 8]
+
+The contents of payload depend on command.  Currently the following commands are
+defined:
+
+1. QEMU_OPEN (1)
+
+The full message will be:
+
+u32 message_size
+u32 command = 1
+u32 flags (O_ flags defined by libc)
+u32 mode (mode_t flags as defined by libc)
+u16 filename_len;
+u8  filename[filename_len]
+
+The server will then respond with:
+
+u32 message_size
+u32 command = 1
+s32 result
+
+If result is < 0, the value will be negated errno value as defined in errno.h.  If
+result is equal to 0, then there will also be a file descriptor available via SCM_RIGHTS
+in the extended sendmsg data.
+
+ETEXI
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [RFC 3/5] block: plumb up open-hook-fd option
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
@ 2012-05-01 15:31 ` Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 4/5] osdep: add qemu_recvmsg() wrapper Stefan Hajnoczi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

From: Anthony Liguori <aliguori@us.ibm.com>

Implement the open hook UNIX domain socket protocol and accept passed
file descriptors.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c           |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block.h           |    2 +
 block/raw-posix.c |    2 +-
 qemu-options.hx   |    4 +-
 vl.c              |    3 ++
 5 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index f9c4633..d3bf443 100644
--- a/block.c
+++ b/block.c
@@ -31,6 +31,7 @@
 #include "qemu-coroutine.h"
 #include "qmp-commands.h"
 #include "qemu-timer.h"
+#include "qemu_socket.h"
 
 #ifdef CONFIG_BSD
 #include <sys/types.h>
@@ -102,9 +103,113 @@ static BlockDriverState *bs_snapshots;
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
+static int remote_file_fd = -1;
+
+void remote_file_open_init(int fd)
+{
+    remote_file_fd = fd;
+}
+
+typedef struct OpenRequest
+{
+    uint32_t message_len;
+    uint32_t type;
+    uint32_t flags;
+    uint32_t mode;
+    uint32_t filename_len;
+    uint8_t filename[0];
+} OpenRequest;
+
+typedef struct OpenResponse
+{
+    uint32_t message_len;
+    uint32_t type;
+    int32_t result;
+} OpenResponse;
+
 int file_open(const char *filename, int flags, mode_t mode)
 {
-    return open(filename, flags, mode);
+#ifdef _WIN32
+    return qemu_open(filename, flags, mode);
+#else
+    struct msghdr msg = { NULL, };
+    struct cmsghdr *cmsg;
+    struct iovec iov[1];
+    union {
+        struct cmsghdr cmsg;
+        char control[CMSG_SPACE(sizeof(int))];
+    } msg_control;
+    ssize_t ret;
+    uint8_t buffer[1024];
+    OpenRequest *req = (void *)buffer;
+    OpenResponse *rsp = (void *)buffer;
+
+    if (remote_file_fd == -1) {
+        return qemu_open(filename, flags, mode);
+    }
+
+    req->filename_len = strlen(filename);
+    req->message_len = sizeof(OpenRequest) + req->filename_len;
+    req->type = 1; /* OpenRequest */
+    req->flags = flags;
+    req->mode = mode;
+
+    if (req->message_len > sizeof(buffer)) {
+        errno = EFAULT;
+        return -1;
+    }
+    memcpy(req->filename, filename, req->filename_len);
+
+    do {
+        ret = send(remote_file_fd, req, req->message_len, 0);
+    } while (ret == -1 && errno == EINTR);
+    if (ret != req->message_len) {
+        errno = EPIPE;
+        return -1;
+    }
+
+    iov[0].iov_base = buffer;
+    iov[0].iov_len = sizeof(buffer);
+
+    msg.msg_iov = iov;
+    msg.msg_iovlen = 1;
+    msg.msg_control = &msg_control;
+    msg.msg_controllen = sizeof(msg_control);
+
+    do {
+        ret = recvmsg(remote_file_fd, &msg, 0);
+    } while (ret == -1 && errno == EINTR);
+    if (ret != sizeof(OpenResponse)) {
+        errno = EPIPE;
+        return -1;
+    }
+
+    rsp = (void *)buffer;
+    if (rsp->result < 0) {
+        errno = -rsp->result;
+        return -1;
+    }
+
+    for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+        int fd;
+
+        if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
+            cmsg->cmsg_level != SOL_SOCKET ||
+            cmsg->cmsg_type != SCM_RIGHTS) {
+            continue;
+        }
+
+        fd = *((int *)CMSG_DATA(cmsg));
+        if (fd < 0) {
+            continue;
+        }
+
+        return fd;
+    }
+
+    errno = ENOENT;
+    return -1;
+#endif
 }
 
 #ifdef _WIN32
diff --git a/block.h b/block.h
index f163e54..b8b78c7 100644
--- a/block.h
+++ b/block.h
@@ -336,6 +336,8 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+void remote_file_open_init(int fd);
+
 #define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index b6bc6bc..9946e5f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -208,7 +208,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
         s->open_flags |= O_DSYNC;
 
     s->fd = -1;
-    fd = qemu_open(filename, s->open_flags, 0644);
+    fd = file_open(filename, s->open_flags, 0644);
     if (fd < 0) {
         ret = -errno;
         if (ret == -EROFS)
diff --git a/qemu-options.hx b/qemu-options.hx
index ccf4d1d..0c54cd5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2730,8 +2730,8 @@ DEF("open-hook-fd", HAS_ARG, QEMU_OPTION_open_hook_fd,
 STEXI
 @item -open-hook-fd @var{fd}
 @findex -open-hook-fd
-Delegates open()s to an external process using @var<fd> to communicate commands.
-@var<fd> should be an open Unix Domain socket pipe that file descriptors can be
+Delegates open()s to an external process using @var{fd} to communicate commands.
+@var{fd} should be an open Unix Domain socket pipe that file descriptors can be
 received from.  The protocol the socket uses is a simple request/response initiated
 by the client.  All integers are in host byte order.  It is assumed that this protocol
 is only ever used on the same physical machine.  It is currently defined as:
diff --git a/vl.c b/vl.c
index ae91a8a..b418865 100644
--- a/vl.c
+++ b/vl.c
@@ -3191,6 +3191,9 @@ int main(int argc, char **argv, char **envp)
             case QEMU_OPTION_qtest_log:
                 qtest_log = optarg;
                 break;
+            case QEMU_OPTION_open_hook_fd:
+                remote_file_open_init(atoi(optarg));
+                break;
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [RFC 4/5] osdep: add qemu_recvmsg() wrapper
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2012-05-01 15:31 ` [Qemu-devel] [RFC 3/5] block: plumb up open-hook-fd option Stefan Hajnoczi
@ 2012-05-01 15:31 ` Stefan Hajnoczi
  2012-05-01 15:31 ` [Qemu-devel] [RFC 5/5] Example -open-hook-fd server Stefan Hajnoczi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

Usually we need to set O_CLOEXEC, which is platform-specific.  Add a
wrapper like qemu_open() but for qemu_recvmsg().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c       |    5 +----
 osdep.c       |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-common.h |    2 ++
 3 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index d3bf443..e66d64f 100644
--- a/block.c
+++ b/block.c
@@ -176,10 +176,7 @@ int file_open(const char *filename, int flags, mode_t mode)
     msg.msg_control = &msg_control;
     msg.msg_controllen = sizeof(msg_control);
 
-    do {
-        ret = recvmsg(remote_file_fd, &msg, 0);
-    } while (ret == -1 && errno == EINTR);
-    if (ret != sizeof(OpenResponse)) {
+    if (qemu_recvmsg(remote_file_fd, &msg, 0) != sizeof(OpenResponse)) {
         errno = EPIPE;
         return -1;
     }
diff --git a/osdep.c b/osdep.c
index 3e6bada..834e78f 100644
--- a/osdep.c
+++ b/osdep.c
@@ -103,6 +103,52 @@ int qemu_open(const char *name, int flags, ...)
 }
 
 /*
+ * Receive a message from a socket
+ *
+ * This behaves like recvmsg(2) except that EINTR is handled internally and
+ * never returned.  Passed file descriptors will have O_CLOEXEC set.
+ */
+ssize_t qemu_recvmsg(int fd, struct msghdr *msg, int flags)
+{
+    ssize_t ret;
+
+#ifdef MSG_CMSG_CLOEXEC
+    /* Receive file descriptors with O_CLOEXEC */
+    flags |= MSG_CMSG_CLOEXEC;
+#endif
+
+    do {
+        ret = recvmsg(fd, msg, flags);
+    } while (ret == -1 && errno == EINTR);
+    if (ret < 0) {
+        return ret;
+    }
+
+#ifndef MSG_CMSG_CLOEXEC
+    /* As a fallback, set O_CLOEXEC in a way that is not thread-safe */
+    {
+        struct cmsghdr *cmsg;
+        for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+            int new_fd;
+
+            if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
+                cmsg->cmsg_level != SOL_SOCKET ||
+                cmsg->cmsg_type != SCM_RIGHTS) {
+                continue;
+            }
+
+            new_fd = *((int *)CMSG_DATA(cmsg));
+            if (new_fd >= 0) {
+                qemu_set_cloexec(new_fd);
+            }
+        }
+    }
+#endif
+
+    return ret;
+}
+
+/*
  * A variant of write(2) which handles partial write.
  *
  * Return the number of bytes transferred.
diff --git a/qemu-common.h b/qemu-common.h
index 50f659a..e3a6c4d 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -184,6 +184,8 @@ const char *path(const char *pathname);
 void *qemu_oom_check(void *ptr);
 
 int qemu_open(const char *name, int flags, ...);
+struct msghdr;
+ssize_t qemu_recvmsg(int fd, struct msghdr *msg, int flags);
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
     QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [RFC 5/5] Example -open-hook-fd server
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2012-05-01 15:31 ` [Qemu-devel] [RFC 4/5] osdep: add qemu_recvmsg() wrapper Stefan Hajnoczi
@ 2012-05-01 15:31 ` Stefan Hajnoczi
  2012-05-01 16:04   ` [Qemu-devel] [libvirt] " Stefan Hajnoczi
  2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 15:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list, Corey Bryant

This patch implements a demo server for the new -open-hook-fd feature.
It opens any filename given to it by QEMU and therefore adds no true
security.  But it serves as a good debugging tool to see what requests
QEMU is making.

  $ gcc -o test-fd-passing -Wall test-fd-passing.c
  $ ./test-fd-passing path/to/my/vm.img

Try:

  (qemu) change ide1-cd0 path/to/a/cdrom.iso

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 test-fd-passing.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 test-fd-passing.c

diff --git a/test-fd-passing.c b/test-fd-passing.c
new file mode 100644
index 0000000..43b2e86
--- /dev/null
+++ b/test-fd-passing.c
@@ -0,0 +1,147 @@
+/*
+ * QEMU -open-hook-fd test server
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Stefan Hajnoczi   <stefanha@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ * gcc -o test-fd-passing -Wall test-fd-passing.c
+ */
+
+#define _GNU_SOURCE
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <spawn.h>
+
+typedef struct {
+    uint32_t message_len;
+    uint32_t type;
+    uint32_t flags;
+    uint32_t mode;
+    uint32_t filename_len;
+    uint8_t filename[0];
+} OpenRequest;
+
+typedef struct {
+    uint32_t message_len;
+    uint32_t type;
+    int32_t result;
+} OpenResponse;
+
+int main(int argc, char **argv)
+{
+    if (argc != 2) {
+        fprintf(stderr, "usage: %s <image-file>\n", argv[0]);
+        return EXIT_FAILURE;
+    }
+
+    int fds[2];
+    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fds) != 0) {
+        perror("socketpair");
+        return EXIT_FAILURE;
+    }
+
+    char *fdstr = NULL;
+    asprintf(&fdstr, "%d", fds[1]);
+
+    char *drivestr = NULL;
+    asprintf(&drivestr, "file=%s,cache=none,if=virtio", argv[1]);
+
+    char *child_argv[] = {
+        "qemu-system-x86_64",
+        "-enable-kvm",
+        "-m", "1024",
+        "-drive", drivestr,
+        "-open-hook-fd", fdstr,
+        NULL,
+    };
+
+    pid_t child_pid;
+    if (posix_spawn(&child_pid, "x86_64-softmmu/qemu-system-x86_64",
+                    NULL, NULL, child_argv, environ) != 0) {
+        fprintf(stderr, "posix_spawn failed\n");
+        return EXIT_FAILURE;
+    }
+    free(drivestr);
+    free(fdstr);
+    close(fds[1]);
+
+    for (;;) {
+        OpenRequest req;
+        char filename[1024];
+
+        if (recv(fds[0], &req, sizeof(req), 0) != sizeof(req)) {
+            perror("recv");
+            return EXIT_FAILURE;
+        }
+
+        if (req.type != 1 /* OpenRequest */) {
+            fprintf(stderr, "Expected request type 1, got %u\n", req.type);
+            return EXIT_FAILURE;
+        }
+
+        if (req.filename_len > sizeof(filename) - 1) {
+            fprintf(stderr, "Filename length too large (%u)\n",
+                    req.filename_len);
+            return EXIT_FAILURE;
+        }
+
+        if (recv(fds[0], filename, req.filename_len, 0) != req.filename_len) {
+            perror("recv");
+            return EXIT_FAILURE;
+        }
+        filename[req.filename_len] = '\0';
+
+        fprintf(stderr, "open(\"%s\", %#x, %#o) = ",
+                filename, req.flags, req.mode);
+
+        int fd, ret;
+        fd = ret = open(filename, req.flags, req.mode);
+
+        fprintf(stderr, "%d (errno %d)\n", ret, errno);
+
+        OpenResponse resp = {
+            .message_len = sizeof(resp),
+            .type = 1,
+            .result = ret < 0 ? -errno : 0,
+        };
+        struct iovec iov = {
+            .iov_base = &resp,
+            .iov_len = sizeof(resp),
+        };
+        char buf[CMSG_SPACE(sizeof(int))];
+        struct msghdr msg = {
+            .msg_iov = &iov,
+            .msg_iovlen = 1,
+        };
+        if (ret >= 0) {
+            msg.msg_control = buf;
+            msg.msg_controllen = sizeof(buf);
+
+            struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
+            cmsg->cmsg_level = SOL_SOCKET;
+            cmsg->cmsg_type = SCM_RIGHTS;
+            cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+
+            memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
+        }
+
+        ret = sendmsg(fds[0], &msg, 0);
+        if (ret < 0) {
+            perror("sendmsg");
+            return EXIT_FAILURE;
+        }
+        close(fd);
+    }
+}
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 5/5] Example -open-hook-fd server
  2012-05-01 15:31 ` [Qemu-devel] [RFC 5/5] Example -open-hook-fd server Stefan Hajnoczi
@ 2012-05-01 16:04   ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-01 16:04 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Tue, May 1, 2012 at 4:31 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> This patch implements a demo server for the new -open-hook-fd feature.
> It opens any filename given to it by QEMU and therefore adds no true
> security.  But it serves as a good debugging tool to see what requests
> QEMU is making.
>
>  $ gcc -o test-fd-passing -Wall test-fd-passing.c
>  $ ./test-fd-passing path/to/my/vm.img
>
> Try:
>
>  (qemu) change ide1-cd0 path/to/a/cdrom.iso
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  test-fd-passing.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 test-fd-passing.c

Although this whole series is RFC, this particular patch is not meant
for qemu.git - it's just an example of how to talk to QEMU.  It's not
100% portable or QEMU coding style but please don't worry, this will
not dirty our beautiful source tree :D.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2012-05-01 15:31 ` [Qemu-devel] [RFC 5/5] Example -open-hook-fd server Stefan Hajnoczi
@ 2012-05-01 20:25 ` Anthony Liguori
  2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
                     ` (2 more replies)
  2012-05-02  9:01 ` [Qemu-devel] " Daniel P. Berrange
  2012-05-04  3:28 ` [Qemu-devel] [libvirt] " Zhi Yong Wu
  7 siblings, 3 replies; 37+ messages in thread
From: Anthony Liguori @ 2012-05-01 20:25 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, libvir-list, Corey Bryant, qemu-devel

Thanks for sending this out Stefan.

On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> it from opening files that it should not have access to.  This improves
> security because it prevents the attacker from escaping the QEMU process if
> they manage to gain control.
>
> NFS has been a pain point for SELinux because it does not support labels (which
> I believe are stored in extended attributes).  In other words, it's not
> possible to use SELinux goodness on QEMU when image files are located on NFS.
> Today we have to allow QEMU access to any file on the NFS export rather than
> restricting specifically to the image files that the guest requires.
>
> File descriptor passing is a solution to this problem and might also come in
> handy elsewhere.  Libvirt or another external process chooses files which QEMU
> is allowed to access and provides just those file descriptors - QEMU cannot
> open the files itself.
>
> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> open an image file it sends a request over the given UNIX domain socket.  The
> response includes the file descriptor or an errno on failure.  Please see the
> patches for details on the protocol.
>
> The -open-hook-fd approach allows QEMU to support file descriptor passing
> without changing -drive.  It also supports snapshot_blkdev and other commands
> that re-open image files.
>
> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
> demo -open-hook-fd server and added some small fixes.  Since Anthony is
> traveling right now I'm sending the RFC for discussion.

What I like about this approach is that it's useful outside the block layer and 
is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt 
and let libvirt enforce whatever rules it wants.

This is not meant to be an alternative to blockdev, but even with blockdev, I 
think we still want to use a mechanism like this even with blockdev.

Regards,

Anthony Liguori

>
> Anthony Liguori (3):
>    block: add open() wrapper that can be hooked by libvirt
>    block: add new command line parameter that and protocol description
>    block: plumb up open-hook-fd option
>
> Stefan Hajnoczi (2):
>    osdep: add qemu_recvmsg() wrapper
>    Example -open-hook-fd server
>
>   block.c           |  107 ++++++++++++++++++++++++++++++++++++++
>   block.h           |    2 +
>   block/raw-posix.c |   18 +++----
>   block/raw-win32.c |    2 +-
>   block/vdi.c       |    2 +-
>   block/vmdk.c      |    6 +--
>   block/vpc.c       |    2 +-
>   block/vvfat.c     |    4 +-
>   block_int.h       |   12 +++++
>   osdep.c           |   46 +++++++++++++++++
>   qemu-common.h     |    2 +
>   qemu-options.hx   |   42 +++++++++++++++
>   test-fd-passing.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   vl.c              |    3 ++
>   14 files changed, 378 insertions(+), 17 deletions(-)
>   create mode 100644 test-fd-passing.c
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
@ 2012-05-01 20:56   ` Eric Blake
  2012-05-01 21:52     ` Anthony Liguori
  2012-05-02 16:40     ` Paolo Bonzini
  2012-05-01 21:45   ` [Qemu-devel] " Corey Bryant
  2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
  2 siblings, 2 replies; 37+ messages in thread
From: Eric Blake @ 2012-05-01 20:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, libvir-list, Stefan Hajnoczi, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

On 05/01/2012 02:25 PM, Anthony Liguori wrote:
> Thanks for sending this out Stefan.

Indeed.


>> This series adds the -open-hook-fd command-line option.  Whenever QEMU
>> needs to
>> open an image file it sends a request over the given UNIX domain
>> socket.  The
>> response includes the file descriptor or an errno on failure.  Please
>> see the
>> patches for details on the protocol.
>>
>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>> without changing -drive.  It also supports snapshot_blkdev and other
>> commands
>> that re-open image files.
>>
>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I
>> added a
>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>> traveling right now I'm sending the RFC for discussion.
> 
> What I like about this approach is that it's useful outside the block
> layer and is conceptionally simple from a QEMU PoV.  We simply delegate
> open() to libvirt and let libvirt enforce whatever rules it wants.
> 
> This is not meant to be an alternative to blockdev, but even with
> blockdev, I think we still want to use a mechanism like this even with
> blockdev.

The overall series looks like it would be rather interesting.  What sort
of timing restrictions are there?  For example, the proposed
'drive-reopen' command (probably now delegated to qemu 1.2) would mean
that qemu would be calling back into libvirt in order to do the reopen.
 If libvirt takes its time in passing back an open fd, is it going to
starve qemu from answering unrelated monitor commands in the meantime?
I definitely want to make sure we avoid deadlock where libvirt is
waiting on a monitor command, but the monitor command is waiting on
libvirt to pass an fd.

Is this also an opportunity to request whether a particular fd must be
seekable vs. acceptable as a one-pass read or write, perhaps by whether
the command is 1 (seekable open) or 2 (one-pass open)?  For example,
migration is one-pass (and therefore libvirt passes a pipe which is
hooked up to a helper app that uses O_DIRECT), while block devices must
be seekable.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
  2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
@ 2012-05-01 21:45   ` Corey Bryant
  2012-05-01 21:53     ` Anthony Liguori
  2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
  2 siblings, 1 reply; 37+ messages in thread
From: Corey Bryant @ 2012-05-01 21:45 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, libvir-list, Stefan Hajnoczi, qemu-devel



On 05/01/2012 04:25 PM, Anthony Liguori wrote:
> Thanks for sending this out Stefan.
>
> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>> Libvirt can take advantage of SELinux to restrict the QEMU process and
>> prevent
>> it from opening files that it should not have access to. This improves
>> security because it prevents the attacker from escaping the QEMU
>> process if
>> they manage to gain control.
>>
>> NFS has been a pain point for SELinux because it does not support
>> labels (which
>> I believe are stored in extended attributes). In other words, it's not
>> possible to use SELinux goodness on QEMU when image files are located
>> on NFS.
>> Today we have to allow QEMU access to any file on the NFS export
>> rather than
>> restricting specifically to the image files that the guest requires.
>>
>> File descriptor passing is a solution to this problem and might also
>> come in
>> handy elsewhere. Libvirt or another external process chooses files
>> which QEMU
>> is allowed to access and provides just those file descriptors - QEMU
>> cannot
>> open the files itself.
>>
>> This series adds the -open-hook-fd command-line option. Whenever QEMU
>> needs to
>> open an image file it sends a request over the given UNIX domain
>> socket. The
>> response includes the file descriptor or an errno on failure. Please
>> see the
>> patches for details on the protocol.
>>
>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>> without changing -drive. It also supports snapshot_blkdev and other
>> commands
>> that re-open image files.
>>
>> Anthony Liguori<aliguori@us.ibm.com> wrote most of these patches. I
>> added a
>> demo -open-hook-fd server and added some small fixes. Since Anthony is
>> traveling right now I'm sending the RFC for discussion.
>
> What I like about this approach is that it's useful outside the block
> layer and is conceptionally simple from a QEMU PoV. We simply delegate
> open() to libvirt and let libvirt enforce whatever rules it wants.
>
> This is not meant to be an alternative to blockdev, but even with
> blockdev, I think we still want to use a mechanism like this even with
> blockdev.
>
> Regards,
>
> Anthony Liguori
>

I like it too and I think it's a better solution than the fd: protocol 
approach.

I think (correct me if I'm wrong) libvirt should be aware of any file 
that qemu asks it to open.  So from a security point of view, libvirt 
can prevent opening a file if it isn't affiliated with the guest.

-- 
Regards,
Corey

>>
>> Anthony Liguori (3):
>> block: add open() wrapper that can be hooked by libvirt
>> block: add new command line parameter that and protocol description
>> block: plumb up open-hook-fd option
>>
>> Stefan Hajnoczi (2):
>> osdep: add qemu_recvmsg() wrapper
>> Example -open-hook-fd server
>>
>> block.c | 107 ++++++++++++++++++++++++++++++++++++++
>> block.h | 2 +
>> block/raw-posix.c | 18 +++----
>> block/raw-win32.c | 2 +-
>> block/vdi.c | 2 +-
>> block/vmdk.c | 6 +--
>> block/vpc.c | 2 +-
>> block/vvfat.c | 4 +-
>> block_int.h | 12 +++++
>> osdep.c | 46 +++++++++++++++++
>> qemu-common.h | 2 +
>> qemu-options.hx | 42 +++++++++++++++
>> test-fd-passing.c | 147
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> vl.c | 3 ++
>> 14 files changed, 378 insertions(+), 17 deletions(-)
>> create mode 100644 test-fd-passing.c
>>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
@ 2012-05-01 21:52     ` Anthony Liguori
  2012-05-02 16:40     ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2012-05-01 21:52 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On 05/01/2012 03:56 PM, Eric Blake wrote:
> On 05/01/2012 02:25 PM, Anthony Liguori wrote:
>> Thanks for sending this out Stefan.
>
> Indeed.
>
>
>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU
>>> needs to
>>> open an image file it sends a request over the given UNIX domain
>>> socket.  The
>>> response includes the file descriptor or an errno on failure.  Please
>>> see the
>>> patches for details on the protocol.
>>>
>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>> without changing -drive.  It also supports snapshot_blkdev and other
>>> commands
>>> that re-open image files.
>>>
>>> Anthony Liguori<aliguori@us.ibm.com>   wrote most of these patches.  I
>>> added a
>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>>> traveling right now I'm sending the RFC for discussion.
>>
>> What I like about this approach is that it's useful outside the block
>> layer and is conceptionally simple from a QEMU PoV.  We simply delegate
>> open() to libvirt and let libvirt enforce whatever rules it wants.
>>
>> This is not meant to be an alternative to blockdev, but even with
>> blockdev, I think we still want to use a mechanism like this even with
>> blockdev.
>
> The overall series looks like it would be rather interesting.  What sort
> of timing restrictions are there?  For example, the proposed
> 'drive-reopen' command (probably now delegated to qemu 1.2) would mean
> that qemu would be calling back into libvirt in order to do the reopen.
>   If libvirt takes its time in passing back an open fd, is it going to
> starve qemu from answering unrelated monitor commands in the meantime?

s/libvirt/kernel/g and your concerns are equally valid.

Doing open() should never be done in a path that could block things.  There's 
always the possibility that we're on top of NFS and the open could timeout.

For something like drive_reopen, we should use an asynchronous open() that 
dispatched the open() in the posix-aio thread pool.

That's part of what's nice about this approach, we could still call file_open() 
in the posix-aio thread pool...

> I definitely want to make sure we avoid deadlock where libvirt is
> waiting on a monitor command, but the monitor command is waiting on
> libvirt to pass an fd.
>
> Is this also an opportunity to request whether a particular fd must be
> seekable vs. acceptable as a one-pass read or write, perhaps by whether
> the command is 1 (seekable open) or 2 (one-pass open)?

I'm not really sure where the distinction lies...

I want the RPC to behave exactly like open().  So if we're assuming that open() 
of a /dev/ file returns something that is ioctl()'able, then that's what libvirt 
should return.

If we want to sort of do fd-transformation where a special protocol is used for 
things like ioctl, that's fine, but it ought to be a different mechanism (that's 
probably not nearly as generic).

> For example,
> migration is one-pass (and therefore libvirt passes a pipe which is
> hooked up to a helper app that uses O_DIRECT), while block devices must
> be seekable.

But migration doesn't involve doing an open().  This is not a replacement for fd 
passing.  This is a replacement for open() to make up for the facts that (1) 
some management tools like libvirt cannot isolate guests with DAC and (2) 
SELinux cannot be used to isolate guests across all file systems.

I would really prefer that the kernel fix this problem for us, but from what I'm 
told, the problem lies in the NFS standards committee so short of forking the 
NFS protocol, there isn't much that the kernel can do.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 21:45   ` [Qemu-devel] " Corey Bryant
@ 2012-05-01 21:53     ` Anthony Liguori
  2012-05-01 22:15       ` [Qemu-devel] [libvirt] " Eric Blake
  0 siblings, 1 reply; 37+ messages in thread
From: Anthony Liguori @ 2012-05-01 21:53 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On 05/01/2012 04:45 PM, Corey Bryant wrote:
>
>
> On 05/01/2012 04:25 PM, Anthony Liguori wrote:
>> Thanks for sending this out Stefan.
>>
>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>>> Libvirt can take advantage of SELinux to restrict the QEMU process and
>>> prevent
>>> it from opening files that it should not have access to. This improves
>>> security because it prevents the attacker from escaping the QEMU
>>> process if
>>> they manage to gain control.
>>>
>>> NFS has been a pain point for SELinux because it does not support
>>> labels (which
>>> I believe are stored in extended attributes). In other words, it's not
>>> possible to use SELinux goodness on QEMU when image files are located
>>> on NFS.
>>> Today we have to allow QEMU access to any file on the NFS export
>>> rather than
>>> restricting specifically to the image files that the guest requires.
>>>
>>> File descriptor passing is a solution to this problem and might also
>>> come in
>>> handy elsewhere. Libvirt or another external process chooses files
>>> which QEMU
>>> is allowed to access and provides just those file descriptors - QEMU
>>> cannot
>>> open the files itself.
>>>
>>> This series adds the -open-hook-fd command-line option. Whenever QEMU
>>> needs to
>>> open an image file it sends a request over the given UNIX domain
>>> socket. The
>>> response includes the file descriptor or an errno on failure. Please
>>> see the
>>> patches for details on the protocol.
>>>
>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>> without changing -drive. It also supports snapshot_blkdev and other
>>> commands
>>> that re-open image files.
>>>
>>> Anthony Liguori<aliguori@us.ibm.com> wrote most of these patches. I
>>> added a
>>> demo -open-hook-fd server and added some small fixes. Since Anthony is
>>> traveling right now I'm sending the RFC for discussion.
>>
>> What I like about this approach is that it's useful outside the block
>> layer and is conceptionally simple from a QEMU PoV. We simply delegate
>> open() to libvirt and let libvirt enforce whatever rules it wants.
>>
>> This is not meant to be an alternative to blockdev, but even with
>> blockdev, I think we still want to use a mechanism like this even with
>> blockdev.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
> I like it too and I think it's a better solution than the fd: protocol approach.
>
> I think (correct me if I'm wrong) libvirt should be aware of any file that qemu
> asks it to open. So from a security point of view, libvirt can prevent opening a
> file if it isn't affiliated with the guest.

Right, libvirt can maintain a whitelist of files QEMU is allowed to open (which 
is already has because it needs to label these files).  The only complexity is 
that it's not a straight strcmp().  The path needs to be (carefully) broken into 
components with '.' and '..' handled appropriately.  But this shouldn't be that 
difficult to do.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 21:53     ` Anthony Liguori
@ 2012-05-01 22:15       ` Eric Blake
  2012-05-01 22:21         ` Anthony Liguori
  2012-05-07 16:10         ` Corey Bryant
  0 siblings, 2 replies; 37+ messages in thread
From: Eric Blake @ 2012-05-01 22:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list,
	Corey Bryant, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]

On 05/01/2012 03:53 PM, Anthony Liguori wrote:

>> I think (correct me if I'm wrong) libvirt should be aware of any file
>> that qemu
>> asks it to open. So from a security point of view, libvirt can prevent
>> opening a
>> file if it isn't affiliated with the guest.
> 
> Right, libvirt can maintain a whitelist of files QEMU is allowed to open
> (which is already has because it needs to label these files).

Indeed.

>  The only
> complexity is that it's not a straight strcmp().  The path needs to be
> (carefully) broken into components with '.' and '..' handled
> appropriately.  But this shouldn't be that difficult to do.

Libvirt would probably canonicalize path names, both when sticking them
in the whitelist, and in validating the requests from qemu - agreed that
it's not difficult.

More importantly, libvirt needs to start tracking the backing chain of
any qcow2 or qed file as part of the domain XML; and operations like
'block-stream' would update not only the chain, but also the whitelist.
 In the drive-reopen case, this means that libvirt would have to be
careful when to change labeling - provide access to the new files before
drive-reopen, then revoke access to files after drive-reopen completes.
 In other words, having the -open-hook-fd client pass a command to
libvirt at the time it is closing an fd would help libvirt know when
qemu has quit using a file, which might make it easier to revoke SELinux
labels at that time.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 22:15       ` [Qemu-devel] [libvirt] " Eric Blake
@ 2012-05-01 22:21         ` Anthony Liguori
  2012-05-07 16:10         ` Corey Bryant
  1 sibling, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2012-05-01 22:21 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list,
	Corey Bryant, qemu-devel

On 05/01/2012 05:15 PM, Eric Blake wrote:
> On 05/01/2012 03:53 PM, Anthony Liguori wrote:
>
>>> I think (correct me if I'm wrong) libvirt should be aware of any file
>>> that qemu
>>> asks it to open. So from a security point of view, libvirt can prevent
>>> opening a
>>> file if it isn't affiliated with the guest.
>>
>> Right, libvirt can maintain a whitelist of files QEMU is allowed to open
>> (which is already has because it needs to label these files).
>
> Indeed.
>
>>   The only
>> complexity is that it's not a straight strcmp().  The path needs to be
>> (carefully) broken into components with '.' and '..' handled
>> appropriately.  But this shouldn't be that difficult to do.
>
> Libvirt would probably canonicalize path names, both when sticking them
> in the whitelist, and in validating the requests from qemu - agreed that
> it's not difficult.
>
> More importantly, libvirt needs to start tracking the backing chain of
> any qcow2 or qed file as part of the domain XML; and operations like
> 'block-stream' would update not only the chain, but also the whitelist.
>   In the drive-reopen case, this means that libvirt would have to be
> careful when to change labeling

Would you give QEMU open access or change the way you label to only allow 
read/write access?  I think the later is probably the better approach.

So presumably, you'll need to adjust the sVirt policy too...

You'll need to detect if a file is on NFS too and figure out what the default 
label is that was given so you can build the rules correctly.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
  2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
  2012-05-01 21:45   ` [Qemu-devel] " Corey Bryant
@ 2012-05-02  8:20   ` Kevin Wolf
  2012-05-02  8:27     ` Stefan Hajnoczi
  2012-05-02  8:53     ` [Qemu-devel] [libvirt] " Daniel P. Berrange
  2 siblings, 2 replies; 37+ messages in thread
From: Kevin Wolf @ 2012-05-02  8:20 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: libvir-list, Corey Bryant, Stefan Hajnoczi, qemu-devel

Am 01.05.2012 22:25, schrieb Anthony Liguori:
> Thanks for sending this out Stefan.
> 
> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>> it from opening files that it should not have access to.  This improves
>> security because it prevents the attacker from escaping the QEMU process if
>> they manage to gain control.
>>
>> NFS has been a pain point for SELinux because it does not support labels (which
>> I believe are stored in extended attributes).  In other words, it's not
>> possible to use SELinux goodness on QEMU when image files are located on NFS.
>> Today we have to allow QEMU access to any file on the NFS export rather than
>> restricting specifically to the image files that the guest requires.
>>
>> File descriptor passing is a solution to this problem and might also come in
>> handy elsewhere.  Libvirt or another external process chooses files which QEMU
>> is allowed to access and provides just those file descriptors - QEMU cannot
>> open the files itself.
>>
>> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>> open an image file it sends a request over the given UNIX domain socket.  The
>> response includes the file descriptor or an errno on failure.  Please see the
>> patches for details on the protocol.
>>
>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>> without changing -drive.  It also supports snapshot_blkdev and other commands
>> that re-open image files.
>>
>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>> traveling right now I'm sending the RFC for discussion.
> 
> What I like about this approach is that it's useful outside the block layer and 
> is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt 
> and let libvirt enforce whatever rules it wants.
> 
> This is not meant to be an alternative to blockdev, but even with blockdev, I 
> think we still want to use a mechanism like this even with blockdev.

What does it provide on top?

This doesn't look like something that I'd like a lot. qemu should be
able to continue to run no matter what the management tool does, whether
it responds to RPCs properly or whether it has crashed. You need a
really good use case for the RPC that cannot be covered otherwise in
order to justify this.

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
@ 2012-05-02  8:27     ` Stefan Hajnoczi
  2012-05-02  9:38       ` Kevin Wolf
  2012-05-02  8:53     ` [Qemu-devel] [libvirt] " Daniel P. Berrange
  1 sibling, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-02  8:27 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: libvir-list, Anthony Liguori, Corey Bryant, Stefan Hajnoczi, qemu-devel

On Wed, May 2, 2012 at 9:20 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 01.05.2012 22:25, schrieb Anthony Liguori:
>> Thanks for sending this out Stefan.
>>
>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>>> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>>> it from opening files that it should not have access to.  This improves
>>> security because it prevents the attacker from escaping the QEMU process if
>>> they manage to gain control.
>>>
>>> NFS has been a pain point for SELinux because it does not support labels (which
>>> I believe are stored in extended attributes).  In other words, it's not
>>> possible to use SELinux goodness on QEMU when image files are located on NFS.
>>> Today we have to allow QEMU access to any file on the NFS export rather than
>>> restricting specifically to the image files that the guest requires.
>>>
>>> File descriptor passing is a solution to this problem and might also come in
>>> handy elsewhere.  Libvirt or another external process chooses files which QEMU
>>> is allowed to access and provides just those file descriptors - QEMU cannot
>>> open the files itself.
>>>
>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>>> open an image file it sends a request over the given UNIX domain socket.  The
>>> response includes the file descriptor or an errno on failure.  Please see the
>>> patches for details on the protocol.
>>>
>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>>> that re-open image files.
>>>
>>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>>> traveling right now I'm sending the RFC for discussion.
>>
>> What I like about this approach is that it's useful outside the block layer and
>> is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt
>> and let libvirt enforce whatever rules it wants.
>>
>> This is not meant to be an alternative to blockdev, but even with blockdev, I
>> think we still want to use a mechanism like this even with blockdev.
>
> What does it provide on top?

It solves the problem of snapshot_blkdev and other operations that
re-open files.  Using -blockdev and hotplug for image files as file
descriptors only solves the static configuration problem, not the
runtime problem we get with snapshot_blkdev.  That's why this approach
is more powerful than -blockdev fd=N.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
  2012-05-02  8:27     ` Stefan Hajnoczi
@ 2012-05-02  8:53     ` Daniel P. Berrange
  2012-05-02  9:45       ` Kevin Wolf
  1 sibling, 1 reply; 37+ messages in thread
From: Daniel P. Berrange @ 2012-05-02  8:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote:
> Am 01.05.2012 22:25, schrieb Anthony Liguori:
> > Thanks for sending this out Stefan.
> > 
> > On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
> >> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> >> it from opening files that it should not have access to.  This improves
> >> security because it prevents the attacker from escaping the QEMU process if
> >> they manage to gain control.
> >>
> >> NFS has been a pain point for SELinux because it does not support labels (which
> >> I believe are stored in extended attributes).  In other words, it's not
> >> possible to use SELinux goodness on QEMU when image files are located on NFS.
> >> Today we have to allow QEMU access to any file on the NFS export rather than
> >> restricting specifically to the image files that the guest requires.
> >>
> >> File descriptor passing is a solution to this problem and might also come in
> >> handy elsewhere.  Libvirt or another external process chooses files which QEMU
> >> is allowed to access and provides just those file descriptors - QEMU cannot
> >> open the files itself.
> >>
> >> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> >> open an image file it sends a request over the given UNIX domain socket.  The
> >> response includes the file descriptor or an errno on failure.  Please see the
> >> patches for details on the protocol.
> >>
> >> The -open-hook-fd approach allows QEMU to support file descriptor passing
> >> without changing -drive.  It also supports snapshot_blkdev and other commands
> >> that re-open image files.
> >>
> >> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
> >> demo -open-hook-fd server and added some small fixes.  Since Anthony is
> >> traveling right now I'm sending the RFC for discussion.
> > 
> > What I like about this approach is that it's useful outside the block layer and 
> > is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt 
> > and let libvirt enforce whatever rules it wants.
> > 
> > This is not meant to be an alternative to blockdev, but even with blockdev, I 
> > think we still want to use a mechanism like this even with blockdev.
> 
> What does it provide on top?
> 
> This doesn't look like something that I'd like a lot. qemu should be
> able to continue to run no matter what the management tool does, whether
> it responds to RPCs properly or whether it has crashed. You need a
> really good use case for the RPC that cannot be covered otherwise in
> order to justify this.

Indeed, this solution breaks if you stop or restart libvirtd while
QEMU is running.  Restarting libvirt while QEMU is running is something
we must support, since installing RPM updates will restart libvirtd
and we cannot let guests die in this case.

I would much prefer to see us be able to pass FDs in directly alongside
the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be
fixed so that you do not need to re-open FDs on the fly.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description
  2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
@ 2012-05-02  8:58   ` Daniel P. Berrange
  2012-05-02  9:03   ` Daniel P. Berrange
  1 sibling, 0 replies; 37+ messages in thread
From: Daniel P. Berrange @ 2012-05-02  8:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Corey Bryant, qemu-devel

On Tue, May 01, 2012 at 04:31:44PM +0100, Stefan Hajnoczi wrote:
> From: Anthony Liguori <aliguori@us.ibm.com>
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  qemu-options.hx |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index a169792..ccf4d1d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2724,6 +2724,48 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log,
>      "-qtest-log LOG  specify tracing options\n",
>      QEMU_ARCH_ALL)
>  
> +DEF("open-hook-fd", HAS_ARG, QEMU_OPTION_open_hook_fd,
> +    "-open-hook-fd <fd>\n"
> +    "                delegate opens to external process using <fd>\n", QEMU_ARCH_ALL)
> +STEXI
> +@item -open-hook-fd @var{fd}
> +@findex -open-hook-fd
> +Delegates open()s to an external process using @var<fd> to communicate commands.
> +@var<fd> should be an open Unix Domain socket pipe that file descriptors can be
> +received from.  The protocol the socket uses is a simple request/response initiated
> +by the client.  All integers are in host byte order.  It is assumed that this protocol
> +is only ever used on the same physical machine.  It is currently defined as:
> +
> +u32 message_size
> +u32 command
> +u8  payload[message_size - 8]
> +
> +The contents of payload depend on command.  Currently the following commands are
> +defined:
> +
> +1. QEMU_OPEN (1)
> +
> +The full message will be:
> +
> +u32 message_size
> +u32 command = 1
> +u32 flags (O_ flags defined by libc)
> +u32 mode (mode_t flags as defined by libc)
> +u16 filename_len;
> +u8  filename[filename_len]
> +
> +The server will then respond with:
> +
> +u32 message_size
> +u32 command = 1
> +s32 result

If we're going for a binary protocol, then I'd like to see it defined
based on the XDR specification, so we can auto-generate our data
marshallers/demarshallers using existing tools / libraries and not
have to write something custom by hand. Your spec here is close enough
that it would not be significant work. The changes would be

 - Everything is always big-endian
 - Each field has 4-byte alignment
 - Strings would have a u32 length, and the payload padded with NUL
   to the 4 byte boundary

Daniel.
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
@ 2012-05-02  9:01 ` Daniel P. Berrange
  2012-05-04  3:28 ` [Qemu-devel] [libvirt] " Zhi Yong Wu
  7 siblings, 0 replies; 37+ messages in thread
From: Daniel P. Berrange @ 2012-05-02  9:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Corey Bryant, qemu-devel

On Tue, May 01, 2012 at 04:31:42PM +0100, Stefan Hajnoczi wrote:
> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> it from opening files that it should not have access to.  This improves
> security because it prevents the attacker from escaping the QEMU process if
> they manage to gain control.
> 
> NFS has been a pain point for SELinux because it does not support labels (which
> I believe are stored in extended attributes).  In other words, it's not
> possible to use SELinux goodness on QEMU when image files are located on NFS.
> Today we have to allow QEMU access to any file on the NFS export rather than
> restricting specifically to the image files that the guest requires.
> 
> File descriptor passing is a solution to this problem and might also come in
> handy elsewhere.  Libvirt or another external process chooses files which QEMU
> is allowed to access and provides just those file descriptors - QEMU cannot
> open the files itself.
> 
> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> open an image file it sends a request over the given UNIX domain socket.  The
> response includes the file descriptor or an errno on failure.  Please see the
> patches for details on the protocol.
> 
> The -open-hook-fd approach allows QEMU to support file descriptor passing
> without changing -drive.  It also supports snapshot_blkdev and other commands
> that re-open image files.

While it would work, I really am not a fan of this approach overall, since
I think it adds significant complexity & great potential for problems. In
particular the potential for deadlock between the mgmt app & QEMU (sure
we should design to avoid deadlocks, but mistakes happen). It is also not
resilient against the mgmt app going away (whether due to crash, or intentional
restart for RPM upgrades). I strongly prefer to see FDs being passed in at
time of device configuration/creation as with netdevs, and then fixing the
places which re-open files to not require that.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description
  2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
  2012-05-02  8:58   ` Daniel P. Berrange
@ 2012-05-02  9:03   ` Daniel P. Berrange
  1 sibling, 0 replies; 37+ messages in thread
From: Daniel P. Berrange @ 2012-05-02  9:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Corey Bryant, qemu-devel

On Tue, May 01, 2012 at 04:31:44PM +0100, Stefan Hajnoczi wrote:
> From: Anthony Liguori <aliguori@us.ibm.com>
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  qemu-options.hx |   42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index a169792..ccf4d1d 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2724,6 +2724,48 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log,
>      "-qtest-log LOG  specify tracing options\n",
>      QEMU_ARCH_ALL)
>  
> +DEF("open-hook-fd", HAS_ARG, QEMU_OPTION_open_hook_fd,
> +    "-open-hook-fd <fd>\n"
> +    "                delegate opens to external process using <fd>\n", QEMU_ARCH_ALL)
> +STEXI
> +@item -open-hook-fd @var{fd}
> +@findex -open-hook-fd
> +Delegates open()s to an external process using @var<fd> to communicate commands.
> +@var<fd> should be an open Unix Domain socket pipe that file descriptors can be
> +received from.  The protocol the socket uses is a simple request/response initiated
> +by the client.  All integers are in host byte order.  It is assumed that this protocol
> +is only ever used on the same physical machine.  It is currently defined as:
> +
> +u32 message_size
> +u32 command
> +u8  payload[message_size - 8]
> +
> +The contents of payload depend on command.  Currently the following commands are
> +defined:
> +
> +1. QEMU_OPEN (1)
> +
> +The full message will be:
> +
> +u32 message_size
> +u32 command = 1
> +u32 flags (O_ flags defined by libc)
> +u32 mode (mode_t flags as defined by libc)
> +u16 filename_len;
> +u8  filename[filename_len]

I think this should include the ID of the backend of the drive wanting
this file opened, so you can reliably match up to the -drive that mgmt
configured at startup. ie not have to search through every single
device in the guest & all their backing files to find if there is a
match.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  8:27     ` Stefan Hajnoczi
@ 2012-05-02  9:38       ` Kevin Wolf
  0 siblings, 0 replies; 37+ messages in thread
From: Kevin Wolf @ 2012-05-02  9:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: libvir-list, Anthony Liguori, Corey Bryant, Stefan Hajnoczi, qemu-devel

Am 02.05.2012 10:27, schrieb Stefan Hajnoczi:
> On Wed, May 2, 2012 at 9:20 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 01.05.2012 22:25, schrieb Anthony Liguori:
>>> Thanks for sending this out Stefan.
>>>
>>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>>>> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>>>> it from opening files that it should not have access to.  This improves
>>>> security because it prevents the attacker from escaping the QEMU process if
>>>> they manage to gain control.
>>>>
>>>> NFS has been a pain point for SELinux because it does not support labels (which
>>>> I believe are stored in extended attributes).  In other words, it's not
>>>> possible to use SELinux goodness on QEMU when image files are located on NFS.
>>>> Today we have to allow QEMU access to any file on the NFS export rather than
>>>> restricting specifically to the image files that the guest requires.
>>>>
>>>> File descriptor passing is a solution to this problem and might also come in
>>>> handy elsewhere.  Libvirt or another external process chooses files which QEMU
>>>> is allowed to access and provides just those file descriptors - QEMU cannot
>>>> open the files itself.
>>>>
>>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>>>> open an image file it sends a request over the given UNIX domain socket.  The
>>>> response includes the file descriptor or an errno on failure.  Please see the
>>>> patches for details on the protocol.
>>>>
>>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>>>> that re-open image files.
>>>>
>>>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
>>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>>>> traveling right now I'm sending the RFC for discussion.
>>>
>>> What I like about this approach is that it's useful outside the block layer and
>>> is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt
>>> and let libvirt enforce whatever rules it wants.
>>>
>>> This is not meant to be an alternative to blockdev, but even with blockdev, I
>>> think we still want to use a mechanism like this even with blockdev.
>>
>> What does it provide on top?
> 
> It solves the problem of snapshot_blkdev and other operations that
> re-open files.  Using -blockdev and hotplug for image files as file
> descriptors only solves the static configuration problem, not the
> runtime problem we get with snapshot_blkdev.  That's why this approach
> is more powerful than -blockdev fd=N.

The important thing that blockdev-add must define is a QAPI type that
describes a block device. This is the same type that can be used for all
other operations as well.

In fact, live snapshots don't even require this. You just pass fd:42 (or
actually whatever the -blockdev syntax for it will be) as the new image
and use 'existing' mode. You don't even need blockdev-add for this, you
can do it with what we have today. (Warning, hack: You can even use
"live snapshots" with existing files to describe the full backing file
structure today and use only FDs. The only new thing you need there is
the fd protocol. But... don't use this.)

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  8:53     ` [Qemu-devel] [libvirt] " Daniel P. Berrange
@ 2012-05-02  9:45       ` Kevin Wolf
  2012-05-02  9:56         ` Daniel P. Berrange
  2012-05-03 19:19         ` Anthony Liguori
  0 siblings, 2 replies; 37+ messages in thread
From: Kevin Wolf @ 2012-05-02  9:45 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

Am 02.05.2012 10:53, schrieb Daniel P. Berrange:
> On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote:
>> Am 01.05.2012 22:25, schrieb Anthony Liguori:
>>> Thanks for sending this out Stefan.
>>>
>>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
>>>> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>>>> it from opening files that it should not have access to.  This improves
>>>> security because it prevents the attacker from escaping the QEMU process if
>>>> they manage to gain control.
>>>>
>>>> NFS has been a pain point for SELinux because it does not support labels (which
>>>> I believe are stored in extended attributes).  In other words, it's not
>>>> possible to use SELinux goodness on QEMU when image files are located on NFS.
>>>> Today we have to allow QEMU access to any file on the NFS export rather than
>>>> restricting specifically to the image files that the guest requires.
>>>>
>>>> File descriptor passing is a solution to this problem and might also come in
>>>> handy elsewhere.  Libvirt or another external process chooses files which QEMU
>>>> is allowed to access and provides just those file descriptors - QEMU cannot
>>>> open the files itself.
>>>>
>>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>>>> open an image file it sends a request over the given UNIX domain socket.  The
>>>> response includes the file descriptor or an errno on failure.  Please see the
>>>> patches for details on the protocol.
>>>>
>>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>>>> that re-open image files.
>>>>
>>>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
>>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>>>> traveling right now I'm sending the RFC for discussion.
>>>
>>> What I like about this approach is that it's useful outside the block layer and 
>>> is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt 
>>> and let libvirt enforce whatever rules it wants.
>>>
>>> This is not meant to be an alternative to blockdev, but even with blockdev, I 
>>> think we still want to use a mechanism like this even with blockdev.
>>
>> What does it provide on top?
>>
>> This doesn't look like something that I'd like a lot. qemu should be
>> able to continue to run no matter what the management tool does, whether
>> it responds to RPCs properly or whether it has crashed. You need a
>> really good use case for the RPC that cannot be covered otherwise in
>> order to justify this.
> 
> Indeed, this solution breaks if you stop or restart libvirtd while
> QEMU is running.  Restarting libvirt while QEMU is running is something
> we must support, since installing RPM updates will restart libvirtd
> and we cannot let guests die in this case.
> 
> I would much prefer to see us be able to pass FDs in directly alongside
> the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be
> fixed so that you do not need to re-open FDs on the fly.

I agree, and this is what -blockdev would give us.

Part of why I don't like the RFC (apart from RPCing the management tool
being just wrong) is that once again it's trying to take shortcuts and
only provide a hack for the urgent need instead of doing it properly and
implementing -blockdev. I suspect that if we take something half-baked
like this, we will keep being unhappy with the situation in the block
layer, but it won't hurt enough any more to actually spend effort on it,
so that we'll go another five years with it.

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  9:45       ` Kevin Wolf
@ 2012-05-02  9:56         ` Daniel P. Berrange
  2012-05-02 19:25           ` Paolo Bonzini
  2012-05-03 19:19         ` Anthony Liguori
  1 sibling, 1 reply; 37+ messages in thread
From: Daniel P. Berrange @ 2012-05-02  9:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

On Wed, May 02, 2012 at 11:45:26AM +0200, Kevin Wolf wrote:
> Am 02.05.2012 10:53, schrieb Daniel P. Berrange:
> > On Wed, May 02, 2012 at 10:20:17AM +0200, Kevin Wolf wrote:
> >> Am 01.05.2012 22:25, schrieb Anthony Liguori:
> >>> Thanks for sending this out Stefan.
> >>>
> >>> On 05/01/2012 10:31 AM, Stefan Hajnoczi wrote:
> >>>> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> >>>> it from opening files that it should not have access to.  This improves
> >>>> security because it prevents the attacker from escaping the QEMU process if
> >>>> they manage to gain control.
> >>>>
> >>>> NFS has been a pain point for SELinux because it does not support labels (which
> >>>> I believe are stored in extended attributes).  In other words, it's not
> >>>> possible to use SELinux goodness on QEMU when image files are located on NFS.
> >>>> Today we have to allow QEMU access to any file on the NFS export rather than
> >>>> restricting specifically to the image files that the guest requires.
> >>>>
> >>>> File descriptor passing is a solution to this problem and might also come in
> >>>> handy elsewhere.  Libvirt or another external process chooses files which QEMU
> >>>> is allowed to access and provides just those file descriptors - QEMU cannot
> >>>> open the files itself.
> >>>>
> >>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> >>>> open an image file it sends a request over the given UNIX domain socket.  The
> >>>> response includes the file descriptor or an errno on failure.  Please see the
> >>>> patches for details on the protocol.
> >>>>
> >>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
> >>>> without changing -drive.  It also supports snapshot_blkdev and other commands
> >>>> that re-open image files.
> >>>>
> >>>> Anthony Liguori<aliguori@us.ibm.com>  wrote most of these patches.  I added a
> >>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
> >>>> traveling right now I'm sending the RFC for discussion.
> >>>
> >>> What I like about this approach is that it's useful outside the block layer and 
> >>> is conceptionally simple from a QEMU PoV.  We simply delegate open() to libvirt 
> >>> and let libvirt enforce whatever rules it wants.
> >>>
> >>> This is not meant to be an alternative to blockdev, but even with blockdev, I 
> >>> think we still want to use a mechanism like this even with blockdev.
> >>
> >> What does it provide on top?
> >>
> >> This doesn't look like something that I'd like a lot. qemu should be
> >> able to continue to run no matter what the management tool does, whether
> >> it responds to RPCs properly or whether it has crashed. You need a
> >> really good use case for the RPC that cannot be covered otherwise in
> >> order to justify this.
> > 
> > Indeed, this solution breaks if you stop or restart libvirtd while
> > QEMU is running.  Restarting libvirt while QEMU is running is something
> > we must support, since installing RPM updates will restart libvirtd
> > and we cannot let guests die in this case.
> > 
> > I would much prefer to see us be able to pass FDs in directly alongside
> > the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be
> > fixed so that you do not need to re-open FDs on the fly.
> 
> I agree, and this is what -blockdev would give us.
> 
> Part of why I don't like the RFC (apart from RPCing the management tool
> being just wrong) is that once again it's trying to take shortcuts and
> only provide a hack for the urgent need instead of doing it properly and
> implementing -blockdev. I suspect that if we take something half-baked
> like this, we will keep being unhappy with the situation in the block
> layer, but it won't hurt enough any more to actually spend effort on it,
> so that we'll go another five years with it.

I tend to agree - we have been talking about -blockdev for faar to long
without (AFAICT) making any real progress towards getting it done. I'd
love to see someone bite the bullet & have a go at implementing it


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
  2012-05-01 21:52     ` Anthony Liguori
@ 2012-05-02 16:40     ` Paolo Bonzini
  1 sibling, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2012-05-02 16:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

Il 01/05/2012 22:56, Eric Blake ha scritto:
> What sort
> of timing restrictions are there?  For example, the proposed
> 'drive-reopen' command (probably now delegated to qemu 1.2) would mean
> that qemu would be calling back into libvirt in order to do the reopen.
>  If libvirt takes its time in passing back an open fd, is it going to
> starve qemu from answering unrelated monitor commands in the meantime?
> I definitely want to make sure we avoid deadlock where libvirt is
> waiting on a monitor command, but the monitor command is waiting on
> libvirt to pass an fd.

FWIW I'm going to kill drive-reopen in favor of something like
block-job-complete that will not require reopening (it will require
opening the backing files though, and that can also take time).

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  9:56         ` Daniel P. Berrange
@ 2012-05-02 19:25           ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2012-05-02 19:25 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, libvir-list, Anthony Liguori, Stefan Hajnoczi, qemu-devel

Il 02/05/2012 11:56, Daniel P. Berrange ha scritto:
> I tend to agree - we have been talking about -blockdev for faar to long
> without (AFAICT) making any real progress towards getting it done. I'd
> love to see someone bite the bullet & have a go at implementing it

Having a spec would help somewhat...

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-02  9:45       ` Kevin Wolf
  2012-05-02  9:56         ` Daniel P. Berrange
@ 2012-05-03 19:19         ` Anthony Liguori
  1 sibling, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2012-05-03 19:19 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: libvir-list, Stefan Hajnoczi, qemu-devel

On 05/02/2012 04:45 AM, Kevin Wolf wrote:
> Am 02.05.2012 10:53, schrieb Daniel P. Berrange:
>> I would much prefer to see us be able to pass FDs in directly alongside
>> the disk config as we do for netdev TAP/etc, and for QEMU / kernel to be
>> fixed so that you do not need to re-open FDs on the fly.
>
> I agree, and this is what -blockdev would give us.
>
> Part of why I don't like the RFC (apart from RPCing the management tool
> being just wrong) is that once again it's trying to take shortcuts and
> only provide a hack for the urgent need instead of doing it properly and
> implementing -blockdev.

The proper way to address this problem is *not* -blockdev.  -blockdev is another 
short cut.

The proper way to solve this problem is to add extended attribute to SELinux. 
Another proper solution is for libvirt to launch guests with different UIDs and 
use DAC to prevent guests from opening files.

> I suspect that if we take something half-baked
> like this, we will keep being unhappy with the situation in the block
> layer, but it won't hurt enough any more to actually spend effort on it,
> so that we'll go another five years with it.

Wanting to refactor the block layer is great.  I am fully in support of it.  But 
holding practical features hostage is not reasonable.

There is nothing intrinsically cleaner about using -blockdev fd=X verses using 
an RPC like this.  -blockdev has a lot of nice characteristics but solving this 
problem is not one of them.

Regards,

Anthony Liguori

> Kevin
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2012-05-02  9:01 ` [Qemu-devel] " Daniel P. Berrange
@ 2012-05-04  3:28 ` Zhi Yong Wu
  2012-05-17 13:42   ` Stefan Hajnoczi
  7 siblings, 1 reply; 37+ messages in thread
From: Zhi Yong Wu @ 2012-05-04  3:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> it from opening files that it should not have access to.  This improves
> security because it prevents the attacker from escaping the QEMU process if
> they manage to gain control.
>
> NFS has been a pain point for SELinux because it does not support labels (which
> I believe are stored in extended attributes).  In other words, it's not
> possible to use SELinux goodness on QEMU when image files are located on NFS.
> Today we have to allow QEMU access to any file on the NFS export rather than
> restricting specifically to the image files that the guest requires.
>
> File descriptor passing is a solution to this problem and might also come in
> handy elsewhere.  Libvirt or another external process chooses files which QEMU
> is allowed to access and provides just those file descriptors - QEMU cannot
> open the files itself.
>
> This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> open an image file it sends a request over the given UNIX domain socket.  The
> response includes the file descriptor or an errno on failure.  Please see the
> patches for details on the protocol.
>
> The -open-hook-fd approach allows QEMU to support file descriptor passing
> without changing -drive.  It also supports snapshot_blkdev and other commands
By the way, How will it support them?
> that re-open image files.
>
> Anthony Liguori <aliguori@us.ibm.com> wrote most of these patches.  I added a
> demo -open-hook-fd server and added some small fixes.  Since Anthony is
> traveling right now I'm sending the RFC for discussion.
>
> Anthony Liguori (3):
>  block: add open() wrapper that can be hooked by libvirt
>  block: add new command line parameter that and protocol description
>  block: plumb up open-hook-fd option
>
> Stefan Hajnoczi (2):
>  osdep: add qemu_recvmsg() wrapper
>  Example -open-hook-fd server
>
>  block.c           |  107 ++++++++++++++++++++++++++++++++++++++
>  block.h           |    2 +
>  block/raw-posix.c |   18 +++----
>  block/raw-win32.c |    2 +-
>  block/vdi.c       |    2 +-
>  block/vmdk.c      |    6 +--
>  block/vpc.c       |    2 +-
>  block/vvfat.c     |    4 +-
>  block_int.h       |   12 +++++
>  osdep.c           |   46 +++++++++++++++++
>  qemu-common.h     |    2 +
>  qemu-options.hx   |   42 +++++++++++++++
>  test-fd-passing.c |  147 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  vl.c              |    3 ++
>  14 files changed, 378 insertions(+), 17 deletions(-)
>  create mode 100644 test-fd-passing.c
>
> --
> 1.7.10
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-01 22:15       ` [Qemu-devel] [libvirt] " Eric Blake
  2012-05-01 22:21         ` Anthony Liguori
@ 2012-05-07 16:10         ` Corey Bryant
  1 sibling, 0 replies; 37+ messages in thread
From: Corey Bryant @ 2012-05-07 16:10 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, libvir-list,
	qemu-devel, Anthony Liguori



On 05/01/2012 06:15 PM, Eric Blake wrote:
> On 05/01/2012 03:53 PM, Anthony Liguori wrote:
>
>>> I think (correct me if I'm wrong) libvirt should be aware of any file
>>> that qemu
>>> asks it to open. So from a security point of view, libvirt can prevent
>>> opening a
>>> file if it isn't affiliated with the guest.
>>
>> Right, libvirt can maintain a whitelist of files QEMU is allowed to open
>> (which is already has because it needs to label these files).
>
> Indeed.
>
>>   The only
>> complexity is that it's not a straight strcmp().  The path needs to be
>> (carefully) broken into components with '.' and '..' handled
>> appropriately.  But this shouldn't be that difficult to do.
>
> Libvirt would probably canonicalize path names, both when sticking them
> in the whitelist, and in validating the requests from qemu - agreed that
> it's not difficult.
>
> More importantly, libvirt needs to start tracking the backing chain of
> any qcow2 or qed file as part of the domain XML; and operations like
> 'block-stream' would update not only the chain, but also the whitelist.
>   In the drive-reopen case, this means that libvirt would have to be
> careful when to change labeling - provide access to the new files before
> drive-reopen, then revoke access to files after drive-reopen completes.
>   In other words, having the -open-hook-fd client pass a command to
> libvirt at the time it is closing an fd would help libvirt know when
> qemu has quit using a file, which might make it easier to revoke SELinux
> labels at that time.
>

If we were to go with this approach, I think the following updates would 
be required for libvirt.  Could you let me know if I'm missing anything?

libvirt tasks:
- Introduce a data structure to store file whitelist per guest
- Add -open-hook-fd option to QEMU command line and pass Unix
   domain socket fd to QEMU
- Create open() handler that handles requests from QEMU to open
   files and passes back fd
- Potentially also handle close requests from QEMU?  Would allow
   libvirt to update XML and whitelist (as well as SELinux labels).
- Canonicalize path names when putting them in whitelist and
   when validating requests from QEMU
- XML updates to track backing chain of qcow2 and qed files
- Update whitelist and XML chain when QEMU monitor commands are
   used to open new files: block-stream, drive-reopen, drive_add,
   savevm, snapshot_blkdev, change

Updates would also be required for SELinux and AppArmor policy to allow 
libvirt open of NFS files, and allow QEMU read/write (no open allowed) 
of NFS Files.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-04  3:28 ` [Qemu-devel] [libvirt] " Zhi Yong Wu
@ 2012-05-17 13:42   ` Stefan Hajnoczi
  2012-05-17 13:57     ` Zhi Yong Wu
                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-17 13:42 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote:
> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
> > Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> > it from opening files that it should not have access to.  This improves
> > security because it prevents the attacker from escaping the QEMU process if
> > they manage to gain control.
> >
> > NFS has been a pain point for SELinux because it does not support labels (which
> > I believe are stored in extended attributes).  In other words, it's not
> > possible to use SELinux goodness on QEMU when image files are located on NFS.
> > Today we have to allow QEMU access to any file on the NFS export rather than
> > restricting specifically to the image files that the guest requires.
> >
> > File descriptor passing is a solution to this problem and might also come in
> > handy elsewhere.  Libvirt or another external process chooses files which QEMU
> > is allowed to access and provides just those file descriptors - QEMU cannot
> > open the files itself.
> >
> > This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> > open an image file it sends a request over the given UNIX domain socket.  The
> > response includes the file descriptor or an errno on failure.  Please see the
> > patches for details on the protocol.
> >
> > The -open-hook-fd approach allows QEMU to support file descriptor passing
> > without changing -drive.  It also supports snapshot_blkdev and other commands
> By the way, How will it support them?

The problem with snapshot_blkdev is that closing a file and opening a
new file cannot be done by the QEMU process when an SELinux policy is in
place to prevent opening files.

The -open-hook-fd approach works even when the QEMU process is not
allowed to open files since file descriptor passing over a UNIX domain
socket is used to open files on behalf of QEMU.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 13:42   ` Stefan Hajnoczi
@ 2012-05-17 13:57     ` Zhi Yong Wu
  2012-05-17 14:02     ` Zhi Yong Wu
  2012-05-17 14:14     ` Eric Blake
  2 siblings, 0 replies; 37+ messages in thread
From: Zhi Yong Wu @ 2012-05-17 13:57 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote:
>> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi
>> <stefanha@linux.vnet.ibm.com> wrote:
>> > Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>> > it from opening files that it should not have access to.  This improves
>> > security because it prevents the attacker from escaping the QEMU process if
>> > they manage to gain control.
>> >
>> > NFS has been a pain point for SELinux because it does not support labels (which
>> > I believe are stored in extended attributes).  In other words, it's not
>> > possible to use SELinux goodness on QEMU when image files are located on NFS.
>> > Today we have to allow QEMU access to any file on the NFS export rather than
>> > restricting specifically to the image files that the guest requires.
>> >
>> > File descriptor passing is a solution to this problem and might also come in
>> > handy elsewhere.  Libvirt or another external process chooses files which QEMU
>> > is allowed to access and provides just those file descriptors - QEMU cannot
>> > open the files itself.
>> >
>> > This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>> > open an image file it sends a request over the given UNIX domain socket.  The
>> > response includes the file descriptor or an errno on failure.  Please see the
>> > patches for details on the protocol.
>> >
>> > The -open-hook-fd approach allows QEMU to support file descriptor passing
>> > without changing -drive.  It also supports snapshot_blkdev and other commands
>> By the way, How will it support them?
>
> The problem with snapshot_blkdev is that closing a file and opening a
> new file cannot be done by the QEMU process when an SELinux policy is in
> place to prevent opening files.
>
> The -open-hook-fd approach works even when the QEMU process is not
> allowed to open files since file descriptor passing over a UNIX domain
> socket is used to open files on behalf of QEMU.
Do you mean that libvirt will provide QEMU with one service? When QEMU
need open or close one new file, it can send one request to libvirt?
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 13:42   ` Stefan Hajnoczi
  2012-05-17 13:57     ` Zhi Yong Wu
@ 2012-05-17 14:02     ` Zhi Yong Wu
  2012-05-18 10:38       ` Stefan Hajnoczi
  2012-05-17 14:14     ` Eric Blake
  2 siblings, 1 reply; 37+ messages in thread
From: Zhi Yong Wu @ 2012-05-17 14:02 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote:
>> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi
>> <stefanha@linux.vnet.ibm.com> wrote:
>> > Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
>> > it from opening files that it should not have access to.  This improves
>> > security because it prevents the attacker from escaping the QEMU process if
>> > they manage to gain control.
>> >
>> > NFS has been a pain point for SELinux because it does not support labels (which
>> > I believe are stored in extended attributes).  In other words, it's not
>> > possible to use SELinux goodness on QEMU when image files are located on NFS.
>> > Today we have to allow QEMU access to any file on the NFS export rather than
>> > restricting specifically to the image files that the guest requires.
>> >
>> > File descriptor passing is a solution to this problem and might also come in
>> > handy elsewhere.  Libvirt or another external process chooses files which QEMU
>> > is allowed to access and provides just those file descriptors - QEMU cannot
>> > open the files itself.
>> >
>> > This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
>> > open an image file it sends a request over the given UNIX domain socket.  The
>> > response includes the file descriptor or an errno on failure.  Please see the
>> > patches for details on the protocol.
>> >
>> > The -open-hook-fd approach allows QEMU to support file descriptor passing
>> > without changing -drive.  It also supports snapshot_blkdev and other commands
>> By the way, How will it support them?
>
> The problem with snapshot_blkdev is that closing a file and opening a
> new file cannot be done by the QEMU process when an SELinux policy is in
> place to prevent opening files.
>
> The -open-hook-fd approach works even when the QEMU process is not
> allowed to open files since file descriptor passing over a UNIX domain
> socket is used to open files on behalf of QEMU.
I thought that the patchset can only let QEMU passively get passed fd
parameter from upper application.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 13:42   ` Stefan Hajnoczi
  2012-05-17 13:57     ` Zhi Yong Wu
  2012-05-17 14:02     ` Zhi Yong Wu
@ 2012-05-17 14:14     ` Eric Blake
  2012-05-18 10:38       ` Stefan Hajnoczi
  2012-07-09 20:00       ` Anthony Liguori
  2 siblings, 2 replies; 37+ messages in thread
From: Eric Blake @ 2012-05-17 14:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Zhi Yong Wu, Kevin Wolf, Anthony Liguori, qemu-devel, libvir-list

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote:

>>>
>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>> By the way, How will it support them?
> 
> The problem with snapshot_blkdev is that closing a file and opening a
> new file cannot be done by the QEMU process when an SELinux policy is in
> place to prevent opening files.

snapshot_blkdev can take an fd:name instead of a /path/to/file for the
file to open, in which case libvirt can pass in the named fd _prior_ to
the snapshot_blkdev using the 'getfd' monitor command.

> 
> The -open-hook-fd approach works even when the QEMU process is not
> allowed to open files since file descriptor passing over a UNIX domain
> socket is used to open files on behalf of QEMU.

The -open-hook-fd approach would indeed allow snapshot_blokdev to ask
for the fd after the fact, but it's much more painful.  Consider a case
with a two-disk snapshot:

with the fd:name approach, the sequence is:

libvirt calls getfd:name1 over normal monitor
qemu responds
libvirt calls getfd:name2 over normal monitor
qemu responds
libvirt calls transaction around blockdev-snapshot-sync over normal
monitor, using fd:name1 and fd:name2
qemu responds

but with -open-hook-fd, the approach would be:

libvirt calls transaction
qemu calls open(file1) over hook
libvirt responds
qemu calls open(file2) over hook
libvirt responds
qemu responds to the original transaction

The 'transaction' operation is thus blocked by the time it takes to do
two intermediate opens over a second channel, which kind of defeats the
purpose of making the transaction take effect with minimal guest
downtime.  And libvirt code becomes a lot trickier to deal with the fact
that two channels are in use, and that the channel that issued the
'transaction' command must block while the other channel for handling
hooks must be responsive.

I'm really disliking the hook-fd approach, when a better solution is to
make use of 'getfd' in advance of any operation that will need to open
new fds.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 14:14     ` Eric Blake
@ 2012-05-18 10:38       ` Stefan Hajnoczi
  2012-07-09 20:00       ` Anthony Liguori
  1 sibling, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-18 10:38 UTC (permalink / raw)
  To: Eric Blake
  Cc: Zhi Yong Wu, Kevin Wolf, Anthony Liguori, qemu-devel, libvir-list

On Thu, May 17, 2012 at 08:14:15AM -0600, Eric Blake wrote:
> On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote:
> 
> >>>
> >>> The -open-hook-fd approach allows QEMU to support file descriptor passing
> >>> without changing -drive.  It also supports snapshot_blkdev and other commands
> >> By the way, How will it support them?
> > 
> > The problem with snapshot_blkdev is that closing a file and opening a
> > new file cannot be done by the QEMU process when an SELinux policy is in
> > place to prevent opening files.
> 
> snapshot_blkdev can take an fd:name instead of a /path/to/file for the
> file to open, in which case libvirt can pass in the named fd _prior_ to
> the snapshot_blkdev using the 'getfd' monitor command.
> 
> > 
> > The -open-hook-fd approach works even when the QEMU process is not
> > allowed to open files since file descriptor passing over a UNIX domain
> > socket is used to open files on behalf of QEMU.
> 
> The -open-hook-fd approach would indeed allow snapshot_blokdev to ask
> for the fd after the fact, but it's much more painful.  Consider a case
> with a two-disk snapshot:
> 
> with the fd:name approach, the sequence is:
> 
> libvirt calls getfd:name1 over normal monitor
> qemu responds
> libvirt calls getfd:name2 over normal monitor
> qemu responds
> libvirt calls transaction around blockdev-snapshot-sync over normal
> monitor, using fd:name1 and fd:name2
> qemu responds
> 
> but with -open-hook-fd, the approach would be:
> 
> libvirt calls transaction
> qemu calls open(file1) over hook
> libvirt responds
> qemu calls open(file2) over hook
> libvirt responds
> qemu responds to the original transaction
> 
> The 'transaction' operation is thus blocked by the time it takes to do
> two intermediate opens over a second channel, which kind of defeats the
> purpose of making the transaction take effect with minimal guest
> downtime.  And libvirt code becomes a lot trickier to deal with the fact
> that two channels are in use, and that the channel that issued the
> 'transaction' command must block while the other channel for handling
> hooks must be responsive.
> 
> I'm really disliking the hook-fd approach, when a better solution is to
> make use of 'getfd' in advance of any operation that will need to open
> new fds.

This is a good technical argument for using getfd.  I agree with you.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 14:02     ` Zhi Yong Wu
@ 2012-05-18 10:38       ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2012-05-18 10:38 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: Kevin Wolf, libvir-list, Anthony Liguori, qemu-devel

On Thu, May 17, 2012 at 10:02:01PM +0800, Zhi Yong Wu wrote:
> On Thu, May 17, 2012 at 9:42 PM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
> > On Fri, May 04, 2012 at 11:28:47AM +0800, Zhi Yong Wu wrote:
> >> On Tue, May 1, 2012 at 11:31 PM, Stefan Hajnoczi
> >> <stefanha@linux.vnet.ibm.com> wrote:
> >> > Libvirt can take advantage of SELinux to restrict the QEMU process and prevent
> >> > it from opening files that it should not have access to.  This improves
> >> > security because it prevents the attacker from escaping the QEMU process if
> >> > they manage to gain control.
> >> >
> >> > NFS has been a pain point for SELinux because it does not support labels (which
> >> > I believe are stored in extended attributes).  In other words, it's not
> >> > possible to use SELinux goodness on QEMU when image files are located on NFS.
> >> > Today we have to allow QEMU access to any file on the NFS export rather than
> >> > restricting specifically to the image files that the guest requires.
> >> >
> >> > File descriptor passing is a solution to this problem and might also come in
> >> > handy elsewhere.  Libvirt or another external process chooses files which QEMU
> >> > is allowed to access and provides just those file descriptors - QEMU cannot
> >> > open the files itself.
> >> >
> >> > This series adds the -open-hook-fd command-line option.  Whenever QEMU needs to
> >> > open an image file it sends a request over the given UNIX domain socket.  The
> >> > response includes the file descriptor or an errno on failure.  Please see the
> >> > patches for details on the protocol.
> >> >
> >> > The -open-hook-fd approach allows QEMU to support file descriptor passing
> >> > without changing -drive.  It also supports snapshot_blkdev and other commands
> >> By the way, How will it support them?
> >
> > The problem with snapshot_blkdev is that closing a file and opening a
> > new file cannot be done by the QEMU process when an SELinux policy is in
> > place to prevent opening files.
> >
> > The -open-hook-fd approach works even when the QEMU process is not
> > allowed to open files since file descriptor passing over a UNIX domain
> > socket is used to open files on behalf of QEMU.
> I thought that the patchset can only let QEMU passively get passed fd
> parameter from upper application.

No.  What this patch series does is make QEMU request file descriptors
from an external process (e.g. libvirt) each time it wants to open an
image file.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-05-17 14:14     ` Eric Blake
  2012-05-18 10:38       ` Stefan Hajnoczi
@ 2012-07-09 20:00       ` Anthony Liguori
  2012-07-09 20:29         ` Eric Blake
  1 sibling, 1 reply; 37+ messages in thread
From: Anthony Liguori @ 2012-07-09 20:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: Zhi Yong Wu, Kevin Wolf, Stefan Hajnoczi, libvir-list, qemu-devel

On 05/17/2012 09:14 AM, Eric Blake wrote:
> On 05/17/2012 07:42 AM, Stefan Hajnoczi wrote:
>
>>>>
>>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>>> without changing -drive.  It also supports snapshot_blkdev and other commands
>>> By the way, How will it support them?
>>
>> The problem with snapshot_blkdev is that closing a file and opening a
>> new file cannot be done by the QEMU process when an SELinux policy is in
>> place to prevent opening files.
>
> snapshot_blkdev can take an fd:name instead of a /path/to/file for the
> file to open, in which case libvirt can pass in the named fd _prior_ to
> the snapshot_blkdev using the 'getfd' monitor command.
>
>>
>> The -open-hook-fd approach works even when the QEMU process is not
>> allowed to open files since file descriptor passing over a UNIX domain
>> socket is used to open files on behalf of QEMU.
>
> The -open-hook-fd approach would indeed allow snapshot_blokdev to ask
> for the fd after the fact, but it's much more painful.  Consider a case
> with a two-disk snapshot:
>
> with the fd:name approach, the sequence is:
>
> libvirt calls getfd:name1 over normal monitor
> qemu responds
> libvirt calls getfd:name2 over normal monitor
> qemu responds
> libvirt calls transaction around blockdev-snapshot-sync over normal
> monitor, using fd:name1 and fd:name2
> qemu responds
>
> but with -open-hook-fd, the approach would be:
>
> libvirt calls transaction
> qemu calls open(file1) over hook
> libvirt responds
> qemu calls open(file2) over hook
> libvirt responds
> qemu responds to the original transaction
>
> The 'transaction' operation is thus blocked by the time it takes to do
> two intermediate opens over a second channel, which kind of defeats the
> purpose of making the transaction take effect with minimal guest
> downtime.

How are you defining "guest down time"?

It's important to note that code running in QEMU does not equate to guest 
visible down time unless QEMU does an explicit vm_stop() which is not happening 
here.

Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute while 
QEMU is holding it.

If your concern is qemu_mutex being held while waiting for libvirt, it would be 
fairly easy to implement a qemu_open_async() that dropped allowed dropping back 
to the main loop and then calling a callback when the open completes.

It would be pretty trivial to convert qmp_transaction to use such a command.

But this is all speculative.  There's no reason to believe that an RPC would 
have a noticable guest visible latency unless you assume there's lot contention. 
  I would strongly suspect that the bdrv_flush() is going to be a much greater 
source of lock contention than the RPC would be.  An RPC is only bound by 
scheduler latency whereas synchronous disk I/O is bound spinning a platter.

> And libvirt code becomes a lot trickier to deal with the fact
> that two channels are in use, and that the channel that issued the
> 'transaction' command must block while the other channel for handling
> hooks must be responsive.

All libvirt needs to do is listen on a socket and delegate access according to a 
white list.  Whatever is providing fd's needs to have no knowledge of anythign 
other than what the guest is allowed to access which shouldn't depend on an 
executing command.

Regards,

Anthony Liguori

> I'm really disliking the hook-fd approach, when a better solution is to
> make use of 'getfd' in advance of any operation that will need to open
> new fds.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-07-09 20:00       ` Anthony Liguori
@ 2012-07-09 20:29         ` Eric Blake
  2012-07-09 20:46           ` Anthony Liguori
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Blake @ 2012-07-09 20:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Corey Bryant,
	qemu-devel, Zhi Yong Wu

[-- Attachment #1: Type: text/plain, Size: 6083 bytes --]

On 07/09/2012 02:00 PM, Anthony Liguori wrote:

>> with the fd:name approach, the sequence is:
>>
>> libvirt calls getfd:name1 over normal monitor
>> qemu responds
>> libvirt calls getfd:name2 over normal monitor
>> qemu responds
>> libvirt calls transaction around blockdev-snapshot-sync over normal
>> monitor, using fd:name1 and fd:name2
>> qemu responds

This general layout is true whether we rewrite all commands to
understand fd:nnn (proposal 1) or whether we add new magic parsing
(/dev/fd/nnn of proposal 3, or even /dev/fdset/nnn of proposal 5), all
as called out in these messages:

https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00227.html
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01098.html

>>
>> but with -open-hook-fd, the approach would be:
>>
>> libvirt calls transaction
>> qemu calls open(file1) over hook
>> libvirt responds
>> qemu calls open(file2) over hook
>> libvirt responds
>> qemu responds to the original transaction

whereas this approach is quite different in semantics, but may indeed be
easier for qemu to implement, at the expense of some more complexity on
the part of libvirt.

At the high level, I think both approaches have one thing in common: by
refactoring all qemu code to go through qemu_open(), we can then
implement our desired complexity (whether fd:nn, /dev/fd/nnn,
/dev/fdset/nnn, or some other magic name parsing; or whether it is an
rpc call over a second socket in parallel to the monitor socket) in just
one location.  Likewise, both approaches have to deal with libvirtd
restarts (magic name parsing by changing an 'inuse' flag when the
monitor detects EOF; rpc passing by failing a qemu_open() when the rpc
socket detects EOF).

>>
>> The 'transaction' operation is thus blocked by the time it takes to do
>> two intermediate opens over a second channel, which kind of defeats the
>> purpose of making the transaction take effect with minimal guest
>> downtime.
> 
> How are you defining "guest down time"?
> 
> It's important to note that code running in QEMU does not equate to
> guest visible down time unless QEMU does an explicit vm_stop() which is
> not happening here.
> 
> Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute
> while QEMU is holding it.
> 
> If your concern is qemu_mutex being held while waiting for libvirt, it
> would be fairly easy to implement a qemu_open_async() that dropped
> allowed dropping back to the main loop and then calling a callback when
> the open completes.
> 
> It would be pretty trivial to convert qmp_transaction to use such a
> command.

In other words, remembering that transactions are divided into phases:

phase 1 - prepare: obtain all needed fds (whether by pre-opening them
via 'pass-fd' or other new 'getfd' relative, or whether by RPC calls);
no guest downtime, and with cleanup that avoids any leaks on any failures
phase 2 - commit: flush all devices and actually make the changes in
qemu state to use the fds obtained in phase 1

and where the guest downtime (if any) is more likely due to flushing
changes in phase 2

> 
> But this is all speculative.  There's no reason to believe that an RPC
> would have a noticable guest visible latency unless you assume there's
> lot contention.  I would strongly suspect that the bdrv_flush() is going
> to be a much greater source of lock contention than the RPC would be. 
> An RPC is only bound by scheduler latency whereas synchronous disk I/O
> is bound spinning a platter.
> 
>> And libvirt code becomes a lot trickier to deal with the fact
>> that two channels are in use, and that the channel that issued the
>> 'transaction' command must block while the other channel for handling
>> hooks must be responsive.
> 
> All libvirt needs to do is listen on a socket and delegate access
> according to a white list.  Whatever is providing fd's needs to have no
> knowledge of anythign other than what the guest is allowed to access
> which shouldn't depend on an executing command.

That's not quite accurate.  What the guest is allowed to access should
indeed change depending on the executing command.  That is, if I start a
guest with:

base <- delta

then I only want to permet O_RDONLY access to base but O_RDWR access to
delta.  If I then call 'blockdev-snapshot-sync', I want to change to the
situation:

base <- delta <- snap

and give O_RDWR permissions to snap; it would also be nice if qemu
attempts to reopen delta with O_RDONLY permissions (although from a
trust perspective, libvirt must assume that delta is still O_RDWR
because qemu may have been compromised and lie about the tightening of
permissions); at any rate, depending on SELinux capabilities of the
file, libvirt may be able to enforce no further writes to 'delta' by
toggling a SELinux label (obviously, this should only be done after
'blockdev-snapshot-sync' completes).

On the other hand, the user could decide to do a 'block-commit', to
squash things into:

base

where base is now O_RDWR.  But libvirt doesn't want to grant
write-access to 'base' up-front, so the whitelist allowing O_RDWR access
to base is indeed dependent on the user running a monitor command that
would cause qemu to need to change to a more permissive access in the
first place.

Maybe the only reason that I'm still leaning towards a 'pass-fd'
solution instead of a hook fd solution is that libvirt would have less
code to write to get it working.  But it was originally Dan's complaint
that an rpc solution has too much risk for deadlock or leaks; if we are
confident that we can avoid deadlock, and that the idea of passing in
fds in response to an rpc involves less bookkeeping and speculation on
libvirt's part about which monitor commands will require opening an fd,
then maybe it really is a better technical solution, even if it costs
more libvirt code to implement.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
  2012-07-09 20:29         ` Eric Blake
@ 2012-07-09 20:46           ` Anthony Liguori
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2012-07-09 20:46 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, Stefan Hajnoczi, libvir-list, Corey Bryant,
	qemu-devel, Zhi Yong Wu

On 07/09/2012 03:29 PM, Eric Blake wrote:
> On 07/09/2012 02:00 PM, Anthony Liguori wrote:
>
>>> with the fd:name approach, the sequence is:
>>>
>>> libvirt calls getfd:name1 over normal monitor
>>> qemu responds
>>> libvirt calls getfd:name2 over normal monitor
>>> qemu responds
>>> libvirt calls transaction around blockdev-snapshot-sync over normal
>>> monitor, using fd:name1 and fd:name2
>>> qemu responds
>
> This general layout is true whether we rewrite all commands to
> understand fd:nnn (proposal 1) or whether we add new magic parsing
> (/dev/fd/nnn of proposal 3, or even /dev/fdset/nnn of proposal 5), all
> as called out in these messages:
>
> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg00227.html
> https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01098.html
>
>>>
>>> but with -open-hook-fd, the approach would be:
>>>
>>> libvirt calls transaction
>>> qemu calls open(file1) over hook
>>> libvirt responds
>>> qemu calls open(file2) over hook
>>> libvirt responds
>>> qemu responds to the original transaction
>
> whereas this approach is quite different in semantics, but may indeed be
> easier for qemu to implement, at the expense of some more complexity on
> the part of libvirt.
>
> At the high level, I think both approaches have one thing in common: by
> refactoring all qemu code to go through qemu_open(), we can then
> implement our desired complexity (whether fd:nn, /dev/fd/nnn,
> /dev/fdset/nnn, or some other magic name parsing; or whether it is an
> rpc call over a second socket in parallel to the monitor socket) in just
> one location.  Likewise, both approaches have to deal with libvirtd
> restarts (magic name parsing by changing an 'inuse' flag when the
> monitor detects EOF; rpc passing by failing a qemu_open() when the rpc
> socket detects EOF).

Ack.

>
>>>
>>> The 'transaction' operation is thus blocked by the time it takes to do
>>> two intermediate opens over a second channel, which kind of defeats the
>>> purpose of making the transaction take effect with minimal guest
>>> downtime.
>>
>> How are you defining "guest down time"?
>>
>> It's important to note that code running in QEMU does not equate to
>> guest visible down time unless QEMU does an explicit vm_stop() which is
>> not happening here.
>>
>> Instead, a VCPU may become blocked *if* it attempts to acquire qemu_mute
>> while QEMU is holding it.
>>
>> If your concern is qemu_mutex being held while waiting for libvirt, it
>> would be fairly easy to implement a qemu_open_async() that dropped
>> allowed dropping back to the main loop and then calling a callback when
>> the open completes.
>>
>> It would be pretty trivial to convert qmp_transaction to use such a
>> command.
>
> In other words, remembering that transactions are divided into phases:
>
> phase 1 - prepare: obtain all needed fds (whether by pre-opening them
> via 'pass-fd' or other new 'getfd' relative, or whether by RPC calls);
> no guest downtime, and with cleanup that avoids any leaks on any failures
> phase 2 - commit: flush all devices and actually make the changes in
> qemu state to use the fds obtained in phase 1
>
> and where the guest downtime (if any) is more likely due to flushing
> changes in phase 2

Not quite.  A synchronous flush can cause lock contention.  We need to separate 
out the problem of lock contention from guest down time.

Also, there's no obvious need to move the flushes before opens.  The main issue 
is that we use qemu_mutex to effectively create a write queue.

You can imagine a simple write queueing mechanism that would obviate the need 
need for this such that we could flush, queue upcoming writes, and drop 
qemu_mutex to sleep waiting for libvirt to send us our fds.

>> But this is all speculative.  There's no reason to believe that an RPC
>> would have a noticable guest visible latency unless you assume there's
>> lot contention.  I would strongly suspect that the bdrv_flush() is going
>> to be a much greater source of lock contention than the RPC would be.
>> An RPC is only bound by scheduler latency whereas synchronous disk I/O
>> is bound spinning a platter.
>>
>>> And libvirt code becomes a lot trickier to deal with the fact
>>> that two channels are in use, and that the channel that issued the
>>> 'transaction' command must block while the other channel for handling
>>> hooks must be responsive.
>>
>> All libvirt needs to do is listen on a socket and delegate access
>> according to a white list.  Whatever is providing fd's needs to have no
>> knowledge of anythign other than what the guest is allowed to access
>> which shouldn't depend on an executing command.
>
> That's not quite accurate.  What the guest is allowed to access should
> indeed change depending on the executing command.  That is, if I start a
> guest with:

I should have spoke more clearly.  libvirt may change the white list for various 
reasons dynamically.  But there shouldn't be a direct dependency on whatever is 
serving up fd's and whatever is changing the white list.

Basically, you just need a shared hash table for each guest.  It should be quite 
simple.

> Maybe the only reason that I'm still leaning towards a 'pass-fd'
> solution instead of a hook fd solution is that libvirt would have less
> code to write to get it working.  But it was originally Dan's complaint
> that an rpc solution has too much risk for deadlock or leaks;

The reason I came back to this is that after reading through the threads, I 
started thinking about how to solve the leak problem.

You need clear ownership.  Having QEMU "own" a file descriptor because it "asks" 
for an fd allows QEMU to be the clear owner.  OTOH, having libvirt give QEMU an 
fd with a floating reference that QEMU may or not may not pin ends up being 
extremely complex in practice.  I'm not sure you can really solve the reliable 
closing problem either.  If you did have a "kill all floating references" 
command, that could introduce other problems (what about multiple clients?).

> if we are
> confident that we can avoid deadlock,

I don't think deadlocks are possible FWIW.

> and that the idea of passing in
> fds in response to an rpc involves less bookkeeping and speculation on
> libvirt's part about which monitor commands will require opening an fd,
> then maybe it really is a better technical solution, even if it costs
> more libvirt code to implement.

I think the important part is that it allows libvirt to not have to have 
intimate knowledge of how QEMU commands work.  If we decide we need to change 
the flags/perms on a file descriptor down the road, it's a lot easier to cope 
with that as it is to cope with changing the order in which we open files.

Plus, once you implement this in libvirt, you don't have to worry about it for 
future block commands.  With fdsets, would need to deal with figuring out the 
magic incantation of setfd commands for all future block commands.

Plus, making /dev/fdset be treated as not a valid file path is just asking for 
trouble down the road...

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-07-09 20:47 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
2012-05-02  8:58   ` Daniel P. Berrange
2012-05-02  9:03   ` Daniel P. Berrange
2012-05-01 15:31 ` [Qemu-devel] [RFC 3/5] block: plumb up open-hook-fd option Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 4/5] osdep: add qemu_recvmsg() wrapper Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 5/5] Example -open-hook-fd server Stefan Hajnoczi
2012-05-01 16:04   ` [Qemu-devel] [libvirt] " Stefan Hajnoczi
2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
2012-05-01 21:52     ` Anthony Liguori
2012-05-02 16:40     ` Paolo Bonzini
2012-05-01 21:45   ` [Qemu-devel] " Corey Bryant
2012-05-01 21:53     ` Anthony Liguori
2012-05-01 22:15       ` [Qemu-devel] [libvirt] " Eric Blake
2012-05-01 22:21         ` Anthony Liguori
2012-05-07 16:10         ` Corey Bryant
2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
2012-05-02  8:27     ` Stefan Hajnoczi
2012-05-02  9:38       ` Kevin Wolf
2012-05-02  8:53     ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2012-05-02  9:45       ` Kevin Wolf
2012-05-02  9:56         ` Daniel P. Berrange
2012-05-02 19:25           ` Paolo Bonzini
2012-05-03 19:19         ` Anthony Liguori
2012-05-02  9:01 ` [Qemu-devel] " Daniel P. Berrange
2012-05-04  3:28 ` [Qemu-devel] [libvirt] " Zhi Yong Wu
2012-05-17 13:42   ` Stefan Hajnoczi
2012-05-17 13:57     ` Zhi Yong Wu
2012-05-17 14:02     ` Zhi Yong Wu
2012-05-18 10:38       ` Stefan Hajnoczi
2012-05-17 14:14     ` Eric Blake
2012-05-18 10:38       ` Stefan Hajnoczi
2012-07-09 20:00       ` Anthony Liguori
2012-07-09 20:29         ` Eric Blake
2012-07-09 20:46           ` Anthony Liguori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.