All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets
@ 2012-07-23 13:07 Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

libvirt's sVirt security driver provides SELinux MAC isolation for
Qemu guest processes and their corresponding image files.  In other
words, sVirt uses SELinux to prevent a QEMU process from opening
files that do not belong to it.

sVirt provides this support by labeling guests and resources with
security labels that are stored in file system extended attributes.
Some file systems, such as NFS, do not support the extended
attribute security namespace, and therefore cannot support sVirt
isolation.

A solution to this problem is to provide fd passing support, where
libvirt opens files and passes file descriptors to QEMU.  This,
along with SELinux policy to prevent QEMU from opening files, can
provide image file isolation for NFS files stored on the same NFS
mount.

This patch series adds the add-fd, remove-fd, and query-fdsets
QMP monitor commands, which allow file descriptors to be passed
via SCM_RIGHTS, and assigned to specified fd sets.  This allows
fd sets to be created per file with fds having, for example,
different access rights.  When QEMU needs to reopen a file with
different access rights, it can search for a matching fd in the
fd set.  Fd sets also allow for easy tracking of fds per file,
helping to prevent fd leaks.

Support is also added to the block layer to allow QEMU to dup an
fd from an fdset when the filename is of the /dev/fdset/nnn format,
where nnn is the fd set ID.

No new SELinux policy is required to prevent open of NFS files
(files with type nfs_t).  The virt_use_nfs boolean type simply
needs to be set to false, and open will be prevented (and dup will
be allowed).  For example:

    # setsebool virt_use_nfs 0
    # getsebool virt_use_nfs
    virt_use_nfs --> off

Corey Bryant (6):
  qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
  qapi: Introduce add-fd, remove-fd, query-fdsets
  monitor: Clean up fd sets on monitor disconnect
  block: Convert open calls to qemu_open
  block: Convert close calls to qemu_close
  block: Enable qemu_open/close to work with fd sets

 block/raw-posix.c |   42 ++++-----
 block/raw-win32.c |    6 +-
 block/vdi.c       |    5 +-
 block/vmdk.c      |   25 +++---
 block/vpc.c       |    4 +-
 block/vvfat.c     |   16 ++--
 cutils.c          |    5 ++
 monitor.c         |  244 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 monitor.h         |    4 +
 osdep.c           |  141 +++++++++++++++++++++++++++++++
 qapi-schema.json  |   97 +++++++++++++++++++++
 qemu-char.c       |   10 ++-
 qemu-common.h     |    2 +
 qemu-tool.c       |   12 +++
 qmp-commands.hx   |  121 ++++++++++++++++++++++++++
 savevm.c          |    4 +-
 16 files changed, 684 insertions(+), 54 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-23 22:50   ` Eric Blake
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets Corey Bryant
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

Set the close-on-exec flag for the file descriptor received
via SCM_RIGHTS.

v4
 -This patch is new in v4 (eblake@redhat.com)

v5
 -Fallback to FD_CLOEXEC if MSG_CMSG_CLOEXEC is not available
  (eblake@redhat.com, stefanha@linux.vnet.ibm.com)

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 qemu-char.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/qemu-char.c b/qemu-char.c
index c2aaaee..eedf66d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2263,9 +2263,17 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len)
     msg.msg_control = &msg_control;
     msg.msg_controllen = sizeof(msg_control);
 
+#ifdef MSG_CMSG_CLOEXEC
+    ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
+#else
     ret = recvmsg(s->fd, &msg, 0);
-    if (ret > 0 && s->is_unix)
+    if (ret > 0) {
+        qemu_set_cloexec(s->fd);
+    }
+#endif
+    if (ret > 0 && s->is_unix) {
         unix_process_msgfd(chr, &msg);
+    }
 
     return ret;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-25 18:16   ` Eric Blake
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 3/6] monitor: Clean up fd sets on monitor disconnect Corey Bryant
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

This patch adds support that enables passing of file descriptors
to the QEMU monitor where they will be stored in specified file
descriptor sets.

A file descriptor set can be used by a client like libvirt to
store file descriptors for the same file.  This allows the
client to open a file with different access modes (O_RDWR,
O_WRONLY, O_RDONLY) and add/remove the passed fds to/from an fd
set as needed.  This will allow QEMU to (in a later patch in this
series) "open" and "reopen" the same file by dup()ing the fd in
the fd set that corresponds to the file, where the fd has the
matching access mode flag that QEMU requests.

The new QMP commands are:
  add-fd: Add a file descriptor to an fd set
  remove-fd: Remove a file descriptor from an fd set
  query-fdsets: Return information describing all fd sets

Note: These commands are not compatible with the existing getfd
and closefd QMP commands.

v5:
 -This patch is new in v5 and replaces the pass-fd QMP command
  from v4.
 -By grouping fds in fd sets, we ease managability with an fd
  set per file, addressing concerns raised in v4 about handling
  "reopens" and preventing fd leakage. (eblake@redhat.com,
  kwolf@redhat.com, dberrange@redhat.com)

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 monitor.c        |  144 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json |   97 ++++++++++++++++++++++++++++++++++++
 qmp-commands.hx  |  121 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 361 insertions(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index 09aa3cd..e27dbbe 100644
--- a/monitor.c
+++ b/monitor.c
@@ -140,6 +140,24 @@ struct mon_fd_t {
     QLIST_ENTRY(mon_fd_t) next;
 };
 
+/* file descriptor associated with a file descriptor set */
+typedef struct mon_fdset_fd_t mon_fdset_fd_t;
+struct mon_fdset_fd_t {
+    int fd;
+    bool removed;
+    QLIST_ENTRY(mon_fdset_fd_t) next;
+};
+
+/* file descriptor set containing fds passed via SCM_RIGHTS */
+typedef struct mon_fdset_t mon_fdset_t;
+struct mon_fdset_t {
+    int64_t id;
+    int refcount;
+    bool in_use;
+    QLIST_HEAD(, mon_fdset_fd_t) fds;
+    QLIST_ENTRY(mon_fdset_t) next;
+};
+
 typedef struct MonitorControl {
     QObject *id;
     JSONMessageParser parser;
@@ -176,7 +194,8 @@ struct Monitor {
     int print_calls_nr;
 #endif
     QError *error;
-    QLIST_HEAD(,mon_fd_t) fds;
+    QLIST_HEAD(, mon_fd_t) fds;
+    QLIST_HEAD(, mon_fdset_t) fdsets;
     QLIST_ENTRY(Monitor) entry;
 };
 
@@ -2396,6 +2415,129 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
     return -1;
 }
 
+static void monitor_fdset_cleanup(mon_fdset_t *mon_fdset)
+{
+    mon_fdset_fd_t *mon_fdset_fd;
+    mon_fdset_fd_t *mon_fdset_fd_next;
+
+    if (mon_fdset->refcount != 0) {
+        return;
+    }
+
+    QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
+        if (!mon_fdset->in_use || mon_fdset_fd->removed) {
+            close(mon_fdset_fd->fd);
+            QLIST_REMOVE(mon_fdset_fd, next);
+            g_free(mon_fdset_fd);
+        }
+    }
+
+    if (QLIST_EMPTY(&mon_fdset->fds)) {
+        QLIST_REMOVE(mon_fdset, next);
+        g_free(mon_fdset);
+    }
+}
+
+AddfdInfo *qmp_add_fd(int64_t fdset_id, Error **errp)
+{
+    int fd;
+    Monitor *mon = cur_mon;
+    mon_fdset_t *mon_fdset;
+    mon_fdset_fd_t *mon_fdset_fd;
+    AddfdInfo *fdinfo;
+
+    fd = qemu_chr_fe_get_msgfd(mon->chr);
+    if (fd == -1) {
+        qerror_report(QERR_FD_NOT_SUPPLIED);
+        return NULL;
+    }
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id == fdset_id) {
+            break;
+        }
+    }
+
+    if (!mon_fdset) {
+        mon_fdset = g_malloc0(sizeof(*mon_fdset));
+        mon_fdset->id = fdset_id;
+        mon_fdset->refcount = 0;
+        mon_fdset->in_use = true;
+        QLIST_INSERT_HEAD(&mon->fdsets, mon_fdset, next);
+    }
+
+    mon_fdset_fd = g_malloc0(sizeof(*mon_fdset_fd));
+    mon_fdset_fd->fd = fd;
+    mon_fdset_fd->removed = false;
+    QLIST_INSERT_HEAD(&mon_fdset->fds, mon_fdset_fd, next);
+
+    fdinfo = g_malloc0(sizeof(*fdinfo));
+    fdinfo->fdset_id = fdset_id;
+    fdinfo->fd = fd;
+
+    return fdinfo;
+}
+
+void qmp_remove_fd(int64_t fdset_id, int64_t fd, Error **errp)
+{
+    Monitor *mon = cur_mon;
+    mon_fdset_t *mon_fdset;
+    mon_fdset_fd_t *mon_fdset_fd;
+    char fd_str[20];
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id != fdset_id) {
+            continue;
+        }
+        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
+            if (mon_fdset_fd->fd != fd) {
+                continue;
+            }
+            mon_fdset_fd->removed = true;
+            monitor_fdset_cleanup(mon_fdset);
+            return;
+        }
+    }
+    snprintf(fd_str, sizeof(fd_str), "%ld", fd);
+    qerror_report(QERR_FD_NOT_FOUND, fd_str);
+}
+
+FdsetInfoList *qmp_query_fdsets(Error **errp)
+{
+    Monitor *mon = cur_mon;
+    mon_fdset_t *mon_fdset;
+    mon_fdset_fd_t *mon_fdset_fd;
+    FdsetInfoList *fdset_list = NULL;
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        FdsetInfoList *fdset_info = g_malloc0(sizeof(*fdset_info));
+        FdsetFdInfoList *fdsetfd_list = NULL;
+
+        fdset_info->value = g_malloc0(sizeof(*fdset_info->value));
+        fdset_info->value->fdset_id = mon_fdset->id;
+        fdset_info->value->refcount = mon_fdset->refcount;
+        fdset_info->value->in_use = mon_fdset->in_use;
+
+        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
+            FdsetFdInfoList *fdsetfd_info = g_malloc0(sizeof(*fdsetfd_info));
+
+            fdsetfd_info->value = g_malloc0(sizeof(*fdsetfd_info->value));
+            fdsetfd_info->value->fd = mon_fdset_fd->fd;
+            fdsetfd_info->value->removed = mon_fdset_fd->removed;
+
+            fdsetfd_info->next = fdsetfd_list;
+            fdsetfd_list = fdsetfd_info;
+        }
+
+        fdset_info->value->fds = fdsetfd_list;
+
+        fdset_info->next = fdset_list;
+        fdset_list = fdset_info;
+    }
+
+    return fdset_list;
+}
+
 /* mon_cmds and info_cmds would be sorted at runtime */
 static mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
diff --git a/qapi-schema.json b/qapi-schema.json
index a92adb1..c07ff07 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1905,3 +1905,100 @@
 # Since: 0.14.0
 ##
 { 'command': 'closefd', 'data': {'fdname': 'str'} }
+
+# @AddfdInfo:
+#
+# Information about a file descriptor that was added to an fd set.
+#
+# @fdset_id: The ID of the fd set that @fd was added to.
+#
+# @fd: The file descriptor that was received via SCM rights and
+#      added to the fd set.
+#
+# Since: 1.2.0
+##
+{ 'type': 'AddfdInfo', 'data': {'fdset_id': 'int', 'fd': 'int'} }
+
+##
+# @add-fd:
+#
+# Add a file descriptor, that was passed via SCM rights, to an fd set.
+#
+# @fdset_id: The ID of the fd set to add the file descriptor to.
+#
+# Returns: @AddfdInfo on success
+#          If file descriptor was not received, FdNotSupplied
+#
+# Since: 1.2.0
+##
+{ 'command': 'add-fd', 'data': {'fdset_id': 'int'},
+  'returns': 'AddfdInfo' }
+
+##
+# @remove-fd:
+#
+# Remove a file descriptor from an fd set.
+#
+# @fdset_id: The ID of the fd set that the file descriptor belongs to.
+#
+# @fd: The file descriptor that is to be removed.
+#
+# Returns: Nothing on success
+#          If @fdset_id or @fd is not found, FdNotFound
+#
+# Since: 1.2.0
+#
+# Notes: File descriptors that are removed:
+#        o will not be closed until the reference count corresponding
+#          to @fdset_id reaches zero.
+#        o will not be available for use after successful completion
+#          of the remove-fd command.
+##
+{ 'command': 'remove-fd', 'data': {'fdset_id': 'int', 'fd': 'int'} }
+
+##
+# @FdsetFdInfo:
+#
+# Information about a file descriptor that belongs to an fd set.
+#
+# @fd: The file descriptor value.
+#
+# @removed: If true, the remove-fd command has been issued for this fd.
+#
+# Since: 1.2.0
+##
+{ 'type': 'FdsetFdInfo', 'data': {'fd': 'int', 'removed': 'bool'} }
+
+##
+# @FdsetInfo:
+#
+# Information about an fd set.
+#
+# @fdset_id: The ID of the fd set.
+#
+# @refcount: A count of the number of outstanding dup() references to
+#            this fd set.
+#
+# @in_use: If true, this fd set is in use by a connected QMP monitor.
+#
+# @fds: A list of file descriptors that belong to this fd set.
+#
+# Since: 1.2.0
+##
+{ 'type': 'FdsetInfo',
+  'data': {'fdset_id': 'int', 'refcount': 'int', 'in_use': 'bool',
+           'fds': ['FdsetFdInfo']} }
+
+##
+# @query-fdsets:
+#
+# Return information describing all fd sets.
+#
+# Returns: A list of @FdsetInfo
+#
+# Since: 1.2.0
+#
+# Notes: File descriptors are not closed until @refcount is zero,
+#        and either @in_use is false or @removed is true.
+##
+{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index e3cf3c5..d485df4 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -926,6 +926,127 @@ Example:
 
 EQMP
 
+     {
+        .name       = "add-fd",
+        .args_type  = "fdset_id:i",
+        .params     = "add-fd fdset_id",
+        .help       = "Add a file descriptor, that was passed via SCM rights, to an fd set",
+        .mhandler.cmd_new = qmp_marshal_input_add_fd,
+    },
+
+SQMP
+add-fd
+-------
+
+Add a file descriptor, that was passed via SCM rights, to an fd set.
+
+Arguments:
+
+- "fdset_id": The ID of the fd set to add the file descriptor to.  (json-int)
+
+Return a json-object with the following information:
+
+- "fdset_id": The ID of the fd set that the fd was added to. (json-int)
+- "fd": The file descriptor that was received via SCM rights and added to the
+        fd set. (json-int)
+
+Example:
+
+-> { "execute": "add-fd", "arguments": { "fdset_id": 1 } }
+<- { "return": { "fdset_id": 1, "fd": 3 } }
+
+EQMP
+
+     {
+        .name       = "remove-fd",
+        .args_type  = "fdset_id:i,fd:i",
+        .params     = "remove-fd fdset_id fd",
+        .help       = "Remove a file descriptor from an fd set",
+        .mhandler.cmd_new = qmp_marshal_input_remove_fd,
+    },
+
+SQMP
+remove-fd
+---------
+
+Remove a file descriptor from an fd set.
+
+Arguments:
+
+- "fdset_id": The ID of the fd set that the file descriptor belongs to.
+              (json-int)
+- "fd": The file descriptor that is to be removed. (json-int)
+
+Example:
+
+-> { "execute": "remove-fd", "arguments": { "fdset_id": 1, "fd": 3 } }
+<- { "return": {} }
+
+Notes:
+
+(1) File descriptors that are removed:
+    (a) will not be closed until the reference count corresponding to fdset_id
+        reaches zero.
+    (b) will not be available for use after successful completion of the
+        remove-fd command.
+
+EQMP
+
+    {
+        .name       = "query-fdsets",
+        .args_type  = "",
+        .help       = "Return information describing all fd sets",
+        .mhandler.cmd_new = qmp_marshal_input_query_fdsets,
+    },
+
+SQMP
+query-fdsets
+-------------
+
+Return information describing all fd sets.
+
+Arguments: None
+
+Example:
+
+-> { "execute": "query-fdsets" }
+<- { "return": [
+       {
+         "fdset_id": 1
+         "refcount": 0,
+         "in_use": true,
+         "fds": [
+           {
+             "fd": 21,
+             "removed": false
+           },
+           {
+             "fd": 23,
+             "removed": false
+           }
+         ],
+       },
+       {
+         "fdset_id": 2
+         "refcount": 0,
+         "in_use": true,
+         "fds": [
+           {
+             "fd": 22,
+             "removed": false
+           }
+         ],
+       }
+     ]
+   }
+
+Notes:
+
+(1) File descriptors are not closed until refcount is zero, and
+    either in_use is false or removed is true.
+
+EQMP
+
     {
         .name       = "block_passwd",
         .args_type  = "device:B,password:s",
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 3/6] monitor: Clean up fd sets on monitor disconnect
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open Corey Bryant
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

Each fd set has a boolean that keeps track of whether or not the
fd set is in use by a monitor connection.  When a monitor
disconnects, all fds that are members of an fd set with refcount
of zero are closed.  This prevents any fd leakage associated with
a client disconnect prior to using a passed fd.

v5:
 -This patch is new in v5.
 -This support addresses concerns from v4 regarding fd leakage
  if the client disconnects unexpectedly. (eblake@redhat.com,
  kwolf@redhat.com, dberrange@redhat.com)

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 monitor.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/monitor.c b/monitor.c
index e27dbbe..30b085f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2538,6 +2538,19 @@ FdsetInfoList *qmp_query_fdsets(Error **errp)
     return fdset_list;
 }
 
+static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
+{
+    mon_fdset_t *mon_fdset;
+    mon_fdset_t *mon_fdset_next;
+
+    QLIST_FOREACH_SAFE(mon_fdset, &mon->fdsets, next, mon_fdset_next) {
+        mon_fdset->in_use = in_use;
+        if (!in_use) {
+            monitor_fdset_cleanup(mon_fdset);
+        }
+    }
+}
+
 /* mon_cmds and info_cmds would be sorted at runtime */
 static mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
@@ -4751,9 +4764,11 @@ static void monitor_control_event(void *opaque, int event)
         data = get_qmp_greeting();
         monitor_json_emitter(mon, data);
         qobject_decref(data);
+        monitor_fdsets_set_in_use(mon, true);
         break;
     case CHR_EVENT_CLOSED:
         json_message_parser_destroy(&mon->mc->parser);
+        monitor_fdsets_set_in_use(mon, false);
         break;
     }
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
                   ` (2 preceding siblings ...)
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 3/6] monitor: Clean up fd sets on monitor disconnect Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-25 19:22   ` Eric Blake
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close Corey Bryant
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

This patch converts all block layer open calls to qemu_open.

Note that this adds the O_CLOEXEC flag to the changed open paths
when the O_CLOEXEC macro is defined.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
v2:
 -Convert calls to qemu_open instead of file_open (kwolf@redhat.com)
 -Mention introduction of O_CLOEXEC (kwolf@redhat.com)

v3-v5:
 -No changes
---
 block/raw-posix.c |   18 +++++++++---------
 block/raw-win32.c |    4 ++--
 block/vdi.c       |    5 +++--
 block/vmdk.c      |   21 +++++++++------------
 block/vpc.c       |    2 +-
 block/vvfat.c     |    4 ++--
 6 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 0dce089..7408a42 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -572,8 +572,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
-              0644);
+    fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+                   0644);
     if (fd < 0) {
         result = -errno;
     } else {
@@ -846,7 +846,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
         if ( bsdPath[ 0 ] != '\0' ) {
             strcat(bsdPath,"s0");
             /* some CDs don't have a partition 0 */
-            fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
+            fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
             if (fd < 0) {
                 bsdPath[strlen(bsdPath)-1] = '1';
             } else {
@@ -903,7 +903,7 @@ static int fd_open(BlockDriverState *bs)
 #endif
             return -EIO;
         }
-        s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK);
+        s->fd = qemu_open(bs->filename, s->open_flags & ~O_NONBLOCK);
         if (s->fd < 0) {
             s->fd_error_time = get_clock();
             s->fd_got_error = 1;
@@ -977,7 +977,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_BINARY);
+    fd = qemu_open(filename, O_WRONLY | O_BINARY);
     if (fd < 0)
         return -errno;
 
@@ -1055,7 +1055,7 @@ static int floppy_probe_device(const char *filename)
     if (strstart(filename, "/dev/fd", NULL))
         prio = 50;
 
-    fd = open(filename, O_RDONLY | O_NONBLOCK);
+    fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
     if (fd < 0) {
         goto out;
     }
@@ -1108,7 +1108,7 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag)
         close(s->fd);
         s->fd = -1;
     }
-    fd = open(bs->filename, s->open_flags | O_NONBLOCK);
+    fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
     if (fd >= 0) {
         if (ioctl(fd, FDEJECT, 0) < 0)
             perror("FDEJECT");
@@ -1158,7 +1158,7 @@ static int cdrom_probe_device(const char *filename)
     int prio = 0;
     struct stat st;
 
-    fd = open(filename, O_RDONLY | O_NONBLOCK);
+    fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
     if (fd < 0) {
         goto out;
     }
@@ -1282,7 +1282,7 @@ static int cdrom_reopen(BlockDriverState *bs)
      */
     if (s->fd >= 0)
         close(s->fd);
-    fd = open(bs->filename, s->open_flags, 0644);
+    fd = qemu_open(bs->filename, s->open_flags, 0644);
     if (fd < 0) {
         s->fd = -1;
         return -EIO;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index e4b0b75..8d7838d 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -255,8 +255,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
-              0644);
+    fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+                   0644);
     if (fd < 0)
         return -EIO;
     set_sparse(fd);
diff --git a/block/vdi.c b/block/vdi.c
index 57325d6..c4f1529 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -653,8 +653,9 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options)
         options++;
     }
 
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
-              0644);
+    fd = qemu_open(filename,
+                   O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+                   0644);
     if (fd < 0) {
         return -errno;
     }
diff --git a/block/vmdk.c b/block/vmdk.c
index 18e9b4c..557dc1b 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1161,10 +1161,9 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
     VMDK4Header header;
     uint32_t tmp, magic, grains, gd_size, gt_size, gt_count;
 
-    fd = open(
-        filename,
-        O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
-        0644);
+    fd = qemu_open(filename,
+                   O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+                   0644);
     if (fd < 0) {
         return -errno;
     }
@@ -1484,15 +1483,13 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
             (flags & BLOCK_FLAG_COMPAT6 ? 6 : 4),
             total_size / (int64_t)(63 * 16 * 512));
     if (split || flat) {
-        fd = open(
-                filename,
-                O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
-                0644);
+        fd = qemu_open(filename,
+                       O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+                       0644);
     } else {
-        fd = open(
-                filename,
-                O_WRONLY | O_BINARY | O_LARGEFILE,
-                0644);
+        fd = qemu_open(filename,
+                       O_WRONLY | O_BINARY | O_LARGEFILE,
+                       0644);
     }
     if (fd < 0) {
         return -errno;
diff --git a/block/vpc.c b/block/vpc.c
index 5cd13d1..60ebf5a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -678,7 +678,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
     }
 
     /* Create the file */
-    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
+    fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
     if (fd < 0) {
         return -EIO;
     }
diff --git a/block/vvfat.c b/block/vvfat.c
index 7b1dcee..22b586a 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1162,7 +1162,7 @@ static int open_file(BDRVVVFATState* s,mapping_t* mapping)
     if(!s->current_mapping ||
 	    strcmp(s->current_mapping->path,mapping->path)) {
 	/* open file */
-	int fd = open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE);
+	int fd = qemu_open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE);
 	if(fd<0)
 	    return -1;
 	vvfat_close_current_file(s);
@@ -2221,7 +2221,7 @@ static int commit_one_file(BDRVVVFATState* s,
     for (i = s->cluster_size; i < offset; i += s->cluster_size)
 	c = modified_fat_get(s, c);
 
-    fd = open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666);
+    fd = qemu_open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666);
     if (fd < 0) {
 	fprintf(stderr, "Could not open %s... (%s, %d)\n", mapping->path,
 		strerror(errno), errno);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
                   ` (3 preceding siblings ...)
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
  2012-07-24 12:09 ` [Qemu-devel] [PATCH v5 0/6] file descriptor passing using " Kevin Wolf
  6 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

This patch converts all block layer close calls, that correspond
to qemu_open calls, to qemu_close.

v5:
 -This patch is new in v5. (kwolf@redhat.com, eblake@redhat.com)

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 block/raw-posix.c |   24 ++++++++++++------------
 block/raw-win32.c |    2 +-
 block/vmdk.c      |    4 ++--
 block/vpc.c       |    2 +-
 block/vvfat.c     |   12 ++++++------
 osdep.c           |    5 +++++
 qemu-common.h     |    1 +
 savevm.c          |    4 ++--
 8 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7408a42..a172de3 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 out_free_buf:
     qemu_vfree(s->aligned_buf);
 out_close:
-    close(fd);
+    qemu_close(fd);
     return -errno;
 }
 
@@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
     if (s->fd >= 0) {
-        close(s->fd);
+        qemu_close(s->fd);
         s->fd = -1;
         if (s->aligned_buf != NULL)
             qemu_vfree(s->aligned_buf);
@@ -580,7 +580,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
             result = -errno;
         }
-        if (close(fd) != 0) {
+        if (qemu_close(fd) != 0) {
             result = -errno;
         }
     }
@@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
             if (fd < 0) {
                 bsdPath[strlen(bsdPath)-1] = '1';
             } else {
-                close(fd);
+                qemu_close(fd);
             }
             filename = bsdPath;
         }
@@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs)
     last_media_present = (s->fd >= 0);
     if (s->fd >= 0 &&
         (get_clock() - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
-        close(s->fd);
+        qemu_close(s->fd);
         s->fd = -1;
 #ifdef DEBUG_FLOPPY
         printf("Floppy closed\n");
@@ -988,7 +988,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
     else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE)
         ret = -ENOSPC;
 
-    close(fd);
+    qemu_close(fd);
     return ret;
 }
 
@@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
         return ret;
 
     /* close fd so that we can reopen it as needed */
-    close(s->fd);
+    qemu_close(s->fd);
     s->fd = -1;
     s->fd_media_changed = 1;
 
@@ -1070,7 +1070,7 @@ static int floppy_probe_device(const char *filename)
         prio = 100;
 
 outc:
-    close(fd);
+    qemu_close(fd);
 out:
     return prio;
 }
@@ -1105,14 +1105,14 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag)
     int fd;
 
     if (s->fd >= 0) {
-        close(s->fd);
+        qemu_close(s->fd);
         s->fd = -1;
     }
     fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
     if (fd >= 0) {
         if (ioctl(fd, FDEJECT, 0) < 0)
             perror("FDEJECT");
-        close(fd);
+        qemu_close(fd);
     }
 }
 
@@ -1173,7 +1173,7 @@ static int cdrom_probe_device(const char *filename)
         prio = 100;
 
 outc:
-    close(fd);
+    qemu_close(fd);
 out:
     return prio;
 }
@@ -1281,7 +1281,7 @@ static int cdrom_reopen(BlockDriverState *bs)
      * FreeBSD seems to not notice sometimes...
      */
     if (s->fd >= 0)
-        close(s->fd);
+        qemu_close(s->fd);
     fd = qemu_open(bs->filename, s->open_flags, 0644);
     if (fd < 0) {
         s->fd = -1;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 8d7838d..c56bf83 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -261,7 +261,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         return -EIO;
     set_sparse(fd);
     ftruncate(fd, total_size * 512);
-    close(fd);
+    qemu_close(fd);
     return 0;
 }
 
diff --git a/block/vmdk.c b/block/vmdk.c
index 557dc1b..daee426 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
 
     ret = 0;
  exit:
-    close(fd);
+    qemu_close(fd);
     return ret;
 }
 
@@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
     }
     ret = 0;
 exit:
-    close(fd);
+    qemu_close(fd);
     return ret;
 }
 
diff --git a/block/vpc.c b/block/vpc.c
index 60ebf5a..c0b82c4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -744,7 +744,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
     }
 
  fail:
-    close(fd);
+    qemu_close(fd);
     return ret;
 }
 
diff --git a/block/vvfat.c b/block/vvfat.c
index 22b586a..59d3c5b 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1105,7 +1105,7 @@ static inline void vvfat_close_current_file(BDRVVVFATState *s)
     if(s->current_mapping) {
 	s->current_mapping = NULL;
 	if (s->current_fd) {
-		close(s->current_fd);
+		qemu_close(s->current_fd);
 		s->current_fd = 0;
 	}
     }
@@ -2230,7 +2230,7 @@ static int commit_one_file(BDRVVVFATState* s,
     }
     if (offset > 0) {
         if (lseek(fd, offset, SEEK_SET) != offset) {
-            close(fd);
+            qemu_close(fd);
             g_free(cluster);
             return -3;
         }
@@ -2251,13 +2251,13 @@ static int commit_one_file(BDRVVVFATState* s,
 	    (uint8_t*)cluster, (rest_size + 0x1ff) / 0x200);
 
         if (ret < 0) {
-            close(fd);
+            qemu_close(fd);
             g_free(cluster);
             return ret;
         }
 
         if (write(fd, cluster, rest_size) < 0) {
-            close(fd);
+            qemu_close(fd);
             g_free(cluster);
             return -2;
         }
@@ -2268,11 +2268,11 @@ static int commit_one_file(BDRVVVFATState* s,
 
     if (ftruncate(fd, size)) {
         perror("ftruncate()");
-        close(fd);
+        qemu_close(fd);
         g_free(cluster);
         return -4;
     }
-    close(fd);
+    qemu_close(fd);
     g_free(cluster);
 
     return commit_mappings(s, first_cluster, dir_index);
diff --git a/osdep.c b/osdep.c
index 03817f0..7b09dff 100644
--- a/osdep.c
+++ b/osdep.c
@@ -104,6 +104,11 @@ int qemu_open(const char *name, int flags, ...)
     return ret;
 }
 
+int qemu_close(int fd)
+{
+    return close(fd);
+}
+
 /*
  * A variant of write(2) which handles partial write.
  *
diff --git a/qemu-common.h b/qemu-common.h
index 09676f5..c8ddf2f 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -188,6 +188,7 @@ const char *path(const char *pathname);
 void *qemu_oom_check(void *ptr);
 
 int qemu_open(const char *name, int flags, ...);
+int qemu_close(int fd);
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
     QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
diff --git a/savevm.c b/savevm.c
index a15c163..3269dd4 100644
--- a/savevm.c
+++ b/savevm.c
@@ -513,7 +513,7 @@ static void qemu_fill_buffer(QEMUFile *f)
  *
  * Returns f->close() return value, or 0 if close function is not set.
  */
-static int qemu_close(QEMUFile *f)
+static int _qemu_fclose(QEMUFile *f)
 {
     int ret = 0;
     if (f->close) {
@@ -535,7 +535,7 @@ int qemu_fclose(QEMUFile *f)
 {
     int ret;
     qemu_fflush(f);
-    ret = qemu_close(f);
+    ret = _qemu_fclose(f);
     /* If any error was spotted before closing, we should report it
      * instead of the close() return value.
      */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
                   ` (4 preceding siblings ...)
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close Corey Bryant
@ 2012-07-23 13:08 ` Corey Bryant
  2012-07-23 13:14   ` Corey Bryant
                     ` (2 more replies)
  2012-07-24 12:09 ` [Qemu-devel] [PATCH v5 0/6] file descriptor passing using " Kevin Wolf
  6 siblings, 3 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, aliguori, stefanha, libvir-list, Corey Bryant,
	lcapitulino, eblake

When qemu_open is passed a filename of the "/dev/fdset/nnn"
format (where nnn is the fdset ID), an fd with matching access
mode flags will be searched for within the specified monitor
fd set.  If the fd is found, a dup of the fd will be returned
from qemu_open.

Each fd set has a reference count.  The purpose of the reference
count is to determine if an fd set contains file descriptors that
have open dup() references that have not yet been closed.  It is
incremented on qemu_open and decremented on qemu_close.  It is
not until the refcount is zero that file desriptors in an fd set
can be closed.  If an fd set has dup() references open, then we
must keep the other fds in the fd set open in case a reopen
of the file occurs that requires an fd with a different access
mode.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>

v2:
 -Get rid of file_open and move dup code to qemu_open
  (kwolf@redhat.com)
 -Use strtol wrapper instead of atoi (kwolf@redhat.com)

v3:
 -Add note about fd leakage (eblake@redhat.com)

v4
 -Moved patch to be later in series (lcapitulino@redhat.com)
 -Update qemu_open to check access mode flags and set flags that
  can be set (eblake@redhat.com, kwolf@redhat.com)

v5:
 -This patch was overhauled quite a bit in this version, with
  the addition of fd set and refcount support.
 -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com)
 -Modify flags set by fcntl on dup'd fd (eblake@redhat.com)
 -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com)
 -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com)
---
 block/raw-posix.c |   24 +++++-----
 block/raw-win32.c |    2 +-
 block/vmdk.c      |    4 +-
 block/vpc.c       |    2 +-
 block/vvfat.c     |   12 ++---
 cutils.c          |    5 ++
 monitor.c         |   85 +++++++++++++++++++++++++++++++++
 monitor.h         |    4 ++
 osdep.c           |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qemu-common.h     |    3 +-
 qemu-tool.c       |   12 +++++
 11 files changed, 267 insertions(+), 24 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index a172de3..5d0a801 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 out_free_buf:
     qemu_vfree(s->aligned_buf);
 out_close:
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return -errno;
 }
 
@@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
     if (s->fd >= 0) {
-        qemu_close(s->fd);
+        qemu_close(s->fd, bs->filename);
         s->fd = -1;
         if (s->aligned_buf != NULL)
             qemu_vfree(s->aligned_buf);
@@ -580,7 +580,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
             result = -errno;
         }
-        if (qemu_close(fd) != 0) {
+        if (qemu_close(fd, filename) != 0) {
             result = -errno;
         }
     }
@@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
             if (fd < 0) {
                 bsdPath[strlen(bsdPath)-1] = '1';
             } else {
-                qemu_close(fd);
+                qemu_close(fd, bsdPath);
             }
             filename = bsdPath;
         }
@@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs)
     last_media_present = (s->fd >= 0);
     if (s->fd >= 0 &&
         (get_clock() - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
-        qemu_close(s->fd);
+        qemu_close(s->fd, bs->filename);
         s->fd = -1;
 #ifdef DEBUG_FLOPPY
         printf("Floppy closed\n");
@@ -988,7 +988,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
     else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE)
         ret = -ENOSPC;
 
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return ret;
 }
 
@@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
         return ret;
 
     /* close fd so that we can reopen it as needed */
-    qemu_close(s->fd);
+    qemu_close(s->fd, filename);
     s->fd = -1;
     s->fd_media_changed = 1;
 
@@ -1070,7 +1070,7 @@ static int floppy_probe_device(const char *filename)
         prio = 100;
 
 outc:
-    qemu_close(fd);
+    qemu_close(fd, filename);
 out:
     return prio;
 }
@@ -1105,14 +1105,14 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag)
     int fd;
 
     if (s->fd >= 0) {
-        qemu_close(s->fd);
+        qemu_close(s->fd, bs->filename);
         s->fd = -1;
     }
     fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
     if (fd >= 0) {
         if (ioctl(fd, FDEJECT, 0) < 0)
             perror("FDEJECT");
-        qemu_close(fd);
+        qemu_close(fd, bs->filename);
     }
 }
 
@@ -1173,7 +1173,7 @@ static int cdrom_probe_device(const char *filename)
         prio = 100;
 
 outc:
-    qemu_close(fd);
+    qemu_close(fd, filename);
 out:
     return prio;
 }
@@ -1281,7 +1281,7 @@ static int cdrom_reopen(BlockDriverState *bs)
      * FreeBSD seems to not notice sometimes...
      */
     if (s->fd >= 0)
-        qemu_close(s->fd);
+        qemu_close(s->fd, bs->filename);
     fd = qemu_open(bs->filename, s->open_flags, 0644);
     if (fd < 0) {
         s->fd = -1;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index c56bf83..b795871 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -261,7 +261,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
         return -EIO;
     set_sparse(fd);
     ftruncate(fd, total_size * 512);
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return 0;
 }
 
diff --git a/block/vmdk.c b/block/vmdk.c
index daee426..7f1206b 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
 
     ret = 0;
  exit:
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return ret;
 }
 
@@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
     }
     ret = 0;
 exit:
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return ret;
 }
 
diff --git a/block/vpc.c b/block/vpc.c
index c0b82c4..20f648a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -744,7 +744,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
     }
 
  fail:
-    qemu_close(fd);
+    qemu_close(fd, filename);
     return ret;
 }
 
diff --git a/block/vvfat.c b/block/vvfat.c
index 59d3c5b..cbc7543 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1105,7 +1105,7 @@ static inline void vvfat_close_current_file(BDRVVVFATState *s)
     if(s->current_mapping) {
 	s->current_mapping = NULL;
 	if (s->current_fd) {
-		qemu_close(s->current_fd);
+		qemu_close(s->current_fd, s->current_mapping->path);
 		s->current_fd = 0;
 	}
     }
@@ -2230,7 +2230,7 @@ static int commit_one_file(BDRVVVFATState* s,
     }
     if (offset > 0) {
         if (lseek(fd, offset, SEEK_SET) != offset) {
-            qemu_close(fd);
+            qemu_close(fd, mapping->path);
             g_free(cluster);
             return -3;
         }
@@ -2251,13 +2251,13 @@ static int commit_one_file(BDRVVVFATState* s,
 	    (uint8_t*)cluster, (rest_size + 0x1ff) / 0x200);
 
         if (ret < 0) {
-            qemu_close(fd);
+            qemu_close(fd, mapping->path);
             g_free(cluster);
             return ret;
         }
 
         if (write(fd, cluster, rest_size) < 0) {
-            qemu_close(fd);
+            qemu_close(fd, mapping->path);
             g_free(cluster);
             return -2;
         }
@@ -2268,11 +2268,11 @@ static int commit_one_file(BDRVVVFATState* s,
 
     if (ftruncate(fd, size)) {
         perror("ftruncate()");
-        qemu_close(fd);
+        qemu_close(fd, mapping->path);
         g_free(cluster);
         return -4;
     }
-    qemu_close(fd);
+    qemu_close(fd, mapping->path);
     g_free(cluster);
 
     return commit_mappings(s, first_cluster, dir_index);
diff --git a/cutils.c b/cutils.c
index e2bc1b8..4e6bfdc 100644
--- a/cutils.c
+++ b/cutils.c
@@ -375,3 +375,8 @@ int qemu_parse_fd(const char *param)
     }
     return fd;
 }
+
+int qemu_parse_fdset(const char *param)
+{
+    return qemu_parse_fd(param);
+}
diff --git a/monitor.c b/monitor.c
index 30b085f..fddd2b5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
     }
 }
 
+void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
+{
+    mon_fdset_t *mon_fdset;
+
+    if (!mon) {
+        return;
+    }
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id == fdset_id) {
+            mon_fdset->refcount++;
+            break;
+        }
+    }
+}
+
+void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
+{
+    mon_fdset_t *mon_fdset;
+
+    if (!mon) {
+        return;
+    }
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id == fdset_id) {
+            mon_fdset->refcount--;
+            if (mon_fdset->refcount == 0) {
+                monitor_fdset_cleanup(mon_fdset);
+            }
+            break;
+        }
+    }
+}
+
+int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
+{
+    mon_fdset_t *mon_fdset;
+    mon_fdset_fd_t *mon_fdset_fd;
+    int mon_fd_flags;
+
+    if (!mon) {
+        errno = ENOENT;
+        return -1;
+    }
+
+    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
+        if (mon_fdset->id != fdset_id) {
+            continue;
+        }
+        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
+            if (mon_fdset_fd->removed) {
+                continue;
+            }
+
+            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
+            if (mon_fd_flags == -1) {
+                return -1;
+            }
+
+            switch (flags & O_ACCMODE) {
+            case O_RDWR:
+                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
+                    return mon_fdset_fd->fd;
+                }
+                break;
+            case O_RDONLY:
+                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
+                    return mon_fdset_fd->fd;
+                }
+                break;
+            case O_WRONLY:
+                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
+                    return mon_fdset_fd->fd;
+                }
+                break;
+            }
+        }
+        errno = EACCES;
+        return -1;
+    }
+    errno = ENOENT;
+    return -1;
+}
+
 /* mon_cmds and info_cmds would be sorted at runtime */
 static mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
diff --git a/monitor.h b/monitor.h
index 5f4de1b..1efed38 100644
--- a/monitor.h
+++ b/monitor.h
@@ -86,4 +86,8 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
 
 int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
 
+int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags);
+void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id);
+void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id);
+
 #endif /* !MONITOR_H */
diff --git a/osdep.c b/osdep.c
index 7b09dff..0303cdd 100644
--- a/osdep.c
+++ b/osdep.c
@@ -47,6 +47,7 @@ extern int madvise(caddr_t, size_t, int);
 #include "qemu-common.h"
 #include "trace.h"
 #include "qemu_socket.h"
+#include "monitor.h"
 
 static const char *qemu_version = QEMU_VERSION;
 
@@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
 #endif
 }
 
+/*
+ * Dups an fd and sets the flags
+ */
+static int qemu_dup(int fd, int flags)
+{
+    int i;
+    int ret;
+    int serrno;
+    int dup_flags;
+    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
+                          O_NONBLOCK, 0 };
+
+    if (flags & O_CLOEXEC) {
+        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
+        if (ret == -1 && errno == EINVAL) {
+            ret = dup(fd);
+            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
+                goto fail;
+            }
+        }
+    } else {
+        ret = dup(fd);
+    }
+
+    if (ret == -1) {
+        goto fail;
+    }
+
+    dup_flags = fcntl(ret, F_GETFL);
+    if (dup_flags == -1) {
+        goto fail;
+    }
+
+    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
+        errno = EINVAL;
+        goto fail;
+    }
+
+    /* Set/unset flags that we can with fcntl */
+    i = 0;
+    while (setfl_flags[i] != 0) {
+        if (flags & setfl_flags[i]) {
+            dup_flags = (dup_flags | setfl_flags[i]);
+        } else {
+            dup_flags = (dup_flags & ~setfl_flags[i]);
+        }
+        i++;
+    }
+
+    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
+        goto fail;
+    }
+
+    /* Truncate the file in the cases that open() would truncate it */
+    if (flags & O_TRUNC ||
+            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
+        if (ftruncate(ret, 0) == -1) {
+            goto fail;
+        }
+    }
+
+    qemu_set_cloexec(ret);
+
+    return ret;
+
+fail:
+    serrno = errno;
+    if (ret != -1) {
+        close(ret);
+    }
+    errno = serrno;
+    return -1;
+}
 
 /*
  * Opens a file with FD_CLOEXEC set
@@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
     int ret;
     int mode = 0;
 
+#ifndef _WIN32
+    const char *fdset_id_str;
+
+    /* Attempt dup of fd from fd set */
+    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
+        int64_t fdset_id;
+        int fd, dupfd;
+
+        fdset_id = qemu_parse_fdset(fdset_id_str);
+        if (fdset_id == -1) {
+            errno = EINVAL;
+            return -1;
+        }
+
+        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
+        if (fd == -1) {
+            return -1;
+        }
+
+        dupfd = qemu_dup(fd, flags);
+        if (fd == -1) {
+            return -1;
+        }
+
+        monitor_fdset_increment_refcount(default_mon, fdset_id);
+
+        return dupfd;
+    }
+#endif
+
     if (flags & O_CREAT) {
         va_list ap;
 
@@ -104,8 +208,40 @@ int qemu_open(const char *name, int flags, ...)
     return ret;
 }
 
-int qemu_close(int fd)
+int qemu_close(int fd, const char *name)
 {
+    const char *fdset_id_str;
+
+    /* Close fd that was dup'd from an fdset */
+    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
+        int ret;
+        int64_t fdset_id;
+        int fdset_fd;
+        int flags;
+
+        fdset_id = qemu_parse_fdset(fdset_id_str);
+        if (fdset_id == -1) {
+            return -1;
+        }
+
+        flags = fcntl(fd, F_GETFL);
+        if (flags == -1) {
+            return -1;
+        }
+
+        fdset_fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
+        if (fdset_fd == -1) {
+            return -1;
+        }
+
+        ret = close(fd);
+        if (ret == 0) {
+            monitor_fdset_decrement_refcount(default_mon, fdset_id);
+        }
+
+        return ret;
+    }
+
     return close(fd);
 }
 
diff --git a/qemu-common.h b/qemu-common.h
index c8ddf2f..6b4123c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -147,6 +147,7 @@ int qemu_fls(int i);
 int qemu_fdatasync(int fd);
 int fcntl_setfl(int fd, int flag);
 int qemu_parse_fd(const char *param);
+int qemu_parse_fdset(const char *param);
 
 /*
  * strtosz() suffixes used to specify the default treatment of an
@@ -188,7 +189,7 @@ const char *path(const char *pathname);
 void *qemu_oom_check(void *ptr);
 
 int qemu_open(const char *name, int flags, ...);
-int qemu_close(int fd);
+int qemu_close(int fd, const char *name);
 ssize_t qemu_write_full(int fd, const void *buf, size_t count)
     QEMU_WARN_UNUSED_RESULT;
 ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
diff --git a/qemu-tool.c b/qemu-tool.c
index 318c5fc..f4db9db 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -31,6 +31,7 @@ struct QEMUBH
 };
 
 Monitor *cur_mon;
+Monitor *default_mon;
 
 int monitor_cur_is_qmp(void)
 {
@@ -57,6 +58,17 @@ void monitor_protocol_event(MonitorEvent event, QObject *data)
 {
 }
 
+int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
+{
+    return -1;
+}
+void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
+{
+}
+void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
+{
+}
+
 int64_t cpu_get_clock(void)
 {
     return qemu_get_clock_ns(rt_clock);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
@ 2012-07-23 13:14   ` Corey Bryant
  2012-08-02 22:21     ` Corey Bryant
  2012-07-24 12:07   ` Kevin Wolf
  2012-07-25 19:43   ` Eric Blake
  2 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-23 13:14 UTC (permalink / raw)
  To: Corey Bryant
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake



On 07/23/2012 09:08 AM, Corey Bryant wrote:
> When qemu_open is passed a filename of the "/dev/fdset/nnn"
> format (where nnn is the fdset ID), an fd with matching access
> mode flags will be searched for within the specified monitor
> fd set.  If the fd is found, a dup of the fd will be returned
> from qemu_open.
>
> Each fd set has a reference count.  The purpose of the reference
> count is to determine if an fd set contains file descriptors that
> have open dup() references that have not yet been closed.  It is
> incremented on qemu_open and decremented on qemu_close.  It is
> not until the refcount is zero that file desriptors in an fd set
> can be closed.  If an fd set has dup() references open, then we
> must keep the other fds in the fd set open in case a reopen
> of the file occurs that requires an fd with a different access
> mode.
>
> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
>
> v2:
>   -Get rid of file_open and move dup code to qemu_open
>    (kwolf@redhat.com)
>   -Use strtol wrapper instead of atoi (kwolf@redhat.com)
>
> v3:
>   -Add note about fd leakage (eblake@redhat.com)
>
> v4
>   -Moved patch to be later in series (lcapitulino@redhat.com)
>   -Update qemu_open to check access mode flags and set flags that
>    can be set (eblake@redhat.com, kwolf@redhat.com)
>
> v5:
>   -This patch was overhauled quite a bit in this version, with
>    the addition of fd set and refcount support.
>   -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com)
>   -Modify flags set by fcntl on dup'd fd (eblake@redhat.com)
>   -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com)
>   -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com)
> ---
>   block/raw-posix.c |   24 +++++-----
>   block/raw-win32.c |    2 +-
>   block/vmdk.c      |    4 +-
>   block/vpc.c       |    2 +-
>   block/vvfat.c     |   12 ++---
>   cutils.c          |    5 ++
>   monitor.c         |   85 +++++++++++++++++++++++++++++++++
>   monitor.h         |    4 ++
>   osdep.c           |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   qemu-common.h     |    3 +-
>   qemu-tool.c       |   12 +++++
>   11 files changed, 267 insertions(+), 24 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index a172de3..5d0a801 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>   out_free_buf:
>       qemu_vfree(s->aligned_buf);
>   out_close:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return -errno;
>   }
>
> @@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs)
>   {
>       BDRVRawState *s = bs->opaque;
>       if (s->fd >= 0) {
> -        qemu_close(s->fd);
> +        qemu_close(s->fd, bs->filename);
>           s->fd = -1;
>           if (s->aligned_buf != NULL)
>               qemu_vfree(s->aligned_buf);
> @@ -580,7 +580,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
>           if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
>               result = -errno;
>           }
> -        if (qemu_close(fd) != 0) {
> +        if (qemu_close(fd, filename) != 0) {
>               result = -errno;
>           }
>       }
> @@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>               if (fd < 0) {
>                   bsdPath[strlen(bsdPath)-1] = '1';
>               } else {
> -                qemu_close(fd);
> +                qemu_close(fd, bsdPath);
>               }
>               filename = bsdPath;
>           }
> @@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs)
>       last_media_present = (s->fd >= 0);
>       if (s->fd >= 0 &&
>           (get_clock() - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
> -        qemu_close(s->fd);
> +        qemu_close(s->fd, bs->filename);
>           s->fd = -1;
>   #ifdef DEBUG_FLOPPY
>           printf("Floppy closed\n");
> @@ -988,7 +988,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options)
>       else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE)
>           ret = -ENOSPC;
>
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return ret;
>   }
>
> @@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>           return ret;
>
>       /* close fd so that we can reopen it as needed */
> -    qemu_close(s->fd);
> +    qemu_close(s->fd, filename);
>       s->fd = -1;
>       s->fd_media_changed = 1;
>
> @@ -1070,7 +1070,7 @@ static int floppy_probe_device(const char *filename)
>           prio = 100;
>
>   outc:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>   out:
>       return prio;
>   }
> @@ -1105,14 +1105,14 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag)
>       int fd;
>
>       if (s->fd >= 0) {
> -        qemu_close(s->fd);
> +        qemu_close(s->fd, bs->filename);
>           s->fd = -1;
>       }
>       fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
>       if (fd >= 0) {
>           if (ioctl(fd, FDEJECT, 0) < 0)
>               perror("FDEJECT");
> -        qemu_close(fd);
> +        qemu_close(fd, bs->filename);
>       }
>   }
>
> @@ -1173,7 +1173,7 @@ static int cdrom_probe_device(const char *filename)
>           prio = 100;
>
>   outc:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>   out:
>       return prio;
>   }
> @@ -1281,7 +1281,7 @@ static int cdrom_reopen(BlockDriverState *bs)
>        * FreeBSD seems to not notice sometimes...
>        */
>       if (s->fd >= 0)
> -        qemu_close(s->fd);
> +        qemu_close(s->fd, bs->filename);
>       fd = qemu_open(bs->filename, s->open_flags, 0644);
>       if (fd < 0) {
>           s->fd = -1;
> diff --git a/block/raw-win32.c b/block/raw-win32.c
> index c56bf83..b795871 100644
> --- a/block/raw-win32.c
> +++ b/block/raw-win32.c
> @@ -261,7 +261,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
>           return -EIO;
>       set_sparse(fd);
>       ftruncate(fd, total_size * 512);
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return 0;
>   }
>
> diff --git a/block/vmdk.c b/block/vmdk.c
> index daee426..7f1206b 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
>
>       ret = 0;
>    exit:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return ret;
>   }
>
> @@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options)
>       }
>       ret = 0;
>   exit:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return ret;
>   }
>
> diff --git a/block/vpc.c b/block/vpc.c
> index c0b82c4..20f648a 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -744,7 +744,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options)
>       }
>
>    fail:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>       return ret;
>   }
>
> diff --git a/block/vvfat.c b/block/vvfat.c
> index 59d3c5b..cbc7543 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -1105,7 +1105,7 @@ static inline void vvfat_close_current_file(BDRVVVFATState *s)
>       if(s->current_mapping) {
>   	s->current_mapping = NULL;
>   	if (s->current_fd) {
> -		qemu_close(s->current_fd);
> +		qemu_close(s->current_fd, s->current_mapping->path);
>   		s->current_fd = 0;
>   	}
>       }
> @@ -2230,7 +2230,7 @@ static int commit_one_file(BDRVVVFATState* s,
>       }
>       if (offset > 0) {
>           if (lseek(fd, offset, SEEK_SET) != offset) {
> -            qemu_close(fd);
> +            qemu_close(fd, mapping->path);
>               g_free(cluster);
>               return -3;
>           }
> @@ -2251,13 +2251,13 @@ static int commit_one_file(BDRVVVFATState* s,
>   	    (uint8_t*)cluster, (rest_size + 0x1ff) / 0x200);
>
>           if (ret < 0) {
> -            qemu_close(fd);
> +            qemu_close(fd, mapping->path);
>               g_free(cluster);
>               return ret;
>           }
>
>           if (write(fd, cluster, rest_size) < 0) {
> -            qemu_close(fd);
> +            qemu_close(fd, mapping->path);
>               g_free(cluster);
>               return -2;
>           }
> @@ -2268,11 +2268,11 @@ static int commit_one_file(BDRVVVFATState* s,
>
>       if (ftruncate(fd, size)) {
>           perror("ftruncate()");
> -        qemu_close(fd);
> +        qemu_close(fd, mapping->path);
>           g_free(cluster);
>           return -4;
>       }
> -    qemu_close(fd);
> +    qemu_close(fd, mapping->path);
>       g_free(cluster);
>
>       return commit_mappings(s, first_cluster, dir_index);
> diff --git a/cutils.c b/cutils.c
> index e2bc1b8..4e6bfdc 100644
> --- a/cutils.c
> +++ b/cutils.c
> @@ -375,3 +375,8 @@ int qemu_parse_fd(const char *param)
>       }
>       return fd;
>   }
> +
> +int qemu_parse_fdset(const char *param)
> +{
> +    return qemu_parse_fd(param);
> +}
> diff --git a/monitor.c b/monitor.c
> index 30b085f..fddd2b5 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>       }
>   }
>
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount++;
> +            break;
> +        }
> +    }
> +}
> +
> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount--;
> +            if (mon_fdset->refcount == 0) {
> +                monitor_fdset_cleanup(mon_fdset);
> +            }
> +            break;
> +        }
> +    }
> +}
> +
> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
> +{
> +    mon_fdset_t *mon_fdset;
> +    mon_fdset_fd_t *mon_fdset_fd;
> +    int mon_fd_flags;
> +
> +    if (!mon) {
> +        errno = ENOENT;
> +        return -1;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id != fdset_id) {
> +            continue;
> +        }
> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
> +            if (mon_fdset_fd->removed) {
> +                continue;
> +            }
> +
> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
> +            if (mon_fd_flags == -1) {
> +                return -1;
> +            }
> +
> +            switch (flags & O_ACCMODE) {
> +            case O_RDWR:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_RDONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_WRONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            }
> +        }
> +        errno = EACCES;
> +        return -1;
> +    }
> +    errno = ENOENT;
> +    return -1;
> +}
> +
>   /* mon_cmds and info_cmds would be sorted at runtime */
>   static mon_cmd_t mon_cmds[] = {
>   #include "hmp-commands.h"
> diff --git a/monitor.h b/monitor.h
> index 5f4de1b..1efed38 100644
> --- a/monitor.h
> +++ b/monitor.h
> @@ -86,4 +86,8 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict, QObject **ret);
>
>   int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
>
> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags);
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id);
> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id);
> +
>   #endif /* !MONITOR_H */
> diff --git a/osdep.c b/osdep.c
> index 7b09dff..0303cdd 100644
> --- a/osdep.c
> +++ b/osdep.c
> @@ -47,6 +47,7 @@ extern int madvise(caddr_t, size_t, int);
>   #include "qemu-common.h"
>   #include "trace.h"
>   #include "qemu_socket.h"
> +#include "monitor.h"
>
>   static const char *qemu_version = QEMU_VERSION;
>
> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>   #endif
>   }
>
> +/*
> + * Dups an fd and sets the flags
> + */
> +static int qemu_dup(int fd, int flags)
> +{
> +    int i;
> +    int ret;
> +    int serrno;
> +    int dup_flags;
> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
> +                          O_NONBLOCK, 0 };
> +
> +    if (flags & O_CLOEXEC) {
> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
> +        if (ret == -1 && errno == EINVAL) {
> +            ret = dup(fd);
> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
> +                goto fail;
> +            }
> +        }
> +    } else {
> +        ret = dup(fd);
> +    }
> +
> +    if (ret == -1) {
> +        goto fail;
> +    }
> +
> +    dup_flags = fcntl(ret, F_GETFL);
> +    if (dup_flags == -1) {
> +        goto fail;
> +    }
> +
> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
> +        errno = EINVAL;
> +        goto fail;
> +    }
> +
> +    /* Set/unset flags that we can with fcntl */
> +    i = 0;
> +    while (setfl_flags[i] != 0) {
> +        if (flags & setfl_flags[i]) {
> +            dup_flags = (dup_flags | setfl_flags[i]);
> +        } else {
> +            dup_flags = (dup_flags & ~setfl_flags[i]);
> +        }
> +        i++;
> +    }
> +
> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
> +        goto fail;
> +    }
> +
> +    /* Truncate the file in the cases that open() would truncate it */
> +    if (flags & O_TRUNC ||
> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
> +        if (ftruncate(ret, 0) == -1) {
> +            goto fail;
> +        }
> +    }
> +
> +    qemu_set_cloexec(ret);
> +
> +    return ret;
> +
> +fail:
> +    serrno = errno;
> +    if (ret != -1) {
> +        close(ret);
> +    }
> +    errno = serrno;
> +    return -1;
> +}
>
>   /*
>    * Opens a file with FD_CLOEXEC set
> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>       int ret;
>       int mode = 0;
>
> +#ifndef _WIN32
> +    const char *fdset_id_str;
> +
> +    /* Attempt dup of fd from fd set */
> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
> +        int64_t fdset_id;
> +        int fd, dupfd;
> +
> +        fdset_id = qemu_parse_fdset(fdset_id_str);
> +        if (fdset_id == -1) {
> +            errno = EINVAL;
> +            return -1;
> +        }
> +
> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);

I know that use of default_mon in this patch is not correct, but I 
wanted to get these patches out for review. I used default_mon for 
testing because cur_mon wasn't pointing to the monitor I'd added fd sets 
to.  I need to figure out why.

> +        if (fd == -1) {
> +            return -1;
> +        }
> +
> +        dupfd = qemu_dup(fd, flags);
> +        if (fd == -1) {
> +            return -1;
> +        }
> +
> +        monitor_fdset_increment_refcount(default_mon, fdset_id);
> +
> +        return dupfd;
> +    }
> +#endif
> +
>       if (flags & O_CREAT) {
>           va_list ap;
>
> @@ -104,8 +208,40 @@ int qemu_open(const char *name, int flags, ...)
>       return ret;
>   }
>
> -int qemu_close(int fd)
> +int qemu_close(int fd, const char *name)
>   {
> +    const char *fdset_id_str;
> +
> +    /* Close fd that was dup'd from an fdset */
> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
> +        int ret;
> +        int64_t fdset_id;
> +        int fdset_fd;
> +        int flags;
> +
> +        fdset_id = qemu_parse_fdset(fdset_id_str);
> +        if (fdset_id == -1) {
> +            return -1;
> +        }
> +
> +        flags = fcntl(fd, F_GETFL);
> +        if (flags == -1) {
> +            return -1;
> +        }
> +
> +        fdset_fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
> +        if (fdset_fd == -1) {
> +            return -1;
> +        }
> +
> +        ret = close(fd);
> +        if (ret == 0) {
> +            monitor_fdset_decrement_refcount(default_mon, fdset_id);
> +        }
> +
> +        return ret;
> +    }
> +
>       return close(fd);
>   }
>
> diff --git a/qemu-common.h b/qemu-common.h
> index c8ddf2f..6b4123c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -147,6 +147,7 @@ int qemu_fls(int i);
>   int qemu_fdatasync(int fd);
>   int fcntl_setfl(int fd, int flag);
>   int qemu_parse_fd(const char *param);
> +int qemu_parse_fdset(const char *param);
>
>   /*
>    * strtosz() suffixes used to specify the default treatment of an
> @@ -188,7 +189,7 @@ const char *path(const char *pathname);
>   void *qemu_oom_check(void *ptr);
>
>   int qemu_open(const char *name, int flags, ...);
> -int qemu_close(int fd);
> +int qemu_close(int fd, const char *name);
>   ssize_t qemu_write_full(int fd, const void *buf, size_t count)
>       QEMU_WARN_UNUSED_RESULT;
>   ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
> diff --git a/qemu-tool.c b/qemu-tool.c
> index 318c5fc..f4db9db 100644
> --- a/qemu-tool.c
> +++ b/qemu-tool.c
> @@ -31,6 +31,7 @@ struct QEMUBH
>   };
>
>   Monitor *cur_mon;
> +Monitor *default_mon;
>
>   int monitor_cur_is_qmp(void)
>   {
> @@ -57,6 +58,17 @@ void monitor_protocol_event(MonitorEvent event, QObject *data)
>   {
>   }
>
> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
> +{
> +    return -1;
> +}
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +}
> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +}
> +
>   int64_t cpu_get_clock(void)
>   {
>       return qemu_get_clock_ns(rt_clock);
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
@ 2012-07-23 22:50   ` Eric Blake
  2012-07-24  2:19     ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-23 22:50 UTC (permalink / raw)
  To: Corey Bryant
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

On 07/23/2012 07:08 AM, Corey Bryant wrote:
> Set the close-on-exec flag for the file descriptor received
> via SCM_RIGHTS.
> 

> +++ b/qemu-char.c
> @@ -2263,9 +2263,17 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len)
>      msg.msg_control = &msg_control;
>      msg.msg_controllen = sizeof(msg_control);
>  
> +#ifdef MSG_CMSG_CLOEXEC
> +    ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
> +#else
>      ret = recvmsg(s->fd, &msg, 0);
> -    if (ret > 0 && s->is_unix)
> +    if (ret > 0) {
> +        qemu_set_cloexec(s->fd);

Wrong fd.  You aren't changing cloexec on the socket (s->fd), but on the
fd that was received via msg (which you don't know at this point in time).

> +    }
> +#endif
> +    if (ret > 0 && s->is_unix) {
>          unix_process_msgfd(chr, &msg);

Only here do you know what fd you received.

I would write it more like:

int flags = 0;
#ifdef MSG_CMSG_CLOEXEC
flags |= MSG_CMSG_CLOEXEC
#endif
ret = recvmsg(s->fd, &msg, flags);
if (ret > 0 && s->is_unix) {
    unix_process_msgfd(chr, &msg);
#ifndef MSG_CMSG_CLOEXEC
    qemu_set_cloexec(/* fd determined from msg */)
#endif
}

which almost implies that unix_process_msgfd() should be the function
that sets cloexec, but without wasting the time doing so if recvmsg
already did the job.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
  2012-07-23 22:50   ` Eric Blake
@ 2012-07-24  2:19     ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-24  2:19 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/23/2012 06:50 PM, Eric Blake wrote:
> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>> Set the close-on-exec flag for the file descriptor received
>> via SCM_RIGHTS.
>>
>
>> +++ b/qemu-char.c
>> @@ -2263,9 +2263,17 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len)
>>       msg.msg_control = &msg_control;
>>       msg.msg_controllen = sizeof(msg_control);
>>
>> +#ifdef MSG_CMSG_CLOEXEC
>> +    ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
>> +#else
>>       ret = recvmsg(s->fd, &msg, 0);
>> -    if (ret > 0 && s->is_unix)
>> +    if (ret > 0) {
>> +        qemu_set_cloexec(s->fd);
>
> Wrong fd.  You aren't changing cloexec on the socket (s->fd), but on the
> fd that was received via msg (which you don't know at this point in time).
>

Ugh, that's bad.

>> +    }
>> +#endif
>> +    if (ret > 0 && s->is_unix) {
>>           unix_process_msgfd(chr, &msg);
>
> Only here do you know what fd you received.
>
> I would write it more like:
>
> int flags = 0;
> #ifdef MSG_CMSG_CLOEXEC
> flags |= MSG_CMSG_CLOEXEC
> #endif
> ret = recvmsg(s->fd, &msg, flags);
> if (ret > 0 && s->is_unix) {
>      unix_process_msgfd(chr, &msg);
> #ifndef MSG_CMSG_CLOEXEC
>      qemu_set_cloexec(/* fd determined from msg */)
> #endif
> }
>
> which almost implies that unix_process_msgfd() should be the function
> that sets cloexec, but without wasting the time doing so if recvmsg
> already did the job.
>

Thanks for the suggestion and catching this.  I'll take this into 
account in the next version.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
  2012-07-23 13:14   ` Corey Bryant
@ 2012-07-24 12:07   ` Kevin Wolf
  2012-07-25  3:41     ` Corey Bryant
  2012-07-25 19:43   ` Eric Blake
  2 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-07-24 12:07 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake

Am 23.07.2012 15:08, schrieb Corey Bryant:
> When qemu_open is passed a filename of the "/dev/fdset/nnn"
> format (where nnn is the fdset ID), an fd with matching access
> mode flags will be searched for within the specified monitor
> fd set.  If the fd is found, a dup of the fd will be returned
> from qemu_open.
> 
> Each fd set has a reference count.  The purpose of the reference
> count is to determine if an fd set contains file descriptors that
> have open dup() references that have not yet been closed.  It is
> incremented on qemu_open and decremented on qemu_close.  It is
> not until the refcount is zero that file desriptors in an fd set
> can be closed.  If an fd set has dup() references open, then we
> must keep the other fds in the fd set open in case a reopen
> of the file occurs that requires an fd with a different access
> mode.
> 
> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
> 
> v2:
>  -Get rid of file_open and move dup code to qemu_open
>   (kwolf@redhat.com)
>  -Use strtol wrapper instead of atoi (kwolf@redhat.com)
> 
> v3:
>  -Add note about fd leakage (eblake@redhat.com)
> 
> v4
>  -Moved patch to be later in series (lcapitulino@redhat.com)
>  -Update qemu_open to check access mode flags and set flags that
>   can be set (eblake@redhat.com, kwolf@redhat.com)
> 
> v5:
>  -This patch was overhauled quite a bit in this version, with
>   the addition of fd set and refcount support.
>  -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com)
>  -Modify flags set by fcntl on dup'd fd (eblake@redhat.com)
>  -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com)
>  -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com)
> ---
>  block/raw-posix.c |   24 +++++-----
>  block/raw-win32.c |    2 +-
>  block/vmdk.c      |    4 +-
>  block/vpc.c       |    2 +-
>  block/vvfat.c     |   12 ++---
>  cutils.c          |    5 ++
>  monitor.c         |   85 +++++++++++++++++++++++++++++++++
>  monitor.h         |    4 ++
>  osdep.c           |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  qemu-common.h     |    3 +-
>  qemu-tool.c       |   12 +++++
>  11 files changed, 267 insertions(+), 24 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index a172de3..5d0a801 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>  out_free_buf:
>      qemu_vfree(s->aligned_buf);
>  out_close:
> -    qemu_close(fd);
> +    qemu_close(fd, filename);
>      return -errno;
>  }

Hm, not a nice interface where qemu_close() needs the filename and
(worse) could be given a wrong filename. Maybe it would be better to
maintain a list of fd -> fdset mappings in qemu_open/close?

But if we decided to keep it like this, please use the right interface
from the beginning in patch 5 instead of updating it here.

> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>      }
>  }
>  
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount++;
> +            break;
> +        }
> +    }
> +}
> +
> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id == fdset_id) {
> +            mon_fdset->refcount--;
> +            if (mon_fdset->refcount == 0) {
> +                monitor_fdset_cleanup(mon_fdset);
> +            }
> +            break;
> +        }
> +    }
> +}

These two functions are almost the same. Would a
monitor_fdset_update_refcount(mon, fdset_id, value) make sense? These
functions could then be kept as thin wrappers around it, or they could
even be dropped completely.

> +
> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
> +{
> +    mon_fdset_t *mon_fdset;
> +    mon_fdset_fd_t *mon_fdset_fd;
> +    int mon_fd_flags;
> +
> +    if (!mon) {
> +        errno = ENOENT;
> +        return -1;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id != fdset_id) {
> +            continue;
> +        }
> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
> +            if (mon_fdset_fd->removed) {
> +                continue;
> +            }
> +
> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
> +            if (mon_fd_flags == -1) {
> +                return -1;
> +            }
> +
> +            switch (flags & O_ACCMODE) {
> +            case O_RDWR:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_RDONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_WRONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            }

I think you mean:

  if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
      return mon_fdset_fd->fd;
  }

What about other flags that cannot be set with fcntl(), like O_SYNC on
older kernels or possibly non-Linux? (The block layer doesn't use it any
more, but I think we want to keep the function generally useful)

> +        }
> +        errno = EACCES;
> +        return -1;
> +    }
> +    errno = ENOENT;
> +    return -1;
> +}
> +
>  /* mon_cmds and info_cmds would be sorted at runtime */
>  static mon_cmd_t mon_cmds[] = {
>  #include "hmp-commands.h"

> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>  #endif
>  }
>  
> +/*
> + * Dups an fd and sets the flags
> + */
> +static int qemu_dup(int fd, int flags)
> +{
> +    int i;
> +    int ret;
> +    int serrno;
> +    int dup_flags;
> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
> +                          O_NONBLOCK, 0 };
> +
> +    if (flags & O_CLOEXEC) {
> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
> +        if (ret == -1 && errno == EINVAL) {
> +            ret = dup(fd);
> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
> +                goto fail;
> +            }
> +        }
> +    } else {
> +        ret = dup(fd);
> +    }
> +
> +    if (ret == -1) {
> +        goto fail;
> +    }
> +
> +    dup_flags = fcntl(ret, F_GETFL);
> +    if (dup_flags == -1) {
> +        goto fail;
> +    }
> +
> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
> +        errno = EINVAL;
> +        goto fail;
> +    }

It's worth trying to set it before failing, newer kernels can do it. But
as I said above, if you can fail here, it makes sense to consider O_SYNC
when selecting the right file descriptor from the fdset.

> +    /* Set/unset flags that we can with fcntl */
> +    i = 0;
> +    while (setfl_flags[i] != 0) {
> +        if (flags & setfl_flags[i]) {
> +            dup_flags = (dup_flags | setfl_flags[i]);
> +        } else {
> +            dup_flags = (dup_flags & ~setfl_flags[i]);
> +        }
> +        i++;
> +    }

What about this instead of the loop:

  int setfl_flags = O_APPEND | O_ASYNC | ... ;

  dup_flags &= ~setfl_flags;
  dup_flags |= (flags & setfl_flags);


> +
> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
> +        goto fail;
> +    }
> +
> +    /* Truncate the file in the cases that open() would truncate it */
> +    if (flags & O_TRUNC ||
> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
> +        if (ftruncate(ret, 0) == -1) {
> +            goto fail;
> +        }
> +    }

O_CREAT | O_EXCL kind of loses its meaning here, but okay, it's hard to
do better with file descriptors.

> +
> +    qemu_set_cloexec(ret);

Wait... If O_CLOEXEC is set, you set the flag immediately and if it
isn't you set it at the end of the function? What's the intended use
case for not using O_CLOEXEC then?

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets
  2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
                   ` (5 preceding siblings ...)
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
@ 2012-07-24 12:09 ` Kevin Wolf
  2012-07-25  3:42   ` Corey Bryant
  6 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-07-24 12:09 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake

Am 23.07.2012 15:07, schrieb Corey Bryant:
> Corey Bryant (6):
>   qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
>   qapi: Introduce add-fd, remove-fd, query-fdsets
>   monitor: Clean up fd sets on monitor disconnect
>   block: Convert open calls to qemu_open
>   block: Convert close calls to qemu_close
>   block: Enable qemu_open/close to work with fd sets
> 
>  block/raw-posix.c |   42 ++++-----
>  block/raw-win32.c |    6 +-
>  block/vdi.c       |    5 +-
>  block/vmdk.c      |   25 +++---
>  block/vpc.c       |    4 +-
>  block/vvfat.c     |   16 ++--
>  cutils.c          |    5 ++
>  monitor.c         |  244 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  monitor.h         |    4 +
>  osdep.c           |  141 +++++++++++++++++++++++++++++++
>  qapi-schema.json  |   97 +++++++++++++++++++++
>  qemu-char.c       |   10 ++-
>  qemu-common.h     |    2 +
>  qemu-tool.c       |   12 +++
>  qmp-commands.hx   |  121 ++++++++++++++++++++++++++
>  savevm.c          |    4 +-
>  16 files changed, 684 insertions(+), 54 deletions(-)

Apart from the points I commented on in patch 6, and what Eric and you
found, this looks good to me.

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-24 12:07   ` Kevin Wolf
@ 2012-07-25  3:41     ` Corey Bryant
  2012-07-25  8:22       ` Kevin Wolf
  0 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-25  3:41 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake



On 07/24/2012 08:07 AM, Kevin Wolf wrote:
> Am 23.07.2012 15:08, schrieb Corey Bryant:
>> When qemu_open is passed a filename of the "/dev/fdset/nnn"
>> format (where nnn is the fdset ID), an fd with matching access
>> mode flags will be searched for within the specified monitor
>> fd set.  If the fd is found, a dup of the fd will be returned
>> from qemu_open.
>>
>> Each fd set has a reference count.  The purpose of the reference
>> count is to determine if an fd set contains file descriptors that
>> have open dup() references that have not yet been closed.  It is
>> incremented on qemu_open and decremented on qemu_close.  It is
>> not until the refcount is zero that file desriptors in an fd set
>> can be closed.  If an fd set has dup() references open, then we
>> must keep the other fds in the fd set open in case a reopen
>> of the file occurs that requires an fd with a different access
>> mode.
>>
>> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
>>
>> v2:
>>   -Get rid of file_open and move dup code to qemu_open
>>    (kwolf@redhat.com)
>>   -Use strtol wrapper instead of atoi (kwolf@redhat.com)
>>
>> v3:
>>   -Add note about fd leakage (eblake@redhat.com)
>>
>> v4
>>   -Moved patch to be later in series (lcapitulino@redhat.com)
>>   -Update qemu_open to check access mode flags and set flags that
>>    can be set (eblake@redhat.com, kwolf@redhat.com)
>>
>> v5:
>>   -This patch was overhauled quite a bit in this version, with
>>    the addition of fd set and refcount support.
>>   -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com)
>>   -Modify flags set by fcntl on dup'd fd (eblake@redhat.com)
>>   -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com)
>>   -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com)
>> ---
>>   block/raw-posix.c |   24 +++++-----
>>   block/raw-win32.c |    2 +-
>>   block/vmdk.c      |    4 +-
>>   block/vpc.c       |    2 +-
>>   block/vvfat.c     |   12 ++---
>>   cutils.c          |    5 ++
>>   monitor.c         |   85 +++++++++++++++++++++++++++++++++
>>   monitor.h         |    4 ++
>>   osdep.c           |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qemu-common.h     |    3 +-
>>   qemu-tool.c       |   12 +++++
>>   11 files changed, 267 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index a172de3..5d0a801 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>   out_free_buf:
>>       qemu_vfree(s->aligned_buf);
>>   out_close:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return -errno;
>>   }
>
> Hm, not a nice interface where qemu_close() needs the filename and
> (worse) could be given a wrong filename. Maybe it would be better to
> maintain a list of fd -> fdset mappings in qemu_open/close?
>

I agree, I don't really like it either.

We already have a list of fd -> fdset mappings (mon_fdset_fd_t -> 
mon_fdset_t).  Would it be too costly to loop through all the fdsets/fds 
at the beginning of every qemu_close()?

> But if we decided to keep it like this, please use the right interface
> from the beginning in patch 5 instead of updating it here.
>

Ok

>> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>>       }
>>   }
>>
>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +
>> +    if (!mon) {
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id == fdset_id) {
>> +            mon_fdset->refcount++;
>> +            break;
>> +        }
>> +    }
>> +}
>> +
>> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +
>> +    if (!mon) {
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id == fdset_id) {
>> +            mon_fdset->refcount--;
>> +            if (mon_fdset->refcount == 0) {
>> +                monitor_fdset_cleanup(mon_fdset);
>> +            }
>> +            break;
>> +        }
>> +    }
>> +}
>
> These two functions are almost the same. Would a
> monitor_fdset_update_refcount(mon, fdset_id, value) make sense? These
> functions could then be kept as thin wrappers around it, or they could
> even be dropped completely.
>

This makes sense and I'll try one of these approaches.  I actually 
started to do something along these lines in v5 but reverted back to the 
two independent functions because it was easier to read the code.

>> +
>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +    mon_fdset_fd_t *mon_fdset_fd;
>> +    int mon_fd_flags;
>> +
>> +    if (!mon) {
>> +        errno = ENOENT;
>> +        return -1;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id != fdset_id) {
>> +            continue;
>> +        }
>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>> +            if (mon_fdset_fd->removed) {
>> +                continue;
>> +            }
>> +
>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>> +            if (mon_fd_flags == -1) {
>> +                return -1;
>> +            }
>> +
>> +            switch (flags & O_ACCMODE) {
>> +            case O_RDWR:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            case O_RDONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            case O_WRONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            }
>
> I think you mean:
>
>    if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
>        return mon_fdset_fd->fd;
>    }

Yes, that would be a bit simpler wouldn't it. :)

>
> What about other flags that cannot be set with fcntl(), like O_SYNC on
> older kernels or possibly non-Linux? (The block layer doesn't use it any
> more, but I think we want to keep the function generally useful)
>

I see what you're getting at here.  Basically you could have 2 fds in an 
fdset with the same access mode flags, but one has O_SYNC on and the 
other has O_SYNC off.  That seems like it would make sense to implement. 
  As a work-around, I think a user could just create a separate fdset 
for the same file with different O_SYNC value.  But from a client 
perspective, it would be nicer to have this taken care of for you.

>> +        }
>> +        errno = EACCES;
>> +        return -1;
>> +    }
>> +    errno = ENOENT;
>> +    return -1;
>> +}
>> +
>>   /* mon_cmds and info_cmds would be sorted at runtime */
>>   static mon_cmd_t mon_cmds[] = {
>>   #include "hmp-commands.h"
>
>> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>   #endif
>>   }
>>
>> +/*
>> + * Dups an fd and sets the flags
>> + */
>> +static int qemu_dup(int fd, int flags)
>> +{
>> +    int i;
>> +    int ret;
>> +    int serrno;
>> +    int dup_flags;
>> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>> +                          O_NONBLOCK, 0 };
>> +
>> +    if (flags & O_CLOEXEC) {
>> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>> +        if (ret == -1 && errno == EINVAL) {
>> +            ret = dup(fd);
>> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>> +                goto fail;
>> +            }
>> +        }
>> +    } else {
>> +        ret = dup(fd);
>> +    }
>> +
>> +    if (ret == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    dup_flags = fcntl(ret, F_GETFL);
>> +    if (dup_flags == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>> +        errno = EINVAL;
>> +        goto fail;
>> +    }
>
> It's worth trying to set it before failing, newer kernels can do it. But
> as I said above, if you can fail here, it makes sense to consider O_SYNC
> when selecting the right file descriptor from the fdset.
>

I'm on a 3.4.4 Fedora kernel that doesn't appear to support 
fcntl(O_SYNC), but perhaps I'm doing something wrong.  Here's my test 
code (shortened for simplicty):

int main() {
     int fd;
     int flags;

     fd = open("/tmp/corey", O_RDWR | O_CREAT,
               S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

     flags = fcntl(fd, F_GETFL);
     printf("flags=%08x\n", flags); //A

     fcntl(fd, F_SETFL, O_SYNC);
     printf("O_SYNC=%08x\n", O_SYNC);

     flags = fcntl(fd, F_GETFL);
     printf("flags=%08x\n", flags); //B
}

fcntl doesn't fail, but the flags I print out at A are the same as the 
flags printed out at B, so it appears that O_SYNC doesn't get set.

>> +    /* Set/unset flags that we can with fcntl */
>> +    i = 0;
>> +    while (setfl_flags[i] != 0) {
>> +        if (flags & setfl_flags[i]) {
>> +            dup_flags = (dup_flags | setfl_flags[i]);
>> +        } else {
>> +            dup_flags = (dup_flags & ~setfl_flags[i]);
>> +        }
>> +        i++;
>> +    }
>
> What about this instead of the loop:
>
>    int setfl_flags = O_APPEND | O_ASYNC | ... ;
>
>    dup_flags &= ~setfl_flags;
>    dup_flags |= (flags & setfl_flags);
>
>

I like your suggestion, it's much simpler.

>> +
>> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    /* Truncate the file in the cases that open() would truncate it */
>> +    if (flags & O_TRUNC ||
>> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
>> +        if (ftruncate(ret, 0) == -1) {
>> +            goto fail;
>> +        }
>> +    }
>
> O_CREAT | O_EXCL kind of loses its meaning here, but okay, it's hard to
> do better with file descriptors.
>

I agree and I don't know if we can do any better.

>> +
>> +    qemu_set_cloexec(ret);
>
> Wait... If O_CLOEXEC is set, you set the flag immediately and if it
> isn't you set it at the end of the function? What's the intended use
> case for not using O_CLOEXEC then?
>

This is a mistake.  I think I just need to be using qemu_set_cloexec() 
instead of fcntl_setfl() earlier in this function and get rid of this 
latter call to qemu_set_cloexec().

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets
  2012-07-24 12:09 ` [Qemu-devel] [PATCH v5 0/6] file descriptor passing using " Kevin Wolf
@ 2012-07-25  3:42   ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-25  3:42 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake



On 07/24/2012 08:09 AM, Kevin Wolf wrote:
> Am 23.07.2012 15:07, schrieb Corey Bryant:
>> Corey Bryant (6):
>>    qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
>>    qapi: Introduce add-fd, remove-fd, query-fdsets
>>    monitor: Clean up fd sets on monitor disconnect
>>    block: Convert open calls to qemu_open
>>    block: Convert close calls to qemu_close
>>    block: Enable qemu_open/close to work with fd sets
>>
>>   block/raw-posix.c |   42 ++++-----
>>   block/raw-win32.c |    6 +-
>>   block/vdi.c       |    5 +-
>>   block/vmdk.c      |   25 +++---
>>   block/vpc.c       |    4 +-
>>   block/vvfat.c     |   16 ++--
>>   cutils.c          |    5 ++
>>   monitor.c         |  244 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   monitor.h         |    4 +
>>   osdep.c           |  141 +++++++++++++++++++++++++++++++
>>   qapi-schema.json  |   97 +++++++++++++++++++++
>>   qemu-char.c       |   10 ++-
>>   qemu-common.h     |    2 +
>>   qemu-tool.c       |   12 +++
>>   qmp-commands.hx   |  121 ++++++++++++++++++++++++++
>>   savevm.c          |    4 +-
>>   16 files changed, 684 insertions(+), 54 deletions(-)
>
> Apart from the points I commented on in patch 6, and what Eric and you
> found, this looks good to me.
>
> Kevin
>

Great, thanks for the review!

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-25  3:41     ` Corey Bryant
@ 2012-07-25  8:22       ` Kevin Wolf
  2012-07-25 19:25         ` Eric Blake
  0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-07-25  8:22 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, eblake

Am 25.07.2012 05:41, schrieb Corey Bryant:
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index a172de3..5d0a801 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>>   out_free_buf:
>>>       qemu_vfree(s->aligned_buf);
>>>   out_close:
>>> -    qemu_close(fd);
>>> +    qemu_close(fd, filename);
>>>       return -errno;
>>>   }
>>
>> Hm, not a nice interface where qemu_close() needs the filename and
>> (worse) could be given a wrong filename. Maybe it would be better to
>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>
> 
> I agree, I don't really like it either.
> 
> We already have a list of fd -> fdset mappings (mon_fdset_fd_t -> 
> mon_fdset_t).  Would it be too costly to loop through all the fdsets/fds 
> at the beginning of every qemu_close()?

I don't think so. qemu_close() is not a fast path and happens almost
never, and the list is short enough that searching it isn't a problem
anyway.

>>> +            switch (flags & O_ACCMODE) {
>>> +            case O_RDWR:
>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>>> +                    return mon_fdset_fd->fd;
>>> +                }
>>> +                break;
>>> +            case O_RDONLY:
>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>>> +                    return mon_fdset_fd->fd;
>>> +                }
>>> +                break;
>>> +            case O_WRONLY:
>>> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>>> +                    return mon_fdset_fd->fd;
>>> +                }
>>> +                break;
>>> +            }
>>
>> I think you mean:
>>
>>    if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
>>        return mon_fdset_fd->fd;
>>    }
> 
> Yes, that would be a bit simpler wouldn't it. :)
> 
>>
>> What about other flags that cannot be set with fcntl(), like O_SYNC on
>> older kernels or possibly non-Linux? (The block layer doesn't use it any
>> more, but I think we want to keep the function generally useful)
>>
> 
> I see what you're getting at here.  Basically you could have 2 fds in an 
> fdset with the same access mode flags, but one has O_SYNC on and the 
> other has O_SYNC off.  That seems like it would make sense to implement. 
>   As a work-around, I think a user could just create a separate fdset 
> for the same file with different O_SYNC value.  But from a client 
> perspective, it would be nicer to have this taken care of for you.

Now that the block layer doesn't use O_SYNC any more, it's more of a
theoretical point. I don't think there's any other place, where we'd
need to switch O_SYNC during runtime.

Taking it into consideration is complicated by the fact that some
kernels allow to fcntl() O_SYNC and others don't, so enforcing a match
here wouldn't feel completely right either.

Maybe just leave it as it is. :-)

> 
>>> +        }
>>> +        errno = EACCES;
>>> +        return -1;
>>> +    }
>>> +    errno = ENOENT;
>>> +    return -1;
>>> +}
>>> +
>>>   /* mon_cmds and info_cmds would be sorted at runtime */
>>>   static mon_cmd_t mon_cmds[] = {
>>>   #include "hmp-commands.h"
>>
>>> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>>   #endif
>>>   }
>>>
>>> +/*
>>> + * Dups an fd and sets the flags
>>> + */
>>> +static int qemu_dup(int fd, int flags)
>>> +{
>>> +    int i;
>>> +    int ret;
>>> +    int serrno;
>>> +    int dup_flags;
>>> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>>> +                          O_NONBLOCK, 0 };
>>> +
>>> +    if (flags & O_CLOEXEC) {
>>> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>>> +        if (ret == -1 && errno == EINVAL) {
>>> +            ret = dup(fd);
>>> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>>> +                goto fail;
>>> +            }
>>> +        }
>>> +    } else {
>>> +        ret = dup(fd);
>>> +    }
>>> +
>>> +    if (ret == -1) {
>>> +        goto fail;
>>> +    }
>>> +
>>> +    dup_flags = fcntl(ret, F_GETFL);
>>> +    if (dup_flags == -1) {
>>> +        goto fail;
>>> +    }
>>> +
>>> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>>> +        errno = EINVAL;
>>> +        goto fail;
>>> +    }
>>
>> It's worth trying to set it before failing, newer kernels can do it. But
>> as I said above, if you can fail here, it makes sense to consider O_SYNC
>> when selecting the right file descriptor from the fdset.
>>
> 
> I'm on a 3.4.4 Fedora kernel that doesn't appear to support 
> fcntl(O_SYNC), but perhaps I'm doing something wrong.  Here's my test 
> code (shortened for simplicty): [...]

Hm, true. So it seems that patch has never made it into the kernel, in
fact...

>>> +
>>> +    qemu_set_cloexec(ret);
>>
>> Wait... If O_CLOEXEC is set, you set the flag immediately and if it
>> isn't you set it at the end of the function? What's the intended use
>> case for not using O_CLOEXEC then?
>>
> 
> This is a mistake.  I think I just need to be using qemu_set_cloexec() 
> instead of fcntl_setfl() earlier in this function and get rid of this 
> latter call to qemu_set_cloexec().

Yes, probably. And in fact, I think this shouldn't even be conditional
on flags & O_CLOEXEC. The whole reason qemu_open() was introduced
originally was to always set O_CLOEXEC.

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets Corey Bryant
@ 2012-07-25 18:16   ` Eric Blake
  2012-07-26  2:55     ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-25 18:16 UTC (permalink / raw)
  To: Corey Bryant
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

On 07/23/2012 07:08 AM, Corey Bryant wrote:
> This patch adds support that enables passing of file descriptors
> to the QEMU monitor where they will be stored in specified file
> descriptor sets.
> 
> +##
> +# @add-fd:
> +#
> +# Add a file descriptor, that was passed via SCM rights, to an fd set.
> +#
> +# @fdset_id: The ID of the fd set to add the file descriptor to.

Should this parameter be optional, in which case qemu generates the set
number as the next available?  The return information would include the
set number whether or not this parameter was present from the caller,
and allowing it to be optional makes it slightly easier for callers to
generate a new set without having to make the caller track which set ids
are in use.

> +##
> +# @remove-fd:
> +#
> +# Remove a file descriptor from an fd set.
> +#
> +# @fdset_id: The ID of the fd set that the file descriptor belongs to.
> +#
> +# @fd: The file descriptor that is to be removed.

Should 'fd' be optional, as a shorthand for saying that _all_ fds
belonging to the set be marked as removed at once?

> +++ b/qmp-commands.hx
> @@ -926,6 +926,127 @@ Example:
>  
>  EQMP
>  
> +     {
> +        .name       = "add-fd",
> +        .args_type  = "fdset_id:i",
> +        .params     = "add-fd fdset_id",
> +        .help       = "Add a file descriptor, that was passed via SCM rights, to an fd set",
> +        .mhandler.cmd_new = qmp_marshal_input_add_fd,
> +    },
> +
> +SQMP
> +add-fd
> +-------
> +
> +Add a file descriptor, that was passed via SCM rights, to an fd set.
> +
> +Arguments:
> +
> +- "fdset_id": The ID of the fd set to add the file descriptor to.  (json-int)

May need tweaking, based on the answers to the above questions.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open Corey Bryant
@ 2012-07-25 19:22   ` Eric Blake
  2012-07-26  3:11     ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-25 19:22 UTC (permalink / raw)
  To: Corey Bryant
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 2109 bytes --]

On 07/23/2012 07:08 AM, Corey Bryant wrote:
> This patch converts all block layer open calls to qemu_open.
> 
> Note that this adds the O_CLOEXEC flag to the changed open paths
> when the O_CLOEXEC macro is defined.

Is it actually adding O_CLOEXEC, or just the ability to use O_CLOEXEC?

Or is the actual change that the end result is that the fd now has
FD_CLOEXEC set unconditionally, whether by O_CLOEXEC (which the caller
need not pass) or by fcntl()?

> +++ b/block/raw-posix.c
> @@ -572,8 +572,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
>          options++;
>      }
>  
> -    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> -              0644);
> +    fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> +                   0644);

After all, I don't see O_CLOEXEC used here.

>      if (fd < 0) {
>          result = -errno;
>      } else {
> @@ -846,7 +846,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>          if ( bsdPath[ 0 ] != '\0' ) {
>              strcat(bsdPath,"s0");
>              /* some CDs don't have a partition 0 */
> -            fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
> +            fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);

Also, I still stand by my earlier claim that we don't need O_LARGEFILE
here (we should already be configuring for 64-bit off_t by default),
although cleaning that up is probably worth an independent commit.


> +++ b/block/vdi.c
> @@ -653,8 +653,9 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options)
>          options++;
>      }
>  
> -    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
> -              0644);
> +    fd = qemu_open(filename,
> +                   O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
> +                   0644);

Another pointless O_LARGEFILE, and so forth.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-25  8:22       ` Kevin Wolf
@ 2012-07-25 19:25         ` Eric Blake
  2012-07-26  3:21           ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-25 19:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, Corey Bryant, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On 07/25/2012 02:22 AM, Kevin Wolf wrote:
>>> Hm, not a nice interface where qemu_close() needs the filename and
>>> (worse) could be given a wrong filename. Maybe it would be better to
>>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>>
>>
>> I agree, I don't really like it either.
>>
>> We already have a list of fd -> fdset mappings (mon_fdset_fd_t -> 
>> mon_fdset_t).  Would it be too costly to loop through all the fdsets/fds 
>> at the beginning of every qemu_close()?
> 
> I don't think so. qemu_close() is not a fast path and happens almost
> never, and the list is short enough that searching it isn't a problem
> anyway.

I agree - just do the loop to do the reverse lookup yourself, rather
than making qemu_close() have a different signature than close().

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
  2012-07-23 13:14   ` Corey Bryant
  2012-07-24 12:07   ` Kevin Wolf
@ 2012-07-25 19:43   ` Eric Blake
  2012-07-26  3:57     ` Corey Bryant
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-25 19:43 UTC (permalink / raw)
  To: Corey Bryant
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 6009 bytes --]

On 07/23/2012 07:08 AM, Corey Bryant wrote:
> When qemu_open is passed a filename of the "/dev/fdset/nnn"
> format (where nnn is the fdset ID), an fd with matching access
> mode flags will be searched for within the specified monitor
> fd set.  If the fd is found, a dup of the fd will be returned
> from qemu_open.
> 
> Each fd set has a reference count.  The purpose of the reference
> count is to determine if an fd set contains file descriptors that
> have open dup() references that have not yet been closed.  It is
> incremented on qemu_open and decremented on qemu_close.  It is
> not until the refcount is zero that file desriptors in an fd set
> can be closed.  If an fd set has dup() references open, then we
> must keep the other fds in the fd set open in case a reopen
> of the file occurs that requires an fd with a different access
> mode.
> 

> +++ b/monitor.c
> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>      }
>  }
>  
> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
> +{
> +    mon_fdset_t *mon_fdset;
> +
> +    if (!mon) {
> +        return;
> +    }

Am I reading this code right by stating that 'if there is no monitor, we
don't increment the refcount'?  How does a monitor reattach affect
things?  Or am I missing something fundamental about the cases when
'mon==NULL' will exist?

> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
> +{
> +    mon_fdset_t *mon_fdset;
> +    mon_fdset_fd_t *mon_fdset_fd;
> +    int mon_fd_flags;
> +
> +    if (!mon) {
> +        errno = ENOENT;
> +        return -1;
> +    }
> +
> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
> +        if (mon_fdset->id != fdset_id) {
> +            continue;
> +        }
> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
> +            if (mon_fdset_fd->removed) {
> +                continue;
> +            }
> +
> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
> +            if (mon_fd_flags == -1) {
> +                return -1;

This says we fail on the first fcntl() failure, instead of trying other
fds in the set.  Granted, an fcntl() failure is probably the sign of a
bigger bug (such as closing an fd at the wrong point in time), so I
guess trying to go on doesn't make much sense once we already know we
are hosed.

> +            }
> +
> +            switch (flags & O_ACCMODE) {
> +            case O_RDWR:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;
> +            case O_RDONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;

Do we want to allow the case where the caller asked for O_RDONLY, but
the set only has O_RDWR?  After all, the caller is getting a compatible
subset of what the set offers.

> +            case O_WRONLY:
> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
> +                    return mon_fdset_fd->fd;
> +                }
> +                break;

Likewise, should we allow a caller asking for O_WRONLY when the set
provides only O_RDWR?

>  
> +/*
> + * Dups an fd and sets the flags
> + */
> +static int qemu_dup(int fd, int flags)
> +{
> +    int i;
> +    int ret;
> +    int serrno;
> +    int dup_flags;
> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
> +                          O_NONBLOCK, 0 };
> +
> +    if (flags & O_CLOEXEC) {
> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);

F_DUPFD_CLOEXEC is required by POSIX but not implemented on all modern
OS yet; you probably need some #ifdef and/or configure guards.

> +        if (ret == -1 && errno == EINVAL) {
> +            ret = dup(fd);
> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {

You _can't_ call F_SETFL with O_CLOEXEC.  O_CLOEXEC only causes open()
to set FD_CLOEXEC; thereafter, including in the case of this dup, what
you want to do is instead set FD_CLOEXEC via F_SETFD (aka call
qemu_set_cloexec, not fcntl_setfl).

> +                goto fail;
> +            }
> +        }
> +    } else {
> +        ret = dup(fd);
> +    }
> +
> +    if (ret == -1) {
> +        goto fail;
> +    }
> +
> +    dup_flags = fcntl(ret, F_GETFL);
> +    if (dup_flags == -1) {
> +        goto fail;
> +    }
> +
> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
> +        errno = EINVAL;
> +        goto fail;
> +    }
> +
> +    /* Set/unset flags that we can with fcntl */
> +    i = 0;
> +    while (setfl_flags[i] != 0) {
> +        if (flags & setfl_flags[i]) {
> +            dup_flags = (dup_flags | setfl_flags[i]);
> +        } else {
> +            dup_flags = (dup_flags & ~setfl_flags[i]);
> +        }
> +        i++;
> +    }

Rather than looping one bit at a time, you should do this as a mask
operation.

> +
> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
> +        goto fail;
> +    }
> +
> +    /* Truncate the file in the cases that open() would truncate it */
> +    if (flags & O_TRUNC ||
> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
> +        if (ftruncate(ret, 0) == -1) {
> +            goto fail;
> +        }
> +    }
> +
> +    qemu_set_cloexec(ret);

If we're going to blindly set FD_CLOEXEC at the end of the day, rather
than try to honor O_CLOEXEC, then why not simplify the beginning of this
function:

    ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
    if (ret == -1 && errno == EINVAL) {
        ret = dup(fd);
        if (ret != -1) {
            qemu_set_cloexec(ret);
        }
    }
    if (ret == -1) {
        goto fail;
    }

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets
  2012-07-25 18:16   ` Eric Blake
@ 2012-07-26  2:55     ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-26  2:55 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/25/2012 02:16 PM, Eric Blake wrote:
> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>> This patch adds support that enables passing of file descriptors
>> to the QEMU monitor where they will be stored in specified file
>> descriptor sets.
>>
>> +##
>> +# @add-fd:
>> +#
>> +# Add a file descriptor, that was passed via SCM rights, to an fd set.
>> +#
>> +# @fdset_id: The ID of the fd set to add the file descriptor to.
>
> Should this parameter be optional, in which case qemu generates the set
> number as the next available?  The return information would include the
> set number whether or not this parameter was present from the caller,
> and allowing it to be optional makes it slightly easier for callers to
> generate a new set without having to make the caller track which set ids
> are in use.
>

Yes, I agree, I think @fdset_id should be optional.

>> +##
>> +# @remove-fd:
>> +#
>> +# Remove a file descriptor from an fd set.
>> +#
>> +# @fdset_id: The ID of the fd set that the file descriptor belongs to.
>> +#
>> +# @fd: The file descriptor that is to be removed.
>
> Should 'fd' be optional, as a shorthand for saying that _all_ fds
> belonging to the set be marked as removed at once?
>

Yes, this makes sense.

>> +++ b/qmp-commands.hx
>> @@ -926,6 +926,127 @@ Example:
>>
>>   EQMP
>>
>> +     {
>> +        .name       = "add-fd",
>> +        .args_type  = "fdset_id:i",
>> +        .params     = "add-fd fdset_id",
>> +        .help       = "Add a file descriptor, that was passed via SCM rights, to an fd set",
>> +        .mhandler.cmd_new = qmp_marshal_input_add_fd,
>> +    },
>> +
>> +SQMP
>> +add-fd
>> +-------
>> +
>> +Add a file descriptor, that was passed via SCM rights, to an fd set.
>> +
>> +Arguments:
>> +
>> +- "fdset_id": The ID of the fd set to add the file descriptor to.  (json-int)
>
> May need tweaking, based on the answers to the above questions.
>

Ok.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open
  2012-07-25 19:22   ` Eric Blake
@ 2012-07-26  3:11     ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-26  3:11 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/25/2012 03:22 PM, Eric Blake wrote:
> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>> This patch converts all block layer open calls to qemu_open.
>>
>> Note that this adds the O_CLOEXEC flag to the changed open paths
>> when the O_CLOEXEC macro is defined.
>
> Is it actually adding O_CLOEXEC, or just the ability to use O_CLOEXEC?

It is adding O_CLOEXEC in qemu_open() on the open() call (as long as it 
is defined).

>
> Or is the actual change that the end result is that the fd now has
> FD_CLOEXEC set unconditionally, whether by O_CLOEXEC (which the caller
> need not pass) or by fcntl()?
>

The statement in the commit message isn't referring to anything dealing 
with qemu_dup(). The statement is specifically talking about the open() 
call in qemu_open(), and the point is to alert folks that old open() 
calls are now being routed through qemu_open() and likely are using the 
O_CLOEXEC flag, which is new behavior.

>> +++ b/block/raw-posix.c
>> @@ -572,8 +572,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
>>           options++;
>>       }
>>
>> -    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
>> -              0644);
>> +    fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
>> +                   0644);
>
> After all, I don't see O_CLOEXEC used here.
>

That's right.  qemu_open() adds O_CLOEXEC on the open() call.

>>       if (fd < 0) {
>>           result = -errno;
>>       } else {
>> @@ -846,7 +846,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>>           if ( bsdPath[ 0 ] != '\0' ) {
>>               strcat(bsdPath,"s0");
>>               /* some CDs don't have a partition 0 */
>> -            fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
>> +            fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
>
> Also, I still stand by my earlier claim that we don't need O_LARGEFILE
> here (we should already be configuring for 64-bit off_t by default),
> although cleaning that up is probably worth an independent commit.
>

I have a note to get rid of O_LARGEFILE as a separate follow-on patch.

>
>> +++ b/block/vdi.c
>> @@ -653,8 +653,9 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options)
>>           options++;
>>       }
>>
>> -    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
>> -              0644);
>> +    fd = qemu_open(filename,
>> +                   O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
>> +                   0644);
>
> Another pointless O_LARGEFILE, and so forth.
>

Yep.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-25 19:25         ` Eric Blake
@ 2012-07-26  3:21           ` Corey Bryant
  2012-07-26 13:13             ` Eric Blake
  0 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-07-26  3:21 UTC (permalink / raw)
  To: Eric Blake
  Cc: Kevin Wolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/25/2012 03:25 PM, Eric Blake wrote:
> On 07/25/2012 02:22 AM, Kevin Wolf wrote:
>>>> Hm, not a nice interface where qemu_close() needs the filename and
>>>> (worse) could be given a wrong filename. Maybe it would be better to
>>>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>>>
>>>
>>> I agree, I don't really like it either.
>>>
>>> We already have a list of fd -> fdset mappings (mon_fdset_fd_t ->
>>> mon_fdset_t).  Would it be too costly to loop through all the fdsets/fds
>>> at the beginning of every qemu_close()?
>>
>> I don't think so. qemu_close() is not a fast path and happens almost
>> never, and the list is short enough that searching it isn't a problem
>> anyway.
>
> I agree - just do the loop to do the reverse lookup yourself, rather
> than making qemu_close() have a different signature than close().
>

Great, I'll do this then.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-25 19:43   ` Eric Blake
@ 2012-07-26  3:57     ` Corey Bryant
  2012-07-26  9:07       ` Kevin Wolf
  2012-08-02 15:08       ` Corey Bryant
  0 siblings, 2 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-26  3:57 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/25/2012 03:43 PM, Eric Blake wrote:
> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>> When qemu_open is passed a filename of the "/dev/fdset/nnn"
>> format (where nnn is the fdset ID), an fd with matching access
>> mode flags will be searched for within the specified monitor
>> fd set.  If the fd is found, a dup of the fd will be returned
>> from qemu_open.
>>
>> Each fd set has a reference count.  The purpose of the reference
>> count is to determine if an fd set contains file descriptors that
>> have open dup() references that have not yet been closed.  It is
>> incremented on qemu_open and decremented on qemu_close.  It is
>> not until the refcount is zero that file desriptors in an fd set
>> can be closed.  If an fd set has dup() references open, then we
>> must keep the other fds in the fd set open in case a reopen
>> of the file occurs that requires an fd with a different access
>> mode.
>>
>
>> +++ b/monitor.c
>> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>>       }
>>   }
>>
>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +
>> +    if (!mon) {
>> +        return;
>> +    }
>
> Am I reading this code right by stating that 'if there is no monitor, we
> don't increment the refcount'?  How does a monitor reattach affect
> things?  Or am I missing something fundamental about the cases when
> 'mon==NULL' will exist?
>

Yes you're reading this correctly.

I'm pretty sure that mon will only be NULL if QEMU is started without a 
monitor.

If QEMU has a monitor, and libvirt disconnects it's connection to the 
qemu monitor, then I believe mon will remain non-NULL.

I'll plan on testing this out to verify though.  (I'm out most of this 
week and will be back full time starting next Tues.)

>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +    mon_fdset_fd_t *mon_fdset_fd;
>> +    int mon_fd_flags;
>> +
>> +    if (!mon) {
>> +        errno = ENOENT;
>> +        return -1;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id != fdset_id) {
>> +            continue;
>> +        }
>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>> +            if (mon_fdset_fd->removed) {
>> +                continue;
>> +            }
>> +
>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>> +            if (mon_fd_flags == -1) {
>> +                return -1;
>
> This says we fail on the first fcntl() failure, instead of trying other
> fds in the set.  Granted, an fcntl() failure is probably the sign of a
> bigger bug (such as closing an fd at the wrong point in time), so I
> guess trying to go on doesn't make much sense once we already know we
> are hosed.
>

I think I'll stick with it the way it is.  If fcntl() fails we might 
have a tainted fd set so I think we should fail.

>> +            }
>> +
>> +            switch (flags & O_ACCMODE) {
>> +            case O_RDWR:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            case O_RDONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>
> Do we want to allow the case where the caller asked for O_RDONLY, but
> the set only has O_RDWR?  After all, the caller is getting a compatible
> subset of what the set offers.
>

I don't see a problem with it.

>> +            case O_WRONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>
> Likewise, should we allow a caller asking for O_WRONLY when the set
> provides only O_RDWR?
>

I don't see a problem with it.

>>
>> +/*
>> + * Dups an fd and sets the flags
>> + */
>> +static int qemu_dup(int fd, int flags)
>> +{
>> +    int i;
>> +    int ret;
>> +    int serrno;
>> +    int dup_flags;
>> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>> +                          O_NONBLOCK, 0 };
>> +
>> +    if (flags & O_CLOEXEC) {
>> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>
> F_DUPFD_CLOEXEC is required by POSIX but not implemented on all modern
> OS yet; you probably need some #ifdef and/or configure guards.
>

Ok

>> +        if (ret == -1 && errno == EINVAL) {
>> +            ret = dup(fd);
>> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>
> You _can't_ call F_SETFL with O_CLOEXEC.  O_CLOEXEC only causes open()
> to set FD_CLOEXEC; thereafter, including in the case of this dup, what
> you want to do is instead set FD_CLOEXEC via F_SETFD (aka call
> qemu_set_cloexec, not fcntl_setfl).
>

I know, this is a mistake.  I'm planning to replace fcntl_setfl() here 
with qemu_set_cloexec().

>> +                goto fail;
>> +            }
>> +        }
>> +    } else {
>> +        ret = dup(fd);
>> +    }
>> +
>> +    if (ret == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    dup_flags = fcntl(ret, F_GETFL);
>> +    if (dup_flags == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>> +        errno = EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    /* Set/unset flags that we can with fcntl */
>> +    i = 0;
>> +    while (setfl_flags[i] != 0) {
>> +        if (flags & setfl_flags[i]) {
>> +            dup_flags = (dup_flags | setfl_flags[i]);
>> +        } else {
>> +            dup_flags = (dup_flags & ~setfl_flags[i]);
>> +        }
>> +        i++;
>> +    }
>
> Rather than looping one bit at a time, you should do this as a mask
> operation.
>

I agree.

>> +
>> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    /* Truncate the file in the cases that open() would truncate it */
>> +    if (flags & O_TRUNC ||
>> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
>> +        if (ftruncate(ret, 0) == -1) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    qemu_set_cloexec(ret);
>
> If we're going to blindly set FD_CLOEXEC at the end of the day, rather
> than try to honor O_CLOEXEC, then why not simplify the beginning of this
> function:

This call to qemu_set_cloexec() was a mistake.  I'm planning on removing it.

>
>      ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>      if (ret == -1 && errno == EINVAL) {
>          ret = dup(fd);
>          if (ret != -1) {
>              qemu_set_cloexec(ret);
>          }
>      }
>      if (ret == -1) {
>          goto fail;
>      }
>

I'll plan on sticking with the existing code in the beginning of this 
function with the modifications mentioned above.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26  3:57     ` Corey Bryant
@ 2012-07-26  9:07       ` Kevin Wolf
  2012-07-27  3:59         ` Corey Bryant
  2012-07-27  4:03         ` Corey Bryant
  2012-08-02 15:08       ` Corey Bryant
  1 sibling, 2 replies; 36+ messages in thread
From: Kevin Wolf @ 2012-07-26  9:07 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, Eric Blake

Am 26.07.2012 05:57, schrieb Corey Bryant:
> On 07/25/2012 03:43 PM, Eric Blake wrote:
>> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>>> +{
>>> +    mon_fdset_t *mon_fdset;
>>> +    mon_fdset_fd_t *mon_fdset_fd;
>>> +    int mon_fd_flags;
>>> +
>>> +    if (!mon) {
>>> +        errno = ENOENT;
>>> +        return -1;
>>> +    }
>>> +
>>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>>> +        if (mon_fdset->id != fdset_id) {
>>> +            continue;
>>> +        }
>>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>>> +            if (mon_fdset_fd->removed) {
>>> +                continue;
>>> +            }
>>> +
>>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>>> +            if (mon_fd_flags == -1) {
>>> +                return -1;
>>
>> This says we fail on the first fcntl() failure, instead of trying other
>> fds in the set.  Granted, an fcntl() failure is probably the sign of a
>> bigger bug (such as closing an fd at the wrong point in time), so I
>> guess trying to go on doesn't make much sense once we already know we
>> are hosed.
>>
> 
> I think I'll stick with it the way it is.  If fcntl() fails we might 
> have a tainted fd set so I think we should fail.

The alternative would be s/return 1/continue/, right? I think either way
is acceptable.

>>> +            }
>>> +
>>> +            switch (flags & O_ACCMODE) {
>>> +            case O_RDWR:
>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>>> +                    return mon_fdset_fd->fd;
>>> +                }
>>> +                break;
>>> +            case O_RDONLY:
>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>>> +                    return mon_fdset_fd->fd;
>>> +                }
>>> +                break;
>>
>> Do we want to allow the case where the caller asked for O_RDONLY, but
>> the set only has O_RDWR?  After all, the caller is getting a compatible
>> subset of what the set offers.
> 
> I don't see a problem with it.

I would require exact matches like you implemented, in order to prevent
damage if we ever had a bug that writes to a read-only file. I believe
it also makes the semantics clearer and the code simpler, while it
shouldn't make much of a difference for clients.

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26  3:21           ` Corey Bryant
@ 2012-07-26 13:13             ` Eric Blake
  2012-07-26 13:16               ` Kevin Wolf
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2012-07-26 13:13 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Kevin Wolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

On 07/25/2012 09:21 PM, Corey Bryant wrote:
> 
> 
> On 07/25/2012 03:25 PM, Eric Blake wrote:
>> On 07/25/2012 02:22 AM, Kevin Wolf wrote:
>>>>> Hm, not a nice interface where qemu_close() needs the filename and
>>>>> (worse) could be given a wrong filename. Maybe it would be better to
>>>>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>>>>
>>>>
>>>> I agree, I don't really like it either.
>>>>
>>>> We already have a list of fd -> fdset mappings (mon_fdset_fd_t ->
>>>> mon_fdset_t).  Would it be too costly to loop through all the
>>>> fdsets/fds
>>>> at the beginning of every qemu_close()?
>>>
>>> I don't think so. qemu_close() is not a fast path and happens almost
>>> never, and the list is short enough that searching it isn't a problem
>>> anyway.
>>
>> I agree - just do the loop to do the reverse lookup yourself, rather
>> than making qemu_close() have a different signature than close().
>>
> 
> Great, I'll do this then.

You may want an optimization of using a bitset for tracking which fds
are tracked by fdset in the first place, so that the fast path of
qemu_close() will be a check against the bitset to see if you even have
to waste time on the reverse lookup in the first place.  The bitset will
typically be small (bounded not only by the maximum possible fd, but
further by the fact that we don't usually open that many fds in the
first place), but I'm not sure if you can get away with static sizing.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26 13:13             ` Eric Blake
@ 2012-07-26 13:16               ` Kevin Wolf
  2012-07-27  4:07                 ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-07-26 13:16 UTC (permalink / raw)
  To: Eric Blake
  Cc: aliguori, stefanha, libvir-list, Corey Bryant, qemu-devel, lcapitulino

Am 26.07.2012 15:13, schrieb Eric Blake:
> On 07/25/2012 09:21 PM, Corey Bryant wrote:
>>
>>
>> On 07/25/2012 03:25 PM, Eric Blake wrote:
>>> On 07/25/2012 02:22 AM, Kevin Wolf wrote:
>>>>>> Hm, not a nice interface where qemu_close() needs the filename and
>>>>>> (worse) could be given a wrong filename. Maybe it would be better to
>>>>>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>>>>>
>>>>>
>>>>> I agree, I don't really like it either.
>>>>>
>>>>> We already have a list of fd -> fdset mappings (mon_fdset_fd_t ->
>>>>> mon_fdset_t).  Would it be too costly to loop through all the
>>>>> fdsets/fds
>>>>> at the beginning of every qemu_close()?
>>>>
>>>> I don't think so. qemu_close() is not a fast path and happens almost
>>>> never, and the list is short enough that searching it isn't a problem
>>>> anyway.
>>>
>>> I agree - just do the loop to do the reverse lookup yourself, rather
>>> than making qemu_close() have a different signature than close().
>>>
>>
>> Great, I'll do this then.
> 
> You may want an optimization of using a bitset for tracking which fds
> are tracked by fdset in the first place, so that the fast path of
> qemu_close() will be a check against the bitset to see if you even have
> to waste time on the reverse lookup in the first place.  The bitset will
> typically be small (bounded not only by the maximum possible fd, but
> further by the fact that we don't usually open that many fds in the
> first place), but I'm not sure if you can get away with static sizing.

Premature optimisation, in my opinion. The list is really small.

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26  9:07       ` Kevin Wolf
@ 2012-07-27  3:59         ` Corey Bryant
  2012-07-27  4:03         ` Corey Bryant
  1 sibling, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-27  3:59 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, Eric Blake



On 07/26/2012 05:07 AM, Kevin Wolf wrote:
> Am 26.07.2012 05:57, schrieb Corey Bryant:
>> On 07/25/2012 03:43 PM, Eric Blake wrote:
>>> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>>>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>>>> +{
>>>> +    mon_fdset_t *mon_fdset;
>>>> +    mon_fdset_fd_t *mon_fdset_fd;
>>>> +    int mon_fd_flags;
>>>> +
>>>> +    if (!mon) {
>>>> +        errno = ENOENT;
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>>>> +        if (mon_fdset->id != fdset_id) {
>>>> +            continue;
>>>> +        }
>>>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>>>> +            if (mon_fdset_fd->removed) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>>>> +            if (mon_fd_flags == -1) {
>>>> +                return -1;
>>>
>>> This says we fail on the first fcntl() failure, instead of trying other
>>> fds in the set.  Granted, an fcntl() failure is probably the sign of a
>>> bigger bug (such as closing an fd at the wrong point in time), so I
>>> guess trying to go on doesn't make much sense once we already know we
>>> are hosed.
>>>
>>
>> I think I'll stick with it the way it is.  If fcntl() fails we might
>> have a tainted fd set so I think we should fail.
>
> The alternative would be s/return 1/continue/, right? I think either way
> is acceptable.
>
>>>> +            }
>>>> +
>>>> +            switch (flags & O_ACCMODE) {
>>>> +            case O_RDWR:
>>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>>>> +                    return mon_fdset_fd->fd;
>>>> +                }
>>>> +                break;
>>>> +            case O_RDONLY:
>>>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>>>> +                    return mon_fdset_fd->fd;
>>>> +                }
>>>> +                break;
>>>
>>> Do we want to allow the case where the caller asked for O_RDONLY, but
>>> the set only has O_RDWR?  After all, the caller is getting a compatible
>>> subset of what the set offers.
>>
>> I don't see a problem with it.
>
> I would require exact matches like you implemented, in order to prevent
> damage if we ever had a bug that writes to a read-only file. I believe
> it also makes the semantics clearer and the code simpler, while it
> shouldn't make much of a difference for clients.
>
> Kevin
>

Alright, then I'll plan on requiring exact matches of access mode flags.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26  9:07       ` Kevin Wolf
  2012-07-27  3:59         ` Corey Bryant
@ 2012-07-27  4:03         ` Corey Bryant
  1 sibling, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-27  4:03 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, Eric Blake



On 07/26/2012 05:07 AM, Kevin Wolf wrote:
> Am 26.07.2012 05:57, schrieb Corey Bryant:
>> On 07/25/2012 03:43 PM, Eric Blake wrote:
>>> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>>>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>>>> +{
>>>> +    mon_fdset_t *mon_fdset;
>>>> +    mon_fdset_fd_t *mon_fdset_fd;
>>>> +    int mon_fd_flags;
>>>> +
>>>> +    if (!mon) {
>>>> +        errno = ENOENT;
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>>>> +        if (mon_fdset->id != fdset_id) {
>>

>> +            continue;
>>>> +        }
>>>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>>>> +            if (mon_fdset_fd->removed) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>>>> +            if (mon_fd_flags == -1) {
>>>> +                return -1;
>>>
>>> This says we fail on the first fcntl() failure, instead of trying other
>>> fds in the set.  Granted, an fcntl() failure is probably the sign of a
>>> bigger bug (such as closing an fd at the wrong point in time), so I
>>> guess trying to go on doesn't make much sense once we already know we
>>> are hosed.
>>>
>>
>> I think I'll stick with it the way it is.  If fcntl() fails we might
>> have a tainted fd set so I think we should fail.
>
> The alternative would be s/return 1/continue/, right? I think either way
> is acceptable.
>

Yes, we'd continue the loop instead of returning -1.  I prefer to return 
on the first failure, but if anyone feels strongly about continuing the 
loop, please let me know.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26 13:16               ` Kevin Wolf
@ 2012-07-27  4:07                 ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-07-27  4:07 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, lcapitulino, Eric Blake



On 07/26/2012 09:16 AM, Kevin Wolf wrote:
> Am 26.07.2012 15:13, schrieb Eric Blake:
>> On 07/25/2012 09:21 PM, Corey Bryant wrote:
>>>
>>>
>>> On 07/25/2012 03:25 PM, Eric Blake wrote:
>>>> On 07/25/2012 02:22 AM, Kevin Wolf wrote:
>>>>>>> Hm, not a nice interface where qemu_close() needs the filename and
>>>>>>> (worse) could be given a wrong filename. Maybe it would be better to
>>>>>>> maintain a list of fd -> fdset mappings in qemu_open/close?
>>>>>>>
>>>>>>
>>>>>> I agree, I don't really like it either.
>>>>>>
>>>>>> We already have a list of fd -> fdset mappings (mon_fdset_fd_t ->
>>>>>> mon_fdset_t).  Would it be too costly to loop through all the
>>>>>> fdsets/fds
>>>>>> at the beginning of every qemu_close()?
>>>>>
>>>>> I don't think so. qemu_close() is not a fast path and happens almost
>>>>> never, and the list is short enough that searching it isn't a problem
>>>>> anyway.
>>>>
>>>> I agree - just do the loop to do the reverse lookup yourself, rather
>>>> than making qemu_close() have a different signature than close().
>>>>
>>>
>>> Great, I'll do this then.
>>
>> You may want an optimization of using a bitset for tracking which fds
>> are tracked by fdset in the first place, so that the fast path of
>> qemu_close() will be a check against the bitset to see if you even have
>> to waste time on the reverse lookup in the first place.  The bitset will
>> typically be small (bounded not only by the maximum possible fd, but
>> further by the fact that we don't usually open that many fds in the
>> first place), but I'm not sure if you can get away with static sizing.
>
> Premature optimisation, in my opinion. The list is really small.
>
> Kevin
>

I'll probably hold off on any optimisation at this point, but I can 
revisit it in the future if it's needed.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-26  3:57     ` Corey Bryant
  2012-07-26  9:07       ` Kevin Wolf
@ 2012-08-02 15:08       ` Corey Bryant
  1 sibling, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-08-02 15:08 UTC (permalink / raw)
  To: Eric Blake
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, lcapitulino



On 07/25/2012 11:57 PM, Corey Bryant wrote:
>
>
> On 07/25/2012 03:43 PM, Eric Blake wrote:
>> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>>> When qemu_open is passed a filename of the "/dev/fdset/nnn"
>>> format (where nnn is the fdset ID), an fd with matching access
>>> mode flags will be searched for within the specified monitor
>>> fd set.  If the fd is found, a dup of the fd will be returned
>>> from qemu_open.
>>>
>>> Each fd set has a reference count.  The purpose of the reference
>>> count is to determine if an fd set contains file descriptors that
>>> have open dup() references that have not yet been closed.  It is
>>> incremented on qemu_open and decremented on qemu_close.  It is
>>> not until the refcount is zero that file desriptors in an fd set
>>> can be closed.  If an fd set has dup() references open, then we
>>> must keep the other fds in the fd set open in case a reopen
>>> of the file occurs that requires an fd with a different access
>>> mode.
>>>
>>
>>> +++ b/monitor.c
>>> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor
>>> *mon, bool in_use)
>>>       }
>>>   }
>>>
>>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
>>> +{
>>> +    mon_fdset_t *mon_fdset;
>>> +
>>> +    if (!mon) {
>>> +        return;
>>> +    }
>>
>> Am I reading this code right by stating that 'if there is no monitor, we
>> don't increment the refcount'?  How does a monitor reattach affect
>> things?  Or am I missing something fundamental about the cases when
>> 'mon==NULL' will exist?
>>
>
> Yes you're reading this correctly.
>
> I'm pretty sure that mon will only be NULL if QEMU is started without a
> monitor.
>
> If QEMU has a monitor, and libvirt disconnects it's connection to the
> qemu monitor, then I believe mon will remain non-NULL.

I've verified this to be true and everything is working as expected.

If libvirt's connection to the monitor fd is closed, mon will remain 
non-NULL and the refcount will still be incremented/decremented on 
qemu_open()/qemu_close().  When libvirt reconnects, any fdsets that 
haven't been cleaned up will still be available.  query-fdsets can be 
used to determine what's available.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-07-23 13:14   ` Corey Bryant
@ 2012-08-02 22:21     ` Corey Bryant
  2012-08-06  9:15       ` Kevin Wolf
  0 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-08-02 22:21 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kwolf, aliguori, stefanha, libvir-list, qemu-devel, eblake



On 07/23/2012 09:14 AM, Corey Bryant wrote:
>
>
> On 07/23/2012 09:08 AM, Corey Bryant wrote:
>> When qemu_open is passed a filename of the "/dev/fdset/nnn"
>> format (where nnn is the fdset ID), an fd with matching access
>> mode flags will be searched for within the specified monitor
>> fd set.  If the fd is found, a dup of the fd will be returned
>> from qemu_open.
>>
>> Each fd set has a reference count.  The purpose of the reference
>> count is to determine if an fd set contains file descriptors that
>> have open dup() references that have not yet been closed.  It is
>> incremented on qemu_open and decremented on qemu_close.  It is
>> not until the refcount is zero that file desriptors in an fd set
>> can be closed.  If an fd set has dup() references open, then we
>> must keep the other fds in the fd set open in case a reopen
>> of the file occurs that requires an fd with a different access
>> mode.
>>
>> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
>>
>> v2:
>>   -Get rid of file_open and move dup code to qemu_open
>>    (kwolf@redhat.com)
>>   -Use strtol wrapper instead of atoi (kwolf@redhat.com)
>>
>> v3:
>>   -Add note about fd leakage (eblake@redhat.com)
>>
>> v4
>>   -Moved patch to be later in series (lcapitulino@redhat.com)
>>   -Update qemu_open to check access mode flags and set flags that
>>    can be set (eblake@redhat.com, kwolf@redhat.com)
>>
>> v5:
>>   -This patch was overhauled quite a bit in this version, with
>>    the addition of fd set and refcount support.
>>   -Use qemu_set_cloexec() on dup'd fd (eblake@redhat.com)
>>   -Modify flags set by fcntl on dup'd fd (eblake@redhat.com)
>>   -Reduce syscalls when setting flags for dup'd fd (eblake@redhat.com)
>>   -Fix O_RDWR, O_RDONLY, O_WRONLY checks (eblake@redhat.com)
>> ---
>>   block/raw-posix.c |   24 +++++-----
>>   block/raw-win32.c |    2 +-
>>   block/vmdk.c      |    4 +-
>>   block/vpc.c       |    2 +-
>>   block/vvfat.c     |   12 ++---
>>   cutils.c          |    5 ++
>>   monitor.c         |   85 +++++++++++++++++++++++++++++++++
>>   monitor.h         |    4 ++
>>   osdep.c           |  138
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   qemu-common.h     |    3 +-
>>   qemu-tool.c       |   12 +++++
>>   11 files changed, 267 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index a172de3..5d0a801 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs,
>> const char *filename,
>>   out_free_buf:
>>       qemu_vfree(s->aligned_buf);
>>   out_close:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return -errno;
>>   }
>>
>> @@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs)
>>   {
>>       BDRVRawState *s = bs->opaque;
>>       if (s->fd >= 0) {
>> -        qemu_close(s->fd);
>> +        qemu_close(s->fd, bs->filename);
>>           s->fd = -1;
>>           if (s->aligned_buf != NULL)
>>               qemu_vfree(s->aligned_buf);
>> @@ -580,7 +580,7 @@ static int raw_create(const char *filename,
>> QEMUOptionParameter *options)
>>           if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
>>               result = -errno;
>>           }
>> -        if (qemu_close(fd) != 0) {
>> +        if (qemu_close(fd, filename) != 0) {
>>               result = -errno;
>>           }
>>       }
>> @@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const
>> char *filename, int flags)
>>               if (fd < 0) {
>>                   bsdPath[strlen(bsdPath)-1] = '1';
>>               } else {
>> -                qemu_close(fd);
>> +                qemu_close(fd, bsdPath);
>>               }
>>               filename = bsdPath;
>>           }
>> @@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs)
>>       last_media_present = (s->fd >= 0);
>>       if (s->fd >= 0 &&
>>           (get_clock() - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
>> -        qemu_close(s->fd);
>> +        qemu_close(s->fd, bs->filename);
>>           s->fd = -1;
>>   #ifdef DEBUG_FLOPPY
>>           printf("Floppy closed\n");
>> @@ -988,7 +988,7 @@ static int hdev_create(const char *filename,
>> QEMUOptionParameter *options)
>>       else if (lseek(fd, 0, SEEK_END) < total_size * BDRV_SECTOR_SIZE)
>>           ret = -ENOSPC;
>>
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return ret;
>>   }
>>
>> @@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs,
>> const char *filename, int flags)
>>           return ret;
>>
>>       /* close fd so that we can reopen it as needed */
>> -    qemu_close(s->fd);
>> +    qemu_close(s->fd, filename);
>>       s->fd = -1;
>>       s->fd_media_changed = 1;
>>
>> @@ -1070,7 +1070,7 @@ static int floppy_probe_device(const char
>> *filename)
>>           prio = 100;
>>
>>   outc:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>   out:
>>       return prio;
>>   }
>> @@ -1105,14 +1105,14 @@ static void floppy_eject(BlockDriverState *bs,
>> bool eject_flag)
>>       int fd;
>>
>>       if (s->fd >= 0) {
>> -        qemu_close(s->fd);
>> +        qemu_close(s->fd, bs->filename);
>>           s->fd = -1;
>>       }
>>       fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
>>       if (fd >= 0) {
>>           if (ioctl(fd, FDEJECT, 0) < 0)
>>               perror("FDEJECT");
>> -        qemu_close(fd);
>> +        qemu_close(fd, bs->filename);
>>       }
>>   }
>>
>> @@ -1173,7 +1173,7 @@ static int cdrom_probe_device(const char *filename)
>>           prio = 100;
>>
>>   outc:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>   out:
>>       return prio;
>>   }
>> @@ -1281,7 +1281,7 @@ static int cdrom_reopen(BlockDriverState *bs)
>>        * FreeBSD seems to not notice sometimes...
>>        */
>>       if (s->fd >= 0)
>> -        qemu_close(s->fd);
>> +        qemu_close(s->fd, bs->filename);
>>       fd = qemu_open(bs->filename, s->open_flags, 0644);
>>       if (fd < 0) {
>>           s->fd = -1;
>> diff --git a/block/raw-win32.c b/block/raw-win32.c
>> index c56bf83..b795871 100644
>> --- a/block/raw-win32.c
>> +++ b/block/raw-win32.c
>> @@ -261,7 +261,7 @@ static int raw_create(const char *filename,
>> QEMUOptionParameter *options)
>>           return -EIO;
>>       set_sparse(fd);
>>       ftruncate(fd, total_size * 512);
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return 0;
>>   }
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index daee426..7f1206b 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char
>> *filename, int64_t filesize,
>>
>>       ret = 0;
>>    exit:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return ret;
>>   }
>>
>> @@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename,
>> QEMUOptionParameter *options)
>>       }
>>       ret = 0;
>>   exit:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return ret;
>>   }
>>
>> diff --git a/block/vpc.c b/block/vpc.c
>> index c0b82c4..20f648a 100644
>> --- a/block/vpc.c
>> +++ b/block/vpc.c
>> @@ -744,7 +744,7 @@ static int vpc_create(const char *filename,
>> QEMUOptionParameter *options)
>>       }
>>
>>    fail:
>> -    qemu_close(fd);
>> +    qemu_close(fd, filename);
>>       return ret;
>>   }
>>
>> diff --git a/block/vvfat.c b/block/vvfat.c
>> index 59d3c5b..cbc7543 100644
>> --- a/block/vvfat.c
>> +++ b/block/vvfat.c
>> @@ -1105,7 +1105,7 @@ static inline void
>> vvfat_close_current_file(BDRVVVFATState *s)
>>       if(s->current_mapping) {
>>       s->current_mapping = NULL;
>>       if (s->current_fd) {
>> -        qemu_close(s->current_fd);
>> +        qemu_close(s->current_fd, s->current_mapping->path);
>>           s->current_fd = 0;
>>       }
>>       }
>> @@ -2230,7 +2230,7 @@ static int commit_one_file(BDRVVVFATState* s,
>>       }
>>       if (offset > 0) {
>>           if (lseek(fd, offset, SEEK_SET) != offset) {
>> -            qemu_close(fd);
>> +            qemu_close(fd, mapping->path);
>>               g_free(cluster);
>>               return -3;
>>           }
>> @@ -2251,13 +2251,13 @@ static int commit_one_file(BDRVVVFATState* s,
>>           (uint8_t*)cluster, (rest_size + 0x1ff) / 0x200);
>>
>>           if (ret < 0) {
>> -            qemu_close(fd);
>> +            qemu_close(fd, mapping->path);
>>               g_free(cluster);
>>               return ret;
>>           }
>>
>>           if (write(fd, cluster, rest_size) < 0) {
>> -            qemu_close(fd);
>> +            qemu_close(fd, mapping->path);
>>               g_free(cluster);
>>               return -2;
>>           }
>> @@ -2268,11 +2268,11 @@ static int commit_one_file(BDRVVVFATState* s,
>>
>>       if (ftruncate(fd, size)) {
>>           perror("ftruncate()");
>> -        qemu_close(fd);
>> +        qemu_close(fd, mapping->path);
>>           g_free(cluster);
>>           return -4;
>>       }
>> -    qemu_close(fd);
>> +    qemu_close(fd, mapping->path);
>>       g_free(cluster);
>>
>>       return commit_mappings(s, first_cluster, dir_index);
>> diff --git a/cutils.c b/cutils.c
>> index e2bc1b8..4e6bfdc 100644
>> --- a/cutils.c
>> +++ b/cutils.c
>> @@ -375,3 +375,8 @@ int qemu_parse_fd(const char *param)
>>       }
>>       return fd;
>>   }
>> +
>> +int qemu_parse_fdset(const char *param)
>> +{
>> +    return qemu_parse_fd(param);
>> +}
>> diff --git a/monitor.c b/monitor.c
>> index 30b085f..fddd2b5 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor
>> *mon, bool in_use)
>>       }
>>   }
>>
>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +
>> +    if (!mon) {
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id == fdset_id) {
>> +            mon_fdset->refcount++;
>> +            break;
>> +        }
>> +    }
>> +}
>> +
>> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +
>> +    if (!mon) {
>> +        return;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id == fdset_id) {
>> +            mon_fdset->refcount--;
>> +            if (mon_fdset->refcount == 0) {
>> +                monitor_fdset_cleanup(mon_fdset);
>> +            }
>> +            break;
>> +        }
>> +    }
>> +}
>> +
>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>> +{
>> +    mon_fdset_t *mon_fdset;
>> +    mon_fdset_fd_t *mon_fdset_fd;
>> +    int mon_fd_flags;
>> +
>> +    if (!mon) {
>> +        errno = ENOENT;
>> +        return -1;
>> +    }
>> +
>> +    QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> +        if (mon_fdset->id != fdset_id) {
>> +            continue;
>> +        }
>> +        QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>> +            if (mon_fdset_fd->removed) {
>> +                continue;
>> +            }
>> +
>> +            mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>> +            if (mon_fd_flags == -1) {
>> +                return -1;
>> +            }
>> +
>> +            switch (flags & O_ACCMODE) {
>> +            case O_RDWR:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            case O_RDONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            case O_WRONLY:
>> +                if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>> +                    return mon_fdset_fd->fd;
>> +                }
>> +                break;
>> +            }
>> +        }
>> +        errno = EACCES;
>> +        return -1;
>> +    }
>> +    errno = ENOENT;
>> +    return -1;
>> +}
>> +
>>   /* mon_cmds and info_cmds would be sorted at runtime */
>>   static mon_cmd_t mon_cmds[] = {
>>   #include "hmp-commands.h"
>> diff --git a/monitor.h b/monitor.h
>> index 5f4de1b..1efed38 100644
>> --- a/monitor.h
>> +++ b/monitor.h
>> @@ -86,4 +86,8 @@ int qmp_qom_set(Monitor *mon, const QDict *qdict,
>> QObject **ret);
>>
>>   int qmp_qom_get(Monitor *mon, const QDict *qdict, QObject **ret);
>>
>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags);
>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id);
>> +void monitor_fdset_decrement_refcount(Monitor *mon, int64_t fdset_id);
>> +
>>   #endif /* !MONITOR_H */
>> diff --git a/osdep.c b/osdep.c
>> index 7b09dff..0303cdd 100644
>> --- a/osdep.c
>> +++ b/osdep.c
>> @@ -47,6 +47,7 @@ extern int madvise(caddr_t, size_t, int);
>>   #include "qemu-common.h"
>>   #include "trace.h"
>>   #include "qemu_socket.h"
>> +#include "monitor.h"
>>
>>   static const char *qemu_version = QEMU_VERSION;
>>
>> @@ -75,6 +76,79 @@ int qemu_madvise(void *addr, size_t len, int advice)
>>   #endif
>>   }
>>
>> +/*
>> + * Dups an fd and sets the flags
>> + */
>> +static int qemu_dup(int fd, int flags)
>> +{
>> +    int i;
>> +    int ret;
>> +    int serrno;
>> +    int dup_flags;
>> +    int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>> +                          O_NONBLOCK, 0 };
>> +
>> +    if (flags & O_CLOEXEC) {
>> +        ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>> +        if (ret == -1 && errno == EINVAL) {
>> +            ret = dup(fd);
>> +            if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>> +                goto fail;
>> +            }
>> +        }
>> +    } else {
>> +        ret = dup(fd);
>> +    }
>> +
>> +    if (ret == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    dup_flags = fcntl(ret, F_GETFL);
>> +    if (dup_flags == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>> +        errno = EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    /* Set/unset flags that we can with fcntl */
>> +    i = 0;
>> +    while (setfl_flags[i] != 0) {
>> +        if (flags & setfl_flags[i]) {
>> +            dup_flags = (dup_flags | setfl_flags[i]);
>> +        } else {
>> +            dup_flags = (dup_flags & ~setfl_flags[i]);
>> +        }
>> +        i++;
>> +    }
>> +
>> +    if (fcntl(ret, F_SETFL, dup_flags) == -1) {
>> +        goto fail;
>> +    }
>> +
>> +    /* Truncate the file in the cases that open() would truncate it */
>> +    if (flags & O_TRUNC ||
>> +            ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
>> +        if (ftruncate(ret, 0) == -1) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    qemu_set_cloexec(ret);
>> +
>> +    return ret;
>> +
>> +fail:
>> +    serrno = errno;
>> +    if (ret != -1) {
>> +        close(ret);
>> +    }
>> +    errno = serrno;
>> +    return -1;
>> +}
>>
>>   /*
>>    * Opens a file with FD_CLOEXEC set
>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>       int ret;
>>       int mode = 0;
>>
>> +#ifndef _WIN32
>> +    const char *fdset_id_str;
>> +
>> +    /* Attempt dup of fd from fd set */
>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>> +        int64_t fdset_id;
>> +        int fd, dupfd;
>> +
>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>> +        if (fdset_id == -1) {
>> +            errno = EINVAL;
>> +            return -1;
>> +        }
>> +
>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>
> I know that use of default_mon in this patch is not correct, but I
> wanted to get these patches out for review. I used default_mon for
> testing because cur_mon wasn't pointing to the monitor I'd added fd sets
> to.  I need to figure out why.
>

Does it make sense to use default_mon here?  After digging into this 
some more, I'm thinking it makes sense, and I'll explain why.

It looks like cur_mon can't be used.  cur_mon will point to the monitor 
object for the duration of a command, and be reset to old_mon (NULL in 
my case) after the command completes.

qemu_open() and qemu_close() are frequently called long after a monitor 
command has come and gone, so cur_mon won't work.  For example, 
drive_add will cause qemu_open() to be called, but after the command has 
completed, the file will keep getting opened/closed during normal QEMU 
operation.  I'm not sure why, I've just noticed this behavior.

Does anyone have any thoughts on this?  It would require fd sets to be 
added to the default monitor only.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-08-02 22:21     ` Corey Bryant
@ 2012-08-06  9:15       ` Kevin Wolf
  2012-08-06 13:32         ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-08-06  9:15 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, Luiz Capitulino, eblake

Am 03.08.2012 00:21, schrieb Corey Bryant:
>>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>>       int ret;
>>>       int mode = 0;
>>>
>>> +#ifndef _WIN32
>>> +    const char *fdset_id_str;
>>> +
>>> +    /* Attempt dup of fd from fd set */
>>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>>> +        int64_t fdset_id;
>>> +        int fd, dupfd;
>>> +
>>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>>> +        if (fdset_id == -1) {
>>> +            errno = EINVAL;
>>> +            return -1;
>>> +        }
>>> +
>>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>>
>> I know that use of default_mon in this patch is not correct, but I
>> wanted to get these patches out for review. I used default_mon for
>> testing because cur_mon wasn't pointing to the monitor I'd added fd sets
>> to.  I need to figure out why.
>>
> 
> Does it make sense to use default_mon here?  After digging into this 
> some more, I'm thinking it makes sense, and I'll explain why.
> 
> It looks like cur_mon can't be used.  cur_mon will point to the monitor 
> object for the duration of a command, and be reset to old_mon (NULL in 
> my case) after the command completes.
> 
> qemu_open() and qemu_close() are frequently called long after a monitor 
> command has come and gone, so cur_mon won't work.  For example, 
> drive_add will cause qemu_open() to be called, but after the command has 
> completed, the file will keep getting opened/closed during normal QEMU 
> operation.  I'm not sure why, I've just noticed this behavior.
> 
> Does anyone have any thoughts on this?  It would require fd sets to be 
> added to the default monitor only.

I think we have two design options that would make sense:

1. Make the file descriptors global instead of per-monitor. Is there a
   reason why each monitor has its own set of fds? (Also I'm wondering
   if they survive a monitor disconnect this way?)

2. Save a monitor reference with the fdset information.

Allowing to send file descriptors on every monitor, but making only
those of the default monitor actually usable, sounds like a bad choice
to me.

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-08-06  9:15       ` Kevin Wolf
@ 2012-08-06 13:32         ` Corey Bryant
  2012-08-06 13:51           ` Kevin Wolf
  0 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-08-06 13:32 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, Luiz Capitulino, eblake



On 08/06/2012 05:15 AM, Kevin Wolf wrote:
> Am 03.08.2012 00:21, schrieb Corey Bryant:
>>>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>>>        int ret;
>>>>        int mode = 0;
>>>>
>>>> +#ifndef _WIN32
>>>> +    const char *fdset_id_str;
>>>> +
>>>> +    /* Attempt dup of fd from fd set */
>>>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>>>> +        int64_t fdset_id;
>>>> +        int fd, dupfd;
>>>> +
>>>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>>>> +        if (fdset_id == -1) {
>>>> +            errno = EINVAL;
>>>> +            return -1;
>>>> +        }
>>>> +
>>>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>>>
>>> I know that use of default_mon in this patch is not correct, but I
>>> wanted to get these patches out for review. I used default_mon for
>>> testing because cur_mon wasn't pointing to the monitor I'd added fd sets
>>> to.  I need to figure out why.
>>>
>>
>> Does it make sense to use default_mon here?  After digging into this
>> some more, I'm thinking it makes sense, and I'll explain why.
>>
>> It looks like cur_mon can't be used.  cur_mon will point to the monitor
>> object for the duration of a command, and be reset to old_mon (NULL in
>> my case) after the command completes.
>>
>> qemu_open() and qemu_close() are frequently called long after a monitor
>> command has come and gone, so cur_mon won't work.  For example,
>> drive_add will cause qemu_open() to be called, but after the command has
>> completed, the file will keep getting opened/closed during normal QEMU
>> operation.  I'm not sure why, I've just noticed this behavior.
>>
>> Does anyone have any thoughts on this?  It would require fd sets to be
>> added to the default monitor only.
>
> I think we have two design options that would make sense:
>
> 1. Make the file descriptors global instead of per-monitor. Is there a
>     reason why each monitor has its own set of fds? (Also I'm wondering
>     if they survive a monitor disconnect this way?)

I'd prefer to have them associated with a monitor object so that we can 
more easily keep track of whether or not they're in use by a monitor 
connection.

>
> 2. Save a monitor reference with the fdset information.
>

Are you saying that each monitor would have the same copy of fdset 
information?

> Allowing to send file descriptors on every monitor, but making only
> those of the default monitor actually usable, sounds like a bad choice
> to me.

What if we also allow them to be added only to the default monitor?

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-08-06 13:32         ` Corey Bryant
@ 2012-08-06 13:51           ` Kevin Wolf
  2012-08-06 14:15             ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Kevin Wolf @ 2012-08-06 13:51 UTC (permalink / raw)
  To: Corey Bryant
  Cc: aliguori, stefanha, libvir-list, qemu-devel, Luiz Capitulino, eblake

Am 06.08.2012 15:32, schrieb Corey Bryant:
> On 08/06/2012 05:15 AM, Kevin Wolf wrote:
>> Am 03.08.2012 00:21, schrieb Corey Bryant:
>>>>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>>>>        int ret;
>>>>>        int mode = 0;
>>>>>
>>>>> +#ifndef _WIN32
>>>>> +    const char *fdset_id_str;
>>>>> +
>>>>> +    /* Attempt dup of fd from fd set */
>>>>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>>>>> +        int64_t fdset_id;
>>>>> +        int fd, dupfd;
>>>>> +
>>>>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>>>>> +        if (fdset_id == -1) {
>>>>> +            errno = EINVAL;
>>>>> +            return -1;
>>>>> +        }
>>>>> +
>>>>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>>>>
>>>> I know that use of default_mon in this patch is not correct, but I
>>>> wanted to get these patches out for review. I used default_mon for
>>>> testing because cur_mon wasn't pointing to the monitor I'd added fd sets
>>>> to.  I need to figure out why.
>>>>
>>>
>>> Does it make sense to use default_mon here?  After digging into this
>>> some more, I'm thinking it makes sense, and I'll explain why.
>>>
>>> It looks like cur_mon can't be used.  cur_mon will point to the monitor
>>> object for the duration of a command, and be reset to old_mon (NULL in
>>> my case) after the command completes.
>>>
>>> qemu_open() and qemu_close() are frequently called long after a monitor
>>> command has come and gone, so cur_mon won't work.  For example,
>>> drive_add will cause qemu_open() to be called, but after the command has
>>> completed, the file will keep getting opened/closed during normal QEMU
>>> operation.  I'm not sure why, I've just noticed this behavior.
>>>
>>> Does anyone have any thoughts on this?  It would require fd sets to be
>>> added to the default monitor only.
>>
>> I think we have two design options that would make sense:
>>
>> 1. Make the file descriptors global instead of per-monitor. Is there a
>>     reason why each monitor has its own set of fds? (Also I'm wondering
>>     if they survive a monitor disconnect this way?)
> 
> I'd prefer to have them associated with a monitor object so that we can 
> more easily keep track of whether or not they're in use by a monitor 
> connection.

Hm, I see.

>> 2. Save a monitor reference with the fdset information.
>>
> 
> Are you saying that each monitor would have the same copy of fdset 
> information?

This one doesn't really make sense indeed...

> 
>> Allowing to send file descriptors on every monitor, but making only
>> those of the default monitor actually usable, sounds like a bad choice
>> to me.
> 
> What if we also allow them to be added only to the default monitor?

Would get you some kind of consistency at least, even though I don't
like that secondary monitors can't use the functionality.

Can't we make the fdset information global, so that a qemu_open/close()
searches all of them, but let it have a Monitor* owner for keeping track
whether it's in use?

Kevin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-08-06 13:51           ` Kevin Wolf
@ 2012-08-06 14:15             ` Corey Bryant
  2012-08-07 16:43               ` Corey Bryant
  0 siblings, 1 reply; 36+ messages in thread
From: Corey Bryant @ 2012-08-06 14:15 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, Luiz Capitulino, eblake



On 08/06/2012 09:51 AM, Kevin Wolf wrote:
> Am 06.08.2012 15:32, schrieb Corey Bryant:
>> On 08/06/2012 05:15 AM, Kevin Wolf wrote:
>>> Am 03.08.2012 00:21, schrieb Corey Bryant:
>>>>>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>>>>>         int ret;
>>>>>>         int mode = 0;
>>>>>>
>>>>>> +#ifndef _WIN32
>>>>>> +    const char *fdset_id_str;
>>>>>> +
>>>>>> +    /* Attempt dup of fd from fd set */
>>>>>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>>>>>> +        int64_t fdset_id;
>>>>>> +        int fd, dupfd;
>>>>>> +
>>>>>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>>>>>> +        if (fdset_id == -1) {
>>>>>> +            errno = EINVAL;
>>>>>> +            return -1;
>>>>>> +        }
>>>>>> +
>>>>>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>>>>>
>>>>> I know that use of default_mon in this patch is not correct, but I
>>>>> wanted to get these patches out for review. I used default_mon for
>>>>> testing because cur_mon wasn't pointing to the monitor I'd added fd sets
>>>>> to.  I need to figure out why.
>>>>>
>>>>
>>>> Does it make sense to use default_mon here?  After digging into this
>>>> some more, I'm thinking it makes sense, and I'll explain why.
>>>>
>>>> It looks like cur_mon can't be used.  cur_mon will point to the monitor
>>>> object for the duration of a command, and be reset to old_mon (NULL in
>>>> my case) after the command completes.
>>>>
>>>> qemu_open() and qemu_close() are frequently called long after a monitor
>>>> command has come and gone, so cur_mon won't work.  For example,
>>>> drive_add will cause qemu_open() to be called, but after the command has
>>>> completed, the file will keep getting opened/closed during normal QEMU
>>>> operation.  I'm not sure why, I've just noticed this behavior.
>>>>
>>>> Does anyone have any thoughts on this?  It would require fd sets to be
>>>> added to the default monitor only.
>>>
>>> I think we have two design options that would make sense:
>>>
>>> 1. Make the file descriptors global instead of per-monitor. Is there a
>>>      reason why each monitor has its own set of fds? (Also I'm wondering
>>>      if they survive a monitor disconnect this way?)
>>
>> I'd prefer to have them associated with a monitor object so that we can
>> more easily keep track of whether or not they're in use by a monitor
>> connection.
>
> Hm, I see.
>
>>> 2. Save a monitor reference with the fdset information.
>>>
>>
>> Are you saying that each monitor would have the same copy of fdset
>> information?
>
> This one doesn't really make sense indeed...
>
>>
>>> Allowing to send file descriptors on every monitor, but making only
>>> those of the default monitor actually usable, sounds like a bad choice
>>> to me.
>>
>> What if we also allow them to be added only to the default monitor?
>
> Would get you some kind of consistency at least, even though I don't
> like that secondary monitors can't use the functionality.
>
> Can't we make the fdset information global, so that a qemu_open/close()
> searches all of them, but let it have a Monitor* owner for keeping track
> whether it's in use?

I think global fdsets might work (sorry I didn't think it through enough 
on my first reply).  I think I'll need to drop the "in_use" flag and tie 
monitor references into the refcount.  (I know I know, you suggested 
that a while back.. :).  I'll give it a shot and see how it goes.
-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
  2012-08-06 14:15             ` Corey Bryant
@ 2012-08-07 16:43               ` Corey Bryant
  0 siblings, 0 replies; 36+ messages in thread
From: Corey Bryant @ 2012-08-07 16:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: aliguori, stefanha, libvir-list, qemu-devel, Luiz Capitulino, eblake



On 08/06/2012 10:15 AM, Corey Bryant wrote:
>
>
> On 08/06/2012 09:51 AM, Kevin Wolf wrote:
>> Am 06.08.2012 15:32, schrieb Corey Bryant:
>>> On 08/06/2012 05:15 AM, Kevin Wolf wrote:
>>>> Am 03.08.2012 00:21, schrieb Corey Bryant:
>>>>>>> @@ -84,6 +158,36 @@ int qemu_open(const char *name, int flags, ...)
>>>>>>>         int ret;
>>>>>>>         int mode = 0;
>>>>>>>
>>>>>>> +#ifndef _WIN32
>>>>>>> +    const char *fdset_id_str;
>>>>>>> +
>>>>>>> +    /* Attempt dup of fd from fd set */
>>>>>>> +    if (strstart(name, "/dev/fdset/", &fdset_id_str)) {
>>>>>>> +        int64_t fdset_id;
>>>>>>> +        int fd, dupfd;
>>>>>>> +
>>>>>>> +        fdset_id = qemu_parse_fdset(fdset_id_str);
>>>>>>> +        if (fdset_id == -1) {
>>>>>>> +            errno = EINVAL;
>>>>>>> +            return -1;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        fd = monitor_fdset_get_fd(default_mon, fdset_id, flags);
>>>>>>
>>>>>> I know that use of default_mon in this patch is not correct, but I
>>>>>> wanted to get these patches out for review. I used default_mon for
>>>>>> testing because cur_mon wasn't pointing to the monitor I'd added
>>>>>> fd sets
>>>>>> to.  I need to figure out why.
>>>>>>
>>>>>
>>>>> Does it make sense to use default_mon here?  After digging into this
>>>>> some more, I'm thinking it makes sense, and I'll explain why.
>>>>>
>>>>> It looks like cur_mon can't be used.  cur_mon will point to the
>>>>> monitor
>>>>> object for the duration of a command, and be reset to old_mon (NULL in
>>>>> my case) after the command completes.
>>>>>
>>>>> qemu_open() and qemu_close() are frequently called long after a
>>>>> monitor
>>>>> command has come and gone, so cur_mon won't work.  For example,
>>>>> drive_add will cause qemu_open() to be called, but after the
>>>>> command has
>>>>> completed, the file will keep getting opened/closed during normal QEMU
>>>>> operation.  I'm not sure why, I've just noticed this behavior.
>>>>>
>>>>> Does anyone have any thoughts on this?  It would require fd sets to be
>>>>> added to the default monitor only.
>>>>
>>>> I think we have two design options that would make sense:
>>>>
>>>> 1. Make the file descriptors global instead of per-monitor. Is there a
>>>>      reason why each monitor has its own set of fds? (Also I'm
>>>> wondering
>>>>      if they survive a monitor disconnect this way?)
>>>
>>> I'd prefer to have them associated with a monitor object so that we can
>>> more easily keep track of whether or not they're in use by a monitor
>>> connection.
>>
>> Hm, I see.
>>
>>>> 2. Save a monitor reference with the fdset information.
>>>>
>>>
>>> Are you saying that each monitor would have the same copy of fdset
>>> information?
>>
>> This one doesn't really make sense indeed...
>>
>>>
>>>> Allowing to send file descriptors on every monitor, but making only
>>>> those of the default monitor actually usable, sounds like a bad choice
>>>> to me.
>>>
>>> What if we also allow them to be added only to the default monitor?
>>
>> Would get you some kind of consistency at least, even though I don't
>> like that secondary monitors can't use the functionality.
>>
>> Can't we make the fdset information global, so that a qemu_open/close()
>> searches all of them, but let it have a Monitor* owner for keeping track
>> whether it's in use?
>
> I think global fdsets might work (sorry I didn't think it through enough
> on my first reply).  I think I'll need to drop the "in_use" flag and tie
> monitor references into the refcount.  (I know I know, you suggested
> that a while back.. :).  I'll give it a shot and see how it goes.

I just submitted the v7 patch series which makes fd sets global, rather 
than each Monitor object having an fd set.  This allows the list of fd 
sets to be shared among all monitor connections.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-08-07 16:45 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-23 13:07 [Qemu-devel] [PATCH v5 0/6] file descriptor passing using fd sets Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 1/6] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
2012-07-23 22:50   ` Eric Blake
2012-07-24  2:19     ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 2/6] qapi: Introduce add-fd, remove-fd, query-fdsets Corey Bryant
2012-07-25 18:16   ` Eric Blake
2012-07-26  2:55     ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 3/6] monitor: Clean up fd sets on monitor disconnect Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 4/6] block: Convert open calls to qemu_open Corey Bryant
2012-07-25 19:22   ` Eric Blake
2012-07-26  3:11     ` Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close Corey Bryant
2012-07-23 13:08 ` [Qemu-devel] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets Corey Bryant
2012-07-23 13:14   ` Corey Bryant
2012-08-02 22:21     ` Corey Bryant
2012-08-06  9:15       ` Kevin Wolf
2012-08-06 13:32         ` Corey Bryant
2012-08-06 13:51           ` Kevin Wolf
2012-08-06 14:15             ` Corey Bryant
2012-08-07 16:43               ` Corey Bryant
2012-07-24 12:07   ` Kevin Wolf
2012-07-25  3:41     ` Corey Bryant
2012-07-25  8:22       ` Kevin Wolf
2012-07-25 19:25         ` Eric Blake
2012-07-26  3:21           ` Corey Bryant
2012-07-26 13:13             ` Eric Blake
2012-07-26 13:16               ` Kevin Wolf
2012-07-27  4:07                 ` Corey Bryant
2012-07-25 19:43   ` Eric Blake
2012-07-26  3:57     ` Corey Bryant
2012-07-26  9:07       ` Kevin Wolf
2012-07-27  3:59         ` Corey Bryant
2012-07-27  4:03         ` Corey Bryant
2012-08-02 15:08       ` Corey Bryant
2012-07-24 12:09 ` [Qemu-devel] [PATCH v5 0/6] file descriptor passing using " Kevin Wolf
2012-07-25  3:42   ` Corey Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.