All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu
@ 2011-10-06 15:38 Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Richa Marwaha @ 2011-10-06 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, coreyb, Richa Marwaha

With qemu it possible to run guest with unprivileged user but if
we wanted to communicate with the outside world we had to switch
to root.

We address this problem by introducing a new network option.This
option is less flexible as compare to other -net tap options because
it relies on a helper with elevated privileges to do the heavy lifting
of allocating and attaching a tap device to a bridge.  We use a special
purpose helper because we don't want to elevate the privileges of more
generic tools like brctl.

Qemu can be run with the default network helper as follows (in
this case attaching the tap device to the default qemubr0 bridge):

     qemu -hda linux.img -net tap,helper=/usr/local/libexec/qemu-bridge-helper -net nic

We're not overly thrilled with having to spell out the helper file name,
however we didn't want to regress any current behavior of -net tap.
Additionally, we feel that this support makes sense in the -net tap backend.
Any suggestions to improve on this are more than welcome.

The default helper uses it's own ACL mechanism for access control,but
future network helpers could be developed, for example, to support PolicyKit
for access control.

More details are included in individual patches.The helper is broken into
a series of patches to improve reviewabilty.

Richa Marwaha (4):
  Add basic version of bridge helper
  Add access control support to qemu-bridge-helper
  Add cap reduction support to enable use as SUID
  Add support for bridge

 Makefile             |   12 ++-
 configure            |   37 +++++
 net.c                |    8 +
 net.h                |    2 +
 net/tap.c            |  150 ++++++++++++++++++-
 qemu-bridge-helper.c |  402 ++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx      |   48 +++++--
 7 files changed, 637 insertions(+), 22 deletions(-)
 create mode 100644 qemu-bridge-helper.c

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 15:38 [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu Richa Marwaha
@ 2011-10-06 15:38 ` Richa Marwaha
  2011-10-06 16:41   ` Daniel P. Berrange
  2011-10-06 17:44   ` Anthony Liguori
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper Richa Marwaha
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 23+ messages in thread
From: Richa Marwaha @ 2011-10-06 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, coreyb, Richa Marwaha

This patch adds a helper that can be used to create a tap device attached to
a bridge device.  Since this helper is minimal in what it does, it can be
given CAP_NET_ADMIN which allows qemu to avoid running as root while still
satisfying the majority of what users tend to want to do with tap devices.

The way this all works is that qemu launches this helper passing a bridge
name and the name of an inherited file descriptor.  The descriptor is one
end of a socketpair() of domain sockets.  This domain socket is used to
transmit a file descriptor of the opened tap device from the helper to qemu.

The helper can then exit and let qemu use the tap device.

Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
---
 Makefile             |   12 +++-
 configure            |    1 +
 qemu-bridge-helper.c |  205 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 216 insertions(+), 2 deletions(-)
 create mode 100644 qemu-bridge-helper.c

diff --git a/Makefile b/Makefile
index 6ed3194..f2caedc 100644
--- a/Makefile
+++ b/Makefile
@@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
 
 LIBS+=-lz $(LIBS_TOOLS)
 
+HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
+
 ifdef BUILD_DOCS
 DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt
 else
@@ -74,7 +76,7 @@ defconfig:
 
 -include config-all-devices.mak
 
-build-all: $(DOCS) $(TOOLS) recurse-all
+build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
 
 config-host.h: config-host.h-timestamp
 config-host.h-timestamp: config-host.mak
@@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-ob
 
 qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o
 
+qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
+
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
 	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN   $@")
 
@@ -208,7 +212,7 @@ clean:
 # avoid old build problems by removing potentially incorrect old files
 	rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h
 	rm -f qemu-options.def
-	rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
+	rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
 	rm -Rf .libs
 	rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d
 	rm -f qemu-img-cmds.h
@@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig
 ifneq ($(TOOLS),)
 	$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
 endif
+ifneq ($(HELPERS-y),)
+	$(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
+	$(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
+endif
 ifneq ($(BLOBS),)
 	$(INSTALL_DIR) "$(DESTDIR)$(datadir)"
 	set -e; for x in $(BLOBS); do \
diff --git a/configure b/configure
index 59b1494..3e32834 100755
--- a/configure
+++ b/configure
@@ -2742,6 +2742,7 @@ echo "mandir=$mandir" >> $config_host_mak
 echo "datadir=$datadir" >> $config_host_mak
 echo "sysconfdir=$sysconfdir" >> $config_host_mak
 echo "docdir=$docdir" >> $config_host_mak
+echo "libexecdir=\${prefix}/libexec" >> $config_host_mak
 echo "confdir=$confdir" >> $config_host_mak
 
 case "$cpu" in
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
new file mode 100644
index 0000000..4ac7b36
--- /dev/null
+++ b/qemu-bridge-helper.c
@@ -0,0 +1,205 @@
+/*
+ * QEMU Bridge Helper
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ * Anthony Liguori   <address@hidden>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "config-host.h"
+
+#include <stdio.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <ctype.h>
+
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/prctl.h>
+
+#include <net/if.h>
+
+#include <linux/sockios.h>
+
+#include "net/tap-linux.h"
+
+static int has_vnet_hdr(int fd)
+{
+    unsigned int features = 0;
+    struct ifreq ifreq;
+
+    if (ioctl(fd, TUNGETFEATURES, &features) == -1) {
+        return -errno;
+    }
+
+    if (!(features & IFF_VNET_HDR)) {
+        return -ENOTSUP;
+    }
+
+    if (ioctl(fd, TUNGETIFF, &ifreq) != -1 || errno != EBADFD) {
+        return -ENOTSUP;
+    }
+
+    return 1;
+}
+
+static void prep_ifreq(struct ifreq *ifr, const char *ifname)
+{
+    memset(ifr, 0, sizeof(*ifr));
+    snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
+}
+
+static int send_fd(int c, int fd)
+{
+    char msgbuf[CMSG_SPACE(sizeof(fd))];
+    struct msghdr msg = {
+        .msg_control = msgbuf,
+        .msg_controllen = sizeof(msgbuf),
+    };
+    struct cmsghdr *cmsg;
+    struct iovec iov;
+    char req[1] = { 0x00 };
+
+    cmsg = CMSG_FIRSTHDR(&msg);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+    msg.msg_controllen = cmsg->cmsg_len;
+
+    iov.iov_base = req;
+    iov.iov_len = sizeof(req);
+
+    msg.msg_iov = &iov;
+    msg.msg_iovlen = 1;
+    memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
+
+    return sendmsg(c, &msg, 0);
+}
+
+int main(int argc, char **argv)
+{
+    struct ifreq ifr;
+    int fd, ctlfd, unixfd;
+    int use_vnet = 0;
+    int mtu;
+    const char *bridge;
+    char iface[IFNAMSIZ];
+    int index;
+
+    /* parse arguments */
+    if (argc < 3 || argc > 4) {
+        fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]);
+        return 1;
+    }
+
+    index = 1;
+    if (strcmp(argv[index], "--use-vnet") == 0) {
+        use_vnet = 1;
+        index++;
+        if (argc == 3) {
+            fprintf(stderr, "invalid number of arguments\n");
+            return -1;
+        }
+    }
+
+    bridge = argv[index++];
+    unixfd = atoi(argv[index++]);
+
+    /* open a socket to use to control the network interfaces */
+    ctlfd = socket(AF_INET, SOCK_STREAM, 0);
+    if (ctlfd == -1) {
+        fprintf(stderr, "failed to open control socket\n");
+        return -errno;
+    }
+
+    /* open the tap device */
+    fd = open("/dev/net/tun", O_RDWR);
+    if (fd == -1) {
+        fprintf(stderr, "failed to open /dev/net/tun\n");
+        return -errno;
+    }
+
+    /* request a tap device, disable PI, and add vnet header support if
+     * requested and it's available. */
+    prep_ifreq(&ifr, "tap%d");
+    ifr.ifr_flags = IFF_TAP|IFF_NO_PI;
+    if (use_vnet && has_vnet_hdr(fd)) {
+        ifr.ifr_flags |= IFF_VNET_HDR;
+    }
+
+    if (ioctl(fd, TUNSETIFF, &ifr) == -1) {
+        fprintf(stderr, "failed to create tun device\n");
+        return -errno;
+    }
+
+    /* save tap device name */
+    snprintf(iface, sizeof(iface), "%s", ifr.ifr_name);
+
+    /* get the mtu of the bridge */
+    prep_ifreq(&ifr, bridge);
+    if (ioctl(ctlfd, SIOCGIFMTU, &ifr) == -1) {
+        fprintf(stderr, "failed to get mtu of bridge `%s'\n", bridge);
+        return -errno;
+    }
+
+    /* save mtu */
+    mtu = ifr.ifr_mtu;
+
+    /* set the mtu of the interface based on the bridge */
+    prep_ifreq(&ifr, iface);
+    ifr.ifr_mtu = mtu;
+    if (ioctl(ctlfd, SIOCSIFMTU, &ifr) == -1) {
+        fprintf(stderr, "failed to set mtu of device `%s' to %d\n",
+                iface, mtu);
+        return -errno;
+    }
+
+    /* add the interface to the bridge */
+    prep_ifreq(&ifr, bridge);
+    ifr.ifr_ifindex = if_nametoindex(iface);
+
+    if (ioctl(ctlfd, SIOCBRADDIF, &ifr) == -1) {
+        fprintf(stderr, "failed to add interface `%s' to bridge `%s'\n",
+                iface, bridge);
+        return -errno;
+    }
+
+    /* bring the interface up */
+    prep_ifreq(&ifr, iface);
+    if (ioctl(ctlfd, SIOCGIFFLAGS, &ifr) == -1) {
+        fprintf(stderr, "failed to get interface flags for `%s'\n", iface);
+        return -errno;
+    }
+
+    ifr.ifr_flags |= IFF_UP;
+    if (ioctl(ctlfd, SIOCSIFFLAGS, &ifr) == -1) {
+        fprintf(stderr, "failed to set bring up interface `%s'\n", iface);
+        return -errno;
+    }
+
+    /* write fd to the domain socket */
+    if (send_fd(unixfd, fd) == -1) {
+        fprintf(stderr, "failed to write fd to unix socket\n");
+        return -errno;
+    }
+
+    /* ... */
+
+    /* profit! */
+
+    close(fd);
+
+    close(ctlfd);
+
+    return 0;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper
  2011-10-06 15:38 [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
@ 2011-10-06 15:38 ` Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 4/4] Add support for bridge Richa Marwaha
  3 siblings, 0 replies; 23+ messages in thread
From: Richa Marwaha @ 2011-10-06 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, coreyb, Richa Marwaha

We go to great lengths to restrict ourselves to just cap_net_admin as an OS
enforced security mechanism.  However, we further restrict what we allow users
to do to simply adding a tap device to a bridge interface by virtue of the fact
that this is the only functionality we expose.

This is not good enough though.  An administrator is likely to want to restrict
the bridges that an unprivileged user can access, in particular, to restrict
an unprivileged user from putting a guest on what should be isolated networks.

This patch implements a ACL mechanism that is enforced by qemu-bridge-helper.
The ACLs are fairly simple whitelist/blacklist mechanisms with a wildcard of
'all'.

An interesting feature of this ACL mechanism is that you can include external
ACL files.  The main reason to support this is so that you can set different
file system permissions on those external ACL files.  This allows an
administrator to implement rather sophisicated ACL policies based on user/group
policies via the file system.

As an example:

/etc/qemu/bridge.conf root:qemu 0640

 deny all
 allow br0
 include /etc/qemu/alice.conf
 include /etc/qemu/bob.conf

/etc/qemu/alice.conf root:alice 0640
 allow br1

/etc/qemu/bob.conf root:bob 0640
 allow br2

This ACL pattern allows any user in the qemu group to get a tap device
connected to br0 (which is bridged to the physical network).

Users in the alice group can additionally get a tap device connected to br1.
This allows br1 to act as a private bridge for the alice group.

Users in the bob group can additionally get a tap device connected to br2.
This allows br2 to act as a private bridge for the bob group.

Under no circumstance can the bob group get access to br1 or can the alice
group get access to br2.

Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
---
 qemu-bridge-helper.c |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 141 insertions(+), 0 deletions(-)

diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
index 4ac7b36..5e09fea 100644
--- a/qemu-bridge-helper.c
+++ b/qemu-bridge-helper.c
@@ -33,6 +33,105 @@
 
 #include "net/tap-linux.h"
 
+#define MAX_ACLS (128)
+#define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf"
+
+enum {
+    ACL_ALLOW = 0,
+    ACL_ALLOW_ALL,
+    ACL_DENY,
+    ACL_DENY_ALL,
+};
+
+typedef struct ACLRule {
+    int type;
+    char iface[IFNAMSIZ];
+} ACLRule;
+
+static int parse_acl_file(const char *filename, ACLRule *acls, int *pacl_count)
+{
+    int acl_count = *pacl_count;
+    FILE *f;
+    char line[4096];
+
+    f = fopen(filename, "r");
+    if (f == NULL) {
+        return -1;
+    }
+
+    while (acl_count != MAX_ACLS &&
+            fgets(line, sizeof(line), f) != NULL) {
+        char *ptr = line;
+        char *cmd, *arg, *argend;
+
+        while (isspace(*ptr)) {
+            ptr++;
+        }
+
+        /* skip comments and empty lines */
+        if (*ptr == '#' || *ptr == 0) {
+            continue;
+        }
+
+        cmd = ptr;
+        arg = strchr(cmd, ' ');
+        if (arg == NULL) {
+            arg = strchr(cmd, '\t');
+        }
+
+        if (arg == NULL) {
+            fprintf(stderr, "Invalid config line:\n  %s\n", line);
+            fclose(f);
+            errno = EINVAL;
+            return -1;
+        }
+
+        *arg = 0;
+        arg++;
+        while (isspace(*arg)) {
+            arg++;
+        }
+
+        argend = arg + strlen(arg);
+        while (arg != argend && isspace(*(argend - 1))) {
+            argend--;
+        }
+        *argend = 0;
+
+        if (strcmp(cmd, "deny") == 0) {
+            if (strcmp(arg, "all") == 0) {
+                acls[acl_count].type = ACL_DENY_ALL;
+            } else {
+                acls[acl_count].type = ACL_DENY;
+                snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg);
+            }
+            acl_count++;
+        } else if (strcmp(cmd, "allow") == 0) {
+            if (strcmp(arg, "all") == 0) {
+                acls[acl_count].type = ACL_ALLOW_ALL;
+            } else {
+                acls[acl_count].type = ACL_ALLOW;
+                snprintf(acls[acl_count].iface, IFNAMSIZ, "%s", arg);
+            }
+            acl_count++;
+        } else if (strcmp(cmd, "include") == 0) {
+            /* ignore errors */
+            parse_acl_file(arg, acls, &acl_count);
+        } else {
+            fprintf(stderr, "Unknown command `%s'\n", cmd);
+            fclose(f);
+            errno = EINVAL;
+            return -1;
+        }
+    }
+
+    *pacl_count = acl_count;
+
+    fclose(f);
+
+    return 0;
+}
+
 static int has_vnet_hdr(int fd)
 {
     unsigned int features = 0;
@@ -95,6 +194,9 @@ int main(int argc, char **argv)
     const char *bridge;
     char iface[IFNAMSIZ];
     int index;
+    ACLRule acls[MAX_ACLS];
+    int acl_count = 0;
+    int i, access_allowed, access_denied;
 
     /* parse arguments */
     if (argc < 3 || argc > 4) {
@@ -115,6 +217,45 @@ int main(int argc, char **argv)
     bridge = argv[index++];
     unixfd = atoi(argv[index++]);
 
+    /* parse default acl file */
+    if (parse_acl_file(DEFAULT_ACL_FILE, acls, &acl_count) == -1) {
+        fprintf(stderr, "failed to parse default acl file `%s'\n",
+                DEFAULT_ACL_FILE);
+        return -errno;
+    }
+
+    /* validate bridge against acl -- default policy is to deny
+     * according acl policy if we have a deny and allow both
+     * then deny should always win over allow
+     */
+    access_allowed = 0;
+    access_denied = 0;
+    for (i = 0; i < acl_count; i++) {
+        switch (acls[i].type) {
+        case ACL_ALLOW_ALL:
+            access_allowed = 1;
+            break;
+        case ACL_ALLOW:
+            if (strcmp(bridge, acls[i].iface) == 0) {
+                access_allowed = 1;
+            }
+            break;
+        case ACL_DENY_ALL:
+            access_denied = 1;
+            break;
+        case ACL_DENY:
+            if (strcmp(bridge, acls[i].iface) == 0) {
+                access_denied = 1;
+            }
+            break;
+        }
+    }
+
+    if ((access_allowed == 0) || (access_denied == 1)) {
+        fprintf(stderr, "access denied by acl file\n");
+        return -EPERM;
+    }
+
     /* open a socket to use to control the network interfaces */
     ctlfd = socket(AF_INET, SOCK_STREAM, 0);
     if (ctlfd == -1) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
  2011-10-06 15:38 [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper Richa Marwaha
@ 2011-10-06 15:38 ` Richa Marwaha
  2011-10-06 16:34   ` Daniel P. Berrange
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 4/4] Add support for bridge Richa Marwaha
  3 siblings, 1 reply; 23+ messages in thread
From: Richa Marwaha @ 2011-10-06 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, coreyb, Richa Marwaha

The ideal way to use qemu-bridge-helper is to give it an fscap of using:

 setcap cap_net_admin=ep qemu-bridge-helper

Unfortunately, most distros still do not have a mechanism to package files
with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
binary.

To improve security, use libcap to reduce our capability set to just
cap_net_admin, then reduce privileges down to the calling user.  This is
hopefully close to equivalent to fscap support from a security perspective.

Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
---
 configure            |   34 ++++++++++++++++++++++++++++++
 qemu-bridge-helper.c |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 3e32834..f46e9b7 100755
--- a/configure
+++ b/configure
@@ -128,6 +128,7 @@ vnc_thread="no"
 xen=""
 xen_ctrl_version=""
 linux_aio=""
+cap=""
 attr=""
 xfs=""
 
@@ -653,6 +654,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-cap)  cap="no"
+  ;;
+  --enable-cap) cap="yes"
+  ;;
   --disable-spice) spice="no"
   ;;
   --enable-spice) spice="yes"
@@ -1032,6 +1037,8 @@ echo "  --disable-vde            disable support for vde network"
 echo "  --enable-vde             enable support for vde network"
 echo "  --disable-linux-aio      disable Linux AIO support"
 echo "  --enable-linux-aio       enable Linux AIO support"
+echo "  --disable-cap            disable libcap support"
+echo "  --enable-cap             enable libcap support"
 echo "  --disable-attr           disables attr and xattr support"
 echo "  --enable-attr            enable attr and xattr support"
 echo "  --disable-blobs          disable installing provided firmware blobs"
@@ -1638,6 +1645,29 @@ EOF
 fi
 
 ##########################################
+# cap library probe
+if test "$cap" != "no" ; then
+  cap_libs="-lcap"
+  cat > $TMPC << EOF
+#include <sys/capability.h>
+int main(void)
+{
+    cap_init();
+    return 0;
+}
+EOF
+  if compile_prog "" "$cap_libs" ; then
+    cap=yes
+    libs_tools="$cap_libs $libs_tools"
+  else
+    if test "$cap" = "yes" ; then
+      feature_not_found "cap"
+    fi
+    cap=no
+  fi
+fi
+
+##########################################
 # Sound support libraries probe
 
 audio_drv_probe()
@@ -2710,6 +2740,7 @@ echo "fdatasync         $fdatasync"
 echo "madvise           $madvise"
 echo "posix_madvise     $posix_madvise"
 echo "uuid support      $uuid"
+echo "libcap support    $cap"
 echo "vhost-net support $vhost_net"
 echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
@@ -2821,6 +2852,9 @@ fi
 if test "$vde" = "yes" ; then
   echo "CONFIG_VDE=y" >> $config_host_mak
 fi
+if test "$cap" = "yes" ; then
+  echo "CONFIG_LIBCAP=y" >> $config_host_mak
+fi
 for card in $audio_card_list; do
     def=CONFIG_`echo $card | tr '[:lower:]' '[:upper:]'`
     echo "$def=y" >> $config_host_mak
diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
index 5e09fea..b1519e0 100644
--- a/qemu-bridge-helper.c
+++ b/qemu-bridge-helper.c
@@ -33,6 +33,10 @@
 
 #include "net/tap-linux.h"
 
+#ifdef CONFIG_LIBCAP
+#include <sys/capability.h>
+#endif
+
 #define MAX_ACLS (128)
 #define DEFAULT_ACL_FILE CONFIG_QEMU_CONFDIR "/bridge.conf"
 
@@ -185,6 +189,47 @@ static int send_fd(int c, int fd)
     return sendmsg(c, &msg, 0);
 }
 
+#ifdef CONFIG_LIBCAP
+static int drop_privileges(void)
+{
+    cap_t cap;
+    cap_value_t new_caps[] = {CAP_NET_ADMIN};
+
+    cap = cap_init();
+
+    /* set capabilities to be permitted and inheritable.  we don't need the
+     * caps to be effective right now as they'll get reset when we seteuid
+     * anyway */
+    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+    cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);
+
+    if (cap_set_proc(cap) == -1) {
+        return -1;
+    }
+
+    cap_free(cap);
+
+    /* reduce our privileges to a normal user */
+    setegid(getgid());
+    seteuid(getuid());
+
+    cap = cap_init();
+
+    /* enable the our capabilities.  we marked them as inheritable earlier
+     * which is what allows this to work. */
+    cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
+    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
+
+    if (cap_set_proc(cap) == -1) {
+        return -1;
+    }
+
+    cap_free(cap);
+
+    return 0;
+}
+#endif
+
 int main(int argc, char **argv)
 {
     struct ifreq ifr;
@@ -198,6 +243,17 @@ int main(int argc, char **argv)
     int acl_count = 0;
     int i, access_allowed, access_denied;
 
+#ifdef CONFIG_LIBCAP
+    /* if we're run from an suid binary, immediately drop privileges preserving
+     * cap_net_admin */
+    if (geteuid() == 0 && getuid() != geteuid()) {
+        if (drop_privileges() == -1) {
+            fprintf(stderr, "failed to drop privileges\n");
+            return 1;
+        }
+    }
+#endif
+
     /* parse arguments */
     if (argc < 3 || argc > 4) {
         fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 4/4] Add support for bridge
  2011-10-06 15:38 [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu Richa Marwaha
                   ` (2 preceding siblings ...)
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID Richa Marwaha
@ 2011-10-06 15:38 ` Richa Marwaha
  2011-10-06 17:49   ` Anthony Liguori
  3 siblings, 1 reply; 23+ messages in thread
From: Richa Marwaha @ 2011-10-06 15:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: aliguori, coreyb, Richa Marwaha

The most common use of -net tap is to connect a tap device to a bridge.  This
requires the use of a script and running qemu as root in order to allocate a
tap device to pass to the script.

This model is great for portability and flexibility but it's incredibly
difficult to eliminate the need to run qemu as root.  The only really viable
mechanism is to use tunctl to create a tap device, attach it to a bridge as
root, and then hand that tap device to qemu.  The problem with this mechanism
is that it requires administrator intervention whenever a user wants to create
a guest.

By essentially writing a helper that implements the most common qemu-ifup
script that can be safely given cap_net_admin, we can dramatically simplify
things for non-privileged users.  We still support existing -net tap options
as a mechanism for advanced users and backwards compatibility.

Currently, this is very Linux centric but there's really no reason why it
couldn't be extended for other Unixes.

The default bridge that we attach to is qemubr0.  The thinking is that a distro
could preconfigure such an interface to allow out-of-the-box bridged networking.

Alternatively, if a user wants to use a different bridge, they can say:

  qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
                     -net nic,model=virtio

Signed-off-by: Richa Marwaha <rmarwah@linux.vnet.ibm.com>
---
 configure       |    2 +
 net.c           |    8 +++
 net.h           |    2 +
 net/tap.c       |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 qemu-options.hx |   48 +++++++++++++-----
 5 files changed, 190 insertions(+), 20 deletions(-)

diff --git a/configure b/configure
index f46e9b7..ef05954 100755
--- a/configure
+++ b/configure
@@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir" >> $config_host_mak
 echo "docdir=$docdir" >> $config_host_mak
 echo "libexecdir=\${prefix}/libexec" >> $config_host_mak
 echo "confdir=$confdir" >> $config_host_mak
+echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"" >> $config_host_mak
+echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"" >> $config_host_mak
 
 case "$cpu" in
   i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)
diff --git a/net.c b/net.c
index d05930c..4c3c551 100644
--- a/net.c
+++ b/net.c
@@ -956,6 +956,14 @@ static const struct {
                 .type = QEMU_OPT_STRING,
                 .help = "script to shut down the interface",
             }, {
+                .name = "br",
+                .type = QEMU_OPT_STRING,
+                .help = "bridge name",
+            }, {
+                .name = "helper",
+                .type = QEMU_OPT_STRING,
+                .help = "command to execute to configure bridge",
+            }, {
                 .name = "sndbuf",
                 .type = QEMU_OPT_SIZE,
                 .help = "send buffer limit"
diff --git a/net.h b/net.h
index 9f633f8..eeb19a7 100644
--- a/net.h
+++ b/net.h
@@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
 #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
+#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
+#define DEFAULT_BRIDGE_INTERFACE "qemubr0"
 
 void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
 
diff --git a/net/tap.c b/net/tap.c
index 1f26dc9..74f103a 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const char *ifname, int fd)
     return -1;
 }
 
+static int recv_fd(int c)
+{
+    int fd;
+    uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
+    struct msghdr msg = {
+        .msg_control = msgbuf,
+        .msg_controllen = sizeof(msgbuf),
+    };
+    struct cmsghdr *cmsg;
+    struct iovec iov;
+    uint8_t req[1];
+    ssize_t len;
+
+    cmsg = CMSG_FIRSTHDR(&msg);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
+    msg.msg_controllen = cmsg->cmsg_len;
+
+    iov.iov_base = req;
+    iov.iov_len = sizeof(req);
+
+    msg.msg_iov = &iov;
+    msg.msg_iovlen = 1;
+
+    len = recvmsg(c, &msg, 0);
+    if (len > 0) {
+        memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
+        return fd;
+    }
+
+    return len;
+}
+
+static int net_bridge_run_helper(const char *helper, const char *bridge)
+{
+    sigset_t oldmask, mask;
+    int pid, status;
+    char *args[5];
+    char **parg;
+    int sv[2];
+
+    sigemptyset(&mask);
+    sigaddset(&mask, SIGCHLD);
+    sigprocmask(SIG_BLOCK, &mask, &oldmask);
+
+    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+        return -1;
+    }
+
+    /* try to launch bridge helper */
+    pid = fork();
+    if (pid == 0) {
+        int open_max = sysconf(_SC_OPEN_MAX), i;
+        char buf[32];
+
+        snprintf(buf, sizeof(buf), "%d", sv[1]);
+
+        for (i = 0; i < open_max; i++) {
+            if (i != STDIN_FILENO &&
+                i != STDOUT_FILENO &&
+                i != STDERR_FILENO &&
+                i != sv[1]) {
+                close(i);
+            }
+        }
+        parg = args;
+        *parg++ = (char *)helper;
+        *parg++ = (char *)"--use-vnet";
+        *parg++ = (char *)bridge;
+        *parg++ = buf;
+        *parg++ = NULL;
+        execv(helper, args);
+        _exit(1);
+    } else if (pid > 0) {
+        int fd;
+
+        close(sv[1]);
+
+        do {
+            fd = recv_fd(sv[0]);
+        } while (fd == -1 && errno == EINTR);
+
+        close(sv[0]);
+
+        while (waitpid(pid, &status, 0) != pid) {
+            /* loop */
+        }
+        sigprocmask(SIG_SETMASK, &oldmask, NULL);
+        if (fd < 0) {
+            fprintf(stderr, "failed to recv file descriptor\n");
+            return -1;
+        }
+
+        if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+            return fd;
+        }
+    }
+    fprintf(stderr, "failed to launch bridge helper\n");
+    return -1;
+}
+
 static int net_tap_init(QemuOpts *opts, int *vnet_hdr)
 {
     int fd, vnet_hdr_required;
@@ -433,8 +535,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         if (qemu_opt_get(opts, "ifname") ||
             qemu_opt_get(opts, "script") ||
             qemu_opt_get(opts, "downscript") ||
-            qemu_opt_get(opts, "vnet_hdr")) {
-            error_report("ifname=, script=, downscript= and vnet_hdr= is invalid with fd=");
+            qemu_opt_get(opts, "vnet_hdr") ||
+            qemu_opt_get(opts, "br") ||
+            qemu_opt_get(opts, "helper")) {
+            error_report("ifname=, script=, downscript=, vnet_hdr=,"
+                         "br= and helper= are invalid with fd=");
             return -1;
         }
 
@@ -446,6 +551,37 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         fcntl(fd, F_SETFL, O_NONBLOCK);
 
         vnet_hdr = tap_probe_vnet_hdr(fd);
+    } else if (qemu_opt_get(opts, "helper")) {
+        if (qemu_opt_get(opts, "ifname") ||
+            qemu_opt_get(opts, "script") ||
+            qemu_opt_get(opts, "downscript")) {
+            error_report("ifname=, script= and downscript="
+                         "are invalid with helper=");
+            return -1;
+        }
+
+        if (!qemu_opt_get(opts, "br")) {
+            qemu_opt_set(opts, "br", DEFAULT_BRIDGE_INTERFACE);
+        }
+
+        fd = net_bridge_run_helper(qemu_opt_get(opts, "helper"),
+                                   qemu_opt_get(opts, "br"));
+
+        fcntl(fd, F_SETFL, O_NONBLOCK);
+
+        vnet_hdr = tap_probe_vnet_hdr(fd);
+
+        s = net_tap_fd_init(vlan, "bridge", name, fd, vnet_hdr);
+
+        if (!s) {
+            close(fd);
+            return -1;
+        }
+
+        snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+                "br=%s", qemu_opt_get(opts, "br"));
+
+        return 0;
     } else {
         if (!qemu_opt_get(opts, "script")) {
             qemu_opt_set(opts, "script", DEFAULT_NETWORK_SCRIPT);
@@ -459,12 +595,12 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
         if (fd == -1) {
             return -1;
         }
-    }
 
-    s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
-    if (!s) {
-        close(fd);
-        return -1;
+        s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
+        if (!s) {
+            close(fd);
+            return -1;
+        }
     }
 
     if (tap_set_sndbuf(s->fd, opts) < 0) {
diff --git a/qemu-options.hx b/qemu-options.hx
index dfbabd0..ad4afa9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1149,11 +1149,15 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
     "-net tap[,vlan=n][,name=str],ifname=name\n"
     "                connect the host TAP network interface to VLAN 'n'\n"
 #else
-    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
-    "                connect the host TAP network interface to VLAN 'n' and use the\n"
-    "                network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
-    "                and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
+    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,br=bridge][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
+    "                connect the host TAP network interface to VLAN 'n' \n"
+    "                use network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
+    "                to configure it and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
+    "                to deconfigure it.  This requires root privilege.\n"
     "                use '[down]script=no' to disable script execution\n"
+    "                use network helper 'helper' (default=" DEFAULT_BRIDGE_HELPER ") and\n"
+    "                use bridge 'br' (default=" DEFAULT_BRIDGE_INTERFACE ") to configure it. This\n"
+    "                does not require root privilege.\n"
     "                use 'fd=h' to connect to an already opened TAP interface\n"
     "                use 'sndbuf=nbytes' to limit the size of the send buffer (the\n"
     "                default is disabled 'sndbuf=0' to enable flow control set 'sndbuf=1048576')\n"
@@ -1322,26 +1326,44 @@ processed and applied to -net user. Mixing them with the new configuration
 syntax gives undefined results. Their use for new applications is discouraged
 as they will be removed from future versions.
 
-@item -net tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}] [,script=@var{file}][,downscript=@var{dfile}]
-Connect the host TAP network interface @var{name} to VLAN @var{n}, use
-the network script @var{file} to configure it and the network script
+@item -net tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}] [,script=@var{file}][,downscript=@var{dfile}][,br=@var{bridge}][,helper=@var{helper}]
+Connect the host TAP network interface @var{name} to VLAN @var{n}.
+
+Use the network script @var{file} to configure it and the network script
 @var{dfile} to deconfigure it. If @var{name} is not provided, the OS
-automatically provides one. @option{fd}=@var{h} can be used to specify
-the handle of an already opened host TAP interface. The default network
-configure script is @file{/etc/qemu-ifup} and the default network
-deconfigure script is @file{/etc/qemu-ifdown}. Use @option{script=no}
-or @option{downscript=no} to disable script execution. Example:
+automatically provides one. The default network configure script is
+@file{/etc/qemu-ifup} and the default network deconfigure script is
+@file{/etc/qemu-ifdown}. Use @option{script=no} or @option{downscript=no}
+to disable script execution.
+
+If running QEMU as an unprivileged user, use the network helper
+@var{helper} to configure the TAP interface. The default network
+bridge helper executable is @file{/usr/local/libexec/qemu-bridge-helper}
+and bridge name interface is @file{qemubr0}.
+
+@option{fd}=@var{h} can be used to specify the handle of an already
+opened host TAP interface.
+
+Examples:
 
 @example
+#launch a QEMU instance with the default network script
 qemu linux.img -net nic -net tap
 @end example
 
-More complicated example (two NICs, each one connected to a TAP device)
 @example
+#launch a QEMU instance with two NICs, each one connected
+#to a TAP device
 qemu linux.img -net nic,vlan=0 -net tap,vlan=0,ifname=tap0 \
                -net nic,vlan=1 -net tap,vlan=1,ifname=tap1
 @end example
 
+@example
+#launch a QEMU instance with the default network helper to
+#connect a TAP device to bridge br0
+qemu linux.img -net nic -net tap,helper=/usr/local/libexec/qemu-bridge-helper,br=br0
+@end example
+
 @item -net socket[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}] [,listen=[@var{host}]:@var{port}][,connect=@var{host}:@var{port}]
 
 Connect the VLAN @var{n} to a remote VLAN in another QEMU virtual
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID Richa Marwaha
@ 2011-10-06 16:34   ` Daniel P. Berrange
  2011-10-06 17:42     ` Anthony Liguori
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrange @ 2011-10-06 16:34 UTC (permalink / raw)
  To: Richa Marwaha; +Cc: aliguori, coreyb, qemu-devel

On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:
> The ideal way to use qemu-bridge-helper is to give it an fscap of using:
> 
>  setcap cap_net_admin=ep qemu-bridge-helper
> 
> Unfortunately, most distros still do not have a mechanism to package files
> with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
> binary.
> 
> To improve security, use libcap to reduce our capability set to just
> cap_net_admin, then reduce privileges down to the calling user.  This is
> hopefully close to equivalent to fscap support from a security perspective.
> +#ifdef CONFIG_LIBCAP
> +static int drop_privileges(void)
> +{
> +    cap_t cap;
> +    cap_value_t new_caps[] = {CAP_NET_ADMIN};
> +
> +    cap = cap_init();

Check for NULL ?

> +
> +    /* set capabilities to be permitted and inheritable.  we don't need the
> +     * caps to be effective right now as they'll get reset when we seteuid
> +     * anyway */
> +    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
> +    cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);

Check for failure ?

> +
> +    if (cap_set_proc(cap) == -1) {
> +        return -1;
> +    }
> +
> +    cap_free(cap);

Check for failure ?

> +
> +    /* reduce our privileges to a normal user */
> +    setegid(getgid());
> +    seteuid(getuid());

Check for failure ?

> +    cap = cap_init();

Check for NULL ?

> +
> +    /* enable the our capabilities.  we marked them as inheritable earlier
> +     * which is what allows this to work. */
> +    cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
> +    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);

Check for failure ?

> +
> +    if (cap_set_proc(cap) == -1) {
> +        return -1;
> +    }
> +
> +    cap_free(cap);

Check for failure ?

> +
> +    return 0;
> +}
> +#endif

It may seem like checking for failure on cap_free/cap_set_flag is
not required because they can only return EINVAL for invalid
args, but since this is missing the check for NULL on cap_init
you can actually see errors from those latter functions in an
OOM cenario.

I think I'd suggest not using libcap, instead try libcap-ng [1] whose
APIs are designed with safety in mind & result in much simpler and
clearer code:

eg, that entire function above can be expressed using capng with
something approximating:

     capng_clear(CAPNG_SELECT_BOTH);
     if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN) < 0)
         error(...);
     if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING))
         error(...);


Regards,
Daniel

[1] http://people.redhat.com/sgrubb/libcap-ng/

-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
@ 2011-10-06 16:41   ` Daniel P. Berrange
  2011-10-06 18:04     ` Anthony Liguori
  2011-10-06 17:44   ` Anthony Liguori
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel P. Berrange @ 2011-10-06 16:41 UTC (permalink / raw)
  To: Richa Marwaha; +Cc: aliguori, coreyb, qemu-devel

On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
> This patch adds a helper that can be used to create a tap device attached to
> a bridge device.  Since this helper is minimal in what it does, it can be
> given CAP_NET_ADMIN which allows qemu to avoid running as root while still
> satisfying the majority of what users tend to want to do with tap devices.
> 
> The way this all works is that qemu launches this helper passing a bridge
> name and the name of an inherited file descriptor.  The descriptor is one
> end of a socketpair() of domain sockets.  This domain socket is used to
> transmit a file descriptor of the opened tap device from the helper to qemu.
> 
> The helper can then exit and let qemu use the tap device.

When QEMU is run by libvirt, we generally like to use capng to
remove the ability for QEMU to run setuid programs at all. So
obviously it will struggle to run the qemu-bridge-helper binary
in such a scenario.

With the way you transmit the TAP device FD back to the caller,
it looks like libvirt itself could execute the qemu-bridge-helper
receiving the FD, and then pass the FD onto QEMU using the
traditional tap,fd=XX syntax.

The TAP device FD is only one FD we normally pass to QEMU. How about
support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
to send back a vhost net FD also. Or indeed multiple vhost net FDs
when we get multiqueue NICs.  Should we expect the bridge helper to
be strictly limited to just connecting a TAP dev to a bridge, or is
the expectation that it will grow more & more functionality over
time ?

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
  2011-10-06 16:34   ` Daniel P. Berrange
@ 2011-10-06 17:42     ` Anthony Liguori
  2011-10-06 18:05       ` Corey Bryant
  2011-10-06 18:08       ` Corey Bryant
  0 siblings, 2 replies; 23+ messages in thread
From: Anthony Liguori @ 2011-10-06 17:42 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Richa Marwaha, coreyb, qemu-devel

On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:
> On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:
>> The ideal way to use qemu-bridge-helper is to give it an fscap of using:
>>
>>   setcap cap_net_admin=ep qemu-bridge-helper
>>
>> Unfortunately, most distros still do not have a mechanism to package files
>> with fscaps applied.  This means they'll have to SUID the qemu-bridge-helper
>> binary.
>>
>> To improve security, use libcap to reduce our capability set to just
>> cap_net_admin, then reduce privileges down to the calling user.  This is
>> hopefully close to equivalent to fscap support from a security perspective.
>> +#ifdef CONFIG_LIBCAP
>> +static int drop_privileges(void)
>> +{
>> +    cap_t cap;
>> +    cap_value_t new_caps[] = {CAP_NET_ADMIN};
>> +
>> +    cap = cap_init();
>
> Check for NULL ?
>
>> +
>> +    /* set capabilities to be permitted and inheritable.  we don't need the
>> +     * caps to be effective right now as they'll get reset when we seteuid
>> +     * anyway */
>> +    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>> +    cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);
>
> Check for failure ?
>
>> +
>> +    if (cap_set_proc(cap) == -1) {
>> +        return -1;
>> +    }
>> +
>> +    cap_free(cap);
>
> Check for failure ?
>
>> +
>> +    /* reduce our privileges to a normal user */
>> +    setegid(getgid());
>> +    seteuid(getuid());
>
> Check for failure ?
>
>> +    cap = cap_init();
>
> Check for NULL ?
>
>> +
>> +    /* enable the our capabilities.  we marked them as inheritable earlier
>> +     * which is what allows this to work. */
>> +    cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
>> +    cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>
> Check for failure ?
>
>> +
>> +    if (cap_set_proc(cap) == -1) {
>> +        return -1;
>> +    }
>> +
>> +    cap_free(cap);
>
> Check for failure ?
>
>> +
>> +    return 0;
>> +}
>> +#endif
>
> It may seem like checking for failure on cap_free/cap_set_flag is
> not required because they can only return EINVAL for invalid
> args, but since this is missing the check for NULL on cap_init
> you can actually see errors from those latter functions in an
> OOM cenario.
>
> I think I'd suggest not using libcap, instead try libcap-ng [1] whose
> APIs are designed with safety in mind&  result in much simpler and
> clearer code:
>
> eg, that entire function above can be expressed using capng with
> something approximating:
>
>       capng_clear(CAPNG_SELECT_BOTH);
>       if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_NET_ADMIN)<  0)
>           error(...);
>       if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP | CAPNG_CLEAR_BOUNDING))
>           error(...);

Ah, libcap-ng didn't exist when the code was initially written but I agree, it 
looks like a nice library.

Regards,

Anthony Liguori

>
>
> Regards,
> Daniel
>
> [1] http://people.redhat.com/sgrubb/libcap-ng/
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
  2011-10-06 16:41   ` Daniel P. Berrange
@ 2011-10-06 17:44   ` Anthony Liguori
  2011-10-06 18:10     ` Corey Bryant
  1 sibling, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2011-10-06 17:44 UTC (permalink / raw)
  To: Richa Marwaha; +Cc: coreyb, qemu-devel

On 10/06/2011 10:38 AM, Richa Marwaha wrote:
> This patch adds a helper that can be used to create a tap device attached to
> a bridge device.  Since this helper is minimal in what it does, it can be
> given CAP_NET_ADMIN which allows qemu to avoid running as root while still
> satisfying the majority of what users tend to want to do with tap devices.
>
> The way this all works is that qemu launches this helper passing a bridge
> name and the name of an inherited file descriptor.  The descriptor is one
> end of a socketpair() of domain sockets.  This domain socket is used to
> transmit a file descriptor of the opened tap device from the helper to qemu.
>
> The helper can then exit and let qemu use the tap device.
>
> Signed-off-by: Richa Marwaha<rmarwah@linux.vnet.ibm.com>
> ---
>   Makefile             |   12 +++-
>   configure            |    1 +
>   qemu-bridge-helper.c |  205 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 216 insertions(+), 2 deletions(-)
>   create mode 100644 qemu-bridge-helper.c
>
> diff --git a/Makefile b/Makefile
> index 6ed3194..f2caedc 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
>
>   LIBS+=-lz $(LIBS_TOOLS)
>
> +HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
> +
>   ifdef BUILD_DOCS
>   DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 QMP/qmp-commands.txt
>   else
> @@ -74,7 +76,7 @@ defconfig:
>
>   -include config-all-devices.mak
>
> -build-all: $(DOCS) $(TOOLS) recurse-all
> +build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
>
>   config-host.h: config-host.h-timestamp
>   config-host.h-timestamp: config-host.mak
> @@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-ob
>
>   qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o
>
> +qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
> +
>   qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
>   	$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h<  $<  >  $@,"  GEN   $@")
>
> @@ -208,7 +212,7 @@ clean:
>   # avoid old build problems by removing potentially incorrect old files
>   	rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h
>   	rm -f qemu-options.def
> -	rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
> +	rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
>   	rm -Rf .libs
>   	rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d
>   	rm -f qemu-img-cmds.h
> @@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc) install-sysconfig
>   ifneq ($(TOOLS),)
>   	$(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
>   endif
> +ifneq ($(HELPERS-y),)
> +	$(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
> +	$(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
> +endif
>   ifneq ($(BLOBS),)
>   	$(INSTALL_DIR) "$(DESTDIR)$(datadir)"
>   	set -e; for x in $(BLOBS); do \
> diff --git a/configure b/configure
> index 59b1494..3e32834 100755
> --- a/configure
> +++ b/configure
> @@ -2742,6 +2742,7 @@ echo "mandir=$mandir">>  $config_host_mak
>   echo "datadir=$datadir">>  $config_host_mak
>   echo "sysconfdir=$sysconfdir">>  $config_host_mak
>   echo "docdir=$docdir">>  $config_host_mak
> +echo "libexecdir=\${prefix}/libexec">>  $config_host_mak
>   echo "confdir=$confdir">>  $config_host_mak
>
>   case "$cpu" in
> diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
> new file mode 100644
> index 0000000..4ac7b36
> --- /dev/null
> +++ b/qemu-bridge-helper.c
> @@ -0,0 +1,205 @@
> +/*
> + * QEMU Bridge Helper
> + *
> + * Copyright IBM, Corp. 2011
> + *
> + * Authors:
> + * Anthony Liguori<address@hidden>

Heh, fairly sure that's not my email address ;-)

> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "config-host.h"
> +
> +#include<stdio.h>
> +#include<errno.h>
> +#include<fcntl.h>
> +#include<unistd.h>
> +#include<string.h>
> +#include<stdlib.h>
> +#include<ctype.h>
> +
> +#include<sys/types.h>
> +#include<sys/ioctl.h>
> +#include<sys/socket.h>
> +#include<sys/un.h>
> +#include<sys/prctl.h>
> +
> +#include<net/if.h>
> +
> +#include<linux/sockios.h>
> +
> +#include "net/tap-linux.h"
> +
> +static int has_vnet_hdr(int fd)
> +{
> +    unsigned int features = 0;
> +    struct ifreq ifreq;
> +
> +    if (ioctl(fd, TUNGETFEATURES,&features) == -1) {
> +        return -errno;
> +    }
> +
> +    if (!(features&  IFF_VNET_HDR)) {
> +        return -ENOTSUP;
> +    }
> +
> +    if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) {
> +        return -ENOTSUP;
> +    }
> +
> +    return 1;
> +}
> +
> +static void prep_ifreq(struct ifreq *ifr, const char *ifname)
> +{
> +    memset(ifr, 0, sizeof(*ifr));
> +    snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
> +}
> +
> +static int send_fd(int c, int fd)
> +{
> +    char msgbuf[CMSG_SPACE(sizeof(fd))];
> +    struct msghdr msg = {
> +        .msg_control = msgbuf,
> +        .msg_controllen = sizeof(msgbuf),
> +    };
> +    struct cmsghdr *cmsg;
> +    struct iovec iov;
> +    char req[1] = { 0x00 };
> +
> +    cmsg = CMSG_FIRSTHDR(&msg);
> +    cmsg->cmsg_level = SOL_SOCKET;
> +    cmsg->cmsg_type = SCM_RIGHTS;
> +    cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
> +    msg.msg_controllen = cmsg->cmsg_len;
> +
> +    iov.iov_base = req;
> +    iov.iov_len = sizeof(req);
> +
> +    msg.msg_iov =&iov;
> +    msg.msg_iovlen = 1;
> +    memcpy(CMSG_DATA(cmsg),&fd, sizeof(fd));
> +
> +    return sendmsg(c,&msg, 0);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    struct ifreq ifr;
> +    int fd, ctlfd, unixfd;
> +    int use_vnet = 0;
> +    int mtu;
> +    const char *bridge;
> +    char iface[IFNAMSIZ];
> +    int index;
> +
> +    /* parse arguments */
> +    if (argc<  3 || argc>  4) {
> +        fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]);
> +        return 1;
> +    }
> +
> +    index = 1;
> +    if (strcmp(argv[index], "--use-vnet") == 0) {
> +        use_vnet = 1;
> +        index++;
> +        if (argc == 3) {
> +            fprintf(stderr, "invalid number of arguments\n");
> +            return -1;
> +        }
> +    }
> +
> +    bridge = argv[index++];
> +    unixfd = atoi(argv[index++]);
> +
> +    /* open a socket to use to control the network interfaces */
> +    ctlfd = socket(AF_INET, SOCK_STREAM, 0);
> +    if (ctlfd == -1) {
> +        fprintf(stderr, "failed to open control socket\n");
> +        return -errno;
> +    }
> +
> +    /* open the tap device */
> +    fd = open("/dev/net/tun", O_RDWR);
> +    if (fd == -1) {
> +        fprintf(stderr, "failed to open /dev/net/tun\n");
> +        return -errno;
> +    }
> +
> +    /* request a tap device, disable PI, and add vnet header support if
> +     * requested and it's available. */
> +    prep_ifreq(&ifr, "tap%d");
> +    ifr.ifr_flags = IFF_TAP|IFF_NO_PI;
> +    if (use_vnet&&  has_vnet_hdr(fd)) {
> +        ifr.ifr_flags |= IFF_VNET_HDR;
> +    }
> +
> +    if (ioctl(fd, TUNSETIFF,&ifr) == -1) {
> +        fprintf(stderr, "failed to create tun device\n");
> +        return -errno;
> +    }
> +
> +    /* save tap device name */
> +    snprintf(iface, sizeof(iface), "%s", ifr.ifr_name);
> +
> +    /* get the mtu of the bridge */
> +    prep_ifreq(&ifr, bridge);
> +    if (ioctl(ctlfd, SIOCGIFMTU,&ifr) == -1) {
> +        fprintf(stderr, "failed to get mtu of bridge `%s'\n", bridge);
> +        return -errno;
> +    }
> +
> +    /* save mtu */
> +    mtu = ifr.ifr_mtu;
> +
> +    /* set the mtu of the interface based on the bridge */
> +    prep_ifreq(&ifr, iface);
> +    ifr.ifr_mtu = mtu;
> +    if (ioctl(ctlfd, SIOCSIFMTU,&ifr) == -1) {
> +        fprintf(stderr, "failed to set mtu of device `%s' to %d\n",
> +                iface, mtu);
> +        return -errno;
> +    }
> +
> +    /* add the interface to the bridge */
> +    prep_ifreq(&ifr, bridge);
> +    ifr.ifr_ifindex = if_nametoindex(iface);
> +
> +    if (ioctl(ctlfd, SIOCBRADDIF,&ifr) == -1) {
> +        fprintf(stderr, "failed to add interface `%s' to bridge `%s'\n",
> +                iface, bridge);
> +        return -errno;
> +    }
> +
> +    /* bring the interface up */
> +    prep_ifreq(&ifr, iface);
> +    if (ioctl(ctlfd, SIOCGIFFLAGS,&ifr) == -1) {
> +        fprintf(stderr, "failed to get interface flags for `%s'\n", iface);
> +        return -errno;
> +    }
> +
> +    ifr.ifr_flags |= IFF_UP;
> +    if (ioctl(ctlfd, SIOCSIFFLAGS,&ifr) == -1) {
> +        fprintf(stderr, "failed to set bring up interface `%s'\n", iface);
> +        return -errno;
> +    }
> +
> +    /* write fd to the domain socket */
> +    if (send_fd(unixfd, fd) == -1) {
> +        fprintf(stderr, "failed to write fd to unix socket\n");
> +        return -errno;
> +    }
> +
> +    /* ... */
> +
> +    /* profit! */

Sold!

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Please put my SoB before yours in the next submission.

Regards,

Anthony Liguori

> +
> +    close(fd);
> +
> +    close(ctlfd);
> +
> +    return 0;
> +}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
  2011-10-06 15:38 ` [Qemu-devel] [PATCH 4/4] Add support for bridge Richa Marwaha
@ 2011-10-06 17:49   ` Anthony Liguori
  2011-10-06 18:15     ` Corey Bryant
  0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2011-10-06 17:49 UTC (permalink / raw)
  To: Richa Marwaha; +Cc: coreyb, qemu-devel

On 10/06/2011 10:38 AM, Richa Marwaha wrote:
> The most common use of -net tap is to connect a tap device to a bridge.  This
> requires the use of a script and running qemu as root in order to allocate a
> tap device to pass to the script.
>
> This model is great for portability and flexibility but it's incredibly
> difficult to eliminate the need to run qemu as root.  The only really viable
> mechanism is to use tunctl to create a tap device, attach it to a bridge as
> root, and then hand that tap device to qemu.  The problem with this mechanism
> is that it requires administrator intervention whenever a user wants to create
> a guest.
>
> By essentially writing a helper that implements the most common qemu-ifup
> script that can be safely given cap_net_admin, we can dramatically simplify
> things for non-privileged users.  We still support existing -net tap options
> as a mechanism for advanced users and backwards compatibility.
>
> Currently, this is very Linux centric but there's really no reason why it
> couldn't be extended for other Unixes.
>
> The default bridge that we attach to is qemubr0.  The thinking is that a distro
> could preconfigure such an interface to allow out-of-the-box bridged networking.
>
> Alternatively, if a user wants to use a different bridge, they can say:
>
>    qemu-hda linux.img -net tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
>                       -net nic,model=virtio


Wouldn't it be better to make the syntax:

-net bridge[,br=BRIDGE][,helper=HELPER]

And default BRIDGE to br0 and HELPER to ${prefix}/libexec/qemu-bridge-helper ?

That gives distros a proper way to configure a default bridge making -net bridge 
Just Work for most people.

Regards,

Anthony Liguori

>
> Signed-off-by: Richa Marwaha<rmarwah@linux.vnet.ibm.com>
> ---
>   configure       |    2 +
>   net.c           |    8 +++
>   net.h           |    2 +
>   net/tap.c       |  150 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   qemu-options.hx |   48 +++++++++++++-----
>   5 files changed, 190 insertions(+), 20 deletions(-)
>
> diff --git a/configure b/configure
> index f46e9b7..ef05954 100755
> --- a/configure
> +++ b/configure
> @@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">>  $config_host_mak
>   echo "docdir=$docdir">>  $config_host_mak
>   echo "libexecdir=\${prefix}/libexec">>  $config_host_mak
>   echo "confdir=$confdir">>  $config_host_mak
> +echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">>  $config_host_mak
> +echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">>  $config_host_mak
>
>   case "$cpu" in
>     i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)
> diff --git a/net.c b/net.c
> index d05930c..4c3c551 100644
> --- a/net.c
> +++ b/net.c
> @@ -956,6 +956,14 @@ static const struct {
>                   .type = QEMU_OPT_STRING,
>                   .help = "script to shut down the interface",
>               }, {
> +                .name = "br",
> +                .type = QEMU_OPT_STRING,
> +                .help = "bridge name",
> +            }, {
> +                .name = "helper",
> +                .type = QEMU_OPT_STRING,
> +                .help = "command to execute to configure bridge",
> +            }, {
>                   .name = "sndbuf",
>                   .type = QEMU_OPT_SIZE,
>                   .help = "send buffer limit"
> diff --git a/net.h b/net.h
> index 9f633f8..eeb19a7 100644
> --- a/net.h
> +++ b/net.h
> @@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
>
>   #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
>   #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
> +#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
> +#define DEFAULT_BRIDGE_INTERFACE "qemubr0"
>
>   void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
>
> diff --git a/net/tap.c b/net/tap.c
> index 1f26dc9..74f103a 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -388,6 +388,108 @@ static int launch_script(const char *setup_script, const char *ifname, int fd)
>       return -1;
>   }
>
> +static int recv_fd(int c)
> +{
> +    int fd;
> +    uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
> +    struct msghdr msg = {
> +        .msg_control = msgbuf,
> +        .msg_controllen = sizeof(msgbuf),
> +    };
> +    struct cmsghdr *cmsg;
> +    struct iovec iov;
> +    uint8_t req[1];
> +    ssize_t len;
> +
> +    cmsg = CMSG_FIRSTHDR(&msg);
> +    cmsg->cmsg_level = SOL_SOCKET;
> +    cmsg->cmsg_type = SCM_RIGHTS;
> +    cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
> +    msg.msg_controllen = cmsg->cmsg_len;
> +
> +    iov.iov_base = req;
> +    iov.iov_len = sizeof(req);
> +
> +    msg.msg_iov =&iov;
> +    msg.msg_iovlen = 1;
> +
> +    len = recvmsg(c,&msg, 0);
> +    if (len>  0) {
> +        memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
> +        return fd;
> +    }
> +
> +    return len;
> +}
> +
> +static int net_bridge_run_helper(const char *helper, const char *bridge)
> +{
> +    sigset_t oldmask, mask;
> +    int pid, status;
> +    char *args[5];
> +    char **parg;
> +    int sv[2];
> +
> +    sigemptyset(&mask);
> +    sigaddset(&mask, SIGCHLD);
> +    sigprocmask(SIG_BLOCK,&mask,&oldmask);
> +
> +    if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
> +        return -1;
> +    }
> +
> +    /* try to launch bridge helper */
> +    pid = fork();
> +    if (pid == 0) {
> +        int open_max = sysconf(_SC_OPEN_MAX), i;
> +        char buf[32];
> +
> +        snprintf(buf, sizeof(buf), "%d", sv[1]);
> +
> +        for (i = 0; i<  open_max; i++) {
> +            if (i != STDIN_FILENO&&
> +                i != STDOUT_FILENO&&
> +                i != STDERR_FILENO&&
> +                i != sv[1]) {
> +                close(i);
> +            }
> +        }
> +        parg = args;
> +        *parg++ = (char *)helper;
> +        *parg++ = (char *)"--use-vnet";
> +        *parg++ = (char *)bridge;
> +        *parg++ = buf;
> +        *parg++ = NULL;
> +        execv(helper, args);
> +        _exit(1);
> +    } else if (pid>  0) {
> +        int fd;
> +
> +        close(sv[1]);
> +
> +        do {
> +            fd = recv_fd(sv[0]);
> +        } while (fd == -1&&  errno == EINTR);
> +
> +        close(sv[0]);
> +
> +        while (waitpid(pid,&status, 0) != pid) {
> +            /* loop */
> +        }
> +        sigprocmask(SIG_SETMASK,&oldmask, NULL);
> +        if (fd<  0) {
> +            fprintf(stderr, "failed to recv file descriptor\n");
> +            return -1;
> +        }
> +
> +        if (WIFEXITED(status)&&  WEXITSTATUS(status) == 0) {
> +            return fd;
> +        }
> +    }
> +    fprintf(stderr, "failed to launch bridge helper\n");
> +    return -1;
> +}
> +
>   static int net_tap_init(QemuOpts *opts, int *vnet_hdr)
>   {
>       int fd, vnet_hdr_required;
> @@ -433,8 +535,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
>           if (qemu_opt_get(opts, "ifname") ||
>               qemu_opt_get(opts, "script") ||
>               qemu_opt_get(opts, "downscript") ||
> -            qemu_opt_get(opts, "vnet_hdr")) {
> -            error_report("ifname=, script=, downscript= and vnet_hdr= is invalid with fd=");
> +            qemu_opt_get(opts, "vnet_hdr") ||
> +            qemu_opt_get(opts, "br") ||
> +            qemu_opt_get(opts, "helper")) {
> +            error_report("ifname=, script=, downscript=, vnet_hdr=,"
> +                         "br= and helper= are invalid with fd=");
>               return -1;
>           }
>
> @@ -446,6 +551,37 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
>           fcntl(fd, F_SETFL, O_NONBLOCK);
>
>           vnet_hdr = tap_probe_vnet_hdr(fd);
> +    } else if (qemu_opt_get(opts, "helper")) {
> +        if (qemu_opt_get(opts, "ifname") ||
> +            qemu_opt_get(opts, "script") ||
> +            qemu_opt_get(opts, "downscript")) {
> +            error_report("ifname=, script= and downscript="
> +                         "are invalid with helper=");
> +            return -1;
> +        }
> +
> +        if (!qemu_opt_get(opts, "br")) {
> +            qemu_opt_set(opts, "br", DEFAULT_BRIDGE_INTERFACE);
> +        }
> +
> +        fd = net_bridge_run_helper(qemu_opt_get(opts, "helper"),
> +                                   qemu_opt_get(opts, "br"));
> +
> +        fcntl(fd, F_SETFL, O_NONBLOCK);
> +
> +        vnet_hdr = tap_probe_vnet_hdr(fd);
> +
> +        s = net_tap_fd_init(vlan, "bridge", name, fd, vnet_hdr);
> +
> +        if (!s) {
> +            close(fd);
> +            return -1;
> +        }
> +
> +        snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> +                "br=%s", qemu_opt_get(opts, "br"));
> +
> +        return 0;
>       } else {
>           if (!qemu_opt_get(opts, "script")) {
>               qemu_opt_set(opts, "script", DEFAULT_NETWORK_SCRIPT);
> @@ -459,12 +595,12 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char *name, VLANState *vlan
>           if (fd == -1) {
>               return -1;
>           }
> -    }
>
> -    s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
> -    if (!s) {
> -        close(fd);
> -        return -1;
> +        s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
> +        if (!s) {
> +            close(fd);
> +            return -1;
> +        }
>       }
>
>       if (tap_set_sndbuf(s->fd, opts)<  0) {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index dfbabd0..ad4afa9 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1149,11 +1149,15 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>       "-net tap[,vlan=n][,name=str],ifname=name\n"
>       "                connect the host TAP network interface to VLAN 'n'\n"
>   #else
> -    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
> -    "                connect the host TAP network interface to VLAN 'n' and use the\n"
> -    "                network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
> -    "                and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
> +    "-net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,br=bridge][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
> +    "                connect the host TAP network interface to VLAN 'n' \n"
> +    "                use network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
> +    "                to configure it and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
> +    "                to deconfigure it.  This requires root privilege.\n"
>       "                use '[down]script=no' to disable script execution\n"
> +    "                use network helper 'helper' (default=" DEFAULT_BRIDGE_HELPER ") and\n"
> +    "                use bridge 'br' (default=" DEFAULT_BRIDGE_INTERFACE ") to configure it. This\n"
> +    "                does not require root privilege.\n"
>       "                use 'fd=h' to connect to an already opened TAP interface\n"
>       "                use 'sndbuf=nbytes' to limit the size of the send buffer (the\n"
>       "                default is disabled 'sndbuf=0' to enable flow control set 'sndbuf=1048576')\n"
> @@ -1322,26 +1326,44 @@ processed and applied to -net user. Mixing them with the new configuration
>   syntax gives undefined results. Their use for new applications is discouraged
>   as they will be removed from future versions.
>
> -@item -net tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}] [,script=@var{file}][,downscript=@var{dfile}]
> -Connect the host TAP network interface @var{name} to VLAN @var{n}, use
> -the network script @var{file} to configure it and the network script
> +@item -net tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}] [,script=@var{file}][,downscript=@var{dfile}][,br=@var{bridge}][,helper=@var{helper}]
> +Connect the host TAP network interface @var{name} to VLAN @var{n}.
> +
> +Use the network script @var{file} to configure it and the network script
>   @var{dfile} to deconfigure it. If @var{name} is not provided, the OS
> -automatically provides one. @option{fd}=@var{h} can be used to specify
> -the handle of an already opened host TAP interface. The default network
> -configure script is @file{/etc/qemu-ifup} and the default network
> -deconfigure script is @file{/etc/qemu-ifdown}. Use @option{script=no}
> -or @option{downscript=no} to disable script execution. Example:
> +automatically provides one. The default network configure script is
> +@file{/etc/qemu-ifup} and the default network deconfigure script is
> +@file{/etc/qemu-ifdown}. Use @option{script=no} or @option{downscript=no}
> +to disable script execution.
> +
> +If running QEMU as an unprivileged user, use the network helper
> +@var{helper} to configure the TAP interface. The default network
> +bridge helper executable is @file{/usr/local/libexec/qemu-bridge-helper}
> +and bridge name interface is @file{qemubr0}.
> +
> +@option{fd}=@var{h} can be used to specify the handle of an already
> +opened host TAP interface.
> +
> +Examples:
>
>   @example
> +#launch a QEMU instance with the default network script
>   qemu linux.img -net nic -net tap
>   @end example
>
> -More complicated example (two NICs, each one connected to a TAP device)
>   @example
> +#launch a QEMU instance with two NICs, each one connected
> +#to a TAP device
>   qemu linux.img -net nic,vlan=0 -net tap,vlan=0,ifname=tap0 \
>                  -net nic,vlan=1 -net tap,vlan=1,ifname=tap1
>   @end example
>
> +@example
> +#launch a QEMU instance with the default network helper to
> +#connect a TAP device to bridge br0
> +qemu linux.img -net nic -net tap,helper=/usr/local/libexec/qemu-bridge-helper,br=br0
> +@end example
> +
>   @item -net socket[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}] [,listen=[@var{host}]:@var{port}][,connect=@var{host}:@var{port}]
>
>   Connect the VLAN @var{n} to a remote VLAN in another QEMU virtual

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 16:41   ` Daniel P. Berrange
@ 2011-10-06 18:04     ` Anthony Liguori
  2011-10-06 18:38       ` Corey Bryant
  0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2011-10-06 18:04 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Richa Marwaha, coreyb, qemu-devel

On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
> On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
>> This patch adds a helper that can be used to create a tap device attached to
>> a bridge device.  Since this helper is minimal in what it does, it can be
>> given CAP_NET_ADMIN which allows qemu to avoid running as root while still
>> satisfying the majority of what users tend to want to do with tap devices.
>>
>> The way this all works is that qemu launches this helper passing a bridge
>> name and the name of an inherited file descriptor.  The descriptor is one
>> end of a socketpair() of domain sockets.  This domain socket is used to
>> transmit a file descriptor of the opened tap device from the helper to qemu.
>>
>> The helper can then exit and let qemu use the tap device.
>
> When QEMU is run by libvirt, we generally like to use capng to
> remove the ability for QEMU to run setuid programs at all. So
> obviously it will struggle to run the qemu-bridge-helper binary
> in such a scenario.
>
> With the way you transmit the TAP device FD back to the caller,
> it looks like libvirt itself could execute the qemu-bridge-helper
> receiving the FD, and then pass the FD onto QEMU using the
> traditional tap,fd=XX syntax.

Exactly.  This would allow tap-based networking using libvirt session:// URIs.

>
> The TAP device FD is only one FD we normally pass to QEMU. How about
> support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
> to send back a vhost net FD also.

Absolutely.

> Or indeed multiple vhost net FDs
> when we get multiqueue NICs.  Should we expect the bridge helper to
> be strictly limited to just connecting a TAP dev to a bridge, or is
> the expectation that it will grow more&  more functionality over
> time ?

I would not expect it to do more than create virtual network interfaces, and add 
them to bridges.  Multiqueue virtual nics, vhost, etc. would all be in scope as 
they are part of creating a virtual network interface.

Creating the bridges and managing the bridges should be done statically by an 
administrator and would be out of scope.

Regards,

Anthony Liguori

>
> Daniel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
  2011-10-06 17:42     ` Anthony Liguori
@ 2011-10-06 18:05       ` Corey Bryant
  2011-10-06 18:08       ` Corey Bryant
  1 sibling, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:05 UTC (permalink / raw)
  To: qemu-devel



On 10/06/2011 01:42 PM, Anthony Liguori wrote:
> On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:
>> On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:
>>> The ideal way to use qemu-bridge-helper is to give it an fscap of using:
>>>
>>> setcap cap_net_admin=ep qemu-bridge-helper
>>>
>>> Unfortunately, most distros still do not have a mechanism to package
>>> files
>>> with fscaps applied. This means they'll have to SUID the
>>> qemu-bridge-helper
>>> binary.
>>>
>>> To improve security, use libcap to reduce our capability set to just
>>> cap_net_admin, then reduce privileges down to the calling user. This is
>>> hopefully close to equivalent to fscap support from a security
>>> perspective.
>>> +#ifdef CONFIG_LIBCAP
>>> +static int drop_privileges(void)
>>> +{
>>> + cap_t cap;
>>> + cap_value_t new_caps[] = {CAP_NET_ADMIN};
>>> +
>>> + cap = cap_init();
>>
>> Check for NULL ?
>>
>>> +
>>> + /* set capabilities to be permitted and inheritable. we don't need the
>>> + * caps to be effective right now as they'll get reset when we seteuid
>>> + * anyway */
>>> + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>>> + cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);
>>
>> Check for failure ?
>>
>>> +
>>> + if (cap_set_proc(cap) == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + cap_free(cap);
>>
>> Check for failure ?
>>
>>> +
>>> + /* reduce our privileges to a normal user */
>>> + setegid(getgid());
>>> + seteuid(getuid());
>>
>> Check for failure ?
>>
>>> + cap = cap_init();
>>
>> Check for NULL ?
>>
>>> +
>>> + /* enable the our capabilities. we marked them as inheritable earlier
>>> + * which is what allows this to work. */
>>> + cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
>>> + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>>
>> Check for failure ?
>>
>>> +
>>> + if (cap_set_proc(cap) == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + cap_free(cap);
>>
>> Check for failure ?
>>
>>> +
>>> + return 0;
>>> +}
>>> +#endif
>>
>> It may seem like checking for failure on cap_free/cap_set_flag is
>> not required because they can only return EINVAL for invalid
>> args, but since this is missing the check for NULL on cap_init
>> you can actually see errors from those latter functions in an
>> OOM cenario.
>>
>> I think I'd suggest not using libcap, instead try libcap-ng [1] whose
>> APIs are designed with safety in mind& result in much simpler and
>> clearer code:
>>
>> eg, that entire function above can be expressed using capng with
>> something approximating:
>>
>> capng_clear(CAPNG_SELECT_BOTH);
>> if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED,
>> CAP_NET_ADMIN)< 0)
>> error(...);
>> if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP |
>> CAPNG_CLEAR_BOUNDING))
>> error(...);
>
> Ah, libcap-ng didn't exist when the code was initially written but I
> agree, it looks like a nice library.
>
> Regards,
>
> Anthony Liguori
>

This looks a lot simpler.  We'll definitely look into implementing this 
in v2.

-- 
Regards,
Corey

>>
>>
>> Regards,
>> Daniel
>>
>> [1] http://people.redhat.com/sgrubb/libcap-ng/
>>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID
  2011-10-06 17:42     ` Anthony Liguori
  2011-10-06 18:05       ` Corey Bryant
@ 2011-10-06 18:08       ` Corey Bryant
  1 sibling, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:08 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel



On 10/06/2011 01:42 PM, Anthony Liguori wrote:
> On 10/06/2011 11:34 AM, Daniel P. Berrange wrote:
>> On Thu, Oct 06, 2011 at 11:38:27AM -0400, Richa Marwaha wrote:
>>> The ideal way to use qemu-bridge-helper is to give it an fscap of using:
>>>
>>> setcap cap_net_admin=ep qemu-bridge-helper
>>>
>>> Unfortunately, most distros still do not have a mechanism to package
>>> files
>>> with fscaps applied. This means they'll have to SUID the
>>> qemu-bridge-helper
>>> binary.
>>>
>>> To improve security, use libcap to reduce our capability set to just
>>> cap_net_admin, then reduce privileges down to the calling user. This is
>>> hopefully close to equivalent to fscap support from a security
>>> perspective.
>>> +#ifdef CONFIG_LIBCAP
>>> +static int drop_privileges(void)
>>> +{
>>> + cap_t cap;
>>> + cap_value_t new_caps[] = {CAP_NET_ADMIN};
>>> +
>>> + cap = cap_init();
>>
>> Check for NULL ?
>>
>>> +
>>> + /* set capabilities to be permitted and inheritable. we don't need the
>>> + * caps to be effective right now as they'll get reset when we seteuid
>>> + * anyway */
>>> + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>>> + cap_set_flag(cap, CAP_INHERITABLE, 1, new_caps, CAP_SET);
>>
>> Check for failure ?
>>
>>> +
>>> + if (cap_set_proc(cap) == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + cap_free(cap);
>>
>> Check for failure ?
>>
>>> +
>>> + /* reduce our privileges to a normal user */
>>> + setegid(getgid());
>>> + seteuid(getuid());
>>
>> Check for failure ?
>>
>>> + cap = cap_init();
>>
>> Check for NULL ?
>>
>>> +
>>> + /* enable the our capabilities. we marked them as inheritable earlier
>>> + * which is what allows this to work. */
>>> + cap_set_flag(cap, CAP_EFFECTIVE, 1, new_caps, CAP_SET);
>>> + cap_set_flag(cap, CAP_PERMITTED, 1, new_caps, CAP_SET);
>>
>> Check for failure ?
>>
>>> +
>>> + if (cap_set_proc(cap) == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + cap_free(cap);
>>
>> Check for failure ?
>>
>>> +
>>> + return 0;
>>> +}
>>> +#endif
>>
>> It may seem like checking for failure on cap_free/cap_set_flag is
>> not required because they can only return EINVAL for invalid
>> args, but since this is missing the check for NULL on cap_init
>> you can actually see errors from those latter functions in an
>> OOM cenario.
>>
>> I think I'd suggest not using libcap, instead try libcap-ng [1] whose
>> APIs are designed with safety in mind& result in much simpler and
>> clearer code:
>>
>> eg, that entire function above can be expressed using capng with
>> something approximating:
>>
>> capng_clear(CAPNG_SELECT_BOTH);
>> if (capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED,
>> CAP_NET_ADMIN)< 0)
>> error(...);
>> if (capng_change_id(getuid(), getgid(), CAPNG_DROP_SUPP_GRP |
>> CAPNG_CLEAR_BOUNDING))
>> error(...);
>
> Ah, libcap-ng didn't exist when the code was initially written but I
> agree, it looks like a nice library.
>
> Regards,
>
> Anthony Liguori
>

This looks a lot simpler.  We'll definitely look into implementing this
in v2.

-- 
Regards,
Corey

>>
>>
>> Regards,
>> Daniel
>>
>> [1] http://people.redhat.com/sgrubb/libcap-ng/
>>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 17:44   ` Anthony Liguori
@ 2011-10-06 18:10     ` Corey Bryant
  0 siblings, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:10 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Richa Marwaha, qemu-devel



On 10/06/2011 01:44 PM, Anthony Liguori wrote:
> On 10/06/2011 10:38 AM, Richa Marwaha wrote:
>> This patch adds a helper that can be used to create a tap device
>> attached to
>> a bridge device. Since this helper is minimal in what it does, it can be
>> given CAP_NET_ADMIN which allows qemu to avoid running as root while
>> still
>> satisfying the majority of what users tend to want to do with tap
>> devices.
>>
>> The way this all works is that qemu launches this helper passing a bridge
>> name and the name of an inherited file descriptor. The descriptor is one
>> end of a socketpair() of domain sockets. This domain socket is used to
>> transmit a file descriptor of the opened tap device from the helper to
>> qemu.
>>
>> The helper can then exit and let qemu use the tap device.
>>
>> Signed-off-by: Richa Marwaha<rmarwah@linux.vnet.ibm.com>
>> ---
>> Makefile | 12 +++-
>> configure | 1 +
>> qemu-bridge-helper.c | 205
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 216 insertions(+), 2 deletions(-)
>> create mode 100644 qemu-bridge-helper.c
>>
>> diff --git a/Makefile b/Makefile
>> index 6ed3194..f2caedc 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -34,6 +34,8 @@ $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
>>
>> LIBS+=-lz $(LIBS_TOOLS)
>>
>> +HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
>> +
>> ifdef BUILD_DOCS
>> DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8
>> QMP/qmp-commands.txt
>> else
>> @@ -74,7 +76,7 @@ defconfig:
>>
>> -include config-all-devices.mak
>>
>> -build-all: $(DOCS) $(TOOLS) recurse-all
>> +build-all: $(DOCS) $(TOOLS) $(HELPERS-y) recurse-all
>>
>> config-host.h: config-host.h-timestamp
>> config-host.h-timestamp: config-host.mak
>> @@ -151,6 +153,8 @@ qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o
>> qemu-error.o $(oslib-obj-y) $(trace-ob
>>
>> qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o
>> $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y)
>> $(version-obj-y) qemu-timer-common.o
>>
>> +qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
>> +
>> qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
>> $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h< $< > $@," GEN $@")
>>
>> @@ -208,7 +212,7 @@ clean:
>> # avoid old build problems by removing potentially incorrect old files
>> rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h
>> gen-op-arm.h
>> rm -f qemu-options.def
>> - rm -f *.o *.d *.a *.lo $(TOOLS) qemu-ga TAGS cscope.* *.pod *~ */*~
>> + rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.*
>> *.pod *~ */*~
>> rm -Rf .libs
>> rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d
>> net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d
>> qga/*.o qga/*.d
>> rm -f qemu-img-cmds.h
>> @@ -275,6 +279,10 @@ install: all $(if $(BUILD_DOCS),install-doc)
>> install-sysconfig
>> ifneq ($(TOOLS),)
>> $(INSTALL_PROG) $(STRIP_OPT) $(TOOLS) "$(DESTDIR)$(bindir)"
>> endif
>> +ifneq ($(HELPERS-y),)
>> + $(INSTALL_DIR) "$(DESTDIR)$(libexecdir)"
>> + $(INSTALL_PROG) $(STRIP_OPT) $(HELPERS-y) "$(DESTDIR)$(libexecdir)"
>> +endif
>> ifneq ($(BLOBS),)
>> $(INSTALL_DIR) "$(DESTDIR)$(datadir)"
>> set -e; for x in $(BLOBS); do \
>> diff --git a/configure b/configure
>> index 59b1494..3e32834 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2742,6 +2742,7 @@ echo "mandir=$mandir">> $config_host_mak
>> echo "datadir=$datadir">> $config_host_mak
>> echo "sysconfdir=$sysconfdir">> $config_host_mak
>> echo "docdir=$docdir">> $config_host_mak
>> +echo "libexecdir=\${prefix}/libexec">> $config_host_mak
>> echo "confdir=$confdir">> $config_host_mak
>>
>> case "$cpu" in
>> diff --git a/qemu-bridge-helper.c b/qemu-bridge-helper.c
>> new file mode 100644
>> index 0000000..4ac7b36
>> --- /dev/null
>> +++ b/qemu-bridge-helper.c
>> @@ -0,0 +1,205 @@
>> +/*
>> + * QEMU Bridge Helper
>> + *
>> + * Copyright IBM, Corp. 2011
>> + *
>> + * Authors:
>> + * Anthony Liguori<address@hidden>
>
> Heh, fairly sure that's not my email address ;-)
>

I thought that was a secret identity. :) We'll update that.

>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2. See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "config-host.h"
>> +
>> +#include<stdio.h>
>> +#include<errno.h>
>> +#include<fcntl.h>
>> +#include<unistd.h>
>> +#include<string.h>
>> +#include<stdlib.h>
>> +#include<ctype.h>
>> +
>> +#include<sys/types.h>
>> +#include<sys/ioctl.h>
>> +#include<sys/socket.h>
>> +#include<sys/un.h>
>> +#include<sys/prctl.h>
>> +
>> +#include<net/if.h>
>> +
>> +#include<linux/sockios.h>
>> +
>> +#include "net/tap-linux.h"
>> +
>> +static int has_vnet_hdr(int fd)
>> +{
>> + unsigned int features = 0;
>> + struct ifreq ifreq;
>> +
>> + if (ioctl(fd, TUNGETFEATURES,&features) == -1) {
>> + return -errno;
>> + }
>> +
>> + if (!(features& IFF_VNET_HDR)) {
>> + return -ENOTSUP;
>> + }
>> +
>> + if (ioctl(fd, TUNGETIFF,&ifreq) != -1 || errno != EBADFD) {
>> + return -ENOTSUP;
>> + }
>> +
>> + return 1;
>> +}
>> +
>> +static void prep_ifreq(struct ifreq *ifr, const char *ifname)
>> +{
>> + memset(ifr, 0, sizeof(*ifr));
>> + snprintf(ifr->ifr_name, IFNAMSIZ, "%s", ifname);
>> +}
>> +
>> +static int send_fd(int c, int fd)
>> +{
>> + char msgbuf[CMSG_SPACE(sizeof(fd))];
>> + struct msghdr msg = {
>> + .msg_control = msgbuf,
>> + .msg_controllen = sizeof(msgbuf),
>> + };
>> + struct cmsghdr *cmsg;
>> + struct iovec iov;
>> + char req[1] = { 0x00 };
>> +
>> + cmsg = CMSG_FIRSTHDR(&msg);
>> + cmsg->cmsg_level = SOL_SOCKET;
>> + cmsg->cmsg_type = SCM_RIGHTS;
>> + cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
>> + msg.msg_controllen = cmsg->cmsg_len;
>> +
>> + iov.iov_base = req;
>> + iov.iov_len = sizeof(req);
>> +
>> + msg.msg_iov =&iov;
>> + msg.msg_iovlen = 1;
>> + memcpy(CMSG_DATA(cmsg),&fd, sizeof(fd));
>> +
>> + return sendmsg(c,&msg, 0);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + struct ifreq ifr;
>> + int fd, ctlfd, unixfd;
>> + int use_vnet = 0;
>> + int mtu;
>> + const char *bridge;
>> + char iface[IFNAMSIZ];
>> + int index;
>> +
>> + /* parse arguments */
>> + if (argc< 3 || argc> 4) {
>> + fprintf(stderr, "Usage: %s [--use-vnet] BRIDGE FD\n", argv[0]);
>> + return 1;
>> + }
>> +
>> + index = 1;
>> + if (strcmp(argv[index], "--use-vnet") == 0) {
>> + use_vnet = 1;
>> + index++;
>> + if (argc == 3) {
>> + fprintf(stderr, "invalid number of arguments\n");
>> + return -1;
>> + }
>> + }
>> +
>> + bridge = argv[index++];
>> + unixfd = atoi(argv[index++]);
>> +
>> + /* open a socket to use to control the network interfaces */
>> + ctlfd = socket(AF_INET, SOCK_STREAM, 0);
>> + if (ctlfd == -1) {
>> + fprintf(stderr, "failed to open control socket\n");
>> + return -errno;
>> + }
>> +
>> + /* open the tap device */
>> + fd = open("/dev/net/tun", O_RDWR);
>> + if (fd == -1) {
>> + fprintf(stderr, "failed to open /dev/net/tun\n");
>> + return -errno;
>> + }
>> +
>> + /* request a tap device, disable PI, and add vnet header support if
>> + * requested and it's available. */
>> + prep_ifreq(&ifr, "tap%d");
>> + ifr.ifr_flags = IFF_TAP|IFF_NO_PI;
>> + if (use_vnet&& has_vnet_hdr(fd)) {
>> + ifr.ifr_flags |= IFF_VNET_HDR;
>> + }
>> +
>> + if (ioctl(fd, TUNSETIFF,&ifr) == -1) {
>> + fprintf(stderr, "failed to create tun device\n");
>> + return -errno;
>> + }
>> +
>> + /* save tap device name */
>> + snprintf(iface, sizeof(iface), "%s", ifr.ifr_name);
>> +
>> + /* get the mtu of the bridge */
>> + prep_ifreq(&ifr, bridge);
>> + if (ioctl(ctlfd, SIOCGIFMTU,&ifr) == -1) {
>> + fprintf(stderr, "failed to get mtu of bridge `%s'\n", bridge);
>> + return -errno;
>> + }
>> +
>> + /* save mtu */
>> + mtu = ifr.ifr_mtu;
>> +
>> + /* set the mtu of the interface based on the bridge */
>> + prep_ifreq(&ifr, iface);
>> + ifr.ifr_mtu = mtu;
>> + if (ioctl(ctlfd, SIOCSIFMTU,&ifr) == -1) {
>> + fprintf(stderr, "failed to set mtu of device `%s' to %d\n",
>> + iface, mtu);
>> + return -errno;
>> + }
>> +
>> + /* add the interface to the bridge */
>> + prep_ifreq(&ifr, bridge);
>> + ifr.ifr_ifindex = if_nametoindex(iface);
>> +
>> + if (ioctl(ctlfd, SIOCBRADDIF,&ifr) == -1) {
>> + fprintf(stderr, "failed to add interface `%s' to bridge `%s'\n",
>> + iface, bridge);
>> + return -errno;
>> + }
>> +
>> + /* bring the interface up */
>> + prep_ifreq(&ifr, iface);
>> + if (ioctl(ctlfd, SIOCGIFFLAGS,&ifr) == -1) {
>> + fprintf(stderr, "failed to get interface flags for `%s'\n", iface);
>> + return -errno;
>> + }
>> +
>> + ifr.ifr_flags |= IFF_UP;
>> + if (ioctl(ctlfd, SIOCSIFFLAGS,&ifr) == -1) {
>> + fprintf(stderr, "failed to set bring up interface `%s'\n", iface);
>> + return -errno;
>> + }
>> +
>> + /* write fd to the domain socket */
>> + if (send_fd(unixfd, fd) == -1) {
>> + fprintf(stderr, "failed to write fd to unix socket\n");
>> + return -errno;
>> + }
>> +
>> + /* ... */
>> +
>> + /* profit! */
>
> Sold!
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>
> Please put my SoB before yours in the next submission.
>
> Regards,
>
> Anthony Liguori
>

Will do.

>> +
>> + close(fd);
>> +
>> + close(ctlfd);
>> +
>> + return 0;
>> +}
>
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
  2011-10-06 17:49   ` Anthony Liguori
@ 2011-10-06 18:15     ` Corey Bryant
  2011-10-06 18:19       ` Anthony Liguori
  0 siblings, 1 reply; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:15 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Richa Marwaha, qemu-devel



On 10/06/2011 01:49 PM, Anthony Liguori wrote:
> On 10/06/2011 10:38 AM, Richa Marwaha wrote:
>> The most common use of -net tap is to connect a tap device to a
>> bridge. This
>> requires the use of a script and running qemu as root in order to
>> allocate a
>> tap device to pass to the script.
>>
>> This model is great for portability and flexibility but it's incredibly
>> difficult to eliminate the need to run qemu as root. The only really
>> viable
>> mechanism is to use tunctl to create a tap device, attach it to a
>> bridge as
>> root, and then hand that tap device to qemu. The problem with this
>> mechanism
>> is that it requires administrator intervention whenever a user wants
>> to create
>> a guest.
>>
>> By essentially writing a helper that implements the most common qemu-ifup
>> script that can be safely given cap_net_admin, we can dramatically
>> simplify
>> things for non-privileged users. We still support existing -net tap
>> options
>> as a mechanism for advanced users and backwards compatibility.
>>
>> Currently, this is very Linux centric but there's really no reason why it
>> couldn't be extended for other Unixes.
>>
>> The default bridge that we attach to is qemubr0. The thinking is that
>> a distro
>> could preconfigure such an interface to allow out-of-the-box bridged
>> networking.
>>
>> Alternatively, if a user wants to use a different bridge, they can say:
>>
>> qemu-hda linux.img -net
>> tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
>> -net nic,model=virtio
>
>
> Wouldn't it be better to make the syntax:
>
> -net bridge[,br=BRIDGE][,helper=HELPER]
>
> And default BRIDGE to br0 and HELPER to
> ${prefix}/libexec/qemu-bridge-helper ?
>
> That gives distros a proper way to configure a default bridge making
> -net bridge Just Work for most people.
>
> Regards,
>
> Anthony Liguori
>

Yes I think it would be much more usable under -net bridge.  I really 
wanted this to work under -net tap (where fd and init are) but now we 
know there's no good way to default to the helper without spelling out 
the path.

We'll move to -net bridge if folks are in agreement and default to 
bridge br0.

>>
>> Signed-off-by: Richa Marwaha<rmarwah@linux.vnet.ibm.com>
>> ---
>> configure | 2 +
>> net.c | 8 +++
>> net.h | 2 +
>> net/tap.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>> qemu-options.hx | 48 +++++++++++++-----
>> 5 files changed, 190 insertions(+), 20 deletions(-)
>>
>> diff --git a/configure b/configure
>> index f46e9b7..ef05954 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2775,6 +2775,8 @@ echo "sysconfdir=$sysconfdir">> $config_host_mak
>> echo "docdir=$docdir">> $config_host_mak
>> echo "libexecdir=\${prefix}/libexec">> $config_host_mak
>> echo "confdir=$confdir">> $config_host_mak
>> +echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"">> $config_host_mak
>> +echo "CONFIG_QEMU_HELPERDIR=\"$prefix/libexec\"">> $config_host_mak
>>
>> case "$cpu" in
>> i386|x86_64|alpha|cris|hppa|ia64|lm32|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64|unicore32)
>>
>> diff --git a/net.c b/net.c
>> index d05930c..4c3c551 100644
>> --- a/net.c
>> +++ b/net.c
>> @@ -956,6 +956,14 @@ static const struct {
>> .type = QEMU_OPT_STRING,
>> .help = "script to shut down the interface",
>> }, {
>> + .name = "br",
>> + .type = QEMU_OPT_STRING,
>> + .help = "bridge name",
>> + }, {
>> + .name = "helper",
>> + .type = QEMU_OPT_STRING,
>> + .help = "command to execute to configure bridge",
>> + }, {
>> .name = "sndbuf",
>> .type = QEMU_OPT_SIZE,
>> .help = "send buffer limit"
>> diff --git a/net.h b/net.h
>> index 9f633f8..eeb19a7 100644
>> --- a/net.h
>> +++ b/net.h
>> @@ -174,6 +174,8 @@ int do_netdev_del(Monitor *mon, const QDict
>> *qdict, QObject **ret_data);
>>
>> #define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
>> #define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
>> +#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR
>> "/qemu-bridge-helper"
>> +#define DEFAULT_BRIDGE_INTERFACE "qemubr0"
>>
>> void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
>>
>> diff --git a/net/tap.c b/net/tap.c
>> index 1f26dc9..74f103a 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -388,6 +388,108 @@ static int launch_script(const char
>> *setup_script, const char *ifname, int fd)
>> return -1;
>> }
>>
>> +static int recv_fd(int c)
>> +{
>> + int fd;
>> + uint8_t msgbuf[CMSG_SPACE(sizeof(fd))];
>> + struct msghdr msg = {
>> + .msg_control = msgbuf,
>> + .msg_controllen = sizeof(msgbuf),
>> + };
>> + struct cmsghdr *cmsg;
>> + struct iovec iov;
>> + uint8_t req[1];
>> + ssize_t len;
>> +
>> + cmsg = CMSG_FIRSTHDR(&msg);
>> + cmsg->cmsg_level = SOL_SOCKET;
>> + cmsg->cmsg_type = SCM_RIGHTS;
>> + cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
>> + msg.msg_controllen = cmsg->cmsg_len;
>> +
>> + iov.iov_base = req;
>> + iov.iov_len = sizeof(req);
>> +
>> + msg.msg_iov =&iov;
>> + msg.msg_iovlen = 1;
>> +
>> + len = recvmsg(c,&msg, 0);
>> + if (len> 0) {
>> + memcpy(&fd, CMSG_DATA(cmsg), sizeof(fd));
>> + return fd;
>> + }
>> +
>> + return len;
>> +}
>> +
>> +static int net_bridge_run_helper(const char *helper, const char *bridge)
>> +{
>> + sigset_t oldmask, mask;
>> + int pid, status;
>> + char *args[5];
>> + char **parg;
>> + int sv[2];
>> +
>> + sigemptyset(&mask);
>> + sigaddset(&mask, SIGCHLD);
>> + sigprocmask(SIG_BLOCK,&mask,&oldmask);
>> +
>> + if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
>> + return -1;
>> + }
>> +
>> + /* try to launch bridge helper */
>> + pid = fork();
>> + if (pid == 0) {
>> + int open_max = sysconf(_SC_OPEN_MAX), i;
>> + char buf[32];
>> +
>> + snprintf(buf, sizeof(buf), "%d", sv[1]);
>> +
>> + for (i = 0; i< open_max; i++) {
>> + if (i != STDIN_FILENO&&
>> + i != STDOUT_FILENO&&
>> + i != STDERR_FILENO&&
>> + i != sv[1]) {
>> + close(i);
>> + }
>> + }
>> + parg = args;
>> + *parg++ = (char *)helper;
>> + *parg++ = (char *)"--use-vnet";
>> + *parg++ = (char *)bridge;
>> + *parg++ = buf;
>> + *parg++ = NULL;
>> + execv(helper, args);
>> + _exit(1);
>> + } else if (pid> 0) {
>> + int fd;
>> +
>> + close(sv[1]);
>> +
>> + do {
>> + fd = recv_fd(sv[0]);
>> + } while (fd == -1&& errno == EINTR);
>> +
>> + close(sv[0]);
>> +
>> + while (waitpid(pid,&status, 0) != pid) {
>> + /* loop */
>> + }
>> + sigprocmask(SIG_SETMASK,&oldmask, NULL);
>> + if (fd< 0) {
>> + fprintf(stderr, "failed to recv file descriptor\n");
>> + return -1;
>> + }
>> +
>> + if (WIFEXITED(status)&& WEXITSTATUS(status) == 0) {
>> + return fd;
>> + }
>> + }
>> + fprintf(stderr, "failed to launch bridge helper\n");
>> + return -1;
>> +}
>> +
>> static int net_tap_init(QemuOpts *opts, int *vnet_hdr)
>> {
>> int fd, vnet_hdr_required;
>> @@ -433,8 +535,11 @@ int net_init_tap(QemuOpts *opts, Monitor *mon,
>> const char *name, VLANState *vlan
>> if (qemu_opt_get(opts, "ifname") ||
>> qemu_opt_get(opts, "script") ||
>> qemu_opt_get(opts, "downscript") ||
>> - qemu_opt_get(opts, "vnet_hdr")) {
>> - error_report("ifname=, script=, downscript= and vnet_hdr= is invalid
>> with fd=");
>> + qemu_opt_get(opts, "vnet_hdr") ||
>> + qemu_opt_get(opts, "br") ||
>> + qemu_opt_get(opts, "helper")) {
>> + error_report("ifname=, script=, downscript=, vnet_hdr=,"
>> + "br= and helper= are invalid with fd=");
>> return -1;
>> }
>>
>> @@ -446,6 +551,37 @@ int net_init_tap(QemuOpts *opts, Monitor *mon,
>> const char *name, VLANState *vlan
>> fcntl(fd, F_SETFL, O_NONBLOCK);
>>
>> vnet_hdr = tap_probe_vnet_hdr(fd);
>> + } else if (qemu_opt_get(opts, "helper")) {
>> + if (qemu_opt_get(opts, "ifname") ||
>> + qemu_opt_get(opts, "script") ||
>> + qemu_opt_get(opts, "downscript")) {
>> + error_report("ifname=, script= and downscript="
>> + "are invalid with helper=");
>> + return -1;
>> + }
>> +
>> + if (!qemu_opt_get(opts, "br")) {
>> + qemu_opt_set(opts, "br", DEFAULT_BRIDGE_INTERFACE);
>> + }
>> +
>> + fd = net_bridge_run_helper(qemu_opt_get(opts, "helper"),
>> + qemu_opt_get(opts, "br"));
>> +
>> + fcntl(fd, F_SETFL, O_NONBLOCK);
>> +
>> + vnet_hdr = tap_probe_vnet_hdr(fd);
>> +
>> + s = net_tap_fd_init(vlan, "bridge", name, fd, vnet_hdr);
>> +
>> + if (!s) {
>> + close(fd);
>> + return -1;
>> + }
>> +
>> + snprintf(s->nc.info_str, sizeof(s->nc.info_str),
>> + "br=%s", qemu_opt_get(opts, "br"));
>> +
>> + return 0;
>> } else {
>> if (!qemu_opt_get(opts, "script")) {
>> qemu_opt_set(opts, "script", DEFAULT_NETWORK_SCRIPT);
>> @@ -459,12 +595,12 @@ int net_init_tap(QemuOpts *opts, Monitor *mon,
>> const char *name, VLANState *vlan
>> if (fd == -1) {
>> return -1;
>> }
>> - }
>>
>> - s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
>> - if (!s) {
>> - close(fd);
>> - return -1;
>> + s = net_tap_fd_init(vlan, "tap", name, fd, vnet_hdr);
>> + if (!s) {
>> + close(fd);
>> + return -1;
>> + }
>> }
>>
>> if (tap_set_sndbuf(s->fd, opts)< 0) {
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index dfbabd0..ad4afa9 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1149,11 +1149,15 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
>> "-net tap[,vlan=n][,name=str],ifname=name\n"
>> " connect the host TAP network interface to VLAN 'n'\n"
>> #else
>> - "-net
>> tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
>>
>> - " connect the host TAP network interface to VLAN 'n' and use the\n"
>> - " network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
>> - " and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
>> + "-net
>> tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,br=bridge][,helper=helper][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h][,vhostforce=on|off]\n"
>>
>> + " connect the host TAP network interface to VLAN 'n' \n"
>> + " use network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT ")\n"
>> + " to configure it and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT
>> ")\n"
>> + " to deconfigure it. This requires root privilege.\n"
>> " use '[down]script=no' to disable script execution\n"
>> + " use network helper 'helper' (default=" DEFAULT_BRIDGE_HELPER ")
>> and\n"
>> + " use bridge 'br' (default=" DEFAULT_BRIDGE_INTERFACE ") to
>> configure it. This\n"
>> + " does not require root privilege.\n"
>> " use 'fd=h' to connect to an already opened TAP interface\n"
>> " use 'sndbuf=nbytes' to limit the size of the send buffer (the\n"
>> " default is disabled 'sndbuf=0' to enable flow control set
>> 'sndbuf=1048576')\n"
>> @@ -1322,26 +1326,44 @@ processed and applied to -net user. Mixing
>> them with the new configuration
>> syntax gives undefined results. Their use for new applications is
>> discouraged
>> as they will be removed from future versions.
>>
>> -@item -net
>> tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}]
>> [,script=@var{file}][,downscript=@var{dfile}]
>> -Connect the host TAP network interface @var{name} to VLAN @var{n}, use
>> -the network script @var{file} to configure it and the network script
>> +@item -net
>> tap[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}][,ifname=@var{name}]
>> [,script=@var{file}][,downscript=@var{dfile}][,br=@var{bridge}][,helper=@var{helper}]
>>
>> +Connect the host TAP network interface @var{name} to VLAN @var{n}.
>> +
>> +Use the network script @var{file} to configure it and the network script
>> @var{dfile} to deconfigure it. If @var{name} is not provided, the OS
>> -automatically provides one. @option{fd}=@var{h} can be used to specify
>> -the handle of an already opened host TAP interface. The default network
>> -configure script is @file{/etc/qemu-ifup} and the default network
>> -deconfigure script is @file{/etc/qemu-ifdown}. Use @option{script=no}
>> -or @option{downscript=no} to disable script execution. Example:
>> +automatically provides one. The default network configure script is
>> +@file{/etc/qemu-ifup} and the default network deconfigure script is
>> +@file{/etc/qemu-ifdown}. Use @option{script=no} or
>> @option{downscript=no}
>> +to disable script execution.
>> +
>> +If running QEMU as an unprivileged user, use the network helper
>> +@var{helper} to configure the TAP interface. The default network
>> +bridge helper executable is @file{/usr/local/libexec/qemu-bridge-helper}
>> +and bridge name interface is @file{qemubr0}.
>> +
>> +@option{fd}=@var{h} can be used to specify the handle of an already
>> +opened host TAP interface.
>> +
>> +Examples:
>>
>> @example
>> +#launch a QEMU instance with the default network script
>> qemu linux.img -net nic -net tap
>> @end example
>>
>> -More complicated example (two NICs, each one connected to a TAP device)
>> @example
>> +#launch a QEMU instance with two NICs, each one connected
>> +#to a TAP device
>> qemu linux.img -net nic,vlan=0 -net tap,vlan=0,ifname=tap0 \
>> -net nic,vlan=1 -net tap,vlan=1,ifname=tap1
>> @end example
>>
>> +@example
>> +#launch a QEMU instance with the default network helper to
>> +#connect a TAP device to bridge br0
>> +qemu linux.img -net nic -net
>> tap,helper=/usr/local/libexec/qemu-bridge-helper,br=br0
>> +@end example
>> +
>> @item -net socket[,vlan=@var{n}][,name=@var{name}][,fd=@var{h}]
>> [,listen=[@var{host}]:@var{port}][,connect=@var{host}:@var{port}]
>>
>> Connect the VLAN @var{n} to a remote VLAN in another QEMU virtual
>
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
  2011-10-06 18:15     ` Corey Bryant
@ 2011-10-06 18:19       ` Anthony Liguori
  2011-10-06 18:24         ` Corey Bryant
  0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2011-10-06 18:19 UTC (permalink / raw)
  To: Corey Bryant; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel

On 10/06/2011 01:15 PM, Corey Bryant wrote:
>
>
> On 10/06/2011 01:49 PM, Anthony Liguori wrote:
>> On 10/06/2011 10:38 AM, Richa Marwaha wrote:
>>> The most common use of -net tap is to connect a tap device to a
>>> bridge. This
>>> requires the use of a script and running qemu as root in order to
>>> allocate a
>>> tap device to pass to the script.
>>>
>>> This model is great for portability and flexibility but it's incredibly
>>> difficult to eliminate the need to run qemu as root. The only really
>>> viable
>>> mechanism is to use tunctl to create a tap device, attach it to a
>>> bridge as
>>> root, and then hand that tap device to qemu. The problem with this
>>> mechanism
>>> is that it requires administrator intervention whenever a user wants
>>> to create
>>> a guest.
>>>
>>> By essentially writing a helper that implements the most common qemu-ifup
>>> script that can be safely given cap_net_admin, we can dramatically
>>> simplify
>>> things for non-privileged users. We still support existing -net tap
>>> options
>>> as a mechanism for advanced users and backwards compatibility.
>>>
>>> Currently, this is very Linux centric but there's really no reason why it
>>> couldn't be extended for other Unixes.
>>>
>>> The default bridge that we attach to is qemubr0. The thinking is that
>>> a distro
>>> could preconfigure such an interface to allow out-of-the-box bridged
>>> networking.
>>>
>>> Alternatively, if a user wants to use a different bridge, they can say:
>>>
>>> qemu-hda linux.img -net
>>> tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
>>> -net nic,model=virtio
>>
>>
>> Wouldn't it be better to make the syntax:
>>
>> -net bridge[,br=BRIDGE][,helper=HELPER]
>>
>> And default BRIDGE to br0 and HELPER to
>> ${prefix}/libexec/qemu-bridge-helper ?
>>
>> That gives distros a proper way to configure a default bridge making
>> -net bridge Just Work for most people.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>
> Yes I think it would be much more usable under -net bridge. I really wanted this
> to work under -net tap (where fd and init are) but now we know there's no good
> way to default to the helper without spelling out the path.

I'm certainly in favor of leaving helper as part of -net tap, but I think there 
should be a -net bridge in addition.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] Add support for bridge
  2011-10-06 18:19       ` Anthony Liguori
@ 2011-10-06 18:24         ` Corey Bryant
  0 siblings, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel



On 10/06/2011 02:19 PM, Anthony Liguori wrote:
> On 10/06/2011 01:15 PM, Corey Bryant wrote:
>>
>>
>> On 10/06/2011 01:49 PM, Anthony Liguori wrote:
>>> On 10/06/2011 10:38 AM, Richa Marwaha wrote:
>>>> The most common use of -net tap is to connect a tap device to a
>>>> bridge. This
>>>> requires the use of a script and running qemu as root in order to
>>>> allocate a
>>>> tap device to pass to the script.
>>>>
>>>> This model is great for portability and flexibility but it's incredibly
>>>> difficult to eliminate the need to run qemu as root. The only really
>>>> viable
>>>> mechanism is to use tunctl to create a tap device, attach it to a
>>>> bridge as
>>>> root, and then hand that tap device to qemu. The problem with this
>>>> mechanism
>>>> is that it requires administrator intervention whenever a user wants
>>>> to create
>>>> a guest.
>>>>
>>>> By essentially writing a helper that implements the most common
>>>> qemu-ifup
>>>> script that can be safely given cap_net_admin, we can dramatically
>>>> simplify
>>>> things for non-privileged users. We still support existing -net tap
>>>> options
>>>> as a mechanism for advanced users and backwards compatibility.
>>>>
>>>> Currently, this is very Linux centric but there's really no reason
>>>> why it
>>>> couldn't be extended for other Unixes.
>>>>
>>>> The default bridge that we attach to is qemubr0. The thinking is that
>>>> a distro
>>>> could preconfigure such an interface to allow out-of-the-box bridged
>>>> networking.
>>>>
>>>> Alternatively, if a user wants to use a different bridge, they can say:
>>>>
>>>> qemu-hda linux.img -net
>>>> tap,br=br0,helper=/usr/local/libexec/qemu-bridge-helper
>>>> -net nic,model=virtio
>>>
>>>
>>> Wouldn't it be better to make the syntax:
>>>
>>> -net bridge[,br=BRIDGE][,helper=HELPER]
>>>
>>> And default BRIDGE to br0 and HELPER to
>>> ${prefix}/libexec/qemu-bridge-helper ?
>>>
>>> That gives distros a proper way to configure a default bridge making
>>> -net bridge Just Work for most people.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>
>> Yes I think it would be much more usable under -net bridge. I really
>> wanted this
>> to work under -net tap (where fd and init are) but now we know there's
>> no good
>> way to default to the helper without spelling out the path.
>
> I'm certainly in favor of leaving helper as part of -net tap, but I
> think there should be a -net bridge in addition.
>
> Regards,
>
> Anthony Liguori

Ok, yes.  The best of both worlds.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 18:04     ` Anthony Liguori
@ 2011-10-06 18:38       ` Corey Bryant
  2011-10-07  9:04         ` Daniel P. Berrange
  0 siblings, 1 reply; 23+ messages in thread
From: Corey Bryant @ 2011-10-06 18:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Richa Marwaha, qemu-devel



On 10/06/2011 02:04 PM, Anthony Liguori wrote:
> On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
>> On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
>>> This patch adds a helper that can be used to create a tap device
>>> attached to
>>> a bridge device. Since this helper is minimal in what it does, it can be
>>> given CAP_NET_ADMIN which allows qemu to avoid running as root while
>>> still
>>> satisfying the majority of what users tend to want to do with tap
>>> devices.
>>>
>>> The way this all works is that qemu launches this helper passing a
>>> bridge
>>> name and the name of an inherited file descriptor. The descriptor is one
>>> end of a socketpair() of domain sockets. This domain socket is used to
>>> transmit a file descriptor of the opened tap device from the helper
>>> to qemu.
>>>
>>> The helper can then exit and let qemu use the tap device.
>>
>> When QEMU is run by libvirt, we generally like to use capng to
>> remove the ability for QEMU to run setuid programs at all. So
>> obviously it will struggle to run the qemu-bridge-helper binary
>> in such a scenario.
>>
>> With the way you transmit the TAP device FD back to the caller,
>> it looks like libvirt itself could execute the qemu-bridge-helper
>> receiving the FD, and then pass the FD onto QEMU using the
>> traditional tap,fd=XX syntax.
>
> Exactly. This would allow tap-based networking using libvirt session://
> URIs.
>

I'll take note of this.  It seems like it would be a nice future 
addition to libvirt.

A slight tangent, but a point on DAC isolation.  The helper enables DAC 
isolation for qemu:///session but we still need some work in libvirt to 
provide DAC isolation for qemu:///system.  This could be done by 
allowing management applications to specify custom user/group IDs when 
creating guests rather than hard coding the IDs in the configuration file.

>>
>> The TAP device FD is only one FD we normally pass to QEMU. How about
>> support for vhost net ? Is it reasonable to ask the qemu-bridge-helper
>> to send back a vhost net FD also.
>
> Absolutely.
>
>> Or indeed multiple vhost net FDs
>> when we get multiqueue NICs. Should we expect the bridge helper to
>> be strictly limited to just connecting a TAP dev to a bridge, or is
>> the expectation that it will grow more& more functionality over
>> time ?
>
> I would not expect it to do more than create virtual network interfaces,
> and add them to bridges. Multiqueue virtual nics, vhost, etc. would all
> be in scope as they are part of creating a virtual network interface.
>
> Creating the bridges and managing the bridges should be done statically
> by an administrator and would be out of scope.
>
> Regards,
>
> Anthony Liguori
>
>>
>> Daniel
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-06 18:38       ` Corey Bryant
@ 2011-10-07  9:04         ` Daniel P. Berrange
  2011-10-07 14:40           ` Corey Bryant
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrange @ 2011-10-07  9:04 UTC (permalink / raw)
  To: Corey Bryant; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel

On Thu, Oct 06, 2011 at 02:38:56PM -0400, Corey Bryant wrote:
> 
> 
> On 10/06/2011 02:04 PM, Anthony Liguori wrote:
> >On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
> >>On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
> >>>This patch adds a helper that can be used to create a tap device
> >>>attached to
> >>>a bridge device. Since this helper is minimal in what it does, it can be
> >>>given CAP_NET_ADMIN which allows qemu to avoid running as root while
> >>>still
> >>>satisfying the majority of what users tend to want to do with tap
> >>>devices.
> >>>
> >>>The way this all works is that qemu launches this helper passing a
> >>>bridge
> >>>name and the name of an inherited file descriptor. The descriptor is one
> >>>end of a socketpair() of domain sockets. This domain socket is used to
> >>>transmit a file descriptor of the opened tap device from the helper
> >>>to qemu.
> >>>
> >>>The helper can then exit and let qemu use the tap device.
> >>
> >>When QEMU is run by libvirt, we generally like to use capng to
> >>remove the ability for QEMU to run setuid programs at all. So
> >>obviously it will struggle to run the qemu-bridge-helper binary
> >>in such a scenario.
> >>
> >>With the way you transmit the TAP device FD back to the caller,
> >>it looks like libvirt itself could execute the qemu-bridge-helper
> >>receiving the FD, and then pass the FD onto QEMU using the
> >>traditional tap,fd=XX syntax.
> >
> >Exactly. This would allow tap-based networking using libvirt session://
> >URIs.
> >
> 
> I'll take note of this.  It seems like it would be a nice future
> addition to libvirt.
> 
> A slight tangent, but a point on DAC isolation.  The helper enables
> DAC isolation for qemu:///session but we still need some work in
> libvirt to provide DAC isolation for qemu:///system.  This could be
> done by allowing management applications to specify custom
> user/group IDs when creating guests rather than hard coding the IDs
> in the configuration file.

Yes, this is a item on our todo list for libvirt. There are a couple of
work items involved

 - Extend the XML to allow multiple <seclabel> elements, one per
   security driver in use.
 - Add a new API to allow fetching of live seclabel data per
   security driver
 - Extend the current DAC security driver to automatically allocate
   UIDs from an admin defined range, and/or pull them from the XML
   provided by app.

Tecnically we could do item 3, without doing items 1/2, but that would
neccessitate *not* using the sVirt security driver. I don't think that's
too useful, so items 1/2 let us use both the sVirt & enhanced DAC driver
at the same time.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-07  9:04         ` Daniel P. Berrange
@ 2011-10-07 14:40           ` Corey Bryant
  2011-10-07 14:45             ` Daniel P. Berrange
  0 siblings, 1 reply; 23+ messages in thread
From: Corey Bryant @ 2011-10-07 14:40 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel



On 10/07/2011 05:04 AM, Daniel P. Berrange wrote:
> On Thu, Oct 06, 2011 at 02:38:56PM -0400, Corey Bryant wrote:
>>
>>
>> On 10/06/2011 02:04 PM, Anthony Liguori wrote:
>>> On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
>>>> On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
>>>>> This patch adds a helper that can be used to create a tap device
>>>>> attached to
>>>>> a bridge device. Since this helper is minimal in what it does, it can be
>>>>> given CAP_NET_ADMIN which allows qemu to avoid running as root while
>>>>> still
>>>>> satisfying the majority of what users tend to want to do with tap
>>>>> devices.
>>>>>
>>>>> The way this all works is that qemu launches this helper passing a
>>>>> bridge
>>>>> name and the name of an inherited file descriptor. The descriptor is one
>>>>> end of a socketpair() of domain sockets. This domain socket is used to
>>>>> transmit a file descriptor of the opened tap device from the helper
>>>>> to qemu.
>>>>>
>>>>> The helper can then exit and let qemu use the tap device.
>>>>
>>>> When QEMU is run by libvirt, we generally like to use capng to
>>>> remove the ability for QEMU to run setuid programs at all. So
>>>> obviously it will struggle to run the qemu-bridge-helper binary
>>>> in such a scenario.
>>>>
>>>> With the way you transmit the TAP device FD back to the caller,
>>>> it looks like libvirt itself could execute the qemu-bridge-helper
>>>> receiving the FD, and then pass the FD onto QEMU using the
>>>> traditional tap,fd=XX syntax.
>>>
>>> Exactly. This would allow tap-based networking using libvirt session://
>>> URIs.
>>>
>>
>> I'll take note of this.  It seems like it would be a nice future
>> addition to libvirt.
>>
>> A slight tangent, but a point on DAC isolation.  The helper enables
>> DAC isolation for qemu:///session but we still need some work in
>> libvirt to provide DAC isolation for qemu:///system.  This could be
>> done by allowing management applications to specify custom
>> user/group IDs when creating guests rather than hard coding the IDs
>> in the configuration file.
>
> Yes, this is a item on our todo list for libvirt. There are a couple of
> work items involved
>
>   - Extend the XML to allow multiple<seclabel>  elements, one per
>     security driver in use.
>   - Add a new API to allow fetching of live seclabel data per
>     security driver
>   - Extend the current DAC security driver to automatically allocate
>     UIDs from an admin defined range, and/or pull them from the XML
>     provided by app.
>
> Tecnically we could do item 3, without doing items 1/2, but that would
> neccessitate *not* using the sVirt security driver. I don't think that's
> too useful, so items 1/2 let us use both the sVirt&  enhanced DAC driver
> at the same time.
>

I think I'm missing something here and could use some more details to 
understand 1 & 2.  Here's what I'm currently picturing.

With DAC isolation:
     QEMU A runs under userA:groupA and QEMU B runs under userB:groupB

versus currently:
     QEMU A runs under qemu:qemu and QEMU B runs under qemu:qemu

In either case, guests A and B have separate domain XML and a single 
unique seclabel, such as this dynamic SELinux label:

<seclabel type='dynamic' model='selinux'>
   <label>system_u:system_r:svirt_t:s0:c633,c712</label>
   <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
</seclabel>


> Regards,
> Daniel

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-07 14:40           ` Corey Bryant
@ 2011-10-07 14:45             ` Daniel P. Berrange
  2011-10-07 14:51               ` Corey Bryant
  2011-10-07 14:52               ` Corey Bryant
  0 siblings, 2 replies; 23+ messages in thread
From: Daniel P. Berrange @ 2011-10-07 14:45 UTC (permalink / raw)
  To: Corey Bryant; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel

On Fri, Oct 07, 2011 at 10:40:56AM -0400, Corey Bryant wrote:
> 
> 
> On 10/07/2011 05:04 AM, Daniel P. Berrange wrote:
> >On Thu, Oct 06, 2011 at 02:38:56PM -0400, Corey Bryant wrote:
> >>
> >>
> >>On 10/06/2011 02:04 PM, Anthony Liguori wrote:
> >>>On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
> >>>>On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
> >>>>>This patch adds a helper that can be used to create a tap device
> >>>>>attached to
> >>>>>a bridge device. Since this helper is minimal in what it does, it can be
> >>>>>given CAP_NET_ADMIN which allows qemu to avoid running as root while
> >>>>>still
> >>>>>satisfying the majority of what users tend to want to do with tap
> >>>>>devices.
> >>>>>
> >>>>>The way this all works is that qemu launches this helper passing a
> >>>>>bridge
> >>>>>name and the name of an inherited file descriptor. The descriptor is one
> >>>>>end of a socketpair() of domain sockets. This domain socket is used to
> >>>>>transmit a file descriptor of the opened tap device from the helper
> >>>>>to qemu.
> >>>>>
> >>>>>The helper can then exit and let qemu use the tap device.
> >>>>
> >>>>When QEMU is run by libvirt, we generally like to use capng to
> >>>>remove the ability for QEMU to run setuid programs at all. So
> >>>>obviously it will struggle to run the qemu-bridge-helper binary
> >>>>in such a scenario.
> >>>>
> >>>>With the way you transmit the TAP device FD back to the caller,
> >>>>it looks like libvirt itself could execute the qemu-bridge-helper
> >>>>receiving the FD, and then pass the FD onto QEMU using the
> >>>>traditional tap,fd=XX syntax.
> >>>
> >>>Exactly. This would allow tap-based networking using libvirt session://
> >>>URIs.
> >>>
> >>
> >>I'll take note of this.  It seems like it would be a nice future
> >>addition to libvirt.
> >>
> >>A slight tangent, but a point on DAC isolation.  The helper enables
> >>DAC isolation for qemu:///session but we still need some work in
> >>libvirt to provide DAC isolation for qemu:///system.  This could be
> >>done by allowing management applications to specify custom
> >>user/group IDs when creating guests rather than hard coding the IDs
> >>in the configuration file.
> >
> >Yes, this is a item on our todo list for libvirt. There are a couple of
> >work items involved
> >
> >  - Extend the XML to allow multiple<seclabel>  elements, one per
> >    security driver in use.
> >  - Add a new API to allow fetching of live seclabel data per
> >    security driver
> >  - Extend the current DAC security driver to automatically allocate
> >    UIDs from an admin defined range, and/or pull them from the XML
> >    provided by app.
> >
> >Tecnically we could do item 3, without doing items 1/2, but that would
> >neccessitate *not* using the sVirt security driver. I don't think that's
> >too useful, so items 1/2 let us use both the sVirt&  enhanced DAC driver
> >at the same time.
> >
> 
> I think I'm missing something here and could use some more details
> to understand 1 & 2.  Here's what I'm currently picturing.
> 
> With DAC isolation:
>     QEMU A runs under userA:groupA and QEMU B runs under userB:groupB
> 
> versus currently:
>     QEMU A runs under qemu:qemu and QEMU B runs under qemu:qemu
> 
> In either case, guests A and B have separate domain XML and a single
> unique seclabel, such as this dynamic SELinux label:
> 
> <seclabel type='dynamic' model='selinux'>
>   <label>system_u:system_r:svirt_t:s0:c633,c712</label>
>   <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
> </seclabel>

If we're going to make the DAC user ID/group ID configurable, then we
need to expose this to application in the XML so that

 a. apps can allocate unique user/group *cluster wide* when shared
    filesystems are in use. libvirt can only ensure per-host uniqueness.

 b. apps can know what user/group ID has been allocate to each guest
    and this can be reported in virsh dominfo, as with svirt info.

ie, we'll need something like this:

  <seclabel type='dynamic' model='selinux'>
    <label>system_u:system_r:svirt_t:s0:c633,c712</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac'>
    <label>102:102</label>
    <imagelabel>102:102</imagelabel>
  </seclabel>


And:

# virsh dominfo f16x86_64
Id:             29
Name:           f16x86_64
UUID:           1e9f3097-0a45-ea06-d0d8-40507999a1cd
OS Type:        hvm
State:          running
CPU(s):         1
CPU time:       19.5s
Max memory:     819200 kB
Used memory:    819200 kB
Persistent:     yes
Autostart:      disable
Security model: selinux
Security DOI:   0
Security label: system_u:system_r:svirt_t:s0:c244,c424 (permissive)
Security model: dac
Security DOI:   0
Security label: 102:102 (enforcing)

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-07 14:45             ` Daniel P. Berrange
@ 2011-10-07 14:51               ` Corey Bryant
  2011-10-07 14:52               ` Corey Bryant
  1 sibling, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-07 14:51 UTC (permalink / raw)
  To: qemu-devel



On 10/07/2011 10:45 AM, Daniel P. Berrange wrote:
> On Fri, Oct 07, 2011 at 10:40:56AM -0400, Corey Bryant wrote:
>>
>>
>> On 10/07/2011 05:04 AM, Daniel P. Berrange wrote:
>>> On Thu, Oct 06, 2011 at 02:38:56PM -0400, Corey Bryant wrote:
>>>>
>>>>
>>>> On 10/06/2011 02:04 PM, Anthony Liguori wrote:
>>>>> On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
>>>>>> On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
>>>>>>> This patch adds a helper that can be used to create a tap device
>>>>>>> attached to
>>>>>>> a bridge device. Since this helper is minimal in what it does, it can be
>>>>>>> given CAP_NET_ADMIN which allows qemu to avoid running as root while
>>>>>>> still
>>>>>>> satisfying the majority of what users tend to want to do with tap
>>>>>>> devices.
>>>>>>>
>>>>>>> The way this all works is that qemu launches this helper passing a
>>>>>>> bridge
>>>>>>> name and the name of an inherited file descriptor. The descriptor is one
>>>>>>> end of a socketpair() of domain sockets. This domain socket is used to
>>>>>>> transmit a file descriptor of the opened tap device from the helper
>>>>>>> to qemu.
>>>>>>>
>>>>>>> The helper can then exit and let qemu use the tap device.
>>>>>>
>>>>>> When QEMU is run by libvirt, we generally like to use capng to
>>>>>> remove the ability for QEMU to run setuid programs at all. So
>>>>>> obviously it will struggle to run the qemu-bridge-helper binary
>>>>>> in such a scenario.
>>>>>>
>>>>>> With the way you transmit the TAP device FD back to the caller,
>>>>>> it looks like libvirt itself could execute the qemu-bridge-helper
>>>>>> receiving the FD, and then pass the FD onto QEMU using the
>>>>>> traditional tap,fd=XX syntax.
>>>>>
>>>>> Exactly. This would allow tap-based networking using libvirt session://
>>>>> URIs.
>>>>>
>>>>
>>>> I'll take note of this.  It seems like it would be a nice future
>>>> addition to libvirt.
>>>>
>>>> A slight tangent, but a point on DAC isolation.  The helper enables
>>>> DAC isolation for qemu:///session but we still need some work in
>>>> libvirt to provide DAC isolation for qemu:///system.  This could be
>>>> done by allowing management applications to specify custom
>>>> user/group IDs when creating guests rather than hard coding the IDs
>>>> in the configuration file.
>>>
>>> Yes, this is a item on our todo list for libvirt. There are a couple of
>>> work items involved
>>>
>>>   - Extend the XML to allow multiple<seclabel>   elements, one per
>>>     security driver in use.
>>>   - Add a new API to allow fetching of live seclabel data per
>>>     security driver
>>>   - Extend the current DAC security driver to automatically allocate
>>>     UIDs from an admin defined range, and/or pull them from the XML
>>>     provided by app.
>>>
>>> Tecnically we could do item 3, without doing items 1/2, but that would
>>> neccessitate *not* using the sVirt security driver. I don't think that's
>>> too useful, so items 1/2 let us use both the sVirt&   enhanced DAC driver
>>> at the same time.
>>>
>>
>> I think I'm missing something here and could use some more details
>> to understand 1&  2.  Here's what I'm currently picturing.
>>
>> With DAC isolation:
>>      QEMU A runs under userA:groupA and QEMU B runs under userB:groupB
>>
>> versus currently:
>>      QEMU A runs under qemu:qemu and QEMU B runs under qemu:qemu
>>
>> In either case, guests A and B have separate domain XML and a single
>> unique seclabel, such as this dynamic SELinux label:
>>
>> <seclabel type='dynamic' model='selinux'>
>>    <label>system_u:system_r:svirt_t:s0:c633,c712</label>
>>    <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
>> </seclabel>
>
> If we're going to make the DAC user ID/group ID configurable, then we
> need to expose this to application in the XML so that
>
>   a. apps can allocate unique user/group *cluster wide* when shared
>      filesystems are in use. libvirt can only ensure per-host uniqueness.
>
>   b. apps can know what user/group ID has been allocate to each guest
>      and this can be reported in virsh dominfo, as with svirt info.
>
> ie, we'll need something like this:
>
>    <seclabel type='dynamic' model='selinux'>
>      <label>system_u:system_r:svirt_t:s0:c633,c712</label>
>      <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
>    </seclabel>
>    <seclabel type='dynamic' model='dac'>
>      <label>102:102</label>
>      <imagelabel>102:102</imagelabel>
>    </seclabel>
>
>
> And:
>
> # virsh dominfo f16x86_64
> Id:             29
> Name:           f16x86_64
> UUID:           1e9f3097-0a45-ea06-d0d8-40507999a1cd
> OS Type:        hvm
> State:          running
> CPU(s):         1
> CPU time:       19.5s
> Max memory:     819200 kB
> Used memory:    819200 kB
> Persistent:     yes
> Autostart:      disable
> Security model: selinux
> Security DOI:   0
> Security label: system_u:system_r:svirt_t:s0:c244,c424 (permissive)
> Security model: dac
> Security DOI:   0
> Security label: 102:102 (enforcing)
>
> Regards,
> Daniel

Ah, yes.  That makes complete sense.  Thanks for the clarification.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper
  2011-10-07 14:45             ` Daniel P. Berrange
  2011-10-07 14:51               ` Corey Bryant
@ 2011-10-07 14:52               ` Corey Bryant
  1 sibling, 0 replies; 23+ messages in thread
From: Corey Bryant @ 2011-10-07 14:52 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Anthony Liguori, Richa Marwaha, qemu-devel



On 10/07/2011 10:45 AM, Daniel P. Berrange wrote:
> On Fri, Oct 07, 2011 at 10:40:56AM -0400, Corey Bryant wrote:
>>
>>
>> On 10/07/2011 05:04 AM, Daniel P. Berrange wrote:
>>> On Thu, Oct 06, 2011 at 02:38:56PM -0400, Corey Bryant wrote:
>>>>
>>>>
>>>> On 10/06/2011 02:04 PM, Anthony Liguori wrote:
>>>>> On 10/06/2011 11:41 AM, Daniel P. Berrange wrote:
>>>>>> On Thu, Oct 06, 2011 at 11:38:25AM -0400, Richa Marwaha wrote:
>>>>>>> This patch adds a helper that can be used to create a tap device
>>>>>>> attached to
>>>>>>> a bridge device. Since this helper is minimal in what it does, it can be
>>>>>>> given CAP_NET_ADMIN which allows qemu to avoid running as root while
>>>>>>> still
>>>>>>> satisfying the majority of what users tend to want to do with tap
>>>>>>> devices.
>>>>>>>
>>>>>>> The way this all works is that qemu launches this helper passing a
>>>>>>> bridge
>>>>>>> name and the name of an inherited file descriptor. The descriptor is one
>>>>>>> end of a socketpair() of domain sockets. This domain socket is used to
>>>>>>> transmit a file descriptor of the opened tap device from the helper
>>>>>>> to qemu.
>>>>>>>
>>>>>>> The helper can then exit and let qemu use the tap device.
>>>>>>
>>>>>> When QEMU is run by libvirt, we generally like to use capng to
>>>>>> remove the ability for QEMU to run setuid programs at all. So
>>>>>> obviously it will struggle to run the qemu-bridge-helper binary
>>>>>> in such a scenario.
>>>>>>
>>>>>> With the way you transmit the TAP device FD back to the caller,
>>>>>> it looks like libvirt itself could execute the qemu-bridge-helper
>>>>>> receiving the FD, and then pass the FD onto QEMU using the
>>>>>> traditional tap,fd=XX syntax.
>>>>>
>>>>> Exactly. This would allow tap-based networking using libvirt session://
>>>>> URIs.
>>>>>
>>>>
>>>> I'll take note of this.  It seems like it would be a nice future
>>>> addition to libvirt.
>>>>
>>>> A slight tangent, but a point on DAC isolation.  The helper enables
>>>> DAC isolation for qemu:///session but we still need some work in
>>>> libvirt to provide DAC isolation for qemu:///system.  This could be
>>>> done by allowing management applications to specify custom
>>>> user/group IDs when creating guests rather than hard coding the IDs
>>>> in the configuration file.
>>>
>>> Yes, this is a item on our todo list for libvirt. There are a couple of
>>> work items involved
>>>
>>>   - Extend the XML to allow multiple<seclabel>   elements, one per
>>>     security driver in use.
>>>   - Add a new API to allow fetching of live seclabel data per
>>>     security driver
>>>   - Extend the current DAC security driver to automatically allocate
>>>     UIDs from an admin defined range, and/or pull them from the XML
>>>     provided by app.
>>>
>>> Tecnically we could do item 3, without doing items 1/2, but that would
>>> neccessitate *not* using the sVirt security driver. I don't think that's
>>> too useful, so items 1/2 let us use both the sVirt&   enhanced DAC driver
>>> at the same time.
>>>
>>
>> I think I'm missing something here and could use some more details
>> to understand 1&  2.  Here's what I'm currently picturing.
>>
>> With DAC isolation:
>>      QEMU A runs under userA:groupA and QEMU B runs under userB:groupB
>>
>> versus currently:
>>      QEMU A runs under qemu:qemu and QEMU B runs under qemu:qemu
>>
>> In either case, guests A and B have separate domain XML and a single
>> unique seclabel, such as this dynamic SELinux label:
>>
>> <seclabel type='dynamic' model='selinux'>
>>    <label>system_u:system_r:svirt_t:s0:c633,c712</label>
>>    <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
>> </seclabel>
>
> If we're going to make the DAC user ID/group ID configurable, then we
> need to expose this to application in the XML so that
>
>   a. apps can allocate unique user/group *cluster wide* when shared
>      filesystems are in use. libvirt can only ensure per-host uniqueness.
>
>   b. apps can know what user/group ID has been allocate to each guest
>      and this can be reported in virsh dominfo, as with svirt info.
>
> ie, we'll need something like this:
>
>    <seclabel type='dynamic' model='selinux'>
>      <label>system_u:system_r:svirt_t:s0:c633,c712</label>
>      <imagelabel>system_u:object_r:svirt_image_t:s0:c633,c712</imagelabel>
>    </seclabel>
>    <seclabel type='dynamic' model='dac'>
>      <label>102:102</label>
>      <imagelabel>102:102</imagelabel>
>    </seclabel>
>
>
> And:
>
> # virsh dominfo f16x86_64
> Id:             29
> Name:           f16x86_64
> UUID:           1e9f3097-0a45-ea06-d0d8-40507999a1cd
> OS Type:        hvm
> State:          running
> CPU(s):         1
> CPU time:       19.5s
> Max memory:     819200 kB
> Used memory:    819200 kB
> Persistent:     yes
> Autostart:      disable
> Security model: selinux
> Security DOI:   0
> Security label: system_u:system_r:svirt_t:s0:c244,c424 (permissive)
> Security model: dac
> Security DOI:   0
> Security label: 102:102 (enforcing)
>
> Regards,
> Daniel

Ah, yes.  That makes complete sense.  Thanks for the clarification.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-10-07 14:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-06 15:38 [Qemu-devel] [PATCH 0/4] -net tap: rootless bridge support for qemu Richa Marwaha
2011-10-06 15:38 ` [Qemu-devel] [PATCH 1/4] Add basic version of bridge helper Richa Marwaha
2011-10-06 16:41   ` Daniel P. Berrange
2011-10-06 18:04     ` Anthony Liguori
2011-10-06 18:38       ` Corey Bryant
2011-10-07  9:04         ` Daniel P. Berrange
2011-10-07 14:40           ` Corey Bryant
2011-10-07 14:45             ` Daniel P. Berrange
2011-10-07 14:51               ` Corey Bryant
2011-10-07 14:52               ` Corey Bryant
2011-10-06 17:44   ` Anthony Liguori
2011-10-06 18:10     ` Corey Bryant
2011-10-06 15:38 ` [Qemu-devel] [PATCH 2/4] Add access control support to qemu-bridge-helper Richa Marwaha
2011-10-06 15:38 ` [Qemu-devel] [PATCH 3/4] Add cap reduction support to enable use as SUID Richa Marwaha
2011-10-06 16:34   ` Daniel P. Berrange
2011-10-06 17:42     ` Anthony Liguori
2011-10-06 18:05       ` Corey Bryant
2011-10-06 18:08       ` Corey Bryant
2011-10-06 15:38 ` [Qemu-devel] [PATCH 4/4] Add support for bridge Richa Marwaha
2011-10-06 17:49   ` Anthony Liguori
2011-10-06 18:15     ` Corey Bryant
2011-10-06 18:19       ` Anthony Liguori
2011-10-06 18:24         ` Corey Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.