xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans
@ 2018-10-05 16:56 George Dunlap
  2018-10-05 16:56 ` [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot George Dunlap
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: George Dunlap @ 2018-10-05 16:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, Andrew Cooper,
	Tim Deegan, George Dunlap, Ross Lagerwall, Julien Grall,
	Jan Beulich, Anthony Perard, Ian Jackson

docs/qemu-deprivilege.txt had some basic instructions for using
dm_restrict, but it was incomplete, misleading, and stale.

Update the docs in a number of ways.

First, separate user-facing documentation and technical description
into docs/features and docs/design, respectively.

In the feature doc:

* Introduce a section mentioning minimim versions of Linux, Xen, and
qemu required (TBD)

* Fix the discussion of qemu userid.  Mention xen-qemuuser-range-base,
and provide example shell code that actually has some hope of working
(instead of failing out after creating 900 userids).

* Describe how to enable restrictions, as well as features which
probably don't or definitely don't work.

In the design doc, introduce a "Technical Details" section which
describes specifically what restrictions are currently done, and also
what restrictions we are looking at doing in the future.

The idea here is that as we implement the various items for the
future, we move them from "Restrictions still to do" to "Restrictions
done".  This can also act as a design document -- a place for public
discussion of what can or should be done and how.

Also add an entry to SUPPORT.md.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes since v2:
- Extraneous privcmd / evtchn instances aren't closed
- Expand description of how to test fd deprivileging
- Rework and clarify two namespace sections, give reference for QEMU NAK
- Add more information about migration technical challenges
- In UID section, mention possibility of container ID collisions.
- Fix name of design document.
- Add SUPPORT.md statement.  Specify Linux, to make sure that FreeBSD is
  evaluated separately.
- Mention that `-sandbox` is a blacklist and why

Changes since v1:
- Break into two, and move into appropriate directories (rather than 'misc')
- Updated version requirements
- Distinguish between features which "don't yet work" and features which we never expect to work
- Update description of xen-restrict functionality
- Reorder and expand further restrictions
- Make it more clear which restrictions are available on Linux only
- Include detailed description of how to kill a process
- Add RLIMIT_NPROC as something we can do without further changes to qemu
- Document the need to check for the sandbox feature before using it

Thank you to Ross Lagerwall, whose description of what XenServer is
doing formed much of the basis for the text here.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 SUPPORT.md                            |  20 ++
 docs/designs/qemu-deprivilege.md      | 322 ++++++++++++++++++++++++++
 docs/features/qemu-deprivilege.pandoc |  94 ++++++++
 docs/misc/qemu-deprivilege.txt        |  36 ---
 4 files changed, 436 insertions(+), 36 deletions(-)
 create mode 100644 docs/designs/qemu-deprivilege.md
 create mode 100644 docs/features/qemu-deprivilege.pandoc
 delete mode 100644 docs/misc/qemu-deprivilege.txt

diff --git a/SUPPORT.md b/SUPPORT.md
index 3727446b83..b5e7e44fb3 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -525,6 +525,26 @@ Vulnerabilities of a device model stub domain
 to a hostile driver domain (either compromised or untrusted)
 are excluded from security support.
 
+### Device Model Deprivileging
+
+    Status, Linux: Tech Preview, with limited support
+
+This means adding extra restrictions to a device model running in
+domain 0 in order to prevent a compromised device model to attack the
+rest of the system.
+
+"Tech preview with limited support" means we will not issue XSAs for
+the _additional_ functionality provided by the feature; but we will
+issue XSAs in the event that enabling this feature opens up a security
+hole that would not be present without the feature disabled.
+
+For example, while this is classified as tech preview, a bug in libxl
+which failed to change the user ID of QEMU would not receive an XSA,
+since without this feature the user ID wouldn't be changed. But a
+change which made it possible for a compromised guest to read
+arbitrary files on the host filesystem without compromising QEMU would
+be issued an XSA, since that does weaken security.
+
 ### KCONFIG Expert
 
     Status: Experimental
diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md
new file mode 100644
index 0000000000..d3c6495030
--- /dev/null
+++ b/docs/designs/qemu-deprivilege.md
@@ -0,0 +1,322 @@
+# Introduction
+
+The goal of deprilvileging qemu is this: Even if there is a bug (for
+example in qemu) which permits a domain to gain control of the device
+model, the compromised device model process is prevented from
+violating the system's overall security properties.  Ie, a guest
+cannot "escape" from the virtualisation by using a qemu bug.
+
+This document lists the various technical measures which we either
+have taken, or plan to take to effect this goal.  Some of them are
+required to be considered secure (that is, there are known attack
+vectors which they close); others are "just in case" (that is, there
+are no known attack vectors, but we perform the restrictions to reduce
+the possibility of unknown attack vectors).
+
+# Restrictions done
+
+The following restrictions are currently implemented.
+
+## Having qemu switch user
+
+'''Description''': As mentioned above, having QEMU switch to a
+non-root user, one per domain id.  Not being the root user limits what
+a compromised QEMU process can do to the system, and having one user
+per domain id limits what a comprimised QEMU process can do to the
+QEMU processes of other VMs.
+
+'''Implementation''': The toolstack adds the following to the qemu command-line:
+
+    -runas <uid>:<gid>
+
+'''How to test''':
+
+    grep /proc/<qpid>/status [UG]id
+
+'''Testing Status''': Not tested
+
+## Xen library / file-descriptor restrictions
+
+'''Description''': Close and restrict Xen-related file descriptors.
+Specifically:
+ * Close all xenstore-related file descriptors
+ * Make sure that all open instances of `privcmd` and `evtchn` file
+descriptors have had `IOCTL_PRIVCMD_RESTRICT` and
+`IOCTL_EVTCHN_RESTRICT_DOMID` ioctls called on them, respectively.
+
+FIXME: Double-check the correctness of the above
+
+'''Implementation''': Toolstack adds the following to the qemu command-line:
+
+    -xen-domid-restrict
+
+'''How to test''':
+
+Use `fishdescriptor` to pull a file descriptor from a running QEMU,
+then use `depriv-fd-checker` to check that it has the desired
+properties, and that hypercalls which are meant to fail do fail.  (In
+Debian `fishdescriptor` can be found in the binary package
+`chiark-scripts`; the `depriv-fd-checker` is included in the Xen
+source tree.)
+
+'''Testing status''': Tested
+
+# Restrictions / improvements still to do
+
+This lists potential restrictions still to do.  It is meant to be
+listed in order of ease of implementation, with low-hanging fruit
+first.
+
+## Chroot
+
+'''Description''': Qemu runs in its own chroot, such that even if it
+could call an 'open' command of some sort, there would be nothing for
+it to see.
+
+'''Implementation''': The toolstack creates a directory in the libxl "run-dir"; e.g.
+`/var/run/xen/qemu-root-<domid>`
+
+Then adds the following to the qemu command-line:
+
+    -chroot /var/run/xen/qemu-root-<domid>
+	
+'''How to test''':  Check `/proc/<qpid>/root`
+	
+'''Tested''': Not tested
+
+## Namespaces for unused functionality (Linux only)
+
+'''Description''': QEMU doesn't use the functionality associated with
+mount and IPC namespaces. (IPC namespaces contol non-file-based IPC
+mechanisms within the kernel; unix and network sockets are not
+affected by this.)  Making separate namespaces for these for QEMU
+won't affect normal operation, but it does mean that even if other
+restrictions fail, the process won't be able to even name system mount
+points or exsting non-file-based IPC descriptors to attempt to attack
+them.
+
+'''Implementation''':
+
+In theory this could be done in QEMU (similar to -sandbox, -runas,
+-chroot, and so on), but a patch doing this in QEMU was NAKed upstream
+(see [qemu-namespaces]). They preferred that this was done as a setup step by
+whatever executes QEMU; i.e., have the process which exec's QEMU first
+call:
+
+    unshare(CLONE_NEWNS | CLONE_NEWIPC)
+	
+'''How to test''':  Check `/proc/<qpid>/ns/[ipc,mnt]`
+
+'''Tested''': Not tested
+
+[qemu-namespaces]: https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg04723.html
+
+### Basic RLIMITs
+
+'''Description''': A number of limits on the resources that a given
+process / userid is allowed to consume.  These can limit the ability
+of a compromised QEMU process to DoS domain 0 by exhausting various
+resources available to it.
+
+'''Implementation'''
+
+Limits that can be implemented immediately without much effort:
+ - RLIMIT_FSIZE` (file size) to 256KiB.
+ - RLIMIT_NPROC (after uid changes to a unique uid)
+
+Probably not necessary but why not:
+ - RLIMIT_CORE: 0
+ - RLIMIT_MSGQUEUE: 0
+ - RLIMIT_LOCKS: 0
+ - RLIMIT_MEMLOCK: 0
+ 
+Note: mlock() is used by QEMU only when both "realtime" and "mlock"
+are specified; this does not apply to QEMU running as a Xen DM.
+   
+'''How to test''': Check `/proc/<qpid>/limits`
+
+'''Tested''': Not tested
+
+### Further RLIMITs
+
+RLIMIT_AS limits the total amount of memory; but this includes the
+virtual memory which QEMU uses as a mapcache.  xen-mapcache.c already
+fiddles with this; it would be straightforward to make it *set* the
+rlimit to what it thinks a sensible limit is.
+
+Other things that would take some cleverness / changes to QEMU to
+utilize due to ordering constrants:
+ - RLIMIT_NOFILES (after all necessary files are opened)
+
+### libxl UID cleanup
+
+'''Description''': Domain IDs are reused, and thus restricted UIDs are
+reused.  If a compromised QEMU can fork (due to seccomp or
+RLIMIT_NPROC limits being ineffective for some reason), it may avoid
+being killed when its domain dies, then wait until the domain ID is
+reused again, at which point it will have control over the domain in
+question (which probably belongs to someone else).
+
+libxl should kill all UIDs associated with a domain both when the VM
+is destroyed, and before starting a VM with the same UID.
+
+'''Implementation''': This is unnecessarily tricky.
+
+The kill() system call can have three kinds of targets:
+ - A single pid
+ - A process group
+ - "Every process except me to which I am allowed to send a signal" (-1)
+
+Targeting a single pid is racy and likely to be beaten by the
+following loop:
+
+    while(1) {
+        if(fork())
+	    _exit(0);
+    }	  
+
+That is, by the time you've read the process list and found the
+process id you want to kill, that process has exited and there is a
+new process whose pid you don't know about.
+
+Targeting a process group will be ineffective, as unprivileged
+processes are allowed to make their own process groups.
+
+kill(-1) can be used but must be done with care.  Consider the
+following code, for example:
+
+    setuid(target_uid);
+    kill(-1, 9);
+
+This looks like it will do the trick; but by setting all of the user
+ids (effective, real, and saved), it opens the 'killing' process up to
+being killed by the target process:
+
+    while(1) {
+        if(fork())
+            _exit(0);
+        else
+            kill(-1, 9);
+    }
+
+Fortunately there is an assymetry we can take advantage of.  From the
+POSIX spec:
+
+> For a process to have permission to send a signal to a process
+> designated by pid, unless the sending process has appropriate
+> privileges, the real or effective user ID of the sending process shall
+> match the real or saved set-user-ID of the receiving process.
+
+The solution is to allocate a second "reaper" uid that is only used to kill
+target processes.  We set the euid of the killing process to the `target_uid`,
+but the ruid of the killing process to `reaper_uid`, leaving the suid of the
+killing process as 0:
+
+    setresuid(reaper_uid, target_uid, 0);
+    kill(-1, 9);
+
+NOTE: We cannot use `setreuid(reaper_uid, target_uid)` here, as that
+will set *both* euid *and* suid to `target_uid`, making the killing
+process vulnerable to the target process again.
+
+Since this will kill all other `reaper_uid` processes as well, we must
+either allocate a separate `reaper_uid` per domain, or use locking to
+ensure that only one killing process is active at a time.
+
+## libxl: Treat QMP connection as untrusted
+
+'''Description''': Currently libxl talks with QEMU via QMP; but its
+interactions have not historically considered from a security point of
+view.  For example, qmp_synchronous_send() waits for a response from
+QEMU, which a compromised QEMU could simply not send (thus preventing
+the toolstack from making forward progress).
+
+'''Implementation''': Audit toolstack interactions with QEMU which
+happen after the guest has started running, and assume QEMU has been
+compromised.
+
+### seccomp filtering (Linux only)
+
+'''Description''': Turn on seccomp filtering to disable syscalls which
+QEMU doesn't need. 
+
+'''Implementation''': Enable from the command-line:
+
+    -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny
+
+`elevateprivileges` is currently required to allow `-runas` to work.
+Removing this requirement would mean making sure that the uid change
+happened before the seccomp2 call, perhaps by changing the uid before
+executing QEMU.  (But this would then require other changes to create
+the QMP socket, VNC socket, and so on).
+
+It should be noted that `-sandbox` is implemented as a blacklist, not
+a whitelist; that is, it disables known-unsed functionality which may
+be harmful, rather than disabling all functionality except that known
+to be safe and needed.  This is unfortunately necessary since qemu
+doesn't know what system calls libraries might end up making.  (See
+[lwn-seccomp] for a more complete discussion.)
+
+This feature is not on by default and may not be available in all
+environments.  We therefore need to either:
+ 1. Require that this feature be enabled to build qemu
+ 2. Check for `-sandbox` support at runtime before 
+
+[lwn-seccomp]: https://lwn.net/Articles/738694/
+
+### Disks
+
+The chroot (and seccomp?) happens late enough such that QEMU can
+initialize itself and open its disks. If you want to add a disk at run
+time via or insert a CD, you can't pass a path because QEMU is
+chrooted. Instead use the add-fd QMP command and use
+/dev/fdset/<fdset-id> as the path.
+
+A further layer of restriction could be to set RLIMIT_NOFILES to '0',
+and hand all disks over QMP.
+
+## Migration
+
+When calling xen-save-devices-state, since QEMU is running in a chroot
+it is not useful to pass a filename (it doesn't even have write access
+inside the chroot). Instead, give it an open fd using the add-fd
+mechanism.
+
+Additionally, all the restrictions need to be applied to the qemu
+started up on the post-migration side.  One issue that needs to be
+solved is how to signal the toolstack on restore that qemu is ready
+for the domain to be started (since this is normally done via
+xenstore, and at this point the xenstore connections will have been
+closed).
+
+### Network namespacing (Linux only)
+
+Enter QEMU into its own network namespace (in addition to mount & IPC
+namespaces):
+
+    unshare(CLONE_NEWNET);
+
+QEMU does actually use the network namespace as a Xen DM for two
+purposes: 1) To set up network tap devices 2) To open vnc connections.
+
+#### Network
+
+If QEMU runs in its own network namespace, it can't open the tap
+device itself because the interface won't be visible outside of its
+own namespace. So instead, have the toolstack open the device and pass
+it as an fd on the command-line:
+
+    -device rtl8139,netdev=tapnet0,mac=... -netdev tap,id=tapnet0,fd=<tapfd>
+
+#### VNC
+
+If QEMU runs in its own network namespace, it is not straightforward
+to listen on a TCP socket outside of its own network namespace. One
+option would be to use VNC over a UNIX socket:
+
+    -vnc unix:/var/run/xen/vnc-<domid>
+
+However, this would break functionality in the general case; I think
+we need to have the toolstack open a socket and pass the fd to QEMU
+(which requires changes to QEMU).
+
diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc
new file mode 100644
index 0000000000..6fb48f3e40
--- /dev/null
+++ b/docs/features/qemu-deprivilege.pandoc
@@ -0,0 +1,94 @@
+% QEMU Deprivileging / dm_restrict
+% Revision 1
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Tech Preview**
+
+Architecture(s): x86
+
+   Component(s): toolstack
+
+---------------- ----------------------------------------------------
+
+# Overview
+
+By default, the QEMU device model is run in domain 0.  If an attacker
+can gain control of a QEMU process, it could easily take control of a
+system.
+
+dm_restrict is a set of operations to restrict QEMU running in domain
+0.  It consists of two halves:
+
+ 1. Mechanisms to restrict QEMU to only being able to affect its own
+domain
+ 2. Mechanisms to restruct QEMU's ability to interact with domain 0.
+
+# User details
+
+## Getting the right versions of software
+
+Linux: 4.11+
+
+Qemu: 3.0+ (Or the version that comes with Xen 4.12+)
+
+## Setting up a userid range
+
+For maximum security, libxl needs to run the devicemodel for each
+domain under a user id (UID) corresponding to its domain id.  There
+are 32752 possible domain IDs, and so libxl needs 32752 user ids set
+aside for it.
+
+The simplest and most effective way to do this is to allocate a
+contiguous block of UIDs, and create a single user named
+`xen-qemuuser-range-base` with the first UID.  For example, under Debian:
+
+    adduser --no-create-home --uid 65536 --system xen-qemuuser-range-base
+
+NOTE: Most modern systems have 32-bit UIDs, and so can in theory go up
+to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
+UIDs however; UID 65535 is reserved for an invalid value, and 65534 is
+normally allocated to "nobody".  Additionally, some container systems
+have proposed using the upper 32 bits of the uid for a container ID.
+
+Another, less-secure way is to run all QEMUs as the same UID.  To do
+this, create a user named `xen-qemuuser-shared`; for example:
+
+    adduser --no-create-home --system xen-qemuuser-shared
+
+## Domain config changes
+
+The core domain config change is to add the following line to the
+domain configuration:
+
+    dm_restrict=1
+
+This will perform a number of restrictions, outlined below in the
+'Technical details' section.
+
+# Technical details
+
+See docs/design/qemu-deprivilege.md for technical details.
+
+# Limitations
+
+The following features still need to be implemented:
+ * Inserting a new cdrom while the guest is running (xl cdrom-insert)
+ * Migration / save / restore
+
+Additionally, getting PCI passthrough to work securely would require a
+significant rework of how passthrough works at the moment.  It may be
+implemented at some point but is not a near-term priority.
+
+See SUPPORT.md for security support status.
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2018-09-14 1        Xen 4.12 Imported from docs/misc
+---------- -------- -------- -------------------------------------------
diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-deprivilege.txt
deleted file mode 100644
index 58b86a3908..0000000000
--- a/docs/misc/qemu-deprivilege.txt
+++ /dev/null
@@ -1,36 +0,0 @@
-For security reasons, libxl tries to pass a non-root username to QEMU as
-argument. During initialization QEMU calls setuid and setgid with the
-user ID and the group ID of the user passed as argument.
-Libxl looks for the following users in this order:
-
-1) a user named "xen-qemuuser-domid$domid",
-Where $domid is the domid of the domain being created.
-This requires the reservation of 65535 uids from xen-qemuuser-domid1
-to xen-qemuuser-domid65535. To use this mechanism, you might want to
-create a large number of users at installation time. For example:
-
-for ((i=1; i<65536; i++))
-do
-    adduser --no-create-home --system xen-qemuuser-domid$i
-done
-
-You might want to consider passing --group to adduser to create a new
-group for each new user.
-
-
-2) a user named "xen-qemuuser-shared"
-As a fall back if both 1) fails, libxl will use a single user for
-all QEMU instances. The user is named xen-qemuuser-shared. This is
-less secure but still better than running QEMU as root. Using this is as
-simple as creating just one more user on your host:
-
-adduser --no-create-home --system xen-qemuuser-shared
-
-
-3) root
-As a last resort, libxl will start QEMU as root.
-
-
-Please note that running QEMU as non-root causes several features like
-migration and PCI passthrough to not work properly and may prevent the guest
-from booting.
-- 
2.19.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
@ 2018-10-05 16:56 ` George Dunlap
  2018-10-26 13:52   ` Ian Jackson
  2018-10-05 16:56 ` [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux George Dunlap
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2018-10-05 16:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony Perard, Ian Jackson, Wei Liu, George Dunlap

When dm_restrict is enabled, ask QEMU to chroot into an empty directory.

* Create /var/run/qemu/root-domid (deleting the old one if it's there)
* Pass the -chroot option to QEMU

Rather than running `rm -rf` on the directory before creating it
(since there is no library function to do this), simply rmdir the
directory, relying on the fact that the previous QEMU instance, if
properly restricted, shouldn't have been able to write anything
anyway.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes since v2:
- Style fixes
- Testing moved to a different patch

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
---
 docs/designs/qemu-deprivilege.md | 12 +++++-----
 tools/libxl/libxl_dm.c           | 41 +++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md
index d3c6495030..6dc3c6d149 100644
--- a/docs/designs/qemu-deprivilege.md
+++ b/docs/designs/qemu-deprivilege.md
@@ -61,12 +61,6 @@ source tree.)
 
 '''Testing status''': Tested
 
-# Restrictions / improvements still to do
-
-This lists potential restrictions still to do.  It is meant to be
-listed in order of ease of implementation, with low-hanging fruit
-first.
-
 ## Chroot
 
 '''Description''': Qemu runs in its own chroot, such that even if it
@@ -84,6 +78,12 @@ Then adds the following to the qemu command-line:
 	
 '''Tested''': Not tested
 
+## Restrictions / improvements still to do
+
+This lists potential restrictions still to do.  It is meant to be
+listed in order of ease of implementation, with low-hanging fruit
+first.
+
 ## Namespaces for unused functionality (Linux only)
 
 '''Description''': QEMU doesn't use the functionality associated with
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index abd31ee6f2..385643b52c 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1410,9 +1410,48 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
         }
     }
 
-    if (libxl_defbool_val(b_info->dm_restrict))
+    if (libxl_defbool_val(b_info->dm_restrict)) {
+        char *chroot_dir = GCSPRINTF("%s/qemu-root-%d",
+                                      libxl__run_dir_path(), guest_domid);
+        int r;
+        
         flexarray_append(dm_args, "-xen-domid-restrict");
 
+        /* 
+         * Run QEMU in a chroot at XEN_RUN_DIR/qemu-root-%d
+         *
+         * There is no library function to do the equivalent of `rm
+         * -rf`.  However deprivileged QEMU in theory shouldn't be
+         * able to write any files, as the chroot would be owned by
+         * root, but it would be running as an unprivileged process.
+         * So in theory, old chroots should always be empty.
+         * 
+         * rmdir the directory before attempting to create
+         * it; if it returns anything other than ENOENT, fail domain
+         * creation.
+         */
+        r = rmdir(chroot_dir);
+        if (r != 0 && errno != ENOENT) {
+            LOGED(ERROR, guest_domid,
+                  "failed to remove existing chroot dir %s", chroot_dir);
+            return ERROR_FAIL;
+        }
+        
+        for (;;) {
+            r = mkdir(chroot_dir, 0000);
+            if (!r)
+                break;
+            if (errno == EINTR) continue;
+            LOGED(ERROR, guest_domid,
+                  "failed to create chroot dir %s", chroot_dir);
+            return ERROR_FAIL;
+        }
+
+        /* Add "-chroot [dir]" to command-line */
+        flexarray_append(dm_args, "-chroot");
+        flexarray_append(dm_args, chroot_dir);
+    }
+
     if (state->saved_state) {
         /* This file descriptor is meant to be used by QEMU */
         *dm_state_fd = open(state->saved_state, O_RDONLY);
-- 
2.19.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
  2018-10-05 16:56 ` [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot George Dunlap
@ 2018-10-05 16:56 ` George Dunlap
  2018-10-26 14:00   ` Ian Jackson
  2018-10-05 16:57 ` [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs George Dunlap
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2018-10-05 16:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony Perard, Ian Jackson, Wei Liu, George Dunlap

QEMU running under Xen doesn't need mount or IPC functionality.
Create and enter separate namespaces for each of these before
executing QEMU, so that in the event that other restrictions fail, the
process won't be able to even name system mount points or exsting
non-file-based IPC descriptors to attempt to attack them.

Unsharing is something a process can only do to itself (it would
seem); so add an os-specific "dm_preexec_restrict()" hook just before
we exec() the device model.

Also add checks to depriv-process-checker.sh to verify that dm is
running in a new namespace (or at least, a different one than the
caller).

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes in v2:
- Return an error rather than calling exit()
- Use LOGE() and print to the current stderr fd, rather than
  printing to the new stderr fd via write()
- Use r for external return values rather than rc.

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
---
 docs/designs/qemu-deprivilege.md | 12 ++++++------
 tools/libxl/libxl_dm.c           |  6 ++++++
 tools/libxl/libxl_freebsd.c      |  5 +++++
 tools/libxl/libxl_internal.h     |  5 +++++
 tools/libxl/libxl_linux.c        | 14 ++++++++++++++
 tools/libxl/libxl_netbsd.c       |  5 +++++
 6 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md
index 6dc3c6d149..e1c9763a71 100644
--- a/docs/designs/qemu-deprivilege.md
+++ b/docs/designs/qemu-deprivilege.md
@@ -78,12 +78,6 @@ Then adds the following to the qemu command-line:
 	
 '''Tested''': Not tested
 
-## Restrictions / improvements still to do
-
-This lists potential restrictions still to do.  It is meant to be
-listed in order of ease of implementation, with low-hanging fruit
-first.
-
 ## Namespaces for unused functionality (Linux only)
 
 '''Description''': QEMU doesn't use the functionality associated with
@@ -111,6 +105,12 @@ call:
 
 [qemu-namespaces]: https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg04723.html
 
+# Restrictions / improvements still to do
+
+This lists potential restrictions still to do.  It is meant to be
+listed in order of ease of implementation, with low-hanging fruit
+first.
+
 ### Basic RLIMITs
 
 '''Description''': A number of limits on the resources that a given
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 385643b52c..702ea75149 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2393,6 +2393,12 @@ retry_transaction:
         goto out_close;
     if (!rc) { /* inner child */
         setsid();
+        if (libxl_defbool_val(b_info->dm_restrict))
+        {
+            rc = libxl__local_dm_preexec_restrict(gc);
+            if (rc != 0)
+                _exit(-1);
+        }
         libxl__exec(gc, null, logfile_w, logfile_w, dm, args, envs);
     }
 
diff --git a/tools/libxl/libxl_freebsd.c b/tools/libxl/libxl_freebsd.c
index 6442ccec72..f7ef4a8910 100644
--- a/tools/libxl/libxl_freebsd.c
+++ b/tools/libxl/libxl_freebsd.c
@@ -245,3 +245,8 @@ int libxl__pci_topology_init(libxl__gc *gc,
 {
     return ERROR_NI;
 }
+
+int libxl__local_dm_preexec_restrict(libxl__gc *gc)
+{
+    return 0;
+}
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 802382c704..df85ddff27 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3782,6 +3782,11 @@ struct libxl__dm_spawn_state {
 
 _hidden void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state*);
 
+/* 
+ * Called after forking but before executing the local devicemodel.
+ */
+_hidden int libxl__local_dm_preexec_restrict(libxl__gc *gc);
+
 /* Stubdom device models. */
 
 typedef struct {
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index 6ef0abc693..34af2e0d90 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -307,6 +307,20 @@ int libxl__pci_topology_init(libxl__gc *gc,
     return err;
 }
 
+int libxl__local_dm_preexec_restrict(libxl__gc *gc)
+{
+    int r;
+
+    /* Unshare mount and IPC namespaces.  These are unused by QEMU. */
+    r = unshare(CLONE_NEWNS | CLONE_NEWIPC);
+    if (r < 0) {
+        LOGE(ERROR, "libxl: Mount and IPC namespace unfailed");
+        return ERROR_FAIL;
+    }
+
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_netbsd.c b/tools/libxl/libxl_netbsd.c
index 2edfb00641..dce3f1fdce 100644
--- a/tools/libxl/libxl_netbsd.c
+++ b/tools/libxl/libxl_netbsd.c
@@ -124,3 +124,8 @@ int libxl__pci_topology_init(libxl__gc *gc,
 {
     return ERROR_NI;
 }
+
+void libxl__local_dm_preexec_restrict(libxl__gc *gc, int stderrfd)
+{
+    return;
+}
-- 
2.19.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
  2018-10-05 16:56 ` [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot George Dunlap
  2018-10-05 16:56 ` [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux George Dunlap
@ 2018-10-05 16:57 ` George Dunlap
  2018-10-26 14:02   ` Ian Jackson
  2018-10-05 16:57 ` [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv George Dunlap
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2018-10-05 16:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Anthony Perard, Ian Jackson, Wei Liu, George Dunlap

Limit the ability of a potentially compromised QEMU to consume system
resources.  Key limits:
 - RLIMIT_FSIZE (file size): 256KiB
 - RLIMIT_NPROC (after uid changes to a unique uid)

Probably unnecessary limits but why not:
 - RLIMIT_CORE: 0
 - RLIMIT_MSGQUEUE: 0
 - RLIMIT_LOCKS: 0
 - RLIMIT_MEMLOCK: 0

NB that we do not yet set RLIMIT_AS (total virtual memory) or
RLIMIT_NOFILES (number of open files), since these require more care
and/or more coordination with QEMU to implement.

Suggested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes since v2:
- Use a macro to define rlimit entries
- Use RLIMIT_NLIMITS as an end-of-list marker, rather than -1
- Various style clean-ups

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Anthony Perard <anthony.perard@citrix.com>
---
 docs/designs/qemu-deprivilege.md | 12 +++++-----
 tools/libxl/libxl_linux.c        | 39 ++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md
index e1c9763a71..9028e68460 100644
--- a/docs/designs/qemu-deprivilege.md
+++ b/docs/designs/qemu-deprivilege.md
@@ -105,12 +105,6 @@ call:
 
 [qemu-namespaces]: https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg04723.html
 
-# Restrictions / improvements still to do
-
-This lists potential restrictions still to do.  It is meant to be
-listed in order of ease of implementation, with low-hanging fruit
-first.
-
 ### Basic RLIMITs
 
 '''Description''': A number of limits on the resources that a given
@@ -137,6 +131,12 @@ are specified; this does not apply to QEMU running as a Xen DM.
 
 '''Tested''': Not tested
 
+# Restrictions / improvements still to do
+
+This lists potential restrictions still to do.  It is meant to be
+listed in order of ease of implementation, with low-hanging fruit
+first.
+
 ### Further RLIMITs
 
 RLIMIT_AS limits the total amount of memory; but this includes the
diff --git a/tools/libxl/libxl_linux.c b/tools/libxl/libxl_linux.c
index 34af2e0d90..bd28f9271b 100644
--- a/tools/libxl/libxl_linux.c
+++ b/tools/libxl/libxl_linux.c
@@ -16,6 +16,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include <sys/resource.h>
  
 int libxl__try_phy_backend(mode_t st_mode)
 {
@@ -307,9 +308,31 @@ int libxl__pci_topology_init(libxl__gc *gc,
     return err;
 }
 
+static struct {
+    int resource;
+    rlim_t limit;
+} rlimits[] = {
+#define RLIMIT_ENTRY(r, l) \
+    { .resource = r, .limit = l }
+    /* Big enough for log files, not big enough for a DoS */
+    RLIMIT_ENTRY(RLIMIT_FSIZE, 256*1024),
+
+    /* Shouldn't need any of these */
+    RLIMIT_ENTRY(RLIMIT_NPROC, 0),
+    RLIMIT_ENTRY(RLIMIT_CORE, 0),
+    RLIMIT_ENTRY(RLIMIT_MSGQUEUE, 0),
+    RLIMIT_ENTRY(RLIMIT_LOCKS, 0),
+    RLIMIT_ENTRY(RLIMIT_MEMLOCK, 0),
+
+    /* End-of-list marker */
+    RLIMIT_ENTRY(RLIMIT_NLIMITS, 0),
+};
+#undef RLIMIT_ENTRY
+
 int libxl__local_dm_preexec_restrict(libxl__gc *gc)
 {
     int r;
+    unsigned i;
 
     /* Unshare mount and IPC namespaces.  These are unused by QEMU. */
     r = unshare(CLONE_NEWNS | CLONE_NEWIPC);
@@ -318,6 +341,22 @@ int libxl__local_dm_preexec_restrict(libxl__gc *gc)
         return ERROR_FAIL;
     }
 
+    /* Set various "easy" rlimits */
+    for (i = 0; rlimits[i].resource != RLIMIT_NLIMITS; i++) {
+        struct rlimit rlim;
+
+        rlim.rlim_cur = rlim.rlim_max = rlimits[i].limit;
+        
+        r = setrlimit(rlimits[i].resource, &rlim);
+        if (r < 0) {
+            LOGE(ERROR, "Setting rlimit %d to %lld failed\n",
+                                  rlimits[i].resource,
+                                  (unsigned long long)rlimits[i].limit);
+            return ERROR_FAIL;
+        }
+
+    }
+
     return 0;
 }
 
-- 
2.19.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
                   ` (2 preceding siblings ...)
  2018-10-05 16:57 ` [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs George Dunlap
@ 2018-10-05 16:57 ` George Dunlap
  2018-10-08 16:28   ` Anthony PERARD
  2018-10-26 14:06   ` Ian Jackson
  2018-10-26 12:45 ` [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
  2018-10-26 13:45 ` Ian Jackson
  5 siblings, 2 replies; 20+ messages in thread
From: George Dunlap @ 2018-10-05 16:57 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Ross Lagerwall,
	Anthony Perard, Ian Jackson

Add a tool to check whether the various process-level deprivileging
operations have actually taken place on the process.

The tool takes a domname or domid, and returns success or failure.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
Changes since v2:
- Make grep for Uid line more strict
- Fix Gid grep, make more strict
- Match strictly more than one space
- Look up the group ID for `nobody` rather than hard-coding it
- Move tests from other patches into one patch
- Remove suffix (in case we change the language)
- Install in the path

NB that a number of other requested changes (such as using `set -e`,
changing the output, &c) have not been made, while I consider whether
to leave this as a stand-alone script, or whether to merge osstest's
fd checker functionality into it (perhaps changing the language to perl
at the same time).

CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Anthony Perard <anthony.perard@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 tools/tests/depriv/Makefile               |   2 +-
 tools/tests/depriv/depriv-process-checker | 146 ++++++++++++++++++++++
 2 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100755 tools/tests/depriv/depriv-process-checker

diff --git a/tools/tests/depriv/Makefile b/tools/tests/depriv/Makefile
index 3cba28da25..1b3d09e97d 100644
--- a/tools/tests/depriv/Makefile
+++ b/tools/tests/depriv/Makefile
@@ -23,7 +23,7 @@ LDLIBS += $(LDLIBS_libxendevicemodel)
 LDLIBS += $(LDLIBS_libxentoolcore)
 LDLIBS += $(LDLIBS_libxentoollog)
 
-INSTALL_PRIVBIN-y += depriv-fd-checker
+INSTALL_PRIVBIN-y += depriv-fd-checker depriv-process-checker
 INSTALL_PRIVBIN := $(INSTALL_PRIVBIN-y)
 TARGETS += $(INSTALL_PRIVBIN)
 
diff --git a/tools/tests/depriv/depriv-process-checker b/tools/tests/depriv/depriv-process-checker
new file mode 100755
index 0000000000..18a3c9b45c
--- /dev/null
+++ b/tools/tests/depriv/depriv-process-checker
@@ -0,0 +1,146 @@
+#!/bin/bash
+
+domain="$1"
+
+if [[ "$domain" =~ ^[0-9]+$ ]] ; then
+    domid="$domain"
+else
+    domid=$(xl domid "$domain")
+fi
+
+dmpid=$(xenstore-read /local/domain/$domid/image/device-model-pid 2>/dev/null)
+if [[ -z "$dmpid" ]] ; then
+    echo "xenstore-read failed"
+    exit 1
+fi
+
+failed="false"
+
+# TEST: Process / group id
+#
+# Read /proc/<qpid>/status, checking Uid and Gid lines
+#
+# Uid should be xen-qemuuser-range-base+$domid
+# Gid should be 65534 ("nobody")
+# FIXME: deal with other UID configurations?
+echo -n "Process UID: "
+tgt_uid=$(id -u xen-qemuuser-range-base)
+tgt_uid=$(( $tgt_uid + $domid ))
+
+# Example input:
+# Uid:	1193	1193	1193	1193
+input=$(grep ^Uid: /proc/$dmpid/status)
+if [[ "$input" =~ ^Uid:[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)$ ]] ; then
+    result="PASSED"
+    for i in {1..4}; do
+	if [[ "${BASH_REMATCH[$i]}" != "$tgt_uid" ]] ; then
+	    result="FAILED"
+	    failed="true"
+	    break
+	fi
+    done
+else
+    result="FAILED"
+    failed="true"
+fi
+echo $result
+
+# Example input:
+# Gid:	10020	10020	10020	10020
+echo -n "Process GID: "
+tgt_gid=$(id -g nobody)
+input=$(grep ^Gid: /proc/$dmpid/status)
+if [[ "$input" =~ ^Gid:[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)$ ]] ; then
+    result="PASSED"
+    for i in {1..4}; do
+	if [[ "${BASH_REMATCH[$i]}" != "$tgt_gid" ]] ; then
+	    result="FAILED"
+	    failed="true"
+	    break
+	fi
+    done
+else
+    result="FAILED"
+    failed="true"
+fi
+echo $result
+
+# TEST: chroot
+#
+# Read /proc/<dmpid>/root to see if it's correct.
+echo -n "Chroot: "
+if [[ -n "$XEN_RUN_DIR" ]] ; then
+    tgt_chroot=$XEN_RUN_DIR/qemu-root-$domid
+    root=$(readlink /proc/$dmpid/root)
+    if [[ "$root" != "$tgt_chroot" ]] ; then
+	echo "FAILED"
+	failed="true"
+    else
+	echo "PASSED"
+    fi
+else
+    echo "FAILED (XEN_RUN_DIR undefined)"
+    failed="true"
+fi
+
+# TEST: Namespace unsharing
+#
+# Read /proc/<dmpid>/ns/<namespace> and make sure it's not equal to
+# the current processes' value
+for nsname in ipc mnt; do
+    echo -n "Unshare namespace $nsname: "
+    dmns=$(readlink /proc/$dmpid/ns/$nsname)
+    myns=$(readlink /proc/self/ns/$nsname)
+
+    if [[ "$dmns" == "$myns" ]] ; then
+	echo "FAILED"
+	failed="true"
+    else
+	echo "PASSED"
+    fi
+done
+
+# TEST: RLIMITs
+#
+# Read /proc/<dmpid>/limits
+function check_rlimit() {
+    limit_name=$1
+    limit_string=$2
+    tgt=$3
+
+    echo -n "rlimit $limit_name: "
+    input=$(grep "^$limit_string" /proc/$dmpid/limits)
+    
+    if [[ -z "$input" ]] ; then
+	echo "Couldn't find limit $limit"
+	echo FAILED
+	failed="true"
+	return
+    fi
+    
+    if [[ "$input" =~ ^$limit_string[[:space:]]*([^[:space:]]+)[[:space:]]*([^[:space:]]+)[[:space:]]*[^[:space:]]+ ]] ; then
+	if [[ "${BASH_REMATCH[1]}" != $tgt ||
+		  "${BASH_REMATCH[2]}" != $tgt ]] ; then
+	    echo "FAILED"
+	    failed="true"
+	else
+	    echo "PASSED"
+	fi
+    else
+	echo "Couldn't parse /proc/<dmpid>/limits"
+	echo "FAILED"
+	failed="true"
+    fi
+}
+check_rlimit FSIZE "Max file size" "262144"
+check_rlimit NPROC "Max processes" 0
+check_rlimit CORE "Max core file size" "0"
+check_rlimit MSGQUEUE "Max msgqueue size" 0
+check_rlimit LOCKS "Max file locks" 0
+check_rlimit MEMLOCK "Max locked memory" 0
+
+if $failed ; then
+    exit 1
+else
+    exit 0
+fi
-- 
2.19.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-05 16:57 ` [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv George Dunlap
@ 2018-10-08 16:28   ` Anthony PERARD
  2018-10-08 16:44     ` Ian Jackson
  2018-10-26 15:28     ` George Dunlap
  2018-10-26 14:06   ` Ian Jackson
  1 sibling, 2 replies; 20+ messages in thread
From: Anthony PERARD @ 2018-10-08 16:28 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ross Lagerwall, xen-devel, Stefano Stabellini, Wei Liu, Ian Jackson

On Fri, Oct 05, 2018 at 05:57:01PM +0100, George Dunlap wrote:
> +# TEST: Process / group id
> +#
> +# Read /proc/<qpid>/status, checking Uid and Gid lines
> +#
> +# Uid should be xen-qemuuser-range-base+$domid
> +# Gid should be 65534 ("nobody")

That is wrong. Gid doesn't have to be nobody. gid can be chosen when
creating the base user id. (And I'm pretty sure "nobody" should be
avoided.)

> +# FIXME: deal with other UID configurations?
> +echo -n "Process UID: "
> +tgt_uid=$(id -u xen-qemuuser-range-base)
> +tgt_uid=$(( $tgt_uid + $domid ))
> +
> +# Example input:
> +# Uid:	1193	1193	1193	1193
> +input=$(grep ^Uid: /proc/$dmpid/status)
> +if [[ "$input" =~ ^Uid:[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)$ ]] ; then
> +    result="PASSED"
> +    for i in {1..4}; do
> +	if [[ "${BASH_REMATCH[$i]}" != "$tgt_uid" ]] ; then
> +	    result="FAILED"
> +	    failed="true"
> +	    break
> +	fi
> +    done
> +else
> +    result="FAILED"
> +    failed="true"
> +fi
> +echo $result
> +
> +# Example input:
> +# Gid:	10020	10020	10020	10020
> +echo -n "Process GID: "
> +tgt_gid=$(id -g nobody)

This should be `id -g xen-qemuuser-range-base`.

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-08 16:28   ` Anthony PERARD
@ 2018-10-08 16:44     ` Ian Jackson
  2018-10-26 15:28     ` George Dunlap
  1 sibling, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2018-10-08 16:44 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Stefano Stabellini, Wei Liu, George Dunlap, Ross Lagerwall

Anthony PERARD writes ("Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv"):
> On Fri, Oct 05, 2018 at 05:57:01PM +0100, George Dunlap wrote:
> > +# TEST: Process / group id
> > +#
> > +# Read /proc/<qpid>/status, checking Uid and Gid lines
> > +#
> > +# Uid should be xen-qemuuser-range-base+$domid
> > +# Gid should be 65534 ("nobody")
> 
> That is wrong. Gid doesn't have to be nobody. gid can be chosen when
> creating the base user id. (And I'm pretty sure "nobody" should be
> avoided.)

The gid is not really relevant but nobody is sometimes chosen as a gid
that no process has so is perhaps a poor choice.  A single specific
gid for all of these would be fine.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
                   ` (3 preceding siblings ...)
  2018-10-05 16:57 ` [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv George Dunlap
@ 2018-10-26 12:45 ` George Dunlap
  2018-10-26 13:45 ` Ian Jackson
  5 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2018-10-26 12:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, Andrew Cooper,
	Tim Deegan, Ross Lagerwall, Julien Grall, Jan Beulich,
	Anthony Perard, Ian Jackson

Ping?  It's been nearly 3 weeks with only minor review for this series.

 -George

On 10/05/2018 05:56 PM, George Dunlap wrote:
> docs/qemu-deprivilege.txt had some basic instructions for using
> dm_restrict, but it was incomplete, misleading, and stale.
> 
> Update the docs in a number of ways.
> 
> First, separate user-facing documentation and technical description
> into docs/features and docs/design, respectively.
> 
> In the feature doc:
> 
> * Introduce a section mentioning minimim versions of Linux, Xen, and
> qemu required (TBD)
> 
> * Fix the discussion of qemu userid.  Mention xen-qemuuser-range-base,
> and provide example shell code that actually has some hope of working
> (instead of failing out after creating 900 userids).
> 
> * Describe how to enable restrictions, as well as features which
> probably don't or definitely don't work.
> 
> In the design doc, introduce a "Technical Details" section which
> describes specifically what restrictions are currently done, and also
> what restrictions we are looking at doing in the future.
> 
> The idea here is that as we implement the various items for the
> future, we move them from "Restrictions still to do" to "Restrictions
> done".  This can also act as a design document -- a place for public
> discussion of what can or should be done and how.
> 
> Also add an entry to SUPPORT.md.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> Changes since v2:
> - Extraneous privcmd / evtchn instances aren't closed
> - Expand description of how to test fd deprivileging
> - Rework and clarify two namespace sections, give reference for QEMU NAK
> - Add more information about migration technical challenges
> - In UID section, mention possibility of container ID collisions.
> - Fix name of design document.
> - Add SUPPORT.md statement.  Specify Linux, to make sure that FreeBSD is
>   evaluated separately.
> - Mention that `-sandbox` is a blacklist and why
> 
> Changes since v1:
> - Break into two, and move into appropriate directories (rather than 'misc')
> - Updated version requirements
> - Distinguish between features which "don't yet work" and features which we never expect to work
> - Update description of xen-restrict functionality
> - Reorder and expand further restrictions
> - Make it more clear which restrictions are available on Linux only
> - Include detailed description of how to kill a process
> - Add RLIMIT_NPROC as something we can do without further changes to qemu
> - Document the need to check for the sandbox feature before using it
> 
> Thank you to Ross Lagerwall, whose description of what XenServer is
> doing formed much of the basis for the text here.
> 
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: Konrad Wilk <konrad.wilk@oracle.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Anthony Perard <anthony.perard@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>  SUPPORT.md                            |  20 ++
>  docs/designs/qemu-deprivilege.md      | 322 ++++++++++++++++++++++++++
>  docs/features/qemu-deprivilege.pandoc |  94 ++++++++
>  docs/misc/qemu-deprivilege.txt        |  36 ---
>  4 files changed, 436 insertions(+), 36 deletions(-)
>  create mode 100644 docs/designs/qemu-deprivilege.md
>  create mode 100644 docs/features/qemu-deprivilege.pandoc
>  delete mode 100644 docs/misc/qemu-deprivilege.txt
> 
> diff --git a/SUPPORT.md b/SUPPORT.md
> index 3727446b83..b5e7e44fb3 100644
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -525,6 +525,26 @@ Vulnerabilities of a device model stub domain
>  to a hostile driver domain (either compromised or untrusted)
>  are excluded from security support.
>  
> +### Device Model Deprivileging
> +
> +    Status, Linux: Tech Preview, with limited support
> +
> +This means adding extra restrictions to a device model running in
> +domain 0 in order to prevent a compromised device model to attack the
> +rest of the system.
> +
> +"Tech preview with limited support" means we will not issue XSAs for
> +the _additional_ functionality provided by the feature; but we will
> +issue XSAs in the event that enabling this feature opens up a security
> +hole that would not be present without the feature disabled.
> +
> +For example, while this is classified as tech preview, a bug in libxl
> +which failed to change the user ID of QEMU would not receive an XSA,
> +since without this feature the user ID wouldn't be changed. But a
> +change which made it possible for a compromised guest to read
> +arbitrary files on the host filesystem without compromising QEMU would
> +be issued an XSA, since that does weaken security.
> +
>  ### KCONFIG Expert
>  
>      Status: Experimental
> diff --git a/docs/designs/qemu-deprivilege.md b/docs/designs/qemu-deprivilege.md
> new file mode 100644
> index 0000000000..d3c6495030
> --- /dev/null
> +++ b/docs/designs/qemu-deprivilege.md
> @@ -0,0 +1,322 @@
> +# Introduction
> +
> +The goal of deprilvileging qemu is this: Even if there is a bug (for
> +example in qemu) which permits a domain to gain control of the device
> +model, the compromised device model process is prevented from
> +violating the system's overall security properties.  Ie, a guest
> +cannot "escape" from the virtualisation by using a qemu bug.
> +
> +This document lists the various technical measures which we either
> +have taken, or plan to take to effect this goal.  Some of them are
> +required to be considered secure (that is, there are known attack
> +vectors which they close); others are "just in case" (that is, there
> +are no known attack vectors, but we perform the restrictions to reduce
> +the possibility of unknown attack vectors).
> +
> +# Restrictions done
> +
> +The following restrictions are currently implemented.
> +
> +## Having qemu switch user
> +
> +'''Description''': As mentioned above, having QEMU switch to a
> +non-root user, one per domain id.  Not being the root user limits what
> +a compromised QEMU process can do to the system, and having one user
> +per domain id limits what a comprimised QEMU process can do to the
> +QEMU processes of other VMs.
> +
> +'''Implementation''': The toolstack adds the following to the qemu command-line:
> +
> +    -runas <uid>:<gid>
> +
> +'''How to test''':
> +
> +    grep /proc/<qpid>/status [UG]id
> +
> +'''Testing Status''': Not tested
> +
> +## Xen library / file-descriptor restrictions
> +
> +'''Description''': Close and restrict Xen-related file descriptors.
> +Specifically:
> + * Close all xenstore-related file descriptors
> + * Make sure that all open instances of `privcmd` and `evtchn` file
> +descriptors have had `IOCTL_PRIVCMD_RESTRICT` and
> +`IOCTL_EVTCHN_RESTRICT_DOMID` ioctls called on them, respectively.
> +
> +FIXME: Double-check the correctness of the above
> +
> +'''Implementation''': Toolstack adds the following to the qemu command-line:
> +
> +    -xen-domid-restrict
> +
> +'''How to test''':
> +
> +Use `fishdescriptor` to pull a file descriptor from a running QEMU,
> +then use `depriv-fd-checker` to check that it has the desired
> +properties, and that hypercalls which are meant to fail do fail.  (In
> +Debian `fishdescriptor` can be found in the binary package
> +`chiark-scripts`; the `depriv-fd-checker` is included in the Xen
> +source tree.)
> +
> +'''Testing status''': Tested
> +
> +# Restrictions / improvements still to do
> +
> +This lists potential restrictions still to do.  It is meant to be
> +listed in order of ease of implementation, with low-hanging fruit
> +first.
> +
> +## Chroot
> +
> +'''Description''': Qemu runs in its own chroot, such that even if it
> +could call an 'open' command of some sort, there would be nothing for
> +it to see.
> +
> +'''Implementation''': The toolstack creates a directory in the libxl "run-dir"; e.g.
> +`/var/run/xen/qemu-root-<domid>`
> +
> +Then adds the following to the qemu command-line:
> +
> +    -chroot /var/run/xen/qemu-root-<domid>
> +	
> +'''How to test''':  Check `/proc/<qpid>/root`
> +	
> +'''Tested''': Not tested
> +
> +## Namespaces for unused functionality (Linux only)
> +
> +'''Description''': QEMU doesn't use the functionality associated with
> +mount and IPC namespaces. (IPC namespaces contol non-file-based IPC
> +mechanisms within the kernel; unix and network sockets are not
> +affected by this.)  Making separate namespaces for these for QEMU
> +won't affect normal operation, but it does mean that even if other
> +restrictions fail, the process won't be able to even name system mount
> +points or exsting non-file-based IPC descriptors to attempt to attack
> +them.
> +
> +'''Implementation''':
> +
> +In theory this could be done in QEMU (similar to -sandbox, -runas,
> +-chroot, and so on), but a patch doing this in QEMU was NAKed upstream
> +(see [qemu-namespaces]). They preferred that this was done as a setup step by
> +whatever executes QEMU; i.e., have the process which exec's QEMU first
> +call:
> +
> +    unshare(CLONE_NEWNS | CLONE_NEWIPC)
> +	
> +'''How to test''':  Check `/proc/<qpid>/ns/[ipc,mnt]`
> +
> +'''Tested''': Not tested
> +
> +[qemu-namespaces]: https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg04723.html
> +
> +### Basic RLIMITs
> +
> +'''Description''': A number of limits on the resources that a given
> +process / userid is allowed to consume.  These can limit the ability
> +of a compromised QEMU process to DoS domain 0 by exhausting various
> +resources available to it.
> +
> +'''Implementation'''
> +
> +Limits that can be implemented immediately without much effort:
> + - RLIMIT_FSIZE` (file size) to 256KiB.
> + - RLIMIT_NPROC (after uid changes to a unique uid)
> +
> +Probably not necessary but why not:
> + - RLIMIT_CORE: 0
> + - RLIMIT_MSGQUEUE: 0
> + - RLIMIT_LOCKS: 0
> + - RLIMIT_MEMLOCK: 0
> + 
> +Note: mlock() is used by QEMU only when both "realtime" and "mlock"
> +are specified; this does not apply to QEMU running as a Xen DM.
> +   
> +'''How to test''': Check `/proc/<qpid>/limits`
> +
> +'''Tested''': Not tested
> +
> +### Further RLIMITs
> +
> +RLIMIT_AS limits the total amount of memory; but this includes the
> +virtual memory which QEMU uses as a mapcache.  xen-mapcache.c already
> +fiddles with this; it would be straightforward to make it *set* the
> +rlimit to what it thinks a sensible limit is.
> +
> +Other things that would take some cleverness / changes to QEMU to
> +utilize due to ordering constrants:
> + - RLIMIT_NOFILES (after all necessary files are opened)
> +
> +### libxl UID cleanup
> +
> +'''Description''': Domain IDs are reused, and thus restricted UIDs are
> +reused.  If a compromised QEMU can fork (due to seccomp or
> +RLIMIT_NPROC limits being ineffective for some reason), it may avoid
> +being killed when its domain dies, then wait until the domain ID is
> +reused again, at which point it will have control over the domain in
> +question (which probably belongs to someone else).
> +
> +libxl should kill all UIDs associated with a domain both when the VM
> +is destroyed, and before starting a VM with the same UID.
> +
> +'''Implementation''': This is unnecessarily tricky.
> +
> +The kill() system call can have three kinds of targets:
> + - A single pid
> + - A process group
> + - "Every process except me to which I am allowed to send a signal" (-1)
> +
> +Targeting a single pid is racy and likely to be beaten by the
> +following loop:
> +
> +    while(1) {
> +        if(fork())
> +	    _exit(0);
> +    }	  
> +
> +That is, by the time you've read the process list and found the
> +process id you want to kill, that process has exited and there is a
> +new process whose pid you don't know about.
> +
> +Targeting a process group will be ineffective, as unprivileged
> +processes are allowed to make their own process groups.
> +
> +kill(-1) can be used but must be done with care.  Consider the
> +following code, for example:
> +
> +    setuid(target_uid);
> +    kill(-1, 9);
> +
> +This looks like it will do the trick; but by setting all of the user
> +ids (effective, real, and saved), it opens the 'killing' process up to
> +being killed by the target process:
> +
> +    while(1) {
> +        if(fork())
> +            _exit(0);
> +        else
> +            kill(-1, 9);
> +    }
> +
> +Fortunately there is an assymetry we can take advantage of.  From the
> +POSIX spec:
> +
> +> For a process to have permission to send a signal to a process
> +> designated by pid, unless the sending process has appropriate
> +> privileges, the real or effective user ID of the sending process shall
> +> match the real or saved set-user-ID of the receiving process.
> +
> +The solution is to allocate a second "reaper" uid that is only used to kill
> +target processes.  We set the euid of the killing process to the `target_uid`,
> +but the ruid of the killing process to `reaper_uid`, leaving the suid of the
> +killing process as 0:
> +
> +    setresuid(reaper_uid, target_uid, 0);
> +    kill(-1, 9);
> +
> +NOTE: We cannot use `setreuid(reaper_uid, target_uid)` here, as that
> +will set *both* euid *and* suid to `target_uid`, making the killing
> +process vulnerable to the target process again.
> +
> +Since this will kill all other `reaper_uid` processes as well, we must
> +either allocate a separate `reaper_uid` per domain, or use locking to
> +ensure that only one killing process is active at a time.
> +
> +## libxl: Treat QMP connection as untrusted
> +
> +'''Description''': Currently libxl talks with QEMU via QMP; but its
> +interactions have not historically considered from a security point of
> +view.  For example, qmp_synchronous_send() waits for a response from
> +QEMU, which a compromised QEMU could simply not send (thus preventing
> +the toolstack from making forward progress).
> +
> +'''Implementation''': Audit toolstack interactions with QEMU which
> +happen after the guest has started running, and assume QEMU has been
> +compromised.
> +
> +### seccomp filtering (Linux only)
> +
> +'''Description''': Turn on seccomp filtering to disable syscalls which
> +QEMU doesn't need. 
> +
> +'''Implementation''': Enable from the command-line:
> +
> +    -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny
> +
> +`elevateprivileges` is currently required to allow `-runas` to work.
> +Removing this requirement would mean making sure that the uid change
> +happened before the seccomp2 call, perhaps by changing the uid before
> +executing QEMU.  (But this would then require other changes to create
> +the QMP socket, VNC socket, and so on).
> +
> +It should be noted that `-sandbox` is implemented as a blacklist, not
> +a whitelist; that is, it disables known-unsed functionality which may
> +be harmful, rather than disabling all functionality except that known
> +to be safe and needed.  This is unfortunately necessary since qemu
> +doesn't know what system calls libraries might end up making.  (See
> +[lwn-seccomp] for a more complete discussion.)
> +
> +This feature is not on by default and may not be available in all
> +environments.  We therefore need to either:
> + 1. Require that this feature be enabled to build qemu
> + 2. Check for `-sandbox` support at runtime before 
> +
> +[lwn-seccomp]: https://lwn.net/Articles/738694/
> +
> +### Disks
> +
> +The chroot (and seccomp?) happens late enough such that QEMU can
> +initialize itself and open its disks. If you want to add a disk at run
> +time via or insert a CD, you can't pass a path because QEMU is
> +chrooted. Instead use the add-fd QMP command and use
> +/dev/fdset/<fdset-id> as the path.
> +
> +A further layer of restriction could be to set RLIMIT_NOFILES to '0',
> +and hand all disks over QMP.
> +
> +## Migration
> +
> +When calling xen-save-devices-state, since QEMU is running in a chroot
> +it is not useful to pass a filename (it doesn't even have write access
> +inside the chroot). Instead, give it an open fd using the add-fd
> +mechanism.
> +
> +Additionally, all the restrictions need to be applied to the qemu
> +started up on the post-migration side.  One issue that needs to be
> +solved is how to signal the toolstack on restore that qemu is ready
> +for the domain to be started (since this is normally done via
> +xenstore, and at this point the xenstore connections will have been
> +closed).
> +
> +### Network namespacing (Linux only)
> +
> +Enter QEMU into its own network namespace (in addition to mount & IPC
> +namespaces):
> +
> +    unshare(CLONE_NEWNET);
> +
> +QEMU does actually use the network namespace as a Xen DM for two
> +purposes: 1) To set up network tap devices 2) To open vnc connections.
> +
> +#### Network
> +
> +If QEMU runs in its own network namespace, it can't open the tap
> +device itself because the interface won't be visible outside of its
> +own namespace. So instead, have the toolstack open the device and pass
> +it as an fd on the command-line:
> +
> +    -device rtl8139,netdev=tapnet0,mac=... -netdev tap,id=tapnet0,fd=<tapfd>
> +
> +#### VNC
> +
> +If QEMU runs in its own network namespace, it is not straightforward
> +to listen on a TCP socket outside of its own network namespace. One
> +option would be to use VNC over a UNIX socket:
> +
> +    -vnc unix:/var/run/xen/vnc-<domid>
> +
> +However, this would break functionality in the general case; I think
> +we need to have the toolstack open a socket and pass the fd to QEMU
> +(which requires changes to QEMU).
> +
> diff --git a/docs/features/qemu-deprivilege.pandoc b/docs/features/qemu-deprivilege.pandoc
> new file mode 100644
> index 0000000000..6fb48f3e40
> --- /dev/null
> +++ b/docs/features/qemu-deprivilege.pandoc
> @@ -0,0 +1,94 @@
> +% QEMU Deprivileging / dm_restrict
> +% Revision 1
> +
> +\clearpage
> +
> +# Basics
> +
> +---------------- ----------------------------------------------------
> +         Status: **Tech Preview**
> +
> +Architecture(s): x86
> +
> +   Component(s): toolstack
> +
> +---------------- ----------------------------------------------------
> +
> +# Overview
> +
> +By default, the QEMU device model is run in domain 0.  If an attacker
> +can gain control of a QEMU process, it could easily take control of a
> +system.
> +
> +dm_restrict is a set of operations to restrict QEMU running in domain
> +0.  It consists of two halves:
> +
> + 1. Mechanisms to restrict QEMU to only being able to affect its own
> +domain
> + 2. Mechanisms to restruct QEMU's ability to interact with domain 0.
> +
> +# User details
> +
> +## Getting the right versions of software
> +
> +Linux: 4.11+
> +
> +Qemu: 3.0+ (Or the version that comes with Xen 4.12+)
> +
> +## Setting up a userid range
> +
> +For maximum security, libxl needs to run the devicemodel for each
> +domain under a user id (UID) corresponding to its domain id.  There
> +are 32752 possible domain IDs, and so libxl needs 32752 user ids set
> +aside for it.
> +
> +The simplest and most effective way to do this is to allocate a
> +contiguous block of UIDs, and create a single user named
> +`xen-qemuuser-range-base` with the first UID.  For example, under Debian:
> +
> +    adduser --no-create-home --uid 65536 --system xen-qemuuser-range-base
> +
> +NOTE: Most modern systems have 32-bit UIDs, and so can in theory go up
> +to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
> +UIDs however; UID 65535 is reserved for an invalid value, and 65534 is
> +normally allocated to "nobody".  Additionally, some container systems
> +have proposed using the upper 32 bits of the uid for a container ID.
> +
> +Another, less-secure way is to run all QEMUs as the same UID.  To do
> +this, create a user named `xen-qemuuser-shared`; for example:
> +
> +    adduser --no-create-home --system xen-qemuuser-shared
> +
> +## Domain config changes
> +
> +The core domain config change is to add the following line to the
> +domain configuration:
> +
> +    dm_restrict=1
> +
> +This will perform a number of restrictions, outlined below in the
> +'Technical details' section.
> +
> +# Technical details
> +
> +See docs/design/qemu-deprivilege.md for technical details.
> +
> +# Limitations
> +
> +The following features still need to be implemented:
> + * Inserting a new cdrom while the guest is running (xl cdrom-insert)
> + * Migration / save / restore
> +
> +Additionally, getting PCI passthrough to work securely would require a
> +significant rework of how passthrough works at the moment.  It may be
> +implemented at some point but is not a near-term priority.
> +
> +See SUPPORT.md for security support status.
> +
> +# History
> +
> +------------------------------------------------------------------------
> +Date       Revision Version  Notes
> +---------- -------- -------- -------------------------------------------
> +2018-09-14 1        Xen 4.12 Imported from docs/misc
> +---------- -------- -------- -------------------------------------------
> diff --git a/docs/misc/qemu-deprivilege.txt b/docs/misc/qemu-deprivilege.txt
> deleted file mode 100644
> index 58b86a3908..0000000000
> --- a/docs/misc/qemu-deprivilege.txt
> +++ /dev/null
> @@ -1,36 +0,0 @@
> -For security reasons, libxl tries to pass a non-root username to QEMU as
> -argument. During initialization QEMU calls setuid and setgid with the
> -user ID and the group ID of the user passed as argument.
> -Libxl looks for the following users in this order:
> -
> -1) a user named "xen-qemuuser-domid$domid",
> -Where $domid is the domid of the domain being created.
> -This requires the reservation of 65535 uids from xen-qemuuser-domid1
> -to xen-qemuuser-domid65535. To use this mechanism, you might want to
> -create a large number of users at installation time. For example:
> -
> -for ((i=1; i<65536; i++))
> -do
> -    adduser --no-create-home --system xen-qemuuser-domid$i
> -done
> -
> -You might want to consider passing --group to adduser to create a new
> -group for each new user.
> -
> -
> -2) a user named "xen-qemuuser-shared"
> -As a fall back if both 1) fails, libxl will use a single user for
> -all QEMU instances. The user is named xen-qemuuser-shared. This is
> -less secure but still better than running QEMU as root. Using this is as
> -simple as creating just one more user on your host:
> -
> -adduser --no-create-home --system xen-qemuuser-shared
> -
> -
> -3) root
> -As a last resort, libxl will start QEMU as root.
> -
> -
> -Please note that running QEMU as non-root causes several features like
> -migration and PCI passthrough to not work properly and may prevent the guest
> -from booting.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans
  2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
                   ` (4 preceding siblings ...)
  2018-10-26 12:45 ` [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
@ 2018-10-26 13:45 ` Ian Jackson
       [not found]   ` <bdf85f15-4eb7-a1b1-75cf-b31cae07010b@citrix.com>
  5 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 13:45 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, Andrew Cooper,
	Tim Deegan, Ross Lagerwall, Julien Grall, Jan Beulich,
	Anthony Perard, xen-devel

George Dunlap writes ("[PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans"):
> docs/qemu-deprivilege.txt had some basic instructions for using
> dm_restrict, but it was incomplete, misleading, and stale.

Thanks for the updates to the unshare stuff.

> +### Device Model Deprivileging
> +
> +    Status, Linux: Tech Preview, with limited support
                    ^
                     dom0

I think this maybe needs

  +    Status, FreeBSD dom0: Unsupported

too ?  The usual default is supported and not listing it at all is
confusing.

> +NOTE: Most modern systems have 32-bit UIDs, and so can in theory go up
> +to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
> +UIDs however; UID 65535 is reserved for an invalid value, and 65534 is
> +normally allocated to "nobody".  Additionally, some container systems
> +have proposed using the upper 32 bits of the uid for a container ID.
                                 ^^
                                 16
This is a good paragraph.

Can I suggest we pick a different example to 65536 ?  It's visually
similar to the familiar values of 65534 and 65535 and abuts them.

osstest uses 200000 but that's not a multiple of 2^16.
How about 131072 ?

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot
  2018-10-05 16:56 ` [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot George Dunlap
@ 2018-10-26 13:52   ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 13:52 UTC (permalink / raw)
  To: George Dunlap; +Cc: Anthony Perard, xen-devel, Wei Liu

George Dunlap writes ("[PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot"):
> When dm_restrict is enabled, ask QEMU to chroot into an empty directory.
> 
> * Create /var/run/qemu/root-domid (deleting the old one if it's there)
> * Pass the -chroot option to QEMU
> 
> Rather than running `rm -rf` on the directory before creating it
> (since there is no library function to do this), simply rmdir the
> directory, relying on the fact that the previous QEMU instance, if
> properly restricted, shouldn't have been able to write anything
> anyway.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux
  2018-10-05 16:56 ` [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux George Dunlap
@ 2018-10-26 14:00   ` Ian Jackson
  2018-10-26 16:18     ` George Dunlap
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 14:00 UTC (permalink / raw)
  To: George Dunlap; +Cc: Anthony Perard, xen-devel, Wei Liu

Thanks, just tiny comments on this.

George Dunlap writes ("[PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux"):
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 385643b52c..702ea75149 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -2393,6 +2393,12 @@ retry_transaction:
>          goto out_close;
>      if (!rc) { /* inner child */
>          setsid();
> +        if (libxl_defbool_val(b_info->dm_restrict))
> +        {

Style: misplaced {.

> +            if (rc != 0)
> +                _exit(-1);

I won't insist on a change here, since CODING_STYLE is not explicit,
but you may want to take note of these statistics:

$ git-grep 'if (rc)' tools/libxl | wc -l
520
$ git-grep 'if (rc != 0)' tools/libxl | wc -l
23

> +int libxl__local_dm_preexec_restrict(libxl__gc *gc)
> +{
> +    int r;
> +
> +    /* Unshare mount and IPC namespaces.  These are unused by QEMU. */
> +    r = unshare(CLONE_NEWNS | CLONE_NEWIPC);
> +    if (r < 0) {
> +        LOGE(ERROR, "libxl: Mount and IPC namespace unfailed");

I think I would slightly prefer
  +    if (r) {

unshare is supposed to return -1 or 0 so this should make no
difference.  If it does something else we should not carry on.  It's
unlikely enough that I don't mind it printing a garbage errno value in
that case.

Anyway,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

assuming you change at least the misplaced { which contradicts
CODING_STYLE.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs
  2018-10-05 16:57 ` [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs George Dunlap
@ 2018-10-26 14:02   ` Ian Jackson
  2018-10-26 16:31     ` George Dunlap
  0 siblings, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 14:02 UTC (permalink / raw)
  To: George Dunlap; +Cc: Anthony Perard, xen-devel, Wei Liu

George Dunlap writes ("[PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs"):
> Limit the ability of a potentially compromised QEMU to consume system
> resources.  Key limits:
>  - RLIMIT_FSIZE (file size): 256KiB
>  - RLIMIT_NPROC (after uid changes to a unique uid)

Thanks.

> +static struct {
> +    int resource;
> +    rlim_t limit;
> +} rlimits[] = {
> +#define RLIMIT_ENTRY(r, l) \
> +    { .resource = r, .limit = l }
> +    /* Big enough for log files, not big enough for a DoS */
> +    RLIMIT_ENTRY(RLIMIT_FSIZE, 256*1024),
> +
> +    /* Shouldn't need any of these */
> +    RLIMIT_ENTRY(RLIMIT_NPROC, 0),
> +    RLIMIT_ENTRY(RLIMIT_CORE, 0),
> +    RLIMIT_ENTRY(RLIMIT_MSGQUEUE, 0),
> +    RLIMIT_ENTRY(RLIMIT_LOCKS, 0),
> +    RLIMIT_ENTRY(RLIMIT_MEMLOCK, 0),

I would have justified the values so this looked more tabular.

> +    /* Set various "easy" rlimits */
> +    for (i = 0; rlimits[i].resource != RLIMIT_NLIMITS; i++) {
> +        struct rlimit rlim;
> +
> +        rlim.rlim_cur = rlim.rlim_max = rlimits[i].limit;
> +        
> +        r = setrlimit(rlimits[i].resource, &rlim);
> +        if (r < 0) {
> +            LOGE(ERROR, "Setting rlimit %d to %lld failed\n",
> +                                  rlimits[i].resource,
> +                                  (unsigned long long)rlimits[i].limit);

I think you mean %llu not %lld.  With that last point changed,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-05 16:57 ` [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv George Dunlap
  2018-10-08 16:28   ` Anthony PERARD
@ 2018-10-26 14:06   ` Ian Jackson
  2018-10-26 14:24     ` George Dunlap
  1 sibling, 1 reply; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 14:06 UTC (permalink / raw)
  To: George Dunlap
  Cc: Anthony Perard, xen-devel, Stefano Stabellini, Wei Liu, Ross Lagerwall

George Dunlap writes ("[PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv"):
> Add a tool to check whether the various process-level deprivileging
> operations have actually taken place on the process.
...
> NB that a number of other requested changes (such as using `set -e`,
> changing the output, &c) have not been made, while I consider whether
> to leave this as a stand-alone script, or whether to merge osstest's
> fd checker functionality into it (perhaps changing the language to perl
> at the same time).

OK.  But, unfortunately, it is very hard to review a shell script that
is written without `set -e'.  Generally a big focus of my usual review
style is error handling.

Also, I suggested some refactoring.  Seeing the script as it is makes
it more obvious that a systematic approach to printing FAILED
etc. would be a good idea.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-26 14:06   ` Ian Jackson
@ 2018-10-26 14:24     ` George Dunlap
  2018-10-26 14:27       ` Ian Jackson
  0 siblings, 1 reply; 20+ messages in thread
From: George Dunlap @ 2018-10-26 14:24 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Anthony Perard, xen-devel, Stefano Stabellini, Wei Liu, Ross Lagerwall

On 10/26/2018 03:06 PM, Ian Jackson wrote:
> George Dunlap writes ("[PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv"):
>> Add a tool to check whether the various process-level deprivileging
>> operations have actually taken place on the process.
> ...
>> NB that a number of other requested changes (such as using `set -e`,
>> changing the output, &c) have not been made, while I consider whether
>> to leave this as a stand-alone script, or whether to merge osstest's
>> fd checker functionality into it (perhaps changing the language to perl
>> at the same time).
> 
> OK.  But, unfortunately, it is very hard to review a shell script that
> is written without `set -e'.  Generally a big focus of my usual review
> style is error handling.
> 
> Also, I suggested some refactoring.  Seeing the script as it is makes
> it more obvious that a systematic approach to printing FAILED
> etc. would be a good idea.

FYI I do agree with all of those suggestions (both `set -e` and having
functions to handle failure in a consistent way); but I didn't want to
fix everything up in bash only to have to write it over again in perl
(should I decide to do so).  I wasn't expecting a review for this patch
yet; I only included it for completeness.  I guess I figured "I didn't
make all the changes you wanted" would make that obvious, but next time
I'll be more explicit when I don't expect a patch to be reviewed. :-)

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-26 14:24     ` George Dunlap
@ 2018-10-26 14:27       ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2018-10-26 14:27 UTC (permalink / raw)
  To: George Dunlap
  Cc: Anthony Perard, xen-devel, Stefano Stabellini, Wei Liu, Ross Lagerwall

George Dunlap writes ("Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv"):
> FYI I do agree with all of those suggestions (both `set -e` and having
> functions to handle failure in a consistent way); but I didn't want to
> fix everything up in bash only to have to write it over again in perl
> (should I decide to do so).  I wasn't expecting a review for this patch
> yet; I only included it for completeness.  I guess I figured "I didn't
> make all the changes you wanted" would make that obvious, but next time
> I'll be more explicit when I don't expect a patch to be reviewed. :-)

Err, right, good.  I'm glad I didn't try to review it anyway :-).

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-08 16:28   ` Anthony PERARD
  2018-10-08 16:44     ` Ian Jackson
@ 2018-10-26 15:28     ` George Dunlap
  2018-10-29 10:16       ` Ian Jackson
  1 sibling, 1 reply; 20+ messages in thread
From: George Dunlap @ 2018-10-26 15:28 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: Ross Lagerwall, xen-devel, Stefano Stabellini, Wei Liu, Ian Jackson

On 10/08/2018 05:28 PM, Anthony PERARD wrote:
> On Fri, Oct 05, 2018 at 05:57:01PM +0100, George Dunlap wrote:
>> +# TEST: Process / group id
>> +#
>> +# Read /proc/<qpid>/status, checking Uid and Gid lines
>> +#
>> +# Uid should be xen-qemuuser-range-base+$domid
>> +# Gid should be 65534 ("nobody")
> 
> That is wrong. Gid doesn't have to be nobody. gid can be chosen when
> creating the base user id. (And I'm pretty sure "nobody" should be
> avoided.)

Oh, actually, 65534 is "nogroup", which is the default when you don't
add a specific group.

Should we recommend creating a separate group for the Xen qemus in our
feature doc?  Or should we just mention the possibility, but leave the
actual example to the default (which will normally end up with the
`nogroup` group)?

> 
>> +# FIXME: deal with other UID configurations?
>> +echo -n "Process UID: "
>> +tgt_uid=$(id -u xen-qemuuser-range-base)
>> +tgt_uid=$(( $tgt_uid + $domid ))
>> +
>> +# Example input:
>> +# Uid:	1193	1193	1193	1193
>> +input=$(grep ^Uid: /proc/$dmpid/status)
>> +if [[ "$input" =~ ^Uid:[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)[[:space:]]+([0-9]+)$ ]] ; then
>> +    result="PASSED"
>> +    for i in {1..4}; do
>> +	if [[ "${BASH_REMATCH[$i]}" != "$tgt_uid" ]] ; then
>> +	    result="FAILED"
>> +	    failed="true"
>> +	    break
>> +	fi
>> +    done
>> +else
>> +    result="FAILED"
>> +    failed="true"
>> +fi
>> +echo $result
>> +
>> +# Example input:
>> +# Gid:	10020	10020	10020	10020
>> +echo -n "Process GID: "
>> +tgt_gid=$(id -g nobody)
> 
> This should be `id -g xen-qemuuser-range-base`.

Got it

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux
  2018-10-26 14:00   ` Ian Jackson
@ 2018-10-26 16:18     ` George Dunlap
  0 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2018-10-26 16:18 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Anthony Perard, xen-devel, Wei Liu

On 10/26/2018 03:00 PM, Ian Jackson wrote:
> Thanks, just tiny comments on this.
> 
> George Dunlap writes ("[PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux"):
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index 385643b52c..702ea75149 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -2393,6 +2393,12 @@ retry_transaction:
>>          goto out_close;
>>      if (!rc) { /* inner child */
>>          setsid();
>> +        if (libxl_defbool_val(b_info->dm_restrict))
>> +        {
> 
> Style: misplaced {.
> 
>> +            if (rc != 0)
>> +                _exit(-1);
> 
> I won't insist on a change here, since CODING_STYLE is not explicit,
> but you may want to take note of these statistics:
> 
> $ git-grep 'if (rc)' tools/libxl | wc -l
> 520
> $ git-grep 'if (rc != 0)' tools/libxl | wc -l
> 23
> 
>> +int libxl__local_dm_preexec_restrict(libxl__gc *gc)
>> +{
>> +    int r;
>> +
>> +    /* Unshare mount and IPC namespaces.  These are unused by QEMU. */
>> +    r = unshare(CLONE_NEWNS | CLONE_NEWIPC);
>> +    if (r < 0) {
>> +        LOGE(ERROR, "libxl: Mount and IPC namespace unfailed");
> 
> I think I would slightly prefer
>   +    if (r) {
> 
> unshare is supposed to return -1 or 0 so this should make no
> difference.  If it does something else we should not carry on.  It's
> unlikely enough that I don't mind it printing a garbage errno value in
> that case.

All three done.

> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs
  2018-10-26 14:02   ` Ian Jackson
@ 2018-10-26 16:31     ` George Dunlap
  0 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2018-10-26 16:31 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Anthony Perard, xen-devel, Wei Liu

On 10/26/2018 03:02 PM, Ian Jackson wrote:
> George Dunlap writes ("[PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs"):
>> Limit the ability of a potentially compromised QEMU to consume system
>> resources.  Key limits:
>>  - RLIMIT_FSIZE (file size): 256KiB
>>  - RLIMIT_NPROC (after uid changes to a unique uid)
> 
> Thanks.
> 
>> +static struct {
>> +    int resource;
>> +    rlim_t limit;
>> +} rlimits[] = {
>> +#define RLIMIT_ENTRY(r, l) \
>> +    { .resource = r, .limit = l }
>> +    /* Big enough for log files, not big enough for a DoS */
>> +    RLIMIT_ENTRY(RLIMIT_FSIZE, 256*1024),
>> +
>> +    /* Shouldn't need any of these */
>> +    RLIMIT_ENTRY(RLIMIT_NPROC, 0),
>> +    RLIMIT_ENTRY(RLIMIT_CORE, 0),
>> +    RLIMIT_ENTRY(RLIMIT_MSGQUEUE, 0),
>> +    RLIMIT_ENTRY(RLIMIT_LOCKS, 0),
>> +    RLIMIT_ENTRY(RLIMIT_MEMLOCK, 0),
> 
> I would have justified the values so this looked more tabular.

Sure.

> 
>> +    /* Set various "easy" rlimits */
>> +    for (i = 0; rlimits[i].resource != RLIMIT_NLIMITS; i++) {
>> +        struct rlimit rlim;
>> +
>> +        rlim.rlim_cur = rlim.rlim_max = rlimits[i].limit;
>> +        
>> +        r = setrlimit(rlimits[i].resource, &rlim);
>> +        if (r < 0) {
>> +            LOGE(ERROR, "Setting rlimit %d to %lld failed\n",
>> +                                  rlimits[i].resource,
>> +                                  (unsigned long long)rlimits[i].limit);
> 
> I think you mean %llu not %lld.  With that last point changed,
> 
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Thanks.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv
  2018-10-26 15:28     ` George Dunlap
@ 2018-10-29 10:16       ` Ian Jackson
  0 siblings, 0 replies; 20+ messages in thread
From: Ian Jackson @ 2018-10-29 10:16 UTC (permalink / raw)
  To: George Dunlap
  Cc: Anthony PERARD, xen-devel, Stefano Stabellini, Wei Liu, Ross Lagerwall

George Dunlap writes ("Re: [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv"):
> Oh, actually, 65534 is "nogroup", which is the default when you don't
> add a specific group.
> 
> Should we recommend creating a separate group for the Xen qemus in our
> feature doc?  Or should we just mention the possibility, but leave the
> actual example to the default (which will normally end up with the
> `nogroup` group)?

`nogroup' isn't as big a problem in general as `nobody'.  (No
processes may ever run as nobody because to avoid unintendedly
permitting access, such a non-id must either have no principals or no
objects, and a process running with a particular uid is both; whereas
running as a particular group does not turn a process into an object
accessible via that group.)

But it's still probably best avoided in case of mistakes.  Also
assigning a group to all the qemus may make some kinds of
configuration applicable to all of them easier.

So I think we should recommend creating one group for this.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans
       [not found]   ` <bdf85f15-4eb7-a1b1-75cf-b31cae07010b@citrix.com>
@ 2018-11-02 11:50     ` George Dunlap
  0 siblings, 0 replies; 20+ messages in thread
From: George Dunlap @ 2018-11-02 11:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Stefano Stabellini, Wei Liu, Konrad Wilk, Andrew Cooper,
	Tim (Xen.org),
	Ross Lagerwall, Julien Grall, Jan Beulich, Anthony Perard,
	xen-devel

[Re-sending as this mail seems to have gone missing]

> On Oct 26, 2018, at 3:52 PM, George Dunlap <george.dunlap@citrix.com> wrote:
> 
> On 10/26/2018 02:45 PM, Ian Jackson wrote:
>> George Dunlap writes ("[PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans"):
>>> docs/qemu-deprivilege.txt had some basic instructions for using
>>> dm_restrict, but it was incomplete, misleading, and stale.
>> 
>> Thanks for the updates to the unshare stuff.
>> 
>>> +### Device Model Deprivileging
>>> +
>>> +    Status, Linux: Tech Preview, with limited support
>>                    ^
>>                     dom0
> 
> "Deprivileging" only makes sense in a dom0 context; the definition in
> the first paragraph should make that clear.  I think adding 'dom0' would
> confuse the issue by implying that non-dom0 deprivileging is possible.
> 
>> I think this maybe needs
>> 
>>  +    Status, FreeBSD dom0: Unsupported
>> 
>> too ?  The usual default is supported and not listing it at all is
>> confusing.
> 
> Where do we say the default is supported?  I thought the default for a
> _feature_ not mentioned was "no information" (i.e., might be either
> supported or not -- if there's a question ask), and the default for a
> _configuration_ not mentioned was "unsupported".
> 
>>> +NOTE: Most modern systems have 32-bit UIDs, and so can in theory go up
>>> +to 2^31 (or 2^32 if uids are unsigned).  POSIX only guarantees 16-bit
>>> +UIDs however; UID 65535 is reserved for an invalid value, and 65534 is
>>> +normally allocated to "nobody".  Additionally, some container systems
>>> +have proposed using the upper 32 bits of the uid for a container ID.
>>                                 ^^
>>                                 16
> 
> Ack
> 
>> This is a good paragraph.
>> 
>> Can I suggest we pick a different example to 65536 ?  It's visually
>> similar to the familiar values of 65534 and 65535 and abuts them.
>> 
>> osstest uses 200000 but that's not a multiple of 2^16.
>> How about 131072 ?
> 
> Is the idea for making it a multiple of 2^16 that the values will then
> only take up one entry in the "container ID" space?
> 
> -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-11-02 11:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 16:56 [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
2018-10-05 16:56 ` [PATCH 2/5] tools/dm_restrict: Ask QEMU to chroot George Dunlap
2018-10-26 13:52   ` Ian Jackson
2018-10-05 16:56 ` [PATCH 3/5] tools/dm_restrict: Unshare mount and IPC namespaces on Linux George Dunlap
2018-10-26 14:00   ` Ian Jackson
2018-10-26 16:18     ` George Dunlap
2018-10-05 16:57 ` [PATCH 4/5] tools/dm_depriv: Add first cut RLIMITs George Dunlap
2018-10-26 14:02   ` Ian Jackson
2018-10-26 16:31     ` George Dunlap
2018-10-05 16:57 ` [PATCH 5/5] RFC: test/depriv: Add a tool to check process-level depriv George Dunlap
2018-10-08 16:28   ` Anthony PERARD
2018-10-08 16:44     ` Ian Jackson
2018-10-26 15:28     ` George Dunlap
2018-10-29 10:16       ` Ian Jackson
2018-10-26 14:06   ` Ian Jackson
2018-10-26 14:24     ` George Dunlap
2018-10-26 14:27       ` Ian Jackson
2018-10-26 12:45 ` [PATCH 1/5] docs/qemu-deprivilege: Revise and update with status and future plans George Dunlap
2018-10-26 13:45 ` Ian Jackson
     [not found]   ` <bdf85f15-4eb7-a1b1-75cf-b31cae07010b@citrix.com>
2018-11-02 11:50     ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).