linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [capabilities] Allow normal inheritance for a configurable set of capabilities
@ 2015-02-02 16:21 Christoph Lameter
  2015-02-02 17:12 ` Serge Hallyn
  2015-02-02 17:54 ` Casey Schaufler
  0 siblings, 2 replies; 57+ messages in thread
From: Christoph Lameter @ 2015-02-02 16:21 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Andy Lutomirski, Jonathan Corbet, Aaron Jones, Ted Ts'o,
	linux-security-module, linux-kernel, akpm

Linux capabilities suffer from the problem that they are not inheritable
like regular process characteristics under Unix. This is behavior that
is counter intuitive to the expected behavior of processes in Unix.

In particular there has been recently software that controls NICs from user
space and provides IP stack like behavior also in user space (DPDK and RDMA
kernel API based implementations). Those typically need either capabilities
to allow raw network access or have to be run setsuid. There is scripting and
LD_PREFLOAD etc involved, arbitrary binaries may be run from those scripts.
That does not go well with having file capabilities set that would enable
the capabilities. Maybe it would work if one would setup capabilities on
all executables but that would also defeat a secure design since these
binaries may only need those caps for certain situations. Ok setting the
inheritable flags on everything may also get one there (if there would not
be the issues with LD_PRELOAD, debugging etc etc).

The easy solution is that capabilities need to be inherited like setsuid
is. We really prefer to use capabilities instead of setsuid (we want to
limit what damage someone can do after all!). Therefore we have been
running a patch like this in production for the last 6 years. At some
point it becomes tedious to run your own custom kernel so we would like
to have this functionality upstream.

See some of the earlier related discussions on the problems with capability
inheritance:

0. Recent surprise:
		https://lkml.org/lkml/2014/1/21/175

1. Attempt to revise caps
		http://www.madore.org/~david/linux/newcaps/

2. Problems of passing caps through exec
		http://unix.stackexchange.com/questions/128394/passing-capabilities-through-exec

3. Problems of binding to privileged ports
		http://stackoverflow.com/questions/413807/is-there-a-way-for-non-root-processes-to-bind-to-privileged-ports-1024-on-l

4. Reviving capabilities
		http://lwn.net/Articles/199004/



There does not seem to be an alternative on the horizon. Some involved
in security development under Linux have even stated that they want to
rip out the whole thing and replace it. Its been a couple of years now
and we are still suffering from the capabilities mess. Let us just
fix it.

This patch does not change the default behavior but it allows to set up
a list of capabilities in the proc filesystem that will enable regular
unix inheritance only for the selected group of capabilities.

With that it is then possible to do something trivial like setting
CAP_NET_RAW on an executable that can then allow that capability to
be inherited by others.

e.g

echo 12,13,23 >/proc/sys/kernel/cap_inheritable

Allows the inheritance of CAP_SYS_NICE, CAP_NET_RAW and CAP_NET_ADMIN.
With that device raw access is possible and also real time priorities
can be set from user space. This is a frequently needed set of
priviledged operations in HPC and HFT applications. User space
processes need to be able to directly access devices as well as
have full control over scheduling.

Setting capabilities on an executable is not always possible if
for example LD_PRELOAD or other things also have to be used. In that
case it is possible to build a classic wrapper after applying this
patch that sets up the proper privileges for running processes
that need these.

I usually do not dabble in security and I am not sure if this is
done correctly. If someone has a better solution then please tell
me but so far we have not seen anything else that actually works.
This keeps on coming up in various context and we need the issue
fixed!

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/include/linux/capability.h
===================================================================
--- linux.orig/include/linux/capability.h
+++ linux/include/linux/capability.h
@@ -44,6 +44,7 @@ struct user_namespace *current_user_ns(v

 extern const kernel_cap_t __cap_empty_set;
 extern const kernel_cap_t __cap_init_eff_set;
+extern const unsigned long *sysctl_cap_inheritable;

 /*
  * Internal kernel functions only
Index: linux/kernel/capability.c
===================================================================
--- linux.orig/kernel/capability.c
+++ linux/kernel/capability.c
@@ -26,6 +26,16 @@
 const kernel_cap_t __cap_empty_set = CAP_EMPTY_SET;
 EXPORT_SYMBOL(__cap_empty_set);

+/*
+ * Allow inheritance with typical unix semantics for capabilities.
+ * This means that the inheritable flag can be omitted on the file
+ * that inherits the capabilities. Capabilities will be passed down
+ * via exec like other process characteristics. This is the behavior
+ * sysadmins expect.
+ */
+static unsigned long cap_inheritable[BITS_TO_LONGS(CAP_LAST_CAP)];
+const unsigned long *sysctl_cap_inheritable = cap_inheritable;
+
 int file_caps_enabled = 1;

 static int __init file_caps_disable(char *str)
Index: linux/kernel/sysctl.c
===================================================================
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -840,6 +840,14 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0444,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "cap_inheritable",
+		.data		= &sysctl_cap_inheritable,
+		.maxlen		= CAP_LAST_CAP,
+		.mode		= 0644,
+		.proc_handler	= proc_do_large_bitmap,
+	},
+
 #if defined(CONFIG_LOCKUP_DETECTOR)
 	{
 		.procname       = "watchdog",
Index: linux/security/commoncap.c
===================================================================
--- linux.orig/security/commoncap.c
+++ linux/security/commoncap.c
@@ -437,6 +437,9 @@ static int get_file_caps(struct linux_bi
 	struct dentry *dentry;
 	int rc = 0;
 	struct cpu_vfs_cap_data vcaps;
+	kernel_cap_t inherit = CAP_EMPTY_SET;
+	bool does_inherit = false;
+	int i;

 	bprm_clear_caps(bprm);

@@ -446,6 +449,17 @@ static int get_file_caps(struct linux_bi
 	if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
 		return 0;

+	/*
+	 * Figure out if any capabilities are inheritable without
+	 * setting bits in the target file.
+	 */
+	for_each_set_bit(i, sysctl_cap_inheritable, CAP_LAST_CAP)
+		if (capable(i)) {
+			cap_raise(inherit, i);
+			does_inherit = true;
+		}
+
+
 	dentry = dget(bprm->file->f_path.dentry);

 	rc = get_vfs_caps_from_disk(dentry, &vcaps);
@@ -455,7 +469,8 @@ static int get_file_caps(struct linux_bi
 				__func__, rc, bprm->filename);
 		else if (rc == -ENODATA)
 			rc = 0;
-		goto out;
+		if (!does_inherit)
+			goto out;
 	}

 	rc = bprm_caps_from_vfs_caps(&vcaps, bprm, effective, has_cap);
@@ -463,6 +478,15 @@ static int get_file_caps(struct linux_bi
 		printk(KERN_NOTICE "%s: cap_from_disk returned %d for %s\n",
 		       __func__, rc, bprm->filename);

+	if (does_inherit) {
+		struct cred *new = bprm->cred;
+		/* Add new capabilies from inheritance mask */
+		new->cap_inheritable = cap_combine(inherit, new->cap_inheritable);
+		new->cap_permitted = cap_combine(inherit, new->cap_permitted);
+		*effective = true;
+		*has_cap = true;
+	}
+
 out:
 	dput(dentry);
 	if (rc)

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2015-02-27 22:47 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02 16:21 [capabilities] Allow normal inheritance for a configurable set of capabilities Christoph Lameter
2015-02-02 17:12 ` Serge Hallyn
2015-02-02 17:18   ` Andy Lutomirski
2015-02-02 18:09     ` Serge Hallyn
2015-02-03 15:16     ` Christoph Lameter
2015-02-03 15:23   ` Christoph Lameter
2015-02-03 15:55     ` Serge E. Hallyn
2015-02-03 17:18       ` Christoph Lameter
2015-02-03 17:26         ` Serge E. Hallyn
2015-02-04 15:15           ` Andrew G. Morgan
2015-02-04 15:50             ` Christoph Lameter
2015-02-04 15:56               ` Serge E. Hallyn
2015-02-04 16:12                 ` Andrew G. Morgan
2015-02-04 16:34                   ` Andy Lutomirski
2015-02-04 16:54                     ` Andrew G. Morgan
2015-02-04 17:34                       ` Serge E. Hallyn
2015-02-04 18:12                         ` Christoph Lameter
2015-02-04 16:43                   ` Christoph Lameter
2015-02-04 16:27                 ` Andy Lutomirski
2015-02-05  0:34             ` Serge E. Hallyn
2015-02-05 15:23               ` Serge E. Hallyn
2015-02-25 21:50     ` Pavel Machek
2015-02-25 23:59       ` Christoph Lameter
2015-02-26 12:27         ` Pavel Machek
2015-02-27 20:15           ` Andy Lutomirski
2015-02-27 20:48             ` Pavel Machek
2015-02-27 20:56               ` Andy Lutomirski
2015-02-27 22:47                 ` Pavel Machek
2015-02-02 17:54 ` Casey Schaufler
2015-02-02 18:08   ` Serge Hallyn
2015-02-02 18:47     ` Mimi Zohar
2015-02-02 19:05       ` Austin S Hemmelgarn
2015-02-02 20:35         ` Casey Schaufler
2015-02-03 16:04       ` Serge E. Hallyn
2015-02-02 19:00     ` Casey Schaufler
2015-02-05  0:20       ` Serge E. Hallyn
2015-02-02 20:37     ` Andy Lutomirski
2015-02-02 20:54       ` Casey Schaufler
2015-02-03 15:51         ` Serge E. Hallyn
2015-02-03 16:37           ` Casey Schaufler
2015-02-03 17:28             ` Serge E. Hallyn
2015-02-03 17:50               ` Casey Schaufler
2015-02-03 19:45                 ` Christoph Lameter
2015-02-03 20:13                   ` Andy Lutomirski
2015-02-03 23:14                     ` Christoph Lameter
2015-02-03 23:17                       ` Andy Lutomirski
2015-02-04  2:27                         ` Christoph Lameter
2015-02-04  6:05                         ` Markku Savela
2015-02-04 13:17                           ` Christoph Lameter
2015-02-04 13:41                             ` Markku Savela
2015-02-04 14:56                               ` Jarkko Sakkinen
2015-02-03 15:17       ` Christoph Lameter
2015-02-03 15:40         ` Casey Schaufler
2015-02-03 15:46       ` Serge E. Hallyn
2015-02-03 17:19         ` Christoph Lameter
2015-02-03 17:29           ` Serge E. Hallyn
2015-02-25 21:50     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).