[PATCH v3 0/2] Restrict dangerous open in sticky directories

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] Restrict dangerous open in sticky directories
@ 2017-11-22  8:01 ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

This patch-set introduces two separate features aimed at restricting
dangerous open in world or group writable sticky directories.
The purpose is to prevent exploitable bugs in user-space programs
that don't access sticky directories in the proper way.
The first patch prevents the O_CREAT open of FIFOs and regular files
in world or group writable sticky directories, if they already exists
and are owned by someone else.
The second patch prevents O_CREAT open in world or group writable
sticky when the O_EXCL flag is not set, even if the file doesn't
exist yet.
More details can be found in the respective commit messages.

Changes in v3:
	- Fixed format string for uid_t that is unsigned
	  (suggested by Jann Horn).
	- Stop checking if file's and parent dir's owners match in
	  may_create_no_excl. This will allow to discover potential
	  vulnerabilities more easily.

Salvatore Mesoraca (2):
  Protected FIFOs and regular files
  Protected O_CREAT open in sticky directories

 Documentation/sysctl/fs.txt |  66 +++++++++++++++++++++++++
 fs/namei.c                  | 117 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h          |   3 ++
 kernel/sysctl.c             |  27 ++++++++++
 4 files changed, 210 insertions(+), 3 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] [PATCH v3 0/2] Restrict dangerous open in sticky directories
@ 2017-11-22  8:01 ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

This patch-set introduces two separate features aimed at restricting
dangerous open in world or group writable sticky directories.
The purpose is to prevent exploitable bugs in user-space programs
that don't access sticky directories in the proper way.
The first patch prevents the O_CREAT open of FIFOs and regular files
in world or group writable sticky directories, if they already exists
and are owned by someone else.
The second patch prevents O_CREAT open in world or group writable
sticky when the O_EXCL flag is not set, even if the file doesn't
exist yet.
More details can be found in the respective commit messages.

Changes in v3:
	- Fixed format string for uid_t that is unsigned
	  (suggested by Jann Horn).
	- Stop checking if file's and parent dir's owners match in
	  may_create_no_excl. This will allow to discover potential
	  vulnerabilities more easily.

Salvatore Mesoraca (2):
  Protected FIFOs and regular files
  Protected O_CREAT open in sticky directories

 Documentation/sysctl/fs.txt |  66 +++++++++++++++++++++++++
 fs/namei.c                  | 117 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h          |   3 ++
 kernel/sysctl.c             |  27 ++++++++++
 4 files changed, 210 insertions(+), 3 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v3 1/2] Protected FIFOs and regular files
  2017-11-22  8:01 ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-22  8:01   ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

Disallows open of FIFOs or regular files not owned by the user in world
writable sticky directories, unless the owner is the same as that of
the directory or the file is opened without the O_CREAT flag.
The purpose is to make data spoofing attacks harder.
This protection can be turned on and off separately for FIFOs and regular
files via sysctl, just like the symlinks/hardlinks protection.
This patch is based on Openwall's "HARDEN_FIFO" feature by Solar
Designer.

This is a brief list of old vulnerabilities that could have been prevented
by this feature, some of them even allow for privilege escalation:
CVE-2000-1134
CVE-2007-3852
CVE-2008-0525
CVE-2009-0416
CVE-2011-4834
CVE-2015-1838
CVE-2015-7442
CVE-2016-7489

This list is not meant to be complete. It's difficult to track down
all vulnerabilities of this kind because they were often reported
without any mention of this particular attack vector.
In fact, before symlinks restrictions, fifos/regular files were not the
favorite vehicle to exploit them.

Suggested-by: Solar Designer <solar@openwall.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++
 fs/namei.c                  | 61 ++++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h          |  2 ++
 kernel/sysctl.c             | 18 +++++++++++++
 4 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 6c00c1e..f3cf2cd 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs:
 - overflowgid
 - pipe-user-pages-hard
 - pipe-user-pages-soft
+- protected_fifos
 - protected_hardlinks
+- protected_regular
 - protected_symlinks
 - suid_dumpable
 - super-max
@@ -182,6 +184,24 @@ applied.
 
 ==============================================================
 
+protected_fifos:
+
+The intent of this protection is to avoid unintentional writes to
+an attacker-controlled FIFO, where a program expected to create a regular
+file.
+
+When set to "0", FIFOs writing is unrestricted.
+
+When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+in world writable sticky directories, unless they are owned by the
+owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+This protection is based on the restrictions in Openwall.
+
+==============================================================
+
 protected_hardlinks:
 
 A long-standing class of security issues is the hardlink-based
@@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity.
 
 ==============================================================
 
+protected_regular:
+
+This protection is similar to protected_fifos, but it
+avoids writes to an attacker-controlled regular file, where a program
+expected to create one.
+
+When set to "0", regular files writing is unrestricted.
+
+When set to "1" don't allow O_CREAT open on regular files that we
+don't own in world writable sticky directories, unless they are
+owned by the owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+==============================================================
+
 protected_symlinks:
 
 A long-standing class of security issues is the symlink-based
diff --git a/fs/namei.c b/fs/namei.c
index f0c7a7b..92992ad 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd)
 
 int sysctl_protected_symlinks __read_mostly = 0;
 int sysctl_protected_hardlinks __read_mostly = 0;
+int sysctl_protected_fifos __read_mostly;
+int sysctl_protected_regular __read_mostly;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
@@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link)
 	return -EPERM;
 }
 
+/**
+ * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory
+ *			  should be allowed or not, when the file already
+ *			  existed.
+ * @dir: the stick parent directory
+ * @name: the file name
+ * @inode: the inode of the file to open
+ *
+ * Block an O_CREAT open of a FIFO (or a regular file) when:
+ *   - sysctl_protected_fifos (or sysctl_protected_regular) is enabled
+ *   - the file already exists
+ *   - we are in a sticky directory
+ *   - we don't own the file
+ *   - the owner of the directory doesn't own the file
+ *   - the directory is world writable
+ * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2
+ * the directory doesn't have to be world writable: being group writable will
+ * be enough.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_in_sticky(struct dentry * const dir,
+				const unsigned char * const name,
+				struct inode * const inode)
+{
+	if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) ||
+	    (!sysctl_protected_regular && S_ISREG(inode->i_mode)) ||
+	    likely(!(dir->d_inode->i_mode & S_ISVTX)) ||
+	    uid_eq(inode->i_uid, dir->d_inode->i_uid) ||
+	    uid_eq(current_fsuid(), inode->i_uid))
+		return 0;
+
+	if (likely(dir->d_inode->i_mode & 0002) ||
+	    (dir->d_inode->i_mode & 0020 &&
+	     ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) ||
+	      (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) {
+		pr_notice_ratelimited("denied writing in '%s' of %u.%u in a sticky directory by UID %u, EUID %u, process %s:%d.\n",
+				      name,
+				      from_kuid(&init_user_ns, inode->i_uid),
+				      from_kgid(&init_user_ns, inode->i_gid),
+				      from_kuid(&init_user_ns, current_uid()),
+				      from_kuid(&init_user_ns, current_euid()),
+				      current->comm, current->pid);
+		return -EACCES;
+	}
+	return 0;
+}
+
 static __always_inline
 const char *get_link(struct nameidata *nd)
 {
@@ -3365,9 +3415,14 @@ static int do_last(struct nameidata *nd,
 	if (error)
 		return error;
 	audit_inode(nd->name, nd->path.dentry, 0);
-	error = -EISDIR;
-	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
-		goto out;
+	if (open_flag & O_CREAT) {
+		error = -EISDIR;
+		if (d_is_dir(nd->path.dentry))
+			goto out;
+		error = may_create_in_sticky(dir, nd->last.name, inode);
+		if (unlikely(error))
+			goto out;
+	}
 	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		goto out;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a27..6fb45a52 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -72,6 +72,8 @@
 extern int leases_enable, lease_break_time;
 extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
+extern int sysctl_protected_fifos;
+extern int sysctl_protected_regular;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..590fbc9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1799,6 +1799,24 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra2		= &one,
 	},
 	{
+		.procname	= "protected_fifos",
+		.data		= &sysctl_protected_fifos,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
+		.procname	= "protected_regular",
+		.data		= &sysctl_protected_regular,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files
@ 2017-11-22  8:01   ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

Disallows open of FIFOs or regular files not owned by the user in world
writable sticky directories, unless the owner is the same as that of
the directory or the file is opened without the O_CREAT flag.
The purpose is to make data spoofing attacks harder.
This protection can be turned on and off separately for FIFOs and regular
files via sysctl, just like the symlinks/hardlinks protection.
This patch is based on Openwall's "HARDEN_FIFO" feature by Solar
Designer.

This is a brief list of old vulnerabilities that could have been prevented
by this feature, some of them even allow for privilege escalation:
CVE-2000-1134
CVE-2007-3852
CVE-2008-0525
CVE-2009-0416
CVE-2011-4834
CVE-2015-1838
CVE-2015-7442
CVE-2016-7489

This list is not meant to be complete. It's difficult to track down
all vulnerabilities of this kind because they were often reported
without any mention of this particular attack vector.
In fact, before symlinks restrictions, fifos/regular files were not the
favorite vehicle to exploit them.

Suggested-by: Solar Designer <solar@openwall.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++
 fs/namei.c                  | 61 ++++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h          |  2 ++
 kernel/sysctl.c             | 18 +++++++++++++
 4 files changed, 114 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 6c00c1e..f3cf2cd 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs:
 - overflowgid
 - pipe-user-pages-hard
 - pipe-user-pages-soft
+- protected_fifos
 - protected_hardlinks
+- protected_regular
 - protected_symlinks
 - suid_dumpable
 - super-max
@@ -182,6 +184,24 @@ applied.
 
 ==============================================================
 
+protected_fifos:
+
+The intent of this protection is to avoid unintentional writes to
+an attacker-controlled FIFO, where a program expected to create a regular
+file.
+
+When set to "0", FIFOs writing is unrestricted.
+
+When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+in world writable sticky directories, unless they are owned by the
+owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+This protection is based on the restrictions in Openwall.
+
+==============================================================
+
 protected_hardlinks:
 
 A long-standing class of security issues is the hardlink-based
@@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity.
 
 ==============================================================
 
+protected_regular:
+
+This protection is similar to protected_fifos, but it
+avoids writes to an attacker-controlled regular file, where a program
+expected to create one.
+
+When set to "0", regular files writing is unrestricted.
+
+When set to "1" don't allow O_CREAT open on regular files that we
+don't own in world writable sticky directories, unless they are
+owned by the owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+==============================================================
+
 protected_symlinks:
 
 A long-standing class of security issues is the symlink-based
diff --git a/fs/namei.c b/fs/namei.c
index f0c7a7b..92992ad 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd)
 
 int sysctl_protected_symlinks __read_mostly = 0;
 int sysctl_protected_hardlinks __read_mostly = 0;
+int sysctl_protected_fifos __read_mostly;
+int sysctl_protected_regular __read_mostly;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
@@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link)
 	return -EPERM;
 }
 
+/**
+ * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory
+ *			  should be allowed or not, when the file already
+ *			  existed.
+ * @dir: the stick parent directory
+ * @name: the file name
+ * @inode: the inode of the file to open
+ *
+ * Block an O_CREAT open of a FIFO (or a regular file) when:
+ *   - sysctl_protected_fifos (or sysctl_protected_regular) is enabled
+ *   - the file already exists
+ *   - we are in a sticky directory
+ *   - we don't own the file
+ *   - the owner of the directory doesn't own the file
+ *   - the directory is world writable
+ * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2
+ * the directory doesn't have to be world writable: being group writable will
+ * be enough.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_in_sticky(struct dentry * const dir,
+				const unsigned char * const name,
+				struct inode * const inode)
+{
+	if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) ||
+	    (!sysctl_protected_regular && S_ISREG(inode->i_mode)) ||
+	    likely(!(dir->d_inode->i_mode & S_ISVTX)) ||
+	    uid_eq(inode->i_uid, dir->d_inode->i_uid) ||
+	    uid_eq(current_fsuid(), inode->i_uid))
+		return 0;
+
+	if (likely(dir->d_inode->i_mode & 0002) ||
+	    (dir->d_inode->i_mode & 0020 &&
+	     ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) ||
+	      (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) {
+		pr_notice_ratelimited("denied writing in '%s' of %u.%u in a sticky directory by UID %u, EUID %u, process %s:%d.\n",
+				      name,
+				      from_kuid(&init_user_ns, inode->i_uid),
+				      from_kgid(&init_user_ns, inode->i_gid),
+				      from_kuid(&init_user_ns, current_uid()),
+				      from_kuid(&init_user_ns, current_euid()),
+				      current->comm, current->pid);
+		return -EACCES;
+	}
+	return 0;
+}
+
 static __always_inline
 const char *get_link(struct nameidata *nd)
 {
@@ -3365,9 +3415,14 @@ static int do_last(struct nameidata *nd,
 	if (error)
 		return error;
 	audit_inode(nd->name, nd->path.dentry, 0);
-	error = -EISDIR;
-	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
-		goto out;
+	if (open_flag & O_CREAT) {
+		error = -EISDIR;
+		if (d_is_dir(nd->path.dentry))
+			goto out;
+		error = may_create_in_sticky(dir, nd->last.name, inode);
+		if (unlikely(error))
+			goto out;
+	}
 	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
 		goto out;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a27..6fb45a52 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -72,6 +72,8 @@
 extern int leases_enable, lease_break_time;
 extern int sysctl_protected_symlinks;
 extern int sysctl_protected_hardlinks;
+extern int sysctl_protected_fifos;
+extern int sysctl_protected_regular;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 557d467..590fbc9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1799,6 +1799,24 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra2		= &one,
 	},
 	{
+		.procname	= "protected_fifos",
+		.data		= &sysctl_protected_fifos,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
+		.procname	= "protected_regular",
+		.data		= &sysctl_protected_regular,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01 ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-22  8:01   ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

Disallows O_CREAT open missing the O_EXCL flag, in world or
group writable directories, even if the file doesn't exist yet.
With few exceptions (e.g. shared lock files based on flock())
if a program tries to open a file, in a sticky directory,
with the O_CREAT flag and without the O_EXCL, it probably has a bug.
This feature allows to detect and potentially block programs that
act this way, it can be used to find vulnerabilities (like those
prevented by patch #1) and to do policy enforcement.

Suggested-by: Solar Designer <solar@openwall.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
 fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |  1 +
 kernel/sysctl.c             |  9 ++++++++
 4 files changed, 96 insertions(+)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index f3cf2cd..7f24b4f 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
 - protected_fifos
 - protected_hardlinks
 - protected_regular
+- protected_sticky_child_create
 - protected_symlinks
 - suid_dumpable
 - super-max
@@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
 
 ==============================================================
 
+protected_sticky_child_create:
+
+An O_CREAT open missing the O_EXCL flag in a sticky directory is,
+often, a bug or a synthom of the fact that the program is not
+using appropriate procedures to access sticky directories.
+This protection allow to detect and possibly block these unsafe
+open invocations, even if the files don't exist yet.
+Though should be noted that, sometimes, it's OK to open a file
+with O_CREAT and without O_EXCL (e.g. shared lock files based
+on flock()), for this reason values above 2 should be set
+with care.
+
+When set to "0" the protection is disabled.
+
+When set to "1", notify about O_CREAT open missing the O_EXCL flag
+in world writable sticky directories.
+
+When set to "2", notify about O_CREAT open missing the O_EXCL flag
+in world or group writable sticky directories.
+
+When set to "3", block O_CREAT open missing the O_EXCL flag
+in world writable sticky directories and notify (but don't block)
+in group writable sticky directories.
+
+When set to "4", block O_CREAT open missing the O_EXCL flag
+in world writable and group writable sticky directories.
+
+==============================================================
+
 protected_symlinks:
 
 A long-standing class of security issues is the symlink-based
diff --git a/fs/namei.c b/fs/namei.c
index 92992ad..fcee423 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -904,6 +904,7 @@ static inline void put_link(struct nameidata *nd)
 int sysctl_protected_hardlinks __read_mostly = 0;
 int sysctl_protected_fifos __read_mostly;
 int sysctl_protected_regular __read_mostly;
+int sysctl_protected_sticky_child_create __read_mostly;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
@@ -1065,6 +1066,53 @@ static int may_create_in_sticky(struct dentry * const dir,
 	return 0;
 }
 
+/**
+ * may_create_no_excl - Detect and possibly block unsafe O_CREAT open
+ *			without O_EXCL.
+ * @dir: the stick parent directory
+ * @name: the file name
+ * @inode: the inode of the file to open (can be NULL to skip uid checks)
+ *
+ * When sysctl_protected_sticky_child_create is set to "0" the
+ * protection is disabled.
+ * When it's set to "1", notify about O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories.
+ * When it's set to "2", notify about O_CREAT open missing the O_EXCL flag
+ * in group writable sticky directories.
+ * When it's set to "3", block O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories and notify (but don't block)
+ * in group writable sticky directories.
+ * When it's set to "4", block O_CREAT open missing the O_EXCL flag
+ * in world writable and group writable sticky directories.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_no_excl(struct dentry * const dir,
+			      const unsigned char * const name,
+			      struct inode * const inode)
+{
+	umode_t mode = dir->d_inode->i_mode;
+
+	if (likely(!(mode & S_ISVTX)))
+		return 0;
+	if (inode && uid_eq(current_fsuid(), inode->i_uid))
+		return 0;
+
+	if ((sysctl_protected_sticky_child_create && likely(mode & 0002)) ||
+	    (sysctl_protected_sticky_child_create >= 2 && mode & 0020)) {
+		pr_notice_ratelimited("unsafe O_CREAT open (missing O_EXCL) of '%s' in a sticky directory by UID %u, EUID %u, process %s:%d.\n",
+				      name,
+				      from_kuid(&init_user_ns, current_uid()),
+				      from_kuid(&init_user_ns, current_euid()),
+				      current->comm, current->pid);
+		if (sysctl_protected_sticky_child_create >= 4 ||
+		    (sysctl_protected_sticky_child_create == 3 &&
+		     likely(mode & 0002)))
+			return -EACCES;
+	}
+	return 0;
+}
+
 static __always_inline
 const char *get_link(struct nameidata *nd)
 {
@@ -3256,6 +3304,11 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 			error = -EACCES;
 			goto out_dput;
 		}
+		if (!(open_flag & O_EXCL)) {
+			error = may_create_no_excl(dir, nd->last.name, NULL);
+			if (unlikely(error))
+				goto out_dput;
+		}
 		error = dir_inode->i_op->create(dir_inode, dentry, mode,
 						open_flag & O_EXCL);
 		if (error)
@@ -3422,6 +3475,9 @@ static int do_last(struct nameidata *nd,
 		error = may_create_in_sticky(dir, nd->last.name, inode);
 		if (unlikely(error))
 			goto out;
+		error = may_create_no_excl(dir, nd->last.name, inode);
+		if (unlikely(error))
+			goto out;
 	}
 	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6fb45a52..3ab37e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -74,6 +74,7 @@
 extern int sysctl_protected_hardlinks;
 extern int sysctl_protected_fifos;
 extern int sysctl_protected_regular;
+extern int sysctl_protected_sticky_child_create;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 590fbc9..012c739 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1817,6 +1817,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra2		= &two,
 	},
 	{
+		.procname	= "protected_sticky_child_create",
+		.data		= &sysctl_protected_sticky_child_create,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &four,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-22  8:01   ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-22  8:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

Disallows O_CREAT open missing the O_EXCL flag, in world or
group writable directories, even if the file doesn't exist yet.
With few exceptions (e.g. shared lock files based on flock())
if a program tries to open a file, in a sticky directory,
with the O_CREAT flag and without the O_EXCL, it probably has a bug.
This feature allows to detect and potentially block programs that
act this way, it can be used to find vulnerabilities (like those
prevented by patch #1) and to do policy enforcement.

Suggested-by: Solar Designer <solar@openwall.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
 fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |  1 +
 kernel/sysctl.c             |  9 ++++++++
 4 files changed, 96 insertions(+)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index f3cf2cd..7f24b4f 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
 - protected_fifos
 - protected_hardlinks
 - protected_regular
+- protected_sticky_child_create
 - protected_symlinks
 - suid_dumpable
 - super-max
@@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
 
 ==============================================================
 
+protected_sticky_child_create:
+
+An O_CREAT open missing the O_EXCL flag in a sticky directory is,
+often, a bug or a synthom of the fact that the program is not
+using appropriate procedures to access sticky directories.
+This protection allow to detect and possibly block these unsafe
+open invocations, even if the files don't exist yet.
+Though should be noted that, sometimes, it's OK to open a file
+with O_CREAT and without O_EXCL (e.g. shared lock files based
+on flock()), for this reason values above 2 should be set
+with care.
+
+When set to "0" the protection is disabled.
+
+When set to "1", notify about O_CREAT open missing the O_EXCL flag
+in world writable sticky directories.
+
+When set to "2", notify about O_CREAT open missing the O_EXCL flag
+in world or group writable sticky directories.
+
+When set to "3", block O_CREAT open missing the O_EXCL flag
+in world writable sticky directories and notify (but don't block)
+in group writable sticky directories.
+
+When set to "4", block O_CREAT open missing the O_EXCL flag
+in world writable and group writable sticky directories.
+
+==============================================================
+
 protected_symlinks:
 
 A long-standing class of security issues is the symlink-based
diff --git a/fs/namei.c b/fs/namei.c
index 92992ad..fcee423 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -904,6 +904,7 @@ static inline void put_link(struct nameidata *nd)
 int sysctl_protected_hardlinks __read_mostly = 0;
 int sysctl_protected_fifos __read_mostly;
 int sysctl_protected_regular __read_mostly;
+int sysctl_protected_sticky_child_create __read_mostly;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
@@ -1065,6 +1066,53 @@ static int may_create_in_sticky(struct dentry * const dir,
 	return 0;
 }
 
+/**
+ * may_create_no_excl - Detect and possibly block unsafe O_CREAT open
+ *			without O_EXCL.
+ * @dir: the stick parent directory
+ * @name: the file name
+ * @inode: the inode of the file to open (can be NULL to skip uid checks)
+ *
+ * When sysctl_protected_sticky_child_create is set to "0" the
+ * protection is disabled.
+ * When it's set to "1", notify about O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories.
+ * When it's set to "2", notify about O_CREAT open missing the O_EXCL flag
+ * in group writable sticky directories.
+ * When it's set to "3", block O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories and notify (but don't block)
+ * in group writable sticky directories.
+ * When it's set to "4", block O_CREAT open missing the O_EXCL flag
+ * in world writable and group writable sticky directories.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_no_excl(struct dentry * const dir,
+			      const unsigned char * const name,
+			      struct inode * const inode)
+{
+	umode_t mode = dir->d_inode->i_mode;
+
+	if (likely(!(mode & S_ISVTX)))
+		return 0;
+	if (inode && uid_eq(current_fsuid(), inode->i_uid))
+		return 0;
+
+	if ((sysctl_protected_sticky_child_create && likely(mode & 0002)) ||
+	    (sysctl_protected_sticky_child_create >= 2 && mode & 0020)) {
+		pr_notice_ratelimited("unsafe O_CREAT open (missing O_EXCL) of '%s' in a sticky directory by UID %u, EUID %u, process %s:%d.\n",
+				      name,
+				      from_kuid(&init_user_ns, current_uid()),
+				      from_kuid(&init_user_ns, current_euid()),
+				      current->comm, current->pid);
+		if (sysctl_protected_sticky_child_create >= 4 ||
+		    (sysctl_protected_sticky_child_create == 3 &&
+		     likely(mode & 0002)))
+			return -EACCES;
+	}
+	return 0;
+}
+
 static __always_inline
 const char *get_link(struct nameidata *nd)
 {
@@ -3256,6 +3304,11 @@ static int lookup_open(struct nameidata *nd, struct path *path,
 			error = -EACCES;
 			goto out_dput;
 		}
+		if (!(open_flag & O_EXCL)) {
+			error = may_create_no_excl(dir, nd->last.name, NULL);
+			if (unlikely(error))
+				goto out_dput;
+		}
 		error = dir_inode->i_op->create(dir_inode, dentry, mode,
 						open_flag & O_EXCL);
 		if (error)
@@ -3422,6 +3475,9 @@ static int do_last(struct nameidata *nd,
 		error = may_create_in_sticky(dir, nd->last.name, inode);
 		if (unlikely(error))
 			goto out;
+		error = may_create_no_excl(dir, nd->last.name, inode);
+		if (unlikely(error))
+			goto out;
 	}
 	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6fb45a52..3ab37e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -74,6 +74,7 @@
 extern int sysctl_protected_hardlinks;
 extern int sysctl_protected_fifos;
 extern int sysctl_protected_regular;
+extern int sysctl_protected_sticky_child_create;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 590fbc9..012c739 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1817,6 +1817,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra2		= &two,
 	},
 	{
+		.procname	= "protected_sticky_child_create",
+		.data		= &sysctl_protected_sticky_child_create,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &four,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
  (?)
@ 2017-11-22 12:03   ` Pavel Vasilyev
  2017-11-24  8:28     ` Salvatore Mesoraca
  -1 siblings, 1 reply; 60+ messages in thread
From: Pavel Vasilyev @ 2017-11-22 12:03 UTC (permalink / raw)
  To: kernel-hardening



22.11.2017 11:01, Salvatore Mesoraca пишет:
> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.
> 
> Suggested-by: Solar Designer <solar@openwall.com>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> ---
>  Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
>  fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h          |  1 +
>  kernel/sysctl.c             |  9 ++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index f3cf2cd..7f24b4f 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
>  - protected_fifos
>  - protected_hardlinks
>  - protected_regular
> +- protected_sticky_child_create
>  - protected_symlinks
>  - suid_dumpable
>  - super-max
> @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
>  
>  ==============================================================
>  
> +protected_sticky_child_create:
> +
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not
> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe
> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file
> +with O_CREAT and without O_EXCL (e.g. shared lock files based
> +on flock()), for this reason values above 2 should be set
> +with care.
> +
> +When set to "0" the protection is disabled.
> +
> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories.
> +
> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
> +in world or group writable sticky directories.
> +
> +When set to "3", block O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories and notify (but don't block)
> +in group writable sticky directories.
> +
> +When set to "4", block O_CREAT open missing the O_EXCL flag
> +in world writable and group writable sticky directories.

May be add:

When set to "X", notify O_CREAT open missing the O_EXCL flag
in world writable sticky directories and notify in group
writable sticky directories.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-22 13:22     ` Matthew Wilcox
  -1 siblings, 0 replies; 60+ messages in thread
From: Matthew Wilcox @ 2017-11-22 13:22 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not
> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe
> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file
> +with O_CREAT and without O_EXCL (e.g. shared lock files based
> +on flock()), for this reason values above 2 should be set
> +with care.
> +
> +When set to "0" the protection is disabled.
> +
> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories.
> +
> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
> +in world or group writable sticky directories.
> +
> +When set to "3", block O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories and notify (but don't block)
> +in group writable sticky directories.
> +
> +When set to "4", block O_CREAT open missing the O_EXCL flag
> +in world writable and group writable sticky directories.

This seems insufficiently flexible.  For example, there is no way for me
to specify that I want to block O_CREAT without O_EXCL in world-writable,
but not be notified about O_CREAT without O_EXCL in group-writable.

And maybe I want to be notified that blocking has happened?

Why not make it bits?  So:

0 => notify in world
1 => block in world
2 => notify in group
3 => block in group

So you'd have the following meaningful values:

 0 - permit all (your option 0)
 1 - notify world; permit group (your option 1)
 2 - block world; permit group
 3 - block,notify world; permit group
 4 - permit world; notify group (?)
 5 - notify world; notify group (your option 2)
 6 - block world; notify group (your option 3)
 7 - block,notify world; notify group
 8 - permit world; block group (?)
 9 - notify world; block group (?)
10 - block world; block group (your option 4)
11 - block,notify world; block group
12 - permit world; block, notify group (?)
13 - notify world; block, notify group (?)
14 - block world; block, notify group
15 - block, notify world; block, notify group

Some of these don't make a lot of sense (marked with ?), but I don't see
the harm in permitting a sysadmin to do something that seems nonsensical
to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-22 13:22     ` Matthew Wilcox
  0 siblings, 0 replies; 60+ messages in thread
From: Matthew Wilcox @ 2017-11-22 13:22 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not
> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe
> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file
> +with O_CREAT and without O_EXCL (e.g. shared lock files based
> +on flock()), for this reason values above 2 should be set
> +with care.
> +
> +When set to "0" the protection is disabled.
> +
> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories.
> +
> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
> +in world or group writable sticky directories.
> +
> +When set to "3", block O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories and notify (but don't block)
> +in group writable sticky directories.
> +
> +When set to "4", block O_CREAT open missing the O_EXCL flag
> +in world writable and group writable sticky directories.

This seems insufficiently flexible.  For example, there is no way for me
to specify that I want to block O_CREAT without O_EXCL in world-writable,
but not be notified about O_CREAT without O_EXCL in group-writable.

And maybe I want to be notified that blocking has happened?

Why not make it bits?  So:

0 => notify in world
1 => block in world
2 => notify in group
3 => block in group

So you'd have the following meaningful values:

 0 - permit all (your option 0)
 1 - notify world; permit group (your option 1)
 2 - block world; permit group
 3 - block,notify world; permit group
 4 - permit world; notify group (?)
 5 - notify world; notify group (your option 2)
 6 - block world; notify group (your option 3)
 7 - block,notify world; notify group
 8 - permit world; block group (?)
 9 - notify world; block group (?)
10 - block world; block group (your option 4)
11 - block,notify world; block group
12 - permit world; block, notify group (?)
13 - notify world; block, notify group (?)
14 - block world; block, notify group
15 - block, notify world; block, notify group

Some of these don't make a lot of sense (marked with ?), but I don't see
the harm in permitting a sysadmin to do something that seems nonsensical
to me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22 13:22     ` [kernel-hardening] " Matthew Wilcox
  (?)
@ 2017-11-22 14:38     ` Pavel Vasilyev
  -1 siblings, 0 replies; 60+ messages in thread
From: Pavel Vasilyev @ 2017-11-22 14:38 UTC (permalink / raw)
  To: kernel-hardening



22.11.2017 16:22, Matthew Wilcox пишет:

> 
> So you'd have the following meaningful values:
> 
>  0 - permit all (your option 0)
>  1 - notify world; permit group (your option 1)
>  2 - block world; permit group
>  3 - block,notify world; permit group
>  4 - permit world; notify group (?)
>  5 - notify world; notify group (your option 2)
>  6 - block world; notify group (your option 3)
>  7 - block,notify world; notify group
>  8 - permit world; block group (?)
>  9 - notify world; block group (?)
> 10 - block world; block group (your option 4)
> 11 - block,notify world; block group
> 12 - permit world; block, notify group (?)
> 13 - notify world; block, notify group (?)
> 14 - block world; block, notify group
> 15 - block, notify world; block, notify group
> 
> Some of these don't make a lot of sense (marked with ?), but I don't see
> the harm in permitting a sysadmin to do something that seems nonsensical
> to me.
> 

I think that notification in block mode by default.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-22 16:51     ` Alan Cox
  -1 siblings, 0 replies; 60+ messages in thread
From: Alan Cox @ 2017-11-22 16:51 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, 22 Nov 2017 09:01:46 +0100
Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:

> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())

Enough exceptions to make it a bad idea.

Firstly if you care this much *stop* having shared writable directories.
We have namespaces, you don't need them. You can give every user their
own /tmp etc.

The rest of this only make sense on a per application and directory basis
because there are valid use cases, and that means it wants to be part of
an existing LSM security module where you've got the context required and
you can attach it to a specific directory and/or process.

Alan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-22 16:51     ` Alan Cox
  0 siblings, 0 replies; 60+ messages in thread
From: Alan Cox @ 2017-11-22 16:51 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, 22 Nov 2017 09:01:46 +0100
Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:

> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())

Enough exceptions to make it a bad idea.

Firstly if you care this much *stop* having shared writable directories.
We have namespaces, you don't need them. You can give every user their
own /tmp etc.

The rest of this only make sense on a per application and directory basis
because there are valid use cases, and that means it wants to be part of
an existing LSM security module where you've got the context required and
you can attach it to a specific directory and/or process.

Alan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
  (?)
@ 2017-11-23 22:43   ` Tobin C. Harding
  2017-11-24  8:24     ` Salvatore Mesoraca
  -1 siblings, 1 reply; 60+ messages in thread
From: Tobin C. Harding @ 2017-11-23 22:43 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, Nov 22, 2017 at 09:01:45AM +0100, Salvatore Mesoraca wrote:

Please take these comments in all humility, my English is a long way
from perfect. These are English grammar comments only. If this is viewed
as trivial please stop reading now and ignore.

> Disallows open of FIFOs or regular files not owned by the user in world
> writable sticky directories, unless the owner is the same as that of
> the directory or the file is opened without the O_CREAT flag.
> The purpose is to make data spoofing attacks harder.
> This protection can be turned on and off separately for FIFOs and regular
> files via sysctl, just like the symlinks/hardlinks protection.
> This patch is based on Openwall's "HARDEN_FIFO" feature by Solar
> Designer.
> 
> This is a brief list of old vulnerabilities that could have been prevented
> by this feature, some of them even allow for privilege escalation:
> CVE-2000-1134
> CVE-2007-3852
> CVE-2008-0525
> CVE-2009-0416
> CVE-2011-4834
> CVE-2015-1838
> CVE-2015-7442
> CVE-2016-7489
> 
> This list is not meant to be complete. It's difficult to track down
> all vulnerabilities of this kind because they were often reported
> without any mention of this particular attack vector.
> In fact, before symlinks restrictions, fifos/regular files were not the
> favorite vehicle to exploit them.
> 
> Suggested-by: Solar Designer <solar@openwall.com>
> Suggested-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> ---
>  Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++
>  fs/namei.c                  | 61 ++++++++++++++++++++++++++++++++++++++++++---
>  include/linux/fs.h          |  2 ++
>  kernel/sysctl.c             | 18 +++++++++++++
>  4 files changed, 114 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index 6c00c1e..f3cf2cd 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs:
>  - overflowgid
>  - pipe-user-pages-hard
>  - pipe-user-pages-soft
> +- protected_fifos
>  - protected_hardlinks
> +- protected_regular
>  - protected_symlinks
>  - suid_dumpable
>  - super-max
> @@ -182,6 +184,24 @@ applied.
>  
>  ==============================================================
>  
> +protected_fifos:
> +
> +The intent of this protection is to avoid unintentional writes to
> +an attacker-controlled FIFO, where a program expected to create a regular
> +file.
> +
> +When set to "0", FIFOs writing is unrestricted.

 When set to "0", writing to FIFOs is unrestricted.

> +When set to "1" don't allow O_CREAT open on FIFOs that we don't own
> +in world writable sticky directories, unless they are owned by the
> +owner of the directory.
> +
> +When set to "2" it also applies to group writable sticky directories.
> +
> +This protection is based on the restrictions in Openwall.
> +
> +==============================================================
> +
>  protected_hardlinks:
>  
>  A long-standing class of security issues is the hardlink-based
> @@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity.
>  
>  ==============================================================
>  
> +protected_regular:
> +
> +This protection is similar to protected_fifos, but it
> +avoids writes to an attacker-controlled regular file, where a program
> +expected to create one.
> +
> +When set to "0", regular files writing is unrestricted.

 When set to "0", writing to regular files is unrestricted.

> +When set to "1" don't allow O_CREAT open on regular files that we
> +don't own in world writable sticky directories, unless they are
> +owned by the owner of the directory.
> +
> +When set to "2" it also applies to group writable sticky directories.
> +
> +==============================================================
> +
>  protected_symlinks:
>  
>  A long-standing class of security issues is the symlink-based
> diff --git a/fs/namei.c b/fs/namei.c
> index f0c7a7b..92992ad 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd)
>  
>  int sysctl_protected_symlinks __read_mostly = 0;
>  int sysctl_protected_hardlinks __read_mostly = 0;
> +int sysctl_protected_fifos __read_mostly;
> +int sysctl_protected_regular __read_mostly;
>  
>  /**
>   * may_follow_link - Check symlink following for unsafe situations
> @@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link)
>  	return -EPERM;
>  }
>  
> +/**
> + * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory
> + *			  should be allowed or not, when the file already
> + *			  existed.

Perhaps

 + * may_create_in_sticky - Check whether an O_CREAT open, in a sticky directory,
 			 should be allowed, or not, on files that already exist.



Hope this helps,
Tobin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
                     ` (3 preceding siblings ...)
  (?)
@ 2017-11-23 22:57   ` Tobin C. Harding
  2017-11-24  8:34     ` Salvatore Mesoraca
  -1 siblings, 1 reply; 60+ messages in thread
From: Tobin C. Harding @ 2017-11-23 22:57 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:

Same caveat about this being English language comments only as for patch
1/2. Please ignore if this is too trivial. My grammar is a long way from
perfect, especially please feel free to ignore my placement of commas,
they are often wrong. 

> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.
> 
> Suggested-by: Solar Designer <solar@openwall.com>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> ---
>  Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
>  fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h          |  1 +
>  kernel/sysctl.c             |  9 ++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index f3cf2cd..7f24b4f 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
>  - protected_fifos
>  - protected_hardlinks
>  - protected_regular
> +- protected_sticky_child_create
>  - protected_symlinks
>  - suid_dumpable
>  - super-max
> @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
>  
>  ==============================================================
>  
> +protected_sticky_child_create:
> +
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not

s/synthom/symptom

> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe

Perhaps

 This protection allows us to detect, and possibly block, these unsafe

> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file

Perhaps

 +Although it should be noted, sometimes it's OK to open a file

(I looked up 'although' vs 'though' and am not quite sure on this one,
it seems to read better with 'although'. Again, apologies if this is
overly trivial.)


Hope this helps,
Tobin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files
  2017-11-23 22:43   ` Tobin C. Harding
@ 2017-11-24  8:24     ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:24 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-23 23:43 GMT+01:00 Tobin C. Harding <me@tobin.cc>:
> On Wed, Nov 22, 2017 at 09:01:45AM +0100, Salvatore Mesoraca wrote:
>
> Please take these comments in all humility, my English is a long way
> from perfect. These are English grammar comments only. If this is viewed
> as trivial please stop reading now and ignore.

Any help is always greatly appreciated!
And I like your proposed changes, they sound better to me too.
Thank you for your time,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22 12:03   ` Pavel Vasilyev
@ 2017-11-24  8:28     ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:28 UTC (permalink / raw)
  To: Pavel Vasilyev; +Cc: Kernel Hardening

2017-11-22 13:03 GMT+01:00 Pavel Vasilyev <dixlor@gmail.com>:
> ...
>> +
>> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
>> +in world or group writable sticky directories.
>> +
>> +When set to "3", block O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories and notify (but don't block)
>> +in group writable sticky directories.
>> +
>> +When set to "4", block O_CREAT open missing the O_EXCL flag
>> +in world writable and group writable sticky directories.
>
> May be add:
>
> When set to "X", notify O_CREAT open missing the O_EXCL flag
> in world writable sticky directories and notify in group
> writable sticky directories.

Doesn't the "2" option already do this?
Did I misunderstood something?

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22 13:22     ` [kernel-hardening] " Matthew Wilcox
@ 2017-11-24  8:29       ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-22 14:22 GMT+01:00 Matthew Wilcox <willy@infradead.org>:
> On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
>> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
>> +often, a bug or a synthom of the fact that the program is not
>> +using appropriate procedures to access sticky directories.
>> +This protection allow to detect and possibly block these unsafe
>> +open invocations, even if the files don't exist yet.
>> +Though should be noted that, sometimes, it's OK to open a file
>> +with O_CREAT and without O_EXCL (e.g. shared lock files based
>> +on flock()), for this reason values above 2 should be set
>> +with care.
>> +
>> +When set to "0" the protection is disabled.
>> +
>> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories.
>> +
>> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
>> +in world or group writable sticky directories.
>> +
>> +When set to "3", block O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories and notify (but don't block)
>> +in group writable sticky directories.
>> +
>> +When set to "4", block O_CREAT open missing the O_EXCL flag
>> +in world writable and group writable sticky directories.
>
> This seems insufficiently flexible.  For example, there is no way for me
> to specify that I want to block O_CREAT without O_EXCL in world-writable,
> but not be notified about O_CREAT without O_EXCL in group-writable.

I understand your concern, I did it like this because I wanted to keep the
interface as simple as possible. But, maybe, more flexibility it's better.

> And maybe I want to be notified that blocking has happened?

I didn't write it explicitly in the doc, but you will always be notified
that blocking has happened. On the other hand you don't have any way to
suppress those notification.

> Why not make it bits?  So:
>
> 0 => notify in world
> 1 => block in world
> 2 => notify in group
> 3 => block in group
>
> So you'd have the following meaningful values:
>
>  0 - permit all (your option 0)
>  1 - notify world; permit group (your option 1)
>  2 - block world; permit group
>  3 - block,notify world; permit group
>  4 - permit world; notify group (?)
>  5 - notify world; notify group (your option 2)
>  6 - block world; notify group (your option 3)
>  7 - block,notify world; notify group
>  8 - permit world; block group (?)
>  9 - notify world; block group (?)
> 10 - block world; block group (your option 4)
> 11 - block,notify world; block group
> 12 - permit world; block, notify group (?)
> 13 - notify world; block, notify group (?)
> 14 - block world; block, notify group
> 15 - block, notify world; block, notify group
>
> Some of these don't make a lot of sense (marked with ?), but I don't see
> the harm in permitting a sysadmin to do something that seems nonsensical
> to me.

I like your idea of using "bits" this way, even if it will allow sysadmins
to set values that don't make too much sense.
Thank you very much for your suggestions,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-24  8:29       ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-22 14:22 GMT+01:00 Matthew Wilcox <willy@infradead.org>:
> On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
>> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
>> +often, a bug or a synthom of the fact that the program is not
>> +using appropriate procedures to access sticky directories.
>> +This protection allow to detect and possibly block these unsafe
>> +open invocations, even if the files don't exist yet.
>> +Though should be noted that, sometimes, it's OK to open a file
>> +with O_CREAT and without O_EXCL (e.g. shared lock files based
>> +on flock()), for this reason values above 2 should be set
>> +with care.
>> +
>> +When set to "0" the protection is disabled.
>> +
>> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories.
>> +
>> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
>> +in world or group writable sticky directories.
>> +
>> +When set to "3", block O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories and notify (but don't block)
>> +in group writable sticky directories.
>> +
>> +When set to "4", block O_CREAT open missing the O_EXCL flag
>> +in world writable and group writable sticky directories.
>
> This seems insufficiently flexible.  For example, there is no way for me
> to specify that I want to block O_CREAT without O_EXCL in world-writable,
> but not be notified about O_CREAT without O_EXCL in group-writable.

I understand your concern, I did it like this because I wanted to keep the
interface as simple as possible. But, maybe, more flexibility it's better.

> And maybe I want to be notified that blocking has happened?

I didn't write it explicitly in the doc, but you will always be notified
that blocking has happened. On the other hand you don't have any way to
suppress those notification.

> Why not make it bits?  So:
>
> 0 => notify in world
> 1 => block in world
> 2 => notify in group
> 3 => block in group
>
> So you'd have the following meaningful values:
>
>  0 - permit all (your option 0)
>  1 - notify world; permit group (your option 1)
>  2 - block world; permit group
>  3 - block,notify world; permit group
>  4 - permit world; notify group (?)
>  5 - notify world; notify group (your option 2)
>  6 - block world; notify group (your option 3)
>  7 - block,notify world; notify group
>  8 - permit world; block group (?)
>  9 - notify world; block group (?)
> 10 - block world; block group (your option 4)
> 11 - block,notify world; block group
> 12 - permit world; block, notify group (?)
> 13 - notify world; block, notify group (?)
> 14 - block world; block, notify group
> 15 - block, notify world; block, notify group
>
> Some of these don't make a lot of sense (marked with ?), but I don't see
> the harm in permitting a sysadmin to do something that seems nonsensical
> to me.

I like your idea of using "bits" this way, even if it will allow sysadmins
to set values that don't make too much sense.
Thank you very much for your suggestions,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22 16:51     ` [kernel-hardening] " Alan Cox
@ 2017-11-24  8:31       ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-22 17:51 GMT+01:00 Alan Cox <gnomes@lxorguk.ukuu.org.uk>:
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>
>> Disallows O_CREAT open missing the O_EXCL flag, in world or
>> group writable directories, even if the file doesn't exist yet.
>> With few exceptions (e.g. shared lock files based on flock())
>
> Enough exceptions to make it a bad idea.
>
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.
>
> The rest of this only make sense on a per application and directory basis
> because there are valid use cases, and that means it wants to be part of
> an existing LSM security module where you've got the context required and
> you can attach it to a specific directory and/or process.

I think that this feature should be intended more as a "debugging" feature
than as a "security" one.
When the feature implemented in the first patch is enabled, this restriction
doesn't improve security at all and it's not supposed to do it.
The first patch blocks attacks that exploit some unsafe usage of sticky
directories.
This patch, instead, doesn't block actual attacks: it detects (and maybe
blocks) the bad code that can be exploited for the attacks blocked by #1,
even if no one is attacking you in that moment.
This looks like a useful feature to me, even if you already use more
sophisticated security apparatus like LSMs or namespaces, because it makes
easy to find real vulnerabilities in software: the commit message of
patch #1 has a short list of some CVEs that this feature can detect.
Also, being just a sysctl away, it's within anyone's reach.
Probably the "debugging" goal wasn't clear from my previous message,
I'm sorry for the misunderstanding.
Thank you very much for your time,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-24  8:31       ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-22 17:51 GMT+01:00 Alan Cox <gnomes@lxorguk.ukuu.org.uk>:
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>
>> Disallows O_CREAT open missing the O_EXCL flag, in world or
>> group writable directories, even if the file doesn't exist yet.
>> With few exceptions (e.g. shared lock files based on flock())
>
> Enough exceptions to make it a bad idea.
>
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.
>
> The rest of this only make sense on a per application and directory basis
> because there are valid use cases, and that means it wants to be part of
> an existing LSM security module where you've got the context required and
> you can attach it to a specific directory and/or process.

I think that this feature should be intended more as a "debugging" feature
than as a "security" one.
When the feature implemented in the first patch is enabled, this restriction
doesn't improve security at all and it's not supposed to do it.
The first patch blocks attacks that exploit some unsafe usage of sticky
directories.
This patch, instead, doesn't block actual attacks: it detects (and maybe
blocks) the bad code that can be exploited for the attacks blocked by #1,
even if no one is attacking you in that moment.
This looks like a useful feature to me, even if you already use more
sophisticated security apparatus like LSMs or namespaces, because it makes
easy to find real vulnerabilities in software: the commit message of
patch #1 has a short list of some CVEs that this feature can detect.
Also, being just a sysctl away, it's within anyone's reach.
Probably the "debugging" goal wasn't clear from my previous message,
I'm sorry for the misunderstanding.
Thank you very much for your time,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-23 22:57   ` Tobin C. Harding
@ 2017-11-24  8:34     ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24  8:34 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

2017-11-23 23:57 GMT+01:00 Tobin C. Harding <me@tobin.cc>:
> On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
>
> Same caveat about this being English language comments only as for patch
> 1/2. Please ignore if this is too trivial. My grammar is a long way from
> perfect, especially please feel free to ignore my placement of commas,
> they are often wrong.

As I wrote before: any help is always welcome.
Thank you,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22 16:51     ` [kernel-hardening] " Alan Cox
@ 2017-11-24 10:53       ` David Laight
  -1 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-24 10:53 UTC (permalink / raw)
  To: 'Alan Cox', Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

From: Alan Cox
> Sent: 22 November 2017 16:52
> 
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> 
> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > group writable directories, even if the file doesn't exist yet.
> > With few exceptions (e.g. shared lock files based on flock())
> 
> Enough exceptions to make it a bad idea.
> 
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.

Looks like a very bad idea to me as well.

Doesn't this stop all shell redirects into a shared /tmp ?
I'm pretty sure most programs use O_CREAT | O_TRUNC for output
files - they'll all stop working.

If there are some directories where you need to force O_EXCL you
need to make it a property of the directory, not the kernel.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-24 10:53       ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-24 10:53 UTC (permalink / raw)
  To: 'Alan Cox', Salvatore Mesoraca
  Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman

From: Alan Cox
> Sent: 22 November 2017 16:52
> 
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> 
> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > group writable directories, even if the file doesn't exist yet.
> > With few exceptions (e.g. shared lock files based on flock())
> 
> Enough exceptions to make it a bad idea.
> 
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.

Looks like a very bad idea to me as well.

Doesn't this stop all shell redirects into a shared /tmp ?
I'm pretty sure most programs use O_CREAT | O_TRUNC for output
files - they'll all stop working.

If there are some directories where you need to force O_EXCL you
need to make it a property of the directory, not the kernel.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-24 10:53       ` [kernel-hardening] " David Laight
@ 2017-11-24 11:43         ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24 11:43 UTC (permalink / raw)
  To: David Laight
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Alan Cox
>> Sent: 22 November 2017 16:52
>>
>> On Wed, 22 Nov 2017 09:01:46 +0100
>> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>>
>> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> > group writable directories, even if the file doesn't exist yet.
>> > With few exceptions (e.g. shared lock files based on flock())
>>
>> Enough exceptions to make it a bad idea.
>>
>> Firstly if you care this much *stop* having shared writable directories.
>> We have namespaces, you don't need them. You can give every user their
>> own /tmp etc.
>
> Looks like a very bad idea to me as well.
>
> Doesn't this stop all shell redirects into a shared /tmp ?
> I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> files - they'll all stop working.

If some program does such a thing, that's a potential vulnerability.
With "protected_hardlinks" you are, in most cases, safe.
But, still, that program has a bug and having this feature enabled will
help you notice it soon.
For that matter, I'm using this patch on my system and I don't have any
program behaving like this.

> If there are some directories where you need to force O_EXCL you
> need to make it a property of the directory, not the kernel.
>
>         David
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-24 11:43         ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-24 11:43 UTC (permalink / raw)
  To: David Laight
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Alan Cox
>> Sent: 22 November 2017 16:52
>>
>> On Wed, 22 Nov 2017 09:01:46 +0100
>> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>>
>> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> > group writable directories, even if the file doesn't exist yet.
>> > With few exceptions (e.g. shared lock files based on flock())
>>
>> Enough exceptions to make it a bad idea.
>>
>> Firstly if you care this much *stop* having shared writable directories.
>> We have namespaces, you don't need them. You can give every user their
>> own /tmp etc.
>
> Looks like a very bad idea to me as well.
>
> Doesn't this stop all shell redirects into a shared /tmp ?
> I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> files - they'll all stop working.

If some program does such a thing, that's a potential vulnerability.
With "protected_hardlinks" you are, in most cases, safe.
But, still, that program has a bug and having this feature enabled will
help you notice it soon.
For that matter, I'm using this patch on my system and I don't have any
program behaving like this.

> If there are some directories where you need to force O_EXCL you
> need to make it a property of the directory, not the kernel.
>
>         David
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-24 11:43         ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-24 11:53           ` David Laight
  -1 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-24 11:53 UTC (permalink / raw)
  To: 'Salvatore Mesoraca'
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]
> Sent: 24 November 2017 11:44
> 
> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100
> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())
> >>
> >> Enough exceptions to make it a bad idea.
> >>
> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.
> >
> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?
> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.
> 
> If some program does such a thing, that's a potential vulnerability.
> With "protected_hardlinks" you are, in most cases, safe.
> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.

Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
I can't help feeling that is just hiding a race.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-24 11:53           ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-24 11:53 UTC (permalink / raw)
  To: 'Salvatore Mesoraca'
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]
> Sent: 24 November 2017 11:44
> 
> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100
> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())
> >>
> >> Enough exceptions to make it a bad idea.
> >>
> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.
> >
> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?
> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.
> 
> If some program does such a thing, that's a potential vulnerability.
> With "protected_hardlinks" you are, in most cases, safe.
> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.

Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
I can't help feeling that is just hiding a race.

	David


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-24 11:53           ` [kernel-hardening] " David Laight
@ 2017-11-26 11:29             ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-26 11:29 UTC (permalink / raw)
  To: David Laight
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

2017-11-24 12:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]
>> Sent: 24 November 2017 11:44
>>
>> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
>> > From: Alan Cox
>> >> Sent: 22 November 2017 16:52
>> >>
>> >> On Wed, 22 Nov 2017 09:01:46 +0100
>> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>> >>
>> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> >> > group writable directories, even if the file doesn't exist yet.
>> >> > With few exceptions (e.g. shared lock files based on flock())
>> >>
>> >> Enough exceptions to make it a bad idea.
>> >>
>> >> Firstly if you care this much *stop* having shared writable directories.
>> >> We have namespaces, you don't need them. You can give every user their
>> >> own /tmp etc.
>> >
>> > Looks like a very bad idea to me as well.
>> >
>> > Doesn't this stop all shell redirects into a shared /tmp ?
>> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
>> > files - they'll all stop working.
>>
>> If some program does such a thing, that's a potential vulnerability.
>> With "protected_hardlinks" you are, in most cases, safe.
>> But, still, that program has a bug and having this feature enabled will
>> help you notice it soon.
>> For that matter, I'm using this patch on my system and I don't have any
>> program behaving like this.
>
> Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
> open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
> I can't help feeling that is just hiding a race.

Yes, unfortunately, doing something like "cp somefile /tmp/" is a bad
practice that
in most cases will go unnoticed by this feature. Nevertheless there
are many other
real world vulnerability that it would have been able to detect.
Thank you very much for taking the time to do some experiments.

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-26 11:29             ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-26 11:29 UTC (permalink / raw)
  To: David Laight
  Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel,
	Alexander Viro, Jann Horn, Kees Cook, Solar Designer,
	Eric W. Biederman

2017-11-24 12:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]
>> Sent: 24 November 2017 11:44
>>
>> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
>> > From: Alan Cox
>> >> Sent: 22 November 2017 16:52
>> >>
>> >> On Wed, 22 Nov 2017 09:01:46 +0100
>> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>> >>
>> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> >> > group writable directories, even if the file doesn't exist yet.
>> >> > With few exceptions (e.g. shared lock files based on flock())
>> >>
>> >> Enough exceptions to make it a bad idea.
>> >>
>> >> Firstly if you care this much *stop* having shared writable directories.
>> >> We have namespaces, you don't need them. You can give every user their
>> >> own /tmp etc.
>> >
>> > Looks like a very bad idea to me as well.
>> >
>> > Doesn't this stop all shell redirects into a shared /tmp ?
>> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
>> > files - they'll all stop working.
>>
>> If some program does such a thing, that's a potential vulnerability.
>> With "protected_hardlinks" you are, in most cases, safe.
>> But, still, that program has a bug and having this feature enabled will
>> help you notice it soon.
>> For that matter, I'm using this patch on my system and I don't have any
>> program behaving like this.
>
> Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
> open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
> I can't help feeling that is just hiding a race.

Yes, unfortunately, doing something like "cp somefile /tmp/" is a bad
practice that
in most cases will go unnoticed by this feature. Nevertheless there
are many other
real world vulnerability that it would have been able to detect.
Thank you very much for taking the time to do some experiments.

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-24 11:43         ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-27  0:26           ` Solar Designer
  -1 siblings, 0 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-27  0:26 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening,
	linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook,
	Eric W. Biederman

On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())

Why would "shared lock files based on flock()" need O_CREAT without
O_EXCL?  Where specifically are they currently used that way?  My guess
would be a group-writable mail spool (that would be fcntl() locks on
Linux now, but it's the same thing for the purpose of this discussion),
but even then I don't see a need for such open flags.  If a program does
that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
retry without these to open the existing file, then flock() either way).

> >> Enough exceptions to make it a bad idea.

Maybe, but as Salvatore points out we're considering this for rather
limited use - easy opt-in policy enforcement during software development
and auditing (years ago, I was using an LKM for this purpose, which
caught some issues in Postfix that were promptly patched), and maybe on
specific production systems where the sysadmins know what they're doing.

> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.

This is good advice (and I've been doing similar with pam_mktemp, prior
to namespaces), but it's not exactly for the same use case.

Also, there are shared writable directories that have to remain shared
unless more things are reworked.  For example, mail spools and crontab
directories, which could be avoided by having the corresponding files in
user homes instead (and it's been done), but that's also a related risk
(a privileged process accessing a directory writable by a user, even if
accessing only for reading, lowering the risk impact).

> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?

The above sounds like it'd be something bad - but it's actually just
right for an opt-in policy like this.  A command like "program >
/tmp/file" is generally unsafe (unless the file was pre-created safely,
see below) and should be avoided.  The fact that a lot of documentation
gives commands of this kind only emphasizes the need for easy policy
enforcement against such misuses of /tmp.

Ideally, we'd stop most "shell redirects into a shared /tmp" - all but
those into files correctly pre-created with mktemp(1) and the like.

Technically, we could achieve similar by allowing O_CREAT without O_EXCL
if the file exists _and_ has the same owner as our current fsuid or the
owner is root.  Most scripts that use mktemp(1) then use that file
without switching privileges, so they'll continue working.  If there's
an exception to that on a system where I'd configure a policy like
what's discussed here, I'd rather learn of it and be interested in
looking at that unusual script.  That script would be taking a risk by
making one (pseudo-)user (or root) write into a file in /tmp created
(even if initially safely) by another pseudo-user (not root), thus
letting the latter pseudo-user attack the former (or root).

Ditto for programs using mkstemp(3), but then somehow going through the
pathname rather than the fd.  That's not ideal, but it's sometimes safe
and sometimes makes sense (such as when they need to exec other programs
only accepting a pathname), so the same exception and approach applies.

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.

That would be great.  Unfortunately (in this context), you've identified
exceptions, where programs try to be smart but are not necessarily safe.

It's beneficial to be able to easily stop at least some misuses of /tmp
and the like, even if we can't stop all.

> If some program does such a thing, that's a potential vulnerability.

Exactly.

> With "protected_hardlinks" you are, in most cases, safe.

And "protected_symlinks", and not in terms of data spoofing attacks via
FIFOs and regular files (the initial focus of this patchset).

> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.
> 
> > If there are some directories where you need to force O_EXCL you
> > need to make it a property of the directory, not the kernel.
> >
> >         David

Good idea, but this would need to be checked and enforced by the kernel
anyhow, so it's a matter of coarse vs. fine policy.  Coarse is much
easier to implement and to start using, albeit perhaps mostly not by
general-purpose distros, but by a software developer/auditor, an
adventurous sysadmin, or a sysadmin team or a developer of specific
systems where a simple policy like this is known to be valid (e.g.,
where there's no expectation of arbitrary software being added, so a
simple rule like this can be introduced system-wide).

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-27  0:26           ` Solar Designer
  0 siblings, 0 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-27  0:26 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening,
	linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook,
	Eric W. Biederman

On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())

Why would "shared lock files based on flock()" need O_CREAT without
O_EXCL?  Where specifically are they currently used that way?  My guess
would be a group-writable mail spool (that would be fcntl() locks on
Linux now, but it's the same thing for the purpose of this discussion),
but even then I don't see a need for such open flags.  If a program does
that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
retry without these to open the existing file, then flock() either way).

> >> Enough exceptions to make it a bad idea.

Maybe, but as Salvatore points out we're considering this for rather
limited use - easy opt-in policy enforcement during software development
and auditing (years ago, I was using an LKM for this purpose, which
caught some issues in Postfix that were promptly patched), and maybe on
specific production systems where the sysadmins know what they're doing.

> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.

This is good advice (and I've been doing similar with pam_mktemp, prior
to namespaces), but it's not exactly for the same use case.

Also, there are shared writable directories that have to remain shared
unless more things are reworked.  For example, mail spools and crontab
directories, which could be avoided by having the corresponding files in
user homes instead (and it's been done), but that's also a related risk
(a privileged process accessing a directory writable by a user, even if
accessing only for reading, lowering the risk impact).

> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?

The above sounds like it'd be something bad - but it's actually just
right for an opt-in policy like this.  A command like "program >
/tmp/file" is generally unsafe (unless the file was pre-created safely,
see below) and should be avoided.  The fact that a lot of documentation
gives commands of this kind only emphasizes the need for easy policy
enforcement against such misuses of /tmp.

Ideally, we'd stop most "shell redirects into a shared /tmp" - all but
those into files correctly pre-created with mktemp(1) and the like.

Technically, we could achieve similar by allowing O_CREAT without O_EXCL
if the file exists _and_ has the same owner as our current fsuid or the
owner is root.  Most scripts that use mktemp(1) then use that file
without switching privileges, so they'll continue working.  If there's
an exception to that on a system where I'd configure a policy like
what's discussed here, I'd rather learn of it and be interested in
looking at that unusual script.  That script would be taking a risk by
making one (pseudo-)user (or root) write into a file in /tmp created
(even if initially safely) by another pseudo-user (not root), thus
letting the latter pseudo-user attack the former (or root).

Ditto for programs using mkstemp(3), but then somehow going through the
pathname rather than the fd.  That's not ideal, but it's sometimes safe
and sometimes makes sense (such as when they need to exec other programs
only accepting a pathname), so the same exception and approach applies.

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.

That would be great.  Unfortunately (in this context), you've identified
exceptions, where programs try to be smart but are not necessarily safe.

It's beneficial to be able to easily stop at least some misuses of /tmp
and the like, even if we can't stop all.

> If some program does such a thing, that's a potential vulnerability.

Exactly.

> With "protected_hardlinks" you are, in most cases, safe.

And "protected_symlinks", and not in terms of data spoofing attacks via
FIFOs and regular files (the initial focus of this patchset).

> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.
> 
> > If there are some directories where you need to force O_EXCL you
> > need to make it a property of the directory, not the kernel.
> >
> >         David

Good idea, but this would need to be checked and enforced by the kernel
anyhow, so it's a matter of coarse vs. fine policy.  Coarse is much
easier to implement and to start using, albeit perhaps mostly not by
general-purpose distros, but by a software developer/auditor, an
adventurous sysadmin, or a sysadmin team or a developer of specific
systems where a simple policy like this is known to be valid (e.g.,
where there's no expectation of arbitrary software being added, so a
simple rule like this can be introduced system-wide).

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-22  8:01 ` [kernel-hardening] " Salvatore Mesoraca
                   ` (2 preceding siblings ...)
  (?)
@ 2017-11-27  1:14 ` Solar Designer
  2017-11-27 13:18   ` Solar Designer
  2017-11-30 14:37   ` Salvatore Mesoraca
  -1 siblings, 2 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-27  1:14 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening

Replying to kernel-hardening only, so we can consider the below thoughts
in a smaller group before bothering LKML:

On Wed, Nov 22, 2017 at 09:01:44AM +0100, Salvatore Mesoraca wrote:
> This patch-set introduces two separate features aimed at restricting
> dangerous open in world or group writable sticky directories.
> The purpose is to prevent exploitable bugs in user-space programs
> that don't access sticky directories in the proper way.
> The first patch prevents the O_CREAT open of FIFOs and regular files
> in world or group writable sticky directories, if they already exists
> and are owned by someone else.
> The second patch prevents O_CREAT open in world or group writable
> sticky when the O_EXCL flag is not set, even if the file doesn't
> exist yet.

When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
try to introduce them at the same time with the restrictions on FIFOs
and regular files.  I think that bundling these together might be a
recipe for never getting any of them in, or for getting in the latter
with unnecessarily limited scope.  I think we need an optional policy
against O_CREAT-without-O_EXCL not only for sticky directories, but also
for other directories writable by other than the current fsuid - there
may be several levels or individual flags here.

So maybe unbundle these or at least avoid forever limiting them to
sticky directories (don't include "sticky" in the sysctl name).

Meanwhile, here's a scenario where I think the FIFO restrictions are
most obviously still worthwhile: attacks on system bootup scripts, where
the attacker would plant a FIFO on a persistent /tmp or the like and the
system would access it on next bootup.  The attacker's exploit program
would not yet be running during the bootup, so it wouldn't be able to
even try winning a race, but a FIFO would be able to keep the vulnerable
service (or its startup script, etc.) stuck until the attacker has
access again.  On the other hand, /tmp is commonly on tmpfs these days,
whereas some other shared writable directories (not necessarily world
writable and not necessarily sticky) are persistent (mail and cron job
spools, preformatted man page cache, etc.)  So maybe this, too, is a
reason to eventually extend these protections beyond sticky directories
(of course, only as an option).  We shouldn't using naming that would be
against such extension, or/and we shouldn't think of the
O_CREAT-without-O_EXCL checks as permanently only for extra detection
and reporting - they may also prevent more attacks if we eventually
extend them to non-sticky, but not (yet) similarly extent the file type
specific protections.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-27  1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer
@ 2017-11-27 13:18   ` Solar Designer
  2017-11-30 14:38     ` Salvatore Mesoraca
  2017-11-30 14:37   ` Salvatore Mesoraca
  1 sibling, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-11-27 13:18 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening

Salvatore,

On Mon, Nov 27, 2017 at 02:14:41AM +0100, Solar Designer wrote:
> When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
> try to introduce them at the same time with the restrictions on FIFOs
> and regular files.

I re-read what I wrote in that discussion in September, and I can see
how it was confusing: first I suggested those checks as possible extra
settings in the restrictions on FIFOs and regular files, then agreed
with you that we need a separate sysctl in a separate patch - but we
never discussed whether that separate patch should be part of the same
patch series or not.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-27  1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer
  2017-11-27 13:18   ` Solar Designer
@ 2017-11-30 14:37   ` Salvatore Mesoraca
  2017-11-30 19:05     ` Solar Designer
  1 sibling, 1 reply; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-30 14:37 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening

2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>:
> Replying to kernel-hardening only, so we can consider the below thoughts
> in a smaller group before bothering LKML:
>
> On Wed, Nov 22, 2017 at 09:01:44AM +0100, Salvatore Mesoraca wrote:
> > This patch-set introduces two separate features aimed at restricting
> > dangerous open in world or group writable sticky directories.
> > The purpose is to prevent exploitable bugs in user-space programs
> > that don't access sticky directories in the proper way.
> > The first patch prevents the O_CREAT open of FIFOs and regular files
> > in world or group writable sticky directories, if they already exists
> > and are owned by someone else.
> > The second patch prevents O_CREAT open in world or group writable
> > sticky when the O_EXCL flag is not set, even if the file doesn't
> > exist yet.
>
> When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
> try to introduce them at the same time with the restrictions on FIFOs
> and regular files.  I think that bundling these together might be a
> recipe for never getting any of them in, or for getting in the latter
> with unnecessarily limited scope.

Yes, I think you are right. I'll split them.

> I think we need an optional policy
> against O_CREAT-without-O_EXCL not only for sticky directories, but also
> for other directories writable by other than the current fsuid - there
> may be several levels or individual flags here.
>
> So maybe unbundle these or at least avoid forever limiting them to
> sticky directories (don't include "sticky" in the sysctl name).

Actually I never liked that sysctl name :) I'll try to come up with something
less ugly and that doesn't include the "sticky" word.
If I implement something like what Matthew proposed[1] it will be easy to
extend scope and functionalities of this feature without complicating too much
the interface.

> Meanwhile, here's a scenario where I think the FIFO restrictions are
> most obviously still worthwhile: attacks on system bootup scripts, where
> the attacker would plant a FIFO on a persistent /tmp or the like and the
> system would access it on next bootup.  The attacker's exploit program
> would not yet be running during the bootup, so it wouldn't be able to
> even try winning a race, but a FIFO would be able to keep the vulnerable
> service (or its startup script, etc.) stuck until the attacker has
> access again.  On the other hand, /tmp is commonly on tmpfs these days,
> whereas some other shared writable directories (not necessarily world
> writable and not necessarily sticky) are persistent (mail and cron job
> spools, preformatted man page cache, etc.)  So maybe this, too, is a
> reason to eventually extend these protections beyond sticky directories
> (of course, only as an option).  We shouldn't using naming that would be
> against such extension, or/and we shouldn't think of the
> O_CREAT-without-O_EXCL checks as permanently only for extra detection
> and reporting - they may also prevent more attacks if we eventually
> extend them to non-sticky, but not (yet) similarly extent the file type
> specific protections.

So, are you suggesting that I should extend "O_CREAT-without-O_EXCL"
and "FIFOs restrictions" to work (optionally) on non-sticky directories too,
while leaving untouched (for the moment) "normal files restrictions"?
Thank you very much for you suggestions,

Salvatore


[1] https://lkml.org/lkml/2017/11/22/319

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-27 13:18   ` Solar Designer
@ 2017-11-30 14:38     ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-30 14:38 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening

2017-11-27 14:18 GMT+01:00 Solar Designer <solar@openwall.com>:
> Salvatore,
>
> On Mon, Nov 27, 2017 at 02:14:41AM +0100, Solar Designer wrote:
>> When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
>> try to introduce them at the same time with the restrictions on FIFOs
>> and regular files.
>
> I re-read what I wrote in that discussion in September, and I can see
> how it was confusing: first I suggested those checks as possible extra
> settings in the restrictions on FIFOs and regular files, then agreed
> with you that we need a separate sysctl in a separate patch - but we
> never discussed whether that separate patch should be part of the same
> patch series or not.

No problem, misunderstandings happen.
I really appreciate your help with this patch!

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-27  0:26           ` [kernel-hardening] " Solar Designer
@ 2017-11-30 14:39             ` Salvatore Mesoraca
  -1 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-30 14:39 UTC (permalink / raw)
  To: Solar Designer
  Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening,
	linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook,
	Eric W. Biederman

2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > > From: Alan Cox
> > >> Sent: 22 November 2017 16:52
> > >>
> > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> > >>
> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > >> > group writable directories, even if the file doesn't exist yet.
> > >> > With few exceptions (e.g. shared lock files based on flock())
>
> Why would "shared lock files based on flock()" need O_CREAT without
> O_EXCL?  Where specifically are they currently used that way?

I don't think that they *need* to act like this, but this is how
util-linux's flock(1) currently works.
And it doesn't seem an unreasonable behavior from their perspective,
so maybe other programs do that too.
I was citing that case just to make it clear that, if someone gets
a warning because of flock(1), they shouldn't be worried about it.
That behavior can be certainly avoided, but of course it isn't a
security problem per se.

> If a program does
> that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
> retry without these to open the existing file, then flock() either way).

Yes, this would probably be the best thing to do, good idea.
Thanks again for your time,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-30 14:39             ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-11-30 14:39 UTC (permalink / raw)
  To: Solar Designer
  Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening,
	linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook,
	Eric W. Biederman

2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > > From: Alan Cox
> > >> Sent: 22 November 2017 16:52
> > >>
> > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> > >>
> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > >> > group writable directories, even if the file doesn't exist yet.
> > >> > With few exceptions (e.g. shared lock files based on flock())
>
> Why would "shared lock files based on flock()" need O_CREAT without
> O_EXCL?  Where specifically are they currently used that way?

I don't think that they *need* to act like this, but this is how
util-linux's flock(1) currently works.
And it doesn't seem an unreasonable behavior from their perspective,
so maybe other programs do that too.
I was citing that case just to make it clear that, if someone gets
a warning because of flock(1), they shouldn't be worried about it.
That behavior can be certainly avoided, but of course it isn't a
security problem per se.

> If a program does
> that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
> retry without these to open the existing file, then flock() either way).

Yes, this would probably be the best thing to do, good idea.
Thanks again for your time,

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-30 14:39             ` [kernel-hardening] " Salvatore Mesoraca
  (?)
@ 2017-11-30 14:57             ` Ian Campbell
  2017-11-30 16:30               ` [kernel-hardening] " Solar Designer
  -1 siblings, 1 reply; 60+ messages in thread
From: Ian Campbell @ 2017-11-30 14:57 UTC (permalink / raw)
  To: Salvatore Mesoraca, Solar Designer
  Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening,
	linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook,
	Eric W. Biederman

On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>
> > > :
> > > > From: Alan Cox
> > > > > Sent: 22 November 2017 16:52
> > > > > 
> > > > > On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.meso
> > > > > raca16@gmail.com> wrote:
> > > > > 
> > > > > > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > > > > > group writable directories, even if the file doesn't exist
> > > > > > yet.
> > > > > > With few exceptions (e.g. shared lock files based on
> > > > > > flock())
> > 
> > Why would "shared lock files based on flock()" need O_CREAT without
> > O_EXCL?  Where specifically are they currently used that way?
> 
> I don't think that they *need* to act like this, but this is how
> util-linux's flock(1) currently works.
> And it doesn't seem an unreasonable behavior from their perspective,

I thought that too, specifically I reasoned that using O_EXCL would
defeat the purpose of the _shared_ flock(), wouldn't it?

Ian.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-30 14:57             ` Ian Campbell
@ 2017-11-30 16:30               ` Solar Designer
  2017-12-05 10:21                 ` Salvatore Mesoraca
  0 siblings, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-11-30 16:30 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Salvatore Mesoraca, David Laight, Alan Cox, linux-kernel,
	Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak

Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
Karel Zak for util-linux flock(1).

On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote:
> On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > Why would "shared lock files based on flock()" need O_CREAT without
> > > O_EXCL?  Where specifically are they currently used that way?
> > 
> > I don't think that they *need* to act like this, but this is how
> > util-linux's flock(1) currently works.

Oh, but you never referred to flock(1) in there, so I read flock() as
implying flock(2).  I think util-linux's flock(1) is relatively obscure,
and as far as I can tell it isn't documented anywhere whether it may be
used on untrusted lock file directories or not.  A revision of the
flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
those examples, I see flock(1) literally uses the /tmp directory itself
(not a file inside) as the lock, which surprisingly appears to work.
Checking the util-linux tree, it looks like this was added as a feature.
I suppose the good news is this doesn't appear to allow for worse than a
DoS against the script using flock(1) (since a different user can also
take that lock), unless any of the upper level directories are also
untrusted.  The man page as seen at https://linux.die.net/man/1/flock
currently only lists a more complex example with /var/lock/mylockfile,
where flock(1) itself is asked to access it via a pre-opened fd.
Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
examples.  The reality is people will use flock(1) in various directories,
some of them trusted and some not.  If our proposed policy can detect and
optionally break some uses in untrusted directories, that's good since
such uses are currently unsafe (see below).

> > And it doesn't seem an unreasonable behavior from their perspective,
> 
> I thought that too, specifically I reasoned that using O_EXCL would
> defeat the purpose of the _shared_ flock(), wouldn't it?

No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
program can retry without both of these flags.  (The actual lock is
obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
already does something similar, as tested using the very first example
from the man page on a RHEL7'ish system:

$ strace flock /tmp -c cat
[...]
open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
open("/tmp", O_RDONLY|O_NOCTTY)         = 3
flock(3, LOCK_EX)                       = 0

As you can see, when O_CREAT failed, the program retried without that
flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
both at once.  Testing with a lock file (rather than lock directory):

$ strace flock /tmp/lockfile -c cat
[...]
open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)                       = 0

This use of flock(1) would be a worse vulnerability, not limited to DoS
against the script itself but also allowing for privilege escalation
unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
would help (reduce the impact back to DoS against itself), and would
require that the retry logic (like what is seen in the lock directory
example above) be present.

On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> so maybe other programs do that too.

Maybe, and they would be similarly vulnerable if they do that in
untrusted directories, and they would also need to be fixed.

A subtle case is when something like this is technically done in a
directory writable by someone else, but is not necessarily crossing
what the program's author or the sysadmin recognize as a privilege
boundary.  Maybe they have accepted that this one pseudo-user can
attack that other one anyway, such as because under a certain daemon's
privsep model the would-be attacking pseudo-user is necessarily more
privileged anyway.  This is why we shouldn't by default extend this
policy to all directories writable by someone else, but instead we
consider introducing a sysctl with varying policy levels - initially
focusing on sticky directories writable by someone else and only later
optionally extending to non-sticky directories writable by someone else.
The latter mode would have even greater focus on development and
debugging rather than on production use, although I imagine that
specific systems could eventually afford it in production as well.

> I was citing that case just to make it clear that, if someone gets
> a warning because of flock(1), they shouldn't be worried about it.

Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact
doesn't retry without O_CREAT in the non-directory case.  That would
need to be corrected, along with introduction of O_EXCL.

> That behavior can be certainly avoided, but of course it isn't a
> security problem per se.

I think it is a security problem per se, except in the "subtle case"
above, and it's great that our proposed policy would catch it.

Perhaps flock(1) should be extended as follows:

> > If a program does
> > that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
> > retry without these to open the existing file, then flock() either way).
> 
> Yes, this would probably be the best thing to do, good idea.

util-linux-2.31/sys-utils/flock.c currently has:

static int open_file(const char *filename, int *flags)
{

	int fd;
	int fl = *flags == 0 ? O_RDONLY : *flags;

	errno = 0;
	fl |= O_NOCTTY | O_CREAT;
	fd = open(filename, fl, 0666);

	/* Linux doesn't like O_CREAT on a directory, even though it
	 * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY
	 */
	if (fd < 0 && errno == EISDIR) {
		fl = O_RDONLY | O_NOCTTY;
		fd = open(filename, fl);
	}

I think this should be revised to:

	fl |= O_NOCTTY | O_CREAT | O_EXCL;
	fd = open(filename, fl, 0666);

	/* Linux doesn't like O_CREAT on a directory, even though it
	 * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY
	 */
	if (fd < 0) {
		if (errno == EISDIR)
			fl = O_RDONLY | O_NOCTTY;
		else
			fl &= ~(O_CREAT | O_EXCL);
		fd = open(filename, fl);
	}

This adds O_EXCL and retry without O_CREAT|O_EXCL for non-directories.

[ The pre-umask permissions of 0666 are also a potential concern since
many users have a relaxed umask as distro default, but changing these to
0600 would break some otherwise valid uses where a lock file may be
shared between different users on purpose.  Maybe a future major version
of flock(1) could default to 0600 and/or accept a "-m" option to set the
lock file's mode in case a new file gets created.  mktemp(1) may be
viewed as precedent: it uses pre-umask permissions of 0600.  Lock files
are arguably somewhat similar to temporary files.  This is beyond scope
of this thread and LKML, though, and should be discussed elsewhere. ]

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
@ 2017-11-30 16:53     ` David Laight
  -1 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-30 16:53 UTC (permalink / raw)
  To: 'Salvatore Mesoraca', linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Solar Designer, Eric W. Biederman

From: Salvatore Mesoraca
> Sent: 22 November 2017 08:02
> 
> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.

(Going back to the original post)

I presume the 'vulnerabilities' are related to symlinks being created
just before the open?

Trouble is this change breaks a lot of general use of /tmp.
I always assumed that code that cared would use O_EXCL and
everything else wasn't worth subverting.

I found code in vi (and elsewhere) that subverted these checks
by opening with O_WRONLY if stat() showed the file existed and
O_CREAT|O_EXCL if it didn't.

I'm pretty sure that traditionally a lot of these opens were done
with O_CREAT|O_TRUNC.
Implementing that as unlink() followed by a create would stop
'random' (ok all) symlinks being followed.

Overall I'm pretty sure this change will break things badly somewhere.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-30 16:53     ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-11-30 16:53 UTC (permalink / raw)
  To: 'Salvatore Mesoraca', linux-kernel
  Cc: Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Solar Designer, Eric W. Biederman

From: Salvatore Mesoraca
> Sent: 22 November 2017 08:02
> 
> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.

(Going back to the original post)

I presume the 'vulnerabilities' are related to symlinks being created
just before the open?

Trouble is this change breaks a lot of general use of /tmp.
I always assumed that code that cared would use O_EXCL and
everything else wasn't worth subverting.

I found code in vi (and elsewhere) that subverted these checks
by opening with O_WRONLY if stat() showed the file existed and
O_CREAT|O_EXCL if it didn't.

I'm pretty sure that traditionally a lot of these opens were done
with O_CREAT|O_TRUNC.
Implementing that as unlink() followed by a create would stop
'random' (ok all) symlinks being followed.

Overall I'm pretty sure this change will break things badly somewhere.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-30 16:53     ` [kernel-hardening] " David Laight
@ 2017-11-30 17:51       ` Solar Designer
  -1 siblings, 0 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-30 17:51 UTC (permalink / raw)
  To: David Laight
  Cc: 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> From: Salvatore Mesoraca
> > if a program tries to open a file, in a sticky directory,
> > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > This feature allows to detect and potentially block programs that
> > act this way, it can be used to find vulnerabilities (like those
> > prevented by patch #1) and to do policy enforcement.

> I presume the 'vulnerabilities' are related to symlinks being created
> just before the open?

That's one way to exploit them.  We already have a sysctl to try and
block that specific attack, and we're considering adding more, for other
attacks.  But we'd also like an easy way to learn of the vulnerabilities
without anyone trying to exploit them yet, hence this patch.

> Trouble is this change breaks a lot of general use of /tmp.

That's general misuse of /tmp.  Things like "command > /tmp/file"
without having pre-created the file with O_EXCL e.g. by mktemp(1).

> I always assumed that code that cared would use O_EXCL and
> everything else wasn't worth subverting.

Opinions will vary on whether it's worth subverting or not, and someone
may reasonably want to configure this differently on different systems.

> I found code in vi (and elsewhere) that subverted these checks
> by opening with O_WRONLY if stat() showed the file existed and
> O_CREAT|O_EXCL if it didn't.

Yes, such misuses of /tmp and the like by use of those programs - where
the potential consequences would often be less severe (if your example
above is correct, it sounds like it'd be data spoofing but not
overwriting another file over a link) - could remain unnoticed.

> I'm pretty sure that traditionally a lot of these opens were done
> with O_CREAT|O_TRUNC.
> Implementing that as unlink() followed by a create would stop
> 'random' (ok all) symlinks being followed.

That's racy.  We have O_EXCL, and it must be used along with O_CREAT
when the directory is untrusted.  (If it were only about symlinks, we
also already have O_NOFOLLOW.)

> Overall I'm pretty sure this change will break things badly somewhere.

Sure.  We want it to visibly break what was subtly broken, so that the
breakage can be seen and corrected without an attempted attack.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-11-30 17:51       ` Solar Designer
  0 siblings, 0 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-30 17:51 UTC (permalink / raw)
  To: David Laight
  Cc: 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> From: Salvatore Mesoraca
> > if a program tries to open a file, in a sticky directory,
> > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > This feature allows to detect and potentially block programs that
> > act this way, it can be used to find vulnerabilities (like those
> > prevented by patch #1) and to do policy enforcement.

> I presume the 'vulnerabilities' are related to symlinks being created
> just before the open?

That's one way to exploit them.  We already have a sysctl to try and
block that specific attack, and we're considering adding more, for other
attacks.  But we'd also like an easy way to learn of the vulnerabilities
without anyone trying to exploit them yet, hence this patch.

> Trouble is this change breaks a lot of general use of /tmp.

That's general misuse of /tmp.  Things like "command > /tmp/file"
without having pre-created the file with O_EXCL e.g. by mktemp(1).

> I always assumed that code that cared would use O_EXCL and
> everything else wasn't worth subverting.

Opinions will vary on whether it's worth subverting or not, and someone
may reasonably want to configure this differently on different systems.

> I found code in vi (and elsewhere) that subverted these checks
> by opening with O_WRONLY if stat() showed the file existed and
> O_CREAT|O_EXCL if it didn't.

Yes, such misuses of /tmp and the like by use of those programs - where
the potential consequences would often be less severe (if your example
above is correct, it sounds like it'd be data spoofing but not
overwriting another file over a link) - could remain unnoticed.

> I'm pretty sure that traditionally a lot of these opens were done
> with O_CREAT|O_TRUNC.
> Implementing that as unlink() followed by a create would stop
> 'random' (ok all) symlinks being followed.

That's racy.  We have O_EXCL, and it must be used along with O_CREAT
when the directory is untrusted.  (If it were only about symlinks, we
also already have O_NOFOLLOW.)

> Overall I'm pretty sure this change will break things badly somewhere.

Sure.  We want it to visibly break what was subtly broken, so that the
breakage can be seen and corrected without an attempted attack.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-30 14:37   ` Salvatore Mesoraca
@ 2017-11-30 19:05     ` Solar Designer
  2017-11-30 19:14       ` Solar Designer
  2017-12-05 10:13       ` Salvatore Mesoraca
  0 siblings, 2 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-30 19:05 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening

On Thu, Nov 30, 2017 at 03:37:29PM +0100, Salvatore Mesoraca wrote:
> 2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>:
> > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
> > try to introduce them at the same time with the restrictions on FIFOs
> > and regular files.  I think that bundling these together might be a
> > recipe for never getting any of them in, or for getting in the latter
> > with unnecessarily limited scope.
> 
> Yes, I think you are right. I'll split them.

Thanks.

> > I think we need an optional policy
> > against O_CREAT-without-O_EXCL not only for sticky directories, but also
> > for other directories writable by other than the current fsuid - there
> > may be several levels or individual flags here.
> >
> > So maybe unbundle these or at least avoid forever limiting them to
> > sticky directories (don't include "sticky" in the sysctl name).
> 
> Actually I never liked that sysctl name :) I'll try to come up with something
> less ugly and that doesn't include the "sticky" word.

So the current name is "protected_sticky_child_create" (I couldn't even
recall it, had to look it up for this reply).  This unnecessarily
bundles this potentially more general policy stuff with the existing
"protections" against specific attacks, unnecessarily limits scope to
"sticky" and "create", and talks about some "child".  How about we use
something totally different, focusing on "policy"?  It could be simply
"policy" (we're already in "fs"), or if that won't fly then how about
"security_policy" or "dac_policy"?  We will be imposing extra
restrictions on top of usual Unix discretionary access control
permission bits while not going all the way to mandatory access control
(not tying objects to subjects).  So in a sense we'll have an extension
of Unix DAC.

BTW, this and all of fs.protected* should be configurable per-container.

> If I implement something like what Matthew proposed[1] it will be easy to
> extend scope and functionalities of this feature without complicating too much
> the interface.

> [1] https://lkml.org/lkml/2017/11/22/319

Right.  I tried thinking of a way to specify all reasonable combinations
without the likely unreasonable ones, but couldn't come up with anything
elegant.  So I'm fine with Matthew's proposal as-is.

A new thought: a directory that has someone else as its owner is for our
purposes effectively the same as a group-writable directory.  So maybe
whatever we'll implement for group-writable directories should also be
done for directories that have other than the current fsuid as their
owner.  It's currently very uncommon to have directories with the sticky
bit set that are not at least group-writable, so this will rarely make a
difference (and when it does, that's just right), but also it provides a
way to explicitly include any directory under this monitoring (if the
group-writable protection is on) - e.g., if a sysadmin wants this
monitoring for users' home directories, they can change permissions for
those from e.g. 700 to 1700.  This could be handy for development and
auditing of software, even though in production it could be easily
circumvented by the directory owner (who can remove the sticky bit,
which we should document to avoid providing a false sense of security)
and it won't automatically apply to subdirectories.  It'd also cover
part of what we intend to achieve later by possibly extending the
feature to non-sticky directories, where we might also want to treat
different owner the same as group-writable (without the circumvention).

> > Meanwhile, here's a scenario where I think the FIFO restrictions are
> > most obviously still worthwhile: attacks on system bootup scripts, where
> > the attacker would plant a FIFO on a persistent /tmp or the like and the
> > system would access it on next bootup.  The attacker's exploit program
> > would not yet be running during the bootup, so it wouldn't be able to
> > even try winning a race, but a FIFO would be able to keep the vulnerable
> > service (or its startup script, etc.) stuck until the attacker has
> > access again.  On the other hand, /tmp is commonly on tmpfs these days,
> > whereas some other shared writable directories (not necessarily world
> > writable and not necessarily sticky) are persistent (mail and cron job
> > spools, preformatted man page cache, etc.)  So maybe this, too, is a
> > reason to eventually extend these protections beyond sticky directories
> > (of course, only as an option).  We shouldn't using naming that would be
> > against such extension, or/and we shouldn't think of the
> > O_CREAT-without-O_EXCL checks as permanently only for extra detection
> > and reporting - they may also prevent more attacks if we eventually
> > extend them to non-sticky, but not (yet) similarly extent the file type
> > specific protections.
> 
> So, are you suggesting that I should extend "O_CREAT-without-O_EXCL"
> and "FIFOs restrictions" to work (optionally) on non-sticky directories too,
> while leaving untouched (for the moment) "normal files restrictions"?

No, I think all of these and the existing symlink restrictions should
potentially be extended "to work (optionally) on non-sticky directories
too", but perhaps with separate patches later.

An even further extension may be to cover non-O_CREAT: writing or/and
reading an existing file in an untrusted directory is also potentially
unsafe.  Unfortunately, we can't reliably know whether the program
possibly takes precautions by using lstat() and fstat() and comparing
st_dev/st_ino, so we won't be able to distinguish likely unsafe and
likely mostly safe accesses, but we'll be able to flag all of them for
manual analysis on a developer's or an auditor's system.  A reasonable
strict policy one might want to follow is to have all accesses done as
the right fsuid, without needing those unreliable st_dev/st_ino checks.
(They're unreliable because of potential side-effects on open() and
inode reuse.)  For example, we chose this strict policy in Openwall's
"tcb suite" and shadow suite patches: http://www.openwall.com/tcb/

Of course, then there's the question on whether something exotic(?) like
this should be in the kernel.

To me, it's like a "gcc -Wall" for filesystem accesses.  Sure it can
"falsely" detect many technically valid uses, but it's also helpful to
improve our filesystem access safety.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-30 19:05     ` Solar Designer
@ 2017-11-30 19:14       ` Solar Designer
  2017-12-05 10:13       ` Salvatore Mesoraca
  1 sibling, 0 replies; 60+ messages in thread
From: Solar Designer @ 2017-11-30 19:14 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening

On Thu, Nov 30, 2017 at 08:05:39PM +0100, Solar Designer wrote:
> A new thought: a directory that has someone else as its owner is for our
> purposes effectively the same as a group-writable directory.  So maybe
> whatever we'll implement for group-writable directories should also be
> done for directories that have other than the current fsuid as their
> owner.  It's currently very uncommon to have directories with the sticky
> bit set that are not at least group-writable, so this will rarely make a
> difference (and when it does, that's just right), but also it provides a
> way to explicitly include any directory under this monitoring (if the
> group-writable protection is on) - e.g., if a sysadmin wants this
> monitoring for users' home directories, they can change permissions for
> those from e.g. 700 to 1700.  This could be handy for development and
> auditing of software, even though in production it could be easily
> circumvented by the directory owner (who can remove the sticky bit,
> which we should document to avoid providing a false sense of security)
> and it won't automatically apply to subdirectories.  It'd also cover
> part of what we intend to achieve later by possibly extending the
> feature to non-sticky directories, where we might also want to treat
> different owner the same as group-writable (without the circumvention).

Of course, where I wrote "other than the current fsuid", it should be "a
non-root user other than the current fsuid".  We trust root.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-30 17:51       ` [kernel-hardening] " Solar Designer
@ 2017-12-01  9:46         ` David Laight
  -1 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-12-01  9:46 UTC (permalink / raw)
  To: 'Solar Designer'
  Cc: 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

From: Solar Designer
> Sent: 30 November 2017 17:52
> 
> On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> > From: Salvatore Mesoraca
> > > if a program tries to open a file, in a sticky directory,
> > > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > > This feature allows to detect and potentially block programs that
> > > act this way, it can be used to find vulnerabilities (like those
> > > prevented by patch #1) and to do policy enforcement.
> 
> > I presume the 'vulnerabilities' are related to symlinks being created
> > just before the open?
> 
> That's one way to exploit them.  We already have a sysctl to try and
> block that specific attack, and we're considering adding more, for other
> attacks.  But we'd also like an easy way to learn of the vulnerabilities
> without anyone trying to exploit them yet, hence this patch.
> 
> > Trouble is this change breaks a lot of general use of /tmp.
> 
> That's general misuse of /tmp.  Things like "command > /tmp/file"
> without having pre-created the file with O_EXCL e.g. by mktemp(1).

I'm sorry, I've been using Unix for over 30 years.
/tmp is a place that temporary files were created - nothing special.
Traditionally it was emptied on every boot.
There was never anything that required files be created in any
specific way.

> > I always assumed that code that cared would use O_EXCL and
> > everything else wasn't worth subverting.
> 
> Opinions will vary on whether it's worth subverting or not, and someone
> may reasonably want to configure this differently on different systems.
> 
> > I found code in vi (and elsewhere) that subverted these checks
> > by opening with O_WRONLY if stat() showed the file existed and
> > O_CREAT|O_EXCL if it didn't.
> 
> Yes, such misuses of /tmp and the like by use of those programs - where
> the potential consequences would often be less severe (if your example
> above is correct, it sounds like it'd be data spoofing but not
> overwriting another file over a link) - could remain unnoticed.

It seemed to me that code had been added to avoid issues caused by
this policing of opens.
But the code added is itself racy - so makes the whole thing pointless.

> > I'm pretty sure that traditionally a lot of these opens were done
> > with O_CREAT|O_TRUNC.
> > Implementing that as unlink() followed by a create would stop
> > 'random' (ok all) symlinks being followed.
> 
> That's racy.  We have O_EXCL, and it must be used along with O_CREAT
> when the directory is untrusted.  (If it were only about symlinks, we
> also already have O_NOFOLLOW.)

Not racy if done in the kernel itself.

> > Overall I'm pretty sure this change will break things badly somewhere.
> 
> Sure.  We want it to visibly break what was subtly broken, so that the
> breakage can be seen and corrected without an attempted attack.

Maybe, but it will break some user system when they do a kernel update.
It won't necessarily break a developer's system first.

And, AFAICT, there is already code that subverts any tests by doing
a check for the file existing and then selecting between two different
types on open.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-12-01  9:46         ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2017-12-01  9:46 UTC (permalink / raw)
  To: 'Solar Designer'
  Cc: 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

From: Solar Designer
> Sent: 30 November 2017 17:52
> 
> On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> > From: Salvatore Mesoraca
> > > if a program tries to open a file, in a sticky directory,
> > > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > > This feature allows to detect and potentially block programs that
> > > act this way, it can be used to find vulnerabilities (like those
> > > prevented by patch #1) and to do policy enforcement.
> 
> > I presume the 'vulnerabilities' are related to symlinks being created
> > just before the open?
> 
> That's one way to exploit them.  We already have a sysctl to try and
> block that specific attack, and we're considering adding more, for other
> attacks.  But we'd also like an easy way to learn of the vulnerabilities
> without anyone trying to exploit them yet, hence this patch.
> 
> > Trouble is this change breaks a lot of general use of /tmp.
> 
> That's general misuse of /tmp.  Things like "command > /tmp/file"
> without having pre-created the file with O_EXCL e.g. by mktemp(1).

I'm sorry, I've been using Unix for over 30 years.
/tmp is a place that temporary files were created - nothing special.
Traditionally it was emptied on every boot.
There was never anything that required files be created in any
specific way.

> > I always assumed that code that cared would use O_EXCL and
> > everything else wasn't worth subverting.
> 
> Opinions will vary on whether it's worth subverting or not, and someone
> may reasonably want to configure this differently on different systems.
> 
> > I found code in vi (and elsewhere) that subverted these checks
> > by opening with O_WRONLY if stat() showed the file existed and
> > O_CREAT|O_EXCL if it didn't.
> 
> Yes, such misuses of /tmp and the like by use of those programs - where
> the potential consequences would often be less severe (if your example
> above is correct, it sounds like it'd be data spoofing but not
> overwriting another file over a link) - could remain unnoticed.

It seemed to me that code had been added to avoid issues caused by
this policing of opens.
But the code added is itself racy - so makes the whole thing pointless.

> > I'm pretty sure that traditionally a lot of these opens were done
> > with O_CREAT|O_TRUNC.
> > Implementing that as unlink() followed by a create would stop
> > 'random' (ok all) symlinks being followed.
> 
> That's racy.  We have O_EXCL, and it must be used along with O_CREAT
> when the directory is untrusted.  (If it were only about symlinks, we
> also already have O_NOFOLLOW.)

Not racy if done in the kernel itself.

> > Overall I'm pretty sure this change will break things badly somewhere.
> 
> Sure.  We want it to visibly break what was subtly broken, so that the
> breakage can be seen and corrected without an attempted attack.

Maybe, but it will break some user system when they do a kernel update.
It won't necessarily break a developer's system first.

And, AFAICT, there is already code that subverts any tests by doing
a check for the file existing and then selecting between two different
types on open.

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-12-01  9:46         ` [kernel-hardening] " David Laight
@ 2017-12-01 15:52           ` Alan Cox
  -1 siblings, 0 replies; 60+ messages in thread
From: Alan Cox @ 2017-12-01 15:52 UTC (permalink / raw)
  To: David Laight
  Cc: 'Solar Designer', 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

> > That's general misuse of /tmp.  Things like "command > /tmp/file"
> > without having pre-created the file with O_EXCL e.g. by mktemp(1).  
> 
> I'm sorry, I've been using Unix for over 30 years.
> /tmp is a place that temporary files were created - nothing special.
> Traditionally it was emptied on every boot.
> There was never anything that required files be created in any
> specific way.

And in 1978 you had to boot single user and use nckeck and icheck to fix
the filesystem up by hand, you had no networking, no systemd, no
sysvinit, no ANSI C. no X11 ... (shall I go on...)

There are reasons it all changed. The origin of /tmp is a compromise of
security and disk performance made in the 1970s about an OS that was
quite different, running on a machine with typically 256K of RAM, no RAM
disks, a single very expensive fixed head drive and a larger moving head
one.

The existence of /tmp in that form today is a bizarre historic quirk.
Fortunately if you want a perfectly safe /tmp/ use namespaces and every
user can have their own private /tmp.

Alan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories
@ 2017-12-01 15:52           ` Alan Cox
  0 siblings, 0 replies; 60+ messages in thread
From: Alan Cox @ 2017-12-01 15:52 UTC (permalink / raw)
  To: David Laight
  Cc: 'Solar Designer', 'Salvatore Mesoraca',
	linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro,
	Jann Horn, Kees Cook, Eric W. Biederman

> > That's general misuse of /tmp.  Things like "command > /tmp/file"
> > without having pre-created the file with O_EXCL e.g. by mktemp(1).  
> 
> I'm sorry, I've been using Unix for over 30 years.
> /tmp is a place that temporary files were created - nothing special.
> Traditionally it was emptied on every boot.
> There was never anything that required files be created in any
> specific way.

And in 1978 you had to boot single user and use nckeck and icheck to fix
the filesystem up by hand, you had no networking, no systemd, no
sysvinit, no ANSI C. no X11 ... (shall I go on...)

There are reasons it all changed. The origin of /tmp is a compromise of
security and disk performance made in the 1970s about an OS that was
quite different, running on a machine with typically 256K of RAM, no RAM
disks, a single very expensive fixed head drive and a larger moving head
one.

The existence of /tmp in that form today is a bizarre historic quirk.
Fortunately if you want a perfectly safe /tmp/ use namespaces and every
user can have their own private /tmp.

Alan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-11-30 19:05     ` Solar Designer
  2017-11-30 19:14       ` Solar Designer
@ 2017-12-05 10:13       ` Salvatore Mesoraca
  2017-12-07 22:23         ` Solar Designer
  1 sibling, 1 reply; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-05 10:13 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening

2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Thu, Nov 30, 2017 at 03:37:29PM +0100, Salvatore Mesoraca wrote:
> > 2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd
> > > try to introduce them at the same time with the restrictions on FIFOs
> > > and regular files.  I think that bundling these together might be a
> > > recipe for never getting any of them in, or for getting in the latter
> > > with unnecessarily limited scope.
> >
> > Yes, I think you are right. I'll split them.
>
> Thanks.
>
> > > I think we need an optional policy
> > > against O_CREAT-without-O_EXCL not only for sticky directories, but also
> > > for other directories writable by other than the current fsuid - there
> > > may be several levels or individual flags here.
> > >
> > > So maybe unbundle these or at least avoid forever limiting them to
> > > sticky directories (don't include "sticky" in the sysctl name).
> >
> > Actually I never liked that sysctl name  I'll try to come up with something
> > less ugly and that doesn't include the "sticky" word.
>
> So the current name is "protected_sticky_child_create" (I couldn't even
> recall it, had to look it up for this reply).  This unnecessarily
> bundles this potentially more general policy stuff with the existing
> "protections" against specific attacks, unnecessarily limits scope to
> "sticky" and "create", and talks about some "child".  How about we use
> something totally different, focusing on "policy"?  It could be simply
> "policy" (we're already in "fs"), or if that won't fly then how about
> "security_policy" or "dac_policy"?  We will be imposing extra
> restrictions on top of usual Unix discretionary access control
> permission bits while not going all the way to mandatory access control
> (not tying objects to subjects).  So in a sense we'll have an extension
> of Unix DAC.

Yea, I like "dac_policy" very much.

> BTW, this and all of fs.protected* should be configurable per-container.

This would be a nice thing to do in the future.

> > If I implement something like what Matthew proposed[1] it will be easy to
> > extend scope and functionalities of this feature without complicating too much
> > the interface.
>
> > [1] https://lkml.org/lkml/2017/11/22/319
>
> Right.  I tried thinking of a way to specify all reasonable combinations
> without the likely unreasonable ones, but couldn't come up with anything
> elegant.  So I'm fine with Matthew's proposal as-is.

Great.

> A new thought: a directory that has someone else as its owner is for our
> purposes effectively the same as a group-writable directory.  So maybe
> whatever we'll implement for group-writable directories should also be
> done for directories that have other than the current fsuid as their
> owner. It's currently very uncommon to have directories with the sticky
> bit set that are not at least group-writable, so this will rarely make a
> difference (and when it does, that's just right), but also it provides a
> way to explicitly include any directory under this monitoring (if the
> group-writable protection is on) - e.g., if a sysadmin wants this
> monitoring for users' home directories, they can change permissions for
> those from e.g. 700 to 1700.  This could be handy for development and
> auditing of software, even though in production it could be easily
> circumvented by the directory owner (who can remove the sticky bit,
> which we should document to avoid providing a false sense of security)
> and it won't automatically apply to subdirectories.  It'd also cover
> part of what we intend to achieve later by possibly extending the
> feature to non-sticky directories, where we might also want to treat
> different owner the same as group-writable (without the circumvention).

Agreed. I'll extend it to also check for the directory owner.
Maybe I could use another bit and make this additional restriction
independent from the "group-writable" one.

> > So, are you suggesting that I should extend "O_CREAT-without-O_EXCL"
> > and "FIFOs restrictions" to work (optionally) on non-sticky directories too,
> > while leaving untouched (for the moment) "normal files restrictions"?
>
> No, I think all of these and the existing symlink restrictions should
> potentially be extended "to work (optionally) on non-sticky directories
> too", but perhaps with separate patches later.

OK.

> An even further extension may be to cover non-O_CREAT: writing or/and
> reading an existing file in an untrusted directory is also potentially
> unsafe.  Unfortunately, we can't reliably know whether the program
> possibly takes precautions by using lstat() and fstat() and comparing
> st_dev/st_ino, so we won't be able to distinguish likely unsafe and
> likely mostly safe accesses, but we'll be able to flag all of them for
> manual analysis on a developer's or an auditor's system.  A reasonable
> strict policy one might want to follow is to have all accesses done as
> the right fsuid, without needing those unreliable st_dev/st_ino checks.
> (They're unreliable because of potential side-effects on open() and
> inode reuse.)  For example, we chose this strict policy in Openwall's
> "tcb suite" and shadow suite patches: http://www.openwall.com/tcb/
>
> Of course, then there's the question on whether something exotic(?) like
> this should be in the kernel.
>
> To me, it's like a "gcc -Wall" for filesystem accesses.  Sure it can
> "falsely" detect many technically valid uses, but it's also helpful to
> improve our filesystem access safety.

Yes, this could be a useful improvement for the future.

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-11-30 16:30               ` [kernel-hardening] " Solar Designer
@ 2017-12-05 10:21                 ` Salvatore Mesoraca
  2017-12-07 21:47                   ` Solar Designer
  0 siblings, 1 reply; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-05 10:21 UTC (permalink / raw)
  To: Solar Designer
  Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel,
	Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak

2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
> Karel Zak for util-linux flock(1).
>
> On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote:
> > On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > Why would "shared lock files based on flock()" need O_CREAT without
> > > > O_EXCL?  Where specifically are they currently used that way?
> > >
> > > I don't think that they *need* to act like this, but this is how
> > > util-linux's flock(1) currently works.
>
> Oh, but you never referred to flock(1) in there, so I read flock() as
> implying flock(2).

That's true, I observed that behavior in flock(1), but I didn't specify
it was flock(1) because I thought that there may be some other users of
flock(2) that works that way.

> I think util-linux's flock(1) is relatively obscure,
> and as far as I can tell it isn't documented anywhere whether it may be
> used on untrusted lock file directories or not.  A revision of the
> flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
> The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
> those examples, I see flock(1) literally uses the /tmp directory itself
> (not a file inside) as the lock, which surprisingly appears to work.
> Checking the util-linux tree, it looks like this was added as a feature.
> I suppose the good news is this doesn't appear to allow for worse than a
> DoS against the script using flock(1) (since a different user can also
> take that lock), unless any of the upper level directories are also
> untrusted.  The man page as seen at https://linux.die.net/man/1/flock
> currently only lists a more complex example with /var/lock/mylockfile,
> where flock(1) itself is asked to access it via a pre-opened fd.
> Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
> I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
> symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
> examples.  The reality is people will use flock(1) in various directories,
> some of them trusted and some not.  If our proposed policy can detect and
> optionally break some uses in untrusted directories, that's good since
> such uses are currently unsafe (see below).
>
> > > And it doesn't seem an unreasonable behavior from their perspective,
> >
> > I thought that too, specifically I reasoned that using O_EXCL would
> > defeat the purpose of the _shared_ flock(), wouldn't it?
>
> No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
> program can retry without both of these flags.  (The actual lock is
> obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
> already does something similar, as tested using the very first example
> from the man page on a RHEL7'ish system:
>
> $ strace flock /tmp -c cat
> [...]
> open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
> open("/tmp", O_RDONLY|O_NOCTTY)         = 3
> flock(3, LOCK_EX)                       = 0
>
> As you can see, when O_CREAT failed, the program retried without that
> flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
> both at once.  Testing with a lock file (rather than lock directory):
>
> $ strace flock /tmp/lockfile -c cat
> [...]
> open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> flock(3, LOCK_EX)                       = 0
>
> This use of flock(1) would be a worse vulnerability, not limited to DoS
> against the script itself but also allowing for privilege escalation
> unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> would help (reduce the impact back to DoS against itself), and would
> require that the retry logic (like what is seen in the lock directory
> example above) be present.
>
> On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> > so maybe other programs do that too.
>
> Maybe, and they would be similarly vulnerable if they do that in
> untrusted directories, and they would also need to be fixed.
>
> A subtle case is when something like this is technically done in a
> directory writable by someone else, but is not necessarily crossing
> what the program's author or the sysadmin recognize as a privilege
> boundary.  Maybe they have accepted that this one pseudo-user can
> attack that other one anyway, such as because under a certain daemon's
> privsep model the would-be attacking pseudo-user is necessarily more
> privileged anyway.  This is why we shouldn't by default extend this
> policy to all directories writable by someone else, but instead we
> consider introducing a sysctl with varying policy levels - initially
> focusing on sticky directories writable by someone else and only later
> optionally extending to non-sticky directories writable by someone else.
> The latter mode would have even greater focus on development and
> debugging rather than on production use, although I imagine that
> specific systems could eventually afford it in production as well.
>
> > I was citing that case just to make it clear that, if someone gets
> > a warning because of flock(1), they shouldn't be worried about it.
>
> Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact
> doesn't retry without O_CREAT in the non-directory case.  That would
> need to be corrected, along with introduction of O_EXCL.
>
> > That behavior can be certainly avoided, but of course it isn't a
> > security problem per se.
>
> I think it is a security problem per se, except in the "subtle case"
> above, and it's great that our proposed policy would catch it.

I agree on the DoS, though, at first, I didn't consider it a "bug" because
there isn't any open mode that can prevent the DoS in this case.
If you want to avoid it, you must avoid other-users-writable directories
at all. So, It think that, if you are using a sticky directory, it's
intended behavior to let someone else "lock" you.
But maybe many flock(1) users are not aware of the issue and so, sometimes,
it can be unintended.
I didn't consider privilege escalation as an issue because I always
looked at flock(1) under the assumption that the lockfile is never actually
read or modified in any way and so it shouldn't make too much difference if
it's an already existing regular file or a symlink etc.
Am I missing something?

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-12-05 10:21                 ` Salvatore Mesoraca
@ 2017-12-07 21:47                   ` Solar Designer
  2017-12-11 12:08                     ` Salvatore Mesoraca
  0 siblings, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-12-07 21:47 UTC (permalink / raw)
  To: Salvatore Mesoraca
  Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel,
	Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak

On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> > $ strace flock /tmp/lockfile -c cat
> > [...]
> > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > flock(3, LOCK_EX)                       = 0
> >
> > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > against the script itself but also allowing for privilege escalation
> > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > would help (reduce the impact back to DoS against itself), and would
> > require that the retry logic (like what is seen in the lock directory
> > example above) be present.

> > > That behavior can be certainly avoided, but of course it isn't a
> > > security problem per se.
> >
> > I think it is a security problem per se, except in the "subtle case"
> > above, and it's great that our proposed policy would catch it.
> 
> I agree on the DoS, though, at first, I didn't consider it a "bug" because
> there isn't any open mode that can prevent the DoS in this case.
> If you want to avoid it, you must avoid other-users-writable directories
> at all. So, It think that, if you are using a sticky directory, it's
> intended behavior to let someone else "lock" you.

Right.  There's a worse DoS I had overlooked, though: flock(1) can also
be made to create and/or lock another file (maybe in another directory).
Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
O_NOCTTY) would be a good idea, even though uses in a directory writable
by someone else are inherently risky anyway.

> But maybe many flock(1) users are not aware of the issue and so, sometimes,
> it can be unintended.
> I didn't consider privilege escalation as an issue because I always
> looked at flock(1) under the assumption that the lockfile is never actually
> read or modified in any way and so it shouldn't make too much difference if
> it's an already existing regular file or a symlink etc.
> Am I missing something?

You made a good point, but yes: O_CREAT will follow a dangling symlink
and there are cases where creating an empty file of an attacker-chosen
pathname may allow for privilege escalation.  For example, crontab(1)
man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:

"If neither of these files exists, only the super user is allowed to
use cron."

In that configuration, simply creating empty /etc/cron.deny grants
access to crontab(1) to all users.  As user:

$ crontab -l
You (solar) are not allowed to use this program (crontab)
See crontab(1) for more information
$ ln -s /etc/cron.deny /tmp/lockfile

As root:

# sysctl -w fs.protected_symlinks=0
fs.protected_symlinks = 0
# flock /tmp/lockfile -c echo

As user:

$ crontab -l
no crontab for solar

There may also be side-effects on open of device files (the best known
example is rewinding a tape), and we won't avoid those by retrying
without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
files, but not against hard links (if on same device).  The kernel's
symlink and hardlink protections help, but in this sub-thread we were
discussing detecting userspace software issues without waiting for an
attack.  Things like this fit David Laight's point well: programs trying
to make risky (mis)uses less risky sometimes also avoid being detected
by our proposed policy.  That's life.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-05 10:13       ` Salvatore Mesoraca
@ 2017-12-07 22:23         ` Solar Designer
  2017-12-08 12:17           ` Solar Designer
  2017-12-11 16:07           ` Solar Designer
  0 siblings, 2 replies; 60+ messages in thread
From: Solar Designer @ 2017-12-07 22:23 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox

CC: Matthew Wilcox

On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > So the current name is "protected_sticky_child_create" (I couldn't even
> > recall it, had to look it up for this reply).  This unnecessarily
> > bundles this potentially more general policy stuff with the existing
> > "protections" against specific attacks, unnecessarily limits scope to
> > "sticky" and "create", and talks about some "child".  How about we use
> > something totally different, focusing on "policy"?  It could be simply
> > "policy" (we're already in "fs"), or if that won't fly then how about
> > "security_policy" or "dac_policy"?  We will be imposing extra
> > restrictions on top of usual Unix discretionary access control
> > permission bits while not going all the way to mandatory access control
> > (not tying objects to subjects).  So in a sense we'll have an extension
> > of Unix DAC.
> 
> Yea, I like "dac_policy" very much.

OK.  Thinking of this further, what we might want to have is a generic
policy against file accesses that would negate a privilege boundary
between Unix (pseudo-)users.  And one of those would be execve(2) of a
file writable by someone else, or with any of its parent directories
writable by someone else (unless that's root, indeed).  I think the name
"dac_policy" still fits this well, but anything with "create" or even
"open" in the name would not fit it.

Another way of looking at this is that we'd be creating a reverse DAC -
optionally blocking unsafe accesses made possible by too permissive DAC.
But I don't have a good sysctl name suggestion along these lines.

> > BTW, this and all of fs.protected* should be configurable per-container.
> 
> This would be a nice thing to do in the future.

OK.

> > > If I implement something like what Matthew proposed[1] it will be easy to
> > > extend scope and functionalities of this feature without complicating too much
> > > the interface.
> >
> > > [1] https://lkml.org/lkml/2017/11/22/319
> >
> > Right.  I tried thinking of a way to specify all reasonable combinations
> > without the likely unreasonable ones, but couldn't come up with anything
> > elegant.  So I'm fine with Matthew's proposal as-is.
> 
> Great.

Thinking of this further, maybe it'd be friendlier to further expansion
if we separate the policy and notify vs. block into two bitmasks, in
separate sysctl's.  If a sysadmin wants only notifications or only
blocking, that would then be easier to achieve, by setting the notify
vs. block sysctl to 0 or to all 1's (to -1 maybe, although this would
give the wrong impression of being "below 0").  Then the sysadmin would
be able to focus on the actual policy bits separately, without the
distraction of the notify vs. block bits inbetween.

The names could then be "dac_policy" and "dac_policy_enforce" maybe?
Any other suggestions?

BTW, if we negate the integers corresponding to these, then two 1's
would mean full enforcement of the strictest policy, and negative
bitmask values could be used to specify fine-grained policy and/or
enforcement.  Good idea or too unusual/confusing?  I'm afraid it is
indeed confusing.  dac_policy_enforce=1 meaning full enforcement makes
more sense to me, but negating only one of these two integers would also
be confusing.  So we probably shouldn't do any of this.  I just thought
I'd share the idea anyway.

> > A new thought: a directory that has someone else as its owner is for our
> > purposes effectively the same as a group-writable directory.  So maybe
> > whatever we'll implement for group-writable directories should also be
> > done for directories that have other than the current fsuid as their
> > owner. It's currently very uncommon to have directories with the sticky
> > bit set that are not at least group-writable, so this will rarely make a
> > difference (and when it does, that's just right), but also it provides a
> > way to explicitly include any dibrectory under this monitoring (if the
> > group-writable protection is on) - e.g., if a sysadmin wants this
> > monitoring for users' home directories, they can change permissions for
> > those from e.g. 700 to 1700.  This could be handy for development and
> > auditing of software, even though in production it could be easily
> > circumvented by the directory owner (who can remove the sticky bit,
> > which we should document to avoid providing a false sense of security)
> > and it won't automatically apply to subdirectories.  It'd also cover
> > part of what we intend to achieve later by possibly extending the
> > feature to non-sticky directories, where we might also want to treat
> > different owner the same as group-writable (without the circumvention).
> 
> Agreed. I'll extend it to also check for the directory owner.
> Maybe I could use another bit and make this additional restriction
> independent from the "group-writable" one.

I'd prefer not to waste a bit on this.  We'll need plenty of bits later
on, perhaps also in pairs or (with your suggestion to separate this one)
in triplets, and every extra bit complicates the calculation of the
bitmask value.

> > > So, are you suggesting that I should extend "O_CREAT-without-O_EXCL"
> > > and "FIFOs restrictions" to work (optionally) on non-sticky directories too,
> > > while leaving untouched (for the moment) "normal files restrictions"?
> >
> > No, I think all of these and the existing symlink restrictions should
> > potentially be extended "to work (optionally) on non-sticky directories
> > too", but perhaps with separate patches later.
> 
> OK.
> 
> > An even further extension may be to cover non-O_CREAT: writing or/and
> > reading an existing file in an untrusted directory is also potentially
> > unsafe.  Unfortunately, we can't reliably know whether the program
> > possibly takes precautions by using lstat() and fstat() and comparing
> > st_dev/st_ino, so we won't be able to distinguish likely unsafe and
> > likely mostly safe accesses, but we'll be able to flag all of them for
> > manual analysis on a developer's or an auditor's system.  A reasonable
> > strict policy one might want to follow is to have all accesses done as
> > the right fsuid, without needing those unreliable st_dev/st_ino checks.
> > (They're unreliable because of potential side-effects on open() and
> > inode reuse.)  For example, we chose this strict policy in Openwall's
> > "tcb suite" and shadow suite patches: http://www.openwall.com/tcb/
> >
> > Of course, then there's the question on whether something exotic(?) like
> > this should be in the kernel.
> >
> > To me, it's like a "gcc -Wall" for filesystem accesses.  Sure it can
> > "falsely" detect many technically valid uses, but it's also helpful to
> > improve our filesystem access safety.
> 
> Yes, this could be a useful improvement for the future.
> 
> Salvatore

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-07 22:23         ` Solar Designer
@ 2017-12-08 12:17           ` Solar Designer
  2017-12-11 12:07             ` Salvatore Mesoraca
  2017-12-11 16:07           ` Solar Designer
  1 sibling, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-12-08 12:17 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox

On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > So the current name is "protected_sticky_child_create" (I couldn't even
> > > recall it, had to look it up for this reply).  This unnecessarily
> > > bundles this potentially more general policy stuff with the existing
> > > "protections" against specific attacks, unnecessarily limits scope to
> > > "sticky" and "create", and talks about some "child".  How about we use
> > > something totally different, focusing on "policy"?  It could be simply
> > > "policy" (we're already in "fs"), or if that won't fly then how about
> > > "security_policy" or "dac_policy"?  We will be imposing extra
> > > restrictions on top of usual Unix discretionary access control
> > > permission bits while not going all the way to mandatory access control
> > > (not tying objects to subjects).  So in a sense we'll have an extension
> > > of Unix DAC.
> > 
> > Yea, I like "dac_policy" very much.
> 
> OK.  Thinking of this further, what we might want to have is a generic
> policy against file accesses that would negate a privilege boundary
> between Unix (pseudo-)users.  And one of those would be execve(2) of a
> file writable by someone else, or with any of its parent directories
> writable by someone else (unless that's root, indeed).  I think the name
> "dac_policy" still fits this well, but anything with "create" or even
> "open" in the name would not fit it.
> 
> Another way of looking at this is that we'd be creating a reverse DAC -
> optionally blocking unsafe accesses made possible by too permissive DAC.
> But I don't have a good sysctl name suggestion along these lines.

Yet another thought is that while we won't go for MAC ("not tying
objects to subjects") in our additional policy against unsafe file
accesses, it could reasonably be extended to cover not only accesses
made possible by too permissive DAC, but also those made possible by too
permissive ACLs.  I doubt we'd want to actually add ACL support in
there (too obscure, unnecessary when auditing software), but if someone
eventually does that would fit in with our general approach (and
wouldn't even require separate flags, as it's conceptually similar to
group-writable).  So an ideal name would not conflict with that.

What would be a good short name for "file access safety policy"?  Maybe
we can omit "file" considering we're in "fs".  "access_safety_policy"?

The individual flags might correspond to detecting and/or blocking:

O_CREAT w/o O_EXCL in +t world-writable directory except if file exists
and is owned by root or by current fsuid and the directory owner is root
or current fsuid

O_CREAT w/o O_EXCL in +t group-writable directory except if file exists
and is owned by root or by current fsuid and the directory owner is root
or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another
non-root user (regardless of whether the file exists and its ownership)

(Note: the "+t directory owned by another non-root user" case is for
consistency and for software testing only, not for enforcing a policy
like this in production, because the directory owner can remove the +t.)

O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions)

O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or
O_CREAT w/o O_EXCL in any directory owned by another non-root user
(regardless of the sticky bit, and also without exceptions)

Same as above for O_WRONLY or O_RDWR, so 4 more flags.

Same as above for O_RDONLY, so 4 more flags.

Other unsafe file accesses via pathname, including execve(2) (would
consider permissions on the file itself, as well as the directory),
chmod(2), chown(2).  That's maybe 6 more flags if we continue to
separate world vs. group/user, even if we don't separate by +t (but we
could nevertheless trust a pre-existing file in +t under the same
conditions listed for "O_CREAT w/o O_EXCL in +t").

Then we could want to eventually add checking of all parent directories,
and this in turn would make more syscalls relevant (in the sense of
being usable for an immediate attack on the invoking user, making them
do something unintended) in the special case when an upper level
directory is unsafe: symlink(2), link(2), unlink(2), rename(2),
mkdir(2), rmdir(2), perhaps more.  Just for these 6 I quickly
identified, and if we continue to separate world vs. group/user, this
could be 12 more flags.

So this would be 4+4+6+12 = 26 flags already.  This feels like way too
many, as well as not leaving enough bits e.g. in a 32-bit mask in case
we end up identifying many more relevant syscalls.  So perhaps the
approach above is too fine-grained and should be revised.  If so, maybe
our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too
fine-grained and should be revised as well.

What do you think?

How to group these categories in fewer flags best?  Perhaps less
separation of the open(2) flags and/or of the syscalls relevant only
with unsafe upper-level directories (one flag for all of them?)

Drop the +t / -t separation for open(2)?  (But nevertheless trust a
pre-existing file in +t under the same conditions as I listed for
"O_CREAT w/o O_EXCL in +t".)

Drop the world vs. group/user separation?

Introduce more sysctl's - separate for open(2) vs. other syscalls?

And/or maybe separate for world vs. group/user-writable?  This would be
convenient in that the same bit positions would refer to the same
open(2) modes in the same directory categories, or the same syscalls.
Extending a previously tested policy for world-writable to cover also
group/user-writable would be as simple as setting the other sysctl to
the same value.

Any other ideas?

> > > > If I implement something like what Matthew proposed[1] it will be easy to
> > > > extend scope and functionalities of this feature without complicating too much
> > > > the interface.
> > >
> > > > [1] https://lkml.org/lkml/2017/11/22/319
> > >
> > > Right.  I tried thinking of a way to specify all reasonable combinations
> > > without the likely unreasonable ones, but couldn't come up with anything
> > > elegant.  So I'm fine with Matthew's proposal as-is.
> > 
> > Great.
> 
> Thinking of this further, maybe it'd be friendlier to further expansion
> if we separate the policy and notify vs. block into two bitmasks, in
> separate sysctl's.  If a sysadmin wants only notifications or only
> blocking, that would then be easier to achieve, by setting the notify
> vs. block sysctl to 0 or to all 1's (to -1 maybe, although this would
> give the wrong impression of being "below 0").  Then the sysadmin would
> be able to focus on the actual policy bits separately, without the
> distraction of the notify vs. block bits inbetween.
> 
> The names could then be "dac_policy" and "dac_policy_enforce" maybe?
> Any other suggestions?

Unfortunately, my suggestion above wouldn't address Matthew's criticism
about the lack of flexibility.  It wouldn't be possible to independently
control notifications and enforcement.

So perhaps the two sysctl name suffixes can be "notify" and "enforce",
and they'd independently define two policies - one producing
notifications only and the other actually blocking syscalls.  Usually
they'd be the same or one of them would be 0, but arbitrary two policies
could also be specified.

Now how to combine this with extensibility of the policies themselves,
potentially eventually going to the extent I outlined above, which might
require multiple sysctl's per policy?  Perhaps just double their number?
If so, that may be a reason for keeping each of the two policies in
fewer sysctl's, which is in turn a reason for lower granularity.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-08 12:17           ` Solar Designer
@ 2017-12-11 12:07             ` Salvatore Mesoraca
  2017-12-11 15:33               ` Solar Designer
  0 siblings, 1 reply; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-11 12:07 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox

2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > So the current name is "protected_sticky_child_create" (I couldn't even
> > > > recall it, had to look it up for this reply).  This unnecessarily
> > > > bundles this potentially more general policy stuff with the existing
> > > > "protections" against specific attacks, unnecessarily limits scope to
> > > > "sticky" and "create", and talks about some "child".  How about we use
> > > > something totally different, focusing on "policy"?  It could be simply
> > > > "policy" (we're already in "fs"), or if that won't fly then how about
> > > > "security_policy" or "dac_policy"?  We will be imposing extra
> > > > restrictions on top of usual Unix discretionary access control
> > > > permission bits while not going all the way to mandatory access control
> > > > (not tying objects to subjects).  So in a sense we'll have an extension
> > > > of Unix DAC.
> > >
> > > Yea, I like "dac_policy" very much.
> >
> > OK.  Thinking of this further, what we might want to have is a generic
> > policy against file accesses that would negate a privilege boundary
> > between Unix (pseudo-)users.  And one of those would be execve(2) of a
> > file writable by someone else, or with any of its parent directories
> > writable by someone else (unless that's root, indeed).

This looks similar to what grsecurity calls TPE, I think that there is
already some ongoing work to port it upstream as an LSM. I don't know
if it's still under development, I didn't heard about it in a while.
Actually what you are proposing here is a bit different, because it will
never prevent the user from running its own executables, so its usefulness
isn't completely defeated by language interpreters etc. Still, scripts that
are directly "sourced" will escape this protection, but at least we'll
cover many other cases.
We should add a restriction on executable mmaps, too.

> > I think the name
> > "dac_policy" still fits this well, but anything with "create" or even
> > "open" in the name would not fit it.
> >
> > Another way of looking at this is that we'd be creating a reverse DAC -
> > optionally blocking unsafe accesses made possible by too permissive DAC.
> > But I don't have a good sysctl name suggestion along these lines.
>
> Yet another thought is that while we won't go for MAC ("not tying
> objects to subjects") in our additional policy against unsafe file
> accesses, it could reasonably be extended to cover not only accesses
> made possible by too permissive DAC, but also those made possible by too
> permissive ACLs.  I doubt we'd want to actually add ACL support in
> there (too obscure, unnecessary when auditing software), but if someone
> eventually does that would fit in with our general approach (and
> wouldn't even require separate flags, as it's conceptually similar to
> group-writable).

I'm not sure if I understood what you meant, but the "group-writable" check
implicitly checks for permissive ACLs, because they force the group bits on.

> So an ideal name would not conflict with that.
>
> What would be a good short name for "file access safety policy"?  Maybe
> we can omit "file" considering we're in "fs".  "access_safety_policy"?
>
> The individual flags might correspond to detecting and/or blocking:
>
> O_CREAT w/o O_EXCL in +t world-writable directory except if file exists
> and is owned by root or by current fsuid and the directory owner is root
> or current fsuid
>
> O_CREAT w/o O_EXCL in +t group-writable directory except if file exists
> and is owned by root or by current fsuid and the directory owner is root
> or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another
> non-root user (regardless of whether the file exists and its ownership)
>
> (Note: the "+t directory owned by another non-root user" case is for
> consistency and for software testing only, not for enforcing a policy
> like this in production, because the directory owner can remove the +t.)
>
> O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions)
>
> O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or
> O_CREAT w/o O_EXCL in any directory owned by another non-root user
> (regardless of the sticky bit, and also without exceptions)
>
> Same as above for O_WRONLY or O_RDWR, so 4 more flags.
>
> Same as above for O_RDONLY, so 4 more flags.
>
> Other unsafe file accesses via pathname, including execve(2) (would
> consider permissions on the file itself, as well as the directory),

I'll use the same bit used to restrict execve to restrict also
executable mmaps.
What do you think?

> chmod(2), chown(2).  That's maybe 6 more flags if we continue to
> separate world vs. group/user, even if we don't separate by +t (but we
> could nevertheless trust a pre-existing file in +t under the same
> conditions listed for "O_CREAT w/o O_EXCL in +t").
>
> Then we could want to eventually add checking of all parent directories,
> and this in turn would make more syscalls relevant (in the sense of
> being usable for an immediate attack on the invoking user, making them
> do something unintended) in the special case when an upper level
> directory is unsafe: symlink(2), link(2), unlink(2), rename(2),
> mkdir(2), rmdir(2), perhaps more.  Just for these 6 I quickly
> identified, and if we continue to separate world vs. group/user, this
> could be 12 more flags.
>
> So this would be 4+4+6+12 = 26 flags already.  This feels like way too
> many, as well as not leaving enough bits e.g. in a 32-bit mask in case
> we end up identifying many more relevant syscalls.  So perhaps the
> approach above is too fine-grained and should be revised.  If so, maybe
> our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too
> fine-grained and should be revised as well.
>
> What do you think?
>
> How to group these categories in fewer flags best?  Perhaps less
> separation of the open(2) flags and/or of the syscalls relevant only
> with unsafe upper-level directories (one flag for all of them?)
>
> Drop the +t / -t separation for open(2)?  (But nevertheless trust a
> pre-existing file in +t under the same conditions as I listed for
> "O_CREAT w/o O_EXCL in +t".)
>
> Drop the world vs. group/user separation?
>
> Introduce more sysctl's - separate for open(2) vs. other syscalls?
>
> And/or maybe separate for world vs. group/user-writable?  This would be
> convenient in that the same bit positions would refer to the same
> open(2) modes in the same directory categories, or the same syscalls.
> Extending a previously tested policy for world-writable to cover also
> group/user-writable would be as simple as setting the other sysctl to
> the same value.
>
> Any other ideas?

I'd group some of those restrictions to be controlled by the same bit,
something like:
 - open(2)
 - execve(2), mmap(2)
 - chmod(2), chown(2)
 - mkdir(2), rmdir(2)
 - symlink(2), link(2), unlink(2), rename(2)

And I'd use the same bitmasks in the following sysctls:
 - access_control_policy
 - access_control_policy_notify
 - access_control_policy_group
 - access_control_policy_group_notify

If those names are too long we could change "access_control" to just "ac".

What do you think?

Anyway, given that the scope of this feature is growing bigger and bigger,
should it be reimplemented as an LSM?

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories
  2017-12-07 21:47                   ` Solar Designer
@ 2017-12-11 12:08                     ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-11 12:08 UTC (permalink / raw)
  To: Solar Designer
  Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel,
	Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn,
	Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak

2017-12-07 22:47 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> > 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > $ strace flock /tmp/lockfile -c cat
> > > [...]
> > > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > > flock(3, LOCK_EX)                       = 0
> > >
> > > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > > against the script itself but also allowing for privilege escalation
> > > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > > would help (reduce the impact back to DoS against itself), and would
> > > require that the retry logic (like what is seen in the lock directory
> > > example above) be present.
>
> > > > That behavior can be certainly avoided, but of course it isn't a
> > > > security problem per se.
> > >
> > > I think it is a security problem per se, except in the "subtle case"
> > > above, and it's great that our proposed policy would catch it.
> >
> > I agree on the DoS, though, at first, I didn't consider it a "bug" because
> > there isn't any open mode that can prevent the DoS in this case.
> > If you want to avoid it, you must avoid other-users-writable directories
> > at all. So, It think that, if you are using a sticky directory, it's
> > intended behavior to let someone else "lock" you.
>
> Right.  There's a worse DoS I had overlooked, though: flock(1) can also
> be made to create and/or lock another file (maybe in another directory).
> Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
> O_NOCTTY) would be a good idea, even though uses in a directory writable
> by someone else are inherently risky anyway.
>
> > But maybe many flock(1) users are not aware of the issue and so, sometimes,
> > it can be unintended.
> > I didn't consider privilege escalation as an issue because I always
> > looked at flock(1) under the assumption that the lockfile is never actually
> > read or modified in any way and so it shouldn't make too much difference if
> > it's an already existing regular file or a symlink etc.
> > Am I missing something?
>
> You made a good point, but yes: O_CREAT will follow a dangling symlink
> and there are cases where creating an empty file of an attacker-chosen
> pathname may allow for privilege escalation.  For example, crontab(1)
> man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:
>
> "If neither of these files exists, only the super user is allowed to
> use cron."
>
> In that configuration, simply creating empty /etc/cron.deny grants
> access to crontab(1) to all users.  As user:
>
> $ crontab -l
> You (solar) are not allowed to use this program (crontab)
> See crontab(1) for more information
> $ ln -s /etc/cron.deny /tmp/lockfile
>
> As root:
>
> # sysctl -w fs.protected_symlinks=0
> fs.protected_symlinks = 0
> # flock /tmp/lockfile -c echo
>
> As user:
>
> $ crontab -l
> no crontab for solar

Ah, right. I didn't think of it.

> There may also be side-effects on open of device files (the best known
> example is rewinding a tape), and we won't avoid those by retrying
> without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
> files, but not against hard links (if on same device).  The kernel's
> symlink and hardlink protections help, but in this sub-thread we were
> discussing detecting userspace software issues without waiting for an
> attack.  Things like this fit David Laight's point well: programs trying
> to make risky (mis)uses less risky sometimes also avoid being detected
> by our proposed policy.  That's life.

Yes, unfortunately some bad behaviors will go unnoticed. But many others won't,
so I thinks this is still a useful feature to have.

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-11 12:07             ` Salvatore Mesoraca
@ 2017-12-11 15:33               ` Solar Designer
  2017-12-12 18:00                 ` Salvatore Mesoraca
  0 siblings, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-12-11 15:33 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox

On Mon, Dec 11, 2017 at 01:07:36PM +0100, Salvatore Mesoraca wrote:
> 2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>:
> > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > > So the current name is "protected_sticky_child_create" (I couldn't even
> > > > > recall it, had to look it up for this reply).  This unnecessarily
> > > > > bundles this potentially more general policy stuff with the existing
> > > > > "protections" against specific attacks, unnecessarily limits scope to
> > > > > "sticky" and "create", and talks about some "child".  How about we use
> > > > > something totally different, focusing on "policy"?  It could be simply
> > > > > "policy" (we're already in "fs"), or if that won't fly then how about
> > > > > "security_policy" or "dac_policy"?  We will be imposing extra
> > > > > restrictions on top of usual Unix discretionary access control
> > > > > permission bits while not going all the way to mandatory access control
> > > > > (not tying objects to subjects).  So in a sense we'll have an extension
> > > > > of Unix DAC.
> > > >
> > > > Yea, I like "dac_policy" very much.
> > >
> > > OK.  Thinking of this further, what we might want to have is a generic
> > > policy against file accesses that would negate a privilege boundary
> > > between Unix (pseudo-)users.  And one of those would be execve(2) of a
> > > file writable by someone else, or with any of its parent directories
> > > writable by someone else (unless that's root, indeed).
> 
> This looks similar to what grsecurity calls TPE,

Related, but not very similar.  The goal is almost the opposite.

TPE aims to prevent the user from bringing and running their own (and
other non-root users') binaries, so it restricts the invoking user.

[ FWIW, it's restrictive tendencies like this that led to the name
Openwall at the time when TPE was new (it was already called that in a
1998 Phrack article, pre-dating grsecurity), meaning we'd focus on
security without restricting the user's reasonable use of the system.
In late 1990s and early 2000s, it almost looked like e.g. shared web
hosting providers would forbid convenient and full Unix shell access
(while nevertheless allowing PHP script uploads, etc.)  Some did.
But then this trend reversed. ]

The potential inclusion of execve(2) in our safety policy would prevent
the user from invoking binaries that are under another non-root user's
control.

So the similarity to TPE is that either policy would allow invoking
binaries that are part of the system.  The difference, roughly speaking,
is that TPE would disallow the user's own binaries whereas we would
disallow only someone else's binaries.

> I think that there is
> already some ongoing work to port it upstream as an LSM.

Yes, this was discussed in here and I criticized it, but not CC'ing LKML
not to obstruct that effort.

> I don't know
> if it's still under development, I didn't heard about it in a while.

I also don't know what its current status is.

> Actually what you are proposing here is a bit different, because it will
> never prevent the user from running its own executables,

Exactly.

> so its usefulness
> isn't completely defeated by language interpreters etc. Still, scripts that
> are directly "sourced" will escape this protection, but at least we'll
> cover many other cases.

We should indeed consider the limitations of our approach, but I think
without reference to TPE (and its bypasses) as TPE's goal is different.

For scripts that are directly "sourced", we could have a policy on
open(2), where I suggested optionally including even O_RDONLY.  There's
no fundamental difference between (interpreted) code and data input to a
program - either may control that program.  This is why merely reading a
file under someone else's control is a risk.

However, as I suggested so far, for open(2) we'd only potentially
restrict file access in unsafe directories (eventually including their
chain of parent directories).  To deal with the risk of someone
"sourcing" a script or reading a crucial configuration file directly
writable by someone else, yet located in a safe directory, we'd have to
extend those restrictions to target file permissions, like I suggested
we do for execve(2).  Unfortunately, this will probably cover many more
reasonable and safe uses (where the program reading either knows to
distrust the file contents or deliberately trusts whoever else can write
the file).  So if we go for this policy extension at all (perhaps for
special-purpose/embedded distros that are developed to meet the
policy?), we should probably have a separate bit to control it.

> We should add a restriction on executable mmaps, too.

We're focusing on accesses via pathnames, and that would be a
restriction on what can be done on an open fd (what file permissions
would we use? as of when?)  It'd make some sense if we were to add
restrictions on open(2) of files with unsafe permissions (on the file
itself, not the directories), as I mentioned in the paragraph above.
Then it'd allow us to make that restriction more fine-grained: apply it
separately to shared libraries and such vs. arbitrary other files.  But
it's tricky.  So while we may have it in mind and not rule it out with
the interface we design now, it's not something we should decide on now.

Another drawback of these policy extensions (including some from my
previous message) is that we'd be getting into restricting the user (not
our goal) rather than just preventing the user from doing things
unsafely yet having a (preferably safe) way to achieve their goal (this
is our goal).  Sometimes a user will deliberately share a file with
another user, and the other user should ideally have a way to read that
file, even if by knowingly accepting a risk (although that's not ideal).

This makes me sad, but I don't have a good suggestion except to ensure
we're not forcing a sysadmin to enable these parts of the policy along
with those parts that do not have such unfortunate user-restrictive
properties.  So this should affect our decisions on the grouping of
potential policy bits.

> > > I think the name
> > > "dac_policy" still fits this well, but anything with "create" or even
> > > "open" in the name would not fit it.
> > >
> > > Another way of looking at this is that we'd be creating a reverse DAC -
> > > optionally blocking unsafe accesses made possible by too permissive DAC.
> > > But I don't have a good sysctl name suggestion along these lines.
> >
> > Yet another thought is that while we won't go for MAC ("not tying
> > objects to subjects") in our additional policy against unsafe file
> > accesses, it could reasonably be extended to cover not only accesses
> > made possible by too permissive DAC, but also those made possible by too
> > permissive ACLs.  I doubt we'd want to actually add ACL support in
> > there (too obscure, unnecessary when auditing software), but if someone
> > eventually does that would fit in with our general approach (and
> > wouldn't even require separate flags, as it's conceptually similar to
> > group-writable).
> 
> I'm not sure if I understood what you meant, but the "group-writable" check
> implicitly checks for permissive ACLs, because they force the group bits on.

Are you referring to how ACLs are currently implemented in the kernel?
I'm not familiar with that.  If you like and have time, you might want
to illustrate your point by a code snippet (if I understood you right,
that would be code already in the kernel), although I think it's not
essential to the rest of this discussion now (we should prefer sysctl
names not conflicting with this potential goal either way).

> > So an ideal name would not conflict with that.
> >
> > What would be a good short name for "file access safety policy"?  Maybe
> > we can omit "file" considering we're in "fs".  "access_safety_policy"?
> >
> > The individual flags might correspond to detecting and/or blocking:
> >
> > O_CREAT w/o O_EXCL in +t world-writable directory except if file exists
> > and is owned by root or by current fsuid and the directory owner is root
> > or current fsuid
> >
> > O_CREAT w/o O_EXCL in +t group-writable directory except if file exists
> > and is owned by root or by current fsuid and the directory owner is root
> > or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another
> > non-root user (regardless of whether the file exists and its ownership)
> >
> > (Note: the "+t directory owned by another non-root user" case is for
> > consistency and for software testing only, not for enforcing a policy
> > like this in production, because the directory owner can remove the +t.)
> >
> > O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions)
> >
> > O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or
> > O_CREAT w/o O_EXCL in any directory owned by another non-root user
> > (regardless of the sticky bit, and also without exceptions)
> >
> > Same as above for O_WRONLY or O_RDWR, so 4 more flags.
> >
> > Same as above for O_RDONLY, so 4 more flags.
> >
> > Other unsafe file accesses via pathname, including execve(2) (would
> > consider permissions on the file itself, as well as the directory),
> 
> I'll use the same bit used to restrict execve to restrict also
> executable mmaps.
> What do you think?

I think we're not ready to discuss that.

execve(2) itself more obviously fits in with what we were discussing so
far because of relevance of directory permissions and because of access
via pathname in this very syscall.

> > chmod(2), chown(2).  That's maybe 6 more flags if we continue to
> > separate world vs. group/user, even if we don't separate by +t (but we
> > could nevertheless trust a pre-existing file in +t under the same
> > conditions listed for "O_CREAT w/o O_EXCL in +t").
> >
> > Then we could want to eventually add checking of all parent directories,
> > and this in turn would make more syscalls relevant (in the sense of
> > being usable for an immediate attack on the invoking user, making them
> > do something unintended) in the special case when an upper level
> > directory is unsafe: symlink(2), link(2), unlink(2), rename(2),
> > mkdir(2), rmdir(2), perhaps more.  Just for these 6 I quickly
> > identified, and if we continue to separate world vs. group/user, this
> > could be 12 more flags.
> >
> > So this would be 4+4+6+12 = 26 flags already.  This feels like way too
> > many, as well as not leaving enough bits e.g. in a 32-bit mask in case
> > we end up identifying many more relevant syscalls.  So perhaps the
> > approach above is too fine-grained and should be revised.  If so, maybe
> > our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too
> > fine-grained and should be revised as well.
> >
> > What do you think?
> >
> > How to group these categories in fewer flags best?  Perhaps less
> > separation of the open(2) flags and/or of the syscalls relevant only
> > with unsafe upper-level directories (one flag for all of them?)
> >
> > Drop the +t / -t separation for open(2)?  (But nevertheless trust a
> > pre-existing file in +t under the same conditions as I listed for
> > "O_CREAT w/o O_EXCL in +t".)
> >
> > Drop the world vs. group/user separation?
> >
> > Introduce more sysctl's - separate for open(2) vs. other syscalls?
> >
> > And/or maybe separate for world vs. group/user-writable?  This would be
> > convenient in that the same bit positions would refer to the same
> > open(2) modes in the same directory categories, or the same syscalls.
> > Extending a previously tested policy for world-writable to cover also
> > group/user-writable would be as simple as setting the other sysctl to
> > the same value.
> >
> > Any other ideas?
> 
> I'd group some of those restrictions to be controlled by the same bit,
> something like:
>  - open(2)

I think at least writes vs. reads are very different.

>  - execve(2), mmap(2)
>  - chmod(2), chown(2)
>  - mkdir(2), rmdir(2)
>  - symlink(2), link(2), unlink(2), rename(2)

Maybe, perhaps except for mmap(2).

> And I'd use the same bitmasks in the following sysctls:
>  - access_control_policy
>  - access_control_policy_notify
>  - access_control_policy_group
>  - access_control_policy_group_notify

Our policy is not about access control - it is about access safety.
(It's the same confusion with the comparison to TPE again.  We should
actually try and avoid getting into the access control territory here.)

Also, in your list "access_control_policy" feels like it's the main
thing, whereas actually it's just one of the four (for world-writable
directories, right?)

We need something like:

safety_world_enforce
safety_world_notify
safety_group_enforce
safety_group_notify

but I'm afraid these are not self-explanatory at all.

> What do you think?

This will probably need even further discussion, implementation, some
usage experience, then changes or maybe even a redesign before we
possibly propose it for inclusion upstream.  For now, we're just
thinking out loud, and appear unready to propose anything upstream.

As discussed previously, you should nevertheless submit the FIFO and
regular file protections in +t as a separate patch.

> Anyway, given that the scope of this feature is growing bigger and bigger,
> should it be reimplemented as an LSM?

I had this thought, too.  I think we should consider possible races (are
there races that would apply in an LSM but would be avoided in a patch?),
maintenance (which implementation would likely be better maintained?),
and easy availability (so that e.g. someone developing software on e.g.
Ubuntu could easily partially turn this on temporarily, even if Ubuntu
itself does not fully conform to the policy and thus wouldn't normally
ship with these features - and would likely not include an LSM then?)

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-07 22:23         ` Solar Designer
  2017-12-08 12:17           ` Solar Designer
@ 2017-12-11 16:07           ` Solar Designer
  2017-12-12 18:01             ` Salvatore Mesoraca
  1 sibling, 1 reply; 60+ messages in thread
From: Solar Designer @ 2017-12-11 16:07 UTC (permalink / raw)
  To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox

On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > A new thought: a directory that has someone else as its owner is for our
> > > purposes effectively the same as a group-writable directory.  So maybe
> > > whatever we'll implement for group-writable directories should also be
> > > done for directories that have other than the current fsuid as their
> > > owner. It's currently very uncommon to have directories with the sticky
> > > bit set that are not at least group-writable, so this will rarely make a
> > > difference (and when it does, that's just right), but also it provides a
> > > way to explicitly include any dibrectory under this monitoring (if the
> > > group-writable protection is on) - e.g., if a sysadmin wants this
> > > monitoring for users' home directories, they can change permissions for
> > > those from e.g. 700 to 1700.  This could be handy for development and
> > > auditing of software, even though in production it could be easily
> > > circumvented by the directory owner (who can remove the sticky bit,
> > > which we should document to avoid providing a false sense of security)
> > > and it won't automatically apply to subdirectories.  It'd also cover
> > > part of what we intend to achieve later by possibly extending the
> > > feature to non-sticky directories, where we might also want to treat
> > > different owner the same as group-writable (without the circumvention).
> > 
> > Agreed. I'll extend it to also check for the directory owner.
> > Maybe I could use another bit and make this additional restriction
> > independent from the "group-writable" one.
> 
> I'd prefer not to waste a bit on this.  We'll need plenty of bits later
> on, perhaps also in pairs or (with your suggestion to separate this one)
> in triplets, and every extra bit complicates the calculation of the
> bitmask value.

OTOH, if we were to interleave the world and group bits, then maybe
instead of having the world and group bits as separate we could have a
two bit number, which naturally leaves us an extra value we'd use for
owner.  So:

0 - none
1 - world
2 - group and world
3 - owner, group, and world

This would remove the ability to enable our restrictions for group (and
owner) while disabling them for world, but that's likely just an obscure
misconfiguration.

Unfortunately, this is not compatible with my other idea of easily
extending a tested policy from world onto group by simply assigning the
same value to the other sysctl.

I'm not sure which of these approaches is cleaner.

Alexander

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-11 15:33               ` Solar Designer
@ 2017-12-12 18:00                 ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-12 18:00 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox

2017-12-11 16:33 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Mon, Dec 11, 2017 at 01:07:36PM +0100, Salvatore Mesoraca wrote:
> > 2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> > > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > > > So the current name is "protected_sticky_child_create" (I couldn't even
> > > > > > recall it, had to look it up for this reply).  This unnecessarily
> > > > > > bundles this potentially more general policy stuff with the existing
> > > > > > "protections" against specific attacks, unnecessarily limits scope to
> > > > > > "sticky" and "create", and talks about some "child".  How about we use
> > > > > > something totally different, focusing on "policy"?  It could be simply
> > > > > > "policy" (we're already in "fs"), or if that won't fly then how about
> > > > > > "security_policy" or "dac_policy"?  We will be imposing extra
> > > > > > restrictions on top of usual Unix discretionary access control
> > > > > > permission bits while not going all the way to mandatory access control
> > > > > > (not tying objects to subjects).  So in a sense we'll have an extension
> > > > > > of Unix DAC.
> > > > >
> > > > > Yea, I like "dac_policy" very much.
> > > >
> > > > OK.  Thinking of this further, what we might want to have is a generic
> > > > policy against file accesses that would negate a privilege boundary
> > > > between Unix (pseudo-)users.  And one of those would be execve(2) of a
> > > > file writable by someone else, or with any of its parent directories
> > > > writable by someone else (unless that's root, indeed).
> >
> > This looks similar to what grsecurity calls TPE,
>
> Related, but not very similar.  The goal is almost the opposite.
>
> TPE aims to prevent the user from bringing and running their own (and
> other non-root users') binaries, so it restricts the invoking user.
>
> [ FWIW, it's restrictive tendencies like this that led to the name
> Openwall at the time when TPE was new (it was already called that in a
> 1998 Phrack article, pre-dating grsecurity), meaning we'd focus on
> security without restricting the user's reasonable use of the system.
> In late 1990s and early 2000s, it almost looked like e.g. shared web
> hosting providers would forbid convenient and full Unix shell access
> (while nevertheless allowing PHP script uploads, etc.)  Some did.
> But then this trend reversed. ]
>
> The potential inclusion of execve(2) in our safety policy would prevent
> the user from invoking binaries that are under another non-root user's
> control.
>
> So the similarity to TPE is that either policy would allow invoking
> binaries that are part of the system.  The difference, roughly speaking,
> is that TPE would disallow the user's own binaries whereas we would
> disallow only someone else's binaries.
>
> > I think that there is
> > already some ongoing work to port it upstream as an LSM.
>
> Yes, this was discussed in here and I criticized it, but not CC'ing LKML
> not to obstruct that effort.
>
> > I don't know
> > if it's still under development, I didn't heard about it in a while.
>
> I also don't know what its current status is.
>
> > Actually what you are proposing here is a bit different, because it will
> > never prevent the user from running its own executables,
>
> Exactly.
>
> > so its usefulness
> > isn't completely defeated by language interpreters etc. Still, scripts that
> > are directly "sourced" will escape this protection, but at least we'll
> > cover many other cases.
>
> We should indeed consider the limitations of our approach, but I think
> without reference to TPE (and its bypasses) as TPE's goal is different.
>
> For scripts that are directly "sourced", we could have a policy on
> open(2), where I suggested optionally including even O_RDONLY.  There's
> no fundamental difference between (interpreted) code and data input to a
> program - either may control that program.  This is why merely reading a
> file under someone else's control is a risk.
>
> However, as I suggested so far, for open(2) we'd only potentially
> restrict file access in unsafe directories (eventually including their
> chain of parent directories).  To deal with the risk of someone
> "sourcing" a script or reading a crucial configuration file directly
> writable by someone else, yet located in a safe directory, we'd have to
> extend those restrictions to target file permissions, like I suggested
> we do for execve(2).  Unfortunately, this will probably cover many more
> reasonable and safe uses (where the program reading either knows to
> distrust the file contents or deliberately trusts whoever else can write
> the file).  So if we go for this policy extension at all (perhaps for
> special-purpose/embedded distros that are developed to meet the
> policy?), we should probably have a separate bit to control it.

Yes, a restriction like this will probably break many safe uses, but it can
also be useful in some cases.

> > We should add a restriction on executable mmaps, too.
>
> We're focusing on accesses via pathnames, and that would be a
> restriction on what can be done on an open fd (what file permissions
> would we use? as of when?)  It'd make some sense if we were to add
> restrictions on open(2) of files with unsafe permissions (on the file
> itself, not the directories), as I mentioned in the paragraph above.
> Then it'd allow us to make that restriction more fine-grained: apply it
> separately to shared libraries and such vs. arbitrary other files.  But
> it's tricky.  So while we may have it in mind and not rule it out with
> the interface we design now, it's not something we should decide on now.
> Another drawback of these policy extensions (including some from my
> previous message) is that we'd be getting into restricting the user (not
> our goal) rather than just preventing the user from doing things
> unsafely yet having a (preferably safe) way to achieve their goal (this
> is our goal).  Sometimes a user will deliberately share a file with
> another user, and the other user should ideally have a way to read that
> file, even if by knowingly accepting a risk (although that's not ideal).
>
> This makes me sad, but I don't have a good suggestion except to ensure
> we're not forcing a sysadmin to enable these parts of the policy along
> with those parts that do not have such unfortunate user-restrictive
> properties.  So this should affect our decisions on the grouping of
> potential policy bits.

Agreed.

> > > > I think the name
> > > > "dac_policy" still fits this well, but anything with "create" or even
> > > > "open" in the name would not fit it.
> > > >
> > > > Another way of looking at this is that we'd be creating a reverse DAC -
> > > > optionally blocking unsafe accesses made possible by too permissive DAC.
> > > > But I don't have a good sysctl name suggestion along these lines.
> > >
> > > Yet another thought is that while we won't go for MAC ("not tying
> > > objects to subjects") in our additional policy against unsafe file
> > > accesses, it could reasonably be extended to cover not only accesses
> > > made possible by too permissive DAC, but also those made possible by too
> > > permissive ACLs.  I doubt we'd want to actually add ACL support in
> > > there (too obscure, unnecessary when auditing software), but if someone
> > > eventually does that would fit in with our general approach (and
> > > wouldn't even require separate flags, as it's conceptually similar to
> > > group-writable).
> >
> > I'm not sure if I understood what you meant, but the "group-writable" check
> > implicitly checks for permissive ACLs, because they force the group bits on.
>
> Are you referring to how ACLs are currently implemented in the kernel?
> I'm not familiar with that.  If you like and have time, you might want
> to illustrate your point by a code snippet (if I understood you right,
> that would be code already in the kernel), although I think it's not
> essential to the rest of this discussion now (we should prefer sysctl
> names not conflicting with this potential goal either way).

[disclaimer]
While working on v1 of this patchset I was wondering if ACLs could be used
to bypass these protections and so I dug a bit in the ACLs code in the
kernel. I'm far from an expert on the subject and, I have to admit, I never
used ACLs in real world.
[/disclaimer]
In my understanding, the permission bits granted to some user or group via ACLs
are always masked against a "mask object" [1], so that ACLs cannot grant
permissions that are not allowed by the "mask object" (except for the file
owner). This "mask object" appears to be tied to the group permissions. In fact,
whenever the group permissions are changed, it gets changed too [2] and whenever
the "mask object" is changed, the file "p_mode" is changed accordingly [3][4].
In practice, ACLs cannot grant permissions that are not already granted in the
mode bits. This is confirmed, in a convoluted way, in acl(5):
"The ACL_MASK entry denotes the maximum access rights that can be granted by
entries of type ACL_USER, ACL_GROUP_OBJ, or ACL_GROUP."
...
"An ACL that contains entries of ACL_USER or ACL_GROUP tag types must contain
exactly one entry of the ACL_MASK tag type"
...
"If the ACL has an ACL_MASK entry, the group permissions correspond to the
permissions of the ACL_MASK entry"

e.g.
    host $ SOMEUSER='someuser'
    host $ touch FILE
    host $ chmod 0600 FILE
    host $ stat -c '%a' FILE
    600
    host $ strace -echmod,fchmod setfacl -m "u:$SOMEUSER:rwx" FILE
    +++ exited with 0 +++
    host $ stat -c '%a' FILE    #setfacl sets the mask_obj for us
    670
    host $ sudo -u $SOMEUSER touch FILE
    host $ chmod 0600 FILE
    host $ getfacl FILE
    # file: FILE
    # owner: root
    # group: root
    user::rw-
    user:someuser:rwx                 #effective:---
    group::---
    mask::---
    other::---
    host $ LC_ALL=C sudo -u $SOMEUSER touch FILE
    touch: cannot touch 'FILE': Permission denied

[1] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L392
[2] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L507
[3] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L643
[4] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L302

> > > So an ideal name would not conflict with that.
> > >
> > > What would be a good short name for "file access safety policy"?  Maybe
> > > we can omit "file" considering we're in "fs".  "access_safety_policy"?
> > >
> > > The individual flags might correspond to detecting and/or blocking:
> > >
> > > O_CREAT w/o O_EXCL in +t world-writable directory except if file exists
> > > and is owned by root or by current fsuid and the directory owner is root
> > > or current fsuid
> > >
> > > O_CREAT w/o O_EXCL in +t group-writable directory except if file exists
> > > and is owned by root or by current fsuid and the directory owner is root
> > > or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another
> > > non-root user (regardless of whether the file exists and its ownership)
> > >
> > > (Note: the "+t directory owned by another non-root user" case is for
> > > consistency and for software testing only, not for enforcing a policy
> > > like this in production, because the directory owner can remove the +t.)
> > >
> > > O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions)
> > >
> > > O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or
> > > O_CREAT w/o O_EXCL in any directory owned by another non-root user
> > > (regardless of the sticky bit, and also without exceptions)
> > >
> > > Same as above for O_WRONLY or O_RDWR, so 4 more flags.
> > >
> > > Same as above for O_RDONLY, so 4 more flags.
> > >
> > > Other unsafe file accesses via pathname, including execve(2) (would
> > > consider permissions on the file itself, as well as the directory),
> >
> > I'll use the same bit used to restrict execve to restrict also
> > executable mmaps.
> > What do you think?
>
> I think we're not ready to discuss that.
>
> execve(2) itself more obviously fits in with what we were discussing so
> far because of relevance of directory permissions and because of access
> via pathname in this very syscall.

OK.

> > > chmod(2), chown(2).  That's maybe 6 more flags if we continue to
> > > separate world vs. group/user, even if we don't separate by +t (but we
> > > could nevertheless trust a pre-existing file in +t under the same
> > > conditions listed for "O_CREAT w/o O_EXCL in +t").
> > >
> > > Then we could want to eventually add checking of all parent directories,
> > > and this in turn would make more syscalls relevant (in the sense of
> > > being usable for an immediate attack on the invoking user, making them
> > > do something unintended) in the special case when an upper level
> > > directory is unsafe: symlink(2), link(2), unlink(2), rename(2),
> > > mkdir(2), rmdir(2), perhaps more.  Just for these 6 I quickly
> > > identified, and if we continue to separate world vs. group/user, this
> > > could be 12 more flags.
> > >
> > > So this would be 4+4+6+12 = 26 flags already.  This feels like way too
> > > many, as well as not leaving enough bits e.g. in a 32-bit mask in case
> > > we end up identifying many more relevant syscalls.  So perhaps the
> > > approach above is too fine-grained and should be revised.  If so, maybe
> > > our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too
> > > fine-grained and should be revised as well.
> > >
> > > What do you think?
> > >
> > > How to group these categories in fewer flags best?  Perhaps less
> > > separation of the open(2) flags and/or of the syscalls relevant only
> > > with unsafe upper-level directories (one flag for all of them?)
> > >
> > > Drop the +t / -t separation for open(2)?  (But nevertheless trust a
> > > pre-existing file in +t under the same conditions as I listed for
> > > "O_CREAT w/o O_EXCL in +t".)
> > >
> > > Drop the world vs. group/user separation?
> > >
> > > Introduce more sysctl's - separate for open(2) vs. other syscalls?
> > >
> > > And/or maybe separate for world vs. group/user-writable?  This would be
> > > convenient in that the same bit positions would refer to the same
> > > open(2) modes in the same directory categories, or the same syscalls.
> > > Extending a previously tested policy for world-writable to cover also
> > > group/user-writable would be as simple as setting the other sysctl to
> > > the same value.
> > >
> > > Any other ideas?
> >
> > I'd group some of those restrictions to be controlled by the same bit,
> > something like:
> >  - open(2)
>
> I think at least writes vs. reads are very different.

Agreed.

> >  - execve(2), mmap(2)
> >  - chmod(2), chown(2)
> >  - mkdir(2), rmdir(2)
> >  - symlink(2), link(2), unlink(2), rename(2)
>
> Maybe, perhaps except for mmap(2).

OK.

> > And I'd use the same bitmasks in the following sysctls:
> >  - access_control_policy
> >  - access_control_policy_notify
> >  - access_control_policy_group
> >  - access_control_policy_group_notify
>
> Our policy is not about access control - it is about access safety.

Yes, you are right.

> (It's the same confusion with the comparison to TPE again.  We should
> actually try and avoid getting into the access control territory here.)
>
> Also, in your list "access_control_policy" feels like it's the main
> thing, whereas actually it's just one of the four (for world-writable
> directories, right?)
>
> We need something like:
>
> safety_world_enforce
> safety_world_notify
> safety_group_enforce
> safety_group_notify
>
> but I'm afraid these are not self-explanatory at all.

Maybe we could use these names for the moment and change them later,
once we'll have better defined this feature.

> > What do you think?
>
> This will probably need even further discussion, implementation, some
> usage experience, then changes or maybe even a redesign before we
> possibly propose it for inclusion upstream.  For now, we're just
> thinking out loud, and appear unready to propose anything upstream.
>
> As discussed previously, you should nevertheless submit the FIFO and
> regular file protections in +t as a separate patch.

Yes, sure.

> > Anyway, given that the scope of this feature is growing bigger and bigger,
> > should it be reimplemented as an LSM?
>
> I had this thought, too.  I think we should consider possible races (are
> there races that would apply in an LSM but would be avoided in a patch?),

Now that I think about it,  I don't believe there is any way, for an LSM,
to know if a file as been opened using the O_EXCL flag or not.
So, with the current framework, this feature (or at least part of it)
can't be implemented (AFAIK).
Of course we can always add a new hook and, if we go down that route, we
can prevent any possible race.

> maintenance (which implementation would likely be better maintained?),

Both solutions have pros and cons in this regard, I'm not sure which one
would be better.

> and easy availability (so that e.g. someone developing software on e.g.
> Ubuntu could easily partially turn this on temporarily, even if Ubuntu
> itself does not fully conform to the policy and thus wouldn't normally
> ship with these features - and would likely not include an LSM then?)

I don't think that this would be a big problem. If it's stackable and "off
by default", distro maintainers could enable it without issues.
Ubuntu kernel, in particular, is configured with support for any existing
LSM, it defaults to AppArmor + Yama, but one could enable SELinux (or any
other LSM) at boot.

My impression is that an LSM has more chances to be accepted by upstream,
especially if this feature needs to touch many different parts of fs/

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories
  2017-12-11 16:07           ` Solar Designer
@ 2017-12-12 18:01             ` Salvatore Mesoraca
  0 siblings, 0 replies; 60+ messages in thread
From: Salvatore Mesoraca @ 2017-12-12 18:01 UTC (permalink / raw)
  To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox

2017-12-11 17:07 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote:
> > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote:
> > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > A new thought: a directory that has someone else as its owner is for our
> > > > purposes effectively the same as a group-writable directory.  So maybe
> > > > whatever we'll implement for group-writable directories should also be
> > > > done for directories that have other than the current fsuid as their
> > > > owner. It's currently very uncommon to have directories with the sticky
> > > > bit set that are not at least group-writable, so this will rarely make a
> > > > difference (and when it does, that's just right), but also it provides a
> > > > way to explicitly include any dibrectory under this monitoring (if the
> > > > group-writable protection is on) - e.g., if a sysadmin wants this
> > > > monitoring for users' home directories, they can change permissions for
> > > > those from e.g. 700 to 1700.  This could be handy for development and
> > > > auditing of software, even though in production it could be easily
> > > > circumvented by the directory owner (who can remove the sticky bit,
> > > > which we should document to avoid providing a false sense of security)
> > > > and it won't automatically apply to subdirectories.  It'd also cover
> > > > part of what we intend to achieve later by possibly extending the
> > > > feature to non-sticky directories, where we might also want to treat
> > > > different owner the same as group-writable (without the circumvention).
> > >
> > > Agreed. I'll extend it to also check for the directory owner.
> > > Maybe I could use another bit and make this additional restriction
> > > independent from the "group-writable" one.
> >
> > I'd prefer not to waste a bit on this.  We'll need plenty of bits later
> > on, perhaps also in pairs or (with your suggestion to separate this one)
> > in triplets, and every extra bit complicates the calculation of the
> > bitmask value.
>
> OTOH, if we were to interleave the world and group bits, then maybe
> instead of having the world and group bits as separate we could have a
> two bit number, which naturally leaves us an extra value we'd use for
> owner.  So:
>
> 0 - none
> 1 - world
> 2 - group and world
> 3 - owner, group, and world
>
> This would remove the ability to enable our restrictions for group (and
> owner) while disabling them for world, but that's likely just an obscure
> misconfiguration.
>
> Unfortunately, this is not compatible with my other idea of easily
> extending a tested policy from world onto group by simply assigning the
> same value to the other sysctl.
>
> I'm not sure which of these approaches is cleaner.

I prefer the version with separate sysctls for world/group, it seems more
usable.

Salvatore

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2017-12-12 18:01 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22  8:01 [PATCH v3 0/2] Restrict dangerous open in sticky directories Salvatore Mesoraca
2017-11-22  8:01 ` [kernel-hardening] " Salvatore Mesoraca
2017-11-22  8:01 ` [PATCH v3 1/2] Protected FIFOs and regular files Salvatore Mesoraca
2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
2017-11-23 22:43   ` Tobin C. Harding
2017-11-24  8:24     ` Salvatore Mesoraca
2017-11-22  8:01 ` [PATCH v3 2/2] Protected O_CREAT open in sticky directories Salvatore Mesoraca
2017-11-22  8:01   ` [kernel-hardening] " Salvatore Mesoraca
2017-11-22 12:03   ` Pavel Vasilyev
2017-11-24  8:28     ` Salvatore Mesoraca
2017-11-22 13:22   ` Matthew Wilcox
2017-11-22 13:22     ` [kernel-hardening] " Matthew Wilcox
2017-11-22 14:38     ` Pavel Vasilyev
2017-11-24  8:29     ` Salvatore Mesoraca
2017-11-24  8:29       ` [kernel-hardening] " Salvatore Mesoraca
2017-11-22 16:51   ` Alan Cox
2017-11-22 16:51     ` [kernel-hardening] " Alan Cox
2017-11-24  8:31     ` Salvatore Mesoraca
2017-11-24  8:31       ` [kernel-hardening] " Salvatore Mesoraca
2017-11-24 10:53     ` David Laight
2017-11-24 10:53       ` [kernel-hardening] " David Laight
2017-11-24 11:43       ` Salvatore Mesoraca
2017-11-24 11:43         ` [kernel-hardening] " Salvatore Mesoraca
2017-11-24 11:53         ` David Laight
2017-11-24 11:53           ` [kernel-hardening] " David Laight
2017-11-26 11:29           ` Salvatore Mesoraca
2017-11-26 11:29             ` [kernel-hardening] " Salvatore Mesoraca
2017-11-27  0:26         ` Solar Designer
2017-11-27  0:26           ` [kernel-hardening] " Solar Designer
2017-11-30 14:39           ` Salvatore Mesoraca
2017-11-30 14:39             ` [kernel-hardening] " Salvatore Mesoraca
2017-11-30 14:57             ` Ian Campbell
2017-11-30 16:30               ` [kernel-hardening] " Solar Designer
2017-12-05 10:21                 ` Salvatore Mesoraca
2017-12-07 21:47                   ` Solar Designer
2017-12-11 12:08                     ` Salvatore Mesoraca
2017-11-23 22:57   ` Tobin C. Harding
2017-11-24  8:34     ` Salvatore Mesoraca
2017-11-30 16:53   ` David Laight
2017-11-30 16:53     ` [kernel-hardening] " David Laight
2017-11-30 17:51     ` Solar Designer
2017-11-30 17:51       ` [kernel-hardening] " Solar Designer
2017-12-01  9:46       ` David Laight
2017-12-01  9:46         ` [kernel-hardening] " David Laight
2017-12-01 15:52         ` Alan Cox
2017-12-01 15:52           ` [kernel-hardening] " Alan Cox
2017-11-27  1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer
2017-11-27 13:18   ` Solar Designer
2017-11-30 14:38     ` Salvatore Mesoraca
2017-11-30 14:37   ` Salvatore Mesoraca
2017-11-30 19:05     ` Solar Designer
2017-11-30 19:14       ` Solar Designer
2017-12-05 10:13       ` Salvatore Mesoraca
2017-12-07 22:23         ` Solar Designer
2017-12-08 12:17           ` Solar Designer
2017-12-11 12:07             ` Salvatore Mesoraca
2017-12-11 15:33               ` Solar Designer
2017-12-12 18:00                 ` Salvatore Mesoraca
2017-12-11 16:07           ` Solar Designer
2017-12-12 18:01             ` Salvatore Mesoraca

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.