All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2012.5 0/2] fs: add link restrictions
@ 2012-07-02 20:17 ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook

This cleans up the various issues that Al Viro pointed out[1], and is 
meant to be applied on top of the recent vfs tree, which changed how 
path_openat handles its error conditions[2].

Thanks,

-Kees

[1] https://lkml.org/lkml/2012/6/30/42
[2] http://git.kernel.org/?p=linux/kernel/git/viro/vfs.git;a=commitdiff;h=09dd2a0348dead8f8eaefe301f1f8cfd0473a4b2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [kernel-hardening] [PATCH v2012.5 0/2] fs: add link restrictions
@ 2012-07-02 20:17 ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook

This cleans up the various issues that Al Viro pointed out[1], and is 
meant to be applied on top of the recent vfs tree, which changed how 
path_openat handles its error conditions[2].

Thanks,

-Kees

[1] https://lkml.org/lkml/2012/6/30/42
[2] http://git.kernel.org/?p=linux/kernel/git/viro/vfs.git;a=commitdiff;h=09dd2a0348dead8f8eaefe301f1f8cfd0473a4b2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] fs: add link restrictions
  2012-07-02 20:17 ` [kernel-hardening] " Kees Cook
@ 2012-07-02 20:17   ` Kees Cook
  -1 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook, Andrew Morton

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

---
v2012.5:
 - updates requested by Al Viro:
   - remove CONFIGs
   - pass nd for parent checking
   - release path on error
v2012.4:
 - split audit functions into a separate patch, suggested by Eric Paris.
v2012.3:
 While this code has been living in -mm and linux-next for 2 releases,
 this is a small rework based on feedback from Al Viro:
 - Moved audit functions into audit.c.
 - Added tests directly to path_openat/path_lookupat.
 - Merged with hardlink restriction patch to make things more sensible.
v2012.2:
 - Change sysctl mode to 0600, suggested by Ingo Molnar.
 - Rework CONFIG logic to split code from default behavior.
 - Renamed sysctl to have a "sysctl_" prefix, suggested by Andrew Morton.
 - Use "true/false" instead of "1/0" for bool arg, thanks to Andrew Morton.
 - Do not trust s_id to be safe to print, suggested by Andrew Morton.
v2012.1:
 - Use GFP_KERNEL for audit log allocation, thanks to Ingo Molnar.
v2011.3:
 - Add pid/comm back to logging.
v2011.2:
 - Updated documentation, thanks to Randy Dunlap.
 - Switched Kconfig default to "y", added __read_mostly to sysctl,
   thanks to Ingo Molnar.
 - Switched to audit logging to gain safe path and name reporting when
   hitting the restriction.
v2011.1:
 - back from hiatus
---
 Documentation/sysctl/fs.txt |   42 +++++++++++++++
 fs/namei.c                  |  121 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |    2 +
 kernel/sysctl.c             |   18 ++++++
 4 files changed, 183 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166..d4a372e 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
 - nr_open
 - overflowuid
 - overflowgid
+- protected_hardlinks
+- protected_symlinks
 - suid_dumpable
 - super-max
 - super-nr
@@ -157,6 +159,46 @@ The default is 65534.
 
 ==============================================================
 
+protected_hardlinks:
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
+protected_symlinks:
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
 suid_dumpable:
 
 This value can be used to query and set the core dump mode for setuid
diff --git a/fs/namei.c b/fs/namei.c
index 1b64746..8712c14 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -641,6 +641,118 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	path_put(link);
 }
 
+int sysctl_protected_symlinks __read_mostly = 1;
+int sysctl_protected_hardlinks __read_mostly = 1;
+
+/**
+ * may_follow_link - Check symlink following for unsafe situations
+ * @link: The path of the symlink
+ *
+ * In the case of the sysctl_protected_symlinks sysctl being enabled,
+ * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
+ * in a sticky world-writable directory. This is to protect privileged
+ * processes from failing races against path names that may change out
+ * from under them by way of other users creating malicious symlinks.
+ * It will permit symlinks to be followed only when outside a sticky
+ * world-writable directory, or when the uid of the symlink and follower
+ * match, or when the directory owner matches the symlink's owner.
+ *
+ * Returns 0 if following the symlink is allowed, -ve on error.
+ */
+static inline int may_follow_link(struct path *link, struct nameidata *nd)
+{
+	const struct inode *inode;
+	const struct inode *parent;
+
+	if (!sysctl_protected_symlinks)
+		return 0;
+
+	/* Allowed if owner and follower match. */
+	inode = link->dentry->d_inode;
+	if (current_cred()->fsuid == inode->i_uid)
+		return 0;
+
+	/* Allowed if parent directory not sticky and world-writable. */
+	parent = nd->path.dentry->d_inode;
+	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
+		return 0;
+
+	/* Allowed if parent directory and link owner match. */
+	if (parent->i_uid == inode->i_uid)
+		return 0;
+
+	path_put(&nd->path);
+	return -EACCES;
+}
+
+/**
+ * safe_hardlink_source - Check for safe hardlink conditions
+ * @inode: the source inode to hardlink from
+ *
+ * Return false if at least one of the following conditions:
+ *    - inode is not a regular file
+ *    - inode is setuid
+ *    - inode is setgid and group-exec
+ *    - access failure for read and write
+ *
+ * Otherwise returns true.
+ */
+static bool safe_hardlink_source(struct inode *inode)
+{
+	umode_t mode = inode->i_mode;
+
+	/* Special files should not get pinned to the filesystem. */
+	if (!S_ISREG(mode))
+		return false;
+
+	/* Setuid files should not get pinned to the filesystem. */
+	if (mode & S_ISUID)
+		return false;
+
+	/* Executable setgid files should not get pinned to the filesystem. */
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		return false;
+
+	/* Hardlinking to unreadable or unwritable sources is dangerous. */
+	if (inode_permission(inode, MAY_READ | MAY_WRITE))
+		return false;
+
+	return true;
+}
+
+/**
+ * may_linkat - Check permissions for creating a hardlink
+ * @link: the source to hardlink from
+ *
+ * Block hardlink when all of:
+ *  - sysctl_protected_hardlinks enabled
+ *  - fsuid does not match inode
+ *  - hardlink source is unsafe (see safe_hardlink_source() above)
+ *  - not CAP_FOWNER
+ *
+ * Returns 0 if successful, -ve on error.
+ */
+static int may_linkat(struct path *link)
+{
+	const struct cred *cred;
+	struct inode *inode;
+
+	if (!sysctl_protected_hardlinks)
+		return 0;
+
+	cred = current_cred();
+	inode = link->dentry->d_inode;
+
+	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
+	 * otherwise, it must be a safe source.
+	 */
+	if (cred->fsuid == inode->i_uid || safe_hardlink_source(inode) ||
+	    capable(CAP_FOWNER))
+		return 0;
+
+	return -EPERM;
+}
+
 static __always_inline int
 follow_link(struct path *link, struct nameidata *nd, void **p)
 {
@@ -1815,6 +1927,9 @@ static int path_lookupat(int dfd, const char *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
+			err = may_follow_link(&link, nd);
+			if (unlikely(err))
+				break;
 			nd->flags |= LOOKUP_PARENT;
 			err = follow_link(&link, nd, &cookie);
 			if (err)
@@ -2774,6 +2889,9 @@ static struct file *path_openat(int dfd, const char *pathname,
 			error = -ELOOP;
 			break;
 		}
+		error = may_follow_link(&link, nd);
+		if (unlikely(error))
+			break;
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		error = follow_link(&link, nd, &cookie);
@@ -3433,6 +3551,9 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
 		goto out_dput;
+	error = may_linkat(&old_path);
+	if (unlikely(error))
+		goto out_dput;
 	error = mnt_want_write(new_path.mnt);
 	if (error)
 		goto out_dput;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 48548bd..de74812 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -437,6 +437,8 @@ extern unsigned long get_max_files(void);
 extern int sysctl_nr_open;
 extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
+extern int sysctl_protected_symlinks;
+extern int sysctl_protected_hardlinks;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..5d9a1d2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1494,6 +1494,24 @@ static struct ctl_table fs_table[] = {
 #endif
 #endif
 	{
+		.procname	= "protected_symlinks",
+		.data		= &sysctl_protected_symlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
+		.procname	= "protected_hardlinks",
+		.data		= &sysctl_protected_hardlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [kernel-hardening] [PATCH 1/2] fs: add link restrictions
@ 2012-07-02 20:17   ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

---
v2012.5:
 - updates requested by Al Viro:
   - remove CONFIGs
   - pass nd for parent checking
   - release path on error
v2012.4:
 - split audit functions into a separate patch, suggested by Eric Paris.
v2012.3:
 While this code has been living in -mm and linux-next for 2 releases,
 this is a small rework based on feedback from Al Viro:
 - Moved audit functions into audit.c.
 - Added tests directly to path_openat/path_lookupat.
 - Merged with hardlink restriction patch to make things more sensible.
v2012.2:
 - Change sysctl mode to 0600, suggested by Ingo Molnar.
 - Rework CONFIG logic to split code from default behavior.
 - Renamed sysctl to have a "sysctl_" prefix, suggested by Andrew Morton.
 - Use "true/false" instead of "1/0" for bool arg, thanks to Andrew Morton.
 - Do not trust s_id to be safe to print, suggested by Andrew Morton.
v2012.1:
 - Use GFP_KERNEL for audit log allocation, thanks to Ingo Molnar.
v2011.3:
 - Add pid/comm back to logging.
v2011.2:
 - Updated documentation, thanks to Randy Dunlap.
 - Switched Kconfig default to "y", added __read_mostly to sysctl,
   thanks to Ingo Molnar.
 - Switched to audit logging to gain safe path and name reporting when
   hitting the restriction.
v2011.1:
 - back from hiatus
---
 Documentation/sysctl/fs.txt |   42 +++++++++++++++
 fs/namei.c                  |  121 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |    2 +
 kernel/sysctl.c             |   18 ++++++
 4 files changed, 183 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166..d4a372e 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
 - nr_open
 - overflowuid
 - overflowgid
+- protected_hardlinks
+- protected_symlinks
 - suid_dumpable
 - super-max
 - super-nr
@@ -157,6 +159,46 @@ The default is 65534.
 
 ==============================================================
 
+protected_hardlinks:
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
+protected_symlinks:
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
 suid_dumpable:
 
 This value can be used to query and set the core dump mode for setuid
diff --git a/fs/namei.c b/fs/namei.c
index 1b64746..8712c14 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -641,6 +641,118 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	path_put(link);
 }
 
+int sysctl_protected_symlinks __read_mostly = 1;
+int sysctl_protected_hardlinks __read_mostly = 1;
+
+/**
+ * may_follow_link - Check symlink following for unsafe situations
+ * @link: The path of the symlink
+ *
+ * In the case of the sysctl_protected_symlinks sysctl being enabled,
+ * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
+ * in a sticky world-writable directory. This is to protect privileged
+ * processes from failing races against path names that may change out
+ * from under them by way of other users creating malicious symlinks.
+ * It will permit symlinks to be followed only when outside a sticky
+ * world-writable directory, or when the uid of the symlink and follower
+ * match, or when the directory owner matches the symlink's owner.
+ *
+ * Returns 0 if following the symlink is allowed, -ve on error.
+ */
+static inline int may_follow_link(struct path *link, struct nameidata *nd)
+{
+	const struct inode *inode;
+	const struct inode *parent;
+
+	if (!sysctl_protected_symlinks)
+		return 0;
+
+	/* Allowed if owner and follower match. */
+	inode = link->dentry->d_inode;
+	if (current_cred()->fsuid == inode->i_uid)
+		return 0;
+
+	/* Allowed if parent directory not sticky and world-writable. */
+	parent = nd->path.dentry->d_inode;
+	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
+		return 0;
+
+	/* Allowed if parent directory and link owner match. */
+	if (parent->i_uid == inode->i_uid)
+		return 0;
+
+	path_put(&nd->path);
+	return -EACCES;
+}
+
+/**
+ * safe_hardlink_source - Check for safe hardlink conditions
+ * @inode: the source inode to hardlink from
+ *
+ * Return false if at least one of the following conditions:
+ *    - inode is not a regular file
+ *    - inode is setuid
+ *    - inode is setgid and group-exec
+ *    - access failure for read and write
+ *
+ * Otherwise returns true.
+ */
+static bool safe_hardlink_source(struct inode *inode)
+{
+	umode_t mode = inode->i_mode;
+
+	/* Special files should not get pinned to the filesystem. */
+	if (!S_ISREG(mode))
+		return false;
+
+	/* Setuid files should not get pinned to the filesystem. */
+	if (mode & S_ISUID)
+		return false;
+
+	/* Executable setgid files should not get pinned to the filesystem. */
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		return false;
+
+	/* Hardlinking to unreadable or unwritable sources is dangerous. */
+	if (inode_permission(inode, MAY_READ | MAY_WRITE))
+		return false;
+
+	return true;
+}
+
+/**
+ * may_linkat - Check permissions for creating a hardlink
+ * @link: the source to hardlink from
+ *
+ * Block hardlink when all of:
+ *  - sysctl_protected_hardlinks enabled
+ *  - fsuid does not match inode
+ *  - hardlink source is unsafe (see safe_hardlink_source() above)
+ *  - not CAP_FOWNER
+ *
+ * Returns 0 if successful, -ve on error.
+ */
+static int may_linkat(struct path *link)
+{
+	const struct cred *cred;
+	struct inode *inode;
+
+	if (!sysctl_protected_hardlinks)
+		return 0;
+
+	cred = current_cred();
+	inode = link->dentry->d_inode;
+
+	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
+	 * otherwise, it must be a safe source.
+	 */
+	if (cred->fsuid == inode->i_uid || safe_hardlink_source(inode) ||
+	    capable(CAP_FOWNER))
+		return 0;
+
+	return -EPERM;
+}
+
 static __always_inline int
 follow_link(struct path *link, struct nameidata *nd, void **p)
 {
@@ -1815,6 +1927,9 @@ static int path_lookupat(int dfd, const char *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
+			err = may_follow_link(&link, nd);
+			if (unlikely(err))
+				break;
 			nd->flags |= LOOKUP_PARENT;
 			err = follow_link(&link, nd, &cookie);
 			if (err)
@@ -2774,6 +2889,9 @@ static struct file *path_openat(int dfd, const char *pathname,
 			error = -ELOOP;
 			break;
 		}
+		error = may_follow_link(&link, nd);
+		if (unlikely(error))
+			break;
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		error = follow_link(&link, nd, &cookie);
@@ -3433,6 +3551,9 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
 		goto out_dput;
+	error = may_linkat(&old_path);
+	if (unlikely(error))
+		goto out_dput;
 	error = mnt_want_write(new_path.mnt);
 	if (error)
 		goto out_dput;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 48548bd..de74812 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -437,6 +437,8 @@ extern unsigned long get_max_files(void);
 extern int sysctl_nr_open;
 extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
+extern int sysctl_protected_symlinks;
+extern int sysctl_protected_hardlinks;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..5d9a1d2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1494,6 +1494,24 @@ static struct ctl_table fs_table[] = {
 #endif
 #endif
 	{
+		.procname	= "protected_symlinks",
+		.data		= &sysctl_protected_symlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
+		.procname	= "protected_hardlinks",
+		.data		= &sysctl_protected_hardlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] fs: add link restriction audit reporting
  2012-07-02 20:17 ` [kernel-hardening] " Kees Cook
@ 2012-07-02 20:17   ` Kees Cook
  -1 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook

Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/namei.c            |    2 ++
 include/linux/audit.h |    4 ++++
 kernel/audit.c        |   21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8712c14..6167420 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -682,6 +682,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	path_put(&nd->path);
+	audit_log_link_denied("follow_link", link);
 	return -EACCES;
 }
 
@@ -750,6 +751,7 @@ static int may_linkat(struct path *link)
 	    capable(CAP_FOWNER))
 		return 0;
 
+	audit_log_link_denied("linkat", link);
 	return -EPERM;
 }
 
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22f292a..36abf2a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -130,6 +130,7 @@
 #define AUDIT_LAST_KERN_ANOM_MSG    1799
 #define AUDIT_ANOM_PROMISCUOUS      1700 /* Device changed promiscuous mode */
 #define AUDIT_ANOM_ABEND            1701 /* Process ended abnormally */
+#define AUDIT_ANOM_LINK		    1702 /* Suspicious use of file links */
 #define AUDIT_INTEGRITY_DATA	    1800 /* Data integrity verification */
 #define AUDIT_INTEGRITY_METADATA    1801 /* Metadata integrity verification */
 #define AUDIT_INTEGRITY_STATUS	    1802 /* Integrity enable status */
@@ -687,6 +688,8 @@ extern void		    audit_log_d_path(struct audit_buffer *ab,
 					     const struct path *path);
 extern void		    audit_log_key(struct audit_buffer *ab,
 					  char *key);
+extern void		    audit_log_link_denied(const char *operation,
+						  struct path *link);
 extern void		    audit_log_lost(const char *message);
 #ifdef CONFIG_SECURITY
 extern void 		    audit_log_secctx(struct audit_buffer *ab, u32 secid);
@@ -716,6 +719,7 @@ extern int audit_enabled;
 #define audit_log_untrustedstring(a,s) do { ; } while (0)
 #define audit_log_d_path(b, p, d) do { ; } while (0)
 #define audit_log_key(b, k) do { ; } while (0)
+#define audit_log_link_denied(o, l) do { ; } while (0)
 #define audit_log_secctx(b,s) do { ; } while (0)
 #define audit_enabled 0
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 1c7f2c6..fda8bd9 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1450,6 +1450,27 @@ void audit_log_key(struct audit_buffer *ab, char *key)
 }
 
 /**
+ * audit_log_link_denied - report a link restriction denial
+ * @operation: specific link opreation
+ * @link: the path that triggered the restriction
+ */
+void audit_log_link_denied(const char *operation, struct path *link)
+{
+	struct audit_buffer *ab;
+
+	ab = audit_log_start(current->audit_context, GFP_KERNEL,
+			     AUDIT_ANOM_LINK);
+	audit_log_format(ab, "op=%s action=denied", operation);
+	audit_log_format(ab, " pid=%d comm=", current->pid);
+	audit_log_untrustedstring(ab, current->comm);
+	audit_log_d_path(ab, " path=", link);
+	audit_log_format(ab, " dev=");
+	audit_log_untrustedstring(ab, link->dentry->d_inode->i_sb->s_id);
+	audit_log_format(ab, " ino=%lu", link->dentry->d_inode->i_ino);
+	audit_log_end(ab);
+}
+
+/**
  * audit_log_end - end one audit record
  * @ab: the audit_buffer
  *
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [kernel-hardening] [PATCH 2/2] fs: add link restriction audit reporting
@ 2012-07-02 20:17   ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02 20:17 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook

Adds audit messages for unexpected link restriction violations so that
system owners will have some sort of potentially actionable information
about misbehaving processes.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/namei.c            |    2 ++
 include/linux/audit.h |    4 ++++
 kernel/audit.c        |   21 +++++++++++++++++++++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 8712c14..6167420 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -682,6 +682,7 @@ static inline int may_follow_link(struct path *link, struct nameidata *nd)
 		return 0;
 
 	path_put(&nd->path);
+	audit_log_link_denied("follow_link", link);
 	return -EACCES;
 }
 
@@ -750,6 +751,7 @@ static int may_linkat(struct path *link)
 	    capable(CAP_FOWNER))
 		return 0;
 
+	audit_log_link_denied("linkat", link);
 	return -EPERM;
 }
 
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 22f292a..36abf2a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -130,6 +130,7 @@
 #define AUDIT_LAST_KERN_ANOM_MSG    1799
 #define AUDIT_ANOM_PROMISCUOUS      1700 /* Device changed promiscuous mode */
 #define AUDIT_ANOM_ABEND            1701 /* Process ended abnormally */
+#define AUDIT_ANOM_LINK		    1702 /* Suspicious use of file links */
 #define AUDIT_INTEGRITY_DATA	    1800 /* Data integrity verification */
 #define AUDIT_INTEGRITY_METADATA    1801 /* Metadata integrity verification */
 #define AUDIT_INTEGRITY_STATUS	    1802 /* Integrity enable status */
@@ -687,6 +688,8 @@ extern void		    audit_log_d_path(struct audit_buffer *ab,
 					     const struct path *path);
 extern void		    audit_log_key(struct audit_buffer *ab,
 					  char *key);
+extern void		    audit_log_link_denied(const char *operation,
+						  struct path *link);
 extern void		    audit_log_lost(const char *message);
 #ifdef CONFIG_SECURITY
 extern void 		    audit_log_secctx(struct audit_buffer *ab, u32 secid);
@@ -716,6 +719,7 @@ extern int audit_enabled;
 #define audit_log_untrustedstring(a,s) do { ; } while (0)
 #define audit_log_d_path(b, p, d) do { ; } while (0)
 #define audit_log_key(b, k) do { ; } while (0)
+#define audit_log_link_denied(o, l) do { ; } while (0)
 #define audit_log_secctx(b,s) do { ; } while (0)
 #define audit_enabled 0
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index 1c7f2c6..fda8bd9 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1450,6 +1450,27 @@ void audit_log_key(struct audit_buffer *ab, char *key)
 }
 
 /**
+ * audit_log_link_denied - report a link restriction denial
+ * @operation: specific link opreation
+ * @link: the path that triggered the restriction
+ */
+void audit_log_link_denied(const char *operation, struct path *link)
+{
+	struct audit_buffer *ab;
+
+	ab = audit_log_start(current->audit_context, GFP_KERNEL,
+			     AUDIT_ANOM_LINK);
+	audit_log_format(ab, "op=%s action=denied", operation);
+	audit_log_format(ab, " pid=%d comm=", current->pid);
+	audit_log_untrustedstring(ab, current->comm);
+	audit_log_d_path(ab, " path=", link);
+	audit_log_format(ab, " dev=");
+	audit_log_untrustedstring(ab, link->dentry->d_inode->i_sb->s_id);
+	audit_log_format(ab, " ino=%lu", link->dentry->d_inode->i_ino);
+	audit_log_end(ab);
+}
+
+/**
  * audit_log_end - end one audit record
  * @ab: the audit_buffer
  *
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-08-08 12:19   ` [kernel-hardening] " Vasily Kulikov
@ 2012-08-12  6:34     ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-08-12  6:34 UTC (permalink / raw)
  To: Vasily Kulikov
  Cc: kernel-hardening, Al Viro, Andrew Morton, linux-kernel,
	linux-fsdevel, Eric Paris, Matthew Wilcox, Doug Ledford,
	Joe Korty, Eric W. Biederman, Ingo Molnar, David Howells,
	James Morris, linux-doc, Dan Rosenberg

On Wed, Aug 8, 2012 at 5:19 AM, Vasily Kulikov <segoon@openwall.com> wrote:
> Hi Kees,
>
> On Wed, Jul 25, 2012 at 17:29 -0700, Kees Cook wrote:
>> +/**
>> + * safe_hardlink_source - Check for safe hardlink conditions
>> + * @inode: the source inode to hardlink from
>> + *
>> + * Return false if at least one of the following conditions:
>> + *    - inode is not a regular file
>> + *    - inode is setuid
>> + *    - inode is setgid and group-exec
>> + *    - access failure for read and write
>> + *
>> + * Otherwise returns true.
>> + */
>> +static bool safe_hardlink_source(struct inode *inode)
>> +{
>> +     umode_t mode = inode->i_mode;
>> +
>> +     /* Special files should not get pinned to the filesystem. */
>> +     if (!S_ISREG(mode))
>> +             return false;
>> +
>> +     /* Setuid files should not get pinned to the filesystem. */
>> +     if (mode & S_ISUID)
>> +             return false;
>
> We don't want to make hardlinks of SUID files, but we still allow to create
> hardlinks to SUID'ish cap'ed files.  Probably check whether the inode is
> setcap'ed?

Excellent idea. It doesn't look like there is anything "simple" to do
this already. It'd be close to get_file_caps() but without the bprm.
Maybe just get_vfs_caps_from_disk() and a walk of the caps? What would
you recommend?

> Probably we can enhance this further and allow LSMs to define whether this
> particular file is special in LSM's point of view (IOW, it can be able to move
> a process to another security domain which is served by LSM).

Yeah. Perhaps implementing the needed check above with a new security
check and have commoncaps do the vfs fetch with LSMs able to override?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-08-06 23:55       ` Eric W. Biederman
@ 2012-08-06 23:57         ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-08-06 23:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: James Morris, kernel-hardening, Al Viro, Andrew Morton,
	linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Ingo Molnar, David Howells,
	James Morris, linux-doc, Dan Rosenberg

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Mon, Aug 6, 2012 at 4:55 PM, Eric W. Biederman <ebiederm@xmission.com>wrote:

> Kees Cook <keescook@chromium.org> writes:
>
> > On Thu, Aug 2, 2012 at 9:26 PM, James Morris <jmorris@namei.org> wrote:
> >> On Wed, 25 Jul 2012, Kees Cook wrote:
> >>
> >>> This adds symlink and hardlink restrictions to the Linux VFS.
> >>
> >> Is Al happy with this now?
> >
> > Looks like it; thanks for checking. It's in mainline now:
> >
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=800179c9b8a1e796e441674776d11cd4c05d61d7
>
> So there was one trivial little issue with your patch.  You were
> directly comparing kuids instead of using uid_eq.  This only practically
> matters when user namespaces are enabled which is currently impossible
> in 3.6-rc1 :(
>
> I have added the following fixup patch to my for-next branch of
> user-namespace.git
>
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> Date: Fri, 3 Aug 2012 09:38:08 -0700
> Subject: [PATCH] userns:  Fix link restrictions to use uid_eq
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>

Ah-ha! Thanks for fixing this.

Acked-by: Kees Cook <keescook@chromium.org>


-- 
Kees Cook
Chrome OS Security

[-- Attachment #2: Type: text/html, Size: 2110 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-07-26  0:29 ` [PATCH 1/2] " Kees Cook
@ 2012-08-03  4:26   ` James Morris
  2012-08-03 17:01     ` [kernel-hardening] " Kees Cook
  2012-08-08 12:19   ` [kernel-hardening] " Vasily Kulikov
  1 sibling, 1 reply; 14+ messages in thread
From: James Morris @ 2012-08-03  4:26 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Al Viro, Andrew Morton, linux-kernel, linux-fsdevel, Eric Paris,
	Matthew Wilcox, Doug Ledford, Joe Korty, Eric W. Biederman,
	Ingo Molnar, David Howells, James Morris, linux-doc,
	Dan Rosenberg, Kees Cook

On Wed, 25 Jul 2012, Kees Cook wrote:

> This adds symlink and hardlink restrictions to the Linux VFS.

Is Al happy with this now?


> 
> Symlinks:
> 
> A long-standing class of security issues is the symlink-based
> time-of-check-time-of-use race, most commonly seen in world-writable
> directories like /tmp. The common method of exploitation of this flaw
> is to cross privilege boundaries when following a given symlink (i.e. a
> root process follows a symlink belonging to another user). For a likely
> incomplete list of hundreds of examples across the years, please see:
> http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
> 
> The solution is to permit symlinks to only be followed when outside
> a sticky world-writable directory, or when the uid of the symlink and
> follower match, or when the directory owner matches the symlink's owner.
> 
> Some pointers to the history of earlier discussion that I could find:
> 
>  1996 Aug, Zygo Blaxell
>   http://marc.info/?l=bugtraq&m=87602167419830&w=2
>  1996 Oct, Andrew Tridgell
>   http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
>  1997 Dec, Albert D Cahalan
>   http://lkml.org/lkml/1997/12/16/4
>  2005 Feb, Lorenzo Hern?ndez Garc?a-Hierro
>   http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
>  2010 May, Kees Cook
>   https://lkml.org/lkml/2010/5/30/144
> 
> Past objections and rebuttals could be summarized as:
> 
>  - Violates POSIX.
>    - POSIX didn't consider this situation and it's not useful to follow
>      a broken specification at the cost of security.
>  - Might break unknown applications that use this feature.
>    - Applications that break because of the change are easy to spot and
>      fix. Applications that are vulnerable to symlink ToCToU by not having
>      the change aren't. Additionally, no applications have yet been found
>      that rely on this behavior.
>  - Applications should just use mkstemp() or O_CREATE|O_EXCL.
>    - True, but applications are not perfect, and new software is written
>      all the time that makes these mistakes; blocking this flaw at the
>      kernel is a single solution to the entire class of vulnerability.
>  - This should live in the core VFS.
>    - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
>  - This should live in an LSM.
>    - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)
> 
> Hardlinks:
> 
> On systems that have user-writable directories on the same partition
> as system files, a long-standing class of security issues is the
> hardlink-based time-of-check-time-of-use race, most commonly seen in
> world-writable directories like /tmp. The common method of exploitation
> of this flaw is to cross privilege boundaries when following a given
> hardlink (i.e. a root process follows a hardlink created by another
> user). Additionally, an issue exists where users can "pin" a potentially
> vulnerable setuid/setgid file so that an administrator will not actually
> upgrade a system fully.
> 
> The solution is to permit hardlinks to only be created when the user is
> already the existing file's owner, or if they already have read/write
> access to the existing file.
> 
> Many Linux users are surprised when they learn they can link to files
> they have no access to, so this change appears to follow the doctrine
> of "least surprise". Additionally, this change does not violate POSIX,
> which states "the implementation may require that the calling process
> has permission to access the existing file"[1].
> 
> This change is known to break some implementations of the "at" daemon,
> though the version used by Fedora and Ubuntu has been fixed[2] for
> a while. Otherwise, the change has been undisruptive while in use in
> Ubuntu for the last 1.5 years.
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
> [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
> 
> This patch is based on the patches in Openwall and grsecurity, along with
> suggestions from Al Viro. I have added a sysctl to enable the protected
> behavior, and documentation.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Acked-by: Ingo Molnar <mingo@elte.hu>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> ---
> v2012.5:
>  - updates requested by Al Viro:
>    - remove CONFIGs
>    - pass nd for parent checking
>    - release path on error
> v2012.4:
>  - split audit functions into a separate patch, suggested by Eric Paris.
> v2012.3:
>  While this code has been living in -mm and linux-next for 2 releases,
>  this is a small rework based on feedback from Al Viro:
>  - Moved audit functions into audit.c.
>  - Added tests directly to path_openat/path_lookupat.
>  - Merged with hardlink restriction patch to make things more sensible.
> v2012.2:
>  - Change sysctl mode to 0600, suggested by Ingo Molnar.
>  - Rework CONFIG logic to split code from default behavior.
>  - Renamed sysctl to have a "sysctl_" prefix, suggested by Andrew Morton.
>  - Use "true/false" instead of "1/0" for bool arg, thanks to Andrew Morton.
>  - Do not trust s_id to be safe to print, suggested by Andrew Morton.
> v2012.1:
>  - Use GFP_KERNEL for audit log allocation, thanks to Ingo Molnar.
> v2011.3:
>  - Add pid/comm back to logging.
> v2011.2:
>  - Updated documentation, thanks to Randy Dunlap.
>  - Switched Kconfig default to "y", added __read_mostly to sysctl,
>    thanks to Ingo Molnar.
>  - Switched to audit logging to gain safe path and name reporting when
>    hitting the restriction.
> v2011.1:
>  - back from hiatus
> ---
>  Documentation/sysctl/fs.txt |   42 +++++++++++++++
>  fs/namei.c                  |  121 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h          |    2 +
>  kernel/sysctl.c             |   18 ++++++
>  4 files changed, 183 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index 13d6166..d4a372e 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
>  - nr_open
>  - overflowuid
>  - overflowgid
> +- protected_hardlinks
> +- protected_symlinks
>  - suid_dumpable
>  - super-max
>  - super-nr
> @@ -157,6 +159,46 @@ The default is 65534.
>  
>  ==============================================================
>  
> +protected_hardlinks:
> +
> +A long-standing class of security issues is the hardlink-based
> +time-of-check-time-of-use race, most commonly seen in world-writable
> +directories like /tmp. The common method of exploitation of this flaw
> +is to cross privilege boundaries when following a given hardlink (i.e. a
> +root process follows a hardlink created by another user). Additionally,
> +on systems without separated partitions, this stops unauthorized users
> +from "pinning" vulnerable setuid/setgid files against being upgraded by
> +the administrator, or linking to special files.
> +
> +When set to "0", hardlink creation behavior is unrestricted.
> +
> +When set to "1" hardlinks cannot be created by users if they do not
> +already own the source file, or do not have read/write access to it.
> +
> +This protection is based on the restrictions in Openwall and grsecurity.
> +
> +==============================================================
> +
> +protected_symlinks:
> +
> +A long-standing class of security issues is the symlink-based
> +time-of-check-time-of-use race, most commonly seen in world-writable
> +directories like /tmp. The common method of exploitation of this flaw
> +is to cross privilege boundaries when following a given symlink (i.e. a
> +root process follows a symlink belonging to another user). For a likely
> +incomplete list of hundreds of examples across the years, please see:
> +http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
> +
> +When set to "0", symlink following behavior is unrestricted.
> +
> +When set to "1" symlinks are permitted to be followed only when outside
> +a sticky world-writable directory, or when the uid of the symlink and
> +follower match, or when the directory owner matches the symlink's owner.
> +
> +This protection is based on the restrictions in Openwall and grsecurity.
> +
> +==============================================================
> +
>  suid_dumpable:
>  
>  This value can be used to query and set the core dump mode for setuid
> diff --git a/fs/namei.c b/fs/namei.c
> index 2ccc35c..e5ad2db 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -650,6 +650,118 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
>  	path_put(link);
>  }
>  
> +int sysctl_protected_symlinks __read_mostly = 1;
> +int sysctl_protected_hardlinks __read_mostly = 1;
> +
> +/**
> + * may_follow_link - Check symlink following for unsafe situations
> + * @link: The path of the symlink
> + *
> + * In the case of the sysctl_protected_symlinks sysctl being enabled,
> + * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
> + * in a sticky world-writable directory. This is to protect privileged
> + * processes from failing races against path names that may change out
> + * from under them by way of other users creating malicious symlinks.
> + * It will permit symlinks to be followed only when outside a sticky
> + * world-writable directory, or when the uid of the symlink and follower
> + * match, or when the directory owner matches the symlink's owner.
> + *
> + * Returns 0 if following the symlink is allowed, -ve on error.
> + */
> +static inline int may_follow_link(struct path *link, struct nameidata *nd)
> +{
> +	const struct inode *inode;
> +	const struct inode *parent;
> +
> +	if (!sysctl_protected_symlinks)
> +		return 0;
> +
> +	/* Allowed if owner and follower match. */
> +	inode = link->dentry->d_inode;
> +	if (current_cred()->fsuid == inode->i_uid)
> +		return 0;
> +
> +	/* Allowed if parent directory not sticky and world-writable. */
> +	parent = nd->path.dentry->d_inode;
> +	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
> +		return 0;
> +
> +	/* Allowed if parent directory and link owner match. */
> +	if (parent->i_uid == inode->i_uid)
> +		return 0;
> +
> +	path_put(&nd->path);
> +	return -EACCES;
> +}
> +
> +/**
> + * safe_hardlink_source - Check for safe hardlink conditions
> + * @inode: the source inode to hardlink from
> + *
> + * Return false if at least one of the following conditions:
> + *    - inode is not a regular file
> + *    - inode is setuid
> + *    - inode is setgid and group-exec
> + *    - access failure for read and write
> + *
> + * Otherwise returns true.
> + */
> +static bool safe_hardlink_source(struct inode *inode)
> +{
> +	umode_t mode = inode->i_mode;
> +
> +	/* Special files should not get pinned to the filesystem. */
> +	if (!S_ISREG(mode))
> +		return false;
> +
> +	/* Setuid files should not get pinned to the filesystem. */
> +	if (mode & S_ISUID)
> +		return false;
> +
> +	/* Executable setgid files should not get pinned to the filesystem. */
> +	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> +		return false;
> +
> +	/* Hardlinking to unreadable or unwritable sources is dangerous. */
> +	if (inode_permission(inode, MAY_READ | MAY_WRITE))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
> + * may_linkat - Check permissions for creating a hardlink
> + * @link: the source to hardlink from
> + *
> + * Block hardlink when all of:
> + *  - sysctl_protected_hardlinks enabled
> + *  - fsuid does not match inode
> + *  - hardlink source is unsafe (see safe_hardlink_source() above)
> + *  - not CAP_FOWNER
> + *
> + * Returns 0 if successful, -ve on error.
> + */
> +static int may_linkat(struct path *link)
> +{
> +	const struct cred *cred;
> +	struct inode *inode;
> +
> +	if (!sysctl_protected_hardlinks)
> +		return 0;
> +
> +	cred = current_cred();
> +	inode = link->dentry->d_inode;
> +
> +	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
> +	 * otherwise, it must be a safe source.
> +	 */
> +	if (cred->fsuid == inode->i_uid || safe_hardlink_source(inode) ||
> +	    capable(CAP_FOWNER))
> +		return 0;
> +
> +	return -EPERM;
> +}
> +
>  static __always_inline int
>  follow_link(struct path *link, struct nameidata *nd, void **p)
>  {
> @@ -1818,6 +1930,9 @@ static int path_lookupat(int dfd, const char *name,
>  		while (err > 0) {
>  			void *cookie;
>  			struct path link = path;
> +			err = may_follow_link(&link, nd);
> +			if (unlikely(err))
> +				break;
>  			nd->flags |= LOOKUP_PARENT;
>  			err = follow_link(&link, nd, &cookie);
>  			if (err)
> @@ -2777,6 +2892,9 @@ static struct file *path_openat(int dfd, const char *pathname,
>  			error = -ELOOP;
>  			break;
>  		}
> +		error = may_follow_link(&link, nd);
> +		if (unlikely(error))
> +			break;
>  		nd->flags |= LOOKUP_PARENT;
>  		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
>  		error = follow_link(&link, nd, &cookie);
> @@ -3436,6 +3554,9 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
>  	error = -EXDEV;
>  	if (old_path.mnt != new_path.mnt)
>  		goto out_dput;
> +	error = may_linkat(&old_path);
> +	if (unlikely(error))
> +		goto out_dput;
>  	error = mnt_want_write(new_path.mnt);
>  	if (error)
>  		goto out_dput;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8fabb03..c8fb6df 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -437,6 +437,8 @@ extern unsigned long get_max_files(void);
>  extern int sysctl_nr_open;
>  extern struct inodes_stat_t inodes_stat;
>  extern int leases_enable, lease_break_time;
> +extern int sysctl_protected_symlinks;
> +extern int sysctl_protected_hardlinks;
>  
>  struct buffer_head;
>  typedef int (get_block_t)(struct inode *inode, sector_t iblock,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4ab1187..5d9a1d2 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1494,6 +1494,24 @@ static struct ctl_table fs_table[] = {
>  #endif
>  #endif
>  	{
> +		.procname	= "protected_symlinks",
> +		.data		= &sysctl_protected_symlinks,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0600,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +	{
> +		.procname	= "protected_hardlinks",
> +		.data		= &sysctl_protected_hardlinks,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0600,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +	{
>  		.procname	= "suid_dumpable",
>  		.data		= &suid_dumpable,
>  		.maxlen		= sizeof(int),
> -- 
> 1.7.0.4
> 

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] fs: add link restrictions
  2012-07-26  0:29 [RESEND][PATCH v2012.5 0/2] fs: add link restrictions Kees Cook
@ 2012-07-26  0:29 ` Kees Cook
  2012-08-03  4:26   ` James Morris
  2012-08-08 12:19   ` [kernel-hardening] " Vasily Kulikov
  0 siblings, 2 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-26  0:29 UTC (permalink / raw)
  To: Al Viro, Andrew Morton
  Cc: linux-kernel, linux-fsdevel, Eric Paris, Matthew Wilcox,
	Doug Ledford, Joe Korty, Eric W. Biederman, Ingo Molnar,
	David Howells, James Morris, linux-doc, Dan Rosenberg,
	kernel-hardening, Kees Cook, Andrew Morton

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

---
v2012.5:
 - updates requested by Al Viro:
   - remove CONFIGs
   - pass nd for parent checking
   - release path on error
v2012.4:
 - split audit functions into a separate patch, suggested by Eric Paris.
v2012.3:
 While this code has been living in -mm and linux-next for 2 releases,
 this is a small rework based on feedback from Al Viro:
 - Moved audit functions into audit.c.
 - Added tests directly to path_openat/path_lookupat.
 - Merged with hardlink restriction patch to make things more sensible.
v2012.2:
 - Change sysctl mode to 0600, suggested by Ingo Molnar.
 - Rework CONFIG logic to split code from default behavior.
 - Renamed sysctl to have a "sysctl_" prefix, suggested by Andrew Morton.
 - Use "true/false" instead of "1/0" for bool arg, thanks to Andrew Morton.
 - Do not trust s_id to be safe to print, suggested by Andrew Morton.
v2012.1:
 - Use GFP_KERNEL for audit log allocation, thanks to Ingo Molnar.
v2011.3:
 - Add pid/comm back to logging.
v2011.2:
 - Updated documentation, thanks to Randy Dunlap.
 - Switched Kconfig default to "y", added __read_mostly to sysctl,
   thanks to Ingo Molnar.
 - Switched to audit logging to gain safe path and name reporting when
   hitting the restriction.
v2011.1:
 - back from hiatus
---
 Documentation/sysctl/fs.txt |   42 +++++++++++++++
 fs/namei.c                  |  121 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |    2 +
 kernel/sysctl.c             |   18 ++++++
 4 files changed, 183 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166..d4a372e 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
 - nr_open
 - overflowuid
 - overflowgid
+- protected_hardlinks
+- protected_symlinks
 - suid_dumpable
 - super-max
 - super-nr
@@ -157,6 +159,46 @@ The default is 65534.
 
 ==============================================================
 
+protected_hardlinks:
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
+protected_symlinks:
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
 suid_dumpable:
 
 This value can be used to query and set the core dump mode for setuid
diff --git a/fs/namei.c b/fs/namei.c
index 2ccc35c..e5ad2db 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -650,6 +650,118 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	path_put(link);
 }
 
+int sysctl_protected_symlinks __read_mostly = 1;
+int sysctl_protected_hardlinks __read_mostly = 1;
+
+/**
+ * may_follow_link - Check symlink following for unsafe situations
+ * @link: The path of the symlink
+ *
+ * In the case of the sysctl_protected_symlinks sysctl being enabled,
+ * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
+ * in a sticky world-writable directory. This is to protect privileged
+ * processes from failing races against path names that may change out
+ * from under them by way of other users creating malicious symlinks.
+ * It will permit symlinks to be followed only when outside a sticky
+ * world-writable directory, or when the uid of the symlink and follower
+ * match, or when the directory owner matches the symlink's owner.
+ *
+ * Returns 0 if following the symlink is allowed, -ve on error.
+ */
+static inline int may_follow_link(struct path *link, struct nameidata *nd)
+{
+	const struct inode *inode;
+	const struct inode *parent;
+
+	if (!sysctl_protected_symlinks)
+		return 0;
+
+	/* Allowed if owner and follower match. */
+	inode = link->dentry->d_inode;
+	if (current_cred()->fsuid == inode->i_uid)
+		return 0;
+
+	/* Allowed if parent directory not sticky and world-writable. */
+	parent = nd->path.dentry->d_inode;
+	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) != (S_ISVTX|S_IWOTH))
+		return 0;
+
+	/* Allowed if parent directory and link owner match. */
+	if (parent->i_uid == inode->i_uid)
+		return 0;
+
+	path_put(&nd->path);
+	return -EACCES;
+}
+
+/**
+ * safe_hardlink_source - Check for safe hardlink conditions
+ * @inode: the source inode to hardlink from
+ *
+ * Return false if at least one of the following conditions:
+ *    - inode is not a regular file
+ *    - inode is setuid
+ *    - inode is setgid and group-exec
+ *    - access failure for read and write
+ *
+ * Otherwise returns true.
+ */
+static bool safe_hardlink_source(struct inode *inode)
+{
+	umode_t mode = inode->i_mode;
+
+	/* Special files should not get pinned to the filesystem. */
+	if (!S_ISREG(mode))
+		return false;
+
+	/* Setuid files should not get pinned to the filesystem. */
+	if (mode & S_ISUID)
+		return false;
+
+	/* Executable setgid files should not get pinned to the filesystem. */
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		return false;
+
+	/* Hardlinking to unreadable or unwritable sources is dangerous. */
+	if (inode_permission(inode, MAY_READ | MAY_WRITE))
+		return false;
+
+	return true;
+}
+
+/**
+ * may_linkat - Check permissions for creating a hardlink
+ * @link: the source to hardlink from
+ *
+ * Block hardlink when all of:
+ *  - sysctl_protected_hardlinks enabled
+ *  - fsuid does not match inode
+ *  - hardlink source is unsafe (see safe_hardlink_source() above)
+ *  - not CAP_FOWNER
+ *
+ * Returns 0 if successful, -ve on error.
+ */
+static int may_linkat(struct path *link)
+{
+	const struct cred *cred;
+	struct inode *inode;
+
+	if (!sysctl_protected_hardlinks)
+		return 0;
+
+	cred = current_cred();
+	inode = link->dentry->d_inode;
+
+	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
+	 * otherwise, it must be a safe source.
+	 */
+	if (cred->fsuid == inode->i_uid || safe_hardlink_source(inode) ||
+	    capable(CAP_FOWNER))
+		return 0;
+
+	return -EPERM;
+}
+
 static __always_inline int
 follow_link(struct path *link, struct nameidata *nd, void **p)
 {
@@ -1818,6 +1930,9 @@ static int path_lookupat(int dfd, const char *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
+			err = may_follow_link(&link, nd);
+			if (unlikely(err))
+				break;
 			nd->flags |= LOOKUP_PARENT;
 			err = follow_link(&link, nd, &cookie);
 			if (err)
@@ -2777,6 +2892,9 @@ static struct file *path_openat(int dfd, const char *pathname,
 			error = -ELOOP;
 			break;
 		}
+		error = may_follow_link(&link, nd);
+		if (unlikely(error))
+			break;
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		error = follow_link(&link, nd, &cookie);
@@ -3436,6 +3554,9 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
 		goto out_dput;
+	error = may_linkat(&old_path);
+	if (unlikely(error))
+		goto out_dput;
 	error = mnt_want_write(new_path.mnt);
 	if (error)
 		goto out_dput;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8fabb03..c8fb6df 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -437,6 +437,8 @@ extern unsigned long get_max_files(void);
 extern int sysctl_nr_open;
 extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
+extern int sysctl_protected_symlinks;
+extern int sysctl_protected_hardlinks;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..5d9a1d2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1494,6 +1494,24 @@ static struct ctl_table fs_table[] = {
 #endif
 #endif
 	{
+		.procname	= "protected_symlinks",
+		.data		= &sysctl_protected_symlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
+		.procname	= "protected_hardlinks",
+		.data		= &sysctl_protected_hardlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-07-02  0:43     ` Kees Cook
@ 2012-07-02  1:26       ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2012-07-02  1:26 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, Andrew Morton, linux-fsdevel, Eric Paris,
	Matthew Wilcox, Doug Ledford, Joe Korty, Eric W. Biederman,
	Ingo Molnar, David Howells, James Morris, linux-doc,
	Dan Rosenberg, kernel-hardening

On Sun, Jul 1, 2012 at 5:43 PM, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Jun 30, 2012 at 2:14 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Mon, Jun 25, 2012 at 02:05:26PM -0700, Kees Cook wrote:
>>> +                     err = may_follow_link(&link);
>>> +                     if (unlikely(err))
>>> +                             break;
>>
>> No.  This is definitely wrong - you are leaking dentries and vfsmount here.
>
> What should I do to avoid the leak? I thought it was avoiding the need
> to call put_link because it aborts before calling follow_link.

Does this need "path_put(&nd->path);" added to the abort case?

If so, is this also missing from follow_link()'s final "return error",
or is it the responsibility of dentry->d_inode->i_op->follow_link() to
have already called path_put()?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-06-30  9:14   ` Al Viro
@ 2012-07-02  0:43     ` Kees Cook
  2012-07-02  1:26       ` Kees Cook
  0 siblings, 1 reply; 14+ messages in thread
From: Kees Cook @ 2012-07-02  0:43 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-kernel, Andrew Morton, linux-fsdevel, Eric Paris,
	Matthew Wilcox, Doug Ledford, Joe Korty, Eric W. Biederman,
	Ingo Molnar, David Howells, James Morris, linux-doc,
	Dan Rosenberg, kernel-hardening

On Sat, Jun 30, 2012 at 2:14 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Jun 25, 2012 at 02:05:26PM -0700, Kees Cook wrote:
>
>> +config PROTECTED_LINKS
>> +     bool "Evaluate vulnerable link conditions"
>> +     default y
>
> Remember Linus' rants about 'default y' in general?

I added these configs due to other people's requests. I am happy to
remove them all and have the sysctls start their life == 1. It would
eliminate all the #ifdef logic too.

>> +     /* Check parent directory mode and owner. */
>
> I suspect that we ought to simply pass that parent directory as argument - caller
> *does* have a reference to it, so we don't need to mess with ->d_lock, etc.

I don't see where the parent is held in either path_openat nor
path_lookupat. What should I be passing into may_follow_link() for the
parent?

>> +                     err = may_follow_link(&link);
>> +                     if (unlikely(err))
>> +                             break;
>
> No.  This is definitely wrong - you are leaking dentries and vfsmount here.

What should I do to avoid the leak? I thought it was avoiding the need
to call put_link because it aborts before calling follow_link.

>> +             error = may_follow_link(&link);
>> +             if (unlikely(error))
>> +                     break;
>
> Ditto.

Same thing here -- it aborts before the follow_link. I must be
misunderstanding something. What am I missing?

Thanks for the feedback! I'll clean up the other things you mentioned as well.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] fs: add link restrictions
  2012-06-25 21:05 ` [PATCH 1/2] " Kees Cook
@ 2012-06-30  9:14   ` Al Viro
  2012-07-02  0:43     ` Kees Cook
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2012-06-30  9:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andrew Morton, linux-fsdevel, Eric Paris,
	Matthew Wilcox, Doug Ledford, Joe Korty, Eric W. Biederman,
	Ingo Molnar, David Howells, James Morris, linux-doc,
	Dan Rosenberg, kernel-hardening

On Mon, Jun 25, 2012 at 02:05:26PM -0700, Kees Cook wrote:

> +config PROTECTED_LINKS
> +	bool "Evaluate vulnerable link conditions"
> +	default y

Remember Linus' rants about 'default y' in general?

> +#ifdef CONFIG_PROTECTED_LINKS
> +int sysctl_protected_symlinks __read_mostly =
> +	CONFIG_PROTECTED_SYMLINKS_SYSCTL;
> +int sysctl_protected_hardlinks __read_mostly =
> +	CONFIG_PROTECTED_HARDLINKS_SYSCTL;
> +
> +/**
> + * may_follow_link - Check symlink following for unsafe situations
> + * @link: The path of the symlink
> + *
> + * In the case of the sysctl_protected_symlinks sysctl being enabled,
> + * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
> + * in a sticky world-writable directory. This is to protect privileged
> + * processes from failing races against path names that may change out
> + * from under them by way of other users creating malicious symlinks.
> + * It will permit symlinks to be followed only when outside a sticky
> + * world-writable directory, or when the uid of the symlink and follower
> + * match, or when the directory owner matches the symlink's owner.
> + *
> + * Returns 0 if following the symlink is allowed, -ve on error.
> + */
> +static inline int may_follow_link(struct path *link)
> +{
> +	int error = 0;
> +	const struct inode *parent;
> +	const struct inode *inode;
> +	const struct cred *cred;
> +	struct dentry *dentry;
> +
> +	if (!sysctl_protected_symlinks)
> +		return 0;

Um.  What this says to me is "this function should be outside of ifdef, with
#else of that ifdef defining sysctl_protected_symlinks to 0".

> +	/* Check parent directory mode and owner. */

I suspect that we ought to simply pass that parent directory as argument - caller
*does* have a reference to it, so we don't need to mess with ->d_lock, etc.

> +	mode_t mode = inode->i_mode;

umode_t, please.

> +static int may_linkat(struct path *link)
> +{
> +	const struct cred *cred;
> +	struct inode *inode;
> +
> +	if (!sysctl_protected_hardlinks)
> +		return 0;

Ditto about taking it outside of ifdef.

> +			err = may_follow_link(&link);
> +			if (unlikely(err))
> +				break;

No.  This is definitely wrong - you are leaking dentries and vfsmount here.
> +		error = may_follow_link(&link);
> +		if (unlikely(error))
> +			break;

Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] fs: add link restrictions
  2012-06-25 21:05 [PATCH v2012.4 0/2] " Kees Cook
@ 2012-06-25 21:05 ` Kees Cook
  2012-06-30  9:14   ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Kees Cook @ 2012-06-25 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alexander Viro, Andrew Morton, linux-fsdevel, Eric Paris,
	Matthew Wilcox, Doug Ledford, Joe Korty, Eric W. Biederman,
	Ingo Molnar, David Howells, James Morris, linux-doc,
	Dan Rosenberg, kernel-hardening

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

 1996 Aug, Zygo Blaxell
  http://marc.info/?l=bugtraq&m=87602167419830&w=2
 1996 Oct, Andrew Tridgell
  http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
 1997 Dec, Albert D Cahalan
  http://lkml.org/lkml/1997/12/16/4
 2005 Feb, Lorenzo Hernández García-Hierro
  http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
 2010 May, Kees Cook
  https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

 - Violates POSIX.
   - POSIX didn't consider this situation and it's not useful to follow
     a broken specification at the cost of security.
 - Might break unknown applications that use this feature.
   - Applications that break because of the change are easy to spot and
     fix. Applications that are vulnerable to symlink ToCToU by not having
     the change aren't. Additionally, no applications have yet been found
     that rely on this behavior.
 - Applications should just use mkstemp() or O_CREATE|O_EXCL.
   - True, but applications are not perfect, and new software is written
     all the time that makes these mistakes; blocking this flaw at the
     kernel is a single solution to the entire class of vulnerability.
 - This should live in the core VFS.
   - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
 - This should live in an LSM.
   - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

---
v2012.4:
 - split audit functions into a separate patch, suggested by Eric Paris.
v2012.3:
 While this code has been living in -mm and linux-next for 2 releases,
 this is a small rework based on feedback from Al Viro:
 - Moved audit functions into audit.c.
 - Added tests directly to path_openat/path_lookupat.
 - Merged with hardlink restriction patch to make things more sensible.
v2012.2:
 - Change sysctl mode to 0600, suggested by Ingo Molnar.
 - Rework CONFIG logic to split code from default behavior.
 - Renamed sysctl to have a "sysctl_" prefix, suggested by Andrew Morton.
 - Use "true/false" instead of "1/0" for bool arg, thanks to Andrew Morton.
 - Do not trust s_id to be safe to print, suggested by Andrew Morton.
v2012.1:
 - Use GFP_KERNEL for audit log allocation, thanks to Ingo Molnar.
v2011.3:
 - Add pid/comm back to logging.
v2011.2:
 - Updated documentation, thanks to Randy Dunlap.
 - Switched Kconfig default to "y", added __read_mostly to sysctl,
   thanks to Ingo Molnar.
 - Switched to audit logging to gain safe path and name reporting when
   hitting the restriction.
v2011.1:
 - back from hiatus
---
 Documentation/sysctl/fs.txt |   42 +++++++++++++
 fs/Kconfig                  |   60 ++++++++++++++++++
 fs/namei.c                  |  139 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |    2 +
 kernel/sysctl.c             |   20 ++++++
 5 files changed, 263 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 13d6166..d4a372e 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -32,6 +32,8 @@ Currently, these files are in /proc/sys/fs:
 - nr_open
 - overflowuid
 - overflowgid
+- protected_hardlinks
+- protected_symlinks
 - suid_dumpable
 - super-max
 - super-nr
@@ -157,6 +159,46 @@ The default is 65534.
 
 ==============================================================
 
+protected_hardlinks:
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
+protected_symlinks:
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+==============================================================
+
 suid_dumpable:
 
 This value can be used to query and set the core dump mode for setuid
diff --git a/fs/Kconfig b/fs/Kconfig
index f95ae3a..ba8974e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -276,4 +276,64 @@ endif # NETWORK_FILESYSTEMS
 source "fs/nls/Kconfig"
 source "fs/dlm/Kconfig"
 
+config PROTECTED_LINKS
+	bool "Evaluate vulnerable link conditions"
+	default y
+	help
+	  A long-standing class of security issues is the link-based
+	  time-of-check-time-of-use race, most commonly seen in
+	  world-writable directories like /tmp. The common method of
+	  exploitation of this flaw is to cross privilege boundaries
+	  when following a given link (i.e. a root process follows
+	  a malicious symlink belonging to another user, or a hardlink
+	  created to a root-owned file).
+
+	  Enabling this adds the logic to examine these dangerous link
+	  conditions. Whether or not the dangerous link situations are
+	  allowed is controlled by PROTECTED_HARDLINKS_ENABLED and
+	  PROTECTED_SYMLINKS_ENABLED.
+
+config PROTECTED_SYMLINKS
+	depends on PROTECTED_LINKS
+	bool "Disallow symlink following in sticky world-writable dirs"
+	default y
+	help
+	  Solve ToCToU symlink race vulnerabilities by permitting symlinks
+	  to be followed only when outside a sticky world-writable directory,
+	  or when the uid of the symlink and follower match, or when the
+	  directory and symlink owners match.
+
+	  When PROC_SYSCTL is enabled, this setting can also be controlled
+	  via /proc/sys/kernel/protected_symlinks.
+
+	  See Documentation/sysctl/fs.txt for details.
+
+config PROTECTED_SYMLINKS_SYSCTL
+	depends on PROTECTED_LINKS
+	int
+	default "1" if PROTECTED_SYMLINKS
+	default "0"
+
+config PROTECTED_HARDLINKS
+	depends on PROTECTED_LINKS
+	bool "Disallow hardlink creation to non-accessible files"
+	default y
+	help
+	  Solve ToCToU hardlink race vulnerabilities by permitting hardlinks
+	  to be created only when to a regular file that is owned by the user,
+	  or is readable and writable by the user. Also blocks users from
+	  "pinning" vulnerable setuid/setgid programs from being upgraded by
+	  the administrator.
+
+	  When PROC_SYSCTL is enabled, this setting can also be controlled
+	  via /proc/sys/kernel/protected_hardlinks.
+
+	  See Documentation/sysctl/fs.txt for details.
+
+config PROTECTED_HARDLINKS_SYSCTL
+	depends on PROTECTED_LINKS
+	int
+	default "1" if PROTECTED_HARDLINKS
+	default "0"
+
 endmenu
diff --git a/fs/namei.c b/fs/namei.c
index 7d69419..71bb13c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -610,6 +610,136 @@ static inline void put_link(struct nameidata *nd, struct path *link, void *cooki
 	path_put(link);
 }
 
+#ifdef CONFIG_PROTECTED_LINKS
+int sysctl_protected_symlinks __read_mostly =
+	CONFIG_PROTECTED_SYMLINKS_SYSCTL;
+int sysctl_protected_hardlinks __read_mostly =
+	CONFIG_PROTECTED_HARDLINKS_SYSCTL;
+
+/**
+ * may_follow_link - Check symlink following for unsafe situations
+ * @link: The path of the symlink
+ *
+ * In the case of the sysctl_protected_symlinks sysctl being enabled,
+ * CAP_DAC_OVERRIDE needs to be specifically ignored if the symlink is
+ * in a sticky world-writable directory. This is to protect privileged
+ * processes from failing races against path names that may change out
+ * from under them by way of other users creating malicious symlinks.
+ * It will permit symlinks to be followed only when outside a sticky
+ * world-writable directory, or when the uid of the symlink and follower
+ * match, or when the directory owner matches the symlink's owner.
+ *
+ * Returns 0 if following the symlink is allowed, -ve on error.
+ */
+static inline int may_follow_link(struct path *link)
+{
+	int error = 0;
+	const struct inode *parent;
+	const struct inode *inode;
+	const struct cred *cred;
+	struct dentry *dentry;
+
+	if (!sysctl_protected_symlinks)
+		return 0;
+
+	/* Allowed if owner and follower match. */
+	cred = current_cred();
+	dentry = link->dentry;
+	inode = dentry->d_inode;
+	if (cred->fsuid == inode->i_uid)
+		return 0;
+
+	/* Check parent directory mode and owner. */
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent->d_inode;
+	if ((parent->i_mode & (S_ISVTX|S_IWOTH)) == (S_ISVTX|S_IWOTH) &&
+	    parent->i_uid != inode->i_uid) {
+		error = -EACCES;
+	}
+	spin_unlock(&dentry->d_lock);
+
+	return error;
+}
+
+/**
+ * safe_hardlink_source - Check for safe hardlink conditions
+ * @inode: the source inode to hardlink from
+ *
+ * Return false if at least one of the following conditions:
+ *    - inode is not a regular file
+ *    - inode is setuid
+ *    - inode is setgid and group-exec
+ *    - access failure for read and write
+ *
+ * Otherwise returns true.
+ */
+static bool safe_hardlink_source(struct inode *inode)
+{
+	mode_t mode = inode->i_mode;
+
+	/* Special files should not get pinned to the filesystem. */
+	if (!S_ISREG(mode))
+		return false;
+
+	/* Setuid files should not get pinned to the filesystem. */
+	if (mode & S_ISUID)
+		return false;
+
+	/* Executable setgid files should not get pinned to the filesystem. */
+	if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+		return false;
+
+	/* Hardlinking to unreadable or unwritable sources is dangerous. */
+	if (inode_permission(inode, MAY_READ | MAY_WRITE))
+		return false;
+
+	return true;
+}
+
+/**
+ * may_linkat - Check permissions for creating a hardlink
+ * @link: the source to hardlink from
+ *
+ * Block hardlink when all of:
+ *  - sysctl_protected_hardlinks enabled
+ *  - fsuid does not match inode
+ *  - hardlink source is unsafe (see safe_hardlink_source() above)
+ *  - not CAP_FOWNER
+ *
+ * Returns 0 if successful, -ve on error.
+ */
+static int may_linkat(struct path *link)
+{
+	const struct cred *cred;
+	struct inode *inode;
+
+	if (!sysctl_protected_hardlinks)
+		return 0;
+
+	cred = current_cred();
+	inode = link->dentry->d_inode;
+
+	/* Source inode owner (or CAP_FOWNER) can hardlink all they like,
+	 * otherwise, it must be a safe source.
+	 */
+	if (cred->fsuid == inode->i_uid || safe_hardlink_source(inode) ||
+	    capable(CAP_FOWNER))
+		return 0;
+
+	return -EPERM;
+}
+#else
+static inline int may_follow_link(struct path *link)
+{
+	return 0;
+}
+
+static inline int may_linkat(struct path *link)
+{
+	return 0;
+}
+#endif
+
 static __always_inline int
 follow_link(struct path *link, struct nameidata *nd, void **p)
 {
@@ -1775,6 +1905,9 @@ static int path_lookupat(int dfd, const char *name,
 		while (err > 0) {
 			void *cookie;
 			struct path link = path;
+			err = may_follow_link(&link);
+			if (unlikely(err))
+				break;
 			nd->flags |= LOOKUP_PARENT;
 			err = follow_link(&link, nd, &cookie);
 			if (!err)
@@ -2471,6 +2604,9 @@ static struct file *path_openat(int dfd, const char *pathname,
 			filp = ERR_PTR(-ELOOP);
 			break;
 		}
+		error = may_follow_link(&link);
+		if (unlikely(error))
+			break;
 		nd->flags |= LOOKUP_PARENT;
 		nd->flags &= ~(LOOKUP_OPEN|LOOKUP_CREATE|LOOKUP_EXCL);
 		error = follow_link(&link, nd, &cookie);
@@ -3130,6 +3266,9 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 	error = -EXDEV;
 	if (old_path.mnt != new_path.mnt)
 		goto out_dput;
+	error = may_linkat(&old_path);
+	if (unlikely(error))
+		goto out_dput;
 	error = mnt_want_write(new_path.mnt);
 	if (error)
 		goto out_dput;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 17fd887..b0a6d44 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -437,6 +437,8 @@ extern unsigned long get_max_files(void);
 extern int sysctl_nr_open;
 extern struct inodes_stat_t inodes_stat;
 extern int leases_enable, lease_break_time;
+extern int sysctl_protected_symlinks;
+extern int sysctl_protected_hardlinks;
 
 struct buffer_head;
 typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..ba133ec 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1493,6 +1493,26 @@ static struct ctl_table fs_table[] = {
 	},
 #endif
 #endif
+#ifdef CONFIG_PROTECTED_LINKS
+	{
+		.procname	= "protected_symlinks",
+		.data		= &sysctl_protected_symlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
+		.procname	= "protected_hardlinks",
+		.data		= &sysctl_protected_hardlinks,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-08-12  6:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-02 20:17 [PATCH v2012.5 0/2] fs: add link restrictions Kees Cook
2012-07-02 20:17 ` [kernel-hardening] " Kees Cook
2012-07-02 20:17 ` [PATCH 1/2] " Kees Cook
2012-07-02 20:17   ` [kernel-hardening] " Kees Cook
2012-07-02 20:17 ` [PATCH 2/2] fs: add link restriction audit reporting Kees Cook
2012-07-02 20:17   ` [kernel-hardening] " Kees Cook
  -- strict thread matches above, loose matches on Subject: below --
2012-07-26  0:29 [RESEND][PATCH v2012.5 0/2] fs: add link restrictions Kees Cook
2012-07-26  0:29 ` [PATCH 1/2] " Kees Cook
2012-08-03  4:26   ` James Morris
2012-08-03 17:01     ` [kernel-hardening] " Kees Cook
2012-08-06 23:55       ` Eric W. Biederman
2012-08-06 23:57         ` Kees Cook
2012-08-08 12:19   ` [kernel-hardening] " Vasily Kulikov
2012-08-12  6:34     ` Kees Cook
2012-06-25 21:05 [PATCH v2012.4 0/2] " Kees Cook
2012-06-25 21:05 ` [PATCH 1/2] " Kees Cook
2012-06-30  9:14   ` Al Viro
2012-07-02  0:43     ` Kees Cook
2012-07-02  1:26       ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.