* [PATCH v3 0/2] Restrict dangerous open in sticky directories @ 2017-11-22 8:01 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman This patch-set introduces two separate features aimed at restricting dangerous open in world or group writable sticky directories. The purpose is to prevent exploitable bugs in user-space programs that don't access sticky directories in the proper way. The first patch prevents the O_CREAT open of FIFOs and regular files in world or group writable sticky directories, if they already exists and are owned by someone else. The second patch prevents O_CREAT open in world or group writable sticky when the O_EXCL flag is not set, even if the file doesn't exist yet. More details can be found in the respective commit messages. Changes in v3: - Fixed format string for uid_t that is unsigned (suggested by Jann Horn). - Stop checking if file's and parent dir's owners match in may_create_no_excl. This will allow to discover potential vulnerabilities more easily. Salvatore Mesoraca (2): Protected FIFOs and regular files Protected O_CREAT open in sticky directories Documentation/sysctl/fs.txt | 66 +++++++++++++++++++++++++ fs/namei.c | 117 ++++++++++++++++++++++++++++++++++++++++++-- include/linux/fs.h | 3 ++ kernel/sysctl.c | 27 ++++++++++ 4 files changed, 210 insertions(+), 3 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] [PATCH v3 0/2] Restrict dangerous open in sticky directories @ 2017-11-22 8:01 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman This patch-set introduces two separate features aimed at restricting dangerous open in world or group writable sticky directories. The purpose is to prevent exploitable bugs in user-space programs that don't access sticky directories in the proper way. The first patch prevents the O_CREAT open of FIFOs and regular files in world or group writable sticky directories, if they already exists and are owned by someone else. The second patch prevents O_CREAT open in world or group writable sticky when the O_EXCL flag is not set, even if the file doesn't exist yet. More details can be found in the respective commit messages. Changes in v3: - Fixed format string for uid_t that is unsigned (suggested by Jann Horn). - Stop checking if file's and parent dir's owners match in may_create_no_excl. This will allow to discover potential vulnerabilities more easily. Salvatore Mesoraca (2): Protected FIFOs and regular files Protected O_CREAT open in sticky directories Documentation/sysctl/fs.txt | 66 +++++++++++++++++++++++++ fs/namei.c | 117 ++++++++++++++++++++++++++++++++++++++++++-- include/linux/fs.h | 3 ++ kernel/sysctl.c | 27 ++++++++++ 4 files changed, 210 insertions(+), 3 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 1/2] Protected FIFOs and regular files 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-22 8:01 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before symlinks restrictions, fifos/regular files were not the favorite vehicle to exploit them. Suggested-by: Solar Designer <solar@openwall.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --- Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++ fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- include/linux/fs.h | 2 ++ kernel/sysctl.c | 18 +++++++++++++ 4 files changed, 114 insertions(+), 3 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 6c00c1e..f3cf2cd 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs: - overflowgid - pipe-user-pages-hard - pipe-user-pages-soft +- protected_fifos - protected_hardlinks +- protected_regular - protected_symlinks - suid_dumpable - super-max @@ -182,6 +184,24 @@ applied. ============================================================== +protected_fifos: + +The intent of this protection is to avoid unintentional writes to +an attacker-controlled FIFO, where a program expected to create a regular +file. + +When set to "0", FIFOs writing is unrestricted. + +When set to "1" don't allow O_CREAT open on FIFOs that we don't own +in world writable sticky directories, unless they are owned by the +owner of the directory. + +When set to "2" it also applies to group writable sticky directories. + +This protection is based on the restrictions in Openwall. + +============================================================== + protected_hardlinks: A long-standing class of security issues is the hardlink-based @@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity. ============================================================== +protected_regular: + +This protection is similar to protected_fifos, but it +avoids writes to an attacker-controlled regular file, where a program +expected to create one. + +When set to "0", regular files writing is unrestricted. + +When set to "1" don't allow O_CREAT open on regular files that we +don't own in world writable sticky directories, unless they are +owned by the owner of the directory. + +When set to "2" it also applies to group writable sticky directories. + +============================================================== + protected_symlinks: A long-standing class of security issues is the symlink-based diff --git a/fs/namei.c b/fs/namei.c index f0c7a7b..92992ad 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd) int sysctl_protected_symlinks __read_mostly = 0; int sysctl_protected_hardlinks __read_mostly = 0; +int sysctl_protected_fifos __read_mostly; +int sysctl_protected_regular __read_mostly; /** * may_follow_link - Check symlink following for unsafe situations @@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link) return -EPERM; } +/** + * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory + * should be allowed or not, when the file already + * existed. + * @dir: the stick parent directory + * @name: the file name + * @inode: the inode of the file to open + * + * Block an O_CREAT open of a FIFO (or a regular file) when: + * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled + * - the file already exists + * - we are in a sticky directory + * - we don't own the file + * - the owner of the directory doesn't own the file + * - the directory is world writable + * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2 + * the directory doesn't have to be world writable: being group writable will + * be enough. + * + * Returns 0 if the open is allowed, -ve on error. + */ +static int may_create_in_sticky(struct dentry * const dir, + const unsigned char * const name, + struct inode * const inode) +{ + if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) || + (!sysctl_protected_regular && S_ISREG(inode->i_mode)) || + likely(!(dir->d_inode->i_mode & S_ISVTX)) || + uid_eq(inode->i_uid, dir->d_inode->i_uid) || + uid_eq(current_fsuid(), inode->i_uid)) + return 0; + + if (likely(dir->d_inode->i_mode & 0002) || + (dir->d_inode->i_mode & 0020 && + ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) || + (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) { + pr_notice_ratelimited("denied writing in '%s' of %u.%u in a sticky directory by UID %u, EUID %u, process %s:%d.\n", + name, + from_kuid(&init_user_ns, inode->i_uid), + from_kgid(&init_user_ns, inode->i_gid), + from_kuid(&init_user_ns, current_uid()), + from_kuid(&init_user_ns, current_euid()), + current->comm, current->pid); + return -EACCES; + } + return 0; +} + static __always_inline const char *get_link(struct nameidata *nd) { @@ -3365,9 +3415,14 @@ static int do_last(struct nameidata *nd, if (error) return error; audit_inode(nd->name, nd->path.dentry, 0); - error = -EISDIR; - if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry)) - goto out; + if (open_flag & O_CREAT) { + error = -EISDIR; + if (d_is_dir(nd->path.dentry)) + goto out; + error = may_create_in_sticky(dir, nd->last.name, inode); + if (unlikely(error)) + goto out; + } error = -ENOTDIR; if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry)) goto out; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2995a27..6fb45a52 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -72,6 +72,8 @@ extern int leases_enable, lease_break_time; extern int sysctl_protected_symlinks; extern int sysctl_protected_hardlinks; +extern int sysctl_protected_fifos; +extern int sysctl_protected_regular; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 557d467..590fbc9 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1799,6 +1799,24 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, .extra2 = &one, }, { + .procname = "protected_fifos", + .data = &sysctl_protected_fifos, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &two, + }, + { + .procname = "protected_regular", + .data = &sysctl_protected_regular, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &two, + }, + { .procname = "suid_dumpable", .data = &suid_dumpable, .maxlen = sizeof(int), -- 1.9.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files @ 2017-11-22 8:01 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before symlinks restrictions, fifos/regular files were not the favorite vehicle to exploit them. Suggested-by: Solar Designer <solar@openwall.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --- Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++ fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- include/linux/fs.h | 2 ++ kernel/sysctl.c | 18 +++++++++++++ 4 files changed, 114 insertions(+), 3 deletions(-) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 6c00c1e..f3cf2cd 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs: - overflowgid - pipe-user-pages-hard - pipe-user-pages-soft +- protected_fifos - protected_hardlinks +- protected_regular - protected_symlinks - suid_dumpable - super-max @@ -182,6 +184,24 @@ applied. ============================================================== +protected_fifos: + +The intent of this protection is to avoid unintentional writes to +an attacker-controlled FIFO, where a program expected to create a regular +file. + +When set to "0", FIFOs writing is unrestricted. + +When set to "1" don't allow O_CREAT open on FIFOs that we don't own +in world writable sticky directories, unless they are owned by the +owner of the directory. + +When set to "2" it also applies to group writable sticky directories. + +This protection is based on the restrictions in Openwall. + +============================================================== + protected_hardlinks: A long-standing class of security issues is the hardlink-based @@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity. ============================================================== +protected_regular: + +This protection is similar to protected_fifos, but it +avoids writes to an attacker-controlled regular file, where a program +expected to create one. + +When set to "0", regular files writing is unrestricted. + +When set to "1" don't allow O_CREAT open on regular files that we +don't own in world writable sticky directories, unless they are +owned by the owner of the directory. + +When set to "2" it also applies to group writable sticky directories. + +============================================================== + protected_symlinks: A long-standing class of security issues is the symlink-based diff --git a/fs/namei.c b/fs/namei.c index f0c7a7b..92992ad 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd) int sysctl_protected_symlinks __read_mostly = 0; int sysctl_protected_hardlinks __read_mostly = 0; +int sysctl_protected_fifos __read_mostly; +int sysctl_protected_regular __read_mostly; /** * may_follow_link - Check symlink following for unsafe situations @@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link) return -EPERM; } +/** + * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory + * should be allowed or not, when the file already + * existed. + * @dir: the stick parent directory + * @name: the file name + * @inode: the inode of the file to open + * + * Block an O_CREAT open of a FIFO (or a regular file) when: + * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled + * - the file already exists + * - we are in a sticky directory + * - we don't own the file + * - the owner of the directory doesn't own the file + * - the directory is world writable + * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2 + * the directory doesn't have to be world writable: being group writable will + * be enough. + * + * Returns 0 if the open is allowed, -ve on error. + */ +static int may_create_in_sticky(struct dentry * const dir, + const unsigned char * const name, + struct inode * const inode) +{ + if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) || + (!sysctl_protected_regular && S_ISREG(inode->i_mode)) || + likely(!(dir->d_inode->i_mode & S_ISVTX)) || + uid_eq(inode->i_uid, dir->d_inode->i_uid) || + uid_eq(current_fsuid(), inode->i_uid)) + return 0; + + if (likely(dir->d_inode->i_mode & 0002) || + (dir->d_inode->i_mode & 0020 && + ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) || + (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) { + pr_notice_ratelimited("denied writing in '%s' of %u.%u in a sticky directory by UID %u, EUID %u, process %s:%d.\n", + name, + from_kuid(&init_user_ns, inode->i_uid), + from_kgid(&init_user_ns, inode->i_gid), + from_kuid(&init_user_ns, current_uid()), + from_kuid(&init_user_ns, current_euid()), + current->comm, current->pid); + return -EACCES; + } + return 0; +} + static __always_inline const char *get_link(struct nameidata *nd) { @@ -3365,9 +3415,14 @@ static int do_last(struct nameidata *nd, if (error) return error; audit_inode(nd->name, nd->path.dentry, 0); - error = -EISDIR; - if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry)) - goto out; + if (open_flag & O_CREAT) { + error = -EISDIR; + if (d_is_dir(nd->path.dentry)) + goto out; + error = may_create_in_sticky(dir, nd->last.name, inode); + if (unlikely(error)) + goto out; + } error = -ENOTDIR; if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry)) goto out; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2995a27..6fb45a52 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -72,6 +72,8 @@ extern int leases_enable, lease_break_time; extern int sysctl_protected_symlinks; extern int sysctl_protected_hardlinks; +extern int sysctl_protected_fifos; +extern int sysctl_protected_regular; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 557d467..590fbc9 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1799,6 +1799,24 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, .extra2 = &one, }, { + .procname = "protected_fifos", + .data = &sysctl_protected_fifos, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &two, + }, + { + .procname = "protected_regular", + .data = &sysctl_protected_regular, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &two, + }, + { .procname = "suid_dumpable", .data = &suid_dumpable, .maxlen = sizeof(int), -- 1.9.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca (?) @ 2017-11-23 22:43 ` Tobin C. Harding 2017-11-24 8:24 ` Salvatore Mesoraca -1 siblings, 1 reply; 60+ messages in thread From: Tobin C. Harding @ 2017-11-23 22:43 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, Nov 22, 2017 at 09:01:45AM +0100, Salvatore Mesoraca wrote: Please take these comments in all humility, my English is a long way from perfect. These are English grammar comments only. If this is viewed as trivial please stop reading now and ignore. > Disallows open of FIFOs or regular files not owned by the user in world > writable sticky directories, unless the owner is the same as that of > the directory or the file is opened without the O_CREAT flag. > The purpose is to make data spoofing attacks harder. > This protection can be turned on and off separately for FIFOs and regular > files via sysctl, just like the symlinks/hardlinks protection. > This patch is based on Openwall's "HARDEN_FIFO" feature by Solar > Designer. > > This is a brief list of old vulnerabilities that could have been prevented > by this feature, some of them even allow for privilege escalation: > CVE-2000-1134 > CVE-2007-3852 > CVE-2008-0525 > CVE-2009-0416 > CVE-2011-4834 > CVE-2015-1838 > CVE-2015-7442 > CVE-2016-7489 > > This list is not meant to be complete. It's difficult to track down > all vulnerabilities of this kind because they were often reported > without any mention of this particular attack vector. > In fact, before symlinks restrictions, fifos/regular files were not the > favorite vehicle to exploit them. > > Suggested-by: Solar Designer <solar@openwall.com> > Suggested-by: Kees Cook <keescook@chromium.org> > Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> > --- > Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++ > fs/namei.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- > include/linux/fs.h | 2 ++ > kernel/sysctl.c | 18 +++++++++++++ > 4 files changed, 114 insertions(+), 3 deletions(-) > > diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > index 6c00c1e..f3cf2cd 100644 > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/fs: > - overflowgid > - pipe-user-pages-hard > - pipe-user-pages-soft > +- protected_fifos > - protected_hardlinks > +- protected_regular > - protected_symlinks > - suid_dumpable > - super-max > @@ -182,6 +184,24 @@ applied. > > ============================================================== > > +protected_fifos: > + > +The intent of this protection is to avoid unintentional writes to > +an attacker-controlled FIFO, where a program expected to create a regular > +file. > + > +When set to "0", FIFOs writing is unrestricted. When set to "0", writing to FIFOs is unrestricted. > +When set to "1" don't allow O_CREAT open on FIFOs that we don't own > +in world writable sticky directories, unless they are owned by the > +owner of the directory. > + > +When set to "2" it also applies to group writable sticky directories. > + > +This protection is based on the restrictions in Openwall. > + > +============================================================== > + > protected_hardlinks: > > A long-standing class of security issues is the hardlink-based > @@ -202,6 +222,22 @@ This protection is based on the restrictions in Openwall and grsecurity. > > ============================================================== > > +protected_regular: > + > +This protection is similar to protected_fifos, but it > +avoids writes to an attacker-controlled regular file, where a program > +expected to create one. > + > +When set to "0", regular files writing is unrestricted. When set to "0", writing to regular files is unrestricted. > +When set to "1" don't allow O_CREAT open on regular files that we > +don't own in world writable sticky directories, unless they are > +owned by the owner of the directory. > + > +When set to "2" it also applies to group writable sticky directories. > + > +============================================================== > + > protected_symlinks: > > A long-standing class of security issues is the symlink-based > diff --git a/fs/namei.c b/fs/namei.c > index f0c7a7b..92992ad 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -902,6 +902,8 @@ static inline void put_link(struct nameidata *nd) > > int sysctl_protected_symlinks __read_mostly = 0; > int sysctl_protected_hardlinks __read_mostly = 0; > +int sysctl_protected_fifos __read_mostly; > +int sysctl_protected_regular __read_mostly; > > /** > * may_follow_link - Check symlink following for unsafe situations > @@ -1015,6 +1017,54 @@ static int may_linkat(struct path *link) > return -EPERM; > } > > +/** > + * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory > + * should be allowed or not, when the file already > + * existed. Perhaps + * may_create_in_sticky - Check whether an O_CREAT open, in a sticky directory, should be allowed, or not, on files that already exist. Hope this helps, Tobin. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 1/2] Protected FIFOs and regular files 2017-11-23 22:43 ` Tobin C. Harding @ 2017-11-24 8:24 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:24 UTC (permalink / raw) To: Tobin C. Harding Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-23 23:43 GMT+01:00 Tobin C. Harding <me@tobin.cc>: > On Wed, Nov 22, 2017 at 09:01:45AM +0100, Salvatore Mesoraca wrote: > > Please take these comments in all humility, my English is a long way > from perfect. These are English grammar comments only. If this is viewed > as trivial please stop reading now and ignore. Any help is always greatly appreciated! And I like your proposed changes, they sound better to me too. Thank you for your time, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-22 8:01 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman Disallows O_CREAT open missing the O_EXCL flag, in world or group writable directories, even if the file doesn't exist yet. With few exceptions (e.g. shared lock files based on flock()) if a program tries to open a file, in a sticky directory, with the O_CREAT flag and without the O_EXCL, it probably has a bug. This feature allows to detect and potentially block programs that act this way, it can be used to find vulnerabilities (like those prevented by patch #1) and to do policy enforcement. Suggested-by: Solar Designer <solar@openwall.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --- Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++ fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 1 + kernel/sysctl.c | 9 ++++++++ 4 files changed, 96 insertions(+) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index f3cf2cd..7f24b4f 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs: - protected_fifos - protected_hardlinks - protected_regular +- protected_sticky_child_create - protected_symlinks - suid_dumpable - super-max @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories. ============================================================== +protected_sticky_child_create: + +An O_CREAT open missing the O_EXCL flag in a sticky directory is, +often, a bug or a synthom of the fact that the program is not +using appropriate procedures to access sticky directories. +This protection allow to detect and possibly block these unsafe +open invocations, even if the files don't exist yet. +Though should be noted that, sometimes, it's OK to open a file +with O_CREAT and without O_EXCL (e.g. shared lock files based +on flock()), for this reason values above 2 should be set +with care. + +When set to "0" the protection is disabled. + +When set to "1", notify about O_CREAT open missing the O_EXCL flag +in world writable sticky directories. + +When set to "2", notify about O_CREAT open missing the O_EXCL flag +in world or group writable sticky directories. + +When set to "3", block O_CREAT open missing the O_EXCL flag +in world writable sticky directories and notify (but don't block) +in group writable sticky directories. + +When set to "4", block O_CREAT open missing the O_EXCL flag +in world writable and group writable sticky directories. + +============================================================== + protected_symlinks: A long-standing class of security issues is the symlink-based diff --git a/fs/namei.c b/fs/namei.c index 92992ad..fcee423 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -904,6 +904,7 @@ static inline void put_link(struct nameidata *nd) int sysctl_protected_hardlinks __read_mostly = 0; int sysctl_protected_fifos __read_mostly; int sysctl_protected_regular __read_mostly; +int sysctl_protected_sticky_child_create __read_mostly; /** * may_follow_link - Check symlink following for unsafe situations @@ -1065,6 +1066,53 @@ static int may_create_in_sticky(struct dentry * const dir, return 0; } +/** + * may_create_no_excl - Detect and possibly block unsafe O_CREAT open + * without O_EXCL. + * @dir: the stick parent directory + * @name: the file name + * @inode: the inode of the file to open (can be NULL to skip uid checks) + * + * When sysctl_protected_sticky_child_create is set to "0" the + * protection is disabled. + * When it's set to "1", notify about O_CREAT open missing the O_EXCL flag + * in world writable sticky directories. + * When it's set to "2", notify about O_CREAT open missing the O_EXCL flag + * in group writable sticky directories. + * When it's set to "3", block O_CREAT open missing the O_EXCL flag + * in world writable sticky directories and notify (but don't block) + * in group writable sticky directories. + * When it's set to "4", block O_CREAT open missing the O_EXCL flag + * in world writable and group writable sticky directories. + * + * Returns 0 if the open is allowed, -ve on error. + */ +static int may_create_no_excl(struct dentry * const dir, + const unsigned char * const name, + struct inode * const inode) +{ + umode_t mode = dir->d_inode->i_mode; + + if (likely(!(mode & S_ISVTX))) + return 0; + if (inode && uid_eq(current_fsuid(), inode->i_uid)) + return 0; + + if ((sysctl_protected_sticky_child_create && likely(mode & 0002)) || + (sysctl_protected_sticky_child_create >= 2 && mode & 0020)) { + pr_notice_ratelimited("unsafe O_CREAT open (missing O_EXCL) of '%s' in a sticky directory by UID %u, EUID %u, process %s:%d.\n", + name, + from_kuid(&init_user_ns, current_uid()), + from_kuid(&init_user_ns, current_euid()), + current->comm, current->pid); + if (sysctl_protected_sticky_child_create >= 4 || + (sysctl_protected_sticky_child_create == 3 && + likely(mode & 0002))) + return -EACCES; + } + return 0; +} + static __always_inline const char *get_link(struct nameidata *nd) { @@ -3256,6 +3304,11 @@ static int lookup_open(struct nameidata *nd, struct path *path, error = -EACCES; goto out_dput; } + if (!(open_flag & O_EXCL)) { + error = may_create_no_excl(dir, nd->last.name, NULL); + if (unlikely(error)) + goto out_dput; + } error = dir_inode->i_op->create(dir_inode, dentry, mode, open_flag & O_EXCL); if (error) @@ -3422,6 +3475,9 @@ static int do_last(struct nameidata *nd, error = may_create_in_sticky(dir, nd->last.name, inode); if (unlikely(error)) goto out; + error = may_create_no_excl(dir, nd->last.name, inode); + if (unlikely(error)) + goto out; } error = -ENOTDIR; if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6fb45a52..3ab37e0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -74,6 +74,7 @@ extern int sysctl_protected_hardlinks; extern int sysctl_protected_fifos; extern int sysctl_protected_regular; +extern int sysctl_protected_sticky_child_create; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 590fbc9..012c739 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1817,6 +1817,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, .extra2 = &two, }, { + .procname = "protected_sticky_child_create", + .data = &sysctl_protected_sticky_child_create, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &four, + }, + { .procname = "suid_dumpable", .data = &suid_dumpable, .maxlen = sizeof(int), -- 1.9.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-22 8:01 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-22 8:01 UTC (permalink / raw) To: linux-kernel Cc: Kernel Hardening, linux-fsdevel, Salvatore Mesoraca, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman Disallows O_CREAT open missing the O_EXCL flag, in world or group writable directories, even if the file doesn't exist yet. With few exceptions (e.g. shared lock files based on flock()) if a program tries to open a file, in a sticky directory, with the O_CREAT flag and without the O_EXCL, it probably has a bug. This feature allows to detect and potentially block programs that act this way, it can be used to find vulnerabilities (like those prevented by patch #1) and to do policy enforcement. Suggested-by: Solar Designer <solar@openwall.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --- Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++ fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 1 + kernel/sysctl.c | 9 ++++++++ 4 files changed, 96 insertions(+) diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index f3cf2cd..7f24b4f 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs: - protected_fifos - protected_hardlinks - protected_regular +- protected_sticky_child_create - protected_symlinks - suid_dumpable - super-max @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories. ============================================================== +protected_sticky_child_create: + +An O_CREAT open missing the O_EXCL flag in a sticky directory is, +often, a bug or a synthom of the fact that the program is not +using appropriate procedures to access sticky directories. +This protection allow to detect and possibly block these unsafe +open invocations, even if the files don't exist yet. +Though should be noted that, sometimes, it's OK to open a file +with O_CREAT and without O_EXCL (e.g. shared lock files based +on flock()), for this reason values above 2 should be set +with care. + +When set to "0" the protection is disabled. + +When set to "1", notify about O_CREAT open missing the O_EXCL flag +in world writable sticky directories. + +When set to "2", notify about O_CREAT open missing the O_EXCL flag +in world or group writable sticky directories. + +When set to "3", block O_CREAT open missing the O_EXCL flag +in world writable sticky directories and notify (but don't block) +in group writable sticky directories. + +When set to "4", block O_CREAT open missing the O_EXCL flag +in world writable and group writable sticky directories. + +============================================================== + protected_symlinks: A long-standing class of security issues is the symlink-based diff --git a/fs/namei.c b/fs/namei.c index 92992ad..fcee423 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -904,6 +904,7 @@ static inline void put_link(struct nameidata *nd) int sysctl_protected_hardlinks __read_mostly = 0; int sysctl_protected_fifos __read_mostly; int sysctl_protected_regular __read_mostly; +int sysctl_protected_sticky_child_create __read_mostly; /** * may_follow_link - Check symlink following for unsafe situations @@ -1065,6 +1066,53 @@ static int may_create_in_sticky(struct dentry * const dir, return 0; } +/** + * may_create_no_excl - Detect and possibly block unsafe O_CREAT open + * without O_EXCL. + * @dir: the stick parent directory + * @name: the file name + * @inode: the inode of the file to open (can be NULL to skip uid checks) + * + * When sysctl_protected_sticky_child_create is set to "0" the + * protection is disabled. + * When it's set to "1", notify about O_CREAT open missing the O_EXCL flag + * in world writable sticky directories. + * When it's set to "2", notify about O_CREAT open missing the O_EXCL flag + * in group writable sticky directories. + * When it's set to "3", block O_CREAT open missing the O_EXCL flag + * in world writable sticky directories and notify (but don't block) + * in group writable sticky directories. + * When it's set to "4", block O_CREAT open missing the O_EXCL flag + * in world writable and group writable sticky directories. + * + * Returns 0 if the open is allowed, -ve on error. + */ +static int may_create_no_excl(struct dentry * const dir, + const unsigned char * const name, + struct inode * const inode) +{ + umode_t mode = dir->d_inode->i_mode; + + if (likely(!(mode & S_ISVTX))) + return 0; + if (inode && uid_eq(current_fsuid(), inode->i_uid)) + return 0; + + if ((sysctl_protected_sticky_child_create && likely(mode & 0002)) || + (sysctl_protected_sticky_child_create >= 2 && mode & 0020)) { + pr_notice_ratelimited("unsafe O_CREAT open (missing O_EXCL) of '%s' in a sticky directory by UID %u, EUID %u, process %s:%d.\n", + name, + from_kuid(&init_user_ns, current_uid()), + from_kuid(&init_user_ns, current_euid()), + current->comm, current->pid); + if (sysctl_protected_sticky_child_create >= 4 || + (sysctl_protected_sticky_child_create == 3 && + likely(mode & 0002))) + return -EACCES; + } + return 0; +} + static __always_inline const char *get_link(struct nameidata *nd) { @@ -3256,6 +3304,11 @@ static int lookup_open(struct nameidata *nd, struct path *path, error = -EACCES; goto out_dput; } + if (!(open_flag & O_EXCL)) { + error = may_create_no_excl(dir, nd->last.name, NULL); + if (unlikely(error)) + goto out_dput; + } error = dir_inode->i_op->create(dir_inode, dentry, mode, open_flag & O_EXCL); if (error) @@ -3422,6 +3475,9 @@ static int do_last(struct nameidata *nd, error = may_create_in_sticky(dir, nd->last.name, inode); if (unlikely(error)) goto out; + error = may_create_no_excl(dir, nd->last.name, inode); + if (unlikely(error)) + goto out; } error = -ENOTDIR; if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6fb45a52..3ab37e0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -74,6 +74,7 @@ extern int sysctl_protected_hardlinks; extern int sysctl_protected_fifos; extern int sysctl_protected_regular; +extern int sysctl_protected_sticky_child_create; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 590fbc9..012c739 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1817,6 +1817,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, .extra2 = &two, }, { + .procname = "protected_sticky_child_create", + .data = &sysctl_protected_sticky_child_create, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &four, + }, + { .procname = "suid_dumpable", .data = &suid_dumpable, .maxlen = sizeof(int), -- 1.9.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca (?) @ 2017-11-22 12:03 ` Pavel Vasilyev 2017-11-24 8:28 ` Salvatore Mesoraca -1 siblings, 1 reply; 60+ messages in thread From: Pavel Vasilyev @ 2017-11-22 12:03 UTC (permalink / raw) To: kernel-hardening 22.11.2017 11:01, Salvatore Mesoraca пишет: > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) > if a program tries to open a file, in a sticky directory, > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > This feature allows to detect and potentially block programs that > act this way, it can be used to find vulnerabilities (like those > prevented by patch #1) and to do policy enforcement. > > Suggested-by: Solar Designer <solar@openwall.com> > Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> > --- > Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++ > fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 1 + > kernel/sysctl.c | 9 ++++++++ > 4 files changed, 96 insertions(+) > > diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > index f3cf2cd..7f24b4f 100644 > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs: > - protected_fifos > - protected_hardlinks > - protected_regular > +- protected_sticky_child_create > - protected_symlinks > - suid_dumpable > - super-max > @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories. > > ============================================================== > > +protected_sticky_child_create: > + > +An O_CREAT open missing the O_EXCL flag in a sticky directory is, > +often, a bug or a synthom of the fact that the program is not > +using appropriate procedures to access sticky directories. > +This protection allow to detect and possibly block these unsafe > +open invocations, even if the files don't exist yet. > +Though should be noted that, sometimes, it's OK to open a file > +with O_CREAT and without O_EXCL (e.g. shared lock files based > +on flock()), for this reason values above 2 should be set > +with care. > + > +When set to "0" the protection is disabled. > + > +When set to "1", notify about O_CREAT open missing the O_EXCL flag > +in world writable sticky directories. > + > +When set to "2", notify about O_CREAT open missing the O_EXCL flag > +in world or group writable sticky directories. > + > +When set to "3", block O_CREAT open missing the O_EXCL flag > +in world writable sticky directories and notify (but don't block) > +in group writable sticky directories. > + > +When set to "4", block O_CREAT open missing the O_EXCL flag > +in world writable and group writable sticky directories. May be add: When set to "X", notify O_CREAT open missing the O_EXCL flag in world writable sticky directories and notify in group writable sticky directories. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 12:03 ` Pavel Vasilyev @ 2017-11-24 8:28 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:28 UTC (permalink / raw) To: Pavel Vasilyev; +Cc: Kernel Hardening 2017-11-22 13:03 GMT+01:00 Pavel Vasilyev <dixlor@gmail.com>: > ... >> + >> +When set to "2", notify about O_CREAT open missing the O_EXCL flag >> +in world or group writable sticky directories. >> + >> +When set to "3", block O_CREAT open missing the O_EXCL flag >> +in world writable sticky directories and notify (but don't block) >> +in group writable sticky directories. >> + >> +When set to "4", block O_CREAT open missing the O_EXCL flag >> +in world writable and group writable sticky directories. > > May be add: > > When set to "X", notify O_CREAT open missing the O_EXCL flag > in world writable sticky directories and notify in group > writable sticky directories. Doesn't the "2" option already do this? Did I misunderstood something? Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-22 13:22 ` Matthew Wilcox -1 siblings, 0 replies; 60+ messages in thread From: Matthew Wilcox @ 2017-11-22 13:22 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: > +An O_CREAT open missing the O_EXCL flag in a sticky directory is, > +often, a bug or a synthom of the fact that the program is not > +using appropriate procedures to access sticky directories. > +This protection allow to detect and possibly block these unsafe > +open invocations, even if the files don't exist yet. > +Though should be noted that, sometimes, it's OK to open a file > +with O_CREAT and without O_EXCL (e.g. shared lock files based > +on flock()), for this reason values above 2 should be set > +with care. > + > +When set to "0" the protection is disabled. > + > +When set to "1", notify about O_CREAT open missing the O_EXCL flag > +in world writable sticky directories. > + > +When set to "2", notify about O_CREAT open missing the O_EXCL flag > +in world or group writable sticky directories. > + > +When set to "3", block O_CREAT open missing the O_EXCL flag > +in world writable sticky directories and notify (but don't block) > +in group writable sticky directories. > + > +When set to "4", block O_CREAT open missing the O_EXCL flag > +in world writable and group writable sticky directories. This seems insufficiently flexible. For example, there is no way for me to specify that I want to block O_CREAT without O_EXCL in world-writable, but not be notified about O_CREAT without O_EXCL in group-writable. And maybe I want to be notified that blocking has happened? Why not make it bits? So: 0 => notify in world 1 => block in world 2 => notify in group 3 => block in group So you'd have the following meaningful values: 0 - permit all (your option 0) 1 - notify world; permit group (your option 1) 2 - block world; permit group 3 - block,notify world; permit group 4 - permit world; notify group (?) 5 - notify world; notify group (your option 2) 6 - block world; notify group (your option 3) 7 - block,notify world; notify group 8 - permit world; block group (?) 9 - notify world; block group (?) 10 - block world; block group (your option 4) 11 - block,notify world; block group 12 - permit world; block, notify group (?) 13 - notify world; block, notify group (?) 14 - block world; block, notify group 15 - block, notify world; block, notify group Some of these don't make a lot of sense (marked with ?), but I don't see the harm in permitting a sysadmin to do something that seems nonsensical to me. ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-22 13:22 ` Matthew Wilcox 0 siblings, 0 replies; 60+ messages in thread From: Matthew Wilcox @ 2017-11-22 13:22 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: > +An O_CREAT open missing the O_EXCL flag in a sticky directory is, > +often, a bug or a synthom of the fact that the program is not > +using appropriate procedures to access sticky directories. > +This protection allow to detect and possibly block these unsafe > +open invocations, even if the files don't exist yet. > +Though should be noted that, sometimes, it's OK to open a file > +with O_CREAT and without O_EXCL (e.g. shared lock files based > +on flock()), for this reason values above 2 should be set > +with care. > + > +When set to "0" the protection is disabled. > + > +When set to "1", notify about O_CREAT open missing the O_EXCL flag > +in world writable sticky directories. > + > +When set to "2", notify about O_CREAT open missing the O_EXCL flag > +in world or group writable sticky directories. > + > +When set to "3", block O_CREAT open missing the O_EXCL flag > +in world writable sticky directories and notify (but don't block) > +in group writable sticky directories. > + > +When set to "4", block O_CREAT open missing the O_EXCL flag > +in world writable and group writable sticky directories. This seems insufficiently flexible. For example, there is no way for me to specify that I want to block O_CREAT without O_EXCL in world-writable, but not be notified about O_CREAT without O_EXCL in group-writable. And maybe I want to be notified that blocking has happened? Why not make it bits? So: 0 => notify in world 1 => block in world 2 => notify in group 3 => block in group So you'd have the following meaningful values: 0 - permit all (your option 0) 1 - notify world; permit group (your option 1) 2 - block world; permit group 3 - block,notify world; permit group 4 - permit world; notify group (?) 5 - notify world; notify group (your option 2) 6 - block world; notify group (your option 3) 7 - block,notify world; notify group 8 - permit world; block group (?) 9 - notify world; block group (?) 10 - block world; block group (your option 4) 11 - block,notify world; block group 12 - permit world; block, notify group (?) 13 - notify world; block, notify group (?) 14 - block world; block, notify group 15 - block, notify world; block, notify group Some of these don't make a lot of sense (marked with ?), but I don't see the harm in permitting a sysadmin to do something that seems nonsensical to me. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 13:22 ` [kernel-hardening] " Matthew Wilcox (?) @ 2017-11-22 14:38 ` Pavel Vasilyev -1 siblings, 0 replies; 60+ messages in thread From: Pavel Vasilyev @ 2017-11-22 14:38 UTC (permalink / raw) To: kernel-hardening 22.11.2017 16:22, Matthew Wilcox пишет: > > So you'd have the following meaningful values: > > 0 - permit all (your option 0) > 1 - notify world; permit group (your option 1) > 2 - block world; permit group > 3 - block,notify world; permit group > 4 - permit world; notify group (?) > 5 - notify world; notify group (your option 2) > 6 - block world; notify group (your option 3) > 7 - block,notify world; notify group > 8 - permit world; block group (?) > 9 - notify world; block group (?) > 10 - block world; block group (your option 4) > 11 - block,notify world; block group > 12 - permit world; block, notify group (?) > 13 - notify world; block, notify group (?) > 14 - block world; block, notify group > 15 - block, notify world; block, notify group > > Some of these don't make a lot of sense (marked with ?), but I don't see > the harm in permitting a sysadmin to do something that seems nonsensical > to me. > I think that notification in block mode by default. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 13:22 ` [kernel-hardening] " Matthew Wilcox @ 2017-11-24 8:29 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:29 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-22 14:22 GMT+01:00 Matthew Wilcox <willy@infradead.org>: > On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: >> +An O_CREAT open missing the O_EXCL flag in a sticky directory is, >> +often, a bug or a synthom of the fact that the program is not >> +using appropriate procedures to access sticky directories. >> +This protection allow to detect and possibly block these unsafe >> +open invocations, even if the files don't exist yet. >> +Though should be noted that, sometimes, it's OK to open a file >> +with O_CREAT and without O_EXCL (e.g. shared lock files based >> +on flock()), for this reason values above 2 should be set >> +with care. >> + >> +When set to "0" the protection is disabled. >> + >> +When set to "1", notify about O_CREAT open missing the O_EXCL flag >> +in world writable sticky directories. >> + >> +When set to "2", notify about O_CREAT open missing the O_EXCL flag >> +in world or group writable sticky directories. >> + >> +When set to "3", block O_CREAT open missing the O_EXCL flag >> +in world writable sticky directories and notify (but don't block) >> +in group writable sticky directories. >> + >> +When set to "4", block O_CREAT open missing the O_EXCL flag >> +in world writable and group writable sticky directories. > > This seems insufficiently flexible. For example, there is no way for me > to specify that I want to block O_CREAT without O_EXCL in world-writable, > but not be notified about O_CREAT without O_EXCL in group-writable. I understand your concern, I did it like this because I wanted to keep the interface as simple as possible. But, maybe, more flexibility it's better. > And maybe I want to be notified that blocking has happened? I didn't write it explicitly in the doc, but you will always be notified that blocking has happened. On the other hand you don't have any way to suppress those notification. > Why not make it bits? So: > > 0 => notify in world > 1 => block in world > 2 => notify in group > 3 => block in group > > So you'd have the following meaningful values: > > 0 - permit all (your option 0) > 1 - notify world; permit group (your option 1) > 2 - block world; permit group > 3 - block,notify world; permit group > 4 - permit world; notify group (?) > 5 - notify world; notify group (your option 2) > 6 - block world; notify group (your option 3) > 7 - block,notify world; notify group > 8 - permit world; block group (?) > 9 - notify world; block group (?) > 10 - block world; block group (your option 4) > 11 - block,notify world; block group > 12 - permit world; block, notify group (?) > 13 - notify world; block, notify group (?) > 14 - block world; block, notify group > 15 - block, notify world; block, notify group > > Some of these don't make a lot of sense (marked with ?), but I don't see > the harm in permitting a sysadmin to do something that seems nonsensical > to me. I like your idea of using "bits" this way, even if it will allow sysadmins to set values that don't make too much sense. Thank you very much for your suggestions, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-24 8:29 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:29 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-22 14:22 GMT+01:00 Matthew Wilcox <willy@infradead.org>: > On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: >> +An O_CREAT open missing the O_EXCL flag in a sticky directory is, >> +often, a bug or a synthom of the fact that the program is not >> +using appropriate procedures to access sticky directories. >> +This protection allow to detect and possibly block these unsafe >> +open invocations, even if the files don't exist yet. >> +Though should be noted that, sometimes, it's OK to open a file >> +with O_CREAT and without O_EXCL (e.g. shared lock files based >> +on flock()), for this reason values above 2 should be set >> +with care. >> + >> +When set to "0" the protection is disabled. >> + >> +When set to "1", notify about O_CREAT open missing the O_EXCL flag >> +in world writable sticky directories. >> + >> +When set to "2", notify about O_CREAT open missing the O_EXCL flag >> +in world or group writable sticky directories. >> + >> +When set to "3", block O_CREAT open missing the O_EXCL flag >> +in world writable sticky directories and notify (but don't block) >> +in group writable sticky directories. >> + >> +When set to "4", block O_CREAT open missing the O_EXCL flag >> +in world writable and group writable sticky directories. > > This seems insufficiently flexible. For example, there is no way for me > to specify that I want to block O_CREAT without O_EXCL in world-writable, > but not be notified about O_CREAT without O_EXCL in group-writable. I understand your concern, I did it like this because I wanted to keep the interface as simple as possible. But, maybe, more flexibility it's better. > And maybe I want to be notified that blocking has happened? I didn't write it explicitly in the doc, but you will always be notified that blocking has happened. On the other hand you don't have any way to suppress those notification. > Why not make it bits? So: > > 0 => notify in world > 1 => block in world > 2 => notify in group > 3 => block in group > > So you'd have the following meaningful values: > > 0 - permit all (your option 0) > 1 - notify world; permit group (your option 1) > 2 - block world; permit group > 3 - block,notify world; permit group > 4 - permit world; notify group (?) > 5 - notify world; notify group (your option 2) > 6 - block world; notify group (your option 3) > 7 - block,notify world; notify group > 8 - permit world; block group (?) > 9 - notify world; block group (?) > 10 - block world; block group (your option 4) > 11 - block,notify world; block group > 12 - permit world; block, notify group (?) > 13 - notify world; block, notify group (?) > 14 - block world; block, notify group > 15 - block, notify world; block, notify group > > Some of these don't make a lot of sense (marked with ?), but I don't see > the harm in permitting a sysadmin to do something that seems nonsensical > to me. I like your idea of using "bits" this way, even if it will allow sysadmins to set values that don't make too much sense. Thank you very much for your suggestions, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-22 16:51 ` Alan Cox -1 siblings, 0 replies; 60+ messages in thread From: Alan Cox @ 2017-11-22 16:51 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) Enough exceptions to make it a bad idea. Firstly if you care this much *stop* having shared writable directories. We have namespaces, you don't need them. You can give every user their own /tmp etc. The rest of this only make sense on a per application and directory basis because there are valid use cases, and that means it wants to be part of an existing LSM security module where you've got the context required and you can attach it to a specific directory and/or process. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-22 16:51 ` Alan Cox 0 siblings, 0 replies; 60+ messages in thread From: Alan Cox @ 2017-11-22 16:51 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) Enough exceptions to make it a bad idea. Firstly if you care this much *stop* having shared writable directories. We have namespaces, you don't need them. You can give every user their own /tmp etc. The rest of this only make sense on a per application and directory basis because there are valid use cases, and that means it wants to be part of an existing LSM security module where you've got the context required and you can attach it to a specific directory and/or process. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 16:51 ` [kernel-hardening] " Alan Cox @ 2017-11-24 8:31 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:31 UTC (permalink / raw) To: Alan Cox Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-22 17:51 GMT+01:00 Alan Cox <gnomes@lxorguk.ukuu.org.uk>: > On Wed, 22 Nov 2017 09:01:46 +0100 > Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> Disallows O_CREAT open missing the O_EXCL flag, in world or >> group writable directories, even if the file doesn't exist yet. >> With few exceptions (e.g. shared lock files based on flock()) > > Enough exceptions to make it a bad idea. > > Firstly if you care this much *stop* having shared writable directories. > We have namespaces, you don't need them. You can give every user their > own /tmp etc. > > The rest of this only make sense on a per application and directory basis > because there are valid use cases, and that means it wants to be part of > an existing LSM security module where you've got the context required and > you can attach it to a specific directory and/or process. I think that this feature should be intended more as a "debugging" feature than as a "security" one. When the feature implemented in the first patch is enabled, this restriction doesn't improve security at all and it's not supposed to do it. The first patch blocks attacks that exploit some unsafe usage of sticky directories. This patch, instead, doesn't block actual attacks: it detects (and maybe blocks) the bad code that can be exploited for the attacks blocked by #1, even if no one is attacking you in that moment. This looks like a useful feature to me, even if you already use more sophisticated security apparatus like LSMs or namespaces, because it makes easy to find real vulnerabilities in software: the commit message of patch #1 has a short list of some CVEs that this feature can detect. Also, being just a sysctl away, it's within anyone's reach. Probably the "debugging" goal wasn't clear from my previous message, I'm sorry for the misunderstanding. Thank you very much for your time, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-24 8:31 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:31 UTC (permalink / raw) To: Alan Cox Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-22 17:51 GMT+01:00 Alan Cox <gnomes@lxorguk.ukuu.org.uk>: > On Wed, 22 Nov 2017 09:01:46 +0100 > Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> Disallows O_CREAT open missing the O_EXCL flag, in world or >> group writable directories, even if the file doesn't exist yet. >> With few exceptions (e.g. shared lock files based on flock()) > > Enough exceptions to make it a bad idea. > > Firstly if you care this much *stop* having shared writable directories. > We have namespaces, you don't need them. You can give every user their > own /tmp etc. > > The rest of this only make sense on a per application and directory basis > because there are valid use cases, and that means it wants to be part of > an existing LSM security module where you've got the context required and > you can attach it to a specific directory and/or process. I think that this feature should be intended more as a "debugging" feature than as a "security" one. When the feature implemented in the first patch is enabled, this restriction doesn't improve security at all and it's not supposed to do it. The first patch blocks attacks that exploit some unsafe usage of sticky directories. This patch, instead, doesn't block actual attacks: it detects (and maybe blocks) the bad code that can be exploited for the attacks blocked by #1, even if no one is attacking you in that moment. This looks like a useful feature to me, even if you already use more sophisticated security apparatus like LSMs or namespaces, because it makes easy to find real vulnerabilities in software: the commit message of patch #1 has a short list of some CVEs that this feature can detect. Also, being just a sysctl away, it's within anyone's reach. Probably the "debugging" goal wasn't clear from my previous message, I'm sorry for the misunderstanding. Thank you very much for your time, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 16:51 ` [kernel-hardening] " Alan Cox @ 2017-11-24 10:53 ` David Laight -1 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-24 10:53 UTC (permalink / raw) To: 'Alan Cox', Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Alan Cox > Sent: 22 November 2017 16:52 > > On Wed, 22 Nov 2017 09:01:46 +0100 > Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > > > Disallows O_CREAT open missing the O_EXCL flag, in world or > > group writable directories, even if the file doesn't exist yet. > > With few exceptions (e.g. shared lock files based on flock()) > > Enough exceptions to make it a bad idea. > > Firstly if you care this much *stop* having shared writable directories. > We have namespaces, you don't need them. You can give every user their > own /tmp etc. Looks like a very bad idea to me as well. Doesn't this stop all shell redirects into a shared /tmp ? I'm pretty sure most programs use O_CREAT | O_TRUNC for output files - they'll all stop working. If there are some directories where you need to force O_EXCL you need to make it a property of the directory, not the kernel. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-24 10:53 ` David Laight 0 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-24 10:53 UTC (permalink / raw) To: 'Alan Cox', Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Alan Cox > Sent: 22 November 2017 16:52 > > On Wed, 22 Nov 2017 09:01:46 +0100 > Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > > > Disallows O_CREAT open missing the O_EXCL flag, in world or > > group writable directories, even if the file doesn't exist yet. > > With few exceptions (e.g. shared lock files based on flock()) > > Enough exceptions to make it a bad idea. > > Firstly if you care this much *stop* having shared writable directories. > We have namespaces, you don't need them. You can give every user their > own /tmp etc. Looks like a very bad idea to me as well. Doesn't this stop all shell redirects into a shared /tmp ? I'm pretty sure most programs use O_CREAT | O_TRUNC for output files - they'll all stop working. If there are some directories where you need to force O_EXCL you need to make it a property of the directory, not the kernel. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-24 10:53 ` [kernel-hardening] " David Laight @ 2017-11-24 11:43 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 11:43 UTC (permalink / raw) To: David Laight Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > From: Alan Cox >> Sent: 22 November 2017 16:52 >> >> On Wed, 22 Nov 2017 09:01:46 +0100 >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: >> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or >> > group writable directories, even if the file doesn't exist yet. >> > With few exceptions (e.g. shared lock files based on flock()) >> >> Enough exceptions to make it a bad idea. >> >> Firstly if you care this much *stop* having shared writable directories. >> We have namespaces, you don't need them. You can give every user their >> own /tmp etc. > > Looks like a very bad idea to me as well. > > Doesn't this stop all shell redirects into a shared /tmp ? > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > files - they'll all stop working. If some program does such a thing, that's a potential vulnerability. With "protected_hardlinks" you are, in most cases, safe. But, still, that program has a bug and having this feature enabled will help you notice it soon. For that matter, I'm using this patch on my system and I don't have any program behaving like this. > If there are some directories where you need to force O_EXCL you > need to make it a property of the directory, not the kernel. > > David > ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-24 11:43 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 11:43 UTC (permalink / raw) To: David Laight Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > From: Alan Cox >> Sent: 22 November 2017 16:52 >> >> On Wed, 22 Nov 2017 09:01:46 +0100 >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: >> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or >> > group writable directories, even if the file doesn't exist yet. >> > With few exceptions (e.g. shared lock files based on flock()) >> >> Enough exceptions to make it a bad idea. >> >> Firstly if you care this much *stop* having shared writable directories. >> We have namespaces, you don't need them. You can give every user their >> own /tmp etc. > > Looks like a very bad idea to me as well. > > Doesn't this stop all shell redirects into a shared /tmp ? > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > files - they'll all stop working. If some program does such a thing, that's a potential vulnerability. With "protected_hardlinks" you are, in most cases, safe. But, still, that program has a bug and having this feature enabled will help you notice it soon. For that matter, I'm using this patch on my system and I don't have any program behaving like this. > If there are some directories where you need to force O_EXCL you > need to make it a property of the directory, not the kernel. > > David > ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-24 11:43 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-24 11:53 ` David Laight -1 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-24 11:53 UTC (permalink / raw) To: 'Salvatore Mesoraca' Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com] > Sent: 24 November 2017 11:44 > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > From: Alan Cox > >> Sent: 22 November 2017 16:52 > >> > >> On Wed, 22 Nov 2017 09:01:46 +0100 > >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > >> > group writable directories, even if the file doesn't exist yet. > >> > With few exceptions (e.g. shared lock files based on flock()) > >> > >> Enough exceptions to make it a bad idea. > >> > >> Firstly if you care this much *stop* having shared writable directories. > >> We have namespaces, you don't need them. You can give every user their > >> own /tmp etc. > > > > Looks like a very bad idea to me as well. > > > > Doesn't this stop all shell redirects into a shared /tmp ? > > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > > files - they'll all stop working. > > If some program does such a thing, that's a potential vulnerability. > With "protected_hardlinks" you are, in most cases, safe. > But, still, that program has a bug and having this feature enabled will > help you notice it soon. > For that matter, I'm using this patch on my system and I don't have any > program behaving like this. Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't. I can't help feeling that is just hiding a race. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-24 11:53 ` David Laight 0 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-24 11:53 UTC (permalink / raw) To: 'Salvatore Mesoraca' Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com] > Sent: 24 November 2017 11:44 > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > From: Alan Cox > >> Sent: 22 November 2017 16:52 > >> > >> On Wed, 22 Nov 2017 09:01:46 +0100 > >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > >> > group writable directories, even if the file doesn't exist yet. > >> > With few exceptions (e.g. shared lock files based on flock()) > >> > >> Enough exceptions to make it a bad idea. > >> > >> Firstly if you care this much *stop* having shared writable directories. > >> We have namespaces, you don't need them. You can give every user their > >> own /tmp etc. > > > > Looks like a very bad idea to me as well. > > > > Doesn't this stop all shell redirects into a shared /tmp ? > > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > > files - they'll all stop working. > > If some program does such a thing, that's a potential vulnerability. > With "protected_hardlinks" you are, in most cases, safe. > But, still, that program has a bug and having this feature enabled will > help you notice it soon. > For that matter, I'm using this patch on my system and I don't have any > program behaving like this. Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't. I can't help feeling that is just hiding a race. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-24 11:53 ` [kernel-hardening] " David Laight @ 2017-11-26 11:29 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-26 11:29 UTC (permalink / raw) To: David Laight Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-24 12:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com] >> Sent: 24 November 2017 11:44 >> >> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: >> > From: Alan Cox >> >> Sent: 22 November 2017 16:52 >> >> >> >> On Wed, 22 Nov 2017 09:01:46 +0100 >> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: >> >> >> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or >> >> > group writable directories, even if the file doesn't exist yet. >> >> > With few exceptions (e.g. shared lock files based on flock()) >> >> >> >> Enough exceptions to make it a bad idea. >> >> >> >> Firstly if you care this much *stop* having shared writable directories. >> >> We have namespaces, you don't need them. You can give every user their >> >> own /tmp etc. >> > >> > Looks like a very bad idea to me as well. >> > >> > Doesn't this stop all shell redirects into a shared /tmp ? >> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output >> > files - they'll all stop working. >> >> If some program does such a thing, that's a potential vulnerability. >> With "protected_hardlinks" you are, in most cases, safe. >> But, still, that program has a bug and having this feature enabled will >> help you notice it soon. >> For that matter, I'm using this patch on my system and I don't have any >> program behaving like this. > > Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then > open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't. > I can't help feeling that is just hiding a race. Yes, unfortunately, doing something like "cp somefile /tmp/" is a bad practice that in most cases will go unnoticed by this feature. Nevertheless there are many other real world vulnerability that it would have been able to detect. Thank you very much for taking the time to do some experiments. Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-26 11:29 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-26 11:29 UTC (permalink / raw) To: David Laight Cc: Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-24 12:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com] >> Sent: 24 November 2017 11:44 >> >> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: >> > From: Alan Cox >> >> Sent: 22 November 2017 16:52 >> >> >> >> On Wed, 22 Nov 2017 09:01:46 +0100 >> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: >> >> >> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or >> >> > group writable directories, even if the file doesn't exist yet. >> >> > With few exceptions (e.g. shared lock files based on flock()) >> >> >> >> Enough exceptions to make it a bad idea. >> >> >> >> Firstly if you care this much *stop* having shared writable directories. >> >> We have namespaces, you don't need them. You can give every user their >> >> own /tmp etc. >> > >> > Looks like a very bad idea to me as well. >> > >> > Doesn't this stop all shell redirects into a shared /tmp ? >> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output >> > files - they'll all stop working. >> >> If some program does such a thing, that's a potential vulnerability. >> With "protected_hardlinks" you are, in most cases, safe. >> But, still, that program has a bug and having this feature enabled will >> help you notice it soon. >> For that matter, I'm using this patch on my system and I don't have any >> program behaving like this. > > Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then > open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't. > I can't help feeling that is just hiding a race. Yes, unfortunately, doing something like "cp somefile /tmp/" is a bad practice that in most cases will go unnoticed by this feature. Nevertheless there are many other real world vulnerability that it would have been able to detect. Thank you very much for taking the time to do some experiments. Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-24 11:43 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-27 0:26 ` Solar Designer -1 siblings, 0 replies; 60+ messages in thread From: Solar Designer @ 2017-11-27 0:26 UTC (permalink / raw) To: Salvatore Mesoraca Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote: > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > From: Alan Cox > >> Sent: 22 November 2017 16:52 > >> > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > >> > group writable directories, even if the file doesn't exist yet. > >> > With few exceptions (e.g. shared lock files based on flock()) Why would "shared lock files based on flock()" need O_CREAT without O_EXCL? Where specifically are they currently used that way? My guess would be a group-writable mail spool (that would be fcntl() locks on Linux now, but it's the same thing for the purpose of this discussion), but even then I don't see a need for such open flags. If a program does that, we could want to find out and revise it (if O_CREAT|O_EXCL fails, retry without these to open the existing file, then flock() either way). > >> Enough exceptions to make it a bad idea. Maybe, but as Salvatore points out we're considering this for rather limited use - easy opt-in policy enforcement during software development and auditing (years ago, I was using an LKM for this purpose, which caught some issues in Postfix that were promptly patched), and maybe on specific production systems where the sysadmins know what they're doing. > >> Firstly if you care this much *stop* having shared writable directories. > >> We have namespaces, you don't need them. You can give every user their > >> own /tmp etc. This is good advice (and I've been doing similar with pam_mktemp, prior to namespaces), but it's not exactly for the same use case. Also, there are shared writable directories that have to remain shared unless more things are reworked. For example, mail spools and crontab directories, which could be avoided by having the corresponding files in user homes instead (and it's been done), but that's also a related risk (a privileged process accessing a directory writable by a user, even if accessing only for reading, lowering the risk impact). > > Looks like a very bad idea to me as well. > > > > Doesn't this stop all shell redirects into a shared /tmp ? The above sounds like it'd be something bad - but it's actually just right for an opt-in policy like this. A command like "program > /tmp/file" is generally unsafe (unless the file was pre-created safely, see below) and should be avoided. The fact that a lot of documentation gives commands of this kind only emphasizes the need for easy policy enforcement against such misuses of /tmp. Ideally, we'd stop most "shell redirects into a shared /tmp" - all but those into files correctly pre-created with mktemp(1) and the like. Technically, we could achieve similar by allowing O_CREAT without O_EXCL if the file exists _and_ has the same owner as our current fsuid or the owner is root. Most scripts that use mktemp(1) then use that file without switching privileges, so they'll continue working. If there's an exception to that on a system where I'd configure a policy like what's discussed here, I'd rather learn of it and be interested in looking at that unusual script. That script would be taking a risk by making one (pseudo-)user (or root) write into a file in /tmp created (even if initially safely) by another pseudo-user (not root), thus letting the latter pseudo-user attack the former (or root). Ditto for programs using mkstemp(3), but then somehow going through the pathname rather than the fd. That's not ideal, but it's sometimes safe and sometimes makes sense (such as when they need to exec other programs only accepting a pathname), so the same exception and approach applies. > > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > > files - they'll all stop working. That would be great. Unfortunately (in this context), you've identified exceptions, where programs try to be smart but are not necessarily safe. It's beneficial to be able to easily stop at least some misuses of /tmp and the like, even if we can't stop all. > If some program does such a thing, that's a potential vulnerability. Exactly. > With "protected_hardlinks" you are, in most cases, safe. And "protected_symlinks", and not in terms of data spoofing attacks via FIFOs and regular files (the initial focus of this patchset). > But, still, that program has a bug and having this feature enabled will > help you notice it soon. > For that matter, I'm using this patch on my system and I don't have any > program behaving like this. > > > If there are some directories where you need to force O_EXCL you > > need to make it a property of the directory, not the kernel. > > > > David Good idea, but this would need to be checked and enforced by the kernel anyhow, so it's a matter of coarse vs. fine policy. Coarse is much easier to implement and to start using, albeit perhaps mostly not by general-purpose distros, but by a software developer/auditor, an adventurous sysadmin, or a sysadmin team or a developer of specific systems where a simple policy like this is known to be valid (e.g., where there's no expectation of arbitrary software being added, so a simple rule like this can be introduced system-wide). Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-27 0:26 ` Solar Designer 0 siblings, 0 replies; 60+ messages in thread From: Solar Designer @ 2017-11-27 0:26 UTC (permalink / raw) To: Salvatore Mesoraca Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote: > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > From: Alan Cox > >> Sent: 22 November 2017 16:52 > >> > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > >> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > >> > group writable directories, even if the file doesn't exist yet. > >> > With few exceptions (e.g. shared lock files based on flock()) Why would "shared lock files based on flock()" need O_CREAT without O_EXCL? Where specifically are they currently used that way? My guess would be a group-writable mail spool (that would be fcntl() locks on Linux now, but it's the same thing for the purpose of this discussion), but even then I don't see a need for such open flags. If a program does that, we could want to find out and revise it (if O_CREAT|O_EXCL fails, retry without these to open the existing file, then flock() either way). > >> Enough exceptions to make it a bad idea. Maybe, but as Salvatore points out we're considering this for rather limited use - easy opt-in policy enforcement during software development and auditing (years ago, I was using an LKM for this purpose, which caught some issues in Postfix that were promptly patched), and maybe on specific production systems where the sysadmins know what they're doing. > >> Firstly if you care this much *stop* having shared writable directories. > >> We have namespaces, you don't need them. You can give every user their > >> own /tmp etc. This is good advice (and I've been doing similar with pam_mktemp, prior to namespaces), but it's not exactly for the same use case. Also, there are shared writable directories that have to remain shared unless more things are reworked. For example, mail spools and crontab directories, which could be avoided by having the corresponding files in user homes instead (and it's been done), but that's also a related risk (a privileged process accessing a directory writable by a user, even if accessing only for reading, lowering the risk impact). > > Looks like a very bad idea to me as well. > > > > Doesn't this stop all shell redirects into a shared /tmp ? The above sounds like it'd be something bad - but it's actually just right for an opt-in policy like this. A command like "program > /tmp/file" is generally unsafe (unless the file was pre-created safely, see below) and should be avoided. The fact that a lot of documentation gives commands of this kind only emphasizes the need for easy policy enforcement against such misuses of /tmp. Ideally, we'd stop most "shell redirects into a shared /tmp" - all but those into files correctly pre-created with mktemp(1) and the like. Technically, we could achieve similar by allowing O_CREAT without O_EXCL if the file exists _and_ has the same owner as our current fsuid or the owner is root. Most scripts that use mktemp(1) then use that file without switching privileges, so they'll continue working. If there's an exception to that on a system where I'd configure a policy like what's discussed here, I'd rather learn of it and be interested in looking at that unusual script. That script would be taking a risk by making one (pseudo-)user (or root) write into a file in /tmp created (even if initially safely) by another pseudo-user (not root), thus letting the latter pseudo-user attack the former (or root). Ditto for programs using mkstemp(3), but then somehow going through the pathname rather than the fd. That's not ideal, but it's sometimes safe and sometimes makes sense (such as when they need to exec other programs only accepting a pathname), so the same exception and approach applies. > > I'm pretty sure most programs use O_CREAT | O_TRUNC for output > > files - they'll all stop working. That would be great. Unfortunately (in this context), you've identified exceptions, where programs try to be smart but are not necessarily safe. It's beneficial to be able to easily stop at least some misuses of /tmp and the like, even if we can't stop all. > If some program does such a thing, that's a potential vulnerability. Exactly. > With "protected_hardlinks" you are, in most cases, safe. And "protected_symlinks", and not in terms of data spoofing attacks via FIFOs and regular files (the initial focus of this patchset). > But, still, that program has a bug and having this feature enabled will > help you notice it soon. > For that matter, I'm using this patch on my system and I don't have any > program behaving like this. > > > If there are some directories where you need to force O_EXCL you > > need to make it a property of the directory, not the kernel. > > > > David Good idea, but this would need to be checked and enforced by the kernel anyhow, so it's a matter of coarse vs. fine policy. Coarse is much easier to implement and to start using, albeit perhaps mostly not by general-purpose distros, but by a software developer/auditor, an adventurous sysadmin, or a sysadmin team or a developer of specific systems where a simple policy like this is known to be valid (e.g., where there's no expectation of arbitrary software being added, so a simple rule like this can be introduced system-wide). Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-27 0:26 ` [kernel-hardening] " Solar Designer @ 2017-11-30 14:39 ` Salvatore Mesoraca -1 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-30 14:39 UTC (permalink / raw) To: Solar Designer Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>: > On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote: > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > > From: Alan Cox > > >> Sent: 22 November 2017 16:52 > > >> > > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > > >> > > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > > >> > group writable directories, even if the file doesn't exist yet. > > >> > With few exceptions (e.g. shared lock files based on flock()) > > Why would "shared lock files based on flock()" need O_CREAT without > O_EXCL? Where specifically are they currently used that way? I don't think that they *need* to act like this, but this is how util-linux's flock(1) currently works. And it doesn't seem an unreasonable behavior from their perspective, so maybe other programs do that too. I was citing that case just to make it clear that, if someone gets a warning because of flock(1), they shouldn't be worried about it. That behavior can be certainly avoided, but of course it isn't a security problem per se. > If a program does > that, we could want to find out and revise it (if O_CREAT|O_EXCL fails, > retry without these to open the existing file, then flock() either way). Yes, this would probably be the best thing to do, good idea. Thanks again for your time, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-30 14:39 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-30 14:39 UTC (permalink / raw) To: Solar Designer Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>: > On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote: > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>: > > > From: Alan Cox > > >> Sent: 22 November 2017 16:52 > > >> > > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote: > > >> > > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or > > >> > group writable directories, even if the file doesn't exist yet. > > >> > With few exceptions (e.g. shared lock files based on flock()) > > Why would "shared lock files based on flock()" need O_CREAT without > O_EXCL? Where specifically are they currently used that way? I don't think that they *need* to act like this, but this is how util-linux's flock(1) currently works. And it doesn't seem an unreasonable behavior from their perspective, so maybe other programs do that too. I was citing that case just to make it clear that, if someone gets a warning because of flock(1), they shouldn't be worried about it. That behavior can be certainly avoided, but of course it isn't a security problem per se. > If a program does > that, we could want to find out and revise it (if O_CREAT|O_EXCL fails, > retry without these to open the existing file, then flock() either way). Yes, this would probably be the best thing to do, good idea. Thanks again for your time, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-30 14:39 ` [kernel-hardening] " Salvatore Mesoraca (?) @ 2017-11-30 14:57 ` Ian Campbell 2017-11-30 16:30 ` [kernel-hardening] " Solar Designer -1 siblings, 1 reply; 60+ messages in thread From: Ian Campbell @ 2017-11-30 14:57 UTC (permalink / raw) To: Salvatore Mesoraca, Solar Designer Cc: David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote: > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>: > > On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote: > > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com> > > > : > > > > From: Alan Cox > > > > > Sent: 22 November 2017 16:52 > > > > > > > > > > On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.meso > > > > > raca16@gmail.com> wrote: > > > > > > > > > > > Disallows O_CREAT open missing the O_EXCL flag, in world or > > > > > > group writable directories, even if the file doesn't exist > > > > > > yet. > > > > > > With few exceptions (e.g. shared lock files based on > > > > > > flock()) > > > > Why would "shared lock files based on flock()" need O_CREAT without > > O_EXCL? Where specifically are they currently used that way? > > I don't think that they *need* to act like this, but this is how > util-linux's flock(1) currently works. > And it doesn't seem an unreasonable behavior from their perspective, I thought that too, specifically I reasoned that using O_EXCL would defeat the purpose of the _shared_ flock(), wouldn't it? Ian. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-30 14:57 ` Ian Campbell @ 2017-11-30 16:30 ` Solar Designer 2017-12-05 10:21 ` Salvatore Mesoraca 0 siblings, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-11-30 16:30 UTC (permalink / raw) To: Ian Campbell Cc: Salvatore Mesoraca, David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and Karel Zak for util-linux flock(1). On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote: > On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote: > > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>: > > > Why would "shared lock files based on flock()" need O_CREAT without > > > O_EXCL? Where specifically are they currently used that way? > > > > I don't think that they *need* to act like this, but this is how > > util-linux's flock(1) currently works. Oh, but you never referred to flock(1) in there, so I read flock() as implying flock(2). I think util-linux's flock(1) is relatively obscure, and as far as I can tell it isn't documented anywhere whether it may be used on untrusted lock file directories or not. A revision of the flock(1) man page seen on RHEL7 gives really weird examples using /tmp. The current util-linux-2.31/sys-utils/flock.1 is similar. strace'ing those examples, I see flock(1) literally uses the /tmp directory itself (not a file inside) as the lock, which surprisingly appears to work. Checking the util-linux tree, it looks like this was added as a feature. I suppose the good news is this doesn't appear to allow for worse than a DoS against the script using flock(1) (since a different user can also take that lock), unless any of the upper level directories are also untrusted. The man page as seen at https://linux.die.net/man/1/flock currently only lists a more complex example with /var/lock/mylockfile, where flock(1) itself is asked to access it via a pre-opened fd. Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as symlink to ../run/lock, which is 755 root:root). Anyway, these are just examples. The reality is people will use flock(1) in various directories, some of them trusted and some not. If our proposed policy can detect and optionally break some uses in untrusted directories, that's good since such uses are currently unsafe (see below). > > And it doesn't seem an unreasonable behavior from their perspective, > > I thought that too, specifically I reasoned that using O_EXCL would > defeat the purpose of the _shared_ flock(), wouldn't it? No. O_EXCL means the attempt to O_CREAT will fail, at which point the program can retry without both of these flags. (The actual lock is obtained later via flock(2) or fcntl(2) anyway.) In fact, flock(1) already does something similar, as tested using the very first example from the man page on a RHEL7'ish system: $ strace flock /tmp -c cat [...] open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory) open("/tmp", O_RDONLY|O_NOCTTY) = 3 flock(3, LOCK_EX) = 0 As you can see, when O_CREAT failed, the program retried without that flag. It could just as easily have tried O_CREAT|O_EXCL, then dropped both at once. Testing with a lock file (rather than lock directory): $ strace flock /tmp/lockfile -c cat [...] open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3 flock(3, LOCK_EX) = 0 This use of flock(1) would be a worse vulnerability, not limited to DoS against the script itself but also allowing for privilege escalation unless (sym)link restrictions are enabled in the kernel. Adding O_EXCL would help (reduce the impact back to DoS against itself), and would require that the retry logic (like what is seen in the lock directory example above) be present. On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote: > so maybe other programs do that too. Maybe, and they would be similarly vulnerable if they do that in untrusted directories, and they would also need to be fixed. A subtle case is when something like this is technically done in a directory writable by someone else, but is not necessarily crossing what the program's author or the sysadmin recognize as a privilege boundary. Maybe they have accepted that this one pseudo-user can attack that other one anyway, such as because under a certain daemon's privsep model the would-be attacking pseudo-user is necessarily more privileged anyway. This is why we shouldn't by default extend this policy to all directories writable by someone else, but instead we consider introducing a sysctl with varying policy levels - initially focusing on sticky directories writable by someone else and only later optionally extending to non-sticky directories writable by someone else. The latter mode would have even greater focus on development and debugging rather than on production use, although I imagine that specific systems could eventually afford it in production as well. > I was citing that case just to make it clear that, if someone gets > a warning because of flock(1), they shouldn't be worried about it. Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact doesn't retry without O_CREAT in the non-directory case. That would need to be corrected, along with introduction of O_EXCL. > That behavior can be certainly avoided, but of course it isn't a > security problem per se. I think it is a security problem per se, except in the "subtle case" above, and it's great that our proposed policy would catch it. Perhaps flock(1) should be extended as follows: > > If a program does > > that, we could want to find out and revise it (if O_CREAT|O_EXCL fails, > > retry without these to open the existing file, then flock() either way). > > Yes, this would probably be the best thing to do, good idea. util-linux-2.31/sys-utils/flock.c currently has: static int open_file(const char *filename, int *flags) { int fd; int fl = *flags == 0 ? O_RDONLY : *flags; errno = 0; fl |= O_NOCTTY | O_CREAT; fd = open(filename, fl, 0666); /* Linux doesn't like O_CREAT on a directory, even though it * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY */ if (fd < 0 && errno == EISDIR) { fl = O_RDONLY | O_NOCTTY; fd = open(filename, fl); } I think this should be revised to: fl |= O_NOCTTY | O_CREAT | O_EXCL; fd = open(filename, fl, 0666); /* Linux doesn't like O_CREAT on a directory, even though it * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY */ if (fd < 0) { if (errno == EISDIR) fl = O_RDONLY | O_NOCTTY; else fl &= ~(O_CREAT | O_EXCL); fd = open(filename, fl); } This adds O_EXCL and retry without O_CREAT|O_EXCL for non-directories. [ The pre-umask permissions of 0666 are also a potential concern since many users have a relaxed umask as distro default, but changing these to 0600 would break some otherwise valid uses where a lock file may be shared between different users on purpose. Maybe a future major version of flock(1) could default to 0600 and/or accept a "-m" option to set the lock file's mode in case a new file gets created. mktemp(1) may be viewed as precedent: it uses pre-umask permissions of 0600. Lock files are arguably somewhat similar to temporary files. This is beyond scope of this thread and LKML, though, and should be discussed elsewhere. ] Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-30 16:30 ` [kernel-hardening] " Solar Designer @ 2017-12-05 10:21 ` Salvatore Mesoraca 2017-12-07 21:47 ` Solar Designer 0 siblings, 1 reply; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-05 10:21 UTC (permalink / raw) To: Solar Designer Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>: > Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and > Karel Zak for util-linux flock(1). > > On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote: > > On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote: > > > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>: > > > > Why would "shared lock files based on flock()" need O_CREAT without > > > > O_EXCL? Where specifically are they currently used that way? > > > > > > I don't think that they *need* to act like this, but this is how > > > util-linux's flock(1) currently works. > > Oh, but you never referred to flock(1) in there, so I read flock() as > implying flock(2). That's true, I observed that behavior in flock(1), but I didn't specify it was flock(1) because I thought that there may be some other users of flock(2) that works that way. > I think util-linux's flock(1) is relatively obscure, > and as far as I can tell it isn't documented anywhere whether it may be > used on untrusted lock file directories or not. A revision of the > flock(1) man page seen on RHEL7 gives really weird examples using /tmp. > The current util-linux-2.31/sys-utils/flock.1 is similar. strace'ing > those examples, I see flock(1) literally uses the /tmp directory itself > (not a file inside) as the lock, which surprisingly appears to work. > Checking the util-linux tree, it looks like this was added as a feature. > I suppose the good news is this doesn't appear to allow for worse than a > DoS against the script using flock(1) (since a different user can also > take that lock), unless any of the upper level directories are also > untrusted. The man page as seen at https://linux.die.net/man/1/flock > currently only lists a more complex example with /var/lock/mylockfile, > where flock(1) itself is asked to access it via a pre-opened fd. > Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system > I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as > symlink to ../run/lock, which is 755 root:root). Anyway, these are just > examples. The reality is people will use flock(1) in various directories, > some of them trusted and some not. If our proposed policy can detect and > optionally break some uses in untrusted directories, that's good since > such uses are currently unsafe (see below). > > > > And it doesn't seem an unreasonable behavior from their perspective, > > > > I thought that too, specifically I reasoned that using O_EXCL would > > defeat the purpose of the _shared_ flock(), wouldn't it? > > No. O_EXCL means the attempt to O_CREAT will fail, at which point the > program can retry without both of these flags. (The actual lock is > obtained later via flock(2) or fcntl(2) anyway.) In fact, flock(1) > already does something similar, as tested using the very first example > from the man page on a RHEL7'ish system: > > $ strace flock /tmp -c cat > [...] > open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory) > open("/tmp", O_RDONLY|O_NOCTTY) = 3 > flock(3, LOCK_EX) = 0 > > As you can see, when O_CREAT failed, the program retried without that > flag. It could just as easily have tried O_CREAT|O_EXCL, then dropped > both at once. Testing with a lock file (rather than lock directory): > > $ strace flock /tmp/lockfile -c cat > [...] > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3 > flock(3, LOCK_EX) = 0 > > This use of flock(1) would be a worse vulnerability, not limited to DoS > against the script itself but also allowing for privilege escalation > unless (sym)link restrictions are enabled in the kernel. Adding O_EXCL > would help (reduce the impact back to DoS against itself), and would > require that the retry logic (like what is seen in the lock directory > example above) be present. > > On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote: > > so maybe other programs do that too. > > Maybe, and they would be similarly vulnerable if they do that in > untrusted directories, and they would also need to be fixed. > > A subtle case is when something like this is technically done in a > directory writable by someone else, but is not necessarily crossing > what the program's author or the sysadmin recognize as a privilege > boundary. Maybe they have accepted that this one pseudo-user can > attack that other one anyway, such as because under a certain daemon's > privsep model the would-be attacking pseudo-user is necessarily more > privileged anyway. This is why we shouldn't by default extend this > policy to all directories writable by someone else, but instead we > consider introducing a sysctl with varying policy levels - initially > focusing on sticky directories writable by someone else and only later > optionally extending to non-sticky directories writable by someone else. > The latter mode would have even greater focus on development and > debugging rather than on production use, although I imagine that > specific systems could eventually afford it in production as well. > > > I was citing that case just to make it clear that, if someone gets > > a warning because of flock(1), they shouldn't be worried about it. > > Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact > doesn't retry without O_CREAT in the non-directory case. That would > need to be corrected, along with introduction of O_EXCL. > > > That behavior can be certainly avoided, but of course it isn't a > > security problem per se. > > I think it is a security problem per se, except in the "subtle case" > above, and it's great that our proposed policy would catch it. I agree on the DoS, though, at first, I didn't consider it a "bug" because there isn't any open mode that can prevent the DoS in this case. If you want to avoid it, you must avoid other-users-writable directories at all. So, It think that, if you are using a sticky directory, it's intended behavior to let someone else "lock" you. But maybe many flock(1) users are not aware of the issue and so, sometimes, it can be unintended. I didn't consider privilege escalation as an issue because I always looked at flock(1) under the assumption that the lockfile is never actually read or modified in any way and so it shouldn't make too much difference if it's an already existing regular file or a symlink etc. Am I missing something? Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-12-05 10:21 ` Salvatore Mesoraca @ 2017-12-07 21:47 ` Solar Designer 2017-12-11 12:08 ` Salvatore Mesoraca 0 siblings, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-12-07 21:47 UTC (permalink / raw) To: Salvatore Mesoraca Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote: > 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>: > > $ strace flock /tmp/lockfile -c cat > > [...] > > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3 > > flock(3, LOCK_EX) = 0 > > > > This use of flock(1) would be a worse vulnerability, not limited to DoS > > against the script itself but also allowing for privilege escalation > > unless (sym)link restrictions are enabled in the kernel. Adding O_EXCL > > would help (reduce the impact back to DoS against itself), and would > > require that the retry logic (like what is seen in the lock directory > > example above) be present. > > > That behavior can be certainly avoided, but of course it isn't a > > > security problem per se. > > > > I think it is a security problem per se, except in the "subtle case" > > above, and it's great that our proposed policy would catch it. > > I agree on the DoS, though, at first, I didn't consider it a "bug" because > there isn't any open mode that can prevent the DoS in this case. > If you want to avoid it, you must avoid other-users-writable directories > at all. So, It think that, if you are using a sticky directory, it's > intended behavior to let someone else "lock" you. Right. There's a worse DoS I had overlooked, though: flock(1) can also be made to create and/or lock another file (maybe in another directory). Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of O_NOCTTY) would be a good idea, even though uses in a directory writable by someone else are inherently risky anyway. > But maybe many flock(1) users are not aware of the issue and so, sometimes, > it can be unintended. > I didn't consider privilege escalation as an issue because I always > looked at flock(1) under the assumption that the lockfile is never actually > read or modified in any way and so it shouldn't make too much difference if > it's an already existing regular file or a symlink etc. > Am I missing something? You made a good point, but yes: O_CREAT will follow a dangling symlink and there are cases where creating an empty file of an attacker-chosen pathname may allow for privilege escalation. For example, crontab(1) man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny: "If neither of these files exists, only the super user is allowed to use cron." In that configuration, simply creating empty /etc/cron.deny grants access to crontab(1) to all users. As user: $ crontab -l You (solar) are not allowed to use this program (crontab) See crontab(1) for more information $ ln -s /etc/cron.deny /tmp/lockfile As root: # sysctl -w fs.protected_symlinks=0 fs.protected_symlinks = 0 # flock /tmp/lockfile -c echo As user: $ crontab -l no crontab for solar There may also be side-effects on open of device files (the best known example is rewinding a tape), and we won't avoid those by retrying without O_CREAT|O_EXCL. O_NOFOLLOW will help against symlinks to device files, but not against hard links (if on same device). The kernel's symlink and hardlink protections help, but in this sub-thread we were discussing detecting userspace software issues without waiting for an attack. Things like this fit David Laight's point well: programs trying to make risky (mis)uses less risky sometimes also avoid being detected by our proposed policy. That's life. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-12-07 21:47 ` Solar Designer @ 2017-12-11 12:08 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-11 12:08 UTC (permalink / raw) To: Solar Designer Cc: Ian Campbell, David Laight, Alan Cox, linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman, H. Peter Anvin, Karel Zak 2017-12-07 22:47 GMT+01:00 Solar Designer <solar@openwall.com>: > On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote: > > 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>: > > > $ strace flock /tmp/lockfile -c cat > > > [...] > > > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3 > > > flock(3, LOCK_EX) = 0 > > > > > > This use of flock(1) would be a worse vulnerability, not limited to DoS > > > against the script itself but also allowing for privilege escalation > > > unless (sym)link restrictions are enabled in the kernel. Adding O_EXCL > > > would help (reduce the impact back to DoS against itself), and would > > > require that the retry logic (like what is seen in the lock directory > > > example above) be present. > > > > > That behavior can be certainly avoided, but of course it isn't a > > > > security problem per se. > > > > > > I think it is a security problem per se, except in the "subtle case" > > > above, and it's great that our proposed policy would catch it. > > > > I agree on the DoS, though, at first, I didn't consider it a "bug" because > > there isn't any open mode that can prevent the DoS in this case. > > If you want to avoid it, you must avoid other-users-writable directories > > at all. So, It think that, if you are using a sticky directory, it's > > intended behavior to let someone else "lock" you. > > Right. There's a worse DoS I had overlooked, though: flock(1) can also > be made to create and/or lock another file (maybe in another directory). > Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of > O_NOCTTY) would be a good idea, even though uses in a directory writable > by someone else are inherently risky anyway. > > > But maybe many flock(1) users are not aware of the issue and so, sometimes, > > it can be unintended. > > I didn't consider privilege escalation as an issue because I always > > looked at flock(1) under the assumption that the lockfile is never actually > > read or modified in any way and so it shouldn't make too much difference if > > it's an already existing regular file or a symlink etc. > > Am I missing something? > > You made a good point, but yes: O_CREAT will follow a dangling symlink > and there are cases where creating an empty file of an attacker-chosen > pathname may allow for privilege escalation. For example, crontab(1) > man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny: > > "If neither of these files exists, only the super user is allowed to > use cron." > > In that configuration, simply creating empty /etc/cron.deny grants > access to crontab(1) to all users. As user: > > $ crontab -l > You (solar) are not allowed to use this program (crontab) > See crontab(1) for more information > $ ln -s /etc/cron.deny /tmp/lockfile > > As root: > > # sysctl -w fs.protected_symlinks=0 > fs.protected_symlinks = 0 > # flock /tmp/lockfile -c echo > > As user: > > $ crontab -l > no crontab for solar Ah, right. I didn't think of it. > There may also be side-effects on open of device files (the best known > example is rewinding a tape), and we won't avoid those by retrying > without O_CREAT|O_EXCL. O_NOFOLLOW will help against symlinks to device > files, but not against hard links (if on same device). The kernel's > symlink and hardlink protections help, but in this sub-thread we were > discussing detecting userspace software issues without waiting for an > attack. Things like this fit David Laight's point well: programs trying > to make risky (mis)uses less risky sometimes also avoid being detected > by our proposed policy. That's life. Yes, unfortunately some bad behaviors will go unnoticed. But many others won't, so I thinks this is still a useful feature to have. Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca ` (3 preceding siblings ...) (?) @ 2017-11-23 22:57 ` Tobin C. Harding 2017-11-24 8:34 ` Salvatore Mesoraca -1 siblings, 1 reply; 60+ messages in thread From: Tobin C. Harding @ 2017-11-23 22:57 UTC (permalink / raw) To: Salvatore Mesoraca Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: Same caveat about this being English language comments only as for patch 1/2. Please ignore if this is too trivial. My grammar is a long way from perfect, especially please feel free to ignore my placement of commas, they are often wrong. > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) > if a program tries to open a file, in a sticky directory, > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > This feature allows to detect and potentially block programs that > act this way, it can be used to find vulnerabilities (like those > prevented by patch #1) and to do policy enforcement. > > Suggested-by: Solar Designer <solar@openwall.com> > Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> > --- > Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++ > fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 1 + > kernel/sysctl.c | 9 ++++++++ > 4 files changed, 96 insertions(+) > > diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > index f3cf2cd..7f24b4f 100644 > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs: > - protected_fifos > - protected_hardlinks > - protected_regular > +- protected_sticky_child_create > - protected_symlinks > - suid_dumpable > - super-max > @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories. > > ============================================================== > > +protected_sticky_child_create: > + > +An O_CREAT open missing the O_EXCL flag in a sticky directory is, > +often, a bug or a synthom of the fact that the program is not s/synthom/symptom > +using appropriate procedures to access sticky directories. > +This protection allow to detect and possibly block these unsafe Perhaps This protection allows us to detect, and possibly block, these unsafe > +open invocations, even if the files don't exist yet. > +Though should be noted that, sometimes, it's OK to open a file Perhaps +Although it should be noted, sometimes it's OK to open a file (I looked up 'although' vs 'though' and am not quite sure on this one, it seems to read better with 'although'. Again, apologies if this is overly trivial.) Hope this helps, Tobin. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-23 22:57 ` Tobin C. Harding @ 2017-11-24 8:34 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-24 8:34 UTC (permalink / raw) To: Tobin C. Harding Cc: linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman 2017-11-23 23:57 GMT+01:00 Tobin C. Harding <me@tobin.cc>: > On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote: > > Same caveat about this being English language comments only as for patch > 1/2. Please ignore if this is too trivial. My grammar is a long way from > perfect, especially please feel free to ignore my placement of commas, > they are often wrong. As I wrote before: any help is always welcome. Thank you, Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca @ 2017-11-30 16:53 ` David Laight -1 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-30 16:53 UTC (permalink / raw) To: 'Salvatore Mesoraca', linux-kernel Cc: Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Salvatore Mesoraca > Sent: 22 November 2017 08:02 > > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) > if a program tries to open a file, in a sticky directory, > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > This feature allows to detect and potentially block programs that > act this way, it can be used to find vulnerabilities (like those > prevented by patch #1) and to do policy enforcement. (Going back to the original post) I presume the 'vulnerabilities' are related to symlinks being created just before the open? Trouble is this change breaks a lot of general use of /tmp. I always assumed that code that cared would use O_EXCL and everything else wasn't worth subverting. I found code in vi (and elsewhere) that subverted these checks by opening with O_WRONLY if stat() showed the file existed and O_CREAT|O_EXCL if it didn't. I'm pretty sure that traditionally a lot of these opens were done with O_CREAT|O_TRUNC. Implementing that as unlink() followed by a create would stop 'random' (ok all) symlinks being followed. Overall I'm pretty sure this change will break things badly somewhere. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-30 16:53 ` David Laight 0 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-11-30 16:53 UTC (permalink / raw) To: 'Salvatore Mesoraca', linux-kernel Cc: Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Solar Designer, Eric W. Biederman From: Salvatore Mesoraca > Sent: 22 November 2017 08:02 > > Disallows O_CREAT open missing the O_EXCL flag, in world or > group writable directories, even if the file doesn't exist yet. > With few exceptions (e.g. shared lock files based on flock()) > if a program tries to open a file, in a sticky directory, > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > This feature allows to detect and potentially block programs that > act this way, it can be used to find vulnerabilities (like those > prevented by patch #1) and to do policy enforcement. (Going back to the original post) I presume the 'vulnerabilities' are related to symlinks being created just before the open? Trouble is this change breaks a lot of general use of /tmp. I always assumed that code that cared would use O_EXCL and everything else wasn't worth subverting. I found code in vi (and elsewhere) that subverted these checks by opening with O_WRONLY if stat() showed the file existed and O_CREAT|O_EXCL if it didn't. I'm pretty sure that traditionally a lot of these opens were done with O_CREAT|O_TRUNC. Implementing that as unlink() followed by a create would stop 'random' (ok all) symlinks being followed. Overall I'm pretty sure this change will break things badly somewhere. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-30 16:53 ` [kernel-hardening] " David Laight @ 2017-11-30 17:51 ` Solar Designer -1 siblings, 0 replies; 60+ messages in thread From: Solar Designer @ 2017-11-30 17:51 UTC (permalink / raw) To: David Laight Cc: 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote: > From: Salvatore Mesoraca > > if a program tries to open a file, in a sticky directory, > > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > > This feature allows to detect and potentially block programs that > > act this way, it can be used to find vulnerabilities (like those > > prevented by patch #1) and to do policy enforcement. > I presume the 'vulnerabilities' are related to symlinks being created > just before the open? That's one way to exploit them. We already have a sysctl to try and block that specific attack, and we're considering adding more, for other attacks. But we'd also like an easy way to learn of the vulnerabilities without anyone trying to exploit them yet, hence this patch. > Trouble is this change breaks a lot of general use of /tmp. That's general misuse of /tmp. Things like "command > /tmp/file" without having pre-created the file with O_EXCL e.g. by mktemp(1). > I always assumed that code that cared would use O_EXCL and > everything else wasn't worth subverting. Opinions will vary on whether it's worth subverting or not, and someone may reasonably want to configure this differently on different systems. > I found code in vi (and elsewhere) that subverted these checks > by opening with O_WRONLY if stat() showed the file existed and > O_CREAT|O_EXCL if it didn't. Yes, such misuses of /tmp and the like by use of those programs - where the potential consequences would often be less severe (if your example above is correct, it sounds like it'd be data spoofing but not overwriting another file over a link) - could remain unnoticed. > I'm pretty sure that traditionally a lot of these opens were done > with O_CREAT|O_TRUNC. > Implementing that as unlink() followed by a create would stop > 'random' (ok all) symlinks being followed. That's racy. We have O_EXCL, and it must be used along with O_CREAT when the directory is untrusted. (If it were only about symlinks, we also already have O_NOFOLLOW.) > Overall I'm pretty sure this change will break things badly somewhere. Sure. We want it to visibly break what was subtly broken, so that the breakage can be seen and corrected without an attempted attack. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-11-30 17:51 ` Solar Designer 0 siblings, 0 replies; 60+ messages in thread From: Solar Designer @ 2017-11-30 17:51 UTC (permalink / raw) To: David Laight Cc: 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote: > From: Salvatore Mesoraca > > if a program tries to open a file, in a sticky directory, > > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > > This feature allows to detect and potentially block programs that > > act this way, it can be used to find vulnerabilities (like those > > prevented by patch #1) and to do policy enforcement. > I presume the 'vulnerabilities' are related to symlinks being created > just before the open? That's one way to exploit them. We already have a sysctl to try and block that specific attack, and we're considering adding more, for other attacks. But we'd also like an easy way to learn of the vulnerabilities without anyone trying to exploit them yet, hence this patch. > Trouble is this change breaks a lot of general use of /tmp. That's general misuse of /tmp. Things like "command > /tmp/file" without having pre-created the file with O_EXCL e.g. by mktemp(1). > I always assumed that code that cared would use O_EXCL and > everything else wasn't worth subverting. Opinions will vary on whether it's worth subverting or not, and someone may reasonably want to configure this differently on different systems. > I found code in vi (and elsewhere) that subverted these checks > by opening with O_WRONLY if stat() showed the file existed and > O_CREAT|O_EXCL if it didn't. Yes, such misuses of /tmp and the like by use of those programs - where the potential consequences would often be less severe (if your example above is correct, it sounds like it'd be data spoofing but not overwriting another file over a link) - could remain unnoticed. > I'm pretty sure that traditionally a lot of these opens were done > with O_CREAT|O_TRUNC. > Implementing that as unlink() followed by a create would stop > 'random' (ok all) symlinks being followed. That's racy. We have O_EXCL, and it must be used along with O_CREAT when the directory is untrusted. (If it were only about symlinks, we also already have O_NOFOLLOW.) > Overall I'm pretty sure this change will break things badly somewhere. Sure. We want it to visibly break what was subtly broken, so that the breakage can be seen and corrected without an attempted attack. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-11-30 17:51 ` [kernel-hardening] " Solar Designer @ 2017-12-01 9:46 ` David Laight -1 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-12-01 9:46 UTC (permalink / raw) To: 'Solar Designer' Cc: 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman From: Solar Designer > Sent: 30 November 2017 17:52 > > On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote: > > From: Salvatore Mesoraca > > > if a program tries to open a file, in a sticky directory, > > > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > > > This feature allows to detect and potentially block programs that > > > act this way, it can be used to find vulnerabilities (like those > > > prevented by patch #1) and to do policy enforcement. > > > I presume the 'vulnerabilities' are related to symlinks being created > > just before the open? > > That's one way to exploit them. We already have a sysctl to try and > block that specific attack, and we're considering adding more, for other > attacks. But we'd also like an easy way to learn of the vulnerabilities > without anyone trying to exploit them yet, hence this patch. > > > Trouble is this change breaks a lot of general use of /tmp. > > That's general misuse of /tmp. Things like "command > /tmp/file" > without having pre-created the file with O_EXCL e.g. by mktemp(1). I'm sorry, I've been using Unix for over 30 years. /tmp is a place that temporary files were created - nothing special. Traditionally it was emptied on every boot. There was never anything that required files be created in any specific way. > > I always assumed that code that cared would use O_EXCL and > > everything else wasn't worth subverting. > > Opinions will vary on whether it's worth subverting or not, and someone > may reasonably want to configure this differently on different systems. > > > I found code in vi (and elsewhere) that subverted these checks > > by opening with O_WRONLY if stat() showed the file existed and > > O_CREAT|O_EXCL if it didn't. > > Yes, such misuses of /tmp and the like by use of those programs - where > the potential consequences would often be less severe (if your example > above is correct, it sounds like it'd be data spoofing but not > overwriting another file over a link) - could remain unnoticed. It seemed to me that code had been added to avoid issues caused by this policing of opens. But the code added is itself racy - so makes the whole thing pointless. > > I'm pretty sure that traditionally a lot of these opens were done > > with O_CREAT|O_TRUNC. > > Implementing that as unlink() followed by a create would stop > > 'random' (ok all) symlinks being followed. > > That's racy. We have O_EXCL, and it must be used along with O_CREAT > when the directory is untrusted. (If it were only about symlinks, we > also already have O_NOFOLLOW.) Not racy if done in the kernel itself. > > Overall I'm pretty sure this change will break things badly somewhere. > > Sure. We want it to visibly break what was subtly broken, so that the > breakage can be seen and corrected without an attempted attack. Maybe, but it will break some user system when they do a kernel update. It won't necessarily break a developer's system first. And, AFAICT, there is already code that subverts any tests by doing a check for the file existing and then selecting between two different types on open. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] RE: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-12-01 9:46 ` David Laight 0 siblings, 0 replies; 60+ messages in thread From: David Laight @ 2017-12-01 9:46 UTC (permalink / raw) To: 'Solar Designer' Cc: 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman From: Solar Designer > Sent: 30 November 2017 17:52 > > On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote: > > From: Salvatore Mesoraca > > > if a program tries to open a file, in a sticky directory, > > > with the O_CREAT flag and without the O_EXCL, it probably has a bug. > > > This feature allows to detect and potentially block programs that > > > act this way, it can be used to find vulnerabilities (like those > > > prevented by patch #1) and to do policy enforcement. > > > I presume the 'vulnerabilities' are related to symlinks being created > > just before the open? > > That's one way to exploit them. We already have a sysctl to try and > block that specific attack, and we're considering adding more, for other > attacks. But we'd also like an easy way to learn of the vulnerabilities > without anyone trying to exploit them yet, hence this patch. > > > Trouble is this change breaks a lot of general use of /tmp. > > That's general misuse of /tmp. Things like "command > /tmp/file" > without having pre-created the file with O_EXCL e.g. by mktemp(1). I'm sorry, I've been using Unix for over 30 years. /tmp is a place that temporary files were created - nothing special. Traditionally it was emptied on every boot. There was never anything that required files be created in any specific way. > > I always assumed that code that cared would use O_EXCL and > > everything else wasn't worth subverting. > > Opinions will vary on whether it's worth subverting or not, and someone > may reasonably want to configure this differently on different systems. > > > I found code in vi (and elsewhere) that subverted these checks > > by opening with O_WRONLY if stat() showed the file existed and > > O_CREAT|O_EXCL if it didn't. > > Yes, such misuses of /tmp and the like by use of those programs - where > the potential consequences would often be less severe (if your example > above is correct, it sounds like it'd be data spoofing but not > overwriting another file over a link) - could remain unnoticed. It seemed to me that code had been added to avoid issues caused by this policing of opens. But the code added is itself racy - so makes the whole thing pointless. > > I'm pretty sure that traditionally a lot of these opens were done > > with O_CREAT|O_TRUNC. > > Implementing that as unlink() followed by a create would stop > > 'random' (ok all) symlinks being followed. > > That's racy. We have O_EXCL, and it must be used along with O_CREAT > when the directory is untrusted. (If it were only about symlinks, we > also already have O_NOFOLLOW.) Not racy if done in the kernel itself. > > Overall I'm pretty sure this change will break things badly somewhere. > > Sure. We want it to visibly break what was subtly broken, so that the > breakage can be seen and corrected without an attempted attack. Maybe, but it will break some user system when they do a kernel update. It won't necessarily break a developer's system first. And, AFAICT, there is already code that subverts any tests by doing a check for the file existing and then selecting between two different types on open. David ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories 2017-12-01 9:46 ` [kernel-hardening] " David Laight @ 2017-12-01 15:52 ` Alan Cox -1 siblings, 0 replies; 60+ messages in thread From: Alan Cox @ 2017-12-01 15:52 UTC (permalink / raw) To: David Laight Cc: 'Solar Designer', 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman > > That's general misuse of /tmp. Things like "command > /tmp/file" > > without having pre-created the file with O_EXCL e.g. by mktemp(1). > > I'm sorry, I've been using Unix for over 30 years. > /tmp is a place that temporary files were created - nothing special. > Traditionally it was emptied on every boot. > There was never anything that required files be created in any > specific way. And in 1978 you had to boot single user and use nckeck and icheck to fix the filesystem up by hand, you had no networking, no systemd, no sysvinit, no ANSI C. no X11 ... (shall I go on...) There are reasons it all changed. The origin of /tmp is a compromise of security and disk performance made in the 1970s about an OS that was quite different, running on a machine with typically 256K of RAM, no RAM disks, a single very expensive fixed head drive and a larger moving head one. The existence of /tmp in that form today is a bizarre historic quirk. Fortunately if you want a perfectly safe /tmp/ use namespaces and every user can have their own private /tmp. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 2/2] Protected O_CREAT open in sticky directories @ 2017-12-01 15:52 ` Alan Cox 0 siblings, 0 replies; 60+ messages in thread From: Alan Cox @ 2017-12-01 15:52 UTC (permalink / raw) To: David Laight Cc: 'Solar Designer', 'Salvatore Mesoraca', linux-kernel, Kernel Hardening, linux-fsdevel, Alexander Viro, Jann Horn, Kees Cook, Eric W. Biederman > > That's general misuse of /tmp. Things like "command > /tmp/file" > > without having pre-created the file with O_EXCL e.g. by mktemp(1). > > I'm sorry, I've been using Unix for over 30 years. > /tmp is a place that temporary files were created - nothing special. > Traditionally it was emptied on every boot. > There was never anything that required files be created in any > specific way. And in 1978 you had to boot single user and use nckeck and icheck to fix the filesystem up by hand, you had no networking, no systemd, no sysvinit, no ANSI C. no X11 ... (shall I go on...) There are reasons it all changed. The origin of /tmp is a compromise of security and disk performance made in the 1970s about an OS that was quite different, running on a machine with typically 256K of RAM, no RAM disks, a single very expensive fixed head drive and a larger moving head one. The existence of /tmp in that form today is a bizarre historic quirk. Fortunately if you want a perfectly safe /tmp/ use namespaces and every user can have their own private /tmp. Alan ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca ` (2 preceding siblings ...) (?) @ 2017-11-27 1:14 ` Solar Designer 2017-11-27 13:18 ` Solar Designer 2017-11-30 14:37 ` Salvatore Mesoraca -1 siblings, 2 replies; 60+ messages in thread From: Solar Designer @ 2017-11-27 1:14 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening Replying to kernel-hardening only, so we can consider the below thoughts in a smaller group before bothering LKML: On Wed, Nov 22, 2017 at 09:01:44AM +0100, Salvatore Mesoraca wrote: > This patch-set introduces two separate features aimed at restricting > dangerous open in world or group writable sticky directories. > The purpose is to prevent exploitable bugs in user-space programs > that don't access sticky directories in the proper way. > The first patch prevents the O_CREAT open of FIFOs and regular files > in world or group writable sticky directories, if they already exists > and are owned by someone else. > The second patch prevents O_CREAT open in world or group writable > sticky when the O_EXCL flag is not set, even if the file doesn't > exist yet. When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd try to introduce them at the same time with the restrictions on FIFOs and regular files. I think that bundling these together might be a recipe for never getting any of them in, or for getting in the latter with unnecessarily limited scope. I think we need an optional policy against O_CREAT-without-O_EXCL not only for sticky directories, but also for other directories writable by other than the current fsuid - there may be several levels or individual flags here. So maybe unbundle these or at least avoid forever limiting them to sticky directories (don't include "sticky" in the sysctl name). Meanwhile, here's a scenario where I think the FIFO restrictions are most obviously still worthwhile: attacks on system bootup scripts, where the attacker would plant a FIFO on a persistent /tmp or the like and the system would access it on next bootup. The attacker's exploit program would not yet be running during the bootup, so it wouldn't be able to even try winning a race, but a FIFO would be able to keep the vulnerable service (or its startup script, etc.) stuck until the attacker has access again. On the other hand, /tmp is commonly on tmpfs these days, whereas some other shared writable directories (not necessarily world writable and not necessarily sticky) are persistent (mail and cron job spools, preformatted man page cache, etc.) So maybe this, too, is a reason to eventually extend these protections beyond sticky directories (of course, only as an option). We shouldn't using naming that would be against such extension, or/and we shouldn't think of the O_CREAT-without-O_EXCL checks as permanently only for extra detection and reporting - they may also prevent more attacks if we eventually extend them to non-sticky, but not (yet) similarly extent the file type specific protections. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-27 1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer @ 2017-11-27 13:18 ` Solar Designer 2017-11-30 14:38 ` Salvatore Mesoraca 2017-11-30 14:37 ` Salvatore Mesoraca 1 sibling, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-11-27 13:18 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening Salvatore, On Mon, Nov 27, 2017 at 02:14:41AM +0100, Solar Designer wrote: > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd > try to introduce them at the same time with the restrictions on FIFOs > and regular files. I re-read what I wrote in that discussion in September, and I can see how it was confusing: first I suggested those checks as possible extra settings in the restrictions on FIFOs and regular files, then agreed with you that we need a separate sysctl in a separate patch - but we never discussed whether that separate patch should be part of the same patch series or not. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-27 13:18 ` Solar Designer @ 2017-11-30 14:38 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-30 14:38 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening 2017-11-27 14:18 GMT+01:00 Solar Designer <solar@openwall.com>: > Salvatore, > > On Mon, Nov 27, 2017 at 02:14:41AM +0100, Solar Designer wrote: >> When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd >> try to introduce them at the same time with the restrictions on FIFOs >> and regular files. > > I re-read what I wrote in that discussion in September, and I can see > how it was confusing: first I suggested those checks as possible extra > settings in the restrictions on FIFOs and regular files, then agreed > with you that we need a separate sysctl in a separate patch - but we > never discussed whether that separate patch should be part of the same > patch series or not. No problem, misunderstandings happen. I really appreciate your help with this patch! Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-27 1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer 2017-11-27 13:18 ` Solar Designer @ 2017-11-30 14:37 ` Salvatore Mesoraca 2017-11-30 19:05 ` Solar Designer 1 sibling, 1 reply; 60+ messages in thread From: Salvatore Mesoraca @ 2017-11-30 14:37 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening 2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>: > Replying to kernel-hardening only, so we can consider the below thoughts > in a smaller group before bothering LKML: > > On Wed, Nov 22, 2017 at 09:01:44AM +0100, Salvatore Mesoraca wrote: > > This patch-set introduces two separate features aimed at restricting > > dangerous open in world or group writable sticky directories. > > The purpose is to prevent exploitable bugs in user-space programs > > that don't access sticky directories in the proper way. > > The first patch prevents the O_CREAT open of FIFOs and regular files > > in world or group writable sticky directories, if they already exists > > and are owned by someone else. > > The second patch prevents O_CREAT open in world or group writable > > sticky when the O_EXCL flag is not set, even if the file doesn't > > exist yet. > > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd > try to introduce them at the same time with the restrictions on FIFOs > and regular files. I think that bundling these together might be a > recipe for never getting any of them in, or for getting in the latter > with unnecessarily limited scope. Yes, I think you are right. I'll split them. > I think we need an optional policy > against O_CREAT-without-O_EXCL not only for sticky directories, but also > for other directories writable by other than the current fsuid - there > may be several levels or individual flags here. > > So maybe unbundle these or at least avoid forever limiting them to > sticky directories (don't include "sticky" in the sysctl name). Actually I never liked that sysctl name :) I'll try to come up with something less ugly and that doesn't include the "sticky" word. If I implement something like what Matthew proposed[1] it will be easy to extend scope and functionalities of this feature without complicating too much the interface. > Meanwhile, here's a scenario where I think the FIFO restrictions are > most obviously still worthwhile: attacks on system bootup scripts, where > the attacker would plant a FIFO on a persistent /tmp or the like and the > system would access it on next bootup. The attacker's exploit program > would not yet be running during the bootup, so it wouldn't be able to > even try winning a race, but a FIFO would be able to keep the vulnerable > service (or its startup script, etc.) stuck until the attacker has > access again. On the other hand, /tmp is commonly on tmpfs these days, > whereas some other shared writable directories (not necessarily world > writable and not necessarily sticky) are persistent (mail and cron job > spools, preformatted man page cache, etc.) So maybe this, too, is a > reason to eventually extend these protections beyond sticky directories > (of course, only as an option). We shouldn't using naming that would be > against such extension, or/and we shouldn't think of the > O_CREAT-without-O_EXCL checks as permanently only for extra detection > and reporting - they may also prevent more attacks if we eventually > extend them to non-sticky, but not (yet) similarly extent the file type > specific protections. So, are you suggesting that I should extend "O_CREAT-without-O_EXCL" and "FIFOs restrictions" to work (optionally) on non-sticky directories too, while leaving untouched (for the moment) "normal files restrictions"? Thank you very much for you suggestions, Salvatore [1] https://lkml.org/lkml/2017/11/22/319 ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-30 14:37 ` Salvatore Mesoraca @ 2017-11-30 19:05 ` Solar Designer 2017-11-30 19:14 ` Solar Designer 2017-12-05 10:13 ` Salvatore Mesoraca 0 siblings, 2 replies; 60+ messages in thread From: Solar Designer @ 2017-11-30 19:05 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening On Thu, Nov 30, 2017 at 03:37:29PM +0100, Salvatore Mesoraca wrote: > 2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>: > > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd > > try to introduce them at the same time with the restrictions on FIFOs > > and regular files. I think that bundling these together might be a > > recipe for never getting any of them in, or for getting in the latter > > with unnecessarily limited scope. > > Yes, I think you are right. I'll split them. Thanks. > > I think we need an optional policy > > against O_CREAT-without-O_EXCL not only for sticky directories, but also > > for other directories writable by other than the current fsuid - there > > may be several levels or individual flags here. > > > > So maybe unbundle these or at least avoid forever limiting them to > > sticky directories (don't include "sticky" in the sysctl name). > > Actually I never liked that sysctl name :) I'll try to come up with something > less ugly and that doesn't include the "sticky" word. So the current name is "protected_sticky_child_create" (I couldn't even recall it, had to look it up for this reply). This unnecessarily bundles this potentially more general policy stuff with the existing "protections" against specific attacks, unnecessarily limits scope to "sticky" and "create", and talks about some "child". How about we use something totally different, focusing on "policy"? It could be simply "policy" (we're already in "fs"), or if that won't fly then how about "security_policy" or "dac_policy"? We will be imposing extra restrictions on top of usual Unix discretionary access control permission bits while not going all the way to mandatory access control (not tying objects to subjects). So in a sense we'll have an extension of Unix DAC. BTW, this and all of fs.protected* should be configurable per-container. > If I implement something like what Matthew proposed[1] it will be easy to > extend scope and functionalities of this feature without complicating too much > the interface. > [1] https://lkml.org/lkml/2017/11/22/319 Right. I tried thinking of a way to specify all reasonable combinations without the likely unreasonable ones, but couldn't come up with anything elegant. So I'm fine with Matthew's proposal as-is. A new thought: a directory that has someone else as its owner is for our purposes effectively the same as a group-writable directory. So maybe whatever we'll implement for group-writable directories should also be done for directories that have other than the current fsuid as their owner. It's currently very uncommon to have directories with the sticky bit set that are not at least group-writable, so this will rarely make a difference (and when it does, that's just right), but also it provides a way to explicitly include any directory under this monitoring (if the group-writable protection is on) - e.g., if a sysadmin wants this monitoring for users' home directories, they can change permissions for those from e.g. 700 to 1700. This could be handy for development and auditing of software, even though in production it could be easily circumvented by the directory owner (who can remove the sticky bit, which we should document to avoid providing a false sense of security) and it won't automatically apply to subdirectories. It'd also cover part of what we intend to achieve later by possibly extending the feature to non-sticky directories, where we might also want to treat different owner the same as group-writable (without the circumvention). > > Meanwhile, here's a scenario where I think the FIFO restrictions are > > most obviously still worthwhile: attacks on system bootup scripts, where > > the attacker would plant a FIFO on a persistent /tmp or the like and the > > system would access it on next bootup. The attacker's exploit program > > would not yet be running during the bootup, so it wouldn't be able to > > even try winning a race, but a FIFO would be able to keep the vulnerable > > service (or its startup script, etc.) stuck until the attacker has > > access again. On the other hand, /tmp is commonly on tmpfs these days, > > whereas some other shared writable directories (not necessarily world > > writable and not necessarily sticky) are persistent (mail and cron job > > spools, preformatted man page cache, etc.) So maybe this, too, is a > > reason to eventually extend these protections beyond sticky directories > > (of course, only as an option). We shouldn't using naming that would be > > against such extension, or/and we shouldn't think of the > > O_CREAT-without-O_EXCL checks as permanently only for extra detection > > and reporting - they may also prevent more attacks if we eventually > > extend them to non-sticky, but not (yet) similarly extent the file type > > specific protections. > > So, are you suggesting that I should extend "O_CREAT-without-O_EXCL" > and "FIFOs restrictions" to work (optionally) on non-sticky directories too, > while leaving untouched (for the moment) "normal files restrictions"? No, I think all of these and the existing symlink restrictions should potentially be extended "to work (optionally) on non-sticky directories too", but perhaps with separate patches later. An even further extension may be to cover non-O_CREAT: writing or/and reading an existing file in an untrusted directory is also potentially unsafe. Unfortunately, we can't reliably know whether the program possibly takes precautions by using lstat() and fstat() and comparing st_dev/st_ino, so we won't be able to distinguish likely unsafe and likely mostly safe accesses, but we'll be able to flag all of them for manual analysis on a developer's or an auditor's system. A reasonable strict policy one might want to follow is to have all accesses done as the right fsuid, without needing those unreliable st_dev/st_ino checks. (They're unreliable because of potential side-effects on open() and inode reuse.) For example, we chose this strict policy in Openwall's "tcb suite" and shadow suite patches: http://www.openwall.com/tcb/ Of course, then there's the question on whether something exotic(?) like this should be in the kernel. To me, it's like a "gcc -Wall" for filesystem accesses. Sure it can "falsely" detect many technically valid uses, but it's also helpful to improve our filesystem access safety. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-30 19:05 ` Solar Designer @ 2017-11-30 19:14 ` Solar Designer 2017-12-05 10:13 ` Salvatore Mesoraca 1 sibling, 0 replies; 60+ messages in thread From: Solar Designer @ 2017-11-30 19:14 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening On Thu, Nov 30, 2017 at 08:05:39PM +0100, Solar Designer wrote: > A new thought: a directory that has someone else as its owner is for our > purposes effectively the same as a group-writable directory. So maybe > whatever we'll implement for group-writable directories should also be > done for directories that have other than the current fsuid as their > owner. It's currently very uncommon to have directories with the sticky > bit set that are not at least group-writable, so this will rarely make a > difference (and when it does, that's just right), but also it provides a > way to explicitly include any directory under this monitoring (if the > group-writable protection is on) - e.g., if a sysadmin wants this > monitoring for users' home directories, they can change permissions for > those from e.g. 700 to 1700. This could be handy for development and > auditing of software, even though in production it could be easily > circumvented by the directory owner (who can remove the sticky bit, > which we should document to avoid providing a false sense of security) > and it won't automatically apply to subdirectories. It'd also cover > part of what we intend to achieve later by possibly extending the > feature to non-sticky directories, where we might also want to treat > different owner the same as group-writable (without the circumvention). Of course, where I wrote "other than the current fsuid", it should be "a non-root user other than the current fsuid". We trust root. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-11-30 19:05 ` Solar Designer 2017-11-30 19:14 ` Solar Designer @ 2017-12-05 10:13 ` Salvatore Mesoraca 2017-12-07 22:23 ` Solar Designer 1 sibling, 1 reply; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-05 10:13 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > On Thu, Nov 30, 2017 at 03:37:29PM +0100, Salvatore Mesoraca wrote: > > 2017-11-27 2:14 GMT+01:00 Solar Designer <solar@openwall.com>: > > > When I suggested the O_CREAT-without-O_EXCL checks, I didn't mean you'd > > > try to introduce them at the same time with the restrictions on FIFOs > > > and regular files. I think that bundling these together might be a > > > recipe for never getting any of them in, or for getting in the latter > > > with unnecessarily limited scope. > > > > Yes, I think you are right. I'll split them. > > Thanks. > > > > I think we need an optional policy > > > against O_CREAT-without-O_EXCL not only for sticky directories, but also > > > for other directories writable by other than the current fsuid - there > > > may be several levels or individual flags here. > > > > > > So maybe unbundle these or at least avoid forever limiting them to > > > sticky directories (don't include "sticky" in the sysctl name). > > > > Actually I never liked that sysctl name I'll try to come up with something > > less ugly and that doesn't include the "sticky" word. > > So the current name is "protected_sticky_child_create" (I couldn't even > recall it, had to look it up for this reply). This unnecessarily > bundles this potentially more general policy stuff with the existing > "protections" against specific attacks, unnecessarily limits scope to > "sticky" and "create", and talks about some "child". How about we use > something totally different, focusing on "policy"? It could be simply > "policy" (we're already in "fs"), or if that won't fly then how about > "security_policy" or "dac_policy"? We will be imposing extra > restrictions on top of usual Unix discretionary access control > permission bits while not going all the way to mandatory access control > (not tying objects to subjects). So in a sense we'll have an extension > of Unix DAC. Yea, I like "dac_policy" very much. > BTW, this and all of fs.protected* should be configurable per-container. This would be a nice thing to do in the future. > > If I implement something like what Matthew proposed[1] it will be easy to > > extend scope and functionalities of this feature without complicating too much > > the interface. > > > [1] https://lkml.org/lkml/2017/11/22/319 > > Right. I tried thinking of a way to specify all reasonable combinations > without the likely unreasonable ones, but couldn't come up with anything > elegant. So I'm fine with Matthew's proposal as-is. Great. > A new thought: a directory that has someone else as its owner is for our > purposes effectively the same as a group-writable directory. So maybe > whatever we'll implement for group-writable directories should also be > done for directories that have other than the current fsuid as their > owner. It's currently very uncommon to have directories with the sticky > bit set that are not at least group-writable, so this will rarely make a > difference (and when it does, that's just right), but also it provides a > way to explicitly include any directory under this monitoring (if the > group-writable protection is on) - e.g., if a sysadmin wants this > monitoring for users' home directories, they can change permissions for > those from e.g. 700 to 1700. This could be handy for development and > auditing of software, even though in production it could be easily > circumvented by the directory owner (who can remove the sticky bit, > which we should document to avoid providing a false sense of security) > and it won't automatically apply to subdirectories. It'd also cover > part of what we intend to achieve later by possibly extending the > feature to non-sticky directories, where we might also want to treat > different owner the same as group-writable (without the circumvention). Agreed. I'll extend it to also check for the directory owner. Maybe I could use another bit and make this additional restriction independent from the "group-writable" one. > > So, are you suggesting that I should extend "O_CREAT-without-O_EXCL" > > and "FIFOs restrictions" to work (optionally) on non-sticky directories too, > > while leaving untouched (for the moment) "normal files restrictions"? > > No, I think all of these and the existing symlink restrictions should > potentially be extended "to work (optionally) on non-sticky directories > too", but perhaps with separate patches later. OK. > An even further extension may be to cover non-O_CREAT: writing or/and > reading an existing file in an untrusted directory is also potentially > unsafe. Unfortunately, we can't reliably know whether the program > possibly takes precautions by using lstat() and fstat() and comparing > st_dev/st_ino, so we won't be able to distinguish likely unsafe and > likely mostly safe accesses, but we'll be able to flag all of them for > manual analysis on a developer's or an auditor's system. A reasonable > strict policy one might want to follow is to have all accesses done as > the right fsuid, without needing those unreliable st_dev/st_ino checks. > (They're unreliable because of potential side-effects on open() and > inode reuse.) For example, we chose this strict policy in Openwall's > "tcb suite" and shadow suite patches: http://www.openwall.com/tcb/ > > Of course, then there's the question on whether something exotic(?) like > this should be in the kernel. > > To me, it's like a "gcc -Wall" for filesystem accesses. Sure it can > "falsely" detect many technically valid uses, but it's also helpful to > improve our filesystem access safety. Yes, this could be a useful improvement for the future. Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-05 10:13 ` Salvatore Mesoraca @ 2017-12-07 22:23 ` Solar Designer 2017-12-08 12:17 ` Solar Designer 2017-12-11 16:07 ` Solar Designer 0 siblings, 2 replies; 60+ messages in thread From: Solar Designer @ 2017-12-07 22:23 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox CC: Matthew Wilcox On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > So the current name is "protected_sticky_child_create" (I couldn't even > > recall it, had to look it up for this reply). This unnecessarily > > bundles this potentially more general policy stuff with the existing > > "protections" against specific attacks, unnecessarily limits scope to > > "sticky" and "create", and talks about some "child". How about we use > > something totally different, focusing on "policy"? It could be simply > > "policy" (we're already in "fs"), or if that won't fly then how about > > "security_policy" or "dac_policy"? We will be imposing extra > > restrictions on top of usual Unix discretionary access control > > permission bits while not going all the way to mandatory access control > > (not tying objects to subjects). So in a sense we'll have an extension > > of Unix DAC. > > Yea, I like "dac_policy" very much. OK. Thinking of this further, what we might want to have is a generic policy against file accesses that would negate a privilege boundary between Unix (pseudo-)users. And one of those would be execve(2) of a file writable by someone else, or with any of its parent directories writable by someone else (unless that's root, indeed). I think the name "dac_policy" still fits this well, but anything with "create" or even "open" in the name would not fit it. Another way of looking at this is that we'd be creating a reverse DAC - optionally blocking unsafe accesses made possible by too permissive DAC. But I don't have a good sysctl name suggestion along these lines. > > BTW, this and all of fs.protected* should be configurable per-container. > > This would be a nice thing to do in the future. OK. > > > If I implement something like what Matthew proposed[1] it will be easy to > > > extend scope and functionalities of this feature without complicating too much > > > the interface. > > > > > [1] https://lkml.org/lkml/2017/11/22/319 > > > > Right. I tried thinking of a way to specify all reasonable combinations > > without the likely unreasonable ones, but couldn't come up with anything > > elegant. So I'm fine with Matthew's proposal as-is. > > Great. Thinking of this further, maybe it'd be friendlier to further expansion if we separate the policy and notify vs. block into two bitmasks, in separate sysctl's. If a sysadmin wants only notifications or only blocking, that would then be easier to achieve, by setting the notify vs. block sysctl to 0 or to all 1's (to -1 maybe, although this would give the wrong impression of being "below 0"). Then the sysadmin would be able to focus on the actual policy bits separately, without the distraction of the notify vs. block bits inbetween. The names could then be "dac_policy" and "dac_policy_enforce" maybe? Any other suggestions? BTW, if we negate the integers corresponding to these, then two 1's would mean full enforcement of the strictest policy, and negative bitmask values could be used to specify fine-grained policy and/or enforcement. Good idea or too unusual/confusing? I'm afraid it is indeed confusing. dac_policy_enforce=1 meaning full enforcement makes more sense to me, but negating only one of these two integers would also be confusing. So we probably shouldn't do any of this. I just thought I'd share the idea anyway. > > A new thought: a directory that has someone else as its owner is for our > > purposes effectively the same as a group-writable directory. So maybe > > whatever we'll implement for group-writable directories should also be > > done for directories that have other than the current fsuid as their > > owner. It's currently very uncommon to have directories with the sticky > > bit set that are not at least group-writable, so this will rarely make a > > difference (and when it does, that's just right), but also it provides a > > way to explicitly include any dibrectory under this monitoring (if the > > group-writable protection is on) - e.g., if a sysadmin wants this > > monitoring for users' home directories, they can change permissions for > > those from e.g. 700 to 1700. This could be handy for development and > > auditing of software, even though in production it could be easily > > circumvented by the directory owner (who can remove the sticky bit, > > which we should document to avoid providing a false sense of security) > > and it won't automatically apply to subdirectories. It'd also cover > > part of what we intend to achieve later by possibly extending the > > feature to non-sticky directories, where we might also want to treat > > different owner the same as group-writable (without the circumvention). > > Agreed. I'll extend it to also check for the directory owner. > Maybe I could use another bit and make this additional restriction > independent from the "group-writable" one. I'd prefer not to waste a bit on this. We'll need plenty of bits later on, perhaps also in pairs or (with your suggestion to separate this one) in triplets, and every extra bit complicates the calculation of the bitmask value. > > > So, are you suggesting that I should extend "O_CREAT-without-O_EXCL" > > > and "FIFOs restrictions" to work (optionally) on non-sticky directories too, > > > while leaving untouched (for the moment) "normal files restrictions"? > > > > No, I think all of these and the existing symlink restrictions should > > potentially be extended "to work (optionally) on non-sticky directories > > too", but perhaps with separate patches later. > > OK. > > > An even further extension may be to cover non-O_CREAT: writing or/and > > reading an existing file in an untrusted directory is also potentially > > unsafe. Unfortunately, we can't reliably know whether the program > > possibly takes precautions by using lstat() and fstat() and comparing > > st_dev/st_ino, so we won't be able to distinguish likely unsafe and > > likely mostly safe accesses, but we'll be able to flag all of them for > > manual analysis on a developer's or an auditor's system. A reasonable > > strict policy one might want to follow is to have all accesses done as > > the right fsuid, without needing those unreliable st_dev/st_ino checks. > > (They're unreliable because of potential side-effects on open() and > > inode reuse.) For example, we chose this strict policy in Openwall's > > "tcb suite" and shadow suite patches: http://www.openwall.com/tcb/ > > > > Of course, then there's the question on whether something exotic(?) like > > this should be in the kernel. > > > > To me, it's like a "gcc -Wall" for filesystem accesses. Sure it can > > "falsely" detect many technically valid uses, but it's also helpful to > > improve our filesystem access safety. > > Yes, this could be a useful improvement for the future. > > Salvatore Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-07 22:23 ` Solar Designer @ 2017-12-08 12:17 ` Solar Designer 2017-12-11 12:07 ` Salvatore Mesoraca 2017-12-11 16:07 ` Solar Designer 1 sibling, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-12-08 12:17 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > So the current name is "protected_sticky_child_create" (I couldn't even > > > recall it, had to look it up for this reply). This unnecessarily > > > bundles this potentially more general policy stuff with the existing > > > "protections" against specific attacks, unnecessarily limits scope to > > > "sticky" and "create", and talks about some "child". How about we use > > > something totally different, focusing on "policy"? It could be simply > > > "policy" (we're already in "fs"), or if that won't fly then how about > > > "security_policy" or "dac_policy"? We will be imposing extra > > > restrictions on top of usual Unix discretionary access control > > > permission bits while not going all the way to mandatory access control > > > (not tying objects to subjects). So in a sense we'll have an extension > > > of Unix DAC. > > > > Yea, I like "dac_policy" very much. > > OK. Thinking of this further, what we might want to have is a generic > policy against file accesses that would negate a privilege boundary > between Unix (pseudo-)users. And one of those would be execve(2) of a > file writable by someone else, or with any of its parent directories > writable by someone else (unless that's root, indeed). I think the name > "dac_policy" still fits this well, but anything with "create" or even > "open" in the name would not fit it. > > Another way of looking at this is that we'd be creating a reverse DAC - > optionally blocking unsafe accesses made possible by too permissive DAC. > But I don't have a good sysctl name suggestion along these lines. Yet another thought is that while we won't go for MAC ("not tying objects to subjects") in our additional policy against unsafe file accesses, it could reasonably be extended to cover not only accesses made possible by too permissive DAC, but also those made possible by too permissive ACLs. I doubt we'd want to actually add ACL support in there (too obscure, unnecessary when auditing software), but if someone eventually does that would fit in with our general approach (and wouldn't even require separate flags, as it's conceptually similar to group-writable). So an ideal name would not conflict with that. What would be a good short name for "file access safety policy"? Maybe we can omit "file" considering we're in "fs". "access_safety_policy"? The individual flags might correspond to detecting and/or blocking: O_CREAT w/o O_EXCL in +t world-writable directory except if file exists and is owned by root or by current fsuid and the directory owner is root or current fsuid O_CREAT w/o O_EXCL in +t group-writable directory except if file exists and is owned by root or by current fsuid and the directory owner is root or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another non-root user (regardless of whether the file exists and its ownership) (Note: the "+t directory owned by another non-root user" case is for consistency and for software testing only, not for enforcing a policy like this in production, because the directory owner can remove the +t.) O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions) O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or O_CREAT w/o O_EXCL in any directory owned by another non-root user (regardless of the sticky bit, and also without exceptions) Same as above for O_WRONLY or O_RDWR, so 4 more flags. Same as above for O_RDONLY, so 4 more flags. Other unsafe file accesses via pathname, including execve(2) (would consider permissions on the file itself, as well as the directory), chmod(2), chown(2). That's maybe 6 more flags if we continue to separate world vs. group/user, even if we don't separate by +t (but we could nevertheless trust a pre-existing file in +t under the same conditions listed for "O_CREAT w/o O_EXCL in +t"). Then we could want to eventually add checking of all parent directories, and this in turn would make more syscalls relevant (in the sense of being usable for an immediate attack on the invoking user, making them do something unintended) in the special case when an upper level directory is unsafe: symlink(2), link(2), unlink(2), rename(2), mkdir(2), rmdir(2), perhaps more. Just for these 6 I quickly identified, and if we continue to separate world vs. group/user, this could be 12 more flags. So this would be 4+4+6+12 = 26 flags already. This feels like way too many, as well as not leaving enough bits e.g. in a 32-bit mask in case we end up identifying many more relevant syscalls. So perhaps the approach above is too fine-grained and should be revised. If so, maybe our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too fine-grained and should be revised as well. What do you think? How to group these categories in fewer flags best? Perhaps less separation of the open(2) flags and/or of the syscalls relevant only with unsafe upper-level directories (one flag for all of them?) Drop the +t / -t separation for open(2)? (But nevertheless trust a pre-existing file in +t under the same conditions as I listed for "O_CREAT w/o O_EXCL in +t".) Drop the world vs. group/user separation? Introduce more sysctl's - separate for open(2) vs. other syscalls? And/or maybe separate for world vs. group/user-writable? This would be convenient in that the same bit positions would refer to the same open(2) modes in the same directory categories, or the same syscalls. Extending a previously tested policy for world-writable to cover also group/user-writable would be as simple as setting the other sysctl to the same value. Any other ideas? > > > > If I implement something like what Matthew proposed[1] it will be easy to > > > > extend scope and functionalities of this feature without complicating too much > > > > the interface. > > > > > > > [1] https://lkml.org/lkml/2017/11/22/319 > > > > > > Right. I tried thinking of a way to specify all reasonable combinations > > > without the likely unreasonable ones, but couldn't come up with anything > > > elegant. So I'm fine with Matthew's proposal as-is. > > > > Great. > > Thinking of this further, maybe it'd be friendlier to further expansion > if we separate the policy and notify vs. block into two bitmasks, in > separate sysctl's. If a sysadmin wants only notifications or only > blocking, that would then be easier to achieve, by setting the notify > vs. block sysctl to 0 or to all 1's (to -1 maybe, although this would > give the wrong impression of being "below 0"). Then the sysadmin would > be able to focus on the actual policy bits separately, without the > distraction of the notify vs. block bits inbetween. > > The names could then be "dac_policy" and "dac_policy_enforce" maybe? > Any other suggestions? Unfortunately, my suggestion above wouldn't address Matthew's criticism about the lack of flexibility. It wouldn't be possible to independently control notifications and enforcement. So perhaps the two sysctl name suffixes can be "notify" and "enforce", and they'd independently define two policies - one producing notifications only and the other actually blocking syscalls. Usually they'd be the same or one of them would be 0, but arbitrary two policies could also be specified. Now how to combine this with extensibility of the policies themselves, potentially eventually going to the extent I outlined above, which might require multiple sysctl's per policy? Perhaps just double their number? If so, that may be a reason for keeping each of the two policies in fewer sysctl's, which is in turn a reason for lower granularity. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-08 12:17 ` Solar Designer @ 2017-12-11 12:07 ` Salvatore Mesoraca 2017-12-11 15:33 ` Solar Designer 0 siblings, 1 reply; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-11 12:07 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox 2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>: > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > > So the current name is "protected_sticky_child_create" (I couldn't even > > > > recall it, had to look it up for this reply). This unnecessarily > > > > bundles this potentially more general policy stuff with the existing > > > > "protections" against specific attacks, unnecessarily limits scope to > > > > "sticky" and "create", and talks about some "child". How about we use > > > > something totally different, focusing on "policy"? It could be simply > > > > "policy" (we're already in "fs"), or if that won't fly then how about > > > > "security_policy" or "dac_policy"? We will be imposing extra > > > > restrictions on top of usual Unix discretionary access control > > > > permission bits while not going all the way to mandatory access control > > > > (not tying objects to subjects). So in a sense we'll have an extension > > > > of Unix DAC. > > > > > > Yea, I like "dac_policy" very much. > > > > OK. Thinking of this further, what we might want to have is a generic > > policy against file accesses that would negate a privilege boundary > > between Unix (pseudo-)users. And one of those would be execve(2) of a > > file writable by someone else, or with any of its parent directories > > writable by someone else (unless that's root, indeed). This looks similar to what grsecurity calls TPE, I think that there is already some ongoing work to port it upstream as an LSM. I don't know if it's still under development, I didn't heard about it in a while. Actually what you are proposing here is a bit different, because it will never prevent the user from running its own executables, so its usefulness isn't completely defeated by language interpreters etc. Still, scripts that are directly "sourced" will escape this protection, but at least we'll cover many other cases. We should add a restriction on executable mmaps, too. > > I think the name > > "dac_policy" still fits this well, but anything with "create" or even > > "open" in the name would not fit it. > > > > Another way of looking at this is that we'd be creating a reverse DAC - > > optionally blocking unsafe accesses made possible by too permissive DAC. > > But I don't have a good sysctl name suggestion along these lines. > > Yet another thought is that while we won't go for MAC ("not tying > objects to subjects") in our additional policy against unsafe file > accesses, it could reasonably be extended to cover not only accesses > made possible by too permissive DAC, but also those made possible by too > permissive ACLs. I doubt we'd want to actually add ACL support in > there (too obscure, unnecessary when auditing software), but if someone > eventually does that would fit in with our general approach (and > wouldn't even require separate flags, as it's conceptually similar to > group-writable). I'm not sure if I understood what you meant, but the "group-writable" check implicitly checks for permissive ACLs, because they force the group bits on. > So an ideal name would not conflict with that. > > What would be a good short name for "file access safety policy"? Maybe > we can omit "file" considering we're in "fs". "access_safety_policy"? > > The individual flags might correspond to detecting and/or blocking: > > O_CREAT w/o O_EXCL in +t world-writable directory except if file exists > and is owned by root or by current fsuid and the directory owner is root > or current fsuid > > O_CREAT w/o O_EXCL in +t group-writable directory except if file exists > and is owned by root or by current fsuid and the directory owner is root > or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another > non-root user (regardless of whether the file exists and its ownership) > > (Note: the "+t directory owned by another non-root user" case is for > consistency and for software testing only, not for enforcing a policy > like this in production, because the directory owner can remove the +t.) > > O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions) > > O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or > O_CREAT w/o O_EXCL in any directory owned by another non-root user > (regardless of the sticky bit, and also without exceptions) > > Same as above for O_WRONLY or O_RDWR, so 4 more flags. > > Same as above for O_RDONLY, so 4 more flags. > > Other unsafe file accesses via pathname, including execve(2) (would > consider permissions on the file itself, as well as the directory), I'll use the same bit used to restrict execve to restrict also executable mmaps. What do you think? > chmod(2), chown(2). That's maybe 6 more flags if we continue to > separate world vs. group/user, even if we don't separate by +t (but we > could nevertheless trust a pre-existing file in +t under the same > conditions listed for "O_CREAT w/o O_EXCL in +t"). > > Then we could want to eventually add checking of all parent directories, > and this in turn would make more syscalls relevant (in the sense of > being usable for an immediate attack on the invoking user, making them > do something unintended) in the special case when an upper level > directory is unsafe: symlink(2), link(2), unlink(2), rename(2), > mkdir(2), rmdir(2), perhaps more. Just for these 6 I quickly > identified, and if we continue to separate world vs. group/user, this > could be 12 more flags. > > So this would be 4+4+6+12 = 26 flags already. This feels like way too > many, as well as not leaving enough bits e.g. in a 32-bit mask in case > we end up identifying many more relevant syscalls. So perhaps the > approach above is too fine-grained and should be revised. If so, maybe > our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too > fine-grained and should be revised as well. > > What do you think? > > How to group these categories in fewer flags best? Perhaps less > separation of the open(2) flags and/or of the syscalls relevant only > with unsafe upper-level directories (one flag for all of them?) > > Drop the +t / -t separation for open(2)? (But nevertheless trust a > pre-existing file in +t under the same conditions as I listed for > "O_CREAT w/o O_EXCL in +t".) > > Drop the world vs. group/user separation? > > Introduce more sysctl's - separate for open(2) vs. other syscalls? > > And/or maybe separate for world vs. group/user-writable? This would be > convenient in that the same bit positions would refer to the same > open(2) modes in the same directory categories, or the same syscalls. > Extending a previously tested policy for world-writable to cover also > group/user-writable would be as simple as setting the other sysctl to > the same value. > > Any other ideas? I'd group some of those restrictions to be controlled by the same bit, something like: - open(2) - execve(2), mmap(2) - chmod(2), chown(2) - mkdir(2), rmdir(2) - symlink(2), link(2), unlink(2), rename(2) And I'd use the same bitmasks in the following sysctls: - access_control_policy - access_control_policy_notify - access_control_policy_group - access_control_policy_group_notify If those names are too long we could change "access_control" to just "ac". What do you think? Anyway, given that the scope of this feature is growing bigger and bigger, should it be reimplemented as an LSM? Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-11 12:07 ` Salvatore Mesoraca @ 2017-12-11 15:33 ` Solar Designer 2017-12-12 18:00 ` Salvatore Mesoraca 0 siblings, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-12-11 15:33 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox On Mon, Dec 11, 2017 at 01:07:36PM +0100, Salvatore Mesoraca wrote: > 2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>: > > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > > > So the current name is "protected_sticky_child_create" (I couldn't even > > > > > recall it, had to look it up for this reply). This unnecessarily > > > > > bundles this potentially more general policy stuff with the existing > > > > > "protections" against specific attacks, unnecessarily limits scope to > > > > > "sticky" and "create", and talks about some "child". How about we use > > > > > something totally different, focusing on "policy"? It could be simply > > > > > "policy" (we're already in "fs"), or if that won't fly then how about > > > > > "security_policy" or "dac_policy"? We will be imposing extra > > > > > restrictions on top of usual Unix discretionary access control > > > > > permission bits while not going all the way to mandatory access control > > > > > (not tying objects to subjects). So in a sense we'll have an extension > > > > > of Unix DAC. > > > > > > > > Yea, I like "dac_policy" very much. > > > > > > OK. Thinking of this further, what we might want to have is a generic > > > policy against file accesses that would negate a privilege boundary > > > between Unix (pseudo-)users. And one of those would be execve(2) of a > > > file writable by someone else, or with any of its parent directories > > > writable by someone else (unless that's root, indeed). > > This looks similar to what grsecurity calls TPE, Related, but not very similar. The goal is almost the opposite. TPE aims to prevent the user from bringing and running their own (and other non-root users') binaries, so it restricts the invoking user. [ FWIW, it's restrictive tendencies like this that led to the name Openwall at the time when TPE was new (it was already called that in a 1998 Phrack article, pre-dating grsecurity), meaning we'd focus on security without restricting the user's reasonable use of the system. In late 1990s and early 2000s, it almost looked like e.g. shared web hosting providers would forbid convenient and full Unix shell access (while nevertheless allowing PHP script uploads, etc.) Some did. But then this trend reversed. ] The potential inclusion of execve(2) in our safety policy would prevent the user from invoking binaries that are under another non-root user's control. So the similarity to TPE is that either policy would allow invoking binaries that are part of the system. The difference, roughly speaking, is that TPE would disallow the user's own binaries whereas we would disallow only someone else's binaries. > I think that there is > already some ongoing work to port it upstream as an LSM. Yes, this was discussed in here and I criticized it, but not CC'ing LKML not to obstruct that effort. > I don't know > if it's still under development, I didn't heard about it in a while. I also don't know what its current status is. > Actually what you are proposing here is a bit different, because it will > never prevent the user from running its own executables, Exactly. > so its usefulness > isn't completely defeated by language interpreters etc. Still, scripts that > are directly "sourced" will escape this protection, but at least we'll > cover many other cases. We should indeed consider the limitations of our approach, but I think without reference to TPE (and its bypasses) as TPE's goal is different. For scripts that are directly "sourced", we could have a policy on open(2), where I suggested optionally including even O_RDONLY. There's no fundamental difference between (interpreted) code and data input to a program - either may control that program. This is why merely reading a file under someone else's control is a risk. However, as I suggested so far, for open(2) we'd only potentially restrict file access in unsafe directories (eventually including their chain of parent directories). To deal with the risk of someone "sourcing" a script or reading a crucial configuration file directly writable by someone else, yet located in a safe directory, we'd have to extend those restrictions to target file permissions, like I suggested we do for execve(2). Unfortunately, this will probably cover many more reasonable and safe uses (where the program reading either knows to distrust the file contents or deliberately trusts whoever else can write the file). So if we go for this policy extension at all (perhaps for special-purpose/embedded distros that are developed to meet the policy?), we should probably have a separate bit to control it. > We should add a restriction on executable mmaps, too. We're focusing on accesses via pathnames, and that would be a restriction on what can be done on an open fd (what file permissions would we use? as of when?) It'd make some sense if we were to add restrictions on open(2) of files with unsafe permissions (on the file itself, not the directories), as I mentioned in the paragraph above. Then it'd allow us to make that restriction more fine-grained: apply it separately to shared libraries and such vs. arbitrary other files. But it's tricky. So while we may have it in mind and not rule it out with the interface we design now, it's not something we should decide on now. Another drawback of these policy extensions (including some from my previous message) is that we'd be getting into restricting the user (not our goal) rather than just preventing the user from doing things unsafely yet having a (preferably safe) way to achieve their goal (this is our goal). Sometimes a user will deliberately share a file with another user, and the other user should ideally have a way to read that file, even if by knowingly accepting a risk (although that's not ideal). This makes me sad, but I don't have a good suggestion except to ensure we're not forcing a sysadmin to enable these parts of the policy along with those parts that do not have such unfortunate user-restrictive properties. So this should affect our decisions on the grouping of potential policy bits. > > > I think the name > > > "dac_policy" still fits this well, but anything with "create" or even > > > "open" in the name would not fit it. > > > > > > Another way of looking at this is that we'd be creating a reverse DAC - > > > optionally blocking unsafe accesses made possible by too permissive DAC. > > > But I don't have a good sysctl name suggestion along these lines. > > > > Yet another thought is that while we won't go for MAC ("not tying > > objects to subjects") in our additional policy against unsafe file > > accesses, it could reasonably be extended to cover not only accesses > > made possible by too permissive DAC, but also those made possible by too > > permissive ACLs. I doubt we'd want to actually add ACL support in > > there (too obscure, unnecessary when auditing software), but if someone > > eventually does that would fit in with our general approach (and > > wouldn't even require separate flags, as it's conceptually similar to > > group-writable). > > I'm not sure if I understood what you meant, but the "group-writable" check > implicitly checks for permissive ACLs, because they force the group bits on. Are you referring to how ACLs are currently implemented in the kernel? I'm not familiar with that. If you like and have time, you might want to illustrate your point by a code snippet (if I understood you right, that would be code already in the kernel), although I think it's not essential to the rest of this discussion now (we should prefer sysctl names not conflicting with this potential goal either way). > > So an ideal name would not conflict with that. > > > > What would be a good short name for "file access safety policy"? Maybe > > we can omit "file" considering we're in "fs". "access_safety_policy"? > > > > The individual flags might correspond to detecting and/or blocking: > > > > O_CREAT w/o O_EXCL in +t world-writable directory except if file exists > > and is owned by root or by current fsuid and the directory owner is root > > or current fsuid > > > > O_CREAT w/o O_EXCL in +t group-writable directory except if file exists > > and is owned by root or by current fsuid and the directory owner is root > > or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another > > non-root user (regardless of whether the file exists and its ownership) > > > > (Note: the "+t directory owned by another non-root user" case is for > > consistency and for software testing only, not for enforcing a policy > > like this in production, because the directory owner can remove the +t.) > > > > O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions) > > > > O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or > > O_CREAT w/o O_EXCL in any directory owned by another non-root user > > (regardless of the sticky bit, and also without exceptions) > > > > Same as above for O_WRONLY or O_RDWR, so 4 more flags. > > > > Same as above for O_RDONLY, so 4 more flags. > > > > Other unsafe file accesses via pathname, including execve(2) (would > > consider permissions on the file itself, as well as the directory), > > I'll use the same bit used to restrict execve to restrict also > executable mmaps. > What do you think? I think we're not ready to discuss that. execve(2) itself more obviously fits in with what we were discussing so far because of relevance of directory permissions and because of access via pathname in this very syscall. > > chmod(2), chown(2). That's maybe 6 more flags if we continue to > > separate world vs. group/user, even if we don't separate by +t (but we > > could nevertheless trust a pre-existing file in +t under the same > > conditions listed for "O_CREAT w/o O_EXCL in +t"). > > > > Then we could want to eventually add checking of all parent directories, > > and this in turn would make more syscalls relevant (in the sense of > > being usable for an immediate attack on the invoking user, making them > > do something unintended) in the special case when an upper level > > directory is unsafe: symlink(2), link(2), unlink(2), rename(2), > > mkdir(2), rmdir(2), perhaps more. Just for these 6 I quickly > > identified, and if we continue to separate world vs. group/user, this > > could be 12 more flags. > > > > So this would be 4+4+6+12 = 26 flags already. This feels like way too > > many, as well as not leaving enough bits e.g. in a 32-bit mask in case > > we end up identifying many more relevant syscalls. So perhaps the > > approach above is too fine-grained and should be revised. If so, maybe > > our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too > > fine-grained and should be revised as well. > > > > What do you think? > > > > How to group these categories in fewer flags best? Perhaps less > > separation of the open(2) flags and/or of the syscalls relevant only > > with unsafe upper-level directories (one flag for all of them?) > > > > Drop the +t / -t separation for open(2)? (But nevertheless trust a > > pre-existing file in +t under the same conditions as I listed for > > "O_CREAT w/o O_EXCL in +t".) > > > > Drop the world vs. group/user separation? > > > > Introduce more sysctl's - separate for open(2) vs. other syscalls? > > > > And/or maybe separate for world vs. group/user-writable? This would be > > convenient in that the same bit positions would refer to the same > > open(2) modes in the same directory categories, or the same syscalls. > > Extending a previously tested policy for world-writable to cover also > > group/user-writable would be as simple as setting the other sysctl to > > the same value. > > > > Any other ideas? > > I'd group some of those restrictions to be controlled by the same bit, > something like: > - open(2) I think at least writes vs. reads are very different. > - execve(2), mmap(2) > - chmod(2), chown(2) > - mkdir(2), rmdir(2) > - symlink(2), link(2), unlink(2), rename(2) Maybe, perhaps except for mmap(2). > And I'd use the same bitmasks in the following sysctls: > - access_control_policy > - access_control_policy_notify > - access_control_policy_group > - access_control_policy_group_notify Our policy is not about access control - it is about access safety. (It's the same confusion with the comparison to TPE again. We should actually try and avoid getting into the access control territory here.) Also, in your list "access_control_policy" feels like it's the main thing, whereas actually it's just one of the four (for world-writable directories, right?) We need something like: safety_world_enforce safety_world_notify safety_group_enforce safety_group_notify but I'm afraid these are not self-explanatory at all. > What do you think? This will probably need even further discussion, implementation, some usage experience, then changes or maybe even a redesign before we possibly propose it for inclusion upstream. For now, we're just thinking out loud, and appear unready to propose anything upstream. As discussed previously, you should nevertheless submit the FIFO and regular file protections in +t as a separate patch. > Anyway, given that the scope of this feature is growing bigger and bigger, > should it be reimplemented as an LSM? I had this thought, too. I think we should consider possible races (are there races that would apply in an LSM but would be avoided in a patch?), maintenance (which implementation would likely be better maintained?), and easy availability (so that e.g. someone developing software on e.g. Ubuntu could easily partially turn this on temporarily, even if Ubuntu itself does not fully conform to the policy and thus wouldn't normally ship with these features - and would likely not include an LSM then?) Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-11 15:33 ` Solar Designer @ 2017-12-12 18:00 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-12 18:00 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox 2017-12-11 16:33 GMT+01:00 Solar Designer <solar@openwall.com>: > On Mon, Dec 11, 2017 at 01:07:36PM +0100, Salvatore Mesoraca wrote: > > 2017-12-08 13:17 GMT+01:00 Solar Designer <solar@openwall.com>: > > > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > > > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > > > > So the current name is "protected_sticky_child_create" (I couldn't even > > > > > > recall it, had to look it up for this reply). This unnecessarily > > > > > > bundles this potentially more general policy stuff with the existing > > > > > > "protections" against specific attacks, unnecessarily limits scope to > > > > > > "sticky" and "create", and talks about some "child". How about we use > > > > > > something totally different, focusing on "policy"? It could be simply > > > > > > "policy" (we're already in "fs"), or if that won't fly then how about > > > > > > "security_policy" or "dac_policy"? We will be imposing extra > > > > > > restrictions on top of usual Unix discretionary access control > > > > > > permission bits while not going all the way to mandatory access control > > > > > > (not tying objects to subjects). So in a sense we'll have an extension > > > > > > of Unix DAC. > > > > > > > > > > Yea, I like "dac_policy" very much. > > > > > > > > OK. Thinking of this further, what we might want to have is a generic > > > > policy against file accesses that would negate a privilege boundary > > > > between Unix (pseudo-)users. And one of those would be execve(2) of a > > > > file writable by someone else, or with any of its parent directories > > > > writable by someone else (unless that's root, indeed). > > > > This looks similar to what grsecurity calls TPE, > > Related, but not very similar. The goal is almost the opposite. > > TPE aims to prevent the user from bringing and running their own (and > other non-root users') binaries, so it restricts the invoking user. > > [ FWIW, it's restrictive tendencies like this that led to the name > Openwall at the time when TPE was new (it was already called that in a > 1998 Phrack article, pre-dating grsecurity), meaning we'd focus on > security without restricting the user's reasonable use of the system. > In late 1990s and early 2000s, it almost looked like e.g. shared web > hosting providers would forbid convenient and full Unix shell access > (while nevertheless allowing PHP script uploads, etc.) Some did. > But then this trend reversed. ] > > The potential inclusion of execve(2) in our safety policy would prevent > the user from invoking binaries that are under another non-root user's > control. > > So the similarity to TPE is that either policy would allow invoking > binaries that are part of the system. The difference, roughly speaking, > is that TPE would disallow the user's own binaries whereas we would > disallow only someone else's binaries. > > > I think that there is > > already some ongoing work to port it upstream as an LSM. > > Yes, this was discussed in here and I criticized it, but not CC'ing LKML > not to obstruct that effort. > > > I don't know > > if it's still under development, I didn't heard about it in a while. > > I also don't know what its current status is. > > > Actually what you are proposing here is a bit different, because it will > > never prevent the user from running its own executables, > > Exactly. > > > so its usefulness > > isn't completely defeated by language interpreters etc. Still, scripts that > > are directly "sourced" will escape this protection, but at least we'll > > cover many other cases. > > We should indeed consider the limitations of our approach, but I think > without reference to TPE (and its bypasses) as TPE's goal is different. > > For scripts that are directly "sourced", we could have a policy on > open(2), where I suggested optionally including even O_RDONLY. There's > no fundamental difference between (interpreted) code and data input to a > program - either may control that program. This is why merely reading a > file under someone else's control is a risk. > > However, as I suggested so far, for open(2) we'd only potentially > restrict file access in unsafe directories (eventually including their > chain of parent directories). To deal with the risk of someone > "sourcing" a script or reading a crucial configuration file directly > writable by someone else, yet located in a safe directory, we'd have to > extend those restrictions to target file permissions, like I suggested > we do for execve(2). Unfortunately, this will probably cover many more > reasonable and safe uses (where the program reading either knows to > distrust the file contents or deliberately trusts whoever else can write > the file). So if we go for this policy extension at all (perhaps for > special-purpose/embedded distros that are developed to meet the > policy?), we should probably have a separate bit to control it. Yes, a restriction like this will probably break many safe uses, but it can also be useful in some cases. > > We should add a restriction on executable mmaps, too. > > We're focusing on accesses via pathnames, and that would be a > restriction on what can be done on an open fd (what file permissions > would we use? as of when?) It'd make some sense if we were to add > restrictions on open(2) of files with unsafe permissions (on the file > itself, not the directories), as I mentioned in the paragraph above. > Then it'd allow us to make that restriction more fine-grained: apply it > separately to shared libraries and such vs. arbitrary other files. But > it's tricky. So while we may have it in mind and not rule it out with > the interface we design now, it's not something we should decide on now. > Another drawback of these policy extensions (including some from my > previous message) is that we'd be getting into restricting the user (not > our goal) rather than just preventing the user from doing things > unsafely yet having a (preferably safe) way to achieve their goal (this > is our goal). Sometimes a user will deliberately share a file with > another user, and the other user should ideally have a way to read that > file, even if by knowingly accepting a risk (although that's not ideal). > > This makes me sad, but I don't have a good suggestion except to ensure > we're not forcing a sysadmin to enable these parts of the policy along > with those parts that do not have such unfortunate user-restrictive > properties. So this should affect our decisions on the grouping of > potential policy bits. Agreed. > > > > I think the name > > > > "dac_policy" still fits this well, but anything with "create" or even > > > > "open" in the name would not fit it. > > > > > > > > Another way of looking at this is that we'd be creating a reverse DAC - > > > > optionally blocking unsafe accesses made possible by too permissive DAC. > > > > But I don't have a good sysctl name suggestion along these lines. > > > > > > Yet another thought is that while we won't go for MAC ("not tying > > > objects to subjects") in our additional policy against unsafe file > > > accesses, it could reasonably be extended to cover not only accesses > > > made possible by too permissive DAC, but also those made possible by too > > > permissive ACLs. I doubt we'd want to actually add ACL support in > > > there (too obscure, unnecessary when auditing software), but if someone > > > eventually does that would fit in with our general approach (and > > > wouldn't even require separate flags, as it's conceptually similar to > > > group-writable). > > > > I'm not sure if I understood what you meant, but the "group-writable" check > > implicitly checks for permissive ACLs, because they force the group bits on. > > Are you referring to how ACLs are currently implemented in the kernel? > I'm not familiar with that. If you like and have time, you might want > to illustrate your point by a code snippet (if I understood you right, > that would be code already in the kernel), although I think it's not > essential to the rest of this discussion now (we should prefer sysctl > names not conflicting with this potential goal either way). [disclaimer] While working on v1 of this patchset I was wondering if ACLs could be used to bypass these protections and so I dug a bit in the ACLs code in the kernel. I'm far from an expert on the subject and, I have to admit, I never used ACLs in real world. [/disclaimer] In my understanding, the permission bits granted to some user or group via ACLs are always masked against a "mask object" [1], so that ACLs cannot grant permissions that are not allowed by the "mask object" (except for the file owner). This "mask object" appears to be tied to the group permissions. In fact, whenever the group permissions are changed, it gets changed too [2] and whenever the "mask object" is changed, the file "p_mode" is changed accordingly [3][4]. In practice, ACLs cannot grant permissions that are not already granted in the mode bits. This is confirmed, in a convoluted way, in acl(5): "The ACL_MASK entry denotes the maximum access rights that can be granted by entries of type ACL_USER, ACL_GROUP_OBJ, or ACL_GROUP." ... "An ACL that contains entries of ACL_USER or ACL_GROUP tag types must contain exactly one entry of the ACL_MASK tag type" ... "If the ACL has an ACL_MASK entry, the group permissions correspond to the permissions of the ACL_MASK entry" e.g. host $ SOMEUSER='someuser' host $ touch FILE host $ chmod 0600 FILE host $ stat -c '%a' FILE 600 host $ strace -echmod,fchmod setfacl -m "u:$SOMEUSER:rwx" FILE +++ exited with 0 +++ host $ stat -c '%a' FILE #setfacl sets the mask_obj for us 670 host $ sudo -u $SOMEUSER touch FILE host $ chmod 0600 FILE host $ getfacl FILE # file: FILE # owner: root # group: root user::rw- user:someuser:rwx #effective:--- group::--- mask::--- other::--- host $ LC_ALL=C sudo -u $SOMEUSER touch FILE touch: cannot touch 'FILE': Permission denied [1] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L392 [2] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L507 [3] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L643 [4] http://elixir.free-electrons.com/linux/v4.15-rc3/source/fs/posix_acl.c#L302 > > > So an ideal name would not conflict with that. > > > > > > What would be a good short name for "file access safety policy"? Maybe > > > we can omit "file" considering we're in "fs". "access_safety_policy"? > > > > > > The individual flags might correspond to detecting and/or blocking: > > > > > > O_CREAT w/o O_EXCL in +t world-writable directory except if file exists > > > and is owned by root or by current fsuid and the directory owner is root > > > or current fsuid > > > > > > O_CREAT w/o O_EXCL in +t group-writable directory except if file exists > > > and is owned by root or by current fsuid and the directory owner is root > > > or current fsuid, or O_CREAT w/o O_EXCL in +t directory owned by another > > > non-root user (regardless of whether the file exists and its ownership) > > > > > > (Note: the "+t directory owned by another non-root user" case is for > > > consistency and for software testing only, not for enforcing a policy > > > like this in production, because the directory owner can remove the +t.) > > > > > > O_CREAT w/o O_EXCL in -t world-writable directory (no exceptions) > > > > > > O_CREAT w/o O_EXCL in -t group-writable directory (no exceptions), or > > > O_CREAT w/o O_EXCL in any directory owned by another non-root user > > > (regardless of the sticky bit, and also without exceptions) > > > > > > Same as above for O_WRONLY or O_RDWR, so 4 more flags. > > > > > > Same as above for O_RDONLY, so 4 more flags. > > > > > > Other unsafe file accesses via pathname, including execve(2) (would > > > consider permissions on the file itself, as well as the directory), > > > > I'll use the same bit used to restrict execve to restrict also > > executable mmaps. > > What do you think? > > I think we're not ready to discuss that. > > execve(2) itself more obviously fits in with what we were discussing so > far because of relevance of directory permissions and because of access > via pathname in this very syscall. OK. > > > chmod(2), chown(2). That's maybe 6 more flags if we continue to > > > separate world vs. group/user, even if we don't separate by +t (but we > > > could nevertheless trust a pre-existing file in +t under the same > > > conditions listed for "O_CREAT w/o O_EXCL in +t"). > > > > > > Then we could want to eventually add checking of all parent directories, > > > and this in turn would make more syscalls relevant (in the sense of > > > being usable for an immediate attack on the invoking user, making them > > > do something unintended) in the special case when an upper level > > > directory is unsafe: symlink(2), link(2), unlink(2), rename(2), > > > mkdir(2), rmdir(2), perhaps more. Just for these 6 I quickly > > > identified, and if we continue to separate world vs. group/user, this > > > could be 12 more flags. > > > > > > So this would be 4+4+6+12 = 26 flags already. This feels like way too > > > many, as well as not leaving enough bits e.g. in a 32-bit mask in case > > > we end up identifying many more relevant syscalls. So perhaps the > > > approach above is too fine-grained and should be revised. If so, maybe > > > our start with two bits for "O_CREAT w/o O_EXCL in +t" is already too > > > fine-grained and should be revised as well. > > > > > > What do you think? > > > > > > How to group these categories in fewer flags best? Perhaps less > > > separation of the open(2) flags and/or of the syscalls relevant only > > > with unsafe upper-level directories (one flag for all of them?) > > > > > > Drop the +t / -t separation for open(2)? (But nevertheless trust a > > > pre-existing file in +t under the same conditions as I listed for > > > "O_CREAT w/o O_EXCL in +t".) > > > > > > Drop the world vs. group/user separation? > > > > > > Introduce more sysctl's - separate for open(2) vs. other syscalls? > > > > > > And/or maybe separate for world vs. group/user-writable? This would be > > > convenient in that the same bit positions would refer to the same > > > open(2) modes in the same directory categories, or the same syscalls. > > > Extending a previously tested policy for world-writable to cover also > > > group/user-writable would be as simple as setting the other sysctl to > > > the same value. > > > > > > Any other ideas? > > > > I'd group some of those restrictions to be controlled by the same bit, > > something like: > > - open(2) > > I think at least writes vs. reads are very different. Agreed. > > - execve(2), mmap(2) > > - chmod(2), chown(2) > > - mkdir(2), rmdir(2) > > - symlink(2), link(2), unlink(2), rename(2) > > Maybe, perhaps except for mmap(2). OK. > > And I'd use the same bitmasks in the following sysctls: > > - access_control_policy > > - access_control_policy_notify > > - access_control_policy_group > > - access_control_policy_group_notify > > Our policy is not about access control - it is about access safety. Yes, you are right. > (It's the same confusion with the comparison to TPE again. We should > actually try and avoid getting into the access control territory here.) > > Also, in your list "access_control_policy" feels like it's the main > thing, whereas actually it's just one of the four (for world-writable > directories, right?) > > We need something like: > > safety_world_enforce > safety_world_notify > safety_group_enforce > safety_group_notify > > but I'm afraid these are not self-explanatory at all. Maybe we could use these names for the moment and change them later, once we'll have better defined this feature. > > What do you think? > > This will probably need even further discussion, implementation, some > usage experience, then changes or maybe even a redesign before we > possibly propose it for inclusion upstream. For now, we're just > thinking out loud, and appear unready to propose anything upstream. > > As discussed previously, you should nevertheless submit the FIFO and > regular file protections in +t as a separate patch. Yes, sure. > > Anyway, given that the scope of this feature is growing bigger and bigger, > > should it be reimplemented as an LSM? > > I had this thought, too. I think we should consider possible races (are > there races that would apply in an LSM but would be avoided in a patch?), Now that I think about it, I don't believe there is any way, for an LSM, to know if a file as been opened using the O_EXCL flag or not. So, with the current framework, this feature (or at least part of it) can't be implemented (AFAIK). Of course we can always add a new hook and, if we go down that route, we can prevent any possible race. > maintenance (which implementation would likely be better maintained?), Both solutions have pros and cons in this regard, I'm not sure which one would be better. > and easy availability (so that e.g. someone developing software on e.g. > Ubuntu could easily partially turn this on temporarily, even if Ubuntu > itself does not fully conform to the policy and thus wouldn't normally > ship with these features - and would likely not include an LSM then?) I don't think that this would be a big problem. If it's stackable and "off by default", distro maintainers could enable it without issues. Ubuntu kernel, in particular, is configured with support for any existing LSM, it defaults to AppArmor + Yama, but one could enable SELinux (or any other LSM) at boot. My impression is that an LSM has more chances to be accepted by upstream, especially if this feature needs to touch many different parts of fs/ Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-07 22:23 ` Solar Designer 2017-12-08 12:17 ` Solar Designer @ 2017-12-11 16:07 ` Solar Designer 2017-12-12 18:01 ` Salvatore Mesoraca 1 sibling, 1 reply; 60+ messages in thread From: Solar Designer @ 2017-12-11 16:07 UTC (permalink / raw) To: Salvatore Mesoraca; +Cc: Kernel Hardening, Matthew Wilcox On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > A new thought: a directory that has someone else as its owner is for our > > > purposes effectively the same as a group-writable directory. So maybe > > > whatever we'll implement for group-writable directories should also be > > > done for directories that have other than the current fsuid as their > > > owner. It's currently very uncommon to have directories with the sticky > > > bit set that are not at least group-writable, so this will rarely make a > > > difference (and when it does, that's just right), but also it provides a > > > way to explicitly include any dibrectory under this monitoring (if the > > > group-writable protection is on) - e.g., if a sysadmin wants this > > > monitoring for users' home directories, they can change permissions for > > > those from e.g. 700 to 1700. This could be handy for development and > > > auditing of software, even though in production it could be easily > > > circumvented by the directory owner (who can remove the sticky bit, > > > which we should document to avoid providing a false sense of security) > > > and it won't automatically apply to subdirectories. It'd also cover > > > part of what we intend to achieve later by possibly extending the > > > feature to non-sticky directories, where we might also want to treat > > > different owner the same as group-writable (without the circumvention). > > > > Agreed. I'll extend it to also check for the directory owner. > > Maybe I could use another bit and make this additional restriction > > independent from the "group-writable" one. > > I'd prefer not to waste a bit on this. We'll need plenty of bits later > on, perhaps also in pairs or (with your suggestion to separate this one) > in triplets, and every extra bit complicates the calculation of the > bitmask value. OTOH, if we were to interleave the world and group bits, then maybe instead of having the world and group bits as separate we could have a two bit number, which naturally leaves us an extra value we'd use for owner. So: 0 - none 1 - world 2 - group and world 3 - owner, group, and world This would remove the ability to enable our restrictions for group (and owner) while disabling them for world, but that's likely just an obscure misconfiguration. Unfortunately, this is not compatible with my other idea of easily extending a tested policy from world onto group by simply assigning the same value to the other sysctl. I'm not sure which of these approaches is cleaner. Alexander ^ permalink raw reply [flat|nested] 60+ messages in thread
* [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous open in sticky directories 2017-12-11 16:07 ` Solar Designer @ 2017-12-12 18:01 ` Salvatore Mesoraca 0 siblings, 0 replies; 60+ messages in thread From: Salvatore Mesoraca @ 2017-12-12 18:01 UTC (permalink / raw) To: Solar Designer; +Cc: Kernel Hardening, Matthew Wilcox 2017-12-11 17:07 GMT+01:00 Solar Designer <solar@openwall.com>: > On Thu, Dec 07, 2017 at 11:23:34PM +0100, Solar Designer wrote: > > On Tue, Dec 05, 2017 at 11:13:50AM +0100, Salvatore Mesoraca wrote: > > > 2017-11-30 20:05 GMT+01:00 Solar Designer <solar@openwall.com>: > > > > A new thought: a directory that has someone else as its owner is for our > > > > purposes effectively the same as a group-writable directory. So maybe > > > > whatever we'll implement for group-writable directories should also be > > > > done for directories that have other than the current fsuid as their > > > > owner. It's currently very uncommon to have directories with the sticky > > > > bit set that are not at least group-writable, so this will rarely make a > > > > difference (and when it does, that's just right), but also it provides a > > > > way to explicitly include any dibrectory under this monitoring (if the > > > > group-writable protection is on) - e.g., if a sysadmin wants this > > > > monitoring for users' home directories, they can change permissions for > > > > those from e.g. 700 to 1700. This could be handy for development and > > > > auditing of software, even though in production it could be easily > > > > circumvented by the directory owner (who can remove the sticky bit, > > > > which we should document to avoid providing a false sense of security) > > > > and it won't automatically apply to subdirectories. It'd also cover > > > > part of what we intend to achieve later by possibly extending the > > > > feature to non-sticky directories, where we might also want to treat > > > > different owner the same as group-writable (without the circumvention). > > > > > > Agreed. I'll extend it to also check for the directory owner. > > > Maybe I could use another bit and make this additional restriction > > > independent from the "group-writable" one. > > > > I'd prefer not to waste a bit on this. We'll need plenty of bits later > > on, perhaps also in pairs or (with your suggestion to separate this one) > > in triplets, and every extra bit complicates the calculation of the > > bitmask value. > > OTOH, if we were to interleave the world and group bits, then maybe > instead of having the world and group bits as separate we could have a > two bit number, which naturally leaves us an extra value we'd use for > owner. So: > > 0 - none > 1 - world > 2 - group and world > 3 - owner, group, and world > > This would remove the ability to enable our restrictions for group (and > owner) while disabling them for world, but that's likely just an obscure > misconfiguration. > > Unfortunately, this is not compatible with my other idea of easily > extending a tested policy from world onto group by simply assigning the > same value to the other sysctl. > > I'm not sure which of these approaches is cleaner. I prefer the version with separate sysctls for world/group, it seems more usable. Salvatore ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2017-12-12 18:01 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-22 8:01 [PATCH v3 0/2] Restrict dangerous open in sticky directories Salvatore Mesoraca 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-22 8:01 ` [PATCH v3 1/2] Protected FIFOs and regular files Salvatore Mesoraca 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-23 22:43 ` Tobin C. Harding 2017-11-24 8:24 ` Salvatore Mesoraca 2017-11-22 8:01 ` [PATCH v3 2/2] Protected O_CREAT open in sticky directories Salvatore Mesoraca 2017-11-22 8:01 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-22 12:03 ` Pavel Vasilyev 2017-11-24 8:28 ` Salvatore Mesoraca 2017-11-22 13:22 ` Matthew Wilcox 2017-11-22 13:22 ` [kernel-hardening] " Matthew Wilcox 2017-11-22 14:38 ` Pavel Vasilyev 2017-11-24 8:29 ` Salvatore Mesoraca 2017-11-24 8:29 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-22 16:51 ` Alan Cox 2017-11-22 16:51 ` [kernel-hardening] " Alan Cox 2017-11-24 8:31 ` Salvatore Mesoraca 2017-11-24 8:31 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-24 10:53 ` David Laight 2017-11-24 10:53 ` [kernel-hardening] " David Laight 2017-11-24 11:43 ` Salvatore Mesoraca 2017-11-24 11:43 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-24 11:53 ` David Laight 2017-11-24 11:53 ` [kernel-hardening] " David Laight 2017-11-26 11:29 ` Salvatore Mesoraca 2017-11-26 11:29 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-27 0:26 ` Solar Designer 2017-11-27 0:26 ` [kernel-hardening] " Solar Designer 2017-11-30 14:39 ` Salvatore Mesoraca 2017-11-30 14:39 ` [kernel-hardening] " Salvatore Mesoraca 2017-11-30 14:57 ` Ian Campbell 2017-11-30 16:30 ` [kernel-hardening] " Solar Designer 2017-12-05 10:21 ` Salvatore Mesoraca 2017-12-07 21:47 ` Solar Designer 2017-12-11 12:08 ` Salvatore Mesoraca 2017-11-23 22:57 ` Tobin C. Harding 2017-11-24 8:34 ` Salvatore Mesoraca 2017-11-30 16:53 ` David Laight 2017-11-30 16:53 ` [kernel-hardening] " David Laight 2017-11-30 17:51 ` Solar Designer 2017-11-30 17:51 ` [kernel-hardening] " Solar Designer 2017-12-01 9:46 ` David Laight 2017-12-01 9:46 ` [kernel-hardening] " David Laight 2017-12-01 15:52 ` Alan Cox 2017-12-01 15:52 ` [kernel-hardening] " Alan Cox 2017-11-27 1:14 ` [kernel-hardening] Re: [PATCH v3 0/2] Restrict dangerous " Solar Designer 2017-11-27 13:18 ` Solar Designer 2017-11-30 14:38 ` Salvatore Mesoraca 2017-11-30 14:37 ` Salvatore Mesoraca 2017-11-30 19:05 ` Solar Designer 2017-11-30 19:14 ` Solar Designer 2017-12-05 10:13 ` Salvatore Mesoraca 2017-12-07 22:23 ` Solar Designer 2017-12-08 12:17 ` Solar Designer 2017-12-11 12:07 ` Salvatore Mesoraca 2017-12-11 15:33 ` Solar Designer 2017-12-12 18:00 ` Salvatore Mesoraca 2017-12-11 16:07 ` Solar Designer 2017-12-12 18:01 ` Salvatore Mesoraca
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.